Revisions of xapian-core
buildservice-autocommit
accepted
request 1007235
from
Antonio Larrosa (alarrosa)
(revision 102)
baserev update by copy to link target
Antonio Larrosa (alarrosa)
accepted
request 1007094
from
Dirk Mueller (dirkmueller)
(revision 101)
- update to 1.4.21: * Stop trying to check for incompatible C++ ABI between the compiler used to build xapian-core and the compiler used to build code using xapian-core. * Fix new warnings from GCC 12. * Avoid undefined value use when unpacking a key in a corrupted glass docdata table. We now skip further checks on the entry in this case. * Merge allocations in MSVC directory reading compatibility code so we can allocate in a single malloc() call. * Add accept() wrapper which checks an assumption that Microsoft's SOCKET type only actually holds 32 bit values even in 64 bit platforms and throws an exception if violated. * Eliminate a use of sprintf. * Squash some unhelpful MSVC deprecation warnings. * Declare dummy invalid parameter handler noexcept to fix a warning from MSVC. * Include <stdlib.h> in configure check for sys_errlist as that's where it is with mingw and MSVC.
buildservice-autocommit
accepted
request 990888
from
Factory Maintainer (factory-maintainer)
(revision 100)
baserev update by copy to link target
Dirk Mueller (dirkmueller)
accepted
request 989717
from
Dirk Mueller (dirkmueller)
(revision 99)
- update to 1.4.20: * Throw DatabaseNotFoundError when the database directory doesn't exist or when it doesn't contain a Xapian database. Patch from Germán Méndez Bravo in https://github.com/xapian/xapian/pull/258 * Improve exception message for attempting to remove an empty term (the exception type is still InvalidArgumentError). Reported by David Bremner. * Optimise when a value range is a superset of the slot bounds but the value slot frequency is not equal to the document count by replacing the lower bound with an empty string to make the bounds check very cheap. * Avoid creating a PostList tree for an empty shard. This avoids pointless work in an uncommon case, but also by handling this up front the code in PostList subclasses for query operators can assume the shard isn't empty which simplifies the code in several places. * Remove lingering handling for database backends without slot bounds since all backends have been required to support these since 1.4.11. * Fix collection frequency estimates for positional operators. This affects the weighting of positional operators in subqueries of OP_SYNONYM with weighting schemes which use the collection frequency. * xapian-check: Test decompress data in the spelling and synonym tables. We don't have structure checking for these tables, but we can at least fetch each entry and check for decompression problems. * Improve error if a block is detected as overwritten in WritableDatabase. Drop "are there multiple writers?" as it's rarely a useful question to ask since we started using fcntl() locking as it's now very hard to get multiple concurrent writers on a database. Instead suggest running xapian-check, which is probably the best next step for a user who hits this problem.
Dirk Mueller (dirkmueller)
committed
(revision 98)
- update to 1.4.19: * New QueryParser::FLAG_NO_POSITIONS flag. With this flag enabled, any query operations which would use positional information are replaced by the nearest equivalent which doesn't (so phrase searches, NEAR and ADJ will result in OP_AND). This is intended to replace the automatic conversion of OP_PHRASE, etc to OP_AND when a database has no positional information, which will no longer happen in the release series after 1.4. * Give a compile error for code which adds a Database to WritableDatabase. Prior to 1.4.19, this compiled and effectively created a "black-hole" shard which quietly discarded any changes made to it. In 1.4.19 it's still possible to perform this operation by assigning the WritableDatabase to a Database first, which is harder to fix. This case throws an exception on git master where it's easier to address. * Fix TermIterator::skip_to() with sharded databases which sometimes was failing to advance all the way to the requested term. Uncovered while addressing warning from GCC's -Wduplicated-cond, reported by dcb in #816. * Clamp edit distance to one less than the length of the word we've been asked to correct, which makes the algorithm we use more efficient. We already require suggestion to have at least one character in common, so the only change to suggestions is we'll no longer suggest corrections which are twice as long or longer even if the edit distance would allow it, which seems like an improvement in itself. * Minor optimisation expanding wildcards. * PostingIterator::get_description(): For an all-docs iterator on a glass database, get_description() would call get_docid() which isn't valid to do once the iterator has reached the end. * Expand allterms test coverage. * Fetch wdf upper bound from postlist which avoids an extra postlist table cursor seek per weighted query term, and also means we now use a per-shard wdf upper bound for local shards which will in typically give a tighter
buildservice-autocommit
accepted
request 864581
from
Dirk Mueller (dirkmueller)
(revision 97)
baserev update by copy to link target
Dirk Mueller (dirkmueller)
accepted
request 864450
from
Dirk Mueller (dirkmueller)
(revision 96)
- update to 1.4.18: * QueryParser::FLAG_ACCUMULATE: New flag. Previously the unstem and stoplist data was always reset by a call to QueryParser::parse_query(), which makes sense if you use the same QueryParser object to parse a series of independent queries. If you're using the same QueryParser object to parse several fields on the same query form, you may want to have the unstem and stoplist data combined for all of them, in which case you can use this flag to prevent this data from being reset. * QueryParser::unstem_begin(): Eliminate unnecessary copying of the data. * Fix typo in Swedish stopword list, syncing change made to Snowball by Daniel Gómez Villanueva. * Remove some French stop words with other meanings, syncing change made to Snowball by PhilippeOuellet. testsuite: * Run testcase testlock4 using backend chert, not just using glass * Skip testcase testlock4 on platforms that don't allow us to implement Database::locked() (which notably include GNU Hurd and Microsoft Windows). documentation: * List DB_NO_TERMLIST in the WritableDatabase constructor API documentation where we already list the other DB_* constants. portability: * Eliminate single use of std::mem_fun() which was deprecated in C++11 and removed in C++17. Reported by Mateusz Pusz in #806. * Add missing includes for std::numeric_limits<>. Reported by stac47 in #805. * Work around mingw.org header issue. MSVC seems to implicitly include <winerror.h> but mingw.org's headers don't, leading to ERROR_PIPE_CONNECTED not being defined. Fixes https://github.com/xapian/xapian/pull/318, reported by Alex Sandro. * Suppress MSVC warnings about possible loss of data. The values involved are the number of set bits in a value of integer type, so these warnings are
buildservice-autocommit
accepted
request 829956
from
Antonio Larrosa (alarrosa)
(revision 95)
baserev update by copy to link target
Antonio Larrosa (alarrosa)
accepted
request 829895
from
Antonio Larrosa (alarrosa)
(revision 94)
- Update to 1.4.17: + API: * Database::get_average_length(): Add this as an alias for Database::get_avlen(). In git master we've added this as a preferred new name - adding it to 1.4.x too will make it easier for users to update to using this. * Database::get_spelling_suggestion(): Optimise edit distance initialisation loop to significantly reduce the cost of a typical edit distance calculation. * Fix query expansion on sharded databases. The mechanism for passing in which shard a TermList is from wasn't hooked up and as a result we'd always think it's from the first shard, meaning the statistics would be wrong and that our suggested terms may not have been as good as they should be in this situation. * Enquire::get_eset(): Use string::compare() to avoid 1/3 of the string compares on average. + documentation: * Update doxygen HTML headers and footers to resolve issues with some interactive features of the API docs not working. Reported by Enrico Zini. * Stop specifying obsolete doxygen settings PERL_PATH and MSCGEN_PATH. * Clarify API docs for MSet::get_termfreq() to make it clear that this considers all documents in the database, not only those that matched the searched (it would sometimes be useful to be able to report the number of occurrences of a term in the matched documents, but it's not something we currently keep track of). Reported by Tadeusz Sośnierz and Peter Salomonsen.
Dirk Mueller (dirkmueller)
committed
(revision 93)
- update to 1.4.16: * MSet::snippet(): The snippet now includes trailing punctuation which carries meaning or gives useful context. See https://github.com/xapian/xapian/pull/180, reported by Robert Stepanek. * MSet::snippet(): Fix segfault generating snippet from default-constructed MSet. This probably isn't something you'd typically do, but it shouldn't crash. Found during extended testing of #803 (which only affected git master) which was reported by Robert Stepanek. * Remove trailing full stop from exception messages. We conventionally don't include one, but a few cases didn't follow that convention. testsuite: * Replace direct use of ftime() which gives deprecation warnings with recent mingw. Reported by srinivasyadav22. matcher: * Fix segfault in rare cases in the query optimiser. We keep a pointer to the most recent posting list to use as a hint for opening the next posting list, but the existing mechanism to take ownership of this hint had a flaw. We now invalidate the hint in situations where it might be indirectly deleted which is safe, but somewhat conservative. * Improve the optimisation of an always-matching OP_VALUE_GE to also take effect when the value slot's lower bound is equal to the limit of the OP_VALUE_GE. Patch from boda sadalla. glass backend: * Report the correct errno value if commit() fails. We were potentially reporting ENOENT from an unlink() call cleaning up a temporary file prior to throwing the exception instead. documentation: * Fix missing menus in API documentation. Newer doxygen generates .js files which we also need to distribute and install. Reported by sec^nd on #xapian. * Note OP_FILTER ignored subquery bug fixed in 1.4.15 as present in 1.4.14 and
buildservice-autocommit
accepted
request 799125
from
Antonio Larrosa (alarrosa)
(revision 92)
baserev update by copy to link target
Antonio Larrosa (alarrosa)
accepted
request 798996
from
Dominique Leuenberger (dimstar)
(revision 91)
Update to 1.4.15
buildservice-autocommit
accepted
request 765865
from
Dirk Mueller (dirkmueller)
(revision 90)
baserev update by copy to link target
Dirk Mueller (dirkmueller)
accepted
request 764595
from
Antonio Larrosa (alarrosa)
(revision 89)
- Update to 1.4.14: * API: + Xapian::QueryParser: Handle "" inside a quoted phrase better. In a quoted boolean term, "" is treated as an escaped ", so handle it in a compatible way for quoted phrases. Previously we'd drop out of the phrase and start a new phrase. Fixes #630, reported by Austin Clements. + Xapian::Stem: The constructor which takes a stemmer name now takes an optional second bool parameter - if this is true, then an unknown stemmer name falls back to using the "none" stemmer instead of throwing an exception. This allows simply constructing a stemmer from an ISO language code without having to worry about whether there's a stemmer for that language, and without having to handle an exception if there isn't. + Xapian::Stem: Fix a bug with handling 4-byte UTF-8 sequences which potentially affects most of the stemmers. None of the stemmers work in languages where 4-byte UTF-8 sequences are part of the alphabet, but this bug could result in invalid UTF-8 sequences in terms generated from text containing high Unicode codepoints such as emoji, which can cause issues (for example, in some language bindings). Fix synced from Snowball git post 2.0.0. + Xapian::Stem: Add a new is_none() method which tests if this is a "none" stemmer. + Xapian::Weight: The total length of all documents is now made available to Xapian::Weight subclasses, and this is now used by DLHWeight, DPHWeight and LMWeight. To maintain ABI compatibility, internally this still fetches the average length and the number of documents, multiplies them, then rounds the result, but in the next release series this will be handled directly. + Xapian::Database::locked() on an inmemory database used to always return false, but an inmemory Database is always actually a WritableDatabase underneath, so now we always report true in this case because it's really always report being locked for writing.
buildservice-autocommit
accepted
request 650369
from
Dirk Mueller (dirkmueller)
(revision 88)
baserev update by copy to link target
Dirk Mueller (dirkmueller)
accepted
request 650355
from
Antonio Larrosa (alarrosa)
(revision 87)
- Update to 1.4.9: * API: + Document::add_posting(): Fix bugs with the change in 1.4.8 to more efficiently handle insertion of a batch of extra positions in ascending order. These could lead to missing positions and corrupted encoded positional data. * remote backend: + Avoid hang if remote connection shutdown fails by not waiting for the connection to close in this situation. Seems to fix occasional hangs seen on macOS. Patch from Germán M. Bravo. - Update to 1.4.8: * API: + QueryParser,TermGenerator: Add new stemming mode STEM_SOME_FULL_POS. This stores positional information for both stemmed and unstemmed terms, allowing NEAR and ADJ to work with stemmed terms. The extra positional information is likely to take up a significant amount of extra disk space so the default STEM_SOME is likely to be a better choice for most users. + Database::check(): Fetch and decompress the document data to catch problems with the splitting of large data into multiple entries, corruption of the compressed data, etc. Also check that empty document data isn't explicitly stored for glass. + Fix an incorrect type being used for term positions in the TermGenerator API. These were Xapian::termcount but should be Xapian::termpos. Both are typedefs for the same 32-bit unsigned integer type by default (almost always "unsigned int") so this change is entirely compatible, except that if you were configuring 1.4.7 or earlier with --enable-64bit-termcount you need to also use the new --enable-64bit-termpos configure option with 1.4.8 and up or rebuild your applications. This change was necessary to make --enable-64bit-termpos actually useful. + Add Document::remove_postings() method which removes all postings in a
buildservice-autocommit
accepted
request 644271
from
Antonio Larrosa (alarrosa)
(revision 86)
baserev update by copy to link target
Antonio Larrosa (alarrosa)
accepted
request 644270
from
Antonio Larrosa (alarrosa)
(revision 85)
* API: + Database::check(): Fix bogus error reports for documents with length zero due to a new check added in 1.4.6 that the doclength was between the stored upper and lower bounds, which failed to allow for the lower bound ignoring documents with length zero (since documents indexed only by boolean terms aren't involved in weighted searches). + Query: Use of Query::MatchAll in multithreaded code causes problems because the reference counting gets messed up by concurrent updates. Document that Query(string()) should be used instead of MatchAll in multithreaded code, and avoid using it in library code. * Stem: + Stemming algorithms added for Irish, Lithuanian, Nepali and Tamil. + Merge Snowball compiler changes which improve code generation. + Merge optimisations to the Arabic and Turkish stemmers. * testsuite: + Fix duplicate test in apitest closedb10 testcase. * See also https://xapian.org/docs/xapian-core-1.4.7/NEWS
Antonio Larrosa (alarrosa)
accepted
request 644121
from
Sean Lewis (seanlew)
(revision 84)
Update xapian-core to 1.4.7
buildservice-autocommit
accepted
request 626669
from
Factory Maintainer (factory-maintainer)
(revision 83)
baserev update by copy to link target
Displaying revisions 1 - 20 of 102