Revisions of python-charset-normalizer
Ana Guerrero (anag+factory)
accepted
request 1221058
from
Factory Maintainer (factory-maintainer)
(revision 24)
Automatic submission by obs-autosubmit
Ana Guerrero (anag+factory)
accepted
request 1217078
from
Dirk Mueller (dirkmueller)
(revision 23)
- update to 3.4.0: * Argument `--no-preemptive` in the CLI to prevent the detector to search for hints. * Support for Python 3.13 * Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch. * Improved the general reliability of the detector based on user feedbacks. (#520) (#509) (#498) (#407) * Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes.
Ana Guerrero (anag+factory)
accepted
request 1128743
from
Dirk Mueller (dirkmueller)
(revision 22)
- update to 3.3.2: * Unintentional memory usage regression when using large payload that match several encoding (#376) * Regression on some detection case showcased in the documentation (#371) * Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form * Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8 * Improved the general detection reliability based on reports from the community
Ana Guerrero (anag+factory)
accepted
request 1114778
from
Dirk Mueller (dirkmueller)
(revision 21)
- update to 3.3.0: * Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer` * Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias * Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.7 * Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350) - Update to 3.0.1 - Update to 3.0.0 * ASCII miss-detection on rare cases (PR #170) * Wrong logging level applied when setting kwarg `explain` to True - require lower-case name instead of breaking build
Ana Guerrero (anag+factory)
accepted
request 1098807
from
Dirk Mueller (dirkmueller)
(revision 20)
- update to 3.2.0: * Typehint for function `from_path` no longer enforce `PathLike` as its first argument * Minor improvement over the global detection reliability * Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries * Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True) * Edge case detection failure where a file would contain 'very- long' camel cased word (Issue #289)
Dominique Leuenberger (dimstar_suse)
accepted
request 1084939
from
Dirk Mueller (dirkmueller)
(revision 19)
- add sle15_python_module_pythons (jsc#PED-68)
Dominique Leuenberger (dimstar_suse)
accepted
request 1074517
from
Dirk Mueller (dirkmueller)
(revision 18)
- update to 3.1.0: * Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262) * Removed Support for Python 3.6 (PR #260) * Optional speedup provided by mypy/c 1.0.1
Dominique Leuenberger (dimstar_suse)
accepted
request 1039740
from
Dirk Mueller (dirkmueller)
(revision 17)
Dominique Leuenberger (dimstar_suse)
accepted
request 1032182
from
Matej Cepl (mcepl)
(revision 16)
Forwarded request #1031656 from yarunachalam - Update to 3.0.0 Added * Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl) * Changed Build with static metadata using 'build' frontend Make the language detection stricter Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1 * Fixed CLI with opt --normalize fail when using full path for files TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it Sphinx warnings when generating the documentation * Removed Coherence detector no longer return 'Simple English' instead return 'English' Coherence detector no longer return 'Classical Chinese' instead return 'Chinese' Breaking: Method first() and best() from CharsetMatch UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII) Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches Breaking: Top-level function normalize Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch Support for the backport unicodedata2
Dominique Leuenberger (dimstar_suse)
accepted
request 1004361
from
Dirk Mueller (dirkmueller)
(revision 15)
- update to 2.1.1: * Function `normalize` scheduled for removal in 3.0 * Removed useless call to decode in fn is_unprintable (#206)
Dominique Leuenberger (dimstar_suse)
accepted
request 998090
from
Dirk Mueller (dirkmueller)
(revision 14)
Richard Brown (RBrownFactory)
accepted
request 991152
from
Dirk Mueller (dirkmueller)
(revision 13)
- update to 2.1.0: * Output the Unicode table version when running the CLI with `--version` * Re-use decoded buffer for single byte character sets * Fixing some performance bottlenecks * Workaround potential bug in cpython with Zero Width No-Break Space located * in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space * CLI default threshold aligned with the API threshold from * Support for Python 3.5 (PR #192) * Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0
Dominique Leuenberger (dimstar_suse)
accepted
request 954654
from
Dirk Mueller (dirkmueller)
(revision 12)
- update to 2.0.12: * ASCII miss-detection on rare cases (PR #170) * Explicit support for Python 3.11 (PR #164) * The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels
Dominique Leuenberger (dimstar_suse)
accepted
request 945443
from
Dirk Mueller (dirkmueller)
(revision 11)
- update to 2.0.10: * Fallback match entries might lead to UnicodeDecodeError for large bytes sequence * Skipping the language-detection (CD) on ASCII
Dominique Leuenberger (dimstar_suse)
accepted
request 936118
from
Dirk Mueller (dirkmueller)
(revision 10)
- update to 2.0.9: * Moderating the logging impact (since 2.0.8) for specific environments * Wrong logging level applied when setting kwarg `explain` to True
Dominique Leuenberger (dimstar_suse)
accepted
request 934519
from
Dirk Mueller (dirkmueller)
(revision 9)
- update to 2.0.8: * Improvement over Vietnamese detection * MD improvement on trailing data and long foreign (non-pure latin) * Efficiency improvements in cd/alphabet_languages * call sum() without an intermediary list following PEP 289 recommendations * Code style as refactored by Sourcery-AI * Minor adjustment on the MD around european words * Remove and replace SRTs from assets / tests * Initialize the library logger with a `NullHandler` by default * Setting kwarg `explain` to True will add provisionally * Fix large (misleading) sequence giving UnicodeDecodeError * Avoid using too insignificant chunk * Add and expose function `set_logging_handler` to configure a specific StreamHandler - require lower-case name instead of breaking build - Use lower-case name of prettytable package
Dominique Leuenberger (dimstar_suse)
accepted
request 927599
from
Dirk Mueller (dirkmueller)
(revision 8)
- Update to version 2.0.7 * Addition: bento Add support for Kazakh (Cyrillic) language detection * Improvement: sparkle Further improve inferring the language from a given code page (single-byte). * Removed: fire Remove redundant logging entry about detected language(s). * Improvement: zap Refactoring for potential performance improvements in loops. * Improvement: sparkles Various detection improvement (MD+CD). * Bugfix: bug Fix a minor inconsistency between Python 3.5 and other versions regarding language detection. - Update to version 2.0.6 * Bugfix: bug Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x. * Bugfix: bug Fix CLI crash when using --minimal output in certain cases. * Improvement: sparkles Minor improvement to the detection efficiency (less than 1%). - Update to version 2.0.5 * Improvement: sparkles The BC-support with v1.x was improved, the old staticmethods are restored. * Remove: fire The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead. * Improvement: sparkles The Unicode detection is slightly improved, see #93 * Bugfix: bug In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection.
Dominique Leuenberger (dimstar_suse)
accepted
request 894589
from
Markéta Machová (mcalabkova)
(revision 7)
Displaying revisions 1 - 20 of 24