Revisions of python-charset-normalizer

Overview Repositories Revisions Requests Users Attributes Meta

Revisions of python-charset-normalizer

Ana Guerrero (anag+factory) accepted request 1221058 from

Factory Maintainer (factory-maintainer) 10 days ago (revision 24)

Automatic submission by obs-autosubmit

Ana Guerrero (anag+factory) accepted request 1217078 from

Dirk Mueller (dirkmueller) 23 days ago (revision 23)

- update to 3.4.0:
  * Argument `--no-preemptive` in the CLI to prevent the detector
    to search for hints.
  * Support for Python 3.13
  * Relax the TypeError exception thrown when trying to compare a
    CharsetMatch with anything else than a CharsetMatch.
  * Improved the general reliability of the detector based on
    user feedbacks. (#520) (#509) (#498) (#407)
  * Declared charset in content (preemptive detection) not
    changed when converting to utf-8 bytes.

Ana Guerrero (anag+factory) accepted request 1128743 from

Dirk Mueller (dirkmueller) 12 months ago (revision 22)

- update to 3.3.2:
  * Unintentional memory usage regression when using large
    payload that match several encoding (#376)
  * Regression on some detection case showcased in the
    documentation (#371)
  * Noise (md) probe that identify malformed arabic
    representation due to the presence of letters in isolated
    form
  * Optional mypyc compilation upgraded to version 1.6.1 for
    Python >= 3.8
  * Improved the general detection reliability based on reports
    from the community

Ana Guerrero (anag+factory) accepted request 1114778 from

Dirk Mueller (dirkmueller) 12 months ago (revision 21)

- update to 3.3.0:
  * Allow to execute the CLI (e.g. normalizer) through `python -m
    charset_normalizer.cli` or `python -m charset_normalizer`
  * Support for 9 forgotten encoding that are supported by Python
    but unlisted in `encoding.aliases` as they have no alias
  * Optional mypyc compilation upgraded to version 1.5.1 for
    Python >= 3.7
  * Unable to properly sort CharsetMatch when both chaos/noise
    and coherence were close due to an unreachable condition in
    \_\_lt\_\_ (#350)

- Update to 3.0.1
- Update to 3.0.0
  * ASCII miss-detection on rare cases (PR #170)
  * Wrong logging level applied when setting kwarg `explain` to True
- require lower-case name instead of breaking build

Ana Guerrero (anag+factory) accepted request 1098807 from

Dirk Mueller (dirkmueller) over 1 year ago (revision 20)

- update to 3.2.0:
  * Typehint for function `from_path` no longer enforce
    `PathLike` as its first argument
  * Minor improvement over the global detection reliability
  * Introduce function `is_binary` that relies on main
    capabilities, and optimized to detect binaries
  * Propagate `enable_fallback` argument throughout `from_bytes`,
    `from_path`, and `from_fp` that allow a deeper control over
    the detection (default True)
  * Edge case detection failure where a file would contain 'very-
    long' camel cased word (Issue #289)

Dominique Leuenberger (dimstar_suse) accepted request 1084939 from

Dirk Mueller (dirkmueller) over 1 year ago (revision 19)

- add sle15_python_module_pythons (jsc#PED-68)

Dominique Leuenberger (dimstar_suse) accepted request 1074517 from

Dirk Mueller (dirkmueller) over 1 year ago (revision 18)

- update to 3.1.0:
  * Argument `should_rename_legacy` for legacy function `detect`
    and disregard any new arguments without errors (PR #262)
  * Removed Support for Python 3.6 (PR #260)
  * Optional speedup provided by mypy/c 1.0.1

Dominique Leuenberger (dimstar_suse) accepted request 1039740 from

Dirk Mueller (dirkmueller) almost 2 years ago (revision 17)

Dominique Leuenberger (dimstar_suse) accepted request 1032182 from

Matej Cepl (mcepl) about 2 years ago (revision 16)

Forwarded request #1031656 from yarunachalam

- Update to 3.0.0 
    Added
    * Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
      Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
      Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio
      normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl)
    * Changed
      Build with static metadata using 'build' frontend
      Make the language detection stricter
      Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
    * Fixed
      CLI with opt --normalize fail when using full path for files
      TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
      Sphinx warnings when generating the documentation
    * Removed
      Coherence detector no longer return 'Simple English' instead return 'English'
      Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
      Breaking: Method first() and best() from CharsetMatch
      UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
      Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
      Breaking: Top-level function normalize
      Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
      Support for the backport unicodedata2

Dominique Leuenberger (dimstar_suse) accepted request 1004361 from

Dirk Mueller (dirkmueller) about 2 years ago (revision 15)

- update to 2.1.1:
  * Function `normalize` scheduled for removal in 3.0
  * Removed useless call to decode in fn is_unprintable (#206)

Dominique Leuenberger (dimstar_suse) accepted request 998090 from

Dirk Mueller (dirkmueller) about 2 years ago (revision 14)

Richard Brown (RBrownFactory) accepted request 991152 from

Dirk Mueller (dirkmueller) over 2 years ago (revision 13)

- update to 2.1.0:
  * Output the Unicode table version when running the CLI with `--version`
  * Re-use decoded buffer for single byte character sets
  * Fixing some performance bottlenecks
  * Workaround potential bug in cpython with Zero Width No-Break Space located
  * in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space
  * CLI default threshold aligned with the API threshold from
  * Support for Python 3.5 (PR #192)
  * Use of backport unicodedata from `unicodedata2` as Python is quickly
    catching up, scheduled for removal in 3.0

Dominique Leuenberger (dimstar_suse) accepted request 954654 from

Dirk Mueller (dirkmueller) over 2 years ago (revision 12)

- update to 2.0.12:
  * ASCII miss-detection on rare cases (PR #170) 
  * Explicit support for Python 3.11 (PR #164)
  * The logging behavior have been completely reviewed, now using only TRACE
    and DEBUG levels

Dominique Leuenberger (dimstar_suse) accepted request 945443 from

Dirk Mueller (dirkmueller) almost 3 years ago (revision 11)

- update to 2.0.10:
  * Fallback match entries might lead to UnicodeDecodeError for large bytes
    sequence
  * Skipping the language-detection (CD) on ASCII

Dominique Leuenberger (dimstar_suse) accepted request 936118 from

Dirk Mueller (dirkmueller) almost 3 years ago (revision 10)

- update to 2.0.9:
  * Moderating the logging impact (since 2.0.8) for specific
    environments
  * Wrong logging level applied when setting kwarg `explain` to True

Dominique Leuenberger (dimstar_suse) accepted request 934519 from

Dirk Mueller (dirkmueller) almost 3 years ago (revision 9)

- update to 2.0.8:
  * Improvement over Vietnamese detection
  * MD improvement on trailing data and long foreign (non-pure latin)
  * Efficiency improvements in cd/alphabet_languages
  * call sum() without an intermediary list following PEP 289 recommendations
  * Code style as refactored by Sourcery-AI
  * Minor adjustment on the MD around european words
  * Remove and replace SRTs from assets / tests
  * Initialize the library logger with a `NullHandler` by default
  * Setting kwarg `explain` to True will add provisionally
  * Fix large (misleading) sequence giving UnicodeDecodeError
  * Avoid using too insignificant chunk
  * Add and expose function `set_logging_handler` to configure a specific
    StreamHandler

- require lower-case name instead of breaking build 

- Use lower-case name of prettytable package

Dominique Leuenberger (dimstar_suse) accepted request 927599 from

Dirk Mueller (dirkmueller) about 3 years ago (revision 8)

- Update to version 2.0.7
  * Addition: bento Add support for Kazakh (Cyrillic) language
    detection
  * Improvement: sparkle Further improve inferring the language
    from a given code page (single-byte).
  * Removed: fire Remove redundant logging entry about detected
    language(s).
  * Improvement: zap Refactoring for potential performance
    improvements in loops.
  * Improvement: sparkles Various detection improvement (MD+CD).
  * Bugfix: bug Fix a minor inconsistency between Python 3.5 and
    other versions regarding language detection.
- Update to version 2.0.6
  * Bugfix: bug Unforeseen regression with the loss of the
    backward-compatibility with some older minor of Python 3.5.x.
  * Bugfix: bug Fix CLI crash when using --minimal output in
    certain cases.
  * Improvement: sparkles Minor improvement to the detection
    efficiency (less than 1%).
- Update to version 2.0.5
  * Improvement: sparkles The BC-support with v1.x was improved,
    the old staticmethods are restored.
  * Remove: fire The project no longer raise warning on tiny
    content given for detection, will be simply logged as warning
    instead.
  * Improvement: sparkles The Unicode detection is slightly
    improved, see #93
  * Bugfix: bug In some rare case, the chunks extractor could cut
    in the middle of a multi-byte character and could mislead the
    mess detection.

Dominique Leuenberger (dimstar_suse) accepted request 894589 from