Revisions of python-nltk

Ana Guerrero's avatar Ana Guerrero (anag+factory) accepted request 1218976 from Daniel Garcia's avatar Daniel Garcia (dgarcia) (revision 19)
- Use _service to download source and exclude documentation that has
  non-commercial license (boo#1232448)
- Remove nltk_data to avoid redistribution of files with
  non-commercial (boo#1232448):
  > NLTK corpora are provided under the terms given in the README file
  > for each corpus; all are redistributable and available for
  > non-commercial use.
- Remove not needed skip-networked-test.patch
Ana Guerrero's avatar Ana Guerrero (anag+factory) accepted request 1218494 from Matej Cepl's avatar Matej Cepl (mcepl) (revision 18)
- Update to to 3.9.1 (changes since 3.8.1):
  * Fixed bug that prevented wordnet from loading
  * Fix security vulnerability CVE-2024-39705 (breaking change)
  * Replace pickled models (punkt, chunker, taggers) by new
    pickle-free "_tab" packages
  * No longer sort Wordnet synsets and relations (sort in calling
    function when required)
  * Only strip the last suffix in Wordnet Morphy, thus
    restricting synsets() results
  * Add Python 3.12 support
  * Many other minor fixes
- Refresh nltk_data
- Remome upstreamed patches:
  - CVE-2024-39705.patch
  - nltk-pr3207-py312.patch

- Update to 3.8
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1189727 from Daniel Garcia's avatar Daniel Garcia (dgarcia) (revision 17)
- Add CVE-2024-39705.patch upstream patch to fix unsafe pickle usage.
  (CVE-2024-39705, gh#nltk/nltk#3266, bsc#1227174).
- Drop CVE-2024-39705-disable-download.patch as it's not needed
  anymore.
Ana Guerrero's avatar Ana Guerrero (anag+factory) accepted request 1185062 from Matej Cepl's avatar Matej Cepl (mcepl) (revision 16)
- Use tarball from GitHub instead of the Zip archive from PyPI,
  the latter has very messy combination of CRLF and LF EOLs,
  which are hard to patch.
- Refresh all patches from the original locations.
- Add CVE-2024-39705-disable-download.patch to make a crude
  workaround around CVE-2024-39705 (gh#nltk/nltk#3266,
  bsc#1227174).
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1077159 from Factory Maintainer's avatar Factory Maintainer (factory-maintainer) (revision 14)
Automatic submission by obs-autosubmit
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 1045543 from Matej Cepl's avatar Matej Cepl (mcepl) (revision 12)
- Complete nltk_data.tar.xz for offline testing
- Fix failing tests (gh#nltk/nltk#2969) by adding patches:
  - port-2to3.patch
  - skip-networked-test.patch
- Clean up the SPEC to get rid of rpmlint warnings.
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 965220 from Dirk Mueller's avatar Dirk Mueller (dirkmueller) (revision 11)
- Update to 3.7
  - Improve and update the NLTK team page on nltk.org (#2855,
    #2941)
  - Drop support for Python 3.6, support Python 3.10 (#2920)
- Update to 3.6.7
  - Resolve IndexError in `sent_tokenize` and `word_tokenize`
    (#2922)
- Update to 3.6.6
  - Refactor `gensim.doctest` to work for gensim 4.0.0 and up
    (#2914)
  - Add Precision, Recall, F-measure, Confusion Matrix to Taggers
    (#2862)
  - Added warnings if .zip files exist without any corresponding
    .csv files. (#2908)
  - Fix `FileNotFoundError` when the `download_dir` is
    a non-existing nested folder (#2910)
  - Rename omw to omw-1.4 (#2907)
  - Resolve ReDoS opportunity by fixing incorrectly specified
    regex (#2906, bsc#1191030, CVE-2021-3828).
  - Support OMW 1.4 (#2899)
  - Deprecate Tree get and set node methods (#2900)
  - Fix broken inaugural test case (#2903)
  - Use Multilingual Wordnet Data from OMW with newer Wordnet
    versions (#2889)
  - Keep NLTKs "tokenize" module working with pathlib (#2896)
  - Make prettyprinter to be more readable (#2893)
  - Update links to the nltk book (#2895)
  - Add `CITATION.cff` to nltk (#2880)
  - Resolve serious ReDoS in PunktSentenceTokenizer (#2869)
  - Delete old CI config files (#2881)
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 812413 from Tomáš Chvátal's avatar Tomáš Chvátal (scarabeus_iv) (revision 10)
- Update to v3.5
  * add support for Python 3.8
  * drop support for Python 2
  * create NLTK's own Tokenizer class distinct from the Treebank
    reference tokeniser
  * update Vader sentiment analyser
  * fix JSON serialization of some PoS taggers
  * minor improvements in grammar.CFG, Vader, pl196x corpus reader,
    StringTokenizer
  * change implementation <= and >= for FreqDist so they are partial
    orders
  * make FreqDist iterable
  * correctly handle Penn Treebank trees with a unlabeled branching
    top node
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 787913 from Dirk Mueller's avatar Dirk Mueller (dirkmueller) (revision 9)
- Update to 3.4.5 (bsc#1146427, CVE-2019-14751):
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 738364 from Matej Cepl's avatar Matej Cepl (mcepl) (revision 7)
Replace %fdupes -s with plain %fdupes; hardlinks are better.
Ludwig Nussel's avatar Ludwig Nussel (lnussel_factory) accepted request 730102 from Tomáš Chvátal's avatar Tomáš Chvátal (scarabeus_iv) (revision 6)
- Update to 3.4.5:
  * Fixed security bug in downloader: Zip slip vulnerability - for the
    unlikely situation where a user configures their downloader to use
    a compromised server CVE-2019-14751
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 717915 from Tomáš Chvátal's avatar Tomáš Chvátal (scarabeus_iv) (revision 5)
- Update to 3.4.4:
  * fix bug in plot function (probability.py)
  * add improved PanLex Swadesh corpus reader
  * add Text.generate()
  * add QuadgramAssocMeasures
  * add SSP to tokenizers
  * return confidence of best tag from AveragedPerceptron
  * make plot methods return Axes objects
  * don't require list arguments to PositiveNaiveBayesClassifier.train
  * fix Tree classes to work with native Python copy library
  * fix inconsistency for NomBank
  * fix random seeding in LanguageModel.generate
  * fix ConditionalFreqDist mutation on tabulate/plot call
  * fix broken links in documentation
  * fix misc Wordnet issues
  * update installation instructions
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 603179 from Tomáš Chvátal's avatar Tomáš Chvátal (scarabeus_iv) (revision 2)
- Trim redundant wording from description.
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 583014 from Atri Bhattacharya's avatar Atri Bhattacharya (badshah400) (revision 1)
NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules, data sets and tutorials supporting research and development in Natural Language Processing.
Displaying all 19 revisions
openSUSE Build Service is sponsored by