Overview
Request 1180109 superseded
- Update to 1.5.0 (bsc#1226185, CVE-2024-5206):
## Security
* Fix feature_extraction.text.CountVectorizer and
feature_extraction.text.TfidfVectorizer no longer store discarded
tokens from the training set in their stop_words_ attribute. This
attribute would hold too frequent (above max_df) but also too rare
tokens (below min_df). This fixes a potential security issue (data
leak) if the discarded rare tokens hold sensitive information from
the training set without the model developer’s knowledge.
## Changed models
* Efficiency The subsampling in preprocessing.QuantileTransformer is
now more efficient for dense arrays but the fitted quantiles and
the results of transform may be slightly different than before
(keeping the same statistical properties). #27344 by Xuefeng Xu.
* Enhancement decomposition.PCA, decomposition.SparsePCA and
decomposition.TruncatedSVD now set the sign of the components_
attribute based on the component values instead of using the
transformed data as reference. This change is needed to be able to
offer consistent component signs across all PCA solvers, including
the new svd_solver="covariance_eigh" option introduced in this
release.
## Changes impacting many modules
* Fix Raise ValueError with an informative error message when
passing 1D sparse arrays to methods that expect 2D sparse inputs.
#28988 by Olivier Grisel.
* API Change The name of the input of the inverse_transform method
of estimators has been standardized to X. As a consequence, Xt is
deprecated and will be removed in version 1.7 in the following
estimators: cluster.FeatureAgglomeration,
decomposition.MiniBatchNMF, decomposition.NMF,
- Created by dgarcia
- In state superseded
- Superseded by 1180116
- Open review for licensedigger
- Open review for opensuse-review-team
- Open review for openSUSE:Factory:Staging:E
Request History
dgarcia created request
- Update to 1.5.0 (bsc#1226185, CVE-2024-5206):
## Security
* Fix feature_extraction.text.CountVectorizer and
feature_extraction.text.TfidfVectorizer no longer store discarded
tokens from the training set in their stop_words_ attribute. This
attribute would hold too frequent (above max_df) but also too rare
tokens (below min_df). This fixes a potential security issue (data
leak) if the discarded rare tokens hold sensitive information from
the training set without the model developer’s knowledge.
## Changed models
* Efficiency The subsampling in preprocessing.QuantileTransformer is
now more efficient for dense arrays but the fitted quantiles and
the results of transform may be slightly different than before
(keeping the same statistical properties). #27344 by Xuefeng Xu.
* Enhancement decomposition.PCA, decomposition.SparsePCA and
decomposition.TruncatedSVD now set the sign of the components_
attribute based on the component values instead of using the
transformed data as reference. This change is needed to be able to
offer consistent component signs across all PCA solvers, including
the new svd_solver="covariance_eigh" option introduced in this
release.
## Changes impacting many modules
* Fix Raise ValueError with an informative error message when
passing 1D sparse arrays to methods that expect 2D sparse inputs.
#28988 by Olivier Grisel.
* API Change The name of the input of the inverse_transform method
of estimators has been standardized to X. As a consequence, Xt is
deprecated and will be removed in version 1.7 in the following
estimators: cluster.FeatureAgglomeration,
decomposition.MiniBatchNMF, decomposition.NMF,
factory-auto added opensuse-review-team as a reviewer
Please review sources
factory-auto accepted review
Check script succeeded
anag+factory set openSUSE:Factory:Staging:E as a staging project
Being evaluated by staging project "openSUSE:Factory:Staging:E"
anag+factory accepted review
Picked "openSUSE:Factory:Staging:E"
superseded by 1180116