Revisions of python-Scrapy
Ana Guerrero (anag+factory)
accepted
request 1186841
from
Dirk Mueller (dirkmueller)
(revision 21)
- update to 2.11.2 (bsc#1224474, CVE-2024-1968): * Redirects to non-HTTP protocols are no longer followed. Please, see the 23j4-mw76-5v7h security advisory for more information. (:issue:`457`) * The Authorization header is now dropped on redirects to a different scheme (http:// or https://) or port, even if the domain is the same. Please, see the 4qqq-9vqf-3h3f security advisory for more information. * When using system proxy settings that are different for http:// and https://, redirects to a different URL scheme will now also trigger the corresponding change in proxy settings for the redirected request. Please, see the jm3v-qxmh-hxwv security advisory for more information. (:issue:`767`) * :attr:`Spider.allowed_domains <scrapy.Spider.allowed_domains>` is now enforced for all requests, and not only requests from spider callbacks. * :func:`~scrapy.utils.iterators.xmliter_lxml` no longer resolves XML entities. * defusedxml is now used to make :class:`scrapy.http.request.rpc.XmlRpcRequest` more secure. * Restored support for brotlipy_, which had been dropped in Scrapy 2.11.1 in favor of brotli. (:issue:`6261`) Note brotlipy is deprecated, both in Scrapy and upstream. Use brotli instead if you can. * Make :setting:`METAREFRESH_IGNORE_TAGS` ["noscript"] by default. This prevents :class:`~scrapy.downloadermiddlewares. redirect.MetaRefreshMiddleware` from following redirects that would not be followed by web browsers with JavaScript enabled.
Ana Guerrero (anag+factory)
accepted
request 1164153
from
Factory Maintainer (factory-maintainer)
(revision 20)
Automatic submission by obs-autosubmit
Ana Guerrero (anag+factory)
accepted
request 1161494
from
Dirk Mueller (dirkmueller)
(revision 19)
- update to 2.11.1 (bsc#1220514, CVE-2024-1892): * Addressed `ReDoS vulnerabilities` (bsc#1220514, CVE-2024-1892) - ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of :func:`~scrapy.utils.iterators.xmliter_lxml`, which :class:`~scrapy.spiders.XMLFeedSpider` now uses. To minimize the impact of this change on existing code, :func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating the node namespace with a prefix in the node name, and big files with highly nested trees when using libxml2 2.7+. - Fixed regular expressions in the implementation of the :func:`~scrapy.utils.response.open_in_browser` function. .. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS * :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security advisory`_ for more information. .. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7 * Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the deprecated ``scrapy.downloadermiddlewares.decompression`` module has been removed. * The ``Authorization`` header is now dropped on redirects to a different domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more information. * The OS signal handling code was refactored to no longer use private Twisted functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`) * Improved documentation for :class:`~scrapy.crawler.Crawler` initialization changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`) * Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`. * Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`, * Added a link to Zyte's export guides to the :ref:`feed exports * Added a missing note about backward-incompatible changes in
Ana Guerrero (anag+factory)
accepted
request 1137882
from
Daniel Garcia (dgarcia)
(revision 18)
- Add patch twisted-23.8.0-compat.patch gh#scrapy/scrapy#6064 - Update to 2.11.0: - Spiders can now modify settings in their from_crawler methods, e.g. based on spider arguments. - Periodic logging of stats. - Bug fixes. - 2.10.0: - Added Python 3.12 support, dropped Python 3.7 support. - The new add-ons framework simplifies configuring 3rd-party components that support it. - Exceptions to retry can now be configured. - Many fixes and improvements for feed exports. - 2.9.0: - Per-domain download settings. - Compatibility with new cryptography and new parsel. - JMESPath selectors from the new parsel. - Bug fixes. - 2.8.0: - This is a maintenance release, with minor features, bug fixes, and cleanups.
Dominique Leuenberger (dimstar_suse)
accepted
request 1034478
from
Markéta Machová (mcalabkova)
(revision 17)
Dominique Leuenberger (dimstar_suse)
accepted
request 1002736
from
Dirk Mueller (dirkmueller)
(revision 15)
Dominique Leuenberger (dimstar_suse)
accepted
request 959733
from
Dirk Mueller (dirkmueller)
(revision 14)
Dominique Leuenberger (dimstar_suse)
accepted
request 958587
from
Matej Cepl (mcepl)
(revision 13)
- Update to v2.6.1 * Security fixes for cookie handling (CVE-2022-0577 aka bsc#1196638, GHSA-mfjm-vh54-3f96) * Python 3.10 support * asyncio support is no longer considered experimental, and works out-of-the-box on Windows regardless of your Python version * Feed exports now support pathlib.Path output paths and per-feed item filtering and post-processing - Remove unnecessary patches: - remove-h2-version-restriction.patch - add-peak-method-to-queues.patch
Dominique Leuenberger (dimstar_suse)
accepted
request 889037
from
Markéta Machová (mcalabkova)
(revision 9)
Dominique Leuenberger (dimstar_suse)
accepted
request 819355
from
Tomáš Chvátal (scarabeus_iv)
(revision 8)
Dominique Leuenberger (dimstar_suse)
accepted
request 807286
from
Tomáš Chvátal (scarabeus_iv)
(revision 7)
Dominique Leuenberger (dimstar_suse)
accepted
request 790737
from
Steve Kowalik (StevenK)
(revision 6)
- Update to 2.0.1: * Python 2 support has been removed * Partial coroutine syntax support and experimental asyncio support * New Response.follow_all method * FTP support for media pipelines * New Response.certificate attribute * IPv6 support through DNS_RESOLVER * Response.follow_all now supports an empty URL iterable as input * Removed top-level reactor imports to prevent errors about the wrong Twisted reactor being installed when setting a different Twisted reactor using TWISTED_REACTOR - Add zope-exception-test_crawler.patch, rewriting one testcase to pass with our version of Zope. - Update BuildRequires based on test requirements.
Dominique Leuenberger (dimstar_suse)
accepted
request 765023
from
Tomáš Chvátal (scarabeus_iv)
(revision 5)
Dominique Leuenberger (dimstar_suse)
accepted
request 725773
from
Tomáš Chvátal (scarabeus_iv)
(revision 4)
Dominique Leuenberger (dimstar_suse)
accepted
request 718179
from
Tomáš Chvátal (scarabeus_iv)
(revision 3)
- Format with spec-cleaner - Use just python3 version of Sphinx - version update to 1.7.1 * Improvements for crawls targeting multiple domains * A cleaner way to pass arguments to callbacks * A new class for JSON requests * Improvements for rule-based spiders * New features for feed exports see news.rst for details
Dominique Leuenberger (dimstar_suse)
accepted
request 703535
from
Tomáš Chvátal (scarabeus_iv)
(revision 2)
Displaying revisions 1 - 20 of 21