File python-ocrmypdf.changes of Package python-ocrmypdf

Overview Repositories Revisions Requests Users Attributes Meta

File python-ocrmypdf.changes of Package python-ocrmypdf

-------------------------------------------------------------------
Sun Nov 17 11:06:17 UTC 2024 - ecsos <ecsos@opensuse.org>

- Update to 16.6.2
  - Remove invalid hyperlink annotations to satisfy Ghostscript 10.x during PDF/A conversion. :issue:`1425`
- Changes from 16.6.1
  - Fixed some issues with Docker build, such as removing unnecessary content and using a stable Tesseract version.
  - Reverted Docker image to Ubuntu 22.04 to access older/more stable Ghostscript for now.
  - Clarified batch commands in documentation.
  - Fixed an issue with JSON serialization and pickling of HOCRResult. :issue:`1427`

-------------------------------------------------------------------
Tue Oct 29 14:08:12 UTC 2024 - ecsos <ecsos@opensuse.org>

- Update to 16.6.0 
  - Fixed an issue where damaged PDFs would fail with --redo-ocr. :issue:`1403`
  - Fixed an error that prevented JBIG2 optimization on Windows if the image was optimized in an earlier step. :issue:`1396`
  - Fixed an error detecting the version of unpaper 7.0.0. :issue:`1409`
  - Fixed a performance regression when scanning pages. :issue:`1378`. Thanks @aliemjay.
  - Fixed Alpine Docker image by enforcing Alpine 3.19.
    Alpine 3.20 includes a defective version of Tesseract OCR and so is not usable.
  - Upgraded Ubuntu Docker image to use Ubuntu 24.04.
  - Build and test scripts/actions switched to uv.
  - When running in a container, we now remind the user that temporary folders are inside the container and may not be accessible.
  - Fixed Linux test coverage matrix, which was missing some key versions.
- Changes from 16.5.0
  - Fixed issue with interpreting PDFs that have images with array masks. :issue:`1377`
  - Enabled testing on Python 3.13.
  - Fixed a test that did not work correctly but still passed. :issue:`1382`
  - Improved "PDF/A conversion failed" warning message to better describe implications.
  - Updated documentation to better explain OCR_JSON_SETTINGS in batch processing.
  - Build backend changed from setuptools to hatchling.

-------------------------------------------------------------------
Sun Aug 11 13:11:42 UTC 2024 - ecsos <ecsos@opensuse.org>

- Update to 16.4.3
  - Work around pdfminer.six issue where a token on the buffer boundary is incorrectly parsed as two tokens. :issue:`1361`
  - New rules are applied to stencil masks and explicit masks when calculating the optimal page DPI for rendering. :issue:`1362`
  - Fixed attempts to use an incompatible jbig2.EXE provided by TeX Live. :issue:`1363`

- Changes from 16.4.2
  - Fixed order of filenames passed to Ghostscript for PDF/A generation. :issue:`1359`
  - Suppressed missing jbig2dec warning message. :issue:`1358`
  - Fixed calculation of image size when soft mask dimensions don't match image dimension. :issue:`1351`
  - Several fixes to documentation. Thanks to users Iris and JoKalliauer who contributed these changes.
  - Fixed error on processing PDFs that are missing certain image metadata. :issue:`1315`

-------------------------------------------------------------------
Tue Jul  2 16:57:23 UTC 2024 - ecsos <ecsos@opensuse.org>

- Update to 16.4.1
  - Fixed calculation of image printed area (used in finding weighted DPI for OCR). :issue:`1334`
  - Fixed "NotImplementedError: not sure how to get colorspace" error messages
    in logs which simply records a failure to optimize images with print production colorspaces. :issue:`1315`

- Changes from 16.4.0
  - Selecting the osd and equ pseudo-languages with -l/--language now exits with an error
    when using Tesseract OCR, because these are not regular Tesseract languages
    but implementation details implemented. Using them can cause Tesseract to crash.
  - The hOCR renderer is more tolerant of extra whitespace in input files.
  - watcher.py now changes the output file extension to .pdf when the input is not .pdf.
  - Improved handling of PDFs that contain circularly referenced Form XObjects. :issue:`1321`
  - Fixed Alpine Docker image for ARM64, which was not building correctly.
  - Docker images now use pikepdf 9.0.0.
  - Prevent use of Tesseract OCR 5.4.0, a version with known regressions.
  - Disabled progressbar for "Linearizing" when --no-progress-bar set.
  - Fixed some tests that warn about missing JBIG2 decoding via pikepdf,
    by installing the necessary libraries during tests.

-------------------------------------------------------------------
Sat May 25 09:33:33 UTC 2024 - ecsos <ecsos@opensuse.org>

- Update to 16.3.1
  - Fixed a test suite failure with Ghostscript 10.03.0+. 1316
  - Fixed an issue with the presentation of the "OCR" progress bar. 1313

- Changes from 16.3.0
  - Fixed progress bar not displaying for Ghostscript PDF/A conversion. 1313
  - Added progress bar for linearization. 1313
  - If --rotate-pages-threshold issued without --rotate-pages we now
    exit with an error since the user likely intended to use --rotate-pages. 1309
  - If Tesseract hOCR gives an invalid line box, print an error message instead of exiting with an error. 1312

-------------------------------------------------------------------
Fri Apr 19 08:20:24 UTC 2024 - ecsos <ecsos@opensuse.org>

- Update to 16.2.0
  - Fixed issue ‘NoneType’ object has no attribute ‘get’ when optimizing certain PDFs. #1293, #1271
  - Switched formatting from black to ruff.
  - Added support for sending sidecar output to io.BytesIO.
  - Added support for converting HEIF/HEIC images (the native image of iPhones and some other devices) to PDFs,
    when the appropriate pi-hief library is installed. This library is marked as a dependency,
    but maintainers may opt out if needed.
  - We now default to downsampling large images that would exceed Tesseract’s internal limits,
    but only if it cause processing to fail. Previously, this behavior only occurred if specifically
    requested on command line. It can still be configured and disabled. See the –tesseract command line options.
  - Added Macports install instructions. Thanks @akierig.
  - Improved logging output when an unexpected error occurs while trying to obtain the version of a third party program.

- Changes from 16.1.2
  - Fixed test suite failure when using Ghostscript 10.3.
  - Other minor corrections.

-------------------------------------------------------------------
Sat Feb 17 09:38:04 UTC 2024 - ecsos <ecsos@opensuse.org>

- Update to 16.1.1
  - Fixed PyPy 3.10 support.
- Changes from 16.1.0
  - Improved hOCR renderer is now default for left to right languages.
  - Improved handling of rotated pages. Previously, OCR text might be missing for
    pages that were rotated with a /Rotate tag on the page entry.
  - Improved handling of cropped pages. Previously, in some cases a page with a
    crop box would not have its OCR applied correctly and misalignment between
    OCR text and visible text coudl occur.
  - Documentation improvements, especially installation instructions for less
    common platforms.

-------------------------------------------------------------------
Mon Jan  8 15:26:44 UTC 2024 - ecsos <ecsos@opensuse.org>

- Update to 16.0.4
  - Fixed some issues for left-to-right text with the new hOCR renderer.
    It is still not default yet but will be made so soon.
    Right-to-left text is still in progress.
  - Added an error to prevent use of several versions of Ghostscript
    that seem corrupt existing text in input PDFs. 
    Newly generated OCR is not affected.
    For best results, use Ghostscript 10.02.1 or newer,
    which contains the fix for the issue.

-------------------------------------------------------------------
Thu Jan  4 10:05:05 UTC 2024 - ecsos <ecsos@opensuse.org>

- Update to 16.0.3
  - Changed minimum required Ghostscript to 9.54, to support users of RHEL 9 and its derivatives,
    since that is the latest version available there.
  - Removed warning message about CVE-2023-43115, on the assumption that most distributions have backported the patch by now.
- Changes from 16.0.2
  - Temporarily changed PDF text renderer back to sandwich by default to address regressions in macOS Preview.
- Changes from 16.0.1
  - Fixed text rendering issue with new hOCR text renderer - extraneous byte order marks.
  - Tightened dependencies.
- Changes from 16.0.0
  - Added OCR text renderer, combined the best ideas of Tesseract's PDF generator and the older hOCR transformer renderer.
    The result is a hopefully permanent fix for wordssmushedtogetherwithoutspaces issues in extracted text, better
    registration/position of text on skewed baselines :issue:`1009`, fixes to character output when the German Fraktur script
    is used :issue:`1191`, proper rendering of right to left languages (Arabic, Hebrew, Persian) :issue:`1157`.
    Asian languages may still have excessive word breaks compared to expectations. The new renderer is the default; 
    the old sandwich renderer is still available using --pdf-renderer sandwich; the old hOCR renderer is no more.
  - The ocrmypdf.hocrtransform API has changed substantially.
  - Support for Python 3.9 has been dropped. Python 3.10+ is now required.
  - pikepdf >= 8.8.0 is now required.

-------------------------------------------------------------------
Fri Dec 15 08:32:05 UTC 2023 - ecsos <ecsos@opensuse.org>

- Initial version 15.4.4

Places

File python-ocrmypdf.changes of Package python-ocrmypdf

Places