OCRmyPDF

Overview Repositories Revisions Requests Users Attributes Meta

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

1 derived packages
Derived Packages
home:Simmphonie:python310:binary
Cancel
Download package
Checkout Package
osc -A https://api.opensuse.org checkout home:frank_kunz/OCRmyPDF && cd $_
Create Badge

Build Results
RPM Lint

Refresh

Source Files

Filename	Size	Changed
OCRmyPDF-16.6.1.tar.gz	0006696285 6.39 MB	14 days ago
OCRmyPDF-rpmlintrc	0000000048 48 Bytes	14 days ago
OCRmyPDF.changes	0000018930 18.5 KB	14 days ago
OCRmyPDF.spec	0000004853 4.74 KB	14 days ago

Latest Revision

Frank Kunz (frank_kunz) committed 14 days ago (revision 18)

- Update to version 16.6.1
v16.6.1
Fixed some issues with Docker build, such as removing unnecessary content and using a stable Tesseract version.
Reverted Docker image to Ubuntu 22.04 to access older/more stable Ghostscript for now.
Clarified batch commands in documentation.
Fixed an issue with JSON serialization and pickling of HOCRResult. :issue:`1427`
v16.6.0
Fixed an issue where damaged PDFs would fail with --redo-ocr. :issue:`1403`
Fixed an error that prevented JBIG2 optimization on Windows if the image was optimized in an earlier step. :issue:`1396`
Fixed an error detecting the version of unpaper 7.0.0. :issue:`1409`
Fixed a performance regression when scanning pages. :issue:`1378`. Thanks @aliemjay.
Fixed Alpine Docker image by enforcing Alpine 3.19. Alpine 3.20 includes a defective version of Tesseract OCR and so is not usable.
Upgraded Ubuntu Docker image to use Ubuntu 24.04.
Build and test scripts/actions switched to uv.
When running in a container, we now remind the user that temporary folders are inside the container and may not be accessible.
Fixed Linux test coverage matrix, which was missing some key versions.
v16.5.0
Fixed issue with interpreting PDFs that have images with array masks. :issue:`1377`
Enabled testing on Python 3.13.
Fixed a test that did not work correctly but still passed. :issue:`1382`
Improved "PDF/A conversion failed" warning message to better describe implications.
Updated documentation to better explain OCR_JSON_SETTINGS in batch processing.
Build backend changed from setuptools to hatchling.
v16.4.3
Work around pdfminer.six issue where a token on the buffer boundary is incorrectly parsed as two tokens. :issue:`1361`
New rules are applied to stencil masks and explicit masks when calculating the optimal page DPI for rendering. :issue:`1362`
Fixed attempts to use an incompatible jbig2.EXE provided by TeX Live. :issue:`1363`
v16.4.2
Fixed order of filenames passed to Ghostscript for PDF/A generation. :issue:`1359`
Suppressed missing jbig2dec warning message. :issue:`1358`

Comments 2

Thomas Glatt (tglatt) - over 1 year ago

Thank you for this package! If you are interested in some feedback: On my system I needed the following python modules to run the software:

pluggy
img2pdf
reportlab
pdfminer.six
coloredlogs
tqdm

Andrea Ippolito (andrea.ippo) - about 1 year ago

Yes that's unfortunate. I tried to use this repo to avoid the need to remember to pip update every time a new ocrmypdf version comes out, but as it stands it's not really practically since I have to fetch the dependencies manually elsewhere, it becomes a bit too much. I guess I'll keep using pip (well, pipx actually), until the authors of ocrmypdf finally provide a build for opensuse (or even a flatpak!)

Places

OCRmyPDF

Edit Package OCRmyPDF

Source Files

Latest Revision

Comments 2

Places