Tesseract Open Source OCR Engine
Tesseract is a free optical character recognition engine originally developed at Hewlett-Packard and currently developed by Google. It is a raw OCR engine - it has no document layout analysis, no output formatting, and no graphical user interface. It only processes a TIFF or BMP image of a single column and creates text from it. It can detect fixed pitch vs proportional text. The engine was in the top 3 in terms of character accuracy in 1995. The source code will read a binary, grey or color image and output text.
Tesseract can process English, French, Italian, German, Spanish, Brazilian, Portuguese and Dutch and can be trained to work in other languages as well.
- Developed at Publishing
- Sources inherited from project openSUSE:Factory
-
1
derived packages
- Download package
-
Checkout Package
osc -A https://api.opensuse.org checkout home:seife:Factory/tesseract-ocr && cd $_
- Create Badge
Refresh
Refresh
Source Files
Filename | Size | Changed |
---|---|---|
tesseract-ocr-3.05.00.tar.gz | 0003581853 3.42 MB | |
tesseract-ocr.changes | 0000007527 7.35 KB | |
tesseract-ocr.spec | 0000003957 3.86 KB |
Revision 5 (latest revision is 18)
Dominique Leuenberger (dimstar_suse)
accepted
request 458814
from
Ismail Dönmez (namtrac)
(revision 5)
### Depends on sr#458696 ### - Update to 3.05.00 * Made some fine tuning to the hOCR output. * Added TSV as another optional output format. * Fixed ABI break introduced in 3.04.00 with the AnalyseLayout() method. * text2image tool - Enable all OpenType ligatures available in a font. This feature requires Pango 1.38 or newer. * Training tools - Replaced asserts with tprintf() and exit(1). * Improved multipage tiff processing. * Improved the embedded pdf font (pdf.ttf). * Enable selection of OCR engine mode from command line. * Changed tesseract command line parameter '-psm' to '--psm'. * Added new C API for orientation and script detection, removed the old one. * Fixed many compiler warning. * Fixed memory and resource leaks.
Comments 0