cuneiform - multi-language OCR system
[--dotmatrix] [--fax] [--singlecolumn] [-f format
] [-o output
Cuneiform is an OCR system. In addition to text recognition it also does layout
analysis and text format recognition. Cuneiform supports several languages.
- Use recognition mode optimized for text printed with a dot
- Use recognition mode optimized for text that has been
- Disable page layout analysis and assumes that the image
consists of only one column of text.
- -f format
- Select output format. The following formats are available:
html (HTML format), hocr (hOCR HTML format), native
(native Cuneiform 2000), rtf (RTF format), smarttext (plain
text with TeX paragraphs), text (plain text). The default is plain
- -l language
- By default Cuneiform recognizes English text. To change the
language use the command line switch -l followed by a language code
(typically an ISO 639-2 three-letter code). The following languages are
- -o output
- If you do not define an output file with the -o
switch, Cuneiform writes the result to a file ‘cuneiform-out.
format’. The file extension depends on your output format.
Cuneiform can process any single-page image that GraphicsMagick knows how to
open. Please consult the gm
(1) manual page for the comprehensive list
of supported image formats.
More information about cuneiform can be found at <
cuneiform was written by Cognitive Technologies and Jussi Pakkanen <
This manual page was written by Daniel Baumann <
>, for the Debian project (but may be used by