diff options
Diffstat (limited to 'static/unix-v10/man1/ocr.1')
| -rw-r--r-- | static/unix-v10/man1/ocr.1 | 176 |
1 files changed, 176 insertions, 0 deletions
diff --git a/static/unix-v10/man1/ocr.1 b/static/unix-v10/man1/ocr.1 new file mode 100644 index 00000000..184a240a --- /dev/null +++ b/static/unix-v10/man1/ocr.1 @@ -0,0 +1,176 @@ +.TH OCR 1 cetus,hydra,coma +.CT 1 graphics +.SH NAME +ocr \- optical character recognition +.SH SYNOPSIS +.B ocr +[ +.I option ... +] +[ +.I file +] +.SH DESCRIPTION +.I Ocr +reads a black-and-white image of a page from +.IR file , +and writes ASCII to the standard output. +If no +.I file +is specified, it reads from the standard input. +.PP +The input is a +.IR picfile (5) +image of one column of machine-printed text, normally +scanned in by +.IR cscan (1). +Fonts, sizes, and line-spacings may vary within the column, +but each line should have a constant text size and baseline. +Lines should be parallel and roughly horizontal. +.PP +In the output, white space approximates the original page layout. +Words that +.IR spell (1) +are preferred, and hyphenations across lines are recombined. +.PP +The options are: +.nr xx \w'\fL-pn,m\ \ ' +.TP \n(xxu +.BI -a s +The alphabet is the union of symbol sets selected by characters in string +.IR s , +from among: +.RS +.PD +.nr yy \w'\fLA\ \ ' +.TP \n(yyu +.B A +ABCDEFGHIJKLMNOPQRSTUVWXYZ +.PD0 +.TP +.B a +abcdefghijklmnopqrstuvwxyz +.PD0 +.TP +.B 0 +0123456789 +.PD0 +.TP +.B . +.ie t \&.\^,\|-\^:\^;\|*\^'\|\^"\|?\^!\|/\|&\|$\^(\^)\^[\|\^]\|#\|@\|% \0\0\0\0\0\0\0\0\0\0\0 \kz(basic punctuation) +.el \&.\^,\|-\^:\^;\|*\^'\|\^"\|?\^!\|/\|&\|$\^(\^)\^[\|\^]\|#\|@\|% \0\0\0\0\0\0\0 \kz(basic punctuation) +.ig +should include ` /(em + ??? +shouldn't include []#@% ??? +.. +.PD0 +.TP +.B ^ +^\|\f(CW~\fR\^`\|\^\\\||\|\^{\|}\|_ \h'|\nzu'(extended punct'n) +.ig +should include []#@% ??? +shouldn't include ` ??? +.. +.PD0 +.TP +.B + ++\^\-\^*\|/\|<\^>\^=\^.\^E\|e\|[\|] \h'|\nzu'(numerical punct'n) +.PD0 +.TP +.B s +.ie t \(sc\^\(dg\^\(dd\^\(ct\|\(bu\|\(co\|\(rg\|\(de\^\(fm\^\(en\|\^\(mi\|\(em \h'|\nzu'(selected non-ASCII) +.el \\(sc\\(dg\\(dd\\(ct\\(bu\\(co ... \h'|\nzu'(selected non-ASCII) +.PD0 +.TP +.B l +.ie t \(fi\|\(fl\|f\h'-.1m'f\|f\h'-.1m'\(fi\|f\h'-.1m'\(fl\|\N'114'\|\N'115'\|\N'105'\|\N'106' \h'|\nzu'(ligatures and digraphs) +.el fi fl ff ffi ffl ae oe ... \h'|\nzu'(ligatures, digraphs) +.PD0 +.TP +.B g +.ie t \(*a\(*b\(*g\(*d\(*e\(*z\(*y\(*h\(*i\(*k\(*l\(*m\(*n\(*c\(*o\(*p\(*r\(*s\(*t\(*u\(*f\(*x\(*q\(*w \h'|\nzu'(Greek lower case) +.el \\(*a\\(*b\\(*g\\(*d\\(*e\\(*z ... \h'|\nzu'(Greek lower case) +.PD0 +.TP +.B G +.ie t AB\(*G\(*DEZH\(*HIK\(*LMN\(*CO\(*PP\(*STY\(*FX\(*Q\(*W \h'|\nzu'(Greek upper case) +.el AB\\(*G\\(*DEZ ... \h'|\nzu'(Greek upper case) +.PD +.PP +The default is +.BR -aAa0.+^ , +the full printable-ASCII set, which may be abbreviated as +.BR -ap . +Thus, +.B -apslgG +selects all of the above. +.RE +.PD +.TP \n(xxu +.B -c +Find columns in complex nested layouts using greedy white covers algorithm. +.TP +.BI -m l[,r] +Trim the left and right margins of the image by +.I l +and +.I r +inches, respectively, before looking for columns. +If +.I r +is omitted, it is assumed to equal +.IR l. +.TP +.BI -n n +Find the +.I n +largest columns by analysis of a single vertical projection. +Each column should be compactly-printed +and separated from the others by at least 2 ems of horizontal white space. +.TP +.BI -p n,m +Point sizes lie in the range [ +.I n, m +]; other sizes are discarded. +The default is +.BR -p6,24 . +.TP +.B -s +Defeat spelling check (but continue to favor numeric strings and good punctuation). +.TP +.B -t +Write +.IR troff (1) +format. +Each column is shown on a separate page, lines at their original height, +words at their original horizontal location, and +characters roughly original size in Times roman. +Hyphenated words are not recombined. +.TP +.B -u +Unspellable words are prefixed with `?' or, if +.B -t +is specified, printed boldface. +.TP +.BI -w w +Find the largest column of width +.I w +inches, within a single vertical projection. +.SS Fonts +Trained on over 100 Latin-alphabet book fonts in various italic, bold, etc styles. +Only one font of Greek, without diacriticals. +Also Swedish and Tibetan, on request. +.SH SEE ALSO +.IR bcp (1), +.IR cscan (1), +.IR font (6), +.IR picfile (5), +.IR spell (1), +.IR troff (1) +.SH BUGS +For best results, use images of high-contrast, cleanly-printed original +documents digitized at a resolution of 400 pixels/inch or higher. +It may help to restrict the alphabet and sizes to what's there. +.ig +8.7 CPU minutes on pipe to read this page, September 1989. +.. |
