summaryrefslogtreecommitdiff
path: root/static/v10/man1/ocr.1
diff options
context:
space:
mode:
Diffstat (limited to 'static/v10/man1/ocr.1')
-rw-r--r--static/v10/man1/ocr.1176
1 files changed, 176 insertions, 0 deletions
diff --git a/static/v10/man1/ocr.1 b/static/v10/man1/ocr.1
new file mode 100644
index 00000000..184a240a
--- /dev/null
+++ b/static/v10/man1/ocr.1
@@ -0,0 +1,176 @@
+.TH OCR 1 cetus,hydra,coma
+.CT 1 graphics
+.SH NAME
+ocr \- optical character recognition
+.SH SYNOPSIS
+.B ocr
+[
+.I option ...
+]
+[
+.I file
+]
+.SH DESCRIPTION
+.I Ocr
+reads a black-and-white image of a page from
+.IR file ,
+and writes ASCII to the standard output.
+If no
+.I file
+is specified, it reads from the standard input.
+.PP
+The input is a
+.IR picfile (5)
+image of one column of machine-printed text, normally
+scanned in by
+.IR cscan (1).
+Fonts, sizes, and line-spacings may vary within the column,
+but each line should have a constant text size and baseline.
+Lines should be parallel and roughly horizontal.
+.PP
+In the output, white space approximates the original page layout.
+Words that
+.IR spell (1)
+are preferred, and hyphenations across lines are recombined.
+.PP
+The options are:
+.nr xx \w'\fL-pn,m\ \ '
+.TP \n(xxu
+.BI -a s
+The alphabet is the union of symbol sets selected by characters in string
+.IR s ,
+from among:
+.RS
+.PD
+.nr yy \w'\fLA\ \ '
+.TP \n(yyu
+.B A
+ABCDEFGHIJKLMNOPQRSTUVWXYZ
+.PD0
+.TP
+.B a
+abcdefghijklmnopqrstuvwxyz
+.PD0
+.TP
+.B 0
+0123456789
+.PD0
+.TP
+.B .
+.ie t \&.\^,\|-\^:\^;\|*\^'\|\^"\|?\^!\|/\|&\|$\^(\^)\^[\|\^]\|#\|@\|% \0\0\0\0\0\0\0\0\0\0\0 \kz(basic punctuation)
+.el \&.\^,\|-\^:\^;\|*\^'\|\^"\|?\^!\|/\|&\|$\^(\^)\^[\|\^]\|#\|@\|% \0\0\0\0\0\0\0 \kz(basic punctuation)
+.ig
+should include ` /(em + ???
+shouldn't include []#@% ???
+..
+.PD0
+.TP
+.B ^
+^\|\f(CW~\fR\^`\|\^\\\||\|\^{\|}\|_ \h'|\nzu'(extended punct'n)
+.ig
+should include []#@% ???
+shouldn't include ` ???
+..
+.PD0
+.TP
+.B +
++\^\-\^*\|/\|<\^>\^=\^.\^E\|e\|[\|] \h'|\nzu'(numerical punct'n)
+.PD0
+.TP
+.B s
+.ie t \(sc\^\(dg\^\(dd\^\(ct\|\(bu\|\(co\|\(rg\|\(de\^\(fm\^\(en\|\^\(mi\|\(em \h'|\nzu'(selected non-ASCII)
+.el \\(sc\\(dg\\(dd\\(ct\\(bu\\(co ... \h'|\nzu'(selected non-ASCII)
+.PD0
+.TP
+.B l
+.ie t \(fi\|\(fl\|f\h'-.1m'f\|f\h'-.1m'\(fi\|f\h'-.1m'\(fl\|\N'114'\|\N'115'\|\N'105'\|\N'106' \h'|\nzu'(ligatures and digraphs)
+.el fi fl ff ffi ffl ae oe ... \h'|\nzu'(ligatures, digraphs)
+.PD0
+.TP
+.B g
+.ie t \(*a\(*b\(*g\(*d\(*e\(*z\(*y\(*h\(*i\(*k\(*l\(*m\(*n\(*c\(*o\(*p\(*r\(*s\(*t\(*u\(*f\(*x\(*q\(*w \h'|\nzu'(Greek lower case)
+.el \\(*a\\(*b\\(*g\\(*d\\(*e\\(*z ... \h'|\nzu'(Greek lower case)
+.PD0
+.TP
+.B G
+.ie t AB\(*G\(*DEZH\(*HIK\(*LMN\(*CO\(*PP\(*STY\(*FX\(*Q\(*W \h'|\nzu'(Greek upper case)
+.el AB\\(*G\\(*DEZ ... \h'|\nzu'(Greek upper case)
+.PD
+.PP
+The default is
+.BR -aAa0.+^ ,
+the full printable-ASCII set, which may be abbreviated as
+.BR -ap .
+Thus,
+.B -apslgG
+selects all of the above.
+.RE
+.PD
+.TP \n(xxu
+.B -c
+Find columns in complex nested layouts using greedy white covers algorithm.
+.TP
+.BI -m l[,r]
+Trim the left and right margins of the image by
+.I l
+and
+.I r
+inches, respectively, before looking for columns.
+If
+.I r
+is omitted, it is assumed to equal
+.IR l.
+.TP
+.BI -n n
+Find the
+.I n
+largest columns by analysis of a single vertical projection.
+Each column should be compactly-printed
+and separated from the others by at least 2 ems of horizontal white space.
+.TP
+.BI -p n,m
+Point sizes lie in the range [
+.I n, m
+]; other sizes are discarded.
+The default is
+.BR -p6,24 .
+.TP
+.B -s
+Defeat spelling check (but continue to favor numeric strings and good punctuation).
+.TP
+.B -t
+Write
+.IR troff (1)
+format.
+Each column is shown on a separate page, lines at their original height,
+words at their original horizontal location, and
+characters roughly original size in Times roman.
+Hyphenated words are not recombined.
+.TP
+.B -u
+Unspellable words are prefixed with `?' or, if
+.B -t
+is specified, printed boldface.
+.TP
+.BI -w w
+Find the largest column of width
+.I w
+inches, within a single vertical projection.
+.SS Fonts
+Trained on over 100 Latin-alphabet book fonts in various italic, bold, etc styles.
+Only one font of Greek, without diacriticals.
+Also Swedish and Tibetan, on request.
+.SH SEE ALSO
+.IR bcp (1),
+.IR cscan (1),
+.IR font (6),
+.IR picfile (5),
+.IR spell (1),
+.IR troff (1)
+.SH BUGS
+For best results, use images of high-contrast, cleanly-printed original
+documents digitized at a resolution of 400 pixels/inch or higher.
+It may help to restrict the alphabet and sizes to what's there.
+.ig
+8.7 CPU minutes on pipe to read this page, September 1989.
+..