diff options
| author | Jacob McDonnell <jacob@jacobmcdonnell.com> | 2026-04-26 16:38:00 -0400 |
|---|---|---|
| committer | Jacob McDonnell <jacob@jacobmcdonnell.com> | 2026-04-26 16:38:00 -0400 |
| commit | 97d5c458cfa039d857301e1ca7d5af3beb37131d (patch) | |
| tree | b460cd850d0537eb71806ba30358840377b27688 /static/plan9-4e/man1/doc2txt.1 | |
| parent | b89dc2331a50c63f8b33272a5c4c61ab98abdaa3 (diff) | |
build: Better Build System
Diffstat (limited to 'static/plan9-4e/man1/doc2txt.1')
| -rw-r--r-- | static/plan9-4e/man1/doc2txt.1 | 53 |
1 files changed, 53 insertions, 0 deletions
diff --git a/static/plan9-4e/man1/doc2txt.1 b/static/plan9-4e/man1/doc2txt.1 new file mode 100644 index 00000000..db7beb78 --- /dev/null +++ b/static/plan9-4e/man1/doc2txt.1 @@ -0,0 +1,53 @@ +.TH DOC2TXT 1 +.SH NAME +doc2txt, olefs, mswordstrings \- extract printable strings from Microsoft Word documents +.SH SYNOPSIS +.B doc2txt +[ +.I file.doc +] +.br +.B aux/olefs +[ +.B -m +.I mtpt +] +.I file.doc +.br +.B aux/mswordstrings +.I /mnt/doc/WordDocument +.SH DESCRIPTION +.I Doc2txt +is a shell script that uses +.I olefs +and +.I mswordstrings +to extract the printable text from the body of a Microsoft Word document. +.PP +Microsoft Office documents are stored in OLE (Object Linking and Embedding) +format, which is a scaled down version of Microsoft's FAT file system. +.I Olefs +presents the contents of an Office document as a file system +on +.IR mtpt , +which defaults to +.BR /mnt/doc . +.I Mswordstrings +parses the +.I WordDocument +file inside an Office document, extracting +the text stream. +.SH SOURCE +.B /sys/src/cmd/aux/mswordstrings.c +.br +.B /sys/src/cmd/aux/olefs.c +.br +.B /rc/bin/doc2txt +.SH SEE ALSO +.IR strings (1) +.br +``Microsoft Word 97 Binary File Format'', +available on line at Microsoft's developer home page. +.br +``LAOLA Binary Structures'', +.IR snake.cs.tu-berlin.de:8081/~schwartz/pmh . |
