summaryrefslogtreecommitdiff
path: root/static/plan9-4e/man1/doc2txt.1
diff options
context:
space:
mode:
authorJacob McDonnell <jacob@jacobmcdonnell.com>2026-04-26 16:38:00 -0400
committerJacob McDonnell <jacob@jacobmcdonnell.com>2026-04-26 16:38:00 -0400
commit97d5c458cfa039d857301e1ca7d5af3beb37131d (patch)
treeb460cd850d0537eb71806ba30358840377b27688 /static/plan9-4e/man1/doc2txt.1
parentb89dc2331a50c63f8b33272a5c4c61ab98abdaa3 (diff)
build: Better Build System
Diffstat (limited to 'static/plan9-4e/man1/doc2txt.1')
-rw-r--r--static/plan9-4e/man1/doc2txt.153
1 files changed, 53 insertions, 0 deletions
diff --git a/static/plan9-4e/man1/doc2txt.1 b/static/plan9-4e/man1/doc2txt.1
new file mode 100644
index 00000000..db7beb78
--- /dev/null
+++ b/static/plan9-4e/man1/doc2txt.1
@@ -0,0 +1,53 @@
+.TH DOC2TXT 1
+.SH NAME
+doc2txt, olefs, mswordstrings \- extract printable strings from Microsoft Word documents
+.SH SYNOPSIS
+.B doc2txt
+[
+.I file.doc
+]
+.br
+.B aux/olefs
+[
+.B -m
+.I mtpt
+]
+.I file.doc
+.br
+.B aux/mswordstrings
+.I /mnt/doc/WordDocument
+.SH DESCRIPTION
+.I Doc2txt
+is a shell script that uses
+.I olefs
+and
+.I mswordstrings
+to extract the printable text from the body of a Microsoft Word document.
+.PP
+Microsoft Office documents are stored in OLE (Object Linking and Embedding)
+format, which is a scaled down version of Microsoft's FAT file system.
+.I Olefs
+presents the contents of an Office document as a file system
+on
+.IR mtpt ,
+which defaults to
+.BR /mnt/doc .
+.I Mswordstrings
+parses the
+.I WordDocument
+file inside an Office document, extracting
+the text stream.
+.SH SOURCE
+.B /sys/src/cmd/aux/mswordstrings.c
+.br
+.B /sys/src/cmd/aux/olefs.c
+.br
+.B /rc/bin/doc2txt
+.SH SEE ALSO
+.IR strings (1)
+.br
+``Microsoft Word 97 Binary File Format'',
+available on line at Microsoft's developer home page.
+.br
+``LAOLA Binary Structures'',
+.IR snake.cs.tu-berlin.de:8081/~schwartz/pmh .