summaryrefslogtreecommitdiff
path: root/static/plan9-4e/man8/scanmail.8
diff options
context:
space:
mode:
Diffstat (limited to 'static/plan9-4e/man8/scanmail.8')
-rw-r--r--static/plan9-4e/man8/scanmail.8447
1 files changed, 447 insertions, 0 deletions
diff --git a/static/plan9-4e/man8/scanmail.8 b/static/plan9-4e/man8/scanmail.8
new file mode 100644
index 00000000..ce639f72
--- /dev/null
+++ b/static/plan9-4e/man8/scanmail.8
@@ -0,0 +1,447 @@
+.TH SCANMAIL 8
+.SH NAME
+scanmail, testscan \- spam filters
+.SH SYNOPSIS
+.B upas/scanmail
+[
+.I options
+]
+[
+.I qer-args
+]
+.I root
+.B mail
+.I sender system rcpt-list
+.PP
+.B upas/testscan
+[
+.B -avd
+]
+[
+.B -p
+.I patfile
+]
+[
+.I filename
+]
+.SH DESCRIPTION
+.B Scanmail
+accepts a mail message supplied on standard input,
+applies a file of patterns to a portion of it,
+and dispatches
+the message based
+on the results.
+It exactly replaces the
+generic queuing command
+.IR qer (8)
+that is executed from the
+.IR rc (1)
+script
+.B /mail/lib/qmail
+in the mail processing pipeline.
+Associated with each pattern is an
+.I action
+in order of decreasing priority:
+.in +5
+.TP 10
+.B dump
+the message is deleted and a log entry is written to
+.B /sys/log/smtpd
+.TP 10
+.B hold
+the message is placed in a queue for human inspection
+.TP
+.B log
+a line containing the matching portion of the message is written to a log
+.in -5
+.PP
+If no pattern matches or only patterns with an action of
+.B log
+match, the message is accepted and
+.I scanmail
+queues the message for delivery.
+.I Scanmail
+meshes with the blocking facilities
+of
+.IR smtpd (6)
+to provide several layers of
+filtering on gateway systems. In all cases the sender
+is notified that the message has been successfully
+delivered,
+leaving the sender unaware that the message has been potentially delayed or deleted.
+.PP
+.I Scanmail
+accepts the arguments of
+.IR qer (8)
+as well as the following:
+.TF filename
+.TP
+.B -c
+Save a copy of each message in a
+randomly-named file in
+directory
+.BR /mail/copy .
+.TP
+.B -d
+Write debugging information to standard error.
+.TP
+.B -h
+Queue
+.I held
+messages by sending domain name.
+The
+.B -q
+option must specify a root directory; messages
+are queued in subdirectories of this directory.
+If the
+.B -h
+option is not specified,
+messages are accumulated in a subdirectory of
+.B /mail/queue.hold
+named for the contents of
+.BR /dev/user ,
+usually
+.BR none .
+.TF filename
+.TP
+.B -n
+Messages are never held for inspection, but are delivered. Also known as
+.IR "vacation mode" .
+.TP
+.BI -p " filename"
+Read the patterns from
+.I filename
+rather than
+.BR /mail/lib/patterns .
+.TP
+.BI -q " holdroot"
+Queue deliverable messages in subdirectories of
+.IR holdroot .
+This option is the same as the
+.B -q
+option of
+.IR qer (8)
+and must be present if the
+.B -h
+option is given.
+.TP
+.B -s
+Save deleted
+messages. Messages are stored, one per randomly-named file,
+in subdirectories of
+.B /mail/queue.dump
+named with the date.
+.TP
+.B -t
+Test mode. The pattern matcher is applied but the message is
+discarded and the result is not logged.
+.TP
+.B -v
+Print the highest priority match.
+This is useful
+with the
+.B -t
+option for testing the pattern matcher without actually
+sending a message.
+.PD
+.PP
+.I Testscan
+is the command line version of
+.IR scanmail .
+If
+.I filename
+is missing, it applies the pattern set to
+the message on standard input. Unlike
+.IR scanmail ,
+which finds the highest priority match,
+.I testscan
+prints all matches in the portion of the message under test.
+It is useful for testing a pattern set or
+implementing a personal filter
+using the
+.B pipeto
+file in a user's mail directory.
+.I Testscan
+accepts the following options:
+.TP
+.B -a
+Print matches in the complete input message
+.TP
+.B -d
+Enable debug mode
+.TP
+.B -v
+Print the message after conversion to canonical form
+.RI ( q.v. ).
+.TP
+.BI -p " filename"
+Read the patterns from
+.I filename
+rather than
+.BR /mail/lib/patterns .
+.SS Canonicalization
+Before pattern matching, both programs convert a portion of
+the message header and the beginning of the
+message to a canonical form. The amount of the header
+and message body processed are set by
+compile-time parameters in the source files.
+The canonicalization process converts letters to lower-case and
+replaces consecutive spaces, tabs and newline characters
+with a single space. HTML commands are
+deleted except for the parameters following
+.B A
+.BR HREF ,
+.B IMG
+.BR SRC ,
+and
+.B IMG
+.B BORDER
+directives. Additionally, the following MIME escape sequences
+are replaced by their ASCII
+equivalents:
+.PP
+.EX
+ Escape Seq ASCII
+ ---------- -----
+ =2e .
+ =2f /
+ =20 <space>
+ =3d =
+.EE
+and the sequence
+.I =<newline>
+is elided.
+.I Scanmail
+assembles the sender, destination domain and recipient fields of
+the command line into a string that is
+subjected to the same canonical processing.
+Following canonicalization, the command line and
+the two long strings containing
+the header and the message body are passed to the
+matching engine for analysis.
+.SS Pattern Syntax
+The matching engine compiles the pattern set
+and matches it to each canonicalized input string.
+Patterns are specified one per line
+as follows:
+.PP
+.EX
+ {*}\fIaction\fP: \fIpattern-spec\fP {~~\fIoverride\fP...~~\fIoverride\fP}
+.EE
+.PP
+On all lines, a
+.B #
+introduces a comment; there is no way to escape this character.
+.PP
+Lines beginning with
+.B *
+contain a
+.I pattern-spec
+that is a string; otherwise, the the
+.I pattern-spec
+is a regular expression in the style of
+.IR regexp (6).
+Regular expression matching is many
+times less efficient than string matching, so it is
+wiser to enumerate several similar strings
+than to combine them into a regular expression.
+The
+.I action
+is a keyword terminated by a
+.B :
+and separated from the pattern by optional white-space.
+It must be one of the following:
+.TP 10
+.B dump
+if the pattern matches, the message is deleted. If the
+.B -s
+command line option is set, the message is saved.
+.TP 10
+.B hold
+if the pattern matches, the message is queued in a subdirectory
+of
+.B /mail/queue.hold
+for manual inspection. After inspection, the queue can be swept
+manually using
+.B runq
+(see
+.IR qer (8))
+to deliver messages that were inadvertently matched.
+.TP 10
+.B header
+this is the same as the
+.B hold
+action, except the pattern is only applied to the message header.
+This optimization is useful for patterns that match header fields
+that are unlikely to be present in the body of the message.
+.TP 10
+.B line
+the sender and a section of the message around the match are written to
+the file
+.BR /sys/log/lines .
+The message is always delivered.
+.TP 10
+.B loff
+patterns of this type are applied only to the canonicalized command line.
+When a match occurs, all patterns with
+.B line
+actions are disabled. This is useful for limiting
+the size of the log file by excluding repetitive messages, such
+as those from mailing lists.
+.PP
+Patterns are accumulated into pattern sets sharing the same action.
+The matching engine applies the
+.B dump
+pattern set first, then the
+.B header
+and
+.B hold
+pattern sets, and finally the
+.B line
+pattern set. Each pattern set is applied three times:
+to the canonicalized command line, to the message header, and
+finally to the message body. The ordering of patterns
+in the pattern file is insignificant.
+.PP
+The
+.I pattern-spec
+is a string of characters terminated by a
+.BR newline ,
+.B #
+or override indicator,
+.BR ~~ .
+Trailing white-space is deleted but
+patterns containing leading or trailing white-space can
+be enclosed in double-quote
+characters. A pattern containing a double-quote
+must be enclosed in double-quote
+characters and preceded by a backslash.
+For example, the pattern
+.PP
+.EX
+ "this is not \\"spam\\""
+.EE
+.PP
+matches the string \fLthis is not "spam"\fP.
+The
+.I pattern-spec
+is followed by zero or more
+.I override
+strings. When the specific pattern matches,
+each override is applied and
+if one matches, it cancels the effect of the pattern.
+Overrides must be strings; regular expressions are not supported.
+Each override is introduced by the string
+.BR ~~
+and continues until a subsequent
+.BR ~~ ,
+.B #
+or
+.BR newline ,
+white-space included.
+A
+.B ~~
+immediately followed by a
+.B newline
+indicates a line continuation and further overrides continue
+on the following line.
+Leading white-space
+on the continuation line is ignored. For example,
+.PP
+.EX
+ *hold: sex.com~~essex.com~~sussex.com~~sysex.com~~
+ lasex.com~~cse.psu.edu!owner-9fans
+.EE
+.PP
+matches all input containing the string
+.B sex.com
+except for messages that also contain the
+strings in the override list. Often it
+is desirable to override a pattern based on
+the name of the sender or
+recipient. For this reason, each override
+pattern is applied to the header and the command line as well
+as the section of the
+canonicalized input containing the matching data.
+Thus a pattern matching the command line or the header
+searches both the command line and the header
+for overrides while a match in the body searches
+the body, header and command line for overrides.
+.PP
+The structure of the pattern file and the matching
+algorithm define the strategy for detecting
+and filtering unwanted messages. Ideally, a
+.B hold
+pattern selects a message for inspection and if it
+is determined to be undesirable, a specific
+.B dump
+pattern is added to delete further instances
+of the message. Additionally, it is often
+useful to block the sender by updating the
+.B smtpd
+control file.
+.PP
+In this regime, patterns with a
+.I dump
+action, generally match phrases
+that are likely to be unique. Patterns that
+hold a message for inspection
+match phrases commonly found in undesirable material and
+occasionally in legitimate messages. Patterns
+that log matches are less specific yet. In all
+cases the ability to override a pattern by
+matching another string, allows repetitive messages
+that trigger the pattern, such as mailing lists,
+to pass the filter after the first one is processed
+manually. The
+.B -s
+option allows deleted messages to be salvaged
+by either manual or semi-automatic review, supporting
+the specification of more aggressive patterns.
+Finally, the utility of the pattern matcher is not
+confined to filtering spam; it is a generally useful
+administrative tool for deleting inadvertently harmful
+messages, for example, mail loops, stuck senders or viruses.
+It is also useful for collecting or counting messages
+matching certain criteria.
+.SH FILES
+.TF /mail/queue.dump/*
+.TP
+.B /mail/lib/patterns
+default pattern file
+.TP
+.B /sys/log/smtpd
+log of deleted messages
+.TP
+.B /mail/log/lines
+file where
+.I log
+matches are logged
+.TP
+.B /mail/queue/*
+directories where legitimate messages are queued for delivery
+.TP
+.B /mail/queue.hold
+directory where held messages are queued for inspection
+.TP
+.B /mail/queue.dump/*
+directory where
+.I dumped
+messages are stored when the
+.B -s
+command line option is specified.
+.TP
+.B /mail/copy/*
+directory where copies of all incoming messages
+are stored.
+.SH SOURCE
+.TP
+.B /sys/src/cmd/upas/scanmail
+.SH "SEE ALSO"
+.IR mail (1),
+.IR qer (8),
+.IR smtpd (6)
+.SH BUGS
+.I Testscan
+does not report a match when the body of a message
+contains exactly one line.