summaryrefslogtreecommitdiff
path: root/static/v10/man1/flex.1
diff options
context:
space:
mode:
authorJacob McDonnell <jacob@jacobmcdonnell.com>2026-04-25 21:07:28 -0400
committerJacob McDonnell <jacob@jacobmcdonnell.com>2026-04-25 21:07:28 -0400
commit711594636704defae873be1a355a292505585afd (patch)
tree59ee13f863830d8beba6cfd02bbe813dd486c26f /static/v10/man1/flex.1
parent3258a063c1f189d7b019e40e525b46bef9b9a7b1 (diff)
docs: Added UNIX V10 Manuals
Diffstat (limited to 'static/v10/man1/flex.1')
-rw-r--r--static/v10/man1/flex.1716
1 files changed, 716 insertions, 0 deletions
diff --git a/static/v10/man1/flex.1 b/static/v10/man1/flex.1
new file mode 100644
index 00000000..99cb5667
--- /dev/null
+++ b/static/v10/man1/flex.1
@@ -0,0 +1,716 @@
+.TH FLEX 1 "20 June 1989" "Version 2.1"
+.SH NAME
+flex - fast lexical analyzer generator
+.SH SYNOPSIS
+.B flex
+[
+.B -bdfipstvFILT -c[efmF] -Sskeleton_file
+] [
+.I filename
+]
+.SH DESCRIPTION
+.I flex
+is a rewrite of
+.I lex
+intended to right some of that tool's deficiencies: in particular,
+.I flex
+generates lexical analyzers much faster, and the analyzers use
+smaller tables and run faster.
+.SH OPTIONS
+In addition to lex's
+.B -t
+flag, flex has the following options:
+.TP
+.B -b
+Generate backtracking information to
+.I lex.backtrack.
+This is a list of scanner states which require backtracking
+and the input characters on which they do so. By adding rules one
+can remove backtracking states. If all backtracking states
+are eliminated and
+.B -f
+or
+.B -F
+is used, the generated scanner will run faster (see the
+.B -p
+flag). Only users who wish to squeeze every last cycle out of their
+scanners need worry about this option.
+.TP
+.B -d
+makes the generated scanner run in
+.I debug
+mode. Whenever a pattern is recognized the scanner will
+write to
+.I stderr
+a line of the form:
+.nf
+
+ --accepting rule #n
+
+.fi
+Rules are numbered sequentially with the first one being 1. Rule #0
+is executed when the scanner backtracks; Rule #(n+1) (where
+.I n
+is the number of rules) indicates the default action; Rule #(n+2) indicates
+that the input buffer is empty and needs to be refilled and then the scan
+restarted. Rules beyond (n+2) are end-of-file actions.
+.TP
+.B -f
+has the same effect as lex's -f flag (do not compress the scanner
+tables); the mnemonic changes from
+.I fast compilation
+to (take your pick)
+.I full table
+or
+.I fast scanner.
+The actual compilation takes
+.I longer,
+since flex is I/O bound writing out the big table.
+.IP
+This option is equivalent to
+.B -cf
+(see below).
+.TP
+.B -i
+instructs flex to generate a
+.I case-insensitive
+scanner. The case of letters given in the flex input patterns will
+be ignored, and the rules will be matched regardless of case. The
+matched text given in
+.I yytext
+will have the preserved case (i.e., it will not be folded).
+.TP
+.B -p
+generates a performance report to stderr. The report
+consists of comments regarding features of the flex input file
+which will cause a loss of performance in the resulting scanner.
+Note that the use of
+.I REJECT
+and variable trailing context (see
+.B BUGS)
+entails a substantial performance penalty; use of
+.I yymore(),
+the
+.B ^
+operator,
+and the
+.B -I
+flag entail minor performance penalties.
+.TP
+.B -s
+causes the
+.I default rule
+(that unmatched scanner input is echoed to
+.I stdout)
+to be suppressed. If the scanner encounters input that does not
+match any of its rules, it aborts with an error. This option is
+useful for finding holes in a scanner's rule set.
+.TP
+.B -v
+has the same meaning as for lex (print to
+.I stderr
+a summary of statistics of the generated scanner). Many more statistics
+are printed, though, and the summary spans several lines. Most
+of the statistics are meaningless to the casual flex user, but the
+first line identifies the version of flex, which is useful for figuring
+out where you stand with respect to patches and new releases.
+.TP
+.B -F
+specifies that the
+.ul
+fast
+scanner table representation should be used. This representation is
+about as fast as the full table representation
+.ul
+(-f),
+and for some sets of patterns will be considerably smaller (and for
+others, larger). In general, if the pattern set contains both "keywords"
+and a catch-all, "identifier" rule, such as in the set:
+.nf
+
+ "case" return ( TOK_CASE );
+ "switch" return ( TOK_SWITCH );
+ ...
+ "default" return ( TOK_DEFAULT );
+ [a-z]+ return ( TOK_ID );
+
+.fi
+then you're better off using the full table representation. If only
+the "identifier" rule is present and you then use a hash table or some such
+to detect the keywords, you're better off using
+.ul
+-F.
+.IP
+This option is equivalent to
+.B -cF
+(see below).
+.TP
+.B -I
+instructs flex to generate an
+.I interactive
+scanner. Normally, scanners generated by flex always look ahead one
+character before deciding that a rule has been matched. At the cost of
+some scanning overhead, flex will generate a scanner which only looks ahead
+when needed. Such scanners are called
+.I interactive
+because if you want to write a scanner for an interactive system such as a
+command shell, you will probably want the user's input to be terminated
+with a newline, and without
+.B -I
+the user will have to type a character in addition to the newline in order
+to have the newline recognized. This leads to dreadful interactive
+performance.
+.IP
+If all this seems to confusing, here's the general rule: if a human will
+be typing in input to your scanner, use
+.B -I,
+otherwise don't; if you don't care about how fast your scanners run and
+don't want to make any assumptions about the input to your scanner,
+always use
+.B -I.
+.IP
+Note,
+.B -I
+cannot be used in conjunction with
+.I full
+or
+.I fast tables,
+i.e., the
+.B -f, -F, -cf,
+or
+.B -cF
+flags.
+.TP
+.B -L
+instructs flex to not generate
+.B #line
+directives (see below).
+.TP
+.B -T
+makes flex run in
+.I trace
+mode. It will generate a lot of messages to stdout concerning
+the form of the input and the resultant non-deterministic and deterministic
+finite automatons. This option is mostly for use in maintaining flex.
+.TP
+.B -c[efmF]
+controls the degree of table compression.
+.B -ce
+directs flex to construct
+.I equivalence classes,
+i.e., sets of characters
+which have identical lexical properties (for example, if the only
+appearance of digits in the flex input is in the character class
+"[0-9]" then the digits '0', '1', ..., '9' will all be put
+in the same equivalence class).
+.B -cf
+specifies that the
+.I full
+scanner tables should be generated - flex should not compress the
+tables by taking advantages of similar transition functions for
+different states.
+.B -cF
+specifies that the alternate fast scanner representation (described
+above under the
+.B -F
+flag)
+should be used.
+.B -cm
+directs flex to construct
+.I meta-equivalence classes,
+which are sets of equivalence classes (or characters, if equivalence
+classes are not being used) that are commonly used together.
+A lone
+.B -c
+specifies that the scanner tables should be compressed but neither
+equivalence classes nor meta-equivalence classes should be used.
+.IP
+The options
+.B -cf
+or
+.B -cF
+and
+.B -cm
+do not make sense together - there is no opportunity for meta-equivalence
+classes if the table is not being compressed. Otherwise the options
+may be freely mixed.
+.IP
+The default setting is
+.B -cem
+which specifies that flex should generate equivalence classes
+and meta-equivalence classes. This setting provides the highest
+degree of table compression. You can trade off
+faster-executing scanners at the cost of larger tables with
+the following generally being true:
+.nf
+
+ slowest smallest
+ -cem
+ -ce
+ -cm
+ -c
+ -c{f,F}e
+ -c{f,F}
+ fastest largest
+
+.fi
+Note that scanners with the smallest tables compile the quickest, so
+during development you will usually want to use the default, maximal
+compression.
+.TP
+.B -Sskeleton_file
+overrides the default skeleton file from which flex constructs
+its scanners. You'll never need this option unless you are doing
+flex maintenance or development.
+.SH INCOMPATIBILITIES WITH LEX
+.I flex
+is fully compatible with
+.I lex
+with the following exceptions:
+.IP -
+There is no run-time library to link with. You needn't
+specify
+.I -ll
+when linking, and you must supply a main program. (Hacker's note: since
+the lex library contains a main() which simply calls yylex(), you actually
+.I can
+be lazy and not supply your own main program and link with
+.I -ll.)
+.IP -
+lex's
+.B %r
+(Ratfor scanners) and
+.B %t
+(translation table) options
+are not supported.
+.IP -
+The do-nothing
+.ul
+-n
+flag is not supported.
+.IP -
+When definitions are expanded, flex encloses them in parentheses.
+With lex, the following
+.nf
+
+ NAME [A-Z][A-Z0-9]*
+ %%
+ foo{NAME}? printf( "Found it\\n" );
+ %%
+
+.fi
+will not match the string "foo" because when the macro
+is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
+and the precedence is such that the '?' is associated with
+"[A-Z0-9]*". With flex, the rule will be expanded to
+"foo([A-z][A-Z0-9]*)?" and so the string "foo" will match.
+Note that because of this, the
+.B ^, $, <s>,
+and
+.B /
+operators cannot be used in a definition.
+.IP -
+The undocumented lex-scanner internal variable
+.B yylineno
+is not supported.
+.IP -
+The
+.B input()
+routine is not redefinable, though may be called to read characters
+following whatever has been matched by a rule. If
+.B input()
+encounters an end-of-file the normal
+.B yywrap()
+processing is done. A ``real'' end-of-file is returned as
+.I EOF.
+.IP
+Input can be controlled by redefining the
+.B YY_INPUT
+macro.
+YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its
+action is to place up to max_size characters in the character buffer "buf"
+and return in the integer variable "result" either the
+number of characters read or the constant YY_NULL (0 on Unix systems)
+systems) to indicate EOF. The default YY_INPUT reads from the
+file-pointer "yyin" (which is by default
+.I stdin),
+so if you
+just want to change the input file, you needn't redefine
+YY_INPUT - just point yyin at the input file.
+.IP
+A sample redefinition of YY_INPUT (in the first section of the input
+file):
+.nf
+
+ %{
+ #undef YY_INPUT
+ #define YY_INPUT(buf,result,max_size) \\
+ result = (buf[0] = getchar()) == EOF ? YY_NULL : 1;
+ %}
+
+.fi
+You also can add in things like counting keeping track of the
+input line number this way; but don't expect your scanner to
+go very fast.
+.IP -
+.B output()
+is not supported.
+Output from the ECHO macro is done to the file-pointer
+"yyout" (default
+.I stdout).
+.IP -
+If you are providing your own yywrap() routine, you must "#undef yywrap"
+first.
+.IP -
+To refer to yytext outside of your scanner source file, use
+"extern char *yytext;" rather than "extern char yytext[];".
+.IP -
+.B yyleng
+is a macro and not a variable, and hence cannot be accessed outside
+of the scanner source file.
+.IP -
+flex reads only one input file, while lex's input is made
+up of the concatenation of its input files.
+.IP -
+The name
+.bd
+FLEX_SCANNER
+is #define'd so scanners may be written for use with either
+flex or lex.
+.IP -
+The macro
+.bd
+YY_USER_ACTION
+can be redefined to provide an action
+which is always executed prior to the matched rule's action. For example,
+it could be #define'd to call a routine to convert yytext to lower-case,
+or to copy yyleng to a global variable to make it accessible outside of
+the scanner source file.
+.IP -
+In the generated scanner, rules are separated using
+.bd
+YY_BREAK
+instead of simple "break"'s. This allows, for example, C++ users to
+#define YY_BREAK to do nothing (while being very careful that every
+rule ends with a "break" or a "return"!) to avoid suffering from
+unreachable statement warnings where a rule's action ends with "return".
+.SH ENHANCEMENTS
+.IP -
+.I Exclusive start-conditions
+can be declared by using
+.B %x
+instead of
+.B %s.
+These start-conditions have the property that when they are active,
+.I no other rules are active.
+Thus a set of rules governed by the same exclusive start condition
+describe a scanner which is independent of any of the other rules in
+the flex input. This feature makes it easy to specify "mini-scanners"
+which scan portions of the input that are syntactically different
+from the rest (e.g., comments).
+.IP -
+.I yyterminate()
+can be used in lieu of a return statement in an action. It terminates
+the scanner and returns a 0 to the scanner's caller, indicating "all done".
+.IP -
+.I End-of-file rules.
+The special rule "<<EOF>>" indicates
+actions which are to be taken when an end-of-file is
+encountered and yywrap() returns non-zero (i.e., indicates
+no further files to process). The action can either
+point yyin at a new file to process, in which case the
+action should finish with
+.I YY_NEW_FILE
+(this is a branch, so subsequent code in the action won't
+be executed), or it should finish with a
+.I return
+statement. <<EOF>> rules may not be used with other
+patterns; they may only be qualified with a list of start
+conditions. If an unqualified <<EOF>> rule is given, it
+applies only to the INITIAL start condition, and
+.I not
+to
+.B %s
+start conditions.
+These rules are useful for catching things like unclosed comments.
+An example:
+.nf
+
+ %x quote
+ %%
+ ...
+ <quote><<EOF>> {
+ error( "unterminated quote" );
+ yyterminate();
+ }
+ <<EOF>> {
+ yyin = fopen( next_file, "r" );
+ YY_NEW_FILE;
+ }
+
+.fi
+.IP -
+flex dynamically resizes its internal tables, so directives like "%a 3000"
+are not needed when specifying large scanners.
+.IP -
+The scanning routine generated by flex is declared using the macro
+.B YY_DECL.
+By redefining this macro you can change the routine's name and
+its calling sequence. For example, you could use:
+.nf
+
+ #undef YY_DECL
+ #define YY_DECL float lexscan( a, b ) float a, b;
+
+.fi
+to give it the name
+.I lexscan,
+returning a float, and taking two floats as arguments. Note that
+if you give arguments to the scanning routine, you must terminate
+the definition with a semi-colon (;).
+.IP -
+flex generates
+.B #line
+directives mapping lines in the output to
+their origin in the input file.
+.IP -
+You can put multiple actions on the same line, separated with
+semi-colons. With lex, the following
+.nf
+
+ foo handle_foo(); return 1;
+
+.fi
+is truncated to
+.nf
+
+ foo handle_foo();
+
+.fi
+flex does not truncate the action. Actions that are not enclosed in
+braces are terminated at the end of the line.
+.IP -
+Actions can be begun with
+.B %{
+and terminated with
+.B %}.
+In this case, flex does not count braces to figure out where the
+action ends - actions are terminated by the closing
+.B %}.
+This feature is useful when the enclosed action has extraneous
+braces in it (usually in comments or inside inactive #ifdef's)
+that throw off the brace-count.
+.IP -
+All of the scanner actions (e.g.,
+.B ECHO, yywrap ...)
+except the
+.B unput()
+and
+.B input()
+routines,
+are written as macros, so they can be redefined if necessary
+without requiring a separate library to link to.
+.IP -
+When
+.B yywrap()
+indicates that the scanner is done processing (it does this by returning
+non-zero), on subsequent calls the scanner will always immediately return
+a value of 0. To restart it on a new input file, the action
+.B yyrestart()
+is used. It takes one argument, the new input file. It closes the
+previous yyin (unless stdin) and sets up the scanners internal variables
+so that the next call to yylex() will start scanning the new file. This
+functionality is useful for, e.g., programs which will process a file, do some
+work, and then get a message to parse another file.
+.IP -
+Flex scans the code in section 1 (inside %{}'s) and the actions for
+occurrences of
+.I REJECT
+and
+.I yymore().
+If it doesn't see any, it assumes the features are not used and generates
+higher-performance scanners. Flex tries to be correct in identifying
+uses but can be fooled (for example, if a reference is made in a macro from
+a #include file). If this happens (a feature is used and flex didn't
+realize it) you will get a compile-time error of the form
+.nf
+
+ reject_used_but_not_detected undefined
+
+.fi
+You can tell flex that a feature is used even if it doesn't think so
+with
+.B %used
+followed by the name of the feature (for example, "%used REJECT");
+similarly, you can specify that a feature is
+.I not
+used even though it thinks it is with
+.B %unused.
+.IP -
+Comments may be put in the first section of the input by preceding
+them with '#'.
+.SH FILES
+.TP
+.I flex.skel
+skeleton scanner
+.TP
+.I lex.yy.c
+generated scanner (called
+.I lexyy.c
+on some systems).
+.TP
+.I lex.backtrack
+backtracking information for
+.B -b
+flag (called
+.I lex.bck
+on some systems).
+.SH "SEE ALSO"
+.LP
+lex(1)
+.LP
+M. E. Lesk and E. Schmidt,
+.I LEX - Lexical Analyzer Generator
+.SH AUTHOR
+Vern Paxson, with the help of many ideas and much inspiration from
+Van Jacobson. Original version by Jef Poskanzer. Fast table
+representation is a partial implementation of a design done by Van
+Jacobson. The implementation was done by Kevin Gong and Vern Paxson.
+.LP
+Thanks to the many flex beta-testers and feedbackers, especially Casey
+Leedom, Frederic Brehm, Nick Christopher, Chris Faylor, Eric Goldman, Eric
+Hughes, Greg Lee, Craig Leres, Mohamed el Lozy, Jim Meyering, Esmond Pitt,
+Jef Poskanzer, and Dave Tallman. Thanks to Keith Bostic, John Gilmore, Bob
+Mulcahy, Rich Salz, and Richard Stallman for help with various distribution
+headaches.
+.LP
+Send comments to:
+.nf
+
+ Vern Paxson
+ Real Time Systems
+ Bldg. 46A
+ Lawrence Berkeley Laboratory
+ 1 Cyclotron Rd.
+ Berkeley, CA 94720
+
+ (415) 486-6411
+
+ vern@csam.lbl.gov
+ vern@rtsg.ee.lbl.gov
+ ucbvax!csam.lbl.gov!vern
+
+.fi
+I will be gone from mid-July '89 through mid-August '89. From August on,
+the addresses are:
+.nf
+
+ vern@cs.cornell.edu
+
+ Vern Paxson
+ CS Department
+ Grad Office
+ 4126 Upson
+ Cornell University
+ Ithaca, NY 14853-7501
+
+ <no phone number yet>
+
+.fi
+Email sent to the former addresses should continue to be forwarded for
+quite a while. Also, it looks like my username will be "paxson" and
+not "vern". I'm planning on having a mail alias set up so "vern" will
+still work, but if you encounter problems try "paxson".
+.SH DIAGNOSTICS
+.LP
+.I flex scanner jammed -
+a scanner compiled with
+.B -s
+has encountered an input string which wasn't matched by
+any of its rules.
+.LP
+.I flex input buffer overflowed -
+a scanner rule matched a string long enough to overflow the
+scanner's internal input buffer (16K bytes - controlled by
+.B YY_BUF_MAX
+in "flex.skel").
+.LP
+.I old-style lex command ignored -
+the flex input contains a lex command (e.g., "%n 1000") which
+is being ignored.
+.SH BUGS
+.LP
+Some trailing context
+patterns cannot be properly matched and generate
+warning messages ("Dangerous trailing context"). These are
+patterns where the ending of the
+first part of the rule matches the beginning of the second
+part, such as "zx*/xy*", where the 'x*' matches the 'x' at
+the beginning of the trailing context. (Lex doesn't get these
+patterns right either.)
+If desperate, you can use
+.B yyless()
+to effect arbitrary trailing context.
+.LP
+.I variable
+trailing context (where both the leading and trailing parts do not have
+a fixed length) entails the same performance loss as
+.I REJECT
+(i.e., substantial).
+.LP
+For some trailing context rules, parts which are actually fixed-length are
+not recognized as such, leading to the abovementioned performance loss.
+In particular, parts using '|' or {n} are always considered variable-length.
+.LP
+Use of unput() or input() trashes the current yytext and yyleng.
+.LP
+Use of unput() to push back more text than was matched can
+result in the pushed-back text matching a beginning-of-line ('^')
+rule even though it didn't come at the beginning of the line.
+.LP
+yytext and yyleng cannot be modified within a flex action.
+.LP
+Nulls are not allowed in flex inputs or in the inputs to
+scanners generated by flex. Their presence generates fatal
+errors.
+.LP
+Flex does not generate correct #line directives for code internal
+to the scanner; thus, bugs in
+.I
+flex.skel
+yield bogus line numbers.
+.LP
+Pushing back definitions enclosed in ()'s can result in nasty,
+difficult-to-understand problems like:
+.nf
+
+ {DIG} [0-9] /* a digit */
+
+.fi
+In which the pushed-back text is "([0-9] /* a digit */)".
+.LP
+Due to both buffering of input and read-ahead, you cannot intermix
+calls to stdio routines, such as, for example,
+.B getchar()
+with flex rules and expect it to work. Call
+.B input()
+instead.
+.LP
+The total table entries listed by the
+.B -v
+flag excludes the number of table entries needed to determine
+what rule has been matched. The number of entries is equal
+to the number of DFA states if the scanner does not use REJECT,
+and somewhat greater than the number of states if it does.
+.LP
+To be consistent with ANSI C, the escape sequence \\xhh should
+be recognized for hexadecimal escape sequences, such as '\\x41' for 'A'.
+.LP
+It would be useful if flex wrote to lex.yy.c a summary of the flags used in
+its generation (such as which table compression options).
+.LP
+The scanner run-time speeds still have not been optimized as much
+as they deserve. Van Jacobson's work shows that the can go
+faster still.
+.LP
+The utility needs more complete documentation.