summaryrefslogtreecommitdiff
path: root/static/openbsd/man1/flex.1
diff options
context:
space:
mode:
authorJacob McDonnell <jacob@jacobmcdonnell.com>2026-04-25 19:54:44 -0400
committerJacob McDonnell <jacob@jacobmcdonnell.com>2026-04-25 19:54:44 -0400
commita9157ce950dfe2fc30795d43b9d79b9d1bffc48b (patch)
tree9df484304b560466d145e662c1c254ff0e9ae0ba /static/openbsd/man1/flex.1
parent160aa82b2d39c46ad33723d7d909cb4972efbb03 (diff)
docs: Added All OpenBSD Manuals
Diffstat (limited to 'static/openbsd/man1/flex.1')
-rw-r--r--static/openbsd/man1/flex.14427
1 files changed, 4427 insertions, 0 deletions
diff --git a/static/openbsd/man1/flex.1 b/static/openbsd/man1/flex.1
new file mode 100644
index 00000000..d06f2ffd
--- /dev/null
+++ b/static/openbsd/man1/flex.1
@@ -0,0 +1,4427 @@
+.\" $OpenBSD: flex.1,v 1.47 2025/05/22 07:31:18 bentley Exp $
+.\"
+.\" Copyright (c) 1990 The Regents of the University of California.
+.\" All rights reserved.
+.\"
+.\" This code is derived from software contributed to Berkeley by
+.\" Vern Paxson.
+.\"
+.\" The United States Government has rights in this work pursuant
+.\" to contract no. DE-AC03-76SF00098 between the United States
+.\" Department of Energy and the University of California.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\"
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\"
+.\" Neither the name of the University nor the names of its contributors
+.\" may be used to endorse or promote products derived from this software
+.\" without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
+.\" IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
+.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+.\" PURPOSE.
+.\"
+.Dd $Mdocdate: May 22 2025 $
+.Dt FLEX 1
+.Os
+.Sh NAME
+.Nm flex ,
+.Nm flex++ ,
+.Nm lex
+.Nd fast lexical analyzer generator
+.Sh SYNOPSIS
+.Nm
+.Bk -words
+.Op Fl 78BbdFfhIiLlnpsTtVvw+?
+.Op Fl C Ns Op Cm aeFfmr
+.Op Fl Fl help
+.Op Fl Fl version
+.Op Fl o Ns Ar output
+.Op Fl P Ns Ar prefix
+.Op Fl S Ns Ar skeleton
+.Op Ar
+.Ek
+.Sh DESCRIPTION
+.Nm
+is a tool for generating
+.Em scanners :
+programs which recognize lexical patterns in text.
+.Nm
+reads the given input files, or its standard input if no file names are given,
+for a description of a scanner to generate.
+The description is in the form of pairs of regular expressions and C code,
+called
+.Em rules .
+.Nm
+generates as output a C source file,
+.Pa lex.yy.c ,
+which defines a routine
+.Fn yylex .
+This file is compiled and linked with the
+.Fl lfl
+library to produce an executable.
+When the executable is run, it analyzes its input for occurrences
+of the regular expressions.
+Whenever it finds one, it executes the corresponding C code.
+.Pp
+.Nm lex
+is a synonym for
+.Nm flex .
+.Nm flex++
+is a synonym for
+.Nm
+.Fl + .
+.Pp
+The manual includes both tutorial and reference sections:
+.Bl -ohang
+.It Sy Some Simple Examples
+.It Sy Format of the Input File
+.It Sy Patterns
+The extended regular expressions used by
+.Nm .
+.It Sy How the Input is Matched
+The rules for determining what has been matched.
+.It Sy Actions
+How to specify what to do when a pattern is matched.
+.It Sy The Generated Scanner
+Details regarding the scanner that
+.Nm
+produces;
+how to control the input source.
+.It Sy Start Conditions
+Introducing context into scanners, and managing
+.Qq mini-scanners .
+.It Sy Multiple Input Buffers
+How to manipulate multiple input sources;
+how to scan from strings instead of files.
+.It Sy End-of-File Rules
+Special rules for matching the end of the input.
+.It Sy Miscellaneous Macros
+A summary of macros available to the actions.
+.It Sy Values Available to the User
+A summary of values available to the actions.
+.It Sy Interfacing with Yacc
+Connecting flex scanners together with
+.Xr yacc 1
+parsers.
+.It Sy Options
+.Nm
+command-line options, and the
+.Dq %option
+directive.
+.It Sy Performance Considerations
+How to make scanners go as fast as possible.
+.It Sy Generating C++ Scanners
+The
+.Pq experimental
+facility for generating C++ scanner classes.
+.It Sy Incompatibilities with Lex and POSIX
+How
+.Nm
+differs from
+.At
+.Nm lex
+and the
+.Tn POSIX
+.Nm lex
+standard.
+.It Sy Files
+Files used by
+.Nm .
+.It Sy Diagnostics
+Those error messages produced by
+.Nm
+.Pq or scanners it generates
+whose meanings might not be apparent.
+.It Sy See Also
+Other documentation, related tools.
+.It Sy Authors
+Includes contact information.
+.It Sy Bugs
+Known problems with
+.Nm .
+.El
+.Sh SOME SIMPLE EXAMPLES
+First some simple examples to get the flavor of how one uses
+.Nm .
+The following
+.Nm
+input specifies a scanner which whenever it encounters the string
+.Qq username
+will replace it with the user's login name:
+.Bd -literal -offset indent
+%%
+username printf("%s", getlogin());
+.Ed
+.Pp
+By default, any text not matched by a
+.Nm
+scanner is copied to the output, so the net effect of this scanner is
+to copy its input file to its output with each occurrence of
+.Qq username
+expanded.
+In this input, there is just one rule.
+.Qq username
+is the
+.Em pattern
+and the
+.Qq printf
+is the
+.Em action .
+The
+.Qq %%
+marks the beginning of the rules.
+.Pp
+Here's another simple example:
+.Bd -literal -offset indent
+%{
+int num_lines = 0, num_chars = 0;
+%}
+
+%%
+\en ++num_lines; ++num_chars;
+\&. ++num_chars;
+
+%%
+main()
+{
+ yylex();
+ printf("# of lines = %d, # of chars = %d\en",
+ num_lines, num_chars);
+}
+.Ed
+.Pp
+This scanner counts the number of characters and the number
+of lines in its input
+(it produces no output other than the final report on the counts).
+The first line declares two globals,
+.Qq num_lines
+and
+.Qq num_chars ,
+which are accessible both inside
+.Fn yylex
+and in the
+.Fn main
+routine declared after the second
+.Qq %% .
+There are two rules, one which matches a newline
+.Pq \&"\en\&"
+and increments both the line count and the character count,
+and one which matches any character other than a newline
+(indicated by the
+.Qq \&.
+regular expression).
+.Pp
+A somewhat more complicated example:
+.Bd -literal -offset indent
+/* scanner for a toy Pascal-like language */
+
+DIGIT [0-9]
+ID [a-z][a-z0-9]*
+
+%%
+
+{DIGIT}+ {
+ printf("An integer: %s\en", yytext);
+}
+
+{DIGIT}+"."{DIGIT}* {
+ printf("A float: %s\en", yytext);
+}
+
+if|then|begin|end|procedure|function {
+ printf("A keyword: %s\en", yytext);
+}
+
+{ID} printf("An identifier: %s\en", yytext);
+
+"+"|"-"|"*"|"/" printf("An operator: %s\en", yytext);
+
+"{"[^}\en]*"}" /* eat up one-line comments */
+
+[ \et\en]+ /* eat up whitespace */
+
+\&. printf("Unrecognized character: %s\en", yytext);
+
+%%
+
+int
+main(int argc, char *argv[])
+{
+ ++argv; --argc; /* skip over program name */
+ if (argc > 0)
+ yyin = fopen(argv[0], "r");
+ else
+ yyin = stdin;
+
+ yylex();
+}
+.Ed
+.Pp
+This is the beginnings of a simple scanner for a language like Pascal.
+It identifies different types of
+.Em tokens
+and reports on what it has seen.
+.Pp
+The details of this example will be explained in the following sections.
+.Sh FORMAT OF THE INPUT FILE
+The
+.Nm
+input file consists of three sections, separated by a line with just
+.Qq %%
+in it:
+.Bd -unfilled -offset indent
+definitions
+%%
+rules
+%%
+user code
+.Ed
+.Pp
+The
+.Em definitions
+section contains declarations of simple
+.Em name
+definitions to simplify the scanner specification, and declarations of
+.Em start conditions ,
+which are explained in a later section.
+.Pp
+Name definitions have the form:
+.Pp
+.D1 name definition
+.Pp
+The
+.Qq name
+is a word beginning with a letter or an underscore
+.Pq Sq _
+followed by zero or more letters, digits,
+.Sq _ ,
+or
+.Sq -
+.Pq dash .
+The definition is taken to begin at the first non-whitespace character
+following the name and continuing to the end of the line.
+The definition can subsequently be referred to using
+.Qq {name} ,
+which will expand to
+.Qq (definition) .
+For example:
+.Bd -literal -offset indent
+DIGIT [0-9]
+ID [a-z][a-z0-9]*
+.Ed
+.Pp
+This defines
+.Qq DIGIT
+to be a regular expression which matches a single digit, and
+.Qq ID
+to be a regular expression which matches a letter
+followed by zero-or-more letters-or-digits.
+A subsequent reference to
+.Pp
+.Dl {DIGIT}+"."{DIGIT}*
+.Pp
+is identical to
+.Pp
+.Dl ([0-9])+"."([0-9])*
+.Pp
+and matches one-or-more digits followed by a
+.Sq .\&
+followed by zero-or-more digits.
+.Pp
+The
+.Em rules
+section of the
+.Nm
+input contains a series of rules of the form:
+.Pp
+.Dl pattern action
+.Pp
+The pattern must be unindented and the action must begin
+on the same line.
+.Pp
+See below for a further description of patterns and actions.
+.Pp
+Finally, the user code section is simply copied to
+.Pa lex.yy.c
+verbatim.
+It is used for companion routines which call or are called by the scanner.
+The presence of this section is optional;
+if it is missing, the second
+.Qq %%
+in the input file may be skipped too.
+.Pp
+In the definitions and rules sections, any indented text or text enclosed in
+.Sq %{
+and
+.Sq %}
+is copied verbatim to the output
+.Pq with the %{}'s removed .
+The %{}'s must appear unindented on lines by themselves.
+.Pp
+In the rules section,
+any indented or %{} text appearing before the first rule may be used to
+declare variables which are local to the scanning routine and
+.Pq after the declarations
+code which is to be executed whenever the scanning routine is entered.
+Other indented or %{} text in the rule section is still copied to the output,
+but its meaning is not well-defined and it may well cause compile-time
+errors (this feature is present for
+.Tn POSIX
+compliance; see below for other such features).
+.Pp
+In the definitions section
+.Pq but not in the rules section ,
+an unindented comment
+(i.e., a line beginning with
+.Qq /* )
+is also copied verbatim to the output up to the next
+.Qq */ .
+.Sh PATTERNS
+The patterns in the input are written using an extended set of regular
+expressions.
+These are:
+.Bl -tag -width "XXXXXXXX"
+.It x
+Match the character
+.Sq x .
+.It .\&
+Any character
+.Pq byte
+except newline.
+.It [xyz]
+A
+.Qq character class ;
+in this case, the pattern matches either an
+.Sq x ,
+a
+.Sq y ,
+or a
+.Sq z .
+.It [abj-oZ]
+A
+.Qq character class
+with a range in it; matches an
+.Sq a ,
+a
+.Sq b ,
+any letter from
+.Sq j
+through
+.Sq o ,
+or a
+.Sq Z .
+.It [^A-Z]
+A
+.Qq negated character class ,
+i.e., any character but those in the class.
+In this case, any character EXCEPT an uppercase letter.
+.It [^A-Z\en]
+Any character EXCEPT an uppercase letter or a newline.
+.It r*
+Zero or more r's, where
+.Sq r
+is any regular expression.
+.It r+
+One or more r's.
+.It r?
+Zero or one r's (that is,
+.Qq an optional r ) .
+.It r{2,5}
+Anywhere from two to five r's.
+.It r{2,}
+Two or more r's.
+.It r{4}
+Exactly 4 r's.
+.It {name}
+The expansion of the
+.Qq name
+definition
+.Pq see above .
+.It \&"[xyz]\e\&"foo\&"
+The literal string: [xyz]"foo.
+.It \eX
+If
+.Sq X
+is an
+.Sq a ,
+.Sq b ,
+.Sq f ,
+.Sq n ,
+.Sq r ,
+.Sq t ,
+or
+.Sq v ,
+then the ANSI-C interpretation of
+.Sq \eX .
+Otherwise, a literal
+.Sq X
+(used to escape operators such as
+.Sq * ) .
+.It \e0
+A NUL character
+.Pq ASCII code 0 .
+.It \e123
+The character with octal value 123.
+.It \ex2a
+The character with hexadecimal value 2a.
+.It (r)
+Match an
+.Sq r ;
+parentheses are used to override precedence
+.Pq see below .
+.It rs
+The regular expression
+.Sq r
+followed by the regular expression
+.Sq s ;
+called
+.Qq concatenation .
+.It r|s
+Either an
+.Sq r
+or an
+.Sq s .
+.It r/s
+An
+.Sq r ,
+but only if it is followed by an
+.Sq s .
+The text matched by
+.Sq s
+is included when determining whether this rule is the
+.Qq longest match ,
+but is then returned to the input before the action is executed.
+So the action only sees the text matched by
+.Sq r .
+This type of pattern is called
+.Qq trailing context .
+(There are some combinations of r/s that
+.Nm
+cannot match correctly; see notes in the
+.Sx BUGS
+section below regarding
+.Qq dangerous trailing context . )
+.It ^r
+An
+.Sq r ,
+but only at the beginning of a line
+(i.e., just starting to scan, or right after a newline has been scanned).
+.It r$
+An
+.Sq r ,
+but only at the end of a line
+.Pq i.e., just before a newline .
+Equivalent to
+.Qq r/\en .
+.Pp
+Note that
+.Nm flex Ns 's
+notion of
+.Qq newline
+is exactly whatever the C compiler used to compile
+.Nm
+interprets
+.Sq \en
+as.
+.\" In particular, on some DOS systems you must either filter out \er's in the
+.\" input yourself, or explicitly use r/\er\en for
+.\" .Qq r$ .
+.It <s>r
+An
+.Sq r ,
+but only in start condition
+.Sq s
+.Pq see below for discussion of start conditions .
+.It <s1,s2,s3>r
+The same, but in any of start conditions s1, s2, or s3.
+.It <*>r
+An
+.Sq r
+in any start condition, even an exclusive one.
+.It <<EOF>>
+An end-of-file.
+.It <s1,s2><<EOF>>
+An end-of-file when in start condition s1 or s2.
+.El
+.Pp
+Note that inside of a character class, all regular expression operators
+lose their special meaning except escape
+.Pq Sq \e
+and the character class operators,
+.Sq - ,
+.Sq ]\& ,
+and, at the beginning of the class,
+.Sq ^ .
+.Pp
+The regular expressions listed above are grouped according to
+precedence, from highest precedence at the top to lowest at the bottom.
+Those grouped together have equal precedence.
+For example,
+.Pp
+.D1 foo|bar*
+.Pp
+is the same as
+.Pp
+.D1 (foo)|(ba(r*))
+.Pp
+since the
+.Sq *
+operator has higher precedence than concatenation,
+and concatenation higher than alternation
+.Pq Sq |\& .
+This pattern therefore matches
+.Em either
+the string
+.Qq foo
+.Em or
+the string
+.Qq ba
+followed by zero-or-more r's.
+To match
+.Qq foo
+or zero-or-more "bar"'s,
+use:
+.Pp
+.D1 foo|(bar)*
+.Pp
+and to match zero-or-more "foo"'s-or-"bar"'s:
+.Pp
+.D1 (foo|bar)*
+.Pp
+In addition to characters and ranges of characters, character classes
+can also contain character class
+.Em expressions .
+These are expressions enclosed inside
+.Sq [:
+and
+.Sq :]
+delimiters (which themselves must appear between the
+.Sq \&[
+and
+.Sq ]\&
+of the
+character class; other elements may occur inside the character class, too).
+The valid expressions are:
+.Bd -unfilled -offset indent
+[:alnum:] [:alpha:] [:blank:]
+[:cntrl:] [:digit:] [:graph:]
+[:lower:] [:print:] [:punct:]
+[:space:] [:upper:] [:xdigit:]
+.Ed
+.Pp
+These expressions all designate a set of characters equivalent to
+the corresponding standard C
+.Fn isXXX
+function.
+For example, [:alnum:] designates those characters for which
+.Xr isalnum 3
+returns true \- i.e., any alphabetic or numeric.
+Some systems don't provide
+.Xr isblank 3 ,
+so
+.Nm
+defines [:blank:] as a blank or a tab.
+.Pp
+For example, the following character classes are all equivalent:
+.Bd -unfilled -offset indent
+[[:alnum:]]
+[[:alpha:][:digit:]]
+[[:alpha:]0-9]
+[a-zA-Z0-9]
+.Ed
+.Pp
+If the scanner is case-insensitive (the
+.Fl i
+flag), then [:upper:] and [:lower:] are equivalent to [:alpha:].
+.Pp
+Some notes on patterns:
+.Bl -dash
+.It
+A negated character class such as the example
+.Qq [^A-Z]
+above will match a newline unless "\en"
+.Pq or an equivalent escape sequence
+is one of the characters explicitly present in the negated character class
+(e.g.,
+.Qq [^A-Z\en] ) .
+This is unlike how many other regular expression tools treat negated character
+classes, but unfortunately the inconsistency is historically entrenched.
+Matching newlines means that a pattern like
+.Qq [^"]*
+can match the entire input unless there's another quote in the input.
+.It
+A rule can have at most one instance of trailing context
+(the
+.Sq /
+operator or the
+.Sq $
+operator).
+The start condition,
+.Sq ^ ,
+and
+.Qq <<EOF>>
+patterns can only occur at the beginning of a pattern and, as well as with
+.Sq /
+and
+.Sq $ ,
+cannot be grouped inside parentheses.
+A
+.Sq ^
+which does not occur at the beginning of a rule or a
+.Sq $
+which does not occur at the end of a rule loses its special properties
+and is treated as a normal character.
+.It
+The following are illegal:
+.Bd -unfilled -offset indent
+foo/bar$
+<sc1>foo<sc2>bar
+.Ed
+.Pp
+Note that the first of these, can be written
+.Qq foo/bar\en .
+.It
+The following will result in
+.Sq $
+or
+.Sq ^
+being treated as a normal character:
+.Bd -unfilled -offset indent
+foo|(bar$)
+foo|^bar
+.Ed
+.Pp
+If what's wanted is a
+.Qq foo
+or a bar-followed-by-a-newline, the following could be used
+(the special
+.Sq |\&
+action is explained below):
+.Bd -unfilled -offset indent
+foo |
+bar$ /* action goes here */
+.Ed
+.Pp
+A similar trick will work for matching a foo or a
+bar-at-the-beginning-of-a-line.
+.El
+.Sh HOW THE INPUT IS MATCHED
+When the generated scanner is run,
+it analyzes its input looking for strings which match any of its patterns.
+If it finds more than one match,
+it takes the one matching the most text
+(for trailing context rules, this includes the length of the trailing part,
+even though it will then be returned to the input).
+If it finds two or more matches of the same length,
+the rule listed first in the
+.Nm
+input file is chosen.
+.Pp
+Once the match is determined, the text corresponding to the match
+(called the
+.Em token )
+is made available in the global character pointer
+.Fa yytext ,
+and its length in the global integer
+.Fa yyleng .
+The
+.Em action
+corresponding to the matched pattern is then executed
+.Pq a more detailed description of actions follows ,
+and then the remaining input is scanned for another match.
+.Pp
+If no match is found, then the default rule is executed:
+the next character in the input is considered matched and
+copied to the standard output.
+Thus, the simplest legal
+.Nm
+input is:
+.Pp
+.D1 %%
+.Pp
+which generates a scanner that simply copies its input
+.Pq one character at a time
+to its output.
+.Pp
+Note that
+.Fa yytext
+can be defined in two different ways:
+either as a character pointer or as a character array.
+Which definition
+.Nm
+uses can be controlled by including one of the special directives
+.Dq %pointer
+or
+.Dq %array
+in the first
+.Pq definitions
+section of flex input.
+The default is
+.Dq %pointer ,
+unless the
+.Fl l
+.Nm lex
+compatibility option is used, in which case
+.Fa yytext
+will be an array.
+The advantage of using
+.Dq %pointer
+is substantially faster scanning and no buffer overflow when matching
+very large tokens
+.Pq unless not enough dynamic memory is available .
+The disadvantage is that actions are restricted in how they can modify
+.Fa yytext
+.Pq see the next section ,
+and calls to the
+.Fn unput
+function destroy the present contents of
+.Fa yytext ,
+which can be a considerable porting headache when moving between different
+.Nm lex
+versions.
+.Pp
+The advantage of
+.Dq %array
+is that
+.Fa yytext
+can be modified as much as wanted, and calls to
+.Fn unput
+do not destroy
+.Fa yytext
+.Pq see below .
+Furthermore, existing
+.Nm lex
+programs sometimes access
+.Fa yytext
+externally using declarations of the form:
+.Pp
+.D1 extern char yytext[];
+.Pp
+This definition is erroneous when used with
+.Dq %pointer ,
+but correct for
+.Dq %array .
+.Pp
+.Dq %array
+defines
+.Fa yytext
+to be an array of
+.Dv YYLMAX
+characters, which defaults to a fairly large value.
+The size can be changed by simply #define'ing
+.Dv YYLMAX
+to a different value in the first section of
+.Nm
+input.
+As mentioned above, with
+.Dq %pointer
+yytext grows dynamically to accommodate large tokens.
+While this means a
+.Dq %pointer
+scanner can accommodate very large tokens
+.Pq such as matching entire blocks of comments ,
+bear in mind that each time the scanner must resize
+.Fa yytext
+it also must rescan the entire token from the beginning, so matching such
+tokens can prove slow.
+.Fa yytext
+presently does not dynamically grow if a call to
+.Fn unput
+results in too much text being pushed back; instead, a run-time error results.
+.Pp
+Also note that
+.Dq %array
+cannot be used with C++ scanner classes
+.Pq the c++ option; see below .
+.Sh ACTIONS
+Each pattern in a rule has a corresponding action,
+which can be any arbitrary C statement.
+The pattern ends at the first non-escaped whitespace character;
+the remainder of the line is its action.
+If the action is empty,
+then when the pattern is matched the input token is simply discarded.
+For example, here is the specification for a program
+which deletes all occurrences of
+.Qq zap me
+from its input:
+.Bd -literal -offset indent
+%%
+"zap me"
+.Ed
+.Pp
+(It will copy all other characters in the input to the output since
+they will be matched by the default rule.)
+.Pp
+Here is a program which compresses multiple blanks and tabs down to
+a single blank, and throws away whitespace found at the end of a line:
+.Bd -literal -offset indent
+%%
+[ \et]+ putchar(' ');
+[ \et]+$ /* ignore this token */
+.Ed
+.Pp
+If the action contains a
+.Sq { ,
+then the action spans till the balancing
+.Sq }
+is found, and the action may cross multiple lines.
+.Nm
+knows about C strings and comments and won't be fooled by braces found
+within them, but also allows actions to begin with
+.Sq %{
+and will consider the action to be all the text up to the next
+.Sq %}
+.Pq regardless of ordinary braces inside the action .
+.Pp
+An action consisting solely of a vertical bar
+.Pq Sq |\&
+means
+.Qq same as the action for the next rule .
+See below for an illustration.
+.Pp
+Actions can include arbitrary C code,
+including return statements to return a value to whatever routine called
+.Fn yylex .
+Each time
+.Fn yylex
+is called, it continues processing tokens from where it last left off
+until it either reaches the end of the file or executes a return.
+.Pp
+Actions are free to modify
+.Fa yytext
+except for lengthening it
+(adding characters to its end \- these will overwrite later characters in the
+input stream).
+This, however, does not apply when using
+.Dq %array
+.Pq see above ;
+in that case,
+.Fa yytext
+may be freely modified in any way.
+.Pp
+Actions are free to modify
+.Fa yyleng
+except they should not do so if the action also includes use of
+.Fn yymore
+.Pq see below .
+.Pp
+There are a number of special directives which can be included within
+an action:
+.Bl -tag -width Ds
+.It ECHO
+Copies
+.Fa yytext
+to the scanner's output.
+.It BEGIN
+Followed by the name of a start condition, places the scanner in the
+corresponding start condition
+.Pq see below .
+.It REJECT
+Directs the scanner to proceed on to the
+.Qq second best
+rule which matched the input
+.Pq or a prefix of the input .
+The rule is chosen as described above in
+.Sx HOW THE INPUT IS MATCHED ,
+and
+.Fa yytext
+and
+.Fa yyleng
+set up appropriately.
+It may either be one which matched as much text
+as the originally chosen rule but came later in the
+.Nm
+input file, or one which matched less text.
+For example, the following will both count the
+words in the input and call the routine
+.Fn special
+whenever
+.Qq frob
+is seen:
+.Bd -literal -offset indent
+int word_count = 0;
+%%
+
+frob special(); REJECT;
+[^ \et\en]+ ++word_count;
+.Ed
+.Pp
+Without the
+.Em REJECT ,
+any "frob"'s in the input would not be counted as words,
+since the scanner normally executes only one action per token.
+Multiple
+.Em REJECT Ns 's
+are allowed,
+each one finding the next best choice to the currently active rule.
+For example, when the following scanner scans the token
+.Qq abcd ,
+it will write
+.Qq abcdabcaba
+to the output:
+.Bd -literal -offset indent
+%%
+a |
+ab |
+abc |
+abcd ECHO; REJECT;
+\&.|\en /* eat up any unmatched character */
+.Ed
+.Pp
+(The first three rules share the fourth's action since they use
+the special
+.Sq |\&
+action.)
+.Em REJECT
+is a particularly expensive feature in terms of scanner performance;
+if it is used in any of the scanner's actions it will slow down
+all of the scanner's matching.
+Furthermore,
+.Em REJECT
+cannot be used with the
+.Fl Cf
+or
+.Fl CF
+options
+.Pq see below .
+.Pp
+Note also that unlike the other special actions,
+.Em REJECT
+is a
+.Em branch ;
+code immediately following it in the action will not be executed.
+.It yymore()
+Tells the scanner that the next time it matches a rule, the corresponding
+token should be appended onto the current value of
+.Fa yytext
+rather than replacing it.
+For example, given the input
+.Qq mega-kludge
+the following will write
+.Qq mega-mega-kludge
+to the output:
+.Bd -literal -offset indent
+%%
+mega- ECHO; yymore();
+kludge ECHO;
+.Ed
+.Pp
+First
+.Qq mega-
+is matched and echoed to the output.
+Then
+.Qq kludge
+is matched, but the previous
+.Qq mega-
+is still hanging around at the beginning of
+.Fa yytext
+so the
+.Em ECHO
+for the
+.Qq kludge
+rule will actually write
+.Qq mega-kludge .
+.Pp
+Two notes regarding use of
+.Fn yymore :
+First,
+.Fn yymore
+depends on the value of
+.Fa yyleng
+correctly reflecting the size of the current token, so
+.Fa yyleng
+must not be modified when using
+.Fn yymore .
+Second, the presence of
+.Fn yymore
+in the scanner's action entails a minor performance penalty in the
+scanner's matching speed.
+.It yyless(n)
+Returns all but the first
+.Ar n
+characters of the current token back to the input stream, where they
+will be rescanned when the scanner looks for the next match.
+.Fa yytext
+and
+.Fa yyleng
+are adjusted appropriately (e.g.,
+.Fa yyleng
+will now be equal to
+.Ar n ) .
+For example, on the input
+.Qq foobar
+the following will write out
+.Qq foobarbar :
+.Bd -literal -offset indent
+%%
+foobar ECHO; yyless(3);
+[a-z]+ ECHO;
+.Ed
+.Pp
+An argument of 0 to
+.Fa yyless
+will cause the entire current input string to be scanned again.
+Unless how the scanner will subsequently process its input has been changed
+(using
+.Em BEGIN ,
+for example),
+this will result in an endless loop.
+.Pp
+Note that
+.Fa yyless
+is a macro and can only be used in the
+.Nm
+input file, not from other source files.
+.It unput(c)
+Puts the character
+.Ar c
+back into the input stream.
+It will be the next character scanned.
+The following action will take the current token and cause it
+to be rescanned enclosed in parentheses.
+.Bd -literal -offset indent
+{
+ int i;
+ char *yycopy;
+
+ /* Copy yytext because unput() trashes yytext */
+ if ((yycopy = strdup(yytext)) == NULL)
+ err(1, NULL);
+ unput(')');
+ for (i = yyleng - 1; i >= 0; --i)
+ unput(yycopy[i]);
+ unput('(');
+ free(yycopy);
+}
+.Ed
+.Pp
+Note that since each
+.Fn unput
+puts the given character back at the beginning of the input stream,
+pushing back strings must be done back-to-front.
+.Pp
+An important potential problem when using
+.Fn unput
+is that if using
+.Dq %pointer
+.Pq the default ,
+a call to
+.Fn unput
+destroys the contents of
+.Fa yytext ,
+starting with its rightmost character and devouring one character to
+the left with each call.
+If the value of
+.Fa yytext
+should be preserved after a call to
+.Fn unput
+.Pq as in the above example ,
+it must either first be copied elsewhere, or the scanner must be built using
+.Dq %array
+instead (see
+.Sx HOW THE INPUT IS MATCHED ) .
+.Pp
+Finally, note that EOF cannot be put back
+to attempt to mark the input stream with an end-of-file.
+.It input()
+Reads the next character from the input stream.
+For example, the following is one way to eat up C comments:
+.Bd -literal -offset indent
+%%
+"/*" {
+ int c;
+
+ for (;;) {
+ while ((c = input()) != '*' && c != EOF)
+ ; /* eat up text of comment */
+
+ if (c == '*') {
+ while ((c = input()) == '*')
+ ;
+ if (c == '/')
+ break; /* found the end */
+ }
+
+ if (c == EOF) {
+ errx(1, "EOF in comment");
+ break;
+ }
+ }
+}
+.Ed
+.Pp
+(Note that if the scanner is compiled using C++, then
+.Fn input
+is instead referred to as
+.Fn yyinput ,
+in order to avoid a name clash with the C++ stream by the name of input.)
+.It YY_FLUSH_BUFFER
+Flushes the scanner's internal buffer
+so that the next time the scanner attempts to match a token,
+it will first refill the buffer using
+.Dv YY_INPUT
+(see
+.Sx THE GENERATED SCANNER ,
+below).
+This action is a special case of the more general
+.Fn yy_flush_buffer
+function, described below in the section
+.Sx MULTIPLE INPUT BUFFERS .
+.It yyterminate()
+Can be used in lieu of a return statement in an action.
+It terminates the scanner and returns a 0 to the scanner's caller, indicating
+.Qq all done .
+By default,
+.Fn yyterminate
+is also called when an end-of-file is encountered.
+It is a macro and may be redefined.
+.El
+.Sh THE GENERATED SCANNER
+The output of
+.Nm
+is the file
+.Pa lex.yy.c ,
+which contains the scanning routine
+.Fn yylex ,
+a number of tables used by it for matching tokens,
+and a number of auxiliary routines and macros.
+By default,
+.Fn yylex
+is declared as follows:
+.Bd -unfilled -offset indent
+int yylex()
+{
+ ... various definitions and the actions in here ...
+}
+.Ed
+.Pp
+(If the environment supports function prototypes, then it will
+be "int yylex(void)".)
+This definition may be changed by defining the
+.Dv YY_DECL
+macro.
+For example:
+.Bd -literal -offset indent
+#define YY_DECL float lexscan(a, b) float a, b;
+.Ed
+.Pp
+would give the scanning routine the name
+.Em lexscan ,
+returning a float, and taking two floats as arguments.
+Note that if arguments are given to the scanning routine using a
+K&R-style/non-prototyped function declaration,
+the definition must be terminated with a semi-colon
+.Pq Sq ;\& .
+.Pp
+Whenever
+.Fn yylex
+is called, it scans tokens from the global input file
+.Pa yyin
+.Pq which defaults to stdin .
+It continues until it either reaches an end-of-file
+.Pq at which point it returns the value 0
+or one of its actions executes a
+.Em return
+statement.
+.Pp
+If the scanner reaches an end-of-file, subsequent calls are undefined
+unless either
+.Em yyin
+is pointed at a new input file
+.Pq in which case scanning continues from that file ,
+or
+.Fn yyrestart
+is called.
+.Fn yyrestart
+takes one argument, a
+.Fa FILE *
+pointer (which can be nil, if
+.Dv YY_INPUT
+has been set up to scan from a source other than
+.Em yyin ) ,
+and initializes
+.Em yyin
+for scanning from that file.
+Essentially there is no difference between just assigning
+.Em yyin
+to a new input file or using
+.Fn yyrestart
+to do so; the latter is available for compatibility with previous versions of
+.Nm ,
+and because it can be used to switch input files in the middle of scanning.
+It can also be used to throw away the current input buffer,
+by calling it with an argument of
+.Em yyin ;
+but better is to use
+.Dv YY_FLUSH_BUFFER
+.Pq see above .
+Note that
+.Fn yyrestart
+does not reset the start condition to
+.Em INITIAL
+(see
+.Sx START CONDITIONS ,
+below).
+.Pp
+If
+.Fn yylex
+stops scanning due to executing a
+.Em return
+statement in one of the actions, the scanner may then be called again and it
+will resume scanning where it left off.
+.Pp
+By default
+.Pq and for purposes of efficiency ,
+the scanner uses block-reads rather than simple
+.Xr getc 3
+calls to read characters from
+.Em yyin .
+The nature of how it gets its input can be controlled by defining the
+.Dv YY_INPUT
+macro.
+.Dv YY_INPUT Ns 's
+calling sequence is
+.Qq YY_INPUT(buf,result,max_size) .
+Its action is to place up to
+.Dv max_size
+characters in the character array
+.Em buf
+and return in the integer variable
+.Em result
+either the number of characters read or the constant
+.Dv YY_NULL
+(0 on
+.Ux
+systems)
+to indicate
+.Dv EOF .
+The default
+.Dv YY_INPUT
+reads from the global file-pointer
+.Qq yyin .
+.Pp
+A sample definition of
+.Dv YY_INPUT
+.Pq in the definitions section of the input file :
+.Bd -unfilled -offset indent
+%{
+#define YY_INPUT(buf,result,max_size) \e
+{ \e
+ int c = getchar(); \e
+ result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \e
+}
+%}
+.Ed
+.Pp
+This definition will change the input processing to occur
+one character at a time.
+.Pp
+When the scanner receives an end-of-file indication from
+.Dv YY_INPUT ,
+it then checks the
+.Fn yywrap
+function.
+If
+.Fn yywrap
+returns false
+.Pq zero ,
+then it is assumed that the function has gone ahead and set up
+.Em yyin
+to point to another input file, and scanning continues.
+If it returns true
+.Pq non-zero ,
+then the scanner terminates, returning 0 to its caller.
+Note that in either case, the start condition remains unchanged;
+it does not revert to
+.Em INITIAL .
+.Pp
+If you do not supply your own version of
+.Fn yywrap ,
+then you must either use
+.Dq %option noyywrap
+(in which case the scanner behaves as though
+.Fn yywrap
+returned 1), or you must link with
+.Fl lfl
+to obtain the default version of the routine, which always returns 1.
+.Pp
+Three routines are available for scanning from in-memory buffers rather
+than files:
+.Fn yy_scan_string ,
+.Fn yy_scan_bytes ,
+and
+.Fn yy_scan_buffer .
+See the discussion of them below in the section
+.Sx MULTIPLE INPUT BUFFERS .
+.Pp
+The scanner writes its
+.Em ECHO
+output to the
+.Em yyout
+global
+.Pq default, stdout ,
+which may be redefined by the user simply by assigning it to some other
+.Va FILE
+pointer.
+.Sh START CONDITIONS
+.Nm
+provides a mechanism for conditionally activating rules.
+Any rule whose pattern is prefixed with
+.Qq <sc>
+will only be active when the scanner is in the start condition named
+.Qq sc .
+For example,
+.Bd -literal -offset indent
+<STRING>[^"]* { /* eat up the string body ... */
+ ...
+}
+.Ed
+.Pp
+will be active only when the scanner is in the
+.Qq STRING
+start condition, and
+.Bd -literal -offset indent
+<INITIAL,STRING,QUOTE>\e. { /* handle an escape ... */
+ ...
+}
+.Ed
+.Pp
+will be active only when the current start condition is either
+.Qq INITIAL ,
+.Qq STRING ,
+or
+.Qq QUOTE .
+.Pp
+Start conditions are declared in the definitions
+.Pq first
+section of the input using unindented lines beginning with either
+.Sq %s
+or
+.Sq %x
+followed by a list of names.
+The former declares
+.Em inclusive
+start conditions, the latter
+.Em exclusive
+start conditions.
+A start condition is activated using the
+.Em BEGIN
+action.
+Until the next
+.Em BEGIN
+action is executed, rules with the given start condition will be active and
+rules with other start conditions will be inactive.
+If the start condition is inclusive,
+then rules with no start conditions at all will also be active.
+If it is exclusive,
+then only rules qualified with the start condition will be active.
+A set of rules contingent on the same exclusive start condition
+describe a scanner which is independent of any of the other rules in the
+.Nm
+input.
+Because of this, exclusive start conditions make it easy to specify
+.Qq mini-scanners
+which scan portions of the input that are syntactically different
+from the rest
+.Pq e.g., comments .
+.Pp
+If the distinction between inclusive and exclusive start conditions
+is still a little vague, here's a simple example illustrating the
+connection between the two.
+The set of rules:
+.Bd -literal -offset indent
+%s example
+%%
+
+<example>foo do_something();
+
+bar something_else();
+.Ed
+.Pp
+is equivalent to
+.Bd -literal -offset indent
+%x example
+%%
+
+<example>foo do_something();
+
+<INITIAL,example>bar something_else();
+.Ed
+.Pp
+Without the <INITIAL,example> qualifier, the
+.Dq bar
+pattern in the second example wouldn't be active
+.Pq i.e., couldn't match
+when in start condition
+.Dq example .
+If we just used <example> to qualify
+.Dq bar ,
+though, then it would only be active in
+.Dq example
+and not in
+.Em INITIAL ,
+while in the first example it's active in both,
+because in the first example the
+.Dq example
+start condition is an inclusive
+.Pq Sq %s
+start condition.
+.Pp
+Also note that the special start-condition specifier
+.Sq <*>
+matches every start condition.
+Thus, the above example could also have been written:
+.Bd -literal -offset indent
+%x example
+%%
+
+<example>foo do_something();
+
+<*>bar something_else();
+.Ed
+.Pp
+The default rule (to
+.Em ECHO
+any unmatched character) remains active in start conditions.
+It is equivalent to:
+.Bd -literal -offset indent
+<*>.|\en ECHO;
+.Ed
+.Pp
+.Dq BEGIN(0)
+returns to the original state where only the rules with
+no start conditions are active.
+This state can also be referred to as the start-condition
+.Em INITIAL ,
+so
+.Dq BEGIN(INITIAL)
+is equivalent to
+.Dq BEGIN(0) .
+(The parentheses around the start condition name are not required but
+are considered good style.)
+.Pp
+.Em BEGIN
+actions can also be given as indented code at the beginning
+of the rules section.
+For example, the following will cause the scanner to enter the
+.Qq SPECIAL
+start condition whenever
+.Fn yylex
+is called and the global variable
+.Fa enter_special
+is true:
+.Bd -literal -offset indent
+int enter_special;
+
+%x SPECIAL
+%%
+ if (enter_special)
+ BEGIN(SPECIAL);
+
+<SPECIAL>blahblahblah
+\&...more rules follow...
+.Ed
+.Pp
+To illustrate the uses of start conditions,
+here is a scanner which provides two different interpretations
+of a string like
+.Qq 123.456 .
+By default it will treat it as three tokens: the integer
+.Qq 123 ,
+a dot
+.Pq Sq .\& ,
+and the integer
+.Qq 456 .
+But if the string is preceded earlier in the line by the string
+.Qq expect-floats
+it will treat it as a single token, the floating-point number 123.456:
+.Bd -literal -offset indent
+%{
+#include <math.h>
+%}
+%s expect
+
+%%
+expect-floats BEGIN(expect);
+
+<expect>[0-9]+"."[0-9]+ {
+ printf("found a float, = %s\en", yytext);
+}
+<expect>\en {
+ /*
+ * That's the end of the line, so
+ * we need another "expect-number"
+ * before we'll recognize any more
+ * numbers.
+ */
+ BEGIN(INITIAL);
+}
+
+[0-9]+ {
+ printf("found an integer, = %s\en", yytext);
+}
+
+"." printf("found a dot\en");
+.Ed
+.Pp
+Here is a scanner which recognizes
+.Pq and discards
+C comments while maintaining a count of the current input line:
+.Bd -literal -offset indent
+%x comment
+%%
+int line_num = 1;
+
+"/*" BEGIN(comment);
+
+<comment>[^*\en]* /* eat anything that's not a '*' */
+<comment>"*"+[^*/\en]* /* eat up '*'s not followed by '/'s */
+<comment>\en ++line_num;
+<comment>"*"+"/" BEGIN(INITIAL);
+.Ed
+.Pp
+This scanner goes to a bit of trouble to match as much
+text as possible with each rule.
+In general, when attempting to write a high-speed scanner
+try to match as much as possible in each rule, as it's a big win.
+.Pp
+Note that start-condition names are really integer values and
+can be stored as such.
+Thus, the above could be extended in the following fashion:
+.Bd -literal -offset indent
+%x comment foo
+%%
+int line_num = 1;
+int comment_caller;
+
+"/*" {
+ comment_caller = INITIAL;
+ BEGIN(comment);
+}
+
+\&...
+
+<foo>"/*" {
+ comment_caller = foo;
+ BEGIN(comment);
+}
+
+<comment>[^*\en]* /* eat anything that's not a '*' */
+<comment>"*"+[^*/\en]* /* eat up '*'s not followed by '/'s */
+<comment>\en ++line_num;
+<comment>"*"+"/" BEGIN(comment_caller);
+.Ed
+.Pp
+Furthermore, the current start condition can be accessed by using
+the integer-valued
+.Dv YY_START
+macro.
+For example, the above assignments to
+.Em comment_caller
+could instead be written
+.Pp
+.Dl comment_caller = YY_START;
+.Pp
+Flex provides
+.Dv YYSTATE
+as an alias for
+.Dv YY_START
+(since that is what's used by
+.At
+.Nm lex ) .
+.Pp
+Note that start conditions do not have their own name-space;
+%s's and %x's declare names in the same fashion as #define's.
+.Pp
+Finally, here's an example of how to match C-style quoted strings using
+exclusive start conditions, including expanded escape sequences
+(but not including checking for a string that's too long):
+.Bd -literal -offset indent
+%x str
+
+%%
+#define MAX_STR_CONST 1024
+char string_buf[MAX_STR_CONST];
+char *string_buf_ptr;
+
+\e" string_buf_ptr = string_buf; BEGIN(str);
+
+<str>\e" { /* saw closing quote - all done */
+ BEGIN(INITIAL);
+ *string_buf_ptr = '\e0';
+ /*
+ * return string constant token type and
+ * value to parser
+ */
+}
+
+<str>\en {
+ /* error - unterminated string constant */
+ /* generate error message */
+}
+
+<str>\e\e[0-7]{1,3} {
+ /* octal escape sequence */
+ int result;
+
+ (void) sscanf(yytext + 1, "%o", &result);
+
+ if (result > 0xff) {
+ /* error, constant is out-of-bounds */
+ } else
+ *string_buf_ptr++ = result;
+}
+
+<str>\e\e[0-9]+ {
+ /*
+ * generate error - bad escape sequence; something
+ * like '\e48' or '\e0777777'
+ */
+}
+
+<str>\e\en *string_buf_ptr++ = '\en';
+<str>\e\et *string_buf_ptr++ = '\et';
+<str>\e\er *string_buf_ptr++ = '\er';
+<str>\e\eb *string_buf_ptr++ = '\eb';
+<str>\e\ef *string_buf_ptr++ = '\ef';
+
+<str>\e\e(.|\en) *string_buf_ptr++ = yytext[1];
+
+<str>[^\e\e\en\e"]+ {
+ char *yptr = yytext;
+
+ while (*yptr)
+ *string_buf_ptr++ = *yptr++;
+}
+.Ed
+.Pp
+Often, such as in some of the examples above,
+a whole bunch of rules are all preceded by the same start condition(s).
+.Nm
+makes this a little easier and cleaner by introducing a notion of
+start condition
+.Em scope .
+A start condition scope is begun with:
+.Pp
+.Dl <SCs>{
+.Pp
+where
+.Dq SCs
+is a list of one or more start conditions.
+Inside the start condition scope, every rule automatically has the prefix <SCs>
+applied to it, until a
+.Sq }
+which matches the initial
+.Sq { .
+So, for example,
+.Bd -literal -offset indent
+<ESC>{
+ "\e\en" return '\en';
+ "\e\er" return '\er';
+ "\e\ef" return '\ef';
+ "\e\e0" return '\e0';
+}
+.Ed
+.Pp
+is equivalent to:
+.Bd -literal -offset indent
+<ESC>"\e\en" return '\en';
+<ESC>"\e\er" return '\er';
+<ESC>"\e\ef" return '\ef';
+<ESC>"\e\e0" return '\e0';
+.Ed
+.Pp
+Start condition scopes may be nested.
+.Pp
+Three routines are available for manipulating stacks of start conditions:
+.Bl -tag -width Ds
+.It void yy_push_state(int new_state)
+Pushes the current start condition onto the top of the start condition
+stack and switches to
+.Fa new_state
+as though
+.Dq BEGIN new_state
+had been used
+.Pq recall that start condition names are also integers .
+.It void yy_pop_state()
+Pops the top of the stack and switches to it via
+.Em BEGIN .
+.It int yy_top_state()
+Returns the top of the stack without altering the stack's contents.
+.El
+.Pp
+The start condition stack grows dynamically and so has no built-in
+size limitation.
+If memory is exhausted, program execution aborts.
+.Pp
+To use start condition stacks, scanners must include a
+.Dq %option stack
+directive (see
+.Sx OPTIONS
+below).
+.Sh MULTIPLE INPUT BUFFERS
+Some scanners
+(such as those which support
+.Qq include
+files)
+require reading from several input streams.
+As
+.Nm
+scanners do a large amount of buffering, one cannot control
+where the next input will be read from by simply writing a
+.Dv YY_INPUT
+which is sensitive to the scanning context.
+.Dv YY_INPUT
+is only called when the scanner reaches the end of its buffer, which
+may be a long time after scanning a statement such as an
+.Qq include
+which requires switching the input source.
+.Pp
+To negotiate these sorts of problems,
+.Nm
+provides a mechanism for creating and switching between multiple
+input buffers.
+An input buffer is created by using:
+.Pp
+.D1 YY_BUFFER_STATE yy_create_buffer(FILE *file, int size)
+.Pp
+which takes a
+.Fa FILE
+pointer and a
+.Fa size
+and creates a buffer associated with the given file and large enough to hold
+.Fa size
+characters (when in doubt, use
+.Dv YY_BUF_SIZE
+for the size).
+It returns a
+.Dv YY_BUFFER_STATE
+handle, which may then be passed to other routines
+.Pq see below .
+The
+.Dv YY_BUFFER_STATE
+type is a pointer to an opaque
+.Dq struct yy_buffer_state
+structure, so
+.Dv YY_BUFFER_STATE
+variables may be safely initialized to
+.Dq ((YY_BUFFER_STATE) 0)
+if desired, and the opaque structure can also be referred to in order to
+correctly declare input buffers in source files other than that of scanners.
+Note that the
+.Fa FILE
+pointer in the call to
+.Fn yy_create_buffer
+is only used as the value of
+.Fa yyin
+seen by
+.Dv YY_INPUT ;
+if
+.Dv YY_INPUT
+is redefined so that it no longer uses
+.Fa yyin ,
+then a nil
+.Fa FILE
+pointer can safely be passed to
+.Fn yy_create_buffer .
+To select a particular buffer to scan:
+.Pp
+.D1 void yy_switch_to_buffer(YY_BUFFER_STATE new_buffer)
+.Pp
+It switches the scanner's input buffer so subsequent tokens will
+come from
+.Fa new_buffer .
+Note that
+.Fn yy_switch_to_buffer
+may be used by
+.Fn yywrap
+to set things up for continued scanning,
+instead of opening a new file and pointing
+.Fa yyin
+at it.
+Note also that switching input sources via either
+.Fn yy_switch_to_buffer
+or
+.Fn yywrap
+does not change the start condition.
+.Pp
+.D1 void yy_delete_buffer(YY_BUFFER_STATE buffer)
+.Pp
+is used to reclaim the storage associated with a buffer.
+.Pf ( Fa buffer
+can be nil, in which case the routine does nothing.)
+To clear the current contents of a buffer:
+.Pp
+.D1 void yy_flush_buffer(YY_BUFFER_STATE buffer)
+.Pp
+This function discards the buffer's contents,
+so the next time the scanner attempts to match a token from the buffer,
+it will first fill the buffer anew using
+.Dv YY_INPUT .
+.Pp
+.Fn yy_new_buffer
+is an alias for
+.Fn yy_create_buffer ,
+provided for compatibility with the C++ use of
+.Em new
+and
+.Em delete
+for creating and destroying dynamic objects.
+.Pp
+Finally, the
+.Dv YY_CURRENT_BUFFER
+macro returns a
+.Dv YY_BUFFER_STATE
+handle to the current buffer.
+.Pp
+Here is an example of using these features for writing a scanner
+which expands include files (the <<EOF>> feature is discussed below):
+.Bd -literal -offset indent
+/*
+ * the "incl" state is used for picking up the name
+ * of an include file
+ */
+%x incl
+
+%{
+#define MAX_INCLUDE_DEPTH 10
+YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
+int include_stack_ptr = 0;
+%}
+
+%%
+include BEGIN(incl);
+
+[a-z]+ ECHO;
+[^a-z\en]*\en? ECHO;
+
+<incl>[ \et]* /* eat the whitespace */
+<incl>[^ \et\en]+ { /* got the include file name */
+ if (include_stack_ptr >= MAX_INCLUDE_DEPTH)
+ errx(1, "Includes nested too deeply");
+
+ include_stack[include_stack_ptr++] =
+ YY_CURRENT_BUFFER;
+
+ yyin = fopen(yytext, "r");
+
+ if (yyin == NULL)
+ err(1, NULL);
+
+ yy_switch_to_buffer(
+ yy_create_buffer(yyin, YY_BUF_SIZE));
+
+ BEGIN(INITIAL);
+}
+
+<<EOF>> {
+ if (--include_stack_ptr < 0)
+ yyterminate();
+ else {
+ yy_delete_buffer(YY_CURRENT_BUFFER);
+ yy_switch_to_buffer(
+ include_stack[include_stack_ptr]);
+ }
+}
+.Ed
+.Pp
+Three routines are available for setting up input buffers for
+scanning in-memory strings instead of files.
+All of them create a new input buffer for scanning the string,
+and return a corresponding
+.Dv YY_BUFFER_STATE
+handle (which should be deleted afterwards using
+.Fn yy_delete_buffer ) .
+They also switch to the new buffer using
+.Fn yy_switch_to_buffer ,
+so the next call to
+.Fn yylex
+will start scanning the string.
+.Bl -tag -width Ds
+.It yy_scan_string(const char *str)
+Scans a NUL-terminated string.
+.It yy_scan_bytes(const char *bytes, int len)
+Scans
+.Fa len
+bytes
+.Pq including possibly NUL's
+starting at location
+.Fa bytes .
+.El
+.Pp
+Note that both of these functions create and scan a copy
+of the string or bytes.
+(This may be desirable, since
+.Fn yylex
+modifies the contents of the buffer it is scanning.)
+The copy can be avoided by using:
+.Bl -tag -width Ds
+.It yy_scan_buffer(char *base, yy_size_t size)
+Which scans the buffer starting at
+.Fa base ,
+consisting of
+.Fa size
+bytes, the last two bytes of which must be
+.Dv YY_END_OF_BUFFER_CHAR
+.Pq ASCII NUL .
+These last two bytes are not scanned; thus, scanning consists of
+base[0] through base[size-2], inclusive.
+.Pp
+If
+.Fa base
+is not set up in this manner
+(i.e., forget the final two
+.Dv YY_END_OF_BUFFER_CHAR
+bytes), then
+.Fn yy_scan_buffer
+returns a nil pointer instead of creating a new input buffer.
+.Pp
+The type
+.Fa yy_size_t
+is an integral type which can be cast to an integer expression
+reflecting the size of the buffer.
+.El
+.Sh END-OF-FILE RULES
+The special rule
+.Qq <<EOF>>
+indicates actions which are to be taken when an end-of-file is encountered and
+.Fn yywrap
+returns non-zero
+.Pq i.e., indicates no further files to process .
+The action must finish by doing one of four things:
+.Bl -dash
+.It
+Assigning
+.Em yyin
+to a new input file
+(in previous versions of
+.Nm ,
+after doing the assignment, it was necessary to call the special action
+.Dv YY_NEW_FILE ;
+this is no longer necessary).
+.It
+Executing a
+.Em return
+statement.
+.It
+Executing the special
+.Fn yyterminate
+action.
+.It
+Switching to a new buffer using
+.Fn yy_switch_to_buffer
+as shown in the example above.
+.El
+.Pp
+<<EOF>> rules may not be used with other patterns;
+they may only be qualified with a list of start conditions.
+If an unqualified <<EOF>> rule is given, it applies to all start conditions
+which do not already have <<EOF>> actions.
+To specify an <<EOF>> rule for only the initial start condition, use
+.Pp
+.Dl <INITIAL><<EOF>>
+.Pp
+These rules are useful for catching things like unclosed comments.
+An example:
+.Bd -literal -offset indent
+%x quote
+%%
+
+\&...other rules for dealing with quotes...
+
+<quote><<EOF>> {
+ error("unterminated quote");
+ yyterminate();
+}
+<<EOF>> {
+ if (*++filelist)
+ yyin = fopen(*filelist, "r");
+ else
+ yyterminate();
+}
+.Ed
+.Sh MISCELLANEOUS MACROS
+The macro
+.Dv YY_USER_ACTION
+can be defined to provide an action
+which is always executed prior to the matched rule's action.
+For example,
+it could be #define'd to call a routine to convert yytext to lower-case.
+When
+.Dv YY_USER_ACTION
+is invoked, the variable
+.Fa yy_act
+gives the number of the matched rule
+.Pq rules are numbered starting with 1 .
+For example, to profile how often each rule is matched,
+the following would do the trick:
+.Pp
+.Dl #define YY_USER_ACTION ++ctr[yy_act]
+.Pp
+where
+.Fa ctr
+is an array to hold the counts for the different rules.
+Note that the macro
+.Dv YY_NUM_RULES
+gives the total number of rules
+(including the default rule, even if
+.Fl s
+is used),
+so a correct declaration for
+.Fa ctr
+is:
+.Pp
+.Dl int ctr[YY_NUM_RULES];
+.Pp
+The macro
+.Dv YY_USER_INIT
+may be defined to provide an action which is always executed before
+the first scan
+.Pq and before the scanner's internal initializations are done .
+For example, it could be used to call a routine to read
+in a data table or open a logging file.
+.Pp
+The macro
+.Dv yy_set_interactive(is_interactive)
+can be used to control whether the current buffer is considered
+.Em interactive .
+An interactive buffer is processed more slowly,
+but must be used when the scanner's input source is indeed
+interactive to avoid problems due to waiting to fill buffers
+(see the discussion of the
+.Fl I
+flag below).
+A non-zero value in the macro invocation marks the buffer as interactive,
+a zero value as non-interactive.
+Note that use of this macro overrides
+.Dq %option always-interactive
+or
+.Dq %option never-interactive
+(see
+.Sx OPTIONS
+below).
+.Fn yy_set_interactive
+must be invoked prior to beginning to scan the buffer that is
+.Pq or is not
+to be considered interactive.
+.Pp
+The macro
+.Dv yy_set_bol(at_bol)
+can be used to control whether the current buffer's scanning
+context for the next token match is done as though at the
+beginning of a line.
+A non-zero macro argument makes rules anchored with
+.Sq ^
+active, while a zero argument makes
+.Sq ^
+rules inactive.
+.Pp
+The macro
+.Dv YY_AT_BOL
+returns true if the next token scanned from the current buffer will have
+.Sq ^
+rules active, false otherwise.
+.Pp
+In the generated scanner, the actions are all gathered in one large
+switch statement and separated using
+.Dv YY_BREAK ,
+which may be redefined.
+By default, it is simply a
+.Qq break ,
+to separate each rule's action from the following rules.
+Redefining
+.Dv YY_BREAK
+allows, for example, C++ users to
+.Dq #define YY_BREAK
+to do nothing
+(while being very careful that every rule ends with a
+.Qq break
+or a
+.Qq return ! )
+to avoid suffering from unreachable statement warnings where because a rule's
+action ends with
+.Dq return ,
+the
+.Dv YY_BREAK
+is inaccessible.
+.Sh VALUES AVAILABLE TO THE USER
+This section summarizes the various values available to the user
+in the rule actions.
+.Bl -tag -width Ds
+.It char *yytext
+Holds the text of the current token.
+It may be modified but not lengthened
+.Pq characters cannot be appended to the end .
+.Pp
+If the special directive
+.Dq %array
+appears in the first section of the scanner description, then
+.Fa yytext
+is instead declared
+.Dq char yytext[YYLMAX] ,
+where
+.Dv YYLMAX
+is a macro definition that can be redefined in the first section
+to change the default value
+.Pq generally 8KB .
+Using
+.Dq %array
+results in somewhat slower scanners, but the value of
+.Fa yytext
+becomes immune to calls to
+.Fn input
+and
+.Fn unput ,
+which potentially destroy its value when
+.Fa yytext
+is a character pointer.
+The opposite of
+.Dq %array
+is
+.Dq %pointer ,
+which is the default.
+.Pp
+.Dq %array
+cannot be used when generating C++ scanner classes
+(the
+.Fl +
+flag).
+.It int yyleng
+Holds the length of the current token.
+.It FILE *yyin
+Is the file which by default
+.Nm
+reads from.
+It may be redefined, but doing so only makes sense before
+scanning begins or after an
+.Dv EOF
+has been encountered.
+Changing it in the midst of scanning will have unexpected results since
+.Nm
+buffers its input; use
+.Fn yyrestart
+instead.
+Once scanning terminates because an end-of-file
+has been seen,
+.Fa yyin
+can be assigned as the new input file
+and the scanner can be called again to continue scanning.
+.It void yyrestart(FILE *new_file)
+May be called to point
+.Fa yyin
+at the new input file.
+The switch-over to the new file is immediate
+.Pq any previously buffered-up input is lost .
+Note that calling
+.Fn yyrestart
+with
+.Fa yyin
+as an argument thus throws away the current input buffer and continues
+scanning the same input file.
+.It FILE *yyout
+Is the file to which
+.Em ECHO
+actions are done.
+It can be reassigned by the user.
+.It YY_CURRENT_BUFFER
+Returns a
+.Dv YY_BUFFER_STATE
+handle to the current buffer.
+.It YY_START
+Returns an integer value corresponding to the current start condition.
+This value can subsequently be used with
+.Em BEGIN
+to return to that start condition.
+.El
+.Sh INTERFACING WITH YACC
+One of the main uses of
+.Nm
+is as a companion to the
+.Xr yacc 1
+parser-generator.
+yacc parsers expect to call a routine named
+.Fn yylex
+to find the next input token.
+The routine is supposed to return the type of the next token
+as well as putting any associated value in the global
+.Fa yylval ,
+which is defined externally,
+and can be a union or any other complex data structure.
+To use
+.Nm
+with yacc, one specifies the
+.Fl d
+option to yacc to instruct it to generate the file
+.Pa y.tab.h
+containing definitions of all the
+.Dq %tokens
+appearing in the yacc input.
+This file is then included in the
+.Nm
+scanner.
+For example, part of the scanner might look like:
+.Bd -literal -offset indent
+%{
+#include "y.tab.h"
+%}
+
+%%
+
+if return TOK_IF;
+then return TOK_THEN;
+begin return TOK_BEGIN;
+end return TOK_END;
+.Ed
+.Sh OPTIONS
+.Nm
+has the following options:
+.Bl -tag -width Ds
+.It Fl 7
+Instructs
+.Nm
+to generate a 7-bit scanner, i.e., one which can only recognize 7-bit
+characters in its input.
+The advantage of using
+.Fl 7
+is that the scanner's tables can be up to half the size of those generated
+using the
+.Fl 8
+option
+.Pq see below .
+The disadvantage is that such scanners often hang
+or crash if their input contains an 8-bit character.
+.Pp
+Note, however, that unless generating a scanner using the
+.Fl Cf
+or
+.Fl CF
+table compression options, use of
+.Fl 7
+will save only a small amount of table space,
+and make the scanner considerably less portable.
+.Nm flex Ns 's
+default behavior is to generate an 8-bit scanner unless
+.Fl Cf
+or
+.Fl CF
+is specified, in which case
+.Nm
+defaults to generating 7-bit scanners unless it was
+configured to generate 8-bit scanners
+(as will often be the case with non-USA sites).
+It is possible tell whether
+.Nm
+generated a 7-bit or an 8-bit scanner by inspecting the flag summary in the
+.Fl v
+output as described below.
+.Pp
+Note that if
+.Fl Cfe
+or
+.Fl CFe
+are used
+(the table compression options, but also using equivalence classes as
+discussed below),
+.Nm
+still defaults to generating an 8-bit scanner,
+since usually with these compression options full 8-bit tables
+are not much more expensive than 7-bit tables.
+.It Fl 8
+Instructs
+.Nm
+to generate an 8-bit scanner, i.e., one which can recognize 8-bit
+characters.
+This flag is only needed for scanners generated using
+.Fl Cf
+or
+.Fl CF ,
+as otherwise
+.Nm
+defaults to generating an 8-bit scanner anyway.
+.Pp
+See the discussion of
+.Fl 7
+above for
+.Nm flex Ns 's
+default behavior and the tradeoffs between 7-bit and 8-bit scanners.
+.It Fl B
+Instructs
+.Nm
+to generate a
+.Em batch
+scanner, the opposite of
+.Em interactive
+scanners generated by
+.Fl I
+.Pq see below .
+In general,
+.Fl B
+is used when the scanner will never be used interactively,
+and you want to squeeze a little more performance out of it.
+If the aim is instead to squeeze out a lot more performance,
+use the
+.Fl Cf
+or
+.Fl CF
+options
+.Pq discussed below ,
+which turn on
+.Fl B
+automatically anyway.
+.It Fl b
+Generate backing-up information to
+.Pa lex.backup .
+This is a list of scanner states which require backing up
+and the input characters on which they do so.
+By adding rules one can remove backing-up states.
+If all backing-up states are eliminated and
+.Fl Cf
+or
+.Fl CF
+is used, the generated scanner will run faster (see the
+.Fl p
+flag).
+Only users who wish to squeeze every last cycle out of their
+scanners need worry about this option.
+(See the section on
+.Sx PERFORMANCE CONSIDERATIONS
+below.)
+.It Fl C Ns Op Cm aeFfmr
+Controls the degree of table compression and, more generally, trade-offs
+between small scanners and fast scanners.
+.Bl -tag -width Ds
+.It Fl Ca
+Instructs
+.Nm
+to trade off larger tables in the generated scanner for faster performance
+because the elements of the tables are better aligned for memory access
+and computation.
+On some
+.Tn RISC
+architectures, fetching and manipulating longwords is more efficient
+than with smaller-sized units such as shortwords.
+This option can double the size of the tables used by the scanner.
+.It Fl Ce
+Directs
+.Nm
+to construct
+.Em equivalence classes ,
+i.e., sets of characters which have identical lexical properties
+(for example, if the only appearance of digits in the
+.Nm
+input is in the character class
+.Qq [0-9]
+then the digits
+.Sq 0 ,
+.Sq 1 ,
+.Sq ... ,
+.Sq 9
+will all be put in the same equivalence class).
+Equivalence classes usually give dramatic reductions in the final
+table/object file sizes
+.Pq typically a factor of 2\-5
+and are pretty cheap performance-wise
+.Pq one array look-up per character scanned .
+.It Fl CF
+Specifies that the alternate fast scanner representation
+(described below under the
+.Fl F
+option)
+should be used.
+This option cannot be used with
+.Fl + .
+.It Fl Cf
+Specifies that the
+.Em full
+scanner tables should be generated \-
+.Nm
+should not compress the tables by taking advantage of
+similar transition functions for different states.
+.It Fl \&Cm
+Directs
+.Nm
+to construct
+.Em meta-equivalence classes ,
+which are sets of equivalence classes
+(or characters, if equivalence classes are not being used)
+that are commonly used together.
+Meta-equivalence classes are often a big win when using compressed tables,
+but they have a moderate performance impact
+(one or two
+.Qq if
+tests and one array look-up per character scanned).
+.It Fl Cr
+Causes the generated scanner to
+.Em bypass
+use of the standard I/O library
+.Pq stdio
+for input.
+Instead of calling
+.Xr fread 3
+or
+.Xr getc 3 ,
+the scanner will use the
+.Xr read 2
+system call,
+resulting in a performance gain which varies from system to system,
+but in general is probably negligible unless
+.Fl Cf
+or
+.Fl CF
+are being used.
+Using
+.Fl Cr
+can cause strange behavior if, for example, reading from
+.Fa yyin
+using stdio prior to calling the scanner
+(because the scanner will miss whatever text previous reads left
+in the stdio input buffer).
+.Pp
+.Fl Cr
+has no effect if
+.Dv YY_INPUT
+is defined
+(see
+.Sx THE GENERATED SCANNER
+above).
+.El
+.Pp
+A lone
+.Fl C
+specifies that the scanner tables should be compressed but neither
+equivalence classes nor meta-equivalence classes should be used.
+.Pp
+The options
+.Fl Cf
+or
+.Fl CF
+and
+.Fl \&Cm
+do not make sense together \- there is no opportunity for meta-equivalence
+classes if the table is not being compressed.
+Otherwise the options may be freely mixed, and are cumulative.
+.Pp
+The default setting is
+.Fl Cem
+which specifies that
+.Nm
+should generate equivalence classes and meta-equivalence classes.
+This setting provides the highest degree of table compression.
+It is possible to trade off faster-executing scanners at the cost of
+larger tables with the following generally being true:
+.Bd -unfilled -offset indent
+slowest & smallest
+ -Cem
+ -Cm
+ -Ce
+ -C
+ -C{f,F}e
+ -C{f,F}
+ -C{f,F}a
+fastest & largest
+.Ed
+.Pp
+Note that scanners with the smallest tables are usually generated and
+compiled the quickest,
+so during development the default is usually best,
+maximal compression.
+.Pp
+.Fl Cfe
+is often a good compromise between speed and size for production scanners.
+.It Fl d
+Makes the generated scanner run in debug mode.
+Whenever a pattern is recognized and the global
+.Fa yy_flex_debug
+is non-zero
+.Pq which is the default ,
+the scanner will write to stderr a line of the form:
+.Pp
+.D1 --accepting rule at line 53 ("the matched text")
+.Pp
+The line number refers to the location of the rule in the file
+defining the scanner
+(i.e., the file that was fed to
+.Nm ) .
+Messages are also generated when the scanner backs up,
+accepts the default rule,
+reaches the end of its input buffer
+(or encounters a NUL;
+at this point, the two look the same as far as the scanner's concerned),
+or reaches an end-of-file.
+.It Fl F
+Specifies that the fast scanner table representation should be used
+.Pq and stdio bypassed .
+This representation is about as fast as the full table representation
+.Pq Fl f ,
+and for some sets of patterns will be considerably smaller
+.Pq and for others, larger .
+In general, if the pattern set contains both
+.Qq keywords
+and a catch-all,
+.Qq identifier
+rule, such as in the set:
+.Bd -unfilled -offset indent
+"case" return TOK_CASE;
+"switch" return TOK_SWITCH;
+\&...
+"default" return TOK_DEFAULT;
+[a-z]+ return TOK_ID;
+.Ed
+.Pp
+then it's better to use the full table representation.
+If only the
+.Qq identifier
+rule is present and a hash table or some such is used to detect the keywords,
+it's better to use
+.Fl F .
+.Pp
+This option is equivalent to
+.Fl CFr
+.Pq see above .
+It cannot be used with
+.Fl + .
+.It Fl f
+Specifies
+.Em fast scanner .
+No table compression is done and stdio is bypassed.
+The result is large but fast.
+This option is equivalent to
+.Fl Cfr
+.Pq see above .
+.It Fl h
+Generates a help summary of
+.Nm flex Ns 's
+options to stdout and then exits.
+.Fl ?\&
+and
+.Fl Fl help
+are synonyms for
+.Fl h .
+.It Fl I
+Instructs
+.Nm
+to generate an
+.Em interactive
+scanner.
+An interactive scanner is one that only looks ahead to decide
+what token has been matched if it absolutely must.
+It turns out that always looking one extra character ahead,
+even if the scanner has already seen enough text
+to disambiguate the current token, is a bit faster than
+only looking ahead when necessary.
+But scanners that always look ahead give dreadful interactive performance;
+for example, when a user types a newline,
+it is not recognized as a newline token until they enter
+.Em another
+token, which often means typing in another whole line.
+.Pp
+.Nm
+scanners default to
+.Em interactive
+unless
+.Fl Cf
+or
+.Fl CF
+table-compression options are specified
+.Pq see above .
+That's because if high-performance is most important,
+one of these options should be used,
+so if they weren't,
+.Nm
+assumes it is preferable to trade off a bit of run-time performance for
+intuitive interactive behavior.
+Note also that
+.Fl I
+cannot be used in conjunction with
+.Fl Cf
+or
+.Fl CF .
+Thus, this option is not really needed; it is on by default for all those
+cases in which it is allowed.
+.Pp
+A scanner can be forced to not be interactive by using
+.Fl B
+.Pq see above .
+.It Fl i
+Instructs
+.Nm
+to generate a case-insensitive scanner.
+The case of letters given in the
+.Nm
+input patterns will be ignored,
+and tokens in the input will be matched regardless of case.
+The matched text given in
+.Fa yytext
+will have the preserved case
+.Pq i.e., it will not be folded .
+.It Fl L
+Instructs
+.Nm
+not to generate
+.Dq #line
+directives.
+Without this option,
+.Nm
+peppers the generated scanner with #line directives so error messages
+in the actions will be correctly located with respect to either the original
+.Nm
+input file
+(if the errors are due to code in the input file),
+or
+.Pa lex.yy.c
+(if the errors are
+.Nm flex Ns 's
+fault \- these sorts of errors should be reported to the email address
+given below).
+.It Fl l
+Turns on maximum compatibility with the original
+.At
+.Nm lex
+implementation.
+Note that this does not mean full compatibility.
+Use of this option costs a considerable amount of performance,
+and it cannot be used with the
+.Fl + , f , F , Cf ,
+or
+.Fl CF
+options.
+For details on the compatibilities it provides, see the section
+.Sx INCOMPATIBILITIES WITH LEX AND POSIX
+below.
+This option also results in the name
+.Dv YY_FLEX_LEX_COMPAT
+being #define'd in the generated scanner.
+.It Fl n
+Another do-nothing, deprecated option included only for
+.Tn POSIX
+compliance.
+.It Fl o Ns Ar output
+Directs
+.Nm
+to write the scanner to the file
+.Ar output
+instead of
+.Pa lex.yy.c .
+If
+.Fl o
+is combined with the
+.Fl t
+option, then the scanner is written to stdout but its
+.Dq #line
+directives
+(see the
+.Fl L
+option above)
+refer to the file
+.Ar output .
+.It Fl P Ns Ar prefix
+Changes the default
+.Qq yy
+prefix used by
+.Nm
+for all globally visible variable and function names to instead be
+.Ar prefix .
+For example,
+.Fl P Ns Ar foo
+changes the name of
+.Fa yytext
+to
+.Fa footext .
+It also changes the name of the default output file from
+.Pa lex.yy.c
+to
+.Pa lex.foo.c .
+Here are all of the names affected:
+.Bd -unfilled -offset indent
+yy_create_buffer
+yy_delete_buffer
+yy_flex_debug
+yy_init_buffer
+yy_flush_buffer
+yy_load_buffer_state
+yy_switch_to_buffer
+yyin
+yyleng
+yylex
+yylineno
+yyout
+yyrestart
+yytext
+yywrap
+.Ed
+.Pp
+(If using a C++ scanner, then only
+.Fa yywrap
+and
+.Fa yyFlexLexer
+are affected.)
+Within the scanner itself, it is still possible to refer to the global variables
+and functions using either version of their name; but externally, they
+have the modified name.
+.Pp
+This option allows multiple
+.Nm
+programs to be easily linked together into the same executable.
+Note, though, that using this option also renames
+.Fn yywrap ,
+so now either an
+.Pq appropriately named
+version of the routine for the scanner must be supplied, or
+.Dq %option noyywrap
+must be used, as linking with
+.Fl lfl
+no longer provides one by default.
+.It Fl p
+Generates a performance report to stderr.
+The report consists of comments regarding features of the
+.Nm
+input file which will cause a serious loss of performance in the resulting
+scanner.
+If the flag is specified twice,
+comments regarding features that lead to minor performance losses
+will also be reported>
+.Pp
+Note that the use of
+.Em REJECT ,
+.Dq %option yylineno ,
+and variable trailing context
+(see the
+.Sx BUGS
+section below)
+entails a substantial performance penalty; use of
+.Fn yymore ,
+the
+.Sq ^
+operator, and the
+.Fl I
+flag entail minor performance penalties.
+.It Fl S Ns Ar skeleton
+Overrides the default skeleton file from which
+.Nm
+constructs its scanners.
+This option is needed only for
+.Nm
+maintenance or development.
+.It Fl s
+Causes the default rule
+.Pq that unmatched scanner input is echoed to stdout
+to be suppressed.
+If the scanner encounters input that does not
+match any of its rules, it aborts with an error.
+This option is useful for finding holes in a scanner's rule set.
+.It Fl T
+Makes
+.Nm
+run in
+.Em trace
+mode.
+It will generate a lot of messages to stderr concerning
+the form of the input and the resultant non-deterministic and deterministic
+finite automata.
+This option is mostly for use in maintaining
+.Nm .
+.It Fl t
+Instructs
+.Nm
+to write the scanner it generates to standard output instead of
+.Pa lex.yy.c .
+.It Fl V
+Prints the version number to stdout and exits.
+.Fl Fl version
+is a synonym for
+.Fl V .
+.It Fl v
+Specifies that
+.Nm
+should write to stderr
+a summary of statistics regarding the scanner it generates.
+Most of the statistics are meaningless to the casual
+.Nm
+user, but the first line identifies the version of
+.Nm
+(same as reported by
+.Fl V ) ,
+and the next line the flags used when generating the scanner,
+including those that are on by default.
+.It Fl w
+Suppresses warning messages.
+.It Fl +
+Specifies that
+.Nm
+should generate a C++ scanner class.
+See the section on
+.Sx GENERATING C++ SCANNERS
+below for details.
+.El
+.Pp
+.Nm
+also provides a mechanism for controlling options within the
+scanner specification itself, rather than from the
+.Nm
+command line.
+This is done by including
+.Dq %option
+directives in the first section of the scanner specification.
+Multiple options can be specified with a single
+.Dq %option
+directive, and multiple directives in the first section of the
+.Nm
+input file.
+.Pp
+Most options are given simply as names, optionally preceded by the word
+.Qq no
+.Pq with no intervening whitespace
+to negate their meaning.
+A number are equivalent to
+.Nm
+flags or their negation:
+.Bd -unfilled -offset indent
+7bit -7 option
+8bit -8 option
+align -Ca option
+backup -b option
+batch -B option
+c++ -+ option
+
+caseful or
+case-sensitive opposite of -i (default)
+
+case-insensitive or
+caseless -i option
+
+debug -d option
+default opposite of -s option
+ecs -Ce option
+fast -F option
+full -f option
+interactive -I option
+lex-compat -l option
+meta-ecs -Cm option
+perf-report -p option
+read -Cr option
+stdout -t option
+verbose -v option
+warn opposite of -w option
+ (use "%option nowarn" for -w)
+
+array equivalent to "%array"
+pointer equivalent to "%pointer" (default)
+.Ed
+.Pp
+Some %option's provide features otherwise not available:
+.Bl -tag -width Ds
+.It always-interactive
+Instructs
+.Nm
+to generate a scanner which always considers its input
+.Qq interactive .
+Normally, on each new input file the scanner calls
+.Fn isatty
+in an attempt to determine whether the scanner's input source is interactive
+and thus should be read a character at a time.
+When this option is used, however, no such call is made.
+.It main
+Directs
+.Nm
+to provide a default
+.Fn main
+program for the scanner, which simply calls
+.Fn yylex .
+This option implies
+.Dq noyywrap
+.Pq see below .
+.It never-interactive
+Instructs
+.Nm
+to generate a scanner which never considers its input
+.Qq interactive
+(again, no call made to
+.Fn isatty ) .
+This is the opposite of
+.Dq always-interactive .
+.It stack
+Enables the use of start condition stacks
+(see
+.Sx START CONDITIONS
+above).
+.It stdinit
+If set (i.e.,
+.Dq %option stdinit ) ,
+initializes
+.Fa yyin
+and
+.Fa yyout
+to stdin and stdout, instead of the default of
+.Dq nil .
+Some existing
+.Nm lex
+programs depend on this behavior, even though it is not compliant with ANSI C,
+which does not require stdin and stdout to be compile-time constant.
+.It yylineno
+Directs
+.Nm
+to generate a scanner that maintains the number of the current line
+read from its input in the global variable
+.Fa yylineno .
+This option is implied by
+.Dq %option lex-compat .
+.It yywrap
+If unset (i.e.,
+.Dq %option noyywrap ) ,
+makes the scanner not call
+.Fn yywrap
+upon an end-of-file, but simply assume that there are no more files to scan
+(until the user points
+.Fa yyin
+at a new file and calls
+.Fn yylex
+again).
+.El
+.Pp
+.Nm
+scans rule actions to determine whether the
+.Em REJECT
+or
+.Fn yymore
+features are being used.
+The
+.Dq reject
+and
+.Dq yymore
+options are available to override its decision as to whether to use the
+options, either by setting them (e.g.,
+.Dq %option reject )
+to indicate the feature is indeed used,
+or unsetting them to indicate it actually is not used
+(e.g.,
+.Dq %option noyymore ) .
+.Pp
+Three options take string-delimited values, offset with
+.Sq = :
+.Pp
+.D1 %option outfile="ABC"
+.Pp
+is equivalent to
+.Fl o Ns Ar ABC ,
+and
+.Pp
+.D1 %option prefix="XYZ"
+.Pp
+is equivalent to
+.Fl P Ns Ar XYZ .
+Finally,
+.Pp
+.D1 %option yyclass="foo"
+.Pp
+only applies when generating a C++ scanner
+.Pf ( Fl +
+option).
+It informs
+.Nm
+that
+.Dq foo
+has been derived as a subclass of yyFlexLexer, so
+.Nm
+will place actions in the member function
+.Dq foo::yylex()
+instead of
+.Dq yyFlexLexer::yylex() .
+It also generates a
+.Dq yyFlexLexer::yylex()
+member function that emits a run-time error (by invoking
+.Dq yyFlexLexer::LexerError() )
+if called.
+See
+.Sx GENERATING C++ SCANNERS ,
+below, for additional information.
+.Pp
+A number of options are available for
+lint
+purists who want to suppress the appearance of unneeded routines
+in the generated scanner.
+Each of the following, if unset
+(e.g.,
+.Dq %option nounput ) ,
+results in the corresponding routine not appearing in the generated scanner:
+.Bd -unfilled -offset indent
+input, unput
+yy_push_state, yy_pop_state, yy_top_state
+yy_scan_buffer, yy_scan_bytes, yy_scan_string
+.Ed
+.Pp
+(though
+.Fn yy_push_state
+and friends won't appear anyway unless
+.Dq %option stack
+is being used).
+.Sh PERFORMANCE CONSIDERATIONS
+The main design goal of
+.Nm
+is that it generate high-performance scanners.
+It has been optimized for dealing well with large sets of rules.
+Aside from the effects on scanner speed of the table compression
+.Fl C
+options outlined above,
+there are a number of options/actions which degrade performance.
+These are, from most expensive to least:
+.Bd -unfilled -offset indent
+REJECT
+%option yylineno
+arbitrary trailing context
+
+pattern sets that require backing up
+%array
+%option interactive
+%option always-interactive
+
+\&'^' beginning-of-line operator
+yymore()
+.Ed
+.Pp
+with the first three all being quite expensive
+and the last two being quite cheap.
+Note also that
+.Fn unput
+is implemented as a routine call that potentially does quite a bit of work,
+while
+.Fn yyless
+is a quite-cheap macro; so if just putting back some excess text,
+use
+.Fn yyless .
+.Pp
+.Em REJECT
+should be avoided at all costs when performance is important.
+It is a particularly expensive option.
+.Pp
+Getting rid of backing up is messy and often may be an enormous
+amount of work for a complicated scanner.
+In principal, one begins by using the
+.Fl b
+flag to generate a
+.Pa lex.backup
+file.
+For example, on the input
+.Bd -literal -offset indent
+%%
+foo return TOK_KEYWORD;
+foobar return TOK_KEYWORD;
+.Ed
+.Pp
+the file looks like:
+.Bd -literal -offset indent
+State #6 is non-accepting -
+ associated rule line numbers:
+ 2 3
+ out-transitions: [ o ]
+ jam-transitions: EOF [ \e001-n p-\e177 ]
+
+State #8 is non-accepting -
+ associated rule line numbers:
+ 3
+ out-transitions: [ a ]
+ jam-transitions: EOF [ \e001-` b-\e177 ]
+
+State #9 is non-accepting -
+ associated rule line numbers:
+ 3
+ out-transitions: [ r ]
+ jam-transitions: EOF [ \e001-q s-\e177 ]
+
+Compressed tables always back up.
+.Ed
+.Pp
+The first few lines tell us that there's a scanner state in
+which it can make a transition on an
+.Sq o
+but not on any other character,
+and that in that state the currently scanned text does not match any rule.
+The state occurs when trying to match the rules found
+at lines 2 and 3 in the input file.
+If the scanner is in that state and then reads something other than an
+.Sq o ,
+it will have to back up to find a rule which is matched.
+With a bit of headscratching one can see that this must be the
+state it's in when it has seen
+.Sq fo .
+When this has happened, if anything other than another
+.Sq o
+is seen, the scanner will have to back up to simply match the
+.Sq f
+.Pq by the default rule .
+.Pp
+The comment regarding State #8 indicates there's a problem when
+.Qq foob
+has been scanned.
+Indeed, on any character other than an
+.Sq a ,
+the scanner will have to back up to accept
+.Qq foo .
+Similarly, the comment for State #9 concerns when
+.Qq fooba
+has been scanned and an
+.Sq r
+does not follow.
+.Pp
+The final comment reminds us that there's no point going to
+all the trouble of removing backing up from the rules unless we're using
+.Fl Cf
+or
+.Fl CF ,
+since there's no performance gain doing so with compressed scanners.
+.Pp
+The way to remove the backing up is to add
+.Qq error
+rules:
+.Bd -literal -offset indent
+%%
+foo return TOK_KEYWORD;
+foobar return TOK_KEYWORD;
+
+fooba |
+foob |
+fo {
+ /* false alarm, not really a keyword */
+ return TOK_ID;
+}
+.Ed
+.Pp
+Eliminating backing up among a list of keywords can also be done using a
+.Qq catch-all
+rule:
+.Bd -literal -offset indent
+%%
+foo return TOK_KEYWORD;
+foobar return TOK_KEYWORD;
+
+[a-z]+ return TOK_ID;
+.Ed
+.Pp
+This is usually the best solution when appropriate.
+.Pp
+Backing up messages tend to cascade.
+With a complicated set of rules it's not uncommon to get hundreds of messages.
+If one can decipher them, though,
+it often only takes a dozen or so rules to eliminate the backing up
+(though it's easy to make a mistake and have an error rule accidentally match
+a valid token; a possible future
+.Nm
+feature will be to automatically add rules to eliminate backing up).
+.Pp
+It's important to keep in mind that the benefits of eliminating
+backing up are gained only if
+.Em every
+instance of backing up is eliminated.
+Leaving just one gains nothing.
+.Pp
+.Em Variable
+trailing context
+(where both the leading and trailing parts do not have a fixed length)
+entails almost the same performance loss as
+.Em REJECT
+.Pq i.e., substantial .
+So when possible a rule like:
+.Bd -literal -offset indent
+%%
+mouse|rat/(cat|dog) run();
+.Ed
+.Pp
+is better written:
+.Bd -literal -offset indent
+%%
+mouse/cat|dog run();
+rat/cat|dog run();
+.Ed
+.Pp
+or as
+.Bd -literal -offset indent
+%%
+mouse|rat/cat run();
+mouse|rat/dog run();
+.Ed
+.Pp
+Note that here the special
+.Sq |\&
+action does not provide any savings, and can even make things worse (see
+.Sx BUGS
+below).
+.Pp
+Another area where the user can increase a scanner's performance
+.Pq and one that's easier to implement
+arises from the fact that the longer the tokens matched,
+the faster the scanner will run.
+This is because with long tokens the processing of most input
+characters takes place in the
+.Pq short
+inner scanning loop, and does not often have to go through the additional work
+of setting up the scanning environment (e.g.,
+.Fa yytext )
+for the action.
+Recall the scanner for C comments:
+.Bd -literal -offset indent
+%x comment
+%%
+int line_num = 1;
+
+"/*" BEGIN(comment);
+
+<comment>[^*\en]*
+<comment>"*"+[^*/\en]*
+<comment>\en ++line_num;
+<comment>"*"+"/" BEGIN(INITIAL);
+.Ed
+.Pp
+This could be sped up by writing it as:
+.Bd -literal -offset indent
+%x comment
+%%
+int line_num = 1;
+
+"/*" BEGIN(comment);
+
+<comment>[^*\en]*
+<comment>[^*\en]*\en ++line_num;
+<comment>"*"+[^*/\en]*
+<comment>"*"+[^*/\en]*\en ++line_num;
+<comment>"*"+"/" BEGIN(INITIAL);
+.Ed
+.Pp
+Now instead of each newline requiring the processing of another action,
+recognizing the newlines is
+.Qq distributed
+over the other rules to keep the matched text as long as possible.
+Note that adding rules does
+.Em not
+slow down the scanner!
+The speed of the scanner is independent of the number of rules or
+(modulo the considerations given at the beginning of this section)
+how complicated the rules are with regard to operators such as
+.Sq *
+and
+.Sq |\& .
+.Pp
+A final example in speeding up a scanner:
+scan through a file containing identifiers and keywords, one per line
+and with no other extraneous characters, and recognize all the keywords.
+A natural first approach is:
+.Bd -literal -offset indent
+%%
+asm |
+auto |
+break |
+\&... etc ...
+volatile |
+while /* it's a keyword */
+
+\&.|\en /* it's not a keyword */
+.Ed
+.Pp
+To eliminate the back-tracking, introduce a catch-all rule:
+.Bd -literal -offset indent
+%%
+asm |
+auto |
+break |
+\&... etc ...
+volatile |
+while /* it's a keyword */
+
+[a-z]+ |
+\&.|\en /* it's not a keyword */
+.Ed
+.Pp
+Now, if it's guaranteed that there's exactly one word per line,
+then we can reduce the total number of matches by a half by
+merging in the recognition of newlines with that of the other tokens:
+.Bd -literal -offset indent
+%%
+asm\en |
+auto\en |
+break\en |
+\&... etc ...
+volatile\en |
+while\en /* it's a keyword */
+
+[a-z]+\en |
+\&.|\en /* it's not a keyword */
+.Ed
+.Pp
+One has to be careful here,
+as we have now reintroduced backing up into the scanner.
+In particular, while we know that there will never be any characters
+in the input stream other than letters or newlines,
+.Nm
+can't figure this out, and it will plan for possibly needing to back up
+when it has scanned a token like
+.Qq auto
+and then the next character is something other than a newline or a letter.
+Previously it would then just match the
+.Qq auto
+rule and be done, but now it has no
+.Qq auto
+rule, only an
+.Qq auto\en
+rule.
+To eliminate the possibility of backing up,
+we could either duplicate all rules but without final newlines or,
+since we never expect to encounter such an input and therefore don't
+how it's classified, we can introduce one more catch-all rule,
+this one which doesn't include a newline:
+.Bd -literal -offset indent
+%%
+asm\en |
+auto\en |
+break\en |
+\&... etc ...
+volatile\en |
+while\en /* it's a keyword */
+
+[a-z]+\en |
+[a-z]+ |
+\&.|\en /* it's not a keyword */
+.Ed
+.Pp
+Compiled with
+.Fl Cf ,
+this is about as fast as one can get a
+.Nm
+scanner to go for this particular problem.
+.Pp
+A final note:
+.Nm
+is slow when matching NUL's,
+particularly when a token contains multiple NUL's.
+It's best to write rules which match short
+amounts of text if it's anticipated that the text will often include NUL's.
+.Pp
+Another final note regarding performance: as mentioned above in the section
+.Sx HOW THE INPUT IS MATCHED ,
+dynamically resizing
+.Fa yytext
+to accommodate huge tokens is a slow process because it presently requires that
+the
+.Pq huge
+token be rescanned from the beginning.
+Thus if performance is vital, it is better to attempt to match
+.Qq large
+quantities of text but not
+.Qq huge
+quantities, where the cutoff between the two is at about 8K characters/token.
+.Sh GENERATING C++ SCANNERS
+.Nm
+provides two different ways to generate scanners for use with C++.
+The first way is to simply compile a scanner generated by
+.Nm
+using a C++ compiler instead of a C compiler.
+This should not generate any compilation errors
+(please report any found to the email address given in the
+.Sx AUTHORS
+section below).
+C++ code can then be used in rule actions instead of C code.
+Note that the default input source for scanners remains
+.Fa yyin ,
+and default echoing is still done to
+.Fa yyout .
+Both of these remain
+.Fa FILE *
+variables and not C++ streams.
+.Pp
+.Nm
+can also be used to generate a C++ scanner class, using the
+.Fl +
+option (or, equivalently,
+.Dq %option c++ ) ,
+which is automatically specified if the name of the flex executable ends in a
+.Sq + ,
+such as
+.Nm flex++ .
+When using this option,
+.Nm
+defaults to generating the scanner to the file
+.Pa lex.yy.cc
+instead of
+.Pa lex.yy.c .
+The generated scanner includes the header file
+.In g++/FlexLexer.h ,
+which defines the interface to two C++ classes.
+.Pp
+The first class,
+.Em FlexLexer ,
+provides an abstract base class defining the general scanner class interface.
+It provides the following member functions:
+.Bl -tag -width Ds
+.It const char* YYText()
+Returns the text of the most recently matched token, the equivalent of
+.Fa yytext .
+.It int YYLeng()
+Returns the length of the most recently matched token, the equivalent of
+.Fa yyleng .
+.It int lineno() const
+Returns the current input line number
+(see
+.Dq %option yylineno ) ,
+or 1 if
+.Dq %option yylineno
+was not used.
+.It void set_debug(int flag)
+Sets the debugging flag for the scanner, equivalent to assigning to
+.Fa yy_flex_debug
+(see the
+.Sx OPTIONS
+section above).
+Note that the scanner must be built using
+.Dq %option debug
+to include debugging information in it.
+.It int debug() const
+Returns the current setting of the debugging flag.
+.El
+.Pp
+Also provided are member functions equivalent to
+.Fn yy_switch_to_buffer ,
+.Fn yy_create_buffer
+(though the first argument is an
+.Fa std::istream*
+object pointer and not a
+.Fa FILE* ) ,
+.Fn yy_flush_buffer ,
+.Fn yy_delete_buffer ,
+and
+.Fn yyrestart
+(again, the first argument is an
+.Fa std::istream*
+object pointer).
+.Pp
+The second class defined in
+.In g++/FlexLexer.h
+is
+.Fa yyFlexLexer ,
+which is derived from
+.Fa FlexLexer .
+It defines the following additional member functions:
+.Bl -tag -width Ds
+.It "yyFlexLexer(std::istream* arg_yyin = 0, std::ostream* arg_yyout = 0)"
+Constructs a
+.Fa yyFlexLexer
+object using the given streams for input and output.
+If not specified, the streams default to
+.Fa cin
+and
+.Fa cout ,
+respectively.
+.It virtual int yylex()
+Performs the same role as
+.Fn yylex
+does for ordinary flex scanners: it scans the input stream, consuming
+tokens, until a rule's action returns a value.
+If subclass
+.Sq S
+is derived from
+.Fa yyFlexLexer ,
+in order to access the member functions and variables of
+.Sq S
+inside
+.Fn yylex ,
+use
+.Dq %option yyclass="S"
+to inform
+.Nm
+that the
+.Sq S
+subclass will be used instead of
+.Fa yyFlexLexer .
+In this case, rather than generating
+.Dq yyFlexLexer::yylex() ,
+.Nm
+generates
+.Dq S::yylex()
+(and also generates a dummy
+.Dq yyFlexLexer::yylex()
+that calls
+.Dq yyFlexLexer::LexerError()
+if called).
+.It "virtual void switch_streams(std::istream* new_in = 0, std::ostream* new_out = 0)"
+Reassigns
+.Fa yyin
+to
+.Fa new_in
+.Pq if non-nil
+and
+.Fa yyout
+to
+.Fa new_out
+.Pq ditto ,
+deleting the previous input buffer if
+.Fa yyin
+is reassigned.
+.It int yylex(std::istream* new_in, std::ostream* new_out = 0)
+First switches the input streams via
+.Dq switch_streams(new_in, new_out)
+and then returns the value of
+.Fn yylex .
+.El
+.Pp
+In addition,
+.Fa yyFlexLexer
+defines the following protected virtual functions which can be redefined
+in derived classes to tailor the scanner:
+.Bl -tag -width Ds
+.It virtual int LexerInput(char* buf, int max_size)
+Reads up to
+.Fa max_size
+characters into
+.Fa buf
+and returns the number of characters read.
+To indicate end-of-input, return 0 characters.
+Note that
+.Qq interactive
+scanners (see the
+.Fl B
+and
+.Fl I
+flags) define the macro
+.Dv YY_INTERACTIVE .
+If
+.Fn LexerInput
+has been redefined, and it's necessary to take different actions depending on
+whether or not the scanner might be scanning an interactive input source,
+it's possible to test for the presence of this name via
+.Dq #ifdef .
+.It virtual void LexerOutput(const char* buf, int size)
+Writes out
+.Fa size
+characters from the buffer
+.Fa buf ,
+which, while NUL-terminated, may also contain
+.Qq internal
+NUL's if the scanner's rules can match text with NUL's in them.
+.It virtual void LexerError(const char* msg)
+Reports a fatal error message.
+The default version of this function writes the message to the stream
+.Fa cerr
+and exits.
+.El
+.Pp
+Note that a
+.Fa yyFlexLexer
+object contains its entire scanning state.
+Thus such objects can be used to create reentrant scanners.
+Multiple instances of the same
+.Fa yyFlexLexer
+class can be instantiated, and multiple C++ scanner classes can be combined
+in the same program using the
+.Fl P
+option discussed above.
+.Pp
+Finally, note that the
+.Dq %array
+feature is not available to C++ scanner classes;
+.Dq %pointer
+must be used
+.Pq the default .
+.Pp
+Here is an example of a simple C++ scanner:
+.Bd -literal -offset indent
+// An example of using the flex C++ scanner class.
+
+%{
+#include <errno.h>
+int mylineno = 0;
+%}
+
+string \e"[^\en"]+\e"
+
+ws [ \et]+
+
+alpha [A-Za-z]
+dig [0-9]
+name ({alpha}|{dig}|\e$)({alpha}|{dig}|[_.\e-/$])*
+num1 [-+]?{dig}+\e.?([eE][-+]?{dig}+)?
+num2 [-+]?{dig}*\e.{dig}+([eE][-+]?{dig}+)?
+number {num1}|{num2}
+
+%%
+
+{ws} /* skip blanks and tabs */
+
+"/*" {
+ int c;
+
+ while ((c = yyinput()) != 0) {
+ if(c == '\en')
+ ++mylineno;
+ else if(c == '*') {
+ if ((c = yyinput()) == '/')
+ break;
+ else
+ unput(c);
+ }
+ }
+}
+
+{number} cout << "number " << YYText() << '\en';
+
+\en mylineno++;
+
+{name} cout << "name " << YYText() << '\en';
+
+{string} cout << "string " << YYText() << '\en';
+
+%%
+
+int main(int /* argc */, char** /* argv */)
+{
+ FlexLexer* lexer = new yyFlexLexer;
+ while(lexer->yylex() != 0)
+ ;
+ return 0;
+}
+.Ed
+.Pp
+To create multiple
+.Pq different
+lexer classes, use the
+.Fl P
+flag
+(or the
+.Dq prefix=
+option)
+to rename each
+.Fa yyFlexLexer
+to some other
+.Fa xxFlexLexer .
+.In g++/FlexLexer.h
+can then be included in other sources once per lexer class, first renaming
+.Fa yyFlexLexer
+as follows:
+.Bd -literal -offset indent
+#undef yyFlexLexer
+#define yyFlexLexer xxFlexLexer
+#include <g++/FlexLexer.h>
+
+#undef yyFlexLexer
+#define yyFlexLexer zzFlexLexer
+#include <g++/FlexLexer.h>
+.Ed
+.Pp
+If, for example,
+.Dq %option prefix="xx"
+is used for one scanner and
+.Dq %option prefix="zz"
+is used for the other.
+.Pp
+.Sy IMPORTANT :
+the present form of the scanning class is experimental
+and may change considerably between major releases.
+.Sh INCOMPATIBILITIES WITH LEX AND POSIX
+.Nm
+is a rewrite of the
+.At
+.Nm lex
+tool
+(the two implementations do not share any code, though),
+with some extensions and incompatibilities, both of which are of concern
+to those who wish to write scanners acceptable to either implementation.
+.Nm
+is fully compliant with the
+.Tn POSIX
+.Nm lex
+specification, except that when using
+.Dq %pointer
+.Pq the default ,
+a call to
+.Fn unput
+destroys the contents of
+.Fa yytext ,
+which is counter to the
+.Tn POSIX
+specification.
+.Pp
+In this section we discuss all of the known areas of incompatibility between
+.Nm ,
+.At
+.Nm lex ,
+and the
+.Tn POSIX
+specification.
+.Pp
+.Nm flex Ns 's
+.Fl l
+option turns on maximum compatibility with the original
+.At
+.Nm lex
+implementation, at the cost of a major loss in the generated scanner's
+performance.
+We note below which incompatibilities can be overcome using the
+.Fl l
+option.
+.Pp
+.Nm
+is fully compatible with
+.Nm lex
+with the following exceptions:
+.Bl -dash
+.It
+The undocumented
+.Nm lex
+scanner internal variable
+.Fa yylineno
+is not supported unless
+.Fl l
+or
+.Dq %option yylineno
+is used.
+.Pp
+.Fa yylineno
+should be maintained on a per-buffer basis, rather than a per-scanner
+.Pq single global variable
+basis.
+.Pp
+.Fa yylineno
+is not part of the
+.Tn POSIX
+specification.
+.It
+The
+.Fn input
+routine is not redefinable, though it may be called to read characters
+following whatever has been matched by a rule.
+If
+.Fn input
+encounters an end-of-file, the normal
+.Fn yywrap
+processing is done.
+A
+.Dq real
+end-of-file is returned by
+.Fn input
+as
+.Dv EOF .
+.Pp
+Input is instead controlled by defining the
+.Dv YY_INPUT
+macro.
+.Pp
+The
+.Nm
+restriction that
+.Fn input
+cannot be redefined is in accordance with the
+.Tn POSIX
+specification, which simply does not specify any way of controlling the
+scanner's input other than by making an initial assignment to
+.Fa yyin .
+.It
+The
+.Fn unput
+routine is not redefinable.
+This restriction is in accordance with
+.Tn POSIX .
+.It
+.Nm
+scanners are not as reentrant as
+.Nm lex
+scanners.
+In particular, if a scanner is interactive and
+an interrupt handler long-jumps out of the scanner,
+and the scanner is subsequently called again,
+the following error message may be displayed:
+.Pp
+.D1 fatal flex scanner internal error--end of buffer missed
+.Pp
+To reenter the scanner, first use
+.Pp
+.Dl yyrestart(yyin);
+.Pp
+Note that this call will throw away any buffered input;
+usually this isn't a problem with an interactive scanner.
+.Pp
+Also note that flex C++ scanner classes are reentrant,
+so if using C++ is an option , they should be used instead.
+See
+.Sx GENERATING C++ SCANNERS
+above for details.
+.It
+.Fn output
+is not supported.
+Output from the
+.Em ECHO
+macro is done to the file-pointer
+.Fa yyout
+.Pq default stdout .
+.Pp
+.Fn output
+is not part of the
+.Tn POSIX
+specification.
+.It
+.Nm lex
+does not support exclusive start conditions
+.Pq %x ,
+though they are in the
+.Tn POSIX
+specification.
+.It
+When definitions are expanded,
+.Nm
+encloses them in parentheses.
+With
+.Nm lex ,
+the following:
+.Bd -literal -offset indent
+NAME [A-Z][A-Z0-9]*
+%%
+foo{NAME}? printf("Found it\en");
+%%
+.Ed
+.Pp
+will not match the string
+.Qq foo
+because when the macro is expanded the rule is equivalent to
+.Qq foo[A-Z][A-Z0-9]*?
+and the precedence is such that the
+.Sq ?\&
+is associated with
+.Qq [A-Z0-9]* .
+With
+.Nm ,
+the rule will be expanded to
+.Qq foo([A-Z][A-Z0-9]*)?
+and so the string
+.Qq foo
+will match.
+.Pp
+Note that if the definition begins with
+.Sq ^
+or ends with
+.Sq $
+then it is not expanded with parentheses, to allow these operators to appear in
+definitions without losing their special meanings.
+But the
+.Sq <s> ,
+.Sq / ,
+and
+.Sq <<EOF>>
+operators cannot be used in a
+.Nm
+definition.
+.Pp
+Using
+.Fl l
+results in the
+.Nm lex
+behavior of no parentheses around the definition.
+.Pp
+The
+.Tn POSIX
+specification is that the definition be enclosed in parentheses.
+.It
+Some implementations of
+.Nm lex
+allow a rule's action to begin on a separate line,
+if the rule's pattern has trailing whitespace:
+.Bd -literal -offset indent
+%%
+foo|bar<space here>
+ { foobar_action(); }
+.Ed
+.Pp
+.Nm
+does not support this feature.
+.It
+The
+.Nm lex
+.Sq %r
+.Pq generate a Ratfor scanner
+option is not supported.
+It is not part of the
+.Tn POSIX
+specification.
+.It
+After a call to
+.Fn unput ,
+.Fa yytext
+is undefined until the next token is matched,
+unless the scanner was built using
+.Dq %array .
+This is not the case with
+.Nm lex
+or the
+.Tn POSIX
+specification.
+The
+.Fl l
+option does away with this incompatibility.
+.It
+The precedence of the
+.Sq {}
+.Pq numeric range
+operator is different.
+.Nm lex
+interprets
+.Qq abc{1,3}
+as match one, two, or three occurrences of
+.Sq abc ,
+whereas
+.Nm
+interprets it as match
+.Sq ab
+followed by one, two, or three occurrences of
+.Sq c .
+The latter is in agreement with the
+.Tn POSIX
+specification.
+.It
+The precedence of the
+.Sq ^
+operator is different.
+.Nm lex
+interprets
+.Qq ^foo|bar
+as match either
+.Sq foo
+at the beginning of a line, or
+.Sq bar
+anywhere, whereas
+.Nm
+interprets it as match either
+.Sq foo
+or
+.Sq bar
+if they come at the beginning of a line.
+The latter is in agreement with the
+.Tn POSIX
+specification.
+.It
+The special table-size declarations such as
+.Sq %a
+supported by
+.Nm lex
+are not required by
+.Nm
+scanners;
+.Nm
+ignores them.
+.It
+The name
+.Dv FLEX_SCANNER
+is #define'd so scanners may be written for use with either
+.Nm
+or
+.Nm lex .
+Scanners also include
+.Dv YY_FLEX_MAJOR_VERSION
+and
+.Dv YY_FLEX_MINOR_VERSION
+indicating which version of
+.Nm
+generated the scanner
+(for example, for the 2.5 release, these defines would be 2 and 5,
+respectively).
+.El
+.Pp
+The following
+.Nm
+features are not included in
+.Nm lex
+or the
+.Tn POSIX
+specification:
+.Bd -unfilled -offset indent
+C++ scanners
+%option
+start condition scopes
+start condition stacks
+interactive/non-interactive scanners
+yy_scan_string() and friends
+yyterminate()
+yy_set_interactive()
+yy_set_bol()
+YY_AT_BOL()
+<<EOF>>
+<*>
+YY_DECL
+YY_START
+YY_USER_ACTION
+YY_USER_INIT
+#line directives
+%{}'s around actions
+multiple actions on a line
+.Ed
+.Pp
+plus almost all of the
+.Nm
+flags.
+The last feature in the list refers to the fact that with
+.Nm
+multiple actions can be placed on the same line,
+separated with semi-colons, while with
+.Nm lex ,
+the following
+.Pp
+.Dl foo handle_foo(); ++num_foos_seen;
+.Pp
+is
+.Pq rather surprisingly
+truncated to
+.Pp
+.Dl foo handle_foo();
+.Pp
+.Nm
+does not truncate the action.
+Actions that are not enclosed in braces
+are simply terminated at the end of the line.
+.Sh FILES
+.Bl -tag -width "<g++/FlexLexer.h>"
+.It Pa flex.skl
+Skeleton scanner.
+This file is only used when building flex, not when
+.Nm
+executes.
+.It Pa lex.backup
+Backing-up information for the
+.Fl b
+flag (called
+.Pa lex.bck
+on some systems).
+.It Pa lex.yy.c
+Generated scanner
+(called
+.Pa lexyy.c
+on some systems).
+.It Pa lex.yy.cc
+Generated C++ scanner class, when using
+.Fl + .
+.It In g++/FlexLexer.h
+Header file defining the C++ scanner base class,
+.Fa FlexLexer ,
+and its derived class,
+.Fa yyFlexLexer .
+.It Pa /usr/lib/libl.*
+.Nm
+libraries.
+The
+.Pa /usr/lib/libfl.*\&
+libraries are links to these.
+Scanners must be linked using either
+.Fl \&ll
+or
+.Fl lfl .
+.El
+.Sh EXIT STATUS
+.Ex -std flex
+.Sh DIAGNOSTICS
+.Bl -diag
+.It warning, rule cannot be matched
+Indicates that the given rule cannot be matched because it follows other rules
+that will always match the same text as it.
+For example, in the following
+.Dq foo
+cannot be matched because it comes after an identifier
+.Qq catch-all
+rule:
+.Bd -literal -offset indent
+[a-z]+ got_identifier();
+foo got_foo();
+.Ed
+.Pp
+Using
+.Em REJECT
+in a scanner suppresses this warning.
+.It "warning, \-s option given but default rule can be matched"
+Means that it is possible
+.Pq perhaps only in a particular start condition
+that the default rule
+.Pq match any single character
+is the only one that will match a particular input.
+Since
+.Fl s
+was given, presumably this is not intended.
+.It reject_used_but_not_detected undefined
+.It yymore_used_but_not_detected undefined
+These errors can occur at compile time.
+They indicate that the scanner uses
+.Em REJECT
+or
+.Fn yymore
+but that
+.Nm
+failed to notice the fact, meaning that
+.Nm
+scanned the first two sections looking for occurrences of these actions
+and failed to find any, but somehow they snuck in
+.Pq via an #include file, for example .
+Use
+.Dq %option reject
+or
+.Dq %option yymore
+to indicate to
+.Nm
+that these features are really needed.
+.It flex scanner jammed
+A scanner compiled with
+.Fl s
+has encountered an input string which wasn't matched by any of its rules.
+This error can also occur due to internal problems.
+.It token too large, exceeds YYLMAX
+The scanner uses
+.Dq %array
+and one of its rules matched a string longer than the
+.Dv YYLMAX
+constant
+.Pq 8K bytes by default .
+The value can be increased by #define'ing
+.Dv YYLMAX
+in the definitions section of
+.Nm
+input.
+.It "scanner requires \-8 flag to use the character 'x'"
+The scanner specification includes recognizing the 8-bit character
+.Sq x
+and the
+.Fl 8
+flag was not specified, and defaulted to 7-bit because the
+.Fl Cf
+or
+.Fl CF
+table compression options were used.
+See the discussion of the
+.Fl 7
+flag for details.
+.It flex scanner push-back overflow
+unput() was used to push back so much text that the scanner's buffer
+could not hold both the pushed-back text and the current token in
+.Fa yytext .
+Ideally the scanner should dynamically resize the buffer in this case,
+but at present it does not.
+.It "input buffer overflow, can't enlarge buffer because scanner uses REJECT"
+The scanner was working on matching an extremely large token and needed
+to expand the input buffer.
+This doesn't work with scanners that use
+.Em REJECT .
+.It "fatal flex scanner internal error--end of buffer missed"
+This can occur in a scanner which is reentered after a long-jump
+has jumped out
+.Pq or over
+the scanner's activation frame.
+Before reentering the scanner, use:
+.Pp
+.Dl yyrestart(yyin);
+.Pp
+or, as noted above, switch to using the C++ scanner class.
+.It "too many start conditions in <> construct!"
+More start conditions than exist were listed in a <> construct
+(so at least one of them must have been listed twice).
+.El
+.Sh SEE ALSO
+.Xr awk 1 ,
+.Xr sed 1 ,
+.Xr yacc 1
+.Rs
+.\" 4.4BSD PSD:16
+.%A M. E. Lesk
+.%T Lex \(em Lexical Analyzer Generator
+.%I AT&T Bell Laboratories
+.%R Computing Science Technical Report
+.%N 39
+.%D October 1975
+.Re
+.Rs
+.%A John Levine
+.%A Tony Mason
+.%A Doug Brown
+.%B Lex & Yacc
+.%I O'Reilly and Associates
+.%N 2nd edition
+.Re
+.Rs
+.%A Alfred Aho
+.%A Ravi Sethi
+.%A Jeffrey Ullman
+.%B Compilers: Principles, Techniques and Tools
+.%I Addison-Wesley
+.%D 1986
+.%O "Describes the pattern-matching techniques used by flex (deterministic finite automata)"
+.Re
+.Sh STANDARDS
+The
+.Nm lex
+utility is compliant with the
+.St -p1003.1-2008
+specification,
+though its presence is optional.
+.Pp
+The flags
+.Op Fl 78BbCdFfhIiLloPpSsTVw+? ,
+.Op Fl -help ,
+and
+.Op Fl -version
+are extensions to that specification.
+.Pp
+See also the
+.Sx INCOMPATIBILITIES WITH LEX AND POSIX
+section, above.
+.Sh AUTHORS
+Vern Paxson, with the help of many ideas and much inspiration from
+Van Jacobson.
+Original version by Jef Poskanzer.
+The fast table representation is a partial implementation of a design done by
+Van Jacobson.
+The implementation was done by Kevin Gong and Vern Paxson.
+.Pp
+Thanks to the many
+.Nm
+beta-testers, feedbackers, and contributors, especially Francois Pinard,
+Casey Leedom,
+Robert Abramovitz,
+Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai,
+Neal Becker, Nelson H.F. Beebe,
+.Mt benson@odi.com ,
+Karl Berry, Peter A. Bigot, Simon Blanchard,
+Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher,
+Brian Clapper, J.T. Conklin,
+Jason Coughlin, Bill Cox, Nick Cropper, Dave Curtis, Scott David
+Daniels, Chris G. Demetriou, Theo de Raadt,
+Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin,
+Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl,
+Joe Gayda, Kaveh R. Ghazi, Wolfgang Glunz,
+Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer Griebel,
+Jan Hajic, Charles Hemphill, NORO Hideo,
+Jarkko Hietaniemi, Scott Hofmann,
+Jeff Honig, Dana Hudes, Eric Hughes, John Interrante,
+Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones,
+Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane,
+Amir Katz,
+.Mt ken@ken.hilco.com ,
+Kevin B. Kenny,
+Steve Kirsch, Winfried Koenig, Marq Kole, Ronald Lamprecht,
+Greg Lee, Rohan Lenard, Craig Leres, John Levine, Steve Liddle,
+David Loffredo, Mike Long,
+Mohamed el Lozy, Brian Madsen, Malte, Joe Marshall,
+Bengt Martensson, Chris Metcalf,
+Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum,
+G.T. Nicol, Landon Noll, James Nordby, Marc Nozell,
+Richard Ohnemus, Karsten Pahnke,
+Sven Panne, Roland Pesch, Walter Pelissero, Gaumond Pierre,
+Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha,
+Frederic Raimbault, Pat Rankin, Rick Richardson,
+Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto Santini,
+Andreas Scherer, Darrell Schiebel, Raf Schietekat,
+Doug Schmidt, Philippe Schnoebelen, Andreas Schwab,
+Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist,
+Mike Stump, Paul Stuart, Dave Tallman, Ian Lance Taylor,
+Chris Thewalt, Richard M. Timoney, Jodi Tsai,
+Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams,
+Ken Yap, Ron Zellar, Nathan Zelle, David Zuhn,
+and those whose names have slipped my marginal mail-archiving skills
+but whose contributions are appreciated all the
+same.
+.Pp
+Thanks to Keith Bostic, Jon Forrest, Noah Friedman,
+John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.
+Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various
+distribution headaches.
+.Pp
+Thanks to Esmond Pitt and Earle Horton for 8-bit character support;
+to Benson Margulies and Fred Burke for C++ support;
+to Kent Williams and Tom Epperly for C++ class support;
+to Ove Ewerlid for support of NUL's;
+and to Eric Hughes for support of multiple buffers.
+.Pp
+This work was primarily done when I was with the Real Time Systems Group
+at the Lawrence Berkeley Laboratory in Berkeley, CA.
+Many thanks to all there for the support I received.
+.Pp
+Send comments to
+.Aq Mt vern@ee.lbl.gov .
+.Sh BUGS
+Some trailing context patterns cannot be properly matched and generate
+warning messages
+.Pq "dangerous trailing context" .
+These are patterns where the ending of the first part of the rule
+matches the beginning of the second part, such as
+.Qq zx*/xy* ,
+where the
+.Sq x*
+matches the
+.Sq x
+at the beginning of the trailing context.
+(Note that the POSIX draft states that the text matched by such patterns
+is undefined.)
+.Pp
+For some trailing context rules, parts which are actually fixed-length are
+not recognized as such, leading to the above mentioned performance loss.
+In particular, parts using
+.Sq |\&
+or
+.Sq {n}
+(such as
+.Qq foo{3} )
+are always considered variable-length.
+.Pp
+Combining trailing context with the special
+.Sq |\&
+action can result in fixed trailing context being turned into
+the more expensive variable trailing context.
+For example, in the following:
+.Bd -literal -offset indent
+%%
+abc |
+xyz/def
+.Ed
+.Pp
+Use of
+.Fn unput
+invalidates yytext and yyleng, unless the
+.Dq %array
+directive
+or the
+.Fl l
+option has been used.
+.Pp
+Pattern-matching of NUL's is substantially slower than matching other
+characters.
+.Pp
+Dynamic resizing of the input buffer is slow, as it entails rescanning
+all the text matched so far by the current
+.Pq generally huge
+token.
+.Pp
+Due to both buffering of input and read-ahead,
+it is not possible to intermix calls to
+.In stdio.h
+routines, such as, for example,
+.Fn getchar ,
+with
+.Nm
+rules and expect it to work.
+Call
+.Fn input
+instead.
+.Pp
+The total table entries listed by the
+.Fl v
+flag excludes the number of table entries needed to determine
+what rule has been matched.
+The number of entries is equal to the number of DFA states
+if the scanner does not use
+.Em REJECT ,
+and somewhat greater than the number of states if it does.
+.Pp
+.Em REJECT
+cannot be used with the
+.Fl f
+or
+.Fl F
+options.
+.Pp
+The
+.Nm
+internal algorithms need documentation.