diff options
| author | Jacob McDonnell <jacob@jacobmcdonnell.com> | 2026-04-25 21:07:28 -0400 |
|---|---|---|
| committer | Jacob McDonnell <jacob@jacobmcdonnell.com> | 2026-04-25 21:07:28 -0400 |
| commit | 711594636704defae873be1a355a292505585afd (patch) | |
| tree | 59ee13f863830d8beba6cfd02bbe813dd486c26f /static/v10/man1/flex.1 | |
| parent | 3258a063c1f189d7b019e40e525b46bef9b9a7b1 (diff) | |
docs: Added UNIX V10 Manuals
Diffstat (limited to 'static/v10/man1/flex.1')
| -rw-r--r-- | static/v10/man1/flex.1 | 716 |
1 files changed, 716 insertions, 0 deletions
diff --git a/static/v10/man1/flex.1 b/static/v10/man1/flex.1 new file mode 100644 index 00000000..99cb5667 --- /dev/null +++ b/static/v10/man1/flex.1 @@ -0,0 +1,716 @@ +.TH FLEX 1 "20 June 1989" "Version 2.1" +.SH NAME +flex - fast lexical analyzer generator +.SH SYNOPSIS +.B flex +[ +.B -bdfipstvFILT -c[efmF] -Sskeleton_file +] [ +.I filename +] +.SH DESCRIPTION +.I flex +is a rewrite of +.I lex +intended to right some of that tool's deficiencies: in particular, +.I flex +generates lexical analyzers much faster, and the analyzers use +smaller tables and run faster. +.SH OPTIONS +In addition to lex's +.B -t +flag, flex has the following options: +.TP +.B -b +Generate backtracking information to +.I lex.backtrack. +This is a list of scanner states which require backtracking +and the input characters on which they do so. By adding rules one +can remove backtracking states. If all backtracking states +are eliminated and +.B -f +or +.B -F +is used, the generated scanner will run faster (see the +.B -p +flag). Only users who wish to squeeze every last cycle out of their +scanners need worry about this option. +.TP +.B -d +makes the generated scanner run in +.I debug +mode. Whenever a pattern is recognized the scanner will +write to +.I stderr +a line of the form: +.nf + + --accepting rule #n + +.fi +Rules are numbered sequentially with the first one being 1. Rule #0 +is executed when the scanner backtracks; Rule #(n+1) (where +.I n +is the number of rules) indicates the default action; Rule #(n+2) indicates +that the input buffer is empty and needs to be refilled and then the scan +restarted. Rules beyond (n+2) are end-of-file actions. +.TP +.B -f +has the same effect as lex's -f flag (do not compress the scanner +tables); the mnemonic changes from +.I fast compilation +to (take your pick) +.I full table +or +.I fast scanner. +The actual compilation takes +.I longer, +since flex is I/O bound writing out the big table. +.IP +This option is equivalent to +.B -cf +(see below). +.TP +.B -i +instructs flex to generate a +.I case-insensitive +scanner. The case of letters given in the flex input patterns will +be ignored, and the rules will be matched regardless of case. The +matched text given in +.I yytext +will have the preserved case (i.e., it will not be folded). +.TP +.B -p +generates a performance report to stderr. The report +consists of comments regarding features of the flex input file +which will cause a loss of performance in the resulting scanner. +Note that the use of +.I REJECT +and variable trailing context (see +.B BUGS) +entails a substantial performance penalty; use of +.I yymore(), +the +.B ^ +operator, +and the +.B -I +flag entail minor performance penalties. +.TP +.B -s +causes the +.I default rule +(that unmatched scanner input is echoed to +.I stdout) +to be suppressed. If the scanner encounters input that does not +match any of its rules, it aborts with an error. This option is +useful for finding holes in a scanner's rule set. +.TP +.B -v +has the same meaning as for lex (print to +.I stderr +a summary of statistics of the generated scanner). Many more statistics +are printed, though, and the summary spans several lines. Most +of the statistics are meaningless to the casual flex user, but the +first line identifies the version of flex, which is useful for figuring +out where you stand with respect to patches and new releases. +.TP +.B -F +specifies that the +.ul +fast +scanner table representation should be used. This representation is +about as fast as the full table representation +.ul +(-f), +and for some sets of patterns will be considerably smaller (and for +others, larger). In general, if the pattern set contains both "keywords" +and a catch-all, "identifier" rule, such as in the set: +.nf + + "case" return ( TOK_CASE ); + "switch" return ( TOK_SWITCH ); + ... + "default" return ( TOK_DEFAULT ); + [a-z]+ return ( TOK_ID ); + +.fi +then you're better off using the full table representation. If only +the "identifier" rule is present and you then use a hash table or some such +to detect the keywords, you're better off using +.ul +-F. +.IP +This option is equivalent to +.B -cF +(see below). +.TP +.B -I +instructs flex to generate an +.I interactive +scanner. Normally, scanners generated by flex always look ahead one +character before deciding that a rule has been matched. At the cost of +some scanning overhead, flex will generate a scanner which only looks ahead +when needed. Such scanners are called +.I interactive +because if you want to write a scanner for an interactive system such as a +command shell, you will probably want the user's input to be terminated +with a newline, and without +.B -I +the user will have to type a character in addition to the newline in order +to have the newline recognized. This leads to dreadful interactive +performance. +.IP +If all this seems to confusing, here's the general rule: if a human will +be typing in input to your scanner, use +.B -I, +otherwise don't; if you don't care about how fast your scanners run and +don't want to make any assumptions about the input to your scanner, +always use +.B -I. +.IP +Note, +.B -I +cannot be used in conjunction with +.I full +or +.I fast tables, +i.e., the +.B -f, -F, -cf, +or +.B -cF +flags. +.TP +.B -L +instructs flex to not generate +.B #line +directives (see below). +.TP +.B -T +makes flex run in +.I trace +mode. It will generate a lot of messages to stdout concerning +the form of the input and the resultant non-deterministic and deterministic +finite automatons. This option is mostly for use in maintaining flex. +.TP +.B -c[efmF] +controls the degree of table compression. +.B -ce +directs flex to construct +.I equivalence classes, +i.e., sets of characters +which have identical lexical properties (for example, if the only +appearance of digits in the flex input is in the character class +"[0-9]" then the digits '0', '1', ..., '9' will all be put +in the same equivalence class). +.B -cf +specifies that the +.I full +scanner tables should be generated - flex should not compress the +tables by taking advantages of similar transition functions for +different states. +.B -cF +specifies that the alternate fast scanner representation (described +above under the +.B -F +flag) +should be used. +.B -cm +directs flex to construct +.I meta-equivalence classes, +which are sets of equivalence classes (or characters, if equivalence +classes are not being used) that are commonly used together. +A lone +.B -c +specifies that the scanner tables should be compressed but neither +equivalence classes nor meta-equivalence classes should be used. +.IP +The options +.B -cf +or +.B -cF +and +.B -cm +do not make sense together - there is no opportunity for meta-equivalence +classes if the table is not being compressed. Otherwise the options +may be freely mixed. +.IP +The default setting is +.B -cem +which specifies that flex should generate equivalence classes +and meta-equivalence classes. This setting provides the highest +degree of table compression. You can trade off +faster-executing scanners at the cost of larger tables with +the following generally being true: +.nf + + slowest smallest + -cem + -ce + -cm + -c + -c{f,F}e + -c{f,F} + fastest largest + +.fi +Note that scanners with the smallest tables compile the quickest, so +during development you will usually want to use the default, maximal +compression. +.TP +.B -Sskeleton_file +overrides the default skeleton file from which flex constructs +its scanners. You'll never need this option unless you are doing +flex maintenance or development. +.SH INCOMPATIBILITIES WITH LEX +.I flex +is fully compatible with +.I lex +with the following exceptions: +.IP - +There is no run-time library to link with. You needn't +specify +.I -ll +when linking, and you must supply a main program. (Hacker's note: since +the lex library contains a main() which simply calls yylex(), you actually +.I can +be lazy and not supply your own main program and link with +.I -ll.) +.IP - +lex's +.B %r +(Ratfor scanners) and +.B %t +(translation table) options +are not supported. +.IP - +The do-nothing +.ul +-n +flag is not supported. +.IP - +When definitions are expanded, flex encloses them in parentheses. +With lex, the following +.nf + + NAME [A-Z][A-Z0-9]* + %% + foo{NAME}? printf( "Found it\\n" ); + %% + +.fi +will not match the string "foo" because when the macro +is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?" +and the precedence is such that the '?' is associated with +"[A-Z0-9]*". With flex, the rule will be expanded to +"foo([A-z][A-Z0-9]*)?" and so the string "foo" will match. +Note that because of this, the +.B ^, $, <s>, +and +.B / +operators cannot be used in a definition. +.IP - +The undocumented lex-scanner internal variable +.B yylineno +is not supported. +.IP - +The +.B input() +routine is not redefinable, though may be called to read characters +following whatever has been matched by a rule. If +.B input() +encounters an end-of-file the normal +.B yywrap() +processing is done. A ``real'' end-of-file is returned as +.I EOF. +.IP +Input can be controlled by redefining the +.B YY_INPUT +macro. +YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its +action is to place up to max_size characters in the character buffer "buf" +and return in the integer variable "result" either the +number of characters read or the constant YY_NULL (0 on Unix systems) +systems) to indicate EOF. The default YY_INPUT reads from the +file-pointer "yyin" (which is by default +.I stdin), +so if you +just want to change the input file, you needn't redefine +YY_INPUT - just point yyin at the input file. +.IP +A sample redefinition of YY_INPUT (in the first section of the input +file): +.nf + + %{ + #undef YY_INPUT + #define YY_INPUT(buf,result,max_size) \\ + result = (buf[0] = getchar()) == EOF ? YY_NULL : 1; + %} + +.fi +You also can add in things like counting keeping track of the +input line number this way; but don't expect your scanner to +go very fast. +.IP - +.B output() +is not supported. +Output from the ECHO macro is done to the file-pointer +"yyout" (default +.I stdout). +.IP - +If you are providing your own yywrap() routine, you must "#undef yywrap" +first. +.IP - +To refer to yytext outside of your scanner source file, use +"extern char *yytext;" rather than "extern char yytext[];". +.IP - +.B yyleng +is a macro and not a variable, and hence cannot be accessed outside +of the scanner source file. +.IP - +flex reads only one input file, while lex's input is made +up of the concatenation of its input files. +.IP - +The name +.bd +FLEX_SCANNER +is #define'd so scanners may be written for use with either +flex or lex. +.IP - +The macro +.bd +YY_USER_ACTION +can be redefined to provide an action +which is always executed prior to the matched rule's action. For example, +it could be #define'd to call a routine to convert yytext to lower-case, +or to copy yyleng to a global variable to make it accessible outside of +the scanner source file. +.IP - +In the generated scanner, rules are separated using +.bd +YY_BREAK +instead of simple "break"'s. This allows, for example, C++ users to +#define YY_BREAK to do nothing (while being very careful that every +rule ends with a "break" or a "return"!) to avoid suffering from +unreachable statement warnings where a rule's action ends with "return". +.SH ENHANCEMENTS +.IP - +.I Exclusive start-conditions +can be declared by using +.B %x +instead of +.B %s. +These start-conditions have the property that when they are active, +.I no other rules are active. +Thus a set of rules governed by the same exclusive start condition +describe a scanner which is independent of any of the other rules in +the flex input. This feature makes it easy to specify "mini-scanners" +which scan portions of the input that are syntactically different +from the rest (e.g., comments). +.IP - +.I yyterminate() +can be used in lieu of a return statement in an action. It terminates +the scanner and returns a 0 to the scanner's caller, indicating "all done". +.IP - +.I End-of-file rules. +The special rule "<<EOF>>" indicates +actions which are to be taken when an end-of-file is +encountered and yywrap() returns non-zero (i.e., indicates +no further files to process). The action can either +point yyin at a new file to process, in which case the +action should finish with +.I YY_NEW_FILE +(this is a branch, so subsequent code in the action won't +be executed), or it should finish with a +.I return +statement. <<EOF>> rules may not be used with other +patterns; they may only be qualified with a list of start +conditions. If an unqualified <<EOF>> rule is given, it +applies only to the INITIAL start condition, and +.I not +to +.B %s +start conditions. +These rules are useful for catching things like unclosed comments. +An example: +.nf + + %x quote + %% + ... + <quote><<EOF>> { + error( "unterminated quote" ); + yyterminate(); + } + <<EOF>> { + yyin = fopen( next_file, "r" ); + YY_NEW_FILE; + } + +.fi +.IP - +flex dynamically resizes its internal tables, so directives like "%a 3000" +are not needed when specifying large scanners. +.IP - +The scanning routine generated by flex is declared using the macro +.B YY_DECL. +By redefining this macro you can change the routine's name and +its calling sequence. For example, you could use: +.nf + + #undef YY_DECL + #define YY_DECL float lexscan( a, b ) float a, b; + +.fi +to give it the name +.I lexscan, +returning a float, and taking two floats as arguments. Note that +if you give arguments to the scanning routine, you must terminate +the definition with a semi-colon (;). +.IP - +flex generates +.B #line +directives mapping lines in the output to +their origin in the input file. +.IP - +You can put multiple actions on the same line, separated with +semi-colons. With lex, the following +.nf + + foo handle_foo(); return 1; + +.fi +is truncated to +.nf + + foo handle_foo(); + +.fi +flex does not truncate the action. Actions that are not enclosed in +braces are terminated at the end of the line. +.IP - +Actions can be begun with +.B %{ +and terminated with +.B %}. +In this case, flex does not count braces to figure out where the +action ends - actions are terminated by the closing +.B %}. +This feature is useful when the enclosed action has extraneous +braces in it (usually in comments or inside inactive #ifdef's) +that throw off the brace-count. +.IP - +All of the scanner actions (e.g., +.B ECHO, yywrap ...) +except the +.B unput() +and +.B input() +routines, +are written as macros, so they can be redefined if necessary +without requiring a separate library to link to. +.IP - +When +.B yywrap() +indicates that the scanner is done processing (it does this by returning +non-zero), on subsequent calls the scanner will always immediately return +a value of 0. To restart it on a new input file, the action +.B yyrestart() +is used. It takes one argument, the new input file. It closes the +previous yyin (unless stdin) and sets up the scanners internal variables +so that the next call to yylex() will start scanning the new file. This +functionality is useful for, e.g., programs which will process a file, do some +work, and then get a message to parse another file. +.IP - +Flex scans the code in section 1 (inside %{}'s) and the actions for +occurrences of +.I REJECT +and +.I yymore(). +If it doesn't see any, it assumes the features are not used and generates +higher-performance scanners. Flex tries to be correct in identifying +uses but can be fooled (for example, if a reference is made in a macro from +a #include file). If this happens (a feature is used and flex didn't +realize it) you will get a compile-time error of the form +.nf + + reject_used_but_not_detected undefined + +.fi +You can tell flex that a feature is used even if it doesn't think so +with +.B %used +followed by the name of the feature (for example, "%used REJECT"); +similarly, you can specify that a feature is +.I not +used even though it thinks it is with +.B %unused. +.IP - +Comments may be put in the first section of the input by preceding +them with '#'. +.SH FILES +.TP +.I flex.skel +skeleton scanner +.TP +.I lex.yy.c +generated scanner (called +.I lexyy.c +on some systems). +.TP +.I lex.backtrack +backtracking information for +.B -b +flag (called +.I lex.bck +on some systems). +.SH "SEE ALSO" +.LP +lex(1) +.LP +M. E. Lesk and E. Schmidt, +.I LEX - Lexical Analyzer Generator +.SH AUTHOR +Vern Paxson, with the help of many ideas and much inspiration from +Van Jacobson. Original version by Jef Poskanzer. Fast table +representation is a partial implementation of a design done by Van +Jacobson. The implementation was done by Kevin Gong and Vern Paxson. +.LP +Thanks to the many flex beta-testers and feedbackers, especially Casey +Leedom, Frederic Brehm, Nick Christopher, Chris Faylor, Eric Goldman, Eric +Hughes, Greg Lee, Craig Leres, Mohamed el Lozy, Jim Meyering, Esmond Pitt, +Jef Poskanzer, and Dave Tallman. Thanks to Keith Bostic, John Gilmore, Bob +Mulcahy, Rich Salz, and Richard Stallman for help with various distribution +headaches. +.LP +Send comments to: +.nf + + Vern Paxson + Real Time Systems + Bldg. 46A + Lawrence Berkeley Laboratory + 1 Cyclotron Rd. + Berkeley, CA 94720 + + (415) 486-6411 + + vern@csam.lbl.gov + vern@rtsg.ee.lbl.gov + ucbvax!csam.lbl.gov!vern + +.fi +I will be gone from mid-July '89 through mid-August '89. From August on, +the addresses are: +.nf + + vern@cs.cornell.edu + + Vern Paxson + CS Department + Grad Office + 4126 Upson + Cornell University + Ithaca, NY 14853-7501 + + <no phone number yet> + +.fi +Email sent to the former addresses should continue to be forwarded for +quite a while. Also, it looks like my username will be "paxson" and +not "vern". I'm planning on having a mail alias set up so "vern" will +still work, but if you encounter problems try "paxson". +.SH DIAGNOSTICS +.LP +.I flex scanner jammed - +a scanner compiled with +.B -s +has encountered an input string which wasn't matched by +any of its rules. +.LP +.I flex input buffer overflowed - +a scanner rule matched a string long enough to overflow the +scanner's internal input buffer (16K bytes - controlled by +.B YY_BUF_MAX +in "flex.skel"). +.LP +.I old-style lex command ignored - +the flex input contains a lex command (e.g., "%n 1000") which +is being ignored. +.SH BUGS +.LP +Some trailing context +patterns cannot be properly matched and generate +warning messages ("Dangerous trailing context"). These are +patterns where the ending of the +first part of the rule matches the beginning of the second +part, such as "zx*/xy*", where the 'x*' matches the 'x' at +the beginning of the trailing context. (Lex doesn't get these +patterns right either.) +If desperate, you can use +.B yyless() +to effect arbitrary trailing context. +.LP +.I variable +trailing context (where both the leading and trailing parts do not have +a fixed length) entails the same performance loss as +.I REJECT +(i.e., substantial). +.LP +For some trailing context rules, parts which are actually fixed-length are +not recognized as such, leading to the abovementioned performance loss. +In particular, parts using '|' or {n} are always considered variable-length. +.LP +Use of unput() or input() trashes the current yytext and yyleng. +.LP +Use of unput() to push back more text than was matched can +result in the pushed-back text matching a beginning-of-line ('^') +rule even though it didn't come at the beginning of the line. +.LP +yytext and yyleng cannot be modified within a flex action. +.LP +Nulls are not allowed in flex inputs or in the inputs to +scanners generated by flex. Their presence generates fatal +errors. +.LP +Flex does not generate correct #line directives for code internal +to the scanner; thus, bugs in +.I +flex.skel +yield bogus line numbers. +.LP +Pushing back definitions enclosed in ()'s can result in nasty, +difficult-to-understand problems like: +.nf + + {DIG} [0-9] /* a digit */ + +.fi +In which the pushed-back text is "([0-9] /* a digit */)". +.LP +Due to both buffering of input and read-ahead, you cannot intermix +calls to stdio routines, such as, for example, +.B getchar() +with flex rules and expect it to work. Call +.B input() +instead. +.LP +The total table entries listed by the +.B -v +flag excludes the number of table entries needed to determine +what rule has been matched. The number of entries is equal +to the number of DFA states if the scanner does not use REJECT, +and somewhat greater than the number of states if it does. +.LP +To be consistent with ANSI C, the escape sequence \\xhh should +be recognized for hexadecimal escape sequences, such as '\\x41' for 'A'. +.LP +It would be useful if flex wrote to lex.yy.c a summary of the flags used in +its generation (such as which table compression options). +.LP +The scanner run-time speeds still have not been optimized as much +as they deserve. Van Jacobson's work shows that the can go +faster still. +.LP +The utility needs more complete documentation. |
