diff options
Diffstat (limited to 'static/netbsd/man1/awk.1')
| -rw-r--r-- | static/netbsd/man1/awk.1 | 696 |
1 files changed, 696 insertions, 0 deletions
diff --git a/static/netbsd/man1/awk.1 b/static/netbsd/man1/awk.1 new file mode 100644 index 00000000..496a2a65 --- /dev/null +++ b/static/netbsd/man1/awk.1 @@ -0,0 +1,696 @@ +.de EX +.nf +.ft CW +.. +.de EE +.br +.fi +.ft 1 +.. +.de TF +.IP "" "\w'\fB\\$1\ \ \fP'u" +.PD 0 +.. +.TH AWK 1 +.CT 1 files prog_other +.SH NAME +awk \- pattern-directed scanning and processing language +.SH SYNOPSIS +.B awk +[ +.BI \-F +.I fs +| +.B \-\^\-csv +] +[ +.BI \-v +.I var=value +] +[ +.I 'prog' +| +.BI \-f +.I progfile +] +[ +.I file ... +] +.SH DESCRIPTION +.I Awk +scans each input +.I file +for lines that match any of a set of patterns specified literally in +.I prog +or in one or more files +specified as +.B \-f +.IR progfile . +With each pattern +there can be an associated action that will be performed +when a line of a +.I file +matches the pattern. +Each line is matched against the +pattern portion of every pattern-action statement; +the associated action is performed for each matched pattern. +The file name +.B \- +means the standard input. +Any +.I file +of the form +.I var=value +is treated as an assignment, not a filename, +and is executed at the time it would have been opened if it were a filename. +The option +.B \-v +followed by +.I var=value +is an assignment to be done before +.I prog +is executed; +any number of +.B \-v +options may be present. +The +.B \-F +.I fs +option defines the input field separator to be the regular expression +.IR fs . +The +.B \-\^\-csv +option causes +.I awk +to process records using (more or less) standard comma-separated values +(CSV) format. +.PP +An input line is normally made up of fields separated by white space, +or by the regular expression +.BR FS . +The fields are denoted +.BR $1 , +.BR $2 , +\&..., while +.B $0 +refers to the entire line. +If +.BR FS +is null, the input line is split into one field per character. +.PP +A pattern-action statement has the form: +.IP +.IB pattern " { " action " } +.PP +A missing +.BI { " action " } +means print the line; +a missing pattern always matches. +Pattern-action statements are separated by newlines or semicolons. +.PP +An action is a sequence of statements. +A statement can be one of the following: +.PP +.EX +.ta \w'\f(CWdelete array[expression]\fR'u +.RS +.nf +.ft CW +if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP +while(\fI expression \fP)\fI statement\fP +for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP +for(\fI var \fPin\fI array \fP)\fI statement\fP +do\fI statement \fPwhile(\fI expression \fP) +break +continue +{\fR [\fP\fI statement ... \fP\fR] \fP} +\fIexpression\fP #\fR commonly\fP\fI var = expression\fP +print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP +printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP +return\fR [ \fP\fIexpression \fP\fR]\fP +next #\fR skip remaining patterns on this input line\fP +nextfile #\fR skip rest of this file, open next, start at top\fP +delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP +delete\fI array\fP #\fR delete all elements of array\fP +exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP +.fi +.RE +.EE +.DT +.PP +Statements are terminated by +semicolons, newlines or right braces. +An empty +.I expression-list +stands for +.BR $0 . +String constants are quoted \&\f(CW"\ "\fR, +with the usual C escapes recognized within. +Expressions take on string or numeric values as appropriate, +and are built using the operators +.B + \- * / % ^ +(exponentiation), and concatenation (indicated by white space). +The operators +.B +! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?: +are also available in expressions. +Variables may be scalars, array elements +(denoted +.IB x [ i ] \fR) +or fields. +Variables are initialized to the null string. +Array subscripts may be any string, +not necessarily numeric; +this allows for a form of associative memory. +Multiple subscripts such as +.B [i,j,k] +are permitted; the constituents are concatenated, +separated by the value of +.BR SUBSEP . +.PP +The +.B print +statement prints its arguments on the standard output +(or on a file if +.BI > " file +or +.BI >> " file +is present or on a pipe if +.BI | " cmd +is present), separated by the current output field separator, +and terminated by the output record separator. +.I file +and +.I cmd +may be literal names or parenthesized expressions; +identical string values in different statements denote +the same open file. +The +.B printf +statement formats its expression list according to the +.I format +(see +.IR printf (3)). +The built-in function +.BI close( expr ) +closes the file or pipe +.IR expr . +The built-in function +.BI fflush( expr ) +flushes any buffered output for the file or pipe +.IR expr . +.PP +The mathematical functions +.BR atan2 , +.BR cos , +.BR exp , +.BR log , +.BR sin , +and +.B sqrt +are built in. +Other built-in functions: +.TF "\fBlength(\fR[\fIv\^\fR]\fB)\fR" +.TP +\fBlength(\fR[\fIv\^\fR]\fB)\fR +the length of its argument +taken as a string, +number of elements in an array for an array argument, +or length of +.B $0 +if no argument. +.TP +.B rand() +random number on [0,1). +.TP +\fBsrand(\fR[\fIs\^\fR]\fB)\fR +sets seed for +.B rand +and returns the previous seed. +.TP +.BI int( x\^ ) +truncates to an integer value. +.TP +\fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR +the +.IR n -character +substring of +.I s +that begins at position +.I m +counted from 1. +If no +.IR n , +use the rest of the string. +.TP +.BI index( s , " t" ) +the position in +.I s +where the string +.I t +occurs, or 0 if it does not. +.TP +.BI match( s , " r" ) +the position in +.I s +where the regular expression +.I r +occurs, or 0 if it does not. +The variables +.B RSTART +and +.B RLENGTH +are set to the position and length of the matched string. +.TP +\fBsplit(\fIs\fB, \fIa \fR[\fB, \fIfs\^\fR]\fB)\fR +splits the string +.I s +into array elements +.IB a [1] \fR, +.IB a [2] \fR, +\&..., +.IB a [ n ] \fR, +and returns +.IR n . +The separation is done with the regular expression +.I fs +or with the field separator +.B FS +if +.I fs +is not given. +An empty string as field separator splits the string +into one array element per character. +.TP +\fBsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB) +substitutes +.I t +for the first occurrence of the regular expression +.I r +in the string +.IR s . +If +.I s +is not given, +.B $0 +is used. +.TP +\fBgsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB) +same as +.B sub +except that all occurrences of the regular expression +are replaced; +.B sub +and +.B gsub +return the number of replacements. +.TP +\fBgensub(\fIpat\fB, \fIrepl\fB, \fIhow\fR [\fB, \fItarget\fR]\fB)\fR +replaces instances of +.I pat +in +.I target +with +.IR repl . +If +.I how +is \fB"g"\fR or \fB"G"\fR, do so globally. Otherwise, +.I how +is a number indicating which occurrence to replace. If no +.IR target , +use +.BR $0 . +Return the resulting string; +.I target +is not modified. +.TP +.BI sprintf( fmt , " expr" , " ...\fB) +the string resulting from formatting +.I expr ... +according to the +.IR printf (3) +format +.IR fmt . +.TP +.B systime() +returns the current date and time as a standard +``seconds since the epoch'' value. +.TP +.BI strftime( fmt ", " timestamp\^ ) +formats +.I timestamp +(a value in seconds since the epoch) +according to +.IR fmt , +which is a format string as supported by +.IR strftime (3). +Both +.I timestamp +and +.I fmt +may be omitted; if no +.IR timestamp , +the current time of day is used, and if no +.IR fmt , +a default format of \fB"%a %b %e %H:%M:%S %Z %Y"\fR is used. +.TP +.BI system( cmd ) +executes +.I cmd +and returns its exit status. This will be \-1 upon error, +.IR cmd 's +exit status upon a normal exit, +256 + +.I sig +upon death-by-signal, where +.I sig +is the number of the murdering signal, +or 512 + +.I sig +if there was a core dump. +.TP +.BI tolower( str ) +returns a copy of +.I str +with all upper-case characters translated to their +corresponding lower-case equivalents. +.TP +.BI toupper( str ) +returns a copy of +.I str +with all lower-case characters translated to their +corresponding upper-case equivalents. +.PD +.PP +The ``function'' +.B getline +sets +.B $0 +to the next input record from the current input file; +.B getline +.BI < " file +sets +.B $0 +to the next record from +.IR file . +.B getline +.I x +sets variable +.I x +instead. +Finally, +.IB cmd " | getline +pipes the output of +.I cmd +into +.BR getline ; +each call of +.B getline +returns the next line of output from +.IR cmd . +In all cases, +.B getline +returns 1 for a successful input, +0 for end of file, and \-1 for an error. +.PP +The functions +.BR compl , +.BR and , +.BR or , +.BR xor , +.BR lshift , +and +.B rshift +peform the corresponding bitwise operations on their +operands, which are first truncated to integer. +.PP +Patterns are arbitrary Boolean combinations +(with +.BR "! || &&" ) +of regular expressions and +relational expressions. +Regular expressions are as in +.IR egrep ; +see +.IR grep (1). +Isolated regular expressions +in a pattern apply to the entire line. +Regular expressions may also occur in +relational expressions, using the operators +.B ~ +and +.BR !~ . +.BI / re / +is a constant regular expression; +any string (constant or variable) may be used +as a regular expression, except in the position of an isolated regular expression +in a pattern. +.PP +A pattern may consist of two patterns separated by a comma; +in this case, the action is performed for all lines +from an occurrence of the first pattern +through an occurrence of the second, inclusive. +.PP +A relational expression is one of the following: +.IP +.I expression matchop regular-expression +.br +.I expression relop expression +.br +.IB expression " in " array-name +.br +.BI ( expr ,\| expr ,\| ... ") in " array-name +.PP +where a +.I relop +is any of the six relational operators in C, +and a +.I matchop +is either +.B ~ +(matches) +or +.B !~ +(does not match). +A conditional is an arithmetic expression, +a relational expression, +or a Boolean combination +of these. +.PP +The special patterns +.B BEGIN +and +.B END +may be used to capture control before the first input line is read +and after the last. +.B BEGIN +and +.B END +do not combine with other patterns. +They may appear multiple times in a program and execute +in the order they are read by +.IR awk . +.PP +Variable names with special meanings: +.TF FILENAME +.TP +.B ARGC +argument count, assignable. +.TP +.B ARGV +argument array, assignable; +non-null members are taken as filenames. +.TP +.B CONVFMT +conversion format used when converting numbers +(default +.BR "%.6g" ). +.TP +.B ENVIRON +array of environment variables; subscripts are names. +.TP +.B FILENAME +the name of the current input file. +.TP +.B FNR +ordinal number of the current record in the current file. +.TP +.B FS +regular expression used to separate fields; also settable +by option +.BI \-F fs\fR. +.TP +.BR NF +number of fields in the current record. +.TP +.B NR +ordinal number of the current record. +.TP +.B OFMT +output format for numbers (default +.BR "%.6g" ). +.TP +.B OFS +output field separator (default space). +.TP +.B ORS +output record separator (default newline). +.TP +.B RLENGTH +the length of a string matched by +.BR match . +.TP +.B RS +input record separator (default newline). +If empty, blank lines separate records. +If more than one character long, +.B RS +is treated as a regular expression, and records are +separated by text matching the expression. +.TP +.B RSTART +the start position of a string matched by +.BR match . +.TP +.B SUBSEP +separates multiple subscripts (default 034). +.PD +.PP +Functions may be defined (at the position of a pattern-action statement) thus: +.IP +.B +function foo(a, b, c) { ... } +.PP +Parameters are passed by value if scalar and by reference if array name; +functions may be called recursively. +Parameters are local to the function; all other variables are global. +Thus local variables may be created by providing excess parameters in +the function definition. +.SH ENVIRONMENT VARIABLES +If +.B POSIXLY_CORRECT +is set in the environment, then +.I awk +follows the POSIX rules for +.B sub +and +.B gsub +with respect to consecutive backslashes and ampersands. +.SH EXAMPLES +.TP +.EX +length($0) > 72 +.EE +Print lines longer than 72 characters. +.TP +.EX +{ print $2, $1 } +.EE +Print first two fields in opposite order. +.PP +.EX +BEGIN { FS = ",[ \et]*|[ \et]+" } + { print $2, $1 } +.EE +.ns +.IP +Same, with input fields separated by comma and/or spaces and tabs. +.PP +.EX +.nf + { s += $1 } +END { print "sum is", s, " average is", s/NR } +.fi +.EE +.ns +.IP +Add up first column, print sum and average. +.TP +.EX +/start/, /stop/ +.EE +Print all lines between start/stop pairs. +.PP +.EX +.nf +BEGIN { # Simulate echo(1) + for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] + printf "\en" + exit } +.fi +.EE +.SH SEE ALSO +.IR grep (1), +.IR lex (1), +.IR sed (1) +.br +A. V. Aho, B. W. Kernighan, P. J. Weinberger, +.IR "The AWK Programming Language, Second Edition" , +Addison-Wesley, 2024. ISBN 978-0-13-826972-2, 0-13-826972-6. +.SH BUGS +There are no explicit conversions between numbers and strings. +To force an expression to be treated as a number add 0 to it; +to force it to be treated as a string concatenate +\&\f(CW""\fP to it. +.PP +The scope rules for variables in functions are a botch; +the syntax is worse. +.PP +Input is expected to be UTF-8 encoded. Other multibyte +character sets are not handled. +However, in eight-bit locales, +.I awk +treats each input byte as a separate character. +.SH UNUSUAL FLOATING-POINT VALUES +.I Awk +was designed before IEEE 754 arithmetic defined Not-A-Number (NaN) +and Infinity values, which are supported by all modern floating-point +hardware. +.PP +Because +.I awk +uses +.IR strtod (3) +and +.IR atof (3) +to convert string values to double-precision floating-point values, +modern C libraries also convert strings starting with +.B inf +and +.B nan +into infinity and NaN values respectively. This led to strange results, +with something like this: +.PP +.EX +.nf +echo nancy | awk '{ print $1 + 0 }' +.fi +.EE +.PP +printing +.B nan +instead of zero. +.PP +.I Awk +now follows GNU AWK, and prefilters string values before attempting +to convert them to numbers, as follows: +.TP +.I "Hexadecimal values" +Hexadecimal values (allowed since C99) convert to zero, as they did +prior to C99. +.TP +.I "NaN values" +The two strings +.B +nan +and +.B \-nan +(case independent) convert to NaN. No others do. +(NaNs can have signs.) +.TP +.I "Infinity values" +The two strings +.B +inf +and +.B \-inf +(case independent) convert to positive and negative infinity, respectively. +No others do. |
