Node:Leftmost Longest, Next:Computed Regexps, Previous:Case-sensitivity, Up:Regexp
Consider the following:
echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
This example uses the sub
function (which we haven't discussed yet;
see String Manipulation Functions)
to make a change to the input record. Here, the regexp /a+/
indicates "one or more a
characters," and the replacement
text is <A>
.
The input contains four a
characters.
awk
(and POSIX) regular expressions always match
the leftmost, longest sequence of input characters that can
match. Thus, all four a
characters are
replaced with <A>
in this example:
$ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }' -| <A>bcd
For simple match/no-match tests, this is not so important. But when doing
text matching and substitutions with the match
, sub
, gsub
,
and gensub
functions, it is very important.
Understanding this principle is also important for regexp-based record
and field splitting (see How Input Is Split into Records,
and also see Specifying How Fields Are Separated).