1*22dc650dSSadaf Ebrahimi 2*22dc650dSSadaf EbrahimiPCRE2GREP(1) General Commands Manual PCRE2GREP(1) 3*22dc650dSSadaf Ebrahimi 4*22dc650dSSadaf Ebrahimi 5*22dc650dSSadaf EbrahimiNAME 6*22dc650dSSadaf Ebrahimi pcre2grep - a grep with Perl-compatible regular expressions. 7*22dc650dSSadaf Ebrahimi 8*22dc650dSSadaf Ebrahimi 9*22dc650dSSadaf EbrahimiSYNOPSIS 10*22dc650dSSadaf Ebrahimi pcre2grep [options] [long options] [pattern] [path1 path2 ...] 11*22dc650dSSadaf Ebrahimi 12*22dc650dSSadaf Ebrahimi 13*22dc650dSSadaf EbrahimiDESCRIPTION 14*22dc650dSSadaf Ebrahimi 15*22dc650dSSadaf Ebrahimi pcre2grep searches files for character patterns, in the same way as 16*22dc650dSSadaf Ebrahimi other grep commands do, but it uses the PCRE2 regular expression li- 17*22dc650dSSadaf Ebrahimi brary to support patterns that are compatible with the regular expres- 18*22dc650dSSadaf Ebrahimi sions of Perl 5. See pcre2syntax(3) for a quick-reference summary of 19*22dc650dSSadaf Ebrahimi pattern syntax, or pcre2pattern(3) for a full description of the syntax 20*22dc650dSSadaf Ebrahimi and semantics of the regular expressions that PCRE2 supports. 21*22dc650dSSadaf Ebrahimi 22*22dc650dSSadaf Ebrahimi Patterns, whether supplied on the command line or in a separate file, 23*22dc650dSSadaf Ebrahimi are given without delimiters. For example: 24*22dc650dSSadaf Ebrahimi 25*22dc650dSSadaf Ebrahimi pcre2grep Thursday /etc/motd 26*22dc650dSSadaf Ebrahimi 27*22dc650dSSadaf Ebrahimi If you attempt to use delimiters (for example, by surrounding a pattern 28*22dc650dSSadaf Ebrahimi with slashes, as is common in Perl scripts), they are interpreted as 29*22dc650dSSadaf Ebrahimi part of the pattern. Quotes can of course be used to delimit patterns 30*22dc650dSSadaf Ebrahimi on the command line because they are interpreted by the shell, and in- 31*22dc650dSSadaf Ebrahimi deed quotes are required if a pattern contains white space or shell 32*22dc650dSSadaf Ebrahimi metacharacters. 33*22dc650dSSadaf Ebrahimi 34*22dc650dSSadaf Ebrahimi The first argument that follows any option settings is treated as the 35*22dc650dSSadaf Ebrahimi single pattern to be matched when neither -e nor -f is present. Con- 36*22dc650dSSadaf Ebrahimi versely, when one or both of these options are used to specify pat- 37*22dc650dSSadaf Ebrahimi terns, all arguments are treated as path names. At least one of -e, -f, 38*22dc650dSSadaf Ebrahimi or an argument pattern must be provided. 39*22dc650dSSadaf Ebrahimi 40*22dc650dSSadaf Ebrahimi If no files are specified, pcre2grep reads the standard input. The 41*22dc650dSSadaf Ebrahimi standard input can also be referenced by a name consisting of a single 42*22dc650dSSadaf Ebrahimi hyphen. For example: 43*22dc650dSSadaf Ebrahimi 44*22dc650dSSadaf Ebrahimi pcre2grep some-pattern file1 - file3 45*22dc650dSSadaf Ebrahimi 46*22dc650dSSadaf Ebrahimi By default, input files are searched line by line, so pattern asser- 47*22dc650dSSadaf Ebrahimi tions about the beginning and end of a subject string (^, $, \A, \Z, 48*22dc650dSSadaf Ebrahimi and \z) match at the beginning and end of each line. When a line 49*22dc650dSSadaf Ebrahimi matches a pattern, it is copied to the standard output, and if there is 50*22dc650dSSadaf Ebrahimi more than one file, the file name is output at the start of each line, 51*22dc650dSSadaf Ebrahimi followed by a colon. However, there are options that can change how 52*22dc650dSSadaf Ebrahimi pcre2grep behaves. For example, the -M option makes it possible to 53*22dc650dSSadaf Ebrahimi search for strings that span line boundaries. What defines a line 54*22dc650dSSadaf Ebrahimi boundary is controlled by the -N (--newline) option. The -h and -H op- 55*22dc650dSSadaf Ebrahimi tions control whether or not file names are shown, and the -Z option 56*22dc650dSSadaf Ebrahimi changes the file name terminator to a zero byte. 57*22dc650dSSadaf Ebrahimi 58*22dc650dSSadaf Ebrahimi The amount of memory used for buffering files that are being scanned is 59*22dc650dSSadaf Ebrahimi controlled by parameters that can be set by the --buffer-size and 60*22dc650dSSadaf Ebrahimi --max-buffer-size options. The first of these sets the size of buffer 61*22dc650dSSadaf Ebrahimi that is obtained at the start of processing. If an input file contains 62*22dc650dSSadaf Ebrahimi very long lines, a larger buffer may be needed; this is handled by au- 63*22dc650dSSadaf Ebrahimi tomatically extending the buffer, up to the limit specified by --max- 64*22dc650dSSadaf Ebrahimi buffer-size. The default values for these parameters can be set when 65*22dc650dSSadaf Ebrahimi pcre2grep is built; if nothing is specified, the defaults are set to 66*22dc650dSSadaf Ebrahimi 20KiB and 1MiB respectively. An error occurs if a line is too long and 67*22dc650dSSadaf Ebrahimi the buffer can no longer be expanded. 68*22dc650dSSadaf Ebrahimi 69*22dc650dSSadaf Ebrahimi The block of memory that is actually used is three times the "buffer 70*22dc650dSSadaf Ebrahimi size", to allow for buffering "before" and "after" lines. If the buffer 71*22dc650dSSadaf Ebrahimi size is too small, fewer than requested "before" and "after" lines may 72*22dc650dSSadaf Ebrahimi be output. 73*22dc650dSSadaf Ebrahimi 74*22dc650dSSadaf Ebrahimi When matching with a multiline pattern, the size of the buffer must be 75*22dc650dSSadaf Ebrahimi at least half of the maximum match expected or the pattern might fail 76*22dc650dSSadaf Ebrahimi to match. 77*22dc650dSSadaf Ebrahimi 78*22dc650dSSadaf Ebrahimi Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the 79*22dc650dSSadaf Ebrahimi greater. BUFSIZ is defined in <stdio.h>. When there is more than one 80*22dc650dSSadaf Ebrahimi pattern (specified by the use of -e and/or -f), each pattern is applied 81*22dc650dSSadaf Ebrahimi to each line in the order in which they are defined, except that all 82*22dc650dSSadaf Ebrahimi the -e patterns are tried before the -f patterns. 83*22dc650dSSadaf Ebrahimi 84*22dc650dSSadaf Ebrahimi By default, as soon as one pattern matches a line, no further patterns 85*22dc650dSSadaf Ebrahimi are considered. However, if --colour (or --color) is used to colour the 86*22dc650dSSadaf Ebrahimi matching substrings, or if --only-matching, --file-offsets, --line-off- 87*22dc650dSSadaf Ebrahimi sets, or --output is used to output only the part of the line that 88*22dc650dSSadaf Ebrahimi matched (either shown literally, or as an offset), the behaviour is 89*22dc650dSSadaf Ebrahimi different. In this situation, all the patterns are applied to the line. 90*22dc650dSSadaf Ebrahimi If there is more than one match, the one that begins nearest to the 91*22dc650dSSadaf Ebrahimi start of the subject is processed; if there is more than one match at 92*22dc650dSSadaf Ebrahimi that position, the one with the longest matching substring is 93*22dc650dSSadaf Ebrahimi processed; if the matching substrings are equal, the first match found 94*22dc650dSSadaf Ebrahimi is processed. 95*22dc650dSSadaf Ebrahimi 96*22dc650dSSadaf Ebrahimi Scanning with all the patterns resumes immediately following the match, 97*22dc650dSSadaf Ebrahimi so that later matches on the same line can be found. Note, however, 98*22dc650dSSadaf Ebrahimi that an overlapping match that starts in the middle of another match 99*22dc650dSSadaf Ebrahimi will not be processed. 100*22dc650dSSadaf Ebrahimi 101*22dc650dSSadaf Ebrahimi The above behaviour was changed at release 10.41 to be more compatible 102*22dc650dSSadaf Ebrahimi with GNU grep. In earlier releases, pcre2grep did not recognize matches 103*22dc650dSSadaf Ebrahimi from later patterns that were earlier in the subject. 104*22dc650dSSadaf Ebrahimi 105*22dc650dSSadaf Ebrahimi Patterns that can match an empty string are accepted, but empty string 106*22dc650dSSadaf Ebrahimi matches are never recognized. An example is the pattern "(su- 107*22dc650dSSadaf Ebrahimi per)?(man)?", in which all components are optional. This pattern finds 108*22dc650dSSadaf Ebrahimi all occurrences of both "super" and "man"; the output differs from 109*22dc650dSSadaf Ebrahimi matching with "super|man" when only the matching substrings are being 110*22dc650dSSadaf Ebrahimi shown. 111*22dc650dSSadaf Ebrahimi 112*22dc650dSSadaf Ebrahimi If the LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses 113*22dc650dSSadaf Ebrahimi the value to set a locale when calling the PCRE2 library. The --locale 114*22dc650dSSadaf Ebrahimi option can be used to override this. 115*22dc650dSSadaf Ebrahimi 116*22dc650dSSadaf Ebrahimi 117*22dc650dSSadaf EbrahimiSUPPORT FOR COMPRESSED FILES 118*22dc650dSSadaf Ebrahimi 119*22dc650dSSadaf Ebrahimi Compile-time options for pcre2grep can set it up to use libz or libbz2 120*22dc650dSSadaf Ebrahimi for reading compressed files whose names end in .gz or .bz2, respec- 121*22dc650dSSadaf Ebrahimi tively. You can find out whether your pcre2grep binary has support for 122*22dc650dSSadaf Ebrahimi one or both of these file types by running it with the --help option. 123*22dc650dSSadaf Ebrahimi If the appropriate support is not present, all files are treated as 124*22dc650dSSadaf Ebrahimi plain text. The standard input is always so treated. If a file with a 125*22dc650dSSadaf Ebrahimi .gz or .bz2 extension is not in fact compressed, it is read as a plain 126*22dc650dSSadaf Ebrahimi text file. When input is from a compressed .gz or .bz2 file, the 127*22dc650dSSadaf Ebrahimi --line-buffered option is ignored. 128*22dc650dSSadaf Ebrahimi 129*22dc650dSSadaf Ebrahimi 130*22dc650dSSadaf EbrahimiBINARY FILES 131*22dc650dSSadaf Ebrahimi 132*22dc650dSSadaf Ebrahimi By default, a file that contains a binary zero byte within the first 133*22dc650dSSadaf Ebrahimi 1024 bytes is identified as a binary file, and is processed specially. 134*22dc650dSSadaf Ebrahimi However, if the newline type is specified as NUL, that is, the line 135*22dc650dSSadaf Ebrahimi terminator is a binary zero, the test for a binary file is not applied. 136*22dc650dSSadaf Ebrahimi See the --binary-files option for a means of changing the way binary 137*22dc650dSSadaf Ebrahimi files are handled. 138*22dc650dSSadaf Ebrahimi 139*22dc650dSSadaf Ebrahimi 140*22dc650dSSadaf EbrahimiBINARY ZEROS IN PATTERNS 141*22dc650dSSadaf Ebrahimi 142*22dc650dSSadaf Ebrahimi Patterns passed from the command line are strings that are terminated 143*22dc650dSSadaf Ebrahimi by a binary zero, so cannot contain internal zeros. However, patterns 144*22dc650dSSadaf Ebrahimi that are read from a file via the -f option may contain binary zeros. 145*22dc650dSSadaf Ebrahimi 146*22dc650dSSadaf Ebrahimi 147*22dc650dSSadaf EbrahimiOPTIONS 148*22dc650dSSadaf Ebrahimi 149*22dc650dSSadaf Ebrahimi The order in which some of the options appear can affect the output. 150*22dc650dSSadaf Ebrahimi For example, both the -H and -l options affect the printing of file 151*22dc650dSSadaf Ebrahimi names. Whichever comes later in the command line will be the one that 152*22dc650dSSadaf Ebrahimi takes effect. Similarly, except where noted below, if an option is 153*22dc650dSSadaf Ebrahimi given twice, the later setting is used. Numerical values for options 154*22dc650dSSadaf Ebrahimi may be followed by K or M, to signify multiplication by 1024 or 155*22dc650dSSadaf Ebrahimi 1024*1024 respectively. 156*22dc650dSSadaf Ebrahimi 157*22dc650dSSadaf Ebrahimi -- This terminates the list of options. It is useful if the next 158*22dc650dSSadaf Ebrahimi item on the command line starts with a hyphen but is not an 159*22dc650dSSadaf Ebrahimi option. This allows for the processing of patterns and file 160*22dc650dSSadaf Ebrahimi names that start with hyphens. 161*22dc650dSSadaf Ebrahimi 162*22dc650dSSadaf Ebrahimi -A number, --after-context=number 163*22dc650dSSadaf Ebrahimi Output up to number lines of context after each matching 164*22dc650dSSadaf Ebrahimi line. Fewer lines are output if the next match or the end of 165*22dc650dSSadaf Ebrahimi the file is reached, or if the processing buffer size has 166*22dc650dSSadaf Ebrahimi been set too small. If file names and/or line numbers are be- 167*22dc650dSSadaf Ebrahimi ing output, a hyphen separator is used instead of a colon for 168*22dc650dSSadaf Ebrahimi the context lines (the -Z option can be used to change the 169*22dc650dSSadaf Ebrahimi file name terminator to a zero byte). A line containing "--" 170*22dc650dSSadaf Ebrahimi is output between each group of lines, unless they are in 171*22dc650dSSadaf Ebrahimi fact contiguous in the input file. The value of number is ex- 172*22dc650dSSadaf Ebrahimi pected to be relatively small. When -c is used, -A is ig- 173*22dc650dSSadaf Ebrahimi nored. 174*22dc650dSSadaf Ebrahimi 175*22dc650dSSadaf Ebrahimi -a, --text 176*22dc650dSSadaf Ebrahimi Treat binary files as text. This is equivalent to --binary- 177*22dc650dSSadaf Ebrahimi files=text. 178*22dc650dSSadaf Ebrahimi 179*22dc650dSSadaf Ebrahimi --allow-lookaround-bsk 180*22dc650dSSadaf Ebrahimi PCRE2 now forbids the use of \K in lookarounds by default, in 181*22dc650dSSadaf Ebrahimi line with Perl. This option causes pcre2grep to set the 182*22dc650dSSadaf Ebrahimi PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK option, which enables this 183*22dc650dSSadaf Ebrahimi somewhat dangerous usage. 184*22dc650dSSadaf Ebrahimi 185*22dc650dSSadaf Ebrahimi -B number, --before-context=number 186*22dc650dSSadaf Ebrahimi Output up to number lines of context before each matching 187*22dc650dSSadaf Ebrahimi line. Fewer lines are output if the previous match or the 188*22dc650dSSadaf Ebrahimi start of the file is within number lines, or if the process- 189*22dc650dSSadaf Ebrahimi ing buffer size has been set too small. If file names and/or 190*22dc650dSSadaf Ebrahimi line numbers are being output, a hyphen separator is used in- 191*22dc650dSSadaf Ebrahimi stead of a colon for the context lines (the -Z option can be 192*22dc650dSSadaf Ebrahimi used to change the file name terminator to a zero byte). A 193*22dc650dSSadaf Ebrahimi line containing "--" is output between each group of lines, 194*22dc650dSSadaf Ebrahimi unless they are in fact contiguous in the input file. The 195*22dc650dSSadaf Ebrahimi value of number is expected to be relatively small. When -c 196*22dc650dSSadaf Ebrahimi is used, -B is ignored. 197*22dc650dSSadaf Ebrahimi 198*22dc650dSSadaf Ebrahimi --binary-files=word 199*22dc650dSSadaf Ebrahimi Specify how binary files are to be processed. If the word is 200*22dc650dSSadaf Ebrahimi "binary" (the default), pattern matching is performed on bi- 201*22dc650dSSadaf Ebrahimi nary files, but the only output is "Binary file <name> 202*22dc650dSSadaf Ebrahimi matches" when a match succeeds. If the word is "text", which 203*22dc650dSSadaf Ebrahimi is equivalent to the -a or --text option, binary files are 204*22dc650dSSadaf Ebrahimi processed in the same way as any other file. In this case, 205*22dc650dSSadaf Ebrahimi when a match succeeds, the output may be binary garbage, 206*22dc650dSSadaf Ebrahimi which can have nasty effects if sent to a terminal. If the 207*22dc650dSSadaf Ebrahimi word is "without-match", which is equivalent to the -I op- 208*22dc650dSSadaf Ebrahimi tion, binary files are not processed at all; they are assumed 209*22dc650dSSadaf Ebrahimi not to be of interest and are skipped without causing any 210*22dc650dSSadaf Ebrahimi output or affecting the return code. 211*22dc650dSSadaf Ebrahimi 212*22dc650dSSadaf Ebrahimi --buffer-size=number 213*22dc650dSSadaf Ebrahimi Set the parameter that controls how much memory is obtained 214*22dc650dSSadaf Ebrahimi at the start of processing for buffering files that are being 215*22dc650dSSadaf Ebrahimi scanned. See also --max-buffer-size below. 216*22dc650dSSadaf Ebrahimi 217*22dc650dSSadaf Ebrahimi -C number, --context=number 218*22dc650dSSadaf Ebrahimi Output number lines of context both before and after each 219*22dc650dSSadaf Ebrahimi matching line. This is equivalent to setting both -A and -B 220*22dc650dSSadaf Ebrahimi to the same value. 221*22dc650dSSadaf Ebrahimi 222*22dc650dSSadaf Ebrahimi -c, --count 223*22dc650dSSadaf Ebrahimi Do not output lines from the files that are being scanned; 224*22dc650dSSadaf Ebrahimi instead output the number of lines that would have been 225*22dc650dSSadaf Ebrahimi shown, either because they matched, or, if -v is set, because 226*22dc650dSSadaf Ebrahimi they failed to match. By default, this count is exactly the 227*22dc650dSSadaf Ebrahimi same as the number of lines that would have been output, but 228*22dc650dSSadaf Ebrahimi if the -M (multiline) option is used (without -v), there may 229*22dc650dSSadaf Ebrahimi be more suppressed lines than the count (that is, the number 230*22dc650dSSadaf Ebrahimi of matches). 231*22dc650dSSadaf Ebrahimi 232*22dc650dSSadaf Ebrahimi If no lines are selected, the number zero is output. If sev- 233*22dc650dSSadaf Ebrahimi eral files are being scanned, a count is output for each of 234*22dc650dSSadaf Ebrahimi them and the -t option can be used to cause a total to be 235*22dc650dSSadaf Ebrahimi output at the end. However, if the --files-with-matches op- 236*22dc650dSSadaf Ebrahimi tion is also used, only those files whose counts are greater 237*22dc650dSSadaf Ebrahimi than zero are listed. When -c is used, the -A, -B, and -C op- 238*22dc650dSSadaf Ebrahimi tions are ignored. 239*22dc650dSSadaf Ebrahimi 240*22dc650dSSadaf Ebrahimi --colour, --color 241*22dc650dSSadaf Ebrahimi If this option is given without any data, it is equivalent to 242*22dc650dSSadaf Ebrahimi "--colour=auto". If data is required, it must be given in 243*22dc650dSSadaf Ebrahimi the same shell item, separated by an equals sign. 244*22dc650dSSadaf Ebrahimi 245*22dc650dSSadaf Ebrahimi --colour=value, --color=value 246*22dc650dSSadaf Ebrahimi This option specifies under what circumstances the parts of a 247*22dc650dSSadaf Ebrahimi line that matched a pattern should be coloured in the output. 248*22dc650dSSadaf Ebrahimi It is ignored if --file-offsets, --line-offsets, or --output 249*22dc650dSSadaf Ebrahimi is set. By default, output is not coloured. The value for the 250*22dc650dSSadaf Ebrahimi --colour option (which is optional, see above) may be 251*22dc650dSSadaf Ebrahimi "never", "always", or "auto". In the latter case, colouring 252*22dc650dSSadaf Ebrahimi happens only if the standard output is connected to a termi- 253*22dc650dSSadaf Ebrahimi nal. More resources are used when colouring is enabled, be- 254*22dc650dSSadaf Ebrahimi cause pcre2grep has to search for all possible matches in a 255*22dc650dSSadaf Ebrahimi line, not just one, in order to colour them all. 256*22dc650dSSadaf Ebrahimi 257*22dc650dSSadaf Ebrahimi The colour that is used can be specified by setting one of 258*22dc650dSSadaf Ebrahimi the environment variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR, 259*22dc650dSSadaf Ebrahimi PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that 260*22dc650dSSadaf Ebrahimi order. If none of these are set, pcre2grep looks for 261*22dc650dSSadaf Ebrahimi GREP_COLORS or GREP_COLOR (in that order). The value of the 262*22dc650dSSadaf Ebrahimi variable should be a string of two numbers, separated by a 263*22dc650dSSadaf Ebrahimi semicolon, except in the case of GREP_COLORS, which must 264*22dc650dSSadaf Ebrahimi start with "ms=" or "mt=" followed by two semicolon-separated 265*22dc650dSSadaf Ebrahimi colours, terminated by the end of the string or by a colon. 266*22dc650dSSadaf Ebrahimi If GREP_COLORS does not start with "ms=" or "mt=" it is ig- 267*22dc650dSSadaf Ebrahimi nored, and GREP_COLOR is checked. 268*22dc650dSSadaf Ebrahimi 269*22dc650dSSadaf Ebrahimi If the string obtained from one of the above variables con- 270*22dc650dSSadaf Ebrahimi tains any characters other than semicolon or digits, the set- 271*22dc650dSSadaf Ebrahimi ting is ignored and the default colour is used. The string is 272*22dc650dSSadaf Ebrahimi copied directly into the control string for setting colour on 273*22dc650dSSadaf Ebrahimi a terminal, so it is your responsibility to ensure that the 274*22dc650dSSadaf Ebrahimi values make sense. If no relevant environment variable is 275*22dc650dSSadaf Ebrahimi set, the default is "1;31", which gives red. 276*22dc650dSSadaf Ebrahimi 277*22dc650dSSadaf Ebrahimi -D action, --devices=action 278*22dc650dSSadaf Ebrahimi If an input path is not a regular file or a directory, "ac- 279*22dc650dSSadaf Ebrahimi tion" specifies how it is to be processed. Valid values are 280*22dc650dSSadaf Ebrahimi "read" (the default) or "skip" (silently skip the path). 281*22dc650dSSadaf Ebrahimi 282*22dc650dSSadaf Ebrahimi -d action, --directories=action 283*22dc650dSSadaf Ebrahimi If an input path is a directory, "action" specifies how it is 284*22dc650dSSadaf Ebrahimi to be processed. Valid values are "read" (the default in 285*22dc650dSSadaf Ebrahimi non-Windows environments, for compatibility with GNU grep), 286*22dc650dSSadaf Ebrahimi "recurse" (equivalent to the -r option), or "skip" (silently 287*22dc650dSSadaf Ebrahimi skip the path, the default in Windows environments). In the 288*22dc650dSSadaf Ebrahimi "read" case, directories are read as if they were ordinary 289*22dc650dSSadaf Ebrahimi files. In some operating systems the effect of reading a di- 290*22dc650dSSadaf Ebrahimi rectory like this is an immediate end-of-file; in others it 291*22dc650dSSadaf Ebrahimi may provoke an error. 292*22dc650dSSadaf Ebrahimi 293*22dc650dSSadaf Ebrahimi --depth-limit=number 294*22dc650dSSadaf Ebrahimi See --match-limit below. 295*22dc650dSSadaf Ebrahimi 296*22dc650dSSadaf Ebrahimi -E, --case-restrict 297*22dc650dSSadaf Ebrahimi When case distinctions are being ignored in Unicode mode, two 298*22dc650dSSadaf Ebrahimi ASCII letters (K and S) will by default match Unicode charac- 299*22dc650dSSadaf Ebrahimi ters U+212A (Kelvin sign) and U+017F (long S) respectively, 300*22dc650dSSadaf Ebrahimi as well as their lower case ASCII counterparts. When this op- 301*22dc650dSSadaf Ebrahimi tion is set, case equivalences are restricted such that no 302*22dc650dSSadaf Ebrahimi ASCII character matches a non-ASCII character, and vice 303*22dc650dSSadaf Ebrahimi versa. 304*22dc650dSSadaf Ebrahimi 305*22dc650dSSadaf Ebrahimi -e pattern, --regex=pattern, --regexp=pattern 306*22dc650dSSadaf Ebrahimi Specify a pattern to be matched. This option can be used mul- 307*22dc650dSSadaf Ebrahimi tiple times in order to specify several patterns. It can also 308*22dc650dSSadaf Ebrahimi be used as a way of specifying a single pattern that starts 309*22dc650dSSadaf Ebrahimi with a hyphen. When -e is used, no argument pattern is taken 310*22dc650dSSadaf Ebrahimi from the command line; all arguments are treated as file 311*22dc650dSSadaf Ebrahimi names. There is no limit to the number of patterns. They are 312*22dc650dSSadaf Ebrahimi applied to each line in the order in which they are defined. 313*22dc650dSSadaf Ebrahimi 314*22dc650dSSadaf Ebrahimi If -f is used with -e, the command line patterns are matched 315*22dc650dSSadaf Ebrahimi first, followed by the patterns from the file(s), independent 316*22dc650dSSadaf Ebrahimi of the order in which these options are specified. 317*22dc650dSSadaf Ebrahimi 318*22dc650dSSadaf Ebrahimi --exclude=pattern 319*22dc650dSSadaf Ebrahimi Files (but not directories) whose names match the pattern are 320*22dc650dSSadaf Ebrahimi skipped without being processed. This applies to all files, 321*22dc650dSSadaf Ebrahimi whether listed on the command line, obtained from --file- 322*22dc650dSSadaf Ebrahimi list, or by scanning a directory. The pattern is a PCRE2 reg- 323*22dc650dSSadaf Ebrahimi ular expression, and is matched against the final component 324*22dc650dSSadaf Ebrahimi of the file name, not the entire path. The -F, -w, and -x op- 325*22dc650dSSadaf Ebrahimi tions do not apply to this pattern. The option may be given 326*22dc650dSSadaf Ebrahimi any number of times in order to specify multiple patterns. If 327*22dc650dSSadaf Ebrahimi a file name matches both an --include and an --exclude pat- 328*22dc650dSSadaf Ebrahimi tern, it is excluded. There is no short form for this option. 329*22dc650dSSadaf Ebrahimi 330*22dc650dSSadaf Ebrahimi --exclude-from=filename 331*22dc650dSSadaf Ebrahimi Treat each non-empty line of the file as the data for an 332*22dc650dSSadaf Ebrahimi --exclude option. What constitutes a newline when reading the 333*22dc650dSSadaf Ebrahimi file is the operating system's default. The --newline option 334*22dc650dSSadaf Ebrahimi has no effect on this option. This option may be given more 335*22dc650dSSadaf Ebrahimi than once in order to specify a number of files to read. 336*22dc650dSSadaf Ebrahimi 337*22dc650dSSadaf Ebrahimi --exclude-dir=pattern 338*22dc650dSSadaf Ebrahimi Directories whose names match the pattern are skipped without 339*22dc650dSSadaf Ebrahimi being processed, whatever the setting of the --recursive op- 340*22dc650dSSadaf Ebrahimi tion. This applies to all directories, whether listed on the 341*22dc650dSSadaf Ebrahimi command line, obtained from --file-list, or by scanning a 342*22dc650dSSadaf Ebrahimi parent directory. The pattern is a PCRE2 regular expression, 343*22dc650dSSadaf Ebrahimi and is matched against the final component of the directory 344*22dc650dSSadaf Ebrahimi name, not the entire path. The -F, -w, and -x options do not 345*22dc650dSSadaf Ebrahimi apply to this pattern. The option may be given any number of 346*22dc650dSSadaf Ebrahimi times in order to specify more than one pattern. If a direc- 347*22dc650dSSadaf Ebrahimi tory matches both --include-dir and --exclude-dir, it is ex- 348*22dc650dSSadaf Ebrahimi cluded. There is no short form for this option. 349*22dc650dSSadaf Ebrahimi 350*22dc650dSSadaf Ebrahimi -F, --fixed-strings 351*22dc650dSSadaf Ebrahimi Interpret each data-matching pattern as a list of fixed 352*22dc650dSSadaf Ebrahimi strings, separated by newlines, instead of as a regular ex- 353*22dc650dSSadaf Ebrahimi pression. What constitutes a newline for this purpose is con- 354*22dc650dSSadaf Ebrahimi trolled by the --newline option. The -w (match as a word) and 355*22dc650dSSadaf Ebrahimi -x (match whole line) options can be used with -F. They ap- 356*22dc650dSSadaf Ebrahimi ply to each of the fixed strings. A line is selected if any 357*22dc650dSSadaf Ebrahimi of the fixed strings are found in it (subject to -w or -x, if 358*22dc650dSSadaf Ebrahimi present). This option applies only to the patterns that are 359*22dc650dSSadaf Ebrahimi matched against the contents of files; it does not apply to 360*22dc650dSSadaf Ebrahimi patterns specified by any of the --include or --exclude op- 361*22dc650dSSadaf Ebrahimi tions. 362*22dc650dSSadaf Ebrahimi 363*22dc650dSSadaf Ebrahimi -f filename, --file=filename 364*22dc650dSSadaf Ebrahimi Read patterns from the file, one per line. As is the case 365*22dc650dSSadaf Ebrahimi with patterns on the command line, no delimiters should be 366*22dc650dSSadaf Ebrahimi used. What constitutes a newline when reading the file is the 367*22dc650dSSadaf Ebrahimi operating system's default interpretation of \n. The --new- 368*22dc650dSSadaf Ebrahimi line option has no effect on this option. Trailing white 369*22dc650dSSadaf Ebrahimi space is removed from each line, and blank lines are ignored. 370*22dc650dSSadaf Ebrahimi An empty file contains no patterns and therefore matches 371*22dc650dSSadaf Ebrahimi nothing. Patterns read from a file in this way may contain 372*22dc650dSSadaf Ebrahimi binary zeros, which are treated as ordinary data characters. 373*22dc650dSSadaf Ebrahimi 374*22dc650dSSadaf Ebrahimi If this option is given more than once, all the specified 375*22dc650dSSadaf Ebrahimi files are read. A data line is output if any of the patterns 376*22dc650dSSadaf Ebrahimi match it. A file name can be given as "-" to refer to the 377*22dc650dSSadaf Ebrahimi standard input. When -f is used, patterns specified on the 378*22dc650dSSadaf Ebrahimi command line using -e may also be present; they are matched 379*22dc650dSSadaf Ebrahimi before the file's patterns. However, no pattern is taken from 380*22dc650dSSadaf Ebrahimi the command line; all arguments are treated as the names of 381*22dc650dSSadaf Ebrahimi paths to be searched. 382*22dc650dSSadaf Ebrahimi 383*22dc650dSSadaf Ebrahimi --file-list=filename 384*22dc650dSSadaf Ebrahimi Read a list of files and/or directories that are to be 385*22dc650dSSadaf Ebrahimi scanned from the given file, one per line. What constitutes a 386*22dc650dSSadaf Ebrahimi newline when reading the file is the operating system's de- 387*22dc650dSSadaf Ebrahimi fault. Trailing white space is removed from each line, and 388*22dc650dSSadaf Ebrahimi blank lines are ignored. These paths are processed before any 389*22dc650dSSadaf Ebrahimi that are listed on the command line. The file name can be 390*22dc650dSSadaf Ebrahimi given as "-" to refer to the standard input. If --file and 391*22dc650dSSadaf Ebrahimi --file-list are both specified as "-", patterns are read 392*22dc650dSSadaf Ebrahimi first. This is useful only when the standard input is a ter- 393*22dc650dSSadaf Ebrahimi minal, from which further lines (the list of files) can be 394*22dc650dSSadaf Ebrahimi read after an end-of-file indication. If this option is given 395*22dc650dSSadaf Ebrahimi more than once, all the specified files are read. 396*22dc650dSSadaf Ebrahimi 397*22dc650dSSadaf Ebrahimi --file-offsets 398*22dc650dSSadaf Ebrahimi Instead of showing lines or parts of lines that match, show 399*22dc650dSSadaf Ebrahimi each match as an offset from the start of the file and a 400*22dc650dSSadaf Ebrahimi length, separated by a comma. In this mode, --colour has no 401*22dc650dSSadaf Ebrahimi effect, and no context is shown. That is, the -A, -B, and -C 402*22dc650dSSadaf Ebrahimi options are ignored. If there is more than one match in a 403*22dc650dSSadaf Ebrahimi line, each of them is shown separately. This option is mutu- 404*22dc650dSSadaf Ebrahimi ally exclusive with --output, --line-offsets, and --only- 405*22dc650dSSadaf Ebrahimi matching. 406*22dc650dSSadaf Ebrahimi 407*22dc650dSSadaf Ebrahimi --group-separator=text 408*22dc650dSSadaf Ebrahimi Output this text string instead of two hyphens between groups 409*22dc650dSSadaf Ebrahimi of lines when -A, -B, or -C is in use. See also --no-group- 410*22dc650dSSadaf Ebrahimi separator. 411*22dc650dSSadaf Ebrahimi 412*22dc650dSSadaf Ebrahimi -H, --with-filename 413*22dc650dSSadaf Ebrahimi Force the inclusion of the file name at the start of output 414*22dc650dSSadaf Ebrahimi lines when searching a single file. The file name is not nor- 415*22dc650dSSadaf Ebrahimi mally shown in this case. By default, for matching lines, 416*22dc650dSSadaf Ebrahimi the file name is followed by a colon; for context lines, a 417*22dc650dSSadaf Ebrahimi hyphen separator is used. The -Z option can be used to change 418*22dc650dSSadaf Ebrahimi the terminator to a zero byte. If a line number is also being 419*22dc650dSSadaf Ebrahimi output, it follows the file name. When the -M option causes a 420*22dc650dSSadaf Ebrahimi pattern to match more than one line, only the first is pre- 421*22dc650dSSadaf Ebrahimi ceded by the file name. This option overrides any previous 422*22dc650dSSadaf Ebrahimi -h, -l, or -L options. 423*22dc650dSSadaf Ebrahimi 424*22dc650dSSadaf Ebrahimi -h, --no-filename 425*22dc650dSSadaf Ebrahimi Suppress the output file names when searching multiple files. 426*22dc650dSSadaf Ebrahimi File names are normally shown when multiple files are 427*22dc650dSSadaf Ebrahimi searched. By default, for matching lines, the file name is 428*22dc650dSSadaf Ebrahimi followed by a colon; for context lines, a hyphen separator is 429*22dc650dSSadaf Ebrahimi used. The -Z option can be used to change the terminator to a 430*22dc650dSSadaf Ebrahimi zero byte. If a line number is also being output, it follows 431*22dc650dSSadaf Ebrahimi the file name. This option overrides any previous -H, -L, or 432*22dc650dSSadaf Ebrahimi -l options. 433*22dc650dSSadaf Ebrahimi 434*22dc650dSSadaf Ebrahimi --heap-limit=number 435*22dc650dSSadaf Ebrahimi See --match-limit below. 436*22dc650dSSadaf Ebrahimi 437*22dc650dSSadaf Ebrahimi --help Output a help message, giving brief details of the command 438*22dc650dSSadaf Ebrahimi options and file type support, and then exit. Anything else 439*22dc650dSSadaf Ebrahimi on the command line is ignored. 440*22dc650dSSadaf Ebrahimi 441*22dc650dSSadaf Ebrahimi -I Ignore binary files. This is equivalent to --binary- 442*22dc650dSSadaf Ebrahimi files=without-match. 443*22dc650dSSadaf Ebrahimi 444*22dc650dSSadaf Ebrahimi -i, --ignore-case 445*22dc650dSSadaf Ebrahimi Ignore upper/lower case distinctions when pattern matching. 446*22dc650dSSadaf Ebrahimi This applies when matching path names for inclusion or exclu- 447*22dc650dSSadaf Ebrahimi sion as well as when matching lines in files. 448*22dc650dSSadaf Ebrahimi 449*22dc650dSSadaf Ebrahimi --include=pattern 450*22dc650dSSadaf Ebrahimi If any --include patterns are specified, the only files that 451*22dc650dSSadaf Ebrahimi are processed are those whose names match one of the patterns 452*22dc650dSSadaf Ebrahimi and do not match an --exclude pattern. This option does not 453*22dc650dSSadaf Ebrahimi affect directories, but it applies to all files, whether 454*22dc650dSSadaf Ebrahimi listed on the command line, obtained from --file-list, or by 455*22dc650dSSadaf Ebrahimi scanning a directory. The pattern is a PCRE2 regular expres- 456*22dc650dSSadaf Ebrahimi sion, and is matched against the final component of the file 457*22dc650dSSadaf Ebrahimi name, not the entire path. The -F, -w, and -x options do not 458*22dc650dSSadaf Ebrahimi apply to this pattern. The option may be given any number of 459*22dc650dSSadaf Ebrahimi times. If a file name matches both an --include and an --ex- 460*22dc650dSSadaf Ebrahimi clude pattern, it is excluded. There is no short form for 461*22dc650dSSadaf Ebrahimi this option. 462*22dc650dSSadaf Ebrahimi 463*22dc650dSSadaf Ebrahimi --include-from=filename 464*22dc650dSSadaf Ebrahimi Treat each non-empty line of the file as the data for an 465*22dc650dSSadaf Ebrahimi --include option. What constitutes a newline for this purpose 466*22dc650dSSadaf Ebrahimi is the operating system's default. The --newline option has 467*22dc650dSSadaf Ebrahimi no effect on this option. This option may be given any number 468*22dc650dSSadaf Ebrahimi of times; all the files are read. 469*22dc650dSSadaf Ebrahimi 470*22dc650dSSadaf Ebrahimi --include-dir=pattern 471*22dc650dSSadaf Ebrahimi If any --include-dir patterns are specified, the only direc- 472*22dc650dSSadaf Ebrahimi tories that are processed are those whose names match one of 473*22dc650dSSadaf Ebrahimi the patterns and do not match an --exclude-dir pattern. This 474*22dc650dSSadaf Ebrahimi applies to all directories, whether listed on the command 475*22dc650dSSadaf Ebrahimi line, obtained from --file-list, or by scanning a parent di- 476*22dc650dSSadaf Ebrahimi rectory. The pattern is a PCRE2 regular expression, and is 477*22dc650dSSadaf Ebrahimi matched against the final component of the directory name, 478*22dc650dSSadaf Ebrahimi not the entire path. The -F, -w, and -x options do not apply 479*22dc650dSSadaf Ebrahimi to this pattern. The option may be given any number of times. 480*22dc650dSSadaf Ebrahimi If a directory matches both --include-dir and --exclude-dir, 481*22dc650dSSadaf Ebrahimi it is excluded. There is no short form for this option. 482*22dc650dSSadaf Ebrahimi 483*22dc650dSSadaf Ebrahimi -L, --files-without-match 484*22dc650dSSadaf Ebrahimi Instead of outputting lines from the files, just output the 485*22dc650dSSadaf Ebrahimi names of the files that do not contain any lines that would 486*22dc650dSSadaf Ebrahimi have been output. Each file name is output once, on a sepa- 487*22dc650dSSadaf Ebrahimi rate line by default, but if the -Z option is set, they are 488*22dc650dSSadaf Ebrahimi separated by zero bytes instead of newlines. This option 489*22dc650dSSadaf Ebrahimi overrides any previous -H, -h, or -l options. 490*22dc650dSSadaf Ebrahimi 491*22dc650dSSadaf Ebrahimi -l, --files-with-matches 492*22dc650dSSadaf Ebrahimi Instead of outputting lines from the files, just output the 493*22dc650dSSadaf Ebrahimi names of the files containing lines that would have been out- 494*22dc650dSSadaf Ebrahimi put. Each file name is output once, on a separate line, but 495*22dc650dSSadaf Ebrahimi if the -Z option is set, they are separated by zero bytes in- 496*22dc650dSSadaf Ebrahimi stead of newlines. Searching normally stops as soon as a 497*22dc650dSSadaf Ebrahimi matching line is found in a file. However, if the -c (count) 498*22dc650dSSadaf Ebrahimi option is also used, matching continues in order to obtain 499*22dc650dSSadaf Ebrahimi the correct count, and those files that have at least one 500*22dc650dSSadaf Ebrahimi match are listed along with their counts. Using this option 501*22dc650dSSadaf Ebrahimi with -c is a way of suppressing the listing of files with no 502*22dc650dSSadaf Ebrahimi matches that occurs with -c on its own. This option overrides 503*22dc650dSSadaf Ebrahimi any previous -H, -h, or -L options. 504*22dc650dSSadaf Ebrahimi 505*22dc650dSSadaf Ebrahimi --label=name 506*22dc650dSSadaf Ebrahimi This option supplies a name to be used for the standard input 507*22dc650dSSadaf Ebrahimi when file names are being output. If not supplied, "(standard 508*22dc650dSSadaf Ebrahimi input)" is used. There is no short form for this option. 509*22dc650dSSadaf Ebrahimi 510*22dc650dSSadaf Ebrahimi --line-buffered 511*22dc650dSSadaf Ebrahimi When this option is given, non-compressed input is read and 512*22dc650dSSadaf Ebrahimi processed line by line, and the output is flushed after each 513*22dc650dSSadaf Ebrahimi write. By default, input is read in large chunks, unless 514*22dc650dSSadaf Ebrahimi pcre2grep can determine that it is reading from a terminal, 515*22dc650dSSadaf Ebrahimi which is currently possible only in Unix-like environments or 516*22dc650dSSadaf Ebrahimi Windows. Output to terminal is normally automatically flushed 517*22dc650dSSadaf Ebrahimi by the operating system. This option can be useful when the 518*22dc650dSSadaf Ebrahimi input or output is attached to a pipe and you do not want 519*22dc650dSSadaf Ebrahimi pcre2grep to buffer up large amounts of data. However, its 520*22dc650dSSadaf Ebrahimi use will affect performance, and the -M (multiline) option 521*22dc650dSSadaf Ebrahimi ceases to work. When input is from a compressed .gz or .bz2 522*22dc650dSSadaf Ebrahimi file, --line-buffered is ignored. 523*22dc650dSSadaf Ebrahimi 524*22dc650dSSadaf Ebrahimi --line-offsets 525*22dc650dSSadaf Ebrahimi Instead of showing lines or parts of lines that match, show 526*22dc650dSSadaf Ebrahimi each match as a line number, the offset from the start of the 527*22dc650dSSadaf Ebrahimi line, and a length. The line number is terminated by a colon 528*22dc650dSSadaf Ebrahimi (as usual; see the -n option), and the offset and length are 529*22dc650dSSadaf Ebrahimi separated by a comma. In this mode, --colour has no effect, 530*22dc650dSSadaf Ebrahimi and no context is shown. That is, the -A, -B, and -C options 531*22dc650dSSadaf Ebrahimi are ignored. If there is more than one match in a line, each 532*22dc650dSSadaf Ebrahimi of them is shown separately. This option is mutually exclu- 533*22dc650dSSadaf Ebrahimi sive with --output, --file-offsets, and --only-matching. 534*22dc650dSSadaf Ebrahimi 535*22dc650dSSadaf Ebrahimi --locale=locale-name 536*22dc650dSSadaf Ebrahimi This option specifies a locale to be used for pattern match- 537*22dc650dSSadaf Ebrahimi ing. It overrides the value in the LC_ALL or LC_CTYPE envi- 538*22dc650dSSadaf Ebrahimi ronment variables. If no locale is specified, the PCRE2 li- 539*22dc650dSSadaf Ebrahimi brary's default (usually the "C" locale) is used. There is no 540*22dc650dSSadaf Ebrahimi short form for this option. 541*22dc650dSSadaf Ebrahimi 542*22dc650dSSadaf Ebrahimi -M, --multiline 543*22dc650dSSadaf Ebrahimi Allow patterns to match more than one line. When this option 544*22dc650dSSadaf Ebrahimi is set, the PCRE2 library is called in "multiline" mode, and 545*22dc650dSSadaf Ebrahimi a match is allowed to continue past the end of the initial 546*22dc650dSSadaf Ebrahimi line and onto one or more subsequent lines. 547*22dc650dSSadaf Ebrahimi 548*22dc650dSSadaf Ebrahimi Patterns used with -M may usefully contain literal newline 549*22dc650dSSadaf Ebrahimi characters and internal occurrences of ^ and $ characters, 550*22dc650dSSadaf Ebrahimi because in multiline mode these can match at internal new- 551*22dc650dSSadaf Ebrahimi lines. Because pcre2grep is scanning multiple lines, the \Z 552*22dc650dSSadaf Ebrahimi and \z assertions match only at the end of the last line in 553*22dc650dSSadaf Ebrahimi the file. The \A assertion matches at the start of the first 554*22dc650dSSadaf Ebrahimi line of a match. This can be any line in the file; it is not 555*22dc650dSSadaf Ebrahimi anchored to the first line. 556*22dc650dSSadaf Ebrahimi 557*22dc650dSSadaf Ebrahimi The output for a successful match may consist of more than 558*22dc650dSSadaf Ebrahimi one line. The first line is the line in which the match 559*22dc650dSSadaf Ebrahimi started, and the last line is the line in which the match 560*22dc650dSSadaf Ebrahimi ended. If the matched string ends with a newline sequence, 561*22dc650dSSadaf Ebrahimi the output ends at the end of that line. If -v is set, none 562*22dc650dSSadaf Ebrahimi of the lines in a multi-line match are output. Once a match 563*22dc650dSSadaf Ebrahimi has been handled, scanning restarts at the beginning of the 564*22dc650dSSadaf Ebrahimi line after the one in which the match ended. 565*22dc650dSSadaf Ebrahimi 566*22dc650dSSadaf Ebrahimi The newline sequence that separates multiple lines must be 567*22dc650dSSadaf Ebrahimi matched as part of the pattern. For example, to find the 568*22dc650dSSadaf Ebrahimi phrase "regular expression" in a file where "regular" might 569*22dc650dSSadaf Ebrahimi be at the end of a line and "expression" at the start of the 570*22dc650dSSadaf Ebrahimi next line, you could use this command: 571*22dc650dSSadaf Ebrahimi 572*22dc650dSSadaf Ebrahimi pcre2grep -M 'regular\s+expression' <file> 573*22dc650dSSadaf Ebrahimi 574*22dc650dSSadaf Ebrahimi The \s escape sequence matches any white space character, in- 575*22dc650dSSadaf Ebrahimi cluding newlines, and is followed by + so as to match trail- 576*22dc650dSSadaf Ebrahimi ing white space on the first line as well as possibly han- 577*22dc650dSSadaf Ebrahimi dling a two-character newline sequence. 578*22dc650dSSadaf Ebrahimi 579*22dc650dSSadaf Ebrahimi There is a limit to the number of lines that can be matched, 580*22dc650dSSadaf Ebrahimi imposed by the way that pcre2grep buffers the input file as 581*22dc650dSSadaf Ebrahimi it scans it. With a sufficiently large processing buffer, 582*22dc650dSSadaf Ebrahimi this should not be a problem. 583*22dc650dSSadaf Ebrahimi 584*22dc650dSSadaf Ebrahimi The -M option does not work when input is read line by line 585*22dc650dSSadaf Ebrahimi (see --line-buffered.) 586*22dc650dSSadaf Ebrahimi 587*22dc650dSSadaf Ebrahimi -m number, --max-count=number 588*22dc650dSSadaf Ebrahimi Stop processing after finding number matching lines, or non- 589*22dc650dSSadaf Ebrahimi matching lines if -v is also set. Any trailing context lines 590*22dc650dSSadaf Ebrahimi are output after the final match. In multiline mode, each 591*22dc650dSSadaf Ebrahimi multiline match counts as just one line for this purpose. If 592*22dc650dSSadaf Ebrahimi this limit is reached when reading the standard input from a 593*22dc650dSSadaf Ebrahimi regular file, the file is left positioned just after the last 594*22dc650dSSadaf Ebrahimi matching line. If -c is also set, the count that is output 595*22dc650dSSadaf Ebrahimi is never greater than number. This option has no effect if 596*22dc650dSSadaf Ebrahimi used with -L, -l, or -q, or when just checking for a match in 597*22dc650dSSadaf Ebrahimi a binary file. 598*22dc650dSSadaf Ebrahimi 599*22dc650dSSadaf Ebrahimi --match-limit=number 600*22dc650dSSadaf Ebrahimi Processing some regular expression patterns may take a very 601*22dc650dSSadaf Ebrahimi long time to search for all possible matching strings. Others 602*22dc650dSSadaf Ebrahimi may require a very large amount of memory. There are three 603*22dc650dSSadaf Ebrahimi options that set resource limits for matching. 604*22dc650dSSadaf Ebrahimi 605*22dc650dSSadaf Ebrahimi The --match-limit option provides a means of limiting comput- 606*22dc650dSSadaf Ebrahimi ing resource usage when processing patterns that are not go- 607*22dc650dSSadaf Ebrahimi ing to match, but which have a very large number of possibil- 608*22dc650dSSadaf Ebrahimi ities in their search trees. The classic example is a pattern 609*22dc650dSSadaf Ebrahimi that uses nested unlimited repeats. Internally, PCRE2 has a 610*22dc650dSSadaf Ebrahimi counter that is incremented each time around its main pro- 611*22dc650dSSadaf Ebrahimi cessing loop. If the value set by --match-limit is reached, 612*22dc650dSSadaf Ebrahimi an error occurs. 613*22dc650dSSadaf Ebrahimi 614*22dc650dSSadaf Ebrahimi The --heap-limit option specifies, as a number of kibibytes 615*22dc650dSSadaf Ebrahimi (units of 1024 bytes), the maximum amount of heap memory that 616*22dc650dSSadaf Ebrahimi may be used for matching. 617*22dc650dSSadaf Ebrahimi 618*22dc650dSSadaf Ebrahimi The --depth-limit option limits the depth of nested back- 619*22dc650dSSadaf Ebrahimi tracking points, which indirectly limits the amount of memory 620*22dc650dSSadaf Ebrahimi that is used. The amount of memory needed for each backtrack- 621*22dc650dSSadaf Ebrahimi ing point depends on the number of capturing parentheses in 622*22dc650dSSadaf Ebrahimi the pattern, so the amount of memory that is used before this 623*22dc650dSSadaf Ebrahimi limit acts varies from pattern to pattern. This limit is of 624*22dc650dSSadaf Ebrahimi use only if it is set smaller than --match-limit. 625*22dc650dSSadaf Ebrahimi 626*22dc650dSSadaf Ebrahimi There are no short forms for these options. The default lim- 627*22dc650dSSadaf Ebrahimi its can be set when the PCRE2 library is compiled; if they 628*22dc650dSSadaf Ebrahimi are not specified, the defaults are very large and so effec- 629*22dc650dSSadaf Ebrahimi tively unlimited. 630*22dc650dSSadaf Ebrahimi 631*22dc650dSSadaf Ebrahimi --max-buffer-size=number 632*22dc650dSSadaf Ebrahimi This limits the expansion of the processing buffer, whose 633*22dc650dSSadaf Ebrahimi initial size can be set by --buffer-size. The maximum buffer 634*22dc650dSSadaf Ebrahimi size is silently forced to be no smaller than the starting 635*22dc650dSSadaf Ebrahimi buffer size. 636*22dc650dSSadaf Ebrahimi 637*22dc650dSSadaf Ebrahimi -N newline-type, --newline=newline-type 638*22dc650dSSadaf Ebrahimi Six different conventions for indicating the ends of lines in 639*22dc650dSSadaf Ebrahimi scanned files are supported. For example: 640*22dc650dSSadaf Ebrahimi 641*22dc650dSSadaf Ebrahimi pcre2grep -N CRLF 'some pattern' <file> 642*22dc650dSSadaf Ebrahimi 643*22dc650dSSadaf Ebrahimi The newline type may be specified in upper, lower, or mixed 644*22dc650dSSadaf Ebrahimi case. If the newline type is NUL, lines are separated by bi- 645*22dc650dSSadaf Ebrahimi nary zero characters. The other types are the single-charac- 646*22dc650dSSadaf Ebrahimi ter sequences CR (carriage return) and LF (linefeed), the 647*22dc650dSSadaf Ebrahimi two-character sequence CRLF, an "anycrlf" type, which recog- 648*22dc650dSSadaf Ebrahimi nizes any of the preceding three types, and an "any" type, 649*22dc650dSSadaf Ebrahimi for which any Unicode line ending sequence is assumed to end 650*22dc650dSSadaf Ebrahimi a line. The Unicode sequences are the three just mentioned, 651*22dc650dSSadaf Ebrahimi plus VT (vertical tab, U+000B), FF (form feed, U+000C), NEL 652*22dc650dSSadaf Ebrahimi (next line, U+0085), LS (line separator, U+2028), and PS 653*22dc650dSSadaf Ebrahimi (paragraph separator, U+2029). 654*22dc650dSSadaf Ebrahimi 655*22dc650dSSadaf Ebrahimi When the PCRE2 library is built, a default line-ending se- 656*22dc650dSSadaf Ebrahimi quence is specified. This is normally the standard sequence 657*22dc650dSSadaf Ebrahimi for the operating system. Unless otherwise specified by this 658*22dc650dSSadaf Ebrahimi option, pcre2grep uses the library's default. 659*22dc650dSSadaf Ebrahimi 660*22dc650dSSadaf Ebrahimi This option makes it possible to use pcre2grep to scan files 661*22dc650dSSadaf Ebrahimi that have come from other environments without having to mod- 662*22dc650dSSadaf Ebrahimi ify their line endings. If the data that is being scanned 663*22dc650dSSadaf Ebrahimi does not agree with the convention set by this option, 664*22dc650dSSadaf Ebrahimi pcre2grep may behave in strange ways. Note that this option 665*22dc650dSSadaf Ebrahimi does not apply to files specified by the -f, --exclude-from, 666*22dc650dSSadaf Ebrahimi or --include-from options, which are expected to use the op- 667*22dc650dSSadaf Ebrahimi erating system's standard newline sequence. 668*22dc650dSSadaf Ebrahimi 669*22dc650dSSadaf Ebrahimi -n, --line-number 670*22dc650dSSadaf Ebrahimi Precede each output line by its line number in the file, fol- 671*22dc650dSSadaf Ebrahimi lowed by a colon for matching lines or a hyphen for context 672*22dc650dSSadaf Ebrahimi lines. If the file name is also being output, it precedes the 673*22dc650dSSadaf Ebrahimi line number. When the -M option causes a pattern to match 674*22dc650dSSadaf Ebrahimi more than one line, only the first is preceded by its line 675*22dc650dSSadaf Ebrahimi number. This option is forced if --line-offsets is used. 676*22dc650dSSadaf Ebrahimi 677*22dc650dSSadaf Ebrahimi --no-group-separator 678*22dc650dSSadaf Ebrahimi Do not output a separator between groups of lines when -A, 679*22dc650dSSadaf Ebrahimi -B, or -C is in use. The default is to output a line contain- 680*22dc650dSSadaf Ebrahimi ing two hyphens. See also --group-separator. 681*22dc650dSSadaf Ebrahimi 682*22dc650dSSadaf Ebrahimi --no-jit If the PCRE2 library is built with support for just-in-time 683*22dc650dSSadaf Ebrahimi compiling (which speeds up matching), pcre2grep automatically 684*22dc650dSSadaf Ebrahimi makes use of this, unless it was explicitly disabled at build 685*22dc650dSSadaf Ebrahimi time. This option can be used to disable the use of JIT at 686*22dc650dSSadaf Ebrahimi run time. It is provided for testing and working around prob- 687*22dc650dSSadaf Ebrahimi lems. It should never be needed in normal use. 688*22dc650dSSadaf Ebrahimi 689*22dc650dSSadaf Ebrahimi -O text, --output=text 690*22dc650dSSadaf Ebrahimi When there is a match, instead of outputting the line that 691*22dc650dSSadaf Ebrahimi matched, output just the text specified in this option, fol- 692*22dc650dSSadaf Ebrahimi lowed by an operating-system standard newline. In this mode, 693*22dc650dSSadaf Ebrahimi --colour has no effect, and no context is shown. That is, 694*22dc650dSSadaf Ebrahimi the -A, -B, and -C options are ignored. The --newline option 695*22dc650dSSadaf Ebrahimi has no effect on this option, which is mutually exclusive 696*22dc650dSSadaf Ebrahimi with --only-matching, --file-offsets, and --line-offsets. 697*22dc650dSSadaf Ebrahimi However, like --only-matching, if there is more than one 698*22dc650dSSadaf Ebrahimi match in a line, each of them causes a line of output. 699*22dc650dSSadaf Ebrahimi 700*22dc650dSSadaf Ebrahimi Escape sequences starting with a dollar character may be used 701*22dc650dSSadaf Ebrahimi to insert the contents of the matched part of the line and/or 702*22dc650dSSadaf Ebrahimi captured substrings into the text. 703*22dc650dSSadaf Ebrahimi 704*22dc650dSSadaf Ebrahimi $<digits> or ${<digits>} is replaced by the captured sub- 705*22dc650dSSadaf Ebrahimi string of the given decimal number; zero substitutes the 706*22dc650dSSadaf Ebrahimi whole match. If the number is greater than the number of cap- 707*22dc650dSSadaf Ebrahimi turing substrings, or if the capture is unset, the replace- 708*22dc650dSSadaf Ebrahimi ment is empty. 709*22dc650dSSadaf Ebrahimi 710*22dc650dSSadaf Ebrahimi $a is replaced by bell; $b by backspace; $e by escape; $f by 711*22dc650dSSadaf Ebrahimi form feed; $n by newline; $r by carriage return; $t by tab; 712*22dc650dSSadaf Ebrahimi $v by vertical tab. 713*22dc650dSSadaf Ebrahimi 714*22dc650dSSadaf Ebrahimi $o<digits> or $o{<digits>} is replaced by the character whose 715*22dc650dSSadaf Ebrahimi code point is the given octal number. In the first form, up 716*22dc650dSSadaf Ebrahimi to three octal digits are processed. When more digits are 717*22dc650dSSadaf Ebrahimi needed in Unicode mode to specify a wide character, the sec- 718*22dc650dSSadaf Ebrahimi ond form must be used. 719*22dc650dSSadaf Ebrahimi 720*22dc650dSSadaf Ebrahimi $x<digits> or $x{<digits>} is replaced by the character rep- 721*22dc650dSSadaf Ebrahimi resented by the given hexadecimal number. In the first form, 722*22dc650dSSadaf Ebrahimi up to two hexadecimal digits are processed. When more digits 723*22dc650dSSadaf Ebrahimi are needed in Unicode mode to specify a wide character, the 724*22dc650dSSadaf Ebrahimi second form must be used. 725*22dc650dSSadaf Ebrahimi 726*22dc650dSSadaf Ebrahimi Any other character is substituted by itself. In particular, 727*22dc650dSSadaf Ebrahimi $$ is replaced by a single dollar. 728*22dc650dSSadaf Ebrahimi 729*22dc650dSSadaf Ebrahimi -o, --only-matching 730*22dc650dSSadaf Ebrahimi Show only the part of the line that matched a pattern instead 731*22dc650dSSadaf Ebrahimi of the whole line. In this mode, no context is shown. That 732*22dc650dSSadaf Ebrahimi is, the -A, -B, and -C options are ignored. If there is more 733*22dc650dSSadaf Ebrahimi than one match in a line, each of them is shown separately, 734*22dc650dSSadaf Ebrahimi on a separate line of output. If -o is combined with -v (in- 735*22dc650dSSadaf Ebrahimi vert the sense of the match to find non-matching lines), no 736*22dc650dSSadaf Ebrahimi output is generated, but the return code is set appropri- 737*22dc650dSSadaf Ebrahimi ately. If the matched portion of the line is empty, nothing 738*22dc650dSSadaf Ebrahimi is output unless the file name or line number are being 739*22dc650dSSadaf Ebrahimi printed, in which case they are shown on an otherwise empty 740*22dc650dSSadaf Ebrahimi line. This option is mutually exclusive with --output, 741*22dc650dSSadaf Ebrahimi --file-offsets and --line-offsets. 742*22dc650dSSadaf Ebrahimi 743*22dc650dSSadaf Ebrahimi -onumber, --only-matching=number 744*22dc650dSSadaf Ebrahimi Show only the part of the line that matched the capturing 745*22dc650dSSadaf Ebrahimi parentheses of the given number. Up to 50 capturing parenthe- 746*22dc650dSSadaf Ebrahimi ses are supported by default. This limit can be changed via 747*22dc650dSSadaf Ebrahimi the --om-capture option. A pattern may contain any number of 748*22dc650dSSadaf Ebrahimi capturing parentheses, but only those whose number is within 749*22dc650dSSadaf Ebrahimi the limit can be accessed by -o. An error occurs if the num- 750*22dc650dSSadaf Ebrahimi ber specified by -o is greater than the limit. 751*22dc650dSSadaf Ebrahimi 752*22dc650dSSadaf Ebrahimi -o0 is the same as -o without a number. Because these options 753*22dc650dSSadaf Ebrahimi can be given without an argument (see above), if an argument 754*22dc650dSSadaf Ebrahimi is present, it must be given in the same shell item, for ex- 755*22dc650dSSadaf Ebrahimi ample, -o3 or --only-matching=2. The comments given for the 756*22dc650dSSadaf Ebrahimi non-argument case above also apply to this option. If the 757*22dc650dSSadaf Ebrahimi specified capturing parentheses do not exist in the pattern, 758*22dc650dSSadaf Ebrahimi or were not set in the match, nothing is output unless the 759*22dc650dSSadaf Ebrahimi file name or line number are being output. 760*22dc650dSSadaf Ebrahimi 761*22dc650dSSadaf Ebrahimi If this option is given multiple times, multiple substrings 762*22dc650dSSadaf Ebrahimi are output for each match, in the order the options are 763*22dc650dSSadaf Ebrahimi given, and all on one line. For example, -o3 -o1 -o3 causes 764*22dc650dSSadaf Ebrahimi the substrings matched by capturing parentheses 3 and 1 and 765*22dc650dSSadaf Ebrahimi then 3 again to be output. By default, there is no separator 766*22dc650dSSadaf Ebrahimi (but see the next but one option). 767*22dc650dSSadaf Ebrahimi 768*22dc650dSSadaf Ebrahimi --om-capture=number 769*22dc650dSSadaf Ebrahimi Set the number of capturing parentheses that can be accessed 770*22dc650dSSadaf Ebrahimi by -o. The default is 50. 771*22dc650dSSadaf Ebrahimi 772*22dc650dSSadaf Ebrahimi --om-separator=text 773*22dc650dSSadaf Ebrahimi Specify a separating string for multiple occurrences of -o. 774*22dc650dSSadaf Ebrahimi The default is an empty string. Separating strings are never 775*22dc650dSSadaf Ebrahimi coloured. 776*22dc650dSSadaf Ebrahimi 777*22dc650dSSadaf Ebrahimi -P, --no-ucp 778*22dc650dSSadaf Ebrahimi Starting from release 10.43, when UTF/Unicode mode is speci- 779*22dc650dSSadaf Ebrahimi fied with -u or -U, the PCRE2_UCP option is used by default. 780*22dc650dSSadaf Ebrahimi This means that the POSIX classes in patterns match more than 781*22dc650dSSadaf Ebrahimi just ASCII characters. For example, [:digit:] matches any 782*22dc650dSSadaf Ebrahimi Unicode decimal digit. The --no-ucp option suppresses 783*22dc650dSSadaf Ebrahimi PCRE2_UCP, thus restricting the POSIX classes to ASCII char- 784*22dc650dSSadaf Ebrahimi acters, as was the case in earlier releases. Note that there 785*22dc650dSSadaf Ebrahimi are now more fine-grained option settings within patterns 786*22dc650dSSadaf Ebrahimi that affect individual classes. For example, when in UCP 787*22dc650dSSadaf Ebrahimi mode, the sequence (?aP) restricts [:word:] to ASCII letters, 788*22dc650dSSadaf Ebrahimi while allowing \w to match Unicode letters and digits. 789*22dc650dSSadaf Ebrahimi 790*22dc650dSSadaf Ebrahimi -q, --quiet 791*22dc650dSSadaf Ebrahimi Work quietly, that is, display nothing except error messages. 792*22dc650dSSadaf Ebrahimi The exit status indicates whether or not any matches were 793*22dc650dSSadaf Ebrahimi found. 794*22dc650dSSadaf Ebrahimi 795*22dc650dSSadaf Ebrahimi -r, --recursive 796*22dc650dSSadaf Ebrahimi If any given path is a directory, recursively scan the files 797*22dc650dSSadaf Ebrahimi it contains, taking note of any --include and --exclude set- 798*22dc650dSSadaf Ebrahimi tings. By default, a directory is read as a normal file; in 799*22dc650dSSadaf Ebrahimi some operating systems this gives an immediate end-of-file. 800*22dc650dSSadaf Ebrahimi This option is a shorthand for setting the -d option to "re- 801*22dc650dSSadaf Ebrahimi curse". 802*22dc650dSSadaf Ebrahimi 803*22dc650dSSadaf Ebrahimi --recursion-limit=number 804*22dc650dSSadaf Ebrahimi This is an obsolete synonym for --depth-limit. See --match- 805*22dc650dSSadaf Ebrahimi limit above for details. 806*22dc650dSSadaf Ebrahimi 807*22dc650dSSadaf Ebrahimi -s, --no-messages 808*22dc650dSSadaf Ebrahimi Suppress error messages about non-existent or unreadable 809*22dc650dSSadaf Ebrahimi files. Such files are quietly skipped. However, the return 810*22dc650dSSadaf Ebrahimi code is still 2, even if matches were found in other files. 811*22dc650dSSadaf Ebrahimi 812*22dc650dSSadaf Ebrahimi -t, --total-count 813*22dc650dSSadaf Ebrahimi This option is useful when scanning more than one file. If 814*22dc650dSSadaf Ebrahimi used on its own, -t suppresses all output except for a grand 815*22dc650dSSadaf Ebrahimi total number of matching lines (or non-matching lines if -v 816*22dc650dSSadaf Ebrahimi is used) in all the files. If -t is used with -c, a grand to- 817*22dc650dSSadaf Ebrahimi tal is output except when the previous output is just one 818*22dc650dSSadaf Ebrahimi line. In other words, it is not output when just one file's 819*22dc650dSSadaf Ebrahimi count is listed. If file names are being output, the grand 820*22dc650dSSadaf Ebrahimi total is preceded by "TOTAL:". Otherwise, it appears as just 821*22dc650dSSadaf Ebrahimi another number. The -t option is ignored when used with -L 822*22dc650dSSadaf Ebrahimi (list files without matches), because the grand total would 823*22dc650dSSadaf Ebrahimi always be zero. 824*22dc650dSSadaf Ebrahimi 825*22dc650dSSadaf Ebrahimi -u, --utf Operate in UTF/Unicode mode. This option is available only if 826*22dc650dSSadaf Ebrahimi PCRE2 has been compiled with UTF-8 support. All patterns (in- 827*22dc650dSSadaf Ebrahimi cluding those for any --exclude and --include options) and 828*22dc650dSSadaf Ebrahimi all lines that are scanned must be valid strings of UTF-8 829*22dc650dSSadaf Ebrahimi characters. If an invalid UTF-8 string is encountered, an er- 830*22dc650dSSadaf Ebrahimi ror occurs. 831*22dc650dSSadaf Ebrahimi 832*22dc650dSSadaf Ebrahimi -U, --utf-allow-invalid 833*22dc650dSSadaf Ebrahimi As --utf, but in addition subject lines may contain invalid 834*22dc650dSSadaf Ebrahimi UTF-8 code unit sequences. These can never form part of any 835*22dc650dSSadaf Ebrahimi pattern match. Patterns themselves, however, must still be 836*22dc650dSSadaf Ebrahimi valid UTF-8 strings. This facility allows valid UTF-8 strings 837*22dc650dSSadaf Ebrahimi to be sought within arbitrary byte sequences in executable or 838*22dc650dSSadaf Ebrahimi other binary files. For more details about matching in non- 839*22dc650dSSadaf Ebrahimi valid UTF-8 strings, see the pcre2unicode(3) documentation. 840*22dc650dSSadaf Ebrahimi 841*22dc650dSSadaf Ebrahimi -V, --version 842*22dc650dSSadaf Ebrahimi Write the version numbers of pcre2grep and the PCRE2 library 843*22dc650dSSadaf Ebrahimi to the standard output and then exit. Anything else on the 844*22dc650dSSadaf Ebrahimi command line is ignored. 845*22dc650dSSadaf Ebrahimi 846*22dc650dSSadaf Ebrahimi -v, --invert-match 847*22dc650dSSadaf Ebrahimi Invert the sense of the match, so that lines which do not 848*22dc650dSSadaf Ebrahimi match any of the patterns are the ones that are found. When 849*22dc650dSSadaf Ebrahimi this option is set, options such as --only-matching and 850*22dc650dSSadaf Ebrahimi --output, which specify parts of a match that are to be out- 851*22dc650dSSadaf Ebrahimi put, are ignored. 852*22dc650dSSadaf Ebrahimi 853*22dc650dSSadaf Ebrahimi -w, --word-regex, --word-regexp 854*22dc650dSSadaf Ebrahimi Force the patterns only to match "words". That is, there must 855*22dc650dSSadaf Ebrahimi be a word boundary at the start and end of each matched 856*22dc650dSSadaf Ebrahimi string. This is equivalent to having "\b(?:" at the start of 857*22dc650dSSadaf Ebrahimi each pattern, and ")\b" at the end. This option applies only 858*22dc650dSSadaf Ebrahimi to the patterns that are matched against the contents of 859*22dc650dSSadaf Ebrahimi files; it does not apply to patterns specified by any of the 860*22dc650dSSadaf Ebrahimi --include or --exclude options. 861*22dc650dSSadaf Ebrahimi 862*22dc650dSSadaf Ebrahimi -x, --line-regex, --line-regexp 863*22dc650dSSadaf Ebrahimi Force the patterns to start matching only at the beginnings 864*22dc650dSSadaf Ebrahimi of lines, and in addition, require them to match entire 865*22dc650dSSadaf Ebrahimi lines. In multiline mode the match may be more than one line. 866*22dc650dSSadaf Ebrahimi This is equivalent to having "^(?:" at the start of each pat- 867*22dc650dSSadaf Ebrahimi tern and ")$" at the end. This option applies only to the 868*22dc650dSSadaf Ebrahimi patterns that are matched against the contents of files; it 869*22dc650dSSadaf Ebrahimi does not apply to patterns specified by any of the --include 870*22dc650dSSadaf Ebrahimi or --exclude options. 871*22dc650dSSadaf Ebrahimi 872*22dc650dSSadaf Ebrahimi -Z, --null 873*22dc650dSSadaf Ebrahimi Terminate files names in the regular output with a zero byte 874*22dc650dSSadaf Ebrahimi (the NUL character) instead of what would normally appear. 875*22dc650dSSadaf Ebrahimi This is useful when file names contain unusual characters 876*22dc650dSSadaf Ebrahimi such as colons, hyphens, or even newlines. The option does 877*22dc650dSSadaf Ebrahimi not apply to file names in error messages. 878*22dc650dSSadaf Ebrahimi 879*22dc650dSSadaf Ebrahimi 880*22dc650dSSadaf EbrahimiENVIRONMENT VARIABLES 881*22dc650dSSadaf Ebrahimi 882*22dc650dSSadaf Ebrahimi The environment variables LC_ALL and LC_CTYPE are examined, in that or- 883*22dc650dSSadaf Ebrahimi der, for a locale. The first one that is set is used. This can be over- 884*22dc650dSSadaf Ebrahimi ridden by the --locale option. If no locale is set, the PCRE2 library's 885*22dc650dSSadaf Ebrahimi default (usually the "C" locale) is used. 886*22dc650dSSadaf Ebrahimi 887*22dc650dSSadaf Ebrahimi 888*22dc650dSSadaf EbrahimiNEWLINES 889*22dc650dSSadaf Ebrahimi 890*22dc650dSSadaf Ebrahimi The -N (--newline) option allows pcre2grep to scan files with newline 891*22dc650dSSadaf Ebrahimi conventions that differ from the default. This option affects only the 892*22dc650dSSadaf Ebrahimi way scanned files are processed. It does not affect the interpretation 893*22dc650dSSadaf Ebrahimi of files specified by the -f, --file-list, --exclude-from, or --in- 894*22dc650dSSadaf Ebrahimi clude-from options. 895*22dc650dSSadaf Ebrahimi 896*22dc650dSSadaf Ebrahimi Any parts of the scanned input files that are written to the standard 897*22dc650dSSadaf Ebrahimi output are copied with whatever newline sequences they have in the in- 898*22dc650dSSadaf Ebrahimi put. However, if the final line of a file is output, and it does not 899*22dc650dSSadaf Ebrahimi end with a newline sequence, a newline sequence is added. If the new- 900*22dc650dSSadaf Ebrahimi line setting is CR, LF, CRLF or NUL, that line ending is output; for 901*22dc650dSSadaf Ebrahimi the other settings (ANYCRLF or ANY) a single NL is used. 902*22dc650dSSadaf Ebrahimi 903*22dc650dSSadaf Ebrahimi The newline setting does not affect the way in which pcre2grep writes 904*22dc650dSSadaf Ebrahimi newlines in informational messages to the standard output and error 905*22dc650dSSadaf Ebrahimi streams. Under Windows, the standard output is set to be binary, so 906*22dc650dSSadaf Ebrahimi that "\r\n" at the ends of output lines that are copied from the input 907*22dc650dSSadaf Ebrahimi is not converted to "\r\r\n" by the C I/O library. This means that any 908*22dc650dSSadaf Ebrahimi messages written to the standard output must end with "\r\n". For all 909*22dc650dSSadaf Ebrahimi other operating systems, and for all messages to the standard error 910*22dc650dSSadaf Ebrahimi stream, "\n" is used. 911*22dc650dSSadaf Ebrahimi 912*22dc650dSSadaf Ebrahimi 913*22dc650dSSadaf EbrahimiOPTIONS COMPATIBILITY WITH GNU GREP 914*22dc650dSSadaf Ebrahimi 915*22dc650dSSadaf Ebrahimi Many of the short and long forms of pcre2grep's options are the same as 916*22dc650dSSadaf Ebrahimi in the GNU grep program. Any long option of the form --xxx-regexp (GNU 917*22dc650dSSadaf Ebrahimi terminology) is also available as --xxx-regex (PCRE2 terminology). 918*22dc650dSSadaf Ebrahimi However, the --case-restrict, --depth-limit, -E, --file-list, --file- 919*22dc650dSSadaf Ebrahimi offsets, --heap-limit, --include-dir, --line-offsets, --locale, 920*22dc650dSSadaf Ebrahimi --match-limit, -M, --multiline, -N, --newline, --no-ucp, --om-separa- 921*22dc650dSSadaf Ebrahimi tor, --output, -P, -u, --utf, -U, and --utf-allow-invalid options are 922*22dc650dSSadaf Ebrahimi specific to pcre2grep, as is the use of the --only-matching option with 923*22dc650dSSadaf Ebrahimi a capturing parentheses number. 924*22dc650dSSadaf Ebrahimi 925*22dc650dSSadaf Ebrahimi Although most of the common options work the same way, a few are dif- 926*22dc650dSSadaf Ebrahimi ferent in pcre2grep. For example, the --include option's argument is a 927*22dc650dSSadaf Ebrahimi glob for GNU grep, but in pcre2grep it is a regular expression to which 928*22dc650dSSadaf Ebrahimi the -i option applies. If both the -c and -l options are given, GNU 929*22dc650dSSadaf Ebrahimi grep lists only file names, without counts, but pcre2grep gives the 930*22dc650dSSadaf Ebrahimi counts as well. 931*22dc650dSSadaf Ebrahimi 932*22dc650dSSadaf Ebrahimi 933*22dc650dSSadaf EbrahimiOPTIONS WITH DATA 934*22dc650dSSadaf Ebrahimi 935*22dc650dSSadaf Ebrahimi There are four different ways in which an option with data can be spec- 936*22dc650dSSadaf Ebrahimi ified. If a short form option is used, the data may follow immedi- 937*22dc650dSSadaf Ebrahimi ately, or (with one exception) in the next command line item. For exam- 938*22dc650dSSadaf Ebrahimi ple: 939*22dc650dSSadaf Ebrahimi 940*22dc650dSSadaf Ebrahimi -f/some/file 941*22dc650dSSadaf Ebrahimi -f /some/file 942*22dc650dSSadaf Ebrahimi 943*22dc650dSSadaf Ebrahimi The exception is the -o option, which may appear with or without data. 944*22dc650dSSadaf Ebrahimi Because of this, if data is present, it must follow immediately in the 945*22dc650dSSadaf Ebrahimi same item, for example -o3. 946*22dc650dSSadaf Ebrahimi 947*22dc650dSSadaf Ebrahimi If a long form option is used, the data may appear in the same command 948*22dc650dSSadaf Ebrahimi line item, separated by an equals character, or (with two exceptions) 949*22dc650dSSadaf Ebrahimi it may appear in the next command line item. For example: 950*22dc650dSSadaf Ebrahimi 951*22dc650dSSadaf Ebrahimi --file=/some/file 952*22dc650dSSadaf Ebrahimi --file /some/file 953*22dc650dSSadaf Ebrahimi 954*22dc650dSSadaf Ebrahimi Note, however, that if you want to supply a file name beginning with ~ 955*22dc650dSSadaf Ebrahimi as data in a shell command, and have the shell expand ~ to a home di- 956*22dc650dSSadaf Ebrahimi rectory, you must separate the file name from the option, because the 957*22dc650dSSadaf Ebrahimi shell does not treat ~ specially unless it is at the start of an item. 958*22dc650dSSadaf Ebrahimi 959*22dc650dSSadaf Ebrahimi The exceptions to the above are the --colour (or --color) and --only- 960*22dc650dSSadaf Ebrahimi matching options, for which the data is optional. If one of these op- 961*22dc650dSSadaf Ebrahimi tions does have data, it must be given in the first form, using an 962*22dc650dSSadaf Ebrahimi equals character. Otherwise pcre2grep will assume that it has no data. 963*22dc650dSSadaf Ebrahimi 964*22dc650dSSadaf Ebrahimi 965*22dc650dSSadaf EbrahimiUSING PCRE2'S CALLOUT FACILITY 966*22dc650dSSadaf Ebrahimi 967*22dc650dSSadaf Ebrahimi pcre2grep has, by default, support for calling external programs or 968*22dc650dSSadaf Ebrahimi scripts or echoing specific strings during matching by making use of 969*22dc650dSSadaf Ebrahimi PCRE2's callout facility. However, this support can be completely or 970*22dc650dSSadaf Ebrahimi partially disabled when pcre2grep is built. You can find out whether 971*22dc650dSSadaf Ebrahimi your binary has support for callouts by running it with the --help op- 972*22dc650dSSadaf Ebrahimi tion. If callout support is completely disabled, all callouts in pat- 973*22dc650dSSadaf Ebrahimi terns are ignored by pcre2grep. If the facility is partially disabled, 974*22dc650dSSadaf Ebrahimi calling external programs is not supported, and callouts that request 975*22dc650dSSadaf Ebrahimi it are ignored. 976*22dc650dSSadaf Ebrahimi 977*22dc650dSSadaf Ebrahimi A callout in a PCRE2 pattern is of the form (?C<arg>) where the argu- 978*22dc650dSSadaf Ebrahimi ment is either a number or a quoted string (see the pcre2callout docu- 979*22dc650dSSadaf Ebrahimi mentation for details). Numbered callouts are ignored by pcre2grep; 980*22dc650dSSadaf Ebrahimi only callouts with string arguments are useful. 981*22dc650dSSadaf Ebrahimi 982*22dc650dSSadaf Ebrahimi Echoing a specific string 983*22dc650dSSadaf Ebrahimi 984*22dc650dSSadaf Ebrahimi Starting the callout string with a pipe character invokes an echoing 985*22dc650dSSadaf Ebrahimi facility that avoids calling an external program or script. This facil- 986*22dc650dSSadaf Ebrahimi ity is always available, provided that callouts were not completely 987*22dc650dSSadaf Ebrahimi disabled when pcre2grep was built. The rest of the callout string is 988*22dc650dSSadaf Ebrahimi processed as a zero-terminated string, which means it should not con- 989*22dc650dSSadaf Ebrahimi tain any internal binary zeros. It is written to the output, having 990*22dc650dSSadaf Ebrahimi first been passed through the same escape processing as text from the 991*22dc650dSSadaf Ebrahimi --output (-O) option (see above). However, $0 cannot be used to insert 992*22dc650dSSadaf Ebrahimi a matched substring because the match is still in progress. Instead, 993*22dc650dSSadaf Ebrahimi the single character '0' is inserted. Any syntax errors in the string 994*22dc650dSSadaf Ebrahimi (for example, a dollar not followed by another character) causes the 995*22dc650dSSadaf Ebrahimi callout to be ignored. No terminator is added to the output string, so 996*22dc650dSSadaf Ebrahimi if you want a newline, you must include it explicitly using the escape 997*22dc650dSSadaf Ebrahimi $n. For example: 998*22dc650dSSadaf Ebrahimi 999*22dc650dSSadaf Ebrahimi pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' <some file> 1000*22dc650dSSadaf Ebrahimi 1001*22dc650dSSadaf Ebrahimi Matching continues normally after the string is output. If you want to 1002*22dc650dSSadaf Ebrahimi see only the callout output but not any output from an actual match, 1003*22dc650dSSadaf Ebrahimi you should end the pattern with (*FAIL). 1004*22dc650dSSadaf Ebrahimi 1005*22dc650dSSadaf Ebrahimi Calling external programs or scripts 1006*22dc650dSSadaf Ebrahimi 1007*22dc650dSSadaf Ebrahimi This facility can be independently disabled when pcre2grep is built. It 1008*22dc650dSSadaf Ebrahimi is supported for Windows, where a call to _spawnvp() is used, for VMS, 1009*22dc650dSSadaf Ebrahimi where lib$spawn() is used, and for any Unix-like environment where 1010*22dc650dSSadaf Ebrahimi fork() and execv() are available. 1011*22dc650dSSadaf Ebrahimi 1012*22dc650dSSadaf Ebrahimi If the callout string does not start with a pipe (vertical bar) charac- 1013*22dc650dSSadaf Ebrahimi ter, it is parsed into a list of substrings separated by pipe charac- 1014*22dc650dSSadaf Ebrahimi ters. The first substring must be an executable name, with the follow- 1015*22dc650dSSadaf Ebrahimi ing substrings specifying arguments: 1016*22dc650dSSadaf Ebrahimi 1017*22dc650dSSadaf Ebrahimi executable_name|arg1|arg2|... 1018*22dc650dSSadaf Ebrahimi 1019*22dc650dSSadaf Ebrahimi Any substring (including the executable name) may contain escape se- 1020*22dc650dSSadaf Ebrahimi quences started by a dollar character. These are the same as for the 1021*22dc650dSSadaf Ebrahimi --output (-O) option documented above, except that $0 cannot insert the 1022*22dc650dSSadaf Ebrahimi matched string because the match is still in progress. Instead, the 1023*22dc650dSSadaf Ebrahimi character '0' is inserted. If you need a literal dollar or pipe charac- 1024*22dc650dSSadaf Ebrahimi ter in any substring, use $$ or $| respectively. Here is an example: 1025*22dc650dSSadaf Ebrahimi 1026*22dc650dSSadaf Ebrahimi echo -e "abcde\n12345" | pcre2grep \ 1027*22dc650dSSadaf Ebrahimi '(?x)(.)(..(.)) 1028*22dc650dSSadaf Ebrahimi (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' - 1029*22dc650dSSadaf Ebrahimi 1030*22dc650dSSadaf Ebrahimi Output: 1031*22dc650dSSadaf Ebrahimi 1032*22dc650dSSadaf Ebrahimi Arg1: [a] [bcd] [d] Arg2: |a| () 1033*22dc650dSSadaf Ebrahimi abcde 1034*22dc650dSSadaf Ebrahimi Arg1: [1] [234] [4] Arg2: |1| () 1035*22dc650dSSadaf Ebrahimi 12345 1036*22dc650dSSadaf Ebrahimi 1037*22dc650dSSadaf Ebrahimi The parameters for the system call that is used to run the program or 1038*22dc650dSSadaf Ebrahimi script are zero-terminated strings. This means that binary zero charac- 1039*22dc650dSSadaf Ebrahimi ters in the callout argument will cause premature termination of their 1040*22dc650dSSadaf Ebrahimi substrings, and therefore should not be present. Any syntax errors in 1041*22dc650dSSadaf Ebrahimi the string (for example, a dollar not followed by another character) 1042*22dc650dSSadaf Ebrahimi causes the callout to be ignored. If running the program fails for any 1043*22dc650dSSadaf Ebrahimi reason (including the non-existence of the executable), a local match- 1044*22dc650dSSadaf Ebrahimi ing failure occurs and the matcher backtracks in the normal way. 1045*22dc650dSSadaf Ebrahimi 1046*22dc650dSSadaf Ebrahimi 1047*22dc650dSSadaf EbrahimiMATCHING ERRORS 1048*22dc650dSSadaf Ebrahimi 1049*22dc650dSSadaf Ebrahimi It is possible to supply a regular expression that takes a very long 1050*22dc650dSSadaf Ebrahimi time to fail to match certain lines. Such patterns normally involve 1051*22dc650dSSadaf Ebrahimi nested indefinite repeats, for example: (a+)*\d when matched against a 1052*22dc650dSSadaf Ebrahimi line of a's with no final digit. The PCRE2 matching function has a re- 1053*22dc650dSSadaf Ebrahimi source limit that causes it to abort in these circumstances. If this 1054*22dc650dSSadaf Ebrahimi happens, pcre2grep outputs an error message and the line that caused 1055*22dc650dSSadaf Ebrahimi the problem to the standard error stream. If there are more than 20 1056*22dc650dSSadaf Ebrahimi such errors, pcre2grep gives up. 1057*22dc650dSSadaf Ebrahimi 1058*22dc650dSSadaf Ebrahimi The --match-limit option of pcre2grep can be used to set the overall 1059*22dc650dSSadaf Ebrahimi resource limit. There are also other limits that affect the amount of 1060*22dc650dSSadaf Ebrahimi memory used during matching; see the discussion of --heap-limit and 1061*22dc650dSSadaf Ebrahimi --depth-limit above. 1062*22dc650dSSadaf Ebrahimi 1063*22dc650dSSadaf Ebrahimi 1064*22dc650dSSadaf EbrahimiDIAGNOSTICS 1065*22dc650dSSadaf Ebrahimi 1066*22dc650dSSadaf Ebrahimi Exit status is 0 if any matches were found, 1 if no matches were found, 1067*22dc650dSSadaf Ebrahimi and 2 for syntax errors, overlong lines, non-existent or inaccessible 1068*22dc650dSSadaf Ebrahimi files (even if matches were found in other files) or too many matching 1069*22dc650dSSadaf Ebrahimi errors. Using the -s option to suppress error messages about inaccessi- 1070*22dc650dSSadaf Ebrahimi ble files does not affect the return code. 1071*22dc650dSSadaf Ebrahimi 1072*22dc650dSSadaf Ebrahimi When run under VMS, the return code is placed in the symbol 1073*22dc650dSSadaf Ebrahimi PCRE2GREP_RC because VMS does not distinguish between exit(0) and 1074*22dc650dSSadaf Ebrahimi exit(1). 1075*22dc650dSSadaf Ebrahimi 1076*22dc650dSSadaf Ebrahimi 1077*22dc650dSSadaf EbrahimiSEE ALSO 1078*22dc650dSSadaf Ebrahimi 1079*22dc650dSSadaf Ebrahimi pcre2pattern(3), pcre2syntax(3), pcre2callout(3), pcre2unicode(3). 1080*22dc650dSSadaf Ebrahimi 1081*22dc650dSSadaf Ebrahimi 1082*22dc650dSSadaf EbrahimiAUTHOR 1083*22dc650dSSadaf Ebrahimi 1084*22dc650dSSadaf Ebrahimi Philip Hazel 1085*22dc650dSSadaf Ebrahimi Retired from University Computing Service 1086*22dc650dSSadaf Ebrahimi Cambridge, England. 1087*22dc650dSSadaf Ebrahimi 1088*22dc650dSSadaf Ebrahimi 1089*22dc650dSSadaf EbrahimiREVISION 1090*22dc650dSSadaf Ebrahimi 1091*22dc650dSSadaf Ebrahimi Last updated: 22 December 2023 1092*22dc650dSSadaf Ebrahimi Copyright (c) 1997-2023 University of Cambridge. 1093*22dc650dSSadaf Ebrahimi 1094*22dc650dSSadaf Ebrahimi 1095*22dc650dSSadaf EbrahimiPCRE2 10.43 22 December 2023 PCRE2GREP(1) 1096