Name Date Size #Lines LOC

..--

.github/workflows/H25-Apr-2025-354242

cmake/H25-Apr-2025-294261

doc/H25-Apr-2025-58,30852,827

include/H25-Apr-2025-63

m4/H25-Apr-2025-9,4868,574

maint/H25-Apr-2025-71,62168,600

src/H25-Apr-2025-138,070100,941

testdata/H25-Apr-2025-110,78993,230

vms/H25-Apr-2025-1,189927

.bazelrcH A D25-Apr-2025127 43

.gitignoreH A D25-Apr-20251.1 KiB10592

132htmlH A D25-Apr-20257 KiB318219

AUTHORSH A D25-Apr-2025749 3724

Android.bpH A D25-Apr-20252.8 KiB106101

BUILD.bazelH A D25-Apr-20251.9 KiB7670

CMakeLists.txtH A D25-Apr-202547.3 KiB1,2341,049

COPYINGH A D25-Apr-202597 63

ChangeLogH A D25-Apr-2025140.8 KiB3,0982,245

CheckManH A D25-Apr-20251.7 KiB7965

CleanTxtH A D25-Apr-20252.9 KiB11472

DetrailH A D25-Apr-2025643 3623

HACKINGH A D25-Apr-202539 KiB854674

INSTALLH A D25-Apr-202515.4 KiB369287

LICENCEH A D25-Apr-20253.4 KiB9567

METADATAH A D25-Apr-2025898 2422

MODULE.bazelH A D25-Apr-2025183 97

MODULE_LICENSE_BSDHD25-Apr-20250

Makefile.amH A D25-Apr-202528.6 KiB968754

Makefile.inH A D25-Apr-2025264.2 KiB3,9573,585

NEWSH A D25-Apr-202517.8 KiB493323

NON-AUTOTOOLS-BUILDH A D25-Apr-202519.1 KiB431324

NOTICEH A D25-Apr-20253 KiB9364

OWNERSH A D25-Apr-202546 21

PrepareReleaseH A D25-Apr-20257.5 KiB258221

READMEH A D25-Apr-202544.5 KiB957754

README.mdH A D25-Apr-20252.3 KiB5740

RunGrepTestH A D25-Apr-202552.7 KiB1,068773

RunGrepTest.batH A D25-Apr-202534.4 KiB700526

RunTestH A D25-Apr-202526.1 KiB917661

RunTest.batH A D25-Apr-202513.7 KiB530477

WORKSPACE.bazelH A D25-Apr-202519 21

aclocal.m4H A D25-Apr-202555.9 KiB1,5621,410

ar-libH A D25-Apr-20255.7 KiB272211

autogen.shH A D25-Apr-20251.2 KiB4625

build.zigH A D25-Apr-20252.7 KiB8473

compileH A D25-Apr-20257.2 KiB349259

config-cmake.h.inH A D25-Apr-20251.7 KiB5747

config.guessH A D25-Apr-202548.2 KiB1,7491,522

config.subH A D25-Apr-202534.4 KiB1,8851,698

configureH A D25-Apr-2025543.2 KiB18,98715,931

configure.acH A D25-Apr-202543 KiB1,1911,011

depcompH A D25-Apr-202523 KiB792502

index.mdH A D25-Apr-20252.3 KiB5740

install-shH A D25-Apr-202515 KiB542352

libpcre2-16.pc.inH A D25-Apr-2025406 1411

libpcre2-32.pc.inH A D25-Apr-2025406 1411

libpcre2-8.pc.inH A D25-Apr-2025403 1411

libpcre2-posix.pc.inH A D25-Apr-2025340 1411

ltmain.shH A D25-Apr-2025325.3 KiB11,4378,214

missingH A D25-Apr-20256.7 KiB216143

pcre2-config.inH A D25-Apr-20252.3 KiB122109

pcre2_fuzzer.dictH A D25-Apr-2025435 5145

pcre2_fuzzer.optionsH A D25-Apr-202553 43

pcre2_fuzzer_16.dictH A D25-Apr-20251.3 KiB5145

pcre2_fuzzer_16.optionsH A D25-Apr-202556 43

pcre2_fuzzer_32.dictH A D25-Apr-20253.1 KiB5145

pcre2_fuzzer_32.optionsH A D25-Apr-202556 43

perltest.shH A D25-Apr-202511.1 KiB403227

test-driverH A D25-Apr-20254.8 KiB15489

README

1README file for PCRE2 (Perl-compatible regular expression library)
2------------------------------------------------------------------
3
4PCRE2 is a re-working of the original PCRE1 library to provide an entirely new
5API. Since its initial release in 2015, there has been further development of
6the code and it now differs from PCRE1 in more than just the API. There are new
7features, and the internals have been improved. The original PCRE1 library is
8now obsolete and no longer maintained. The latest release of PCRE2 is available
9in .tar.gz, tar.bz2, or .zip form from this GitHub repository:
10
11https://github.com/PCRE2Project/pcre2/releases
12
13There is a mailing list for discussion about the development of PCRE2 at
14[email protected]. You can subscribe by sending an email to
15[email protected].
16
17You can access the archives and also subscribe or manage your subscription
18here:
19
20https://groups.google.com/g/pcre2-dev
21
22Please read the NEWS file if you are upgrading from a previous release. The
23contents of this README file are:
24
25  The PCRE2 APIs
26  Documentation for PCRE2
27  Building PCRE2 on non-Unix-like systems
28  Building PCRE2 without using autotools
29  Building PCRE2 using autotools
30  Retrieving configuration information
31  Shared libraries
32  Cross-compiling using autotools
33  Making new tarballs
34  Testing PCRE2
35  Character tables
36  File manifest
37
38
39The PCRE2 APIs
40--------------
41
42PCRE2 is written in C, and it has its own API. There are three sets of
43functions, one for the 8-bit library, which processes strings of bytes, one for
44the 16-bit library, which processes strings of 16-bit values, and one for the
4532-bit library, which processes strings of 32-bit values. Unlike PCRE1, there
46are no C++ wrappers.
47
48The distribution does contain a set of C wrapper functions for the 8-bit
49library that are based on the POSIX regular expression API (see the pcre2posix
50man page). These are built into a library called libpcre2-posix. Note that this
51just provides a POSIX calling interface to PCRE2; the regular expressions
52themselves still follow Perl syntax and semantics. The POSIX API is restricted,
53and does not give full access to all of PCRE2's facilities.
54
55The header file for the POSIX-style functions is called pcre2posix.h. The
56official POSIX name is regex.h, but I did not want to risk possible problems
57with existing files of that name by distributing it that way. To use PCRE2 with
58an existing program that uses the POSIX API, pcre2posix.h will have to be
59renamed or pointed at by a link (or the program modified, of course). See the
60pcre2posix documentation for more details.
61
62
63Documentation for PCRE2
64-----------------------
65
66If you install PCRE2 in the normal way on a Unix-like system, you will end up
67with a set of man pages whose names all start with "pcre2". The one that is
68just called "pcre2" lists all the others. In addition to these man pages, the
69PCRE2 documentation is supplied in two other forms:
70
71  1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and
72     doc/pcre2test.txt in the source distribution. The first of these is a
73     concatenation of the text forms of all the section 3 man pages except the
74     listing of pcre2demo.c and those that summarize individual functions. The
75     other two are the text forms of the section 1 man pages for the pcre2grep
76     and pcre2test commands. These text forms are provided for ease of scanning
77     with text editors or similar tools. They are installed in
78     <prefix>/share/doc/pcre2, where <prefix> is the installation prefix
79     (defaulting to /usr/local).
80
81  2. A set of files containing all the documentation in HTML form, hyperlinked
82     in various ways, and rooted in a file called index.html, is distributed in
83     doc/html and installed in <prefix>/share/doc/pcre2/html.
84
85
86Building PCRE2 on non-Unix-like systems
87---------------------------------------
88
89For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if
90your system supports the use of "configure" and "make" you may be able to build
91PCRE2 using autotools in the same way as for many Unix-like systems.
92
93PCRE2 can also be configured using CMake, which can be run in various ways
94(command line, GUI, etc). This creates Makefiles, solution files, etc. The file
95NON-AUTOTOOLS-BUILD has information about CMake.
96
97PCRE2 has been compiled on many different operating systems. It should be
98straightforward to build PCRE2 on any system that has a Standard C compiler and
99library, because it uses only Standard C functions.
100
101
102Building PCRE2 without using autotools
103--------------------------------------
104
105The use of autotools (in particular, libtool) is problematic in some
106environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
107file for ways of building PCRE2 without using autotools.
108
109
110Building PCRE2 using autotools
111------------------------------
112
113The following instructions assume the use of the widely used "configure; make;
114make install" (autotools) process.
115
116If you have downloaded and unpacked a PCRE2 release tarball, run the
117"configure" command from the PCRE2 directory, with your current directory set
118to the directory where you want the files to be created. This command is a
119standard GNU "autoconf" configuration script, for which generic instructions
120are supplied in the file INSTALL.
121
122The files in the GitHub repository do not contain "configure". If you have
123downloaded the PCRE2 source files from GitHub, before you can run "configure"
124you must run the shell script called autogen.sh. This runs a number of
125autotools to create a "configure" script (you must of course have the autotools
126commands installed in order to do this).
127
128Most commonly, people build PCRE2 within its own distribution directory, and in
129this case, on many systems, just running "./configure" is sufficient. However,
130the usual methods of changing standard defaults are available. For example:
131
132CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
133
134This command specifies that the C compiler should be run with the flags '-O2
135-Wall' instead of the default, and that "make install" should install PCRE2
136under /opt/local instead of the default /usr/local.
137
138If you want to build in a different directory, just run "configure" with that
139directory as current. For example, suppose you have unpacked the PCRE2 source
140into /source/pcre2/pcre2-xxx, but you want to build it in
141/build/pcre2/pcre2-xxx:
142
143cd /build/pcre2/pcre2-xxx
144/source/pcre2/pcre2-xxx/configure
145
146PCRE2 is written in C and is normally compiled as a C library. However, it is
147possible to build it as a C++ library, though the provided building apparatus
148does not have any features to support this.
149
150There are some optional features that can be included or omitted from the PCRE2
151library. They are also documented in the pcre2build man page.
152
153. By default, both shared and static libraries are built. You can change this
154  by adding one of these options to the "configure" command:
155
156  --disable-shared
157  --disable-static
158
159  Setting --disable-shared ensures that PCRE2 libraries are built as static
160  libraries. The binaries that are then created as part of the build process
161  (for example, pcre2test and pcre2grep) are linked statically with one or more
162  PCRE2 libraries, but may also be dynamically linked with other libraries such
163  as libc. If you want these binaries to be fully statically linked, you can
164  set LDFLAGS like this:
165
166  LDFLAGS=--static ./configure --disable-shared
167
168  Note the two hyphens in --static. Of course, this works only if static
169  versions of all the relevant libraries are available for linking. See also
170  "Shared libraries" below.
171
172. By default, only the 8-bit library is built. If you add --enable-pcre2-16 to
173  the "configure" command, the 16-bit library is also built. If you add
174  --enable-pcre2-32 to the "configure" command, the 32-bit library is also
175  built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8
176  to disable building the 8-bit library.
177
178. If you want to include support for just-in-time (JIT) compiling, which can
179  give large performance improvements on certain platforms, add --enable-jit to
180  the "configure" command. This support is available only for certain hardware
181  architectures. If you try to enable it on an unsupported architecture, there
182  will be a compile time error. If in doubt, use --enable-jit=auto, which
183  enables JIT only if the current hardware is supported.
184
185. If you are enabling JIT under SELinux environment you may also want to add
186  --enable-jit-sealloc, which enables the use of an executable memory allocator
187  that is compatible with SELinux. Warning: this allocator is experimental!
188  It does not support fork() operation and may crash when no disk space is
189  available. This option has no effect if JIT is disabled.
190
191. If you do not want to make use of the default support for UTF-8 Unicode
192  character strings in the 8-bit library, UTF-16 Unicode character strings in
193  the 16-bit library, or UTF-32 Unicode character strings in the 32-bit
194  library, you can add --disable-unicode to the "configure" command. This
195  reduces the size of the libraries. It is not possible to configure one
196  library with Unicode support, and another without, in the same configuration.
197  It is also not possible to use --enable-ebcdic (see below) with Unicode
198  support, so if this option is set, you must also use --disable-unicode.
199
200  When Unicode support is available, the use of a UTF encoding still has to be
201  enabled by setting the PCRE2_UTF option at run time or starting a pattern
202  with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
203  either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms.
204
205  As well as supporting UTF strings, Unicode support includes support for the
206  \P, \p, and \X sequences that recognize Unicode character properties.
207  However, only a subset of Unicode properties are supported; see the
208  pcre2pattern man page for details. Escape sequences such as \d and \w in
209  patterns do not by default make use of Unicode properties, but can be made to
210  do so by setting the PCRE2_UCP option or starting a pattern with (*UCP).
211
212. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
213  of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
214  character as indicating the end of a line. Whatever you specify at build time
215  is the default; the caller of PCRE2 can change the selection at run time. The
216  default newline indicator is a single LF character (the Unix standard). You
217  can specify the default newline indicator by adding --enable-newline-is-cr,
218  --enable-newline-is-lf, --enable-newline-is-crlf,
219  --enable-newline-is-anycrlf, --enable-newline-is-any, or
220  --enable-newline-is-nul to the "configure" command, respectively.
221
222. By default, the sequence \R in a pattern matches any Unicode line ending
223  sequence. This is independent of the option specifying what PCRE2 considers
224  to be the end of a line (see above). However, the caller of PCRE2 can
225  restrict \R to match only CR, LF, or CRLF. You can make this the default by
226  adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
227
228. In a pattern, the escape sequence \C matches a single code unit, even in a
229  UTF mode. This can be dangerous because it breaks up multi-code-unit
230  characters. You can build PCRE2 with the use of \C permanently locked out by
231  adding --enable-never-backslash-C (note the upper case C) to the "configure"
232  command. When \C is allowed by the library, individual applications can lock
233  it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option.
234
235. PCRE2 has a counter that limits the depth of nesting of parentheses in a
236  pattern. This limits the amount of system stack that a pattern uses when it
237  is compiled. The default is 250, but you can change it by setting, for
238  example,
239
240  --with-parens-nest-limit=500
241
242. PCRE2 has a counter that can be set to limit the amount of computing resource
243  it uses when matching a pattern. If the limit is exceeded during a match, the
244  match fails. The default is ten million. You can change the default by
245  setting, for example,
246
247  --with-match-limit=500000
248
249  on the "configure" command. This is just the default; individual calls to
250  pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
251  discussion in the pcre2api man page (search for pcre2_set_match_limit).
252
253. There is a separate counter that limits the depth of nested backtracking
254  (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
255  matching process, which indirectly limits the amount of heap memory that is
256  used, and in the case of pcre2_dfa_match() the amount of stack as well. This
257  counter also has a default of ten million, which is essentially "unlimited".
258  You can change the default by setting, for example,
259
260  --with-match-limit-depth=5000
261
262  There is more discussion in the pcre2api man page (search for
263  pcre2_set_depth_limit).
264
265. You can also set an explicit limit on the amount of heap memory used by
266  the pcre2_match() and pcre2_dfa_match() interpreters:
267
268  --with-heap-limit=500
269
270  The units are kibibytes (units of 1024 bytes). This limit does not apply when
271  the JIT optimization (which has its own memory control features) is used.
272  There is more discussion on the pcre2api man page (search for
273  pcre2_set_heap_limit).
274
275. In the 8-bit library, the default maximum compiled pattern size is around
276  64 kibibytes. You can increase this by adding --with-link-size=3 to the
277  "configure" command. PCRE2 then uses three bytes instead of two for offsets
278  to different parts of the compiled pattern. In the 16-bit library,
279  --with-link-size=3 is the same as --with-link-size=4, which (in both
280  libraries) uses four-byte offsets. Increasing the internal link size reduces
281  performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
282  link size setting is ignored, as 4-byte offsets are always used.
283
284. Lookbehind assertions in which one or more branches can match a variable
285  number of characters are supported only if there is a maximum matching length
286  for each top-level branch. There is a limit to this maximum that defaults to
287  255 characters. You can alter this default by a setting such as
288
289  --with-max-varlookbehind=100
290
291  The limit can be changed at runtime by calling pcre2_set_max_varlookbehind().
292  Lookbehind assertions in which every branch matches a fixed number of
293  characters (not necessarily all the same) are not constrained by this limit.
294
295. For speed, PCRE2 uses four tables for manipulating and identifying characters
296  whose code point values are less than 256. By default, it uses a set of
297  tables for ASCII encoding that is part of the distribution. If you specify
298
299  --enable-rebuild-chartables
300
301  a program called pcre2_dftables is compiled and run in the default C locale
302  when you obey "make". It builds a source file called pcre2_chartables.c. If
303  you do not specify this option, pcre2_chartables.c is created as a copy of
304  pcre2_chartables.c.dist. See "Character tables" below for further
305  information.
306
307. It is possible to compile PCRE2 for use on systems that use EBCDIC as their
308  character code (as opposed to ASCII/Unicode) by specifying
309
310  --enable-ebcdic --disable-unicode
311
312  This automatically implies --enable-rebuild-chartables (see above). However,
313  when PCRE2 is built this way, it always operates in EBCDIC. It cannot support
314  both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
315  which specifies that the code value for the EBCDIC NL character is 0x25
316  instead of the default 0x15.
317
318. If you specify --enable-debug, additional debugging code is included in the
319  build. This option is intended for use by the PCRE2 maintainers.
320
321. In environments where valgrind is installed, if you specify
322
323  --enable-valgrind
324
325  PCRE2 will use valgrind annotations to mark certain memory regions as
326  unaddressable. This allows it to detect invalid memory accesses, and is
327  mostly useful for debugging PCRE2 itself.
328
329. In environments where the gcc compiler is used and lcov is installed, if you
330  specify
331
332  --enable-coverage
333
334  the build process implements a code coverage report for the test suite. The
335  report is generated by running "make coverage". If ccache is installed on
336  your system, it must be disabled when building PCRE2 for coverage reporting.
337  You can do this by setting the environment variable CCACHE_DISABLE=1 before
338  running "make" to build PCRE2. There is more information about coverage
339  reporting in the "pcre2build" documentation.
340
341. When JIT support is enabled, pcre2grep automatically makes use of it, unless
342  you add --disable-pcre2grep-jit to the "configure" command.
343
344. There is support for calling external programs during matching in the
345  pcre2grep command, using PCRE2's callout facility with string arguments. This
346  support can be disabled by adding --disable-pcre2grep-callout to the
347  "configure" command. There are two kinds of callout: one that generates
348  output from inbuilt code, and another that calls an external program. The
349  latter has special support for Windows and VMS; otherwise it assumes the
350  existence of the fork() function. This facility can be disabled by adding
351  --disable-pcre2grep-callout-fork to the "configure" command.
352
353. The pcre2grep program currently supports only 8-bit data files, and so
354  requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
355  libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
356  specifying one or both of
357
358  --enable-pcre2grep-libz
359  --enable-pcre2grep-libbz2
360
361  Of course, the relevant libraries must be installed on your system.
362
363. The default starting size (in bytes) of the internal buffer used by pcre2grep
364  can be set by, for example:
365
366  --with-pcre2grep-bufsize=51200
367
368  The value must be a plain integer. The default is 20480. The amount of memory
369  used by pcre2grep is actually three times this number, to allow for "before"
370  and "after" lines. If very long lines are encountered, the buffer is
371  automatically enlarged, up to a fixed maximum size.
372
373. The default maximum size of pcre2grep's internal buffer can be set by, for
374  example:
375
376  --with-pcre2grep-max-bufsize=2097152
377
378  The default is either 1048576 or the value of --with-pcre2grep-bufsize,
379  whichever is the larger.
380
381. It is possible to compile pcre2test so that it links with the libreadline
382  or libedit libraries, by specifying, respectively,
383
384  --enable-pcre2test-libreadline or --enable-pcre2test-libedit
385
386  If this is done, when pcre2test's input is from a terminal, it reads it using
387  the readline() function. This provides line-editing and history facilities.
388  Note that libreadline is GPL-licenced, so if you distribute a binary of
389  pcre2test linked in this way, there may be licensing issues. These can be
390  avoided by linking with libedit (which has a BSD licence) instead.
391
392  Enabling libreadline causes the -lreadline option to be added to the
393  pcre2test build. In many operating environments with a system-installed
394  readline library this is sufficient. However, in some environments (e.g. if
395  an unmodified distribution version of readline is in use), it may be
396  necessary to specify something like LIBS="-lncurses" as well. This is
397  because, to quote the readline INSTALL, "Readline uses the termcap functions,
398  but does not link with the termcap or curses library itself, allowing
399  applications which link with readline the option to choose an appropriate
400  library." If you get error messages about missing functions tgetstr, tgetent,
401  tputs, tgetflag, or tgoto, this is the problem, and linking with the ncurses
402  library should fix it.
403
404. The C99 standard defines formatting modifiers z and t for size_t and
405  ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in
406  environments other than Microsoft Visual Studio versions earlier than 2013
407  when __STDC_VERSION__ is defined and has a value greater than or equal to
408  199901L (indicating C99). However, there is at least one environment that
409  claims to be C99 but does not support these modifiers. If
410  --disable-percent-zt is specified, no use is made of the z or t modifiers.
411  Instead of %td or %zu, %lu is used, with a cast for size_t values.
412
413. There is a special option called --enable-fuzz-support for use by people who
414  want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
415  library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
416  be built, but not installed. This contains a single function called
417  LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
418  length of the string. When called, this function tries to compile the string
419  as a pattern, and if that succeeds, to match it. This is done both with no
420  options and with some random options bits that are generated from the string.
421  Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
422  be created. This is normally run under valgrind or used when PCRE2 is
423  compiled with address sanitizing enabled. It calls the fuzzing function and
424  outputs information about what it is doing. The input strings are specified
425  by arguments: if an argument starts with "=" the rest of it is a literal
426  input string. Otherwise, it is assumed to be a file name, and the contents
427  of the file are the test string.
428
429. Releases before 10.30 could be compiled with --disable-stack-for-recursion,
430  which caused pcre2_match() to use individual blocks on the heap for
431  backtracking instead of recursive function calls (which use the stack). This
432  is now obsolete because pcre2_match() was refactored always to use the heap
433  (in a much more efficient way than before). This option is retained for
434  backwards compatibility, but has no effect other than to output a warning.
435
436The "configure" script builds the following files for the basic C library:
437
438. Makefile             the makefile that builds the library
439. src/config.h         build-time configuration options for the library
440. src/pcre2.h          the public PCRE2 header file
441. pcre2-config         script that shows the building settings such as CFLAGS
442                         that were set for "configure"
443. libpcre2-8.pc        )
444. libpcre2-16.pc       ) data for the pkg-config command
445. libpcre2-32.pc       )
446. libpcre2-posix.pc    )
447. libtool              script that builds shared and/or static libraries
448
449Versions of config.h and pcre2.h are distributed in the src directory of PCRE2
450tarballs under the names config.h.generic and pcre2.h.generic. These are
451provided for those who have to build PCRE2 without using "configure" or CMake.
452If you use "configure" or CMake, the .generic versions are not used.
453
454The "configure" script also creates config.status, which is an executable
455script that can be run to recreate the configuration, and config.log, which
456contains compiler output from tests that "configure" runs.
457
458Once "configure" has run, you can run "make". This builds whichever of the
459libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test
460program called pcre2test. If you enabled JIT support with --enable-jit, another
461test program called pcre2_jit_test is built as well. If the 8-bit library is
462built, libpcre2-posix, pcre2posix_test, and the pcre2grep command are also
463built. Running "make" with the -j option may speed up compilation on
464multiprocessor systems.
465
466The command "make check" runs all the appropriate tests. Details of the PCRE2
467tests are given below in a separate section of this document. The -j option of
468"make" can also be used when running the tests.
469
470You can use "make install" to install PCRE2 into live directories on your
471system. The following are installed (file names are all relative to the
472<prefix> that is set when "configure" is run):
473
474  Commands (bin):
475    pcre2test
476    pcre2grep (if 8-bit support is enabled)
477    pcre2-config
478
479  Libraries (lib):
480    libpcre2-8      (if 8-bit support is enabled)
481    libpcre2-16     (if 16-bit support is enabled)
482    libpcre2-32     (if 32-bit support is enabled)
483    libpcre2-posix  (if 8-bit support is enabled)
484
485  Configuration information (lib/pkgconfig):
486    libpcre2-8.pc
487    libpcre2-16.pc
488    libpcre2-32.pc
489    libpcre2-posix.pc
490
491  Header files (include):
492    pcre2.h
493    pcre2posix.h
494
495  Man pages (share/man/man{1,3}):
496    pcre2grep.1
497    pcre2test.1
498    pcre2-config.1
499    pcre2.3
500    pcre2*.3 (lots more pages, all starting "pcre2")
501
502  HTML documentation (share/doc/pcre2/html):
503    index.html
504    *.html (lots more pages, hyperlinked from index.html)
505
506  Text file documentation (share/doc/pcre2):
507    AUTHORS
508    COPYING
509    ChangeLog
510    LICENCE
511    NEWS
512    README
513    pcre2.txt         (a concatenation of the man(3) pages)
514    pcre2test.txt     the pcre2test man page
515    pcre2grep.txt     the pcre2grep man page
516    pcre2-config.txt  the pcre2-config man page
517
518If you want to remove PCRE2 from your system, you can run "make uninstall".
519This removes all the files that "make install" installed. However, it does not
520remove any directories, because these are often shared with other programs.
521
522
523Retrieving configuration information
524------------------------------------
525
526Running "make install" installs the command pcre2-config, which can be used to
527recall information about the PCRE2 configuration and installation. For example:
528
529  pcre2-config --version
530
531prints the version number, and
532
533  pcre2-config --libs8
534
535outputs information about where the 8-bit library is installed. This command
536can be included in makefiles for programs that use PCRE2, saving the programmer
537from having to remember too many details. Run pcre2-config with no arguments to
538obtain a list of possible arguments.
539
540The pkg-config command is another system for saving and retrieving information
541about installed libraries. Instead of separate commands for each library, a
542single command is used. For example:
543
544  pkg-config --libs libpcre2-16
545
546The data is held in *.pc files that are installed in a directory called
547<prefix>/lib/pkgconfig.
548
549
550Shared libraries
551----------------
552
553The default distribution builds PCRE2 as shared libraries and static libraries,
554as long as the operating system supports shared libraries. Shared library
555support relies on the "libtool" script which is built as part of the
556"configure" process.
557
558The libtool script is used to compile and link both shared and static
559libraries. They are placed in a subdirectory called .libs when they are newly
560built. The programs pcre2test and pcre2grep are built to use these uninstalled
561libraries (by means of wrapper scripts in the case of shared libraries). When
562you use "make install" to install shared libraries, pcre2grep and pcre2test are
563automatically re-built to use the newly installed shared libraries before being
564installed themselves. However, the versions left in the build directory still
565use the uninstalled libraries.
566
567To build PCRE2 using static libraries only you must use --disable-shared when
568configuring it. For example:
569
570./configure --prefix=/usr/gnu --disable-shared
571
572Then run "make" in the usual way. Similarly, you can use --disable-static to
573build only shared libraries. Note, however, that when you build only static
574libraries, binary programs such as pcre2test and pcre2grep may still be
575dynamically linked with other libraries (for example, libc) unless you set
576LDFLAGS to --static when running "configure".
577
578
579Cross-compiling using autotools
580-------------------------------
581
582You can specify CC and CFLAGS in the normal way to the "configure" command, in
583order to cross-compile PCRE2 for some other host. However, you should NOT
584specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c
585source file is compiled and run on the local host, in order to generate the
586inbuilt character tables (the pcre2_chartables.c file). This will probably not
587work, because pcre2_dftables.c needs to be compiled with the local compiler,
588not the cross compiler.
589
590When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
591created by making a copy of pcre2_chartables.c.dist, which is a default set of
592tables that assumes ASCII code. Cross-compiling with the default tables should
593not be a problem.
594
595If you need to modify the character tables when cross-compiling, you should
596move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by
597hand and run it on the local host to make a new version of
598pcre2_chartables.c.dist. See the pcre2build section "Creating character tables
599at build time" for more details.
600
601
602Making new tarballs
603-------------------
604
605The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
606zip formats. The command "make distcheck" does the same, but then does a trial
607build of the new distribution to ensure that it works.
608
609If you have modified any of the man page sources in the doc directory, you
610should first run the PrepareRelease script before making a distribution. This
611script creates the .txt and HTML forms of the documentation from the man pages.
612
613
614Testing PCRE2
615-------------
616
617To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
618There is another script called RunGrepTest that tests the pcre2grep command.
619When the 8-bit library is built, a test program for the POSIX wrapper, called
620pcre2posix_test, is compiled, and when JIT support is enabled, a test program
621called pcre2_jit_test is built. The scripts and the program tests are all run
622when you obey "make check". For other environments, see the instructions in
623NON-AUTOTOOLS-BUILD.
624
625The RunTest script runs the pcre2test test program (which is documented in its
626own man page) on each of the relevant testinput files in the testdata
627directory, and compares the output with the contents of the corresponding
628testoutput files. RunTest uses a file called testtry to hold the main output
629from pcre2test. Other files whose names begin with "test" are used as working
630files in some tests.
631
632Some tests are relevant only when certain build-time options were selected. For
633example, the tests for UTF-8/16/32 features are run only when Unicode support
634is available. RunTest outputs a comment when it skips a test.
635
636Many (but not all) of the tests that are not skipped are run twice if JIT
637support is available. On the second run, JIT compilation is forced. This
638testing can be suppressed by putting "-nojit" on the RunTest command line.
639
640The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
641libraries that are enabled. If you want to run just one set of tests, call
642RunTest with either the -8, -16 or -32 option.
643
644If valgrind is installed, you can run the tests under it by putting "-valgrind"
645on the RunTest command line. To run pcre2test on just one or more specific test
646files, give their numbers as arguments to RunTest, for example:
647
648  RunTest 2 7 11
649
650You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
651end), or a number preceded by ~ to exclude a test. For example:
652
653  Runtest 3-15 ~10
654
655This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
656except test 13. Whatever order the arguments are in, the tests are always run
657in numerical order.
658
659You can also call RunTest with the single argument "list" to cause it to output
660a list of tests.
661
662The test sequence starts with "test 0", which is a special test that has no
663input file, and whose output is not checked. This is because it will be
664different on different hardware and with different configurations. The test
665exists in order to exercise some of pcre2test's code that would not otherwise
666be run.
667
668Tests 1 and 2 can always be run, as they expect only plain text strings (not
669UTF) and make no use of Unicode properties. The first test file can be fed
670directly into the perltest.sh script to check that Perl gives the same results.
671The only difference you should see is in the first few lines, where the Perl
672version is given instead of the PCRE2 version. The second set of tests check
673auxiliary functions, error detection, and run-time flags that are specific to
674PCRE2. It also uses the debugging flags to check some of the internals of
675pcre2_compile().
676
677If you build PCRE2 with a locale setting that is not the standard C locale, the
678character tables may be different (see next paragraph). In some cases, this may
679cause failures in the second set of tests. For example, in a locale where the
680isprint() function yields TRUE for characters in the range 128-255, the use of
681[:isascii:] inside a character class defines a different set of characters, and
682this shows up in this test as a difference in the compiled code, which is being
683listed for checking. For example, where the comparison test output contains
684[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
685cases. This is not a bug in PCRE2.
686
687Test 3 checks pcre2_maketables(), the facility for building a set of character
688tables for a specific locale and using them instead of the default tables. The
689script uses the "locale" command to check for the availability of the "fr_FR",
690"french", or "fr" locale, and uses the first one that it finds. If the "locale"
691command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
692the list of available locales, the third test cannot be run, and a comment is
693output to say why. If running this test produces an error like this:
694
695  ** Failed to set locale "fr_FR"
696
697it means that the given locale is not available on your system, despite being
698listed by "locale". This does not mean that PCRE2 is broken. There are three
699alternative output files for the third test, because three different versions
700of the French locale have been encountered. The test passes if its output
701matches any one of them.
702
703Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
704with the perltest.sh script, and test 5 checking PCRE2-specific things.
705
706Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
707non-UTF mode and UTF-mode with Unicode property support, respectively.
708
709Test 8 checks some internal offsets and code size features, but it is run only
710when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
71132-bit modes and for different link sizes, so there are different output files
712for each mode and link size.
713
714Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
71516-bit and 32-bit modes. These are tests that generate different output in
7168-bit mode. Each pair are for general cases and Unicode support, respectively.
717
718Test 13 checks the handling of non-UTF characters greater than 255 by
719pcre2_dfa_match() in 16-bit and 32-bit modes.
720
721Test 14 contains some special UTF and UCP tests that give different output for
722different code unit widths.
723
724Test 15 contains a number of tests that must not be run with JIT. They check,
725among other non-JIT things, the match-limiting features of the interpretive
726matcher.
727
728Test 16 is run only when JIT support is not available. It checks that an
729attempt to use JIT has the expected behaviour.
730
731Test 17 is run only when JIT support is available. It checks JIT complete and
732partial modes, match-limiting under JIT, and other JIT-specific features.
733
734Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to
735the 8-bit library, without and with Unicode support, respectively.
736
737Test 20 checks the serialization functions by writing a set of compiled
738patterns to a file, and then reloading and checking them.
739
740Tests 21 and 22 test \C support when the use of \C is not locked out, without
741and with UTF support, respectively. Test 23 tests \C when it is locked out.
742
743Tests 24 and 25 test the experimental pattern conversion functions, without and
744with UTF support, respectively.
745
746Test 26 checks Unicode property support using tests that are generated
747automatically from the Unicode data tables.
748
749
750Character tables
751----------------
752
753For speed, PCRE2 uses four tables for manipulating and identifying characters
754whose code point values are less than 256. By default, a set of tables that is
755built into the library is used. The pcre2_maketables() function can be called
756by an application to create a new set of tables in the current locale. This are
757passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a
758compile context.
759
760The source file called pcre2_chartables.c contains the default set of tables.
761By default, this is created as a copy of pcre2_chartables.c.dist, which
762contains tables for ASCII coding. However, if --enable-rebuild-chartables is
763specified for ./configure, a new version of pcre2_chartables.c is built by the
764program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C
765character handling functions such as isalnum(), isalpha(), isupper(),
766islower(), etc. to build the table sources. This means that the default C
767locale that is set for your system will control the contents of these default
768tables. You can change the default tables by editing pcre2_chartables.c and
769then re-building PCRE2. If you do this, you should take care to ensure that the
770file does not get automatically re-generated. The best way to do this is to
771move pcre2_chartables.c.dist out of the way and replace it with your customized
772tables.
773
774When the pcre2_dftables program is run as a result of specifying
775--enable-rebuild-chartables, it uses the default C locale that is set on your
776system. It does not pay attention to the LC_xxx environment variables. In other
777words, it uses the system's default locale rather than whatever the compiling
778user happens to have set. If you really do want to build a source set of
779character tables in a locale that is specified by the LC_xxx variables, you can
780run the pcre2_dftables program by hand with the -L option. For example:
781
782  ./pcre2_dftables -L pcre2_chartables.c.special
783
784The second argument names the file where the source code for the tables is
785written. The first two 256-byte tables provide lower casing and case flipping
786functions, respectively. The next table consists of a number of 32-byte bit
787maps which identify certain character classes such as digits, "word"
788characters, white space, etc. These are used when building 32-byte bit maps
789that represent character classes for code points less than 256. The final
790256-byte table has bits indicating various character types, as follows:
791
792    1   white space character
793    2   letter
794    4   lower case letter
795    8   decimal digit
796   16   alphanumeric or '_'
797
798You can also specify -b (with or without -L) when running pcre2_dftables. This
799causes the tables to be written in binary instead of as source code. A set of
800binary tables can be loaded into memory by an application and passed to
801pcre2_compile() in the same way as tables created dynamically by calling
802pcre2_maketables(). The tables are just a string of bytes, independent of
803hardware characteristics such as endianness. This means they can be bundled
804with an application that runs in different environments, to ensure consistent
805behaviour.
806
807See also the pcre2build section "Creating character tables at build time".
808
809
810File manifest
811-------------
812
813The distribution should contain the files listed below.
814
815(A) Source files for the PCRE2 library functions and their headers are found in
816    the src directory:
817
818  src/pcre2_dftables.c     auxiliary program for building pcre2_chartables.c
819                           when --enable-rebuild-chartables is specified
820
821  src/pcre2_chartables.c.dist  a default set of character tables that assume
822                           ASCII coding; unless --enable-rebuild-chartables is
823                           specified, used by copying to pcre2_chartables.c
824
825  src/pcre2posix.c         )
826  src/pcre2_auto_possess.c )
827  src/pcre2_chkdint.c      )
828  src/pcre2_compile.c      )
829  src/pcre2_config.c       )
830  src/pcre2_context.c      )
831  src/pcre2_convert.c      )
832  src/pcre2_dfa_match.c    )
833  src/pcre2_error.c        )
834  src/pcre2_extuni.c       )
835  src/pcre2_find_bracket.c )
836  src/pcre2_jit_compile.c  )
837  src/pcre2_jit_match.c    ) sources for the functions in the library,
838  src/pcre2_jit_misc.c     )   and some internal functions that they use
839  src/pcre2_maketables.c   )
840  src/pcre2_match.c        )
841  src/pcre2_match_data.c   )
842  src/pcre2_newline.c      )
843  src/pcre2_ord2utf.c      )
844  src/pcre2_pattern_info.c )
845  src/pcre2_script_run.c   )
846  src/pcre2_serialize.c    )
847  src/pcre2_string_utils.c )
848  src/pcre2_study.c        )
849  src/pcre2_substitute.c   )
850  src/pcre2_substring.c    )
851  src/pcre2_tables.c       )
852  src/pcre2_ucd.c          )
853  src/pcre2_ucptables.c    )
854  src/pcre2_valid_utf.c    )
855  src/pcre2_xclass.c       )
856
857  src/pcre2_printint.c     debugging function that is used by pcre2test,
858  src/pcre2_fuzzsupport.c  function for (optional) fuzzing support
859
860  src/config.h.in          template for config.h, when built by "configure"
861  src/pcre2.h.in           template for pcre2.h when built by "configure"
862  src/pcre2posix.h         header for the external POSIX wrapper API
863  src/pcre2_internal.h     header for internal use
864  src/pcre2_intmodedep.h   a mode-specific internal header
865  src/pcre2_jit_neon_inc.h header used by JIT
866  src/pcre2_jit_simd_inc.h header used by JIT
867  src/pcre2_ucp.h          header for Unicode property handling
868
869  sljit/*                  source files for the JIT compiler
870
871(B) Source files for programs that use PCRE2:
872
873  src/pcre2demo.c          simple demonstration of coding calls to PCRE2
874  src/pcre2grep.c          source of a grep utility that uses PCRE2
875  src/pcre2test.c          comprehensive test program
876  src/pcre2_jit_test.c     JIT test program
877  src/pcre2posix_test.c    POSIX wrapper API test program
878
879(C) Auxiliary files:
880
881  132html                  script to turn "man" pages into HTML
882  AUTHORS                  information about the author of PCRE2
883  ChangeLog                log of changes to the code
884  CleanTxt                 script to clean nroff output for txt man pages
885  Detrail                  script to remove trailing spaces
886  HACKING                  some notes about the internals of PCRE2
887  INSTALL                  generic installation instructions
888  LICENCE                  conditions for the use of PCRE2
889  COPYING                  the same, using GNU's standard name
890  Makefile.in              ) template for Unix Makefile, which is built by
891                           )   "configure"
892  Makefile.am              ) the automake input that was used to create
893                           )   Makefile.in
894  NEWS                     important changes in this release
895  NON-AUTOTOOLS-BUILD      notes on building PCRE2 without using autotools
896  PrepareRelease           script to make preparations for "make dist"
897  README                   this file
898  RunTest                  a Unix shell script for running tests
899  RunGrepTest              a Unix shell script for pcre2grep tests
900  aclocal.m4               m4 macros (generated by "aclocal")
901  config.guess             ) files used by libtool,
902  config.sub               )   used only when building a shared library
903  configure                a configuring shell script (built by autoconf)
904  configure.ac             ) the autoconf input that was used to build
905                           )   "configure" and config.h
906  depcomp                  ) script to find program dependencies, generated by
907                           )   automake
908  doc/*.3                  man page sources for PCRE2
909  doc/*.1                  man page sources for pcre2grep and pcre2test
910  doc/index.html.src       the base HTML page
911  doc/html/*               HTML documentation
912  doc/pcre2.txt            plain text version of the man pages
913  doc/pcre2test.txt        plain text documentation of test program
914  install-sh               a shell script for installing files
915  libpcre2-8.pc.in         template for libpcre2-8.pc for pkg-config
916  libpcre2-16.pc.in        template for libpcre2-16.pc for pkg-config
917  libpcre2-32.pc.in        template for libpcre2-32.pc for pkg-config
918  libpcre2-posix.pc.in     template for libpcre2-posix.pc for pkg-config
919  ltmain.sh                file used to build a libtool script
920  missing                  ) common stub for a few missing GNU programs while
921                           )   installing, generated by automake
922  mkinstalldirs            script for making install directories
923  perltest.sh              Script for running a Perl test program
924  pcre2-config.in          source of script which retains PCRE2 information
925  testdata/testinput*      test data for main library tests
926  testdata/testoutput*     expected test results
927  testdata/grep*           input and output for pcre2grep tests
928  testdata/*               other supporting test files
929
930(D) Auxiliary files for cmake support
931
932  cmake/COPYING-CMAKE-SCRIPTS
933  cmake/FindPackageHandleStandardArgs.cmake
934  cmake/FindEditline.cmake
935  cmake/FindReadline.cmake
936  CMakeLists.txt
937  config-cmake.h.in
938
939(E) Auxiliary files for building PCRE2 "by hand"
940
941  src/pcre2.h.generic     ) a version of the public PCRE2 header file
942                          )   for use in non-"configure" environments
943  src/config.h.generic    ) a version of config.h for use in non-"configure"
944                          )   environments
945
946(F) Auxiliary files for building PCRE2 under OpenVMS
947
948  vms/configure.com       )
949  vms/openvms_readme.txt  ) These files were contributed by a PCRE2 user.
950  vms/pcre2.h_patch       )
951  vms/stdint.h            )
952
953Philip Hazel
954Email local part: Philip.Hazel
955Email domain: gmail.com
956Last updated: 15 April 2024
957

README.md

1# PCRE2 - Perl-Compatible Regular Expressions
2
3The PCRE2 library is a set of C functions that implement regular expression
4pattern matching using the same syntax and semantics as Perl 5. PCRE2 has its
5own native API, as well as a set of wrapper functions that correspond to the
6POSIX regular expression API. The PCRE2 library is free, even for building
7proprietary software. It comes in three forms, for processing 8-bit, 16-bit,
8or 32-bit code units, in either literal or UTF encoding.
9
10PCRE2 was first released in 2015 to replace the API in the original PCRE
11library, which is now obsolete and no longer maintained. As well as a more
12flexible API, the code of PCRE2 has been much improved since the fork.
13
14## Download
15
16As well as downloading from the
17[GitHub site](https://github.com/PCRE2Project/pcre2), you can download PCRE2
18or the older, unmaintained PCRE1 library from an
19[*unofficial* mirror](https://sourceforge.net/projects/pcre/files/) at SourceForge.
20
21You can check out the PCRE2 source code via Git or Subversion:
22
23    git clone https://github.com/PCRE2Project/pcre2.git
24    svn co    https://github.com/PCRE2Project/pcre2.git
25
26## Contributed Ports
27
28If you just need the command-line PCRE2 tools on Windows, precompiled binary
29versions are available at this
30[Rexegg page](http://www.rexegg.com/pcregrep-pcretest.html).
31
32A PCRE2 port for z/OS, a mainframe operating system which uses EBCDIC as its
33default character encoding, can be found at
34[http://www.cbttape.org](http://www.cbttape.org/) (File 939).
35
36## Documentation
37
38You can read the PCRE2 documentation
39[here](https://PCRE2Project.github.io/pcre2/doc/html/index.html).
40
41Comparisons to Perl's regular expression semantics can be found in the
42community authored Wikipedia entry for PCRE.
43
44There is a curated summary of changes for each PCRE release, copies of
45documentation from older releases, and other useful information from the third
46party authored
47[RexEgg PCRE Documentation and Change Log page](http://www.rexegg.com/pcre-documentation.html).
48
49## Contact
50
51To report a problem with the PCRE2 library, or to make a feature request, please
52use the PCRE2 GitHub issues tracker. There is a mailing list for discussion of
53 PCRE2 issues and development at [email protected], which is where any
54announcements will be made. You can browse the
55[list archives](https://groups.google.com/g/pcre2-dev).
56
57