1README file for PCRE2 (Perl-compatible regular expression library) 2------------------------------------------------------------------ 3 4PCRE2 is a re-working of the original PCRE1 library to provide an entirely new 5API. Since its initial release in 2015, there has been further development of 6the code and it now differs from PCRE1 in more than just the API. There are new 7features, and the internals have been improved. The original PCRE1 library is 8now obsolete and no longer maintained. The latest release of PCRE2 is available 9in .tar.gz, tar.bz2, or .zip form from this GitHub repository: 10 11https://github.com/PCRE2Project/pcre2/releases 12 13There is a mailing list for discussion about the development of PCRE2 at 14[email protected]. You can subscribe by sending an email to 15[email protected]. 16 17You can access the archives and also subscribe or manage your subscription 18here: 19 20https://groups.google.com/g/pcre2-dev 21 22Please read the NEWS file if you are upgrading from a previous release. The 23contents of this README file are: 24 25 The PCRE2 APIs 26 Documentation for PCRE2 27 Building PCRE2 on non-Unix-like systems 28 Building PCRE2 without using autotools 29 Building PCRE2 using autotools 30 Retrieving configuration information 31 Shared libraries 32 Cross-compiling using autotools 33 Making new tarballs 34 Testing PCRE2 35 Character tables 36 File manifest 37 38 39The PCRE2 APIs 40-------------- 41 42PCRE2 is written in C, and it has its own API. There are three sets of 43functions, one for the 8-bit library, which processes strings of bytes, one for 44the 16-bit library, which processes strings of 16-bit values, and one for the 4532-bit library, which processes strings of 32-bit values. Unlike PCRE1, there 46are no C++ wrappers. 47 48The distribution does contain a set of C wrapper functions for the 8-bit 49library that are based on the POSIX regular expression API (see the pcre2posix 50man page). These are built into a library called libpcre2-posix. Note that this 51just provides a POSIX calling interface to PCRE2; the regular expressions 52themselves still follow Perl syntax and semantics. The POSIX API is restricted, 53and does not give full access to all of PCRE2's facilities. 54 55The header file for the POSIX-style functions is called pcre2posix.h. The 56official POSIX name is regex.h, but I did not want to risk possible problems 57with existing files of that name by distributing it that way. To use PCRE2 with 58an existing program that uses the POSIX API, pcre2posix.h will have to be 59renamed or pointed at by a link (or the program modified, of course). See the 60pcre2posix documentation for more details. 61 62 63Documentation for PCRE2 64----------------------- 65 66If you install PCRE2 in the normal way on a Unix-like system, you will end up 67with a set of man pages whose names all start with "pcre2". The one that is 68just called "pcre2" lists all the others. In addition to these man pages, the 69PCRE2 documentation is supplied in two other forms: 70 71 1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and 72 doc/pcre2test.txt in the source distribution. The first of these is a 73 concatenation of the text forms of all the section 3 man pages except the 74 listing of pcre2demo.c and those that summarize individual functions. The 75 other two are the text forms of the section 1 man pages for the pcre2grep 76 and pcre2test commands. These text forms are provided for ease of scanning 77 with text editors or similar tools. They are installed in 78 <prefix>/share/doc/pcre2, where <prefix> is the installation prefix 79 (defaulting to /usr/local). 80 81 2. A set of files containing all the documentation in HTML form, hyperlinked 82 in various ways, and rooted in a file called index.html, is distributed in 83 doc/html and installed in <prefix>/share/doc/pcre2/html. 84 85 86Building PCRE2 on non-Unix-like systems 87--------------------------------------- 88 89For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if 90your system supports the use of "configure" and "make" you may be able to build 91PCRE2 using autotools in the same way as for many Unix-like systems. 92 93PCRE2 can also be configured using CMake, which can be run in various ways 94(command line, GUI, etc). This creates Makefiles, solution files, etc. The file 95NON-AUTOTOOLS-BUILD has information about CMake. 96 97PCRE2 has been compiled on many different operating systems. It should be 98straightforward to build PCRE2 on any system that has a Standard C compiler and 99library, because it uses only Standard C functions. 100 101 102Building PCRE2 without using autotools 103-------------------------------------- 104 105The use of autotools (in particular, libtool) is problematic in some 106environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD 107file for ways of building PCRE2 without using autotools. 108 109 110Building PCRE2 using autotools 111------------------------------ 112 113The following instructions assume the use of the widely used "configure; make; 114make install" (autotools) process. 115 116If you have downloaded and unpacked a PCRE2 release tarball, run the 117"configure" command from the PCRE2 directory, with your current directory set 118to the directory where you want the files to be created. This command is a 119standard GNU "autoconf" configuration script, for which generic instructions 120are supplied in the file INSTALL. 121 122The files in the GitHub repository do not contain "configure". If you have 123downloaded the PCRE2 source files from GitHub, before you can run "configure" 124you must run the shell script called autogen.sh. This runs a number of 125autotools to create a "configure" script (you must of course have the autotools 126commands installed in order to do this). 127 128Most commonly, people build PCRE2 within its own distribution directory, and in 129this case, on many systems, just running "./configure" is sufficient. However, 130the usual methods of changing standard defaults are available. For example: 131 132CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local 133 134This command specifies that the C compiler should be run with the flags '-O2 135-Wall' instead of the default, and that "make install" should install PCRE2 136under /opt/local instead of the default /usr/local. 137 138If you want to build in a different directory, just run "configure" with that 139directory as current. For example, suppose you have unpacked the PCRE2 source 140into /source/pcre2/pcre2-xxx, but you want to build it in 141/build/pcre2/pcre2-xxx: 142 143cd /build/pcre2/pcre2-xxx 144/source/pcre2/pcre2-xxx/configure 145 146PCRE2 is written in C and is normally compiled as a C library. However, it is 147possible to build it as a C++ library, though the provided building apparatus 148does not have any features to support this. 149 150There are some optional features that can be included or omitted from the PCRE2 151library. They are also documented in the pcre2build man page. 152 153. By default, both shared and static libraries are built. You can change this 154 by adding one of these options to the "configure" command: 155 156 --disable-shared 157 --disable-static 158 159 Setting --disable-shared ensures that PCRE2 libraries are built as static 160 libraries. The binaries that are then created as part of the build process 161 (for example, pcre2test and pcre2grep) are linked statically with one or more 162 PCRE2 libraries, but may also be dynamically linked with other libraries such 163 as libc. If you want these binaries to be fully statically linked, you can 164 set LDFLAGS like this: 165 166 LDFLAGS=--static ./configure --disable-shared 167 168 Note the two hyphens in --static. Of course, this works only if static 169 versions of all the relevant libraries are available for linking. See also 170 "Shared libraries" below. 171 172. By default, only the 8-bit library is built. If you add --enable-pcre2-16 to 173 the "configure" command, the 16-bit library is also built. If you add 174 --enable-pcre2-32 to the "configure" command, the 32-bit library is also 175 built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8 176 to disable building the 8-bit library. 177 178. If you want to include support for just-in-time (JIT) compiling, which can 179 give large performance improvements on certain platforms, add --enable-jit to 180 the "configure" command. This support is available only for certain hardware 181 architectures. If you try to enable it on an unsupported architecture, there 182 will be a compile time error. If in doubt, use --enable-jit=auto, which 183 enables JIT only if the current hardware is supported. 184 185. If you are enabling JIT under SELinux environment you may also want to add 186 --enable-jit-sealloc, which enables the use of an executable memory allocator 187 that is compatible with SELinux. Warning: this allocator is experimental! 188 It does not support fork() operation and may crash when no disk space is 189 available. This option has no effect if JIT is disabled. 190 191. If you do not want to make use of the default support for UTF-8 Unicode 192 character strings in the 8-bit library, UTF-16 Unicode character strings in 193 the 16-bit library, or UTF-32 Unicode character strings in the 32-bit 194 library, you can add --disable-unicode to the "configure" command. This 195 reduces the size of the libraries. It is not possible to configure one 196 library with Unicode support, and another without, in the same configuration. 197 It is also not possible to use --enable-ebcdic (see below) with Unicode 198 support, so if this option is set, you must also use --disable-unicode. 199 200 When Unicode support is available, the use of a UTF encoding still has to be 201 enabled by setting the PCRE2_UTF option at run time or starting a pattern 202 with (*UTF). When PCRE2 is compiled with Unicode support, its input can only 203 either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. 204 205 As well as supporting UTF strings, Unicode support includes support for the 206 \P, \p, and \X sequences that recognize Unicode character properties. 207 However, only a subset of Unicode properties are supported; see the 208 pcre2pattern man page for details. Escape sequences such as \d and \w in 209 patterns do not by default make use of Unicode properties, but can be made to 210 do so by setting the PCRE2_UCP option or starting a pattern with (*UCP). 211 212. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any 213 of the preceding, or any of the Unicode newline sequences, or the NUL (zero) 214 character as indicating the end of a line. Whatever you specify at build time 215 is the default; the caller of PCRE2 can change the selection at run time. The 216 default newline indicator is a single LF character (the Unix standard). You 217 can specify the default newline indicator by adding --enable-newline-is-cr, 218 --enable-newline-is-lf, --enable-newline-is-crlf, 219 --enable-newline-is-anycrlf, --enable-newline-is-any, or 220 --enable-newline-is-nul to the "configure" command, respectively. 221 222. By default, the sequence \R in a pattern matches any Unicode line ending 223 sequence. This is independent of the option specifying what PCRE2 considers 224 to be the end of a line (see above). However, the caller of PCRE2 can 225 restrict \R to match only CR, LF, or CRLF. You can make this the default by 226 adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R"). 227 228. In a pattern, the escape sequence \C matches a single code unit, even in a 229 UTF mode. This can be dangerous because it breaks up multi-code-unit 230 characters. You can build PCRE2 with the use of \C permanently locked out by 231 adding --enable-never-backslash-C (note the upper case C) to the "configure" 232 command. When \C is allowed by the library, individual applications can lock 233 it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option. 234 235. PCRE2 has a counter that limits the depth of nesting of parentheses in a 236 pattern. This limits the amount of system stack that a pattern uses when it 237 is compiled. The default is 250, but you can change it by setting, for 238 example, 239 240 --with-parens-nest-limit=500 241 242. PCRE2 has a counter that can be set to limit the amount of computing resource 243 it uses when matching a pattern. If the limit is exceeded during a match, the 244 match fails. The default is ten million. You can change the default by 245 setting, for example, 246 247 --with-match-limit=500000 248 249 on the "configure" command. This is just the default; individual calls to 250 pcre2_match() or pcre2_dfa_match() can supply their own value. There is more 251 discussion in the pcre2api man page (search for pcre2_set_match_limit). 252 253. There is a separate counter that limits the depth of nested backtracking 254 (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a 255 matching process, which indirectly limits the amount of heap memory that is 256 used, and in the case of pcre2_dfa_match() the amount of stack as well. This 257 counter also has a default of ten million, which is essentially "unlimited". 258 You can change the default by setting, for example, 259 260 --with-match-limit-depth=5000 261 262 There is more discussion in the pcre2api man page (search for 263 pcre2_set_depth_limit). 264 265. You can also set an explicit limit on the amount of heap memory used by 266 the pcre2_match() and pcre2_dfa_match() interpreters: 267 268 --with-heap-limit=500 269 270 The units are kibibytes (units of 1024 bytes). This limit does not apply when 271 the JIT optimization (which has its own memory control features) is used. 272 There is more discussion on the pcre2api man page (search for 273 pcre2_set_heap_limit). 274 275. In the 8-bit library, the default maximum compiled pattern size is around 276 64 kibibytes. You can increase this by adding --with-link-size=3 to the 277 "configure" command. PCRE2 then uses three bytes instead of two for offsets 278 to different parts of the compiled pattern. In the 16-bit library, 279 --with-link-size=3 is the same as --with-link-size=4, which (in both 280 libraries) uses four-byte offsets. Increasing the internal link size reduces 281 performance in the 8-bit and 16-bit libraries. In the 32-bit library, the 282 link size setting is ignored, as 4-byte offsets are always used. 283 284. Lookbehind assertions in which one or more branches can match a variable 285 number of characters are supported only if there is a maximum matching length 286 for each top-level branch. There is a limit to this maximum that defaults to 287 255 characters. You can alter this default by a setting such as 288 289 --with-max-varlookbehind=100 290 291 The limit can be changed at runtime by calling pcre2_set_max_varlookbehind(). 292 Lookbehind assertions in which every branch matches a fixed number of 293 characters (not necessarily all the same) are not constrained by this limit. 294 295. For speed, PCRE2 uses four tables for manipulating and identifying characters 296 whose code point values are less than 256. By default, it uses a set of 297 tables for ASCII encoding that is part of the distribution. If you specify 298 299 --enable-rebuild-chartables 300 301 a program called pcre2_dftables is compiled and run in the default C locale 302 when you obey "make". It builds a source file called pcre2_chartables.c. If 303 you do not specify this option, pcre2_chartables.c is created as a copy of 304 pcre2_chartables.c.dist. See "Character tables" below for further 305 information. 306 307. It is possible to compile PCRE2 for use on systems that use EBCDIC as their 308 character code (as opposed to ASCII/Unicode) by specifying 309 310 --enable-ebcdic --disable-unicode 311 312 This automatically implies --enable-rebuild-chartables (see above). However, 313 when PCRE2 is built this way, it always operates in EBCDIC. It cannot support 314 both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25, 315 which specifies that the code value for the EBCDIC NL character is 0x25 316 instead of the default 0x15. 317 318. If you specify --enable-debug, additional debugging code is included in the 319 build. This option is intended for use by the PCRE2 maintainers. 320 321. In environments where valgrind is installed, if you specify 322 323 --enable-valgrind 324 325 PCRE2 will use valgrind annotations to mark certain memory regions as 326 unaddressable. This allows it to detect invalid memory accesses, and is 327 mostly useful for debugging PCRE2 itself. 328 329. In environments where the gcc compiler is used and lcov is installed, if you 330 specify 331 332 --enable-coverage 333 334 the build process implements a code coverage report for the test suite. The 335 report is generated by running "make coverage". If ccache is installed on 336 your system, it must be disabled when building PCRE2 for coverage reporting. 337 You can do this by setting the environment variable CCACHE_DISABLE=1 before 338 running "make" to build PCRE2. There is more information about coverage 339 reporting in the "pcre2build" documentation. 340 341. When JIT support is enabled, pcre2grep automatically makes use of it, unless 342 you add --disable-pcre2grep-jit to the "configure" command. 343 344. There is support for calling external programs during matching in the 345 pcre2grep command, using PCRE2's callout facility with string arguments. This 346 support can be disabled by adding --disable-pcre2grep-callout to the 347 "configure" command. There are two kinds of callout: one that generates 348 output from inbuilt code, and another that calls an external program. The 349 latter has special support for Windows and VMS; otherwise it assumes the 350 existence of the fork() function. This facility can be disabled by adding 351 --disable-pcre2grep-callout-fork to the "configure" command. 352 353. The pcre2grep program currently supports only 8-bit data files, and so 354 requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use 355 libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by 356 specifying one or both of 357 358 --enable-pcre2grep-libz 359 --enable-pcre2grep-libbz2 360 361 Of course, the relevant libraries must be installed on your system. 362 363. The default starting size (in bytes) of the internal buffer used by pcre2grep 364 can be set by, for example: 365 366 --with-pcre2grep-bufsize=51200 367 368 The value must be a plain integer. The default is 20480. The amount of memory 369 used by pcre2grep is actually three times this number, to allow for "before" 370 and "after" lines. If very long lines are encountered, the buffer is 371 automatically enlarged, up to a fixed maximum size. 372 373. The default maximum size of pcre2grep's internal buffer can be set by, for 374 example: 375 376 --with-pcre2grep-max-bufsize=2097152 377 378 The default is either 1048576 or the value of --with-pcre2grep-bufsize, 379 whichever is the larger. 380 381. It is possible to compile pcre2test so that it links with the libreadline 382 or libedit libraries, by specifying, respectively, 383 384 --enable-pcre2test-libreadline or --enable-pcre2test-libedit 385 386 If this is done, when pcre2test's input is from a terminal, it reads it using 387 the readline() function. This provides line-editing and history facilities. 388 Note that libreadline is GPL-licenced, so if you distribute a binary of 389 pcre2test linked in this way, there may be licensing issues. These can be 390 avoided by linking with libedit (which has a BSD licence) instead. 391 392 Enabling libreadline causes the -lreadline option to be added to the 393 pcre2test build. In many operating environments with a system-installed 394 readline library this is sufficient. However, in some environments (e.g. if 395 an unmodified distribution version of readline is in use), it may be 396 necessary to specify something like LIBS="-lncurses" as well. This is 397 because, to quote the readline INSTALL, "Readline uses the termcap functions, 398 but does not link with the termcap or curses library itself, allowing 399 applications which link with readline the option to choose an appropriate 400 library." If you get error messages about missing functions tgetstr, tgetent, 401 tputs, tgetflag, or tgoto, this is the problem, and linking with the ncurses 402 library should fix it. 403 404. The C99 standard defines formatting modifiers z and t for size_t and 405 ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in 406 environments other than Microsoft Visual Studio versions earlier than 2013 407 when __STDC_VERSION__ is defined and has a value greater than or equal to 408 199901L (indicating C99). However, there is at least one environment that 409 claims to be C99 but does not support these modifiers. If 410 --disable-percent-zt is specified, no use is made of the z or t modifiers. 411 Instead of %td or %zu, %lu is used, with a cast for size_t values. 412 413. There is a special option called --enable-fuzz-support for use by people who 414 want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit 415 library. If set, it causes an extra library called libpcre2-fuzzsupport.a to 416 be built, but not installed. This contains a single function called 417 LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the 418 length of the string. When called, this function tries to compile the string 419 as a pattern, and if that succeeds, to match it. This is done both with no 420 options and with some random options bits that are generated from the string. 421 Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to 422 be created. This is normally run under valgrind or used when PCRE2 is 423 compiled with address sanitizing enabled. It calls the fuzzing function and 424 outputs information about what it is doing. The input strings are specified 425 by arguments: if an argument starts with "=" the rest of it is a literal 426 input string. Otherwise, it is assumed to be a file name, and the contents 427 of the file are the test string. 428 429. Releases before 10.30 could be compiled with --disable-stack-for-recursion, 430 which caused pcre2_match() to use individual blocks on the heap for 431 backtracking instead of recursive function calls (which use the stack). This 432 is now obsolete because pcre2_match() was refactored always to use the heap 433 (in a much more efficient way than before). This option is retained for 434 backwards compatibility, but has no effect other than to output a warning. 435 436The "configure" script builds the following files for the basic C library: 437 438. Makefile the makefile that builds the library 439. src/config.h build-time configuration options for the library 440. src/pcre2.h the public PCRE2 header file 441. pcre2-config script that shows the building settings such as CFLAGS 442 that were set for "configure" 443. libpcre2-8.pc ) 444. libpcre2-16.pc ) data for the pkg-config command 445. libpcre2-32.pc ) 446. libpcre2-posix.pc ) 447. libtool script that builds shared and/or static libraries 448 449Versions of config.h and pcre2.h are distributed in the src directory of PCRE2 450tarballs under the names config.h.generic and pcre2.h.generic. These are 451provided for those who have to build PCRE2 without using "configure" or CMake. 452If you use "configure" or CMake, the .generic versions are not used. 453 454The "configure" script also creates config.status, which is an executable 455script that can be run to recreate the configuration, and config.log, which 456contains compiler output from tests that "configure" runs. 457 458Once "configure" has run, you can run "make". This builds whichever of the 459libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test 460program called pcre2test. If you enabled JIT support with --enable-jit, another 461test program called pcre2_jit_test is built as well. If the 8-bit library is 462built, libpcre2-posix, pcre2posix_test, and the pcre2grep command are also 463built. Running "make" with the -j option may speed up compilation on 464multiprocessor systems. 465 466The command "make check" runs all the appropriate tests. Details of the PCRE2 467tests are given below in a separate section of this document. The -j option of 468"make" can also be used when running the tests. 469 470You can use "make install" to install PCRE2 into live directories on your 471system. The following are installed (file names are all relative to the 472<prefix> that is set when "configure" is run): 473 474 Commands (bin): 475 pcre2test 476 pcre2grep (if 8-bit support is enabled) 477 pcre2-config 478 479 Libraries (lib): 480 libpcre2-8 (if 8-bit support is enabled) 481 libpcre2-16 (if 16-bit support is enabled) 482 libpcre2-32 (if 32-bit support is enabled) 483 libpcre2-posix (if 8-bit support is enabled) 484 485 Configuration information (lib/pkgconfig): 486 libpcre2-8.pc 487 libpcre2-16.pc 488 libpcre2-32.pc 489 libpcre2-posix.pc 490 491 Header files (include): 492 pcre2.h 493 pcre2posix.h 494 495 Man pages (share/man/man{1,3}): 496 pcre2grep.1 497 pcre2test.1 498 pcre2-config.1 499 pcre2.3 500 pcre2*.3 (lots more pages, all starting "pcre2") 501 502 HTML documentation (share/doc/pcre2/html): 503 index.html 504 *.html (lots more pages, hyperlinked from index.html) 505 506 Text file documentation (share/doc/pcre2): 507 AUTHORS 508 COPYING 509 ChangeLog 510 LICENCE 511 NEWS 512 README 513 pcre2.txt (a concatenation of the man(3) pages) 514 pcre2test.txt the pcre2test man page 515 pcre2grep.txt the pcre2grep man page 516 pcre2-config.txt the pcre2-config man page 517 518If you want to remove PCRE2 from your system, you can run "make uninstall". 519This removes all the files that "make install" installed. However, it does not 520remove any directories, because these are often shared with other programs. 521 522 523Retrieving configuration information 524------------------------------------ 525 526Running "make install" installs the command pcre2-config, which can be used to 527recall information about the PCRE2 configuration and installation. For example: 528 529 pcre2-config --version 530 531prints the version number, and 532 533 pcre2-config --libs8 534 535outputs information about where the 8-bit library is installed. This command 536can be included in makefiles for programs that use PCRE2, saving the programmer 537from having to remember too many details. Run pcre2-config with no arguments to 538obtain a list of possible arguments. 539 540The pkg-config command is another system for saving and retrieving information 541about installed libraries. Instead of separate commands for each library, a 542single command is used. For example: 543 544 pkg-config --libs libpcre2-16 545 546The data is held in *.pc files that are installed in a directory called 547<prefix>/lib/pkgconfig. 548 549 550Shared libraries 551---------------- 552 553The default distribution builds PCRE2 as shared libraries and static libraries, 554as long as the operating system supports shared libraries. Shared library 555support relies on the "libtool" script which is built as part of the 556"configure" process. 557 558The libtool script is used to compile and link both shared and static 559libraries. They are placed in a subdirectory called .libs when they are newly 560built. The programs pcre2test and pcre2grep are built to use these uninstalled 561libraries (by means of wrapper scripts in the case of shared libraries). When 562you use "make install" to install shared libraries, pcre2grep and pcre2test are 563automatically re-built to use the newly installed shared libraries before being 564installed themselves. However, the versions left in the build directory still 565use the uninstalled libraries. 566 567To build PCRE2 using static libraries only you must use --disable-shared when 568configuring it. For example: 569 570./configure --prefix=/usr/gnu --disable-shared 571 572Then run "make" in the usual way. Similarly, you can use --disable-static to 573build only shared libraries. Note, however, that when you build only static 574libraries, binary programs such as pcre2test and pcre2grep may still be 575dynamically linked with other libraries (for example, libc) unless you set 576LDFLAGS to --static when running "configure". 577 578 579Cross-compiling using autotools 580------------------------------- 581 582You can specify CC and CFLAGS in the normal way to the "configure" command, in 583order to cross-compile PCRE2 for some other host. However, you should NOT 584specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c 585source file is compiled and run on the local host, in order to generate the 586inbuilt character tables (the pcre2_chartables.c file). This will probably not 587work, because pcre2_dftables.c needs to be compiled with the local compiler, 588not the cross compiler. 589 590When --enable-rebuild-chartables is not specified, pcre2_chartables.c is 591created by making a copy of pcre2_chartables.c.dist, which is a default set of 592tables that assumes ASCII code. Cross-compiling with the default tables should 593not be a problem. 594 595If you need to modify the character tables when cross-compiling, you should 596move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by 597hand and run it on the local host to make a new version of 598pcre2_chartables.c.dist. See the pcre2build section "Creating character tables 599at build time" for more details. 600 601 602Making new tarballs 603------------------- 604 605The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and 606zip formats. The command "make distcheck" does the same, but then does a trial 607build of the new distribution to ensure that it works. 608 609If you have modified any of the man page sources in the doc directory, you 610should first run the PrepareRelease script before making a distribution. This 611script creates the .txt and HTML forms of the documentation from the man pages. 612 613 614Testing PCRE2 615------------- 616 617To test the basic PCRE2 library on a Unix-like system, run the RunTest script. 618There is another script called RunGrepTest that tests the pcre2grep command. 619When the 8-bit library is built, a test program for the POSIX wrapper, called 620pcre2posix_test, is compiled, and when JIT support is enabled, a test program 621called pcre2_jit_test is built. The scripts and the program tests are all run 622when you obey "make check". For other environments, see the instructions in 623NON-AUTOTOOLS-BUILD. 624 625The RunTest script runs the pcre2test test program (which is documented in its 626own man page) on each of the relevant testinput files in the testdata 627directory, and compares the output with the contents of the corresponding 628testoutput files. RunTest uses a file called testtry to hold the main output 629from pcre2test. Other files whose names begin with "test" are used as working 630files in some tests. 631 632Some tests are relevant only when certain build-time options were selected. For 633example, the tests for UTF-8/16/32 features are run only when Unicode support 634is available. RunTest outputs a comment when it skips a test. 635 636Many (but not all) of the tests that are not skipped are run twice if JIT 637support is available. On the second run, JIT compilation is forced. This 638testing can be suppressed by putting "-nojit" on the RunTest command line. 639 640The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit 641libraries that are enabled. If you want to run just one set of tests, call 642RunTest with either the -8, -16 or -32 option. 643 644If valgrind is installed, you can run the tests under it by putting "-valgrind" 645on the RunTest command line. To run pcre2test on just one or more specific test 646files, give their numbers as arguments to RunTest, for example: 647 648 RunTest 2 7 11 649 650You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the 651end), or a number preceded by ~ to exclude a test. For example: 652 653 Runtest 3-15 ~10 654 655This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests 656except test 13. Whatever order the arguments are in, the tests are always run 657in numerical order. 658 659You can also call RunTest with the single argument "list" to cause it to output 660a list of tests. 661 662The test sequence starts with "test 0", which is a special test that has no 663input file, and whose output is not checked. This is because it will be 664different on different hardware and with different configurations. The test 665exists in order to exercise some of pcre2test's code that would not otherwise 666be run. 667 668Tests 1 and 2 can always be run, as they expect only plain text strings (not 669UTF) and make no use of Unicode properties. The first test file can be fed 670directly into the perltest.sh script to check that Perl gives the same results. 671The only difference you should see is in the first few lines, where the Perl 672version is given instead of the PCRE2 version. The second set of tests check 673auxiliary functions, error detection, and run-time flags that are specific to 674PCRE2. It also uses the debugging flags to check some of the internals of 675pcre2_compile(). 676 677If you build PCRE2 with a locale setting that is not the standard C locale, the 678character tables may be different (see next paragraph). In some cases, this may 679cause failures in the second set of tests. For example, in a locale where the 680isprint() function yields TRUE for characters in the range 128-255, the use of 681[:isascii:] inside a character class defines a different set of characters, and 682this shows up in this test as a difference in the compiled code, which is being 683listed for checking. For example, where the comparison test output contains 684[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other 685cases. This is not a bug in PCRE2. 686 687Test 3 checks pcre2_maketables(), the facility for building a set of character 688tables for a specific locale and using them instead of the default tables. The 689script uses the "locale" command to check for the availability of the "fr_FR", 690"french", or "fr" locale, and uses the first one that it finds. If the "locale" 691command fails, or if its output doesn't include "fr_FR", "french", or "fr" in 692the list of available locales, the third test cannot be run, and a comment is 693output to say why. If running this test produces an error like this: 694 695 ** Failed to set locale "fr_FR" 696 697it means that the given locale is not available on your system, despite being 698listed by "locale". This does not mean that PCRE2 is broken. There are three 699alternative output files for the third test, because three different versions 700of the French locale have been encountered. The test passes if its output 701matches any one of them. 702 703Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible 704with the perltest.sh script, and test 5 checking PCRE2-specific things. 705 706Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in 707non-UTF mode and UTF-mode with Unicode property support, respectively. 708 709Test 8 checks some internal offsets and code size features, but it is run only 710when Unicode support is enabled. The output is different in 8-bit, 16-bit, and 71132-bit modes and for different link sizes, so there are different output files 712for each mode and link size. 713 714Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in 71516-bit and 32-bit modes. These are tests that generate different output in 7168-bit mode. Each pair are for general cases and Unicode support, respectively. 717 718Test 13 checks the handling of non-UTF characters greater than 255 by 719pcre2_dfa_match() in 16-bit and 32-bit modes. 720 721Test 14 contains some special UTF and UCP tests that give different output for 722different code unit widths. 723 724Test 15 contains a number of tests that must not be run with JIT. They check, 725among other non-JIT things, the match-limiting features of the interpretive 726matcher. 727 728Test 16 is run only when JIT support is not available. It checks that an 729attempt to use JIT has the expected behaviour. 730 731Test 17 is run only when JIT support is available. It checks JIT complete and 732partial modes, match-limiting under JIT, and other JIT-specific features. 733 734Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to 735the 8-bit library, without and with Unicode support, respectively. 736 737Test 20 checks the serialization functions by writing a set of compiled 738patterns to a file, and then reloading and checking them. 739 740Tests 21 and 22 test \C support when the use of \C is not locked out, without 741and with UTF support, respectively. Test 23 tests \C when it is locked out. 742 743Tests 24 and 25 test the experimental pattern conversion functions, without and 744with UTF support, respectively. 745 746Test 26 checks Unicode property support using tests that are generated 747automatically from the Unicode data tables. 748 749 750Character tables 751---------------- 752 753For speed, PCRE2 uses four tables for manipulating and identifying characters 754whose code point values are less than 256. By default, a set of tables that is 755built into the library is used. The pcre2_maketables() function can be called 756by an application to create a new set of tables in the current locale. This are 757passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a 758compile context. 759 760The source file called pcre2_chartables.c contains the default set of tables. 761By default, this is created as a copy of pcre2_chartables.c.dist, which 762contains tables for ASCII coding. However, if --enable-rebuild-chartables is 763specified for ./configure, a new version of pcre2_chartables.c is built by the 764program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C 765character handling functions such as isalnum(), isalpha(), isupper(), 766islower(), etc. to build the table sources. This means that the default C 767locale that is set for your system will control the contents of these default 768tables. You can change the default tables by editing pcre2_chartables.c and 769then re-building PCRE2. If you do this, you should take care to ensure that the 770file does not get automatically re-generated. The best way to do this is to 771move pcre2_chartables.c.dist out of the way and replace it with your customized 772tables. 773 774When the pcre2_dftables program is run as a result of specifying 775--enable-rebuild-chartables, it uses the default C locale that is set on your 776system. It does not pay attention to the LC_xxx environment variables. In other 777words, it uses the system's default locale rather than whatever the compiling 778user happens to have set. If you really do want to build a source set of 779character tables in a locale that is specified by the LC_xxx variables, you can 780run the pcre2_dftables program by hand with the -L option. For example: 781 782 ./pcre2_dftables -L pcre2_chartables.c.special 783 784The second argument names the file where the source code for the tables is 785written. The first two 256-byte tables provide lower casing and case flipping 786functions, respectively. The next table consists of a number of 32-byte bit 787maps which identify certain character classes such as digits, "word" 788characters, white space, etc. These are used when building 32-byte bit maps 789that represent character classes for code points less than 256. The final 790256-byte table has bits indicating various character types, as follows: 791 792 1 white space character 793 2 letter 794 4 lower case letter 795 8 decimal digit 796 16 alphanumeric or '_' 797 798You can also specify -b (with or without -L) when running pcre2_dftables. This 799causes the tables to be written in binary instead of as source code. A set of 800binary tables can be loaded into memory by an application and passed to 801pcre2_compile() in the same way as tables created dynamically by calling 802pcre2_maketables(). The tables are just a string of bytes, independent of 803hardware characteristics such as endianness. This means they can be bundled 804with an application that runs in different environments, to ensure consistent 805behaviour. 806 807See also the pcre2build section "Creating character tables at build time". 808 809 810File manifest 811------------- 812 813The distribution should contain the files listed below. 814 815(A) Source files for the PCRE2 library functions and their headers are found in 816 the src directory: 817 818 src/pcre2_dftables.c auxiliary program for building pcre2_chartables.c 819 when --enable-rebuild-chartables is specified 820 821 src/pcre2_chartables.c.dist a default set of character tables that assume 822 ASCII coding; unless --enable-rebuild-chartables is 823 specified, used by copying to pcre2_chartables.c 824 825 src/pcre2posix.c ) 826 src/pcre2_auto_possess.c ) 827 src/pcre2_chkdint.c ) 828 src/pcre2_compile.c ) 829 src/pcre2_config.c ) 830 src/pcre2_context.c ) 831 src/pcre2_convert.c ) 832 src/pcre2_dfa_match.c ) 833 src/pcre2_error.c ) 834 src/pcre2_extuni.c ) 835 src/pcre2_find_bracket.c ) 836 src/pcre2_jit_compile.c ) 837 src/pcre2_jit_match.c ) sources for the functions in the library, 838 src/pcre2_jit_misc.c ) and some internal functions that they use 839 src/pcre2_maketables.c ) 840 src/pcre2_match.c ) 841 src/pcre2_match_data.c ) 842 src/pcre2_newline.c ) 843 src/pcre2_ord2utf.c ) 844 src/pcre2_pattern_info.c ) 845 src/pcre2_script_run.c ) 846 src/pcre2_serialize.c ) 847 src/pcre2_string_utils.c ) 848 src/pcre2_study.c ) 849 src/pcre2_substitute.c ) 850 src/pcre2_substring.c ) 851 src/pcre2_tables.c ) 852 src/pcre2_ucd.c ) 853 src/pcre2_ucptables.c ) 854 src/pcre2_valid_utf.c ) 855 src/pcre2_xclass.c ) 856 857 src/pcre2_printint.c debugging function that is used by pcre2test, 858 src/pcre2_fuzzsupport.c function for (optional) fuzzing support 859 860 src/config.h.in template for config.h, when built by "configure" 861 src/pcre2.h.in template for pcre2.h when built by "configure" 862 src/pcre2posix.h header for the external POSIX wrapper API 863 src/pcre2_internal.h header for internal use 864 src/pcre2_intmodedep.h a mode-specific internal header 865 src/pcre2_jit_neon_inc.h header used by JIT 866 src/pcre2_jit_simd_inc.h header used by JIT 867 src/pcre2_ucp.h header for Unicode property handling 868 869 sljit/* source files for the JIT compiler 870 871(B) Source files for programs that use PCRE2: 872 873 src/pcre2demo.c simple demonstration of coding calls to PCRE2 874 src/pcre2grep.c source of a grep utility that uses PCRE2 875 src/pcre2test.c comprehensive test program 876 src/pcre2_jit_test.c JIT test program 877 src/pcre2posix_test.c POSIX wrapper API test program 878 879(C) Auxiliary files: 880 881 132html script to turn "man" pages into HTML 882 AUTHORS information about the author of PCRE2 883 ChangeLog log of changes to the code 884 CleanTxt script to clean nroff output for txt man pages 885 Detrail script to remove trailing spaces 886 HACKING some notes about the internals of PCRE2 887 INSTALL generic installation instructions 888 LICENCE conditions for the use of PCRE2 889 COPYING the same, using GNU's standard name 890 Makefile.in ) template for Unix Makefile, which is built by 891 ) "configure" 892 Makefile.am ) the automake input that was used to create 893 ) Makefile.in 894 NEWS important changes in this release 895 NON-AUTOTOOLS-BUILD notes on building PCRE2 without using autotools 896 PrepareRelease script to make preparations for "make dist" 897 README this file 898 RunTest a Unix shell script for running tests 899 RunGrepTest a Unix shell script for pcre2grep tests 900 aclocal.m4 m4 macros (generated by "aclocal") 901 config.guess ) files used by libtool, 902 config.sub ) used only when building a shared library 903 configure a configuring shell script (built by autoconf) 904 configure.ac ) the autoconf input that was used to build 905 ) "configure" and config.h 906 depcomp ) script to find program dependencies, generated by 907 ) automake 908 doc/*.3 man page sources for PCRE2 909 doc/*.1 man page sources for pcre2grep and pcre2test 910 doc/index.html.src the base HTML page 911 doc/html/* HTML documentation 912 doc/pcre2.txt plain text version of the man pages 913 doc/pcre2test.txt plain text documentation of test program 914 install-sh a shell script for installing files 915 libpcre2-8.pc.in template for libpcre2-8.pc for pkg-config 916 libpcre2-16.pc.in template for libpcre2-16.pc for pkg-config 917 libpcre2-32.pc.in template for libpcre2-32.pc for pkg-config 918 libpcre2-posix.pc.in template for libpcre2-posix.pc for pkg-config 919 ltmain.sh file used to build a libtool script 920 missing ) common stub for a few missing GNU programs while 921 ) installing, generated by automake 922 mkinstalldirs script for making install directories 923 perltest.sh Script for running a Perl test program 924 pcre2-config.in source of script which retains PCRE2 information 925 testdata/testinput* test data for main library tests 926 testdata/testoutput* expected test results 927 testdata/grep* input and output for pcre2grep tests 928 testdata/* other supporting test files 929 930(D) Auxiliary files for cmake support 931 932 cmake/COPYING-CMAKE-SCRIPTS 933 cmake/FindPackageHandleStandardArgs.cmake 934 cmake/FindEditline.cmake 935 cmake/FindReadline.cmake 936 CMakeLists.txt 937 config-cmake.h.in 938 939(E) Auxiliary files for building PCRE2 "by hand" 940 941 src/pcre2.h.generic ) a version of the public PCRE2 header file 942 ) for use in non-"configure" environments 943 src/config.h.generic ) a version of config.h for use in non-"configure" 944 ) environments 945 946(F) Auxiliary files for building PCRE2 under OpenVMS 947 948 vms/configure.com ) 949 vms/openvms_readme.txt ) These files were contributed by a PCRE2 user. 950 vms/pcre2.h_patch ) 951 vms/stdint.h ) 952 953Philip Hazel 954Email local part: Philip.Hazel 955Email domain: gmail.com 956Last updated: 15 April 2024 957