xref: /aosp_15_r20/external/regex-re2/doc/syntax.txt (revision ccdc9c3e24c519bfa4832a66aa2e83a52c19f295)
1RE2 regular expression syntax reference
2-------------------------­-------­-----
3
4Single characters:
5.	any character, possibly including newline (s=true)
6[xyz]	character class
7[^xyz]	negated character class
8\d	Perl character class
9\D	negated Perl character class
10[[:alpha:]]	ASCII character class
11[[:^alpha:]]	negated ASCII character class
12\pN	Unicode character class (one-letter name)
13\p{Greek}	Unicode character class
14\PN	negated Unicode character class (one-letter name)
15\P{Greek}	negated Unicode character class
16
17Composites:
18xy	«x» followed by «y»
19x|y	«x» or «y» (prefer «x»)
20
21Repetitions:
22x*	zero or more «x», prefer more
23x+	one or more «x», prefer more
24x?	zero or one «x», prefer one
25x{n,m}	«n» or «n»+1 or ... or «m» «x», prefer more
26x{n,}	«n» or more «x», prefer more
27x{n}	exactly «n» «x»
28x*?	zero or more «x», prefer fewer
29x+?	one or more «x», prefer fewer
30x??	zero or one «x», prefer zero
31x{n,m}?	«n» or «n»+1 or ... or «m» «x», prefer fewer
32x{n,}?	«n» or more «x», prefer fewer
33x{n}?	exactly «n» «x»
34x{}	(== x*) NOT SUPPORTED vim
35x{-}	(== x*?) NOT SUPPORTED vim
36x{-n}	(== x{n}?) NOT SUPPORTED vim
37x=	(== x?) NOT SUPPORTED vim
38
39Implementation restriction: The counting forms «x{n,m}», «x{n,}», and «x{n}»
40reject forms that create a minimum or maximum repetition count above 1000.
41Unlimited repetitions are not subject to this restriction.
42
43Possessive repetitions:
44x*+	zero or more «x», possessive NOT SUPPORTED
45x++	one or more «x», possessive NOT SUPPORTED
46x?+	zero or one «x», possessive NOT SUPPORTED
47x{n,m}+	«n» or ... or «m» «x», possessive NOT SUPPORTED
48x{n,}+	«n» or more «x», possessive NOT SUPPORTED
49x{n}+	exactly «n» «x», possessive NOT SUPPORTED
50
51Grouping:
52(re)	numbered capturing group (submatch)
53(?P<name>re)	named & numbered capturing group (submatch)
54(?<name>re)	named & numbered capturing group (submatch) NOT SUPPORTED
55(?'name're)	named & numbered capturing group (submatch) NOT SUPPORTED
56(?:re)	non-capturing group
57(?flags)	set flags within current group; non-capturing
58(?flags:re)	set flags during re; non-capturing
59(?#text)	comment NOT SUPPORTED
60(?|x|y|z)	branch numbering reset NOT SUPPORTED
61(?>re)	possessive match of «re» NOT SUPPORTED
62re@>	possessive match of «re» NOT SUPPORTED vim
63%(re)	non-capturing group NOT SUPPORTED vim
64
65Flags:
66i	case-insensitive (default false)
67m	multi-line mode: «^» and «$» match begin/end line in addition to begin/end text (default false)
68s	let «.» match «\n» (default false)
69U	ungreedy: swap meaning of «x*» and «x*?», «x+» and «x+?», etc (default false)
70Flag syntax is «xyz» (set) or «-xyz» (clear) or «xy-z» (set «xy», clear «z»).
71
72Empty strings:
73^	at beginning of text or line («m»=true)
74$	at end of text (like «\z» not «\Z») or line («m»=true)
75\A	at beginning of text
76\b	at ASCII word boundary («\w» on one side and «\W», «\A», or «\z» on the other)
77\B	not at ASCII word boundary
78\G	at beginning of subtext being searched NOT SUPPORTED pcre
79\G	at end of last match NOT SUPPORTED perl
80\Z	at end of text, or before newline at end of text NOT SUPPORTED
81\z	at end of text
82(?=re)	before text matching «re» NOT SUPPORTED
83(?!re)	before text not matching «re» NOT SUPPORTED
84(?<=re)	after text matching «re» NOT SUPPORTED
85(?<!re)	after text not matching «re» NOT SUPPORTED
86re&	before text matching «re» NOT SUPPORTED vim
87re@=	before text matching «re» NOT SUPPORTED vim
88re@!	before text not matching «re» NOT SUPPORTED vim
89re@<=	after text matching «re» NOT SUPPORTED vim
90re@<!	after text not matching «re» NOT SUPPORTED vim
91\zs	sets start of match (= \K) NOT SUPPORTED vim
92\ze	sets end of match NOT SUPPORTED vim
93\%^	beginning of file NOT SUPPORTED vim
94\%$	end of file NOT SUPPORTED vim
95\%V	on screen NOT SUPPORTED vim
96\%#	cursor position NOT SUPPORTED vim
97\%'m	mark «m» position NOT SUPPORTED vim
98\%23l	in line 23 NOT SUPPORTED vim
99\%23c	in column 23 NOT SUPPORTED vim
100\%23v	in virtual column 23 NOT SUPPORTED vim
101
102Escape sequences:
103\a	bell (== \007)
104\f	form feed (== \014)
105\t	horizontal tab (== \011)
106\n	newline (== \012)
107\r	carriage return (== \015)
108\v	vertical tab character (== \013)
109\*	literal «*», for any punctuation character «*»
110\123	octal character code (up to three digits)
111\x7F	hex character code (exactly two digits)
112\x{10FFFF}	hex character code
113\C	match a single byte even in UTF-8 mode
114\Q...\E	literal text «...» even if «...» has punctuation
115
116\1	backreference NOT SUPPORTED
117\b	backspace NOT SUPPORTED (use «\010»)
118\cK	control char ^K NOT SUPPORTED (use «\001» etc)
119\e	escape NOT SUPPORTED (use «\033»)
120\g1	backreference NOT SUPPORTED
121\g{1}	backreference NOT SUPPORTED
122\g{+1}	backreference NOT SUPPORTED
123\g{-1}	backreference NOT SUPPORTED
124\g{name}	named backreference NOT SUPPORTED
125\g<name>	subroutine call NOT SUPPORTED
126\g'name'	subroutine call NOT SUPPORTED
127\k<name>	named backreference NOT SUPPORTED
128\k'name'	named backreference NOT SUPPORTED
129\lX	lowercase «X» NOT SUPPORTED
130\ux	uppercase «x» NOT SUPPORTED
131\L...\E	lowercase text «...» NOT SUPPORTED
132\K	reset beginning of «$0» NOT SUPPORTED
133\N{name}	named Unicode character NOT SUPPORTED
134\R	line break NOT SUPPORTED
135\U...\E	upper case text «...» NOT SUPPORTED
136\X	extended Unicode sequence NOT SUPPORTED
137
138\%d123	decimal character 123 NOT SUPPORTED vim
139\%xFF	hex character FF NOT SUPPORTED vim
140\%o123	octal character 123 NOT SUPPORTED vim
141\%u1234	Unicode character 0x1234 NOT SUPPORTED vim
142\%U12345678	Unicode character 0x12345678 NOT SUPPORTED vim
143
144Character class elements:
145x	single character
146A-Z	character range (inclusive)
147\d	Perl character class
148[:foo:]	ASCII character class «foo»
149\p{Foo}	Unicode character class «Foo»
150\pF	Unicode character class «F» (one-letter name)
151
152Named character classes as character class elements:
153[\d]	digits (== \d)
154[^\d]	not digits (== \D)
155[\D]	not digits (== \D)
156[^\D]	not not digits (== \d)
157[[:name:]]	named ASCII class inside character class (== [:name:])
158[^[:name:]]	named ASCII class inside negated character class (== [:^name:])
159[\p{Name}]	named Unicode property inside character class (== \p{Name})
160[^\p{Name}]	named Unicode property inside negated character class (== \P{Name})
161
162Perl character classes (all ASCII-only):
163\d	digits (== [0-9])
164\D	not digits (== [^0-9])
165\s	whitespace (== [\t\n\f\r ])
166\S	not whitespace (== [^\t\n\f\r ])
167\w	word characters (== [0-9A-Za-z_])
168\W	not word characters (== [^0-9A-Za-z_])
169
170\h	horizontal space NOT SUPPORTED
171\H	not horizontal space NOT SUPPORTED
172\v	vertical space NOT SUPPORTED
173\V	not vertical space NOT SUPPORTED
174
175ASCII character classes:
176[[:alnum:]]	alphanumeric (== [0-9A-Za-z])
177[[:alpha:]]	alphabetic (== [A-Za-z])
178[[:ascii:]]	ASCII (== [\x00-\x7F])
179[[:blank:]]	blank (== [\t ])
180[[:cntrl:]]	control (== [\x00-\x1F\x7F])
181[[:digit:]]	digits (== [0-9])
182[[:graph:]]	graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~])
183[[:lower:]]	lower case (== [a-z])
184[[:print:]]	printable (== [ -~] == [ [:graph:]])
185[[:punct:]]	punctuation (== [!-/:-@[-`{-~])
186[[:space:]]	whitespace (== [\t\n\v\f\r ])
187[[:upper:]]	upper case (== [A-Z])
188[[:word:]]	word characters (== [0-9A-Za-z_])
189[[:xdigit:]]	hex digit (== [0-9A-Fa-f])
190
191Unicode character class names--general category:
192C	other
193Cc	control
194Cf	format
195Cn	unassigned code points NOT SUPPORTED
196Co	private use
197Cs	surrogate
198L	letter
199LC	cased letter NOT SUPPORTED
200L&	cased letter NOT SUPPORTED
201Ll	lowercase letter
202Lm	modifier letter
203Lo	other letter
204Lt	titlecase letter
205Lu	uppercase letter
206M	mark
207Mc	spacing mark
208Me	enclosing mark
209Mn	non-spacing mark
210N	number
211Nd	decimal number
212Nl	letter number
213No	other number
214P	punctuation
215Pc	connector punctuation
216Pd	dash punctuation
217Pe	close punctuation
218Pf	final punctuation
219Pi	initial punctuation
220Po	other punctuation
221Ps	open punctuation
222S	symbol
223Sc	currency symbol
224Sk	modifier symbol
225Sm	math symbol
226So	other symbol
227Z	separator
228Zl	line separator
229Zp	paragraph separator
230Zs	space separator
231
232Unicode character class names--scripts:
233Adlam
234Ahom
235Anatolian_Hieroglyphs
236Arabic
237Armenian
238Avestan
239Balinese
240Bamum
241Bassa_Vah
242Batak
243Bengali
244Bhaiksuki
245Bopomofo
246Brahmi
247Braille
248Buginese
249Buhid
250Canadian_Aboriginal
251Carian
252Caucasian_Albanian
253Chakma
254Cham
255Cherokee
256Common
257Coptic
258Cuneiform
259Cypriot
260Cyrillic
261Deseret
262Devanagari
263Dogra
264Duployan
265Egyptian_Hieroglyphs
266Elbasan
267Ethiopic
268Georgian
269Glagolitic
270Gothic
271Grantha
272Greek
273Gujarati
274Gunjala_Gondi
275Gurmukhi
276Han
277Hangul
278Hanifi_Rohingya
279Hanunoo
280Hatran
281Hebrew
282Hiragana
283Imperial_Aramaic
284Inherited
285Inscriptional_Pahlavi
286Inscriptional_Parthian
287Javanese
288Kaithi
289Kannada
290Katakana
291Kayah_Li
292Kharoshthi
293Khmer
294Khojki
295Khudawadi
296Lao
297Latin
298Lepcha
299Limbu
300Linear_A
301Linear_B
302Lisu
303Lycian
304Lydian
305Mahajani
306Makasar
307Malayalam
308Mandaic
309Manichaean
310Marchen
311Masaram_Gondi
312Medefaidrin
313Meetei_Mayek
314Mende_Kikakui
315Meroitic_Cursive
316Meroitic_Hieroglyphs
317Miao
318Modi
319Mongolian
320Mro
321Multani
322Myanmar
323Nabataean
324New_Tai_Lue
325Newa
326Nko
327Nushu
328Ogham
329Ol_Chiki
330Old_Hungarian
331Old_Italic
332Old_North_Arabian
333Old_Permic
334Old_Persian
335Old_Sogdian
336Old_South_Arabian
337Old_Turkic
338Oriya
339Osage
340Osmanya
341Pahawh_Hmong
342Palmyrene
343Pau_Cin_Hau
344Phags_Pa
345Phoenician
346Psalter_Pahlavi
347Rejang
348Runic
349Samaritan
350Saurashtra
351Sharada
352Shavian
353Siddham
354SignWriting
355Sinhala
356Sogdian
357Sora_Sompeng
358Soyombo
359Sundanese
360Syloti_Nagri
361Syriac
362Tagalog
363Tagbanwa
364Tai_Le
365Tai_Tham
366Tai_Viet
367Takri
368Tamil
369Tangut
370Telugu
371Thaana
372Thai
373Tibetan
374Tifinagh
375Tirhuta
376Ugaritic
377Vai
378Warang_Citi
379Yi
380Zanabazar_Square
381
382Vim character classes:
383\i	identifier character NOT SUPPORTED vim
384\I	«\i» except digits NOT SUPPORTED vim
385\k	keyword character NOT SUPPORTED vim
386\K	«\k» except digits NOT SUPPORTED vim
387\f	file name character NOT SUPPORTED vim
388\F	«\f» except digits NOT SUPPORTED vim
389\p	printable character NOT SUPPORTED vim
390\P	«\p» except digits NOT SUPPORTED vim
391\s	whitespace character (== [ \t]) NOT SUPPORTED vim
392\S	non-white space character (== [^ \t]) NOT SUPPORTED vim
393\d	digits (== [0-9]) vim
394\D	not «\d» vim
395\x	hex digits (== [0-9A-Fa-f]) NOT SUPPORTED vim
396\X	not «\x» NOT SUPPORTED vim
397\o	octal digits (== [0-7]) NOT SUPPORTED vim
398\O	not «\o» NOT SUPPORTED vim
399\w	word character vim
400\W	not «\w» vim
401\h	head of word character NOT SUPPORTED vim
402\H	not «\h» NOT SUPPORTED vim
403\a	alphabetic NOT SUPPORTED vim
404\A	not «\a» NOT SUPPORTED vim
405\l	lowercase NOT SUPPORTED vim
406\L	not lowercase NOT SUPPORTED vim
407\u	uppercase NOT SUPPORTED vim
408\U	not uppercase NOT SUPPORTED vim
409\_x	«\x» plus newline, for any «x» NOT SUPPORTED vim
410
411Vim flags:
412\c	ignore case NOT SUPPORTED vim
413\C	match case NOT SUPPORTED vim
414\m	magic NOT SUPPORTED vim
415\M	nomagic NOT SUPPORTED vim
416\v	verymagic NOT SUPPORTED vim
417\V	verynomagic NOT SUPPORTED vim
418\Z	ignore differences in Unicode combining characters NOT SUPPORTED vim
419
420Magic:
421(?{code})	arbitrary Perl code NOT SUPPORTED perl
422(??{code})	postponed arbitrary Perl code NOT SUPPORTED perl
423(?n)	recursive call to regexp capturing group «n» NOT SUPPORTED
424(?+n)	recursive call to relative group «+n» NOT SUPPORTED
425(?-n)	recursive call to relative group «-n» NOT SUPPORTED
426(?C)	PCRE callout NOT SUPPORTED pcre
427(?R)	recursive call to entire regexp (== (?0)) NOT SUPPORTED
428(?&name)	recursive call to named group NOT SUPPORTED
429(?P=name)	named backreference NOT SUPPORTED
430(?P>name)	recursive call to named group NOT SUPPORTED
431(?(cond)true|false)	conditional branch NOT SUPPORTED
432(?(cond)true)	conditional branch NOT SUPPORTED
433(*ACCEPT)	make regexps more like Prolog NOT SUPPORTED
434(*COMMIT)	NOT SUPPORTED
435(*F)	NOT SUPPORTED
436(*FAIL)	NOT SUPPORTED
437(*MARK)	NOT SUPPORTED
438(*PRUNE)	NOT SUPPORTED
439(*SKIP)	NOT SUPPORTED
440(*THEN)	NOT SUPPORTED
441(*ANY)	set newline convention NOT SUPPORTED
442(*ANYCRLF)	NOT SUPPORTED
443(*CR)	NOT SUPPORTED
444(*CRLF)	NOT SUPPORTED
445(*LF)	NOT SUPPORTED
446(*BSR_ANYCRLF)	set \R convention NOT SUPPORTED pcre
447(*BSR_UNICODE)	NOT SUPPORTED pcre
448
449