Regular Expression Engine Comparison Chart

Many different applications claim to support regular expressions. But what does that even mean?
Well there are lots of different regular expression engines, and they all have different feature sets and different time-space efficiencies.

ORIGIN

The information here is just copied from regular-expressions.mobi/refflavors.html
But for some reason, it's not accessible unless you have a mobile phone user agent.
Go to the main site for lots of regular expression information and
their commercial product called RegexBuddy.

Flavors (=engines)

Implementations

JGsoft: This flavor is used by the Just Great Software products, including PowerGREP and EditPad Pro.
.NET: This flavor is used by programming languages based on the Microsoft .NET framework versions 1.x, 2.0 or 3.x. It is generally also the regex flavor used by applications developed in these programming languages.
Java: The regex flavor of the java.util.regex package, available in the Java 4 (JDK 1.4.x) and later. A few features were added in Java 5 (JDK 1.5.x) and Java 6 (JDK 1.6.x). It is generally also the regex flavor used by applications developed in Java.
Perl: The regex flavor used in the Perl programming language, versions 5.6 and 5.8. Versions prior to 5.6 do not support Unicode.
PCRE: The open source PCRE library. The feature set described here is available in PCRE 5.x and 6.x. PCRE is the regex engine used by the TPerlRegEx Delphi component and the RegularExrpessions and RegularExpressionsCore units in Delphi XE and C++Builder XE.
ECMA (JavaScript): The regular expression syntax defined in the 3rd edition of the ECMA-262 standard, which defines the scripting language commonly known as JavaScript.
Python: The regex flavor supported by Python's built-in re module.
Ruby: The regex flavor built into the Ruby programming language.
Tcl ARE: The regex flavor developed by Henry Spencer for the regexp command in Tcl 8.2 and 8.4, dubbed Advanced Regular Expressions.
POSIX BRE: Basic Regular Expressions as defined in the IEEE POSIX standard 1003.2.
POSIX ERE: Extended Regular Expressions as defined in the IEEE POSIX standard 1003.2.
GNU BRE: GNU Basic Regular Expressions, which are POSIX BRE with GNU extensions, used in the GNU implementations of classic UNIX tools.
GNU ERE: GNU Extended Regular Expressions, which are POSIX ERE with GNU extensions, used in the GNU implementations of classic UNIX tools.
XML: The regular expression flavor defined in the XML Schema standard.
XPath: The regular expression flavor defined in the XQuery 1.0 and XPath 2.0 Functions and Operators standard.

AceText: Version 2 and later use the JGsoft engine. Version 1 did not support regular expressions at all.
awk: The awk UNIX tool and programming language uses POSIX ERE.
C#: As a .NET programming language, C# can use the System.Text.RegularExpressions classes, listed as ".NET" below.
Delphi for .NET: As a .NET programming language, the .NET version of Delphi can use the System.Text.RegularExpressions classes, listed as ".NET" below.
Delphi for Win32: Delphi for Win32 does not have built-in regular expression support. Many free PCRE wrappers are available.
EditPad Pro: Version 6 and later use the JGsoft engine. Earlier versions used PCRE, without Unicode support.
egrep: The traditional UNIX egrep command uses the "POSIX ERE" flavor, though not all implementations fully adhere to the standard. Linux usually ships with the GNU implementation, which use "GNU ERE".
grep: The traditional UNIX grep command uses the "POSIX BRE" flavor, though not all implementations fully adhere to the standard. Linux usually ships with the GNU implementation, which use "GNU BRE".
Emacs: The GNU implementation of this classic UNIX text editor uses the "GNU ERE" flavor, except that POSIX classes, collations and equivalences are not supported.
Java: The regex flavor of the java.util.regex package is listed as "Java" in the table below.
JavaScript: JavaScript's regex flavor is listed as "ECMA" in the table below.
MySQL: MySQL uses POSIX Extended Regular Expressions, listed as "POSIX ERE" in the table below.
Oracle: Oracle Database 10g implements POSIX Extended Regular Expressions, listed as "POSIX ERE" in the table below. Oracle supports backreferences \1 through \9, though these are not part of the POSIX ERE standard.
Perl: Perl's regex flavor is listed as "Perl" in the table below.
PHP: PHP's ereg functions implement the "POSIX ERE" flavor, while the preg functions implement the "PCRE" flavor.
PostgreSQL: PostgreSQL 7.4 and later uses Henry Spencer's "Advanced Regular Expressions" flavor, listed as "Tcl ARE" in the table below. Earlier versions used POSIX Extended Regular Expressions, listed as POSIX ERE.

PowerGREP: Version 3 and later use the JGsoft engine. Earlier versions used PCRE, without Unicode support.
PowerShell: PowerShell's built-in -match and -replace operators use the .NET regex flavor. PowerShell can also use the System.Text.RegularExpressions classes directly.
Python: Python's regex flavor is listed as "Python" in the table below.
R: The regular expression functions in the R language for statistical programming use either the POSIX ERE flavor (default), the PCRE flavor (perl = true) or the POSIX BRE flavor (perl = false, extended = false).
REALbasic: REALbasic's RegEx class is a wrapper around PCRE.
RegexBuddy: Version 3 and later use a special version of the JGsoft engine that emulates all the regular expression flavors in this comparison. Version 2 supported the JGsoft regex flavor only. Version 1 used PCRE, without Unicode support.
Ruby: Ruby's regex flavor is listed as "Ruby" in the table below.
sed: The sed UNIX tool uses POSIX BRE. Linux usually ships with the GNU implementation, which use "GNU BRE".
Tcl: Tcl's Advanced Regular Expression flavor, the default flavor in Tcl 8.2 and later, is listed as "Tcl ARE" in the table below. Tcl's Extended Regular Expression and Basic Regular Expression flavors are listed as "POSIX ERE" and "POSIX BRE" in the table below.
VBScript: VBScript's RegExp object uses the same regex flavor as JavaScript, which is listed as "ECMA" in the table below.
Visual Basic 6: Visual Basic 6 does not have built-in support for regular expressions, but can easily use the "Microsoft VBScript Regular Expressions 5.5" COM object, which implements the "ECMA" flavor listed below.
Visual Basic.NET: As a .NET programming language, VB.NET can use the System.Text.RegularExpressions classes, listed as ".NET" below.
wxWidgets: The wxRegEx class supports 3 flavors. wxRE_ADVANCED is the "Tcl ARE" flavor, wxRE_EXTENDED is "POSIX ERE" and wxRE_BASIC is "POSIX BRE".
XML Schema: The XML Schema regular expression flavor is listed as "XML" in the table below.
XPath: The regex flavor used by XPath functions is listed as "XPath" in the table below.
XQuery: The regex flavor used by XQuery functions is listed as "XPath" in the table below.

Feature Comparison

Characters	JGsoft	.NET	Java	Perl	PCRE	ECMA	Python	Ruby	Tcl ARE	POSIX BRE	POSIX ERE	GNU BRE	GNU ERE	XML	XPath
Backslash escapes one metacharacter	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES
\Q...\E escapes a string of metacharacters	YES	NO	Java 6	YES	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\x00 through \xFF (ASCII character)	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	NO	NO	NO	NO	NO
\n (LF), \r (CR) and \t (tab)	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	NO	NO	NO	YES	YES
\f (form feed) and \v (vtab)	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	NO	NO	NO	NO	NO
\a (bell)	YES	YES	YES	YES	YES	NO	YES	YES	YES	NO	NO	NO	NO	NO	NO
\e (escape)	YES	YES	YES	YES	YES	NO	NO	YES	YES	NO	NO	NO	NO	NO	NO
\b (backspace) and \B (backslash)	NO	NO	NO	NO	NO	NO	NO	NO	YES	NO	NO	NO	NO	NO	NO
\cA through \cZ (control character)	YES	YES	YES	YES	YES	YES	NO	NO	YES	NO	NO	NO	NO	NO	NO
\ca through \cz (control character)	YES	YES	NO	YES	YES	YES	NO	NO	YES	NO	NO	NO	NO	NO	NO

Character Classes (=Sets)

[abc] (character class)	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES
[^abc] (negated character class)	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES
[a-z] (character class range)	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES
Hyphen in [\d-z] is a literal	YES	YES	YES	YES	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
Hyphen in [a-\d] is a literal	YES	NO	NO	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
Backslash escapes one character class metacharacter	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	NO	NO	NO	YES	YES
\Q...\E escapes a string of character class metacharacters	YES	NO	Java 6	YES	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\d (shorthand for digits)	YES	YES	ascii	YES	ascii	ascii	option	ascii	YES	NO	NO	NO	NO	YES	YES
\w (shorthand for word characters)	YES	YES	ascii	YES	ascii	ascii	option	ascii	YES	NO	NO	YES	YES	YES	YES
\s (shorthand for whitespace)	YES	YES	ascii	YES	ascii	YES	option	ascii	YES	NO	NO	YES	YES	ascii	ascii
\D, \W and \S (shorthand negated character classes)	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	NO	YES	YES	YES	YES
[\b] (backspace)	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	NO	NO	NO	NO	NO

Dot and Anchors

. (dot; any character except line break)	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES
^ (start of string/line)	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	YES
$ (end of string/line)	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	YES
\A (start of string)	YES	YES	YES	YES	YES	NO	YES	YES	YES	NO	NO	NO	NO	NO	NO
\Z (end of string, before final line break)	YES	YES	YES	YES	YES	NO	NO	YES	YES	NO	NO	NO	NO	NO	NO
\z (end of string)	YES	YES	YES	YES	YES	NO	\Z	YES	NO	NO	NO	NO	NO	NO	NO
\` (start of string)	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	YES	YES	NO	NO
\' (end of string)	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	YES	YES	NO	NO

Word Boundaries

\b (at the beginning or end of a word)	YES	YES	YES	YES	ascii	ascii	option	ascii	NO	NO	NO	YES	YES	NO	NO
\B (NOT at the beginning or end of a word)	YES	YES	YES	YES	ascii	ascii	option	ascii	NO	NO	NO	YES	YES	NO	NO
\y (at the beginning or end of a word)	YES	NO	NO	NO	NO	NO	NO	NO	YES	NO	NO	NO	NO	NO	NO
\Y (NOT at the beginning or end of a word)	YES	NO	NO	NO	NO	NO	NO	NO	YES	NO	NO	NO	NO	NO	NO
\m (at the beginning of a word)	YES	NO	NO	NO	NO	NO	NO	NO	YES	NO	NO	NO	NO	NO	NO
\M (at the end of a word)	YES	NO	NO	NO	NO	NO	NO	NO	YES	NO	NO	NO	NO	NO	NO
\< (at the beginning of a word)	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	YES	YES	NO	NO
\> (at the end of a word)	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	YES	YES	NO	NO

Alternation and Quantifiers

\| (alternation)	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	YES	\\|	YES	YES	YES
? (0 or 1)	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	YES	\?	YES	YES	YES
* (0 or more)	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES
+ (1 or more)	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	YES	\+	YES	YES	YES
{n} (exactly n)	YES	YES	YES	YES	YES	YES	YES	YES	YES	\{n\}	YES	\{n\}	YES	YES	YES
{n,m} (between n and m)	YES	YES	YES	YES	YES	YES	YES	YES	YES	\{n,m\}	YES	\{n,m\}	YES	YES	YES
{n,} (n or more)	YES	YES	YES	YES	YES	YES	YES	YES	YES	\{n\}	YES	\{n\}	YES	YES	YES
? after any of the above quantifiers to make it "lazy"	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	NO	NO	NO	NO	YES

Grouping and Backreferences

(regex) (numbered capturing group)	YES	YES	YES	YES	YES	YES	YES	YES	YES	\( \)	YES	\( \)	YES	YES	YES
(?:regex) (non-capturing group)	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	NO	NO	NO	NO	NO
\1 through \9 (backreferences)	YES	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	YES	YES	NO	YES
\10 through \99 (backreferences)	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	N/A	NO	NO	N/A	YES
Forward references \1 through \9	YES	YES	YES	YES	YES	NO	NO	YES	NO	NO	N/A	NO	NO	N/A	NO
Nested references \1 through \9	YES	YES	YES	YES	YES	YES	NO	YES	NO	NO	N/A	NO	NO	N/A	NO
Backreferences non-existent groups are an error	YES	YES	YES	YES	YES	NO	YES	NO	YES	YES	N/A	YES	YES	N/A	YES
Backreferences to failed groups also fail	YES	YES	YES	YES	YES	NO	YES	YES	YES	YES	N/A	YES	YES	N/A	YES
\G (start of match attempt)	YES	YES	YES	YES	YES	NO	NO	YES	NO	NO	NO	NO	NO	NO	NO

Modifiers

(?i) (case insensitive)	YES	YES	YES	YES	YES	/i only	YES	YES	YES	NO	NO	NO	NO	NO	flag
(?s) (dot matches newlines)	YES	YES	YES	YES	YES	NO	YES	(?m)	NO	NO	NO	NO	NO	NO	flag
(?m) (^ and $ match at line breaks)	YES	YES	YES	YES	YES	/m only	YES	always on	NO	NO	NO	NO	NO	NO	flag
(?x) (free-spacing mode)	YES	YES	YES	YES	YES	NO	YES	YES	YES	NO	NO	NO	NO	NO	flag
(?n) (explicit capture)	YES	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
(?-ismxn) (turn off mode modifiers)	YES	YES	YES	YES	YES	NO	NO	YES	NO	NO	NO	NO	NO	NO	NO
(?ismxn:group) (mode modifiers local to group)	YES	YES	YES	YES	YES	NO	NO	YES	NO	NO	NO	NO	NO	NO	NO

Atomic Grouping and Possessive Quantifiers

(?>regex) (atomic group)	YES	YES	YES	YES	YES	NO	3.11	YES	NO	NO	NO	NO	NO	NO	NO
?+, *+, ++ and {m,n}+ (possessive quantifiers)	YES	NO	YES	NO	YES	NO	3.11	NO	NO	NO	NO	NO	NO	NO	NO

Lookaround

(?=regex) (positive lookahead)	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	NO	NO	NO	NO	NO
(?!regex) (negative lookahead)	YES	YES	YES	YES	YES	YES	YES	YES	YES	NO	NO	NO	NO	NO	NO
(?<=text) (positive lookbehind)	full regex	full regex	finite length	fixed length	fixed + alternation	NO	fixed length	NO	NO	NO	NO	NO	NO	NO	NO
(?<!text) (negative lookbehind)	full regex	full regex	finite length	fixed length	fixed + alternation	NO	fixed length	NO	NO	NO	NO	NO	NO	NO	NO

Conditionals

(?(?=regex)then\|else) (using any lookaround)	YES	YES	NO	YES	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
(?(regex)then\|else)	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
(?(1)then\|else)	YES	YES	NO	YES	YES	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO
(?(group)then\|else)	YES	YES	NO	NO	YES	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO

Comments and Free-Spacing Syntax

(?#comment)	YES	YES	NO	YES	YES	NO	YES	YES	YES	NO	NO	NO	NO	NO	NO
Free-spacing syntax supported	YES	YES	YES	YES	YES	NO	YES	YES	YES	NO	NO	NO	NO	NO	YES
Character class is a single token	YES	YES	NO	YES	YES	N/A	YES	YES	YES	N/A	N/A	N/A	N/A	N/A	YES
# starts a comment	YES	YES	YES	YES	YES	N/A	YES	YES	YES	N/A	N/A	N/A	N/A	N/A	NO

Unicode Characters

\X (Unicode grapheme)	YES	NO	NO	YES	option	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\u0000 through \uFFFF (Unicode character)	YES	YES	YES	NO	NO	YES	3.0	NO	YES	NO	NO	NO	NO	NO	NO
\x{0} through \x{FFFF} (Unicode character)	YES	NO	NO	YES	option	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO

Unicode Properties, Scripts, Blocks

\pL through \pC (Unicode properties)	YES	NO	YES	YES	option	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\p{L} through \p{C} (Unicode properties)	YES	YES	YES	YES	option	NO	NO	NO	NO	NO	NO	NO	NO	YES	YES
\p{Lu} through \p{Cn} (Unicode property)	YES	YES	YES	YES	option	NO	NO	NO	NO	NO	NO	NO	NO	YES	YES
\p{L&} and \p{Letter&} (equivalent of [\p{Lu}\p{Ll}\p{Lt}] Unicode properties)	YES	NO	NO	YES	option	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\p{IsL} through \p{IsC} (Unicode properties)	YES	NO	YES	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\p{IsLu} through \p{IsCn} (Unicode property)	YES	NO	YES	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\p{Letter} through \p{Other} (Unicode properties)	YES	NO	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\p{Lowercase_Letter} through \p{Not_Assigned} (Unicode property)	YES	NO	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\p{IsLetter} through \p{IsOther} (Unicode properties)	YES	NO	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\p{IsLowercase_Letter} through \p{IsNot_Assigned} (Unicode property)	YES	NO	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\p{Arabic} through \p{Yi} (Unicode script)	YES	NO	NO	YES	option	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\p{IsArabic} through \p{IsYi} (Unicode script)	YES	NO	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\p{BasicLatin} through \p{Specials} (Unicode block)	YES	NO	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\p{InBasicLatin} through \p{InSpecials} (Unicode block)	YES	NO	YES	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\p{IsBasicLatin} through \p{IsSpecials} (Unicode block)	YES	YES	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	YES	YES
Part between {} in all of the above is case insensitive	YES	NO	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
Spaces, hyphens and underscores allowed in all long names listed above (e.g. "BasicLatin" can be written as "Basic-Latin" or "Basic_Latin" or "Basic Latin")	YES	NO	Java 5	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\P (negated variants of all \p as listed above)	YES	YES	YES	YES	option	NO	NO	NO	NO	NO	NO	NO	NO	YES	YES
\p{^...} (negated variants of all \p{...} as listed above)	YES	NO	NO	YES	option	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO

Named Capture and Backreferences

(?<name>regex) (.NET-style named capturing group)	YES	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
(?'name'regex) (.NET-style named capturing group)	YES	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\k<name> (.NET-style named backreference)	YES	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\k'name' (.NET-style named backreference)	YES	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
(?P<name>regex) (Python-style named capturing group)	YES	NO	NO	NO	YES	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO
(?P=name) (Python-style named backreference)	YES	NO	NO	NO	YES	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO
Multiple capturing groups can have the same name	YES	YES	N/A	N/A	NO	N/A	NO	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A

XML Character Classes

\i, \I, \c and \C (shorthand XML name character classes)	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	YES	YES
[abc-[abc]] (character class subtraction)	YES	2.0	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	YES	YES

POSIX Bracket Expressions

[:alpha:] (POSIX character class)	YES	NO	NO	YES	ascii	NO	NO	YES	YES	YES	YES	YES	YES	NO	NO
\p{Alpha} (POSIX character class)	YES	NO	ascii	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
\p{IsAlpha} (POSIX character class)	YES	NO	NO	YES	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO	NO
[.span-ll.] (POSIX collation sequence)	NO	NO	NO	NO	NO	NO	NO	NO	YES	YES	YES	YES	YES	NO	NO
[=x=] (POSIX character equivalence)	NO	NO	NO	NO	NO	NO	NO	NO	YES	YES	YES	YES	YES	NO	NO

Characters

JGsoft

.NET

Java

Perl

PCRE

ECMA

Python

Ruby

Tcl ARE

POSIX BRE

POSIX ERE

GNU BRE

GNU ERE

XML

XPath

Backslash escapes one metacharacter

YES

\Q...\E escapes a string of metacharacters

YES

Java 6

YES

\x00 through \xFF (ASCII character)

YES

\n (LF), \r (CR) and \t (tab)

YES

\f (form feed) and \v (vtab)

YES

\a (bell)

YES

\e (escape)

YES

\b (backspace) and \B (backslash)

YES

\cA through \cZ (control character)

YES

\ca through \cz (control character)

YES

Character Classes (=Sets)

[abc] (character class)

YES

[^abc] (negated character class)

YES

[a-z] (character class range)

YES

Hyphen in [\d-z] is a literal

YES

Hyphen in [a-\d] is a literal

YES

Backslash escapes one character class metacharacter

YES

\Q...\E escapes a string of character class metacharacters

YES

Java 6

YES

\d (shorthand for digits)

YES

ascii

YES

ascii

option

ascii

YES

\w (shorthand for word characters)

YES

ascii

YES

ascii

option

ascii

YES

\s (shorthand for whitespace)

YES

ascii

YES

ascii

YES

option

ascii

YES

ascii

\D, \W and \S (shorthand negated character classes)

YES

[\b] (backspace)

YES

Dot and Anchors

. (dot; any character except line break)

YES

^ (start of string/line)

YES

$ (end of string/line)

YES

\A (start of string)

YES

\Z (end of string, before final line break)

YES

\z (end of string)

YES

\` (start of string)

YES

\' (end of string)

YES

Word Boundaries

\b (at the beginning or end of a word)

YES

ascii

option

ascii

YES

\B (NOT at the beginning or end of a word)

YES

ascii

option

ascii

YES

\y (at the beginning or end of a word)

YES

\Y (NOT at the beginning or end of a word)

YES

\m (at the beginning of a word)

YES

\M (at the end of a word)

YES

\< (at the beginning of a word)

YES

\> (at the end of a word)

YES

Alternation and Quantifiers

| (alternation)

YES

? (0 or 1)

YES

* (0 or more)

YES

+ (1 or more)

YES

{n} (exactly n)

YES

\{n\}

YES

\{n\}

YES

{n,m} (between n and m)

YES

\{n,m\}

YES

\{n,m\}

YES

{n,} (n or more)

YES

\{n\}

YES

\{n\}

YES

? after any of the above quantifiers to make it "lazy"

YES

Grouping and Backreferences

(regex) (numbered capturing group)

YES

YES

YES

(?:regex) (non-capturing group)

YES

\1 through \9 (backreferences)

YES

\10 through \99 (backreferences)

YES

N/A

YES

Forward references \1 through \9

YES

N/A

Nested references \1 through \9

YES

N/A

Backreferences non-existent groups are an error

YES

N/A

YES

N/A

YES

Backreferences to failed groups also fail

YES

N/A

YES

N/A

YES

\G (start of match attempt)

YES

Modifiers

(?i) (case insensitive)

YES

/i only

YES

flag

(?s) (dot matches newlines)

YES

(?m)

flag

(?m) (^ and $ match at line breaks)

YES

/m only

YES

always on

flag

(?x) (free-spacing mode)

YES

flag

(?n) (explicit capture)

YES

(?-ismxn) (turn off mode modifiers)

YES

(?ismxn:group) (mode modifiers local to group)

YES

Atomic Grouping and Possessive Quantifiers

(?>regex) (atomic group)

YES

3.11

YES

?+, *+, ++ and {m,n}+ (possessive quantifiers)

YES

3.11

Lookaround

(?=regex) (positive lookahead)

YES

(?!regex) (negative lookahead)

YES

(?<=text) (positive lookbehind)

full regex

finite length

fixed length

fixed + alternation

fixed length

(?<!text) (negative lookbehind)

full regex

finite length

fixed length

fixed + alternation

fixed length

Conditionals

(?(?=regex)then|else) (using any lookaround)

YES

(?(regex)then|else)

YES

(?(1)then|else)

YES

(?(group)then|else)

YES

Comments and Free-Spacing Syntax

(?#comment)

YES

Free-spacing syntax supported

YES

Character class is a single token

YES

N/A

YES

N/A

YES

# starts a comment

YES

N/A

YES

N/A

Unicode Characters

\X (Unicode grapheme)

YES

option

\u0000 through \uFFFF (Unicode character)

YES

3.0

YES

\x{0} through \x{FFFF} (Unicode character)

YES

option

Unicode Properties, Scripts, Blocks

\pL through \pC (Unicode properties)

YES

option

\p{L} through \p{C} (Unicode properties)

YES

option

YES

\p{Lu} through \p{Cn} (Unicode property)

YES

option

YES

\p{L&} and \p{Letter&} (equivalent of [\p{Lu}\p{Ll}\p{Lt}] Unicode properties)

YES

option

\p{IsL} through \p{IsC} (Unicode properties)

YES

\p{IsLu} through \p{IsCn} (Unicode property)

YES

\p{Letter} through \p{Other} (Unicode properties)

YES

\p{Lowercase_Letter} through \p{Not_Assigned} (Unicode property)

YES

\p{IsLetter} through \p{IsOther} (Unicode properties)

YES

\p{IsLowercase_Letter} through \p{IsNot_Assigned} (Unicode property)

YES

\p{Arabic} through \p{Yi} (Unicode script)

YES

option

\p{IsArabic} through \p{IsYi} (Unicode script)

YES

\p{BasicLatin} through \p{Specials} (Unicode block)

YES

\p{InBasicLatin} through \p{InSpecials} (Unicode block)

YES

\p{IsBasicLatin} through \p{IsSpecials} (Unicode block)

YES

Part between {} in all of the above is case insensitive

YES

Spaces, hyphens and underscores allowed in all long names listed above (e.g. "BasicLatin" can be written as "Basic-Latin" or "Basic_Latin" or "Basic Latin")

YES

Java 5

YES

\P (negated variants of all \p as listed above)

YES

option

YES

\p{^...} (negated variants of all \p{...} as listed above)

YES

option

Named Capture and Backreferences

(?<name>regex) (.NET-style named capturing group)

YES

(?'name'regex) (.NET-style named capturing group)

YES

\k<name> (.NET-style named backreference)

YES

\k'name' (.NET-style named backreference)

YES

(?P<name>regex) (Python-style named capturing group)

YES

(?P=name) (Python-style named backreference)

YES

Multiple capturing groups can have the same name

YES

N/A

XML Character Classes

\i, \I, \c and \C (shorthand XML name character classes)

YES

[abc-[abc]] (character class subtraction)

YES

2.0

YES

POSIX Bracket Expressions

[:alpha:] (POSIX character class)

YES

ascii

YES

\p{Alpha} (POSIX character class)

YES

ascii

\p{IsAlpha} (POSIX character class)

YES

[.span-ll.] (POSIX collation sequence)

YES

[=x=] (POSIX character equivalence)

YES