Regular Expression Syntax
Regular expressions or regex
are sequences of characters that define a pattern.
There are three types of regular expressions: regex (OS_Regex), sregex (OS_Match) and PCRE2.
Regex (OS_Regex) syntax
This is a fast and simple library for regular expressions in C.
This library is designed to be simple while still supporting the most common regular expressions.
Supported expressions
Expressions |
Valid characters |
---|---|
\w |
A-Z, a-z, 0-9, '-', '@', '_' characters |
\d |
0-9 character |
\s |
Spaces " " |
\t |
Tabs |
\p |
()*+,-.:;<=>?[]!"'#$%&|{} |
\W |
Anything not \w |
\D |
Anything not \d |
\S |
Anything not \s |
\. |
Anything |
Modifiers
Expressions |
Actions |
---|---|
+ |
To match one or more times |
* |
To match zero or more times |
Special characters
Expressions |
Actions |
---|---|
^ |
To specify the beginning of the text |
$ |
To specify the end of the text |
| |
To create a logical or between multiple patterns |
Characters escaping
To utilize the following characters they must be escaped with: \
$ |
( |
) |
\ |
| |
< |
---|---|---|---|---|---|
\$ |
\( |
\) |
\ \ |
\| |
\< |
Limitations
The
*
and+
modifiers can only be applied to backslash expressions, not bare characters (e.g.\d+
is supported,0+
is not)You cannot use alternation in a group, e.g.
(foo|bar)
is not permittedComplex backtracking is not supported, e.g.
\p*\d*\s*\w*:
does not match a single colon, because\p*
consumes the colon.
matches a literal dot, whereas\.
matches any character\s
matches only an ASCII space (32), not other whitespace like tabthere is no syntax to match a literal caret, asterisk or plus (although
\p
will match asterisk or plus, along with some other characters)
Sregex (OS_Match) syntax
This is faster than OS_Regex, but only supports simple string matching and the following special characters.
Special characters
Expressions |
Actions |
---|---|
^ |
To specify the beginning of the text |
$ |
To specify the end of the text |
| |
To create a logic: or, between multiple patterns |
! |
To negate the expression |
PCRE2 syntax
Perl Compatible Regular Expressions (PCRE) tries to match Perl syntax and semantics as closely as it can.
It provides features like recursive patterns, look-ahead and look-behind assertions, non-capturing groups, non-greedy quantifiers, extended syntax for characters and character classes, and many other. For more details, please refer to the PCRE Syntax documentation.
Supported expressions
Expressions |
Actions |
---|---|
. |
Any character except newline |
\d |
Any decimal digit, equal to [0-9] |
\D |
Any character that is not a decimal digit, equal to [^0-9] |
\h |
Any horizontal white space character |
\H |
Any character that is not a horizontal white space character |
\s |
Any white space character, equal to [\t\r\n\f] |
\S |
Any character that is not a white space character, equal to [^\t\r\n\f] |
\w |
Any "word" character |
\W |
Any "non-word" character |
Characters escaping
Expressions |
Actions |
---|---|
\f |
Form feed (hex 0C) |
\n |
Newline (hex 0A) |
\r |
Carriage return (hex 0D) |
\t |
Tab (hex 09) |
\0dd |
Character with octal code 0dd |
\o{ddd..} |
Character with octal code ddd.. |
\xhh |
Character with hex code hh |
\x{hh..} |
Character with hex code hh.. |
Quantifiers
Expressions |
Actions |
---|---|
? |
0 or 1, greedy |
?+ |
0 or 1, possessive |
?? |
0 or 1, lazy |
* |
0 or more, greedy |
*+ |
0 or more, possessive |
*? |
0 or more, lazy |
+ |
1 or more, greedy |
++ |
1 or more, possessive |
+? |
1 or more, lazy |
{n} |
Exactly n |
{n,m} |
At least n, no more than m, greedy |
{n,m}+ |
At least n, no more than m, possessive |
{n,m}? |
At least n, no more than m, lazy |
{n,} |
n or more, greedy |
{n,}+ |
n or more, possessive |
{n,}? |
n or more, lazy |