A concise cheatsheet for using Regular Expressions in JavaScript
- Follows my mental model.
- Intentionally non-comprehensive. Only includes syntax and parts of the API that I actually use.
- Certain concepts are imprecisely defined. (For example, some definitions do not account for characters that are very rarely encountered in input strings.)
There should be no capturing groups and no
g
flag inregexp
.
/y/.test('xx') //=> false
/x/.test('xx') //=> true
regexp.lastIndex
may change with each call toregexp.test(string)
.
const regexp = /x/g
regexp.lastIndex //=> 0
regexp.test('xx') //=> true
regexp.lastIndex //=> 1
regexp.test('xx') //=> true
regexp.lastIndex //=> 2
regexp.test('xx') //=> false
regexp.lastIndex //=> 0
There should be no capturing groups if there’s also a
g
flag inregexp
.
'xx'.match(/y/) //=> null
'xx'.match(/y/g) //=> null
const matches = 'xx'.match(/x/)
matches[0] //=> 'x'
matches.index //=> 0
matches.length //=> 1
const matches = 'xx'.match(/x/g)
matches[0] //=> 'x'
matches[1] //=> 'x'
matches.index //=> undefined
matches.length //=> 2
const matches = 'xyxy'.match(/x(y)/)
matches[0] //=> 'xy'
matches[1] //=> 'y'
matches.index //=> 0
matches.length //=> 2
Capturing groups in
regexp
are ignored; returns the matches only.
const matches = 'xyxy'.match(/x(y)/g)
matches[0] //=> 'xy'
matches[1] //=> 'xy'
matches.index //=> undefined
matches.length //=> 2
There should always be a
g
flag inregexp
.
const iterator = 'xx'.matchAll(/y/g)
const result = []
for (const match of iterator) {
result.push(match[0])
}
result //=> []
const iterator = 'xx'.matchAll(/x/g)
const result = []
for (const match of iterator) {
result.push(match[0])
}
result //=> ['x', 'x']
const iterator = 'xyxy'.matchAll(/x(y)/g)
const result = []
for (const match of iterator) {
result.push([match[0], match[1]])
}
result //=> [['xy', 'y'], ['xy', 'y']]
There should be no capturing groups in
regexp
.
'xx'.replace(/y/, 'z') //=> 'xx'
'xx'.replace(/y/g, 'z') //=> 'xx'
'xx'.replace(/x/, 'z') //=> 'zx'
'xx'.replace(/x/g, 'z') //=> 'zz'
function callback (match) {
return match.toUpperCase()
}
'xx'.replace(/y/, callback) //=> 'xx'
'xx'.replace(/y/g, callback) //=> 'xx'
function callback (match) {
return match.toUpperCase()
}
'xx'.replace(/x/, callback) //=> 'Xx'
'xx'.replace(/x/g, callback) //=> 'XX'
function callback (_, y) {
return y.toUpperCase()
}
'xyxy'.replace(/x(y)/, callback) //=> 'Yxy'
'xyxy'.replace(/x(y)/g, callback) //=> 'YY'
Expression | Description |
---|---|
. or [^\n\r] |
any character excluding a newline or carriage return |
[A-Za-z] |
alphabet |
[a-z] |
lowercase alphabet |
[A-Z] |
uppercase alphabet |
\d or [0-9] |
digit |
\D or [^0-9] |
non-digit |
_ |
underscore |
\w or [A-Za-z0-9_] |
alphabet, digit or underscore |
\W or [^A-Za-z0-9_] |
inverse of \w |
\S |
inverse of \s |
Expression | Description |
---|---|
|
space |
\t |
tab |
\n |
newline |
\r |
carriage return |
\s |
space, tab, newline or carriage return |
Expression | Description |
---|---|
[xyz] |
either x , y or z |
[^xyz] |
neither x , y nor z |
[1-3] |
either 1 , 2 or 3 |
[^1-3] |
neither 1 , 2 nor 3 |
- Think of a character set as an
OR
operation on the single characters that are enclosed between the square brackets. - Use
^
after the opening[
to “negate” the character set. - Within a character set,
.
means a literal period.
Expression | Description |
---|---|
\. |
period |
\^ |
caret |
\$ |
dollar sign |
| |
pipe |
\\ |
back slash |
\/ |
forward slash |
\( |
opening bracket |
\) |
closing bracket |
\[ |
opening square bracket |
\] |
closing square bracket |
\{ |
opening curly bracket |
\} |
closing curly bracket |
Expression | Description |
---|---|
\\ |
back slash |
\] |
closing square bracket |
- A
^
must be escaped only if it occurs immediately after the opening[
of the character set. - A
-
must be escaped only if it occurs between two alphabets or two digits.
Expression | Description |
---|---|
{2} |
exactly 2 |
{2,} |
at least 2 |
{2,7} |
at least 2 but no more than 7 |
* |
0 or more |
+ |
1 or more |
? |
exactly 0 or 1 |
- The quantifier goes after the expression to be quantified.
Expression | Description |
---|---|
^ |
start of string |
$ |
end of string |
\b |
word boundary |
- How word boundary matching works:
- At the beginning of the string if the first character is
\w
. - Between two adjacent characters within the string, if the first character is
\w
and the second character is\W
. - At the end of the string if the last character is
\w
.
- At the beginning of the string if the first character is
Expression | Description |
---|---|
foo|bar |
match either foo or bar |
foo(?=bar) |
match foo if it’s before bar |
foo(?!bar) |
match foo if it’s not before bar |
(?<=bar)foo |
match foo if it’s after bar |
(?<!bar)foo |
match foo if it’s not after bar |
Expression | Description |
---|---|
(foo) |
capturing group; match and capture foo |
(?:foo) |
non-capturing group; match foo but without capturing foo |
(foo)bar\1 |
\1 is a backreference to the 1st capturing group; match foobarfoo |
- Capturing groups are only relevant in the following methods:
string.match(regexp)
string.matchAll(regexp)
string.replace(regexp, callback)
\N
is a backreference to theNth
capturing group. Capturing groups are numbered starting from 1.
Flag | Description |
---|---|
g |
global search |
i |
case-insensitive search |
m |
multi-line search |
- If the
g
flag is used,regexp.lastIndex
may change with each call toregexp.test(string)
. - If the
m
flag is used,^
and$
will match the start and end of each line.