Q: What causes 'catastrophic backtracking'?

Nested quantifiers on overlapping patterns like (a+)+ or (.*)*. The engine tries exponential combinations on a near-match. Fix with possessive quantifiers (PCRE), atomic groups, or by rewriting to avoid ambiguity.

Q: What's the difference between \w and [A-Za-z0-9_]?

In ASCII mode they're identical. With the Unicode (u) flag in JS or by default in Python 3, \w also matches letters/digits from other scripts. Use [A-Za-z0-9_] when you specifically need ASCII-only.

Q: How do I match a literal dot, parenthesis or backslash?

Escape it: \.  \\. Inside a character class [...] most metacharacters lose their meaning — [.] is just a dot. The only specials inside a class are ] \ ^ -.

Q: Why does my regex match more than expected?

Almost always greedy quantifiers. Switch . * to . *? (lazy), or use a negated class like [^"]* to bound the match precisely.

Q: Are regex flavors compatible across languages?

Core syntax (classes, quantifiers, anchors, groups) is portable. Lookbehind, named groups, possessive quantifiers and Unicode property escapes (\p{L}) vary. Go's RE2 has no backreferences or lookaround at all.

Question 1

What's the difference between greedy and lazy quantifiers?

Accepted Answer

Greedy (*, +, ?) match as much as possible. Lazy (*?, +?, ??) match as little as possible. Use lazy when matching between delimiters: <.+?> instead of <.+>, otherwise one match swallows everything up to the last >.

Question 2

When should I use a non-capturing group?

Accepted Answer

Use (?:...) when you need grouping for alternation or quantifiers but don't need the captured value. It's slightly faster and keeps your capture indices ($1, $2…) clean.

Question 3

Why doesn't my multiline regex work?

Accepted Answer

By default ^ and $ match the start/end of the whole string, not each line. Add the m flag for line-based anchors. For . to match newlines, add the s (dotall) flag.

Question 4

Should I parse HTML or JSON with regex?

Accepted Answer

No. Use a real parser (DOMParser, json.loads, etc.). Regex breaks on nested or malformed input. Use it only for token-level extraction inside already-parsed text.

Question 5

What is a lookahead vs a lookbehind?

Accepted Answer

Lookahead (?=...) asserts what comes next without consuming characters. Lookbehind (?<=...) asserts what came before. Negative versions (?!...) and (?<!...) assert absence. They are zero-width — the match length is unchanged.

Question 6

What causes 'catastrophic backtracking'?

Accepted Answer

Nested quantifiers on overlapping patterns like (a+)+ or (.*)*. The engine tries exponential combinations on a near-match. Fix with possessive quantifiers (PCRE), atomic groups, or by rewriting to avoid ambiguity.

Question 7

What's the difference between \w and [A-Za-z0-9_]?

Accepted Answer

In ASCII mode they're identical. With the Unicode (u) flag in JS or by default in Python 3, \w also matches letters/digits from other scripts. Use [A-Za-z0-9_] when you specifically need ASCII-only.

Question 8

How do I match a literal dot, parenthesis or backslash?

Accepted Answer

Escape it: \.  \. Inside a character class [...] most metacharacters lose their meaning — [.] is just a dot. The only specials inside a class are ] \ ^ -.

Question 9

Why does my regex match more than expected?

Accepted Answer

Almost always greedy quantifiers. Switch . * to . *? (lazy), or use a negated class like [^"]* to bound the match precisely.

Question 10

Are regex flavors compatible across languages?

Accepted Answer

Core syntax (classes, quantifiers, anchors, groups) is portable. Lookbehind, named groups, possessive quantifiers and Unicode property escapes (\p{L}) vary. Go's RE2 has no backreferences or lookaround at all.

Token	Matches
.	Any character except newline (use s flag for newlines).
\d	Digit [0-9]. \D = non-digit.
\w	Word char [A-Za-z0-9_]. \W = non-word.
\s	Whitespace (space, tab, newline). \S = non-whitespace.
[abc]	Any of a, b, c. [^abc] negates.
[a-z]	Range. Combine: [A-Za-z0-9_-].
\b	Word boundary. \B = non-boundary.
\t \n \r	Tab, newline, carriage return.
\xhh / \uhhhh	Hex escape / Unicode codepoint.
\\	Literal backslash. Escape any metachar with \.

Anchor	Matches
^	Start of string (or line in multiline mode).
$	End of string (or line in multiline mode).
\A	Absolute start of string (PCRE/Python).
\Z	Absolute end of string (PCRE/Python).
\b	Word boundary — between \w and \W.
\B	Non-word-boundary.

Quantifier	Meaning
*	0 or more (greedy). Use *? for lazy.
+	1 or more. +? for lazy.
?	0 or 1. ?? for lazy.
{n}	Exactly n times.
{n,}	n or more.
{n,m}	Between n and m.
*+ ++ ?+	Possessive (PCRE/Java) — no backtracking. Prevents catastrophic regex.

Construct	Meaning
(abc)	Capturing group. Refer back as \1 or $1.
(?:abc)	Non-capturing group. Use when you only need grouping.
(?<name>abc)	Named capture. Refer as \k<name> or $<name>.
(?=abc)	Positive lookahead — must be followed by abc.
(?!abc)	Negative lookahead.
(?<=abc)	Positive lookbehind.
(?<!abc)	Negative lookbehind.
a\|b	Alternation: a or b. Group with (?:a\|b) to scope.
\1 \2	Backreference to the Nth capture group.

Flag	Effect
i	Case-insensitive.
g	Global — find all matches (JS/PCRE).
m	Multiline — ^ and $ match line boundaries.
s	Dotall — . matches newlines.
x	Extended — ignore whitespace and # comments in pattern (PCRE/Python).
u	Unicode (JS, Python).
y	Sticky — match at lastIndex only (JS).

Regex Cheat Sheet (Syntax, Lookarounds & Recipes)

Quick regex tester

Character classes

Anchors

Quantifiers

Groups & lookarounds

Flags

The 7-second regex trick

Common recipes

Flavor gotchas

Worked examples

Get the Network Engineer Starter Pack

FAQ

Related

Pattern	Regex	Note
Email (pragmatic)	^[\w.+-]+@[\w-]+(\.[\w-]+)+$	Don't try to be RFC 5322 perfect.
IPv4 (strict)	^((25[0-5]\|2[0-4]\d\|1?\d?\d)\.){3}(25[0-5]\|2[0-4]\d\|1?\d?\d)$	Each octet 0–255.
IPv6 (loose)	^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$	Doesn't catch :: shorthand — use a parser for that.
URL (loose)	https?://[\w.-]+(?:\.[\w.-]+)+[\w\-._~:/?#[\]@!$&'()+,;=]	Use a parser if you can.
UUID v4	^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$	Add the i flag.
ISO 8601 date/time	^\d{4}-\d{2}-\d{2}(T\d{2}:\d{2}:\d{2}(\.\d+)?(Z\|[+-]\d{2}:?\d{2})?)?$	Date or datetime.
US phone	^\+?1?[-. ]?\(?\d{3}\)?[-. ]?\d{3}[-. ]?\d{4}$	Accepts common separators.
Hex color	^#?([0-9a-f]{3}\|[0-9a-f]{6})$	3 or 6 digit. Case-insensitive.
Slug	^[a-z0-9]+(?:-[a-z0-9]+)*$	URL-safe kebab-case.
Strong password	^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.[^\w\s]).{8,}$	Lookaheads enforce each rule.
Semantic version	^\d+\.\d+\.\d+(?:-[\w.]+)?(?:\+[\w.]+)?$	MAJOR.MINOR.PATCH[-pre][+build].
Strip ANSI color codes	\x1b\[[0-9;]*m	Clean log output.
Markdown link	\[([^\]]+)\]\(([^)]+)\)	Group 1 = text, Group 2 = URL.
HTML tag (token only)	<\/?([a-zA-Z][a-zA-Z0-9])\b[^>]>	Tokens only — never parse real HTML with regex.

Engine	What's different
JavaScript	Lookbehind requires ES2018+. No \A / \Z — use ^ / $ with no m flag.
PCRE / PHP	Use delimiters: /pattern/flags. Supports possessive quantifiers and recursion.
Python (re)	Use raw strings: r"\d+". re.match anchors at start; use re.search otherwise.
Go (regexp)	RE2-based — no backreferences, no lookaround. Linear time guaranteed.
grep	grep = BRE (escape +, ?, \|). grep -E = ERE. grep -P = PCRE.
sed	BRE by default; sed -E for ERE. Use a different delimiter (s\|a\|b\|) to avoid escaping slashes.