// regex

Regex Cheat Sheet (Syntax, Lookarounds & Recipes)

Every regular expression construct you actually use — character classes, anchors, quantifiers, capturing groups, lookarounds and flags — plus a recipe book of battle-tested patterns and a live tester. Bookmark this. It answers 90% of regex questions in one glance.

Updated

Quick regex tester

Edit the pattern, flags or input — matches update instantly (JavaScript engine).

3 matches:
  • @10 2026-05-18
  • @34 2026-06-01
  • @63 2026-04-30

Character classes

TokenMatches
.Any character except newline (use s flag for newlines).
\dDigit [0-9]. \D = non-digit.
\wWord char [A-Za-z0-9_]. \W = non-word.
\sWhitespace (space, tab, newline). \S = non-whitespace.
[abc]Any of a, b, c. [^abc] negates.
[a-z]Range. Combine: [A-Za-z0-9_-].
\bWord boundary. \B = non-boundary.
\t \n \rTab, newline, carriage return.
\xhh / \uhhhhHex escape / Unicode codepoint.
\\Literal backslash. Escape any metachar with \.

Anchors

AnchorMatches
^Start of string (or line in multiline mode).
$End of string (or line in multiline mode).
\AAbsolute start of string (PCRE/Python).
\ZAbsolute end of string (PCRE/Python).
\bWord boundary — between \w and \W.
\BNon-word-boundary.

Quantifiers

QuantifierMeaning
*0 or more (greedy). Use *? for lazy.
+1 or more. +? for lazy.
?0 or 1. ?? for lazy.
{n}Exactly n times.
{n,}n or more.
{n,m}Between n and m.
*+ ++ ?+Possessive (PCRE/Java) — no backtracking. Prevents catastrophic regex.

Groups & lookarounds

ConstructMeaning
(abc)Capturing group. Refer back as \1 or $1.
(?:abc)Non-capturing group. Use when you only need grouping.
(?<name>abc)Named capture. Refer as \k<name> or $<name>.
(?=abc)Positive lookahead — must be followed by abc.
(?!abc)Negative lookahead.
(?<=abc)Positive lookbehind.
(?<!abc)Negative lookbehind.
a|bAlternation: a or b. Group with (?:a|b) to scope.
\1 \2Backreference to the Nth capture group.

Flags

FlagEffect
iCase-insensitive.
gGlobal — find all matches (JS/PCRE).
mMultiline — ^ and $ match line boundaries.
sDotall — . matches newlines.
xExtended — ignore whitespace and # comments in pattern (PCRE/Python).
uUnicode (JS, Python).
ySticky — match at lastIndex only (JS).

The 7-second regex trick

  1. Write the simplest pattern that matches one real example. Test it.
  2. Anchor with ^ and $ for validation, leave unanchored for extraction.
  3. Replace greedy .* with a negated class ([^"]*) or lazy .*?.
  4. Wrap alternatives in non-capturing groups: (?:a|b|c).
  5. Use named captures (?<year>\d4) when you'll reference the match.

Example: extract the timestamp from [2026-05-18 12:34:56] ERROR ... ^\[(?<ts>\d4-\d2-\d2 \d2:\d2:\d2)\]. Captures the date in ts, no greedy wildcards, runs in linear time.

Common recipes

PatternRegexNote
Email (pragmatic)^[\w.+-]+@[\w-]+(\.[\w-]+)+$Don't try to be RFC 5322 perfect.
IPv4 (strict)^((25[0-5]|2[0-4]\d|1?\d?\d)\.){3}(25[0-5]|2[0-4]\d|1?\d?\d)$Each octet 0–255.
IPv6 (loose)^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$Doesn't catch :: shorthand — use a parser for that.
URL (loose)https?://[\w.-]+(?:\.[\w.-]+)+[\w\-._~:/?#[\]@!$&'()*+,;=]*Use a parser if you can.
UUID v4^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$Add the i flag.
ISO 8601 date/time^\d{4}-\d{2}-\d{2}(T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:?\d{2})?)?$Date or datetime.
US phone^\+?1?[-. ]?\(?\d{3}\)?[-. ]?\d{3}[-. ]?\d{4}$Accepts common separators.
Hex color^#?([0-9a-f]{3}|[0-9a-f]{6})$3 or 6 digit. Case-insensitive.
Slug^[a-z0-9]+(?:-[a-z0-9]+)*$URL-safe kebab-case.
Strong password^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\w\s]).{8,}$Lookaheads enforce each rule.
Semantic version^\d+\.\d+\.\d+(?:-[\w.]+)?(?:\+[\w.]+)?$MAJOR.MINOR.PATCH[-pre][+build].
Strip ANSI color codes\x1b\[[0-9;]*mClean log output.
Markdown link\[([^\]]+)\]\(([^)]+)\)Group 1 = text, Group 2 = URL.
HTML tag (token only)<\/?([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>Tokens only — never parse real HTML with regex.

Flavor gotchas

EngineWhat's different
JavaScriptLookbehind requires ES2018+. No \A / \Z — use ^ / $ with no m flag.
PCRE / PHPUse delimiters: /pattern/flags. Supports possessive quantifiers and recursion.
Python (re)Use raw strings: r"\d+". re.match anchors at start; use re.search otherwise.
Go (regexp)RE2-based — no backreferences, no lookaround. Linear time guaranteed.
grepgrep = BRE (escape +, ?, |). grep -E = ERE. grep -P = PCRE.
sedBRE by default; sed -E for ERE. Use a different delimiter (s|a|b|) to avoid escaping slashes.

Worked examples

Extract between quotes
Pattern
"([^"]*)"
Input
name="alice" age="30"
Match 1
alice
Why
Negated class is faster + safer than .*?
Strip trailing whitespace
Pattern
\s+$
Flags
gm
Replace
(empty)
Why
m makes $ match each line end
Validate semver
Pattern
^\d+\.\d+\.\d+$
Match
1.4.2
No match
1.4 / v1.4.2
Note
Add prefix/build suffix if needed
Lookahead password rule
Pattern
^(?=.*\d)(?=.*[A-Z]).{8,}$
Match
Hunter12x
No match
hunter12
Why
Lookaheads add rules without consuming chars
// free download

Get the Network Engineer Starter Pack

A printable 5-page PDF: first-60-seconds triage, modern Linux network commands, BGP show commands & path-selection order, and a symptom → root-cause map. Free, no fluff.

No spam. Unsubscribe anytime. We send occasional updates when we ship new tools or cheatsheets.

FAQ

What's the difference between greedy and lazy quantifiers?
Greedy (*, +, ?) match as much as possible. Lazy (*?, +?, ??) match as little as possible. Use lazy when matching between delimiters: <.+?> instead of <.+>, otherwise one match swallows everything up to the last >.
When should I use a non-capturing group?
Use (?:...) when you need grouping for alternation or quantifiers but don't need the captured value. It's slightly faster and keeps your capture indices ($1, $2…) clean.
Why doesn't my multiline regex work?
By default ^ and $ match the start/end of the whole string, not each line. Add the m flag for line-based anchors. For . to match newlines, add the s (dotall) flag.
Should I parse HTML or JSON with regex?
No. Use a real parser (DOMParser, json.loads, etc.). Regex breaks on nested or malformed input. Use it only for token-level extraction inside already-parsed text.
What is a lookahead vs a lookbehind?
Lookahead (?=...) asserts what comes next without consuming characters. Lookbehind (?<=...) asserts what came before. Negative versions (?!...) and (?<!...) assert absence. They are zero-width — the match length is unchanged.
What causes 'catastrophic backtracking'?
Nested quantifiers on overlapping patterns like (a+)+ or (.*)*. The engine tries exponential combinations on a near-match. Fix with possessive quantifiers (PCRE), atomic groups, or by rewriting to avoid ambiguity.
What's the difference between \w and [A-Za-z0-9_]?
In ASCII mode they're identical. With the Unicode (u) flag in JS or by default in Python 3, \w also matches letters/digits from other scripts. Use [A-Za-z0-9_] when you specifically need ASCII-only.
How do I match a literal dot, parenthesis or backslash?
Escape it: \. \( \) \\. Inside a character class [...] most metacharacters lose their meaning — [.] is just a dot. The only specials inside a class are ] \ ^ -.
Why does my regex match more than expected?
Almost always greedy quantifiers. Switch . * to . *? (lazy), or use a negated class like [^"]* to bound the match precisely.
Are regex flavors compatible across languages?
Core syntax (classes, quantifiers, anchors, groups) is portable. Lookbehind, named groups, possessive quantifiers and Unicode property escapes (\p{L}) vary. Go's RE2 has no backreferences or lookaround at all.

Related