// regex
Regex Cheat Sheet (Syntax, Lookarounds & Recipes)
Every regular expression construct you actually use — character classes, anchors, quantifiers, capturing groups, lookarounds and flags — plus a recipe book of battle-tested patterns and a live tester. Bookmark this. It answers 90% of regex questions in one glance.
Updated
Quick regex tester
Edit the pattern, flags or input — matches update instantly (JavaScript engine).
3 matches:
- @10 2026-05-18
- @34 2026-06-01
- @63 2026-04-30
Character classes
| Token | Matches |
|---|---|
| . | Any character except newline (use s flag for newlines). |
| \d | Digit [0-9]. \D = non-digit. |
| \w | Word char [A-Za-z0-9_]. \W = non-word. |
| \s | Whitespace (space, tab, newline). \S = non-whitespace. |
| [abc] | Any of a, b, c. [^abc] negates. |
| [a-z] | Range. Combine: [A-Za-z0-9_-]. |
| \b | Word boundary. \B = non-boundary. |
| \t \n \r | Tab, newline, carriage return. |
| \xhh / \uhhhh | Hex escape / Unicode codepoint. |
| \\ | Literal backslash. Escape any metachar with \. |
Anchors
| Anchor | Matches |
|---|---|
| ^ | Start of string (or line in multiline mode). |
| $ | End of string (or line in multiline mode). |
| \A | Absolute start of string (PCRE/Python). |
| \Z | Absolute end of string (PCRE/Python). |
| \b | Word boundary — between \w and \W. |
| \B | Non-word-boundary. |
Quantifiers
| Quantifier | Meaning |
|---|---|
| * | 0 or more (greedy). Use *? for lazy. |
| + | 1 or more. +? for lazy. |
| ? | 0 or 1. ?? for lazy. |
| {n} | Exactly n times. |
| {n,} | n or more. |
| {n,m} | Between n and m. |
| *+ ++ ?+ | Possessive (PCRE/Java) — no backtracking. Prevents catastrophic regex. |
Groups & lookarounds
| Construct | Meaning |
|---|---|
| (abc) | Capturing group. Refer back as \1 or $1. |
| (?:abc) | Non-capturing group. Use when you only need grouping. |
| (?<name>abc) | Named capture. Refer as \k<name> or $<name>. |
| (?=abc) | Positive lookahead — must be followed by abc. |
| (?!abc) | Negative lookahead. |
| (?<=abc) | Positive lookbehind. |
| (?<!abc) | Negative lookbehind. |
| a|b | Alternation: a or b. Group with (?:a|b) to scope. |
| \1 \2 | Backreference to the Nth capture group. |
Flags
| Flag | Effect |
|---|---|
| i | Case-insensitive. |
| g | Global — find all matches (JS/PCRE). |
| m | Multiline — ^ and $ match line boundaries. |
| s | Dotall — . matches newlines. |
| x | Extended — ignore whitespace and # comments in pattern (PCRE/Python). |
| u | Unicode (JS, Python). |
| y | Sticky — match at lastIndex only (JS). |
The 7-second regex trick
- Write the simplest pattern that matches one real example. Test it.
- Anchor with
^and$for validation, leave unanchored for extraction. - Replace greedy
.*with a negated class ([^"]*) or lazy.*?. - Wrap alternatives in non-capturing groups:
(?:a|b|c). - Use named captures
(?<year>\d4)when you'll reference the match.
Example: extract the timestamp from [2026-05-18 12:34:56] ERROR ... → ^\[(?<ts>\d4-\d2-\d2 \d2:\d2:\d2)\]. Captures the date in ts, no greedy wildcards, runs in linear time.
Common recipes
| Pattern | Regex | Note |
|---|---|---|
| Email (pragmatic) | ^[\w.+-]+@[\w-]+(\.[\w-]+)+$ | Don't try to be RFC 5322 perfect. |
| IPv4 (strict) | ^((25[0-5]|2[0-4]\d|1?\d?\d)\.){3}(25[0-5]|2[0-4]\d|1?\d?\d)$ | Each octet 0–255. |
| IPv6 (loose) | ^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$ | Doesn't catch :: shorthand — use a parser for that. |
| URL (loose) | https?://[\w.-]+(?:\.[\w.-]+)+[\w\-._~:/?#[\]@!$&'()*+,;=]* | Use a parser if you can. |
| UUID v4 | ^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$ | Add the i flag. |
| ISO 8601 date/time | ^\d{4}-\d{2}-\d{2}(T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:?\d{2})?)?$ | Date or datetime. |
| US phone | ^\+?1?[-. ]?\(?\d{3}\)?[-. ]?\d{3}[-. ]?\d{4}$ | Accepts common separators. |
| Hex color | ^#?([0-9a-f]{3}|[0-9a-f]{6})$ | 3 or 6 digit. Case-insensitive. |
| Slug | ^[a-z0-9]+(?:-[a-z0-9]+)*$ | URL-safe kebab-case. |
| Strong password | ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\w\s]).{8,}$ | Lookaheads enforce each rule. |
| Semantic version | ^\d+\.\d+\.\d+(?:-[\w.]+)?(?:\+[\w.]+)?$ | MAJOR.MINOR.PATCH[-pre][+build]. |
| Strip ANSI color codes | \x1b\[[0-9;]*m | Clean log output. |
| Markdown link | \[([^\]]+)\]\(([^)]+)\) | Group 1 = text, Group 2 = URL. |
| HTML tag (token only) | <\/?([a-zA-Z][a-zA-Z0-9]*)\b[^>]*> | Tokens only — never parse real HTML with regex. |
Flavor gotchas
| Engine | What's different |
|---|---|
| JavaScript | Lookbehind requires ES2018+. No \A / \Z — use ^ / $ with no m flag. |
| PCRE / PHP | Use delimiters: /pattern/flags. Supports possessive quantifiers and recursion. |
| Python (re) | Use raw strings: r"\d+". re.match anchors at start; use re.search otherwise. |
| Go (regexp) | RE2-based — no backreferences, no lookaround. Linear time guaranteed. |
| grep | grep = BRE (escape +, ?, |). grep -E = ERE. grep -P = PCRE. |
| sed | BRE by default; sed -E for ERE. Use a different delimiter (s|a|b|) to avoid escaping slashes. |
Worked examples
Extract between quotes
- Pattern
- "([^"]*)"
- Input
- name="alice" age="30"
- Match 1
- alice
- Why
- Negated class is faster + safer than .*?
Strip trailing whitespace
- Pattern
- \s+$
- Flags
- gm
- Replace
- (empty)
- Why
- m makes $ match each line end
Validate semver
- Pattern
- ^\d+\.\d+\.\d+$
- Match
- 1.4.2
- No match
- 1.4 / v1.4.2
- Note
- Add prefix/build suffix if needed
Lookahead password rule
- Pattern
- ^(?=.*\d)(?=.*[A-Z]).{8,}$
- Match
- Hunter12x
- No match
- hunter12
- Why
- Lookaheads add rules without consuming chars
// free download
Get the Network Engineer Starter Pack
A printable 5-page PDF: first-60-seconds triage, modern Linux network commands, BGP show commands & path-selection order, and a symptom → root-cause map. Free, no fluff.
FAQ
- What's the difference between greedy and lazy quantifiers?
- Greedy (*, +, ?) match as much as possible. Lazy (*?, +?, ??) match as little as possible. Use lazy when matching between delimiters: <.+?> instead of <.+>, otherwise one match swallows everything up to the last >.
- When should I use a non-capturing group?
- Use (?:...) when you need grouping for alternation or quantifiers but don't need the captured value. It's slightly faster and keeps your capture indices ($1, $2…) clean.
- Why doesn't my multiline regex work?
- By default ^ and $ match the start/end of the whole string, not each line. Add the m flag for line-based anchors. For . to match newlines, add the s (dotall) flag.
- Should I parse HTML or JSON with regex?
- No. Use a real parser (DOMParser, json.loads, etc.). Regex breaks on nested or malformed input. Use it only for token-level extraction inside already-parsed text.
- What is a lookahead vs a lookbehind?
- Lookahead (?=...) asserts what comes next without consuming characters. Lookbehind (?<=...) asserts what came before. Negative versions (?!...) and (?<!...) assert absence. They are zero-width — the match length is unchanged.
- What causes 'catastrophic backtracking'?
- Nested quantifiers on overlapping patterns like (a+)+ or (.*)*. The engine tries exponential combinations on a near-match. Fix with possessive quantifiers (PCRE), atomic groups, or by rewriting to avoid ambiguity.
- What's the difference between \w and [A-Za-z0-9_]?
- In ASCII mode they're identical. With the Unicode (u) flag in JS or by default in Python 3, \w also matches letters/digits from other scripts. Use [A-Za-z0-9_] when you specifically need ASCII-only.
- How do I match a literal dot, parenthesis or backslash?
- Escape it: \. \( \) \\. Inside a character class [...] most metacharacters lose their meaning — [.] is just a dot. The only specials inside a class are ] \ ^ -.
- Why does my regex match more than expected?
- Almost always greedy quantifiers. Switch . * to . *? (lazy), or use a negated class like [^"]* to bound the match precisely.
- Are regex flavors compatible across languages?
- Core syntax (classes, quantifiers, anchors, groups) is portable. Lookbehind, named groups, possessive quantifiers and Unicode property escapes (\p{L}) vary. Go's RE2 has no backreferences or lookaround at all.