// regex
Regex Cheatsheet
The pieces you actually use — character classes, anchors, quantifiers, groups, lookarounds, flags — plus a recipe book of common patterns.
Updated
Character classes
| Token | Matches |
|---|---|
| . | Any character except newline. |
| \d | Digit [0-9]. \D = non-digit. |
| \w | Word char [A-Za-z0-9_]. \W = non-word. |
| \s | Whitespace (space, tab, newline). \S = non-whitespace. |
| [abc] | Any of a, b, c. [^abc] negates. |
| [a-z] | Range. Combine: [A-Za-z0-9_-]. |
| \b | Word boundary. \B = non-boundary. |
Anchors
| Anchor | Matches |
|---|---|
| ^ | Start of string (or line in multiline mode). |
| $ | End of string (or line in multiline mode). |
| \A | Absolute start of string. |
| \Z | Absolute end of string. |
Quantifiers
| Quantifier | Meaning |
|---|---|
| * | 0 or more (greedy). Use *? for lazy. |
| + | 1 or more. +? for lazy. |
| ? | 0 or 1. ?? for lazy. |
| {n} | Exactly n times. |
| {n,} | n or more. |
| {n,m} | Between n and m. |
Groups & lookarounds
| Construct | Meaning |
|---|---|
| (abc) | Capturing group. Refer back as \1 or $1. |
| (?:abc) | Non-capturing group. Use when you only need grouping. |
| (?<name>abc) | Named capture. Refer as \k<name> or $<name>. |
| (?=abc) | Positive lookahead — must be followed by abc. |
| (?!abc) | Negative lookahead. |
| (?<=abc) | Positive lookbehind. |
| (?<!abc) | Negative lookbehind. |
| a|b | Alternation: a or b. Group with (?:a|b) to scope. |
Flags
| Flag | Effect |
|---|---|
| i | Case-insensitive. |
| g | Global — find all matches (JS/PCRE). |
| m | Multiline — ^ and $ match line boundaries. |
| s | Dotall — . matches newlines. |
| x | Extended — ignore whitespace and # comments in pattern (PCRE/Python). |
| u | Unicode (JS, Python). |
Common recipes
| Pattern | Regex | Note |
|---|---|---|
| Email (pragmatic) | ^[\w.+-]+@[\w-]+(\.[\w-]+)+$ | Don't try to be RFC 5322 perfect. |
| IPv4 | ^((25[0-5]|2[0-4]\d|1?\d?\d)\.){3}(25[0-5]|2[0-4]\d|1?\d?\d)$ | Strict octet bounds. |
| URL (loose) | https?://[\w.-]+(?:\.[\w.-]+)+[\w\-._~:/?#[\]@!$&'()*+,;=]* | Use a parser if you can. |
| UUID v4 | ^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$ | Case-insensitive flag recommended. |
| ISO 8601 date | ^\d{4}-\d{2}-\d{2}(T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:?\d{2})?)?$ | Date or datetime. |
| Strip ANSI color codes | \x1b\[[0-9;]*m | Useful for cleaning log output. |
FAQ
- What's the difference between greedy and lazy quantifiers?
- Greedy (*, +, ?) match as much as possible. Lazy (*?, +?, ??) match as little as possible. Use lazy when matching between delimiters: <.+?> instead of <.+>.
- When should I use a non-capturing group?
- Use (?:...) when you need grouping for alternation or quantifiers but don't need the captured value. It's faster and keeps capture indices clean.
- Why doesn't my multiline regex work?
- By default ^ and $ match the start/end of the string, not each line. Add the m flag for line-based anchors. For . to match newlines, add the s (dotall) flag.
- Should I parse HTML/JSON with regex?
- No. Use a real parser (DOMParser, json.loads, etc.). Regex breaks on nested or malformed input. Use it for token-level extraction inside already-parsed text.