// regex

Regex Cheatsheet

The pieces you actually use — character classes, anchors, quantifiers, groups, lookarounds, flags — plus a recipe book of common patterns.

Updated

Character classes

TokenMatches
.Any character except newline.
\dDigit [0-9]. \D = non-digit.
\wWord char [A-Za-z0-9_]. \W = non-word.
\sWhitespace (space, tab, newline). \S = non-whitespace.
[abc]Any of a, b, c. [^abc] negates.
[a-z]Range. Combine: [A-Za-z0-9_-].
\bWord boundary. \B = non-boundary.

Anchors

AnchorMatches
^Start of string (or line in multiline mode).
$End of string (or line in multiline mode).
\AAbsolute start of string.
\ZAbsolute end of string.

Quantifiers

QuantifierMeaning
*0 or more (greedy). Use *? for lazy.
+1 or more. +? for lazy.
?0 or 1. ?? for lazy.
{n}Exactly n times.
{n,}n or more.
{n,m}Between n and m.

Groups & lookarounds

ConstructMeaning
(abc)Capturing group. Refer back as \1 or $1.
(?:abc)Non-capturing group. Use when you only need grouping.
(?<name>abc)Named capture. Refer as \k<name> or $<name>.
(?=abc)Positive lookahead — must be followed by abc.
(?!abc)Negative lookahead.
(?<=abc)Positive lookbehind.
(?<!abc)Negative lookbehind.
a|bAlternation: a or b. Group with (?:a|b) to scope.

Flags

FlagEffect
iCase-insensitive.
gGlobal — find all matches (JS/PCRE).
mMultiline — ^ and $ match line boundaries.
sDotall — . matches newlines.
xExtended — ignore whitespace and # comments in pattern (PCRE/Python).
uUnicode (JS, Python).

Common recipes

PatternRegexNote
Email (pragmatic)^[\w.+-]+@[\w-]+(\.[\w-]+)+$Don't try to be RFC 5322 perfect.
IPv4^((25[0-5]|2[0-4]\d|1?\d?\d)\.){3}(25[0-5]|2[0-4]\d|1?\d?\d)$Strict octet bounds.
URL (loose)https?://[\w.-]+(?:\.[\w.-]+)+[\w\-._~:/?#[\]@!$&'()*+,;=]*Use a parser if you can.
UUID v4^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$Case-insensitive flag recommended.
ISO 8601 date^\d{4}-\d{2}-\d{2}(T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:?\d{2})?)?$Date or datetime.
Strip ANSI color codes\x1b\[[0-9;]*mUseful for cleaning log output.

FAQ

What's the difference between greedy and lazy quantifiers?
Greedy (*, +, ?) match as much as possible. Lazy (*?, +?, ??) match as little as possible. Use lazy when matching between delimiters: <.+?> instead of <.+>.
When should I use a non-capturing group?
Use (?:...) when you need grouping for alternation or quantifiers but don't need the captured value. It's faster and keeps capture indices clean.
Why doesn't my multiline regex work?
By default ^ and $ match the start/end of the string, not each line. Add the m flag for line-based anchors. For . to match newlines, add the s (dotall) flag.
Should I parse HTML/JSON with regex?
No. Use a real parser (DOMParser, json.loads, etc.). Regex breaks on nested or malformed input. Use it for token-level extraction inside already-parsed text.

Related