// triage

Network Troubleshooting Commands

A working triage playbook: what to check first, what each command actually tells you, and how common symptoms map to root causes.

Updated

Triage by layer (bottom-up)

Don't debug L7 until L1-L4 are clean. Saves hours.

LayerQuestionCommands / Notes
L1 — linkCable, optic, port up?`ethtool eth0`, switch port counters, optic dB levels. No link = stop here.
L2 — switchingARP / MAC learned?`ip neigh`, `show mac address-table`. Wrong VLAN and duplicate MACs live here.
L3 — routingRoute to destination?`ip r get <dst>`, traceroute. Asymmetric routing is sneaky — check return path too.
L4 — transportPort reachable?`nc -zv host port`, `ss -tnp`. Firewalls and conntrack fills happen here.
L7 — applicationService actually answering?`curl -v`, app logs, DNS resolution, TLS handshake. Most 'network outages' end up being L7.

First-60-seconds triage

Run these in order. Each one rules out a whole class of problem.

CommandRules outHow to read it
ping -c 4 8.8.8.8Internet reachable?If yes, L1-L3 to the internet is fine. Move to DNS / app layer.
ping -c 4 google.comDNS working?IP ping works but name doesn't → resolver/DNS issue. Check /etc/resolv.conf and `dig`.
mtr -rwzbc 50 <dst>Where's the loss?Look for consistent loss starting at a hop — that's the culprit. Loss at one hop only often = ICMP rate-limit, ignore.
nc -zv <host> <port>Port open from here?Connection refused = service down. Timeout = firewall. No route = routing issue.
curl -v https://hostFull HTTP transactionTimes every phase: DNS, connect, TLS, server. Add `-w '@curl-format.txt'` for detailed timing.
dig +trace example.comDNS delegation chainWalks root → TLD → authoritative. Shows exactly where resolution breaks.
ss -sSocket summaryConntrack full or TIME_WAIT explosion? You'll see it here in seconds.

Symptom → root cause

SymptomUsual causeWhat to do
Intermittent timeoutsPacket loss or session limitsRun `mtr` for 5+ minutes. Check conntrack table size and NIC error counters.
Slow but worksMTU, congestion or DNSTry `ping -M do -s 1472 <dst>`. If fragmentation needed = MTU issue. Check DNS with `dig +stats`.
DNS resolves wrong IPStale cache or split-horizon`systemd-resolve --flush-caches` or `resolvectl flush-caches`. Compare `dig @1.1.1.1` vs `dig @local`.
TLS handshake failsCert, SNI or TLS version`openssl s_client -connect host:443 -servername host`. Check expiry, chain and accepted protocols.
Connection reset by peerFirewall or app crashRST mid-flow = something killed the session. Check stateful firewall idle timeouts and app-side OOM.
Asymmetric / one-way trafficRouting + stateful FWReply path takes a different firewall that has no session state → drop. Use `ip r get` from both ends.
Works locally, fails remotelyMTU, path MTU discoveryICMP fragmentation-needed blocked somewhere. Lower MSS or enable PMTUD properly.

Run a real NOC instead of guessing

NOC-in-a-Box is a Docker-Compose stack with Prometheus, Grafana, Alertmanager, blackbox-exporter and synthetic checks already wired up — so you see packet loss, latency drift and DNS slowness before tickets land. Part of the pingtraceSSH Arsenal.

→ Get NOC-in-a-Box
// free download

Get the Network Engineer Starter Pack

A printable 5-page PDF: first-60-seconds triage, modern Linux network commands, BGP show commands & path-selection order, and a symptom → root-cause map. Free, no fluff.

No spam. Unsubscribe anytime. We send occasional updates when we ship new tools or cheatsheets.

FAQ

What's the order I should troubleshoot network issues in?
Bottom-up: physical link → switching/ARP → routing → transport/firewall → DNS → application. Each layer depends on the one below. Most engineers waste hours debugging L7 when the real problem is a flapping interface or a full conntrack table.
Ping works but my app can't connect — what's wrong?
ICMP and TCP go through different code paths in firewalls. Use `nc -zv host port` or `curl -v` against the actual port. Common causes: stateful firewall blocking the port, application not listening on the right interface, or SELinux/AppArmor denying the bind.
How do I find packet loss between me and a remote host?
Use `mtr -rwzbc 100 <host>` — it sends 100 probes and reports per-hop loss + latency. Look for the first hop with consistent loss; that's your problem. Loss at a single intermediate hop only (with no loss after) is usually ICMP rate-limiting on that router, not real loss.
What does 'no route to host' mean?
The kernel has no route that matches the destination. Run `ip r get <destination>` to see what route would be used. Either you're missing a route, a gateway is down, or your interface is in a state that disqualifies it (no carrier, no IP).
Why does my connection work for a while then drop?
Three usual suspects: stateful firewall idle timeout (often 1 hour for TCP), conntrack table filling up (`cat /proc/sys/net/netfilter/nf_conntrack_count`), or TCP keepalive disabled while a NAT in the path drops idle sessions. Enable keepalive on the app or lower its interval.
What's the fastest way to know if it's DNS?
Ping the destination by IP. If IP works and name doesn't, it's DNS. Then `dig name` against your resolver and against `1.1.1.1` to isolate local vs upstream. `dig +trace` walks the delegation chain when something deeper is broken.

Related