Debugging as a Discipline

The worst debugging sessions I have had shared a common pattern: I was thrashing. Changing things semi-randomly, hoping something would work, losing track of what I had already tried. The best ones felt almost meditative — a calm, systematic narrowing of possibilities.

The difference was not the complexity of the bug. It was whether I had a process.

The core loop

Good debugging is hypothesis-driven:

Observe the actual behaviour precisely
Form a hypothesis about the cause
Test the hypothesis with a minimal change or reproduction
Update your model based on what you observe

Most debugging goes wrong at step 2. People jump from observation directly to changing code, without forming an explicit hypothesis. Without a hypothesis, you cannot know what the result of your change is telling you.

Reproducing the bug

Before you can fix a bug, you need to reproduce it reliably. If you cannot reproduce it, you cannot know when you've fixed it.

# Isolate: does the bug reproduce in a fresh environment?
git stash && npm ci && npm run test
 
# Bisect: which commit introduced the regression?
git bisect start
git bisect bad HEAD
git bisect good v1.2.0
# git bisect will check out commits for you to test

A minimal reproduction is the goal: the smallest possible input that triggers the problem. Strip out everything unrelated to the failure.

Reading the error

An error message is a clue, not an answer. Read it carefully:

What is the exact error?
What file and line does the stack trace point to?
Is the error in your code or in a dependency?
What was the state of the system at the moment of failure?

The stack trace is a call history. Read it bottom-up: the bottom is where execution started, the top is where it failed.

When to ask for help

Wait too long and you waste time. Ask too soon and you miss the learning.

A useful heuristic: if you cannot make progress in 20–30 minutes of focused effort, stop and write down:

What you know about the problem
What you have tried and what each attempt revealed
Your current best hypothesis

Writing this down often reveals what you are missing. If it does not, share the write-up when you ask for help — it makes you much easier to help.

The log you wish you had

When a bug appears in production and you cannot reproduce it locally, you wish you had better logging. The places worth logging:

Entry and exit of critical functions (with inputs and outputs)
External calls (HTTP, DB, file system) including latency
State transitions in stateful systems
Unexpected but non-fatal conditions (if (rarely_true_thing))

Over-logging is a real problem (noise, cost, performance). But most codebases I have seen are under-logged in the places that matter.

Debugging is a skill that improves with deliberate practice. The deliberate part is key: not just fixing bugs, but noticing how you fixed them and building a more reliable process for next time.