Using AI for Debugging: My Real-World Approach

Introduction

Three weeks ago, I was debugging a production issue that had me stumped. API timeouts, but only for about 30% of requests. No pattern I could see. Stack Overflow had nothing. I'd been staring at logs for two hours.

Then I copied everything into Claude and had it solved in 15 minutes.

Not because AI magically knew the answer (it didn't). But because it asked the right questions, generated hypotheses I hadn't considered, and helped me think systematically instead of randomly trying things.

That's what AI is actually good for in debugging: being a tireless partner who doesn't get frustrated, doesn't make assumptions, and helps you think through problems methodically.

Here's how I use it—the actual techniques and prompts that work.

When I Reach for AI (And When I Don't)

Not every bug needs AI. Here's when I use it:

Perfect for AI

Mystery errors with cryptic stack traces. Those errors where you Google the message and get nothing useful? AI has seen millions of these and can often explain what's actually happening.

Intermittent issues I can't reproduce. When a bug happens randomly in production but never locally, AI helps generate hypotheses about environmental differences.

Complex async/timing issues. Race conditions, promise chains gone wrong, event loop mysteries—AI is great at spotting these patterns.

Performance problems with no obvious bottleneck. When everything "looks fine" but it's slow, AI helps profile systematically.

Where I Debug Old-School

Typos and syntax errors. Just read the error message. Faster than asking AI.

Simple logic bugs. If I can see the bug by looking at the code, I fix it. No AI needed.

Domain-specific business logic. AI doesn't know my application's business rules. When a calculation is wrong, I need to understand the requirement, not ask AI.

The Framework That Actually Works

Here's my process when I'm stuck on a bug. I've used this dozens of times and it works.

Step 1: Capture Everything Relevant

AI is only as good as the context you give it. Here's what I include:

I'm debugging [brief problem description].

ERROR:
[Full error message and stack trace]

CODE:
[The function that's failing - keep it focused, not the entire file]

CONTEXT:
- Tech stack: [e.g., "Next.js 14, PostgreSQL, deployed on Vercel"]
- When it happens: [e.g., "30% of API requests, no clear pattern"]
- Environment: [e.g., "Only in production, not local or staging"]
- Recent changes: [e.g., "Started after adding caching layer yesterday"]

WHAT I'VE TRIED:
- [Thing 1 - didn't work]
- [Thing 2 - didn't work]
- [Thing 3 - made it worse]

What are the most likely causes, ranked by probability?

The "what I've tried" section is crucial—it stops AI from suggesting things you've already ruled out.

Step 2: Get Ranked Hypotheses

I don't want AI to just guess. I want it to generate multiple hypotheses ranked by likelihood:

Based on this error pattern, list 5 possible causes ranked by probability.

For each:
1. Why it might be the cause
2. How to verify it
3. How to fix it if confirmed

Start with the most likely.

Usually AI gets it right in the top 2-3 hypotheses. I work through them systematically.

Step 3: Verify Systematically

Here's where AI really helps—it suggests how to verify each hypothesis:

Let's test hypothesis #1. What logging or instrumentation should I add
to confirm or rule this out?

Give me copy-pasteable code I can add to verify.

AI generates actual code I can drop in to test the hypothesis. Much faster than figuring it out myself.

Step 4: Understand the Fix

When we find the cause, I never just copy the fix. I make AI explain it:

Explain:
1. Why this bug happened
2. Why your fix works
3. How to prevent this in the future
4. Are there other places in my codebase that might have the same issue?

That last question is gold—often I have the same bug pattern elsewhere.

The Real Debugging Prompts I Use

These are my actual go-to prompts. Copy them and adjust for your situation:

The "I'm Completely Stuck" Prompt:

I've been debugging this for [time] and I'm stuck.

PROBLEM: [One sentence description]

ERROR:
[Paste full error and stack trace]

CODE:
[Paste the relevant function/component]

WHAT I KNOW:
- Happens: [when/where]
- Doesn't happen: [when/where]
- Started: [when - e.g., "after deploying X" or "randomly"]

WHAT I'VE TRIED:
- [Thing 1]
- [Thing 2]
- [Thing 3]

Generate 5 hypotheses for what could be wrong, ranked by likelihood.
Include how to verify each one.

The "It Works Locally But Not in Production" Prompt:

This code works perfectly locally but fails in production.

SETUP:
- Local: [your local setup]
- Production: [your production setup]

ERROR IN PRODUCTION:
[Paste error]

CODE:
[Paste code]

What environmental differences should I check?
Give me a systematic checklist to work through.

The "Performance Mystery" Prompt:

This [endpoint/function/page] is slow but I can't find the bottleneck.

CURRENT PERFORMANCE:
- Response time: [e.g., "800ms avg, 2s p95"]
- Target: [e.g., "< 200ms"]

CODE:
[Paste code]

WHAT I'VE CHECKED:
- Database queries: [findings]
- External API calls: [findings]
- Profiling results: [if any]

Help me systematically profile this. What instrumentation should I add?
Give me copy-pasteable code to identify the slow parts.

The "Race Condition" Prompt:

I have a bug that only happens sometimes, unpredictably.
I suspect a race condition or timing issue.

SYMPTOM:
[What happens when it fails]

CODE:
[Paste async code / event handlers / state management]

FREQUENCY:
[How often it fails - e.g., "1 in 20 times"]

Help me:
1. Identify potential race conditions in this code
2. Make this bug more reproducible so I can debug it
3. Fix the race condition once confirmed

The "Memory Leak" Prompt:

My application's memory usage keeps growing over time.

METRICS:
- Starts at: [e.g., "120MB"]
- After 1 hour: [e.g., "450MB"]
- After 4 hours: [e.g., "1.2GB, then crashes"]

CODE:
[Paste code that runs repeatedly - event listeners, intervals, API handlers]

ENVIRONMENT:
[Node version, framework, etc.]

What are the most likely sources of a memory leak here?
How can I confirm and fix each one?

The "Explain This Error" Prompt:

I'm getting this error and I don't understand what it means:

[Paste error message]

Context:
[Paste relevant code]

Explain:
1. What this error actually means in plain English
2. What's causing it in my code
3. How to fix it
4. How to prevent it in the future

Advanced Technique: The Rubber Duck on Steroids

You know rubber duck debugging? Where you explain the problem aloud and often solve it yourself?

AI supercharges this because it asks follow-up questions:

You: "Let me walk through this step by step. I fetch user data when
the component mounts..."

AI: "Okay. What happens if the component unmounts before the fetch
completes?"

You: "It would... oh. I'm not canceling the fetch. So I'm calling
setState on an unmounted component."

AI: "Exactly. Want me to show you how to clean up with AbortController?"

The AI asks the questions a rubber duck can't. And often, just explaining the problem to AI helps you spot the issue yourself.

I use this all the time. Sometimes I type out the whole problem and realize the answer before AI even responds.

Real Example: The Connection Pool Mystery

Let me walk you through an actual bug I solved with AI help.

The Problem:

API returning 500 errors intermittently. About 30% of requests.
Error: Connection timeout

What I Tried:

Increased connection pool size (didn't help)
Checked database performance (CPU < 30%, plenty of capacity)
Looked for slow queries (all fast)

What I Asked AI:

API endpoint returns 500 error intermittently (30% of requests).

ERROR:
Error: Connection timeout
  at Timeout._onTimeout (/app/db/pool.js:45:12)

CODE:
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20,
  connectionTimeoutMillis: 2000
});

app.get('/api/users/:id', async (req, res) => {
  const client = await pool.connect();
  const result = await client.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
  res.json(result.rows[0]);
});

CONTEXT:
- Node.js 18, PostgreSQL 14
- ~100 req/sec when errors occur
- Only happens in production
- Started after yesterday's deployment (added a new feature, no DB changes)

TRIED:
- Increased pool size to 20 (was 10)
- Checked DB performance (fine)
- Checked query performance (< 5ms)

What's most likely wrong?

AI's Response:

Most likely cause (90%): Connection leak.

Your code gets a client from the pool but never releases it.
Look at line 8: no client.release()

Each request takes a connection and never returns it. Eventually all 20
connections are taken, new requests timeout.

Fix:
app.get('/api/users/:id', async (req, res) => {
  const client = await pool.connect();
  try {
    const result = await client.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
    res.json(result.rows[0]);
  } finally {
    client.release();  // Always release, even on error
  }
});

To verify this is the issue, add pool monitoring:
[AI provided monitoring code]

That was it. Missing client.release(). Fixed in 2 minutes once AI spotted it.

What I Learned:

The error message was misleading—it said "timeout" but the real issue was exhausted connections
AI spotted the missing cleanup immediately (I'd been staring at that code for an hour)
The "started after deployment" clue was a red herring—the bug was always there, deployment just increased traffic enough to trigger it

What AI Can't Do (Yet)

Let me be honest about limitations I've hit:

AI doesn't know your business logic. If a calculation is wrong, you need to verify it against requirements. AI can't tell you if 10% discount should apply before or after tax.

AI can't reproduce bugs. It can suggest how to make bugs reproducible, but you still have to run the code and observe the behavior.

AI sometimes confidently suggests wrong solutions. Always test AI's suggestions. I've had it recommend "fixes" that made things worse.

AI doesn't have access to your production logs. You have to copy relevant parts to it. And be careful not to include sensitive data.

My Debugging Checklist

When I'm stuck, I work through this:

First 5 Minutes: Try Without AI

Read the error message carefully
Check recent changes (git log)
Add logging around the failing code
Google the exact error message

If Still Stuck: Bring in AI 5. Collect context (error, code, environment, what I tried) 6. Ask AI for ranked hypotheses 7. Systematically verify each hypothesis 8. Understand the fix before implementing 9. Add tests to prevent regression

After Solving: 10. Document the bug and solution 11. Check for similar issues elsewhere in codebase 12. Update team docs if it's a common pattern

What Changed from 2024 to 2025

The debugging experience got better in a few key ways:

Larger context windows: I can now paste entire stack traces and multiple files without hitting limits
Better at suggesting instrumentation: AI generates better logging/profiling code
Understands modern stacks better: Much better at Next.js, serverless, edge functions
Faster responses: AI suggestions come in seconds, not minutes

But the core limitation remains: AI helps you think through problems, it doesn't magically know the answer.

Your Action Plan

Want to get better at AI-assisted debugging?

This Week:

Next time you're stuck on a bug for more than 20 minutes, try the framework:

Collect complete context
Ask for ranked hypotheses
Verify systematically
Understand the fix

Save your prompts. When you find a prompt that works well, save it in a file you can reference later.

Keep a debugging log. Note what worked, what AI got wrong, and what you learned.

Within a Month:

Build your personal collection of debugging prompts for common scenarios:

"It works locally but not in production"
"Performance is slow"
"Intermittent errors"
"Memory leak"

You'll get faster every time.

Final Thoughts

AI won't solve bugs for you. But it's an exceptional debugging partner.

It's tireless. It doesn't get frustrated. It suggests approaches you might not consider. It helps you think systematically instead of randomly trying things.

The developers who debug fastest with AI aren't the ones who blindly copy AI suggestions. They're the ones who use AI to generate hypotheses, verify them methodically, and understand root causes.

Start with one prompt. Use it next time you're stuck. Refine it based on what works. Build your debugging toolkit over time.

Your impossible bugs are still solvable. AI just helps you get there faster.