Support

Operations keeping production running

What is Support?

Support is the operational work that keeps production systems running smoothly. This includes monitoring, debugging, incident response, and optimization.

Support maintains Solutions, troubleshoots Systems, debugs Software, and monitors Services.

Runbook: Database Queries Suddenly Slow

Symptom

API responses taking 5-30 seconds. Users reporting "loading forever" on dashboard. Vercel Analytics showing increased database timeout errors.

Severity: High

Users can't access critical features

Time to Fix: 30-60 minutes

Diagnosis + index creation

Diagnosis Steps

  1. Check Supabase Dashboard
    # Navigate to Supabase Dashboard
    https://supabase.com/dashboard/project/YOUR_PROJECT/logs
    
    # Filter for slow queries (>1s execution time)
    Filter: "duration > 1000"
  2. Identify slow query pattern
    # Common culprits:
    - SELECT * FROM large_table WHERE unindexed_column = 'value'
    - JOINs without proper indexes
    - ORDER BY on unindexed columns
  3. Run EXPLAIN ANALYZE
    -- In Supabase SQL Editor
    EXPLAIN ANALYZE
    SELECT * FROM users
    WHERE email = 'test@example.com';
    
    -- Look for:
    -- "Seq Scan" = BAD (no index used)
    -- "Index Scan" = GOOD (index used)
  4. Check index usage
    -- See which indexes exist and are being used
    SELECT
      schemaname,
      tablename,
      indexname,
      idx_scan as times_used
    FROM pg_stat_user_indexes
    WHERE schemaname = 'public'
    ORDER BY idx_scan ASC;
    
    -- idx_scan = 0 means index never used (consider dropping)

Fix Steps

  1. Create missing index
    -- Example: Email lookup slow
    CREATE INDEX CONCURRENTLY idx_users_email
    ON users(email);
    
    -- CONCURRENTLY = no table lock during creation
    -- Safe for production with active traffic
  2. Verify index creation
    -- Check index exists
    SELECT indexname, indexdef
    FROM pg_indexes
    WHERE tablename = 'users'
    AND indexname = 'idx_users_email';
  3. Re-run slow query
    EXPLAIN ANALYZE
    SELECT * FROM users
    WHERE email = 'test@example.com';
    
    -- Should now show:
    -- "Index Scan using idx_users_email"
    -- Execution time: <10ms (was 5000ms+)
  4. Monitor production impact
    # Vercel Analytics (if using)
    - Check p95 response time improvement
    
    # Vercel Analytics
    - Verify database timeout errors decreased
    
    # Supabase Dashboard
    - Confirm slow query count dropped

Prevention

  • Index planning: Add indexes for all WHERE, JOIN, and ORDER BY columns
  • Query testing: Run EXPLAIN ANALYZE on all queries before production
  • Monitoring: Set up Supabase alerts for queries >1s execution time
  • Regular audits: Review pg_stat_user_indexes monthly, drop unused indexes
  • Load testing: Test with production-scale data before deploy

Common Production Issues (Quick Reference)

Authentication Loops

Symptom: Users stuck redirecting between /login and /dashboard
Fix: Check middleware matcher excludes /login route
Code: export const config = { matcher: ['/dashboard/:path*'] }

Build Failures on Vercel

Symptom: "Type error: Cannot find module" during build
Fix: Run npm run build locally first, fix TypeScript errors
Prevention: Add npm run type-check to pre-commit hooks

API Rate Limit Exceeded

Symptom: 429 errors from Claude API, users can't get AI responses
Fix: Implement client-side rate limiting with Upstash Redis
Code: @upstash/ratelimit with sliding window

Environment Variables Not Loading

Symptom: process.env.SUPABASE_URL is undefined in production
Fix: Add to Vercel dashboard → Settings → Environment Variables
Gotcha: Redeploy after adding env vars (not automatic)

Essential Monitoring Setup

ToolWhat to MonitorAlert ThresholdCost
Vercel Analytics
  • JavaScript errors
  • API failures
  • Performance issues
>10 errors/min$0-26/mo
Vercel Analytics
  • Response times (p50, p95, p99)
  • Function execution duration
  • Edge function errors
p95 > 1s$10-20/mo
Supabase Logs
  • Slow queries (>1s)
  • Failed auth attempts
  • Database connection pool
>10 slow queries/minIncluded in Pro ($25/mo)
Uptime Robot
  • Homepage availability
  • API endpoint health
  • SSL certificate expiry
Any downtime$0 (50 monitors)

AI Coding Assistance Tools (Late 2025)

AI coding tools have matured significantly in 2025. Terminal-native agents like Claude Code achieve 72.5% on SWE-bench Verified, while multi-platform orchestrators like OpenAI Codex hit 74.9%. These tools handle production debugging, complex refactoring, and multi-file edits.

  • Best for terminal-native workflows: Claude Code, Gemini CLI
  • Best for multi-platform orchestration: OpenAI Codex
  • Best for speed-critical tasks: grok-code-fast-1 (92 tokens/sec)
ToolSWE-benchInterfacePricingBest For
Claude Code72.5%Terminal$20-200/moTerminal-native workflows, MCP integrations
OpenAI Codex74.5-74.9%VS Code/IDE$20-200/moMulti-platform orchestration, agent swarms
Gemini CLIN/ATerminalFree (1K/day)Open-source, 1M token context, free tier
grok-code-fast-170.8%API$0.20/$1.50 per MSpeed-critical tasks (92 tokens/sec)

Claude Code: Terminal-Native Agent

72.5% SWE-bench Verified (Claude 3.7 Sonnet), terminal-native with MCP protocol for tool integrations. Agentic workflow with TodoWrite, MultiEdit, and specialized agents for complex tasks.

  • Strengths: Terminal integration, MCP tools (Supabase, GitHub, Playwright), multi-step reasoning
  • Use Cases: Production debugging, complex refactoring, full-stack development
  • Pricing: Pro ($20/mo) to Team ($200/mo), consumption-based API
  • Context: 200K tokens (Claude 3.7 Sonnet), effective codebase understanding

OpenAI Codex: Multi-Platform Orchestration

74.5-74.9% SWE-bench Verified (o1 and o3-mini models), multi-platform with VS Code, JetBrains, terminal. Agent orchestration with specialized sub-agents (architect, coder, tester).

  • Strengths: Platform flexibility, agent swarms, deep reasoning (o1/o3-mini)
  • Use Cases: Complex architecture, multi-component refactoring, team workflows
  • Pricing: Pro ($20/mo) to Teams ($200/mo), enhanced reasoning costs more
  • Context: 128K tokens (o1), 200K tokens (o3-mini)

Gemini CLI: Open-Source Terminal Agent

Apache 2.0 license, terminal-native with 1M token context window. Free tier: 1,000 requests/day. Best for open-source projects and developers who want full control.

  • Strengths: Free tier, 1M token context, open-source, self-hostable
  • Use Cases: Large codebases, cost-sensitive projects, customization needs
  • Pricing: Free (1K requests/day), paid tiers for production
  • Context: 1M tokens (best for large monorepos)

grok-code-fast-1: Speed-Optimized Model

70.8% SWE-bench Verified with 92 tokens/sec generation speed. API-first model for speed-critical applications. Best for real-time coding assistance and rapid iteration.

  • Strengths: 92 tokens/sec speed, cost-effective ($0.20/$1.50 per M tokens)
  • Use Cases: Real-time assistance, rapid prototyping, live coding sessions
  • Pricing: $0.20 input / $1.50 output per million tokens
  • Context: Standard context window, optimized for speed over depth

Selection Guide

  • Terminal-native workflow? Claude Code or Gemini CLI
  • IDE integration priority? OpenAI Codex (VS Code/JetBrains)
  • Budget constrained? Gemini CLI (free tier) or grok-code-fast-1 (API)
  • Complex multi-component work? OpenAI Codex (agent orchestration)
  • Speed-critical tasks? grok-code-fast-1 (92 tokens/sec)
  • MCP tool integrations? Claude Code (Supabase, GitHub, Playwright)

Incident Response Process

  1. Detect: Alert fires (Vercel Analytics, deployment logs, user reports)
  2. Assess Severity:
    • Critical: Complete outage, data loss risk → Fix immediately
    • High: Core features broken → Fix within 1 hour
    • Medium: Degraded performance → Fix within 4 hours
    • Low: Minor issues → Fix within 24 hours
  3. Communicate: Update status page, notify affected users if >5 min downtime
  4. Mitigate: Stop the bleeding (rollback, disable feature, scale resources)
  5. Diagnose: Use logs, metrics, and runbooks to find root cause
  6. Fix: Apply permanent solution, verify in production
  7. Document: Write post-mortem, update runbook, add monitoring

How Support Relates to Other Layers

  • Maintain Solutions: Keep AI chat working, users authenticated
  • Troubleshoot Systems: Debug auth loops, fix data flow issues
  • Debug Software: Fix TypeScript errors, optimize React renders
  • Monitor Services: Track costs, API usage, uptime