Support
Operations keeping production running
What is Support?
Support is the operational work that keeps production systems running smoothly. This includes monitoring, debugging, incident response, and optimization.
Support maintains Solutions, troubleshoots Systems, debugs Software, and monitors Services.
Runbook: Database Queries Suddenly Slow
Symptom
API responses taking 5-30 seconds. Users reporting "loading forever" on dashboard. Vercel Analytics showing increased database timeout errors.
Severity: High
Users can't access critical features
Time to Fix: 30-60 minutes
Diagnosis + index creation
Diagnosis Steps
- Check Supabase Dashboard
# Navigate to Supabase Dashboard https://supabase.com/dashboard/project/YOUR_PROJECT/logs # Filter for slow queries (>1s execution time) Filter: "duration > 1000"
- Identify slow query pattern
# Common culprits: - SELECT * FROM large_table WHERE unindexed_column = 'value' - JOINs without proper indexes - ORDER BY on unindexed columns
- Run EXPLAIN ANALYZE
-- In Supabase SQL Editor EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com'; -- Look for: -- "Seq Scan" = BAD (no index used) -- "Index Scan" = GOOD (index used)
- Check index usage
-- See which indexes exist and are being used SELECT schemaname, tablename, indexname, idx_scan as times_used FROM pg_stat_user_indexes WHERE schemaname = 'public' ORDER BY idx_scan ASC; -- idx_scan = 0 means index never used (consider dropping)
Fix Steps
- Create missing index
-- Example: Email lookup slow CREATE INDEX CONCURRENTLY idx_users_email ON users(email); -- CONCURRENTLY = no table lock during creation -- Safe for production with active traffic
- Verify index creation
-- Check index exists SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'users' AND indexname = 'idx_users_email';
- Re-run slow query
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com'; -- Should now show: -- "Index Scan using idx_users_email" -- Execution time: <10ms (was 5000ms+)
- Monitor production impact
# Vercel Analytics (if using) - Check p95 response time improvement # Vercel Analytics - Verify database timeout errors decreased # Supabase Dashboard - Confirm slow query count dropped
Prevention
- Index planning: Add indexes for all WHERE, JOIN, and ORDER BY columns
- Query testing: Run EXPLAIN ANALYZE on all queries before production
- Monitoring: Set up Supabase alerts for queries >1s execution time
- Regular audits: Review pg_stat_user_indexes monthly, drop unused indexes
- Load testing: Test with production-scale data before deploy
Common Production Issues (Quick Reference)
Authentication Loops
Symptom: Users stuck redirecting between /login and /dashboard
Fix: Check middleware matcher excludes /login route
Code: export const config = { matcher: ['/dashboard/:path*'] }
Build Failures on Vercel
Symptom: "Type error: Cannot find module" during build
Fix: Run npm run build
locally first, fix TypeScript errors
Prevention: Add npm run type-check
to pre-commit hooks
API Rate Limit Exceeded
Symptom: 429 errors from Claude API, users can't get AI responses
Fix: Implement client-side rate limiting with Upstash Redis
Code: @upstash/ratelimit
with sliding window
Environment Variables Not Loading
Symptom: process.env.SUPABASE_URL
is undefined in production
Fix: Add to Vercel dashboard → Settings → Environment Variables
Gotcha: Redeploy after adding env vars (not automatic)
Essential Monitoring Setup
Tool | What to Monitor | Alert Threshold | Cost |
---|---|---|---|
Vercel Analytics |
| >10 errors/min | $0-26/mo |
Vercel Analytics |
| p95 > 1s | $10-20/mo |
Supabase Logs |
| >10 slow queries/min | Included in Pro ($25/mo) |
Uptime Robot |
| Any downtime | $0 (50 monitors) |
AI Coding Assistance Tools (Late 2025)
AI coding tools have matured significantly in 2025. Terminal-native agents like Claude Code achieve 72.5% on SWE-bench Verified, while multi-platform orchestrators like OpenAI Codex hit 74.9%. These tools handle production debugging, complex refactoring, and multi-file edits.
- • Best for terminal-native workflows: Claude Code, Gemini CLI
- • Best for multi-platform orchestration: OpenAI Codex
- • Best for speed-critical tasks: grok-code-fast-1 (92 tokens/sec)
Tool | SWE-bench | Interface | Pricing | Best For |
---|---|---|---|---|
Claude Code | 72.5% | Terminal | $20-200/mo | Terminal-native workflows, MCP integrations |
OpenAI Codex | 74.5-74.9% | VS Code/IDE | $20-200/mo | Multi-platform orchestration, agent swarms |
Gemini CLI | N/A | Terminal | Free (1K/day) | Open-source, 1M token context, free tier |
grok-code-fast-1 | 70.8% | API | $0.20/$1.50 per M | Speed-critical tasks (92 tokens/sec) |
Claude Code: Terminal-Native Agent
72.5% SWE-bench Verified (Claude 3.7 Sonnet), terminal-native with MCP protocol for tool integrations. Agentic workflow with TodoWrite, MultiEdit, and specialized agents for complex tasks.
- • Strengths: Terminal integration, MCP tools (Supabase, GitHub, Playwright), multi-step reasoning
- • Use Cases: Production debugging, complex refactoring, full-stack development
- • Pricing: Pro ($20/mo) to Team ($200/mo), consumption-based API
- • Context: 200K tokens (Claude 3.7 Sonnet), effective codebase understanding
OpenAI Codex: Multi-Platform Orchestration
74.5-74.9% SWE-bench Verified (o1 and o3-mini models), multi-platform with VS Code, JetBrains, terminal. Agent orchestration with specialized sub-agents (architect, coder, tester).
- • Strengths: Platform flexibility, agent swarms, deep reasoning (o1/o3-mini)
- • Use Cases: Complex architecture, multi-component refactoring, team workflows
- • Pricing: Pro ($20/mo) to Teams ($200/mo), enhanced reasoning costs more
- • Context: 128K tokens (o1), 200K tokens (o3-mini)
Gemini CLI: Open-Source Terminal Agent
Apache 2.0 license, terminal-native with 1M token context window. Free tier: 1,000 requests/day. Best for open-source projects and developers who want full control.
- • Strengths: Free tier, 1M token context, open-source, self-hostable
- • Use Cases: Large codebases, cost-sensitive projects, customization needs
- • Pricing: Free (1K requests/day), paid tiers for production
- • Context: 1M tokens (best for large monorepos)
grok-code-fast-1: Speed-Optimized Model
70.8% SWE-bench Verified with 92 tokens/sec generation speed. API-first model for speed-critical applications. Best for real-time coding assistance and rapid iteration.
- • Strengths: 92 tokens/sec speed, cost-effective ($0.20/$1.50 per M tokens)
- • Use Cases: Real-time assistance, rapid prototyping, live coding sessions
- • Pricing: $0.20 input / $1.50 output per million tokens
- • Context: Standard context window, optimized for speed over depth
Selection Guide
- • Terminal-native workflow? Claude Code or Gemini CLI
- • IDE integration priority? OpenAI Codex (VS Code/JetBrains)
- • Budget constrained? Gemini CLI (free tier) or grok-code-fast-1 (API)
- • Complex multi-component work? OpenAI Codex (agent orchestration)
- • Speed-critical tasks? grok-code-fast-1 (92 tokens/sec)
- • MCP tool integrations? Claude Code (Supabase, GitHub, Playwright)
Incident Response Process
- Detect: Alert fires (Vercel Analytics, deployment logs, user reports)
- Assess Severity:
- Critical: Complete outage, data loss risk → Fix immediately
- High: Core features broken → Fix within 1 hour
- Medium: Degraded performance → Fix within 4 hours
- Low: Minor issues → Fix within 24 hours
- Communicate: Update status page, notify affected users if >5 min downtime
- Mitigate: Stop the bleeding (rollback, disable feature, scale resources)
- Diagnose: Use logs, metrics, and runbooks to find root cause
- Fix: Apply permanent solution, verify in production
- Document: Write post-mortem, update runbook, add monitoring