State Management for Multi-Agent Systems

State is memory. Memory is identity. Identity determines behavior. Managing state across multi-agent workflows is the foundation of reliable AI orchestration.

Strategic Foundation: State as Identity

Without temporal continuity, you have independent function calls, not an intelligent system. State creates memory. Memory creates identity. Identity enables coherent behavior over time.

Core insight: State is the foundation of temporal continuity. Multi-agent systems without proper state management are just random function calls. Read the full framework in Philosophy: The Industrialization of Intelligence.

The Core Principle

State determines what an agent can remember, and therefore what it can become.

In multi-agent orchestration, state management isn't about data storage—it's about creating temporal continuity across independent AI agents. Without it, you have stateless function calls. With it, you have systems that learn, adapt, and maintain coherent behavior.

State Management Patterns

1. Shared Context Pattern

All agents read from and write to shared state. Simplest pattern, but requires careful management to avoid conflicts.

Pattern: Shared Context Object

// Vercel AI SDK - Shared context across agents
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';

interface SharedContext {
  conversationHistory: Array<{ role: string; content: string }>;
  userPreferences: Record<string, any>;
  sessionData: Record<string, any>;
  metadata: {
    userId: string;
    sessionId: string;
    startTime: number;
  };
}

class MultiAgentOrchestrator {
  private context: SharedContext;

  constructor(userId: string, sessionId: string) {
    this.context = {
      conversationHistory: [],
      userPreferences: {},
      sessionData: {},
      metadata: {
        userId,
        sessionId,
        startTime: Date.now(),
      },
    };
  }

  async runAgent(
    agentName: string,
    prompt: string,
    model: any = openai('gpt-4o-mini')
  ) {
    // Agent can read from shared context
    const contextualPrompt = `
Context:
- User ID: ${this.context.metadata.userId}
- Session started: ${new Date(this.context.metadata.startTime).toISOString()}
- Conversation history: ${this.context.conversationHistory.length} messages
- User preferences: ${JSON.stringify(this.context.userPreferences)}

Current task: ${prompt}
    `;

    const result = await generateText({
      model,
      prompt: contextualPrompt,
      maxTokens: 500,
    });

    // Agent writes to shared context
    this.context.conversationHistory.push(
      { role: 'user', content: prompt },
      { role: 'assistant', content: result.text, agent: agentName }
    );

    // Update session data based on agent output
    this.context.sessionData[`last_${agentName}_response`] = result.text;
    this.context.sessionData[`${agentName}_call_count`] =
      (this.context.sessionData[`${agentName}_call_count`] || 0) + 1;

    return result.text;
  }

  getContext() {
    return this.context;
  }

  updatePreferences(preferences: Record<string, any>) {
    this.context.userPreferences = {
      ...this.context.userPreferences,
      ...preferences,
    };
  }
}

// Usage
const orchestrator = new MultiAgentOrchestrator('user-123', 'session-456');

// Agent 1: Classify user intent
const intent = await orchestrator.runAgent(
  'classifier',
  'User says: "I need help with my billing"'
);

// Agent 2: Can access context from Agent 1
const response = await orchestrator.runAgent(
  'support',
  `Handle billing issue. Detected intent: ${intent}`
);

// Check accumulated context
console.log(orchestrator.getContext().conversationHistory);
// Shows full conversation across both agents

When to use: Sequential agent workflows where later agents need context from earlier agents. Keep context object small (<10KB) to avoid token bloat.

2. Isolated State Pattern

Each agent maintains its own state. Pass only necessary data between agents. Better for parallel workflows and state isolation.

Pattern: Agent-Specific State with Explicit Passing

// Each agent manages own state, pass data explicitly
interface AgentState<T> {
  data: T;
  metadata: {
    agentName: string;
    lastUpdate: number;
    callCount: number;
  };
}

class StatefulAgent<T> {
  private state: AgentState<T>;

  constructor(agentName: string, initialData: T) {
    this.state = {
      data: initialData,
      metadata: {
        agentName,
        lastUpdate: Date.now(),
        callCount: 0,
      },
    };
  }

  async run(prompt: string, model: any, context?: Record<string, any>) {
    // Build prompt with agent-specific state + optional context
    const fullPrompt = `
Agent: ${this.state.metadata.agentName}
Agent state: ${JSON.stringify(this.state.data)}
Additional context: ${context ? JSON.stringify(context) : 'none'}

Task: ${prompt}
    `;

    const result = await generateText({
      model,
      prompt: fullPrompt,
      maxTokens: 500,
    });

    // Update agent-specific state
    this.state.metadata.lastUpdate = Date.now();
    this.state.metadata.callCount++;

    return {
      response: result.text,
      state: this.state,
    };
  }

  updateState(updates: Partial<T>) {
    this.state.data = { ...this.state.data, ...updates };
    this.state.metadata.lastUpdate = Date.now();
  }

  getState() {
    return this.state;
  }
}

// Usage: Parallel agents with isolated state
const classifierAgent = new StatefulAgent('classifier', {
  patterns: ['billing', 'technical', 'sales'],
  confidence_threshold: 0.7,
});

const routerAgent = new StatefulAgent('router', {
  routes: { billing: 'support-team-1', technical: 'support-team-2' },
  fallback: 'general-support',
});

// Run agents in parallel with isolated state
const [classification, routing] = await Promise.all([
  classifierAgent.run('Classify: user needs billing help', openai('gpt-4o-mini')),
  routerAgent.run('Determine best team', openai('gpt-4o-mini')),
]);

// Pass only necessary data between agents
const finalResponse = await routerAgent.run(
  `Route this request: ${classification.response}`,
  openai('gpt-4o-mini'),
  { classification: classification.response } // Explicit context passing
);

When to use: Parallel agent workflows, specialized agents that don't need full system context, or when you need strong state isolation for testing/debugging.

3. Persistent State Pattern (Database-Backed)

Store state in database for long-running workflows, cross-session memory, and production reliability.

Pattern: Supabase-Backed Agent State

// Production pattern: Database-backed state
import { createClient } from '@supabase/supabase-js';
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_ANON_KEY!
);

// Database schema:
// CREATE TABLE agent_state (
//   id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
//   session_id TEXT NOT NULL,
//   agent_name TEXT NOT NULL,
//   state JSONB NOT NULL,
//   created_at TIMESTAMPTZ DEFAULT NOW(),
//   updated_at TIMESTAMPTZ DEFAULT NOW()
// );
// CREATE INDEX idx_session_agent ON agent_state(session_id, agent_name);

class PersistentAgent {
  constructor(
    private sessionId: string,
    private agentName: string
  ) {}

  async loadState() {
    const { data, error } = await supabase
      .from('agent_state')
      .select('state')
      .eq('session_id', this.sessionId)
      .eq('agent_name', this.agentName)
      .single();

    if (error && error.code !== 'PGRST116') {
      // Not found is ok
      throw error;
    }

    return data?.state || {};
  }

  async saveState(state: Record<string, any>) {
    const { error } = await supabase.from('agent_state').upsert(
      {
        session_id: this.sessionId,
        agent_name: this.agentName,
        state,
        updated_at: new Date().toISOString(),
      },
      {
        onConflict: 'session_id,agent_name',
      }
    );

    if (error) throw error;
  }

  async run(prompt: string, model: any = openai('gpt-4o-mini')) {
    // Load state from database
    const state = await this.loadState();

    const fullPrompt = `
Agent: ${this.agentName}
Persistent state: ${JSON.stringify(state)}
Task: ${prompt}
    `;

    const result = await generateText({
      model,
      prompt: fullPrompt,
      maxTokens: 500,
    });

    // Update and persist state
    const updatedState = {
      ...state,
      lastResponse: result.text,
      lastUpdate: new Date().toISOString(),
      callCount: (state.callCount || 0) + 1,
    };

    await this.saveState(updatedState);

    return result.text;
  }
}

// Usage: State persists across requests
const agent = new PersistentAgent('session-123', 'support-agent');

// Request 1
await agent.run('Help user with billing question');

// ... time passes, different server, different request ...

// Request 2: Agent remembers previous interaction
const agent2 = new PersistentAgent('session-123', 'support-agent');
await agent2.run('Follow up on previous billing issue');
// Agent has access to state from Request 1

Production necessity: For any multi-turn conversation or workflow spanning >5 minutes, database-backed state is required. In-memory state dies when your server restarts.

4. Vector Memory Pattern (Semantic Search)

Store conversation history and agent outputs as vector embeddings. Retrieve relevant context based on semantic similarity, not exact matching.

Pattern: Supabase pgvector for Agent Memory

// Semantic memory using vector embeddings
import { embed } from 'ai';
import { openai } from '@ai-sdk/openai';

// Database schema:
// CREATE EXTENSION vector;
// CREATE TABLE agent_memory (
//   id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
//   session_id TEXT NOT NULL,
//   agent_name TEXT NOT NULL,
//   content TEXT NOT NULL,
//   embedding vector(1536),
//   metadata JSONB,
//   created_at TIMESTAMPTZ DEFAULT NOW()
// );
// CREATE INDEX ON agent_memory USING ivfflat (embedding vector_cosine_ops);

class VectorMemoryAgent {
  constructor(
    private sessionId: string,
    private agentName: string
  ) {}

  async storeMemory(content: string, metadata?: Record<string, any>) {
    // Generate embedding
    const { embedding } = await embed({
      model: openai.embedding('text-embedding-3-small'),
      value: content,
    });

    // Store in database with vector
    const { error } = await supabase.from('agent_memory').insert({
      session_id: this.sessionId,
      agent_name: this.agentName,
      content,
      embedding,
      metadata,
    });

    if (error) throw error;
  }

  async searchMemory(query: string, limit: number = 5) {
    // Generate query embedding
    const { embedding } = await embed({
      model: openai.embedding('text-embedding-3-small'),
      value: query,
    });

    // Semantic search using cosine similarity
    const { data, error } = await supabase.rpc('search_agent_memory', {
      query_embedding: embedding,
      match_session_id: this.sessionId,
      match_agent_name: this.agentName,
      match_limit: limit,
    });

    if (error) throw error;

    return data;
  }

  async runWithMemory(prompt: string, model: any = openai('gpt-4o')) {
    // Search for relevant past context
    const relevantMemories = await this.searchMemory(prompt);

    // Build context from semantic search results
    const memoryContext = relevantMemories
      .map((m: any) => `[Past context]: ${m.content}`)
      .join('\n');

    const fullPrompt = `
Relevant past interactions:
${memoryContext}

Current task: ${prompt}
    `;

    const result = await generateText({
      model,
      prompt: fullPrompt,
      maxTokens: 500,
    });

    // Store this interaction in memory
    await this.storeMemory(
      `User: ${prompt}\nAssistant: ${result.text}`,
      { timestamp: new Date().toISOString() }
    );

    return result.text;
  }
}

// SQL function for semantic search:
// CREATE OR REPLACE FUNCTION search_agent_memory(
//   query_embedding vector(1536),
//   match_session_id text,
//   match_agent_name text,
//   match_limit int
// )
// RETURNS TABLE (
//   id uuid,
//   content text,
//   similarity float
// )
// LANGUAGE plpgsql
// AS $$
// BEGIN
//   RETURN QUERY
//   SELECT
//     agent_memory.id,
//     agent_memory.content,
//     1 - (agent_memory.embedding <=> query_embedding) as similarity
//   FROM agent_memory
//   WHERE session_id = match_session_id
//     AND agent_name = match_agent_name
//   ORDER BY agent_memory.embedding <=> query_embedding
//   LIMIT match_limit;
// END;
// $$;

// Usage: Agent remembers semantically similar interactions
const agent = new VectorMemoryAgent('session-123', 'support-agent');

await agent.runWithMemory('User had billing issue last week');
// Agent searches memory, finds semantically related past billing conversations
// Even if exact words don't match

When to use: Long-running agents with extensive history, customer support bots, personal assistants, or any system where "remembering" semantically related context matters more than exact chronological history.

Production State Architectures

1. Layered State Architecture

Separate state by lifecycle and access patterns: ephemeral, session, and long-term state.

Pattern: Three-Layer State

// Production architecture: Layered state management
class LayeredStateAgent {
  // Layer 1: Ephemeral (in-memory, request-scoped)
  private ephemeral: Record<string, any> = {};

  // Layer 2: Session (Redis, 30-min TTL)
  private sessionKey: string;

  // Layer 3: Long-term (Postgres, permanent)
  private userId: string;

  constructor(userId: string, sessionId: string) {
    this.userId = userId;
    this.sessionKey = `session:${sessionId}`;
  }

  // Ephemeral: Temporary computation results
  setEphemeral(key: string, value: any) {
    this.ephemeral[key] = value;
  }

  getEphemeral(key: string) {
    return this.ephemeral[key];
  }

  // Session: Redis-backed, expires after 30 min
  async setSession(key: string, value: any) {
    // Using Upstash Redis or similar
    await redis.setex(`${this.sessionKey}:${key}`, 1800, JSON.stringify(value));
  }

  async getSession(key: string) {
    const data = await redis.get(`${this.sessionKey}:${key}`);
    return data ? JSON.parse(data) : null;
  }

  // Long-term: Postgres, permanent storage
  async setLongTerm(key: string, value: any) {
    await supabase.from('user_state').upsert({
      user_id: this.userId,
      key,
      value,
      updated_at: new Date().toISOString(),
    });
  }

  async getLongTerm(key: string) {
    const { data } = await supabase
      .from('user_state')
      .select('value')
      .eq('user_id', this.userId)
      .eq('key', key)
      .single();

    return data?.value;
  }

  async run(prompt: string, model: any = openai('gpt-4o-mini')) {
    // Read from all layers
    const userPreferences = await this.getLongTerm('preferences');
    const conversationHistory = await this.getSession('history');
    const currentContext = this.getEphemeral('context');

    const fullPrompt = `
User preferences (long-term): ${JSON.stringify(userPreferences)}
Recent conversation (session): ${JSON.stringify(conversationHistory)}
Current context (ephemeral): ${JSON.stringify(currentContext)}

Task: ${prompt}
    `;

    const result = await generateText({ model, prompt: fullPrompt });

    // Write to appropriate layers
    this.setEphemeral('lastResponse', result.text);
    await this.setSession('lastInteraction', {
      prompt,
      response: result.text,
      timestamp: Date.now(),
    });

    return result.text;
  }
}

// Layer decision guide:
// - Ephemeral: Intermediate computation, request-scoped temp data
// - Session: Conversation history, recent preferences, transient state
// - Long-term: User profile, learned preferences, permanent records

2. Event Sourcing Pattern

Store all state changes as immutable events. Reconstruct current state by replaying events. Perfect for audit trails and debugging.

Pattern: Event-Sourced Agent State

// Event sourcing for full agent history
interface StateEvent {
  id: string;
  sessionId: string;
  agentName: string;
  eventType: 'agent_called' | 'state_updated' | 'decision_made';
  data: Record<string, any>;
  timestamp: number;
}

class EventSourcedAgent {
  constructor(
    private sessionId: string,
    private agentName: string
  ) {}

  async appendEvent(eventType: StateEvent['eventType'], data: Record<string, any>) {
    const event: StateEvent = {
      id: crypto.randomUUID(),
      sessionId: this.sessionId,
      agentName: this.agentName,
      eventType,
      data,
      timestamp: Date.now(),
    };

    // Store event (immutable, append-only)
    await supabase.from('agent_events').insert(event);

    return event;
  }

  async getEvents(limit?: number): Promise<StateEvent[]> {
    const query = supabase
      .from('agent_events')
      .select('*')
      .eq('session_id', this.sessionId)
      .eq('agent_name', this.agentName)
      .order('timestamp', { ascending: true });

    if (limit) query.limit(limit);

    const { data } = await query;
    return data || [];
  }

  async reconstructState(): Promise<Record<string, any>> {
    const events = await this.getEvents();

    // Replay events to build current state
    const state: Record<string, any> = {};

    for (const event of events) {
      switch (event.eventType) {
        case 'state_updated':
          Object.assign(state, event.data);
          break;
        case 'decision_made':
          state.lastDecision = event.data;
          break;
        // ... handle other event types
      }
    }

    return state;
  }

  async run(prompt: string, model: any = openai('gpt-4o-mini')) {
    // Reconstruct state from events
    const state = await this.reconstructState();

    await this.appendEvent('agent_called', { prompt });

    const result = await generateText({
      model,
      prompt: `State: ${JSON.stringify(state)}\nTask: ${prompt}`,
    });

    await this.appendEvent('state_updated', {
      lastResponse: result.text,
    });

    return result.text;
  }

  // Debugging: Replay history to specific point
  async replayToTimestamp(timestamp: number) {
    const query = supabase
      .from('agent_events')
      .select('*')
      .eq('session_id', this.sessionId)
      .eq('agent_name', this.agentName)
      .lte('timestamp', timestamp)
      .order('timestamp', { ascending: true });

    const { data: events } = await query;

    const state: Record<string, any> = {};
    for (const event of events || []) {
      if (event.eventType === 'state_updated') {
        Object.assign(state, event.data);
      }
    }

    return state;
  }
}

When to use: Systems requiring full audit trails, debugging complex agent behavior, compliance requirements, or when you need to "replay" agent decisions.

Common Mistakes

❌ No state management at all

Every agent call is independent. No memory, no continuity, no coherent behavior. This isn't a system, it's random function calls.

❌ In-memory state for production

Server restarts, your state dies. Use in-memory for ephemeral data only. Session and long-term state must be persistent.

❌ Putting everything in shared context

Bloated context object means every agent pays token cost for irrelevant data. Pass only what each agent needs.

❌ No state cleanup strategy

State accumulates forever. Set TTLs, implement cleanup jobs, or your database becomes a junk drawer.

❌ Synchronous state updates blocking agent execution

Don't wait for database writes before returning agent response. Update state async unless you need transaction guarantees.

Best Practices

✅ Layer state by lifecycle

Ephemeral (in-memory), session (Redis, 30min TTL), long-term (Postgres). Different data has different lifecycles—treat them differently.

✅ Set explicit TTLs on session state

Don't rely on manual cleanup. Redis SETEX or Postgres scheduled jobs to expire old data automatically.

✅ Use vector memory for semantic search

When conversation history grows beyond 10-20 messages, switch to vector-based retrieval. Pass only semantically relevant context to agents.

✅ Update state async when possible

Return agent response immediately, update database in background. Use transactions only when atomicity matters.

✅ Consider event sourcing for critical workflows

For compliance, debugging, or audit trails, event sourcing gives you complete history of all state changes.

Production Considerations

1. Token Economics of State

Every byte of state you pass to agents costs tokens. Optimize ruthlessly.

Token cost examples:

Entire conversation history (20 messages): ~2,000 tokens = $0.006/request @ Claude 3.5 Sonnet
Vector search (top 5 relevant messages): ~500 tokens = $0.0015/request
Minimal context (user ID + intent): ~50 tokens = $0.00015/request

💰 At 1M requests/month: Minimal context saves $5,850/month vs full history

2. State Cleanup

Implement automated cleanup to prevent unbounded state growth.

// Supabase cleanup job (run daily)
-- Delete session state older than 7 days
DELETE FROM agent_state
WHERE updated_at < NOW() - INTERVAL '7 days'
  AND session_id LIKE 'session-%';

-- Delete orphaned memories (no recent activity)
DELETE FROM agent_memory
WHERE session_id IN (
  SELECT DISTINCT session_id
  FROM agent_memory
  GROUP BY session_id
  HAVING MAX(created_at) < NOW() - INTERVAL '30 days'
);

-- Archive old events (keep last 90 days, archive rest)
INSERT INTO agent_events_archive
SELECT * FROM agent_events
WHERE timestamp < EXTRACT(EPOCH FROM NOW() - INTERVAL '90 days') * 1000;

DELETE FROM agent_events
WHERE timestamp < EXTRACT(EPOCH FROM NOW() - INTERVAL '90 days') * 1000;

3. State Monitoring

Track state size, access patterns, and performance impact.

Key metrics to monitor:

Average state size per session (bytes)
State access latency (Redis/Postgres read time)
State growth rate (MB/day)
Cache hit rate (if using Redis caching)
Token cost from state context ($/request)

Key Takeaways

•State is memory, memory is identity. Without proper state management, you have stateless function calls, not intelligent agents.
•Layer state by lifecycle: Ephemeral (in-memory), session (Redis), long-term (Postgres). Different data needs different storage.
•Token economics matter: Full conversation history costs 10x more than vector search retrieval. Optimize what you pass to agents.
•Use vector memory for semantic search: When history grows beyond 10-20 messages, switch to embedding-based retrieval.
•Implement state cleanup: Set TTLs, run cleanup jobs, archive old data. Unbounded state growth kills performance.
•Consider event sourcing for critical workflows: Full audit trail, debugging capability, compliance requirements all benefit from event-sourced state.

← Model Routing Strategies Tool Orchestration with MCP →