Documentation as a Brain
These docs are not just pages to read. They are the source of truth for the entire Avolve game — and they are machine-readable.
Every document in this collection is embedded as vectors in Supabase pgvector. This means the content can be searched semantically (by meaning, not just keywords) and used as context for AI-generated answers.
How It Works
The pipeline has three stages:
1. Content → Chunks
Each MDX document is split into chunks by section heading. A document with five ## headings becomes six chunks: the introduction plus five sections. Each chunk carries its source file, slug, heading, and frontmatter metadata.
2. Chunks → Vectors
Each chunk is embedded using Google's text-embedding-005 model via Vercel AI Gateway, producing a 768-dimensional vector that captures the semantic meaning of the text. Similar concepts produce similar vectors, regardless of exact wording.
3. Vectors → Answers
When you search or ask a question:
- Your query is embedded into the same vector space
- The database finds the most similar document chunks using cosine similarity
- For search: the matching chunks are returned directly with relevance scores
- For chat: the matching chunks become context for Claude, which generates a grounded answer citing specific documents
Who This Serves
The brain serves multiple stakeholders simultaneously:
| Stakeholder | How They Access It | What They Get |
|---|---|---|
| Players | Search bar on /docs, chat on /docs/ask | Find answers by meaning, ask questions in natural language |
| Search engines | Structured MDX with Schema.org JSON-LD | Clean indexable content with rich metadata |
| AI agents | Vector search API, Supabase RPC | Any agent can query the knowledge base programmatically |
| Admin | Supabase dashboard, embedding scripts | Full control over what is indexed and how |
The same content serves a human reading a doc page, Google indexing the site, an AI agent building context, and an admin debugging the system. One source of truth, multiple access patterns.
Architecture
MDX files (content/docs/)
↓ POST /api/embed
Google text-embedding-005 via Vercel AI Gateway (768 dimensions)
↓ upsert
Supabase pgvector (documents table, HNSW index)
↓ query
/api/search (vector similarity → ranked chunks)
/api/chat (vector similarity → Claude context → streamed answer)
The documents table stores:
| Column | Purpose |
|---|---|
source | Origin path (e.g., docs/game-theory) |
slug | URL-friendly identifier |
heading | Section heading (null for introduction) |
content | The chunk text |
embedding | 768-dim vector |
metadata | Frontmatter as JSON (title, tags, category) |
An HNSW index enables fast approximate nearest-neighbor search. The match_documents function handles similarity queries with configurable threshold and result count.
What Gets Embedded
Only public documentation in content/docs/ is embedded. Internal notes, skill files, and admin references are not included. This is intentional — the brain contains exactly what players should be able to find and what AI agents should be able to cite.
If a piece of information is important enough to be in the brain, it belongs in a doc. If it is internal-only, it stays in skill references or admin notes.
Keeping It Current
The embedding pipeline runs as an API route: POST /api/embed with the service role key in the x-embed-secret header. It deletes existing rows for each source before reinserting, ensuring the database always reflects the current state of the docs. There is no drift between what you read on the page and what the brain knows.
Limitations
- Embedding model: Google text-embedding-005 is fast and cost-effective ($0.025/1M tokens) via Vercel AI Gateway. Sufficient for a focused documentation set.
- Chunk granularity: Splitting by
##headings means very long sections become single chunks. Short sections may lack context. This works well for the current doc structure. - Latency: Search adds an embedding call (~100ms) plus a database query. Chat adds a Claude inference step on top. Both are acceptable for the use case.
- Scope: Only docs are embedded. Future iterations could include Genius entry patterns, quest descriptions, or community knowledge.