diff --git a/docs/reference-architecture-mcp-chat.md b/docs/reference-architecture-mcp-chat.md new file mode 100644 index 0000000..47f25c9 --- /dev/null +++ b/docs/reference-architecture-mcp-chat.md @@ -0,0 +1,967 @@ +# Reference Architecture: MCP Server + SSE Chat on FastAPI + +Pattern for adding an MCP server and a streaming chat assistant to an existing FastAPI application with any frontend framework. First built for the [Margaret Hamilton Digital Archive](https://hamilton.warehack.ing) (Starlight + vanilla JS + FastAPI), then adapted for [SpiceBook](https://spicebook.warehack.ing) (Astro SSR + React 19 + FastAPI). Both are in production. + +--- + +## Origin Story + +The Hamilton Archive needed a chat assistant that could answer questions about Apollo-era documents using RAG (retrieval-augmented generation). The requirements were: + +1. **MCP server** — so Claude Code and other MCP clients could query the archive programmatically +2. **Chat panel** — floating widget on all pages, streaming LLM responses via SSE, aware of whatever the user was currently reading (a Starlight page, a PDF in the viewer, etc.) +3. **RAG pipeline** — semantic search → batch SQL fetch → character-budget truncation → LLM completion + +This was built as vanilla TypeScript (no framework) because the Hamilton Archive uses Starlight with static output — there's no React, no Zustand, no build-time component hydration. The chat widget is a single 1,125-line `.ts` file that does manual DOM manipulation, localStorage conversation management, and inline Lucide SVG icon paths. + +When the same pattern was needed for SpiceBook, the architecture was adapted: + +- **Frontend**: React 19 with Zustand for state, split across `ChatWidget.tsx` + `chat-store.ts` + `chat-api.ts` +- **Context model**: `PageContext(title, path, description)` → `NotebookContext(notebook_id, title, engine)` — the domain changed but the shape is identical +- **RAG function**: `_build_context(query)` → `_build_notebook_context(req)` — this is the main customization point between deployments +- **Caddy routing**: per-route `handle` blocks → single `@api.path` matcher — simpler but less precise + +**What stays identical** across both projects: + +| Component | Identical? | Notes | +|-----------|:----------:|-------| +| SSE event protocol | Yes | `status`, `token`, `reasoning`, `error`, `done` | +| SSE client parser | Yes | `parseSSEBlock()` with `\n\n` boundary detection | +| `_sse_event()` helper | Yes | Compact JSON formatting | +| httpx streaming client | Yes | Same timeouts, limits, connection pooling | +| `_chat_completion_stream()` | Yes | Same SSE line parser for OpenAI-compatible endpoints | +| MCP mounting pattern | Yes | `mcp.http_app()` + `combine_lifespans()` + `app.mount("/mcp", ...)` | +| FastMCP tool conventions | Yes | Return `str` (JSON), never raise `HTTPException` | +| Conversation limits | Yes | MAX_CONVERSATIONS=20, MAX_MESSAGES=50 | +| Title derivation | Yes | First user message truncated to ~50-60 chars | + +--- + +## Two Frontend Variants + +### Variant A: Vanilla TypeScript (Hamilton Archive) + +**Single file**: `chat-widget.ts` (1,125 lines) — no framework, no build-time hydration, no npm state library. + +**Entry point**: +```typescript +// ChatWidget.astro (11 lines) +import { initChatWidget } from './chat-widget.ts' +initChatWidget() +document.addEventListener('astro:after-swap', initChatWidget) +``` + +**State management**: Module-scoped variables + direct localStorage: +```typescript +const STORAGE_KEY_INDEX = 'hamilton-chat-conversations' +const STORAGE_KEY_ACTIVE = 'hamilton-chat-active' +const STORAGE_KEY_PREFIX = 'hamilton-chat-conv-' +const STORAGE_KEY_LEGACY = 'hamilton-chat-history' // flat format, auto-migrated +``` + +Storage uses a split architecture: an index array (conversation metadata) stored separately from individual conversation message arrays (`STORAGE_KEY_PREFIX + id`). This avoids loading all message content when just rendering the history list. + +**DOM manipulation**: `data-open`/`data-view` attributes on the widget root element control CSS visibility. Rendering is imperative — `renderMessages()`, `renderHistoryList()`, etc. Lucide icons are pasted as inline SVG path strings (no icon library dependency). + +**Key advantage**: Zero JS framework overhead. The widget works in any static site (Starlight, plain HTML, Hugo) because it only needs a `