mcwaddams-site/src/content/docs/backstory.mdx
Ryan Malloy 32b41f79d9 Initial mcwaddams documentation site
Astro + Starlight documentation site for mcwaddams MCP server.

Features:
- Diátaxis documentation structure (tutorials, how-to, reference, explanation)
- Office Space theming (Milton Waddams, TPS Reports, red Swingline)
- 29 documentation pages covering all 20 tools
- TPS Reports section for test results
- Flair gamification config (pieces of documentation flair)
- Custom CSS with Office Space color scheme

Structure:
- Getting Started: backstory, installation, quickstart
- Tutorials: first extraction, legacy formats, indexing, resources
- How-To: tables, Excel analysis, markdown, pagination, URLs
- Reference: all tools, universal/word/excel tools, MCP resources, formats
- Explanation: architecture, mixins, fallbacks, resource system
- TPS Reports: dashboard, coverage, torture tests
- Community: credits, feedback, leaderboard
2026-01-11 12:21:49 -07:00

99 lines
3.1 KiB
Plaintext

---
title: The Backstory
description: How Milton Waddams became the patron saint of legacy document processing.
---
import { Aside } from '@astrojs/starlight/components';
# The Backstory
> *"I was told I could listen to the radio at a reasonable volume from nine to eleven..."*
## The Relocation
Milton Waddams was relocated to the basement. They took his stapler. But down there, surrounded by boxes of `.doc` files from 1997 and `.xls` spreadsheets that predate Unicode, he became something else entirely.
He became a **document processing expert**.
<Aside type="tip" title="Fun Fact">
After the movie *Office Space* came out, Swingline didn't actually make a red stapler. Demand was so high they started manufacturing them. The movie created the product.
</Aside>
## The Problem
Every enterprise has them:
- **The Archive Folder** — 50,000 Word documents from before the cloud existed
- **The Legacy Database Export** — Excel files with formulas referencing cells that no longer exist
- **The Board Presentations** — PowerPoint decks with embedded charts from 2003
- **The Contract Repository** — `.doc` files that crash modern Word
AI agents can read PDFs. They can parse JSON. But Office documents? The binary formats, the OLE containers, the OOXML with custom schemas?
*Nobody wants to deal with that.*
## The Solution
**mcwaddams** handles the documents nobody else wants to touch.
```python
# Extract text from a 1997 .doc file
result = await extract_text("contract_final_FINAL_v2.doc")
# It just works
print(result["text"])
```
### What We Handle
| Format | Era | Status |
|--------|-----|--------|
| `.docx` | 2007+ | ✅ Full support |
| `.doc` | 1997-2007 | ✅ Works fine |
| `.xlsx` | 2007+ | ✅ Full support |
| `.xls` | 1997-2007 | ✅ Works fine |
| `.pptx` | 2007+ | ✅ Full support |
| `.ppt` | 1997-2007 | ✅ Works fine |
<Aside type="caution" title="Did You Get the Memo?">
If a document is password-protected or encrypted, we'll detect it and tell you. We can't extract content from encrypted files, but we won't silently fail either.
</Aside>
## The Philosophy
### 1. No Silent Failures
When python-docx can't handle a file, mammoth tries. When openpyxl fails, pandas steps in. You'll always get either content or a clear error message explaining why.
### 2. Legacy is Not Abandoned
Those `.doc` files from 2003? They're still business-critical for someone. We don't treat legacy formats as second-class citizens.
### 3. Context-Aware Extraction
Large documents get paginated automatically. The MCP resource system lets you fetch chapters on-demand. Your context window stays manageable.
### 4. Testing Painful Stuff
We threw 301 random Office documents at mcwaddams. **299 succeeded.** The 2 failures were empty/corrupt files.
See the [TPS Reports](/tps/dashboard/) for proof.
---
## The Name
Milton Waddams. The guy with the stapler. Relegated to the basement with the old filing cabinets and the roaches.
That's where the legacy documents live too.
*"I could set the building on fire..."*
---
<div style="text-align: center; margin-top: 2rem;">
**Ready to start?** → [Installation](/installation/)
</div>