--- title: The Backstory description: How Milton Waddams became the patron saint of legacy document processing. --- import { Aside } from '@astrojs/starlight/components'; # The Backstory > *"I was told I could listen to the radio at a reasonable volume from nine to eleven..."* ## The Relocation Milton Waddams was relocated to the basement. They took his stapler. But down there, surrounded by boxes of `.doc` files from 1997 and `.xls` spreadsheets that predate Unicode, he became something else entirely. He became a **document processing expert**. ## The Problem Every enterprise has them: - **The Archive Folder** — 50,000 Word documents from before the cloud existed - **The Legacy Database Export** — Excel files with formulas referencing cells that no longer exist - **The Board Presentations** — PowerPoint decks with embedded charts from 2003 - **The Contract Repository** — `.doc` files that crash modern Word AI agents can read PDFs. They can parse JSON. But Office documents? The binary formats, the OLE containers, the OOXML with custom schemas? *Nobody wants to deal with that.* ## The Solution **mcwaddams** handles the documents nobody else wants to touch. ```python # Extract text from a 1997 .doc file result = await extract_text("contract_final_FINAL_v2.doc") # It just works print(result["text"]) ``` ### What We Handle | Format | Era | Status | |--------|-----|--------| | `.docx` | 2007+ | ✅ Full support | | `.doc` | 1997-2007 | ✅ Works fine | | `.xlsx` | 2007+ | ✅ Full support | | `.xls` | 1997-2007 | ✅ Works fine | | `.pptx` | 2007+ | ✅ Full support | | `.ppt` | 1997-2007 | ✅ Works fine | ## The Philosophy ### 1. No Silent Failures When python-docx can't handle a file, mammoth tries. When openpyxl fails, pandas steps in. You'll always get either content or a clear error message explaining why. ### 2. Legacy is Not Abandoned Those `.doc` files from 2003? They're still business-critical for someone. We don't treat legacy formats as second-class citizens. ### 3. Context-Aware Extraction Large documents get paginated automatically. The MCP resource system lets you fetch chapters on-demand. Your context window stays manageable. ### 4. Testing Painful Stuff We threw 301 random Office documents at mcwaddams. **299 succeeded.** The 2 failures were empty/corrupt files. See the [TPS Reports](/tps/dashboard/) for proof. --- ## The Name Milton Waddams. The guy with the stapler. Relegated to the basement with the old filing cabinets and the roaches. That's where the legacy documents live too. *"I could set the building on fire..."* ---