--- title: Your First Extraction description: Extract text from an Office document in 60 seconds. --- import { Aside, Steps, Code, Tabs, TabItem } from '@astrojs/starlight/components'; > *"I'll be honest with you, I love extracting documents. I do. I'm a mcwaddams fan."* Let's get you extracting documents faster than you can say "TPS report cover sheet." --- ## Prerequisites Make sure you have mcwaddams installed and configured: ```bash claude mcp add mcwaddams "uvx mcwaddams" ``` Restart Claude Code, and you're ready. Add to your `claude_desktop_config.json`: ```json { "mcpServers": { "mcwaddams": { "command": "uvx", "args": ["mcwaddams"] } } } ``` Restart Claude Desktop. --- ## Step 1: Find a Document Grab any Office document you have lying around: - A `.docx` report - An `.xlsx` spreadsheet - A `.pptx` presentation - Even a crusty `.doc` from 2005 --- ## Step 2: Ask for Extraction Just tell your AI assistant what you want: ``` Extract text from /path/to/quarterly-report.docx ``` That's it. No configuration, no options, no ceremony. --- ## Step 3: Get Results mcwaddams returns structured data: ```json { "text": "Q4 2024 Financial Summary\n\nRevenue increased by 15%...", "metadata": { "format": "Word Document (DOCX)", "extraction_method": "python-docx", "extraction_time": 0.042, "word_count": 3421 } } ``` The AI can now use this content to answer your questions, summarize, analyze, or whatever you need. --- ## What Just Happened? Behind the scenes, mcwaddams: 1. **Detected the format** — Identified `.docx` as a modern Word document 2. **Selected the best method** — Used `python-docx` for optimal extraction 3. **Extracted the content** — Pulled text while preserving structure 4. **Added metadata** — Included timing and method information --- ## Try Different Formats The same command works for all supported formats: ### Word Documents ``` Extract text from contract.docx Extract text from legacy-proposal.doc ``` ### Excel Spreadsheets ``` Extract text from sales-data.xlsx Extract text from budget-2019.xls ``` ### PowerPoint Presentations ``` Extract text from quarterly-deck.pptx Extract text from old-presentation.ppt ``` ### CSV Files ``` Extract text from export.csv ``` --- ## Working with Large Documents Documents over 25,000 tokens get automatically paginated: ```json { "text": "Chapter 1: Introduction...", "pagination": { "current_page": 1, "total_pages": 5, "cursor_id": "abc123" } } ``` To get the next page: ``` Continue extracting (cursor: abc123) ``` --- ## Common Options You can be more specific about what you want: ### Include Images ``` Extract text and images from report.docx ``` ### Get Metadata Only ``` Get metadata from mystery-file.doc ``` ### Convert to Markdown ``` Convert presentation.pptx to markdown ``` ### Analyze Structure ``` Show me the structure of thesis.docx ``` --- ## Error Messages mcwaddams provides clear errors when something goes wrong: ### File Not Found ```json { "error": "File not found", "path": "/path/to/missing.docx", "hint": "Check that the file path exists and is accessible" } ``` ### Unsupported Format ```json { "error": "Unsupported format", "extension": ".xyz", "hint": "Use get_supported_formats to see all supported types" } ``` ### Password Protected ```json { "error": "Document is password-protected", "hint": "Remove password protection or provide an unencrypted version" } ``` --- ## Next Steps Now that you've extracted your first document: - **[Working with Legacy Formats](/tutorials/legacy-formats/)** — Handle `.doc`, `.xls`, `.ppt` - **[Indexing Large Documents](/tutorials/indexing/)** — Efficient access to huge files - **[Extract Tables](/how-to/extract-tables/)** — Structured table extraction - **[All Tools Reference](/reference/tools/)** — Complete tool documentation ---
"Looks like someone has a case of the Mondays."
Not anymore. Your documents are extracted.