# mcarchive-org An MCP (Model Context Protocol) server that lets an LLM search, inspect, and download content from the [Internet Archive](https://archive.org). Built on [FastMCP](https://gofastmcp.com) + [httpx](https://www.python-httpx.org/). No API key required — archive.org's read endpoints are public. ## Tools | Tool | Purpose | |------|---------| | `search_items` | Small Solr-style search via `advancedsearch.php` (1–200 rows, paginated) | | `scrape_items` | Bulk cursor-paginated search via Scrape API (count ≥ 100) | | `get_item_metadata` | Metadata for one item; skips the (possibly huge) files list by default | | `list_files` | Files array with optional format / glob filtering — includes `download_url` per file | | `get_file_url` | Build a canonical download URL without hitting the network | | `download_file` | Stream a file to disk with resume support and optional MD5 verification | Also exposes an MCP resource template: `archive://item/{identifier}`. ## Install & run ```bash # From a checkout: uv sync uv run mcarchive-org # Or from PyPI (once published): uvx mcarchive-org ``` Register with Claude Code: ```bash claude mcp add archive-org -- uvx mcarchive-org # or, from a local checkout: claude mcp add archive-org -- uv run --directory /path/to/mcarchive-org mcarchive-org ``` ## Environment | Variable | Default | Purpose | |----------|---------|---------| | `MCARCHIVE_DOWNLOAD_ROOT` | `./downloads` | Base directory for `download_file` | ## Example flow ``` search_items(query='mediatype:audio AND creator:"Grateful Dead"', sort=['downloads desc']) → identifier 'gd77-05-08.sbd.hicks.4982.sbeok.shnf' (among others) list_files(identifier='gd77-05-08.sbd.hicks.4982.sbeok.shnf', formats=['VBR MP3']) → [{ name: 'gd1977-05-08d1t01.mp3', size: 6342912, md5: '…', download_url: '…' }, …] download_file(identifier='gd77-…', filename='gd1977-05-08d1t01.mp3', verify_md5='…') → { path: './downloads/gd77-…/gd1977-…mp3', bytes: 6342912, md5_ok: True } ``` ## Query syntax notes archive.org uses a Solr/Lucene dialect: - `mediatype:(audio OR movies)` — restrict to media types - `collection:etree` — items in a specific collection - `date:[1977-01-01 TO 1977-12-31]` — date ranges - `creator:"Grateful Dead"` — phrase match - `-subject:bootleg` — exclusion - Sort by `downloads desc`, `date asc`, `addeddate desc`, etc. See [archive.org's search docs](https://archive.org/advancedsearch.php) for the full grammar. ## License MIT