texlive-xetex alone is rarely enough — pandoc's default template needs packages from texlive-latex-extra (Debian) / texlive-latexextra (Arch): lastpage, xcolor, framed, fancyhdr, etc. Real markdown docs fail with "File 'X.sty' not found" without them. Restructure system deps to present three engine routes per platform: - tectonic (recommended): ~30 MB static binary, downloads packages on demand - full TeX: texlive-xetex + texlive-latex-extra + texlive-fonts-extra - weasyprint: skip TeX entirely, pip-installable Add an engine comparison table in the README explaining the disk-size and quality trade-offs so users can pick informed.
6.3 KiB
Quick Start Guide
1. Installation
Option A: Run from PyPI with uvx (Recommended for end users)
No clone required — uvx fetches and runs in an isolated cached venv:
# Bare install
uvx mcp-pdf
# With markdown_to_pdf support (requires pandoc on host)
uvx --from "mcp-pdf[markdown]" mcp-pdf
# Force a refresh after a new release
uvx --refresh --from "mcp-pdf[markdown]" mcp-pdf
Option B: pip install from PyPI
pip install mcp-pdf
# Or with optional extras:
pip install "mcp-pdf[markdown]" # adds markdown_to_pdf
pip install "mcp-pdf[forms]" # adds form creation tools
pip install "mcp-pdf[tables]" # adds Camelot/Tabula table extraction
pip install "mcp-pdf[all]" # everything
Option C: Local development with uv
# Clone the repository
git clone https://github.com/rsp2k/mcp-pdf
cd mcp-pdf
# Install with uv
uv sync
# Verify installation
uv run python examples/verify_installation.py
Option D: Using Docker
git clone https://github.com/rsp2k/mcp-pdf
cd mcp-pdf
docker compose build
docker compose run --rm mcp-pdf python examples/verify_installation.py
2. System Dependencies
uvx and pip only handle Python deps. Some tools call out to system binaries that you'll need to install separately:
| Binary | Required for |
|---|---|
tesseract |
ocr_pdf |
ghostscript |
Camelot table extraction |
java (JRE) |
Tabula table extraction |
poppler |
PDF→image conversion |
pandoc |
markdown_to_pdf |
xelatex / pdflatex / tectonic / weasyprint / wkhtmltopdf |
markdown_to_pdf (need at least one) |
Note on the LaTeX engine:
texlive-xetexalone is often not enough for real markdown docs — pandoc's default template needs LaTeX packages (lastpage,xcolor,framed,fancyhdr, etc.) that live intexlive-latex-extra(Debian) /texlive-latexextra(Arch). If you don't already use TeX,tectonicis a much better choice — it's a ~30 MB static binary that downloads packages on demand. See the README's "Picking a PDF engine" table for details.
Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y \
tesseract-ocr tesseract-ocr-eng \
poppler-utils ghostscript \
python3-tk default-jre-headless
# For markdown_to_pdf, pick one of:
sudo apt-get install -y pandoc # then install tectonic separately
sudo apt-get install -y pandoc texlive-xetex texlive-latex-extra texlive-fonts-extra # full TeX
sudo apt-get install -y pandoc && pip install weasyprint # skip TeX
Arch Linux
sudo pacman -S \
tesseract tesseract-data-eng \
poppler ghostscript \
jre-openjdk-headless tk
# For markdown_to_pdf, pick one of:
sudo pacman -S pandoc tectonic # recommended
sudo pacman -S pandoc texlive-xetex texlive-latexextra texlive-fontsextra # full TeX
sudo pacman -S pandoc && pip install weasyprint # skip TeX
macOS (Homebrew)
brew install tesseract poppler ghostscript
# For markdown_to_pdf, pick one of:
brew install pandoc tectonic # recommended
brew install pandoc && brew install --cask mactex-no-gui # full TeX
brew install pandoc weasyprint # skip TeX
Windows
- Install Tesseract: https://github.com/UB-Mannheim/tesseract/wiki
- Install Poppler: http://blog.alivate.com.au/poppler-windows/
- Install Ghostscript: https://www.ghostscript.com/download/gsdnld.html
- Install Java: https://www.java.com/download/
- Install Pandoc (for
markdown_to_pdf): https://pandoc.org/installing.html - Install MiKTeX or wkhtmltopdf for the PDF engine
3. Adding to Claude Code / Claude Desktop
Easiest — claude mcp add with uvx
# Bare
claude mcp add pdf-tools -- uvx mcp-pdf
# With markdown_to_pdf support
claude mcp add pdf-tools -- uvx --from "mcp-pdf[markdown]" mcp-pdf
The -- separator is required so the Claude CLI doesn't try to parse --from as one of its own flags.
Manual config (Claude Desktop)
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or ~/.config/Claude/claude_desktop_config.json (Linux):
{
"mcpServers": {
"pdf-tools": {
"command": "uvx",
"args": ["--from", "mcp-pdf[markdown]", "mcp-pdf"]
}
}
}
4. Test the Tools
# Test with a sample PDF
uv run python examples/test_pdf_tools.py /path/to/your/document.pdf
5. Common Issues
OCR not working
- Check Tesseract is installed:
tesseract --version - Install language packs:
sudo apt-get install tesseract-ocr-[lang](Debian) orsudo pacman -S tesseract-data-[lang](Arch)
Table extraction failing
- Check Java is installed:
java -version - For Camelot issues, ensure Ghostscript is installed:
gs --version
markdown_to_pdf errors
- "pandoc binary not found" → install pandoc (see System Dependencies)
- "No PDF engine found" → install at least one of
xelatex,pdflatex,tectonic,weasyprint,wkhtmltopdf - "Pandoc died with exitcode 43" +
mktexfmterrors → your TeX install is missing format files; rebuild withsudo fmtutil-sys --allor use a different engine viapdf_engine="weasyprint" - The tool reports
detected_enginesin its response — check that field to see what's actually available
Large PDF issues
- Process specific pages:
pages="1-10"orpages="1,3,5" - Increase memory:
export JAVA_OPTS="-Xmx2g"
6. Example Usage in Claude
Once configured, you can ask Claude:
- "Extract text from the PDF at /path/to/document.pdf"
- "Check if /path/to/scan.pdf is a scanned document"
- "Extract all tables from /path/to/report.pdf and format as markdown"
- "Convert /path/to/document.pdf to markdown format"
- "Extract images from the first 5 pages of /path/to/presentation.pdf"
- "Build a PDF from /path/to/notes.md with a table of contents"
7. Verify the Built-in Test
Convert this README itself to PDF as a smoke test once everything is wired up:
markdown_to_pdf(
markdown_path="QUICKSTART.md",
output_path="/tmp/quickstart.pdf",
toc=True,
)
The response includes detected_engines so you can see exactly what's installed on your host.
Need Help?
- Check the full README.md for detailed documentation
- Run tests:
uv run pytest - Enable debug mode: Set
DEBUG=truein your .env file