docs: flag texlive-latex-extra requirement, recommend tectonic

texlive-xetex alone is rarely enough — pandoc's default template needs packages from texlive-latex-extra (Debian) / texlive-latexextra (Arch): lastpage, xcolor, framed, fancyhdr, etc. Real markdown docs fail with "File 'X.sty' not found" without them. Restructure system deps to present three engine routes per platform: - tectonic (recommended): ~30 MB static binary, downloads packages on demand - full TeX: texlive-xetex + texlive-latex-extra + texlive-fonts-extra - weasyprint: skip TeX entirely, pip-installable Add an engine comparison table in the README explaining the disk-size and quality trade-offs so users can pick informed.
docs: cover markdown_to_pdf, [markdown] extra, uvx + pacman install
2026-05-05 16:29:05 -06:00 · 2026-05-05 16:27:28 -06:00
2 changed files with 218 additions and 37 deletions
--- a/QUICKSTART.md
+++ b/QUICKSTART.md
@ -2,12 +2,38 @@

 ## 1. Installation

-### Option A: Using UV (Recommended for Development)
+### Option A: Run from PyPI with uvx (Recommended for end users)
+
+No clone required — `uvx` fetches and runs in an isolated cached venv:
+
+```bash
+# Bare install
+uvx mcp-pdf
+
+# With markdown_to_pdf support (requires pandoc on host)
+uvx --from "mcp-pdf[markdown]" mcp-pdf
+
+# Force a refresh after a new release
+uvx --refresh --from "mcp-pdf[markdown]" mcp-pdf
+```
+
+### Option B: pip install from PyPI
+
+```bash
+pip install mcp-pdf
+# Or with optional extras:
+pip install "mcp-pdf[markdown]"   # adds markdown_to_pdf
+pip install "mcp-pdf[forms]"      # adds form creation tools
+pip install "mcp-pdf[tables]"     # adds Camelot/Tabula table extraction
+pip install "mcp-pdf[all]"        # everything
+```
+
+### Option C: Local development with uv

 ```bash
 # Clone the repository
-git clone https://github.com/rpm/mcp-pdf-tools
-cd mcp-pdf-tools
+git clone https://github.com/rsp2k/mcp-pdf
+cd mcp-pdf

 # Install with uv
 uv sync
@ -16,41 +42,66 @@ uv sync
 uv run python examples/verify_installation.py
 ```

-### Option B: Using Docker
+### Option D: Using Docker

 ```bash
-# Clone the repository
-git clone https://github.com/rpm/mcp-pdf-tools
-cd mcp-pdf-tools
+git clone https://github.com/rsp2k/mcp-pdf
+cd mcp-pdf

-# Build and run with Docker
-docker-compose build
-docker-compose run --rm mcp-pdf-tools python examples/verify_installation.py
-```
-
-### Option C: From PyPI
-
-```bash
-pip install mcp-pdf-tools
+docker compose build
+docker compose run --rm mcp-pdf python examples/verify_installation.py
 ```

 ## 2. System Dependencies

+`uvx` and `pip` only handle Python deps. Some tools call out to system binaries that you'll need to install separately:
+
+| Binary | Required for |
+|--------|-------------|
+| `tesseract` | `ocr_pdf` |
+| `ghostscript` | Camelot table extraction |
+| `java` (JRE) | Tabula table extraction |
+| `poppler` | PDF→image conversion |
+| `pandoc` | `markdown_to_pdf` |
+| `xelatex` / `pdflatex` / `tectonic` / `weasyprint` / `wkhtmltopdf` | `markdown_to_pdf` (need at least one) |
+
+> **Note on the LaTeX engine:** `texlive-xetex` alone is often not enough for real markdown docs — pandoc's default template needs LaTeX packages (`lastpage`, `xcolor`, `framed`, `fancyhdr`, etc.) that live in `texlive-latex-extra` (Debian) / `texlive-latexextra` (Arch). If you don't already use TeX, **`tectonic` is a much better choice** — it's a ~30 MB static binary that downloads packages on demand. See the README's "Picking a PDF engine" table for details.
+
 ### Ubuntu/Debian
 ```bash
 sudo apt-get update
 sudo apt-get install -y \
-    tesseract-ocr \
-    tesseract-ocr-eng \
-    poppler-utils \
-    ghostscript \
-    python3-tk \
-    default-jre-headless
+    tesseract-ocr tesseract-ocr-eng \
+    poppler-utils ghostscript \
+    python3-tk default-jre-headless
+
+# For markdown_to_pdf, pick one of:
+sudo apt-get install -y pandoc                              # then install tectonic separately
+sudo apt-get install -y pandoc texlive-xetex texlive-latex-extra texlive-fonts-extra   # full TeX
+sudo apt-get install -y pandoc && pip install weasyprint    # skip TeX
 ```

-### macOS
+### Arch Linux
+```bash
+sudo pacman -S \
+    tesseract tesseract-data-eng \
+    poppler ghostscript \
+    jre-openjdk-headless tk
+
+# For markdown_to_pdf, pick one of:
+sudo pacman -S pandoc tectonic                                                 # recommended
+sudo pacman -S pandoc texlive-xetex texlive-latexextra texlive-fontsextra      # full TeX
+sudo pacman -S pandoc && pip install weasyprint                                # skip TeX
+```
+
+### macOS (Homebrew)
 ```bash
 brew install tesseract poppler ghostscript
+
+# For markdown_to_pdf, pick one of:
+brew install pandoc tectonic                          # recommended
+brew install pandoc && brew install --cask mactex-no-gui   # full TeX
+brew install pandoc weasyprint                        # skip TeX
 ```

 ### Windows
@ -58,18 +109,33 @@ brew install tesseract poppler ghostscript
 - Install Poppler: http://blog.alivate.com.au/poppler-windows/
 - Install Ghostscript: https://www.ghostscript.com/download/gsdnld.html
 - Install Java: https://www.java.com/download/
+- Install Pandoc (for `markdown_to_pdf`): https://pandoc.org/installing.html
+- Install MiKTeX or wkhtmltopdf for the PDF engine

-## 3. Claude Desktop Configuration
+## 3. Adding to Claude Code / Claude Desktop

-Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
+### Easiest — `claude mcp add` with uvx
+
+```bash
+# Bare
+claude mcp add pdf-tools -- uvx mcp-pdf
+
+# With markdown_to_pdf support
+claude mcp add pdf-tools -- uvx --from "mcp-pdf[markdown]" mcp-pdf
+```
+
+The `--` separator is required so the Claude CLI doesn't try to parse `--from` as one of its own flags.
+
+### Manual config (Claude Desktop)
+
+Edit `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `~/.config/Claude/claude_desktop_config.json` (Linux):

 ```json
 {
  "mcpServers": {
    "pdf-tools": {
-      "command": "uv",
-      "args": ["run", "mcp-pdf-tools"],
-      "cwd": "/home/rpm/claude/mcp-pdf-tools"
+      "command": "uvx",
+      "args": ["--from", "mcp-pdf[markdown]", "mcp-pdf"]
    }
  }
 }
@ -86,14 +152,20 @@ uv run python examples/test_pdf_tools.py /path/to/your/document.pdf

 ### OCR not working
 - Check Tesseract is installed: `tesseract --version`
- Install language packs: `sudo apt-get install tesseract-ocr-[lang]`
+- Install language packs: `sudo apt-get install tesseract-ocr-[lang]` (Debian) or `sudo pacman -S tesseract-data-[lang]` (Arch)

 ### Table extraction failing
 - Check Java is installed: `java -version`
- For Camelot issues, ensure Ghostscript is installed
+- For Camelot issues, ensure Ghostscript is installed: `gs --version`
+
+### `markdown_to_pdf` errors
+- "pandoc binary not found" → install pandoc (see System Dependencies)
+- "No PDF engine found" → install at least one of `xelatex`, `pdflatex`, `tectonic`, `weasyprint`, `wkhtmltopdf`
+- "Pandoc died with exitcode 43" + `mktexfmt` errors → your TeX install is missing format files; rebuild with `sudo fmtutil-sys --all` or use a different engine via `pdf_engine="weasyprint"`
+- The tool reports `detected_engines` in its response — check that field to see what's actually available

 ### Large PDF issues
- Process specific pages: `pages=[0, 1, 2]`
+- Process specific pages: `pages="1-10"` or `pages="1,3,5"`
 - Increase memory: `export JAVA_OPTS="-Xmx2g"`

 ## 6. Example Usage in Claude
@ -105,6 +177,21 @@ Once configured, you can ask Claude:
 - "Extract all tables from /path/to/report.pdf and format as markdown"
 - "Convert /path/to/document.pdf to markdown format"
 - "Extract images from the first 5 pages of /path/to/presentation.pdf"
+- "Build a PDF from /path/to/notes.md with a table of contents"
+
+## 7. Verify the Built-in Test
+
+Convert this README itself to PDF as a smoke test once everything is wired up:
+
+```python
+markdown_to_pdf(
+    markdown_path="QUICKSTART.md",
+    output_path="/tmp/quickstart.pdf",
+    toc=True,
+)
+```
+
+The response includes `detected_engines` so you can see exactly what's installed on your host.

 ## Need Help?

--- a/README.md
+++ b/README.md
@ -6,7 +6,7 @@

 **A FastMCP server for PDF processing**

-*46 tools for text extraction, OCR, tables, forms, annotations, and more*
+*47 tools for text extraction, OCR, tables, forms, annotations, markdown↔PDF, and more*

 [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg?style=flat-square)](https://www.python.org/downloads/)
 [![FastMCP](https://img.shields.io/badge/FastMCP-2.0+-green.svg?style=flat-square)](https://github.com/jlowin/fastmcp)
@ -31,19 +31,25 @@ MCP PDF extracts content from PDFs using multiple libraries with automatic fallb
 - **Document assembly** - merge, split, reorder pages
 - **Annotations** - sticky notes, highlights, stamps
 - **Vector graphics** - extract to SVG for schematics and technical drawings
+- **Format conversion** - PDF ↔ Markdown (PDF→MD via PyMuPDF, MD→PDF via pandoc)

 ---

 ## Quick Start

 ```bash
-# Install from PyPI
+# Run from PyPI (one-shot, no permanent install)
 uvx mcp-pdf

-# Or add to Claude Code
-claude mcp add pdf-tools uvx mcp-pdf
+# Add to Claude Code — note the `--` separator before uvx
+claude mcp add pdf-tools -- uvx mcp-pdf
+
+# Include the markdown_to_pdf tool (requires pandoc on host)
+claude mcp add pdf-tools -- uvx --from "mcp-pdf[markdown]" mcp-pdf
 ```

+> `uvx` caches tool installs aggressively. After upgrading to a new release, force a refresh with `uvx --refresh mcp-pdf` (or `uvx --refresh --from "mcp-pdf[markdown]" mcp-pdf` if you're using extras).
+
 <details>
 <summary><b>Development Installation</b></summary>

@ -55,6 +61,11 @@ uv sync
 # System dependencies (Ubuntu/Debian)
 sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript

+# For markdown_to_pdf — pick one PDF-engine route:
+sudo apt-get install pandoc tectonic                                          # recommended (small)
+# or:  sudo apt-get install pandoc texlive-xetex texlive-latex-extra          # full TeX
+# or:  sudo apt-get install pandoc && pip install weasyprint                  # skip TeX
+
 # Verify
 uv run python examples/verify_installation.py
 ```
@ -73,10 +84,18 @@ uv run python examples/verify_installation.py
 | `extract_tables` | Extract tables to JSON, CSV, or Markdown |
 | `extract_images` | Extract embedded images |
 | `extract_links` | Get all hyperlinks with page filtering |
-| `pdf_to_markdown` | Convert PDF to markdown preserving structure |
 | `ocr_pdf` | OCR scanned documents using Tesseract |
 | `extract_vector_graphics` | Export vector graphics to SVG (schematics, charts, drawings) |

+### Format Conversion
+
+| Tool | What it does |
+|------|-------------|
+| `pdf_to_markdown` | Convert PDF to markdown preserving structure; extracts images and SVG vectors to disk |
+| `markdown_to_pdf` | Convert `.md` files (or inline text) to PDF via pandoc with auto-detected engine |
+
+**`markdown_to_pdf` requires:** `pip install mcp-pdf[markdown]` plus the pandoc binary and at least one PDF engine (`xelatex`, `pdflatex`, `tectonic`, `weasyprint`, or `wkhtmltopdf`) on PATH. The tool auto-detects what's available and uses the highest-quality one. Pass `pdf_engine=` to override or `extra_args=` for raw pandoc options.
+
 ### Document Analysis

 | Tool | What it does |
@ -193,12 +212,87 @@ Some features require system packages:
 | Camelot tables | `ghostscript` |
 | Tabula tables | `default-jre-headless` |
 | PDF to images | `poppler-utils` |
+| `markdown_to_pdf` | `pandoc` + one of: `tectonic`, `texlive-xetex` (+ `texlive-latex-extra`), `weasyprint`, `wkhtmltopdf` |

-Ubuntu/Debian:
+### Picking a PDF engine for `markdown_to_pdf`
+
+Pandoc takes markdown → HTML or LaTeX → PDF. The LaTeX path produces the most polished output but needs a TeX install. Trade-offs:
+
+| Engine | Disk size | Notes |
+|--------|----------|-------|
+| **`tectonic`** | ~30 MB | **Recommended for new installs.** Single static binary. Downloads LaTeX packages on demand — no upfront mass-install. |
+| `xelatex` + `texlive-latex-extra` | ~500 MB | Best output once installed. Use if you already run TeX. The `-extra` package matters: pandoc's default template needs `lastpage`, `xcolor`, `framed`, `fancyhdr`, etc. — all of which live there, **not** in `texlive-xetex`. |
+| `xelatex` alone (just `texlive-xetex`) | ~200 MB | **Often breaks.** Expect `! LaTeX Error: File 'X.sty' not found` on real docs. |
+| `weasyprint` | ~40 MB | Pure-Python (`pip install weasyprint`) + cairo/pango system libs. HTML/CSS path — no LaTeX. Good for simple docs; weaker on math, footnotes, citations. |
+| `wkhtmltopdf` | ~40 MB | Older HTML-to-PDF tool. Adequate but less actively maintained. |
+
+**Ubuntu/Debian:**
 ```bash
 sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript default-jre-headless
+
+# For markdown_to_pdf — pick one engine route:
+
+# Option A — tectonic (smallest, downloads packages on demand)
+sudo apt-get install pandoc
+# tectonic isn't in apt — install via cargo or download static binary:
+#   https://tectonic-typesetting.github.io/en-US/install.html
+
+# Option B — full TeX (best quality, large download)
+sudo apt-get install pandoc texlive-xetex texlive-latex-extra texlive-fonts-extra
+
+# Option C — weasyprint (skip TeX entirely)
+sudo apt-get install pandoc
+pip install weasyprint
 ```

+**Arch Linux:**
+```bash
+sudo pacman -S tesseract tesseract-data-eng poppler ghostscript jre-openjdk-headless
+
+# For markdown_to_pdf — pick one engine route:
+
+# Option A — tectonic (recommended for new installs, in official repo)
+sudo pacman -S pandoc tectonic
+
+# Option B — full TeX (best output, ~500 MB)
+sudo pacman -S pandoc texlive-xetex texlive-latexextra texlive-fontsextra
+
+# Option C — weasyprint (skip TeX)
+sudo pacman -S pandoc
+pip install weasyprint   # or: uv pip install weasyprint
+
+# Option D — wkhtmltopdf (from AUR)
+yay -S wkhtmltopdf-static
+```
+
+**macOS (Homebrew):**
+```bash
+brew install tesseract poppler ghostscript
+
+# For markdown_to_pdf — pick one engine route:
+
+# Option A — tectonic (recommended)
+brew install pandoc tectonic
+
+# Option B — full TeX (mactex-no-gui includes the latex-extra equivalent)
+brew install pandoc
+brew install --cask mactex-no-gui
+
+# Option C — weasyprint
+brew install pandoc weasyprint
+```
+
+## Optional Extras
+
+The base install stays lean. Heavy or niche dependencies are gated behind extras:
+
+| Extra | Adds | When to install |
+|-------|------|----------------|
+| `mcp-pdf[forms]` | `reportlab` | Form creation tools (`create_form_pdf`, permit forms) |
+| `mcp-pdf[tables]` | `camelot-py`, `tabula-py` | Higher-accuracy table extraction (also needs Java + Ghostscript) |
+| `mcp-pdf[markdown]` | `pypandoc` | `markdown_to_pdf` tool (also needs pandoc binary) |
+| `mcp-pdf[all]` | All of the above | Want everything |
+
 ---

 ## Configuration