docs: flag texlive-latex-extra requirement, recommend tectonic
Some checks are pending
Security Scan / security-scan (push) Waiting to run

texlive-xetex alone is rarely enough — pandoc's default template needs
packages from texlive-latex-extra (Debian) / texlive-latexextra (Arch):
lastpage, xcolor, framed, fancyhdr, etc. Real markdown docs fail with
"File 'X.sty' not found" without them.

Restructure system deps to present three engine routes per platform:
- tectonic (recommended): ~30 MB static binary, downloads packages on demand
- full TeX: texlive-xetex + texlive-latex-extra + texlive-fonts-extra
- weasyprint: skip TeX entirely, pip-installable

Add an engine comparison table in the README explaining the disk-size
and quality trade-offs so users can pick informed.
This commit is contained in:
Ryan Malloy 2026-05-05 16:29:05 -06:00
parent 964fd14a26
commit 31b8b2e6d4
2 changed files with 68 additions and 23 deletions

View File

@ -65,6 +65,8 @@ docker compose run --rm mcp-pdf python examples/verify_installation.py
| `pandoc` | `markdown_to_pdf` | | `pandoc` | `markdown_to_pdf` |
| `xelatex` / `pdflatex` / `tectonic` / `weasyprint` / `wkhtmltopdf` | `markdown_to_pdf` (need at least one) | | `xelatex` / `pdflatex` / `tectonic` / `weasyprint` / `wkhtmltopdf` | `markdown_to_pdf` (need at least one) |
> **Note on the LaTeX engine:** `texlive-xetex` alone is often not enough for real markdown docs — pandoc's default template needs LaTeX packages (`lastpage`, `xcolor`, `framed`, `fancyhdr`, etc.) that live in `texlive-latex-extra` (Debian) / `texlive-latexextra` (Arch). If you don't already use TeX, **`tectonic` is a much better choice** — it's a ~30 MB static binary that downloads packages on demand. See the README's "Picking a PDF engine" table for details.
### Ubuntu/Debian ### Ubuntu/Debian
```bash ```bash
sudo apt-get update sudo apt-get update
@ -73,8 +75,10 @@ sudo apt-get install -y \
poppler-utils ghostscript \ poppler-utils ghostscript \
python3-tk default-jre-headless python3-tk default-jre-headless
# For markdown_to_pdf # For markdown_to_pdf, pick one of:
sudo apt-get install -y pandoc texlive-xetex sudo apt-get install -y pandoc # then install tectonic separately
sudo apt-get install -y pandoc texlive-xetex texlive-latex-extra texlive-fonts-extra # full TeX
sudo apt-get install -y pandoc && pip install weasyprint # skip TeX
``` ```
### Arch Linux ### Arch Linux
@ -84,21 +88,20 @@ sudo pacman -S \
poppler ghostscript \ poppler ghostscript \
jre-openjdk-headless tk jre-openjdk-headless tk
# For markdown_to_pdf # For markdown_to_pdf, pick one of:
sudo pacman -S pandoc texlive-xetex sudo pacman -S pandoc tectonic # recommended
# Lighter alternative engines: tectonic (official repo), sudo pacman -S pandoc texlive-xetex texlive-latexextra texlive-fontsextra # full TeX
# wkhtmltopdf (AUR), or `pip install weasyprint` (works in any venv) sudo pacman -S pandoc && pip install weasyprint # skip TeX
``` ```
### macOS (Homebrew) ### macOS (Homebrew)
```bash ```bash
brew install tesseract poppler ghostscript brew install tesseract poppler ghostscript
# For markdown_to_pdf # For markdown_to_pdf, pick one of:
brew install pandoc brew install pandoc tectonic # recommended
brew install --cask mactex-no-gui # full TeX with xelatex/pdflatex brew install pandoc && brew install --cask mactex-no-gui # full TeX
# Or lighter: brew install pandoc weasyprint # skip TeX
brew install weasyprint
``` ```
### Windows ### Windows

View File

@ -61,8 +61,10 @@ uv sync
# System dependencies (Ubuntu/Debian) # System dependencies (Ubuntu/Debian)
sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript
# For markdown_to_pdf: # For markdown_to_pdf — pick one PDF-engine route:
sudo apt-get install pandoc texlive-xetex # or: weasyprint, wkhtmltopdf sudo apt-get install pandoc tectonic # recommended (small)
# or: sudo apt-get install pandoc texlive-xetex texlive-latex-extra # full TeX
# or: sudo apt-get install pandoc && pip install weasyprint # skip TeX
# Verify # Verify
uv run python examples/verify_installation.py uv run python examples/verify_installation.py
@ -210,34 +212,74 @@ Some features require system packages:
| Camelot tables | `ghostscript` | | Camelot tables | `ghostscript` |
| Tabula tables | `default-jre-headless` | | Tabula tables | `default-jre-headless` |
| PDF to images | `poppler-utils` | | PDF to images | `poppler-utils` |
| `markdown_to_pdf` | `pandoc` + one of: `texlive-xetex`, `texlive-latex-base`, `tectonic`, `weasyprint`, `wkhtmltopdf` | | `markdown_to_pdf` | `pandoc` + one of: `tectonic`, `texlive-xetex` (+ `texlive-latex-extra`), `weasyprint`, `wkhtmltopdf` |
### Picking a PDF engine for `markdown_to_pdf`
Pandoc takes markdown → HTML or LaTeX → PDF. The LaTeX path produces the most polished output but needs a TeX install. Trade-offs:
| Engine | Disk size | Notes |
|--------|----------|-------|
| **`tectonic`** | ~30 MB | **Recommended for new installs.** Single static binary. Downloads LaTeX packages on demand — no upfront mass-install. |
| `xelatex` + `texlive-latex-extra` | ~500 MB | Best output once installed. Use if you already run TeX. The `-extra` package matters: pandoc's default template needs `lastpage`, `xcolor`, `framed`, `fancyhdr`, etc. — all of which live there, **not** in `texlive-xetex`. |
| `xelatex` alone (just `texlive-xetex`) | ~200 MB | **Often breaks.** Expect `! LaTeX Error: File 'X.sty' not found` on real docs. |
| `weasyprint` | ~40 MB | Pure-Python (`pip install weasyprint`) + cairo/pango system libs. HTML/CSS path — no LaTeX. Good for simple docs; weaker on math, footnotes, citations. |
| `wkhtmltopdf` | ~40 MB | Older HTML-to-PDF tool. Adequate but less actively maintained. |
**Ubuntu/Debian:** **Ubuntu/Debian:**
```bash ```bash
sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript default-jre-headless sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript default-jre-headless
# For markdown_to_pdf — pandoc plus at least one PDF engine # For markdown_to_pdf — pick one engine route:
sudo apt-get install pandoc texlive-xetex
# Option A — tectonic (smallest, downloads packages on demand)
sudo apt-get install pandoc
# tectonic isn't in apt — install via cargo or download static binary:
# https://tectonic-typesetting.github.io/en-US/install.html
# Option B — full TeX (best quality, large download)
sudo apt-get install pandoc texlive-xetex texlive-latex-extra texlive-fonts-extra
# Option C — weasyprint (skip TeX entirely)
sudo apt-get install pandoc
pip install weasyprint
``` ```
**Arch Linux:** **Arch Linux:**
```bash ```bash
sudo pacman -S tesseract tesseract-data-eng poppler ghostscript jre-openjdk-headless sudo pacman -S tesseract tesseract-data-eng poppler ghostscript jre-openjdk-headless
# For markdown_to_pdf — pandoc plus at least one PDF engine # For markdown_to_pdf — pick one engine route:
sudo pacman -S pandoc texlive-xetex
# Lighter alternatives (pick one): tectonic, wkhtmltopdf (AUR), or pip install weasyprint # Option A — tectonic (recommended for new installs, in official repo)
sudo pacman -S pandoc tectonic
# Option B — full TeX (best output, ~500 MB)
sudo pacman -S pandoc texlive-xetex texlive-latexextra texlive-fontsextra
# Option C — weasyprint (skip TeX)
sudo pacman -S pandoc
pip install weasyprint # or: uv pip install weasyprint
# Option D — wkhtmltopdf (from AUR)
yay -S wkhtmltopdf-static
``` ```
**macOS (Homebrew):** **macOS (Homebrew):**
```bash ```bash
brew install tesseract poppler ghostscript brew install tesseract poppler ghostscript
# For markdown_to_pdf # For markdown_to_pdf — pick one engine route:
# Option A — tectonic (recommended)
brew install pandoc tectonic
# Option B — full TeX (mactex-no-gui includes the latex-extra equivalent)
brew install pandoc brew install pandoc
brew install --cask mactex-no-gui # for xelatex/pdflatex brew install --cask mactex-no-gui
# Or a lighter engine:
brew install weasyprint # Option C — weasyprint
brew install pandoc weasyprint
``` ```
## Optional Extras ## Optional Extras