diff --git a/QUICKSTART.md b/QUICKSTART.md index 1e2dbdc..db5889f 100644 --- a/QUICKSTART.md +++ b/QUICKSTART.md @@ -65,6 +65,8 @@ docker compose run --rm mcp-pdf python examples/verify_installation.py | `pandoc` | `markdown_to_pdf` | | `xelatex` / `pdflatex` / `tectonic` / `weasyprint` / `wkhtmltopdf` | `markdown_to_pdf` (need at least one) | +> **Note on the LaTeX engine:** `texlive-xetex` alone is often not enough for real markdown docs — pandoc's default template needs LaTeX packages (`lastpage`, `xcolor`, `framed`, `fancyhdr`, etc.) that live in `texlive-latex-extra` (Debian) / `texlive-latexextra` (Arch). If you don't already use TeX, **`tectonic` is a much better choice** — it's a ~30 MB static binary that downloads packages on demand. See the README's "Picking a PDF engine" table for details. + ### Ubuntu/Debian ```bash sudo apt-get update @@ -73,8 +75,10 @@ sudo apt-get install -y \ poppler-utils ghostscript \ python3-tk default-jre-headless -# For markdown_to_pdf -sudo apt-get install -y pandoc texlive-xetex +# For markdown_to_pdf, pick one of: +sudo apt-get install -y pandoc # then install tectonic separately +sudo apt-get install -y pandoc texlive-xetex texlive-latex-extra texlive-fonts-extra # full TeX +sudo apt-get install -y pandoc && pip install weasyprint # skip TeX ``` ### Arch Linux @@ -84,21 +88,20 @@ sudo pacman -S \ poppler ghostscript \ jre-openjdk-headless tk -# For markdown_to_pdf -sudo pacman -S pandoc texlive-xetex -# Lighter alternative engines: tectonic (official repo), -# wkhtmltopdf (AUR), or `pip install weasyprint` (works in any venv) +# For markdown_to_pdf, pick one of: +sudo pacman -S pandoc tectonic # recommended +sudo pacman -S pandoc texlive-xetex texlive-latexextra texlive-fontsextra # full TeX +sudo pacman -S pandoc && pip install weasyprint # skip TeX ``` ### macOS (Homebrew) ```bash brew install tesseract poppler ghostscript -# For markdown_to_pdf -brew install pandoc -brew install --cask mactex-no-gui # full TeX with xelatex/pdflatex -# Or lighter: -brew install weasyprint +# For markdown_to_pdf, pick one of: +brew install pandoc tectonic # recommended +brew install pandoc && brew install --cask mactex-no-gui # full TeX +brew install pandoc weasyprint # skip TeX ``` ### Windows diff --git a/README.md b/README.md index 5e194e3..87c1368 100644 --- a/README.md +++ b/README.md @@ -61,8 +61,10 @@ uv sync # System dependencies (Ubuntu/Debian) sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript -# For markdown_to_pdf: -sudo apt-get install pandoc texlive-xetex # or: weasyprint, wkhtmltopdf +# For markdown_to_pdf — pick one PDF-engine route: +sudo apt-get install pandoc tectonic # recommended (small) +# or: sudo apt-get install pandoc texlive-xetex texlive-latex-extra # full TeX +# or: sudo apt-get install pandoc && pip install weasyprint # skip TeX # Verify uv run python examples/verify_installation.py @@ -210,34 +212,74 @@ Some features require system packages: | Camelot tables | `ghostscript` | | Tabula tables | `default-jre-headless` | | PDF to images | `poppler-utils` | -| `markdown_to_pdf` | `pandoc` + one of: `texlive-xetex`, `texlive-latex-base`, `tectonic`, `weasyprint`, `wkhtmltopdf` | +| `markdown_to_pdf` | `pandoc` + one of: `tectonic`, `texlive-xetex` (+ `texlive-latex-extra`), `weasyprint`, `wkhtmltopdf` | + +### Picking a PDF engine for `markdown_to_pdf` + +Pandoc takes markdown → HTML or LaTeX → PDF. The LaTeX path produces the most polished output but needs a TeX install. Trade-offs: + +| Engine | Disk size | Notes | +|--------|----------|-------| +| **`tectonic`** | ~30 MB | **Recommended for new installs.** Single static binary. Downloads LaTeX packages on demand — no upfront mass-install. | +| `xelatex` + `texlive-latex-extra` | ~500 MB | Best output once installed. Use if you already run TeX. The `-extra` package matters: pandoc's default template needs `lastpage`, `xcolor`, `framed`, `fancyhdr`, etc. — all of which live there, **not** in `texlive-xetex`. | +| `xelatex` alone (just `texlive-xetex`) | ~200 MB | **Often breaks.** Expect `! LaTeX Error: File 'X.sty' not found` on real docs. | +| `weasyprint` | ~40 MB | Pure-Python (`pip install weasyprint`) + cairo/pango system libs. HTML/CSS path — no LaTeX. Good for simple docs; weaker on math, footnotes, citations. | +| `wkhtmltopdf` | ~40 MB | Older HTML-to-PDF tool. Adequate but less actively maintained. | **Ubuntu/Debian:** ```bash sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript default-jre-headless -# For markdown_to_pdf — pandoc plus at least one PDF engine -sudo apt-get install pandoc texlive-xetex +# For markdown_to_pdf — pick one engine route: + +# Option A — tectonic (smallest, downloads packages on demand) +sudo apt-get install pandoc +# tectonic isn't in apt — install via cargo or download static binary: +# https://tectonic-typesetting.github.io/en-US/install.html + +# Option B — full TeX (best quality, large download) +sudo apt-get install pandoc texlive-xetex texlive-latex-extra texlive-fonts-extra + +# Option C — weasyprint (skip TeX entirely) +sudo apt-get install pandoc +pip install weasyprint ``` **Arch Linux:** ```bash sudo pacman -S tesseract tesseract-data-eng poppler ghostscript jre-openjdk-headless -# For markdown_to_pdf — pandoc plus at least one PDF engine -sudo pacman -S pandoc texlive-xetex -# Lighter alternatives (pick one): tectonic, wkhtmltopdf (AUR), or pip install weasyprint +# For markdown_to_pdf — pick one engine route: + +# Option A — tectonic (recommended for new installs, in official repo) +sudo pacman -S pandoc tectonic + +# Option B — full TeX (best output, ~500 MB) +sudo pacman -S pandoc texlive-xetex texlive-latexextra texlive-fontsextra + +# Option C — weasyprint (skip TeX) +sudo pacman -S pandoc +pip install weasyprint # or: uv pip install weasyprint + +# Option D — wkhtmltopdf (from AUR) +yay -S wkhtmltopdf-static ``` **macOS (Homebrew):** ```bash brew install tesseract poppler ghostscript -# For markdown_to_pdf +# For markdown_to_pdf — pick one engine route: + +# Option A — tectonic (recommended) +brew install pandoc tectonic + +# Option B — full TeX (mactex-no-gui includes the latex-extra equivalent) brew install pandoc -brew install --cask mactex-no-gui # for xelatex/pdflatex -# Or a lighter engine: -brew install weasyprint +brew install --cask mactex-no-gui + +# Option C — weasyprint +brew install pandoc weasyprint ``` ## Optional Extras