docs: rewrite LOCAL_DEVELOPMENT.md, delete stale CLAUDE_DESKTOP_SETUP.md
Some checks are pending
Security Scan / security-scan (push) Waiting to run
Some checks are pending
Security Scan / security-scan (push) Waiting to run
CLAUDE_DESKTOP_SETUP.md was actively misleading — listed 8 tools (out of 47), referenced the old `mcp-pdf-tools` package name, and had hardcoded user paths. README.md and QUICKSTART.md cover the same territory correctly now, and nothing in the repo links to it. LOCAL_DEVELOPMENT.md kept its structure (setup → wiring up → testing → publishing → gotchas) but updated to reflect current reality: - `claude mcp add` syntax now uses the required `--` separator - Three patterns shown (local source, pinned PyPI version, latest PyPI with --refresh) since they each serve different dev workflows - markdown_to_pdf added to manual verification checklist - Publishing pipeline now matches what we actually do (clean dist/, PII audit per global rules, twine for upload since uv publish doesn't read ~/.pypirc) - Common gotchas section: mktexfmt errors, FunctionTool test failures, PyPI JSON caching — all real things hit during this session - Removed claim that the server has "23 PDF tools"
This commit is contained in:
parent
31b8b2e6d4
commit
c3dd788120
@ -1,88 +0,0 @@
|
|||||||
# Claude Desktop MCP Configuration
|
|
||||||
|
|
||||||
This document explains how the MCP PDF Tools server has been configured for Claude Desktop.
|
|
||||||
|
|
||||||
## Configuration Location
|
|
||||||
|
|
||||||
The MCP configuration has been added to:
|
|
||||||
```
|
|
||||||
/home/rpm/.config/Claude/claude_desktop_config.json
|
|
||||||
```
|
|
||||||
|
|
||||||
## PDF Tools Server Configuration
|
|
||||||
|
|
||||||
The following configuration has been added to your Claude Desktop:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"mcpServers": {
|
|
||||||
"pdf-tools": {
|
|
||||||
"command": "uv",
|
|
||||||
"args": [
|
|
||||||
"--directory",
|
|
||||||
"/home/rpm/claude/mcp-pdf-tools",
|
|
||||||
"run",
|
|
||||||
"mcp-pdf-tools"
|
|
||||||
],
|
|
||||||
"env": {
|
|
||||||
"PDF_TEMP_DIR": "/tmp/mcp-pdf-processing"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## What This Enables
|
|
||||||
|
|
||||||
With this configuration, all your Claude sessions will have access to:
|
|
||||||
|
|
||||||
- **extract_text**: Extract text from PDFs with multiple method support
|
|
||||||
- **extract_tables**: Extract tables from PDFs with intelligent fallbacks
|
|
||||||
- **extract_images**: Extract and filter images from PDFs
|
|
||||||
- **extract_metadata**: Get comprehensive PDF metadata and file information
|
|
||||||
- **get_document_structure**: Analyze PDF structure, outline, and fonts
|
|
||||||
- **is_scanned_pdf**: Detect if PDFs are scanned/image-based
|
|
||||||
- **ocr_pdf**: Perform OCR on scanned PDFs with preprocessing
|
|
||||||
- **pdf_to_markdown**: Convert PDFs to clean markdown format
|
|
||||||
|
|
||||||
## Environment Variables
|
|
||||||
|
|
||||||
- `PDF_TEMP_DIR`: Set to `/tmp/mcp-pdf-processing` for temporary file processing
|
|
||||||
|
|
||||||
## Backup
|
|
||||||
|
|
||||||
A backup of your original configuration has been saved to:
|
|
||||||
```
|
|
||||||
/home/rpm/.config/Claude/claude_desktop_config.json.backup
|
|
||||||
```
|
|
||||||
|
|
||||||
## Testing
|
|
||||||
|
|
||||||
The server has been tested and is working correctly. You can verify it's available in new Claude sessions by checking for the `mcp__pdf-tools__*` functions.
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
If you encounter issues:
|
|
||||||
|
|
||||||
1. **Server not starting**: Check that all dependencies are installed:
|
|
||||||
```bash
|
|
||||||
cd /home/rpm/claude/mcp-pdf-tools
|
|
||||||
uv sync --dev
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **System dependencies missing**: Install required packages:
|
|
||||||
```bash
|
|
||||||
sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils ghostscript python3-tk default-jre-headless
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Permission issues**: Ensure temp directory exists:
|
|
||||||
```bash
|
|
||||||
mkdir -p /tmp/mcp-pdf-processing
|
|
||||||
chmod 755 /tmp/mcp-pdf-processing
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Test server manually**:
|
|
||||||
```bash
|
|
||||||
cd /home/rpm/claude/mcp-pdf-tools
|
|
||||||
uv run mcp-pdf-tools --help
|
|
||||||
```
|
|
||||||
@ -1,201 +1,161 @@
|
|||||||
# 🔧 Local Development Guide for MCP PDF
|
# Local Development Guide
|
||||||
|
|
||||||
This guide shows how to test MCP PDF locally during development before publishing to PyPI.
|
This guide is for working **on** mcp-pdf — testing changes against a local Claude Code session before publishing to PyPI. End-user install instructions live in [README.md](README.md) and [QUICKSTART.md](QUICKSTART.md).
|
||||||
|
|
||||||
## 📋 Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
- Python 3.10+
|
- Python 3.10+
|
||||||
- uv package manager
|
- [uv](https://docs.astral.sh/uv/) package manager
|
||||||
- Claude Desktop app
|
- Claude Code CLI (or Claude Desktop app)
|
||||||
- Git repository cloned locally
|
- Git
|
||||||
|
|
||||||
## 🚀 Quick Start for Local Testing
|
System binaries needed at runtime are listed in [README.md → System Dependencies](README.md#system-dependencies). For development you can skip the ones you're not actively touching (e.g. skip `pandoc` if you're not changing `markdown_to_pdf`).
|
||||||
|
|
||||||
### 1. Clone and Setup
|
## Setup
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Clone the repository
|
|
||||||
git clone https://github.com/rsp2k/mcp-pdf.git
|
git clone https://github.com/rsp2k/mcp-pdf.git
|
||||||
cd mcp-pdf
|
cd mcp-pdf
|
||||||
|
|
||||||
# Install dependencies
|
# Install dev deps + all optional extras so every tool is exercised
|
||||||
uv sync --dev
|
uv sync --all-extras --dev
|
||||||
|
|
||||||
# Verify installation
|
# Smoke test — does the package even import?
|
||||||
uv run python -c "from mcp_pdf.server import create_server; print('✅ MCP PDF loads successfully')"
|
uv run python -c "from mcp_pdf.server import create_server; print('ok')"
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Add MCP Server to Claude Desktop
|
## Wiring up a local checkout to Claude Code
|
||||||
|
|
||||||
#### For Production Use (PyPI Installation)
|
Three useful patterns, in order of how often you'll want each:
|
||||||
|
|
||||||
Install the published version from PyPI:
|
### Pattern 1 — Local source, hot-reload edits
|
||||||
|
|
||||||
|
Best when iterating on tool code. `uv run` resolves to the local checkout, so saves are picked up next time the MCP server restarts.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# For personal use across all projects
|
# From inside the mcp-pdf checkout
|
||||||
claude mcp add -s local pdf-tools uvx mcp-pdf
|
claude mcp add -s project pdf-tools-dev -- uv --directory . run mcp-pdf
|
||||||
|
|
||||||
# For project-specific use (isolated to current directory)
|
# Or from anywhere, with an absolute path
|
||||||
claude mcp add -s project pdf-tools uvx mcp-pdf
|
claude mcp add -s project pdf-tools-dev -- uv --directory /path/to/mcp-pdf run mcp-pdf
|
||||||
```
|
```
|
||||||
|
|
||||||
#### For Local Development (Source Installation)
|
The `--` separator is required so the Claude CLI doesn't try to parse `--directory` as one of its own flags.
|
||||||
|
|
||||||
When developing MCP PDF itself, use the local source:
|
### Pattern 2 — Specific PyPI version (regression testing)
|
||||||
|
|
||||||
|
Useful for confirming that a regression report on a published version actually reproduces.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# For development from local source
|
claude mcp add -s project pdf-tools-pinned -- uvx --from "mcp-pdf==2.2.0" mcp-pdf
|
||||||
claude mcp add -s project pdf-tools-dev uv -- --directory /path/to/mcp-pdf-tools run mcp-pdf
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Or if you're in the mcp-pdf directory:
|
### Pattern 3 — Latest from PyPI (post-publish smoke test)
|
||||||
|
|
||||||
|
After running `uv publish`, verify the new version actually works in Claude before announcing it:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Development server from current directory
|
claude mcp add -s project pdf-tools-released -- uvx --refresh --from "mcp-pdf[markdown]" mcp-pdf
|
||||||
claude mcp add -s project pdf-tools-dev uv -- --directory . run mcp-pdf
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. Alternative: Manual Server Testing
|
`--refresh` forces uvx to bypass its cache, otherwise it'll keep using the previous release.
|
||||||
|
|
||||||
You can also run the server manually for debugging:
|
## Running the server outside Claude (for debugging)
|
||||||
|
|
||||||
|
Sometimes the MCP transport hides the real failure. Run the server directly:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Run the MCP server directly
|
|
||||||
uv run mcp-pdf
|
uv run mcp-pdf
|
||||||
|
# Or, equivalently:
|
||||||
# Or run with specific FastMCP options
|
|
||||||
uv run python -m mcp_pdf.server
|
uv run python -m mcp_pdf.server
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Test Core Functionality
|
Then use a test harness like the FastMCP in-process client, or hit the stdio transport directly with JSON-RPC messages. The server logs go to stderr.
|
||||||
|
|
||||||
Once connected to Claude Code, test these key features:
|
## Tests
|
||||||
|
|
||||||
#### Basic PDF Processing
|
|
||||||
```
|
|
||||||
"Extract text from this PDF file: /path/to/test.pdf"
|
|
||||||
"Get metadata from this PDF: /path/to/document.pdf"
|
|
||||||
"Check if this PDF is scanned: /path/to/scan.pdf"
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Security Features
|
|
||||||
```
|
|
||||||
"Try to extract text from a very large PDF"
|
|
||||||
"Process a PDF with 2000 pages" (should be limited to 1000)
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Advanced Features
|
|
||||||
```
|
|
||||||
"Extract tables from this PDF: /path/to/tables.pdf"
|
|
||||||
"Convert this PDF to markdown: /path/to/document.pdf"
|
|
||||||
"Add annotations to this PDF: /path/to/target.pdf"
|
|
||||||
```
|
|
||||||
|
|
||||||
## 🔒 Security Testing
|
|
||||||
|
|
||||||
Verify the security hardening works:
|
|
||||||
|
|
||||||
### File Size Limits
|
|
||||||
- Try processing a PDF larger than 100MB
|
|
||||||
- Should see: "PDF file too large: X bytes > 104857600"
|
|
||||||
|
|
||||||
### Page Count Limits
|
|
||||||
- Try processing a PDF with >1000 pages
|
|
||||||
- Should see: "PDF too large for processing: X pages > 1000"
|
|
||||||
|
|
||||||
### Path Traversal Protection
|
|
||||||
- Test with malicious paths like `../../../etc/passwd`
|
|
||||||
- Should be blocked with security error
|
|
||||||
|
|
||||||
### JSON Input Validation
|
|
||||||
- Large JSON inputs (>10KB) should be rejected
|
|
||||||
- Malformed JSON should return clean error messages
|
|
||||||
|
|
||||||
## 🐛 Debugging
|
|
||||||
|
|
||||||
### Enable Debug Logging
|
|
||||||
```bash
|
```bash
|
||||||
export DEBUG=true
|
# Full suite
|
||||||
uv run mcp-pdf
|
uv run pytest
|
||||||
```
|
|
||||||
|
|
||||||
### Check Security Functions
|
# A single area
|
||||||
```bash
|
uv run pytest tests/test_server.py
|
||||||
# Test security validation functions
|
|
||||||
|
# With coverage
|
||||||
|
uv run pytest --cov=mcp_pdf
|
||||||
|
|
||||||
|
# Security-focused tests live at the repo root (not under tests/)
|
||||||
uv run python test_security_features.py
|
uv run python test_security_features.py
|
||||||
|
|
||||||
# Run integration tests
|
|
||||||
uv run python test_integration.py
|
uv run python test_integration.py
|
||||||
```
|
```
|
||||||
|
|
||||||
### Verify Package Structure
|
## Manual verification before a release
|
||||||
|
|
||||||
|
A short list to walk through against a real PDF before publishing. Update version in `pyproject.toml` first.
|
||||||
|
|
||||||
|
- [ ] `extract_text` on a small PDF returns text
|
||||||
|
- [ ] `extract_text` on a 1000+ page PDF chunks correctly (no MCP "response too large" errors)
|
||||||
|
- [ ] `extract_tables` finds tables in a financial-report-style PDF
|
||||||
|
- [ ] `pdf_to_markdown` writes to disk by default and the inline escape hatch (`inline=True`) returns full markdown
|
||||||
|
- [ ] `markdown_to_pdf` converts a real `.md` → PDF via the auto-detected engine (run with `pdf_engine=None` and check the `engine_used` field in the response)
|
||||||
|
- [ ] `extract_form_data` + `fill_form_pdf` round-trip on an interactive form
|
||||||
|
- [ ] `add_sticky_notes`, `add_highlights`, `add_stamps` all produce viewable annotations in Acrobat
|
||||||
|
- [ ] Path traversal blocked: passing `../../etc/passwd` returns a sanitized error
|
||||||
|
- [ ] Page count limit blocked: a >1000-page PDF returns a sanitized error
|
||||||
|
- [ ] Error messages don't leak `/home/$USER/` paths (the `sanitize_error_message` helper handles this)
|
||||||
|
|
||||||
|
## Publishing pipeline
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Check package builds correctly
|
# 1. Bump version in pyproject.toml (date-based or semver — see global rules)
|
||||||
|
|
||||||
|
# 2. Sync uv.lock — easy to forget; the lockfile and pyproject.toml versions
|
||||||
|
# must agree or the published wheel and sdist disagree
|
||||||
|
uv lock
|
||||||
|
|
||||||
|
# 3. Clean dist/ — uv publish uploads everything in dist/, so stale files
|
||||||
|
# from previous releases cause duplicate-upload errors
|
||||||
|
rm -rf dist/
|
||||||
|
|
||||||
|
# 4. Build
|
||||||
uv build
|
uv build
|
||||||
|
|
||||||
# Verify package metadata
|
# 5. Pre-publish PII audit — see ~/.claude/rules/python.md for the
|
||||||
|
# full grep recipe. The unpacked-sdist check is non-negotiable.
|
||||||
|
mkdir -p /tmp/sdist-audit && tar -xzf dist/*.tar.gz -C /tmp/sdist-audit
|
||||||
|
grep -rnEi 'real-domain|10\.[0-9]+\.|192\.168\.|/home/' /tmp/sdist-audit/
|
||||||
|
|
||||||
|
# 6. Validate metadata
|
||||||
uv run twine check dist/*
|
uv run twine check dist/*
|
||||||
|
|
||||||
|
# 7. Upload (uv publish doesn't read ~/.pypirc — twine does)
|
||||||
|
uv run twine upload dist/*
|
||||||
|
|
||||||
|
# 8. Verify the version is live
|
||||||
|
curl -s -o /dev/null -w "HTTP %{http_code}\n" \
|
||||||
|
https://pypi.org/project/mcp-pdf/$(grep ^version pyproject.toml | cut -d'"' -f2)/
|
||||||
```
|
```
|
||||||
|
|
||||||
## 📊 Testing Checklist
|
Once published, push tags + commits to both remotes:
|
||||||
|
|
||||||
Before publishing, verify:
|
|
||||||
|
|
||||||
- [ ] All 23 PDF tools work correctly
|
|
||||||
- [ ] Security limits are enforced (file size, page count)
|
|
||||||
- [ ] Error messages are clean and helpful
|
|
||||||
- [ ] No sensitive information leaked in errors
|
|
||||||
- [ ] Path traversal protection works
|
|
||||||
- [ ] JSON input validation works
|
|
||||||
- [ ] Memory limits prevent crashes
|
|
||||||
- [ ] CLI command `mcp-pdf` works
|
|
||||||
- [ ] Package imports correctly: `from mcp_pdf.server import create_server`
|
|
||||||
|
|
||||||
## 🚀 Publishing Pipeline
|
|
||||||
|
|
||||||
Once local testing passes:
|
|
||||||
|
|
||||||
1. **Version Bump**: Update version in `pyproject.toml`
|
|
||||||
2. **Build**: `uv build`
|
|
||||||
3. **Test Upload**: `uv run twine upload --repository testpypi dist/*`
|
|
||||||
4. **Test Install**: `pip install -i https://test.pypi.org/simple/ mcp-pdf`
|
|
||||||
5. **Production Upload**: `uv run twine upload dist/*`
|
|
||||||
|
|
||||||
## 🔧 Development Commands
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Format code
|
git push new-origin main
|
||||||
uv run black src/ tests/
|
git push origin main # gitea mirror
|
||||||
|
|
||||||
# Lint code
|
|
||||||
uv run ruff check src/ tests/
|
|
||||||
|
|
||||||
# Run tests
|
|
||||||
uv run pytest
|
|
||||||
|
|
||||||
# Security scan
|
|
||||||
uv run pip-audit
|
|
||||||
|
|
||||||
# Build package
|
|
||||||
uv build
|
|
||||||
|
|
||||||
# Install editable for development
|
|
||||||
pip install -e . # (in a venv)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🆘 Troubleshooting
|
## Common gotchas
|
||||||
|
|
||||||
### "Module not found" errors
|
### "Module not found" after a `git pull`
|
||||||
- Ensure you're in the right directory
|
Run `uv sync --all-extras` again — new optional dependencies don't auto-install.
|
||||||
- Run `uv sync` to install dependencies
|
|
||||||
- Check Python path with `uv run python -c "import sys; print(sys.path)"`
|
|
||||||
|
|
||||||
### MCP server won't start
|
### MCP server connects but tools don't appear
|
||||||
- Check that all system dependencies are installed (tesseract, java, ghostscript)
|
The Claude Code session caches the tool list at MCP-server-connect time. After adding a new tool, run `/mcp` and reconnect.
|
||||||
- Verify with: `uv run python examples/verify_installation.py`
|
|
||||||
|
|
||||||
### Security tests fail
|
### `markdown_to_pdf` errors with `mktexfmt: Did not find entry for byfmt=xelatex`
|
||||||
- Run `uv run python test_security_features.py -v` for detailed output
|
The host's TeX install is missing format files. Either run `sudo fmtutil-sys --all` to regenerate them, or sidestep by passing `pdf_engine="weasyprint"` (after `pip install weasyprint`) or `pdf_engine="tectonic"`.
|
||||||
- Check that security constants are properly set
|
|
||||||
|
|
||||||
This setup allows for rapid development and testing without polluting your system Python or needing to publish to PyPI for every change.
|
### Tests fail with `'FunctionTool' object is not callable`
|
||||||
|
Tests under `tests/test_server.py` call the `@mcp_tool`-decorated methods directly, but `@mcp_tool` wraps them. Call the methods via the mixin instance instead — see `tests/test_mixin_architecture.py` for the working pattern.
|
||||||
|
|
||||||
|
### `uv publish` succeeds but PyPI shows old version
|
||||||
|
PyPI's package-level JSON cache (`/pypi/<pkg>/json`) lags by a minute or two. The version-specific URL (`/pypi/<pkg>/<ver>/`) updates immediately — use that for verification.
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user