pdf_to_markdown generated pdf-image:// URIs that never resolved — the
resource handler only existed in the legacy server. Add output_directory
parameter: when set, images extract to disk with relative ./images/ paths.
Without it, existing pdf-image:// behavior preserved for backward compat.
Also adds min_width/min_height filtering (matching extract_images),
save_markdown option, and fixes missing extract_vector_graphics in
list_capabilities.
Coordinate-based PDF form filling for scanned/flat PDFs.
reportlab is now optional - only loaded when permit tools used.
Install with: pip install mcp-pdf[forms]
PROBLEM:
Table extraction from large PDFs was exceeding MCP's 25,000 token limit,
causing "response too large" errors. A 5-page PDF with large tables
generated 59,005 tokens, more than double the allowed limit.
SOLUTION:
Added flexible table data limiting with two new parameters:
- max_rows_per_table: Limit rows returned per table (prevents overflow)
- summary_only: Return only metadata without table data
IMPLEMENTATION:
1. Added new parameters to extract_tables() method signature
2. Created _process_table_data() helper for consistent limiting logic
3. Updated all 3 extraction methods (Camelot, pdfplumber, Tabula)
4. Enhanced table metadata with truncation tracking:
- total_rows: Full row count from PDF
- rows_returned: Actual rows in response (after limiting)
- rows_truncated: Number of rows omitted (if limited)
USAGE EXAMPLES:
# Summary mode - metadata only (smallest response)
extract_tables(pdf_path, pages="1-5", summary_only=True)
# Limited data - first 100 rows per table
extract_tables(pdf_path, pages="1-5", max_rows_per_table=100)
# Full data (default behavior, may overflow on large tables)
extract_tables(pdf_path, pages="1-5")
BENEFITS:
- Prevents MCP token overflow errors
- Maintains backward compatibility (new params are optional)
- Clear guidance through metadata (shows when truncation occurred)
- Flexible - users choose between summary/limited/full modes
FILES MODIFIED:
- src/mcp_pdf/mixins_official/table_extraction.py (all changes)
- src/mcp_pdf/server.py (version bump to 2.0.7)
- pyproject.toml (version bump to 2.0.7)
VERSION: 2.0.7
PUBLISHED: https://pypi.org/project/mcp-pdf/2.0.7/
BREAKING ISSUE FIXED:
- Users reported "Output path not allowed: images" error
- extract_images tool was rejecting relative paths due to overly restrictive security
NEW SECURITY MODEL:
- MCP_PDF_ALLOWED_PATHS environment variable controls allowed output directories
- If unset: Allows any directory with "security theater" warnings
- If set: Restricts outputs to specified colon-separated paths
- Cross-platform compatible (: on Unix, ; on Windows)
SECURITY PHILOSOPHY ENHANCED:
- "TRUST NO ONE" - honest about application-level security limitations
- Clear warnings that this is "security theater"
- Emphasis on OS-level permissions and process isolation
- Educational guidance on real security practices
TECHNICAL CHANGES:
- validate_output_path() rewritten with environment variable control
- Path validation uses relative_to() for proper containment checking
- Enhanced warning messages with security education
- Updated documentation with honest security assessment
DOCUMENTATION UPDATES:
- Added MCP_PDF_ALLOWED_PATHS to configuration section
- New "REAL Security" section with OS-level recommendations
- Clear explanation of security theater vs actual protection
Version: 1.1.1 (patch version for critical bugfix)
New Features:
- extract_links: Extract all PDF hyperlinks with advanced filtering
- Page-specific filtering (e.g., "1,3,5" or "1-5,8,10-12")
- Link type categorization: external URLs, internal pages, emails, documents
- Coordinate tracking for precise link positioning
- FastMCP integration with proper tool registration
- Version banner display following CLAUDE.md guidelines
Technical Improvements:
- Enhanced startup banner with package version display
- Updated documentation to reflect 24 specialized tools
- Proper FastMCP @mcp.tool() decorator usage
- Comprehensive error handling and security validation
Documentation Updates:
- README.md: Updated tool count and installation guides
- CLAUDE.md: Added link extraction to implemented features
- LOCAL_DEVELOPMENT.md: Enhanced with scoped installation commands
Version: 1.1.0 (minor version bump for new feature)
- Fix variable scope bug in extract_text function
- Add local development setup with claude-mcp-manager
- Update author information
- Add comprehensive local development documentation
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
**Package Rebranding:**
- Renamed package from mcp-pdf-tools to mcp-pdf (cleaner name)
- Updated version to 1.0.0 (production ready with security hardening)
- Updated all import paths and references throughout codebase
**PyPI Preparation:**
- Enhanced package description and metadata
- Added proper project URLs and homepage
- Updated CLI command from mcp-pdf-tools to mcp-pdf
- Built distribution packages (wheel + source)
**Testing & Validation:**
- All 20 security tests pass with new package structure
- Local installation and import tests successful
- CLI command working correctly
- Package ready for PyPI publication
The secure, production-ready PDF processing platform is now ready
for public distribution and installation via pip.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>