mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-03 06:12:14 +08:00
feat: add skills system for specialized agent workflows (#6)
Implement a skills framework that enables specialized workflows for specific tasks (e.g., PDF processing, web page generation). Skills are discovered from the skills/ directory and automatically mounted in sandboxes with path mapping support. - Add SkillsConfig for configuring skills path and container mount point - Implement dynamic skill loading from SKILL.md files with YAML frontmatter - Add path mapping in LocalSandbox to translate container paths to local paths - Mount skills directory in AIO Docker sandbox containers - Update lead agent prompt to dynamically inject available skills - Add setup documentation and expand config.example.yaml Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
44
README.md
44
README.md
@@ -2,6 +2,50 @@
|
||||
|
||||
> Originated from Open Source, give back to Open Source.
|
||||
|
||||
A LangGraph-based AI agent backend with sandbox execution capabilities.
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. **Configure the application**:
|
||||
```bash
|
||||
# Copy example configuration
|
||||
cp config.example.yaml config.yaml
|
||||
|
||||
# Set your API keys
|
||||
export OPENAI_API_KEY="your-key-here"
|
||||
# or edit config.yaml directly
|
||||
```
|
||||
|
||||
2. **Install dependencies**:
|
||||
```bash
|
||||
cd backend
|
||||
make install
|
||||
```
|
||||
|
||||
3. **Run development server**:
|
||||
```bash
|
||||
make dev
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
deer-flow/
|
||||
├── config.example.yaml # Configuration template (copy to config.yaml)
|
||||
├── backend/ # Backend application
|
||||
│ ├── src/ # Source code
|
||||
│ └── docs/ # Documentation
|
||||
├── frontend/ # Frontend application
|
||||
└── skills/ # Agent skills
|
||||
├── public/ # Public skills
|
||||
└── custom/ # Custom skills
|
||||
```
|
||||
|
||||
## Documentation
|
||||
|
||||
- [Configuration Guide](backend/docs/CONFIGURATION.md) - Setup and configuration instructions
|
||||
- [Architecture Overview](backend/CLAUDE.md) - Technical architecture details
|
||||
|
||||
## License
|
||||
|
||||
This project is open source and available under the [MIT License](./LICENSE).
|
||||
|
||||
@@ -26,11 +26,20 @@ make format
|
||||
|
||||
### Configuration System
|
||||
|
||||
The app uses a YAML-based configuration system loaded from `config.yaml`. Configuration priority:
|
||||
The app uses a YAML-based configuration system loaded from `config.yaml`.
|
||||
|
||||
**Setup**: Copy `config.example.yaml` to `config.yaml` in the **project root** directory and customize for your environment.
|
||||
|
||||
```bash
|
||||
# From project root (deer-flow/)
|
||||
cp config.example.yaml config.yaml
|
||||
```
|
||||
|
||||
Configuration priority:
|
||||
1. Explicit `config_path` argument
|
||||
2. `DEER_FLOW_CONFIG_PATH` environment variable
|
||||
3. `config.yaml` in current directory
|
||||
4. `config.yaml` in parent directory
|
||||
3. `config.yaml` in current directory (backend/)
|
||||
4. `config.yaml` in parent directory (project root - **recommended location**)
|
||||
|
||||
Config values starting with `$` are resolved as environment variables (e.g., `$OPENAI_API_KEY`).
|
||||
|
||||
@@ -61,12 +70,25 @@ Config values starting with `$` are resolved as environment variables (e.g., `$O
|
||||
- `resolve_variable()` imports module and returns variable (e.g., `module:variable`)
|
||||
- `resolve_class()` imports and validates class against base class
|
||||
|
||||
**Skills System** (`src/skills/`)
|
||||
- Skills provide specialized workflows for specific tasks (e.g., PDF processing, frontend design)
|
||||
- Located in `deer-flow/skills/{public,custom}` directory structure
|
||||
- Each skill has a `SKILL.md` file with YAML front matter (name, description, license)
|
||||
- Skills are automatically discovered and loaded at runtime
|
||||
- `load_skills()` scans directories and parses SKILL.md files
|
||||
- Skills are injected into agent's system prompt with paths
|
||||
- Path mapping system allows seamless access in both local and Docker sandbox:
|
||||
- Local sandbox: `/mnt/skills` → `/path/to/deer-flow/skills`
|
||||
- Docker sandbox: Automatically mounted as volume
|
||||
|
||||
### Config Schema
|
||||
|
||||
Models, tools, and sandbox providers are configured in `config.yaml`:
|
||||
Models, tools, sandbox providers, and skills are configured in `config.yaml`:
|
||||
- `models[]`: LLM configurations with `use` class path
|
||||
- `tools[]`: Tool configurations with `use` variable path and `group`
|
||||
- `sandbox.use`: Sandbox provider class path
|
||||
- `skills.path`: Host path to skills directory (optional, default: `../skills`)
|
||||
- `skills.container_path`: Container mount path (default: `/mnt/skills`)
|
||||
|
||||
## Code Style
|
||||
|
||||
|
||||
76
backend/SETUP.md
Normal file
76
backend/SETUP.md
Normal file
@@ -0,0 +1,76 @@
|
||||
# Setup Guide
|
||||
|
||||
Quick setup instructions for DeerFlow.
|
||||
|
||||
## Configuration Setup
|
||||
|
||||
DeerFlow uses a YAML configuration file that should be placed in the **project root directory**.
|
||||
|
||||
### Steps
|
||||
|
||||
1. **Navigate to project root**:
|
||||
```bash
|
||||
cd /path/to/deer-flow
|
||||
```
|
||||
|
||||
2. **Copy example configuration**:
|
||||
```bash
|
||||
cp config.example.yaml config.yaml
|
||||
```
|
||||
|
||||
3. **Edit configuration**:
|
||||
```bash
|
||||
# Option A: Set environment variables (recommended)
|
||||
export OPENAI_API_KEY="your-key-here"
|
||||
|
||||
# Option B: Edit config.yaml directly
|
||||
vim config.yaml # or your preferred editor
|
||||
```
|
||||
|
||||
4. **Verify configuration**:
|
||||
```bash
|
||||
cd backend
|
||||
python -c "from src.config import get_app_config; print('✓ Config loaded:', get_app_config().models[0].name)"
|
||||
```
|
||||
|
||||
## Important Notes
|
||||
|
||||
- **Location**: `config.yaml` should be in `deer-flow/` (project root), not `deer-flow/backend/`
|
||||
- **Git**: `config.yaml` is automatically ignored by git (contains secrets)
|
||||
- **Priority**: If both `backend/config.yaml` and `../config.yaml` exist, backend version takes precedence
|
||||
|
||||
## Configuration File Locations
|
||||
|
||||
The backend searches for `config.yaml` in this order:
|
||||
|
||||
1. `DEER_FLOW_CONFIG_PATH` environment variable (if set)
|
||||
2. `backend/config.yaml` (current directory when running from backend/)
|
||||
3. `deer-flow/config.yaml` (parent directory - **recommended location**)
|
||||
|
||||
**Recommended**: Place `config.yaml` in project root (`deer-flow/config.yaml`).
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Config file not found
|
||||
|
||||
```bash
|
||||
# Check where the backend is looking
|
||||
cd deer-flow/backend
|
||||
python -c "from src.config.app_config import AppConfig; print(AppConfig.resolve_config_path())"
|
||||
```
|
||||
|
||||
If it can't find the config:
|
||||
1. Ensure you've copied `config.example.yaml` to `config.yaml`
|
||||
2. Verify you're in the correct directory
|
||||
3. Check the file exists: `ls -la ../config.yaml`
|
||||
|
||||
### Permission denied
|
||||
|
||||
```bash
|
||||
chmod 600 ../config.yaml # Protect sensitive configuration
|
||||
```
|
||||
|
||||
## See Also
|
||||
|
||||
- [Configuration Guide](docs/CONFIGURATION.md) - Detailed configuration options
|
||||
- [Architecture Overview](CLAUDE.md) - System architecture
|
||||
221
backend/docs/CONFIGURATION.md
Normal file
221
backend/docs/CONFIGURATION.md
Normal file
@@ -0,0 +1,221 @@
|
||||
# Configuration Guide
|
||||
|
||||
This guide explains how to configure DeerFlow for your environment.
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. **Copy the example configuration** (from project root):
|
||||
```bash
|
||||
# From project root directory (deer-flow/)
|
||||
cp config.example.yaml config.yaml
|
||||
```
|
||||
|
||||
2. **Set your API keys**:
|
||||
|
||||
Option A: Use environment variables (recommended):
|
||||
```bash
|
||||
export OPENAI_API_KEY="your-api-key-here"
|
||||
export ANTHROPIC_API_KEY="your-api-key-here"
|
||||
# Add other keys as needed
|
||||
```
|
||||
|
||||
Option B: Edit `config.yaml` directly (not recommended for production):
|
||||
```yaml
|
||||
models:
|
||||
- name: gpt-4
|
||||
api_key: your-actual-api-key-here # Replace placeholder
|
||||
```
|
||||
|
||||
3. **Start the application**:
|
||||
```bash
|
||||
make dev
|
||||
```
|
||||
|
||||
## Configuration Sections
|
||||
|
||||
### Models
|
||||
|
||||
Configure the LLM models available to the agent:
|
||||
|
||||
```yaml
|
||||
models:
|
||||
- name: gpt-4 # Internal identifier
|
||||
display_name: GPT-4 # Human-readable name
|
||||
use: langchain_openai:ChatOpenAI # LangChain class path
|
||||
model: gpt-4 # Model identifier for API
|
||||
api_key: $OPENAI_API_KEY # API key (use env var)
|
||||
max_tokens: 4096 # Max tokens per request
|
||||
temperature: 0.7 # Sampling temperature
|
||||
```
|
||||
|
||||
**Supported Providers**:
|
||||
- OpenAI (`langchain_openai:ChatOpenAI`)
|
||||
- Anthropic (`langchain_anthropic:ChatAnthropic`)
|
||||
- DeepSeek (`langchain_deepseek:ChatDeepSeek`)
|
||||
- Any LangChain-compatible provider
|
||||
|
||||
**Thinking Models**:
|
||||
Some models support "thinking" mode for complex reasoning:
|
||||
|
||||
```yaml
|
||||
models:
|
||||
- name: deepseek-v3
|
||||
supports_thinking: true
|
||||
when_thinking_enabled:
|
||||
extra_body:
|
||||
thinking:
|
||||
type: enabled
|
||||
```
|
||||
|
||||
### Tool Groups
|
||||
|
||||
Organize tools into logical groups:
|
||||
|
||||
```yaml
|
||||
tool_groups:
|
||||
- name: web # Web browsing and search
|
||||
- name: file:read # Read-only file operations
|
||||
- name: file:write # Write file operations
|
||||
- name: bash # Shell command execution
|
||||
```
|
||||
|
||||
### Tools
|
||||
|
||||
Configure specific tools available to the agent:
|
||||
|
||||
```yaml
|
||||
tools:
|
||||
- name: web_search
|
||||
group: web
|
||||
use: src.community.tavily.tools:web_search_tool
|
||||
max_results: 5
|
||||
# api_key: $TAVILY_API_KEY # Optional
|
||||
```
|
||||
|
||||
**Built-in Tools**:
|
||||
- `web_search` - Search the web (Tavily)
|
||||
- `web_fetch` - Fetch web pages (Jina AI)
|
||||
- `ls` - List directory contents
|
||||
- `read_file` - Read file contents
|
||||
- `write_file` - Write file contents
|
||||
- `str_replace` - String replacement in files
|
||||
- `bash` - Execute bash commands
|
||||
|
||||
### Sandbox
|
||||
|
||||
Choose between local execution or Docker-based isolation:
|
||||
|
||||
**Option 1: Local Sandbox** (default, simpler setup):
|
||||
```yaml
|
||||
sandbox:
|
||||
use: src.sandbox.local:LocalSandboxProvider
|
||||
```
|
||||
|
||||
**Option 2: Docker Sandbox** (isolated, more secure):
|
||||
```yaml
|
||||
sandbox:
|
||||
use: src.community.aio_sandbox:AioSandboxProvider
|
||||
port: 8080
|
||||
auto_start: true
|
||||
container_prefix: deer-flow-sandbox
|
||||
|
||||
# Optional: Additional mounts
|
||||
mounts:
|
||||
- host_path: /path/on/host
|
||||
container_path: /path/in/container
|
||||
read_only: false
|
||||
```
|
||||
|
||||
### Skills
|
||||
|
||||
Configure the skills directory for specialized workflows:
|
||||
|
||||
```yaml
|
||||
skills:
|
||||
# Host path (optional, default: ../skills)
|
||||
path: /custom/path/to/skills
|
||||
|
||||
# Container mount path (default: /mnt/skills)
|
||||
container_path: /mnt/skills
|
||||
```
|
||||
|
||||
**How Skills Work**:
|
||||
- Skills are stored in `deer-flow/skills/{public,custom}/`
|
||||
- Each skill has a `SKILL.md` file with metadata
|
||||
- Skills are automatically discovered and loaded
|
||||
- Available in both local and Docker sandbox via path mapping
|
||||
|
||||
### Title Generation
|
||||
|
||||
Automatic conversation title generation:
|
||||
|
||||
```yaml
|
||||
title:
|
||||
enabled: true
|
||||
max_words: 6
|
||||
max_chars: 60
|
||||
model_name: null # Use first model in list
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
DeerFlow supports environment variable substitution using the `$` prefix:
|
||||
|
||||
```yaml
|
||||
models:
|
||||
- api_key: $OPENAI_API_KEY # Reads from environment
|
||||
```
|
||||
|
||||
**Common Environment Variables**:
|
||||
- `OPENAI_API_KEY` - OpenAI API key
|
||||
- `ANTHROPIC_API_KEY` - Anthropic API key
|
||||
- `DEEPSEEK_API_KEY` - DeepSeek API key
|
||||
- `TAVILY_API_KEY` - Tavily search API key
|
||||
- `DEER_FLOW_CONFIG_PATH` - Custom config file path
|
||||
|
||||
## Configuration Location
|
||||
|
||||
The configuration file should be placed in the **project root directory** (`deer-flow/config.yaml`), not in the backend directory.
|
||||
|
||||
## Configuration Priority
|
||||
|
||||
DeerFlow searches for configuration in this order:
|
||||
|
||||
1. Path specified in code via `config_path` argument
|
||||
2. Path from `DEER_FLOW_CONFIG_PATH` environment variable
|
||||
3. `config.yaml` in current working directory (typically `backend/` when running)
|
||||
4. `config.yaml` in parent directory (project root: `deer-flow/`)
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Place `config.yaml` in project root** - Not in `backend/` directory
|
||||
2. **Never commit `config.yaml`** - It's already in `.gitignore`
|
||||
3. **Use environment variables for secrets** - Don't hardcode API keys
|
||||
4. **Keep `config.example.yaml` updated** - Document all new options
|
||||
5. **Test configuration changes locally** - Before deploying
|
||||
6. **Use Docker sandbox for production** - Better isolation and security
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Config file not found"
|
||||
- Ensure `config.yaml` exists in the **project root** directory (`deer-flow/config.yaml`)
|
||||
- The backend searches parent directory by default, so root location is preferred
|
||||
- Alternatively, set `DEER_FLOW_CONFIG_PATH` environment variable to custom location
|
||||
|
||||
### "Invalid API key"
|
||||
- Verify environment variables are set correctly
|
||||
- Check that `$` prefix is used for env var references
|
||||
|
||||
### "Skills not loading"
|
||||
- Check that `deer-flow/skills/` directory exists
|
||||
- Verify skills have valid `SKILL.md` files
|
||||
- Check `skills.path` configuration if using custom path
|
||||
|
||||
### "Docker sandbox fails to start"
|
||||
- Ensure Docker is running
|
||||
- Check port 8080 (or configured port) is available
|
||||
- Verify Docker image is accessible
|
||||
|
||||
## Examples
|
||||
|
||||
See `config.example.yaml` for complete examples of all configuration options.
|
||||
@@ -1,6 +1,8 @@
|
||||
from datetime import datetime
|
||||
|
||||
SYSTEM_PROMPT = f"""
|
||||
from src.skills import load_skills
|
||||
|
||||
SYSTEM_PROMPT_TEMPLATE = """
|
||||
<role>
|
||||
You are DeerFlow 2.0, an open-source super agent.
|
||||
</role>
|
||||
@@ -14,19 +16,16 @@ You are DeerFlow 2.0, an open-source super agent.
|
||||
You have access to skills that provide optimized workflows for specific tasks. Each skill contains best practices, frameworks, and references to additional resources.
|
||||
|
||||
**Progressive Loading Pattern:**
|
||||
1. When a user query matches a skill's use case, immediately call `view` on the skill's main file located at `/mnt/skills/{"{skill_name}"}/SKILL.md`
|
||||
1. When a user query matches a skill's use case, immediately call `view` on the skill's main file using the path attribute provided in the skill tag below
|
||||
2. Read and understand the skill's workflow and instructions
|
||||
3. The skill file contains references to external resources under the same folder
|
||||
4. Load referenced resources only when needed during execution
|
||||
5. Follow the skill's instructions precisely
|
||||
|
||||
**Skills are located at:** {skills_base_path}
|
||||
|
||||
<all_available_skills>
|
||||
<skill name="generate-web-page">
|
||||
Generate a web page or web application
|
||||
</skill>
|
||||
<skill name="pdf-processing">
|
||||
Extract text, fill forms, merge PDFs (pypdf, pdfplumber)
|
||||
</skill>
|
||||
{skills_list}
|
||||
</all_available_skills>
|
||||
|
||||
</skill_system>
|
||||
@@ -64,4 +63,27 @@ All temporary work happens in `/mnt/user-data/workspace`. Final deliverables mus
|
||||
|
||||
|
||||
def apply_prompt_template() -> str:
|
||||
return SYSTEM_PROMPT + f"\n<current_date>{datetime.now().strftime('%Y-%m-%d, %A')}</current_date>"
|
||||
# Load all available skills
|
||||
skills = load_skills()
|
||||
|
||||
# Get skills container path from config
|
||||
try:
|
||||
from src.config import get_app_config
|
||||
|
||||
config = get_app_config()
|
||||
container_base_path = config.skills.container_path
|
||||
except Exception:
|
||||
# Fallback to default if config fails
|
||||
container_base_path = "/mnt/skills"
|
||||
|
||||
# Generate skills list XML with paths
|
||||
skills_list = "\n".join(f'<skill name="{skill.name}" path="{skill.get_container_path(container_base_path)}">\n{skill.description}\n</skill>' for skill in skills)
|
||||
|
||||
# If no skills found, provide empty list
|
||||
if not skills_list:
|
||||
skills_list = "<!-- No skills available -->"
|
||||
|
||||
# Format the prompt with dynamic skills
|
||||
prompt = SYSTEM_PROMPT_TEMPLATE.format(skills_list=skills_list, skills_base_path=container_base_path)
|
||||
|
||||
return prompt + f"\n<current_date>{datetime.now().strftime('%Y-%m-%d, %A')}</current_date>"
|
||||
|
||||
@@ -100,6 +100,26 @@ class AioSandboxProvider(SandboxProvider):
|
||||
(str(thread_dir / "outputs"), f"{CONTAINER_USER_DATA_DIR}/outputs", False),
|
||||
]
|
||||
|
||||
def _get_skills_mount(self) -> tuple[str, str, bool] | None:
|
||||
"""Get the skills directory mount configuration.
|
||||
|
||||
Returns:
|
||||
Tuple of (host_path, container_path, read_only) if skills directory exists,
|
||||
None otherwise.
|
||||
"""
|
||||
try:
|
||||
config = get_app_config()
|
||||
skills_path = config.skills.get_skills_path()
|
||||
container_path = config.skills.container_path
|
||||
|
||||
# Only mount if skills directory exists
|
||||
if skills_path.exists():
|
||||
return (str(skills_path), container_path, True) # Read-only mount for security
|
||||
except Exception as e:
|
||||
logger.warning(f"Could not setup skills mount: {e}")
|
||||
|
||||
return None
|
||||
|
||||
def _start_container(self, sandbox_id: str, port: int, extra_mounts: list[tuple[str, str, bool]] | None = None) -> str:
|
||||
"""Start a new Docker container for the sandbox.
|
||||
|
||||
@@ -208,11 +228,17 @@ class AioSandboxProvider(SandboxProvider):
|
||||
sandbox_id = str(uuid.uuid4())[:8]
|
||||
|
||||
# Get thread-specific mounts if thread_id is provided
|
||||
extra_mounts = None
|
||||
extra_mounts = []
|
||||
if thread_id:
|
||||
extra_mounts = self._get_thread_mounts(thread_id)
|
||||
extra_mounts.extend(self._get_thread_mounts(thread_id))
|
||||
logger.info(f"Adding thread mounts for thread {thread_id}: {extra_mounts}")
|
||||
|
||||
# Add skills mount if available
|
||||
skills_mount = self._get_skills_mount()
|
||||
if skills_mount:
|
||||
extra_mounts.append(skills_mount)
|
||||
logger.info(f"Adding skills mount: {skills_mount}")
|
||||
|
||||
# If base_url is configured, use existing sandbox
|
||||
if self._config.get("base_url"):
|
||||
base_url = self._config["base_url"]
|
||||
@@ -230,7 +256,7 @@ class AioSandboxProvider(SandboxProvider):
|
||||
raise RuntimeError("auto_start is disabled and no base_url is configured")
|
||||
|
||||
port = self._find_available_port(self._config["port"])
|
||||
container_id = self._start_container(sandbox_id, port, extra_mounts=extra_mounts)
|
||||
container_id = self._start_container(sandbox_id, port, extra_mounts=extra_mounts if extra_mounts else None)
|
||||
self._containers[sandbox_id] = container_id
|
||||
|
||||
base_url = f"http://localhost:{port}"
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
from .app_config import get_app_config
|
||||
from .skills_config import SkillsConfig
|
||||
|
||||
__all__ = ["get_app_config"]
|
||||
__all__ = ["get_app_config", "SkillsConfig"]
|
||||
|
||||
@@ -8,6 +8,7 @@ from pydantic import BaseModel, ConfigDict, Field
|
||||
|
||||
from src.config.model_config import ModelConfig
|
||||
from src.config.sandbox_config import SandboxConfig
|
||||
from src.config.skills_config import SkillsConfig
|
||||
from src.config.title_config import load_title_config_from_dict
|
||||
from src.config.tool_config import ToolConfig, ToolGroupConfig
|
||||
|
||||
@@ -21,6 +22,7 @@ class AppConfig(BaseModel):
|
||||
sandbox: SandboxConfig = Field(description="Sandbox configuration")
|
||||
tools: list[ToolConfig] = Field(default_factory=list, description="Available tools")
|
||||
tool_groups: list[ToolGroupConfig] = Field(default_factory=list, description="Available tool groups")
|
||||
skills: SkillsConfig = Field(default_factory=SkillsConfig, description="Skills configuration")
|
||||
model_config = ConfigDict(extra="allow", frozen=False)
|
||||
|
||||
@classmethod
|
||||
|
||||
49
backend/src/config/skills_config.py
Normal file
49
backend/src/config/skills_config.py
Normal file
@@ -0,0 +1,49 @@
|
||||
from pathlib import Path
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class SkillsConfig(BaseModel):
|
||||
"""Configuration for skills system"""
|
||||
|
||||
path: str | None = Field(
|
||||
default=None,
|
||||
description="Path to skills directory. If not specified, defaults to ../skills relative to backend directory",
|
||||
)
|
||||
container_path: str = Field(
|
||||
default="/mnt/skills",
|
||||
description="Path where skills are mounted in the sandbox container",
|
||||
)
|
||||
|
||||
def get_skills_path(self) -> Path:
|
||||
"""
|
||||
Get the resolved skills directory path.
|
||||
|
||||
Returns:
|
||||
Path to the skills directory
|
||||
"""
|
||||
if self.path:
|
||||
# Use configured path (can be absolute or relative)
|
||||
path = Path(self.path)
|
||||
if not path.is_absolute():
|
||||
# If relative, resolve from current working directory
|
||||
path = Path.cwd() / path
|
||||
return path.resolve()
|
||||
else:
|
||||
# Default: ../skills relative to backend directory
|
||||
from src.skills.loader import get_skills_root_path
|
||||
|
||||
return get_skills_root_path()
|
||||
|
||||
def get_skill_container_path(self, skill_name: str, category: str = "public") -> str:
|
||||
"""
|
||||
Get the full container path for a specific skill.
|
||||
|
||||
Args:
|
||||
skill_name: Name of the skill (directory name)
|
||||
category: Category of the skill (public or custom)
|
||||
|
||||
Returns:
|
||||
Full path to the skill in the container
|
||||
"""
|
||||
return f"{self.container_path}/{category}/{skill_name}"
|
||||
@@ -1,13 +1,46 @@
|
||||
import os
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
|
||||
from src.sandbox.local.list_dir import list_dir
|
||||
from src.sandbox.sandbox import Sandbox
|
||||
|
||||
|
||||
class LocalSandbox(Sandbox):
|
||||
def __init__(self, id: str):
|
||||
def __init__(self, id: str, path_mappings: dict[str, str] | None = None):
|
||||
"""
|
||||
Initialize local sandbox with optional path mappings.
|
||||
|
||||
Args:
|
||||
id: Sandbox identifier
|
||||
path_mappings: Dictionary mapping container paths to local paths
|
||||
Example: {"/mnt/skills": "/absolute/path/to/skills"}
|
||||
"""
|
||||
super().__init__(id)
|
||||
self.path_mappings = path_mappings or {}
|
||||
|
||||
def _resolve_path(self, path: str) -> str:
|
||||
"""
|
||||
Resolve container path to actual local path using mappings.
|
||||
|
||||
Args:
|
||||
path: Path that might be a container path
|
||||
|
||||
Returns:
|
||||
Resolved local path
|
||||
"""
|
||||
path_str = str(path)
|
||||
|
||||
# Try each mapping (longest prefix first for more specific matches)
|
||||
for container_path, local_path in sorted(self.path_mappings.items(), key=lambda x: len(x[0]), reverse=True):
|
||||
if path_str.startswith(container_path):
|
||||
# Replace the container path prefix with local path
|
||||
relative = path_str[len(container_path) :].lstrip("/")
|
||||
resolved = str(Path(local_path) / relative) if relative else local_path
|
||||
return resolved
|
||||
|
||||
# No mapping found, return original path
|
||||
return path_str
|
||||
|
||||
def execute_command(self, command: str) -> str:
|
||||
result = subprocess.run(
|
||||
@@ -26,16 +59,19 @@ class LocalSandbox(Sandbox):
|
||||
return output if output else "(no output)"
|
||||
|
||||
def list_dir(self, path: str, max_depth=2) -> list[str]:
|
||||
return list_dir(path, max_depth)
|
||||
resolved_path = self._resolve_path(path)
|
||||
return list_dir(resolved_path, max_depth)
|
||||
|
||||
def read_file(self, path: str) -> str:
|
||||
with open(path) as f:
|
||||
resolved_path = self._resolve_path(path)
|
||||
with open(resolved_path) as f:
|
||||
return f.read()
|
||||
|
||||
def write_file(self, path: str, content: str, append: bool = False) -> None:
|
||||
dir_path = os.path.dirname(path)
|
||||
resolved_path = self._resolve_path(path)
|
||||
dir_path = os.path.dirname(resolved_path)
|
||||
if dir_path:
|
||||
os.makedirs(dir_path, exist_ok=True)
|
||||
mode = "a" if append else "w"
|
||||
with open(path, mode) as f:
|
||||
with open(resolved_path, mode) as f:
|
||||
f.write(content)
|
||||
|
||||
@@ -5,10 +5,42 @@ _singleton: LocalSandbox | None = None
|
||||
|
||||
|
||||
class LocalSandboxProvider(SandboxProvider):
|
||||
def __init__(self):
|
||||
"""Initialize the local sandbox provider with path mappings."""
|
||||
self._path_mappings = self._setup_path_mappings()
|
||||
|
||||
def _setup_path_mappings(self) -> dict[str, str]:
|
||||
"""
|
||||
Setup path mappings for local sandbox.
|
||||
|
||||
Maps container paths to actual local paths, including skills directory.
|
||||
|
||||
Returns:
|
||||
Dictionary of path mappings
|
||||
"""
|
||||
mappings = {}
|
||||
|
||||
# Map skills container path to local skills directory
|
||||
try:
|
||||
from src.config import get_app_config
|
||||
|
||||
config = get_app_config()
|
||||
skills_path = config.skills.get_skills_path()
|
||||
container_path = config.skills.container_path
|
||||
|
||||
# Only add mapping if skills directory exists
|
||||
if skills_path.exists():
|
||||
mappings[container_path] = str(skills_path)
|
||||
except Exception as e:
|
||||
# Log but don't fail if config loading fails
|
||||
print(f"Warning: Could not setup skills path mapping: {e}")
|
||||
|
||||
return mappings
|
||||
|
||||
def acquire(self, thread_id: str | None = None) -> str:
|
||||
global _singleton
|
||||
if _singleton is None:
|
||||
_singleton = LocalSandbox("local")
|
||||
_singleton = LocalSandbox("local", path_mappings=self._path_mappings)
|
||||
return _singleton.id
|
||||
|
||||
def get(self, sandbox_id: str) -> None:
|
||||
|
||||
4
backend/src/skills/__init__.py
Normal file
4
backend/src/skills/__init__.py
Normal file
@@ -0,0 +1,4 @@
|
||||
from .loader import get_skills_root_path, load_skills
|
||||
from .types import Skill
|
||||
|
||||
__all__ = ["load_skills", "get_skills_root_path", "Skill"]
|
||||
77
backend/src/skills/loader.py
Normal file
77
backend/src/skills/loader.py
Normal file
@@ -0,0 +1,77 @@
|
||||
from pathlib import Path
|
||||
|
||||
from .parser import parse_skill_file
|
||||
from .types import Skill
|
||||
|
||||
|
||||
def get_skills_root_path() -> Path:
|
||||
"""
|
||||
Get the root path of the skills directory.
|
||||
|
||||
Returns:
|
||||
Path to the skills directory (deer-flow/skills)
|
||||
"""
|
||||
# backend directory is current file's parent's parent's parent
|
||||
backend_dir = Path(__file__).resolve().parent.parent.parent
|
||||
# skills directory is sibling to backend directory
|
||||
skills_dir = backend_dir.parent / "skills"
|
||||
return skills_dir
|
||||
|
||||
|
||||
def load_skills(skills_path: Path | None = None, use_config: bool = True) -> list[Skill]:
|
||||
"""
|
||||
Load all skills from the skills directory.
|
||||
|
||||
Scans both public and custom skill directories, parsing SKILL.md files
|
||||
to extract metadata.
|
||||
|
||||
Args:
|
||||
skills_path: Optional custom path to skills directory.
|
||||
If not provided and use_config is True, uses path from config.
|
||||
Otherwise defaults to deer-flow/skills
|
||||
use_config: Whether to load skills path from config (default: True)
|
||||
|
||||
Returns:
|
||||
List of Skill objects, sorted by name
|
||||
"""
|
||||
if skills_path is None:
|
||||
if use_config:
|
||||
try:
|
||||
from src.config import get_app_config
|
||||
|
||||
config = get_app_config()
|
||||
skills_path = config.skills.get_skills_path()
|
||||
except Exception:
|
||||
# Fallback to default if config fails
|
||||
skills_path = get_skills_root_path()
|
||||
else:
|
||||
skills_path = get_skills_root_path()
|
||||
|
||||
if not skills_path.exists():
|
||||
return []
|
||||
|
||||
skills = []
|
||||
|
||||
# Scan public and custom directories
|
||||
for category in ["public", "custom"]:
|
||||
category_path = skills_path / category
|
||||
if not category_path.exists() or not category_path.is_dir():
|
||||
continue
|
||||
|
||||
# Each subdirectory is a potential skill
|
||||
for skill_dir in category_path.iterdir():
|
||||
if not skill_dir.is_dir():
|
||||
continue
|
||||
|
||||
skill_file = skill_dir / "SKILL.md"
|
||||
if not skill_file.exists():
|
||||
continue
|
||||
|
||||
skill = parse_skill_file(skill_file, category=category)
|
||||
if skill:
|
||||
skills.append(skill)
|
||||
|
||||
# Sort by name for consistent ordering
|
||||
skills.sort(key=lambda s: s.name)
|
||||
|
||||
return skills
|
||||
63
backend/src/skills/parser.py
Normal file
63
backend/src/skills/parser.py
Normal file
@@ -0,0 +1,63 @@
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
from .types import Skill
|
||||
|
||||
|
||||
def parse_skill_file(skill_file: Path, category: str) -> Skill | None:
|
||||
"""
|
||||
Parse a SKILL.md file and extract metadata.
|
||||
|
||||
Args:
|
||||
skill_file: Path to the SKILL.md file
|
||||
category: Category of the skill ('public' or 'custom')
|
||||
|
||||
Returns:
|
||||
Skill object if parsing succeeds, None otherwise
|
||||
"""
|
||||
if not skill_file.exists() or skill_file.name != "SKILL.md":
|
||||
return None
|
||||
|
||||
try:
|
||||
content = skill_file.read_text(encoding="utf-8")
|
||||
|
||||
# Extract YAML front matter
|
||||
# Pattern: ---\nkey: value\n---
|
||||
front_matter_match = re.match(r"^---\s*\n(.*?)\n---\s*\n", content, re.DOTALL)
|
||||
|
||||
if not front_matter_match:
|
||||
return None
|
||||
|
||||
front_matter = front_matter_match.group(1)
|
||||
|
||||
# Parse YAML front matter (simple key-value parsing)
|
||||
metadata = {}
|
||||
for line in front_matter.split("\n"):
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
if ":" in line:
|
||||
key, value = line.split(":", 1)
|
||||
metadata[key.strip()] = value.strip()
|
||||
|
||||
# Extract required fields
|
||||
name = metadata.get("name")
|
||||
description = metadata.get("description")
|
||||
|
||||
if not name or not description:
|
||||
return None
|
||||
|
||||
license_text = metadata.get("license")
|
||||
|
||||
return Skill(
|
||||
name=name,
|
||||
description=description,
|
||||
license=license_text,
|
||||
skill_dir=skill_file.parent,
|
||||
skill_file=skill_file,
|
||||
category=category,
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error parsing skill file {skill_file}: {e}")
|
||||
return None
|
||||
34
backend/src/skills/types.py
Normal file
34
backend/src/skills/types.py
Normal file
@@ -0,0 +1,34 @@
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
@dataclass
|
||||
class Skill:
|
||||
"""Represents a skill with its metadata and file path"""
|
||||
|
||||
name: str
|
||||
description: str
|
||||
license: str | None
|
||||
skill_dir: Path
|
||||
skill_file: Path
|
||||
category: str # 'public' or 'custom'
|
||||
|
||||
@property
|
||||
def skill_path(self) -> str:
|
||||
"""Returns the relative path from skills root to this skill's directory"""
|
||||
return self.skill_dir.name
|
||||
|
||||
def get_container_path(self, container_base_path: str = "/mnt/skills") -> str:
|
||||
"""
|
||||
Get the full path to this skill in the container.
|
||||
|
||||
Args:
|
||||
container_base_path: Base path where skills are mounted in the container
|
||||
|
||||
Returns:
|
||||
Full container path to the skill directory
|
||||
"""
|
||||
return f"{container_base_path}/{self.category}/{self.skill_dir.name}"
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"Skill(name={self.name!r}, description={self.description!r}, category={self.category!r})"
|
||||
@@ -1,30 +1,65 @@
|
||||
# Configuration for the DeerFlow application
|
||||
#
|
||||
# Guidelines:
|
||||
# - The default path of this configuration file is `config.yaml` in the CWD (Current Working Directory) or the parent directory of the CWD.
|
||||
# How ever you can change it using the `DEER_FLOW_CONFIG_PATH` environment variable.
|
||||
# - Copy this file to `config.yaml` and customize it for your environment
|
||||
# - The default path of this configuration file is `config.yaml` in the current working directory.
|
||||
# However you can change it using the `DEER_FLOW_CONFIG_PATH` environment variable.
|
||||
# - Environment variables are available for all field values. Example: `api_key: $OPENAI_API_KEY`
|
||||
# - Provider path is a string that looks like "package_name.sub_package_name.module_name:class_name/variable_name".
|
||||
# - The `use` path is a string that looks like "package_name.sub_package_name.module_name:class_name/variable_name".
|
||||
|
||||
# ============================================================================
|
||||
# Models Configuration
|
||||
# ============================================================================
|
||||
# Configure available LLM models for the agent to use
|
||||
|
||||
models:
|
||||
- name: doubao-seed-1.8
|
||||
display_name: Doubao 1.8
|
||||
use: langchain_deepseek:ChatDeepSeek
|
||||
model: doubao-seed-1-8-251228
|
||||
api_base: https://ark.cn-beijing.volces.com/api/v3
|
||||
api_key: $ARK_API_KEY
|
||||
supports_thinking: true
|
||||
when_thinking_enabled:
|
||||
extra_body:
|
||||
thinking:
|
||||
type: enabled
|
||||
- name: gpt-5
|
||||
display_name: GPT-5
|
||||
# Example: OpenAI model
|
||||
- name: gpt-4
|
||||
display_name: GPT-4
|
||||
use: langchain_openai:ChatOpenAI
|
||||
model: gpt-5-251228
|
||||
api_base: https://api.openai.com/v1
|
||||
api_key: $OPENAI_API_KEY
|
||||
supports_thinking: true
|
||||
model: gpt-4
|
||||
api_key: $OPENAI_API_KEY # Use environment variable
|
||||
max_tokens: 4096
|
||||
temperature: 0.7
|
||||
|
||||
# Example: Anthropic Claude model
|
||||
# - name: claude-3-5-sonnet
|
||||
# display_name: Claude 3.5 Sonnet
|
||||
# use: langchain_anthropic:ChatAnthropic
|
||||
# model: claude-3-5-sonnet-20241022
|
||||
# api_key: $ANTHROPIC_API_KEY
|
||||
# max_tokens: 8192
|
||||
|
||||
# Example: DeepSeek model (with thinking support)
|
||||
# - name: deepseek-v3
|
||||
# display_name: DeepSeek V3 (Thinking)
|
||||
# use: langchain_deepseek:ChatDeepSeek
|
||||
# model: deepseek-chat
|
||||
# api_key: $DEEPSEEK_API_KEY
|
||||
# max_tokens: 16384
|
||||
# supports_thinking: true
|
||||
# when_thinking_enabled:
|
||||
# extra_body:
|
||||
# thinking:
|
||||
# type: enabled
|
||||
|
||||
# Example: Volcengine (Doubao) model
|
||||
# - name: doubao-seed-1.8
|
||||
# display_name: Doubao 1.8 (Thinking)
|
||||
# use: langchain_deepseek:ChatDeepSeek
|
||||
# model: ep-m-20260106111913-xxxxx
|
||||
# api_base: https://ark.cn-beijing.volces.com/api/v3
|
||||
# api_key: $VOLCENGINE_API_KEY
|
||||
# supports_thinking: true
|
||||
# when_thinking_enabled:
|
||||
# extra_body:
|
||||
# thinking:
|
||||
# type: enabled
|
||||
|
||||
# ============================================================================
|
||||
# Tool Groups Configuration
|
||||
# ============================================================================
|
||||
# Define groups of tools for organization and access control
|
||||
|
||||
tool_groups:
|
||||
- name: web
|
||||
@@ -32,17 +67,26 @@ tool_groups:
|
||||
- name: file:write
|
||||
- name: bash
|
||||
|
||||
# ============================================================================
|
||||
# Tools Configuration
|
||||
# ============================================================================
|
||||
# Configure available tools for the agent to use
|
||||
|
||||
tools:
|
||||
# Web search tool (requires Tavily API key)
|
||||
- name: web_search
|
||||
group: web
|
||||
use: src.community.tavily.tools:web_search_tool
|
||||
max_results: 5
|
||||
# api_key: $TAVILY_API_KEY # Set if needed
|
||||
|
||||
# Web fetch tool (uses Jina AI reader)
|
||||
- name: web_fetch
|
||||
group: web
|
||||
use: src.community.jina_ai.tools:web_fetch_tool
|
||||
timeout: 10
|
||||
|
||||
# File operations tools
|
||||
- name: ls
|
||||
group: file:read
|
||||
use: src.sandbox.tools:ls_tool
|
||||
@@ -59,37 +103,74 @@ tools:
|
||||
group: file:write
|
||||
use: src.sandbox.tools:str_replace_tool
|
||||
|
||||
# Bash execution tool
|
||||
- name: bash
|
||||
group: bash
|
||||
use: src.sandbox.tools:bash_tool
|
||||
|
||||
# ============================================================================
|
||||
# Sandbox Configuration
|
||||
# ============================================================================
|
||||
# Choose between local sandbox (direct execution) or Docker-based AIO sandbox
|
||||
|
||||
# Option 1: Local Sandbox (Default)
|
||||
# Executes commands directly on the host machine
|
||||
sandbox:
|
||||
use: src.sandbox.local:LocalSandboxProvider
|
||||
|
||||
# To use Docker-based AIO sandbox instead, uncomment the following:
|
||||
# Option 2: Docker-based AIO Sandbox
|
||||
# Executes commands in isolated Docker containers
|
||||
# Uncomment to use:
|
||||
# sandbox:
|
||||
# use: src.community.aio_sandbox:AioSandboxProvider
|
||||
#
|
||||
# # Optional: Use existing sandbox at this URL (no Docker container will be started)
|
||||
# # base_url: http://localhost:8080
|
||||
# # Optional: Docker image to use (default: enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest)
|
||||
#
|
||||
# # Optional: Docker image to use
|
||||
# # Default: enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest
|
||||
# # image: enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest
|
||||
#
|
||||
# # Optional: Base port for sandbox containers (default: 8080)
|
||||
# # port: 8080
|
||||
#
|
||||
# # Optional: Whether to automatically start Docker container (default: true)
|
||||
# # auto_start: true
|
||||
#
|
||||
# # Optional: Prefix for container names (default: deer-flow-sandbox)
|
||||
# # container_prefix: deer-flow-sandbox
|
||||
# # Optional: Mount directories from host to container
|
||||
#
|
||||
# # Optional: Additional mount directories from host to container
|
||||
# # NOTE: Skills directory is automatically mounted from skills.path to skills.container_path
|
||||
# # mounts:
|
||||
# # # Other custom mounts
|
||||
# # - host_path: /path/on/host
|
||||
# # container_path: /home/user/shared
|
||||
# # read_only: false
|
||||
# # - host_path: /another/path
|
||||
# # container_path: /data
|
||||
# # read_only: true
|
||||
|
||||
# Automatic thread title generation
|
||||
# ============================================================================
|
||||
# Skills Configuration
|
||||
# ============================================================================
|
||||
# Configure skills directory for specialized agent workflows
|
||||
|
||||
skills:
|
||||
# Path to skills directory on the host (relative to project root or absolute)
|
||||
# Default: ../skills (relative to backend directory)
|
||||
# Uncomment to customize:
|
||||
# path: /absolute/path/to/custom/skills
|
||||
|
||||
# Path where skills are mounted in the sandbox container
|
||||
# This is used by the agent to access skills in both local and Docker sandbox
|
||||
# Default: /mnt/skills
|
||||
container_path: /mnt/skills
|
||||
|
||||
# ============================================================================
|
||||
# Title Generation Configuration
|
||||
# ============================================================================
|
||||
# Automatic conversation title generation settings
|
||||
|
||||
title:
|
||||
enabled: true
|
||||
max_words: 6
|
||||
max_chars: 60
|
||||
model_name: null # Use default model
|
||||
model_name: null # Use default model (first model in models list)
|
||||
|
||||
177
skills/public/frontend-design/LICENSE.txt
Normal file
177
skills/public/frontend-design/LICENSE.txt
Normal file
@@ -0,0 +1,177 @@
|
||||
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
42
skills/public/frontend-design/SKILL.md
Normal file
42
skills/public/frontend-design/SKILL.md
Normal file
@@ -0,0 +1,42 @@
|
||||
---
|
||||
name: frontend-design
|
||||
description: Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.
|
||||
license: Complete terms in LICENSE.txt
|
||||
---
|
||||
|
||||
This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.
|
||||
|
||||
The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.
|
||||
|
||||
## Design Thinking
|
||||
|
||||
Before coding, understand the context and commit to a BOLD aesthetic direction:
|
||||
- **Purpose**: What problem does this interface solve? Who uses it?
|
||||
- **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
|
||||
- **Constraints**: Technical requirements (framework, performance, accessibility).
|
||||
- **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
|
||||
|
||||
**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.
|
||||
|
||||
Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
|
||||
- Production-grade and functional
|
||||
- Visually striking and memorable
|
||||
- Cohesive with a clear aesthetic point-of-view
|
||||
- Meticulously refined in every detail
|
||||
|
||||
## Frontend Aesthetics Guidelines
|
||||
|
||||
Focus on:
|
||||
- **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
|
||||
- **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
|
||||
- **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
|
||||
- **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
|
||||
- **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.
|
||||
|
||||
NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.
|
||||
|
||||
Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.
|
||||
|
||||
**IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.
|
||||
|
||||
Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.
|
||||
30
skills/public/pdf-processing/LICENSE.txt
Normal file
30
skills/public/pdf-processing/LICENSE.txt
Normal file
@@ -0,0 +1,30 @@
|
||||
© 2025 Anthropic, PBC. All rights reserved.
|
||||
|
||||
LICENSE: Use of these materials (including all code, prompts, assets, files,
|
||||
and other components of this Skill) is governed by your agreement with
|
||||
Anthropic regarding use of Anthropic's services. If no separate agreement
|
||||
exists, use is governed by Anthropic's Consumer Terms of Service or
|
||||
Commercial Terms of Service, as applicable:
|
||||
https://www.anthropic.com/legal/consumer-terms
|
||||
https://www.anthropic.com/legal/commercial-terms
|
||||
Your applicable agreement is referred to as the "Agreement." "Services" are
|
||||
as defined in the Agreement.
|
||||
|
||||
ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the
|
||||
contrary, users may not:
|
||||
|
||||
- Extract these materials from the Services or retain copies of these
|
||||
materials outside the Services
|
||||
- Reproduce or copy these materials, except for temporary copies created
|
||||
automatically during authorized use of the Services
|
||||
- Create derivative works based on these materials
|
||||
- Distribute, sublicense, or transfer these materials to any third party
|
||||
- Make, offer to sell, sell, or import any inventions embodied in these
|
||||
materials
|
||||
- Reverse engineer, decompile, or disassemble these materials
|
||||
|
||||
The receipt, viewing, or possession of these materials does not convey or
|
||||
imply any license or right beyond those expressly granted above.
|
||||
|
||||
Anthropic retains all right, title, and interest in these materials,
|
||||
including all copyrights, patents, and other intellectual property rights.
|
||||
294
skills/public/pdf-processing/SKILL.md
Normal file
294
skills/public/pdf-processing/SKILL.md
Normal file
@@ -0,0 +1,294 @@
|
||||
---
|
||||
name: pdf
|
||||
description: Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
|
||||
license: Proprietary. LICENSE.txt has complete terms
|
||||
---
|
||||
|
||||
# PDF Processing Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see reference.md. If you need to fill out a PDF form, read forms.md and follow its instructions.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```python
|
||||
from pypdf import PdfReader, PdfWriter
|
||||
|
||||
# Read a PDF
|
||||
reader = PdfReader("document.pdf")
|
||||
print(f"Pages: {len(reader.pages)}")
|
||||
|
||||
# Extract text
|
||||
text = ""
|
||||
for page in reader.pages:
|
||||
text += page.extract_text()
|
||||
```
|
||||
|
||||
## Python Libraries
|
||||
|
||||
### pypdf - Basic Operations
|
||||
|
||||
#### Merge PDFs
|
||||
```python
|
||||
from pypdf import PdfWriter, PdfReader
|
||||
|
||||
writer = PdfWriter()
|
||||
for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
|
||||
reader = PdfReader(pdf_file)
|
||||
for page in reader.pages:
|
||||
writer.add_page(page)
|
||||
|
||||
with open("merged.pdf", "wb") as output:
|
||||
writer.write(output)
|
||||
```
|
||||
|
||||
#### Split PDF
|
||||
```python
|
||||
reader = PdfReader("input.pdf")
|
||||
for i, page in enumerate(reader.pages):
|
||||
writer = PdfWriter()
|
||||
writer.add_page(page)
|
||||
with open(f"page_{i+1}.pdf", "wb") as output:
|
||||
writer.write(output)
|
||||
```
|
||||
|
||||
#### Extract Metadata
|
||||
```python
|
||||
reader = PdfReader("document.pdf")
|
||||
meta = reader.metadata
|
||||
print(f"Title: {meta.title}")
|
||||
print(f"Author: {meta.author}")
|
||||
print(f"Subject: {meta.subject}")
|
||||
print(f"Creator: {meta.creator}")
|
||||
```
|
||||
|
||||
#### Rotate Pages
|
||||
```python
|
||||
reader = PdfReader("input.pdf")
|
||||
writer = PdfWriter()
|
||||
|
||||
page = reader.pages[0]
|
||||
page.rotate(90) # Rotate 90 degrees clockwise
|
||||
writer.add_page(page)
|
||||
|
||||
with open("rotated.pdf", "wb") as output:
|
||||
writer.write(output)
|
||||
```
|
||||
|
||||
### pdfplumber - Text and Table Extraction
|
||||
|
||||
#### Extract Text with Layout
|
||||
```python
|
||||
import pdfplumber
|
||||
|
||||
with pdfplumber.open("document.pdf") as pdf:
|
||||
for page in pdf.pages:
|
||||
text = page.extract_text()
|
||||
print(text)
|
||||
```
|
||||
|
||||
#### Extract Tables
|
||||
```python
|
||||
with pdfplumber.open("document.pdf") as pdf:
|
||||
for i, page in enumerate(pdf.pages):
|
||||
tables = page.extract_tables()
|
||||
for j, table in enumerate(tables):
|
||||
print(f"Table {j+1} on page {i+1}:")
|
||||
for row in table:
|
||||
print(row)
|
||||
```
|
||||
|
||||
#### Advanced Table Extraction
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
with pdfplumber.open("document.pdf") as pdf:
|
||||
all_tables = []
|
||||
for page in pdf.pages:
|
||||
tables = page.extract_tables()
|
||||
for table in tables:
|
||||
if table: # Check if table is not empty
|
||||
df = pd.DataFrame(table[1:], columns=table[0])
|
||||
all_tables.append(df)
|
||||
|
||||
# Combine all tables
|
||||
if all_tables:
|
||||
combined_df = pd.concat(all_tables, ignore_index=True)
|
||||
combined_df.to_excel("extracted_tables.xlsx", index=False)
|
||||
```
|
||||
|
||||
### reportlab - Create PDFs
|
||||
|
||||
#### Basic PDF Creation
|
||||
```python
|
||||
from reportlab.lib.pagesizes import letter
|
||||
from reportlab.pdfgen import canvas
|
||||
|
||||
c = canvas.Canvas("hello.pdf", pagesize=letter)
|
||||
width, height = letter
|
||||
|
||||
# Add text
|
||||
c.drawString(100, height - 100, "Hello World!")
|
||||
c.drawString(100, height - 120, "This is a PDF created with reportlab")
|
||||
|
||||
# Add a line
|
||||
c.line(100, height - 140, 400, height - 140)
|
||||
|
||||
# Save
|
||||
c.save()
|
||||
```
|
||||
|
||||
#### Create PDF with Multiple Pages
|
||||
```python
|
||||
from reportlab.lib.pagesizes import letter
|
||||
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
|
||||
from reportlab.lib.styles import getSampleStyleSheet
|
||||
|
||||
doc = SimpleDocTemplate("report.pdf", pagesize=letter)
|
||||
styles = getSampleStyleSheet()
|
||||
story = []
|
||||
|
||||
# Add content
|
||||
title = Paragraph("Report Title", styles['Title'])
|
||||
story.append(title)
|
||||
story.append(Spacer(1, 12))
|
||||
|
||||
body = Paragraph("This is the body of the report. " * 20, styles['Normal'])
|
||||
story.append(body)
|
||||
story.append(PageBreak())
|
||||
|
||||
# Page 2
|
||||
story.append(Paragraph("Page 2", styles['Heading1']))
|
||||
story.append(Paragraph("Content for page 2", styles['Normal']))
|
||||
|
||||
# Build PDF
|
||||
doc.build(story)
|
||||
```
|
||||
|
||||
## Command-Line Tools
|
||||
|
||||
### pdftotext (poppler-utils)
|
||||
```bash
|
||||
# Extract text
|
||||
pdftotext input.pdf output.txt
|
||||
|
||||
# Extract text preserving layout
|
||||
pdftotext -layout input.pdf output.txt
|
||||
|
||||
# Extract specific pages
|
||||
pdftotext -f 1 -l 5 input.pdf output.txt # Pages 1-5
|
||||
```
|
||||
|
||||
### qpdf
|
||||
```bash
|
||||
# Merge PDFs
|
||||
qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf
|
||||
|
||||
# Split pages
|
||||
qpdf input.pdf --pages . 1-5 -- pages1-5.pdf
|
||||
qpdf input.pdf --pages . 6-10 -- pages6-10.pdf
|
||||
|
||||
# Rotate pages
|
||||
qpdf input.pdf output.pdf --rotate=+90:1 # Rotate page 1 by 90 degrees
|
||||
|
||||
# Remove password
|
||||
qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf
|
||||
```
|
||||
|
||||
### pdftk (if available)
|
||||
```bash
|
||||
# Merge
|
||||
pdftk file1.pdf file2.pdf cat output merged.pdf
|
||||
|
||||
# Split
|
||||
pdftk input.pdf burst
|
||||
|
||||
# Rotate
|
||||
pdftk input.pdf rotate 1east output rotated.pdf
|
||||
```
|
||||
|
||||
## Common Tasks
|
||||
|
||||
### Extract Text from Scanned PDFs
|
||||
```python
|
||||
# Requires: pip install pytesseract pdf2image
|
||||
import pytesseract
|
||||
from pdf2image import convert_from_path
|
||||
|
||||
# Convert PDF to images
|
||||
images = convert_from_path('scanned.pdf')
|
||||
|
||||
# OCR each page
|
||||
text = ""
|
||||
for i, image in enumerate(images):
|
||||
text += f"Page {i+1}:\n"
|
||||
text += pytesseract.image_to_string(image)
|
||||
text += "\n\n"
|
||||
|
||||
print(text)
|
||||
```
|
||||
|
||||
### Add Watermark
|
||||
```python
|
||||
from pypdf import PdfReader, PdfWriter
|
||||
|
||||
# Create watermark (or load existing)
|
||||
watermark = PdfReader("watermark.pdf").pages[0]
|
||||
|
||||
# Apply to all pages
|
||||
reader = PdfReader("document.pdf")
|
||||
writer = PdfWriter()
|
||||
|
||||
for page in reader.pages:
|
||||
page.merge_page(watermark)
|
||||
writer.add_page(page)
|
||||
|
||||
with open("watermarked.pdf", "wb") as output:
|
||||
writer.write(output)
|
||||
```
|
||||
|
||||
### Extract Images
|
||||
```bash
|
||||
# Using pdfimages (poppler-utils)
|
||||
pdfimages -j input.pdf output_prefix
|
||||
|
||||
# This extracts all images as output_prefix-000.jpg, output_prefix-001.jpg, etc.
|
||||
```
|
||||
|
||||
### Password Protection
|
||||
```python
|
||||
from pypdf import PdfReader, PdfWriter
|
||||
|
||||
reader = PdfReader("input.pdf")
|
||||
writer = PdfWriter()
|
||||
|
||||
for page in reader.pages:
|
||||
writer.add_page(page)
|
||||
|
||||
# Add password
|
||||
writer.encrypt("userpassword", "ownerpassword")
|
||||
|
||||
with open("encrypted.pdf", "wb") as output:
|
||||
writer.write(output)
|
||||
```
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Task | Best Tool | Command/Code |
|
||||
|------|-----------|--------------|
|
||||
| Merge PDFs | pypdf | `writer.add_page(page)` |
|
||||
| Split PDFs | pypdf | One page per file |
|
||||
| Extract text | pdfplumber | `page.extract_text()` |
|
||||
| Extract tables | pdfplumber | `page.extract_tables()` |
|
||||
| Create PDFs | reportlab | Canvas or Platypus |
|
||||
| Command line merge | qpdf | `qpdf --empty --pages ...` |
|
||||
| OCR scanned PDFs | pytesseract | Convert to image first |
|
||||
| Fill PDF forms | pdf-lib or pypdf (see forms.md) | See forms.md |
|
||||
|
||||
## Next Steps
|
||||
|
||||
- For advanced pypdfium2 usage, see reference.md
|
||||
- For JavaScript libraries (pdf-lib), see reference.md
|
||||
- If you need to fill out a PDF form, follow the instructions in forms.md
|
||||
- For troubleshooting guides, see reference.md
|
||||
205
skills/public/pdf-processing/forms.md
Normal file
205
skills/public/pdf-processing/forms.md
Normal file
@@ -0,0 +1,205 @@
|
||||
**CRITICAL: You MUST complete these steps in order. Do not skip ahead to writing code.**
|
||||
|
||||
If you need to fill out a PDF form, first check to see if the PDF has fillable form fields. Run this script from this file's directory:
|
||||
`python scripts/check_fillable_fields <file.pdf>`, and depending on the result go to either the "Fillable fields" or "Non-fillable fields" and follow those instructions.
|
||||
|
||||
# Fillable fields
|
||||
If the PDF has fillable form fields:
|
||||
- Run this script from this file's directory: `python scripts/extract_form_field_info.py <input.pdf> <field_info.json>`. It will create a JSON file with a list of fields in this format:
|
||||
```
|
||||
[
|
||||
{
|
||||
"field_id": (unique ID for the field),
|
||||
"page": (page number, 1-based),
|
||||
"rect": ([left, bottom, right, top] bounding box in PDF coordinates, y=0 is the bottom of the page),
|
||||
"type": ("text", "checkbox", "radio_group", or "choice"),
|
||||
},
|
||||
// Checkboxes have "checked_value" and "unchecked_value" properties:
|
||||
{
|
||||
"field_id": (unique ID for the field),
|
||||
"page": (page number, 1-based),
|
||||
"type": "checkbox",
|
||||
"checked_value": (Set the field to this value to check the checkbox),
|
||||
"unchecked_value": (Set the field to this value to uncheck the checkbox),
|
||||
},
|
||||
// Radio groups have a "radio_options" list with the possible choices.
|
||||
{
|
||||
"field_id": (unique ID for the field),
|
||||
"page": (page number, 1-based),
|
||||
"type": "radio_group",
|
||||
"radio_options": [
|
||||
{
|
||||
"value": (set the field to this value to select this radio option),
|
||||
"rect": (bounding box for the radio button for this option)
|
||||
},
|
||||
// Other radio options
|
||||
]
|
||||
},
|
||||
// Multiple choice fields have a "choice_options" list with the possible choices:
|
||||
{
|
||||
"field_id": (unique ID for the field),
|
||||
"page": (page number, 1-based),
|
||||
"type": "choice",
|
||||
"choice_options": [
|
||||
{
|
||||
"value": (set the field to this value to select this option),
|
||||
"text": (display text of the option)
|
||||
},
|
||||
// Other choice options
|
||||
],
|
||||
}
|
||||
]
|
||||
```
|
||||
- Convert the PDF to PNGs (one image for each page) with this script (run from this file's directory):
|
||||
`python scripts/convert_pdf_to_images.py <file.pdf> <output_directory>`
|
||||
Then analyze the images to determine the purpose of each form field (make sure to convert the bounding box PDF coordinates to image coordinates).
|
||||
- Create a `field_values.json` file in this format with the values to be entered for each field:
|
||||
```
|
||||
[
|
||||
{
|
||||
"field_id": "last_name", // Must match the field_id from `extract_form_field_info.py`
|
||||
"description": "The user's last name",
|
||||
"page": 1, // Must match the "page" value in field_info.json
|
||||
"value": "Simpson"
|
||||
},
|
||||
{
|
||||
"field_id": "Checkbox12",
|
||||
"description": "Checkbox to be checked if the user is 18 or over",
|
||||
"page": 1,
|
||||
"value": "/On" // If this is a checkbox, use its "checked_value" value to check it. If it's a radio button group, use one of the "value" values in "radio_options".
|
||||
},
|
||||
// more fields
|
||||
]
|
||||
```
|
||||
- Run the `fill_fillable_fields.py` script from this file's directory to create a filled-in PDF:
|
||||
`python scripts/fill_fillable_fields.py <input pdf> <field_values.json> <output pdf>`
|
||||
This script will verify that the field IDs and values you provide are valid; if it prints error messages, correct the appropriate fields and try again.
|
||||
|
||||
# Non-fillable fields
|
||||
If the PDF doesn't have fillable form fields, you'll need to visually determine where the data should be added and create text annotations. Follow the below steps *exactly*. You MUST perform all of these steps to ensure that the the form is accurately completed. Details for each step are below.
|
||||
- Convert the PDF to PNG images and determine field bounding boxes.
|
||||
- Create a JSON file with field information and validation images showing the bounding boxes.
|
||||
- Validate the the bounding boxes.
|
||||
- Use the bounding boxes to fill in the form.
|
||||
|
||||
## Step 1: Visual Analysis (REQUIRED)
|
||||
- Convert the PDF to PNG images. Run this script from this file's directory:
|
||||
`python scripts/convert_pdf_to_images.py <file.pdf> <output_directory>`
|
||||
The script will create a PNG image for each page in the PDF.
|
||||
- Carefully examine each PNG image and identify all form fields and areas where the user should enter data. For each form field where the user should enter text, determine bounding boxes for both the form field label, and the area where the user should enter text. The label and entry bounding boxes MUST NOT INTERSECT; the text entry box should only include the area where data should be entered. Usually this area will be immediately to the side, above, or below its label. Entry bounding boxes must be tall and wide enough to contain their text.
|
||||
|
||||
These are some examples of form structures that you might see:
|
||||
|
||||
*Label inside box*
|
||||
```
|
||||
┌────────────────────────┐
|
||||
│ Name: │
|
||||
└────────────────────────┘
|
||||
```
|
||||
The input area should be to the right of the "Name" label and extend to the edge of the box.
|
||||
|
||||
*Label before line*
|
||||
```
|
||||
Email: _______________________
|
||||
```
|
||||
The input area should be above the line and include its entire width.
|
||||
|
||||
*Label under line*
|
||||
```
|
||||
_________________________
|
||||
Name
|
||||
```
|
||||
The input area should be above the line and include the entire width of the line. This is common for signature and date fields.
|
||||
|
||||
*Label above line*
|
||||
```
|
||||
Please enter any special requests:
|
||||
________________________________________________
|
||||
```
|
||||
The input area should extend from the bottom of the label to the line, and should include the entire width of the line.
|
||||
|
||||
*Checkboxes*
|
||||
```
|
||||
Are you a US citizen? Yes □ No □
|
||||
```
|
||||
For checkboxes:
|
||||
- Look for small square boxes (□) - these are the actual checkboxes to target. They may be to the left or right of their labels.
|
||||
- Distinguish between label text ("Yes", "No") and the clickable checkbox squares.
|
||||
- The entry bounding box should cover ONLY the small square, not the text label.
|
||||
|
||||
### Step 2: Create fields.json and validation images (REQUIRED)
|
||||
- Create a file named `fields.json` with information for the form fields and bounding boxes in this format:
|
||||
```
|
||||
{
|
||||
"pages": [
|
||||
{
|
||||
"page_number": 1,
|
||||
"image_width": (first page image width in pixels),
|
||||
"image_height": (first page image height in pixels),
|
||||
},
|
||||
{
|
||||
"page_number": 2,
|
||||
"image_width": (second page image width in pixels),
|
||||
"image_height": (second page image height in pixels),
|
||||
}
|
||||
// additional pages
|
||||
],
|
||||
"form_fields": [
|
||||
// Example for a text field.
|
||||
{
|
||||
"page_number": 1,
|
||||
"description": "The user's last name should be entered here",
|
||||
// Bounding boxes are [left, top, right, bottom]. The bounding boxes for the label and text entry should not overlap.
|
||||
"field_label": "Last name",
|
||||
"label_bounding_box": [30, 125, 95, 142],
|
||||
"entry_bounding_box": [100, 125, 280, 142],
|
||||
"entry_text": {
|
||||
"text": "Johnson", // This text will be added as an annotation at the entry_bounding_box location
|
||||
"font_size": 14, // optional, defaults to 14
|
||||
"font_color": "000000", // optional, RRGGBB format, defaults to 000000 (black)
|
||||
}
|
||||
},
|
||||
// Example for a checkbox. TARGET THE SQUARE for the entry bounding box, NOT THE TEXT
|
||||
{
|
||||
"page_number": 2,
|
||||
"description": "Checkbox that should be checked if the user is over 18",
|
||||
"entry_bounding_box": [140, 525, 155, 540], // Small box over checkbox square
|
||||
"field_label": "Yes",
|
||||
"label_bounding_box": [100, 525, 132, 540], // Box containing "Yes" text
|
||||
// Use "X" to check a checkbox.
|
||||
"entry_text": {
|
||||
"text": "X",
|
||||
}
|
||||
}
|
||||
// additional form field entries
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Create validation images by running this script from this file's directory for each page:
|
||||
`python scripts/create_validation_image.py <page_number> <path_to_fields.json> <input_image_path> <output_image_path>
|
||||
|
||||
The validation images will have red rectangles where text should be entered, and blue rectangles covering label text.
|
||||
|
||||
### Step 3: Validate Bounding Boxes (REQUIRED)
|
||||
#### Automated intersection check
|
||||
- Verify that none of bounding boxes intersect and that the entry bounding boxes are tall enough by checking the fields.json file with the `check_bounding_boxes.py` script (run from this file's directory):
|
||||
`python scripts/check_bounding_boxes.py <JSON file>`
|
||||
|
||||
If there are errors, reanalyze the relevant fields, adjust the bounding boxes, and iterate until there are no remaining errors. Remember: label (blue) bounding boxes should contain text labels, entry (red) boxes should not.
|
||||
|
||||
#### Manual image inspection
|
||||
**CRITICAL: Do not proceed without visually inspecting validation images**
|
||||
- Red rectangles must ONLY cover input areas
|
||||
- Red rectangles MUST NOT contain any text
|
||||
- Blue rectangles should contain label text
|
||||
- For checkboxes:
|
||||
- Red rectangle MUST be centered on the checkbox square
|
||||
- Blue rectangle should cover the text label for the checkbox
|
||||
|
||||
- If any rectangles look wrong, fix fields.json, regenerate the validation images, and verify again. Repeat this process until the bounding boxes are fully accurate.
|
||||
|
||||
|
||||
### Step 4: Add annotations to the PDF
|
||||
Run this script from this file's directory to create a filled-out PDF using the information in fields.json:
|
||||
`python scripts/fill_pdf_form_with_annotations.py <input_pdf_path> <path_to_fields.json> <output_pdf_path>
|
||||
612
skills/public/pdf-processing/reference.md
Normal file
612
skills/public/pdf-processing/reference.md
Normal file
@@ -0,0 +1,612 @@
|
||||
# PDF Processing Advanced Reference
|
||||
|
||||
This document contains advanced PDF processing features, detailed examples, and additional libraries not covered in the main skill instructions.
|
||||
|
||||
## pypdfium2 Library (Apache/BSD License)
|
||||
|
||||
### Overview
|
||||
pypdfium2 is a Python binding for PDFium (Chromium's PDF library). It's excellent for fast PDF rendering, image generation, and serves as a PyMuPDF replacement.
|
||||
|
||||
### Render PDF to Images
|
||||
```python
|
||||
import pypdfium2 as pdfium
|
||||
from PIL import Image
|
||||
|
||||
# Load PDF
|
||||
pdf = pdfium.PdfDocument("document.pdf")
|
||||
|
||||
# Render page to image
|
||||
page = pdf[0] # First page
|
||||
bitmap = page.render(
|
||||
scale=2.0, # Higher resolution
|
||||
rotation=0 # No rotation
|
||||
)
|
||||
|
||||
# Convert to PIL Image
|
||||
img = bitmap.to_pil()
|
||||
img.save("page_1.png", "PNG")
|
||||
|
||||
# Process multiple pages
|
||||
for i, page in enumerate(pdf):
|
||||
bitmap = page.render(scale=1.5)
|
||||
img = bitmap.to_pil()
|
||||
img.save(f"page_{i+1}.jpg", "JPEG", quality=90)
|
||||
```
|
||||
|
||||
### Extract Text with pypdfium2
|
||||
```python
|
||||
import pypdfium2 as pdfium
|
||||
|
||||
pdf = pdfium.PdfDocument("document.pdf")
|
||||
for i, page in enumerate(pdf):
|
||||
text = page.get_text()
|
||||
print(f"Page {i+1} text length: {len(text)} chars")
|
||||
```
|
||||
|
||||
## JavaScript Libraries
|
||||
|
||||
### pdf-lib (MIT License)
|
||||
|
||||
pdf-lib is a powerful JavaScript library for creating and modifying PDF documents in any JavaScript environment.
|
||||
|
||||
#### Load and Manipulate Existing PDF
|
||||
```javascript
|
||||
import { PDFDocument } from 'pdf-lib';
|
||||
import fs from 'fs';
|
||||
|
||||
async function manipulatePDF() {
|
||||
// Load existing PDF
|
||||
const existingPdfBytes = fs.readFileSync('input.pdf');
|
||||
const pdfDoc = await PDFDocument.load(existingPdfBytes);
|
||||
|
||||
// Get page count
|
||||
const pageCount = pdfDoc.getPageCount();
|
||||
console.log(`Document has ${pageCount} pages`);
|
||||
|
||||
// Add new page
|
||||
const newPage = pdfDoc.addPage([600, 400]);
|
||||
newPage.drawText('Added by pdf-lib', {
|
||||
x: 100,
|
||||
y: 300,
|
||||
size: 16
|
||||
});
|
||||
|
||||
// Save modified PDF
|
||||
const pdfBytes = await pdfDoc.save();
|
||||
fs.writeFileSync('modified.pdf', pdfBytes);
|
||||
}
|
||||
```
|
||||
|
||||
#### Create Complex PDFs from Scratch
|
||||
```javascript
|
||||
import { PDFDocument, rgb, StandardFonts } from 'pdf-lib';
|
||||
import fs from 'fs';
|
||||
|
||||
async function createPDF() {
|
||||
const pdfDoc = await PDFDocument.create();
|
||||
|
||||
// Add fonts
|
||||
const helveticaFont = await pdfDoc.embedFont(StandardFonts.Helvetica);
|
||||
const helveticaBold = await pdfDoc.embedFont(StandardFonts.HelveticaBold);
|
||||
|
||||
// Add page
|
||||
const page = pdfDoc.addPage([595, 842]); // A4 size
|
||||
const { width, height } = page.getSize();
|
||||
|
||||
// Add text with styling
|
||||
page.drawText('Invoice #12345', {
|
||||
x: 50,
|
||||
y: height - 50,
|
||||
size: 18,
|
||||
font: helveticaBold,
|
||||
color: rgb(0.2, 0.2, 0.8)
|
||||
});
|
||||
|
||||
// Add rectangle (header background)
|
||||
page.drawRectangle({
|
||||
x: 40,
|
||||
y: height - 100,
|
||||
width: width - 80,
|
||||
height: 30,
|
||||
color: rgb(0.9, 0.9, 0.9)
|
||||
});
|
||||
|
||||
// Add table-like content
|
||||
const items = [
|
||||
['Item', 'Qty', 'Price', 'Total'],
|
||||
['Widget', '2', '$50', '$100'],
|
||||
['Gadget', '1', '$75', '$75']
|
||||
];
|
||||
|
||||
let yPos = height - 150;
|
||||
items.forEach(row => {
|
||||
let xPos = 50;
|
||||
row.forEach(cell => {
|
||||
page.drawText(cell, {
|
||||
x: xPos,
|
||||
y: yPos,
|
||||
size: 12,
|
||||
font: helveticaFont
|
||||
});
|
||||
xPos += 120;
|
||||
});
|
||||
yPos -= 25;
|
||||
});
|
||||
|
||||
const pdfBytes = await pdfDoc.save();
|
||||
fs.writeFileSync('created.pdf', pdfBytes);
|
||||
}
|
||||
```
|
||||
|
||||
#### Advanced Merge and Split Operations
|
||||
```javascript
|
||||
import { PDFDocument } from 'pdf-lib';
|
||||
import fs from 'fs';
|
||||
|
||||
async function mergePDFs() {
|
||||
// Create new document
|
||||
const mergedPdf = await PDFDocument.create();
|
||||
|
||||
// Load source PDFs
|
||||
const pdf1Bytes = fs.readFileSync('doc1.pdf');
|
||||
const pdf2Bytes = fs.readFileSync('doc2.pdf');
|
||||
|
||||
const pdf1 = await PDFDocument.load(pdf1Bytes);
|
||||
const pdf2 = await PDFDocument.load(pdf2Bytes);
|
||||
|
||||
// Copy pages from first PDF
|
||||
const pdf1Pages = await mergedPdf.copyPages(pdf1, pdf1.getPageIndices());
|
||||
pdf1Pages.forEach(page => mergedPdf.addPage(page));
|
||||
|
||||
// Copy specific pages from second PDF (pages 0, 2, 4)
|
||||
const pdf2Pages = await mergedPdf.copyPages(pdf2, [0, 2, 4]);
|
||||
pdf2Pages.forEach(page => mergedPdf.addPage(page));
|
||||
|
||||
const mergedPdfBytes = await mergedPdf.save();
|
||||
fs.writeFileSync('merged.pdf', mergedPdfBytes);
|
||||
}
|
||||
```
|
||||
|
||||
### pdfjs-dist (Apache License)
|
||||
|
||||
PDF.js is Mozilla's JavaScript library for rendering PDFs in the browser.
|
||||
|
||||
#### Basic PDF Loading and Rendering
|
||||
```javascript
|
||||
import * as pdfjsLib from 'pdfjs-dist';
|
||||
|
||||
// Configure worker (important for performance)
|
||||
pdfjsLib.GlobalWorkerOptions.workerSrc = './pdf.worker.js';
|
||||
|
||||
async function renderPDF() {
|
||||
// Load PDF
|
||||
const loadingTask = pdfjsLib.getDocument('document.pdf');
|
||||
const pdf = await loadingTask.promise;
|
||||
|
||||
console.log(`Loaded PDF with ${pdf.numPages} pages`);
|
||||
|
||||
// Get first page
|
||||
const page = await pdf.getPage(1);
|
||||
const viewport = page.getViewport({ scale: 1.5 });
|
||||
|
||||
// Render to canvas
|
||||
const canvas = document.createElement('canvas');
|
||||
const context = canvas.getContext('2d');
|
||||
canvas.height = viewport.height;
|
||||
canvas.width = viewport.width;
|
||||
|
||||
const renderContext = {
|
||||
canvasContext: context,
|
||||
viewport: viewport
|
||||
};
|
||||
|
||||
await page.render(renderContext).promise;
|
||||
document.body.appendChild(canvas);
|
||||
}
|
||||
```
|
||||
|
||||
#### Extract Text with Coordinates
|
||||
```javascript
|
||||
import * as pdfjsLib from 'pdfjs-dist';
|
||||
|
||||
async function extractText() {
|
||||
const loadingTask = pdfjsLib.getDocument('document.pdf');
|
||||
const pdf = await loadingTask.promise;
|
||||
|
||||
let fullText = '';
|
||||
|
||||
// Extract text from all pages
|
||||
for (let i = 1; i <= pdf.numPages; i++) {
|
||||
const page = await pdf.getPage(i);
|
||||
const textContent = await page.getTextContent();
|
||||
|
||||
const pageText = textContent.items
|
||||
.map(item => item.str)
|
||||
.join(' ');
|
||||
|
||||
fullText += `\n--- Page ${i} ---\n${pageText}`;
|
||||
|
||||
// Get text with coordinates for advanced processing
|
||||
const textWithCoords = textContent.items.map(item => ({
|
||||
text: item.str,
|
||||
x: item.transform[4],
|
||||
y: item.transform[5],
|
||||
width: item.width,
|
||||
height: item.height
|
||||
}));
|
||||
}
|
||||
|
||||
console.log(fullText);
|
||||
return fullText;
|
||||
}
|
||||
```
|
||||
|
||||
#### Extract Annotations and Forms
|
||||
```javascript
|
||||
import * as pdfjsLib from 'pdfjs-dist';
|
||||
|
||||
async function extractAnnotations() {
|
||||
const loadingTask = pdfjsLib.getDocument('annotated.pdf');
|
||||
const pdf = await loadingTask.promise;
|
||||
|
||||
for (let i = 1; i <= pdf.numPages; i++) {
|
||||
const page = await pdf.getPage(i);
|
||||
const annotations = await page.getAnnotations();
|
||||
|
||||
annotations.forEach(annotation => {
|
||||
console.log(`Annotation type: ${annotation.subtype}`);
|
||||
console.log(`Content: ${annotation.contents}`);
|
||||
console.log(`Coordinates: ${JSON.stringify(annotation.rect)}`);
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Advanced Command-Line Operations
|
||||
|
||||
### poppler-utils Advanced Features
|
||||
|
||||
#### Extract Text with Bounding Box Coordinates
|
||||
```bash
|
||||
# Extract text with bounding box coordinates (essential for structured data)
|
||||
pdftotext -bbox-layout document.pdf output.xml
|
||||
|
||||
# The XML output contains precise coordinates for each text element
|
||||
```
|
||||
|
||||
#### Advanced Image Conversion
|
||||
```bash
|
||||
# Convert to PNG images with specific resolution
|
||||
pdftoppm -png -r 300 document.pdf output_prefix
|
||||
|
||||
# Convert specific page range with high resolution
|
||||
pdftoppm -png -r 600 -f 1 -l 3 document.pdf high_res_pages
|
||||
|
||||
# Convert to JPEG with quality setting
|
||||
pdftoppm -jpeg -jpegopt quality=85 -r 200 document.pdf jpeg_output
|
||||
```
|
||||
|
||||
#### Extract Embedded Images
|
||||
```bash
|
||||
# Extract all embedded images with metadata
|
||||
pdfimages -j -p document.pdf page_images
|
||||
|
||||
# List image info without extracting
|
||||
pdfimages -list document.pdf
|
||||
|
||||
# Extract images in their original format
|
||||
pdfimages -all document.pdf images/img
|
||||
```
|
||||
|
||||
### qpdf Advanced Features
|
||||
|
||||
#### Complex Page Manipulation
|
||||
```bash
|
||||
# Split PDF into groups of pages
|
||||
qpdf --split-pages=3 input.pdf output_group_%02d.pdf
|
||||
|
||||
# Extract specific pages with complex ranges
|
||||
qpdf input.pdf --pages input.pdf 1,3-5,8,10-end -- extracted.pdf
|
||||
|
||||
# Merge specific pages from multiple PDFs
|
||||
qpdf --empty --pages doc1.pdf 1-3 doc2.pdf 5-7 doc3.pdf 2,4 -- combined.pdf
|
||||
```
|
||||
|
||||
#### PDF Optimization and Repair
|
||||
```bash
|
||||
# Optimize PDF for web (linearize for streaming)
|
||||
qpdf --linearize input.pdf optimized.pdf
|
||||
|
||||
# Remove unused objects and compress
|
||||
qpdf --optimize-level=all input.pdf compressed.pdf
|
||||
|
||||
# Attempt to repair corrupted PDF structure
|
||||
qpdf --check input.pdf
|
||||
qpdf --fix-qdf damaged.pdf repaired.pdf
|
||||
|
||||
# Show detailed PDF structure for debugging
|
||||
qpdf --show-all-pages input.pdf > structure.txt
|
||||
```
|
||||
|
||||
#### Advanced Encryption
|
||||
```bash
|
||||
# Add password protection with specific permissions
|
||||
qpdf --encrypt user_pass owner_pass 256 --print=none --modify=none -- input.pdf encrypted.pdf
|
||||
|
||||
# Check encryption status
|
||||
qpdf --show-encryption encrypted.pdf
|
||||
|
||||
# Remove password protection (requires password)
|
||||
qpdf --password=secret123 --decrypt encrypted.pdf decrypted.pdf
|
||||
```
|
||||
|
||||
## Advanced Python Techniques
|
||||
|
||||
### pdfplumber Advanced Features
|
||||
|
||||
#### Extract Text with Precise Coordinates
|
||||
```python
|
||||
import pdfplumber
|
||||
|
||||
with pdfplumber.open("document.pdf") as pdf:
|
||||
page = pdf.pages[0]
|
||||
|
||||
# Extract all text with coordinates
|
||||
chars = page.chars
|
||||
for char in chars[:10]: # First 10 characters
|
||||
print(f"Char: '{char['text']}' at x:{char['x0']:.1f} y:{char['y0']:.1f}")
|
||||
|
||||
# Extract text by bounding box (left, top, right, bottom)
|
||||
bbox_text = page.within_bbox((100, 100, 400, 200)).extract_text()
|
||||
```
|
||||
|
||||
#### Advanced Table Extraction with Custom Settings
|
||||
```python
|
||||
import pdfplumber
|
||||
import pandas as pd
|
||||
|
||||
with pdfplumber.open("complex_table.pdf") as pdf:
|
||||
page = pdf.pages[0]
|
||||
|
||||
# Extract tables with custom settings for complex layouts
|
||||
table_settings = {
|
||||
"vertical_strategy": "lines",
|
||||
"horizontal_strategy": "lines",
|
||||
"snap_tolerance": 3,
|
||||
"intersection_tolerance": 15
|
||||
}
|
||||
tables = page.extract_tables(table_settings)
|
||||
|
||||
# Visual debugging for table extraction
|
||||
img = page.to_image(resolution=150)
|
||||
img.save("debug_layout.png")
|
||||
```
|
||||
|
||||
### reportlab Advanced Features
|
||||
|
||||
#### Create Professional Reports with Tables
|
||||
```python
|
||||
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph
|
||||
from reportlab.lib.styles import getSampleStyleSheet
|
||||
from reportlab.lib import colors
|
||||
|
||||
# Sample data
|
||||
data = [
|
||||
['Product', 'Q1', 'Q2', 'Q3', 'Q4'],
|
||||
['Widgets', '120', '135', '142', '158'],
|
||||
['Gadgets', '85', '92', '98', '105']
|
||||
]
|
||||
|
||||
# Create PDF with table
|
||||
doc = SimpleDocTemplate("report.pdf")
|
||||
elements = []
|
||||
|
||||
# Add title
|
||||
styles = getSampleStyleSheet()
|
||||
title = Paragraph("Quarterly Sales Report", styles['Title'])
|
||||
elements.append(title)
|
||||
|
||||
# Add table with advanced styling
|
||||
table = Table(data)
|
||||
table.setStyle(TableStyle([
|
||||
('BACKGROUND', (0, 0), (-1, 0), colors.grey),
|
||||
('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
|
||||
('ALIGN', (0, 0), (-1, -1), 'CENTER'),
|
||||
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
|
||||
('FONTSIZE', (0, 0), (-1, 0), 14),
|
||||
('BOTTOMPADDING', (0, 0), (-1, 0), 12),
|
||||
('BACKGROUND', (0, 1), (-1, -1), colors.beige),
|
||||
('GRID', (0, 0), (-1, -1), 1, colors.black)
|
||||
]))
|
||||
elements.append(table)
|
||||
|
||||
doc.build(elements)
|
||||
```
|
||||
|
||||
## Complex Workflows
|
||||
|
||||
### Extract Figures/Images from PDF
|
||||
|
||||
#### Method 1: Using pdfimages (fastest)
|
||||
```bash
|
||||
# Extract all images with original quality
|
||||
pdfimages -all document.pdf images/img
|
||||
```
|
||||
|
||||
#### Method 2: Using pypdfium2 + Image Processing
|
||||
```python
|
||||
import pypdfium2 as pdfium
|
||||
from PIL import Image
|
||||
import numpy as np
|
||||
|
||||
def extract_figures(pdf_path, output_dir):
|
||||
pdf = pdfium.PdfDocument(pdf_path)
|
||||
|
||||
for page_num, page in enumerate(pdf):
|
||||
# Render high-resolution page
|
||||
bitmap = page.render(scale=3.0)
|
||||
img = bitmap.to_pil()
|
||||
|
||||
# Convert to numpy for processing
|
||||
img_array = np.array(img)
|
||||
|
||||
# Simple figure detection (non-white regions)
|
||||
mask = np.any(img_array != [255, 255, 255], axis=2)
|
||||
|
||||
# Find contours and extract bounding boxes
|
||||
# (This is simplified - real implementation would need more sophisticated detection)
|
||||
|
||||
# Save detected figures
|
||||
# ... implementation depends on specific needs
|
||||
```
|
||||
|
||||
### Batch PDF Processing with Error Handling
|
||||
```python
|
||||
import os
|
||||
import glob
|
||||
from pypdf import PdfReader, PdfWriter
|
||||
import logging
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def batch_process_pdfs(input_dir, operation='merge'):
|
||||
pdf_files = glob.glob(os.path.join(input_dir, "*.pdf"))
|
||||
|
||||
if operation == 'merge':
|
||||
writer = PdfWriter()
|
||||
for pdf_file in pdf_files:
|
||||
try:
|
||||
reader = PdfReader(pdf_file)
|
||||
for page in reader.pages:
|
||||
writer.add_page(page)
|
||||
logger.info(f"Processed: {pdf_file}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to process {pdf_file}: {e}")
|
||||
continue
|
||||
|
||||
with open("batch_merged.pdf", "wb") as output:
|
||||
writer.write(output)
|
||||
|
||||
elif operation == 'extract_text':
|
||||
for pdf_file in pdf_files:
|
||||
try:
|
||||
reader = PdfReader(pdf_file)
|
||||
text = ""
|
||||
for page in reader.pages:
|
||||
text += page.extract_text()
|
||||
|
||||
output_file = pdf_file.replace('.pdf', '.txt')
|
||||
with open(output_file, 'w', encoding='utf-8') as f:
|
||||
f.write(text)
|
||||
logger.info(f"Extracted text from: {pdf_file}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to extract text from {pdf_file}: {e}")
|
||||
continue
|
||||
```
|
||||
|
||||
### Advanced PDF Cropping
|
||||
```python
|
||||
from pypdf import PdfWriter, PdfReader
|
||||
|
||||
reader = PdfReader("input.pdf")
|
||||
writer = PdfWriter()
|
||||
|
||||
# Crop page (left, bottom, right, top in points)
|
||||
page = reader.pages[0]
|
||||
page.mediabox.left = 50
|
||||
page.mediabox.bottom = 50
|
||||
page.mediabox.right = 550
|
||||
page.mediabox.top = 750
|
||||
|
||||
writer.add_page(page)
|
||||
with open("cropped.pdf", "wb") as output:
|
||||
writer.write(output)
|
||||
```
|
||||
|
||||
## Performance Optimization Tips
|
||||
|
||||
### 1. For Large PDFs
|
||||
- Use streaming approaches instead of loading entire PDF in memory
|
||||
- Use `qpdf --split-pages` for splitting large files
|
||||
- Process pages individually with pypdfium2
|
||||
|
||||
### 2. For Text Extraction
|
||||
- `pdftotext -bbox-layout` is fastest for plain text extraction
|
||||
- Use pdfplumber for structured data and tables
|
||||
- Avoid `pypdf.extract_text()` for very large documents
|
||||
|
||||
### 3. For Image Extraction
|
||||
- `pdfimages` is much faster than rendering pages
|
||||
- Use low resolution for previews, high resolution for final output
|
||||
|
||||
### 4. For Form Filling
|
||||
- pdf-lib maintains form structure better than most alternatives
|
||||
- Pre-validate form fields before processing
|
||||
|
||||
### 5. Memory Management
|
||||
```python
|
||||
# Process PDFs in chunks
|
||||
def process_large_pdf(pdf_path, chunk_size=10):
|
||||
reader = PdfReader(pdf_path)
|
||||
total_pages = len(reader.pages)
|
||||
|
||||
for start_idx in range(0, total_pages, chunk_size):
|
||||
end_idx = min(start_idx + chunk_size, total_pages)
|
||||
writer = PdfWriter()
|
||||
|
||||
for i in range(start_idx, end_idx):
|
||||
writer.add_page(reader.pages[i])
|
||||
|
||||
# Process chunk
|
||||
with open(f"chunk_{start_idx//chunk_size}.pdf", "wb") as output:
|
||||
writer.write(output)
|
||||
```
|
||||
|
||||
## Troubleshooting Common Issues
|
||||
|
||||
### Encrypted PDFs
|
||||
```python
|
||||
# Handle password-protected PDFs
|
||||
from pypdf import PdfReader
|
||||
|
||||
try:
|
||||
reader = PdfReader("encrypted.pdf")
|
||||
if reader.is_encrypted:
|
||||
reader.decrypt("password")
|
||||
except Exception as e:
|
||||
print(f"Failed to decrypt: {e}")
|
||||
```
|
||||
|
||||
### Corrupted PDFs
|
||||
```bash
|
||||
# Use qpdf to repair
|
||||
qpdf --check corrupted.pdf
|
||||
qpdf --replace-input corrupted.pdf
|
||||
```
|
||||
|
||||
### Text Extraction Issues
|
||||
```python
|
||||
# Fallback to OCR for scanned PDFs
|
||||
import pytesseract
|
||||
from pdf2image import convert_from_path
|
||||
|
||||
def extract_text_with_ocr(pdf_path):
|
||||
images = convert_from_path(pdf_path)
|
||||
text = ""
|
||||
for i, image in enumerate(images):
|
||||
text += pytesseract.image_to_string(image)
|
||||
return text
|
||||
```
|
||||
|
||||
## License Information
|
||||
|
||||
- **pypdf**: BSD License
|
||||
- **pdfplumber**: MIT License
|
||||
- **pypdfium2**: Apache/BSD License
|
||||
- **reportlab**: BSD License
|
||||
- **poppler-utils**: GPL-2 License
|
||||
- **qpdf**: Apache License
|
||||
- **pdf-lib**: MIT License
|
||||
- **pdfjs-dist**: Apache License
|
||||
70
skills/public/pdf-processing/scripts/check_bounding_boxes.py
Normal file
70
skills/public/pdf-processing/scripts/check_bounding_boxes.py
Normal file
@@ -0,0 +1,70 @@
|
||||
from dataclasses import dataclass
|
||||
import json
|
||||
import sys
|
||||
|
||||
|
||||
# Script to check that the `fields.json` file that Claude creates when analyzing PDFs
|
||||
# does not have overlapping bounding boxes. See forms.md.
|
||||
|
||||
|
||||
@dataclass
|
||||
class RectAndField:
|
||||
rect: list[float]
|
||||
rect_type: str
|
||||
field: dict
|
||||
|
||||
|
||||
# Returns a list of messages that are printed to stdout for Claude to read.
|
||||
def get_bounding_box_messages(fields_json_stream) -> list[str]:
|
||||
messages = []
|
||||
fields = json.load(fields_json_stream)
|
||||
messages.append(f"Read {len(fields['form_fields'])} fields")
|
||||
|
||||
def rects_intersect(r1, r2):
|
||||
disjoint_horizontal = r1[0] >= r2[2] or r1[2] <= r2[0]
|
||||
disjoint_vertical = r1[1] >= r2[3] or r1[3] <= r2[1]
|
||||
return not (disjoint_horizontal or disjoint_vertical)
|
||||
|
||||
rects_and_fields = []
|
||||
for f in fields["form_fields"]:
|
||||
rects_and_fields.append(RectAndField(f["label_bounding_box"], "label", f))
|
||||
rects_and_fields.append(RectAndField(f["entry_bounding_box"], "entry", f))
|
||||
|
||||
has_error = False
|
||||
for i, ri in enumerate(rects_and_fields):
|
||||
# This is O(N^2); we can optimize if it becomes a problem.
|
||||
for j in range(i + 1, len(rects_and_fields)):
|
||||
rj = rects_and_fields[j]
|
||||
if ri.field["page_number"] == rj.field["page_number"] and rects_intersect(ri.rect, rj.rect):
|
||||
has_error = True
|
||||
if ri.field is rj.field:
|
||||
messages.append(f"FAILURE: intersection between label and entry bounding boxes for `{ri.field['description']}` ({ri.rect}, {rj.rect})")
|
||||
else:
|
||||
messages.append(f"FAILURE: intersection between {ri.rect_type} bounding box for `{ri.field['description']}` ({ri.rect}) and {rj.rect_type} bounding box for `{rj.field['description']}` ({rj.rect})")
|
||||
if len(messages) >= 20:
|
||||
messages.append("Aborting further checks; fix bounding boxes and try again")
|
||||
return messages
|
||||
if ri.rect_type == "entry":
|
||||
if "entry_text" in ri.field:
|
||||
font_size = ri.field["entry_text"].get("font_size", 14)
|
||||
entry_height = ri.rect[3] - ri.rect[1]
|
||||
if entry_height < font_size:
|
||||
has_error = True
|
||||
messages.append(f"FAILURE: entry bounding box height ({entry_height}) for `{ri.field['description']}` is too short for the text content (font size: {font_size}). Increase the box height or decrease the font size.")
|
||||
if len(messages) >= 20:
|
||||
messages.append("Aborting further checks; fix bounding boxes and try again")
|
||||
return messages
|
||||
|
||||
if not has_error:
|
||||
messages.append("SUCCESS: All bounding boxes are valid")
|
||||
return messages
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) != 2:
|
||||
print("Usage: check_bounding_boxes.py [fields.json]")
|
||||
sys.exit(1)
|
||||
# Input file should be in the `fields.json` format described in forms.md.
|
||||
with open(sys.argv[1]) as f:
|
||||
messages = get_bounding_box_messages(f)
|
||||
for msg in messages:
|
||||
print(msg)
|
||||
@@ -0,0 +1,226 @@
|
||||
import unittest
|
||||
import json
|
||||
import io
|
||||
from check_bounding_boxes import get_bounding_box_messages
|
||||
|
||||
|
||||
# Currently this is not run automatically in CI; it's just for documentation and manual checking.
|
||||
class TestGetBoundingBoxMessages(unittest.TestCase):
|
||||
|
||||
def create_json_stream(self, data):
|
||||
"""Helper to create a JSON stream from data"""
|
||||
return io.StringIO(json.dumps(data))
|
||||
|
||||
def test_no_intersections(self):
|
||||
"""Test case with no bounding box intersections"""
|
||||
data = {
|
||||
"form_fields": [
|
||||
{
|
||||
"description": "Name",
|
||||
"page_number": 1,
|
||||
"label_bounding_box": [10, 10, 50, 30],
|
||||
"entry_bounding_box": [60, 10, 150, 30]
|
||||
},
|
||||
{
|
||||
"description": "Email",
|
||||
"page_number": 1,
|
||||
"label_bounding_box": [10, 40, 50, 60],
|
||||
"entry_bounding_box": [60, 40, 150, 60]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
stream = self.create_json_stream(data)
|
||||
messages = get_bounding_box_messages(stream)
|
||||
self.assertTrue(any("SUCCESS" in msg for msg in messages))
|
||||
self.assertFalse(any("FAILURE" in msg for msg in messages))
|
||||
|
||||
def test_label_entry_intersection_same_field(self):
|
||||
"""Test intersection between label and entry of the same field"""
|
||||
data = {
|
||||
"form_fields": [
|
||||
{
|
||||
"description": "Name",
|
||||
"page_number": 1,
|
||||
"label_bounding_box": [10, 10, 60, 30],
|
||||
"entry_bounding_box": [50, 10, 150, 30] # Overlaps with label
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
stream = self.create_json_stream(data)
|
||||
messages = get_bounding_box_messages(stream)
|
||||
self.assertTrue(any("FAILURE" in msg and "intersection" in msg for msg in messages))
|
||||
self.assertFalse(any("SUCCESS" in msg for msg in messages))
|
||||
|
||||
def test_intersection_between_different_fields(self):
|
||||
"""Test intersection between bounding boxes of different fields"""
|
||||
data = {
|
||||
"form_fields": [
|
||||
{
|
||||
"description": "Name",
|
||||
"page_number": 1,
|
||||
"label_bounding_box": [10, 10, 50, 30],
|
||||
"entry_bounding_box": [60, 10, 150, 30]
|
||||
},
|
||||
{
|
||||
"description": "Email",
|
||||
"page_number": 1,
|
||||
"label_bounding_box": [40, 20, 80, 40], # Overlaps with Name's boxes
|
||||
"entry_bounding_box": [160, 10, 250, 30]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
stream = self.create_json_stream(data)
|
||||
messages = get_bounding_box_messages(stream)
|
||||
self.assertTrue(any("FAILURE" in msg and "intersection" in msg for msg in messages))
|
||||
self.assertFalse(any("SUCCESS" in msg for msg in messages))
|
||||
|
||||
def test_different_pages_no_intersection(self):
|
||||
"""Test that boxes on different pages don't count as intersecting"""
|
||||
data = {
|
||||
"form_fields": [
|
||||
{
|
||||
"description": "Name",
|
||||
"page_number": 1,
|
||||
"label_bounding_box": [10, 10, 50, 30],
|
||||
"entry_bounding_box": [60, 10, 150, 30]
|
||||
},
|
||||
{
|
||||
"description": "Email",
|
||||
"page_number": 2,
|
||||
"label_bounding_box": [10, 10, 50, 30], # Same coordinates but different page
|
||||
"entry_bounding_box": [60, 10, 150, 30]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
stream = self.create_json_stream(data)
|
||||
messages = get_bounding_box_messages(stream)
|
||||
self.assertTrue(any("SUCCESS" in msg for msg in messages))
|
||||
self.assertFalse(any("FAILURE" in msg for msg in messages))
|
||||
|
||||
def test_entry_height_too_small(self):
|
||||
"""Test that entry box height is checked against font size"""
|
||||
data = {
|
||||
"form_fields": [
|
||||
{
|
||||
"description": "Name",
|
||||
"page_number": 1,
|
||||
"label_bounding_box": [10, 10, 50, 30],
|
||||
"entry_bounding_box": [60, 10, 150, 20], # Height is 10
|
||||
"entry_text": {
|
||||
"font_size": 14 # Font size larger than height
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
stream = self.create_json_stream(data)
|
||||
messages = get_bounding_box_messages(stream)
|
||||
self.assertTrue(any("FAILURE" in msg and "height" in msg for msg in messages))
|
||||
self.assertFalse(any("SUCCESS" in msg for msg in messages))
|
||||
|
||||
def test_entry_height_adequate(self):
|
||||
"""Test that adequate entry box height passes"""
|
||||
data = {
|
||||
"form_fields": [
|
||||
{
|
||||
"description": "Name",
|
||||
"page_number": 1,
|
||||
"label_bounding_box": [10, 10, 50, 30],
|
||||
"entry_bounding_box": [60, 10, 150, 30], # Height is 20
|
||||
"entry_text": {
|
||||
"font_size": 14 # Font size smaller than height
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
stream = self.create_json_stream(data)
|
||||
messages = get_bounding_box_messages(stream)
|
||||
self.assertTrue(any("SUCCESS" in msg for msg in messages))
|
||||
self.assertFalse(any("FAILURE" in msg for msg in messages))
|
||||
|
||||
def test_default_font_size(self):
|
||||
"""Test that default font size is used when not specified"""
|
||||
data = {
|
||||
"form_fields": [
|
||||
{
|
||||
"description": "Name",
|
||||
"page_number": 1,
|
||||
"label_bounding_box": [10, 10, 50, 30],
|
||||
"entry_bounding_box": [60, 10, 150, 20], # Height is 10
|
||||
"entry_text": {} # No font_size specified, should use default 14
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
stream = self.create_json_stream(data)
|
||||
messages = get_bounding_box_messages(stream)
|
||||
self.assertTrue(any("FAILURE" in msg and "height" in msg for msg in messages))
|
||||
self.assertFalse(any("SUCCESS" in msg for msg in messages))
|
||||
|
||||
def test_no_entry_text(self):
|
||||
"""Test that missing entry_text doesn't cause height check"""
|
||||
data = {
|
||||
"form_fields": [
|
||||
{
|
||||
"description": "Name",
|
||||
"page_number": 1,
|
||||
"label_bounding_box": [10, 10, 50, 30],
|
||||
"entry_bounding_box": [60, 10, 150, 20] # Small height but no entry_text
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
stream = self.create_json_stream(data)
|
||||
messages = get_bounding_box_messages(stream)
|
||||
self.assertTrue(any("SUCCESS" in msg for msg in messages))
|
||||
self.assertFalse(any("FAILURE" in msg for msg in messages))
|
||||
|
||||
def test_multiple_errors_limit(self):
|
||||
"""Test that error messages are limited to prevent excessive output"""
|
||||
fields = []
|
||||
# Create many overlapping fields
|
||||
for i in range(25):
|
||||
fields.append({
|
||||
"description": f"Field{i}",
|
||||
"page_number": 1,
|
||||
"label_bounding_box": [10, 10, 50, 30], # All overlap
|
||||
"entry_bounding_box": [20, 15, 60, 35] # All overlap
|
||||
})
|
||||
|
||||
data = {"form_fields": fields}
|
||||
|
||||
stream = self.create_json_stream(data)
|
||||
messages = get_bounding_box_messages(stream)
|
||||
# Should abort after ~20 messages
|
||||
self.assertTrue(any("Aborting" in msg for msg in messages))
|
||||
# Should have some FAILURE messages but not hundreds
|
||||
failure_count = sum(1 for msg in messages if "FAILURE" in msg)
|
||||
self.assertGreater(failure_count, 0)
|
||||
self.assertLess(len(messages), 30) # Should be limited
|
||||
|
||||
def test_edge_touching_boxes(self):
|
||||
"""Test that boxes touching at edges don't count as intersecting"""
|
||||
data = {
|
||||
"form_fields": [
|
||||
{
|
||||
"description": "Name",
|
||||
"page_number": 1,
|
||||
"label_bounding_box": [10, 10, 50, 30],
|
||||
"entry_bounding_box": [50, 10, 150, 30] # Touches at x=50
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
stream = self.create_json_stream(data)
|
||||
messages = get_bounding_box_messages(stream)
|
||||
self.assertTrue(any("SUCCESS" in msg for msg in messages))
|
||||
self.assertFalse(any("FAILURE" in msg for msg in messages))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
||||
@@ -0,0 +1,12 @@
|
||||
import sys
|
||||
from pypdf import PdfReader
|
||||
|
||||
|
||||
# Script for Claude to run to determine whether a PDF has fillable form fields. See forms.md.
|
||||
|
||||
|
||||
reader = PdfReader(sys.argv[1])
|
||||
if (reader.get_fields()):
|
||||
print("This PDF has fillable form fields")
|
||||
else:
|
||||
print("This PDF does not have fillable form fields; you will need to visually determine where to enter data")
|
||||
@@ -0,0 +1,35 @@
|
||||
import os
|
||||
import sys
|
||||
|
||||
from pdf2image import convert_from_path
|
||||
|
||||
|
||||
# Converts each page of a PDF to a PNG image.
|
||||
|
||||
|
||||
def convert(pdf_path, output_dir, max_dim=1000):
|
||||
images = convert_from_path(pdf_path, dpi=200)
|
||||
|
||||
for i, image in enumerate(images):
|
||||
# Scale image if needed to keep width/height under `max_dim`
|
||||
width, height = image.size
|
||||
if width > max_dim or height > max_dim:
|
||||
scale_factor = min(max_dim / width, max_dim / height)
|
||||
new_width = int(width * scale_factor)
|
||||
new_height = int(height * scale_factor)
|
||||
image = image.resize((new_width, new_height))
|
||||
|
||||
image_path = os.path.join(output_dir, f"page_{i+1}.png")
|
||||
image.save(image_path)
|
||||
print(f"Saved page {i+1} as {image_path} (size: {image.size})")
|
||||
|
||||
print(f"Converted {len(images)} pages to PNG images")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) != 3:
|
||||
print("Usage: convert_pdf_to_images.py [input pdf] [output directory]")
|
||||
sys.exit(1)
|
||||
pdf_path = sys.argv[1]
|
||||
output_directory = sys.argv[2]
|
||||
convert(pdf_path, output_directory)
|
||||
@@ -0,0 +1,41 @@
|
||||
import json
|
||||
import sys
|
||||
|
||||
from PIL import Image, ImageDraw
|
||||
|
||||
|
||||
# Creates "validation" images with rectangles for the bounding box information that
|
||||
# Claude creates when determining where to add text annotations in PDFs. See forms.md.
|
||||
|
||||
|
||||
def create_validation_image(page_number, fields_json_path, input_path, output_path):
|
||||
# Input file should be in the `fields.json` format described in forms.md.
|
||||
with open(fields_json_path, 'r') as f:
|
||||
data = json.load(f)
|
||||
|
||||
img = Image.open(input_path)
|
||||
draw = ImageDraw.Draw(img)
|
||||
num_boxes = 0
|
||||
|
||||
for field in data["form_fields"]:
|
||||
if field["page_number"] == page_number:
|
||||
entry_box = field['entry_bounding_box']
|
||||
label_box = field['label_bounding_box']
|
||||
# Draw red rectangle over entry bounding box and blue rectangle over the label.
|
||||
draw.rectangle(entry_box, outline='red', width=2)
|
||||
draw.rectangle(label_box, outline='blue', width=2)
|
||||
num_boxes += 2
|
||||
|
||||
img.save(output_path)
|
||||
print(f"Created validation image at {output_path} with {num_boxes} bounding boxes")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) != 5:
|
||||
print("Usage: create_validation_image.py [page number] [fields.json file] [input image path] [output image path]")
|
||||
sys.exit(1)
|
||||
page_number = int(sys.argv[1])
|
||||
fields_json_path = sys.argv[2]
|
||||
input_image_path = sys.argv[3]
|
||||
output_image_path = sys.argv[4]
|
||||
create_validation_image(page_number, fields_json_path, input_image_path, output_image_path)
|
||||
152
skills/public/pdf-processing/scripts/extract_form_field_info.py
Normal file
152
skills/public/pdf-processing/scripts/extract_form_field_info.py
Normal file
@@ -0,0 +1,152 @@
|
||||
import json
|
||||
import sys
|
||||
|
||||
from pypdf import PdfReader
|
||||
|
||||
|
||||
# Extracts data for the fillable form fields in a PDF and outputs JSON that
|
||||
# Claude uses to fill the fields. See forms.md.
|
||||
|
||||
|
||||
# This matches the format used by PdfReader `get_fields` and `update_page_form_field_values` methods.
|
||||
def get_full_annotation_field_id(annotation):
|
||||
components = []
|
||||
while annotation:
|
||||
field_name = annotation.get('/T')
|
||||
if field_name:
|
||||
components.append(field_name)
|
||||
annotation = annotation.get('/Parent')
|
||||
return ".".join(reversed(components)) if components else None
|
||||
|
||||
|
||||
def make_field_dict(field, field_id):
|
||||
field_dict = {"field_id": field_id}
|
||||
ft = field.get('/FT')
|
||||
if ft == "/Tx":
|
||||
field_dict["type"] = "text"
|
||||
elif ft == "/Btn":
|
||||
field_dict["type"] = "checkbox" # radio groups handled separately
|
||||
states = field.get("/_States_", [])
|
||||
if len(states) == 2:
|
||||
# "/Off" seems to always be the unchecked value, as suggested by
|
||||
# https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf#page=448
|
||||
# It can be either first or second in the "/_States_" list.
|
||||
if "/Off" in states:
|
||||
field_dict["checked_value"] = states[0] if states[0] != "/Off" else states[1]
|
||||
field_dict["unchecked_value"] = "/Off"
|
||||
else:
|
||||
print(f"Unexpected state values for checkbox `${field_id}`. Its checked and unchecked values may not be correct; if you're trying to check it, visually verify the results.")
|
||||
field_dict["checked_value"] = states[0]
|
||||
field_dict["unchecked_value"] = states[1]
|
||||
elif ft == "/Ch":
|
||||
field_dict["type"] = "choice"
|
||||
states = field.get("/_States_", [])
|
||||
field_dict["choice_options"] = [{
|
||||
"value": state[0],
|
||||
"text": state[1],
|
||||
} for state in states]
|
||||
else:
|
||||
field_dict["type"] = f"unknown ({ft})"
|
||||
return field_dict
|
||||
|
||||
|
||||
# Returns a list of fillable PDF fields:
|
||||
# [
|
||||
# {
|
||||
# "field_id": "name",
|
||||
# "page": 1,
|
||||
# "type": ("text", "checkbox", "radio_group", or "choice")
|
||||
# // Per-type additional fields described in forms.md
|
||||
# },
|
||||
# ]
|
||||
def get_field_info(reader: PdfReader):
|
||||
fields = reader.get_fields()
|
||||
|
||||
field_info_by_id = {}
|
||||
possible_radio_names = set()
|
||||
|
||||
for field_id, field in fields.items():
|
||||
# Skip if this is a container field with children, except that it might be
|
||||
# a parent group for radio button options.
|
||||
if field.get("/Kids"):
|
||||
if field.get("/FT") == "/Btn":
|
||||
possible_radio_names.add(field_id)
|
||||
continue
|
||||
field_info_by_id[field_id] = make_field_dict(field, field_id)
|
||||
|
||||
# Bounding rects are stored in annotations in page objects.
|
||||
|
||||
# Radio button options have a separate annotation for each choice;
|
||||
# all choices have the same field name.
|
||||
# See https://westhealth.github.io/exploring-fillable-forms-with-pdfrw.html
|
||||
radio_fields_by_id = {}
|
||||
|
||||
for page_index, page in enumerate(reader.pages):
|
||||
annotations = page.get('/Annots', [])
|
||||
for ann in annotations:
|
||||
field_id = get_full_annotation_field_id(ann)
|
||||
if field_id in field_info_by_id:
|
||||
field_info_by_id[field_id]["page"] = page_index + 1
|
||||
field_info_by_id[field_id]["rect"] = ann.get('/Rect')
|
||||
elif field_id in possible_radio_names:
|
||||
try:
|
||||
# ann['/AP']['/N'] should have two items. One of them is '/Off',
|
||||
# the other is the active value.
|
||||
on_values = [v for v in ann["/AP"]["/N"] if v != "/Off"]
|
||||
except KeyError:
|
||||
continue
|
||||
if len(on_values) == 1:
|
||||
rect = ann.get("/Rect")
|
||||
if field_id not in radio_fields_by_id:
|
||||
radio_fields_by_id[field_id] = {
|
||||
"field_id": field_id,
|
||||
"type": "radio_group",
|
||||
"page": page_index + 1,
|
||||
"radio_options": [],
|
||||
}
|
||||
# Note: at least on macOS 15.7, Preview.app doesn't show selected
|
||||
# radio buttons correctly. (It does if you remove the leading slash
|
||||
# from the value, but that causes them not to appear correctly in
|
||||
# Chrome/Firefox/Acrobat/etc).
|
||||
radio_fields_by_id[field_id]["radio_options"].append({
|
||||
"value": on_values[0],
|
||||
"rect": rect,
|
||||
})
|
||||
|
||||
# Some PDFs have form field definitions without corresponding annotations,
|
||||
# so we can't tell where they are. Ignore these fields for now.
|
||||
fields_with_location = []
|
||||
for field_info in field_info_by_id.values():
|
||||
if "page" in field_info:
|
||||
fields_with_location.append(field_info)
|
||||
else:
|
||||
print(f"Unable to determine location for field id: {field_info.get('field_id')}, ignoring")
|
||||
|
||||
# Sort by page number, then Y position (flipped in PDF coordinate system), then X.
|
||||
def sort_key(f):
|
||||
if "radio_options" in f:
|
||||
rect = f["radio_options"][0]["rect"] or [0, 0, 0, 0]
|
||||
else:
|
||||
rect = f.get("rect") or [0, 0, 0, 0]
|
||||
adjusted_position = [-rect[1], rect[0]]
|
||||
return [f.get("page"), adjusted_position]
|
||||
|
||||
sorted_fields = fields_with_location + list(radio_fields_by_id.values())
|
||||
sorted_fields.sort(key=sort_key)
|
||||
|
||||
return sorted_fields
|
||||
|
||||
|
||||
def write_field_info(pdf_path: str, json_output_path: str):
|
||||
reader = PdfReader(pdf_path)
|
||||
field_info = get_field_info(reader)
|
||||
with open(json_output_path, "w") as f:
|
||||
json.dump(field_info, f, indent=2)
|
||||
print(f"Wrote {len(field_info)} fields to {json_output_path}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) != 3:
|
||||
print("Usage: extract_form_field_info.py [input pdf] [output json]")
|
||||
sys.exit(1)
|
||||
write_field_info(sys.argv[1], sys.argv[2])
|
||||
114
skills/public/pdf-processing/scripts/fill_fillable_fields.py
Normal file
114
skills/public/pdf-processing/scripts/fill_fillable_fields.py
Normal file
@@ -0,0 +1,114 @@
|
||||
import json
|
||||
import sys
|
||||
|
||||
from pypdf import PdfReader, PdfWriter
|
||||
|
||||
from extract_form_field_info import get_field_info
|
||||
|
||||
|
||||
# Fills fillable form fields in a PDF. See forms.md.
|
||||
|
||||
|
||||
def fill_pdf_fields(input_pdf_path: str, fields_json_path: str, output_pdf_path: str):
|
||||
with open(fields_json_path) as f:
|
||||
fields = json.load(f)
|
||||
# Group by page number.
|
||||
fields_by_page = {}
|
||||
for field in fields:
|
||||
if "value" in field:
|
||||
field_id = field["field_id"]
|
||||
page = field["page"]
|
||||
if page not in fields_by_page:
|
||||
fields_by_page[page] = {}
|
||||
fields_by_page[page][field_id] = field["value"]
|
||||
|
||||
reader = PdfReader(input_pdf_path)
|
||||
|
||||
has_error = False
|
||||
field_info = get_field_info(reader)
|
||||
fields_by_ids = {f["field_id"]: f for f in field_info}
|
||||
for field in fields:
|
||||
existing_field = fields_by_ids.get(field["field_id"])
|
||||
if not existing_field:
|
||||
has_error = True
|
||||
print(f"ERROR: `{field['field_id']}` is not a valid field ID")
|
||||
elif field["page"] != existing_field["page"]:
|
||||
has_error = True
|
||||
print(f"ERROR: Incorrect page number for `{field['field_id']}` (got {field['page']}, expected {existing_field['page']})")
|
||||
else:
|
||||
if "value" in field:
|
||||
err = validation_error_for_field_value(existing_field, field["value"])
|
||||
if err:
|
||||
print(err)
|
||||
has_error = True
|
||||
if has_error:
|
||||
sys.exit(1)
|
||||
|
||||
writer = PdfWriter(clone_from=reader)
|
||||
for page, field_values in fields_by_page.items():
|
||||
writer.update_page_form_field_values(writer.pages[page - 1], field_values, auto_regenerate=False)
|
||||
|
||||
# This seems to be necessary for many PDF viewers to format the form values correctly.
|
||||
# It may cause the viewer to show a "save changes" dialog even if the user doesn't make any changes.
|
||||
writer.set_need_appearances_writer(True)
|
||||
|
||||
with open(output_pdf_path, "wb") as f:
|
||||
writer.write(f)
|
||||
|
||||
|
||||
def validation_error_for_field_value(field_info, field_value):
|
||||
field_type = field_info["type"]
|
||||
field_id = field_info["field_id"]
|
||||
if field_type == "checkbox":
|
||||
checked_val = field_info["checked_value"]
|
||||
unchecked_val = field_info["unchecked_value"]
|
||||
if field_value != checked_val and field_value != unchecked_val:
|
||||
return f'ERROR: Invalid value "{field_value}" for checkbox field "{field_id}". The checked value is "{checked_val}" and the unchecked value is "{unchecked_val}"'
|
||||
elif field_type == "radio_group":
|
||||
option_values = [opt["value"] for opt in field_info["radio_options"]]
|
||||
if field_value not in option_values:
|
||||
return f'ERROR: Invalid value "{field_value}" for radio group field "{field_id}". Valid values are: {option_values}'
|
||||
elif field_type == "choice":
|
||||
choice_values = [opt["value"] for opt in field_info["choice_options"]]
|
||||
if field_value not in choice_values:
|
||||
return f'ERROR: Invalid value "{field_value}" for choice field "{field_id}". Valid values are: {choice_values}'
|
||||
return None
|
||||
|
||||
|
||||
# pypdf (at least version 5.7.0) has a bug when setting the value for a selection list field.
|
||||
# In _writer.py around line 966:
|
||||
#
|
||||
# if field.get(FA.FT, "/Tx") == "/Ch" and field_flags & FA.FfBits.Combo == 0:
|
||||
# txt = "\n".join(annotation.get_inherited(FA.Opt, []))
|
||||
#
|
||||
# The problem is that for selection lists, `get_inherited` returns a list of two-element lists like
|
||||
# [["value1", "Text 1"], ["value2", "Text 2"], ...]
|
||||
# This causes `join` to throw a TypeError because it expects an iterable of strings.
|
||||
# The horrible workaround is to patch `get_inherited` to return a list of the value strings.
|
||||
# We call the original method and adjust the return value only if the argument to `get_inherited`
|
||||
# is `FA.Opt` and if the return value is a list of two-element lists.
|
||||
def monkeypatch_pydpf_method():
|
||||
from pypdf.generic import DictionaryObject
|
||||
from pypdf.constants import FieldDictionaryAttributes
|
||||
|
||||
original_get_inherited = DictionaryObject.get_inherited
|
||||
|
||||
def patched_get_inherited(self, key: str, default = None):
|
||||
result = original_get_inherited(self, key, default)
|
||||
if key == FieldDictionaryAttributes.Opt:
|
||||
if isinstance(result, list) and all(isinstance(v, list) and len(v) == 2 for v in result):
|
||||
result = [r[0] for r in result]
|
||||
return result
|
||||
|
||||
DictionaryObject.get_inherited = patched_get_inherited
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) != 4:
|
||||
print("Usage: fill_fillable_fields.py [input pdf] [field_values.json] [output pdf]")
|
||||
sys.exit(1)
|
||||
monkeypatch_pydpf_method()
|
||||
input_pdf = sys.argv[1]
|
||||
fields_json = sys.argv[2]
|
||||
output_pdf = sys.argv[3]
|
||||
fill_pdf_fields(input_pdf, fields_json, output_pdf)
|
||||
@@ -0,0 +1,108 @@
|
||||
import json
|
||||
import sys
|
||||
|
||||
from pypdf import PdfReader, PdfWriter
|
||||
from pypdf.annotations import FreeText
|
||||
|
||||
|
||||
# Fills a PDF by adding text annotations defined in `fields.json`. See forms.md.
|
||||
|
||||
|
||||
def transform_coordinates(bbox, image_width, image_height, pdf_width, pdf_height):
|
||||
"""Transform bounding box from image coordinates to PDF coordinates"""
|
||||
# Image coordinates: origin at top-left, y increases downward
|
||||
# PDF coordinates: origin at bottom-left, y increases upward
|
||||
x_scale = pdf_width / image_width
|
||||
y_scale = pdf_height / image_height
|
||||
|
||||
left = bbox[0] * x_scale
|
||||
right = bbox[2] * x_scale
|
||||
|
||||
# Flip Y coordinates for PDF
|
||||
top = pdf_height - (bbox[1] * y_scale)
|
||||
bottom = pdf_height - (bbox[3] * y_scale)
|
||||
|
||||
return left, bottom, right, top
|
||||
|
||||
|
||||
def fill_pdf_form(input_pdf_path, fields_json_path, output_pdf_path):
|
||||
"""Fill the PDF form with data from fields.json"""
|
||||
|
||||
# `fields.json` format described in forms.md.
|
||||
with open(fields_json_path, "r") as f:
|
||||
fields_data = json.load(f)
|
||||
|
||||
# Open the PDF
|
||||
reader = PdfReader(input_pdf_path)
|
||||
writer = PdfWriter()
|
||||
|
||||
# Copy all pages to writer
|
||||
writer.append(reader)
|
||||
|
||||
# Get PDF dimensions for each page
|
||||
pdf_dimensions = {}
|
||||
for i, page in enumerate(reader.pages):
|
||||
mediabox = page.mediabox
|
||||
pdf_dimensions[i + 1] = [mediabox.width, mediabox.height]
|
||||
|
||||
# Process each form field
|
||||
annotations = []
|
||||
for field in fields_data["form_fields"]:
|
||||
page_num = field["page_number"]
|
||||
|
||||
# Get page dimensions and transform coordinates.
|
||||
page_info = next(p for p in fields_data["pages"] if p["page_number"] == page_num)
|
||||
image_width = page_info["image_width"]
|
||||
image_height = page_info["image_height"]
|
||||
pdf_width, pdf_height = pdf_dimensions[page_num]
|
||||
|
||||
transformed_entry_box = transform_coordinates(
|
||||
field["entry_bounding_box"],
|
||||
image_width, image_height,
|
||||
pdf_width, pdf_height
|
||||
)
|
||||
|
||||
# Skip empty fields
|
||||
if "entry_text" not in field or "text" not in field["entry_text"]:
|
||||
continue
|
||||
entry_text = field["entry_text"]
|
||||
text = entry_text["text"]
|
||||
if not text:
|
||||
continue
|
||||
|
||||
font_name = entry_text.get("font", "Arial")
|
||||
font_size = str(entry_text.get("font_size", 14)) + "pt"
|
||||
font_color = entry_text.get("font_color", "000000")
|
||||
|
||||
# Font size/color seems to not work reliably across viewers:
|
||||
# https://github.com/py-pdf/pypdf/issues/2084
|
||||
annotation = FreeText(
|
||||
text=text,
|
||||
rect=transformed_entry_box,
|
||||
font=font_name,
|
||||
font_size=font_size,
|
||||
font_color=font_color,
|
||||
border_color=None,
|
||||
background_color=None,
|
||||
)
|
||||
annotations.append(annotation)
|
||||
# page_number is 0-based for pypdf
|
||||
writer.add_annotation(page_number=page_num - 1, annotation=annotation)
|
||||
|
||||
# Save the filled PDF
|
||||
with open(output_pdf_path, "wb") as output:
|
||||
writer.write(output)
|
||||
|
||||
print(f"Successfully filled PDF form and saved to {output_pdf_path}")
|
||||
print(f"Added {len(annotations)} text annotations")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) != 4:
|
||||
print("Usage: fill_pdf_form_with_annotations.py [input pdf] [fields.json] [output pdf]")
|
||||
sys.exit(1)
|
||||
input_pdf = sys.argv[1]
|
||||
fields_json = sys.argv[2]
|
||||
output_pdf = sys.argv[3]
|
||||
|
||||
fill_pdf_form(input_pdf, fields_json, output_pdf)
|
||||
Reference in New Issue
Block a user