feat: add skills system for specialized agent workflows (#6)

Implement a skills framework that enables specialized workflows for
specific tasks (e.g., PDF processing, web page generation). Skills are
discovered from the skills/ directory and automatically mounted in
sandboxes with path mapping support.

- Add SkillsConfig for configuring skills path and container mount point
- Implement dynamic skill loading from SKILL.md files with YAML frontmatter
- Add path mapping in LocalSandbox to translate container paths to local paths
- Mount skills directory in AIO Docker sandbox containers
- Update lead agent prompt to dynamically inject available skills
- Add setup documentation and expand config.example.yaml

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
DanielWalnut
2026-01-16 14:44:51 +08:00
committed by GitHub
parent 5ef3cb57ee
commit cfa97f7a96
30 changed files with 2959 additions and 51 deletions

View File

@@ -26,11 +26,20 @@ make format
### Configuration System
The app uses a YAML-based configuration system loaded from `config.yaml`. Configuration priority:
The app uses a YAML-based configuration system loaded from `config.yaml`.
**Setup**: Copy `config.example.yaml` to `config.yaml` in the **project root** directory and customize for your environment.
```bash
# From project root (deer-flow/)
cp config.example.yaml config.yaml
```
Configuration priority:
1. Explicit `config_path` argument
2. `DEER_FLOW_CONFIG_PATH` environment variable
3. `config.yaml` in current directory
4. `config.yaml` in parent directory
3. `config.yaml` in current directory (backend/)
4. `config.yaml` in parent directory (project root - **recommended location**)
Config values starting with `$` are resolved as environment variables (e.g., `$OPENAI_API_KEY`).
@@ -61,12 +70,25 @@ Config values starting with `$` are resolved as environment variables (e.g., `$O
- `resolve_variable()` imports module and returns variable (e.g., `module:variable`)
- `resolve_class()` imports and validates class against base class
**Skills System** (`src/skills/`)
- Skills provide specialized workflows for specific tasks (e.g., PDF processing, frontend design)
- Located in `deer-flow/skills/{public,custom}` directory structure
- Each skill has a `SKILL.md` file with YAML front matter (name, description, license)
- Skills are automatically discovered and loaded at runtime
- `load_skills()` scans directories and parses SKILL.md files
- Skills are injected into agent's system prompt with paths
- Path mapping system allows seamless access in both local and Docker sandbox:
- Local sandbox: `/mnt/skills``/path/to/deer-flow/skills`
- Docker sandbox: Automatically mounted as volume
### Config Schema
Models, tools, and sandbox providers are configured in `config.yaml`:
Models, tools, sandbox providers, and skills are configured in `config.yaml`:
- `models[]`: LLM configurations with `use` class path
- `tools[]`: Tool configurations with `use` variable path and `group`
- `sandbox.use`: Sandbox provider class path
- `skills.path`: Host path to skills directory (optional, default: `../skills`)
- `skills.container_path`: Container mount path (default: `/mnt/skills`)
## Code Style

76
backend/SETUP.md Normal file
View File

@@ -0,0 +1,76 @@
# Setup Guide
Quick setup instructions for DeerFlow.
## Configuration Setup
DeerFlow uses a YAML configuration file that should be placed in the **project root directory**.
### Steps
1. **Navigate to project root**:
```bash
cd /path/to/deer-flow
```
2. **Copy example configuration**:
```bash
cp config.example.yaml config.yaml
```
3. **Edit configuration**:
```bash
# Option A: Set environment variables (recommended)
export OPENAI_API_KEY="your-key-here"
# Option B: Edit config.yaml directly
vim config.yaml # or your preferred editor
```
4. **Verify configuration**:
```bash
cd backend
python -c "from src.config import get_app_config; print('✓ Config loaded:', get_app_config().models[0].name)"
```
## Important Notes
- **Location**: `config.yaml` should be in `deer-flow/` (project root), not `deer-flow/backend/`
- **Git**: `config.yaml` is automatically ignored by git (contains secrets)
- **Priority**: If both `backend/config.yaml` and `../config.yaml` exist, backend version takes precedence
## Configuration File Locations
The backend searches for `config.yaml` in this order:
1. `DEER_FLOW_CONFIG_PATH` environment variable (if set)
2. `backend/config.yaml` (current directory when running from backend/)
3. `deer-flow/config.yaml` (parent directory - **recommended location**)
**Recommended**: Place `config.yaml` in project root (`deer-flow/config.yaml`).
## Troubleshooting
### Config file not found
```bash
# Check where the backend is looking
cd deer-flow/backend
python -c "from src.config.app_config import AppConfig; print(AppConfig.resolve_config_path())"
```
If it can't find the config:
1. Ensure you've copied `config.example.yaml` to `config.yaml`
2. Verify you're in the correct directory
3. Check the file exists: `ls -la ../config.yaml`
### Permission denied
```bash
chmod 600 ../config.yaml # Protect sensitive configuration
```
## See Also
- [Configuration Guide](docs/CONFIGURATION.md) - Detailed configuration options
- [Architecture Overview](CLAUDE.md) - System architecture

View File

@@ -0,0 +1,221 @@
# Configuration Guide
This guide explains how to configure DeerFlow for your environment.
## Quick Start
1. **Copy the example configuration** (from project root):
```bash
# From project root directory (deer-flow/)
cp config.example.yaml config.yaml
```
2. **Set your API keys**:
Option A: Use environment variables (recommended):
```bash
export OPENAI_API_KEY="your-api-key-here"
export ANTHROPIC_API_KEY="your-api-key-here"
# Add other keys as needed
```
Option B: Edit `config.yaml` directly (not recommended for production):
```yaml
models:
- name: gpt-4
api_key: your-actual-api-key-here # Replace placeholder
```
3. **Start the application**:
```bash
make dev
```
## Configuration Sections
### Models
Configure the LLM models available to the agent:
```yaml
models:
- name: gpt-4 # Internal identifier
display_name: GPT-4 # Human-readable name
use: langchain_openai:ChatOpenAI # LangChain class path
model: gpt-4 # Model identifier for API
api_key: $OPENAI_API_KEY # API key (use env var)
max_tokens: 4096 # Max tokens per request
temperature: 0.7 # Sampling temperature
```
**Supported Providers**:
- OpenAI (`langchain_openai:ChatOpenAI`)
- Anthropic (`langchain_anthropic:ChatAnthropic`)
- DeepSeek (`langchain_deepseek:ChatDeepSeek`)
- Any LangChain-compatible provider
**Thinking Models**:
Some models support "thinking" mode for complex reasoning:
```yaml
models:
- name: deepseek-v3
supports_thinking: true
when_thinking_enabled:
extra_body:
thinking:
type: enabled
```
### Tool Groups
Organize tools into logical groups:
```yaml
tool_groups:
- name: web # Web browsing and search
- name: file:read # Read-only file operations
- name: file:write # Write file operations
- name: bash # Shell command execution
```
### Tools
Configure specific tools available to the agent:
```yaml
tools:
- name: web_search
group: web
use: src.community.tavily.tools:web_search_tool
max_results: 5
# api_key: $TAVILY_API_KEY # Optional
```
**Built-in Tools**:
- `web_search` - Search the web (Tavily)
- `web_fetch` - Fetch web pages (Jina AI)
- `ls` - List directory contents
- `read_file` - Read file contents
- `write_file` - Write file contents
- `str_replace` - String replacement in files
- `bash` - Execute bash commands
### Sandbox
Choose between local execution or Docker-based isolation:
**Option 1: Local Sandbox** (default, simpler setup):
```yaml
sandbox:
use: src.sandbox.local:LocalSandboxProvider
```
**Option 2: Docker Sandbox** (isolated, more secure):
```yaml
sandbox:
use: src.community.aio_sandbox:AioSandboxProvider
port: 8080
auto_start: true
container_prefix: deer-flow-sandbox
# Optional: Additional mounts
mounts:
- host_path: /path/on/host
container_path: /path/in/container
read_only: false
```
### Skills
Configure the skills directory for specialized workflows:
```yaml
skills:
# Host path (optional, default: ../skills)
path: /custom/path/to/skills
# Container mount path (default: /mnt/skills)
container_path: /mnt/skills
```
**How Skills Work**:
- Skills are stored in `deer-flow/skills/{public,custom}/`
- Each skill has a `SKILL.md` file with metadata
- Skills are automatically discovered and loaded
- Available in both local and Docker sandbox via path mapping
### Title Generation
Automatic conversation title generation:
```yaml
title:
enabled: true
max_words: 6
max_chars: 60
model_name: null # Use first model in list
```
## Environment Variables
DeerFlow supports environment variable substitution using the `$` prefix:
```yaml
models:
- api_key: $OPENAI_API_KEY # Reads from environment
```
**Common Environment Variables**:
- `OPENAI_API_KEY` - OpenAI API key
- `ANTHROPIC_API_KEY` - Anthropic API key
- `DEEPSEEK_API_KEY` - DeepSeek API key
- `TAVILY_API_KEY` - Tavily search API key
- `DEER_FLOW_CONFIG_PATH` - Custom config file path
## Configuration Location
The configuration file should be placed in the **project root directory** (`deer-flow/config.yaml`), not in the backend directory.
## Configuration Priority
DeerFlow searches for configuration in this order:
1. Path specified in code via `config_path` argument
2. Path from `DEER_FLOW_CONFIG_PATH` environment variable
3. `config.yaml` in current working directory (typically `backend/` when running)
4. `config.yaml` in parent directory (project root: `deer-flow/`)
## Best Practices
1. **Place `config.yaml` in project root** - Not in `backend/` directory
2. **Never commit `config.yaml`** - It's already in `.gitignore`
3. **Use environment variables for secrets** - Don't hardcode API keys
4. **Keep `config.example.yaml` updated** - Document all new options
5. **Test configuration changes locally** - Before deploying
6. **Use Docker sandbox for production** - Better isolation and security
## Troubleshooting
### "Config file not found"
- Ensure `config.yaml` exists in the **project root** directory (`deer-flow/config.yaml`)
- The backend searches parent directory by default, so root location is preferred
- Alternatively, set `DEER_FLOW_CONFIG_PATH` environment variable to custom location
### "Invalid API key"
- Verify environment variables are set correctly
- Check that `$` prefix is used for env var references
### "Skills not loading"
- Check that `deer-flow/skills/` directory exists
- Verify skills have valid `SKILL.md` files
- Check `skills.path` configuration if using custom path
### "Docker sandbox fails to start"
- Ensure Docker is running
- Check port 8080 (or configured port) is available
- Verify Docker image is accessible
## Examples
See `config.example.yaml` for complete examples of all configuration options.

View File

@@ -1,6 +1,8 @@
from datetime import datetime
SYSTEM_PROMPT = f"""
from src.skills import load_skills
SYSTEM_PROMPT_TEMPLATE = """
<role>
You are DeerFlow 2.0, an open-source super agent.
</role>
@@ -14,19 +16,16 @@ You are DeerFlow 2.0, an open-source super agent.
You have access to skills that provide optimized workflows for specific tasks. Each skill contains best practices, frameworks, and references to additional resources.
**Progressive Loading Pattern:**
1. When a user query matches a skill's use case, immediately call `view` on the skill's main file located at `/mnt/skills/{"{skill_name}"}/SKILL.md`
1. When a user query matches a skill's use case, immediately call `view` on the skill's main file using the path attribute provided in the skill tag below
2. Read and understand the skill's workflow and instructions
3. The skill file contains references to external resources under the same folder
4. Load referenced resources only when needed during execution
5. Follow the skill's instructions precisely
**Skills are located at:** {skills_base_path}
<all_available_skills>
<skill name="generate-web-page">
Generate a web page or web application
</skill>
<skill name="pdf-processing">
Extract text, fill forms, merge PDFs (pypdf, pdfplumber)
</skill>
{skills_list}
</all_available_skills>
</skill_system>
@@ -64,4 +63,27 @@ All temporary work happens in `/mnt/user-data/workspace`. Final deliverables mus
def apply_prompt_template() -> str:
return SYSTEM_PROMPT + f"\n<current_date>{datetime.now().strftime('%Y-%m-%d, %A')}</current_date>"
# Load all available skills
skills = load_skills()
# Get skills container path from config
try:
from src.config import get_app_config
config = get_app_config()
container_base_path = config.skills.container_path
except Exception:
# Fallback to default if config fails
container_base_path = "/mnt/skills"
# Generate skills list XML with paths
skills_list = "\n".join(f'<skill name="{skill.name}" path="{skill.get_container_path(container_base_path)}">\n{skill.description}\n</skill>' for skill in skills)
# If no skills found, provide empty list
if not skills_list:
skills_list = "<!-- No skills available -->"
# Format the prompt with dynamic skills
prompt = SYSTEM_PROMPT_TEMPLATE.format(skills_list=skills_list, skills_base_path=container_base_path)
return prompt + f"\n<current_date>{datetime.now().strftime('%Y-%m-%d, %A')}</current_date>"

View File

@@ -100,6 +100,26 @@ class AioSandboxProvider(SandboxProvider):
(str(thread_dir / "outputs"), f"{CONTAINER_USER_DATA_DIR}/outputs", False),
]
def _get_skills_mount(self) -> tuple[str, str, bool] | None:
"""Get the skills directory mount configuration.
Returns:
Tuple of (host_path, container_path, read_only) if skills directory exists,
None otherwise.
"""
try:
config = get_app_config()
skills_path = config.skills.get_skills_path()
container_path = config.skills.container_path
# Only mount if skills directory exists
if skills_path.exists():
return (str(skills_path), container_path, True) # Read-only mount for security
except Exception as e:
logger.warning(f"Could not setup skills mount: {e}")
return None
def _start_container(self, sandbox_id: str, port: int, extra_mounts: list[tuple[str, str, bool]] | None = None) -> str:
"""Start a new Docker container for the sandbox.
@@ -208,11 +228,17 @@ class AioSandboxProvider(SandboxProvider):
sandbox_id = str(uuid.uuid4())[:8]
# Get thread-specific mounts if thread_id is provided
extra_mounts = None
extra_mounts = []
if thread_id:
extra_mounts = self._get_thread_mounts(thread_id)
extra_mounts.extend(self._get_thread_mounts(thread_id))
logger.info(f"Adding thread mounts for thread {thread_id}: {extra_mounts}")
# Add skills mount if available
skills_mount = self._get_skills_mount()
if skills_mount:
extra_mounts.append(skills_mount)
logger.info(f"Adding skills mount: {skills_mount}")
# If base_url is configured, use existing sandbox
if self._config.get("base_url"):
base_url = self._config["base_url"]
@@ -230,7 +256,7 @@ class AioSandboxProvider(SandboxProvider):
raise RuntimeError("auto_start is disabled and no base_url is configured")
port = self._find_available_port(self._config["port"])
container_id = self._start_container(sandbox_id, port, extra_mounts=extra_mounts)
container_id = self._start_container(sandbox_id, port, extra_mounts=extra_mounts if extra_mounts else None)
self._containers[sandbox_id] = container_id
base_url = f"http://localhost:{port}"

View File

@@ -1,3 +1,4 @@
from .app_config import get_app_config
from .skills_config import SkillsConfig
__all__ = ["get_app_config"]
__all__ = ["get_app_config", "SkillsConfig"]

View File

@@ -8,6 +8,7 @@ from pydantic import BaseModel, ConfigDict, Field
from src.config.model_config import ModelConfig
from src.config.sandbox_config import SandboxConfig
from src.config.skills_config import SkillsConfig
from src.config.title_config import load_title_config_from_dict
from src.config.tool_config import ToolConfig, ToolGroupConfig
@@ -21,6 +22,7 @@ class AppConfig(BaseModel):
sandbox: SandboxConfig = Field(description="Sandbox configuration")
tools: list[ToolConfig] = Field(default_factory=list, description="Available tools")
tool_groups: list[ToolGroupConfig] = Field(default_factory=list, description="Available tool groups")
skills: SkillsConfig = Field(default_factory=SkillsConfig, description="Skills configuration")
model_config = ConfigDict(extra="allow", frozen=False)
@classmethod

View File

@@ -0,0 +1,49 @@
from pathlib import Path
from pydantic import BaseModel, Field
class SkillsConfig(BaseModel):
"""Configuration for skills system"""
path: str | None = Field(
default=None,
description="Path to skills directory. If not specified, defaults to ../skills relative to backend directory",
)
container_path: str = Field(
default="/mnt/skills",
description="Path where skills are mounted in the sandbox container",
)
def get_skills_path(self) -> Path:
"""
Get the resolved skills directory path.
Returns:
Path to the skills directory
"""
if self.path:
# Use configured path (can be absolute or relative)
path = Path(self.path)
if not path.is_absolute():
# If relative, resolve from current working directory
path = Path.cwd() / path
return path.resolve()
else:
# Default: ../skills relative to backend directory
from src.skills.loader import get_skills_root_path
return get_skills_root_path()
def get_skill_container_path(self, skill_name: str, category: str = "public") -> str:
"""
Get the full container path for a specific skill.
Args:
skill_name: Name of the skill (directory name)
category: Category of the skill (public or custom)
Returns:
Full path to the skill in the container
"""
return f"{self.container_path}/{category}/{skill_name}"

View File

@@ -1,13 +1,46 @@
import os
import subprocess
from pathlib import Path
from src.sandbox.local.list_dir import list_dir
from src.sandbox.sandbox import Sandbox
class LocalSandbox(Sandbox):
def __init__(self, id: str):
def __init__(self, id: str, path_mappings: dict[str, str] | None = None):
"""
Initialize local sandbox with optional path mappings.
Args:
id: Sandbox identifier
path_mappings: Dictionary mapping container paths to local paths
Example: {"/mnt/skills": "/absolute/path/to/skills"}
"""
super().__init__(id)
self.path_mappings = path_mappings or {}
def _resolve_path(self, path: str) -> str:
"""
Resolve container path to actual local path using mappings.
Args:
path: Path that might be a container path
Returns:
Resolved local path
"""
path_str = str(path)
# Try each mapping (longest prefix first for more specific matches)
for container_path, local_path in sorted(self.path_mappings.items(), key=lambda x: len(x[0]), reverse=True):
if path_str.startswith(container_path):
# Replace the container path prefix with local path
relative = path_str[len(container_path) :].lstrip("/")
resolved = str(Path(local_path) / relative) if relative else local_path
return resolved
# No mapping found, return original path
return path_str
def execute_command(self, command: str) -> str:
result = subprocess.run(
@@ -26,16 +59,19 @@ class LocalSandbox(Sandbox):
return output if output else "(no output)"
def list_dir(self, path: str, max_depth=2) -> list[str]:
return list_dir(path, max_depth)
resolved_path = self._resolve_path(path)
return list_dir(resolved_path, max_depth)
def read_file(self, path: str) -> str:
with open(path) as f:
resolved_path = self._resolve_path(path)
with open(resolved_path) as f:
return f.read()
def write_file(self, path: str, content: str, append: bool = False) -> None:
dir_path = os.path.dirname(path)
resolved_path = self._resolve_path(path)
dir_path = os.path.dirname(resolved_path)
if dir_path:
os.makedirs(dir_path, exist_ok=True)
mode = "a" if append else "w"
with open(path, mode) as f:
with open(resolved_path, mode) as f:
f.write(content)

View File

@@ -5,10 +5,42 @@ _singleton: LocalSandbox | None = None
class LocalSandboxProvider(SandboxProvider):
def __init__(self):
"""Initialize the local sandbox provider with path mappings."""
self._path_mappings = self._setup_path_mappings()
def _setup_path_mappings(self) -> dict[str, str]:
"""
Setup path mappings for local sandbox.
Maps container paths to actual local paths, including skills directory.
Returns:
Dictionary of path mappings
"""
mappings = {}
# Map skills container path to local skills directory
try:
from src.config import get_app_config
config = get_app_config()
skills_path = config.skills.get_skills_path()
container_path = config.skills.container_path
# Only add mapping if skills directory exists
if skills_path.exists():
mappings[container_path] = str(skills_path)
except Exception as e:
# Log but don't fail if config loading fails
print(f"Warning: Could not setup skills path mapping: {e}")
return mappings
def acquire(self, thread_id: str | None = None) -> str:
global _singleton
if _singleton is None:
_singleton = LocalSandbox("local")
_singleton = LocalSandbox("local", path_mappings=self._path_mappings)
return _singleton.id
def get(self, sandbox_id: str) -> None:

View File

@@ -0,0 +1,4 @@
from .loader import get_skills_root_path, load_skills
from .types import Skill
__all__ = ["load_skills", "get_skills_root_path", "Skill"]

View File

@@ -0,0 +1,77 @@
from pathlib import Path
from .parser import parse_skill_file
from .types import Skill
def get_skills_root_path() -> Path:
"""
Get the root path of the skills directory.
Returns:
Path to the skills directory (deer-flow/skills)
"""
# backend directory is current file's parent's parent's parent
backend_dir = Path(__file__).resolve().parent.parent.parent
# skills directory is sibling to backend directory
skills_dir = backend_dir.parent / "skills"
return skills_dir
def load_skills(skills_path: Path | None = None, use_config: bool = True) -> list[Skill]:
"""
Load all skills from the skills directory.
Scans both public and custom skill directories, parsing SKILL.md files
to extract metadata.
Args:
skills_path: Optional custom path to skills directory.
If not provided and use_config is True, uses path from config.
Otherwise defaults to deer-flow/skills
use_config: Whether to load skills path from config (default: True)
Returns:
List of Skill objects, sorted by name
"""
if skills_path is None:
if use_config:
try:
from src.config import get_app_config
config = get_app_config()
skills_path = config.skills.get_skills_path()
except Exception:
# Fallback to default if config fails
skills_path = get_skills_root_path()
else:
skills_path = get_skills_root_path()
if not skills_path.exists():
return []
skills = []
# Scan public and custom directories
for category in ["public", "custom"]:
category_path = skills_path / category
if not category_path.exists() or not category_path.is_dir():
continue
# Each subdirectory is a potential skill
for skill_dir in category_path.iterdir():
if not skill_dir.is_dir():
continue
skill_file = skill_dir / "SKILL.md"
if not skill_file.exists():
continue
skill = parse_skill_file(skill_file, category=category)
if skill:
skills.append(skill)
# Sort by name for consistent ordering
skills.sort(key=lambda s: s.name)
return skills

View File

@@ -0,0 +1,63 @@
import re
from pathlib import Path
from .types import Skill
def parse_skill_file(skill_file: Path, category: str) -> Skill | None:
"""
Parse a SKILL.md file and extract metadata.
Args:
skill_file: Path to the SKILL.md file
category: Category of the skill ('public' or 'custom')
Returns:
Skill object if parsing succeeds, None otherwise
"""
if not skill_file.exists() or skill_file.name != "SKILL.md":
return None
try:
content = skill_file.read_text(encoding="utf-8")
# Extract YAML front matter
# Pattern: ---\nkey: value\n---
front_matter_match = re.match(r"^---\s*\n(.*?)\n---\s*\n", content, re.DOTALL)
if not front_matter_match:
return None
front_matter = front_matter_match.group(1)
# Parse YAML front matter (simple key-value parsing)
metadata = {}
for line in front_matter.split("\n"):
line = line.strip()
if not line:
continue
if ":" in line:
key, value = line.split(":", 1)
metadata[key.strip()] = value.strip()
# Extract required fields
name = metadata.get("name")
description = metadata.get("description")
if not name or not description:
return None
license_text = metadata.get("license")
return Skill(
name=name,
description=description,
license=license_text,
skill_dir=skill_file.parent,
skill_file=skill_file,
category=category,
)
except Exception as e:
print(f"Error parsing skill file {skill_file}: {e}")
return None

View File

@@ -0,0 +1,34 @@
from dataclasses import dataclass
from pathlib import Path
@dataclass
class Skill:
"""Represents a skill with its metadata and file path"""
name: str
description: str
license: str | None
skill_dir: Path
skill_file: Path
category: str # 'public' or 'custom'
@property
def skill_path(self) -> str:
"""Returns the relative path from skills root to this skill's directory"""
return self.skill_dir.name
def get_container_path(self, container_base_path: str = "/mnt/skills") -> str:
"""
Get the full path to this skill in the container.
Args:
container_base_path: Base path where skills are mounted in the container
Returns:
Full container path to the skill directory
"""
return f"{container_base_path}/{self.category}/{self.skill_dir.name}"
def __repr__(self) -> str:
return f"Skill(name={self.name!r}, description={self.description!r}, category={self.category!r})"