mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-02 22:02:13 +08:00
* refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.*) and app (app.*) Physically split the monolithic backend/src/ package into two layers: - **Harness** (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.*`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - **App** (`app/`): unpublished application code with import prefix `app.*`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src.* → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.*` or `import app.*` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow.*/app.* after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
29 KiB
29 KiB
Architecture Overview
This document provides a comprehensive overview of the DeerFlow backend architecture.
System Architecture
┌──────────────────────────────────────────────────────────────────────────┐
│ Client (Browser) │
└─────────────────────────────────┬────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────┐
│ Nginx (Port 2026) │
│ Unified Reverse Proxy Entry Point │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ /api/langgraph/* → LangGraph Server (2024) │ │
│ │ /api/* → Gateway API (8001) │ │
│ │ /* → Frontend (3000) │ │
│ └────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────┬────────────────────────────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
│ LangGraph Server │ │ Gateway API │ │ Frontend │
│ (Port 2024) │ │ (Port 8001) │ │ (Port 3000) │
│ │ │ │ │ │
│ - Agent Runtime │ │ - Models API │ │ - Next.js App │
│ - Thread Mgmt │ │ - MCP Config │ │ - React UI │
│ - SSE Streaming │ │ - Skills Mgmt │ │ - Chat Interface │
│ - Checkpointing │ │ - File Uploads │ │ │
│ │ │ - Artifacts │ │ │
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
│ │
│ ┌─────────────────┘
│ │
▼ ▼
┌──────────────────────────────────────────────────────────────────────────┐
│ Shared Configuration │
│ ┌─────────────────────────┐ ┌────────────────────────────────────────┐ │
│ │ config.yaml │ │ extensions_config.json │ │
│ │ - Models │ │ - MCP Servers │ │
│ │ - Tools │ │ - Skills State │ │
│ │ - Sandbox │ │ │ │
│ │ - Summarization │ │ │ │
│ └─────────────────────────┘ └────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
Component Details
LangGraph Server
The LangGraph server is the core agent runtime, built on LangGraph for robust multi-agent workflow orchestration.
Entry Point: packages/harness/deerflow/agents/lead_agent/agent.py:make_lead_agent
Key Responsibilities:
- Agent creation and configuration
- Thread state management
- Middleware chain execution
- Tool execution orchestration
- SSE streaming for real-time responses
Configuration: langgraph.json
{
"agent": {
"type": "agent",
"path": "deerflow.agents:make_lead_agent"
}
}
Gateway API
FastAPI application providing REST endpoints for non-agent operations.
Entry Point: app/gateway/app.py
Routers:
models.py-/api/models- Model listing and detailsmcp.py-/api/mcp- MCP server configurationskills.py-/api/skills- Skills managementuploads.py-/api/threads/{id}/uploads- File uploadartifacts.py-/api/threads/{id}/artifacts- Artifact serving
Agent Architecture
┌─────────────────────────────────────────────────────────────────────────┐
│ make_lead_agent(config) │
└────────────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Middleware Chain │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ 1. ThreadDataMiddleware - Initialize workspace/uploads/outputs │ │
│ │ 2. UploadsMiddleware - Process uploaded files │ │
│ │ 3. SandboxMiddleware - Acquire sandbox environment │ │
│ │ 4. SummarizationMiddleware - Context reduction (if enabled) │ │
│ │ 5. TitleMiddleware - Auto-generate titles │ │
│ │ 6. TodoListMiddleware - Task tracking (if plan_mode) │ │
│ │ 7. ViewImageMiddleware - Vision model support │ │
│ │ 8. ClarificationMiddleware - Handle clarifications │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Agent Core │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐ │
│ │ Model │ │ Tools │ │ System Prompt │ │
│ │ (from factory) │ │ (configured + │ │ (with skills) │ │
│ │ │ │ MCP + builtin) │ │ │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Thread State
The ThreadState extends LangGraph's AgentState with additional fields:
class ThreadState(AgentState):
# Core state from AgentState
messages: list[BaseMessage]
# DeerFlow extensions
sandbox: dict # Sandbox environment info
artifacts: list[str] # Generated file paths
thread_data: dict # {workspace, uploads, outputs} paths
title: str | None # Auto-generated conversation title
todos: list[dict] # Task tracking (plan mode)
viewed_images: dict # Vision model image data
Sandbox System
┌─────────────────────────────────────────────────────────────────────────┐
│ Sandbox Architecture │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────┐
│ SandboxProvider │ (Abstract)
│ - acquire() │
│ - get() │
│ - release() │
└────────────┬────────────┘
│
┌────────────────────┼────────────────────┐
│ │
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ LocalSandboxProvider │ │ AioSandboxProvider │
│ (packages/harness/deerflow/sandbox/local.py) │ │ (packages/harness/deerflow/community/) │
│ │ │ │
│ - Singleton instance │ │ - Docker-based │
│ - Direct execution │ │ - Isolated containers │
│ - Development use │ │ - Production use │
└─────────────────────────┘ └─────────────────────────┘
┌─────────────────────────┐
│ Sandbox │ (Abstract)
│ - execute_command() │
│ - read_file() │
│ - write_file() │
│ - list_dir() │
└─────────────────────────┘
Virtual Path Mapping:
| Virtual Path | Physical Path |
|---|---|
/mnt/user-data/workspace |
backend/.deer-flow/threads/{thread_id}/user-data/workspace |
/mnt/user-data/uploads |
backend/.deer-flow/threads/{thread_id}/user-data/uploads |
/mnt/user-data/outputs |
backend/.deer-flow/threads/{thread_id}/user-data/outputs |
/mnt/skills |
deer-flow/skills/ |
Tool System
┌─────────────────────────────────────────────────────────────────────────┐
│ Tool Sources │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
│ Built-in Tools │ │ Configured Tools │ │ MCP Tools │
│ (packages/harness/deerflow/tools/) │ │ (config.yaml) │ │ (extensions.json) │
├─────────────────────┤ ├─────────────────────┤ ├─────────────────────┤
│ - present_file │ │ - web_search │ │ - github │
│ - ask_clarification │ │ - web_fetch │ │ - filesystem │
│ - view_image │ │ - bash │ │ - postgres │
│ │ │ - read_file │ │ - brave-search │
│ │ │ - write_file │ │ - puppeteer │
│ │ │ - str_replace │ │ - ... │
│ │ │ - ls │ │ │
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
│ │ │
└───────────────────────┴───────────────────────┘
│
▼
┌─────────────────────────┐
│ get_available_tools() │
│ (packages/harness/deerflow/tools/__init__) │
└─────────────────────────┘
Model Factory
┌─────────────────────────────────────────────────────────────────────────┐
│ Model Factory │
│ (packages/harness/deerflow/models/factory.py) │
└─────────────────────────────────────────────────────────────────────────┘
config.yaml:
┌─────────────────────────────────────────────────────────────────────────┐
│ models: │
│ - name: gpt-4 │
│ display_name: GPT-4 │
│ use: langchain_openai:ChatOpenAI │
│ model: gpt-4 │
│ api_key: $OPENAI_API_KEY │
│ max_tokens: 4096 │
│ supports_thinking: false │
│ supports_vision: true │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────┐
│ create_chat_model() │
│ - name: str │
│ - thinking_enabled │
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ resolve_class() │
│ (reflection system) │
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ BaseChatModel │
│ (LangChain instance) │
└─────────────────────────┘
Supported Providers:
- OpenAI (
langchain_openai:ChatOpenAI) - Anthropic (
langchain_anthropic:ChatAnthropic) - DeepSeek (
langchain_deepseek:ChatDeepSeek) - Custom via LangChain integrations
MCP Integration
┌─────────────────────────────────────────────────────────────────────────┐
│ MCP Integration │
│ (packages/harness/deerflow/mcp/manager.py) │
└─────────────────────────────────────────────────────────────────────────┘
extensions_config.json:
┌─────────────────────────────────────────────────────────────────────────┐
│ { │
│ "mcpServers": { │
│ "github": { │
│ "enabled": true, │
│ "type": "stdio", │
│ "command": "npx", │
│ "args": ["-y", "@modelcontextprotocol/server-github"], │
│ "env": {"GITHUB_TOKEN": "$GITHUB_TOKEN"} │
│ } │
│ } │
│ } │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────┐
│ MultiServerMCPClient │
│ (langchain-mcp-adapters)│
└────────────┬────────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ stdio │ │ SSE │ │ HTTP │
│ transport │ │ transport │ │ transport │
└───────────┘ └───────────┘ └───────────┘
Skills System
┌─────────────────────────────────────────────────────────────────────────┐
│ Skills System │
│ (packages/harness/deerflow/skills/loader.py) │
└─────────────────────────────────────────────────────────────────────────┘
Directory Structure:
┌─────────────────────────────────────────────────────────────────────────┐
│ skills/ │
│ ├── public/ # Public skills (committed) │
│ │ ├── pdf-processing/ │
│ │ │ └── SKILL.md │
│ │ ├── frontend-design/ │
│ │ │ └── SKILL.md │
│ │ └── ... │
│ └── custom/ # Custom skills (gitignored) │
│ └── user-installed/ │
│ └── SKILL.md │
└─────────────────────────────────────────────────────────────────────────┘
SKILL.md Format:
┌─────────────────────────────────────────────────────────────────────────┐
│ --- │
│ name: PDF Processing │
│ description: Handle PDF documents efficiently │
│ license: MIT │
│ allowed-tools: │
│ - read_file │
│ - write_file │
│ - bash │
│ --- │
│ │
│ # Skill Instructions │
│ Content injected into system prompt... │
└─────────────────────────────────────────────────────────────────────────┘
Request Flow
┌─────────────────────────────────────────────────────────────────────────┐
│ Request Flow Example │
│ User sends message to agent │
└─────────────────────────────────────────────────────────────────────────┘
1. Client → Nginx
POST /api/langgraph/threads/{thread_id}/runs
{"input": {"messages": [{"role": "user", "content": "Hello"}]}}
2. Nginx → LangGraph Server (2024)
Proxied to LangGraph server
3. LangGraph Server
a. Load/create thread state
b. Execute middleware chain:
- ThreadDataMiddleware: Set up paths
- UploadsMiddleware: Inject file list
- SandboxMiddleware: Acquire sandbox
- SummarizationMiddleware: Check token limits
- TitleMiddleware: Generate title if needed
- TodoListMiddleware: Load todos (if plan mode)
- ViewImageMiddleware: Process images
- ClarificationMiddleware: Check for clarifications
c. Execute agent:
- Model processes messages
- May call tools (bash, web_search, etc.)
- Tools execute via sandbox
- Results added to messages
d. Stream response via SSE
4. Client receives streaming response
Data Flow
File Upload Flow
1. Client uploads file
POST /api/threads/{thread_id}/uploads
Content-Type: multipart/form-data
2. Gateway receives file
- Validates file
- Stores in .deer-flow/threads/{thread_id}/user-data/uploads/
- If document: converts to Markdown via markitdown
3. Returns response
{
"files": [{
"filename": "doc.pdf",
"path": ".deer-flow/.../uploads/doc.pdf",
"virtual_path": "/mnt/user-data/uploads/doc.pdf",
"artifact_url": "/api/threads/.../artifacts/mnt/.../doc.pdf"
}]
}
4. Next agent run
- UploadsMiddleware lists files
- Injects file list into messages
- Agent can access via virtual_path
Configuration Reload
1. Client updates MCP config
PUT /api/mcp/config
2. Gateway writes extensions_config.json
- Updates mcpServers section
- File mtime changes
3. MCP Manager detects change
- get_cached_mcp_tools() checks mtime
- If changed: reinitializes MCP client
- Loads updated server configurations
4. Next agent run uses new tools
Security Considerations
Sandbox Isolation
- Agent code executes within sandbox boundaries
- Local sandbox: Direct execution (development only)
- Docker sandbox: Container isolation (production recommended)
- Path traversal prevention in file operations
API Security
- Thread isolation: Each thread has separate data directories
- File validation: Uploads checked for path safety
- Environment variable resolution: Secrets not stored in config
MCP Security
- Each MCP server runs in its own process
- Environment variables resolved at runtime
- Servers can be enabled/disabled independently
Performance Considerations
Caching
- MCP tools cached with file mtime invalidation
- Configuration loaded once, reloaded on file change
- Skills parsed once at startup, cached in memory
Streaming
- SSE used for real-time response streaming
- Reduces time to first token
- Enables progress visibility for long operations
Context Management
- Summarization middleware reduces context when limits approached
- Configurable triggers: tokens, messages, or fraction
- Preserves recent messages while summarizing older ones