# Architecture Overview This document provides a comprehensive overview of the DeerFlow backend architecture. ## System Architecture ``` ┌──────────────────────────────────────────────────────────────────────────┐ │ Client (Browser) │ └─────────────────────────────────┬────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────────────────┐ │ Nginx (Port 2026) │ │ Unified Reverse Proxy Entry Point │ │ ┌────────────────────────────────────────────────────────────────────┐ │ │ │ /api/langgraph/* → LangGraph Server (2024) │ │ │ │ /api/* → Gateway API (8001) │ │ │ │ /* → Frontend (3000) │ │ │ └────────────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────┬────────────────────────────────────────┘ │ ┌───────────────────────┼───────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ │ LangGraph Server │ │ Gateway API │ │ Frontend │ │ (Port 2024) │ │ (Port 8001) │ │ (Port 3000) │ │ │ │ │ │ │ │ - Agent Runtime │ │ - Models API │ │ - Next.js App │ │ - Thread Mgmt │ │ - MCP Config │ │ - React UI │ │ - SSE Streaming │ │ - Skills Mgmt │ │ - Chat Interface │ │ - Checkpointing │ │ - File Uploads │ │ │ │ │ │ - Artifacts │ │ │ └─────────────────────┘ └─────────────────────┘ └─────────────────────┘ │ │ │ ┌─────────────────┘ │ │ ▼ ▼ ┌──────────────────────────────────────────────────────────────────────────┐ │ Shared Configuration │ │ ┌─────────────────────────┐ ┌────────────────────────────────────────┐ │ │ │ config.yaml │ │ extensions_config.json │ │ │ │ - Models │ │ - MCP Servers │ │ │ │ - Tools │ │ - Skills State │ │ │ │ - Sandbox │ │ │ │ │ │ - Summarization │ │ │ │ │ └─────────────────────────┘ └────────────────────────────────────────┘ │ └──────────────────────────────────────────────────────────────────────────┘ ``` ## Component Details ### LangGraph Server The LangGraph server is the core agent runtime, built on LangGraph for robust multi-agent workflow orchestration. **Entry Point**: `src/agents/lead_agent/agent.py:make_lead_agent` **Key Responsibilities**: - Agent creation and configuration - Thread state management - Middleware chain execution - Tool execution orchestration - SSE streaming for real-time responses **Configuration**: `langgraph.json` ```json { "agent": { "type": "agent", "path": "src.agents:make_lead_agent" } } ``` ### Gateway API FastAPI application providing REST endpoints for non-agent operations. **Entry Point**: `src/gateway/app.py` **Routers**: - `models.py` - `/api/models` - Model listing and details - `mcp.py` - `/api/mcp` - MCP server configuration - `skills.py` - `/api/skills` - Skills management - `uploads.py` - `/api/threads/{id}/uploads` - File upload - `artifacts.py` - `/api/threads/{id}/artifacts` - Artifact serving ### Agent Architecture ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ make_lead_agent(config) │ └────────────────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ Middleware Chain │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ 1. ThreadDataMiddleware - Initialize workspace/uploads/outputs │ │ │ │ 2. UploadsMiddleware - Process uploaded files │ │ │ │ 3. SandboxMiddleware - Acquire sandbox environment │ │ │ │ 4. SummarizationMiddleware - Context reduction (if enabled) │ │ │ │ 5. TitleMiddleware - Auto-generate titles │ │ │ │ 6. TodoListMiddleware - Task tracking (if plan_mode) │ │ │ │ 7. ViewImageMiddleware - Vision model support │ │ │ │ 8. ClarificationMiddleware - Handle clarifications │ │ │ └──────────────────────────────────────────────────────────────────┘ │ └────────────────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ Agent Core │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐ │ │ │ Model │ │ Tools │ │ System Prompt │ │ │ │ (from factory) │ │ (configured + │ │ (with skills) │ │ │ │ │ │ MCP + builtin) │ │ │ │ │ └──────────────────┘ └──────────────────┘ └──────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────┘ ``` ### Thread State The `ThreadState` extends LangGraph's `AgentState` with additional fields: ```python class ThreadState(AgentState): # Core state from AgentState messages: list[BaseMessage] # DeerFlow extensions sandbox: dict # Sandbox environment info artifacts: list[str] # Generated file paths thread_data: dict # {workspace, uploads, outputs} paths title: str | None # Auto-generated conversation title todos: list[dict] # Task tracking (plan mode) viewed_images: dict # Vision model image data ``` ### Sandbox System ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ Sandbox Architecture │ └─────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────┐ │ SandboxProvider │ (Abstract) │ - acquire() │ │ - get() │ │ - release() │ └────────────┬────────────┘ │ ┌────────────────────┼────────────────────┐ │ │ ▼ ▼ ┌─────────────────────────┐ ┌─────────────────────────┐ │ LocalSandboxProvider │ │ AioSandboxProvider │ │ (src/sandbox/local.py) │ │ (src/community/) │ │ │ │ │ │ - Singleton instance │ │ - Docker-based │ │ - Direct execution │ │ - Isolated containers │ │ - Development use │ │ - Production use │ └─────────────────────────┘ └─────────────────────────┘ ┌─────────────────────────┐ │ Sandbox │ (Abstract) │ - execute_command() │ │ - read_file() │ │ - write_file() │ │ - list_dir() │ └─────────────────────────┘ ``` **Virtual Path Mapping**: | Virtual Path | Physical Path | |-------------|---------------| | `/mnt/user-data/workspace` | `backend/.deer-flow/threads/{thread_id}/user-data/workspace` | | `/mnt/user-data/uploads` | `backend/.deer-flow/threads/{thread_id}/user-data/uploads` | | `/mnt/user-data/outputs` | `backend/.deer-flow/threads/{thread_id}/user-data/outputs` | | `/mnt/skills` | `deer-flow/skills/` | ### Tool System ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ Tool Sources │ └─────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ │ Built-in Tools │ │ Configured Tools │ │ MCP Tools │ │ (src/tools/) │ │ (config.yaml) │ │ (extensions.json) │ ├─────────────────────┤ ├─────────────────────┤ ├─────────────────────┤ │ - present_file │ │ - web_search │ │ - github │ │ - ask_clarification │ │ - web_fetch │ │ - filesystem │ │ - view_image │ │ - bash │ │ - postgres │ │ │ │ - read_file │ │ - brave-search │ │ │ │ - write_file │ │ - puppeteer │ │ │ │ - str_replace │ │ - ... │ │ │ │ - ls │ │ │ └─────────────────────┘ └─────────────────────┘ └─────────────────────┘ │ │ │ └───────────────────────┴───────────────────────┘ │ ▼ ┌─────────────────────────┐ │ get_available_tools() │ │ (src/tools/__init__) │ └─────────────────────────┘ ``` ### Model Factory ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ Model Factory │ │ (src/models/factory.py) │ └─────────────────────────────────────────────────────────────────────────┘ config.yaml: ┌─────────────────────────────────────────────────────────────────────────┐ │ models: │ │ - name: gpt-4 │ │ display_name: GPT-4 │ │ use: langchain_openai:ChatOpenAI │ │ model: gpt-4 │ │ api_key: $OPENAI_API_KEY │ │ max_tokens: 4096 │ │ supports_thinking: false │ │ supports_vision: true │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────┐ │ create_chat_model() │ │ - name: str │ │ - thinking_enabled │ └────────────┬────────────┘ │ ▼ ┌─────────────────────────┐ │ resolve_class() │ │ (reflection system) │ └────────────┬────────────┘ │ ▼ ┌─────────────────────────┐ │ BaseChatModel │ │ (LangChain instance) │ └─────────────────────────┘ ``` **Supported Providers**: - OpenAI (`langchain_openai:ChatOpenAI`) - Anthropic (`langchain_anthropic:ChatAnthropic`) - DeepSeek (`langchain_deepseek:ChatDeepSeek`) - Custom via LangChain integrations ### MCP Integration ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ MCP Integration │ │ (src/mcp/manager.py) │ └─────────────────────────────────────────────────────────────────────────┘ extensions_config.json: ┌─────────────────────────────────────────────────────────────────────────┐ │ { │ │ "mcpServers": { │ │ "github": { │ │ "enabled": true, │ │ "type": "stdio", │ │ "command": "npx", │ │ "args": ["-y", "@modelcontextprotocol/server-github"], │ │ "env": {"GITHUB_TOKEN": "$GITHUB_TOKEN"} │ │ } │ │ } │ │ } │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────┐ │ MultiServerMCPClient │ │ (langchain-mcp-adapters)│ └────────────┬────────────┘ │ ┌────────────────────┼────────────────────┐ │ │ │ ▼ ▼ ▼ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ stdio │ │ SSE │ │ HTTP │ │ transport │ │ transport │ │ transport │ └───────────┘ └───────────┘ └───────────┘ ``` ### Skills System ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ Skills System │ │ (src/skills/loader.py) │ └─────────────────────────────────────────────────────────────────────────┘ Directory Structure: ┌─────────────────────────────────────────────────────────────────────────┐ │ skills/ │ │ ├── public/ # Public skills (committed) │ │ │ ├── pdf-processing/ │ │ │ │ └── SKILL.md │ │ │ ├── frontend-design/ │ │ │ │ └── SKILL.md │ │ │ └── ... │ │ └── custom/ # Custom skills (gitignored) │ │ └── user-installed/ │ │ └── SKILL.md │ └─────────────────────────────────────────────────────────────────────────┘ SKILL.md Format: ┌─────────────────────────────────────────────────────────────────────────┐ │ --- │ │ name: PDF Processing │ │ description: Handle PDF documents efficiently │ │ license: MIT │ │ allowed-tools: │ │ - read_file │ │ - write_file │ │ - bash │ │ --- │ │ │ │ # Skill Instructions │ │ Content injected into system prompt... │ └─────────────────────────────────────────────────────────────────────────┘ ``` ### Request Flow ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ Request Flow Example │ │ User sends message to agent │ └─────────────────────────────────────────────────────────────────────────┘ 1. Client → Nginx POST /api/langgraph/threads/{thread_id}/runs {"input": {"messages": [{"role": "user", "content": "Hello"}]}} 2. Nginx → LangGraph Server (2024) Proxied to LangGraph server 3. LangGraph Server a. Load/create thread state b. Execute middleware chain: - ThreadDataMiddleware: Set up paths - UploadsMiddleware: Inject file list - SandboxMiddleware: Acquire sandbox - SummarizationMiddleware: Check token limits - TitleMiddleware: Generate title if needed - TodoListMiddleware: Load todos (if plan mode) - ViewImageMiddleware: Process images - ClarificationMiddleware: Check for clarifications c. Execute agent: - Model processes messages - May call tools (bash, web_search, etc.) - Tools execute via sandbox - Results added to messages d. Stream response via SSE 4. Client receives streaming response ``` ## Data Flow ### File Upload Flow ``` 1. Client uploads file POST /api/threads/{thread_id}/uploads Content-Type: multipart/form-data 2. Gateway receives file - Validates file - Stores in .deer-flow/threads/{thread_id}/user-data/uploads/ - If document: converts to Markdown via markitdown 3. Returns response { "files": [{ "filename": "doc.pdf", "path": ".deer-flow/.../uploads/doc.pdf", "virtual_path": "/mnt/user-data/uploads/doc.pdf", "artifact_url": "/api/threads/.../artifacts/mnt/.../doc.pdf" }] } 4. Next agent run - UploadsMiddleware lists files - Injects file list into messages - Agent can access via virtual_path ``` ### Configuration Reload ``` 1. Client updates MCP config PUT /api/mcp/config 2. Gateway writes extensions_config.json - Updates mcpServers section - File mtime changes 3. MCP Manager detects change - get_cached_mcp_tools() checks mtime - If changed: reinitializes MCP client - Loads updated server configurations 4. Next agent run uses new tools ``` ## Security Considerations ### Sandbox Isolation - Agent code executes within sandbox boundaries - Local sandbox: Direct execution (development only) - Docker sandbox: Container isolation (production recommended) - Path traversal prevention in file operations ### API Security - Thread isolation: Each thread has separate data directories - File validation: Uploads checked for path safety - Environment variable resolution: Secrets not stored in config ### MCP Security - Each MCP server runs in its own process - Environment variables resolved at runtime - Servers can be enabled/disabled independently ## Performance Considerations ### Caching - MCP tools cached with file mtime invalidation - Configuration loaded once, reloaded on file change - Skills parsed once at startup, cached in memory ### Streaming - SSE used for real-time response streaming - Reduces time to first token - Enables progress visibility for long operations ### Context Management - Summarization middleware reduces context when limits approached - Configurable triggers: tokens, messages, or fraction - Preserves recent messages while summarizing older ones