docs: revise backend README and CLAUDE.md to reflect full architecture

Updated documentation to accurately cover all backend subsystems including subagents, memory, middleware chain, sandbox, MCP, skills, and gateway API. Fixed broken MCP_SETUP.md link in root README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 06:12:14 +08:00 · 2026-02-08 22:49:36 +08:00
parent 010aba1e28
commit 2703eb0b22
3 changed files with 395 additions and 371 deletions
--- a/backend/README.md
+++ b/backend/README.md
@@ -1,41 +1,133 @@
 # DeerFlow Backend

-DeerFlow is a LangGraph-based AI agent system that provides a powerful "super agent" with sandbox execution capabilities. The backend enables AI agents to execute code, browse the web, manage files, and perform complex multi-step tasks in isolated environments.
+DeerFlow is a LangGraph-based AI super agent with sandbox execution, persistent memory, and extensible tool integration. The backend enables AI agents to execute code, browse the web, manage files, delegate tasks to subagents, and retain context across conversations - all in isolated, per-thread environments.

 ---
-## Features

- **LangGraph Agent Runtime**: Built on LangGraph for robust multi-agent workflow orchestration
- **Sandbox Execution**: Safe code execution with local or Docker-based isolation
- **Multi-Model Support**: OpenAI, Anthropic Claude, DeepSeek, Doubao, Kimi, and custom LangChain-compatible models
- **MCP Integration**: Extensible tool ecosystem via Model Context Protocol
- **Skills System**: Specialized domain workflows injected into agent prompts
- **File Upload & Processing**: Multi-format document upload with automatic Markdown conversion
- **Context Summarization**: Automatic conversation summarization for long conversations
- **Plan Mode**: TodoList middleware for complex multi-step task tracking
-
---
 ## Architecture

 ```
-┌─────────────────────────────────────────────────────────────────┐
-│                         Nginx (Port 2026)                       │
-│              Unified reverse proxy entry point                   │
-└─────────────────┬───────────────────────────────┬───────────────┘
-                  │                               │
-                  ▼                               ▼
-┌─────────────────────────────┐   ┌─────────────────────────────┐
-│   LangGraph Server (2024)   │   │    Gateway API (8001)       │
-│   Agent runtime & workflows │   │   Models, MCP, Skills, etc. │
-└─────────────────────────────┘   └─────────────────────────────┘
+                        ┌──────────────────────────────────────┐
+                        │          Nginx (Port 2026)           │
+                        │      Unified reverse proxy           │
+                        └───────┬──────────────────┬───────────┘
+                                │                  │
+              /api/langgraph/*  │                  │  /api/* (other)
+                                ▼                  ▼
+               ┌────────────────────┐  ┌────────────────────────┐
+               │ LangGraph Server   │  │   Gateway API (8001)   │
+               │    (Port 2024)     │  │   FastAPI REST         │
+               │                    │  │                        │
+               │ ┌────────────────┐ │  │ Models, MCP, Skills,   │
+               │ │  Lead Agent    │ │  │ Memory, Uploads,       │
+               │ │  ┌──────────┐  │ │  │ Artifacts              │
+               │ │  │Middleware│  │ │  └────────────────────────┘
+               │ │  │  Chain   │  │ │
+               │ │  └──────────┘  │ │
+               │ │  ┌──────────┐  │ │
+               │ │  │  Tools   │  │ │
+               │ │  └──────────┘  │ │
+               │ │  ┌──────────┐  │ │
+               │ │  │Subagents │  │ │
+               │ │  └──────────┘  │ │
+               │ └────────────────┘ │
+               └────────────────────┘
 ```

-**Request Routing**:
- `/api/langgraph/*` → LangGraph Server (agent interactions, threads, streaming)
- `/api/*` (other) → Gateway API (models, MCP, skills, artifacts, uploads)
- `/` (non-API) → Frontend (web interface)
+**Request Routing** (via Nginx):
+- `/api/langgraph/*` → LangGraph Server - agent interactions, threads, streaming
+- `/api/*` (other) → Gateway API - models, MCP, skills, memory, artifacts, uploads
+- `/` (non-API) → Frontend - Next.js web interface

 ---
+
+## Core Components
+
+### Lead Agent
+
+The single LangGraph agent (`lead_agent`) is the runtime entry point, created via `make_lead_agent(config)`. It combines:
+
+- **Dynamic model selection** with thinking and vision support
+- **Middleware chain** for cross-cutting concerns (9 middlewares)
+- **Tool system** with sandbox, MCP, community, and built-in tools
+- **Subagent delegation** for parallel task execution
+- **System prompt** with skills injection, memory context, and working directory guidance
+
+### Middleware Chain
+
+Middlewares execute in strict order, each handling a specific concern:
+
+| # | Middleware | Purpose |
+|---|-----------|---------|
+| 1 | **ThreadDataMiddleware** | Creates per-thread isolated directories (workspace, uploads, outputs) |
+| 2 | **UploadsMiddleware** | Injects newly uploaded files into conversation context |
+| 3 | **SandboxMiddleware** | Acquires sandbox environment for code execution |
+| 4 | **SummarizationMiddleware** | Reduces context when approaching token limits (optional) |
+| 5 | **TodoListMiddleware** | Tracks multi-step tasks in plan mode (optional) |
+| 6 | **TitleMiddleware** | Auto-generates conversation titles after first exchange |
+| 7 | **MemoryMiddleware** | Queues conversations for async memory extraction |
+| 8 | **ViewImageMiddleware** | Injects image data for vision-capable models (conditional) |
+| 9 | **ClarificationMiddleware** | Intercepts clarification requests and interrupts execution (must be last) |
+
+### Sandbox System
+
+Per-thread isolated execution with virtual path translation:
+
+- **Abstract interface**: `execute_command`, `read_file`, `write_file`, `list_dir`
+- **Providers**: `LocalSandboxProvider` (filesystem) and `AioSandboxProvider` (Docker, in community/)
+- **Virtual paths**: `/mnt/user-data/{workspace,uploads,outputs}` → thread-specific physical directories
+- **Skills path**: `/mnt/skills` → `deer-flow/skills/` directory
+- **Tools**: `bash`, `ls`, `read_file`, `write_file`, `str_replace`
+
+### Subagent System
+
+Async task delegation with concurrent execution:
+
+- **Built-in agents**: `general-purpose` (full toolset) and `bash` (command specialist)
+- **Concurrency**: Max 3 subagents per turn, 15-minute timeout
+- **Execution**: Background thread pools with status tracking and SSE events
+- **Flow**: Agent calls `task()` tool → executor runs subagent in background → polls for completion → returns result
+
+### Memory System
+
+LLM-powered persistent context retention across conversations:
+
+- **Automatic extraction**: Analyzes conversations for user context, facts, and preferences
+- **Structured storage**: User context (work, personal, top-of-mind), history, and confidence-scored facts
+- **Debounced updates**: Batches updates to minimize LLM calls (configurable wait time)
+- **System prompt injection**: Top facts + context injected into agent prompts
+- **Storage**: JSON file with mtime-based cache invalidation
+
+### Tool Ecosystem
+
+| Category | Tools |
+|----------|-------|
+| **Sandbox** | `bash`, `ls`, `read_file`, `write_file`, `str_replace` |
+| **Built-in** | `present_files`, `ask_clarification`, `view_image`, `task` (subagent) |
+| **Community** | Tavily (web search), Jina AI (web fetch), Firecrawl (scraping), DuckDuckGo (image search) |
+| **MCP** | Any Model Context Protocol server (stdio, SSE, HTTP transports) |
+| **Skills** | Domain-specific workflows injected via system prompt |
+
+### Gateway API
+
+FastAPI application providing REST endpoints for frontend integration:
+
+| Route | Purpose |
+|-------|---------|
+| `GET /api/models` | List available LLM models |
+| `GET/PUT /api/mcp/config` | Manage MCP server configurations |
+| `GET/PUT /api/skills` | List and manage skills |
+| `POST /api/skills/install` | Install skill from `.skill` archive |
+| `GET /api/memory` | Retrieve memory data |
+| `POST /api/memory/reload` | Force memory reload |
+| `GET /api/memory/config` | Memory configuration |
+| `GET /api/memory/status` | Combined config + data |
+| `POST /api/threads/{id}/uploads` | Upload files (auto-converts PDF/PPT/Excel/Word to Markdown) |
+| `GET /api/threads/{id}/uploads/list` | List uploaded files |
+| `GET /api/threads/{id}/artifacts/{path}` | Serve generated artifacts |
+
+---
+
 ## Quick Start

 ### Prerequisites
@@ -47,7 +139,6 @@ DeerFlow is a LangGraph-based AI agent system that provides a powerful "super ag
 ### Installation

 ```bash
-# Clone the repository (if not already)
 cd deer-flow

 # Copy configuration files
@@ -61,23 +152,23 @@ make install

 ### Configuration

-Edit `config.yaml` in the project root to configure your models and tools:
+Edit `config.yaml` in the project root:

 ```yaml
 models:
-  - name: gpt-4
-    display_name: GPT-4
+  - name: gpt-4o
+    display_name: GPT-4o
    use: langchain_openai:ChatOpenAI
-    model: gpt-4
-    api_key: $OPENAI_API_KEY  # Set environment variable
-    max_tokens: 4096
+    model: gpt-4o
+    api_key: $OPENAI_API_KEY
+    supports_thinking: false
+    supports_vision: true
 ```

 Set your API keys:

 ```bash
 export OPENAI_API_KEY="your-api-key-here"
-# Or other provider keys as needed
 ```

 ### Running
@@ -100,96 +191,70 @@ make dev
 make gateway
 ```

-Direct access:
- LangGraph: http://localhost:2024
- Gateway: http://localhost:8001
+Direct access: LangGraph at http://localhost:2024, Gateway at http://localhost:8001

 ---
+
 ## Project Structure

 ```
 backend/
 ├── src/
-│   ├── agents/              # LangGraph agents and workflows
-│   │   ├── lead_agent/      # Main agent implementation
-│   │   └── middlewares/     # Agent middlewares
-│   ├── gateway/             # FastAPI Gateway API
-│   │   └── routers/         # API route handlers
-│   ├── sandbox/             # Sandbox execution system
-│   ├── tools/               # Agent tools (builtins)
-│   ├── mcp/                 # MCP integration
-│   ├── models/              # Model factory
-│   ├── skills/              # Skills loader
-│   ├── config/              # Configuration system
-│   ├── community/           # Community tools (web search, etc.)
-│   ├── reflection/          # Dynamic module loading
-│   └── utils/               # Utility functions
-├── docs/                    # Documentation
-├── tests/                   # Test suite
-├── langgraph.json           # LangGraph server configuration
-├── config.yaml              # Application configuration (optional)
-├── pyproject.toml           # Python dependencies
-├── Makefile                 # Development commands
-└── Dockerfile               # Container build
+│   ├── agents/                  # Agent system
+│   │   ├── lead_agent/         # Main agent (factory, prompts)
+│   │   ├── middlewares/        # 9 middleware components
+│   │   ├── memory/             # Memory extraction & storage
+│   │   └── thread_state.py    # ThreadState schema
+│   ├── gateway/                # FastAPI Gateway API
+│   │   ├── app.py             # Application setup
+│   │   └── routers/           # 6 route modules
+│   ├── sandbox/                # Sandbox execution
+│   │   ├── local/             # Local filesystem provider
+│   │   ├── sandbox.py         # Abstract interface
+│   │   ├── tools.py           # bash, ls, read/write/str_replace
+│   │   └── middleware.py      # Sandbox lifecycle
+│   ├── subagents/              # Subagent delegation
+│   │   ├── builtins/          # general-purpose, bash agents
+│   │   ├── executor.py        # Background execution engine
+│   │   └── registry.py        # Agent registry
+│   ├── tools/builtins/         # Built-in tools
+│   ├── mcp/                    # MCP protocol integration
+│   ├── models/                 # Model factory
+│   ├── skills/                 # Skill discovery & loading
+│   ├── config/                 # Configuration system
+│   ├── community/              # Community tools & providers
+│   ├── reflection/             # Dynamic module loading
+│   └── utils/                  # Utilities
+├── docs/                       # Documentation
+├── tests/                      # Test suite
+├── langgraph.json              # LangGraph server configuration
+├── pyproject.toml              # Python dependencies
+├── Makefile                    # Development commands
+└── Dockerfile                  # Container build
 ```

 ---
-## API Reference

-### LangGraph API (via `/api/langgraph/*`)
-
- `POST /threads` - Create new conversation thread
- `POST /threads/{thread_id}/runs` - Execute agent with input
- `GET /threads/{thread_id}/runs` - Get run history
- `GET /threads/{thread_id}/state` - Get current conversation state
- WebSocket support for streaming responses
-
-### Gateway API (via `/api/*`)
-
-**Models**:
- `GET /api/models` - List available LLM models
- `GET /api/models/{model_name}` - Get model details
-
-**MCP Configuration**:
- `GET /api/mcp/config` - Get current MCP server configurations
- `PUT /api/mcp/config` - Update MCP configuration
-
-**Skills Management**:
- `GET /api/skills` - List all skills
- `GET /api/skills/{skill_name}` - Get skill details
- `POST /api/skills/{skill_name}/enable` - Enable a skill
- `POST /api/skills/{skill_name}/disable` - Disable a skill
- `POST /api/skills/install` - Install skill from `.skill` file
-
-**File Uploads**:
- `POST /api/threads/{thread_id}/uploads` - Upload files
- `GET /api/threads/{thread_id}/uploads/list` - List uploaded files
- `DELETE /api/threads/{thread_id}/uploads/{filename}` - Delete file
-
-**Artifacts**:
- `GET /api/threads/{thread_id}/artifacts/{path}` - Download generated artifacts
-
---
 ## Configuration

 ### Main Configuration (`config.yaml`)

-The application uses a YAML-based configuration file. Place it in the project root directory.
+Place in project root. Config values starting with `$` resolve as environment variables.

 Key sections:
- `models`: LLM configurations with class paths and API keys
- `tool_groups`: Logical groupings for tools
- `tools`: Tool definitions with module paths
- `sandbox`: Execution environment settings
- `skills`: Skills directory configuration
- `title`: Auto-title generation settings
- `summarization`: Context summarization settings
-
-See [docs/CONFIGURATION.md](docs/CONFIGURATION.md) for detailed documentation.
+- `models` - LLM configurations with class paths, API keys, thinking/vision flags
+- `tools` - Tool definitions with module paths and groups
+- `tool_groups` - Logical tool groupings
+- `sandbox` - Execution environment provider
+- `skills` - Skills directory paths
+- `title` - Auto-title generation settings
+- `summarization` - Context summarization settings
+- `subagents` - Subagent system (enabled/disabled)
+- `memory` - Memory system settings (enabled, storage, debounce, facts limits)

 ### Extensions Configuration (`extensions_config.json`)

-MCP servers and skills are configured in `extensions_config.json`:
+MCP servers and skill states in a single file:

 ```json
 {
@@ -216,6 +281,7 @@ MCP servers and skills are configured in `extensions_config.json`:
 - Tool API keys: `TAVILY_API_KEY`, `GITHUB_TOKEN`, etc.

 ---
+
 ## Development

 ### Commands
@@ -230,10 +296,11 @@ make format     # Format code (ruff)

 ### Code Style

- Uses `ruff` for linting and formatting
- Line length: 240 characters
- Python 3.12+ with type hints
- Double quotes, space indentation
+- **Linter/Formatter**: `ruff`
+- **Line length**: 240 characters
+- **Python**: 3.12+ with type hints
+- **Quotes**: Double quotes
+- **Indentation**: 4 spaces

 ### Testing

@@ -242,45 +309,36 @@ uv run pytest
 ```

 ---
-## Documentation

- [Configuration Guide](docs/CONFIGURATION.md) - Detailed configuration options
- [Setup Guide](docs/SETUP.md) - Quick setup instructions
- [File Upload](docs/FILE_UPLOAD.md) - File upload functionality
- [Path Examples](docs/PATH_EXAMPLES.md) - Path types and usage
- [Summarization](docs/summarization.md) - Context summarization feature
- [Plan Mode](docs/plan_mode_usage.md) - TodoList middleware usage
-
---
 ## Technology Stack

-### Core Frameworks
- **LangChain** (1.2.3+) - LLM orchestration
- **LangGraph** (1.0.6+) - Multi-agent workflows
- **FastAPI** (0.115.0+) - REST API
- **Uvicorn** (0.34.0+) - ASGI server
-
-### LLM Integrations
- `langchain-openai` - OpenAI models
- `langchain-anthropic` - Claude models
- `langchain-deepseek` - DeepSeek models
-
-### Extensions
- `langchain-mcp-adapters` - MCP protocol support
- `agent-sandbox` - Sandboxed code execution
-
-### Utilities
- `markitdown` - Multi-format to Markdown conversion
- `tavily-python` - Web search
- `firecrawl-py` - Web scraping
- `ddgs` - DuckDuckGo image search
+- **LangGraph** (1.0.6+) - Agent framework and multi-agent orchestration
+- **LangChain** (1.2.3+) - LLM abstractions and tool system
+- **FastAPI** (0.115.0+) - Gateway REST API
+- **langchain-mcp-adapters** - Model Context Protocol support
+- **agent-sandbox** - Sandboxed code execution
+- **markitdown** - Multi-format document conversion
+- **tavily-python** / **firecrawl-py** - Web search and scraping

 ---
+
+## Documentation
+
+- [Configuration Guide](docs/CONFIGURATION.md)
+- [Architecture Details](docs/ARCHITECTURE.md)
+- [API Reference](docs/API.md)
+- [File Upload](docs/FILE_UPLOAD.md)
+- [Path Examples](docs/PATH_EXAMPLES.md)
+- [Context Summarization](docs/summarization.md)
+- [Plan Mode](docs/plan_mode_usage.md)
+- [Setup Guide](docs/SETUP.md)
+
+---
+
 ## License

 See the [LICENSE](../LICENSE) file in the project root.

---
 ## Contributing

 See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines.