docs: add comprehensive backend documentation

- Add README.md with project overview, quick start, and API reference - Add CONTRIBUTING.md with development setup and contribution guidelines - Add docs/ARCHITECTURE.md with detailed system architecture diagrams - Add docs/API.md with complete API reference for LangGraph and Gateway - Add docs/README.md as documentation index - Update CLAUDE.md with improved structure and new features - Update docs/TODO.md to reflect current status - Update pyproject.toml description Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-03 06:12:14 +08:00 · 2026-02-01 22:18:25 +08:00
parent 9b77070406
commit 9043c964ca
8 changed files with 2014 additions and 105 deletions
--- a/backend/CLAUDE.md
+++ b/backend/CLAUDE.md
@@ -8,7 +8,7 @@ DeerFlow is a LangGraph-based AI agent system with a full-stack architecture. Th

 **Architecture**:
 - **LangGraph Server** (port 2024): Agent runtime and workflow execution
- **Gateway API** (port 8001): REST API for models, MCP, skills, and artifacts
+- **Gateway API** (port 8001): REST API for models, MCP, skills, artifacts, and uploads
 - **Frontend** (port 3000): Next.js web interface
 - **Nginx** (port 2026): Unified reverse proxy entry point

@@ -27,7 +27,12 @@ deer-flow/
 │   │   ├── sandbox/           # Sandbox execution system
 │   │   ├── tools/             # Agent tools
 │   │   ├── mcp/               # MCP integration
-│   │   └── skills/            # Skills loading and management
+│   │   ├── models/            # Model factory
+│   │   ├── skills/            # Skills loading and management
+│   │   ├── config/            # Configuration system
+│   │   ├── community/         # Community tools (web search, etc.)
+│   │   ├── reflection/        # Dynamic module loading
+│   │   └── utils/             # Utilities
 │   └── langgraph.json         # LangGraph server configuration
 ├── frontend/                   # Next.js frontend application
 └── skills/                     # Agent skills directory
@@ -74,9 +79,11 @@ make format

 ### Configuration System

-The app uses a YAML-based configuration system loaded from `config.yaml`.
+The app uses a two-tier YAML/JSON-based configuration system.

-**Setup**: Copy `config.example.yaml` to `config.yaml` in the **project root** directory and customize for your environment.
+**Main Configuration** (`config.yaml`):
+
+Setup: Copy `config.example.yaml` to `config.yaml` in the **project root** directory.

 ```bash
 # From project root (deer-flow/)
@@ -91,96 +98,10 @@ Configuration priority:

 Config values starting with `$` are resolved as environment variables (e.g., `$OPENAI_API_KEY`).

-### Core Components
-
-**Gateway API** (`src/gateway/`)
- FastAPI application that provides REST endpoints for frontend integration
- Endpoints:
-  - `/api/models` - List available LLM models from configuration
-  - `/api/mcp` - Manage MCP server configurations (GET, POST)
-  - `/api/skills` - Manage skill configurations (GET, POST)
-  - `/api/threads/{thread_id}/artifacts/*` - Serve agent-generated artifacts (files, images, etc.)
- Works alongside LangGraph server, handling non-agent HTTP operations
- Proxied through nginx under `/api/*` routes (except `/api/langgraph/*`)
-
-**Agent Graph** (`src/agents/`)
- `lead_agent` is the main entry point registered in `langgraph.json`
- Uses `ThreadState` which extends `AgentState` with sandbox state
- Agent is created via `create_agent()` with model, tools, middleware, and system prompt
-
-**Sandbox System** (`src/sandbox/`)
- Abstract `Sandbox` base class defines interface: `execute_command`, `read_file`, `write_file`, `list_dir`
- `SandboxProvider` manages sandbox lifecycle: `acquire`, `get`, `release`
- `SandboxMiddleware` automatically acquires sandbox on agent start and injects into state
- `LocalSandboxProvider` is a singleton implementation for local execution
- Sandbox tools (`bash`, `ls`, `read_file`, `write_file`, `str_replace`) extract sandbox from tool runtime
-
-**Model Factory** (`src/models/`)
- `create_chat_model()` instantiates LLM from config using reflection
- Supports `thinking_enabled` flag with per-model `when_thinking_enabled` overrides
-
-**Tool System** (`src/tools/`)
- Tools defined in config with `use` path (e.g., `src.sandbox.tools:bash_tool`)
- `get_available_tools()` resolves tool paths via reflection
- Community tools in `src/community/`: Jina AI (web fetch), Tavily (web search)
- Supports MCP (Model Context Protocol) for pluggable external tools
-
-**MCP System** (`src/mcp/`)
- Integrates with MCP servers to provide pluggable external tools using `langchain-mcp-adapters`
- Uses `MultiServerMCPClient` from langchain-mcp-adapters for multi-server management
- **Automatic initialization**: Tools are loaded on first use with lazy initialization
- Supports both eager loading (FastAPI startup) and lazy loading (LangGraph Studio)
- `initialize_mcp_tools()` can be called in FastAPI lifespan handler for eager loading
- `get_cached_mcp_tools()` automatically initializes tools if not already loaded
- Works seamlessly in both FastAPI server and LangGraph Studio environments
- Each server can be enabled/disabled independently via `enabled` flag
- Popular MCP servers: filesystem, postgres, github, brave-search, puppeteer
- Built on top of langchain-ai/langchain-mcp-adapters for seamless integration
-
-**Reflection System** (`src/reflection/`)
- `resolve_variable()` imports module and returns variable (e.g., `module:variable`)
- `resolve_class()` imports and validates class against base class
-
-**Skills System** (`src/skills/`)
- Skills provide specialized workflows for specific tasks (e.g., PDF processing, frontend design)
- Located in `deer-flow/skills/{public,custom}` directory structure
- Each skill has a `SKILL.md` file with YAML front matter (name, description, license)
- Skills are automatically discovered and loaded at runtime
- `load_skills()` scans directories and parses SKILL.md files
- Skills are injected into agent's system prompt with paths (only enabled skills)
- Path mapping system allows seamless access in both local and Docker sandbox:
-  - Local sandbox: `/mnt/skills` → `/path/to/deer-flow/skills`
-  - Docker sandbox: Automatically mounted as volume
- Each skill can be enabled/disabled independently via `enabled` flag in extensions config
-
-**Middleware System**
- Custom middlewares in `src/agents/middlewares/`: Title generation, thread data, clarification, etc.
- `SummarizationMiddleware` from LangChain automatically condenses conversation history when token limits are approached
- Configured in `config.yaml` under `summarization` key with trigger/keep thresholds
- Middlewares are registered in `src/agents/lead_agent/agent.py` with execution order:
-  1. `ThreadDataMiddleware` - Initializes thread context
-  2. `SandboxMiddleware` - Manages sandbox lifecycle
-  3. `SummarizationMiddleware` - Reduces context when limits are approached (if enabled)
-  4. `TitleMiddleware` - Generates conversation titles
-  5. `ClarificationMiddleware` - Handles clarification requests (must be last)
-
-### Config Schema
-
-Models, tools, sandbox providers, skills, and middleware settings are configured in `config.yaml`:
- `models[]`: LLM configurations with `use` class path
- `tools[]`: Tool configurations with `use` variable path and `group`
- `sandbox.use`: Sandbox provider class path
- `skills.path`: Host path to skills directory (optional, default: `../skills`)
- `skills.container_path`: Container mount path (default: `/mnt/skills`)
- `title`: Automatic thread title generation configuration
- `summarization`: Automatic conversation summarization configuration
-
-**Extensions Configuration** (`extensions_config.json`)
+**Extensions Configuration** (`extensions_config.json`):

 MCP servers and skills are configured together in `extensions_config.json` in project root:

-**Setup**: Copy `extensions_config.example.json` to `extensions_config.json` in the **project root** directory.
-
 ```bash
 # From project root (deer-flow/)
 cp extensions_config.example.json extensions_config.json
@@ -193,12 +114,115 @@ Configuration priority:
 4. `extensions_config.json` in parent directory (project root - **recommended location**)
 5. For backward compatibility: `mcp_config.json` (will be deprecated)

-Structure:
+### Core Components
+
+**Gateway API** (`src/gateway/`)
+- FastAPI application that provides REST endpoints for frontend integration
+- Endpoints:
+  - `/api/models` - List available LLM models from configuration
+  - `/api/mcp` - Manage MCP server configurations (GET, POST)
+  - `/api/skills` - Manage skill configurations (GET, POST)
+  - `/api/threads/{thread_id}/artifacts/*` - Serve agent-generated artifacts
+  - `/api/threads/{thread_id}/uploads` - File upload, list, delete
+- Works alongside LangGraph server, handling non-agent HTTP operations
+- Proxied through nginx under `/api/*` routes (except `/api/langgraph/*`)
+
+**Agent Graph** (`src/agents/`)
+- `lead_agent` is the main entry point registered in `langgraph.json`
+- Uses `ThreadState` which extends `AgentState` with:
+  - `sandbox`: Sandbox environment info
+  - `artifacts`: Generated file paths
+  - `thread_data`: Workspace/uploads/outputs paths
+  - `title`: Auto-generated conversation title
+  - `todos`: Task tracking (plan mode)
+  - `viewed_images`: Vision model image data
+- Agent is created via `make_lead_agent(config)` with model, tools, middleware, and system prompt
+
+**Sandbox System** (`src/sandbox/`)
+- Abstract `Sandbox` base class defines interface: `execute_command`, `read_file`, `write_file`, `list_dir`
+- `SandboxProvider` manages sandbox lifecycle: `acquire`, `get`, `release`
+- `SandboxMiddleware` automatically acquires sandbox on agent start and injects into state
+- `LocalSandboxProvider` is a singleton implementation for local execution
+- `AioSandboxProvider` provides Docker-based isolation (in `src/community/`)
+- Sandbox tools (`bash`, `ls`, `read_file`, `write_file`, `str_replace`) extract sandbox from tool runtime
+
+**Virtual Path System**:
+- Paths map between virtual and physical locations
+- Virtual: `/mnt/user-data/{workspace,uploads,outputs}` - used by agent
+- Physical: `backend/.deer-flow/threads/{thread_id}/user-data/{workspace,uploads,outputs}`
+- Skills path: `/mnt/skills` maps to `deer-flow/skills/`
+
+**Model Factory** (`src/models/factory.py`)
+- `create_chat_model()` instantiates LLM from config using reflection
+- Supports `thinking_enabled` flag with per-model `when_thinking_enabled` overrides
+- Supports `supports_vision` flag for image understanding models
+
+**Tool System** (`src/tools/`)
+- Tools defined in config with `use` path (e.g., `src.sandbox.tools:bash_tool`)
+- `get_available_tools()` resolves tool paths via reflection
+- Built-in tools in `src/tools/builtins/`:
+  - `present_file_tool` - Display files to users
+  - `ask_clarification_tool` - Request clarification
+  - `view_image_tool` - Vision model integration (conditional on model capability)
+- Community tools in `src/community/`: Jina AI (web fetch), Tavily (web search), Firecrawl (scraping)
+- Supports MCP (Model Context Protocol) for pluggable external tools
+
+**MCP System** (`src/mcp/`)
+- Integrates with MCP servers to provide pluggable external tools using `langchain-mcp-adapters`
+- Uses `MultiServerMCPClient` from langchain-mcp-adapters for multi-server management
+- **Automatic initialization**: Tools are loaded on first use with lazy initialization
+- Supports both eager loading (FastAPI startup) and lazy loading (LangGraph Studio)
+- `initialize_mcp_tools()` can be called in FastAPI lifespan handler for eager loading
+- `get_cached_mcp_tools()` automatically initializes tools if not already loaded
+- Each server can be enabled/disabled independently via `enabled` flag
+- Support types: stdio (command-based), SSE, HTTP
+- Built on top of langchain-ai/langchain-mcp-adapters for seamless integration
+
+**Reflection System** (`src/reflection/`)
+- `resolve_variable()` imports module and returns variable (e.g., `module:variable`)
+- `resolve_class()` imports and validates class against base class
+
+**Skills System** (`src/skills/`)
+- Skills provide specialized workflows for specific tasks (e.g., PDF processing, frontend design)
+- Located in `deer-flow/skills/{public,custom}` directory structure
+- Each skill has a `SKILL.md` file with YAML front matter (name, description, license, allowed-tools)
+- Skills are automatically discovered and loaded at runtime
+- `load_skills()` scans directories and parses SKILL.md files
+- Skills are injected into agent's system prompt with paths (only enabled skills)
+- Path mapping system allows seamless access in both local and Docker sandbox
+- Each skill can be enabled/disabled independently via `enabled` flag in extensions config
+
+**Middleware System** (`src/agents/middlewares/`)
+- Custom middlewares handle cross-cutting concerns
+- Middlewares are registered in `src/agents/lead_agent/agent.py` with execution order:
+  1. `ThreadDataMiddleware` - Initializes thread context (workspace, uploads, outputs paths)
+  2. `UploadsMiddleware` - Processes uploaded files, injects file list into state
+  3. `SandboxMiddleware` - Manages sandbox lifecycle, acquires on start
+  4. `SummarizationMiddleware` - Reduces context when token limits approached (if enabled)
+  5. `TitleMiddleware` - Generates conversation titles
+  6. `TodoListMiddleware` - Tracks multi-step tasks (if plan_mode enabled)
+  7. `ViewImageMiddleware` - Injects image details for vision models
+  8. `ClarificationMiddleware` - Handles clarification requests (must be last)
+
+### Config Schema
+
+Models, tools, sandbox providers, skills, and middleware settings are configured in `config.yaml`:
+- `models[]`: LLM configurations with `use` class path, `supports_thinking`, `supports_vision`
+- `tools[]`: Tool configurations with `use` variable path and `group`
+- `tool_groups[]`: Logical groupings for tools
+- `sandbox.use`: Sandbox provider class path
+- `skills.path`: Host path to skills directory (optional, default: `../skills`)
+- `skills.container_path`: Container mount path (default: `/mnt/skills`)
+- `title`: Automatic thread title generation configuration
+- `summarization`: Automatic conversation summarization configuration
+
+**Extensions Configuration Schema** (`extensions_config.json`):
 - `mcpServers`: Map of MCP server name to configuration
  - `enabled`: Whether the server is enabled (boolean)
-  - `command`: Command to execute to start the server (e.g., "npx", "python")
+  - `type`: Transport type (`stdio`, `sse`, `http`)
+  - `command`: Command to execute (for stdio type)
  - `args`: Arguments to pass to the command (array)
-  - `env`: Environment variables (object with `$VAR` support for env variable resolution)
+  - `env`: Environment variables (object with `$VAR` support)
  - `description`: Human-readable description
 - `skills`: Map of skill name to state configuration
  - `enabled`: Whether the skill is enabled (boolean, default: true if not specified)
@@ -218,7 +242,7 @@ This starts all services and makes the application available at `http://localhos

 **Nginx routing**:
 - `/api/langgraph/*` → LangGraph Server (2024) - Agent interactions, threads, streaming
- `/api/*` (other) → Gateway API (8001) - Models, MCP, skills, artifacts
+- `/api/*` (other) → Gateway API (8001) - Models, MCP, skills, artifacts, uploads
 - `/` (non-API) → Frontend (3000) - Web interface

 ### Running Backend Services Separately
@@ -245,9 +269,57 @@ The frontend uses environment variables to connect to backend services:

 When using `make dev` from root, the frontend automatically connects through nginx.

+## Key Features
+
+### File Upload
+
+The backend supports multi-file upload with automatic document conversion:
+- Endpoint: `POST /api/threads/{thread_id}/uploads`
+- Supports: PDF, PPT, Excel, Word documents
+- Auto-converts documents to Markdown using `markitdown`
+- Files stored in thread-isolated directories
+- Agent automatically receives uploaded file list via `UploadsMiddleware`
+
+See [docs/FILE_UPLOAD.md](docs/FILE_UPLOAD.md) for details.
+
+### Plan Mode
+
+Enable TodoList middleware for complex multi-step tasks:
+- Controlled via runtime config: `config.configurable.is_plan_mode = True`
+- Provides `write_todos` tool for task tracking
+- Agent can break down complex tasks and track progress
+
+See [docs/plan_mode_usage.md](docs/plan_mode_usage.md) for details.
+
+### Context Summarization
+
+Automatic conversation summarization when approaching token limits:
+- Configured in `config.yaml` under `summarization` key
+- Trigger types: tokens, messages, or fraction of max input
+- Keeps recent messages while summarizing older ones
+
+See [docs/summarization.md](docs/summarization.md) for details.
+
+### Vision Support
+
+For models with `supports_vision: true`:
+- `ViewImageMiddleware` processes images in conversation
+- `view_image_tool` added to agent's toolset
+- Images automatically converted and injected into state
+
 ## Code Style

 - Uses `ruff` for linting and formatting
 - Line length: 240 characters
 - Python 3.12+ with type hints
 - Double quotes, space indentation
+
+## Documentation
+
+See `docs/` directory for detailed documentation:
+- [CONFIGURATION.md](docs/CONFIGURATION.md) - Configuration options
+- [SETUP.md](docs/SETUP.md) - Setup guide
+- [FILE_UPLOAD.md](docs/FILE_UPLOAD.md) - File upload feature
+- [PATH_EXAMPLES.md](docs/PATH_EXAMPLES.md) - Path types and usage
+- [summarization.md](docs/summarization.md) - Context summarization
+- [plan_mode_usage.md](docs/plan_mode_usage.md) - Plan mode with TodoList