docs: revise backend README and CLAUDE.md to reflect full architecture

Updated documentation to accurately cover all backend subsystems including subagents, memory, middleware chain, sandbox, MCP, skills, and gateway API. Fixed broken MCP_SETUP.md link in root README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-19 12:24:46 +08:00 · 2026-02-08 22:49:36 +08:00
parent 010aba1e28
commit 2703eb0b22
3 changed files with 395 additions and 371 deletions
--- a/README.md
+++ b/README.md
@@ -72,7 +72,7 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed local development guide.
 - [Contributing Guide](CONTRIBUTING.md) - Development environment setup and workflow
 - [Configuration Guide](backend/docs/CONFIGURATION.md) - Setup and configuration instructions
 - [Architecture Overview](backend/CLAUDE.md) - Technical architecture details
- [MCP Setup Guide](MCP_SETUP.md) - Configure Model Context Protocol servers for additional tools
+- [Backend Architecture](backend/README.md) - Backend architecture and API reference
 ## Contributing
--- a/backend/CLAUDE.md
+++ b/backend/CLAUDE.md
@@ -4,11 +4,11 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 ## Project Overview
-DeerFlow is a LangGraph-based AI agent system with a full-stack architecture. The backend provides a "super agent" with sandbox execution capabilities that can execute code, browse the web, and manage files in isolated environments.
+DeerFlow is a LangGraph-based AI super agent system with a full-stack architecture. The backend provides a "super agent" with sandbox execution, persistent memory, subagent delegation, and extensible tool integration - all operating in per-thread isolated environments.
 **Architecture**:
 - **LangGraph Server** (port 2024): Agent runtime and workflow execution
- **Gateway API** (port 8001): REST API for models, MCP, skills, artifacts, and uploads
+- **Gateway API** (port 8001): REST API for models, MCP, skills, memory, artifacts, and uploads
 - **Frontend** (port 3000): Next.js web interface
 - **Nginx** (port 2026): Unified reverse proxy entry point
@@ -16,24 +16,39 @@ DeerFlow is a LangGraph-based AI agent system with a full-stack architecture. Th
 ```
 deer-flow/
 ├── Makefile                    # Root commands (check, install, dev, stop)
 ├── nginx.conf                  # Nginx reverse proxy configuration
 ├── config.yaml                 # Main application configuration
 ├── extensions_config.json      # MCP servers and skills configuration
 ├── backend/                    # Backend application (this directory)
 │   ├── Makefile               # Backend-only commands (dev, gateway, lint)
 │   ├── langgraph.json         # LangGraph server configuration
 │   ├── src/
-│   │   ├── agents/            # LangGraph agents and workflows
+│   │   ├── agents/            # LangGraph agent system
 │   │   │   ├── lead_agent/    # Main agent (factory + system prompt)
 │   │   │   ├── middlewares/   # 9 middleware components
 │   │   │   ├── memory/        # Memory extraction, queue, prompts
 │   │   │   └── thread_state.py # ThreadState schema
 │   │   ├── gateway/           # FastAPI Gateway API
 │   │   │   ├── app.py         # FastAPI application
 │   │   │   └── routers/       # 6 route modules
 │   │   ├── sandbox/           # Sandbox execution system
-│   │   ├── tools/             # Agent tools
+│   │   │   ├── local/         # Local filesystem provider
-│   │   ├── mcp/               # MCP integration
+│   │   │   ├── sandbox.py     # Abstract Sandbox interface
-│   │   ├── models/            # Model factory
+│   │   │   ├── tools.py       # bash, ls, read/write/str_replace
-│   │   ├── skills/            # Skills loading and management
+│   │   │   └── middleware.py  # Sandbox lifecycle management
-│   │   ├── config/            # Configuration system
+│   │   ├── subagents/         # Subagent delegation system
-│   │   ├── community/         # Community tools (web search, etc.)
+│   │   │   ├── builtins/      # general-purpose, bash agents
-│   │   ├── reflection/        # Dynamic module loading
+│   │   │   ├── executor.py    # Background execution engine
-│   │   └── utils/             # Utilities
+│   │   │   └── registry.py    # Agent registry
-│   └── langgraph.json         # LangGraph server configuration
+│   │   ├── tools/builtins/    # Built-in tools (present_files, ask_clarification, view_image)
 │   │   ├── mcp/               # MCP integration (tools, cache, client)
 │   │   ├── models/            # Model factory with thinking/vision support
 │   │   ├── skills/            # Skills discovery, loading, parsing
 │   │   ├── config/            # Configuration system (app, model, sandbox, tool, etc.)
 │   │   ├── community/         # Community tools (tavily, jina_ai, firecrawl, image_search, aio_sandbox)
 │   │   ├── reflection/        # Dynamic module loading (resolve_variable, resolve_class)
 │   │   └── utils/             # Utilities (network, readability)
 │   ├── tests/                 # Test suite
 │   └── docs/                  # Documentation
 ├── frontend/                   # Next.js frontend application
 └── skills/                     # Agent skills directory
    ├── public/                # Public skills (committed)
@@ -55,52 +70,61 @@ When making code changes, you MUST update the relevant documentation:
 **Root directory** (for full application):
 ```bash
-# Check system requirements
+make check      # Check system requirements
-make check
+make install    # Install all dependencies (frontend + backend)
-
+make dev        # Start all services (LangGraph + Gateway + Frontend + Nginx)
-# Install all dependencies (frontend + backend)
+make stop       # Stop all services
 make install
 # Start all services (LangGraph + Gateway + Frontend + Nginx)
 make dev
 # Stop all services
 make stop
 ```
 **Backend directory** (for backend development only):
 ```bash
-# Install backend dependencies
+make install    # Install backend dependencies
-make install
+make dev        # Run LangGraph server only (port 2024)
-
+make gateway    # Run Gateway API only (port 8001)
-# Run LangGraph server only (port 2024)
+make lint       # Lint with ruff
-make dev
+make format     # Format code with ruff
 # Run Gateway API only (port 8001)
 make gateway
 # Lint
 make lint
 # Format code
 make format
 ```
 ## Architecture
-### Configuration System
+### Agent System
-The app uses a two-tier YAML/JSON-based configuration system.
+**Lead Agent** (`src/agents/lead_agent/agent.py`):
 - Entry point: `make_lead_agent(config: RunnableConfig)` registered in `langgraph.json`
 - Dynamic model selection via `create_chat_model()` with thinking/vision support
 - Tools loaded via `get_available_tools()` - combines sandbox, built-in, MCP, community, and subagent tools
 - System prompt generated by `apply_prompt_template()` with skills, memory, and subagent instructions
 **ThreadState** (`src/agents/thread_state.py`):
 - Extends `AgentState` with: `sandbox`, `thread_data`, `title`, `artifacts`, `todos`, `uploaded_files`, `viewed_images`
 - Uses custom reducers: `merge_artifacts` (deduplicate), `merge_viewed_images` (merge/clear)
 **Runtime Configuration** (via `config.configurable`):
 - `thinking_enabled` - Enable model's extended thinking
 - `model_name` - Select specific LLM model
 - `is_plan_mode` - Enable TodoList middleware
 - `subagent_enabled` - Enable task delegation tool
 ### Middleware Chain
 Middlewares execute in strict order in `src/agents/lead_agent/agent.py`:
 1. **ThreadDataMiddleware** - Creates per-thread directories (`backend/.deer-flow/threads/{thread_id}/user-data/{workspace,uploads,outputs}`)
 2. **UploadsMiddleware** - Tracks and injects newly uploaded files into conversation
 3. **SandboxMiddleware** - Acquires sandbox, stores `sandbox_id` in state
 4. **SummarizationMiddleware** - Context reduction when approaching token limits (optional, if enabled)
 5. **TodoListMiddleware** - Task tracking with `write_todos` tool (optional, if plan_mode)
 6. **TitleMiddleware** - Auto-generates thread title after first complete exchange
 7. **MemoryMiddleware** - Queues conversations for async memory update (filters to user + final AI responses)
 8. **ViewImageMiddleware** - Injects base64 image data before LLM call (conditional on vision support)
 9. **ClarificationMiddleware** - Intercepts `ask_clarification` tool calls, interrupts via `Command(goto=END)` (must be last)
 ### Configuration System
 **Main Configuration** (`config.yaml`):
 Setup: Copy `config.example.yaml` to `config.yaml` in the **project root** directory.
 ```bash
 # From project root (deer-flow/)
 cp config.example.yaml config.yaml
 ```
 Configuration priority:
 1. Explicit `config_path` argument
 2. `DEER_FLOW_CONFIG_PATH` environment variable
@@ -113,185 +137,149 @@ Config values starting with `$` are resolved as environment variables (e.g., `$O
 MCP servers and skills are configured together in `extensions_config.json` in project root:
 ```bash
 # From project root (deer-flow/)
 cp extensions_config.example.json extensions_config.json
 ```
 Configuration priority:
 1. Explicit `config_path` argument
 2. `DEER_FLOW_EXTENSIONS_CONFIG_PATH` environment variable
 3. `extensions_config.json` in current directory (backend/)
 4. `extensions_config.json` in parent directory (project root - **recommended location**)
 5. For backward compatibility: `mcp_config.json` (will be deprecated)
-### Core Components
+### Gateway API (`src/gateway/`)
-**Gateway API** (`src/gateway/`)
+FastAPI application on port 8001 with health check at `GET /health`.
 - FastAPI application that provides REST endpoints for frontend integration
 - Endpoints:
  - `/api/models` - List available LLM models from configuration
  - `/api/mcp` - Manage MCP server configurations (GET, POST)
  - `/api/skills` - Manage skill configurations (GET, POST)
  - `/api/threads/{thread_id}/artifacts/*` - Serve agent-generated artifacts
  - `/api/threads/{thread_id}/uploads` - File upload, list, delete
 - Works alongside LangGraph server, handling non-agent HTTP operations
 - Proxied through nginx under `/api/*` routes (except `/api/langgraph/*`)
-**Agent Graph** (`src/agents/`)
+**Routers**:
 - `lead_agent` is the main entry point registered in `langgraph.json`
 - Uses `ThreadState` which extends `AgentState` with:
  - `sandbox`: Sandbox environment info
  - `artifacts`: Generated file paths
  - `thread_data`: Workspace/uploads/outputs paths
  - `title`: Auto-generated conversation title
  - `todos`: Task tracking (plan mode)
  - `viewed_images`: Vision model image data
 - Agent is created via `make_lead_agent(config)` with model, tools, middleware, and system prompt
-**Sandbox System** (`src/sandbox/`)
+| Router | Endpoints |
- Abstract `Sandbox` base class defines interface: `execute_command`, `read_file`, `write_file`, `list_dir`
+|--------|-----------|
- `SandboxProvider` manages sandbox lifecycle: `acquire`, `get`, `release`
+| **Models** (`/api/models`) | `GET /` - list models; `GET /{name}` - model details |
- `SandboxMiddleware` automatically acquires sandbox on agent start and injects into state
+| **MCP** (`/api/mcp`) | `GET /config` - get config; `PUT /config` - update config (saves to extensions_config.json) |
- `LocalSandboxProvider` is a singleton implementation for local execution
+| **Skills** (`/api/skills`) | `GET /` - list skills; `GET /{name}` - details; `PUT /{name}` - update enabled; `POST /install` - install from .skill archive |
- `AioSandboxProvider` provides Docker-based isolation (in `src/community/`)
+| **Memory** (`/api/memory`) | `GET /` - memory data; `POST /reload` - force reload; `GET /config` - config; `GET /status` - config + data |
- Sandbox tools (`bash`, `ls`, `read_file`, `write_file`, `str_replace`) extract sandbox from tool runtime
+| **Uploads** (`/api/threads/{id}/uploads`) | `POST /` - upload files (auto-converts PDF/PPT/Excel/Word); `GET /list` - list; `DELETE /{filename}` - delete |
 | **Artifacts** (`/api/threads/{id}/artifacts`) | `GET /{path}` - serve artifacts; `?download=true` for download with citation removal |
 Proxied through nginx: `/api/langgraph/*` → LangGraph, all other `/api/*` → Gateway.
 ### Sandbox System (`src/sandbox/`)
 **Interface**: Abstract `Sandbox` with `execute_command`, `read_file`, `write_file`, `list_dir`
 **Provider Pattern**: `SandboxProvider` with `acquire`, `get`, `release` lifecycle
 **Implementations**:
 - `LocalSandboxProvider` - Singleton local filesystem execution with path mappings
 - `AioSandboxProvider` (`src/community/`) - Docker-based isolation
 **Virtual Path System**:
- Paths map between virtual and physical locations
+- Agent sees: `/mnt/user-data/{workspace,uploads,outputs}`, `/mnt/skills`
- Virtual: `/mnt/user-data/{workspace,uploads,outputs}` - used by agent
+- Physical: `backend/.deer-flow/threads/{thread_id}/user-data/...`, `deer-flow/skills/`
- Physical: `backend/.deer-flow/threads/{thread_id}/user-data/{workspace,uploads,outputs}`
+- Translation: `replace_virtual_path()` / `replace_virtual_paths_in_command()`
- Skills path: `/mnt/skills` maps to `deer-flow/skills/`
+- Detection: `is_local_sandbox()` checks `sandbox_id == "local"`
-**Model Factory** (`src/models/factory.py`)
+**Sandbox Tools** (in `src/sandbox/tools.py`):
- `create_chat_model()` instantiates LLM from config using reflection
+- `bash` - Execute commands with path translation and error handling
 - `ls` - Directory listing (tree format, max 2 levels)
 - `read_file` - Read file contents with optional line range
 - `write_file` - Write/append to files, creates directories
 - `str_replace` - Substring replacement (single or all occurrences)
 ### Subagent System (`src/subagents/`)
 **Built-in Agents**: `general-purpose` (all tools except `task`) and `bash` (command specialist)
 **Execution**: Dual thread pool - `_scheduler_pool` (3 workers) + `_execution_pool` (3 workers)
 **Concurrency**: `MAX_CONCURRENT_SUBAGENTS = 3` per trace, 15-minute timeout
 **Flow**: `task()` tool → `SubagentExecutor` → background thread → poll 5s → SSE events → result
 **Events**: `task_started`, `task_running`, `task_completed`/`task_failed`/`task_timed_out`
 ### Tool System (`src/tools/`)
 `get_available_tools(groups, include_mcp, model_name, subagent_enabled)` assembles:
 1. **Config-defined tools** - Resolved from `config.yaml` via `resolve_variable()`
 2. **MCP tools** - From enabled MCP servers (lazy initialized, cached with mtime invalidation)
 3. **Built-in tools**:
   - `present_files` - Make output files visible to user (only `/mnt/user-data/outputs`)
   - `ask_clarification` - Request clarification (intercepted by ClarificationMiddleware → interrupts)
   - `view_image` - Read image as base64 (added only if model supports vision)
 4. **Subagent tool** (if enabled):
   - `task` - Delegate to subagent (description, prompt, subagent_type, max_turns)
 **Community tools** (`src/community/`):
 - `tavily/` - Web search (5 results default) and web fetch (4KB limit)
 - `jina_ai/` - Web fetch via Jina reader API with readability extraction
 - `firecrawl/` - Web scraping via Firecrawl API
 - `image_search/` - Image search via DuckDuckGo
 ### MCP System (`src/mcp/`)
 - Uses `langchain-mcp-adapters` `MultiServerMCPClient` for multi-server management
 - **Lazy initialization**: Tools loaded on first use via `get_cached_mcp_tools()`
 - **Cache invalidation**: Detects config file changes via mtime comparison
 - **Transports**: stdio (command-based), SSE, HTTP
 - **Runtime updates**: Gateway API saves to extensions_config.json; LangGraph detects via mtime
 ### Skills System (`src/skills/`)
 - **Location**: `deer-flow/skills/{public,custom}/`
 - **Format**: Directory with `SKILL.md` (YAML frontmatter: name, description, license, allowed-tools)
 - **Loading**: `load_skills()` scans directories, parses SKILL.md, reads enabled state from extensions_config.json
 - **Injection**: Enabled skills listed in agent system prompt with container paths
 - **Installation**: `POST /api/skills/install` extracts .skill ZIP archive to custom/ directory
 ### Model Factory (`src/models/factory.py`)
 - `create_chat_model(name, thinking_enabled)` instantiates LLM from config via reflection
 - Supports `thinking_enabled` flag with per-model `when_thinking_enabled` overrides
 - Supports `supports_vision` flag for image understanding models
 - Config values starting with `$` resolved as environment variables
-**Tool System** (`src/tools/`)
+### Memory System (`src/agents/memory/`)
 - Tools defined in config with `use` path (e.g., `src.sandbox.tools:bash_tool`)
 - `get_available_tools()` resolves tool paths via reflection
 - Built-in tools in `src/tools/builtins/`:
  - `present_file_tool` - Display files to users
  - `ask_clarification_tool` - Request clarification
  - `view_image_tool` - Vision model integration (conditional on model capability)
 - Community tools in `src/community/`: Jina AI (web fetch), Tavily (web search), Firecrawl (scraping)
 - Supports MCP (Model Context Protocol) for pluggable external tools
-**MCP System** (`src/mcp/`)
+**Components**:
- Integrates with MCP servers to provide pluggable external tools using `langchain-mcp-adapters`
+- `updater.py` - LLM-based memory updates with fact extraction and atomic file I/O
- Uses `MultiServerMCPClient` from langchain-mcp-adapters for multi-server management
+- `queue.py` - Debounced update queue (per-thread deduplication, configurable wait time)
- **Automatic initialization**: Tools are loaded on first use with lazy initialization
+- `prompt.py` - Prompt templates for memory updates
 - Supports both eager loading (FastAPI startup) and lazy loading (LangGraph Studio)
 - `initialize_mcp_tools()` can be called in FastAPI lifespan handler for eager loading
 - `get_cached_mcp_tools()` automatically initializes tools if not already loaded
 - Each server can be enabled/disabled independently via `enabled` flag
 - Support types: stdio (command-based), SSE, HTTP
 - Built on top of langchain-ai/langchain-mcp-adapters for seamless integration
-**Reflection System** (`src/reflection/`)
+**Data Structure** (stored in `backend/.deer-flow/memory.json`):
- `resolve_variable()` imports module and returns variable (e.g., `module:variable`)
+- **User Context**: `workContext`, `personalContext`, `topOfMind` (1-3 sentence summaries)
- `resolve_class()` imports and validates class against base class
+- **History**: `recentMonths`, `earlierContext`, `longTermBackground`
 - **Facts**: Discrete facts with `id`, `content`, `category` (preference/knowledge/context/behavior/goal), `confidence` (0-1), `createdAt`, `source`
-**Skills System** (`src/skills/`)
+**Workflow**:
- Skills provide specialized workflows for specific tasks (e.g., PDF processing, frontend design)
+1. `MemoryMiddleware` filters messages (user inputs + final AI responses) and queues conversation
- Located in `deer-flow/skills/{public,custom}` directory structure
+2. Queue debounces (30s default), batches updates, deduplicates per-thread
- Each skill has a `SKILL.md` file with YAML front matter (name, description, license, allowed-tools)
+3. Background thread invokes LLM to extract context updates and facts
- Skills are automatically discovered and loaded at runtime
+4. Applies updates atomically (temp file + rename) with cache invalidation
- `load_skills()` scans directories and parses SKILL.md files
+5. Next interaction injects top 15 facts + context into `<memory>` tags in system prompt
 - Skills are injected into agent's system prompt with paths (only enabled skills)
 - Path mapping system allows seamless access in both local and Docker sandbox
 - Each skill can be enabled/disabled independently via `enabled` flag in extensions config
-**Middleware System** (`src/agents/middlewares/`)
+**Configuration** (`config.yaml` → `memory`):
- Custom middlewares handle cross-cutting concerns
+- `enabled` / `injection_enabled` - Master switches
- Middlewares are registered in `src/agents/lead_agent/agent.py` with execution order:
+- `storage_path` - Path to memory.json
-  1. `ThreadDataMiddleware` - Initializes thread context (workspace, uploads, outputs paths)
+- `debounce_seconds` - Wait time before processing (default: 30)
-  2. `UploadsMiddleware` - Processes uploaded files, injects file list into state
+- `model_name` - LLM for updates (null = default model)
-  3. `SandboxMiddleware` - Manages sandbox lifecycle, acquires on start
+- `max_facts` / `fact_confidence_threshold` - Fact storage limits (100 / 0.7)
-  4. `SummarizationMiddleware` - Reduces context when token limits approached (if enabled)
+- `max_injection_tokens` - Token limit for prompt injection (2000)
  5. `TitleMiddleware` - Generates conversation titles
  6. `TodoListMiddleware` - Tracks multi-step tasks (if plan_mode enabled)
  7. `ViewImageMiddleware` - Injects image details for vision models
  8. `MemoryMiddleware` - Automatic context retention and personalization (if enabled)
  9. `ClarificationMiddleware` - Handles clarification requests (must be last)
-**Memory System** (`src/agents/memory/`)
+### Reflection System (`src/reflection/`)
 - LLM-powered personalization layer that automatically extracts and stores user context across conversations
 - Components:
  - `updater.py` - LLM-based memory updates with fact extraction and file I/O
  - `queue.py` - Debounced update queue for batching and performance optimization
  - `prompt.py` - Prompt templates and formatting utilities for memory updates
 - `MemoryMiddleware` (`src/agents/middlewares/memory_middleware.py`) - Queues conversations for memory updates
 - Gateway API (`src/gateway/routers/memory.py`) - REST endpoints for memory management
 - Storage: JSON file at `backend/.deer-flow/memory.json`
-**Memory Data Structure**:
+- `resolve_variable(path)` - Import module and return variable (e.g., `module.path:variable_name`)
- **User Context** (current state):
+- `resolve_class(path, base_class)` - Import and validate class against base class
  - `workContext` - Work-related information (job, projects, technologies)
  - `personalContext` - Preferences, communication style, background
  - `topOfMind` - Current focus areas and immediate priorities
 - **History** (temporal context):
  - `recentMonths` - Recent activities and discussions
  - `earlierContext` - Important historical context
  - `longTermBackground` - Persistent background information
 - **Facts** (structured knowledge):
  - Discrete facts with categories: `preference`, `knowledge`, `context`, `behavior`, `goal`
  - Each fact includes: `id`, `content`, `category`, `confidence` (0-1), `createdAt`, `source` (thread ID)
  - Confidence threshold (default 0.7) filters low-quality facts
  - Max facts limit (default 100) keeps highest-confidence facts
 **Memory Workflow**:
 1. **Post-Interaction**: `MemoryMiddleware` filters messages (user inputs + final AI responses only) and queues conversation
 2. **Debounced Processing**: Queue waits 30s (configurable), batches multiple updates, resets timer on new updates
 3. **LLM-Based Update**: Background thread loads memory, formats conversation, invokes LLM to extract:
   - Updated context summaries (1-3 sentences each)
   - New facts with confidence scores and categories
   - Facts to remove (contradictions)
 4. **Storage**: Applies updates atomically to `memory.json` with cache invalidation (mtime-based)
 5. **Injection**: Next interaction loads memory, formats top 15 facts + context, injects into `<memory>` tags in system prompt
 **Memory API Endpoints** (`/api/memory`):
 - `GET /api/memory` - Retrieve current memory data
 - `POST /api/memory/reload` - Force reload from file (invalidates cache)
 - `GET /api/memory/config` - Get memory configuration
 - `GET /api/memory/status` - Get both config and data
 ### Config Schema
-Models, tools, sandbox providers, skills, and middleware settings are configured in `config.yaml`:
+**`config.yaml`** key sections:
- `models[]`: LLM configurations with `use` class path, `supports_thinking`, `supports_vision`
+- `models[]` - LLM configs with `use` class path, `supports_thinking`, `supports_vision`, provider-specific fields
- `tools[]`: Tool configurations with `use` variable path and `group`
+- `tools[]` - Tool configs with `use` variable path and `group`
- `tool_groups[]`: Logical groupings for tools
+- `tool_groups[]` - Logical groupings for tools
- `sandbox.use`: Sandbox provider class path
+- `sandbox.use` - Sandbox provider class path
- `skills.path`: Host path to skills directory (optional, default: `../skills`)
+- `skills.path` / `skills.container_path` - Host and container paths to skills directory
- `skills.container_path`: Container mount path (default: `/mnt/skills`)
+- `title` - Auto-title generation (enabled, max_words, max_chars, prompt_template)
- `title`: Automatic thread title generation configuration
+- `summarization` - Context summarization (enabled, trigger conditions, keep policy)
- `summarization`: Automatic conversation summarization configuration
+- `subagents.enabled` - Master switch for subagent delegation
- `subagents`: Subagent (task tool) configuration
+- `memory` - Memory system (enabled, storage_path, debounce_seconds, model_name, max_facts, fact_confidence_threshold, injection_enabled, max_injection_tokens)
  - `enabled`: Master switch to enable/disable subagents (boolean, default: true)
 - `memory`: Memory system configuration
  - `enabled`: Master switch (boolean)
  - `storage_path`: Path to memory.json file (relative to backend/)
  - `debounce_seconds`: Wait time before processing updates (default: 30)
  - `model_name`: LLM model for memory updates (null = use default model)
  - `max_facts`: Maximum facts to store (default: 100)
  - `fact_confidence_threshold`: Minimum confidence to store fact (default: 0.7)
  - `injection_enabled`: Inject memory into system prompt (boolean)
  - `max_injection_tokens`: Token limit for memory injection (default: 2000)
-**Extensions Configuration Schema** (`extensions_config.json`):
+**`extensions_config.json`**:
- `mcpServers`: Map of MCP server name to configuration
+- `mcpServers` - Map of server name → config (enabled, type, command, args, env, url, headers, description)
-  - `enabled`: Whether the server is enabled (boolean)
+- `skills` - Map of skill name → state (enabled)
  - `type`: Transport type (`stdio`, `sse`, `http`)
  - `command`: Command to execute (for stdio type)
  - `args`: Arguments to pass to the command (array)
  - `env`: Environment variables (object with `$VAR` support)
  - `description`: Human-readable description
 - `skills`: Map of skill name to state configuration
  - `enabled`: Whether the skill is enabled (boolean, default: true if not specified)
-Both MCP servers and skills can be modified at runtime via API endpoints.
+Both can be modified at runtime via Gateway API endpoints.
 ## Development Workflow
@@ -305,13 +293,13 @@ make dev
 This starts all services and makes the application available at `http://localhost:2026`.
 **Nginx routing**:
- `/api/langgraph/*` → LangGraph Server (2024) - Agent interactions, threads, streaming
+- `/api/langgraph/*` → LangGraph Server (2024)
- `/api/*` (other) → Gateway API (8001) - Models, MCP, skills, artifacts, uploads
+- `/api/*` (other) → Gateway API (8001)
- `/` (non-API) → Frontend (3000) - Web interface
+- `/` (non-API) → Frontend (3000)
 ### Running Backend Services Separately
-For backend-only development, from the **backend** directory:
+From the **backend** directory:
 ```bash
 # Terminal 1: LangGraph server
@@ -337,21 +325,20 @@ When using `make dev` from root, the frontend automatically connects through ngi
 ### File Upload
-The backend supports multi-file upload with automatic document conversion:
+Multi-file upload with automatic document conversion:
 - Endpoint: `POST /api/threads/{thread_id}/uploads`
- Supports: PDF, PPT, Excel, Word documents
+- Supports: PDF, PPT, Excel, Word documents (converted via `markitdown`)
 - Auto-converts documents to Markdown using `markitdown`
 - Files stored in thread-isolated directories
- Agent automatically receives uploaded file list via `UploadsMiddleware`
+- Agent receives uploaded file list via `UploadsMiddleware`
 See [docs/FILE_UPLOAD.md](docs/FILE_UPLOAD.md) for details.
 ### Plan Mode
-Enable TodoList middleware for complex multi-step tasks:
+TodoList middleware for complex multi-step tasks:
 - Controlled via runtime config: `config.configurable.is_plan_mode = True`
 - Provides `write_todos` tool for task tracking
- Agent can break down complex tasks and track progress
+- One task in_progress at a time, real-time updates
 See [docs/plan_mode_usage.md](docs/plan_mode_usage.md) for details.
@@ -369,30 +356,7 @@ See [docs/summarization.md](docs/summarization.md) for details.
 For models with `supports_vision: true`:
 - `ViewImageMiddleware` processes images in conversation
 - `view_image_tool` added to agent's toolset
- Images automatically converted and injected into state
+- Images automatically converted to base64 and injected into state
 ### Memory System
 Persistent context retention and personalization across conversations:
 - **Automatic Extraction**: LLM analyzes conversations to extract user context, facts, and preferences
 - **Structured Storage**: Maintains user context, history, and confidence-scored facts in JSON format
 - **Smart Filtering**: Only processes meaningful messages (user inputs + final AI responses)
 - **Debounced Updates**: Batches updates to minimize LLM calls (configurable wait time)
 - **System Prompt Injection**: Automatically injects relevant memory context into agent prompts
 - **Cache Optimization**: File modification time-based cache invalidation for external edits
 - **Thread Safety**: Locks protect queue and cache for concurrent access
 - **REST API**: Full CRUD operations via `/api/memory` endpoints
 - **Frontend Integration**: Memory settings page for viewing and managing memory data
 **Configuration**: Controlled via `memory` section in `config.yaml`
 - Enable/disable memory system
 - Configure storage path, debounce timing, fact limits
 - Control system prompt injection and token limits
 - Set confidence thresholds for fact storage
 **Storage Location**: `backend/.deer-flow/memory.json`
 See configuration section for detailed settings.
 ## Code Style
@@ -405,6 +369,8 @@ See configuration section for detailed settings.
 See `docs/` directory for detailed documentation:
 - [CONFIGURATION.md](docs/CONFIGURATION.md) - Configuration options
 - [ARCHITECTURE.md](docs/ARCHITECTURE.md) - Architecture details
 - [API.md](docs/API.md) - API reference
 - [SETUP.md](docs/SETUP.md) - Setup guide
 - [FILE_UPLOAD.md](docs/FILE_UPLOAD.md) - File upload feature
 - [PATH_EXAMPLES.md](docs/PATH_EXAMPLES.md) - Path types and usage
--- a/backend/README.md
+++ b/backend/README.md
@@ -1,41 +1,133 @@
 # DeerFlow Backend
-DeerFlow is a LangGraph-based AI agent system that provides a powerful "super agent" with sandbox execution capabilities. The backend enables AI agents to execute code, browse the web, manage files, and perform complex multi-step tasks in isolated environments.
+DeerFlow is a LangGraph-based AI super agent with sandbox execution, persistent memory, and extensible tool integration. The backend enables AI agents to execute code, browse the web, manage files, delegate tasks to subagents, and retain context across conversations - all in isolated, per-thread environments.
 ---
 ## Features
 - **LangGraph Agent Runtime**: Built on LangGraph for robust multi-agent workflow orchestration
 - **Sandbox Execution**: Safe code execution with local or Docker-based isolation
 - **Multi-Model Support**: OpenAI, Anthropic Claude, DeepSeek, Doubao, Kimi, and custom LangChain-compatible models
 - **MCP Integration**: Extensible tool ecosystem via Model Context Protocol
 - **Skills System**: Specialized domain workflows injected into agent prompts
 - **File Upload & Processing**: Multi-format document upload with automatic Markdown conversion
 - **Context Summarization**: Automatic conversation summarization for long conversations
 - **Plan Mode**: TodoList middleware for complex multi-step task tracking
 ---
 ## Architecture
 ```
-┌─────────────────────────────────────────────────────────────────┐
+                        ┌──────────────────────────────────────┐
-│                         Nginx (Port 2026)                       │
+                        │          Nginx (Port 2026)           │
-│              Unified reverse proxy entry point                   │
+                        │      Unified reverse proxy           │
-└─────────────────┬───────────────────────────────┬───────────────┘
+                        └───────┬──────────────────┬───────────┘
                                │                  │
              /api/langgraph/*  │                  │  /api/* (other)
                                ▼                  ▼
-┌─────────────────────────────┐   ┌─────────────────────────────┐
+               ┌────────────────────┐  ┌────────────────────────┐
-│   LangGraph Server (2024)   │   │    Gateway API (8001)       │
+               │ LangGraph Server   │  │   Gateway API (8001)   │
-│   Agent runtime & workflows │   │   Models, MCP, Skills, etc. │
+               │    (Port 2024)     │  │   FastAPI REST         │
-└─────────────────────────────┘   └─────────────────────────────┘
+               │                    │  │                        │
               │ ┌────────────────┐ │  │ Models, MCP, Skills,   │
               │ │  Lead Agent    │ │  │ Memory, Uploads,       │
               │ │  ┌──────────┐  │ │  │ Artifacts              │
               │ │  │Middleware│  │ │  └────────────────────────┘
               │ │  │  Chain   │  │ │
               │ │  └──────────┘  │ │
               │ │  ┌──────────┐  │ │
               │ │  │  Tools   │  │ │
               │ │  └──────────┘  │ │
               │ │  ┌──────────┐  │ │
               │ │  │Subagents │  │ │
               │ │  └──────────┘  │ │
               │ └────────────────┘ │
               └────────────────────┘
 ```
-**Request Routing**:
+**Request Routing** (via Nginx):
- `/api/langgraph/*` → LangGraph Server (agent interactions, threads, streaming)
+- `/api/langgraph/*` → LangGraph Server - agent interactions, threads, streaming
- `/api/*` (other) → Gateway API (models, MCP, skills, artifacts, uploads)
+- `/api/*` (other) → Gateway API - models, MCP, skills, memory, artifacts, uploads
- `/` (non-API) → Frontend (web interface)
+- `/` (non-API) → Frontend - Next.js web interface
 ---
 ## Core Components
 ### Lead Agent
 The single LangGraph agent (`lead_agent`) is the runtime entry point, created via `make_lead_agent(config)`. It combines:
 - **Dynamic model selection** with thinking and vision support
 - **Middleware chain** for cross-cutting concerns (9 middlewares)
 - **Tool system** with sandbox, MCP, community, and built-in tools
 - **Subagent delegation** for parallel task execution
 - **System prompt** with skills injection, memory context, and working directory guidance
 ### Middleware Chain
 Middlewares execute in strict order, each handling a specific concern:
 | # | Middleware | Purpose |
 |---|-----------|---------|
 | 1 | **ThreadDataMiddleware** | Creates per-thread isolated directories (workspace, uploads, outputs) |
 | 2 | **UploadsMiddleware** | Injects newly uploaded files into conversation context |
 | 3 | **SandboxMiddleware** | Acquires sandbox environment for code execution |
 | 4 | **SummarizationMiddleware** | Reduces context when approaching token limits (optional) |
 | 5 | **TodoListMiddleware** | Tracks multi-step tasks in plan mode (optional) |
 | 6 | **TitleMiddleware** | Auto-generates conversation titles after first exchange |
 | 7 | **MemoryMiddleware** | Queues conversations for async memory extraction |
 | 8 | **ViewImageMiddleware** | Injects image data for vision-capable models (conditional) |
 | 9 | **ClarificationMiddleware** | Intercepts clarification requests and interrupts execution (must be last) |
 ### Sandbox System
 Per-thread isolated execution with virtual path translation:
 - **Abstract interface**: `execute_command`, `read_file`, `write_file`, `list_dir`
 - **Providers**: `LocalSandboxProvider` (filesystem) and `AioSandboxProvider` (Docker, in community/)
 - **Virtual paths**: `/mnt/user-data/{workspace,uploads,outputs}` → thread-specific physical directories
 - **Skills path**: `/mnt/skills` → `deer-flow/skills/` directory
 - **Tools**: `bash`, `ls`, `read_file`, `write_file`, `str_replace`
 ### Subagent System
 Async task delegation with concurrent execution:
 - **Built-in agents**: `general-purpose` (full toolset) and `bash` (command specialist)
 - **Concurrency**: Max 3 subagents per turn, 15-minute timeout
 - **Execution**: Background thread pools with status tracking and SSE events
 - **Flow**: Agent calls `task()` tool → executor runs subagent in background → polls for completion → returns result
 ### Memory System
 LLM-powered persistent context retention across conversations:
 - **Automatic extraction**: Analyzes conversations for user context, facts, and preferences
 - **Structured storage**: User context (work, personal, top-of-mind), history, and confidence-scored facts
 - **Debounced updates**: Batches updates to minimize LLM calls (configurable wait time)
 - **System prompt injection**: Top facts + context injected into agent prompts
 - **Storage**: JSON file with mtime-based cache invalidation
 ### Tool Ecosystem
 | Category | Tools |
 |----------|-------|
 | **Sandbox** | `bash`, `ls`, `read_file`, `write_file`, `str_replace` |
 | **Built-in** | `present_files`, `ask_clarification`, `view_image`, `task` (subagent) |
 | **Community** | Tavily (web search), Jina AI (web fetch), Firecrawl (scraping), DuckDuckGo (image search) |
 | **MCP** | Any Model Context Protocol server (stdio, SSE, HTTP transports) |
 | **Skills** | Domain-specific workflows injected via system prompt |
 ### Gateway API
 FastAPI application providing REST endpoints for frontend integration:
 | Route | Purpose |
 |-------|---------|
 | `GET /api/models` | List available LLM models |
 | `GET/PUT /api/mcp/config` | Manage MCP server configurations |
 | `GET/PUT /api/skills` | List and manage skills |
 | `POST /api/skills/install` | Install skill from `.skill` archive |
 | `GET /api/memory` | Retrieve memory data |
 | `POST /api/memory/reload` | Force memory reload |
 | `GET /api/memory/config` | Memory configuration |
 | `GET /api/memory/status` | Combined config + data |
 | `POST /api/threads/{id}/uploads` | Upload files (auto-converts PDF/PPT/Excel/Word to Markdown) |
 | `GET /api/threads/{id}/uploads/list` | List uploaded files |
 | `GET /api/threads/{id}/artifacts/{path}` | Serve generated artifacts |
 ---
 ## Quick Start
 ### Prerequisites
@@ -47,7 +139,6 @@ DeerFlow is a LangGraph-based AI agent system that provides a powerful "super ag
 ### Installation
 ```bash
 # Clone the repository (if not already)
 cd deer-flow
 # Copy configuration files
@@ -61,23 +152,23 @@ make install
 ### Configuration
-Edit `config.yaml` in the project root to configure your models and tools:
+Edit `config.yaml` in the project root:
 ```yaml
 models:
-  - name: gpt-4
+  - name: gpt-4o
-    display_name: GPT-4
+    display_name: GPT-4o
    use: langchain_openai:ChatOpenAI
-    model: gpt-4
+    model: gpt-4o
-    api_key: $OPENAI_API_KEY  # Set environment variable
+    api_key: $OPENAI_API_KEY
-    max_tokens: 4096
+    supports_thinking: false
    supports_vision: true
 ```
 Set your API keys:
 ```bash
 export OPENAI_API_KEY="your-api-key-here"
 # Or other provider keys as needed
 ```
 ### Running
@@ -100,96 +191,70 @@ make dev
 make gateway
 ```
-Direct access:
+Direct access: LangGraph at http://localhost:2024, Gateway at http://localhost:8001
 - LangGraph: http://localhost:2024
 - Gateway: http://localhost:8001
 ---
 ## Project Structure
 ```
 backend/
 ├── src/
-│   ├── agents/              # LangGraph agents and workflows
+│   ├── agents/                  # Agent system
-│   │   ├── lead_agent/      # Main agent implementation
+│   │   ├── lead_agent/         # Main agent (factory, prompts)
-│   │   └── middlewares/     # Agent middlewares
+│   │   ├── middlewares/        # 9 middleware components
 │   │   ├── memory/             # Memory extraction & storage
 │   │   └── thread_state.py    # ThreadState schema
 │   ├── gateway/                # FastAPI Gateway API
-│   │   └── routers/         # API route handlers
+│   │   ├── app.py             # Application setup
-│   ├── sandbox/             # Sandbox execution system
+│   │   └── routers/           # 6 route modules
-│   ├── tools/               # Agent tools (builtins)
+│   ├── sandbox/                # Sandbox execution
-│   ├── mcp/                 # MCP integration
+│   │   ├── local/             # Local filesystem provider
 │   │   ├── sandbox.py         # Abstract interface
 │   │   ├── tools.py           # bash, ls, read/write/str_replace
 │   │   └── middleware.py      # Sandbox lifecycle
 │   ├── subagents/              # Subagent delegation
 │   │   ├── builtins/          # general-purpose, bash agents
 │   │   ├── executor.py        # Background execution engine
 │   │   └── registry.py        # Agent registry
 │   ├── tools/builtins/         # Built-in tools
 │   ├── mcp/                    # MCP protocol integration
 │   ├── models/                 # Model factory
-│   ├── skills/              # Skills loader
+│   ├── skills/                 # Skill discovery & loading
 │   ├── config/                 # Configuration system
-│   ├── community/           # Community tools (web search, etc.)
+│   ├── community/              # Community tools & providers
 │   ├── reflection/             # Dynamic module loading
-│   └── utils/               # Utility functions
+│   └── utils/                  # Utilities
 ├── docs/                       # Documentation
 ├── tests/                      # Test suite
 ├── langgraph.json              # LangGraph server configuration
 ├── config.yaml              # Application configuration (optional)
 ├── pyproject.toml              # Python dependencies
 ├── Makefile                    # Development commands
 └── Dockerfile                  # Container build
 ```
 ---
 ## API Reference
 ### LangGraph API (via `/api/langgraph/*`)
 - `POST /threads` - Create new conversation thread
 - `POST /threads/{thread_id}/runs` - Execute agent with input
 - `GET /threads/{thread_id}/runs` - Get run history
 - `GET /threads/{thread_id}/state` - Get current conversation state
 - WebSocket support for streaming responses
 ### Gateway API (via `/api/*`)
 **Models**:
 - `GET /api/models` - List available LLM models
 - `GET /api/models/{model_name}` - Get model details
 **MCP Configuration**:
 - `GET /api/mcp/config` - Get current MCP server configurations
 - `PUT /api/mcp/config` - Update MCP configuration
 **Skills Management**:
 - `GET /api/skills` - List all skills
 - `GET /api/skills/{skill_name}` - Get skill details
 - `POST /api/skills/{skill_name}/enable` - Enable a skill
 - `POST /api/skills/{skill_name}/disable` - Disable a skill
 - `POST /api/skills/install` - Install skill from `.skill` file
 **File Uploads**:
 - `POST /api/threads/{thread_id}/uploads` - Upload files
 - `GET /api/threads/{thread_id}/uploads/list` - List uploaded files
 - `DELETE /api/threads/{thread_id}/uploads/{filename}` - Delete file
 **Artifacts**:
 - `GET /api/threads/{thread_id}/artifacts/{path}` - Download generated artifacts
 ---
 ## Configuration
 ### Main Configuration (`config.yaml`)
-The application uses a YAML-based configuration file. Place it in the project root directory.
+Place in project root. Config values starting with `$` resolve as environment variables.
 Key sections:
- `models`: LLM configurations with class paths and API keys
+- `models` - LLM configurations with class paths, API keys, thinking/vision flags
- `tool_groups`: Logical groupings for tools
+- `tools` - Tool definitions with module paths and groups
- `tools`: Tool definitions with module paths
+- `tool_groups` - Logical tool groupings
- `sandbox`: Execution environment settings
+- `sandbox` - Execution environment provider
- `skills`: Skills directory configuration
+- `skills` - Skills directory paths
- `title`: Auto-title generation settings
+- `title` - Auto-title generation settings
- `summarization`: Context summarization settings
+- `summarization` - Context summarization settings
-
+- `subagents` - Subagent system (enabled/disabled)
-See [docs/CONFIGURATION.md](docs/CONFIGURATION.md) for detailed documentation.
+- `memory` - Memory system settings (enabled, storage, debounce, facts limits)
 ### Extensions Configuration (`extensions_config.json`)
-MCP servers and skills are configured in `extensions_config.json`:
+MCP servers and skill states in a single file:
 ```json
 {
@@ -216,6 +281,7 @@ MCP servers and skills are configured in `extensions_config.json`:
 - Tool API keys: `TAVILY_API_KEY`, `GITHUB_TOKEN`, etc.
 ---
 ## Development
 ### Commands
@@ -230,10 +296,11 @@ make format     # Format code (ruff)
 ### Code Style
- Uses `ruff` for linting and formatting
+- **Linter/Formatter**: `ruff`
- Line length: 240 characters
+- **Line length**: 240 characters
- Python 3.12+ with type hints
+- **Python**: 3.12+ with type hints
- Double quotes, space indentation
+- **Quotes**: Double quotes
 - **Indentation**: 4 spaces
 ### Testing
@@ -242,45 +309,36 @@ uv run pytest
 ```
 ---
 ## Documentation
 - [Configuration Guide](docs/CONFIGURATION.md) - Detailed configuration options
 - [Setup Guide](docs/SETUP.md) - Quick setup instructions
 - [File Upload](docs/FILE_UPLOAD.md) - File upload functionality
 - [Path Examples](docs/PATH_EXAMPLES.md) - Path types and usage
 - [Summarization](docs/summarization.md) - Context summarization feature
 - [Plan Mode](docs/plan_mode_usage.md) - TodoList middleware usage
 ---
 ## Technology Stack
-### Core Frameworks
+- **LangGraph** (1.0.6+) - Agent framework and multi-agent orchestration
- **LangChain** (1.2.3+) - LLM orchestration
+- **LangChain** (1.2.3+) - LLM abstractions and tool system
- **LangGraph** (1.0.6+) - Multi-agent workflows
+- **FastAPI** (0.115.0+) - Gateway REST API
- **FastAPI** (0.115.0+) - REST API
+- **langchain-mcp-adapters** - Model Context Protocol support
- **Uvicorn** (0.34.0+) - ASGI server
+- **agent-sandbox** - Sandboxed code execution
-
+- **markitdown** - Multi-format document conversion
-### LLM Integrations
+- **tavily-python** / **firecrawl-py** - Web search and scraping
 - `langchain-openai` - OpenAI models
 - `langchain-anthropic` - Claude models
 - `langchain-deepseek` - DeepSeek models
 ### Extensions
 - `langchain-mcp-adapters` - MCP protocol support
 - `agent-sandbox` - Sandboxed code execution
 ### Utilities
 - `markitdown` - Multi-format to Markdown conversion
 - `tavily-python` - Web search
 - `firecrawl-py` - Web scraping
 - `ddgs` - DuckDuckGo image search
 ---
 ## Documentation
 - [Configuration Guide](docs/CONFIGURATION.md)
 - [Architecture Details](docs/ARCHITECTURE.md)
 - [API Reference](docs/API.md)
 - [File Upload](docs/FILE_UPLOAD.md)
 - [Path Examples](docs/PATH_EXAMPLES.md)
 - [Context Summarization](docs/summarization.md)
 - [Plan Mode](docs/plan_mode_usage.md)
 - [Setup Guide](docs/SETUP.md)
 ---
 ## License
 See the [LICENSE](../LICENSE) file in the project root.
 ---
 ## Contributing
 See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines.