* feat: support explicit OpenAI Responses API config Co-authored-by: Codex <noreply@openai.com> * Update backend/packages/harness/deerflow/config/model_config.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
28 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
DeerFlow is a LangGraph-based AI super agent system with a full-stack architecture. The backend provides a "super agent" with sandbox execution, persistent memory, subagent delegation, and extensible tool integration - all operating in per-thread isolated environments.
Architecture:
- LangGraph Server (port 2024): Agent runtime and workflow execution
- Gateway API (port 8001): REST API for models, MCP, skills, memory, artifacts, and uploads
- Frontend (port 3000): Next.js web interface
- Nginx (port 2026): Unified reverse proxy entry point
- Provisioner (port 8002, optional in Docker dev): Started only when sandbox is configured for provisioner/Kubernetes mode
Project Structure:
deer-flow/
├── Makefile # Root commands (check, install, dev, stop)
├── config.yaml # Main application configuration
├── extensions_config.json # MCP servers and skills configuration
├── backend/ # Backend application (this directory)
│ ├── Makefile # Backend-only commands (dev, gateway, lint)
│ ├── langgraph.json # LangGraph server configuration
│ ├── packages/
│ │ └── harness/ # deerflow-harness package (import: deerflow.*)
│ │ ├── pyproject.toml
│ │ └── deerflow/
│ │ ├── agents/ # LangGraph agent system
│ │ │ ├── lead_agent/ # Main agent (factory + system prompt)
│ │ │ ├── middlewares/ # 10 middleware components
│ │ │ ├── memory/ # Memory extraction, queue, prompts
│ │ │ └── thread_state.py # ThreadState schema
│ │ ├── sandbox/ # Sandbox execution system
│ │ │ ├── local/ # Local filesystem provider
│ │ │ ├── sandbox.py # Abstract Sandbox interface
│ │ │ ├── tools.py # bash, ls, read/write/str_replace
│ │ │ └── middleware.py # Sandbox lifecycle management
│ │ ├── subagents/ # Subagent delegation system
│ │ │ ├── builtins/ # general-purpose, bash agents
│ │ │ ├── executor.py # Background execution engine
│ │ │ └── registry.py # Agent registry
│ │ ├── tools/builtins/ # Built-in tools (present_files, ask_clarification, view_image)
│ │ ├── mcp/ # MCP integration (tools, cache, client)
│ │ ├── models/ # Model factory with thinking/vision support
│ │ ├── skills/ # Skills discovery, loading, parsing
│ │ ├── config/ # Configuration system (app, model, sandbox, tool, etc.)
│ │ ├── community/ # Community tools (tavily, jina_ai, firecrawl, image_search, aio_sandbox)
│ │ ├── reflection/ # Dynamic module loading (resolve_variable, resolve_class)
│ │ ├── utils/ # Utilities (network, readability)
│ │ └── client.py # Embedded Python client (DeerFlowClient)
│ ├── app/ # Application layer (import: app.*)
│ │ ├── gateway/ # FastAPI Gateway API
│ │ │ ├── app.py # FastAPI application
│ │ │ └── routers/ # 6 route modules
│ │ └── channels/ # IM platform integrations
│ ├── tests/ # Test suite
│ └── docs/ # Documentation
├── frontend/ # Next.js frontend application
└── skills/ # Agent skills directory
├── public/ # Public skills (committed)
└── custom/ # Custom skills (gitignored)
Important Development Guidelines
Documentation Update Policy
CRITICAL: Always update README.md and CLAUDE.md after every code change
When making code changes, you MUST update the relevant documentation:
- Update
README.mdfor user-facing changes (features, setup, usage instructions) - Update
CLAUDE.mdfor development changes (architecture, commands, workflows, internal systems) - Keep documentation synchronized with the codebase at all times
- Ensure accuracy and timeliness of all documentation
Commands
Root directory (for full application):
make check # Check system requirements
make install # Install all dependencies (frontend + backend)
make dev # Start all services (LangGraph + Gateway + Frontend + Nginx), with config.yaml preflight
make stop # Stop all services
Backend directory (for backend development only):
make install # Install backend dependencies
make dev # Run LangGraph server only (port 2024)
make gateway # Run Gateway API only (port 8001)
make test # Run all backend tests
make lint # Lint with ruff
make format # Format code with ruff
Regression tests related to Docker/provisioner behavior:
tests/test_docker_sandbox_mode_detection.py(mode detection fromconfig.yaml)tests/test_provisioner_kubeconfig.py(kubeconfig file/directory handling)
Boundary check (harness → app import firewall):
tests/test_harness_boundary.py— ensurespackages/harness/deerflow/never imports fromapp.*
CI runs these regression tests for every pull request via .github/workflows/backend-unit-tests.yml.
Architecture
Harness / App Split
The backend is split into two layers with a strict dependency direction:
- Harness (
packages/harness/deerflow/): Publishable agent framework package (deerflow-harness). Import prefix:deerflow.*. Contains agent orchestration, tools, sandbox, models, MCP, skills, config — everything needed to build and run agents. - App (
app/): Unpublished application code. Import prefix:app.*. Contains the FastAPI Gateway API and IM channel integrations (Feishu, Slack, Telegram).
Dependency rule: App imports deerflow, but deerflow never imports app. This boundary is enforced by tests/test_harness_boundary.py which runs in CI.
Import conventions:
# Harness internal
from deerflow.agents import make_lead_agent
from deerflow.models import create_chat_model
# App internal
from app.gateway.app import app
from app.channels.service import start_channel_service
# App → Harness (allowed)
from deerflow.config import get_app_config
# Harness → App (FORBIDDEN — enforced by test_harness_boundary.py)
# from app.gateway.routers.uploads import ... # ← will fail CI
Agent System
Lead Agent (packages/harness/deerflow/agents/lead_agent/agent.py):
- Entry point:
make_lead_agent(config: RunnableConfig)registered inlanggraph.json - Dynamic model selection via
create_chat_model()with thinking/vision support - Tools loaded via
get_available_tools()- combines sandbox, built-in, MCP, community, and subagent tools - System prompt generated by
apply_prompt_template()with skills, memory, and subagent instructions
ThreadState (packages/harness/deerflow/agents/thread_state.py):
- Extends
AgentStatewith:sandbox,thread_data,title,artifacts,todos,uploaded_files,viewed_images - Uses custom reducers:
merge_artifacts(deduplicate),merge_viewed_images(merge/clear)
Runtime Configuration (via config.configurable):
thinking_enabled- Enable model's extended thinkingmodel_name- Select specific LLM modelis_plan_mode- Enable TodoList middlewaresubagent_enabled- Enable task delegation tool
Middleware Chain
Middlewares execute in strict order in packages/harness/deerflow/agents/lead_agent/agent.py:
- ThreadDataMiddleware - Creates per-thread directories (
backend/.deer-flow/threads/{thread_id}/user-data/{workspace,uploads,outputs}) - UploadsMiddleware - Tracks and injects newly uploaded files into conversation
- SandboxMiddleware - Acquires sandbox, stores
sandbox_idin state - DanglingToolCallMiddleware - Injects placeholder ToolMessages for AIMessage tool_calls that lack responses (e.g., due to user interruption)
- SummarizationMiddleware - Context reduction when approaching token limits (optional, if enabled)
- TodoListMiddleware - Task tracking with
write_todostool (optional, if plan_mode) - TitleMiddleware - Auto-generates thread title after first complete exchange and normalizes structured message content before prompting the title model
- MemoryMiddleware - Queues conversations for async memory update (filters to user + final AI responses)
- ViewImageMiddleware - Injects base64 image data before LLM call (conditional on vision support)
- SubagentLimitMiddleware - Truncates excess
tasktool calls from model response to enforceMAX_CONCURRENT_SUBAGENTSlimit (optional, if subagent_enabled) - ClarificationMiddleware - Intercepts
ask_clarificationtool calls, interrupts viaCommand(goto=END)(must be last)
Configuration System
Main Configuration (config.yaml):
Setup: Copy config.example.yaml to config.yaml in the project root directory.
Config Versioning: config.example.yaml has a config_version field. On startup, AppConfig.from_file() compares user version vs example version and emits a warning if outdated. Missing config_version = version 0. Run make config-upgrade to auto-merge missing fields. When changing the config schema, bump config_version in config.example.yaml.
Config Caching: get_app_config() caches the parsed config, but automatically reloads it when the resolved config path changes or the file's mtime increases. This keeps Gateway and LangGraph reads aligned with config.yaml edits without requiring a manual process restart.
Configuration priority:
- Explicit
config_pathargument DEER_FLOW_CONFIG_PATHenvironment variableconfig.yamlin current directory (backend/)config.yamlin parent directory (project root - recommended location)
Config values starting with $ are resolved as environment variables (e.g., $OPENAI_API_KEY).
ModelConfig also declares use_responses_api and output_version so OpenAI /v1/responses can be enabled explicitly while still using langchain_openai:ChatOpenAI.
Extensions Configuration (extensions_config.json):
MCP servers and skills are configured together in extensions_config.json in project root:
Configuration priority:
- Explicit
config_pathargument DEER_FLOW_EXTENSIONS_CONFIG_PATHenvironment variableextensions_config.jsonin current directory (backend/)extensions_config.jsonin parent directory (project root - recommended location)
Gateway API (app/gateway/)
FastAPI application on port 8001 with health check at GET /health.
Routers:
| Router | Endpoints |
|---|---|
Models (/api/models) |
GET / - list models; GET /{name} - model details |
MCP (/api/mcp) |
GET /config - get config; PUT /config - update config (saves to extensions_config.json) |
Skills (/api/skills) |
GET / - list skills; GET /{name} - details; PUT /{name} - update enabled; POST /install - install from .skill archive (accepts standard optional frontmatter like version, author, compatibility) |
Memory (/api/memory) |
GET / - memory data; POST /reload - force reload; GET /config - config; GET /status - config + data |
Uploads (/api/threads/{id}/uploads) |
POST / - upload files (auto-converts PDF/PPT/Excel/Word); GET /list - list; DELETE /{filename} - delete |
Artifacts (/api/threads/{id}/artifacts) |
GET /{path} - serve artifacts; ?download=true for file download |
Suggestions (/api/threads/{id}/suggestions) |
POST / - generate follow-up questions; rich list/block model content is normalized before JSON parsing |
Proxied through nginx: /api/langgraph/* → LangGraph, all other /api/* → Gateway.
Sandbox System (packages/harness/deerflow/sandbox/)
Interface: Abstract Sandbox with execute_command, read_file, write_file, list_dir
Provider Pattern: SandboxProvider with acquire, get, release lifecycle
Implementations:
LocalSandboxProvider- Singleton local filesystem execution with path mappingsAioSandboxProvider(packages/harness/deerflow/community/) - Docker-based isolation
Virtual Path System:
- Agent sees:
/mnt/user-data/{workspace,uploads,outputs},/mnt/skills - Physical:
backend/.deer-flow/threads/{thread_id}/user-data/...,deer-flow/skills/ - Translation:
replace_virtual_path()/replace_virtual_paths_in_command() - Detection:
is_local_sandbox()checkssandbox_id == "local"
Sandbox Tools (in packages/harness/deerflow/sandbox/tools.py):
bash- Execute commands with path translation and error handlingls- Directory listing (tree format, max 2 levels)read_file- Read file contents with optional line rangewrite_file- Write/append to files, creates directoriesstr_replace- Substring replacement (single or all occurrences)
Subagent System (packages/harness/deerflow/subagents/)
Built-in Agents: general-purpose (all tools except task) and bash (command specialist)
Execution: Dual thread pool - _scheduler_pool (3 workers) + _execution_pool (3 workers)
Concurrency: MAX_CONCURRENT_SUBAGENTS = 3 enforced by SubagentLimitMiddleware (truncates excess tool calls in after_model), 15-minute timeout
Flow: task() tool → SubagentExecutor → background thread → poll 5s → SSE events → result
Events: task_started, task_running, task_completed/task_failed/task_timed_out
Tool System (packages/harness/deerflow/tools/)
get_available_tools(groups, include_mcp, model_name, subagent_enabled) assembles:
- Config-defined tools - Resolved from
config.yamlviaresolve_variable() - MCP tools - From enabled MCP servers (lazy initialized, cached with mtime invalidation)
- Built-in tools:
present_files- Make output files visible to user (only/mnt/user-data/outputs)ask_clarification- Request clarification (intercepted by ClarificationMiddleware → interrupts)view_image- Read image as base64 (added only if model supports vision)
- Subagent tool (if enabled):
task- Delegate to subagent (description, prompt, subagent_type, max_turns)
Community tools (packages/harness/deerflow/community/):
tavily/- Web search (5 results default) and web fetch (4KB limit)jina_ai/- Web fetch via Jina reader API with readability extractionfirecrawl/- Web scraping via Firecrawl APIimage_search/- Image search via DuckDuckGo
MCP System (packages/harness/deerflow/mcp/)
- Uses
langchain-mcp-adaptersMultiServerMCPClientfor multi-server management - Lazy initialization: Tools loaded on first use via
get_cached_mcp_tools() - Cache invalidation: Detects config file changes via mtime comparison
- Transports: stdio (command-based), SSE, HTTP
- OAuth (HTTP/SSE): Supports token endpoint flows (
client_credentials,refresh_token) with automatic token refresh + Authorization header injection - Runtime updates: Gateway API saves to extensions_config.json; LangGraph detects via mtime
Skills System (packages/harness/deerflow/skills/)
- Location:
deer-flow/skills/{public,custom}/ - Format: Directory with
SKILL.md(YAML frontmatter: name, description, license, allowed-tools) - Loading:
load_skills()recursively scansskills/{public,custom}forSKILL.md, parses metadata, and reads enabled state from extensions_config.json - Injection: Enabled skills listed in agent system prompt with container paths
- Installation:
POST /api/skills/installextracts .skill ZIP archive to custom/ directory
Model Factory (packages/harness/deerflow/models/factory.py)
create_chat_model(name, thinking_enabled)instantiates LLM from config via reflection- Supports
thinking_enabledflag with per-modelwhen_thinking_enabledoverrides - Supports
supports_visionflag for image understanding models - Config values starting with
$resolved as environment variables - Missing provider modules surface actionable install hints from reflection resolvers (for example
uv add langchain-google-genai)
IM Channels System (app/channels/)
Bridges external messaging platforms (Feishu, Slack, Telegram) to the DeerFlow agent via the LangGraph Server.
Architecture: Channels communicate with the LangGraph Server through langgraph-sdk HTTP client (same as the frontend), ensuring threads are created and managed server-side.
Components:
message_bus.py- Async pub/sub hub (InboundMessage→ queue → dispatcher;OutboundMessage→ callbacks → channels)store.py- JSON-file persistence mappingchannel_name:chat_id[:topic_id]→thread_id(keys arechannel:chatfor root conversations andchannel:chat:topicfor threaded conversations)manager.py- Core dispatcher: creates threads viaclient.threads.create(), routes commands, keeps Slack/Telegram onclient.runs.wait(), and usesclient.runs.stream(["messages-tuple", "values"])for Feishu incremental outbound updatesbase.py- AbstractChannelbase class (start/stop/send lifecycle)service.py- Manages lifecycle of all configured channels fromconfig.yamlslack.py/feishu.py/telegram.py- Platform-specific implementations (feishu.pytracks the running cardmessage_idin memory and patches the same card in place)
Message Flow:
- External platform -> Channel impl ->
MessageBus.publish_inbound() ChannelManager._dispatch_loop()consumes from queue- For chat: look up/create thread on LangGraph Server
- Feishu chat:
runs.stream()→ accumulate AI text → publish multiple outbound updates (is_final=False) → publish final outbound (is_final=True) - Slack/Telegram chat:
runs.wait()→ extract final response → publish outbound - Feishu channel sends one running reply card up front, then patches the same card for each outbound update (card JSON sets
config.update_multi=truefor Feishu's patch API requirement) - For commands (
/new,/status,/models,/memory,/help): handle locally or query Gateway API - Outbound → channel callbacks → platform reply
Configuration (config.yaml -> channels):
langgraph_url- LangGraph Server URL (default:http://localhost:2024)gateway_url- Gateway API URL for auxiliary commands (default:http://localhost:8001)- Per-channel configs:
feishu(app_id, app_secret),slack(bot_token, app_token),telegram(bot_token)
Memory System (packages/harness/deerflow/agents/memory/)
Components:
updater.py- LLM-based memory updates with fact extraction, whitespace-normalized fact deduplication (trims leading/trailing whitespace before comparing), and atomic file I/Oqueue.py- Debounced update queue (per-thread deduplication, configurable wait time)prompt.py- Prompt templates for memory updates
Data Structure (stored in backend/.deer-flow/memory.json):
- User Context:
workContext,personalContext,topOfMind(1-3 sentence summaries) - History:
recentMonths,earlierContext,longTermBackground - Facts: Discrete facts with
id,content,category(preference/knowledge/context/behavior/goal),confidence(0-1),createdAt,source
Workflow:
MemoryMiddlewarefilters messages (user inputs + final AI responses) and queues conversation- Queue debounces (30s default), batches updates, deduplicates per-thread
- Background thread invokes LLM to extract context updates and facts
- Applies updates atomically (temp file + rename) with cache invalidation, skipping duplicate fact content before append
- Next interaction injects top 15 facts + context into
<memory>tags in system prompt
Focused regression coverage for the updater lives in backend/tests/test_memory_updater.py.
Configuration (config.yaml → memory):
enabled/injection_enabled- Master switchesstorage_path- Path to memory.jsondebounce_seconds- Wait time before processing (default: 30)model_name- LLM for updates (null = default model)max_facts/fact_confidence_threshold- Fact storage limits (100 / 0.7)max_injection_tokens- Token limit for prompt injection (2000)
Reflection System (packages/harness/deerflow/reflection/)
resolve_variable(path)- Import module and return variable (e.g.,module.path:variable_name)resolve_class(path, base_class)- Import and validate class against base class
Config Schema
config.yaml key sections:
models[]- LLM configs withuseclass path,supports_thinking,supports_vision, provider-specific fieldstools[]- Tool configs withusevariable path andgrouptool_groups[]- Logical groupings for toolssandbox.use- Sandbox provider class pathskills.path/skills.container_path- Host and container paths to skills directorytitle- Auto-title generation (enabled, max_words, max_chars, prompt_template)summarization- Context summarization (enabled, trigger conditions, keep policy)subagents.enabled- Master switch for subagent delegationmemory- Memory system (enabled, storage_path, debounce_seconds, model_name, max_facts, fact_confidence_threshold, injection_enabled, max_injection_tokens)
extensions_config.json:
mcpServers- Map of server name → config (enabled, type, command, args, env, url, headers, oauth, description)skills- Map of skill name → state (enabled)
Both can be modified at runtime via Gateway API endpoints or DeerFlowClient methods.
Embedded Client (packages/harness/deerflow/client.py)
DeerFlowClient provides direct in-process access to all DeerFlow capabilities without HTTP services. All return types align with the Gateway API response schemas, so consumer code works identically in HTTP and embedded modes.
Architecture: Imports the same deerflow modules that LangGraph Server and Gateway API use. Shares the same config files and data directories. No FastAPI dependency.
Agent Conversation (replaces LangGraph Server):
chat(message, thread_id)— synchronous, returns final textstream(message, thread_id)— yieldsStreamEventaligned with LangGraph SSE protocol:"values"— full state snapshot (title, messages, artifacts)"messages-tuple"— per-message update (AI text, tool calls, tool results)"end"— stream finished
- Agent created lazily via
create_agent()+_build_middlewares(), same asmake_lead_agent - Supports
checkpointerparameter for state persistence across turns reset_agent()forces agent recreation (e.g. after memory or skill changes)
Gateway Equivalent Methods (replaces Gateway API):
| Category | Methods | Return format |
|---|---|---|
| Models | list_models(), get_model(name) |
{"models": [...]}, {name, display_name, ...} |
| MCP | get_mcp_config(), update_mcp_config(servers) |
{"mcp_servers": {...}} |
| Skills | list_skills(), get_skill(name), update_skill(name, enabled), install_skill(path) |
{"skills": [...]} |
| Memory | get_memory(), reload_memory(), get_memory_config(), get_memory_status() |
dict |
| Uploads | upload_files(thread_id, files), list_uploads(thread_id), delete_upload(thread_id, filename) |
{"success": true, "files": [...]}, {"files": [...], "count": N} |
| Artifacts | get_artifact(thread_id, path) → (bytes, mime_type) |
tuple |
Key difference from Gateway: Upload accepts local Path objects instead of HTTP UploadFile, rejects directory paths before copying, and reuses a single worker when document conversion must run inside an active event loop. Artifact returns (bytes, mime_type) instead of HTTP Response. update_mcp_config() and update_skill() automatically invalidate the cached agent.
Tests: tests/test_client.py (77 unit tests including TestGatewayConformance), tests/test_client_live.py (live integration tests, requires config.yaml)
Gateway Conformance Tests (TestGatewayConformance): Validate that every dict-returning client method conforms to the corresponding Gateway Pydantic response model. Each test parses the client output through the Gateway model — if Gateway adds a required field that the client doesn't provide, Pydantic raises ValidationError and CI catches the drift. Covers: ModelsListResponse, ModelResponse, SkillsListResponse, SkillResponse, SkillInstallResponse, McpConfigResponse, UploadResponse, MemoryConfigResponse, MemoryStatusResponse.
Development Workflow
Test-Driven Development (TDD) — MANDATORY
Every new feature or bug fix MUST be accompanied by unit tests. No exceptions.
- Write tests in
backend/tests/following the existing naming conventiontest_<feature>.py - Run the full suite before and after your change:
make test - Tests must pass before a feature is considered complete
- For lightweight config/utility modules, prefer pure unit tests with no external dependencies
- If a module causes circular import issues in tests, add a
sys.modulesmock intests/conftest.py(see existing example fordeerflow.subagents.executor)
# Run all tests
make test
# Run a specific test file
PYTHONPATH=. uv run pytest tests/test_<feature>.py -v
Running the Full Application
From the project root directory:
make dev
This starts all services and makes the application available at http://localhost:2026.
Nginx routing:
/api/langgraph/*→ LangGraph Server (2024)/api/*(other) → Gateway API (8001)/(non-API) → Frontend (3000)
Running Backend Services Separately
From the backend directory:
# Terminal 1: LangGraph server
make dev
# Terminal 2: Gateway API
make gateway
Direct access (without nginx):
- LangGraph:
http://localhost:2024 - Gateway:
http://localhost:8001
Frontend Configuration
The frontend uses environment variables to connect to backend services:
NEXT_PUBLIC_LANGGRAPH_BASE_URL- Defaults to/api/langgraph(through nginx)NEXT_PUBLIC_BACKEND_BASE_URL- Defaults to empty string (through nginx)
When using make dev from root, the frontend automatically connects through nginx.
Key Features
File Upload
Multi-file upload with automatic document conversion:
- Endpoint:
POST /api/threads/{thread_id}/uploads - Supports: PDF, PPT, Excel, Word documents (converted via
markitdown) - Rejects directory inputs before copying so uploads stay all-or-nothing
- Reuses one conversion worker per request when called from an active event loop
- Files stored in thread-isolated directories
- Agent receives uploaded file list via
UploadsMiddleware
See docs/FILE_UPLOAD.md for details.
Plan Mode
TodoList middleware for complex multi-step tasks:
- Controlled via runtime config:
config.configurable.is_plan_mode = True - Provides
write_todostool for task tracking - One task in_progress at a time, real-time updates
See docs/plan_mode_usage.md for details.
Context Summarization
Automatic conversation summarization when approaching token limits:
- Configured in
config.yamlundersummarizationkey - Trigger types: tokens, messages, or fraction of max input
- Keeps recent messages while summarizing older ones
See docs/summarization.md for details.
Vision Support
For models with supports_vision: true:
ViewImageMiddlewareprocesses images in conversationview_image_tooladded to agent's toolset- Images automatically converted to base64 and injected into state
Code Style
- Uses
rufffor linting and formatting - Line length: 240 characters
- Python 3.12+ with type hints
- Double quotes, space indentation
Documentation
See docs/ directory for detailed documentation:
- CONFIGURATION.md - Configuration options
- ARCHITECTURE.md - Architecture details
- API.md - API reference
- SETUP.md - Setup guide
- FILE_UPLOAD.md - File upload feature
- PATH_EXAMPLES.md - Path types and usage
- summarization.md - Context summarization
- plan_mode_usage.md - Plan mode with TodoList