diff --git a/backend/CLAUDE.md b/backend/CLAUDE.md index b46a034..1d48aad 100644 --- a/backend/CLAUDE.md +++ b/backend/CLAUDE.md @@ -8,7 +8,7 @@ DeerFlow is a LangGraph-based AI agent system with a full-stack architecture. Th **Architecture**: - **LangGraph Server** (port 2024): Agent runtime and workflow execution -- **Gateway API** (port 8001): REST API for models, MCP, skills, and artifacts +- **Gateway API** (port 8001): REST API for models, MCP, skills, artifacts, and uploads - **Frontend** (port 3000): Next.js web interface - **Nginx** (port 2026): Unified reverse proxy entry point @@ -27,7 +27,12 @@ deer-flow/ │ │ ├── sandbox/ # Sandbox execution system │ │ ├── tools/ # Agent tools │ │ ├── mcp/ # MCP integration -│ │ └── skills/ # Skills loading and management +│ │ ├── models/ # Model factory +│ │ ├── skills/ # Skills loading and management +│ │ ├── config/ # Configuration system +│ │ ├── community/ # Community tools (web search, etc.) +│ │ ├── reflection/ # Dynamic module loading +│ │ └── utils/ # Utilities │ └── langgraph.json # LangGraph server configuration ├── frontend/ # Next.js frontend application └── skills/ # Agent skills directory @@ -74,9 +79,11 @@ make format ### Configuration System -The app uses a YAML-based configuration system loaded from `config.yaml`. +The app uses a two-tier YAML/JSON-based configuration system. -**Setup**: Copy `config.example.yaml` to `config.yaml` in the **project root** directory and customize for your environment. +**Main Configuration** (`config.yaml`): + +Setup: Copy `config.example.yaml` to `config.yaml` in the **project root** directory. ```bash # From project root (deer-flow/) @@ -91,96 +98,10 @@ Configuration priority: Config values starting with `$` are resolved as environment variables (e.g., `$OPENAI_API_KEY`). -### Core Components - -**Gateway API** (`src/gateway/`) -- FastAPI application that provides REST endpoints for frontend integration -- Endpoints: - - `/api/models` - List available LLM models from configuration - - `/api/mcp` - Manage MCP server configurations (GET, POST) - - `/api/skills` - Manage skill configurations (GET, POST) - - `/api/threads/{thread_id}/artifacts/*` - Serve agent-generated artifacts (files, images, etc.) -- Works alongside LangGraph server, handling non-agent HTTP operations -- Proxied through nginx under `/api/*` routes (except `/api/langgraph/*`) - -**Agent Graph** (`src/agents/`) -- `lead_agent` is the main entry point registered in `langgraph.json` -- Uses `ThreadState` which extends `AgentState` with sandbox state -- Agent is created via `create_agent()` with model, tools, middleware, and system prompt - -**Sandbox System** (`src/sandbox/`) -- Abstract `Sandbox` base class defines interface: `execute_command`, `read_file`, `write_file`, `list_dir` -- `SandboxProvider` manages sandbox lifecycle: `acquire`, `get`, `release` -- `SandboxMiddleware` automatically acquires sandbox on agent start and injects into state -- `LocalSandboxProvider` is a singleton implementation for local execution -- Sandbox tools (`bash`, `ls`, `read_file`, `write_file`, `str_replace`) extract sandbox from tool runtime - -**Model Factory** (`src/models/`) -- `create_chat_model()` instantiates LLM from config using reflection -- Supports `thinking_enabled` flag with per-model `when_thinking_enabled` overrides - -**Tool System** (`src/tools/`) -- Tools defined in config with `use` path (e.g., `src.sandbox.tools:bash_tool`) -- `get_available_tools()` resolves tool paths via reflection -- Community tools in `src/community/`: Jina AI (web fetch), Tavily (web search) -- Supports MCP (Model Context Protocol) for pluggable external tools - -**MCP System** (`src/mcp/`) -- Integrates with MCP servers to provide pluggable external tools using `langchain-mcp-adapters` -- Uses `MultiServerMCPClient` from langchain-mcp-adapters for multi-server management -- **Automatic initialization**: Tools are loaded on first use with lazy initialization -- Supports both eager loading (FastAPI startup) and lazy loading (LangGraph Studio) -- `initialize_mcp_tools()` can be called in FastAPI lifespan handler for eager loading -- `get_cached_mcp_tools()` automatically initializes tools if not already loaded -- Works seamlessly in both FastAPI server and LangGraph Studio environments -- Each server can be enabled/disabled independently via `enabled` flag -- Popular MCP servers: filesystem, postgres, github, brave-search, puppeteer -- Built on top of langchain-ai/langchain-mcp-adapters for seamless integration - -**Reflection System** (`src/reflection/`) -- `resolve_variable()` imports module and returns variable (e.g., `module:variable`) -- `resolve_class()` imports and validates class against base class - -**Skills System** (`src/skills/`) -- Skills provide specialized workflows for specific tasks (e.g., PDF processing, frontend design) -- Located in `deer-flow/skills/{public,custom}` directory structure -- Each skill has a `SKILL.md` file with YAML front matter (name, description, license) -- Skills are automatically discovered and loaded at runtime -- `load_skills()` scans directories and parses SKILL.md files -- Skills are injected into agent's system prompt with paths (only enabled skills) -- Path mapping system allows seamless access in both local and Docker sandbox: - - Local sandbox: `/mnt/skills` → `/path/to/deer-flow/skills` - - Docker sandbox: Automatically mounted as volume -- Each skill can be enabled/disabled independently via `enabled` flag in extensions config - -**Middleware System** -- Custom middlewares in `src/agents/middlewares/`: Title generation, thread data, clarification, etc. -- `SummarizationMiddleware` from LangChain automatically condenses conversation history when token limits are approached -- Configured in `config.yaml` under `summarization` key with trigger/keep thresholds -- Middlewares are registered in `src/agents/lead_agent/agent.py` with execution order: - 1. `ThreadDataMiddleware` - Initializes thread context - 2. `SandboxMiddleware` - Manages sandbox lifecycle - 3. `SummarizationMiddleware` - Reduces context when limits are approached (if enabled) - 4. `TitleMiddleware` - Generates conversation titles - 5. `ClarificationMiddleware` - Handles clarification requests (must be last) - -### Config Schema - -Models, tools, sandbox providers, skills, and middleware settings are configured in `config.yaml`: -- `models[]`: LLM configurations with `use` class path -- `tools[]`: Tool configurations with `use` variable path and `group` -- `sandbox.use`: Sandbox provider class path -- `skills.path`: Host path to skills directory (optional, default: `../skills`) -- `skills.container_path`: Container mount path (default: `/mnt/skills`) -- `title`: Automatic thread title generation configuration -- `summarization`: Automatic conversation summarization configuration - -**Extensions Configuration** (`extensions_config.json`) +**Extensions Configuration** (`extensions_config.json`): MCP servers and skills are configured together in `extensions_config.json` in project root: -**Setup**: Copy `extensions_config.example.json` to `extensions_config.json` in the **project root** directory. - ```bash # From project root (deer-flow/) cp extensions_config.example.json extensions_config.json @@ -193,12 +114,115 @@ Configuration priority: 4. `extensions_config.json` in parent directory (project root - **recommended location**) 5. For backward compatibility: `mcp_config.json` (will be deprecated) -Structure: +### Core Components + +**Gateway API** (`src/gateway/`) +- FastAPI application that provides REST endpoints for frontend integration +- Endpoints: + - `/api/models` - List available LLM models from configuration + - `/api/mcp` - Manage MCP server configurations (GET, POST) + - `/api/skills` - Manage skill configurations (GET, POST) + - `/api/threads/{thread_id}/artifacts/*` - Serve agent-generated artifacts + - `/api/threads/{thread_id}/uploads` - File upload, list, delete +- Works alongside LangGraph server, handling non-agent HTTP operations +- Proxied through nginx under `/api/*` routes (except `/api/langgraph/*`) + +**Agent Graph** (`src/agents/`) +- `lead_agent` is the main entry point registered in `langgraph.json` +- Uses `ThreadState` which extends `AgentState` with: + - `sandbox`: Sandbox environment info + - `artifacts`: Generated file paths + - `thread_data`: Workspace/uploads/outputs paths + - `title`: Auto-generated conversation title + - `todos`: Task tracking (plan mode) + - `viewed_images`: Vision model image data +- Agent is created via `make_lead_agent(config)` with model, tools, middleware, and system prompt + +**Sandbox System** (`src/sandbox/`) +- Abstract `Sandbox` base class defines interface: `execute_command`, `read_file`, `write_file`, `list_dir` +- `SandboxProvider` manages sandbox lifecycle: `acquire`, `get`, `release` +- `SandboxMiddleware` automatically acquires sandbox on agent start and injects into state +- `LocalSandboxProvider` is a singleton implementation for local execution +- `AioSandboxProvider` provides Docker-based isolation (in `src/community/`) +- Sandbox tools (`bash`, `ls`, `read_file`, `write_file`, `str_replace`) extract sandbox from tool runtime + +**Virtual Path System**: +- Paths map between virtual and physical locations +- Virtual: `/mnt/user-data/{workspace,uploads,outputs}` - used by agent +- Physical: `backend/.deer-flow/threads/{thread_id}/user-data/{workspace,uploads,outputs}` +- Skills path: `/mnt/skills` maps to `deer-flow/skills/` + +**Model Factory** (`src/models/factory.py`) +- `create_chat_model()` instantiates LLM from config using reflection +- Supports `thinking_enabled` flag with per-model `when_thinking_enabled` overrides +- Supports `supports_vision` flag for image understanding models + +**Tool System** (`src/tools/`) +- Tools defined in config with `use` path (e.g., `src.sandbox.tools:bash_tool`) +- `get_available_tools()` resolves tool paths via reflection +- Built-in tools in `src/tools/builtins/`: + - `present_file_tool` - Display files to users + - `ask_clarification_tool` - Request clarification + - `view_image_tool` - Vision model integration (conditional on model capability) +- Community tools in `src/community/`: Jina AI (web fetch), Tavily (web search), Firecrawl (scraping) +- Supports MCP (Model Context Protocol) for pluggable external tools + +**MCP System** (`src/mcp/`) +- Integrates with MCP servers to provide pluggable external tools using `langchain-mcp-adapters` +- Uses `MultiServerMCPClient` from langchain-mcp-adapters for multi-server management +- **Automatic initialization**: Tools are loaded on first use with lazy initialization +- Supports both eager loading (FastAPI startup) and lazy loading (LangGraph Studio) +- `initialize_mcp_tools()` can be called in FastAPI lifespan handler for eager loading +- `get_cached_mcp_tools()` automatically initializes tools if not already loaded +- Each server can be enabled/disabled independently via `enabled` flag +- Support types: stdio (command-based), SSE, HTTP +- Built on top of langchain-ai/langchain-mcp-adapters for seamless integration + +**Reflection System** (`src/reflection/`) +- `resolve_variable()` imports module and returns variable (e.g., `module:variable`) +- `resolve_class()` imports and validates class against base class + +**Skills System** (`src/skills/`) +- Skills provide specialized workflows for specific tasks (e.g., PDF processing, frontend design) +- Located in `deer-flow/skills/{public,custom}` directory structure +- Each skill has a `SKILL.md` file with YAML front matter (name, description, license, allowed-tools) +- Skills are automatically discovered and loaded at runtime +- `load_skills()` scans directories and parses SKILL.md files +- Skills are injected into agent's system prompt with paths (only enabled skills) +- Path mapping system allows seamless access in both local and Docker sandbox +- Each skill can be enabled/disabled independently via `enabled` flag in extensions config + +**Middleware System** (`src/agents/middlewares/`) +- Custom middlewares handle cross-cutting concerns +- Middlewares are registered in `src/agents/lead_agent/agent.py` with execution order: + 1. `ThreadDataMiddleware` - Initializes thread context (workspace, uploads, outputs paths) + 2. `UploadsMiddleware` - Processes uploaded files, injects file list into state + 3. `SandboxMiddleware` - Manages sandbox lifecycle, acquires on start + 4. `SummarizationMiddleware` - Reduces context when token limits approached (if enabled) + 5. `TitleMiddleware` - Generates conversation titles + 6. `TodoListMiddleware` - Tracks multi-step tasks (if plan_mode enabled) + 7. `ViewImageMiddleware` - Injects image details for vision models + 8. `ClarificationMiddleware` - Handles clarification requests (must be last) + +### Config Schema + +Models, tools, sandbox providers, skills, and middleware settings are configured in `config.yaml`: +- `models[]`: LLM configurations with `use` class path, `supports_thinking`, `supports_vision` +- `tools[]`: Tool configurations with `use` variable path and `group` +- `tool_groups[]`: Logical groupings for tools +- `sandbox.use`: Sandbox provider class path +- `skills.path`: Host path to skills directory (optional, default: `../skills`) +- `skills.container_path`: Container mount path (default: `/mnt/skills`) +- `title`: Automatic thread title generation configuration +- `summarization`: Automatic conversation summarization configuration + +**Extensions Configuration Schema** (`extensions_config.json`): - `mcpServers`: Map of MCP server name to configuration - `enabled`: Whether the server is enabled (boolean) - - `command`: Command to execute to start the server (e.g., "npx", "python") + - `type`: Transport type (`stdio`, `sse`, `http`) + - `command`: Command to execute (for stdio type) - `args`: Arguments to pass to the command (array) - - `env`: Environment variables (object with `$VAR` support for env variable resolution) + - `env`: Environment variables (object with `$VAR` support) - `description`: Human-readable description - `skills`: Map of skill name to state configuration - `enabled`: Whether the skill is enabled (boolean, default: true if not specified) @@ -218,7 +242,7 @@ This starts all services and makes the application available at `http://localhos **Nginx routing**: - `/api/langgraph/*` → LangGraph Server (2024) - Agent interactions, threads, streaming -- `/api/*` (other) → Gateway API (8001) - Models, MCP, skills, artifacts +- `/api/*` (other) → Gateway API (8001) - Models, MCP, skills, artifacts, uploads - `/` (non-API) → Frontend (3000) - Web interface ### Running Backend Services Separately @@ -245,9 +269,57 @@ The frontend uses environment variables to connect to backend services: When using `make dev` from root, the frontend automatically connects through nginx. +## Key Features + +### File Upload + +The backend supports multi-file upload with automatic document conversion: +- Endpoint: `POST /api/threads/{thread_id}/uploads` +- Supports: PDF, PPT, Excel, Word documents +- Auto-converts documents to Markdown using `markitdown` +- Files stored in thread-isolated directories +- Agent automatically receives uploaded file list via `UploadsMiddleware` + +See [docs/FILE_UPLOAD.md](docs/FILE_UPLOAD.md) for details. + +### Plan Mode + +Enable TodoList middleware for complex multi-step tasks: +- Controlled via runtime config: `config.configurable.is_plan_mode = True` +- Provides `write_todos` tool for task tracking +- Agent can break down complex tasks and track progress + +See [docs/plan_mode_usage.md](docs/plan_mode_usage.md) for details. + +### Context Summarization + +Automatic conversation summarization when approaching token limits: +- Configured in `config.yaml` under `summarization` key +- Trigger types: tokens, messages, or fraction of max input +- Keeps recent messages while summarizing older ones + +See [docs/summarization.md](docs/summarization.md) for details. + +### Vision Support + +For models with `supports_vision: true`: +- `ViewImageMiddleware` processes images in conversation +- `view_image_tool` added to agent's toolset +- Images automatically converted and injected into state + ## Code Style - Uses `ruff` for linting and formatting - Line length: 240 characters - Python 3.12+ with type hints - Double quotes, space indentation + +## Documentation + +See `docs/` directory for detailed documentation: +- [CONFIGURATION.md](docs/CONFIGURATION.md) - Configuration options +- [SETUP.md](docs/SETUP.md) - Setup guide +- [FILE_UPLOAD.md](docs/FILE_UPLOAD.md) - File upload feature +- [PATH_EXAMPLES.md](docs/PATH_EXAMPLES.md) - Path types and usage +- [summarization.md](docs/summarization.md) - Context summarization +- [plan_mode_usage.md](docs/plan_mode_usage.md) - Plan mode with TodoList diff --git a/backend/CONTRIBUTING.md b/backend/CONTRIBUTING.md new file mode 100644 index 0000000..d5dfaa3 --- /dev/null +++ b/backend/CONTRIBUTING.md @@ -0,0 +1,427 @@ +# Contributing to DeerFlow Backend + +Thank you for your interest in contributing to DeerFlow! This document provides guidelines and instructions for contributing to the backend codebase. + +## Table of Contents + +- [Getting Started](#getting-started) +- [Development Setup](#development-setup) +- [Project Structure](#project-structure) +- [Code Style](#code-style) +- [Making Changes](#making-changes) +- [Testing](#testing) +- [Pull Request Process](#pull-request-process) +- [Architecture Guidelines](#architecture-guidelines) + +## Getting Started + +### Prerequisites + +- Python 3.12 or higher +- [uv](https://docs.astral.sh/uv/) package manager +- Git +- Docker (optional, for Docker sandbox testing) + +### Fork and Clone + +1. Fork the repository on GitHub +2. Clone your fork locally: + ```bash + git clone https://github.com/YOUR_USERNAME/deer-flow.git + cd deer-flow + ``` + +## Development Setup + +### Install Dependencies + +```bash +# From project root +cp config.example.yaml config.yaml +cp extensions_config.example.json extensions_config.json + +# Install backend dependencies +cd backend +make install +``` + +### Configure Environment + +Set up your API keys for testing: + +```bash +export OPENAI_API_KEY="your-api-key" +# Add other keys as needed +``` + +### Run the Development Server + +```bash +# Terminal 1: LangGraph server +make dev + +# Terminal 2: Gateway API +make gateway +``` + +## Project Structure + +``` +backend/src/ +├── agents/ # Agent system +│ ├── lead_agent/ # Main agent implementation +│ │ └── agent.py # Agent factory and creation +│ ├── middlewares/ # Agent middlewares +│ │ ├── thread_data_middleware.py +│ │ ├── sandbox_middleware.py +│ │ ├── title_middleware.py +│ │ ├── uploads_middleware.py +│ │ ├── view_image_middleware.py +│ │ └── clarification_middleware.py +│ └── thread_state.py # Thread state definition +│ +├── gateway/ # FastAPI Gateway +│ ├── app.py # FastAPI application +│ └── routers/ # Route handlers +│ ├── models.py # /api/models endpoints +│ ├── mcp.py # /api/mcp endpoints +│ ├── skills.py # /api/skills endpoints +│ ├── artifacts.py # /api/threads/.../artifacts +│ └── uploads.py # /api/threads/.../uploads +│ +├── sandbox/ # Sandbox execution +│ ├── __init__.py # Sandbox interface +│ ├── local.py # Local sandbox provider +│ └── tools.py # Sandbox tools (bash, file ops) +│ +├── tools/ # Agent tools +│ └── builtins/ # Built-in tools +│ ├── present_file_tool.py +│ ├── ask_clarification_tool.py +│ └── view_image_tool.py +│ +├── mcp/ # MCP integration +│ └── manager.py # MCP server management +│ +├── models/ # Model system +│ └── factory.py # Model factory +│ +├── skills/ # Skills system +│ └── loader.py # Skills loader +│ +├── config/ # Configuration +│ ├── app_config.py # Main app config +│ ├── extensions_config.py # Extensions config +│ └── summarization_config.py +│ +├── community/ # Community tools +│ ├── tavily/ # Tavily web search +│ ├── jina/ # Jina web fetch +│ ├── firecrawl/ # Firecrawl scraping +│ └── aio_sandbox/ # Docker sandbox +│ +├── reflection/ # Dynamic loading +│ └── __init__.py # Module resolution +│ +└── utils/ # Utilities + └── __init__.py +``` + +## Code Style + +### Linting and Formatting + +We use `ruff` for both linting and formatting: + +```bash +# Check for issues +make lint + +# Auto-fix and format +make format +``` + +### Style Guidelines + +- **Line length**: 240 characters maximum +- **Python version**: 3.12+ features allowed +- **Type hints**: Use type hints for function signatures +- **Quotes**: Double quotes for strings +- **Indentation**: 4 spaces (no tabs) +- **Imports**: Group by standard library, third-party, local + +### Docstrings + +Use docstrings for public functions and classes: + +```python +def create_chat_model(name: str, thinking_enabled: bool = False) -> BaseChatModel: + """Create a chat model instance from configuration. + + Args: + name: The model name as defined in config.yaml + thinking_enabled: Whether to enable extended thinking + + Returns: + A configured LangChain chat model instance + + Raises: + ValueError: If the model name is not found in configuration + """ + ... +``` + +## Making Changes + +### Branch Naming + +Use descriptive branch names: + +- `feature/add-new-tool` - New features +- `fix/sandbox-timeout` - Bug fixes +- `docs/update-readme` - Documentation +- `refactor/config-system` - Code refactoring + +### Commit Messages + +Write clear, concise commit messages: + +``` +feat: add support for Claude 3.5 model + +- Add model configuration in config.yaml +- Update model factory to handle Claude-specific settings +- Add tests for new model +``` + +Prefix types: +- `feat:` - New feature +- `fix:` - Bug fix +- `docs:` - Documentation +- `refactor:` - Code refactoring +- `test:` - Tests +- `chore:` - Build/config changes + +## Testing + +### Running Tests + +```bash +uv run pytest +``` + +### Writing Tests + +Place tests in the `tests/` directory mirroring the source structure: + +``` +tests/ +├── test_models/ +│ └── test_factory.py +├── test_sandbox/ +│ └── test_local.py +└── test_gateway/ + └── test_models_router.py +``` + +Example test: + +```python +import pytest +from src.models.factory import create_chat_model + +def test_create_chat_model_with_valid_name(): + """Test that a valid model name creates a model instance.""" + model = create_chat_model("gpt-4") + assert model is not None + +def test_create_chat_model_with_invalid_name(): + """Test that an invalid model name raises ValueError.""" + with pytest.raises(ValueError): + create_chat_model("nonexistent-model") +``` + +## Pull Request Process + +### Before Submitting + +1. **Ensure tests pass**: `uv run pytest` +2. **Run linter**: `make lint` +3. **Format code**: `make format` +4. **Update documentation** if needed + +### PR Description + +Include in your PR description: + +- **What**: Brief description of changes +- **Why**: Motivation for the change +- **How**: Implementation approach +- **Testing**: How you tested the changes + +### Review Process + +1. Submit PR with clear description +2. Address review feedback +3. Ensure CI passes +4. Maintainer will merge when approved + +## Architecture Guidelines + +### Adding New Tools + +1. Create tool in `src/tools/builtins/` or `src/community/`: + +```python +# src/tools/builtins/my_tool.py +from langchain_core.tools import tool + +@tool +def my_tool(param: str) -> str: + """Tool description for the agent. + + Args: + param: Description of the parameter + + Returns: + Description of return value + """ + return f"Result: {param}" +``` + +2. Register in `config.yaml`: + +```yaml +tools: + - name: my_tool + group: my_group + use: src.tools.builtins.my_tool:my_tool +``` + +### Adding New Middleware + +1. Create middleware in `src/agents/middlewares/`: + +```python +# src/agents/middlewares/my_middleware.py +from langchain.agents.middleware import BaseMiddleware +from langchain_core.runnables import RunnableConfig + +class MyMiddleware(BaseMiddleware): + """Middleware description.""" + + def transform_state(self, state: dict, config: RunnableConfig) -> dict: + """Transform the state before agent execution.""" + # Modify state as needed + return state +``` + +2. Register in `src/agents/lead_agent/agent.py`: + +```python +middlewares = [ + ThreadDataMiddleware(), + SandboxMiddleware(), + MyMiddleware(), # Add your middleware + TitleMiddleware(), + ClarificationMiddleware(), +] +``` + +### Adding New API Endpoints + +1. Create router in `src/gateway/routers/`: + +```python +# src/gateway/routers/my_router.py +from fastapi import APIRouter + +router = APIRouter(prefix="/my-endpoint", tags=["my-endpoint"]) + +@router.get("/") +async def get_items(): + """Get all items.""" + return {"items": []} + +@router.post("/") +async def create_item(data: dict): + """Create a new item.""" + return {"created": data} +``` + +2. Register in `src/gateway/app.py`: + +```python +from src.gateway.routers import my_router + +app.include_router(my_router.router) +``` + +### Configuration Changes + +When adding new configuration options: + +1. Update `src/config/app_config.py` with new fields +2. Add default values in `config.example.yaml` +3. Document in `docs/CONFIGURATION.md` + +### MCP Server Integration + +To add support for a new MCP server: + +1. Add configuration in `extensions_config.json`: + +```json +{ + "mcpServers": { + "my-server": { + "enabled": true, + "type": "stdio", + "command": "npx", + "args": ["-y", "@my-org/mcp-server"], + "description": "My MCP Server" + } + } +} +``` + +2. Update `extensions_config.example.json` with the new server + +### Skills Development + +To create a new skill: + +1. Create directory in `skills/public/` or `skills/custom/`: + +``` +skills/public/my-skill/ +└── SKILL.md +``` + +2. Write `SKILL.md` with YAML front matter: + +```markdown +--- +name: My Skill +description: What this skill does +license: MIT +allowed-tools: + - read_file + - write_file + - bash +--- + +# My Skill + +Instructions for the agent when this skill is enabled... +``` + +## Questions? + +If you have questions about contributing: + +1. Check existing documentation in `docs/` +2. Look for similar issues or PRs on GitHub +3. Open a discussion or issue on GitHub + +Thank you for contributing to DeerFlow! diff --git a/backend/README.md b/backend/README.md new file mode 100644 index 0000000..4c8e01c --- /dev/null +++ b/backend/README.md @@ -0,0 +1,275 @@ +# DeerFlow Backend + +DeerFlow is a LangGraph-based AI agent system that provides a powerful "super agent" with sandbox execution capabilities. The backend enables AI agents to execute code, browse the web, manage files, and perform complex multi-step tasks in isolated environments. + +## Features + +- **LangGraph Agent Runtime**: Built on LangGraph for robust multi-agent workflow orchestration +- **Sandbox Execution**: Safe code execution with local or Docker-based isolation +- **Multi-Model Support**: OpenAI, Anthropic Claude, DeepSeek, Doubao, Kimi, and custom LangChain-compatible models +- **MCP Integration**: Extensible tool ecosystem via Model Context Protocol +- **Skills System**: Specialized domain workflows injected into agent prompts +- **File Upload & Processing**: Multi-format document upload with automatic Markdown conversion +- **Context Summarization**: Automatic conversation summarization for long conversations +- **Plan Mode**: TodoList middleware for complex multi-step task tracking + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ Nginx (Port 2026) │ +│ Unified reverse proxy entry point │ +└─────────────────┬───────────────────────────────┬───────────────┘ + │ │ + ▼ ▼ +┌─────────────────────────────┐ ┌─────────────────────────────┐ +│ LangGraph Server (2024) │ │ Gateway API (8001) │ +│ Agent runtime & workflows │ │ Models, MCP, Skills, etc. │ +└─────────────────────────────┘ └─────────────────────────────┘ +``` + +**Request Routing**: +- `/api/langgraph/*` → LangGraph Server (agent interactions, threads, streaming) +- `/api/*` (other) → Gateway API (models, MCP, skills, artifacts, uploads) +- `/` (non-API) → Frontend (web interface) + +## Quick Start + +### Prerequisites + +- Python 3.12+ +- [uv](https://docs.astral.sh/uv/) package manager +- API keys for your chosen LLM provider + +### Installation + +```bash +# Clone the repository (if not already) +cd deer-flow + +# Copy configuration files +cp config.example.yaml config.yaml +cp extensions_config.example.json extensions_config.json + +# Install backend dependencies +cd backend +make install +``` + +### Configuration + +Edit `config.yaml` in the project root to configure your models and tools: + +```yaml +models: + - name: gpt-4 + display_name: GPT-4 + use: langchain_openai:ChatOpenAI + model: gpt-4 + api_key: $OPENAI_API_KEY # Set environment variable + max_tokens: 4096 +``` + +Set your API keys: + +```bash +export OPENAI_API_KEY="your-api-key-here" +# Or other provider keys as needed +``` + +### Running + +**Full Application** (from project root): + +```bash +make dev # Starts LangGraph + Gateway + Frontend + Nginx +``` + +Access at: http://localhost:2026 + +**Backend Only** (from backend directory): + +```bash +# Terminal 1: LangGraph server +make dev + +# Terminal 2: Gateway API +make gateway +``` + +Direct access: +- LangGraph: http://localhost:2024 +- Gateway: http://localhost:8001 + +## Project Structure + +``` +backend/ +├── src/ +│ ├── agents/ # LangGraph agents and workflows +│ │ ├── lead_agent/ # Main agent implementation +│ │ └── middlewares/ # Agent middlewares +│ ├── gateway/ # FastAPI Gateway API +│ │ └── routers/ # API route handlers +│ ├── sandbox/ # Sandbox execution system +│ ├── tools/ # Agent tools (builtins) +│ ├── mcp/ # MCP integration +│ ├── models/ # Model factory +│ ├── skills/ # Skills loader +│ ├── config/ # Configuration system +│ ├── community/ # Community tools (web search, etc.) +│ ├── reflection/ # Dynamic module loading +│ └── utils/ # Utility functions +├── docs/ # Documentation +├── tests/ # Test suite +├── langgraph.json # LangGraph server configuration +├── config.yaml # Application configuration (optional) +├── pyproject.toml # Python dependencies +├── Makefile # Development commands +└── Dockerfile # Container build +``` + +## API Reference + +### LangGraph API (via `/api/langgraph/*`) + +- `POST /threads` - Create new conversation thread +- `POST /threads/{thread_id}/runs` - Execute agent with input +- `GET /threads/{thread_id}/runs` - Get run history +- `GET /threads/{thread_id}/state` - Get current conversation state +- WebSocket support for streaming responses + +### Gateway API (via `/api/*`) + +**Models**: +- `GET /api/models` - List available LLM models +- `GET /api/models/{model_name}` - Get model details + +**MCP Configuration**: +- `GET /api/mcp/config` - Get current MCP server configurations +- `PUT /api/mcp/config` - Update MCP configuration + +**Skills Management**: +- `GET /api/skills` - List all skills +- `GET /api/skills/{skill_name}` - Get skill details +- `POST /api/skills/{skill_name}/enable` - Enable a skill +- `POST /api/skills/{skill_name}/disable` - Disable a skill +- `POST /api/skills/install` - Install skill from `.skill` file + +**File Uploads**: +- `POST /api/threads/{thread_id}/uploads` - Upload files +- `GET /api/threads/{thread_id}/uploads/list` - List uploaded files +- `DELETE /api/threads/{thread_id}/uploads/{filename}` - Delete file + +**Artifacts**: +- `GET /api/threads/{thread_id}/artifacts/{path}` - Download generated artifacts + +## Configuration + +### Main Configuration (`config.yaml`) + +The application uses a YAML-based configuration file. Place it in the project root directory. + +Key sections: +- `models`: LLM configurations with class paths and API keys +- `tool_groups`: Logical groupings for tools +- `tools`: Tool definitions with module paths +- `sandbox`: Execution environment settings +- `skills`: Skills directory configuration +- `title`: Auto-title generation settings +- `summarization`: Context summarization settings + +See [docs/CONFIGURATION.md](docs/CONFIGURATION.md) for detailed documentation. + +### Extensions Configuration (`extensions_config.json`) + +MCP servers and skills are configured in `extensions_config.json`: + +```json +{ + "mcpServers": { + "github": { + "enabled": true, + "type": "stdio", + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-github"], + "env": {"GITHUB_TOKEN": "$GITHUB_TOKEN"} + } + }, + "skills": { + "pdf-processing": {"enabled": true} + } +} +``` + +### Environment Variables + +- `DEER_FLOW_CONFIG_PATH` - Override config.yaml location +- `DEER_FLOW_EXTENSIONS_CONFIG_PATH` - Override extensions_config.json location +- Model API keys: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `DEEPSEEK_API_KEY`, etc. +- Tool API keys: `TAVILY_API_KEY`, `GITHUB_TOKEN`, etc. + +## Development + +### Commands + +```bash +make install # Install dependencies +make dev # Run LangGraph server (port 2024) +make gateway # Run Gateway API (port 8001) +make lint # Run linter (ruff) +make format # Format code (ruff) +``` + +### Code Style + +- Uses `ruff` for linting and formatting +- Line length: 240 characters +- Python 3.12+ with type hints +- Double quotes, space indentation + +### Testing + +```bash +uv run pytest +``` + +## Documentation + +- [Configuration Guide](docs/CONFIGURATION.md) - Detailed configuration options +- [Setup Guide](docs/SETUP.md) - Quick setup instructions +- [File Upload](docs/FILE_UPLOAD.md) - File upload functionality +- [Path Examples](docs/PATH_EXAMPLES.md) - Path types and usage +- [Summarization](docs/summarization.md) - Context summarization feature +- [Plan Mode](docs/plan_mode_usage.md) - TodoList middleware usage + +## Technology Stack + +### Core Frameworks +- **LangChain** (1.2.3+) - LLM orchestration +- **LangGraph** (1.0.6+) - Multi-agent workflows +- **FastAPI** (0.115.0+) - REST API +- **Uvicorn** (0.34.0+) - ASGI server + +### LLM Integrations +- `langchain-openai` - OpenAI models +- `langchain-anthropic` - Claude models +- `langchain-deepseek` - DeepSeek models + +### Extensions +- `langchain-mcp-adapters` - MCP protocol support +- `agent-sandbox` - Sandboxed code execution + +### Utilities +- `markitdown` - Multi-format to Markdown conversion +- `tavily-python` - Web search +- `firecrawl-py` - Web scraping +- `ddgs` - DuckDuckGo image search + +## License + +See the [LICENSE](../LICENSE) file in the project root. + +## Contributing + +See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines. diff --git a/backend/docs/API.md b/backend/docs/API.md new file mode 100644 index 0000000..358257d --- /dev/null +++ b/backend/docs/API.md @@ -0,0 +1,605 @@ +# API Reference + +This document provides a complete reference for the DeerFlow backend APIs. + +## Overview + +DeerFlow backend exposes two sets of APIs: + +1. **LangGraph API** - Agent interactions, threads, and streaming (`/api/langgraph/*`) +2. **Gateway API** - Models, MCP, skills, uploads, and artifacts (`/api/*`) + +All APIs are accessed through the Nginx reverse proxy at port 2026. + +## LangGraph API + +Base URL: `/api/langgraph` + +The LangGraph API is provided by the LangGraph server and follows the LangGraph SDK conventions. + +### Threads + +#### Create Thread + +```http +POST /api/langgraph/threads +Content-Type: application/json +``` + +**Request Body:** +```json +{ + "metadata": {} +} +``` + +**Response:** +```json +{ + "thread_id": "abc123", + "created_at": "2024-01-15T10:30:00Z", + "metadata": {} +} +``` + +#### Get Thread State + +```http +GET /api/langgraph/threads/{thread_id}/state +``` + +**Response:** +```json +{ + "values": { + "messages": [...], + "sandbox": {...}, + "artifacts": [...], + "thread_data": {...}, + "title": "Conversation Title" + }, + "next": [], + "config": {...} +} +``` + +### Runs + +#### Create Run + +Execute the agent with input. + +```http +POST /api/langgraph/threads/{thread_id}/runs +Content-Type: application/json +``` + +**Request Body:** +```json +{ + "input": { + "messages": [ + { + "role": "user", + "content": "Hello, can you help me?" + } + ] + }, + "config": { + "configurable": { + "model_name": "gpt-4", + "thinking_enabled": false, + "is_plan_mode": false + } + }, + "stream_mode": ["values", "messages"] +} +``` + +**Configurable Options:** +- `model_name` (string): Override the default model +- `thinking_enabled` (boolean): Enable extended thinking for supported models +- `is_plan_mode` (boolean): Enable TodoList middleware for task tracking + +**Response:** Server-Sent Events (SSE) stream + +``` +event: values +data: {"messages": [...], "title": "..."} + +event: messages +data: {"content": "Hello! I'd be happy to help.", "role": "assistant"} + +event: end +data: {} +``` + +#### Get Run History + +```http +GET /api/langgraph/threads/{thread_id}/runs +``` + +**Response:** +```json +{ + "runs": [ + { + "run_id": "run123", + "status": "success", + "created_at": "2024-01-15T10:30:00Z" + } + ] +} +``` + +#### Stream Run + +Stream responses in real-time. + +```http +POST /api/langgraph/threads/{thread_id}/runs/stream +Content-Type: application/json +``` + +Same request body as Create Run. Returns SSE stream. + +--- + +## Gateway API + +Base URL: `/api` + +### Models + +#### List Models + +Get all available LLM models from configuration. + +```http +GET /api/models +``` + +**Response:** +```json +{ + "models": [ + { + "name": "gpt-4", + "display_name": "GPT-4", + "supports_thinking": false, + "supports_vision": true + }, + { + "name": "claude-3-opus", + "display_name": "Claude 3 Opus", + "supports_thinking": false, + "supports_vision": true + }, + { + "name": "deepseek-v3", + "display_name": "DeepSeek V3", + "supports_thinking": true, + "supports_vision": false + } + ] +} +``` + +#### Get Model Details + +```http +GET /api/models/{model_name} +``` + +**Response:** +```json +{ + "name": "gpt-4", + "display_name": "GPT-4", + "model": "gpt-4", + "max_tokens": 4096, + "supports_thinking": false, + "supports_vision": true +} +``` + +### MCP Configuration + +#### Get MCP Config + +Get current MCP server configurations. + +```http +GET /api/mcp/config +``` + +**Response:** +```json +{ + "mcpServers": { + "github": { + "enabled": true, + "type": "stdio", + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-github"], + "env": { + "GITHUB_TOKEN": "***" + }, + "description": "GitHub operations" + }, + "filesystem": { + "enabled": false, + "type": "stdio", + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-filesystem"], + "description": "File system access" + } + } +} +``` + +#### Update MCP Config + +Update MCP server configurations. + +```http +PUT /api/mcp/config +Content-Type: application/json +``` + +**Request Body:** +```json +{ + "mcpServers": { + "github": { + "enabled": true, + "type": "stdio", + "command": "npx", + "args": ["-y", "@modelcontextprotocol/server-github"], + "env": { + "GITHUB_TOKEN": "$GITHUB_TOKEN" + }, + "description": "GitHub operations" + } + } +} +``` + +**Response:** +```json +{ + "success": true, + "message": "MCP configuration updated" +} +``` + +### Skills + +#### List Skills + +Get all available skills. + +```http +GET /api/skills +``` + +**Response:** +```json +{ + "skills": [ + { + "name": "pdf-processing", + "display_name": "PDF Processing", + "description": "Handle PDF documents efficiently", + "enabled": true, + "license": "MIT", + "path": "public/pdf-processing" + }, + { + "name": "frontend-design", + "display_name": "Frontend Design", + "description": "Design and build frontend interfaces", + "enabled": false, + "license": "MIT", + "path": "public/frontend-design" + } + ] +} +``` + +#### Get Skill Details + +```http +GET /api/skills/{skill_name} +``` + +**Response:** +```json +{ + "name": "pdf-processing", + "display_name": "PDF Processing", + "description": "Handle PDF documents efficiently", + "enabled": true, + "license": "MIT", + "path": "public/pdf-processing", + "allowed_tools": ["read_file", "write_file", "bash"], + "content": "# PDF Processing\n\nInstructions for the agent..." +} +``` + +#### Enable Skill + +```http +POST /api/skills/{skill_name}/enable +``` + +**Response:** +```json +{ + "success": true, + "message": "Skill 'pdf-processing' enabled" +} +``` + +#### Disable Skill + +```http +POST /api/skills/{skill_name}/disable +``` + +**Response:** +```json +{ + "success": true, + "message": "Skill 'pdf-processing' disabled" +} +``` + +#### Install Skill + +Install a skill from a `.skill` file. + +```http +POST /api/skills/install +Content-Type: multipart/form-data +``` + +**Request Body:** +- `file`: The `.skill` file to install + +**Response:** +```json +{ + "success": true, + "message": "Skill 'my-skill' installed successfully", + "skill": { + "name": "my-skill", + "display_name": "My Skill", + "path": "custom/my-skill" + } +} +``` + +### File Uploads + +#### Upload Files + +Upload one or more files to a thread. + +```http +POST /api/threads/{thread_id}/uploads +Content-Type: multipart/form-data +``` + +**Request Body:** +- `files`: One or more files to upload + +**Response:** +```json +{ + "success": true, + "files": [ + { + "filename": "document.pdf", + "size": 1234567, + "path": ".deer-flow/threads/abc123/user-data/uploads/document.pdf", + "virtual_path": "/mnt/user-data/uploads/document.pdf", + "artifact_url": "/api/threads/abc123/artifacts/mnt/user-data/uploads/document.pdf", + "markdown_file": "document.md", + "markdown_path": ".deer-flow/threads/abc123/user-data/uploads/document.md", + "markdown_virtual_path": "/mnt/user-data/uploads/document.md", + "markdown_artifact_url": "/api/threads/abc123/artifacts/mnt/user-data/uploads/document.md" + } + ], + "message": "Successfully uploaded 1 file(s)" +} +``` + +**Supported Document Formats** (auto-converted to Markdown): +- PDF (`.pdf`) +- PowerPoint (`.ppt`, `.pptx`) +- Excel (`.xls`, `.xlsx`) +- Word (`.doc`, `.docx`) + +#### List Uploaded Files + +```http +GET /api/threads/{thread_id}/uploads/list +``` + +**Response:** +```json +{ + "files": [ + { + "filename": "document.pdf", + "size": 1234567, + "path": ".deer-flow/threads/abc123/user-data/uploads/document.pdf", + "virtual_path": "/mnt/user-data/uploads/document.pdf", + "artifact_url": "/api/threads/abc123/artifacts/mnt/user-data/uploads/document.pdf", + "extension": ".pdf", + "modified": 1705997600.0 + } + ], + "count": 1 +} +``` + +#### Delete File + +```http +DELETE /api/threads/{thread_id}/uploads/{filename} +``` + +**Response:** +```json +{ + "success": true, + "message": "Deleted document.pdf" +} +``` + +### Artifacts + +#### Get Artifact + +Download or view an artifact generated by the agent. + +```http +GET /api/threads/{thread_id}/artifacts/{path} +``` + +**Path Examples:** +- `/api/threads/abc123/artifacts/mnt/user-data/outputs/result.txt` +- `/api/threads/abc123/artifacts/mnt/user-data/uploads/document.pdf` + +**Query Parameters:** +- `download` (boolean): If `true`, force download with Content-Disposition header + +**Response:** File content with appropriate Content-Type + +--- + +## Error Responses + +All APIs return errors in a consistent format: + +```json +{ + "detail": "Error message describing what went wrong" +} +``` + +**HTTP Status Codes:** +- `400` - Bad Request: Invalid input +- `404` - Not Found: Resource not found +- `422` - Validation Error: Request validation failed +- `500` - Internal Server Error: Server-side error + +--- + +## Authentication + +Currently, DeerFlow does not implement authentication. All APIs are accessible without credentials. + +For production deployments, it is recommended to: +1. Use Nginx for basic auth or OAuth integration +2. Deploy behind a VPN or private network +3. Implement custom authentication middleware + +--- + +## Rate Limiting + +No rate limiting is implemented by default. For production deployments, configure rate limiting in Nginx: + +```nginx +limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s; + +location /api/ { + limit_req zone=api burst=20 nodelay; + proxy_pass http://backend; +} +``` + +--- + +## WebSocket Support + +The LangGraph server supports WebSocket connections for real-time streaming. Connect to: + +``` +ws://localhost:2026/api/langgraph/threads/{thread_id}/runs/stream +``` + +--- + +## SDK Usage + +### Python (LangGraph SDK) + +```python +from langgraph_sdk import get_client + +client = get_client(url="http://localhost:2026/api/langgraph") + +# Create thread +thread = await client.threads.create() + +# Run agent +async for event in client.runs.stream( + thread["thread_id"], + "lead_agent", + input={"messages": [{"role": "user", "content": "Hello"}]}, + config={"configurable": {"model_name": "gpt-4"}}, + stream_mode=["values", "messages"], +): + print(event) +``` + +### JavaScript/TypeScript + +```typescript +// Using fetch for Gateway API +const response = await fetch('/api/models'); +const data = await response.json(); +console.log(data.models); + +// Using EventSource for streaming +const eventSource = new EventSource( + `/api/langgraph/threads/${threadId}/runs/stream` +); +eventSource.onmessage = (event) => { + console.log(JSON.parse(event.data)); +}; +``` + +### cURL Examples + +```bash +# List models +curl http://localhost:2026/api/models + +# Get MCP config +curl http://localhost:2026/api/mcp/config + +# Upload file +curl -X POST http://localhost:2026/api/threads/abc123/uploads \ + -F "files=@document.pdf" + +# Enable skill +curl -X POST http://localhost:2026/api/skills/pdf-processing/enable + +# Create thread and run agent +curl -X POST http://localhost:2026/api/langgraph/threads \ + -H "Content-Type: application/json" \ + -d '{}' + +curl -X POST http://localhost:2026/api/langgraph/threads/abc123/runs \ + -H "Content-Type: application/json" \ + -d '{ + "input": {"messages": [{"role": "user", "content": "Hello"}]}, + "config": {"configurable": {"model_name": "gpt-4"}} + }' +``` diff --git a/backend/docs/ARCHITECTURE.md b/backend/docs/ARCHITECTURE.md new file mode 100644 index 0000000..cf0285f --- /dev/null +++ b/backend/docs/ARCHITECTURE.md @@ -0,0 +1,464 @@ +# Architecture Overview + +This document provides a comprehensive overview of the DeerFlow backend architecture. + +## System Architecture + +``` +┌──────────────────────────────────────────────────────────────────────────┐ +│ Client (Browser) │ +└─────────────────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌──────────────────────────────────────────────────────────────────────────┐ +│ Nginx (Port 2026) │ +│ Unified Reverse Proxy Entry Point │ +│ ┌────────────────────────────────────────────────────────────────────┐ │ +│ │ /api/langgraph/* → LangGraph Server (2024) │ │ +│ │ /api/* → Gateway API (8001) │ │ +│ │ /* → Frontend (3000) │ │ +│ └────────────────────────────────────────────────────────────────────┘ │ +└─────────────────────────────────┬────────────────────────────────────────┘ + │ + ┌───────────────────────┼───────────────────────┐ + │ │ │ + ▼ ▼ ▼ +┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ +│ LangGraph Server │ │ Gateway API │ │ Frontend │ +│ (Port 2024) │ │ (Port 8001) │ │ (Port 3000) │ +│ │ │ │ │ │ +│ - Agent Runtime │ │ - Models API │ │ - Next.js App │ +│ - Thread Mgmt │ │ - MCP Config │ │ - React UI │ +│ - SSE Streaming │ │ - Skills Mgmt │ │ - Chat Interface │ +│ - Checkpointing │ │ - File Uploads │ │ │ +│ │ │ - Artifacts │ │ │ +└─────────────────────┘ └─────────────────────┘ └─────────────────────┘ + │ │ + │ ┌─────────────────┘ + │ │ + ▼ ▼ +┌──────────────────────────────────────────────────────────────────────────┐ +│ Shared Configuration │ +│ ┌─────────────────────────┐ ┌────────────────────────────────────────┐ │ +│ │ config.yaml │ │ extensions_config.json │ │ +│ │ - Models │ │ - MCP Servers │ │ +│ │ - Tools │ │ - Skills State │ │ +│ │ - Sandbox │ │ │ │ +│ │ - Summarization │ │ │ │ +│ └─────────────────────────┘ └────────────────────────────────────────┘ │ +└──────────────────────────────────────────────────────────────────────────┘ +``` + +## Component Details + +### LangGraph Server + +The LangGraph server is the core agent runtime, built on LangGraph for robust multi-agent workflow orchestration. + +**Entry Point**: `src/agents/lead_agent/agent.py:make_lead_agent` + +**Key Responsibilities**: +- Agent creation and configuration +- Thread state management +- Middleware chain execution +- Tool execution orchestration +- SSE streaming for real-time responses + +**Configuration**: `langgraph.json` + +```json +{ + "agent": { + "type": "agent", + "path": "src.agents:make_lead_agent" + } +} +``` + +### Gateway API + +FastAPI application providing REST endpoints for non-agent operations. + +**Entry Point**: `src/gateway/app.py` + +**Routers**: +- `models.py` - `/api/models` - Model listing and details +- `mcp.py` - `/api/mcp` - MCP server configuration +- `skills.py` - `/api/skills` - Skills management +- `uploads.py` - `/api/threads/{id}/uploads` - File upload +- `artifacts.py` - `/api/threads/{id}/artifacts` - Artifact serving + +### Agent Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ make_lead_agent(config) │ +└────────────────────────────────────┬────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────┐ +│ Middleware Chain │ +│ ┌──────────────────────────────────────────────────────────────────┐ │ +│ │ 1. ThreadDataMiddleware - Initialize workspace/uploads/outputs │ │ +│ │ 2. UploadsMiddleware - Process uploaded files │ │ +│ │ 3. SandboxMiddleware - Acquire sandbox environment │ │ +│ │ 4. SummarizationMiddleware - Context reduction (if enabled) │ │ +│ │ 5. TitleMiddleware - Auto-generate titles │ │ +│ │ 6. TodoListMiddleware - Task tracking (if plan_mode) │ │ +│ │ 7. ViewImageMiddleware - Vision model support │ │ +│ │ 8. ClarificationMiddleware - Handle clarifications │ │ +│ └──────────────────────────────────────────────────────────────────┘ │ +└────────────────────────────────────┬────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────┐ +│ Agent Core │ +│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐ │ +│ │ Model │ │ Tools │ │ System Prompt │ │ +│ │ (from factory) │ │ (configured + │ │ (with skills) │ │ +│ │ │ │ MCP + builtin) │ │ │ │ +│ └──────────────────┘ └──────────────────┘ └──────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +### Thread State + +The `ThreadState` extends LangGraph's `AgentState` with additional fields: + +```python +class ThreadState(AgentState): + # Core state from AgentState + messages: list[BaseMessage] + + # DeerFlow extensions + sandbox: dict # Sandbox environment info + artifacts: list[str] # Generated file paths + thread_data: dict # {workspace, uploads, outputs} paths + title: str | None # Auto-generated conversation title + todos: list[dict] # Task tracking (plan mode) + viewed_images: dict # Vision model image data +``` + +### Sandbox System + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ Sandbox Architecture │ +└─────────────────────────────────────────────────────────────────────────┘ + + ┌─────────────────────────┐ + │ SandboxProvider │ (Abstract) + │ - acquire() │ + │ - get() │ + │ - release() │ + └────────────┬────────────┘ + │ + ┌────────────────────┼────────────────────┐ + │ │ + ▼ ▼ +┌─────────────────────────┐ ┌─────────────────────────┐ +│ LocalSandboxProvider │ │ AioSandboxProvider │ +│ (src/sandbox/local.py) │ │ (src/community/) │ +│ │ │ │ +│ - Singleton instance │ │ - Docker-based │ +│ - Direct execution │ │ - Isolated containers │ +│ - Development use │ │ - Production use │ +└─────────────────────────┘ └─────────────────────────┘ + + ┌─────────────────────────┐ + │ Sandbox │ (Abstract) + │ - execute_command() │ + │ - read_file() │ + │ - write_file() │ + │ - list_dir() │ + └─────────────────────────┘ +``` + +**Virtual Path Mapping**: + +| Virtual Path | Physical Path | +|-------------|---------------| +| `/mnt/user-data/workspace` | `backend/.deer-flow/threads/{thread_id}/user-data/workspace` | +| `/mnt/user-data/uploads` | `backend/.deer-flow/threads/{thread_id}/user-data/uploads` | +| `/mnt/user-data/outputs` | `backend/.deer-flow/threads/{thread_id}/user-data/outputs` | +| `/mnt/skills` | `deer-flow/skills/` | + +### Tool System + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ Tool Sources │ +└─────────────────────────────────────────────────────────────────────────┘ + +┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ +│ Built-in Tools │ │ Configured Tools │ │ MCP Tools │ +│ (src/tools/) │ │ (config.yaml) │ │ (extensions.json) │ +├─────────────────────┤ ├─────────────────────┤ ├─────────────────────┤ +│ - present_file │ │ - web_search │ │ - github │ +│ - ask_clarification │ │ - web_fetch │ │ - filesystem │ +│ - view_image │ │ - bash │ │ - postgres │ +│ │ │ - read_file │ │ - brave-search │ +│ │ │ - write_file │ │ - puppeteer │ +│ │ │ - str_replace │ │ - ... │ +│ │ │ - ls │ │ │ +└─────────────────────┘ └─────────────────────┘ └─────────────────────┘ + │ │ │ + └───────────────────────┴───────────────────────┘ + │ + ▼ + ┌─────────────────────────┐ + │ get_available_tools() │ + │ (src/tools/__init__) │ + └─────────────────────────┘ +``` + +### Model Factory + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ Model Factory │ +│ (src/models/factory.py) │ +└─────────────────────────────────────────────────────────────────────────┘ + +config.yaml: +┌─────────────────────────────────────────────────────────────────────────┐ +│ models: │ +│ - name: gpt-4 │ +│ display_name: GPT-4 │ +│ use: langchain_openai:ChatOpenAI │ +│ model: gpt-4 │ +│ api_key: $OPENAI_API_KEY │ +│ max_tokens: 4096 │ +│ supports_thinking: false │ +│ supports_vision: true │ +└─────────────────────────────────────────────────────────────────────────┘ + │ + ▼ + ┌─────────────────────────┐ + │ create_chat_model() │ + │ - name: str │ + │ - thinking_enabled │ + └────────────┬────────────┘ + │ + ▼ + ┌─────────────────────────┐ + │ resolve_class() │ + │ (reflection system) │ + └────────────┬────────────┘ + │ + ▼ + ┌─────────────────────────┐ + │ BaseChatModel │ + │ (LangChain instance) │ + └─────────────────────────┘ +``` + +**Supported Providers**: +- OpenAI (`langchain_openai:ChatOpenAI`) +- Anthropic (`langchain_anthropic:ChatAnthropic`) +- DeepSeek (`langchain_deepseek:ChatDeepSeek`) +- Custom via LangChain integrations + +### MCP Integration + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ MCP Integration │ +│ (src/mcp/manager.py) │ +└─────────────────────────────────────────────────────────────────────────┘ + +extensions_config.json: +┌─────────────────────────────────────────────────────────────────────────┐ +│ { │ +│ "mcpServers": { │ +│ "github": { │ +│ "enabled": true, │ +│ "type": "stdio", │ +│ "command": "npx", │ +│ "args": ["-y", "@modelcontextprotocol/server-github"], │ +│ "env": {"GITHUB_TOKEN": "$GITHUB_TOKEN"} │ +│ } │ +│ } │ +│ } │ +└─────────────────────────────────────────────────────────────────────────┘ + │ + ▼ + ┌─────────────────────────┐ + │ MultiServerMCPClient │ + │ (langchain-mcp-adapters)│ + └────────────┬────────────┘ + │ + ┌────────────────────┼────────────────────┐ + │ │ │ + ▼ ▼ ▼ + ┌───────────┐ ┌───────────┐ ┌───────────┐ + │ stdio │ │ SSE │ │ HTTP │ + │ transport │ │ transport │ │ transport │ + └───────────┘ └───────────┘ └───────────┘ +``` + +### Skills System + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ Skills System │ +│ (src/skills/loader.py) │ +└─────────────────────────────────────────────────────────────────────────┘ + +Directory Structure: +┌─────────────────────────────────────────────────────────────────────────┐ +│ skills/ │ +│ ├── public/ # Public skills (committed) │ +│ │ ├── pdf-processing/ │ +│ │ │ └── SKILL.md │ +│ │ ├── frontend-design/ │ +│ │ │ └── SKILL.md │ +│ │ └── ... │ +│ └── custom/ # Custom skills (gitignored) │ +│ └── user-installed/ │ +│ └── SKILL.md │ +└─────────────────────────────────────────────────────────────────────────┘ + +SKILL.md Format: +┌─────────────────────────────────────────────────────────────────────────┐ +│ --- │ +│ name: PDF Processing │ +│ description: Handle PDF documents efficiently │ +│ license: MIT │ +│ allowed-tools: │ +│ - read_file │ +│ - write_file │ +│ - bash │ +│ --- │ +│ │ +│ # Skill Instructions │ +│ Content injected into system prompt... │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +### Request Flow + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ Request Flow Example │ +│ User sends message to agent │ +└─────────────────────────────────────────────────────────────────────────┘ + +1. Client → Nginx + POST /api/langgraph/threads/{thread_id}/runs + {"input": {"messages": [{"role": "user", "content": "Hello"}]}} + +2. Nginx → LangGraph Server (2024) + Proxied to LangGraph server + +3. LangGraph Server + a. Load/create thread state + b. Execute middleware chain: + - ThreadDataMiddleware: Set up paths + - UploadsMiddleware: Inject file list + - SandboxMiddleware: Acquire sandbox + - SummarizationMiddleware: Check token limits + - TitleMiddleware: Generate title if needed + - TodoListMiddleware: Load todos (if plan mode) + - ViewImageMiddleware: Process images + - ClarificationMiddleware: Check for clarifications + + c. Execute agent: + - Model processes messages + - May call tools (bash, web_search, etc.) + - Tools execute via sandbox + - Results added to messages + + d. Stream response via SSE + +4. Client receives streaming response +``` + +## Data Flow + +### File Upload Flow + +``` +1. Client uploads file + POST /api/threads/{thread_id}/uploads + Content-Type: multipart/form-data + +2. Gateway receives file + - Validates file + - Stores in .deer-flow/threads/{thread_id}/user-data/uploads/ + - If document: converts to Markdown via markitdown + +3. Returns response + { + "files": [{ + "filename": "doc.pdf", + "path": ".deer-flow/.../uploads/doc.pdf", + "virtual_path": "/mnt/user-data/uploads/doc.pdf", + "artifact_url": "/api/threads/.../artifacts/mnt/.../doc.pdf" + }] + } + +4. Next agent run + - UploadsMiddleware lists files + - Injects file list into messages + - Agent can access via virtual_path +``` + +### Configuration Reload + +``` +1. Client updates MCP config + PUT /api/mcp/config + +2. Gateway writes extensions_config.json + - Updates mcpServers section + - File mtime changes + +3. MCP Manager detects change + - get_cached_mcp_tools() checks mtime + - If changed: reinitializes MCP client + - Loads updated server configurations + +4. Next agent run uses new tools +``` + +## Security Considerations + +### Sandbox Isolation + +- Agent code executes within sandbox boundaries +- Local sandbox: Direct execution (development only) +- Docker sandbox: Container isolation (production recommended) +- Path traversal prevention in file operations + +### API Security + +- Thread isolation: Each thread has separate data directories +- File validation: Uploads checked for path safety +- Environment variable resolution: Secrets not stored in config + +### MCP Security + +- Each MCP server runs in its own process +- Environment variables resolved at runtime +- Servers can be enabled/disabled independently + +## Performance Considerations + +### Caching + +- MCP tools cached with file mtime invalidation +- Configuration loaded once, reloaded on file change +- Skills parsed once at startup, cached in memory + +### Streaming + +- SSE used for real-time response streaming +- Reduces time to first token +- Enables progress visibility for long operations + +### Context Management + +- Summarization middleware reduces context when limits approached +- Configurable triggers: tokens, messages, or fraction +- Preserves recent messages while summarizing older ones diff --git a/backend/docs/README.md b/backend/docs/README.md new file mode 100644 index 0000000..bd8c178 --- /dev/null +++ b/backend/docs/README.md @@ -0,0 +1,53 @@ +# Documentation + +This directory contains detailed documentation for the DeerFlow backend. + +## Quick Links + +| Document | Description | +|----------|-------------| +| [ARCHITECTURE.md](ARCHITECTURE.md) | System architecture overview | +| [API.md](API.md) | Complete API reference | +| [CONFIGURATION.md](CONFIGURATION.md) | Configuration options | +| [SETUP.md](SETUP.md) | Quick setup guide | + +## Feature Documentation + +| Document | Description | +|----------|-------------| +| [FILE_UPLOAD.md](FILE_UPLOAD.md) | File upload functionality | +| [PATH_EXAMPLES.md](PATH_EXAMPLES.md) | Path types and usage examples | +| [summarization.md](summarization.md) | Context summarization feature | +| [plan_mode_usage.md](plan_mode_usage.md) | Plan mode with TodoList | +| [AUTO_TITLE_GENERATION.md](AUTO_TITLE_GENERATION.md) | Automatic title generation | + +## Development + +| Document | Description | +|----------|-------------| +| [TODO.md](TODO.md) | Planned features and known issues | + +## Getting Started + +1. **New to DeerFlow?** Start with [SETUP.md](SETUP.md) for quick installation +2. **Configuring the system?** See [CONFIGURATION.md](CONFIGURATION.md) +3. **Understanding the architecture?** Read [ARCHITECTURE.md](ARCHITECTURE.md) +4. **Building integrations?** Check [API.md](API.md) for API reference + +## Document Organization + +``` +docs/ +├── README.md # This file +├── ARCHITECTURE.md # System architecture +├── API.md # API reference +├── CONFIGURATION.md # Configuration guide +├── SETUP.md # Setup instructions +├── FILE_UPLOAD.md # File upload feature +├── PATH_EXAMPLES.md # Path usage examples +├── summarization.md # Summarization feature +├── plan_mode_usage.md # Plan mode feature +├── AUTO_TITLE_GENERATION.md # Title generation +├── TITLE_GENERATION_IMPLEMENTATION.md # Title implementation details +└── TODO.md # Roadmap and issues +``` diff --git a/backend/docs/TODO.md b/backend/docs/TODO.md index 1e5ff2c..a873db3 100644 --- a/backend/docs/TODO.md +++ b/backend/docs/TODO.md @@ -1,14 +1,27 @@ # TODO List -## Features +## Completed Features -[x] Launch the sandbox only after the first file system or bash tool is called -[ ] Pooling the sandbox resources to reduce the number of sandbox containers -[x] Add Clarification Process for the whole process -[x] Implement Context Summarization Mechanism to avoid context explosion\ -[ ] Integrate MCP +- [x] Launch the sandbox only after the first file system or bash tool is called +- [x] Add Clarification Process for the whole process +- [x] Implement Context Summarization Mechanism to avoid context explosion +- [x] Integrate MCP (Model Context Protocol) for extensible tools +- [x] Add file upload support with automatic document conversion +- [x] Implement automatic thread title generation +- [x] Add Plan Mode with TodoList middleware +- [x] Add vision model support with ViewImageMiddleware +- [x] Skills system with SKILL.md format -## Issues +## Planned Features -[x] Make sure that no duplicated files in `state.artifacts` -[x] Long thinking but with empty content (answer inside thinking process) +- [ ] Pooling the sandbox resources to reduce the number of sandbox containers +- [ ] Add authentication/authorization layer +- [ ] Implement rate limiting +- [ ] Add metrics and monitoring +- [ ] Support for more document formats in upload +- [ ] Skill marketplace / remote skill installation + +## Resolved Issues + +- [x] Make sure that no duplicated files in `state.artifacts` +- [x] Long thinking but with empty content (answer inside thinking process) diff --git a/backend/pyproject.toml b/backend/pyproject.toml index 4fe0d94..7daa573 100644 --- a/backend/pyproject.toml +++ b/backend/pyproject.toml @@ -1,7 +1,7 @@ [project] name = "deer-flow" version = "0.1.0" -description = "Add your description here" +description = "LangGraph-based AI agent system with sandbox execution capabilities" readme = "README.md" requires-python = ">=3.12" dependencies = [