mirror of https://gitee.com/wanwujie/deer-flow synced 2026-04-03 06:12:14 +08:00

Files

DanielWalnut 76803b826f refactor: split backend into harness (deerflow.*) and app (app.*) (#1131 )

* refactor: extract shared utils to break harness→app cross-layer imports

Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: split backend/src into harness (deerflow.*) and app (app.*)

Physically split the monolithic backend/src/ package into two layers:

- **Harness** (`packages/harness/deerflow/`): publishable agent framework
  package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
  models, MCP, skills, config, and all core infrastructure.

- **App** (`app/`): unpublished application code with import prefix `app.*`.
  Contains gateway (FastAPI REST API) and channels (IM integrations).

Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure

Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: add harness→app boundary check test and update docs

Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.

Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add config versioning with auto-upgrade on startup

When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.

- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix comments

* fix: update src.* import in test_sandbox_tools_security to deerflow.*

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle empty config and search parent dirs for config.example.yaml

Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
  looking next to config.yaml, fixing detection in common setups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct skills root path depth and config_version type coercion

- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
  after harness split, file lives at packages/harness/deerflow/skills/
  so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
  _check_config_version() to prevent TypeError when YAML stores value
  as string (e.g. config_version: "1")
- tests: add regression tests for both fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update test imports from src.* to deerflow.*/app.* after harness refactor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-14 22:55:52 +08:00

9.9 KiB

Raw Blame History

Contributing to DeerFlow Backend

Thank you for your interest in contributing to DeerFlow! This document provides guidelines and instructions for contributing to the backend codebase.

Getting Started
Development Setup
Project Structure
Code Style
Making Changes
Testing
Pull Request Process
Architecture Guidelines

Getting Started

Prerequisites

Python 3.12 or higher
uv package manager
Git
Docker (optional, for Docker sandbox testing)

Fork and Clone

Fork the repository on GitHub

Clone your fork locally:

git clone https://github.com/YOUR_USERNAME/deer-flow.git
cd deer-flow

Development Setup

Install Dependencies

# From project root
cp config.example.yaml config.yaml

# Install backend dependencies
cd backend
make install

Configure Environment

Set up your API keys for testing:

export OPENAI_API_KEY="your-api-key"
# Add other keys as needed

Run the Development Server

# Terminal 1: LangGraph server
make dev

# Terminal 2: Gateway API
make gateway

Project Structure

backend/src/
├── agents/                  # Agent system
│   ├── lead_agent/         # Main agent implementation
│   │   └── agent.py        # Agent factory and creation
│   ├── middlewares/        # Agent middlewares
│   │   ├── thread_data_middleware.py
│   │   ├── sandbox_middleware.py
│   │   ├── title_middleware.py
│   │   ├── uploads_middleware.py
│   │   ├── view_image_middleware.py
│   │   └── clarification_middleware.py
│   └── thread_state.py     # Thread state definition
│
├── gateway/                 # FastAPI Gateway
│   ├── app.py              # FastAPI application
│   └── routers/            # Route handlers
│       ├── models.py       # /api/models endpoints
│       ├── mcp.py          # /api/mcp endpoints
│       ├── skills.py       # /api/skills endpoints
│       ├── artifacts.py    # /api/threads/.../artifacts
│       └── uploads.py      # /api/threads/.../uploads
│
├── sandbox/                 # Sandbox execution
│   ├── __init__.py         # Sandbox interface
│   ├── local.py            # Local sandbox provider
│   └── tools.py            # Sandbox tools (bash, file ops)
│
├── tools/                   # Agent tools
│   └── builtins/           # Built-in tools
│       ├── present_file_tool.py
│       ├── ask_clarification_tool.py
│       └── view_image_tool.py
│
├── mcp/                     # MCP integration
│   └── manager.py          # MCP server management
│
├── models/                  # Model system
│   └── factory.py          # Model factory
│
├── skills/                  # Skills system
│   └── loader.py           # Skills loader
│
├── config/                  # Configuration
│   ├── app_config.py       # Main app config
│   ├── extensions_config.py # Extensions config
│   └── summarization_config.py
│
├── community/               # Community tools
│   ├── tavily/             # Tavily web search
│   ├── jina/               # Jina web fetch
│   ├── firecrawl/          # Firecrawl scraping
│   └── aio_sandbox/        # Docker sandbox
│
├── reflection/              # Dynamic loading
│   └── __init__.py         # Module resolution
│
└── utils/                   # Utilities
    └── __init__.py

Code Style

Linting and Formatting

We use ruff for both linting and formatting:

# Check for issues
make lint

# Auto-fix and format
make format

Style Guidelines

Line length: 240 characters maximum
Python version: 3.12+ features allowed
Type hints: Use type hints for function signatures
Quotes: Double quotes for strings
Indentation: 4 spaces (no tabs)
Imports: Group by standard library, third-party, local

Docstrings

Use docstrings for public functions and classes:

def create_chat_model(name: str, thinking_enabled: bool = False) -> BaseChatModel:
    """Create a chat model instance from configuration.

    Args:
        name: The model name as defined in config.yaml
        thinking_enabled: Whether to enable extended thinking

    Returns:
        A configured LangChain chat model instance

    Raises:
        ValueError: If the model name is not found in configuration
    """
    ...

Making Changes

Branch Naming

Use descriptive branch names:

feature/add-new-tool - New features
fix/sandbox-timeout - Bug fixes
docs/update-readme - Documentation
refactor/config-system - Code refactoring

Commit Messages

Write clear, concise commit messages:

feat: add support for Claude 3.5 model

- Add model configuration in config.yaml
- Update model factory to handle Claude-specific settings
- Add tests for new model

Prefix types:

feat: - New feature
fix: - Bug fix
docs: - Documentation
refactor: - Code refactoring
test: - Tests
chore: - Build/config changes

Testing

Running Tests

uv run pytest

Writing Tests

Place tests in the tests/ directory mirroring the source structure:

tests/
├── test_models/
│   └── test_factory.py
├── test_sandbox/
│   └── test_local.py
└── test_gateway/
    └── test_models_router.py

Example test:

import pytest
from deerflow.models.factory import create_chat_model

def test_create_chat_model_with_valid_name():
    """Test that a valid model name creates a model instance."""
    model = create_chat_model("gpt-4")
    assert model is not None

def test_create_chat_model_with_invalid_name():
    """Test that an invalid model name raises ValueError."""
    with pytest.raises(ValueError):
        create_chat_model("nonexistent-model")

Pull Request Process

Before Submitting

Ensure tests pass: uv run pytest
Run linter: make lint
Format code: make format
Update documentation if needed

PR Description

Include in your PR description:

What: Brief description of changes
Why: Motivation for the change
How: Implementation approach
Testing: How you tested the changes

Review Process

Submit PR with clear description
Address review feedback
Ensure CI passes
Maintainer will merge when approved

Architecture Guidelines

Adding New Tools

Create tool in packages/harness/deerflow/tools/builtins/ or packages/harness/deerflow/community/:

# packages/harness/deerflow/tools/builtins/my_tool.py
from langchain_core.tools import tool

@tool
def my_tool(param: str) -> str:
    """Tool description for the agent.

    Args:
        param: Description of the parameter

    Returns:
        Description of return value
    """
    return f"Result: {param}"

tools:
  - name: my_tool
    group: my_group
    use: deerflow.tools.builtins.my_tool:my_tool

Adding New Middleware

Create middleware in packages/harness/deerflow/agents/middlewares/:

# packages/harness/deerflow/agents/middlewares/my_middleware.py
from langchain.agents.middleware import BaseMiddleware
from langchain_core.runnables import RunnableConfig

class MyMiddleware(BaseMiddleware):
    """Middleware description."""

    def transform_state(self, state: dict, config: RunnableConfig) -> dict:
        """Transform the state before agent execution."""
        # Modify state as needed
        return state

middlewares = [
    ThreadDataMiddleware(),
    SandboxMiddleware(),
    MyMiddleware(),  # Add your middleware
    TitleMiddleware(),
    ClarificationMiddleware(),
]

Adding New API Endpoints

Create router in app/gateway/routers/:

# app/gateway/routers/my_router.py
from fastapi import APIRouter

router = APIRouter(prefix="/my-endpoint", tags=["my-endpoint"])

@router.get("/")
async def get_items():
    """Get all items."""
    return {"items": []}

@router.post("/")
async def create_item(data: dict):
    """Create a new item."""
    return {"created": data}

from app.gateway.routers import my_router

app.include_router(my_router.router)

Configuration Changes

When adding new configuration options:

Update packages/harness/deerflow/config/app_config.py with new fields
Add default values in config.example.yaml
Document in docs/CONFIGURATION.md

MCP Server Integration

To add support for a new MCP server:

Add configuration in extensions_config.json:

{
  "mcpServers": {
    "my-server": {
      "enabled": true,
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@my-org/mcp-server"],
      "description": "My MCP Server"
    }
  }
}

Update extensions_config.example.json with the new server

Skills Development

To create a new skill:

Create directory in skills/public/ or skills/custom/:

skills/public/my-skill/
└── SKILL.md

Write SKILL.md with YAML front matter:

---
name: My Skill
description: What this skill does
license: MIT
allowed-tools:
  - read_file
  - write_file
  - bash
---

# My Skill

Instructions for the agent when this skill is enabled...

Questions?

If you have questions about contributing:

Check existing documentation in docs/
Look for similar issues or PRs on GitHub
Open a discussion or issue on GitHub

Thank you for contributing to DeerFlow!

9.9 KiB Raw Blame History