refactor: split backend into harness (deerflow.*) and app (app.*) (#1131)

* refactor: extract shared utils to break harness→app cross-layer imports

Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: split backend/src into harness (deerflow.*) and app (app.*)

Physically split the monolithic backend/src/ package into two layers:

- **Harness** (`packages/harness/deerflow/`): publishable agent framework
  package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
  models, MCP, skills, config, and all core infrastructure.

- **App** (`app/`): unpublished application code with import prefix `app.*`.
  Contains gateway (FastAPI REST API) and channels (IM integrations).

Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure

Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: add harness→app boundary check test and update docs

Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.

Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add config versioning with auto-upgrade on startup

When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.

- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix comments

* fix: update src.* import in test_sandbox_tools_security to deerflow.*

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle empty config and search parent dirs for config.example.yaml

Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
  looking next to config.yaml, fixing detection in common setups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct skills root path depth and config_version type coercion

- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
  after harness split, file lives at packages/harness/deerflow/skills/
  so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
  _check_config_version() to prevent TypeError when YAML stores value
  as string (e.g. config_version: "1")
- tests: add regression tests for both fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update test imports from src.* to deerflow.*/app.* after harness refactor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
DanielWalnut
2026-03-14 22:55:52 +08:00
committed by GitHub
parent 9b49a80dda
commit 76803b826f
198 changed files with 1786 additions and 941 deletions

View File

@@ -80,7 +80,7 @@ docker stop <id> # Auto-removes due to --rm
### Implementation Details
The implementation is in `backend/src/community/aio_sandbox/aio_sandbox_provider.py`:
The implementation is in `backend/packages/harness/deerflow/community/aio_sandbox/aio_sandbox_provider.py`:
- `_detect_container_runtime()`: Detects available runtime at startup
- `_start_container()`: Uses detected runtime, skips Docker-specific options for Apple Container
@@ -93,14 +93,14 @@ No configuration changes are needed! The system works automatically.
However, you can verify the runtime in use by checking the logs:
```
INFO:src.community.aio_sandbox.aio_sandbox_provider:Detected Apple Container: container version 0.1.0
INFO:src.community.aio_sandbox.aio_sandbox_provider:Starting sandbox container using container: ...
INFO:deerflow.community.aio_sandbox.aio_sandbox_provider:Detected Apple Container: container version 0.1.0
INFO:deerflow.community.aio_sandbox.aio_sandbox_provider:Starting sandbox container using container: ...
```
Or for Docker:
```
INFO:src.community.aio_sandbox.aio_sandbox_provider:Apple Container not available, falling back to Docker
INFO:src.community.aio_sandbox.aio_sandbox_provider:Starting sandbox container using docker: ...
INFO:deerflow.community.aio_sandbox.aio_sandbox_provider:Apple Container not available, falling back to Docker
INFO:deerflow.community.aio_sandbox.aio_sandbox_provider:Starting sandbox container using docker: ...
```
## Container Images
@@ -109,7 +109,7 @@ Both runtimes use OCI-compatible images. The default image works with both:
```yaml
sandbox:
use: src.community.aio_sandbox:AioSandboxProvider
use: deerflow.community.aio_sandbox:AioSandboxProvider
image: enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest # Default image
```

View File

@@ -55,7 +55,7 @@ This document provides a comprehensive overview of the DeerFlow backend architec
The LangGraph server is the core agent runtime, built on LangGraph for robust multi-agent workflow orchestration.
**Entry Point**: `src/agents/lead_agent/agent.py:make_lead_agent`
**Entry Point**: `packages/harness/deerflow/agents/lead_agent/agent.py:make_lead_agent`
**Key Responsibilities**:
- Agent creation and configuration
@@ -70,7 +70,7 @@ The LangGraph server is the core agent runtime, built on LangGraph for robust mu
{
"agent": {
"type": "agent",
"path": "src.agents:make_lead_agent"
"path": "deerflow.agents:make_lead_agent"
}
}
```
@@ -79,7 +79,7 @@ The LangGraph server is the core agent runtime, built on LangGraph for robust mu
FastAPI application providing REST endpoints for non-agent operations.
**Entry Point**: `src/gateway/app.py`
**Entry Point**: `app/gateway/app.py`
**Routers**:
- `models.py` - `/api/models` - Model listing and details
@@ -158,7 +158,7 @@ class ThreadState(AgentState):
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ LocalSandboxProvider │ │ AioSandboxProvider │
│ (src/sandbox/local.py) │ │ (src/community/) │
│ (packages/harness/deerflow/sandbox/local.py) │ │ (packages/harness/deerflow/community/) │
│ │ │ │
│ - Singleton instance │ │ - Docker-based │
│ - Direct execution │ │ - Isolated containers │
@@ -192,7 +192,7 @@ class ThreadState(AgentState):
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
│ Built-in Tools │ │ Configured Tools │ │ MCP Tools │
│ (src/tools/) │ │ (config.yaml) │ │ (extensions.json) │
│ (packages/harness/deerflow/tools/) │ │ (config.yaml) │ │ (extensions.json) │
├─────────────────────┤ ├─────────────────────┤ ├─────────────────────┤
│ - present_file │ │ - web_search │ │ - github │
│ - ask_clarification │ │ - web_fetch │ │ - filesystem │
@@ -208,7 +208,7 @@ class ThreadState(AgentState):
┌─────────────────────────┐
│ get_available_tools() │
│ (src/tools/__init__) │
│ (packages/harness/deerflow/tools/__init__) │
└─────────────────────────┘
```
@@ -217,7 +217,7 @@ class ThreadState(AgentState):
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Model Factory │
│ (src/models/factory.py) │
│ (packages/harness/deerflow/models/factory.py) │
└─────────────────────────────────────────────────────────────────────────┘
config.yaml:
@@ -264,7 +264,7 @@ config.yaml:
```
┌─────────────────────────────────────────────────────────────────────────┐
│ MCP Integration │
│ (src/mcp/manager.py) │
│ (packages/harness/deerflow/mcp/manager.py) │
└─────────────────────────────────────────────────────────────────────────┘
extensions_config.json:
@@ -302,7 +302,7 @@ extensions_config.json:
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Skills System │
│ (src/skills/loader.py) │
│ (packages/harness/deerflow/skills/loader.py) │
└─────────────────────────────────────────────────────────────────────────┘
Directory Structure:

View File

@@ -50,7 +50,7 @@ checkpointer = PostgresSaver.from_conn_string(
```json
{
"graphs": {
"lead_agent": "src.agents:lead_agent"
"lead_agent": "deerflow.agents:lead_agent"
},
"checkpointer": "checkpointer:checkpointer"
}
@@ -71,7 +71,7 @@ title:
或在代码中配置:
```python
from src.config.title_config import TitleConfig, set_title_config
from deerflow.config.title_config import TitleConfig, set_title_config
set_title_config(TitleConfig(
enabled=True,
@@ -185,7 +185,7 @@ sequenceDiagram
```python
# 测试 title 生成
import pytest
from src.agents.title_middleware import TitleMiddleware
from deerflow.agents.title_middleware import TitleMiddleware
def test_title_generation():
# TODO: 添加单元测试
@@ -243,11 +243,11 @@ def after_agent(self, state: TitleMiddlewareState, runtime: Runtime) -> dict | N
## 相关文件
- [`src/agents/thread_state.py`](../src/agents/thread_state.py) - ThreadState 定义
- [`src/agents/title_middleware.py`](../src/agents/title_middleware.py) - TitleMiddleware 实现
- [`src/config/title_config.py`](../src/config/title_config.py) - 配置管理
- [`packages/harness/deerflow/agents/thread_state.py`](../packages/harness/deerflow/agents/thread_state.py) - ThreadState 定义
- [`packages/harness/deerflow/agents/title_middleware.py`](../packages/harness/deerflow/agents/title_middleware.py) - TitleMiddleware 实现
- [`packages/harness/deerflow/config/title_config.py`](../packages/harness/deerflow/config/title_config.py) - 配置管理
- [`config.yaml`](../config.yaml) - 配置文件
- [`src/agents/lead_agent/agent.py`](../src/agents/lead_agent/agent.py) - Middleware 注册
- [`packages/harness/deerflow/agents/lead_agent/agent.py`](../packages/harness/deerflow/agents/lead_agent/agent.py) - Middleware 注册
## 参考资料

View File

@@ -2,6 +2,19 @@
This guide explains how to configure DeerFlow for your environment.
## Config Versioning
`config.example.yaml` contains a `config_version` field that tracks schema changes. When the example version is higher than your local `config.yaml`, the application emits a startup warning:
```
WARNING - Your config.yaml (version 0) is outdated — the latest version is 1.
Run `make config-upgrade` to merge new fields into your config.
```
- **Missing `config_version`** in your config is treated as version 0.
- Run `make config-upgrade` to auto-merge missing fields (your existing values are preserved, a `.bak` backup is created).
- When changing the config schema, bump `config_version` in `config.example.yaml`.
## Configuration Sections
### Models
@@ -103,7 +116,7 @@ Configure specific tools available to the agent:
tools:
- name: web_search
group: web
use: src.community.tavily.tools:web_search_tool
use: deerflow.community.tavily.tools:web_search_tool
max_results: 5
# api_key: $TAVILY_API_KEY # Optional
```
@@ -124,13 +137,13 @@ DeerFlow supports multiple sandbox execution modes. Configure your preferred mod
**Local Execution** (runs sandbox code directly on the host machine):
```yaml
sandbox:
use: src.sandbox.local:LocalSandboxProvider # Local execution
use: deerflow.sandbox.local:LocalSandboxProvider # Local execution
```
**Docker Execution** (runs sandbox code in isolated Docker containers):
```yaml
sandbox:
use: src.community.aio_sandbox:AioSandboxProvider # Docker-based sandbox
use: deerflow.community.aio_sandbox:AioSandboxProvider # Docker-based sandbox
```
**Docker Execution with Kubernetes** (runs sandbox code in Kubernetes pods via provisioner service):
@@ -139,7 +152,7 @@ This mode runs each sandbox in an isolated Kubernetes Pod on your **host machine
```yaml
sandbox:
use: src.community.aio_sandbox:AioSandboxProvider
use: deerflow.community.aio_sandbox:AioSandboxProvider
provisioner_url: http://provisioner:8002
```
@@ -152,13 +165,13 @@ Choose between local execution or Docker-based isolation:
**Option 1: Local Sandbox** (default, simpler setup):
```yaml
sandbox:
use: src.sandbox.local:LocalSandboxProvider
use: deerflow.sandbox.local:LocalSandboxProvider
```
**Option 2: Docker Sandbox** (isolated, more secure):
```yaml
sandbox:
use: src.community.aio_sandbox:AioSandboxProvider
use: deerflow.community.aio_sandbox:AioSandboxProvider
port: 8080
auto_start: true
container_prefix: deer-flow-sandbox

View File

@@ -212,11 +212,11 @@ backend/.deer-flow/threads/
### 组件
1. **Upload Router** (`src/gateway/routers/uploads.py`)
1. **Upload Router** (`app/gateway/routers/uploads.py`)
- 处理文件上传、列表、删除请求
- 使用 markitdown 转换文档
2. **Uploads Middleware** (`src/agents/middlewares/uploads_middleware.py`)
2. **Uploads Middleware** (`packages/harness/deerflow/agents/middlewares/uploads_middleware.py`)
- 在每次 Agent 请求前注入文件列表
- 自动生成格式化的文件列表消息

View File

@@ -0,0 +1,343 @@
# DeerFlow 后端拆分设计文档Harness + App
> 状态Draft
> 作者DeerFlow Team
> 日期2026-03-13
## 1. 背景与动机
DeerFlow 后端当前是一个单一 Python 包(`src.*`),包含了从底层 agent 编排到上层用户产品的所有代码。随着项目发展,这种结构带来了几个问题:
- **复用困难**其他产品CLI 工具、Slack bot、第三方集成想用 agent 能力,必须依赖整个后端,包括 FastAPI、IM SDK 等不需要的依赖
- **职责模糊**agent 编排逻辑和用户产品逻辑混在同一个 `src/` 下,边界不清晰
- **依赖膨胀**LangGraph Server 运行时不需要 FastAPI/uvicorn/Slack SDK但当前必须安装全部依赖
本文档提出将后端拆分为两部分:**deerflow-harness**(可发布的 agent 框架包)和 **app**(不打包的用户产品代码)。
## 2. 核心概念
### 2.1 Harness线束/框架层)
Harness 是 agent 的构建与编排框架,回答 **"如何构建和运行 agent"** 的问题:
- Agent 工厂与生命周期管理
- Middleware pipeline
- 工具系统(内置工具 + MCP + 社区工具)
- 沙箱执行环境
- 子 agent 委派
- 记忆系统
- 技能加载与注入
- 模型工厂
- 配置系统
**Harness 是一个可发布的 Python 包**`deerflow-harness`),可以独立安装和使用。
**Harness 的设计原则**:对上层应用完全无感知。它不知道也不关心谁在调用它——可以是 Web App、CLI、Slack Bot、或者一个单元测试。
### 2.2 App应用层
App 是面向用户的产品代码,回答 **"如何将 agent 呈现给用户"** 的问题:
- Gateway APIFastAPI REST 接口)
- IM Channels飞书、Slack、Telegram 集成)
- Custom Agent 的 CRUD 管理
- 文件上传/下载的 HTTP 接口
**App 不打包、不发布**,它是 DeerFlow 项目内部的应用代码,直接运行。
**App 依赖 Harness但 Harness 不依赖 App。**
### 2.3 边界划分
| 模块 | 归属 | 说明 |
|------|------|------|
| `config/` | Harness | 配置系统是基础设施 |
| `reflection/` | Harness | 动态模块加载工具 |
| `utils/` | Harness | 通用工具函数 |
| `agents/` | Harness | Agent 工厂、middleware、state、memory |
| `subagents/` | Harness | 子 agent 委派系统 |
| `sandbox/` | Harness | 沙箱执行环境 |
| `tools/` | Harness | 工具注册与发现 |
| `mcp/` | Harness | MCP 协议集成 |
| `skills/` | Harness | 技能加载、解析、定义 schema |
| `models/` | Harness | LLM 模型工厂 |
| `community/` | Harness | 社区工具tavily、jina 等) |
| `client.py` | Harness | 嵌入式 Python 客户端 |
| `gateway/` | App | FastAPI REST API |
| `channels/` | App | IM 平台集成 |
**关于 Custom Agents**agent 定义格式(`config.yaml` + `SOUL.md` schema由 Harness 层的 `config/agents_config.py` 定义但文件的存储、CRUD、发现机制由 App 层的 `gateway/routers/agents.py` 负责。
## 3. 目标架构
### 3.1 目录结构
```
backend/
├── packages/
│ └── harness/
│ ├── pyproject.toml # deerflow-harness 包定义
│ └── deerflow/ # Python 包根import 前缀: deerflow.*
│ ├── __init__.py
│ ├── config/
│ ├── reflection/
│ ├── utils/
│ ├── agents/
│ │ ├── lead_agent/
│ │ ├── middlewares/
│ │ ├── memory/
│ │ ├── checkpointer/
│ │ └── thread_state.py
│ ├── subagents/
│ ├── sandbox/
│ ├── tools/
│ ├── mcp/
│ ├── skills/
│ ├── models/
│ ├── community/
│ └── client.py
├── app/ # 不打包import 前缀: app.*
│ ├── __init__.py
│ ├── gateway/
│ │ ├── __init__.py
│ │ ├── app.py
│ │ ├── config.py
│ │ ├── path_utils.py
│ │ └── routers/
│ └── channels/
│ ├── __init__.py
│ ├── base.py
│ ├── manager.py
│ ├── service.py
│ ├── store.py
│ ├── message_bus.py
│ ├── feishu.py
│ ├── slack.py
│ └── telegram.py
├── pyproject.toml # uv workspace root
├── langgraph.json
├── tests/
├── docs/
└── Makefile
```
### 3.2 Import 规则
两个层使用不同的 import 前缀,职责边界一目了然:
```python
# ---------------------------------------------------------------
# Harness 内部互相引用deerflow.* 前缀)
# ---------------------------------------------------------------
from deerflow.agents import make_lead_agent
from deerflow.models import create_chat_model
from deerflow.config import get_app_config
from deerflow.tools import get_available_tools
# ---------------------------------------------------------------
# App 内部互相引用app.* 前缀)
# ---------------------------------------------------------------
from app.gateway.app import app
from app.gateway.routers.uploads import upload_files
from app.channels.service import start_channel_service
# ---------------------------------------------------------------
# App 调用 Harness单向依赖Harness 永远不 import app
# ---------------------------------------------------------------
from deerflow.agents import make_lead_agent
from deerflow.models import create_chat_model
from deerflow.skills import load_skills
from deerflow.config.extensions_config import get_extensions_config
```
**App 调用 Harness 示例 — Gateway 中启动 agent**
```python
# app/gateway/routers/chat.py
from deerflow.agents.lead_agent.agent import make_lead_agent
from deerflow.models import create_chat_model
from deerflow.config import get_app_config
async def create_chat_session(thread_id: str, model_name: str):
config = get_app_config()
model = create_chat_model(name=model_name)
agent = make_lead_agent(config=...)
# ... 使用 agent 处理用户消息
```
**App 调用 Harness 示例 — Channel 中查询 skills**
```python
# app/channels/manager.py
from deerflow.skills import load_skills
from deerflow.agents.memory.updater import get_memory_data
def handle_status_command():
skills = load_skills(enabled_only=True)
memory = get_memory_data()
return f"Skills: {len(skills)}, Memory facts: {len(memory.get('facts', []))}"
```
**禁止方向**Harness 代码中绝不能出现 `from app.``import app.`
### 3.3 为什么 App 不打包
| 方面 | 打包(放 packages/ 下) | 不打包(放 backend/app/ |
|------|------------------------|--------------------------|
| 命名空间 | 需要 pkgutil `extend_path` 合并,或独立前缀 | 天然独立,`app.*` vs `deerflow.*` |
| 发布需求 | 没有——App 是项目内部代码 | 不需要 pyproject.toml |
| 复杂度 | 需要管理两个包的构建、版本、依赖声明 | 直接运行,零额外配置 |
| 运行方式 | `pip install deerflow-app` | `PYTHONPATH=. uvicorn app.gateway.app:app` |
App 的唯一消费者是 DeerFlow 项目自身,没有独立发布的需求。放在 `backend/app/` 下作为普通 Python 包,通过 `PYTHONPATH` 或 editable install 让 Python 找到即可。
### 3.4 依赖关系
```
┌─────────────────────────────────────┐
│ app/ (不打包,直接运行) │
│ ├── fastapi, uvicorn │
│ ├── slack-sdk, lark-oapi, ... │
│ └── import deerflow.* │
└──────────────┬──────────────────────┘
┌─────────────────────────────────────┐
│ deerflow-harness (可发布的包) │
│ ├── langgraph, langchain │
│ ├── markitdown, pydantic, ... │
│ └── 零 app 依赖 │
└─────────────────────────────────────┘
```
**依赖分类**
| 分类 | 依赖包 |
|------|--------|
| Harness only | agent-sandbox, langchain*, langgraph*, markdownify, markitdown, pydantic, pyyaml, readabilipy, tavily-python, firecrawl-py, tiktoken, ddgs, duckdb, httpx, kubernetes, dotenv |
| App only | fastapi, uvicorn, sse-starlette, python-multipart, lark-oapi, slack-sdk, python-telegram-bot, markdown-to-mrkdwn |
| Shared | langgraph-sdkchannels 用 HTTP client, pydantic, httpx |
### 3.5 Workspace 配置
`backend/pyproject.toml`workspace root
```toml
[project]
name = "deer-flow"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = ["deerflow-harness"]
[dependency-groups]
dev = ["pytest>=8.0.0", "ruff>=0.14.11"]
# App 的额外依赖fastapi 等)也声明在 workspace root因为 app 不打包
app = ["fastapi", "uvicorn", "sse-starlette", "python-multipart"]
channels = ["lark-oapi", "slack-sdk", "python-telegram-bot"]
[tool.uv.workspace]
members = ["packages/harness"]
[tool.uv.sources]
deerflow-harness = { workspace = true }
```
## 4. 当前的跨层依赖问题
在拆分之前,需要先解决 `client.py` 中两处从 harness 到 app 的反向依赖:
### 4.1 `_validate_skill_frontmatter`
```python
# client.py — harness 导入了 app 层代码
from src.gateway.routers.skills import _validate_skill_frontmatter
```
**解决方案**:将该函数提取到 `deerflow/skills/validation.py`。这是一个纯逻辑函数(解析 YAML frontmatter、校验字段与 FastAPI 无关。
### 4.2 `CONVERTIBLE_EXTENSIONS` + `convert_file_to_markdown`
```python
# client.py — harness 导入了 app 层代码
from src.gateway.routers.uploads import CONVERTIBLE_EXTENSIONS, convert_file_to_markdown
```
**解决方案**:将它们提取到 `deerflow/utils/file_conversion.py`。仅依赖 `markitdown` + `pathlib`,是通用工具函数。
## 5. 基础设施变更
### 5.1 LangGraph Server
LangGraph Server 只需要 harness 包。`langgraph.json` 更新:
```json
{
"dependencies": ["./packages/harness"],
"graphs": {
"lead_agent": "deerflow.agents:make_lead_agent"
},
"checkpointer": {
"path": "./packages/harness/deerflow/agents/checkpointer/async_provider.py:make_checkpointer"
}
}
```
### 5.2 Gateway API
```bash
# serve.sh / Makefile
# PYTHONPATH 包含 backend/ 根目录,使 app.* 和 deerflow.* 都能被找到
PYTHONPATH=. uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001
```
### 5.3 Nginx
无需变更(只做 URL 路由,不涉及 Python 模块路径)。
### 5.4 Docker
Dockerfile 中的 module 引用从 `src.` 改为 `deerflow.` / `app.``COPY` 命令需覆盖 `packages/``app/` 目录。
## 6. 实施计划
分 3 个 PR 递进执行:
### PR 1提取共享工具函数Low Risk
1. 创建 `src/skills/validation.py`,从 `gateway/routers/skills.py` 提取 `_validate_skill_frontmatter`
2. 创建 `src/utils/file_conversion.py`,从 `gateway/routers/uploads.py` 提取文件转换逻辑
3. 更新 `client.py``gateway/routers/skills.py``gateway/routers/uploads.py` 的 import
4. 运行全部测试确认无回归
### PR 2Rename + 物理拆分High Risk原子操作
1. 创建 `packages/harness/` 目录,创建 `pyproject.toml`
2. `git mv` 将 harness 相关模块从 `src/` 移入 `packages/harness/deerflow/`
3. `git mv` 将 app 相关模块从 `src/` 移入 `app/`
4. 全局替换 import
- harness 模块:`src.*``deerflow.*`(所有 `.py` 文件、`langgraph.json`、测试、文档)
- app 模块:`src.gateway.*``app.gateway.*``src.channels.*``app.channels.*`
5. 更新 workspace root `pyproject.toml`
6. 更新 `langgraph.json``Makefile``Dockerfile`
7. `uv sync` + 全部测试 + 手动验证服务启动
### PR 3边界检查 + 文档Low Risk
1. 添加 lint 规则:检查 harness 不 import app 模块
2. 更新 `CLAUDE.md``README.md`
## 7. 风险与缓解
| 风险 | 影响 | 缓解措施 |
|------|------|----------|
| 全局 rename 误伤 | 字符串中的 `src` 被错误替换 | 正则精确匹配 `\bsrc\.`review diff |
| LangGraph Server 找不到模块 | 服务启动失败 | `langgraph.json``dependencies` 指向正确的 harness 包路径 |
| App 的 `PYTHONPATH` 缺失 | Gateway/Channel 启动 import 报错 | Makefile/Docker 统一设置 `PYTHONPATH=.` |
| `config.yaml` 中的 `use` 字段引用旧路径 | 运行时模块解析失败 | `config.yaml` 中的 `use` 字段同步更新为 `deerflow.*` |
| 测试中 `sys.path` 混乱 | 测试失败 | 用 editable install`uv sync`)确保 deerflow 可导入,`conftest.py` 中添加 `app/``sys.path` |
## 8. 未来演进
- **独立发布**harness 可以发布到内部 PyPI让其他项目直接 `pip install deerflow-harness`
- **插件化 App**:不同的 appweb、CLI、bot可以各自独立都依赖同一个 harness
- **更细粒度拆分**:如果 harness 内部模块继续增长,可以进一步拆分(如 `deerflow-sandbox``deerflow-mcp`

View File

@@ -33,6 +33,6 @@ No `current_context` argument is currently available in `main`.
## Verification Pointers
- Implementation: `backend/src/agents/memory/prompt.py`
- Prompt assembly: `backend/src/agents/lead_agent/prompt.py`
- Implementation: `packages/harness/deerflow/agents/memory/prompt.py`
- Prompt assembly: `packages/harness/deerflow/agents/lead_agent/prompt.py`
- Regression tests: `backend/tests/test_memory_prompt_injection.py`

View File

@@ -144,7 +144,7 @@ async function uploadAndProcess(threadId: string, file: File) {
```python
from pathlib import Path
from src.agents.middlewares.thread_data_middleware import THREAD_DATA_BASE_DIR
from deerflow.agents.middlewares.thread_data_middleware import THREAD_DATA_BASE_DIR
def process_uploaded_file(thread_id: str, filename: str):
# 使用实际路径

View File

@@ -30,7 +30,7 @@ DeerFlow uses a YAML configuration file that should be placed in the **project r
4. **Verify configuration**:
```bash
cd backend
python -c "from src.config import get_app_config; print('✓ Config loaded:', get_app_config().models[0].name)"
python -c "from deerflow.config import get_app_config; print('✓ Config loaded:', get_app_config().models[0].name)"
```
## Important Notes
@@ -51,7 +51,7 @@ The backend searches for `config.yaml` in this order:
## Sandbox Setup (Optional but Recommended)
If you plan to use Docker/Container-based sandbox (configured in `config.yaml` under `sandbox.use: src.community.aio_sandbox:AioSandboxProvider`), it's highly recommended to pre-pull the container image:
If you plan to use Docker/Container-based sandbox (configured in `config.yaml` under `sandbox.use: deerflow.community.aio_sandbox:AioSandboxProvider`), it's highly recommended to pre-pull the container image:
```bash
# From project root
@@ -72,7 +72,7 @@ If you skip this step, the image will be automatically pulled on first agent exe
```bash
# Check where the backend is looking
cd deer-flow/backend
python -c "from src.config.app_config import AppConfig; print(AppConfig.resolve_config_path())"
python -c "from deerflow.config.app_config import AppConfig; print(AppConfig.resolve_config_path())"
```
If it can't find the config:

View File

@@ -4,27 +4,27 @@
### 1. 核心实现文件
#### [`src/agents/thread_state.py`](../src/agents/thread_state.py)
#### [`packages/harness/deerflow/agents/thread_state.py`](../packages/harness/deerflow/agents/thread_state.py)
- ✅ 添加 `title: str | None = None` 字段到 `ThreadState`
#### [`src/config/title_config.py`](../src/config/title_config.py) (新建)
#### [`packages/harness/deerflow/config/title_config.py`](../packages/harness/deerflow/config/title_config.py) (新建)
- ✅ 创建 `TitleConfig` 配置类
- ✅ 支持配置enabled, max_words, max_chars, model_name, prompt_template
- ✅ 提供 `get_title_config()``set_title_config()` 函数
- ✅ 提供 `load_title_config_from_dict()` 从配置文件加载
#### [`src/agents/title_middleware.py`](../src/agents/title_middleware.py) (新建)
#### [`packages/harness/deerflow/agents/title_middleware.py`](../packages/harness/deerflow/agents/title_middleware.py) (新建)
- ✅ 创建 `TitleMiddleware`
- ✅ 实现 `_should_generate_title()` 检查是否需要生成
- ✅ 实现 `_generate_title()` 调用 LLM 生成标题
- ✅ 实现 `after_agent()` 钩子,在首次对话后自动触发
- ✅ 包含 fallback 策略LLM 失败时使用用户消息前几个词)
#### [`src/config/app_config.py`](../src/config/app_config.py)
#### [`packages/harness/deerflow/config/app_config.py`](../packages/harness/deerflow/config/app_config.py)
- ✅ 导入 `load_title_config_from_dict`
- ✅ 在 `from_file()` 中加载 title 配置
#### [`src/agents/lead_agent/agent.py`](../src/agents/lead_agent/agent.py)
#### [`packages/harness/deerflow/agents/lead_agent/agent.py`](../packages/harness/deerflow/agents/lead_agent/agent.py)
- ✅ 导入 `TitleMiddleware`
- ✅ 注册到 `middleware` 列表:`[SandboxMiddleware(), TitleMiddleware()]`
@@ -131,7 +131,7 @@ checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
// langgraph.json
{
"graphs": {
"lead_agent": "src.agents:lead_agent"
"lead_agent": "deerflow.agents:lead_agent"
},
"checkpointer": "checkpointer:checkpointer"
}

View File

@@ -21,8 +21,8 @@
- [ ] Support for more document formats in upload
- [ ] Skill marketplace / remote skill installation
- [ ] Optimize async concurrency in agent hot path (IM channels multi-task scenario)
- Replace `time.sleep(5)` with `asyncio.sleep()` in `src/tools/builtins/task_tool.py` (subagent polling)
- Replace `subprocess.run()` with `asyncio.create_subprocess_shell()` in `src/sandbox/local/local_sandbox.py`
- Replace `time.sleep(5)` with `asyncio.sleep()` in `packages/harness/deerflow/tools/builtins/task_tool.py` (subagent polling)
- Replace `subprocess.run()` with `asyncio.create_subprocess_shell()` in `packages/harness/deerflow/sandbox/local/local_sandbox.py`
- Replace sync `requests` with `httpx.AsyncClient` in community tools (tavily, jina_ai, firecrawl, infoquest, image_search)
- Replace sync `model.invoke()` with async `model.ainvoke()` in title_middleware and memory updater
- Consider `asyncio.to_thread()` wrapper for remaining blocking file I/O

View File

@@ -19,7 +19,7 @@ Plan mode is controlled via **runtime configuration** through the `is_plan_mode`
```python
from langchain_core.runnables import RunnableConfig
from src.agents.lead_agent.agent import make_lead_agent
from deerflow.agents.lead_agent.agent import make_lead_agent
# Enable plan mode via runtime configuration
config = RunnableConfig(
@@ -72,7 +72,7 @@ The agent will skip using the todo list for:
```python
from langchain_core.runnables import RunnableConfig
from src.agents.lead_agent.agent import make_lead_agent
from deerflow.agents.lead_agent.agent import make_lead_agent
# Create agent with plan mode ENABLED
config_with_plan_mode = RunnableConfig(
@@ -101,7 +101,7 @@ You can enable/disable plan mode dynamically for different conversations or task
```python
from langchain_core.runnables import RunnableConfig
from src.agents.lead_agent.agent import make_lead_agent
from deerflow.agents.lead_agent.agent import make_lead_agent
def create_agent_for_task(task_complexity: str):
"""Create agent with plan mode based on task complexity."""
@@ -154,7 +154,7 @@ make_lead_agent(config)
## Implementation Details
### Agent Module
- **Location**: `src/agents/lead_agent/agent.py`
- **Location**: `packages/harness/deerflow/agents/lead_agent/agent.py`
- **Function**: `_create_todo_list_middleware(is_plan_mode: bool)` - Creates TodoListMiddleware if plan mode is enabled
- **Function**: `_build_middlewares(config: RunnableConfig)` - Builds middleware chain based on runtime config
- **Function**: `make_lead_agent(config: RunnableConfig)` - Creates agent with appropriate middlewares
@@ -194,7 +194,7 @@ DeerFlow uses custom `system_prompt` and `tool_description` for the TodoListMidd
- Comprehensive best practices section
- Task completion requirements to prevent premature marking
The custom prompts are defined in `_create_todo_list_middleware()` in `/Users/hetao/workspace/deer-flow/backend/src/agents/lead_agent/agent.py:57`.
The custom prompts are defined in `_create_todo_list_middleware()` in `/Users/hetao/workspace/deer-flow/backend/packages/harness/deerflow/agents/lead_agent/agent.py:57`.
## Notes

View File

@@ -269,8 +269,8 @@ The middleware intelligently preserves message context:
### Code Structure
- **Configuration**: `src/config/summarization_config.py`
- **Integration**: `src/agents/lead_agent/agent.py`
- **Configuration**: `packages/harness/deerflow/config/summarization_config.py`
- **Integration**: `packages/harness/deerflow/agents/lead_agent/agent.py`
- **Middleware**: Uses `langchain.agents.middleware.SummarizationMiddleware`
### Middleware Order

View File

@@ -65,7 +65,7 @@ The `task_status_tool` is no longer exposed to the LLM. It's kept in the codebas
### Polling Logic
Located in `src/tools/builtins/task_tool.py`:
Located in `packages/harness/deerflow/tools/builtins/task_tool.py`:
```python
# Start background execution
@@ -93,7 +93,7 @@ while True:
In addition to polling timeout, subagent execution now has a built-in timeout mechanism:
**Configuration** (`src/subagents/config.py`):
**Configuration** (`packages/harness/deerflow/subagents/config.py`):
```python
@dataclass
class SubagentConfig: