* refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.*) and app (app.*) Physically split the monolithic backend/src/ package into two layers: - **Harness** (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.*`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - **App** (`app/`): unpublished application code with import prefix `app.*`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src.* → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.*` or `import app.*` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow.*/app.* after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
13 KiB
DeerFlow 后端拆分设计文档:Harness + App
状态:Draft 作者:DeerFlow Team 日期:2026-03-13
1. 背景与动机
DeerFlow 后端当前是一个单一 Python 包(src.*),包含了从底层 agent 编排到上层用户产品的所有代码。随着项目发展,这种结构带来了几个问题:
- 复用困难:其他产品(CLI 工具、Slack bot、第三方集成)想用 agent 能力,必须依赖整个后端,包括 FastAPI、IM SDK 等不需要的依赖
- 职责模糊:agent 编排逻辑和用户产品逻辑混在同一个
src/下,边界不清晰 - 依赖膨胀:LangGraph Server 运行时不需要 FastAPI/uvicorn/Slack SDK,但当前必须安装全部依赖
本文档提出将后端拆分为两部分:deerflow-harness(可发布的 agent 框架包)和 app(不打包的用户产品代码)。
2. 核心概念
2.1 Harness(线束/框架层)
Harness 是 agent 的构建与编排框架,回答 "如何构建和运行 agent" 的问题:
- Agent 工厂与生命周期管理
- Middleware pipeline
- 工具系统(内置工具 + MCP + 社区工具)
- 沙箱执行环境
- 子 agent 委派
- 记忆系统
- 技能加载与注入
- 模型工厂
- 配置系统
Harness 是一个可发布的 Python 包(deerflow-harness),可以独立安装和使用。
Harness 的设计原则:对上层应用完全无感知。它不知道也不关心谁在调用它——可以是 Web App、CLI、Slack Bot、或者一个单元测试。
2.2 App(应用层)
App 是面向用户的产品代码,回答 "如何将 agent 呈现给用户" 的问题:
- Gateway API(FastAPI REST 接口)
- IM Channels(飞书、Slack、Telegram 集成)
- Custom Agent 的 CRUD 管理
- 文件上传/下载的 HTTP 接口
App 不打包、不发布,它是 DeerFlow 项目内部的应用代码,直接运行。
App 依赖 Harness,但 Harness 不依赖 App。
2.3 边界划分
| 模块 | 归属 | 说明 |
|---|---|---|
config/ |
Harness | 配置系统是基础设施 |
reflection/ |
Harness | 动态模块加载工具 |
utils/ |
Harness | 通用工具函数 |
agents/ |
Harness | Agent 工厂、middleware、state、memory |
subagents/ |
Harness | 子 agent 委派系统 |
sandbox/ |
Harness | 沙箱执行环境 |
tools/ |
Harness | 工具注册与发现 |
mcp/ |
Harness | MCP 协议集成 |
skills/ |
Harness | 技能加载、解析、定义 schema |
models/ |
Harness | LLM 模型工厂 |
community/ |
Harness | 社区工具(tavily、jina 等) |
client.py |
Harness | 嵌入式 Python 客户端 |
gateway/ |
App | FastAPI REST API |
channels/ |
App | IM 平台集成 |
关于 Custom Agents:agent 定义格式(config.yaml + SOUL.md schema)由 Harness 层的 config/agents_config.py 定义,但文件的存储、CRUD、发现机制由 App 层的 gateway/routers/agents.py 负责。
3. 目标架构
3.1 目录结构
backend/
├── packages/
│ └── harness/
│ ├── pyproject.toml # deerflow-harness 包定义
│ └── deerflow/ # Python 包根(import 前缀: deerflow.*)
│ ├── __init__.py
│ ├── config/
│ ├── reflection/
│ ├── utils/
│ ├── agents/
│ │ ├── lead_agent/
│ │ ├── middlewares/
│ │ ├── memory/
│ │ ├── checkpointer/
│ │ └── thread_state.py
│ ├── subagents/
│ ├── sandbox/
│ ├── tools/
│ ├── mcp/
│ ├── skills/
│ ├── models/
│ ├── community/
│ └── client.py
├── app/ # 不打包(import 前缀: app.*)
│ ├── __init__.py
│ ├── gateway/
│ │ ├── __init__.py
│ │ ├── app.py
│ │ ├── config.py
│ │ ├── path_utils.py
│ │ └── routers/
│ └── channels/
│ ├── __init__.py
│ ├── base.py
│ ├── manager.py
│ ├── service.py
│ ├── store.py
│ ├── message_bus.py
│ ├── feishu.py
│ ├── slack.py
│ └── telegram.py
├── pyproject.toml # uv workspace root
├── langgraph.json
├── tests/
├── docs/
└── Makefile
3.2 Import 规则
两个层使用不同的 import 前缀,职责边界一目了然:
# ---------------------------------------------------------------
# Harness 内部互相引用(deerflow.* 前缀)
# ---------------------------------------------------------------
from deerflow.agents import make_lead_agent
from deerflow.models import create_chat_model
from deerflow.config import get_app_config
from deerflow.tools import get_available_tools
# ---------------------------------------------------------------
# App 内部互相引用(app.* 前缀)
# ---------------------------------------------------------------
from app.gateway.app import app
from app.gateway.routers.uploads import upload_files
from app.channels.service import start_channel_service
# ---------------------------------------------------------------
# App 调用 Harness(单向依赖,Harness 永远不 import app)
# ---------------------------------------------------------------
from deerflow.agents import make_lead_agent
from deerflow.models import create_chat_model
from deerflow.skills import load_skills
from deerflow.config.extensions_config import get_extensions_config
App 调用 Harness 示例 — Gateway 中启动 agent:
# app/gateway/routers/chat.py
from deerflow.agents.lead_agent.agent import make_lead_agent
from deerflow.models import create_chat_model
from deerflow.config import get_app_config
async def create_chat_session(thread_id: str, model_name: str):
config = get_app_config()
model = create_chat_model(name=model_name)
agent = make_lead_agent(config=...)
# ... 使用 agent 处理用户消息
App 调用 Harness 示例 — Channel 中查询 skills:
# app/channels/manager.py
from deerflow.skills import load_skills
from deerflow.agents.memory.updater import get_memory_data
def handle_status_command():
skills = load_skills(enabled_only=True)
memory = get_memory_data()
return f"Skills: {len(skills)}, Memory facts: {len(memory.get('facts', []))}"
禁止方向:Harness 代码中绝不能出现 from app. 或 import app.。
3.3 为什么 App 不打包
| 方面 | 打包(放 packages/ 下) | 不打包(放 backend/app/) |
|---|---|---|
| 命名空间 | 需要 pkgutil extend_path 合并,或独立前缀 |
天然独立,app.* vs deerflow.* |
| 发布需求 | 没有——App 是项目内部代码 | 不需要 pyproject.toml |
| 复杂度 | 需要管理两个包的构建、版本、依赖声明 | 直接运行,零额外配置 |
| 运行方式 | pip install deerflow-app |
PYTHONPATH=. uvicorn app.gateway.app:app |
App 的唯一消费者是 DeerFlow 项目自身,没有独立发布的需求。放在 backend/app/ 下作为普通 Python 包,通过 PYTHONPATH 或 editable install 让 Python 找到即可。
3.4 依赖关系
┌─────────────────────────────────────┐
│ app/ (不打包,直接运行) │
│ ├── fastapi, uvicorn │
│ ├── slack-sdk, lark-oapi, ... │
│ └── import deerflow.* │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ deerflow-harness (可发布的包) │
│ ├── langgraph, langchain │
│ ├── markitdown, pydantic, ... │
│ └── 零 app 依赖 │
└─────────────────────────────────────┘
依赖分类:
| 分类 | 依赖包 |
|---|---|
| Harness only | agent-sandbox, langchain*, langgraph*, markdownify, markitdown, pydantic, pyyaml, readabilipy, tavily-python, firecrawl-py, tiktoken, ddgs, duckdb, httpx, kubernetes, dotenv |
| App only | fastapi, uvicorn, sse-starlette, python-multipart, lark-oapi, slack-sdk, python-telegram-bot, markdown-to-mrkdwn |
| Shared | langgraph-sdk(channels 用 HTTP client), pydantic, httpx |
3.5 Workspace 配置
backend/pyproject.toml(workspace root):
[project]
name = "deer-flow"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = ["deerflow-harness"]
[dependency-groups]
dev = ["pytest>=8.0.0", "ruff>=0.14.11"]
# App 的额外依赖(fastapi 等)也声明在 workspace root,因为 app 不打包
app = ["fastapi", "uvicorn", "sse-starlette", "python-multipart"]
channels = ["lark-oapi", "slack-sdk", "python-telegram-bot"]
[tool.uv.workspace]
members = ["packages/harness"]
[tool.uv.sources]
deerflow-harness = { workspace = true }
4. 当前的跨层依赖问题
在拆分之前,需要先解决 client.py 中两处从 harness 到 app 的反向依赖:
4.1 _validate_skill_frontmatter
# client.py — harness 导入了 app 层代码
from src.gateway.routers.skills import _validate_skill_frontmatter
解决方案:将该函数提取到 deerflow/skills/validation.py。这是一个纯逻辑函数(解析 YAML frontmatter、校验字段),与 FastAPI 无关。
4.2 CONVERTIBLE_EXTENSIONS + convert_file_to_markdown
# client.py — harness 导入了 app 层代码
from src.gateway.routers.uploads import CONVERTIBLE_EXTENSIONS, convert_file_to_markdown
解决方案:将它们提取到 deerflow/utils/file_conversion.py。仅依赖 markitdown + pathlib,是通用工具函数。
5. 基础设施变更
5.1 LangGraph Server
LangGraph Server 只需要 harness 包。langgraph.json 更新:
{
"dependencies": ["./packages/harness"],
"graphs": {
"lead_agent": "deerflow.agents:make_lead_agent"
},
"checkpointer": {
"path": "./packages/harness/deerflow/agents/checkpointer/async_provider.py:make_checkpointer"
}
}
5.2 Gateway API
# serve.sh / Makefile
# PYTHONPATH 包含 backend/ 根目录,使 app.* 和 deerflow.* 都能被找到
PYTHONPATH=. uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001
5.3 Nginx
无需变更(只做 URL 路由,不涉及 Python 模块路径)。
5.4 Docker
Dockerfile 中的 module 引用从 src. 改为 deerflow. / app.,COPY 命令需覆盖 packages/ 和 app/ 目录。
6. 实施计划
分 3 个 PR 递进执行:
PR 1:提取共享工具函数(Low Risk)
- 创建
src/skills/validation.py,从gateway/routers/skills.py提取_validate_skill_frontmatter - 创建
src/utils/file_conversion.py,从gateway/routers/uploads.py提取文件转换逻辑 - 更新
client.py、gateway/routers/skills.py、gateway/routers/uploads.py的 import - 运行全部测试确认无回归
PR 2:Rename + 物理拆分(High Risk,原子操作)
- 创建
packages/harness/目录,创建pyproject.toml git mv将 harness 相关模块从src/移入packages/harness/deerflow/git mv将 app 相关模块从src/移入app/- 全局替换 import:
- harness 模块:
src.*→deerflow.*(所有.py文件、langgraph.json、测试、文档) - app 模块:
src.gateway.*→app.gateway.*、src.channels.*→app.channels.*
- harness 模块:
- 更新 workspace root
pyproject.toml - 更新
langgraph.json、Makefile、Dockerfile uv sync+ 全部测试 + 手动验证服务启动
PR 3:边界检查 + 文档(Low Risk)
- 添加 lint 规则:检查 harness 不 import app 模块
- 更新
CLAUDE.md、README.md
7. 风险与缓解
| 风险 | 影响 | 缓解措施 |
|---|---|---|
| 全局 rename 误伤 | 字符串中的 src 被错误替换 |
正则精确匹配 \bsrc\.,review diff |
| LangGraph Server 找不到模块 | 服务启动失败 | langgraph.json 的 dependencies 指向正确的 harness 包路径 |
App 的 PYTHONPATH 缺失 |
Gateway/Channel 启动 import 报错 | Makefile/Docker 统一设置 PYTHONPATH=. |
config.yaml 中的 use 字段引用旧路径 |
运行时模块解析失败 | config.yaml 中的 use 字段同步更新为 deerflow.* |
测试中 sys.path 混乱 |
测试失败 | 用 editable install(uv sync)确保 deerflow 可导入,conftest.py 中添加 app/ 到 sys.path |
8. 未来演进
- 独立发布:harness 可以发布到内部 PyPI,让其他项目直接
pip install deerflow-harness - 插件化 App:不同的 app(web、CLI、bot)可以各自独立,都依赖同一个 harness
- 更细粒度拆分:如果 harness 内部模块继续增长,可以进一步拆分(如
deerflow-sandbox、deerflow-mcp)