mirror of https://gitee.com/wanwujie/deer-flow synced 2026-04-03 06:12:14 +08:00

Files

DanielWalnut 76803b826f refactor: split backend into harness (deerflow.*) and app (app.*) (#1131 )

* refactor: extract shared utils to break harness→app cross-layer imports

Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: split backend/src into harness (deerflow.*) and app (app.*)

Physically split the monolithic backend/src/ package into two layers:

- **Harness** (`packages/harness/deerflow/`): publishable agent framework
  package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
  models, MCP, skills, config, and all core infrastructure.

- **App** (`app/`): unpublished application code with import prefix `app.*`.
  Contains gateway (FastAPI REST API) and channels (IM integrations).

Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure

Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: add harness→app boundary check test and update docs

Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.

Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add config versioning with auto-upgrade on startup

When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.

- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix comments

* fix: update src.* import in test_sandbox_tools_security to deerflow.*

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle empty config and search parent dirs for config.example.yaml

Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
  looking next to config.yaml, fixing detection in common setups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct skills root path depth and config_version type coercion

- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
  after harness split, file lives at packages/harness/deerflow/skills/
  so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
  _check_config_version() to prevent TypeError when YAML stores value
  as string (e.g. config_version: "1")
- tests: add regression tests for both fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update test imports from src.* to deerflow.*/app.* after harness refactor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-14 22:55:52 +08:00

13 KiB

Raw Blame History

DeerFlow 后端拆分设计文档：Harness + App

状态：Draft 作者：DeerFlow Team 日期：2026-03-13

1. 背景与动机

DeerFlow 后端当前是一个单一 Python 包（src.*），包含了从底层 agent 编排到上层用户产品的所有代码。随着项目发展，这种结构带来了几个问题：

复用困难：其他产品（CLI 工具、Slack bot、第三方集成）想用 agent 能力，必须依赖整个后端，包括 FastAPI、IM SDK 等不需要的依赖
职责模糊：agent 编排逻辑和用户产品逻辑混在同一个 src/ 下，边界不清晰
依赖膨胀：LangGraph Server 运行时不需要 FastAPI/uvicorn/Slack SDK，但当前必须安装全部依赖

本文档提出将后端拆分为两部分：deerflow-harness（可发布的 agent 框架包）和 app（不打包的用户产品代码）。

2. 核心概念

2.1 Harness（线束/框架层）

Harness 是 agent 的构建与编排框架，回答 "如何构建和运行 agent" 的问题：

Agent 工厂与生命周期管理
Middleware pipeline
工具系统（内置工具 + MCP + 社区工具）
沙箱执行环境
子 agent 委派
记忆系统
技能加载与注入
模型工厂
配置系统

Harness 是一个可发布的 Python 包（deerflow-harness），可以独立安装和使用。

Harness 的设计原则：对上层应用完全无感知。它不知道也不关心谁在调用它——可以是 Web App、CLI、Slack Bot、或者一个单元测试。

2.2 App（应用层）

App 是面向用户的产品代码，回答 "如何将 agent 呈现给用户" 的问题：

Gateway API（FastAPI REST 接口）
IM Channels（飞书、Slack、Telegram 集成）
Custom Agent 的 CRUD 管理
文件上传/下载的 HTTP 接口

App 不打包、不发布，它是 DeerFlow 项目内部的应用代码，直接运行。

App 依赖 Harness，但 Harness 不依赖 App。

2.3 边界划分

模块	归属	说明
`config/`	Harness	配置系统是基础设施
`reflection/`	Harness	动态模块加载工具
`utils/`	Harness	通用工具函数
`agents/`	Harness	Agent 工厂、middleware、state、memory
`subagents/`	Harness	子 agent 委派系统
`sandbox/`	Harness	沙箱执行环境
`tools/`	Harness	工具注册与发现
`mcp/`	Harness	MCP 协议集成
`skills/`	Harness	技能加载、解析、定义 schema
`models/`	Harness	LLM 模型工厂
`community/`	Harness	社区工具（tavily、jina 等）
`client.py`	Harness	嵌入式 Python 客户端
`gateway/`	App	FastAPI REST API
`channels/`	App	IM 平台集成

关于 Custom Agents：agent 定义格式（config.yaml + SOUL.md schema）由 Harness 层的 config/agents_config.py 定义，但文件的存储、CRUD、发现机制由 App 层的 gateway/routers/agents.py 负责。

3. 目标架构

3.1 目录结构

backend/
├── packages/
│   └── harness/
│       ├── pyproject.toml          # deerflow-harness 包定义
│       └── deerflow/               # Python 包根（import 前缀: deerflow.*）
│           ├── __init__.py
│           ├── config/
│           ├── reflection/
│           ├── utils/
│           ├── agents/
│           │   ├── lead_agent/
│           │   ├── middlewares/
│           │   ├── memory/
│           │   ├── checkpointer/
│           │   └── thread_state.py
│           ├── subagents/
│           ├── sandbox/
│           ├── tools/
│           ├── mcp/
│           ├── skills/
│           ├── models/
│           ├── community/
│           └── client.py
├── app/                            # 不打包（import 前缀: app.*）
│   ├── __init__.py
│   ├── gateway/
│   │   ├── __init__.py
│   │   ├── app.py
│   │   ├── config.py
│   │   ├── path_utils.py
│   │   └── routers/
│   └── channels/
│       ├── __init__.py
│       ├── base.py
│       ├── manager.py
│       ├── service.py
│       ├── store.py
│       ├── message_bus.py
│       ├── feishu.py
│       ├── slack.py
│       └── telegram.py
├── pyproject.toml                  # uv workspace root
├── langgraph.json
├── tests/
├── docs/
└── Makefile

3.2 Import 规则

两个层使用不同的 import 前缀，职责边界一目了然：

# ---------------------------------------------------------------
# Harness 内部互相引用（deerflow.* 前缀）
# ---------------------------------------------------------------
from deerflow.agents import make_lead_agent
from deerflow.models import create_chat_model
from deerflow.config import get_app_config
from deerflow.tools import get_available_tools

# ---------------------------------------------------------------
# App 内部互相引用（app.* 前缀）
# ---------------------------------------------------------------
from app.gateway.app import app
from app.gateway.routers.uploads import upload_files
from app.channels.service import start_channel_service

# ---------------------------------------------------------------
# App 调用 Harness（单向依赖，Harness 永远不 import app）
# ---------------------------------------------------------------
from deerflow.agents import make_lead_agent
from deerflow.models import create_chat_model
from deerflow.skills import load_skills
from deerflow.config.extensions_config import get_extensions_config

App 调用 Harness 示例 — Gateway 中启动 agent：

# app/gateway/routers/chat.py
from deerflow.agents.lead_agent.agent import make_lead_agent
from deerflow.models import create_chat_model
from deerflow.config import get_app_config

async def create_chat_session(thread_id: str, model_name: str):
    config = get_app_config()
    model = create_chat_model(name=model_name)
    agent = make_lead_agent(config=...)
    # ... 使用 agent 处理用户消息

App 调用 Harness 示例 — Channel 中查询 skills：

# app/channels/manager.py
from deerflow.skills import load_skills
from deerflow.agents.memory.updater import get_memory_data

def handle_status_command():
    skills = load_skills(enabled_only=True)
    memory = get_memory_data()
    return f"Skills: {len(skills)}, Memory facts: {len(memory.get('facts', []))}"

禁止方向：Harness 代码中绝不能出现 from app. 或 import app.。

3.3 为什么 App 不打包

方面	打包（放 packages/ 下）	不打包（放 backend/app/）
命名空间	需要 pkgutil `extend_path` 合并，或独立前缀	天然独立，`app.` vs `deerflow.`
发布需求	没有——App 是项目内部代码	不需要 pyproject.toml
复杂度	需要管理两个包的构建、版本、依赖声明	直接运行，零额外配置
运行方式	`pip install deerflow-app`	`PYTHONPATH=. uvicorn app.gateway.app:app`

App 的唯一消费者是 DeerFlow 项目自身，没有独立发布的需求。放在 backend/app/ 下作为普通 Python 包，通过 PYTHONPATH 或 editable install 让 Python 找到即可。

3.4 依赖关系

┌─────────────────────────────────────┐
│  app/  (不打包，直接运行)             │
│  ├── fastapi, uvicorn               │
│  ├── slack-sdk, lark-oapi, ...      │
│  └── import deerflow.*              │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│  deerflow-harness  (可发布的包)       │
│  ├── langgraph, langchain           │
│  ├── markitdown, pydantic, ...      │
│  └── 零 app 依赖                     │
└─────────────────────────────────────┘

依赖分类：

分类	依赖包
Harness only	agent-sandbox, langchain, langgraph, markdownify, markitdown, pydantic, pyyaml, readabilipy, tavily-python, firecrawl-py, tiktoken, ddgs, duckdb, httpx, kubernetes, dotenv
App only	fastapi, uvicorn, sse-starlette, python-multipart, lark-oapi, slack-sdk, python-telegram-bot, markdown-to-mrkdwn
Shared	langgraph-sdk（channels 用 HTTP client）, pydantic, httpx

3.5 Workspace 配置

backend/pyproject.toml（workspace root）：

[project]
name = "deer-flow"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = ["deerflow-harness"]

[dependency-groups]
dev = ["pytest>=8.0.0", "ruff>=0.14.11"]
# App 的额外依赖（fastapi 等）也声明在 workspace root，因为 app 不打包
app = ["fastapi", "uvicorn", "sse-starlette", "python-multipart"]
channels = ["lark-oapi", "slack-sdk", "python-telegram-bot"]

[tool.uv.workspace]
members = ["packages/harness"]

[tool.uv.sources]
deerflow-harness = { workspace = true }

4. 当前的跨层依赖问题

在拆分之前，需要先解决 client.py 中两处从 harness 到 app 的反向依赖：

4.1 `_validate_skill_frontmatter`

# client.py — harness 导入了 app 层代码
from src.gateway.routers.skills import _validate_skill_frontmatter

解决方案：将该函数提取到 deerflow/skills/validation.py。这是一个纯逻辑函数（解析 YAML frontmatter、校验字段），与 FastAPI 无关。

4.2 `CONVERTIBLE_EXTENSIONS` + `convert_file_to_markdown`

# client.py — harness 导入了 app 层代码
from src.gateway.routers.uploads import CONVERTIBLE_EXTENSIONS, convert_file_to_markdown

解决方案：将它们提取到 deerflow/utils/file_conversion.py。仅依赖 markitdown + pathlib，是通用工具函数。

5. 基础设施变更

5.1 LangGraph Server

LangGraph Server 只需要 harness 包。langgraph.json 更新：

{
  "dependencies": ["./packages/harness"],
  "graphs": {
    "lead_agent": "deerflow.agents:make_lead_agent"
  },
  "checkpointer": {
    "path": "./packages/harness/deerflow/agents/checkpointer/async_provider.py:make_checkpointer"
  }
}

5.2 Gateway API

# serve.sh / Makefile
# PYTHONPATH 包含 backend/ 根目录，使 app.* 和 deerflow.* 都能被找到
PYTHONPATH=. uvicorn app.gateway.app:app --host 0.0.0.0 --port 8001

5.3 Nginx

无需变更（只做 URL 路由，不涉及 Python 模块路径）。

5.4 Docker

Dockerfile 中的 module 引用从 src. 改为 deerflow. / app.，COPY 命令需覆盖 packages/ 和 app/ 目录。

6. 实施计划

分 3 个 PR 递进执行：

PR 1：提取共享工具函数（Low Risk）

创建 src/skills/validation.py，从 gateway/routers/skills.py 提取 _validate_skill_frontmatter
创建 src/utils/file_conversion.py，从 gateway/routers/uploads.py 提取文件转换逻辑
更新 client.py、gateway/routers/skills.py、gateway/routers/uploads.py 的 import
运行全部测试确认无回归

PR 2：Rename + 物理拆分（High Risk，原子操作）

创建 packages/harness/ 目录，创建 pyproject.toml
git mv 将 harness 相关模块从 src/ 移入 packages/harness/deerflow/
git mv 将 app 相关模块从 src/ 移入 app/
全局替换 import：
- harness 模块：src.* → deerflow.*（所有 .py 文件、langgraph.json、测试、文档）
- app 模块：src.gateway.* → app.gateway.*、src.channels.* → app.channels.*
更新 workspace root pyproject.toml
更新 langgraph.json、Makefile、Dockerfile
uv sync + 全部测试 + 手动验证服务启动

PR 3：边界检查 + 文档（Low Risk）

添加 lint 规则：检查 harness 不 import app 模块
更新 CLAUDE.md、README.md

7. 风险与缓解

风险	影响	缓解措施
全局 rename 误伤	字符串中的 `src` 被错误替换	正则精确匹配 `\bsrc\.`，review diff
LangGraph Server 找不到模块	服务启动失败	`langgraph.json` 的 `dependencies` 指向正确的 harness 包路径
App 的 `PYTHONPATH` 缺失	Gateway/Channel 启动 import 报错	Makefile/Docker 统一设置 `PYTHONPATH=.`
`config.yaml` 中的 `use` 字段引用旧路径	运行时模块解析失败	`config.yaml` 中的 `use` 字段同步更新为 `deerflow.*`
测试中 `sys.path` 混乱	测试失败	用 editable install（`uv sync`）确保 deerflow 可导入，`conftest.py` 中添加 `app/` 到 `sys.path`

8. 未来演进

独立发布：harness 可以发布到内部 PyPI，让其他项目直接 pip install deerflow-harness
插件化 App：不同的 app（web、CLI、bot）可以各自独立，都依赖同一个 harness
更细粒度拆分：如果 harness 内部模块继续增长，可以进一步拆分（如 deerflow-sandbox、deerflow-mcp）

13 KiB Raw Blame History Unescape Escape