mirror of https://gitee.com/wanwujie/deer-flow synced 2026-04-02 22:02:13 +08:00

Files

DanielWalnut 76803b826f refactor: split backend into harness (deerflow.*) and app (app.*) (#1131 )

* refactor: extract shared utils to break harness→app cross-layer imports

Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: split backend/src into harness (deerflow.*) and app (app.*)

Physically split the monolithic backend/src/ package into two layers:

- **Harness** (`packages/harness/deerflow/`): publishable agent framework
  package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
  models, MCP, skills, config, and all core infrastructure.

- **App** (`app/`): unpublished application code with import prefix `app.*`.
  Contains gateway (FastAPI REST API) and channels (IM integrations).

Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure

Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: add harness→app boundary check test and update docs

Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.

Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add config versioning with auto-upgrade on startup

When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.

- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix comments

* fix: update src.* import in test_sandbox_tools_security to deerflow.*

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle empty config and search parent dirs for config.example.yaml

Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
  looking next to config.yaml, fixing detection in common setups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct skills root path depth and config_version type coercion

- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
  after harness split, file lives at packages/harness/deerflow/skills/
  so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
  _check_config_version() to prevent TypeError when YAML stores value
  as string (e.g. config_version: "1")
- tests: add regression tests for both fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update test imports from src.* to deerflow.*/app.* after harness refactor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-14 22:55:52 +08:00

7.5 KiB

Raw Blame History

自动 Thread Title 生成功能

功能说明

自动为对话线程生成标题，在用户首次提问并收到回复后自动触发。

实现方式

使用 TitleMiddleware 在 after_agent 钩子中：

检测是否是首次对话（1个用户消息 + 1个助手回复）
检查 state 是否已有 title
调用 LLM 生成简洁的标题（默认最多6个词）
将 title 存储到 ThreadState 中（会被 checkpointer 持久化）

⚠️ 重要：存储机制

Title 存储位置

Title 存储在 ThreadState.title 中，而非 thread metadata：

class ThreadState(AgentState):
    sandbox: SandboxState | None = None
    title: str | None = None  # ✅ Title stored here

持久化说明

部署方式	持久化	说明
LangGraph Studio (本地)	❌ 否	仅内存存储，重启后丢失
LangGraph Platform	✅ 是	自动持久化到数据库
自定义 + Checkpointer	✅ 是	需配置 PostgreSQL/SQLite checkpointer

如何启用持久化

如果需要在本地开发时也持久化 title，需要配置 checkpointer：

# 在 langgraph.json 同级目录创建 checkpointer.py
from langgraph.checkpoint.postgres import PostgresSaver

checkpointer = PostgresSaver.from_conn_string(
    "postgresql://user:pass@localhost/dbname"
)

然后在 langgraph.json 中引用：

{
  "graphs": {
    "lead_agent": "deerflow.agents:lead_agent"
  },
  "checkpointer": "checkpointer:checkpointer"
}

配置

在 config.yaml 中添加（可选）：

title:
  enabled: true
  max_words: 6
  max_chars: 60
  model_name: null  # 使用默认模型

或在代码中配置：

from deerflow.config.title_config import TitleConfig, set_title_config

set_title_config(TitleConfig(
    enabled=True,
    max_words=8,
    max_chars=80,
))

客户端使用

获取 Thread Title

// 方式1: 从 thread state 获取
const state = await client.threads.getState(threadId);
const title = state.values.title || "New Conversation";

// 方式2: 监听 stream 事件
for await (const chunk of client.runs.stream(threadId, assistantId, {
  input: { messages: [{ role: "user", content: "Hello" }] }
})) {
  if (chunk.event === "values" && chunk.data.title) {
    console.log("Title:", chunk.data.title);
  }
}

显示 Title

// 在对话列表中显示
function ConversationList() {
  const [threads, setThreads] = useState([]);

  useEffect(() => {
    async function loadThreads() {
      const allThreads = await client.threads.list();
      
      // 获取每个 thread 的 state 来读取 title
      const threadsWithTitles = await Promise.all(
        allThreads.map(async (t) => {
          const state = await client.threads.getState(t.thread_id);
          return {
            id: t.thread_id,
            title: state.values.title || "New Conversation",
            updatedAt: t.updated_at,
          };
        })
      );
      
      setThreads(threadsWithTitles);
    }
    loadThreads();
  }, []);

  return (
    <ul>
      {threads.map(thread => (
        <li key={thread.id}>
          <a href={`/chat/${thread.id}`}>{thread.title}</a>
        </li>
      ))}
    </ul>
  );
}

工作流程

sequenceDiagram
    participant User
    participant Client
    participant LangGraph
    participant TitleMiddleware
    participant LLM
    participant Checkpointer

    User->>Client: 发送首条消息
    Client->>LangGraph: POST /threads/{id}/runs
    LangGraph->>Agent: 处理消息
    Agent-->>LangGraph: 返回回复
    LangGraph->>TitleMiddleware: after_agent()
    TitleMiddleware->>TitleMiddleware: 检查是否需要生成 title
    TitleMiddleware->>LLM: 生成 title
    LLM-->>TitleMiddleware: 返回 title
    TitleMiddleware->>LangGraph: return {"title": "..."}
    LangGraph->>Checkpointer: 保存 state (含 title)
    LangGraph-->>Client: 返回响应
    Client->>Client: 从 state.values.title 读取

优势

✅ 可靠持久化 - 使用 LangGraph 的 state 机制，自动持久化
✅ 完全后端处理 - 客户端无需额外逻辑
✅ 自动触发 - 首次对话后自动生成
✅ 可配置 - 支持自定义长度、模型等
✅ 容错性强 - 失败时使用 fallback 策略
✅ 架构一致 - 与现有 SandboxMiddleware 保持一致

注意事项

读取方式不同：Title 在 state.values.title 而非 thread.metadata.title
性能考虑：title 生成会增加约 0.5-1 秒延迟，可通过使用更快的模型优化
并发安全：middleware 在 agent 执行后运行，不会阻塞主流程
Fallback 策略：如果 LLM 调用失败，会使用用户消息的前几个词作为 title

测试

# 测试 title 生成
import pytest
from deerflow.agents.title_middleware import TitleMiddleware

def test_title_generation():
    # TODO: 添加单元测试
    pass

故障排查

Title 没有生成

检查配置是否启用：get_title_config().enabled == True
检查日志：查找 "Generated thread title" 或错误信息
确认是首次对话：只有 1 个用户消息和 1 个助手回复时才会触发

Title 生成但客户端看不到

确认读取位置：应该从 state.values.title 读取，而非 thread.metadata.title
检查 API 响应：确认 state 中包含 title 字段
尝试重新获取 state：client.threads.getState(threadId)

Title 重启后丢失

检查是否配置了 checkpointer（本地开发需要）
确认部署方式：LangGraph Platform 会自动持久化
查看数据库：确认 checkpointer 正常工作

架构设计

为什么使用 State 而非 Metadata？

特性	State	Metadata
持久化	✅ 自动（通过 checkpointer）	⚠️ 取决于实现
版本控制	✅ 支持时间旅行	❌ 不支持
类型安全	✅ TypedDict 定义	❌ 任意字典
可追溯	✅ 每次更新都记录	⚠️ 只有最新值
标准化	✅ LangGraph 核心机制	⚠️ 扩展功能

实现细节

# TitleMiddleware 核心逻辑
@override
def after_agent(self, state: TitleMiddlewareState, runtime: Runtime) -> dict | None:
    """Generate and set thread title after the first agent response."""
    if self._should_generate_title(state, runtime):
        title = self._generate_title(runtime)
        print(f"Generated thread title: {title}")
        
        # ✅ 返回 state 更新，会被 checkpointer 自动持久化
        return {"title": title}
    
    return None

7.5 KiB

Raw Blame History

自动 Thread Title 生成功能

功能说明

实现方式

⚠️ 重要：存储机制

Title 存储位置

持久化说明

如何启用持久化

配置

客户端使用

获取 Thread Title

显示 Title

工作流程

优势

注意事项

测试

故障排查

Title 没有生成

Title 生成但客户端看不到

Title 重启后丢失

架构设计

为什么使用 State 而非 Metadata？

实现细节

相关文件

参考资料

7.5 KiB Raw Blame History Unescape Escape

自动 Thread Title 生成功能

功能说明

实现方式

⚠️ 重要：存储机制

Title 存储位置

持久化说明

如何启用持久化

配置

客户端使用

获取 Thread Title

显示 Title

工作流程

优势

注意事项

测试

故障排查

Title 没有生成

Title 生成但客户端看不到

Title 重启后丢失

架构设计

为什么使用 State 而非 Metadata？

实现细节

相关文件

参考资料

7.5 KiB

Raw Blame History