Files
deer-flow/backend/docs/AUTO_TITLE_GENERATION.md
DanielWalnut 76803b826f refactor: split backend into harness (deerflow.*) and app (app.*) (#1131)
* refactor: extract shared utils to break harness→app cross-layer imports

Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: split backend/src into harness (deerflow.*) and app (app.*)

Physically split the monolithic backend/src/ package into two layers:

- **Harness** (`packages/harness/deerflow/`): publishable agent framework
  package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
  models, MCP, skills, config, and all core infrastructure.

- **App** (`app/`): unpublished application code with import prefix `app.*`.
  Contains gateway (FastAPI REST API) and channels (IM integrations).

Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure

Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: add harness→app boundary check test and update docs

Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.

Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add config versioning with auto-upgrade on startup

When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.

- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix comments

* fix: update src.* import in test_sandbox_tools_security to deerflow.*

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle empty config and search parent dirs for config.example.yaml

Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
  looking next to config.yaml, fixing detection in common setups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct skills root path depth and config_version type coercion

- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
  after harness split, file lives at packages/harness/deerflow/skills/
  so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
  _check_config_version() to prevent TypeError when YAML stores value
  as string (e.g. config_version: "1")
- tests: add regression tests for both fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update test imports from src.* to deerflow.*/app.* after harness refactor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 22:55:52 +08:00

7.5 KiB
Raw Blame History

自动 Thread Title 生成功能

功能说明

自动为对话线程生成标题,在用户首次提问并收到回复后自动触发。

实现方式

使用 TitleMiddlewareafter_agent 钩子中:

  1. 检测是否是首次对话1个用户消息 + 1个助手回复
  2. 检查 state 是否已有 title
  3. 调用 LLM 生成简洁的标题默认最多6个词
  4. 将 title 存储到 ThreadState 中(会被 checkpointer 持久化)

⚠️ 重要:存储机制

Title 存储位置

Title 存储在 ThreadState.title 中,而非 thread metadata

class ThreadState(AgentState):
    sandbox: SandboxState | None = None
    title: str | None = None  # ✅ Title stored here

持久化说明

部署方式 持久化 说明
LangGraph Studio (本地) 仅内存存储,重启后丢失
LangGraph Platform 自动持久化到数据库
自定义 + Checkpointer 需配置 PostgreSQL/SQLite checkpointer

如何启用持久化

如果需要在本地开发时也持久化 title需要配置 checkpointer

# 在 langgraph.json 同级目录创建 checkpointer.py
from langgraph.checkpoint.postgres import PostgresSaver

checkpointer = PostgresSaver.from_conn_string(
    "postgresql://user:pass@localhost/dbname"
)

然后在 langgraph.json 中引用:

{
  "graphs": {
    "lead_agent": "deerflow.agents:lead_agent"
  },
  "checkpointer": "checkpointer:checkpointer"
}

配置

config.yaml 中添加(可选):

title:
  enabled: true
  max_words: 6
  max_chars: 60
  model_name: null  # 使用默认模型

或在代码中配置:

from deerflow.config.title_config import TitleConfig, set_title_config

set_title_config(TitleConfig(
    enabled=True,
    max_words=8,
    max_chars=80,
))

客户端使用

获取 Thread Title

// 方式1: 从 thread state 获取
const state = await client.threads.getState(threadId);
const title = state.values.title || "New Conversation";

// 方式2: 监听 stream 事件
for await (const chunk of client.runs.stream(threadId, assistantId, {
  input: { messages: [{ role: "user", content: "Hello" }] }
})) {
  if (chunk.event === "values" && chunk.data.title) {
    console.log("Title:", chunk.data.title);
  }
}

显示 Title

// 在对话列表中显示
function ConversationList() {
  const [threads, setThreads] = useState([]);

  useEffect(() => {
    async function loadThreads() {
      const allThreads = await client.threads.list();
      
      // 获取每个 thread 的 state 来读取 title
      const threadsWithTitles = await Promise.all(
        allThreads.map(async (t) => {
          const state = await client.threads.getState(t.thread_id);
          return {
            id: t.thread_id,
            title: state.values.title || "New Conversation",
            updatedAt: t.updated_at,
          };
        })
      );
      
      setThreads(threadsWithTitles);
    }
    loadThreads();
  }, []);

  return (
    <ul>
      {threads.map(thread => (
        <li key={thread.id}>
          <a href={`/chat/${thread.id}`}>{thread.title}</a>
        </li>
      ))}
    </ul>
  );
}

工作流程

sequenceDiagram
    participant User
    participant Client
    participant LangGraph
    participant TitleMiddleware
    participant LLM
    participant Checkpointer

    User->>Client: 发送首条消息
    Client->>LangGraph: POST /threads/{id}/runs
    LangGraph->>Agent: 处理消息
    Agent-->>LangGraph: 返回回复
    LangGraph->>TitleMiddleware: after_agent()
    TitleMiddleware->>TitleMiddleware: 检查是否需要生成 title
    TitleMiddleware->>LLM: 生成 title
    LLM-->>TitleMiddleware: 返回 title
    TitleMiddleware->>LangGraph: return {"title": "..."}
    LangGraph->>Checkpointer: 保存 state (含 title)
    LangGraph-->>Client: 返回响应
    Client->>Client: 从 state.values.title 读取

优势

可靠持久化 - 使用 LangGraph 的 state 机制,自动持久化
完全后端处理 - 客户端无需额外逻辑
自动触发 - 首次对话后自动生成
可配置 - 支持自定义长度、模型等
容错性强 - 失败时使用 fallback 策略
架构一致 - 与现有 SandboxMiddleware 保持一致

注意事项

  1. 读取方式不同Title 在 state.values.title 而非 thread.metadata.title
  2. 性能考虑title 生成会增加约 0.5-1 秒延迟,可通过使用更快的模型优化
  3. 并发安全middleware 在 agent 执行后运行,不会阻塞主流程
  4. Fallback 策略:如果 LLM 调用失败,会使用用户消息的前几个词作为 title

测试

# 测试 title 生成
import pytest
from deerflow.agents.title_middleware import TitleMiddleware

def test_title_generation():
    # TODO: 添加单元测试
    pass

故障排查

Title 没有生成

  1. 检查配置是否启用:get_title_config().enabled == True
  2. 检查日志:查找 "Generated thread title" 或错误信息
  3. 确认是首次对话:只有 1 个用户消息和 1 个助手回复时才会触发

Title 生成但客户端看不到

  1. 确认读取位置:应该从 state.values.title 读取,而非 thread.metadata.title
  2. 检查 API 响应:确认 state 中包含 title 字段
  3. 尝试重新获取 stateclient.threads.getState(threadId)

Title 重启后丢失

  1. 检查是否配置了 checkpointer本地开发需要
  2. 确认部署方式LangGraph Platform 会自动持久化
  3. 查看数据库:确认 checkpointer 正常工作

架构设计

为什么使用 State 而非 Metadata

特性 State Metadata
持久化 自动(通过 checkpointer ⚠️ 取决于实现
版本控制 支持时间旅行 不支持
类型安全 TypedDict 定义 任意字典
可追溯 每次更新都记录 ⚠️ 只有最新值
标准化 LangGraph 核心机制 ⚠️ 扩展功能

实现细节

# TitleMiddleware 核心逻辑
@override
def after_agent(self, state: TitleMiddlewareState, runtime: Runtime) -> dict | None:
    """Generate and set thread title after the first agent response."""
    if self._should_generate_title(state, runtime):
        title = self._generate_title(runtime)
        print(f"Generated thread title: {title}")
        
        # ✅ 返回 state 更新,会被 checkpointer 自动持久化
        return {"title": title}
    
    return None

相关文件

参考资料