mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-02 22:02:13 +08:00
* refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.*) and app (app.*) Physically split the monolithic backend/src/ package into two layers: - **Harness** (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.*`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - **App** (`app/`): unpublished application code with import prefix `app.*`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src.* → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.*` or `import app.*` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow.*/app.* after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
7.5 KiB
7.5 KiB
自动 Thread Title 生成功能
功能说明
自动为对话线程生成标题,在用户首次提问并收到回复后自动触发。
实现方式
使用 TitleMiddleware 在 after_agent 钩子中:
- 检测是否是首次对话(1个用户消息 + 1个助手回复)
- 检查 state 是否已有 title
- 调用 LLM 生成简洁的标题(默认最多6个词)
- 将 title 存储到
ThreadState中(会被 checkpointer 持久化)
⚠️ 重要:存储机制
Title 存储位置
Title 存储在 ThreadState.title 中,而非 thread metadata:
class ThreadState(AgentState):
sandbox: SandboxState | None = None
title: str | None = None # ✅ Title stored here
持久化说明
| 部署方式 | 持久化 | 说明 |
|---|---|---|
| LangGraph Studio (本地) | ❌ 否 | 仅内存存储,重启后丢失 |
| LangGraph Platform | ✅ 是 | 自动持久化到数据库 |
| 自定义 + Checkpointer | ✅ 是 | 需配置 PostgreSQL/SQLite checkpointer |
如何启用持久化
如果需要在本地开发时也持久化 title,需要配置 checkpointer:
# 在 langgraph.json 同级目录创建 checkpointer.py
from langgraph.checkpoint.postgres import PostgresSaver
checkpointer = PostgresSaver.from_conn_string(
"postgresql://user:pass@localhost/dbname"
)
然后在 langgraph.json 中引用:
{
"graphs": {
"lead_agent": "deerflow.agents:lead_agent"
},
"checkpointer": "checkpointer:checkpointer"
}
配置
在 config.yaml 中添加(可选):
title:
enabled: true
max_words: 6
max_chars: 60
model_name: null # 使用默认模型
或在代码中配置:
from deerflow.config.title_config import TitleConfig, set_title_config
set_title_config(TitleConfig(
enabled=True,
max_words=8,
max_chars=80,
))
客户端使用
获取 Thread Title
// 方式1: 从 thread state 获取
const state = await client.threads.getState(threadId);
const title = state.values.title || "New Conversation";
// 方式2: 监听 stream 事件
for await (const chunk of client.runs.stream(threadId, assistantId, {
input: { messages: [{ role: "user", content: "Hello" }] }
})) {
if (chunk.event === "values" && chunk.data.title) {
console.log("Title:", chunk.data.title);
}
}
显示 Title
// 在对话列表中显示
function ConversationList() {
const [threads, setThreads] = useState([]);
useEffect(() => {
async function loadThreads() {
const allThreads = await client.threads.list();
// 获取每个 thread 的 state 来读取 title
const threadsWithTitles = await Promise.all(
allThreads.map(async (t) => {
const state = await client.threads.getState(t.thread_id);
return {
id: t.thread_id,
title: state.values.title || "New Conversation",
updatedAt: t.updated_at,
};
})
);
setThreads(threadsWithTitles);
}
loadThreads();
}, []);
return (
<ul>
{threads.map(thread => (
<li key={thread.id}>
<a href={`/chat/${thread.id}`}>{thread.title}</a>
</li>
))}
</ul>
);
}
工作流程
sequenceDiagram
participant User
participant Client
participant LangGraph
participant TitleMiddleware
participant LLM
participant Checkpointer
User->>Client: 发送首条消息
Client->>LangGraph: POST /threads/{id}/runs
LangGraph->>Agent: 处理消息
Agent-->>LangGraph: 返回回复
LangGraph->>TitleMiddleware: after_agent()
TitleMiddleware->>TitleMiddleware: 检查是否需要生成 title
TitleMiddleware->>LLM: 生成 title
LLM-->>TitleMiddleware: 返回 title
TitleMiddleware->>LangGraph: return {"title": "..."}
LangGraph->>Checkpointer: 保存 state (含 title)
LangGraph-->>Client: 返回响应
Client->>Client: 从 state.values.title 读取
优势
✅ 可靠持久化 - 使用 LangGraph 的 state 机制,自动持久化
✅ 完全后端处理 - 客户端无需额外逻辑
✅ 自动触发 - 首次对话后自动生成
✅ 可配置 - 支持自定义长度、模型等
✅ 容错性强 - 失败时使用 fallback 策略
✅ 架构一致 - 与现有 SandboxMiddleware 保持一致
注意事项
- 读取方式不同:Title 在
state.values.title而非thread.metadata.title - 性能考虑:title 生成会增加约 0.5-1 秒延迟,可通过使用更快的模型优化
- 并发安全:middleware 在 agent 执行后运行,不会阻塞主流程
- Fallback 策略:如果 LLM 调用失败,会使用用户消息的前几个词作为 title
测试
# 测试 title 生成
import pytest
from deerflow.agents.title_middleware import TitleMiddleware
def test_title_generation():
# TODO: 添加单元测试
pass
故障排查
Title 没有生成
- 检查配置是否启用:
get_title_config().enabled == True - 检查日志:查找 "Generated thread title" 或错误信息
- 确认是首次对话:只有 1 个用户消息和 1 个助手回复时才会触发
Title 生成但客户端看不到
- 确认读取位置:应该从
state.values.title读取,而非thread.metadata.title - 检查 API 响应:确认 state 中包含 title 字段
- 尝试重新获取 state:
client.threads.getState(threadId)
Title 重启后丢失
- 检查是否配置了 checkpointer(本地开发需要)
- 确认部署方式:LangGraph Platform 会自动持久化
- 查看数据库:确认 checkpointer 正常工作
架构设计
为什么使用 State 而非 Metadata?
| 特性 | State | Metadata |
|---|---|---|
| 持久化 | ✅ 自动(通过 checkpointer) | ⚠️ 取决于实现 |
| 版本控制 | ✅ 支持时间旅行 | ❌ 不支持 |
| 类型安全 | ✅ TypedDict 定义 | ❌ 任意字典 |
| 可追溯 | ✅ 每次更新都记录 | ⚠️ 只有最新值 |
| 标准化 | ✅ LangGraph 核心机制 | ⚠️ 扩展功能 |
实现细节
# TitleMiddleware 核心逻辑
@override
def after_agent(self, state: TitleMiddlewareState, runtime: Runtime) -> dict | None:
"""Generate and set thread title after the first agent response."""
if self._should_generate_title(state, runtime):
title = self._generate_title(runtime)
print(f"Generated thread title: {title}")
# ✅ 返回 state 更新,会被 checkpointer 自动持久化
return {"title": title}
return None
相关文件
packages/harness/deerflow/agents/thread_state.py- ThreadState 定义packages/harness/deerflow/agents/title_middleware.py- TitleMiddleware 实现packages/harness/deerflow/config/title_config.py- 配置管理config.yaml- 配置文件packages/harness/deerflow/agents/lead_agent/agent.py- Middleware 注册