mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-02 22:02:13 +08:00
* refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.*) and app (app.*) Physically split the monolithic backend/src/ package into two layers: - **Harness** (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.*`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - **App** (`app/`): unpublished application code with import prefix `app.*`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src.* → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.*` or `import app.*` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow.*/app.* after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
5.6 KiB
5.6 KiB
自动 Title 生成功能实现总结
✅ 已完成的工作
1. 核心实现文件
packages/harness/deerflow/agents/thread_state.py
- ✅ 添加
title: str | None = None字段到ThreadState
packages/harness/deerflow/config/title_config.py (新建)
- ✅ 创建
TitleConfig配置类 - ✅ 支持配置:enabled, max_words, max_chars, model_name, prompt_template
- ✅ 提供
get_title_config()和set_title_config()函数 - ✅ 提供
load_title_config_from_dict()从配置文件加载
packages/harness/deerflow/agents/title_middleware.py (新建)
- ✅ 创建
TitleMiddleware类 - ✅ 实现
_should_generate_title()检查是否需要生成 - ✅ 实现
_generate_title()调用 LLM 生成标题 - ✅ 实现
after_agent()钩子,在首次对话后自动触发 - ✅ 包含 fallback 策略(LLM 失败时使用用户消息前几个词)
packages/harness/deerflow/config/app_config.py
- ✅ 导入
load_title_config_from_dict - ✅ 在
from_file()中加载 title 配置
packages/harness/deerflow/agents/lead_agent/agent.py
- ✅ 导入
TitleMiddleware - ✅ 注册到
middleware列表:[SandboxMiddleware(), TitleMiddleware()]
2. 配置文件
config.yaml
- ✅ 添加 title 配置段:
title:
enabled: true
max_words: 6
max_chars: 60
model_name: null
3. 文档
docs/AUTO_TITLE_GENERATION.md (新建)
- ✅ 完整的功能说明文档
- ✅ 实现方式和架构设计
- ✅ 配置说明
- ✅ 客户端使用示例(TypeScript)
- ✅ 工作流程图(Mermaid)
- ✅ 故障排查指南
- ✅ State vs Metadata 对比
BACKEND_TODO.md
- ✅ 添加功能完成记录
4. 测试
tests/test_title_generation.py (新建)
- ✅ 配置类测试
- ✅ Middleware 初始化测试
- ✅ TODO: 集成测试(需要 mock Runtime)
🎯 核心设计决策
为什么使用 State 而非 Metadata?
| 方面 | State (✅ 采用) | Metadata (❌ 未采用) |
|---|---|---|
| 持久化 | 自动(通过 checkpointer) | 取决于实现,不可靠 |
| 版本控制 | 支持时间旅行 | 不支持 |
| 类型安全 | TypedDict 定义 | 任意字典 |
| 标准化 | LangGraph 核心机制 | 扩展功能 |
工作流程
用户发送首条消息
↓
Agent 处理并返回回复
↓
TitleMiddleware.after_agent() 触发
↓
检查:是否首次对话?是否已有 title?
↓
调用 LLM 生成 title
↓
返回 {"title": "..."} 更新 state
↓
Checkpointer 自动持久化(如果配置了)
↓
客户端从 state.values.title 读取
📋 使用指南
后端配置
- 启用/禁用功能
# config.yaml
title:
enabled: true # 设为 false 禁用
- 自定义配置
title:
enabled: true
max_words: 8 # 标题最多 8 个词
max_chars: 80 # 标题最多 80 个字符
model_name: null # 使用默认模型
- 配置持久化(可选)
如果需要在本地开发时持久化 title:
# checkpointer.py
from langgraph.checkpoint.sqlite import SqliteSaver
checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
// langgraph.json
{
"graphs": {
"lead_agent": "deerflow.agents:lead_agent"
},
"checkpointer": "checkpointer:checkpointer"
}
客户端使用
// 获取 thread title
const state = await client.threads.getState(threadId);
const title = state.values.title || "New Conversation";
// 显示在对话列表
<li>{title}</li>
⚠️ 注意:Title 在 state.values.title,而非 thread.metadata.title
🧪 测试
# 运行测试
pytest tests/test_title_generation.py -v
# 运行所有测试
pytest
🔍 故障排查
Title 没有生成?
- 检查配置:
title.enabled = true - 查看日志:搜索 "Generated thread title"
- 确认是首次对话(1 个用户消息 + 1 个助手回复)
Title 生成但看不到?
- 确认读取位置:
state.values.title(不是thread.metadata.title) - 检查 API 响应是否包含 title
- 重新获取 state
Title 重启后丢失?
- 本地开发需要配置 checkpointer
- LangGraph Platform 会自动持久化
- 检查数据库确认 checkpointer 工作正常
📊 性能影响
- 延迟增加:约 0.5-1 秒(LLM 调用)
- 并发安全:在
after_agent中运行,不阻塞主流程 - 资源消耗:每个 thread 只生成一次
优化建议
- 使用更快的模型(如
gpt-3.5-turbo) - 减少
max_words和max_chars - 调整 prompt 使其更简洁
🚀 下一步
- 添加集成测试(需要 mock LangGraph Runtime)
- 支持自定义 prompt template
- 支持多语言 title 生成
- 添加 title 重新生成功能
- 监控 title 生成成功率和延迟
📚 相关资源
实现完成时间: 2026-01-14