Files
deer-flow/backend/docs/TITLE_GENERATION_IMPLEMENTATION.md
DanielWalnut 76803b826f refactor: split backend into harness (deerflow.*) and app (app.*) (#1131)
* refactor: extract shared utils to break harness→app cross-layer imports

Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: split backend/src into harness (deerflow.*) and app (app.*)

Physically split the monolithic backend/src/ package into two layers:

- **Harness** (`packages/harness/deerflow/`): publishable agent framework
  package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
  models, MCP, skills, config, and all core infrastructure.

- **App** (`app/`): unpublished application code with import prefix `app.*`.
  Contains gateway (FastAPI REST API) and channels (IM integrations).

Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure

Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: add harness→app boundary check test and update docs

Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.

Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add config versioning with auto-upgrade on startup

When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.

- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix comments

* fix: update src.* import in test_sandbox_tools_security to deerflow.*

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle empty config and search parent dirs for config.example.yaml

Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
  looking next to config.yaml, fixing detection in common setups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct skills root path depth and config_version type coercion

- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
  after harness split, file lives at packages/harness/deerflow/skills/
  so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
  _check_config_version() to prevent TypeError when YAML stores value
  as string (e.g. config_version: "1")
- tests: add regression tests for both fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update test imports from src.* to deerflow.*/app.* after harness refactor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 22:55:52 +08:00

223 lines
5.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 自动 Title 生成功能实现总结
## ✅ 已完成的工作
### 1. 核心实现文件
#### [`packages/harness/deerflow/agents/thread_state.py`](../packages/harness/deerflow/agents/thread_state.py)
- ✅ 添加 `title: str | None = None` 字段到 `ThreadState`
#### [`packages/harness/deerflow/config/title_config.py`](../packages/harness/deerflow/config/title_config.py) (新建)
- ✅ 创建 `TitleConfig` 配置类
- ✅ 支持配置enabled, max_words, max_chars, model_name, prompt_template
- ✅ 提供 `get_title_config()``set_title_config()` 函数
- ✅ 提供 `load_title_config_from_dict()` 从配置文件加载
#### [`packages/harness/deerflow/agents/title_middleware.py`](../packages/harness/deerflow/agents/title_middleware.py) (新建)
- ✅ 创建 `TitleMiddleware`
- ✅ 实现 `_should_generate_title()` 检查是否需要生成
- ✅ 实现 `_generate_title()` 调用 LLM 生成标题
- ✅ 实现 `after_agent()` 钩子,在首次对话后自动触发
- ✅ 包含 fallback 策略LLM 失败时使用用户消息前几个词)
#### [`packages/harness/deerflow/config/app_config.py`](../packages/harness/deerflow/config/app_config.py)
- ✅ 导入 `load_title_config_from_dict`
- ✅ 在 `from_file()` 中加载 title 配置
#### [`packages/harness/deerflow/agents/lead_agent/agent.py`](../packages/harness/deerflow/agents/lead_agent/agent.py)
- ✅ 导入 `TitleMiddleware`
- ✅ 注册到 `middleware` 列表:`[SandboxMiddleware(), TitleMiddleware()]`
### 2. 配置文件
#### [`config.yaml`](../config.yaml)
- ✅ 添加 title 配置段:
```yaml
title:
enabled: true
max_words: 6
max_chars: 60
model_name: null
```
### 3. 文档
#### [`docs/AUTO_TITLE_GENERATION.md`](../docs/AUTO_TITLE_GENERATION.md) (新建)
- ✅ 完整的功能说明文档
- ✅ 实现方式和架构设计
- ✅ 配置说明
- ✅ 客户端使用示例TypeScript
- ✅ 工作流程图Mermaid
- ✅ 故障排查指南
- ✅ State vs Metadata 对比
#### [`BACKEND_TODO.md`](../BACKEND_TODO.md)
- ✅ 添加功能完成记录
### 4. 测试
#### [`tests/test_title_generation.py`](../tests/test_title_generation.py) (新建)
- ✅ 配置类测试
- ✅ Middleware 初始化测试
- ✅ TODO: 集成测试(需要 mock Runtime
---
## 🎯 核心设计决策
### 为什么使用 State 而非 Metadata
| 方面 | State (✅ 采用) | Metadata (❌ 未采用) |
|------|----------------|---------------------|
| **持久化** | 自动(通过 checkpointer | 取决于实现,不可靠 |
| **版本控制** | 支持时间旅行 | 不支持 |
| **类型安全** | TypedDict 定义 | 任意字典 |
| **标准化** | LangGraph 核心机制 | 扩展功能 |
### 工作流程
```
用户发送首条消息
Agent 处理并返回回复
TitleMiddleware.after_agent() 触发
检查:是否首次对话?是否已有 title
调用 LLM 生成 title
返回 {"title": "..."} 更新 state
Checkpointer 自动持久化(如果配置了)
客户端从 state.values.title 读取
```
---
## 📋 使用指南
### 后端配置
1. **启用/禁用功能**
```yaml
# config.yaml
title:
enabled: true # 设为 false 禁用
```
2. **自定义配置**
```yaml
title:
enabled: true
max_words: 8 # 标题最多 8 个词
max_chars: 80 # 标题最多 80 个字符
model_name: null # 使用默认模型
```
3. **配置持久化(可选)**
如果需要在本地开发时持久化 title
```python
# checkpointer.py
from langgraph.checkpoint.sqlite import SqliteSaver
checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
```
```json
// langgraph.json
{
"graphs": {
"lead_agent": "deerflow.agents:lead_agent"
},
"checkpointer": "checkpointer:checkpointer"
}
```
### 客户端使用
```typescript
// 获取 thread title
const state = await client.threads.getState(threadId);
const title = state.values.title || "New Conversation";
// 显示在对话列表
<li>{title}</li>
```
**⚠️ 注意**Title 在 `state.values.title`,而非 `thread.metadata.title`
---
## 🧪 测试
```bash
# 运行测试
pytest tests/test_title_generation.py -v
# 运行所有测试
pytest
```
---
## 🔍 故障排查
### Title 没有生成?
1. 检查配置:`title.enabled = true`
2. 查看日志:搜索 "Generated thread title"
3. 确认是首次对话1 个用户消息 + 1 个助手回复)
### Title 生成但看不到?
1. 确认读取位置:`state.values.title`(不是 `thread.metadata.title`
2. 检查 API 响应是否包含 title
3. 重新获取 state
### Title 重启后丢失?
1. 本地开发需要配置 checkpointer
2. LangGraph Platform 会自动持久化
3. 检查数据库确认 checkpointer 工作正常
---
## 📊 性能影响
- **延迟增加**:约 0.5-1 秒LLM 调用)
- **并发安全**:在 `after_agent` 中运行,不阻塞主流程
- **资源消耗**:每个 thread 只生成一次
### 优化建议
1. 使用更快的模型(如 `gpt-3.5-turbo`
2. 减少 `max_words``max_chars`
3. 调整 prompt 使其更简洁
---
## 🚀 下一步
- [ ] 添加集成测试(需要 mock LangGraph Runtime
- [ ] 支持自定义 prompt template
- [ ] 支持多语言 title 生成
- [ ] 添加 title 重新生成功能
- [ ] 监控 title 生成成功率和延迟
---
## 📚 相关资源
- [完整文档](../docs/AUTO_TITLE_GENERATION.md)
- [LangGraph Middleware](https://langchain-ai.github.io/langgraph/concepts/middleware/)
- [LangGraph State 管理](https://langchain-ai.github.io/langgraph/concepts/low_level/#state)
- [LangGraph Checkpointer](https://langchain-ai.github.io/langgraph/concepts/persistence/)
---
*实现完成时间: 2026-01-14*