mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-03 06:12:14 +08:00
* refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.*) and app (app.*) Physically split the monolithic backend/src/ package into two layers: - **Harness** (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.*`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - **App** (`app/`): unpublished application code with import prefix `app.*`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src.* → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.*` or `import app.*` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow.*/app.* after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
354 lines
10 KiB
Markdown
354 lines
10 KiB
Markdown
# Conversation Summarization
|
|
|
|
DeerFlow includes automatic conversation summarization to handle long conversations that approach model token limits. When enabled, the system automatically condenses older messages while preserving recent context.
|
|
|
|
## Overview
|
|
|
|
The summarization feature uses LangChain's `SummarizationMiddleware` to monitor conversation history and trigger summarization based on configurable thresholds. When activated, it:
|
|
|
|
1. Monitors message token counts in real-time
|
|
2. Triggers summarization when thresholds are met
|
|
3. Keeps recent messages intact while summarizing older exchanges
|
|
4. Maintains AI/Tool message pairs together for context continuity
|
|
5. Injects the summary back into the conversation
|
|
|
|
## Configuration
|
|
|
|
Summarization is configured in `config.yaml` under the `summarization` key:
|
|
|
|
```yaml
|
|
summarization:
|
|
enabled: true
|
|
model_name: null # Use default model or specify a lightweight model
|
|
|
|
# Trigger conditions (OR logic - any condition triggers summarization)
|
|
trigger:
|
|
- type: tokens
|
|
value: 4000
|
|
# Additional triggers (optional)
|
|
# - type: messages
|
|
# value: 50
|
|
# - type: fraction
|
|
# value: 0.8 # 80% of model's max input tokens
|
|
|
|
# Context retention policy
|
|
keep:
|
|
type: messages
|
|
value: 20
|
|
|
|
# Token trimming for summarization call
|
|
trim_tokens_to_summarize: 4000
|
|
|
|
# Custom summary prompt (optional)
|
|
summary_prompt: null
|
|
```
|
|
|
|
### Configuration Options
|
|
|
|
#### `enabled`
|
|
- **Type**: Boolean
|
|
- **Default**: `false`
|
|
- **Description**: Enable or disable automatic summarization
|
|
|
|
#### `model_name`
|
|
- **Type**: String or null
|
|
- **Default**: `null` (uses default model)
|
|
- **Description**: Model to use for generating summaries. Recommended to use a lightweight, cost-effective model like `gpt-4o-mini` or equivalent.
|
|
|
|
#### `trigger`
|
|
- **Type**: Single `ContextSize` or list of `ContextSize` objects
|
|
- **Required**: At least one trigger must be specified when enabled
|
|
- **Description**: Thresholds that trigger summarization. Uses OR logic - summarization runs when ANY threshold is met.
|
|
|
|
**ContextSize Types:**
|
|
|
|
1. **Token-based trigger**: Activates when token count reaches the specified value
|
|
```yaml
|
|
trigger:
|
|
type: tokens
|
|
value: 4000
|
|
```
|
|
|
|
2. **Message-based trigger**: Activates when message count reaches the specified value
|
|
```yaml
|
|
trigger:
|
|
type: messages
|
|
value: 50
|
|
```
|
|
|
|
3. **Fraction-based trigger**: Activates when token usage reaches a percentage of the model's maximum input tokens
|
|
```yaml
|
|
trigger:
|
|
type: fraction
|
|
value: 0.8 # 80% of max input tokens
|
|
```
|
|
|
|
**Multiple Triggers:**
|
|
```yaml
|
|
trigger:
|
|
- type: tokens
|
|
value: 4000
|
|
- type: messages
|
|
value: 50
|
|
```
|
|
|
|
#### `keep`
|
|
- **Type**: `ContextSize` object
|
|
- **Default**: `{type: messages, value: 20}`
|
|
- **Description**: Specifies how much recent conversation history to preserve after summarization.
|
|
|
|
**Examples:**
|
|
```yaml
|
|
# Keep most recent 20 messages
|
|
keep:
|
|
type: messages
|
|
value: 20
|
|
|
|
# Keep most recent 3000 tokens
|
|
keep:
|
|
type: tokens
|
|
value: 3000
|
|
|
|
# Keep most recent 30% of model's max input tokens
|
|
keep:
|
|
type: fraction
|
|
value: 0.3
|
|
```
|
|
|
|
#### `trim_tokens_to_summarize`
|
|
- **Type**: Integer or null
|
|
- **Default**: `4000`
|
|
- **Description**: Maximum tokens to include when preparing messages for the summarization call itself. Set to `null` to skip trimming (not recommended for very long conversations).
|
|
|
|
#### `summary_prompt`
|
|
- **Type**: String or null
|
|
- **Default**: `null` (uses LangChain's default prompt)
|
|
- **Description**: Custom prompt template for generating summaries. The prompt should guide the model to extract the most important context.
|
|
|
|
**Default Prompt Behavior:**
|
|
The default LangChain prompt instructs the model to:
|
|
- Extract highest quality/most relevant context
|
|
- Focus on information critical to the overall goal
|
|
- Avoid repeating completed actions
|
|
- Return only the extracted context
|
|
|
|
## How It Works
|
|
|
|
### Summarization Flow
|
|
|
|
1. **Monitoring**: Before each model call, the middleware counts tokens in the message history
|
|
2. **Trigger Check**: If any configured threshold is met, summarization is triggered
|
|
3. **Message Partitioning**: Messages are split into:
|
|
- Messages to summarize (older messages beyond the `keep` threshold)
|
|
- Messages to preserve (recent messages within the `keep` threshold)
|
|
4. **Summary Generation**: The model generates a concise summary of the older messages
|
|
5. **Context Replacement**: The message history is updated:
|
|
- All old messages are removed
|
|
- A single summary message is added
|
|
- Recent messages are preserved
|
|
6. **AI/Tool Pair Protection**: The system ensures AI messages and their corresponding tool messages stay together
|
|
|
|
### Token Counting
|
|
|
|
- Uses approximate token counting based on character count
|
|
- For Anthropic models: ~3.3 characters per token
|
|
- For other models: Uses LangChain's default estimation
|
|
- Can be customized with a custom `token_counter` function
|
|
|
|
### Message Preservation
|
|
|
|
The middleware intelligently preserves message context:
|
|
|
|
- **Recent Messages**: Always kept intact based on `keep` configuration
|
|
- **AI/Tool Pairs**: Never split - if a cutoff point falls within tool messages, the system adjusts to keep the entire AI + Tool message sequence together
|
|
- **Summary Format**: Summary is injected as a HumanMessage with the format:
|
|
```
|
|
Here is a summary of the conversation to date:
|
|
|
|
[Generated summary text]
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### Choosing Trigger Thresholds
|
|
|
|
1. **Token-based triggers**: Recommended for most use cases
|
|
- Set to 60-80% of your model's context window
|
|
- Example: For 8K context, use 4000-6000 tokens
|
|
|
|
2. **Message-based triggers**: Useful for controlling conversation length
|
|
- Good for applications with many short messages
|
|
- Example: 50-100 messages depending on average message length
|
|
|
|
3. **Fraction-based triggers**: Ideal when using multiple models
|
|
- Automatically adapts to each model's capacity
|
|
- Example: 0.8 (80% of model's max input tokens)
|
|
|
|
### Choosing Retention Policy (`keep`)
|
|
|
|
1. **Message-based retention**: Best for most scenarios
|
|
- Preserves natural conversation flow
|
|
- Recommended: 15-25 messages
|
|
|
|
2. **Token-based retention**: Use when precise control is needed
|
|
- Good for managing exact token budgets
|
|
- Recommended: 2000-4000 tokens
|
|
|
|
3. **Fraction-based retention**: For multi-model setups
|
|
- Automatically scales with model capacity
|
|
- Recommended: 0.2-0.4 (20-40% of max input)
|
|
|
|
### Model Selection
|
|
|
|
- **Recommended**: Use a lightweight, cost-effective model for summaries
|
|
- Examples: `gpt-4o-mini`, `claude-haiku`, or equivalent
|
|
- Summaries don't require the most powerful models
|
|
- Significant cost savings on high-volume applications
|
|
|
|
- **Default**: If `model_name` is `null`, uses the default model
|
|
- May be more expensive but ensures consistency
|
|
- Good for simple setups
|
|
|
|
### Optimization Tips
|
|
|
|
1. **Balance triggers**: Combine token and message triggers for robust handling
|
|
```yaml
|
|
trigger:
|
|
- type: tokens
|
|
value: 4000
|
|
- type: messages
|
|
value: 50
|
|
```
|
|
|
|
2. **Conservative retention**: Keep more messages initially, adjust based on performance
|
|
```yaml
|
|
keep:
|
|
type: messages
|
|
value: 25 # Start higher, reduce if needed
|
|
```
|
|
|
|
3. **Trim strategically**: Limit tokens sent to summarization model
|
|
```yaml
|
|
trim_tokens_to_summarize: 4000 # Prevents expensive summarization calls
|
|
```
|
|
|
|
4. **Monitor and iterate**: Track summary quality and adjust configuration
|
|
|
|
## Troubleshooting
|
|
|
|
### Summary Quality Issues
|
|
|
|
**Problem**: Summaries losing important context
|
|
|
|
**Solutions**:
|
|
1. Increase `keep` value to preserve more messages
|
|
2. Decrease trigger thresholds to summarize earlier
|
|
3. Customize `summary_prompt` to emphasize key information
|
|
4. Use a more capable model for summarization
|
|
|
|
### Performance Issues
|
|
|
|
**Problem**: Summarization calls taking too long
|
|
|
|
**Solutions**:
|
|
1. Use a faster model for summaries (e.g., `gpt-4o-mini`)
|
|
2. Reduce `trim_tokens_to_summarize` to send less context
|
|
3. Increase trigger thresholds to summarize less frequently
|
|
|
|
### Token Limit Errors
|
|
|
|
**Problem**: Still hitting token limits despite summarization
|
|
|
|
**Solutions**:
|
|
1. Lower trigger thresholds to summarize earlier
|
|
2. Reduce `keep` value to preserve fewer messages
|
|
3. Check if individual messages are very large
|
|
4. Consider using fraction-based triggers
|
|
|
|
## Implementation Details
|
|
|
|
### Code Structure
|
|
|
|
- **Configuration**: `packages/harness/deerflow/config/summarization_config.py`
|
|
- **Integration**: `packages/harness/deerflow/agents/lead_agent/agent.py`
|
|
- **Middleware**: Uses `langchain.agents.middleware.SummarizationMiddleware`
|
|
|
|
### Middleware Order
|
|
|
|
Summarization runs after ThreadData and Sandbox initialization but before Title and Clarification:
|
|
|
|
1. ThreadDataMiddleware
|
|
2. SandboxMiddleware
|
|
3. **SummarizationMiddleware** ← Runs here
|
|
4. TitleMiddleware
|
|
5. ClarificationMiddleware
|
|
|
|
### State Management
|
|
|
|
- Summarization is stateless - configuration is loaded once at startup
|
|
- Summaries are added as regular messages in the conversation history
|
|
- The checkpointer persists the summarized history automatically
|
|
|
|
## Example Configurations
|
|
|
|
### Minimal Configuration
|
|
```yaml
|
|
summarization:
|
|
enabled: true
|
|
trigger:
|
|
type: tokens
|
|
value: 4000
|
|
keep:
|
|
type: messages
|
|
value: 20
|
|
```
|
|
|
|
### Production Configuration
|
|
```yaml
|
|
summarization:
|
|
enabled: true
|
|
model_name: gpt-4o-mini # Lightweight model for cost efficiency
|
|
trigger:
|
|
- type: tokens
|
|
value: 6000
|
|
- type: messages
|
|
value: 75
|
|
keep:
|
|
type: messages
|
|
value: 25
|
|
trim_tokens_to_summarize: 5000
|
|
```
|
|
|
|
### Multi-Model Configuration
|
|
```yaml
|
|
summarization:
|
|
enabled: true
|
|
model_name: gpt-4o-mini
|
|
trigger:
|
|
type: fraction
|
|
value: 0.7 # 70% of model's max input
|
|
keep:
|
|
type: fraction
|
|
value: 0.3 # Keep 30% of max input
|
|
trim_tokens_to_summarize: 4000
|
|
```
|
|
|
|
### Conservative Configuration (High Quality)
|
|
```yaml
|
|
summarization:
|
|
enabled: true
|
|
model_name: gpt-4 # Use full model for high-quality summaries
|
|
trigger:
|
|
type: tokens
|
|
value: 8000
|
|
keep:
|
|
type: messages
|
|
value: 40 # Keep more context
|
|
trim_tokens_to_summarize: null # No trimming
|
|
```
|
|
|
|
## References
|
|
|
|
- [LangChain Summarization Middleware Documentation](https://docs.langchain.com/oss/python/langchain/middleware/built-in#summarization)
|
|
- [LangChain Source Code](https://github.com/langchain-ai/langchain)
|