mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-03 06:12:14 +08:00
354 lines
10 KiB
Markdown
354 lines
10 KiB
Markdown
|
|
# Conversation Summarization
|
||
|
|
|
||
|
|
DeerFlow includes automatic conversation summarization to handle long conversations that approach model token limits. When enabled, the system automatically condenses older messages while preserving recent context.
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
The summarization feature uses LangChain's `SummarizationMiddleware` to monitor conversation history and trigger summarization based on configurable thresholds. When activated, it:
|
||
|
|
|
||
|
|
1. Monitors message token counts in real-time
|
||
|
|
2. Triggers summarization when thresholds are met
|
||
|
|
3. Keeps recent messages intact while summarizing older exchanges
|
||
|
|
4. Maintains AI/Tool message pairs together for context continuity
|
||
|
|
5. Injects the summary back into the conversation
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
Summarization is configured in `config.yaml` under the `summarization` key:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
summarization:
|
||
|
|
enabled: true
|
||
|
|
model_name: null # Use default model or specify a lightweight model
|
||
|
|
|
||
|
|
# Trigger conditions (OR logic - any condition triggers summarization)
|
||
|
|
trigger:
|
||
|
|
- type: tokens
|
||
|
|
value: 4000
|
||
|
|
# Additional triggers (optional)
|
||
|
|
# - type: messages
|
||
|
|
# value: 50
|
||
|
|
# - type: fraction
|
||
|
|
# value: 0.8 # 80% of model's max input tokens
|
||
|
|
|
||
|
|
# Context retention policy
|
||
|
|
keep:
|
||
|
|
type: messages
|
||
|
|
value: 20
|
||
|
|
|
||
|
|
# Token trimming for summarization call
|
||
|
|
trim_tokens_to_summarize: 4000
|
||
|
|
|
||
|
|
# Custom summary prompt (optional)
|
||
|
|
summary_prompt: null
|
||
|
|
```
|
||
|
|
|
||
|
|
### Configuration Options
|
||
|
|
|
||
|
|
#### `enabled`
|
||
|
|
- **Type**: Boolean
|
||
|
|
- **Default**: `false`
|
||
|
|
- **Description**: Enable or disable automatic summarization
|
||
|
|
|
||
|
|
#### `model_name`
|
||
|
|
- **Type**: String or null
|
||
|
|
- **Default**: `null` (uses default model)
|
||
|
|
- **Description**: Model to use for generating summaries. Recommended to use a lightweight, cost-effective model like `gpt-4o-mini` or equivalent.
|
||
|
|
|
||
|
|
#### `trigger`
|
||
|
|
- **Type**: Single `ContextSize` or list of `ContextSize` objects
|
||
|
|
- **Required**: At least one trigger must be specified when enabled
|
||
|
|
- **Description**: Thresholds that trigger summarization. Uses OR logic - summarization runs when ANY threshold is met.
|
||
|
|
|
||
|
|
**ContextSize Types:**
|
||
|
|
|
||
|
|
1. **Token-based trigger**: Activates when token count reaches the specified value
|
||
|
|
```yaml
|
||
|
|
trigger:
|
||
|
|
type: tokens
|
||
|
|
value: 4000
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Message-based trigger**: Activates when message count reaches the specified value
|
||
|
|
```yaml
|
||
|
|
trigger:
|
||
|
|
type: messages
|
||
|
|
value: 50
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Fraction-based trigger**: Activates when token usage reaches a percentage of the model's maximum input tokens
|
||
|
|
```yaml
|
||
|
|
trigger:
|
||
|
|
type: fraction
|
||
|
|
value: 0.8 # 80% of max input tokens
|
||
|
|
```
|
||
|
|
|
||
|
|
**Multiple Triggers:**
|
||
|
|
```yaml
|
||
|
|
trigger:
|
||
|
|
- type: tokens
|
||
|
|
value: 4000
|
||
|
|
- type: messages
|
||
|
|
value: 50
|
||
|
|
```
|
||
|
|
|
||
|
|
#### `keep`
|
||
|
|
- **Type**: `ContextSize` object
|
||
|
|
- **Default**: `{type: messages, value: 20}`
|
||
|
|
- **Description**: Specifies how much recent conversation history to preserve after summarization.
|
||
|
|
|
||
|
|
**Examples:**
|
||
|
|
```yaml
|
||
|
|
# Keep most recent 20 messages
|
||
|
|
keep:
|
||
|
|
type: messages
|
||
|
|
value: 20
|
||
|
|
|
||
|
|
# Keep most recent 3000 tokens
|
||
|
|
keep:
|
||
|
|
type: tokens
|
||
|
|
value: 3000
|
||
|
|
|
||
|
|
# Keep most recent 30% of model's max input tokens
|
||
|
|
keep:
|
||
|
|
type: fraction
|
||
|
|
value: 0.3
|
||
|
|
```
|
||
|
|
|
||
|
|
#### `trim_tokens_to_summarize`
|
||
|
|
- **Type**: Integer or null
|
||
|
|
- **Default**: `4000`
|
||
|
|
- **Description**: Maximum tokens to include when preparing messages for the summarization call itself. Set to `null` to skip trimming (not recommended for very long conversations).
|
||
|
|
|
||
|
|
#### `summary_prompt`
|
||
|
|
- **Type**: String or null
|
||
|
|
- **Default**: `null` (uses LangChain's default prompt)
|
||
|
|
- **Description**: Custom prompt template for generating summaries. The prompt should guide the model to extract the most important context.
|
||
|
|
|
||
|
|
**Default Prompt Behavior:**
|
||
|
|
The default LangChain prompt instructs the model to:
|
||
|
|
- Extract highest quality/most relevant context
|
||
|
|
- Focus on information critical to the overall goal
|
||
|
|
- Avoid repeating completed actions
|
||
|
|
- Return only the extracted context
|
||
|
|
|
||
|
|
## How It Works
|
||
|
|
|
||
|
|
### Summarization Flow
|
||
|
|
|
||
|
|
1. **Monitoring**: Before each model call, the middleware counts tokens in the message history
|
||
|
|
2. **Trigger Check**: If any configured threshold is met, summarization is triggered
|
||
|
|
3. **Message Partitioning**: Messages are split into:
|
||
|
|
- Messages to summarize (older messages beyond the `keep` threshold)
|
||
|
|
- Messages to preserve (recent messages within the `keep` threshold)
|
||
|
|
4. **Summary Generation**: The model generates a concise summary of the older messages
|
||
|
|
5. **Context Replacement**: The message history is updated:
|
||
|
|
- All old messages are removed
|
||
|
|
- A single summary message is added
|
||
|
|
- Recent messages are preserved
|
||
|
|
6. **AI/Tool Pair Protection**: The system ensures AI messages and their corresponding tool messages stay together
|
||
|
|
|
||
|
|
### Token Counting
|
||
|
|
|
||
|
|
- Uses approximate token counting based on character count
|
||
|
|
- For Anthropic models: ~3.3 characters per token
|
||
|
|
- For other models: Uses LangChain's default estimation
|
||
|
|
- Can be customized with a custom `token_counter` function
|
||
|
|
|
||
|
|
### Message Preservation
|
||
|
|
|
||
|
|
The middleware intelligently preserves message context:
|
||
|
|
|
||
|
|
- **Recent Messages**: Always kept intact based on `keep` configuration
|
||
|
|
- **AI/Tool Pairs**: Never split - if a cutoff point falls within tool messages, the system adjusts to keep the entire AI + Tool message sequence together
|
||
|
|
- **Summary Format**: Summary is injected as a HumanMessage with the format:
|
||
|
|
```
|
||
|
|
Here is a summary of the conversation to date:
|
||
|
|
|
||
|
|
[Generated summary text]
|
||
|
|
```
|
||
|
|
|
||
|
|
## Best Practices
|
||
|
|
|
||
|
|
### Choosing Trigger Thresholds
|
||
|
|
|
||
|
|
1. **Token-based triggers**: Recommended for most use cases
|
||
|
|
- Set to 60-80% of your model's context window
|
||
|
|
- Example: For 8K context, use 4000-6000 tokens
|
||
|
|
|
||
|
|
2. **Message-based triggers**: Useful for controlling conversation length
|
||
|
|
- Good for applications with many short messages
|
||
|
|
- Example: 50-100 messages depending on average message length
|
||
|
|
|
||
|
|
3. **Fraction-based triggers**: Ideal when using multiple models
|
||
|
|
- Automatically adapts to each model's capacity
|
||
|
|
- Example: 0.8 (80% of model's max input tokens)
|
||
|
|
|
||
|
|
### Choosing Retention Policy (`keep`)
|
||
|
|
|
||
|
|
1. **Message-based retention**: Best for most scenarios
|
||
|
|
- Preserves natural conversation flow
|
||
|
|
- Recommended: 15-25 messages
|
||
|
|
|
||
|
|
2. **Token-based retention**: Use when precise control is needed
|
||
|
|
- Good for managing exact token budgets
|
||
|
|
- Recommended: 2000-4000 tokens
|
||
|
|
|
||
|
|
3. **Fraction-based retention**: For multi-model setups
|
||
|
|
- Automatically scales with model capacity
|
||
|
|
- Recommended: 0.2-0.4 (20-40% of max input)
|
||
|
|
|
||
|
|
### Model Selection
|
||
|
|
|
||
|
|
- **Recommended**: Use a lightweight, cost-effective model for summaries
|
||
|
|
- Examples: `gpt-4o-mini`, `claude-haiku`, or equivalent
|
||
|
|
- Summaries don't require the most powerful models
|
||
|
|
- Significant cost savings on high-volume applications
|
||
|
|
|
||
|
|
- **Default**: If `model_name` is `null`, uses the default model
|
||
|
|
- May be more expensive but ensures consistency
|
||
|
|
- Good for simple setups
|
||
|
|
|
||
|
|
### Optimization Tips
|
||
|
|
|
||
|
|
1. **Balance triggers**: Combine token and message triggers for robust handling
|
||
|
|
```yaml
|
||
|
|
trigger:
|
||
|
|
- type: tokens
|
||
|
|
value: 4000
|
||
|
|
- type: messages
|
||
|
|
value: 50
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Conservative retention**: Keep more messages initially, adjust based on performance
|
||
|
|
```yaml
|
||
|
|
keep:
|
||
|
|
type: messages
|
||
|
|
value: 25 # Start higher, reduce if needed
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Trim strategically**: Limit tokens sent to summarization model
|
||
|
|
```yaml
|
||
|
|
trim_tokens_to_summarize: 4000 # Prevents expensive summarization calls
|
||
|
|
```
|
||
|
|
|
||
|
|
4. **Monitor and iterate**: Track summary quality and adjust configuration
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Summary Quality Issues
|
||
|
|
|
||
|
|
**Problem**: Summaries losing important context
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Increase `keep` value to preserve more messages
|
||
|
|
2. Decrease trigger thresholds to summarize earlier
|
||
|
|
3. Customize `summary_prompt` to emphasize key information
|
||
|
|
4. Use a more capable model for summarization
|
||
|
|
|
||
|
|
### Performance Issues
|
||
|
|
|
||
|
|
**Problem**: Summarization calls taking too long
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Use a faster model for summaries (e.g., `gpt-4o-mini`)
|
||
|
|
2. Reduce `trim_tokens_to_summarize` to send less context
|
||
|
|
3. Increase trigger thresholds to summarize less frequently
|
||
|
|
|
||
|
|
### Token Limit Errors
|
||
|
|
|
||
|
|
**Problem**: Still hitting token limits despite summarization
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Lower trigger thresholds to summarize earlier
|
||
|
|
2. Reduce `keep` value to preserve fewer messages
|
||
|
|
3. Check if individual messages are very large
|
||
|
|
4. Consider using fraction-based triggers
|
||
|
|
|
||
|
|
## Implementation Details
|
||
|
|
|
||
|
|
### Code Structure
|
||
|
|
|
||
|
|
- **Configuration**: `src/config/summarization_config.py`
|
||
|
|
- **Integration**: `src/agents/lead_agent/agent.py`
|
||
|
|
- **Middleware**: Uses `langchain.agents.middleware.SummarizationMiddleware`
|
||
|
|
|
||
|
|
### Middleware Order
|
||
|
|
|
||
|
|
Summarization runs after ThreadData and Sandbox initialization but before Title and Clarification:
|
||
|
|
|
||
|
|
1. ThreadDataMiddleware
|
||
|
|
2. SandboxMiddleware
|
||
|
|
3. **SummarizationMiddleware** ← Runs here
|
||
|
|
4. TitleMiddleware
|
||
|
|
5. ClarificationMiddleware
|
||
|
|
|
||
|
|
### State Management
|
||
|
|
|
||
|
|
- Summarization is stateless - configuration is loaded once at startup
|
||
|
|
- Summaries are added as regular messages in the conversation history
|
||
|
|
- The checkpointer persists the summarized history automatically
|
||
|
|
|
||
|
|
## Example Configurations
|
||
|
|
|
||
|
|
### Minimal Configuration
|
||
|
|
```yaml
|
||
|
|
summarization:
|
||
|
|
enabled: true
|
||
|
|
trigger:
|
||
|
|
type: tokens
|
||
|
|
value: 4000
|
||
|
|
keep:
|
||
|
|
type: messages
|
||
|
|
value: 20
|
||
|
|
```
|
||
|
|
|
||
|
|
### Production Configuration
|
||
|
|
```yaml
|
||
|
|
summarization:
|
||
|
|
enabled: true
|
||
|
|
model_name: gpt-4o-mini # Lightweight model for cost efficiency
|
||
|
|
trigger:
|
||
|
|
- type: tokens
|
||
|
|
value: 6000
|
||
|
|
- type: messages
|
||
|
|
value: 75
|
||
|
|
keep:
|
||
|
|
type: messages
|
||
|
|
value: 25
|
||
|
|
trim_tokens_to_summarize: 5000
|
||
|
|
```
|
||
|
|
|
||
|
|
### Multi-Model Configuration
|
||
|
|
```yaml
|
||
|
|
summarization:
|
||
|
|
enabled: true
|
||
|
|
model_name: gpt-4o-mini
|
||
|
|
trigger:
|
||
|
|
type: fraction
|
||
|
|
value: 0.7 # 70% of model's max input
|
||
|
|
keep:
|
||
|
|
type: fraction
|
||
|
|
value: 0.3 # Keep 30% of max input
|
||
|
|
trim_tokens_to_summarize: 4000
|
||
|
|
```
|
||
|
|
|
||
|
|
### Conservative Configuration (High Quality)
|
||
|
|
```yaml
|
||
|
|
summarization:
|
||
|
|
enabled: true
|
||
|
|
model_name: gpt-4 # Use full model for high-quality summaries
|
||
|
|
trigger:
|
||
|
|
type: tokens
|
||
|
|
value: 8000
|
||
|
|
keep:
|
||
|
|
type: messages
|
||
|
|
value: 40 # Keep more context
|
||
|
|
trim_tokens_to_summarize: null # No trimming
|
||
|
|
```
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- [LangChain Summarization Middleware Documentation](https://docs.langchain.com/oss/python/langchain/middleware/built-in#summarization)
|
||
|
|
- [LangChain Source Code](https://github.com/langchain-ai/langchain)
|