DeerFlow includes automatic conversation summarization to handle long conversations that approach model token limits. When enabled, the system automatically condenses older messages while preserving recent context.
## Overview
The summarization feature uses LangChain's `SummarizationMiddleware` to monitor conversation history and trigger summarization based on configurable thresholds. When activated, it:
1. Monitors message token counts in real-time
2. Triggers summarization when thresholds are met
3. Keeps recent messages intact while summarizing older exchanges
4. Maintains AI/Tool message pairs together for context continuity
5. Injects the summary back into the conversation
## Configuration
Summarization is configured in `config.yaml` under the `summarization` key:
```yaml
summarization:
enabled: true
model_name: null # Use default model or specify a lightweight model
# Trigger conditions (OR logic - any condition triggers summarization)
trigger:
- type: tokens
value: 4000
# Additional triggers (optional)
# - type: messages
# value: 50
# - type: fraction
# value: 0.8 # 80% of model's max input tokens
# Context retention policy
keep:
type: messages
value: 20
# Token trimming for summarization call
trim_tokens_to_summarize: 4000
# Custom summary prompt (optional)
summary_prompt: null
```
### Configuration Options
#### `enabled`
- **Type**: Boolean
- **Default**: `false`
- **Description**: Enable or disable automatic summarization
#### `model_name`
- **Type**: String or null
- **Default**: `null` (uses default model)
- **Description**: Model to use for generating summaries. Recommended to use a lightweight, cost-effective model like `gpt-4o-mini` or equivalent.
#### `trigger`
- **Type**: Single `ContextSize` or list of `ContextSize` objects
- **Required**: At least one trigger must be specified when enabled
- **Description**: Thresholds that trigger summarization. Uses OR logic - summarization runs when ANY threshold is met.
**ContextSize Types:**
1.**Token-based trigger**: Activates when token count reaches the specified value
```yaml
trigger:
type: tokens
value: 4000
```
2.**Message-based trigger**: Activates when message count reaches the specified value
```yaml
trigger:
type: messages
value: 50
```
3.**Fraction-based trigger**: Activates when token usage reaches a percentage of the model's maximum input tokens
```yaml
trigger:
type: fraction
value: 0.8 # 80% of max input tokens
```
**Multiple Triggers:**
```yaml
trigger:
- type: tokens
value: 4000
- type: messages
value: 50
```
#### `keep`
- **Type**: `ContextSize` object
- **Default**: `{type: messages, value: 20}`
- **Description**: Specifies how much recent conversation history to preserve after summarization.
**Examples:**
```yaml
# Keep most recent 20 messages
keep:
type: messages
value: 20
# Keep most recent 3000 tokens
keep:
type: tokens
value: 3000
# Keep most recent 30% of model's max input tokens
keep:
type: fraction
value: 0.3
```
#### `trim_tokens_to_summarize`
- **Type**: Integer or null
- **Default**: `4000`
- **Description**: Maximum tokens to include when preparing messages for the summarization call itself. Set to `null` to skip trimming (not recommended for very long conversations).
- **Description**: Custom prompt template for generating summaries. The prompt should guide the model to extract the most important context.
**Default Prompt Behavior:**
The default LangChain prompt instructs the model to:
- Extract highest quality/most relevant context
- Focus on information critical to the overall goal
- Avoid repeating completed actions
- Return only the extracted context
## How It Works
### Summarization Flow
1.**Monitoring**: Before each model call, the middleware counts tokens in the message history
2.**Trigger Check**: If any configured threshold is met, summarization is triggered
3.**Message Partitioning**: Messages are split into:
- Messages to summarize (older messages beyond the `keep` threshold)
- Messages to preserve (recent messages within the `keep` threshold)
4.**Summary Generation**: The model generates a concise summary of the older messages
5.**Context Replacement**: The message history is updated:
- All old messages are removed
- A single summary message is added
- Recent messages are preserved
6.**AI/Tool Pair Protection**: The system ensures AI messages and their corresponding tool messages stay together
### Token Counting
- Uses approximate token counting based on character count
- For Anthropic models: ~3.3 characters per token
- For other models: Uses LangChain's default estimation
- Can be customized with a custom `token_counter` function
### Message Preservation
The middleware intelligently preserves message context:
- **Recent Messages**: Always kept intact based on `keep` configuration
- **AI/Tool Pairs**: Never split - if a cutoff point falls within tool messages, the system adjusts to keep the entire AI + Tool message sequence together
- **Summary Format**: Summary is injected as a HumanMessage with the format:
```
Here is a summary of the conversation to date:
[Generated summary text]
```
## Best Practices
### Choosing Trigger Thresholds
1.**Token-based triggers**: Recommended for most use cases
- Set to 60-80% of your model's context window
- Example: For 8K context, use 4000-6000 tokens
2.**Message-based triggers**: Useful for controlling conversation length
- Good for applications with many short messages
- Example: 50-100 messages depending on average message length
3.**Fraction-based triggers**: Ideal when using multiple models
- Automatically adapts to each model's capacity
- Example: 0.8 (80% of model's max input tokens)
### Choosing Retention Policy (`keep`)
1.**Message-based retention**: Best for most scenarios
- Preserves natural conversation flow
- Recommended: 15-25 messages
2.**Token-based retention**: Use when precise control is needed
- Good for managing exact token budgets
- Recommended: 2000-4000 tokens
3.**Fraction-based retention**: For multi-model setups
- Automatically scales with model capacity
- Recommended: 0.2-0.4 (20-40% of max input)
### Model Selection
- **Recommended**: Use a lightweight, cost-effective model for summaries
- Examples: `gpt-4o-mini`, `claude-haiku`, or equivalent
- Summaries don't require the most powerful models
- Significant cost savings on high-volume applications
- **Default**: If `model_name` is `null`, uses the default model
- May be more expensive but ensures consistency
- Good for simple setups
### Optimization Tips
1.**Balance triggers**: Combine token and message triggers for robust handling
```yaml
trigger:
- type: tokens
value: 4000
- type: messages
value: 50
```
2.**Conservative retention**: Keep more messages initially, adjust based on performance
```yaml
keep:
type: messages
value: 25 # Start higher, reduce if needed
```
3.**Trim strategically**: Limit tokens sent to summarization model