feat: implement summarization (#14)

This commit is contained in:
DanielWalnut
2026-01-19 16:17:31 +08:00
committed by GitHub
parent 1352b0e0ba
commit f0a2381bd5
8 changed files with 555 additions and 5 deletions

View File

@@ -81,14 +81,27 @@ Config values starting with `$` are resolved as environment variables (e.g., `$O
- Local sandbox: `/mnt/skills``/path/to/deer-flow/skills`
- Docker sandbox: Automatically mounted as volume
**Middleware System**
- Custom middlewares in `src/agents/middlewares/`: Title generation, thread data, clarification, etc.
- `SummarizationMiddleware` from LangChain automatically condenses conversation history when token limits are approached
- Configured in `config.yaml` under `summarization` key with trigger/keep thresholds
- Middlewares are registered in `src/agents/lead_agent/agent.py` with execution order:
1. `ThreadDataMiddleware` - Initializes thread context
2. `SandboxMiddleware` - Manages sandbox lifecycle
3. `SummarizationMiddleware` - Reduces context when limits are approached (if enabled)
4. `TitleMiddleware` - Generates conversation titles
5. `ClarificationMiddleware` - Handles clarification requests (must be last)
### Config Schema
Models, tools, sandbox providers, and skills are configured in `config.yaml`:
Models, tools, sandbox providers, skills, and middleware settings are configured in `config.yaml`:
- `models[]`: LLM configurations with `use` class path
- `tools[]`: Tool configurations with `use` variable path and `group`
- `sandbox.use`: Sandbox provider class path
- `skills.path`: Host path to skills directory (optional, default: `../skills`)
- `skills.container_path`: Container mount path (default: `/mnt/skills`)
- `title`: Automatic thread title generation configuration
- `summarization`: Automatic conversation summarization configuration
## Code Style

View File

@@ -4,7 +4,9 @@
[x] Launch the sandbox only after the first file system or bash tool is called
[ ] Pooling the sandbox resources to reduce the number of sandbox containers
[ ] Add Clarification Process for the whole process
[x] Add Clarification Process for the whole process
[x] Implement Context Summarization Mechanism to avoid context explosion\
[ ] Integrate MCP
## Issues

View File

@@ -0,0 +1,353 @@
# Conversation Summarization
DeerFlow includes automatic conversation summarization to handle long conversations that approach model token limits. When enabled, the system automatically condenses older messages while preserving recent context.
## Overview
The summarization feature uses LangChain's `SummarizationMiddleware` to monitor conversation history and trigger summarization based on configurable thresholds. When activated, it:
1. Monitors message token counts in real-time
2. Triggers summarization when thresholds are met
3. Keeps recent messages intact while summarizing older exchanges
4. Maintains AI/Tool message pairs together for context continuity
5. Injects the summary back into the conversation
## Configuration
Summarization is configured in `config.yaml` under the `summarization` key:
```yaml
summarization:
enabled: true
model_name: null # Use default model or specify a lightweight model
# Trigger conditions (OR logic - any condition triggers summarization)
trigger:
- type: tokens
value: 4000
# Additional triggers (optional)
# - type: messages
# value: 50
# - type: fraction
# value: 0.8 # 80% of model's max input tokens
# Context retention policy
keep:
type: messages
value: 20
# Token trimming for summarization call
trim_tokens_to_summarize: 4000
# Custom summary prompt (optional)
summary_prompt: null
```
### Configuration Options
#### `enabled`
- **Type**: Boolean
- **Default**: `false`
- **Description**: Enable or disable automatic summarization
#### `model_name`
- **Type**: String or null
- **Default**: `null` (uses default model)
- **Description**: Model to use for generating summaries. Recommended to use a lightweight, cost-effective model like `gpt-4o-mini` or equivalent.
#### `trigger`
- **Type**: Single `ContextSize` or list of `ContextSize` objects
- **Required**: At least one trigger must be specified when enabled
- **Description**: Thresholds that trigger summarization. Uses OR logic - summarization runs when ANY threshold is met.
**ContextSize Types:**
1. **Token-based trigger**: Activates when token count reaches the specified value
```yaml
trigger:
type: tokens
value: 4000
```
2. **Message-based trigger**: Activates when message count reaches the specified value
```yaml
trigger:
type: messages
value: 50
```
3. **Fraction-based trigger**: Activates when token usage reaches a percentage of the model's maximum input tokens
```yaml
trigger:
type: fraction
value: 0.8 # 80% of max input tokens
```
**Multiple Triggers:**
```yaml
trigger:
- type: tokens
value: 4000
- type: messages
value: 50
```
#### `keep`
- **Type**: `ContextSize` object
- **Default**: `{type: messages, value: 20}`
- **Description**: Specifies how much recent conversation history to preserve after summarization.
**Examples:**
```yaml
# Keep most recent 20 messages
keep:
type: messages
value: 20
# Keep most recent 3000 tokens
keep:
type: tokens
value: 3000
# Keep most recent 30% of model's max input tokens
keep:
type: fraction
value: 0.3
```
#### `trim_tokens_to_summarize`
- **Type**: Integer or null
- **Default**: `4000`
- **Description**: Maximum tokens to include when preparing messages for the summarization call itself. Set to `null` to skip trimming (not recommended for very long conversations).
#### `summary_prompt`
- **Type**: String or null
- **Default**: `null` (uses LangChain's default prompt)
- **Description**: Custom prompt template for generating summaries. The prompt should guide the model to extract the most important context.
**Default Prompt Behavior:**
The default LangChain prompt instructs the model to:
- Extract highest quality/most relevant context
- Focus on information critical to the overall goal
- Avoid repeating completed actions
- Return only the extracted context
## How It Works
### Summarization Flow
1. **Monitoring**: Before each model call, the middleware counts tokens in the message history
2. **Trigger Check**: If any configured threshold is met, summarization is triggered
3. **Message Partitioning**: Messages are split into:
- Messages to summarize (older messages beyond the `keep` threshold)
- Messages to preserve (recent messages within the `keep` threshold)
4. **Summary Generation**: The model generates a concise summary of the older messages
5. **Context Replacement**: The message history is updated:
- All old messages are removed
- A single summary message is added
- Recent messages are preserved
6. **AI/Tool Pair Protection**: The system ensures AI messages and their corresponding tool messages stay together
### Token Counting
- Uses approximate token counting based on character count
- For Anthropic models: ~3.3 characters per token
- For other models: Uses LangChain's default estimation
- Can be customized with a custom `token_counter` function
### Message Preservation
The middleware intelligently preserves message context:
- **Recent Messages**: Always kept intact based on `keep` configuration
- **AI/Tool Pairs**: Never split - if a cutoff point falls within tool messages, the system adjusts to keep the entire AI + Tool message sequence together
- **Summary Format**: Summary is injected as a HumanMessage with the format:
```
Here is a summary of the conversation to date:
[Generated summary text]
```
## Best Practices
### Choosing Trigger Thresholds
1. **Token-based triggers**: Recommended for most use cases
- Set to 60-80% of your model's context window
- Example: For 8K context, use 4000-6000 tokens
2. **Message-based triggers**: Useful for controlling conversation length
- Good for applications with many short messages
- Example: 50-100 messages depending on average message length
3. **Fraction-based triggers**: Ideal when using multiple models
- Automatically adapts to each model's capacity
- Example: 0.8 (80% of model's max input tokens)
### Choosing Retention Policy (`keep`)
1. **Message-based retention**: Best for most scenarios
- Preserves natural conversation flow
- Recommended: 15-25 messages
2. **Token-based retention**: Use when precise control is needed
- Good for managing exact token budgets
- Recommended: 2000-4000 tokens
3. **Fraction-based retention**: For multi-model setups
- Automatically scales with model capacity
- Recommended: 0.2-0.4 (20-40% of max input)
### Model Selection
- **Recommended**: Use a lightweight, cost-effective model for summaries
- Examples: `gpt-4o-mini`, `claude-haiku`, or equivalent
- Summaries don't require the most powerful models
- Significant cost savings on high-volume applications
- **Default**: If `model_name` is `null`, uses the default model
- May be more expensive but ensures consistency
- Good for simple setups
### Optimization Tips
1. **Balance triggers**: Combine token and message triggers for robust handling
```yaml
trigger:
- type: tokens
value: 4000
- type: messages
value: 50
```
2. **Conservative retention**: Keep more messages initially, adjust based on performance
```yaml
keep:
type: messages
value: 25 # Start higher, reduce if needed
```
3. **Trim strategically**: Limit tokens sent to summarization model
```yaml
trim_tokens_to_summarize: 4000 # Prevents expensive summarization calls
```
4. **Monitor and iterate**: Track summary quality and adjust configuration
## Troubleshooting
### Summary Quality Issues
**Problem**: Summaries losing important context
**Solutions**:
1. Increase `keep` value to preserve more messages
2. Decrease trigger thresholds to summarize earlier
3. Customize `summary_prompt` to emphasize key information
4. Use a more capable model for summarization
### Performance Issues
**Problem**: Summarization calls taking too long
**Solutions**:
1. Use a faster model for summaries (e.g., `gpt-4o-mini`)
2. Reduce `trim_tokens_to_summarize` to send less context
3. Increase trigger thresholds to summarize less frequently
### Token Limit Errors
**Problem**: Still hitting token limits despite summarization
**Solutions**:
1. Lower trigger thresholds to summarize earlier
2. Reduce `keep` value to preserve fewer messages
3. Check if individual messages are very large
4. Consider using fraction-based triggers
## Implementation Details
### Code Structure
- **Configuration**: `src/config/summarization_config.py`
- **Integration**: `src/agents/lead_agent/agent.py`
- **Middleware**: Uses `langchain.agents.middleware.SummarizationMiddleware`
### Middleware Order
Summarization runs after ThreadData and Sandbox initialization but before Title and Clarification:
1. ThreadDataMiddleware
2. SandboxMiddleware
3. **SummarizationMiddleware** ← Runs here
4. TitleMiddleware
5. ClarificationMiddleware
### State Management
- Summarization is stateless - configuration is loaded once at startup
- Summaries are added as regular messages in the conversation history
- The checkpointer persists the summarized history automatically
## Example Configurations
### Minimal Configuration
```yaml
summarization:
enabled: true
trigger:
type: tokens
value: 4000
keep:
type: messages
value: 20
```
### Production Configuration
```yaml
summarization:
enabled: true
model_name: gpt-4o-mini # Lightweight model for cost efficiency
trigger:
- type: tokens
value: 6000
- type: messages
value: 75
keep:
type: messages
value: 25
trim_tokens_to_summarize: 5000
```
### Multi-Model Configuration
```yaml
summarization:
enabled: true
model_name: gpt-4o-mini
trigger:
type: fraction
value: 0.7 # 70% of model's max input
keep:
type: fraction
value: 0.3 # Keep 30% of max input
trim_tokens_to_summarize: 4000
```
### Conservative Configuration (High Quality)
```yaml
summarization:
enabled: true
model_name: gpt-4 # Use full model for high-quality summaries
trigger:
type: tokens
value: 8000
keep:
type: messages
value: 40 # Keep more context
trim_tokens_to_summarize: null # No trimming
```
## References
- [LangChain Summarization Middleware Documentation](https://docs.langchain.com/oss/python/langchain/middleware/built-in#summarization)
- [LangChain Source Code](https://github.com/langchain-ai/langchain)

View File

@@ -1,4 +1,5 @@
from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware
from langchain_core.runnables import RunnableConfig
from src.agents.lead_agent.prompt import apply_prompt_template
@@ -6,12 +7,66 @@ from src.agents.middlewares.clarification_middleware import ClarificationMiddlew
from src.agents.middlewares.thread_data_middleware import ThreadDataMiddleware
from src.agents.middlewares.title_middleware import TitleMiddleware
from src.agents.thread_state import ThreadState
from src.config.summarization_config import get_summarization_config
from src.models import create_chat_model
from src.sandbox.middleware import SandboxMiddleware
def _create_summarization_middleware() -> SummarizationMiddleware | None:
"""Create and configure the summarization middleware from config."""
config = get_summarization_config()
if not config.enabled:
return None
# Prepare trigger parameter
trigger = None
if config.trigger is not None:
if isinstance(config.trigger, list):
trigger = [t.to_tuple() for t in config.trigger]
else:
trigger = config.trigger.to_tuple()
# Prepare keep parameter
keep = config.keep.to_tuple()
# Prepare model parameter
if config.model_name:
model = config.model_name
else:
# Use a lightweight model for summarization to save costs
# Falls back to default model if not explicitly specified
model = create_chat_model(thinking_enabled=False)
# Prepare kwargs
kwargs = {
"model": model,
"trigger": trigger,
"keep": keep,
}
if config.trim_tokens_to_summarize is not None:
kwargs["trim_tokens_to_summarize"] = config.trim_tokens_to_summarize
if config.summary_prompt is not None:
kwargs["summary_prompt"] = config.summary_prompt
return SummarizationMiddleware(**kwargs)
# ThreadDataMiddleware must be before SandboxMiddleware to ensure thread_id is available
# SummarizationMiddleware should be early to reduce context before other processing
# ClarificationMiddleware should be last to intercept clarification requests after model calls
middlewares = [ThreadDataMiddleware(), SandboxMiddleware(), TitleMiddleware(), ClarificationMiddleware()]
def _build_middlewares():
middlewares = [ThreadDataMiddleware(), SandboxMiddleware()]
# Add summarization middleware if enabled
summarization_middleware = _create_summarization_middleware()
if summarization_middleware is not None:
middlewares.append(summarization_middleware)
middlewares.extend([TitleMiddleware(), ClarificationMiddleware()])
return middlewares
def make_lead_agent(config: RunnableConfig):
@@ -24,7 +79,7 @@ def make_lead_agent(config: RunnableConfig):
return create_agent(
model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled),
tools=get_available_tools(),
middleware=middlewares,
middleware=_build_middlewares(),
system_prompt=apply_prompt_template(),
state_schema=ThreadState,
)

View File

@@ -89,7 +89,7 @@ You: "Deploying to staging..." [proceed]
You have access to skills that provide optimized workflows for specific tasks. Each skill contains best practices, frameworks, and references to additional resources.
**Progressive Loading Pattern:**
1. When a user query matches a skill's use case, immediately call `view` on the skill's main file using the path attribute provided in the skill tag below
1. When a user query matches a skill's use case, immediately call `read_file` on the skill's main file using the path attribute provided in the skill tag below
2. Read and understand the skill's workflow and instructions
3. The skill file contains references to external resources under the same folder
4. Load referenced resources only when needed during execution

View File

@@ -9,6 +9,7 @@ from pydantic import BaseModel, ConfigDict, Field
from src.config.model_config import ModelConfig
from src.config.sandbox_config import SandboxConfig
from src.config.skills_config import SkillsConfig
from src.config.summarization_config import load_summarization_config_from_dict
from src.config.title_config import load_title_config_from_dict
from src.config.tool_config import ToolConfig, ToolGroupConfig
@@ -75,6 +76,10 @@ class AppConfig(BaseModel):
if "title" in config_data:
load_title_config_from_dict(config_data["title"])
# Load summarization config if present
if "summarization" in config_data:
load_summarization_config_from_dict(config_data["summarization"])
result = cls.model_validate(config_data)
return result

View File

@@ -0,0 +1,74 @@
"""Configuration for conversation summarization."""
from typing import Literal
from pydantic import BaseModel, Field
ContextSizeType = Literal["fraction", "tokens", "messages"]
class ContextSize(BaseModel):
"""Context size specification for trigger or keep parameters."""
type: ContextSizeType = Field(description="Type of context size specification")
value: int | float = Field(description="Value for the context size specification")
def to_tuple(self) -> tuple[ContextSizeType, int | float]:
"""Convert to tuple format expected by SummarizationMiddleware."""
return (self.type, self.value)
class SummarizationConfig(BaseModel):
"""Configuration for automatic conversation summarization."""
enabled: bool = Field(
default=False,
description="Whether to enable automatic conversation summarization",
)
model_name: str | None = Field(
default=None,
description="Model name to use for summarization (None = use a lightweight model)",
)
trigger: ContextSize | list[ContextSize] | None = Field(
default=None,
description="One or more thresholds that trigger summarization. When any threshold is met, summarization runs. "
"Examples: {'type': 'messages', 'value': 50} triggers at 50 messages, "
"{'type': 'tokens', 'value': 4000} triggers at 4000 tokens, "
"{'type': 'fraction', 'value': 0.8} triggers at 80% of model's max input tokens",
)
keep: ContextSize = Field(
default_factory=lambda: ContextSize(type="messages", value=20),
description="Context retention policy after summarization. Specifies how much history to preserve. "
"Examples: {'type': 'messages', 'value': 20} keeps 20 messages, "
"{'type': 'tokens', 'value': 3000} keeps 3000 tokens, "
"{'type': 'fraction', 'value': 0.3} keeps 30% of model's max input tokens",
)
trim_tokens_to_summarize: int | None = Field(
default=4000,
description="Maximum tokens to keep when preparing messages for summarization. Pass null to skip trimming.",
)
summary_prompt: str | None = Field(
default=None,
description="Custom prompt template for generating summaries. If not provided, uses the default LangChain prompt.",
)
# Global configuration instance
_summarization_config: SummarizationConfig = SummarizationConfig()
def get_summarization_config() -> SummarizationConfig:
"""Get the current summarization configuration."""
return _summarization_config
def set_summarization_config(config: SummarizationConfig) -> None:
"""Set the summarization configuration."""
global _summarization_config
_summarization_config = config
def load_summarization_config_from_dict(config_dict: dict) -> None:
"""Load summarization configuration from a dictionary."""
global _summarization_config
_summarization_config = SummarizationConfig(**config_dict)

View File

@@ -174,3 +174,51 @@ title:
max_words: 6
max_chars: 60
model_name: null # Use default model (first model in models list)
# ============================================================================
# Summarization Configuration
# ============================================================================
# Automatically summarize conversation history when token limits are approached
# This helps maintain context in long conversations without exceeding model limits
summarization:
enabled: true
# Model to use for summarization (null = use default model)
# Recommended: Use a lightweight, cost-effective model like "gpt-4o-mini" or similar
model_name: null
# Trigger conditions - at least one required
# Summarization runs when ANY threshold is met (OR logic)
# You can specify a single trigger or a list of triggers
trigger:
# Trigger when token count reaches 4000
- type: tokens
value: 4000
# Uncomment to also trigger when message count reaches 50
# - type: messages
# value: 50
# Uncomment to trigger when 80% of model's max input tokens is reached
# - type: fraction
# value: 0.8
# Context retention policy after summarization
# Specifies how much recent history to preserve
keep:
# Keep the most recent 20 messages (recommended)
type: messages
value: 20
# Alternative: Keep specific token count
# type: tokens
# value: 3000
# Alternative: Keep percentage of model's max input tokens
# type: fraction
# value: 0.3
# Maximum tokens to keep when preparing messages for summarization
# Set to null to skip trimming (not recommended for very long conversations)
trim_tokens_to_summarize: 4000
# Custom summary prompt template (null = use default LangChain prompt)
# The prompt should guide the model to extract important context
summary_prompt: null