mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-03 06:12:14 +08:00
feat: implement summarization (#14)
This commit is contained in:
@@ -81,14 +81,27 @@ Config values starting with `$` are resolved as environment variables (e.g., `$O
|
||||
- Local sandbox: `/mnt/skills` → `/path/to/deer-flow/skills`
|
||||
- Docker sandbox: Automatically mounted as volume
|
||||
|
||||
**Middleware System**
|
||||
- Custom middlewares in `src/agents/middlewares/`: Title generation, thread data, clarification, etc.
|
||||
- `SummarizationMiddleware` from LangChain automatically condenses conversation history when token limits are approached
|
||||
- Configured in `config.yaml` under `summarization` key with trigger/keep thresholds
|
||||
- Middlewares are registered in `src/agents/lead_agent/agent.py` with execution order:
|
||||
1. `ThreadDataMiddleware` - Initializes thread context
|
||||
2. `SandboxMiddleware` - Manages sandbox lifecycle
|
||||
3. `SummarizationMiddleware` - Reduces context when limits are approached (if enabled)
|
||||
4. `TitleMiddleware` - Generates conversation titles
|
||||
5. `ClarificationMiddleware` - Handles clarification requests (must be last)
|
||||
|
||||
### Config Schema
|
||||
|
||||
Models, tools, sandbox providers, and skills are configured in `config.yaml`:
|
||||
Models, tools, sandbox providers, skills, and middleware settings are configured in `config.yaml`:
|
||||
- `models[]`: LLM configurations with `use` class path
|
||||
- `tools[]`: Tool configurations with `use` variable path and `group`
|
||||
- `sandbox.use`: Sandbox provider class path
|
||||
- `skills.path`: Host path to skills directory (optional, default: `../skills`)
|
||||
- `skills.container_path`: Container mount path (default: `/mnt/skills`)
|
||||
- `title`: Automatic thread title generation configuration
|
||||
- `summarization`: Automatic conversation summarization configuration
|
||||
|
||||
## Code Style
|
||||
|
||||
|
||||
@@ -4,7 +4,9 @@
|
||||
|
||||
[x] Launch the sandbox only after the first file system or bash tool is called
|
||||
[ ] Pooling the sandbox resources to reduce the number of sandbox containers
|
||||
[ ] Add Clarification Process for the whole process
|
||||
[x] Add Clarification Process for the whole process
|
||||
[x] Implement Context Summarization Mechanism to avoid context explosion\
|
||||
[ ] Integrate MCP
|
||||
|
||||
## Issues
|
||||
|
||||
|
||||
353
backend/docs/summarization.md
Normal file
353
backend/docs/summarization.md
Normal file
@@ -0,0 +1,353 @@
|
||||
# Conversation Summarization
|
||||
|
||||
DeerFlow includes automatic conversation summarization to handle long conversations that approach model token limits. When enabled, the system automatically condenses older messages while preserving recent context.
|
||||
|
||||
## Overview
|
||||
|
||||
The summarization feature uses LangChain's `SummarizationMiddleware` to monitor conversation history and trigger summarization based on configurable thresholds. When activated, it:
|
||||
|
||||
1. Monitors message token counts in real-time
|
||||
2. Triggers summarization when thresholds are met
|
||||
3. Keeps recent messages intact while summarizing older exchanges
|
||||
4. Maintains AI/Tool message pairs together for context continuity
|
||||
5. Injects the summary back into the conversation
|
||||
|
||||
## Configuration
|
||||
|
||||
Summarization is configured in `config.yaml` under the `summarization` key:
|
||||
|
||||
```yaml
|
||||
summarization:
|
||||
enabled: true
|
||||
model_name: null # Use default model or specify a lightweight model
|
||||
|
||||
# Trigger conditions (OR logic - any condition triggers summarization)
|
||||
trigger:
|
||||
- type: tokens
|
||||
value: 4000
|
||||
# Additional triggers (optional)
|
||||
# - type: messages
|
||||
# value: 50
|
||||
# - type: fraction
|
||||
# value: 0.8 # 80% of model's max input tokens
|
||||
|
||||
# Context retention policy
|
||||
keep:
|
||||
type: messages
|
||||
value: 20
|
||||
|
||||
# Token trimming for summarization call
|
||||
trim_tokens_to_summarize: 4000
|
||||
|
||||
# Custom summary prompt (optional)
|
||||
summary_prompt: null
|
||||
```
|
||||
|
||||
### Configuration Options
|
||||
|
||||
#### `enabled`
|
||||
- **Type**: Boolean
|
||||
- **Default**: `false`
|
||||
- **Description**: Enable or disable automatic summarization
|
||||
|
||||
#### `model_name`
|
||||
- **Type**: String or null
|
||||
- **Default**: `null` (uses default model)
|
||||
- **Description**: Model to use for generating summaries. Recommended to use a lightweight, cost-effective model like `gpt-4o-mini` or equivalent.
|
||||
|
||||
#### `trigger`
|
||||
- **Type**: Single `ContextSize` or list of `ContextSize` objects
|
||||
- **Required**: At least one trigger must be specified when enabled
|
||||
- **Description**: Thresholds that trigger summarization. Uses OR logic - summarization runs when ANY threshold is met.
|
||||
|
||||
**ContextSize Types:**
|
||||
|
||||
1. **Token-based trigger**: Activates when token count reaches the specified value
|
||||
```yaml
|
||||
trigger:
|
||||
type: tokens
|
||||
value: 4000
|
||||
```
|
||||
|
||||
2. **Message-based trigger**: Activates when message count reaches the specified value
|
||||
```yaml
|
||||
trigger:
|
||||
type: messages
|
||||
value: 50
|
||||
```
|
||||
|
||||
3. **Fraction-based trigger**: Activates when token usage reaches a percentage of the model's maximum input tokens
|
||||
```yaml
|
||||
trigger:
|
||||
type: fraction
|
||||
value: 0.8 # 80% of max input tokens
|
||||
```
|
||||
|
||||
**Multiple Triggers:**
|
||||
```yaml
|
||||
trigger:
|
||||
- type: tokens
|
||||
value: 4000
|
||||
- type: messages
|
||||
value: 50
|
||||
```
|
||||
|
||||
#### `keep`
|
||||
- **Type**: `ContextSize` object
|
||||
- **Default**: `{type: messages, value: 20}`
|
||||
- **Description**: Specifies how much recent conversation history to preserve after summarization.
|
||||
|
||||
**Examples:**
|
||||
```yaml
|
||||
# Keep most recent 20 messages
|
||||
keep:
|
||||
type: messages
|
||||
value: 20
|
||||
|
||||
# Keep most recent 3000 tokens
|
||||
keep:
|
||||
type: tokens
|
||||
value: 3000
|
||||
|
||||
# Keep most recent 30% of model's max input tokens
|
||||
keep:
|
||||
type: fraction
|
||||
value: 0.3
|
||||
```
|
||||
|
||||
#### `trim_tokens_to_summarize`
|
||||
- **Type**: Integer or null
|
||||
- **Default**: `4000`
|
||||
- **Description**: Maximum tokens to include when preparing messages for the summarization call itself. Set to `null` to skip trimming (not recommended for very long conversations).
|
||||
|
||||
#### `summary_prompt`
|
||||
- **Type**: String or null
|
||||
- **Default**: `null` (uses LangChain's default prompt)
|
||||
- **Description**: Custom prompt template for generating summaries. The prompt should guide the model to extract the most important context.
|
||||
|
||||
**Default Prompt Behavior:**
|
||||
The default LangChain prompt instructs the model to:
|
||||
- Extract highest quality/most relevant context
|
||||
- Focus on information critical to the overall goal
|
||||
- Avoid repeating completed actions
|
||||
- Return only the extracted context
|
||||
|
||||
## How It Works
|
||||
|
||||
### Summarization Flow
|
||||
|
||||
1. **Monitoring**: Before each model call, the middleware counts tokens in the message history
|
||||
2. **Trigger Check**: If any configured threshold is met, summarization is triggered
|
||||
3. **Message Partitioning**: Messages are split into:
|
||||
- Messages to summarize (older messages beyond the `keep` threshold)
|
||||
- Messages to preserve (recent messages within the `keep` threshold)
|
||||
4. **Summary Generation**: The model generates a concise summary of the older messages
|
||||
5. **Context Replacement**: The message history is updated:
|
||||
- All old messages are removed
|
||||
- A single summary message is added
|
||||
- Recent messages are preserved
|
||||
6. **AI/Tool Pair Protection**: The system ensures AI messages and their corresponding tool messages stay together
|
||||
|
||||
### Token Counting
|
||||
|
||||
- Uses approximate token counting based on character count
|
||||
- For Anthropic models: ~3.3 characters per token
|
||||
- For other models: Uses LangChain's default estimation
|
||||
- Can be customized with a custom `token_counter` function
|
||||
|
||||
### Message Preservation
|
||||
|
||||
The middleware intelligently preserves message context:
|
||||
|
||||
- **Recent Messages**: Always kept intact based on `keep` configuration
|
||||
- **AI/Tool Pairs**: Never split - if a cutoff point falls within tool messages, the system adjusts to keep the entire AI + Tool message sequence together
|
||||
- **Summary Format**: Summary is injected as a HumanMessage with the format:
|
||||
```
|
||||
Here is a summary of the conversation to date:
|
||||
|
||||
[Generated summary text]
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Choosing Trigger Thresholds
|
||||
|
||||
1. **Token-based triggers**: Recommended for most use cases
|
||||
- Set to 60-80% of your model's context window
|
||||
- Example: For 8K context, use 4000-6000 tokens
|
||||
|
||||
2. **Message-based triggers**: Useful for controlling conversation length
|
||||
- Good for applications with many short messages
|
||||
- Example: 50-100 messages depending on average message length
|
||||
|
||||
3. **Fraction-based triggers**: Ideal when using multiple models
|
||||
- Automatically adapts to each model's capacity
|
||||
- Example: 0.8 (80% of model's max input tokens)
|
||||
|
||||
### Choosing Retention Policy (`keep`)
|
||||
|
||||
1. **Message-based retention**: Best for most scenarios
|
||||
- Preserves natural conversation flow
|
||||
- Recommended: 15-25 messages
|
||||
|
||||
2. **Token-based retention**: Use when precise control is needed
|
||||
- Good for managing exact token budgets
|
||||
- Recommended: 2000-4000 tokens
|
||||
|
||||
3. **Fraction-based retention**: For multi-model setups
|
||||
- Automatically scales with model capacity
|
||||
- Recommended: 0.2-0.4 (20-40% of max input)
|
||||
|
||||
### Model Selection
|
||||
|
||||
- **Recommended**: Use a lightweight, cost-effective model for summaries
|
||||
- Examples: `gpt-4o-mini`, `claude-haiku`, or equivalent
|
||||
- Summaries don't require the most powerful models
|
||||
- Significant cost savings on high-volume applications
|
||||
|
||||
- **Default**: If `model_name` is `null`, uses the default model
|
||||
- May be more expensive but ensures consistency
|
||||
- Good for simple setups
|
||||
|
||||
### Optimization Tips
|
||||
|
||||
1. **Balance triggers**: Combine token and message triggers for robust handling
|
||||
```yaml
|
||||
trigger:
|
||||
- type: tokens
|
||||
value: 4000
|
||||
- type: messages
|
||||
value: 50
|
||||
```
|
||||
|
||||
2. **Conservative retention**: Keep more messages initially, adjust based on performance
|
||||
```yaml
|
||||
keep:
|
||||
type: messages
|
||||
value: 25 # Start higher, reduce if needed
|
||||
```
|
||||
|
||||
3. **Trim strategically**: Limit tokens sent to summarization model
|
||||
```yaml
|
||||
trim_tokens_to_summarize: 4000 # Prevents expensive summarization calls
|
||||
```
|
||||
|
||||
4. **Monitor and iterate**: Track summary quality and adjust configuration
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Summary Quality Issues
|
||||
|
||||
**Problem**: Summaries losing important context
|
||||
|
||||
**Solutions**:
|
||||
1. Increase `keep` value to preserve more messages
|
||||
2. Decrease trigger thresholds to summarize earlier
|
||||
3. Customize `summary_prompt` to emphasize key information
|
||||
4. Use a more capable model for summarization
|
||||
|
||||
### Performance Issues
|
||||
|
||||
**Problem**: Summarization calls taking too long
|
||||
|
||||
**Solutions**:
|
||||
1. Use a faster model for summaries (e.g., `gpt-4o-mini`)
|
||||
2. Reduce `trim_tokens_to_summarize` to send less context
|
||||
3. Increase trigger thresholds to summarize less frequently
|
||||
|
||||
### Token Limit Errors
|
||||
|
||||
**Problem**: Still hitting token limits despite summarization
|
||||
|
||||
**Solutions**:
|
||||
1. Lower trigger thresholds to summarize earlier
|
||||
2. Reduce `keep` value to preserve fewer messages
|
||||
3. Check if individual messages are very large
|
||||
4. Consider using fraction-based triggers
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Code Structure
|
||||
|
||||
- **Configuration**: `src/config/summarization_config.py`
|
||||
- **Integration**: `src/agents/lead_agent/agent.py`
|
||||
- **Middleware**: Uses `langchain.agents.middleware.SummarizationMiddleware`
|
||||
|
||||
### Middleware Order
|
||||
|
||||
Summarization runs after ThreadData and Sandbox initialization but before Title and Clarification:
|
||||
|
||||
1. ThreadDataMiddleware
|
||||
2. SandboxMiddleware
|
||||
3. **SummarizationMiddleware** ← Runs here
|
||||
4. TitleMiddleware
|
||||
5. ClarificationMiddleware
|
||||
|
||||
### State Management
|
||||
|
||||
- Summarization is stateless - configuration is loaded once at startup
|
||||
- Summaries are added as regular messages in the conversation history
|
||||
- The checkpointer persists the summarized history automatically
|
||||
|
||||
## Example Configurations
|
||||
|
||||
### Minimal Configuration
|
||||
```yaml
|
||||
summarization:
|
||||
enabled: true
|
||||
trigger:
|
||||
type: tokens
|
||||
value: 4000
|
||||
keep:
|
||||
type: messages
|
||||
value: 20
|
||||
```
|
||||
|
||||
### Production Configuration
|
||||
```yaml
|
||||
summarization:
|
||||
enabled: true
|
||||
model_name: gpt-4o-mini # Lightweight model for cost efficiency
|
||||
trigger:
|
||||
- type: tokens
|
||||
value: 6000
|
||||
- type: messages
|
||||
value: 75
|
||||
keep:
|
||||
type: messages
|
||||
value: 25
|
||||
trim_tokens_to_summarize: 5000
|
||||
```
|
||||
|
||||
### Multi-Model Configuration
|
||||
```yaml
|
||||
summarization:
|
||||
enabled: true
|
||||
model_name: gpt-4o-mini
|
||||
trigger:
|
||||
type: fraction
|
||||
value: 0.7 # 70% of model's max input
|
||||
keep:
|
||||
type: fraction
|
||||
value: 0.3 # Keep 30% of max input
|
||||
trim_tokens_to_summarize: 4000
|
||||
```
|
||||
|
||||
### Conservative Configuration (High Quality)
|
||||
```yaml
|
||||
summarization:
|
||||
enabled: true
|
||||
model_name: gpt-4 # Use full model for high-quality summaries
|
||||
trigger:
|
||||
type: tokens
|
||||
value: 8000
|
||||
keep:
|
||||
type: messages
|
||||
value: 40 # Keep more context
|
||||
trim_tokens_to_summarize: null # No trimming
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [LangChain Summarization Middleware Documentation](https://docs.langchain.com/oss/python/langchain/middleware/built-in#summarization)
|
||||
- [LangChain Source Code](https://github.com/langchain-ai/langchain)
|
||||
@@ -1,4 +1,5 @@
|
||||
from langchain.agents import create_agent
|
||||
from langchain.agents.middleware import SummarizationMiddleware
|
||||
from langchain_core.runnables import RunnableConfig
|
||||
|
||||
from src.agents.lead_agent.prompt import apply_prompt_template
|
||||
@@ -6,12 +7,66 @@ from src.agents.middlewares.clarification_middleware import ClarificationMiddlew
|
||||
from src.agents.middlewares.thread_data_middleware import ThreadDataMiddleware
|
||||
from src.agents.middlewares.title_middleware import TitleMiddleware
|
||||
from src.agents.thread_state import ThreadState
|
||||
from src.config.summarization_config import get_summarization_config
|
||||
from src.models import create_chat_model
|
||||
from src.sandbox.middleware import SandboxMiddleware
|
||||
|
||||
|
||||
def _create_summarization_middleware() -> SummarizationMiddleware | None:
|
||||
"""Create and configure the summarization middleware from config."""
|
||||
config = get_summarization_config()
|
||||
|
||||
if not config.enabled:
|
||||
return None
|
||||
|
||||
# Prepare trigger parameter
|
||||
trigger = None
|
||||
if config.trigger is not None:
|
||||
if isinstance(config.trigger, list):
|
||||
trigger = [t.to_tuple() for t in config.trigger]
|
||||
else:
|
||||
trigger = config.trigger.to_tuple()
|
||||
|
||||
# Prepare keep parameter
|
||||
keep = config.keep.to_tuple()
|
||||
|
||||
# Prepare model parameter
|
||||
if config.model_name:
|
||||
model = config.model_name
|
||||
else:
|
||||
# Use a lightweight model for summarization to save costs
|
||||
# Falls back to default model if not explicitly specified
|
||||
model = create_chat_model(thinking_enabled=False)
|
||||
|
||||
# Prepare kwargs
|
||||
kwargs = {
|
||||
"model": model,
|
||||
"trigger": trigger,
|
||||
"keep": keep,
|
||||
}
|
||||
|
||||
if config.trim_tokens_to_summarize is not None:
|
||||
kwargs["trim_tokens_to_summarize"] = config.trim_tokens_to_summarize
|
||||
|
||||
if config.summary_prompt is not None:
|
||||
kwargs["summary_prompt"] = config.summary_prompt
|
||||
|
||||
return SummarizationMiddleware(**kwargs)
|
||||
|
||||
|
||||
# ThreadDataMiddleware must be before SandboxMiddleware to ensure thread_id is available
|
||||
# SummarizationMiddleware should be early to reduce context before other processing
|
||||
# ClarificationMiddleware should be last to intercept clarification requests after model calls
|
||||
middlewares = [ThreadDataMiddleware(), SandboxMiddleware(), TitleMiddleware(), ClarificationMiddleware()]
|
||||
def _build_middlewares():
|
||||
middlewares = [ThreadDataMiddleware(), SandboxMiddleware()]
|
||||
|
||||
# Add summarization middleware if enabled
|
||||
summarization_middleware = _create_summarization_middleware()
|
||||
if summarization_middleware is not None:
|
||||
middlewares.append(summarization_middleware)
|
||||
|
||||
middlewares.extend([TitleMiddleware(), ClarificationMiddleware()])
|
||||
return middlewares
|
||||
|
||||
|
||||
def make_lead_agent(config: RunnableConfig):
|
||||
@@ -24,7 +79,7 @@ def make_lead_agent(config: RunnableConfig):
|
||||
return create_agent(
|
||||
model=create_chat_model(name=model_name, thinking_enabled=thinking_enabled),
|
||||
tools=get_available_tools(),
|
||||
middleware=middlewares,
|
||||
middleware=_build_middlewares(),
|
||||
system_prompt=apply_prompt_template(),
|
||||
state_schema=ThreadState,
|
||||
)
|
||||
|
||||
@@ -89,7 +89,7 @@ You: "Deploying to staging..." [proceed]
|
||||
You have access to skills that provide optimized workflows for specific tasks. Each skill contains best practices, frameworks, and references to additional resources.
|
||||
|
||||
**Progressive Loading Pattern:**
|
||||
1. When a user query matches a skill's use case, immediately call `view` on the skill's main file using the path attribute provided in the skill tag below
|
||||
1. When a user query matches a skill's use case, immediately call `read_file` on the skill's main file using the path attribute provided in the skill tag below
|
||||
2. Read and understand the skill's workflow and instructions
|
||||
3. The skill file contains references to external resources under the same folder
|
||||
4. Load referenced resources only when needed during execution
|
||||
|
||||
@@ -9,6 +9,7 @@ from pydantic import BaseModel, ConfigDict, Field
|
||||
from src.config.model_config import ModelConfig
|
||||
from src.config.sandbox_config import SandboxConfig
|
||||
from src.config.skills_config import SkillsConfig
|
||||
from src.config.summarization_config import load_summarization_config_from_dict
|
||||
from src.config.title_config import load_title_config_from_dict
|
||||
from src.config.tool_config import ToolConfig, ToolGroupConfig
|
||||
|
||||
@@ -75,6 +76,10 @@ class AppConfig(BaseModel):
|
||||
if "title" in config_data:
|
||||
load_title_config_from_dict(config_data["title"])
|
||||
|
||||
# Load summarization config if present
|
||||
if "summarization" in config_data:
|
||||
load_summarization_config_from_dict(config_data["summarization"])
|
||||
|
||||
result = cls.model_validate(config_data)
|
||||
return result
|
||||
|
||||
|
||||
74
backend/src/config/summarization_config.py
Normal file
74
backend/src/config/summarization_config.py
Normal file
@@ -0,0 +1,74 @@
|
||||
"""Configuration for conversation summarization."""
|
||||
|
||||
from typing import Literal
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
ContextSizeType = Literal["fraction", "tokens", "messages"]
|
||||
|
||||
|
||||
class ContextSize(BaseModel):
|
||||
"""Context size specification for trigger or keep parameters."""
|
||||
|
||||
type: ContextSizeType = Field(description="Type of context size specification")
|
||||
value: int | float = Field(description="Value for the context size specification")
|
||||
|
||||
def to_tuple(self) -> tuple[ContextSizeType, int | float]:
|
||||
"""Convert to tuple format expected by SummarizationMiddleware."""
|
||||
return (self.type, self.value)
|
||||
|
||||
|
||||
class SummarizationConfig(BaseModel):
|
||||
"""Configuration for automatic conversation summarization."""
|
||||
|
||||
enabled: bool = Field(
|
||||
default=False,
|
||||
description="Whether to enable automatic conversation summarization",
|
||||
)
|
||||
model_name: str | None = Field(
|
||||
default=None,
|
||||
description="Model name to use for summarization (None = use a lightweight model)",
|
||||
)
|
||||
trigger: ContextSize | list[ContextSize] | None = Field(
|
||||
default=None,
|
||||
description="One or more thresholds that trigger summarization. When any threshold is met, summarization runs. "
|
||||
"Examples: {'type': 'messages', 'value': 50} triggers at 50 messages, "
|
||||
"{'type': 'tokens', 'value': 4000} triggers at 4000 tokens, "
|
||||
"{'type': 'fraction', 'value': 0.8} triggers at 80% of model's max input tokens",
|
||||
)
|
||||
keep: ContextSize = Field(
|
||||
default_factory=lambda: ContextSize(type="messages", value=20),
|
||||
description="Context retention policy after summarization. Specifies how much history to preserve. "
|
||||
"Examples: {'type': 'messages', 'value': 20} keeps 20 messages, "
|
||||
"{'type': 'tokens', 'value': 3000} keeps 3000 tokens, "
|
||||
"{'type': 'fraction', 'value': 0.3} keeps 30% of model's max input tokens",
|
||||
)
|
||||
trim_tokens_to_summarize: int | None = Field(
|
||||
default=4000,
|
||||
description="Maximum tokens to keep when preparing messages for summarization. Pass null to skip trimming.",
|
||||
)
|
||||
summary_prompt: str | None = Field(
|
||||
default=None,
|
||||
description="Custom prompt template for generating summaries. If not provided, uses the default LangChain prompt.",
|
||||
)
|
||||
|
||||
|
||||
# Global configuration instance
|
||||
_summarization_config: SummarizationConfig = SummarizationConfig()
|
||||
|
||||
|
||||
def get_summarization_config() -> SummarizationConfig:
|
||||
"""Get the current summarization configuration."""
|
||||
return _summarization_config
|
||||
|
||||
|
||||
def set_summarization_config(config: SummarizationConfig) -> None:
|
||||
"""Set the summarization configuration."""
|
||||
global _summarization_config
|
||||
_summarization_config = config
|
||||
|
||||
|
||||
def load_summarization_config_from_dict(config_dict: dict) -> None:
|
||||
"""Load summarization configuration from a dictionary."""
|
||||
global _summarization_config
|
||||
_summarization_config = SummarizationConfig(**config_dict)
|
||||
@@ -174,3 +174,51 @@ title:
|
||||
max_words: 6
|
||||
max_chars: 60
|
||||
model_name: null # Use default model (first model in models list)
|
||||
|
||||
# ============================================================================
|
||||
# Summarization Configuration
|
||||
# ============================================================================
|
||||
# Automatically summarize conversation history when token limits are approached
|
||||
# This helps maintain context in long conversations without exceeding model limits
|
||||
|
||||
summarization:
|
||||
enabled: true
|
||||
|
||||
# Model to use for summarization (null = use default model)
|
||||
# Recommended: Use a lightweight, cost-effective model like "gpt-4o-mini" or similar
|
||||
model_name: null
|
||||
|
||||
# Trigger conditions - at least one required
|
||||
# Summarization runs when ANY threshold is met (OR logic)
|
||||
# You can specify a single trigger or a list of triggers
|
||||
trigger:
|
||||
# Trigger when token count reaches 4000
|
||||
- type: tokens
|
||||
value: 4000
|
||||
# Uncomment to also trigger when message count reaches 50
|
||||
# - type: messages
|
||||
# value: 50
|
||||
# Uncomment to trigger when 80% of model's max input tokens is reached
|
||||
# - type: fraction
|
||||
# value: 0.8
|
||||
|
||||
# Context retention policy after summarization
|
||||
# Specifies how much recent history to preserve
|
||||
keep:
|
||||
# Keep the most recent 20 messages (recommended)
|
||||
type: messages
|
||||
value: 20
|
||||
# Alternative: Keep specific token count
|
||||
# type: tokens
|
||||
# value: 3000
|
||||
# Alternative: Keep percentage of model's max input tokens
|
||||
# type: fraction
|
||||
# value: 0.3
|
||||
|
||||
# Maximum tokens to keep when preparing messages for summarization
|
||||
# Set to null to skip trimming (not recommended for very long conversations)
|
||||
trim_tokens_to_summarize: 4000
|
||||
|
||||
# Custom summary prompt template (null = use default LangChain prompt)
|
||||
# The prompt should guide the model to extract important context
|
||||
summary_prompt: null
|
||||
|
||||
Reference in New Issue
Block a user