* refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.*) and app (app.*) Physically split the monolithic backend/src/ package into two layers: - **Harness** (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.*`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - **App** (`app/`): unpublished application code with import prefix `app.*`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src.* → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.*` or `import app.*` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow.*/app.* after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
10 KiB
Conversation Summarization
DeerFlow includes automatic conversation summarization to handle long conversations that approach model token limits. When enabled, the system automatically condenses older messages while preserving recent context.
Overview
The summarization feature uses LangChain's SummarizationMiddleware to monitor conversation history and trigger summarization based on configurable thresholds. When activated, it:
- Monitors message token counts in real-time
- Triggers summarization when thresholds are met
- Keeps recent messages intact while summarizing older exchanges
- Maintains AI/Tool message pairs together for context continuity
- Injects the summary back into the conversation
Configuration
Summarization is configured in config.yaml under the summarization key:
summarization:
enabled: true
model_name: null # Use default model or specify a lightweight model
# Trigger conditions (OR logic - any condition triggers summarization)
trigger:
- type: tokens
value: 4000
# Additional triggers (optional)
# - type: messages
# value: 50
# - type: fraction
# value: 0.8 # 80% of model's max input tokens
# Context retention policy
keep:
type: messages
value: 20
# Token trimming for summarization call
trim_tokens_to_summarize: 4000
# Custom summary prompt (optional)
summary_prompt: null
Configuration Options
enabled
- Type: Boolean
- Default:
false - Description: Enable or disable automatic summarization
model_name
- Type: String or null
- Default:
null(uses default model) - Description: Model to use for generating summaries. Recommended to use a lightweight, cost-effective model like
gpt-4o-minior equivalent.
trigger
- Type: Single
ContextSizeor list ofContextSizeobjects - Required: At least one trigger must be specified when enabled
- Description: Thresholds that trigger summarization. Uses OR logic - summarization runs when ANY threshold is met.
ContextSize Types:
-
Token-based trigger: Activates when token count reaches the specified value
trigger: type: tokens value: 4000 -
Message-based trigger: Activates when message count reaches the specified value
trigger: type: messages value: 50 -
Fraction-based trigger: Activates when token usage reaches a percentage of the model's maximum input tokens
trigger: type: fraction value: 0.8 # 80% of max input tokens
Multiple Triggers:
trigger:
- type: tokens
value: 4000
- type: messages
value: 50
keep
- Type:
ContextSizeobject - Default:
{type: messages, value: 20} - Description: Specifies how much recent conversation history to preserve after summarization.
Examples:
# Keep most recent 20 messages
keep:
type: messages
value: 20
# Keep most recent 3000 tokens
keep:
type: tokens
value: 3000
# Keep most recent 30% of model's max input tokens
keep:
type: fraction
value: 0.3
trim_tokens_to_summarize
- Type: Integer or null
- Default:
4000 - Description: Maximum tokens to include when preparing messages for the summarization call itself. Set to
nullto skip trimming (not recommended for very long conversations).
summary_prompt
- Type: String or null
- Default:
null(uses LangChain's default prompt) - Description: Custom prompt template for generating summaries. The prompt should guide the model to extract the most important context.
Default Prompt Behavior: The default LangChain prompt instructs the model to:
- Extract highest quality/most relevant context
- Focus on information critical to the overall goal
- Avoid repeating completed actions
- Return only the extracted context
How It Works
Summarization Flow
- Monitoring: Before each model call, the middleware counts tokens in the message history
- Trigger Check: If any configured threshold is met, summarization is triggered
- Message Partitioning: Messages are split into:
- Messages to summarize (older messages beyond the
keepthreshold) - Messages to preserve (recent messages within the
keepthreshold)
- Messages to summarize (older messages beyond the
- Summary Generation: The model generates a concise summary of the older messages
- Context Replacement: The message history is updated:
- All old messages are removed
- A single summary message is added
- Recent messages are preserved
- AI/Tool Pair Protection: The system ensures AI messages and their corresponding tool messages stay together
Token Counting
- Uses approximate token counting based on character count
- For Anthropic models: ~3.3 characters per token
- For other models: Uses LangChain's default estimation
- Can be customized with a custom
token_counterfunction
Message Preservation
The middleware intelligently preserves message context:
- Recent Messages: Always kept intact based on
keepconfiguration - AI/Tool Pairs: Never split - if a cutoff point falls within tool messages, the system adjusts to keep the entire AI + Tool message sequence together
- Summary Format: Summary is injected as a HumanMessage with the format:
Here is a summary of the conversation to date: [Generated summary text]
Best Practices
Choosing Trigger Thresholds
-
Token-based triggers: Recommended for most use cases
- Set to 60-80% of your model's context window
- Example: For 8K context, use 4000-6000 tokens
-
Message-based triggers: Useful for controlling conversation length
- Good for applications with many short messages
- Example: 50-100 messages depending on average message length
-
Fraction-based triggers: Ideal when using multiple models
- Automatically adapts to each model's capacity
- Example: 0.8 (80% of model's max input tokens)
Choosing Retention Policy (keep)
-
Message-based retention: Best for most scenarios
- Preserves natural conversation flow
- Recommended: 15-25 messages
-
Token-based retention: Use when precise control is needed
- Good for managing exact token budgets
- Recommended: 2000-4000 tokens
-
Fraction-based retention: For multi-model setups
- Automatically scales with model capacity
- Recommended: 0.2-0.4 (20-40% of max input)
Model Selection
-
Recommended: Use a lightweight, cost-effective model for summaries
- Examples:
gpt-4o-mini,claude-haiku, or equivalent - Summaries don't require the most powerful models
- Significant cost savings on high-volume applications
- Examples:
-
Default: If
model_nameisnull, uses the default model- May be more expensive but ensures consistency
- Good for simple setups
Optimization Tips
-
Balance triggers: Combine token and message triggers for robust handling
trigger: - type: tokens value: 4000 - type: messages value: 50 -
Conservative retention: Keep more messages initially, adjust based on performance
keep: type: messages value: 25 # Start higher, reduce if needed -
Trim strategically: Limit tokens sent to summarization model
trim_tokens_to_summarize: 4000 # Prevents expensive summarization calls -
Monitor and iterate: Track summary quality and adjust configuration
Troubleshooting
Summary Quality Issues
Problem: Summaries losing important context
Solutions:
- Increase
keepvalue to preserve more messages - Decrease trigger thresholds to summarize earlier
- Customize
summary_promptto emphasize key information - Use a more capable model for summarization
Performance Issues
Problem: Summarization calls taking too long
Solutions:
- Use a faster model for summaries (e.g.,
gpt-4o-mini) - Reduce
trim_tokens_to_summarizeto send less context - Increase trigger thresholds to summarize less frequently
Token Limit Errors
Problem: Still hitting token limits despite summarization
Solutions:
- Lower trigger thresholds to summarize earlier
- Reduce
keepvalue to preserve fewer messages - Check if individual messages are very large
- Consider using fraction-based triggers
Implementation Details
Code Structure
- Configuration:
packages/harness/deerflow/config/summarization_config.py - Integration:
packages/harness/deerflow/agents/lead_agent/agent.py - Middleware: Uses
langchain.agents.middleware.SummarizationMiddleware
Middleware Order
Summarization runs after ThreadData and Sandbox initialization but before Title and Clarification:
- ThreadDataMiddleware
- SandboxMiddleware
- SummarizationMiddleware ← Runs here
- TitleMiddleware
- ClarificationMiddleware
State Management
- Summarization is stateless - configuration is loaded once at startup
- Summaries are added as regular messages in the conversation history
- The checkpointer persists the summarized history automatically
Example Configurations
Minimal Configuration
summarization:
enabled: true
trigger:
type: tokens
value: 4000
keep:
type: messages
value: 20
Production Configuration
summarization:
enabled: true
model_name: gpt-4o-mini # Lightweight model for cost efficiency
trigger:
- type: tokens
value: 6000
- type: messages
value: 75
keep:
type: messages
value: 25
trim_tokens_to_summarize: 5000
Multi-Model Configuration
summarization:
enabled: true
model_name: gpt-4o-mini
trigger:
type: fraction
value: 0.7 # 70% of model's max input
keep:
type: fraction
value: 0.3 # Keep 30% of max input
trim_tokens_to_summarize: 4000
Conservative Configuration (High Quality)
summarization:
enabled: true
model_name: gpt-4 # Use full model for high-quality summaries
trigger:
type: tokens
value: 8000
keep:
type: messages
value: 40 # Keep more context
trim_tokens_to_summarize: null # No trimming