feat: add context compress (#590)

* feat:Add context compress * feat: Add unit test * feat: add unit test for context manager * feat: add postprocessor param && code format * feat: add configuration guide * fix: fix the configuration_guide * fix: fix the unit test * fix: fix the default value * feat: add test and log for context_manager
2026-04-03 06:12:14 +08:00 · 2025-09-27 06:42:22 -07:00
parent c214999606
commit 5f4eb38fdb
9 changed files with 1032 additions and 7 deletions
--- a/docs/configuration_guide.md
+++ b/docs/configuration_guide.md
@@ -180,6 +180,20 @@ BASIC_MODEL:
  api_key: $AZURE_OPENAI_API_KEY
 ```

+### How to configure context length for different models
+
+Different models have different context length limitations. DeerFlow provides a method to control the context length between different models. You can configure the context length between different models in the `conf.yaml` file. For example:
+```yaml
+BASIC_MODEL:
+  base_url: https://ark.cn-beijing.volces.com/api/v3
+  model: "doubao-1-5-pro-32k-250115"
+  api_key: ""
+  token_limit: 128000
+```
+This means that the context length limit using this model is 128k. 
+
+The context management doesn't work if the token_limit is not set.
+
 ## About Search Engine

 ### How to control search domains for Tavily?
@@ -210,6 +224,28 @@ SEARCH_ENGINE:
  include_raw_content: false
 ```

+### How to post-process Tavily search results
+
+DeerFlow can post-process Tavily search results:
+* Remove duplicate content
+* Filter low-quality content: Filter out results with low relevance scores
+* Clear base64 encoded images
+* Length truncation: Truncate each search result according to the user-configured length
+
+The filtering of low-quality content and length truncation depend on user configuration, providing two configurable parameters:
+* min_score_threshold: Minimum relevance score threshold, search results below this threshold will be filtered. If not set, no filtering will be performed;
+* max_content_length_per_page: Maximum length limit for each search result content, parts exceeding this length will be truncated. If not set, no truncation will be performed;
+
+These two parameters can be configured in `conf.yaml` as shown below:
+```yaml
+SEARCH_ENGINE:
+  engine: tavily
+  include_images: true
+  min_score_threshold: 0.4
+  max_content_length_per_page: 5000
+```
+That's meaning that the search results will be filtered based on the minimum relevance score threshold and truncated to the maximum length limit for each search result content.
+
 ## RAG (Retrieval-Augmented Generation) Configuration

 DeerFlow supports multiple RAG providers for document retrieval. Configure the RAG provider by setting environment variables.
@@ -244,4 +280,4 @@ MILVUS_EMBEDDING_PROVIDER=openai
 MILVUS_EMBEDDING_BASE_URL=
 MILVUS_EMBEDDING_MODEL=
 MILVUS_EMBEDDING_API_KEY=
-```
+```