deer-flow

mirror of https://gitee.com/wanwujie/deer-flow synced 2026-04-13 10:24:44 +08:00

Author	SHA1	Message	Date
Willem Jiang	b24f4d3f38	fix: apply context compression to prevent token overflow (Issue #721 ) (#722 ) * fix: apply context compression to prevent token overflow (Issue #721) - Add token_limit configuration to conf.yaml.example for BASIC_MODEL and REASONING_MODEL - Implement context compression in _execute_agent_step() before agent invocation - Preserve first 3 messages (system prompt + context) during compression - Enhance ContextManager logging with better token count reporting - Prevent 400 Input tokens exceeded errors by automatically compressing message history * feat: add model-based token limit inference for Issue #721 - Add smart default token limits based on common LLM models - Support model name inference when token_limit not explicitly configured - Models include: OpenAI (GPT-4o, GPT-4, etc.), Claude, Gemini, Doubao, DeepSeek, etc. - Conservative defaults prevent token overflow even without explicit configuration - Priority: explicit config > model inference > safe default (100,000 tokens) - Ensures Issue #721 protection for all users, not just those with token_limit set	2025-11-28 18:52:42 +08:00
Willem Jiang	b4c09aa4b1	security: add log injection attack prevention with input sanitization (#667 ) * security: add log injection attack prevention with input sanitization - Created src/utils/log_sanitizer.py to sanitize user-controlled input before logging - Prevents log injection attacks using newlines, tabs, carriage returns, etc. - Escapes dangerous characters: \n, \r, \t, \0, \x1b - Provides specialized functions for different input types: - sanitize_log_input: general purpose sanitization - sanitize_thread_id: for user-provided thread IDs - sanitize_user_content: for user messages (more aggressive truncation) - sanitize_agent_name: for agent identifiers - sanitize_tool_name: for tool names - sanitize_feedback: for user interrupt feedback - create_safe_log_message: template-based safe message creation - Updated src/server/app.py to sanitize all user input in logging: - Thread IDs from request parameter - Message content from user - Agent names and node information - Tool names and feedback - Updated src/agents/tool_interceptor.py to sanitize: - Tool names during execution - User feedback during interrupt handling - Tool input data - Added 29 comprehensive unit tests covering: - Classic newline injection attacks - Carriage return injection - Tab and null character injection - HTML/ANSI escape sequence injection - Combined multi-character attacks - Truncation and length limits Fixes potential log forgery vulnerability where malicious users could inject fake log entries via unsanitized input containing control characters.	2025-10-27 20:57:23 +08:00
Willem Jiang	c7a82b82b4	fix: parsed json with extra tokens issue (#656 ) Fixes #598 * fix: parsed json with extra tokens issue * Added unit test for json.ts * fix the json unit test running issue * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update the code with code review suggestion --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Willem Jiang <143703838+willem-bd@users.noreply.github.com>	2025-10-26 07:24:25 +08:00
Willem Jiang	052490b116	fix: resolve issue #467 - message content validation and Tavily search error handling (#645 ) * fix: resolve issue #467 - message content validation and Tavily search error handling This commit implements a comprehensive fix for issue #467 where the application crashed with 'Field required: input.messages.3.content' error when generating reports. ## Root Cause Analysis The issue had multiple interconnected causes: 1. Tavily tool returned mixed types (lists/error strings) instead of consistent JSON 2. background_investigation_node didn't handle error cases properly, returning None 3. Missing message content validation before LLM calls 4. Insufficient error diagnostics for content-related errors ## Changes Made ### Part 1: Fix Tavily Search Tool (tavily_search_results_with_images.py) - Modified _run() and _arun() methods to return JSON strings instead of mixed types - Error responses now return JSON: {"error": repr(e)} - Successful responses return JSON string: json.dumps(cleaned_results) - Ensures tool results always have valid string content for ToolMessages ### Part 2: Fix background_investigation_node Error Handling (graph/nodes.py) - Initialize background_investigation_results to empty list instead of None - Added proper JSON parsing for string responses from Tavily tool - Handle error responses with explicit error logging - Always return valid JSON (empty list if error) instead of None ### Part 3: Add Message Content Validation (utils/context_manager.py) - New validate_message_content() function validates all messages before LLM calls - Ensures all messages have content attribute and valid string content - Converts complex types (lists, dicts) to JSON strings - Provides graceful fallback for messages with issues ### Part 4: Enhanced Error Diagnostics (_execute_agent_step in graph/nodes.py) - Call message validation before agent invocation - Add detailed logging for content-related errors - Log message types, content types, and lengths when validation fails - Helps with future debugging of similar issues ## Testing - All unit tests pass (395 tests) - Python syntax verified for all modified files - No breaking changes to existing functionality * test: update tests for issue #467 fixes Update test expectations to match the new implementation: - Tavily search tool now returns JSON strings instead of mixed types - background_investigation_node returns empty list [] for errors instead of None - All tests updated to verify the new behavior - All 391 tests pass successfully * Update src/graph/nodes.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-23 22:08:14 +08:00
jimmyuconn1982	2510cc61de	feat: Add intelligent clarification feature in coordinate step for research queries (#613 ) * fix: support local models by making thought field optional in Plan model - Make thought field optional in Plan model to fix Pydantic validation errors with local models - Add Ollama configuration example to conf.yaml.example - Update documentation to include local model support - Improve planner prompt with better JSON format requirements Fixes local model integration issues where models like qwen3:14b would fail due to missing thought field in JSON output. * feat: Add intelligent clarification feature for research queries - Add multi-turn clarification process to refine vague research questions - Implement three-dimension clarification standard (Tech/App, Focus, Scope) - Add clarification state management in coordinator node - Update coordinator prompt with detailed clarification guidelines - Add UI settings to enable/disable clarification feature (disabled by default) - Update workflow to handle clarification rounds recursively - Add comprehensive test coverage for clarification functionality - Update documentation with clarification feature usage guide Key components: - src/graph/nodes.py: Core clarification logic and state management - src/prompts/coordinator.md: Detailed clarification guidelines - src/workflow.py: Recursive clarification handling - web/: UI settings integration - tests/: Comprehensive test coverage - docs/: Updated configuration guide * fix: Improve clarification conversation continuity - Add comprehensive conversation history to clarification context - Include previous exchanges summary in system messages - Add explicit guidelines for continuing rounds in coordinator prompt - Prevent LLM from starting new topics during clarification - Ensure topic continuity across clarification rounds Fixes issue where LLM would restart clarification instead of building upon previous exchanges. * fix: Add conversation history to clarification context * fix: resolve clarification feature message to planer, prompt, test issues - Optimize coordinator.md prompt template for better clarification flow - Simplify final message sent to planner after clarification - Fix API key assertion issues in test_search.py * fix: Add configurable max_clarification_rounds and comprehensive tests - Add max_clarification_rounds parameter for external configuration - Add comprehensive test cases for clarification feature in test_app.py - Fixes issues found during interactive mode testing where: - Recursive call failed due to missing initial_state parameter - Clarification exited prematurely at max rounds - Incorrect logging of max rounds reached * Move clarification tests to test_nodes.py and add max_clarification_rounds to zh.json	2025-10-14 13:35:57 +08:00
Fancy-hjyp	5f4eb38fdb	feat: add context compress (#590 ) * feat:Add context compress * feat: Add unit test * feat: add unit test for context manager * feat: add postprocessor param && code format * feat: add configuration guide * fix: fix the configuration_guide * fix: fix the unit test * fix: fix the default value * feat: add test and log for context_manager	2025-09-27 21:42:22 +08:00
Chayton Bai	7694bb5d72	feat: support dify in rag module (#550 ) Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2025-09-16 20:30:45 +08:00
zgjja	3b4e993531	feat: 1. replace black with ruff for fomatting and sort import (#489 ) 2. use tavily from`langchain-tavily` rather than the older one from `langchain-community` Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2025-08-17 22:57:23 +08:00
CHANGXUBO	1bfec3ad05	feat: Enhance chat streaming and tool call processing (#498 ) * feat: Enhance chat streaming and tool call processing - Added support for MongoDB checkpointer in the chat streaming workflow. - Introduced functions to process tool call chunks and sanitize arguments. - Improved event message creation with additional metadata. - Enhanced error handling for JSON serialization in event messages. - Updated the frontend to convert escaped characters in tool call arguments. - Refactored the workflow input preparation and initial message processing. - Added new dependencies for MongoDB integration and tool argument sanitization. * fix: Update MongoDB checkpointer configuration to use LANGGRAPH_CHECKPOINT_DB_URL * feat: Add support for Postgres checkpointing and update README with database recommendations * feat: Implement checkpoint saver functionality and update MongoDB connection handling * refactor: Improve code formatting and readability in app.py and json_utils.py * refactor: Clean up commented code and improve formatting in server.py * refactor: Remove unused imports and improve code organization in app.py * refactor: Improve code organization and remove unnecessary comments in app.py * chore: use langgraph-checkpoint-postgres==2.0.21 to avoid the JSON convert issue in the latest version, implement chat stream persistant with Postgres * feat: add MongoDB and PostgreSQL support for LangGraph checkpointing, enhance environment variable handling * fix: update comments for clarity on Windows event loop policy * chore: remove empty code changes in MongoDB and PostgreSQL checkpoint tests * chore: clean up unused imports and code in checkpoint-related files * chore: remove empty code changes in test_checkpoint.py * chore: remove empty code changes in test_checkpoint.py * chore: remove empty code changes in test_checkpoint.py * test: update status code assertions in MCP endpoint tests to allow for 403 responses * test: update MCP endpoint tests to assert specific status codes and enable MCP server configuration * chore: remove unnecessary environment variables from unittest workflow * fix: invert condition for MCP server configuration check to raise 403 when disabled * chore: remove pymongo from test dependencies in uv.lock * chore: optimize the _get_agent_name method * test: enhance ChatStreamManager tests for PostgreSQL and MongoDB initialization * test: add persistence tests for ChatStreamManager with PostgreSQL and MongoDB * test: add unit tests for ChatStreamManager initialization with PostgreSQL and MongoDB * test: enhance persistence tests for ChatStreamManager with PostgreSQL and MongoDB to verify message aggregation * test: add unit tests for ChatStreamManager with PostgreSQL and MongoDB * test: add unit tests for ChatStreamManager initialization with PostgreSQL and MongoDB * test: add unit tests for ChatStreamManager initialization with PostgreSQL and MongoDB --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2025-08-16 21:03:12 +08:00
cmq2525	0dc6c16c42	fix: repair_json_output cannot process msgs that do not starts with {, [ or ``` (#384 ) Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2025-07-12 23:29:22 +08:00
He Tao	6937abcd91	chore: add license headers	2025-04-17 11:34:42 +08:00
He Tao	03798ded08	feat: lite deep researcher implementation	2025-04-09 20:32:16 +08:00

12 Commits