Commit Graph

17 Commits

Author SHA1 Message Date
Willem Jiang
2a97170b6c feat: add Serper search engine support (#762)
* feat: add Serper search engine support

* docs: update configuration guide and env example for Serper

* test: add test case for Serper with missing API key
2025-12-15 23:04:26 +08:00
infoquest-byteplus
7ec9e45702 feat: support infoquest (#708)
* support infoquest

* support html checker

* support html checker

* change line break format

* change line break format

* change line break format

* change line break format

* change line break format

* change line break format

* change line break format

* change line break format

* Fix several critical issues in the codebase
- Resolve crawler panic by improving error handling
- Fix plan validation to prevent invalid configurations
- Correct InfoQuest crawler JSON conversion logic

* add test for infoquest

* add test for infoquest

* Add InfoQuest introduction to the README

* add test for infoquest

* fix readme for infoquest

* fix readme for infoquest

* resolve the conflict

* resolve the conflict

* resolve the conflict

* Fix formatting of INFOQUEST in SearchEngine enum

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Willem Jiang <143703838+willem-bd@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-02 08:16:35 +08:00
Willem Jiang
170c4eb33c Upgrade langchain version to 1.x (#720)
* fix: revert the part of patch of issue-710 to extract the content from the plan

* Upgrade the ddgs for the new compatible version

* Upgraded langchain to 1.1.0
updated langchain related package to the new compatable version

* Update pyproject.toml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-28 22:09:13 +08:00
Willem Jiang
bec97f02ae fix: the crawling error when encountering PDF URLs (#707)
* fix: the crawling error when encountering PDF URLs

* Added the unit test for the new feature of crawl tool

* fix: address the code review problems

* fix: address the code review problems
2025-11-25 09:24:52 +08:00
Willem Jiang
975b344ca7 fix: resolve issue #651 - crawl error with None content handling (#652)
* fix: resolve issue #651 - crawl error with None content handling
Fixed issue #651 by adding comprehensive null-safety checks and error handling to the crawl system.
The fix prevents the ‘TypeError: Incoming markup is of an invalid type: None’ crash by:
1. Validating HTTP responses from Jina API
2. Handling None/empty content at extraction stage
3. Adding fallback handling in Article markdown/message conversion
4. Improving error diagnostics with detailed logging
5. Adding 16 new tests with 100% coverage for critical paths

* Update src/crawler/readability_extractor.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update src/crawler/article.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-24 17:06:54 +08:00
jimmyuconn1982
2001a7c223 Fix: clarification bugs - max rounds, locale passing, and over-clarification (#647)
Fixes: Max rounds bug, locale passing bug, over-clarification issue

* reslove Copilot spelling comments

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2025-10-24 16:43:39 +08:00
Willem Jiang
052490b116 fix: resolve issue #467 - message content validation and Tavily search error handling (#645)
* fix: resolve issue #467 - message content validation and Tavily search error handling

This commit implements a comprehensive fix for issue #467 where the application
crashed with 'Field required: input.messages.3.content' error when generating reports.

## Root Cause Analysis
The issue had multiple interconnected causes:
1. Tavily tool returned mixed types (lists/error strings) instead of consistent JSON
2. background_investigation_node didn't handle error cases properly, returning None
3. Missing message content validation before LLM calls
4. Insufficient error diagnostics for content-related errors

## Changes Made

### Part 1: Fix Tavily Search Tool (tavily_search_results_with_images.py)
- Modified _run() and _arun() methods to return JSON strings instead of mixed types
- Error responses now return JSON: {"error": repr(e)}
- Successful responses return JSON string: json.dumps(cleaned_results)
- Ensures tool results always have valid string content for ToolMessages

### Part 2: Fix background_investigation_node Error Handling (graph/nodes.py)
- Initialize background_investigation_results to empty list instead of None
- Added proper JSON parsing for string responses from Tavily tool
- Handle error responses with explicit error logging
- Always return valid JSON (empty list if error) instead of None

### Part 3: Add Message Content Validation (utils/context_manager.py)
- New validate_message_content() function validates all messages before LLM calls
- Ensures all messages have content attribute and valid string content
- Converts complex types (lists, dicts) to JSON strings
- Provides graceful fallback for messages with issues

### Part 4: Enhanced Error Diagnostics (_execute_agent_step in graph/nodes.py)
- Call message validation before agent invocation
- Add detailed logging for content-related errors
- Log message types, content types, and lengths when validation fails
- Helps with future debugging of similar issues

## Testing
- All unit tests pass (395 tests)
- Python syntax verified for all modified files
- No breaking changes to existing functionality

* test: update tests for issue #467 fixes

Update test expectations to match the new implementation:
- Tavily search tool now returns JSON strings instead of mixed types
- background_investigation_node returns empty list [] for errors instead of None
- All tests updated to verify the new behavior
- All 391 tests pass successfully

* Update src/graph/nodes.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-23 22:08:14 +08:00
Willem Jiang
9ece3fd9c3 fix: support additional Tavily search parameters via configuration to fix #548 (#643)
* fix: support additional Tavily search parameters via configuration to fix #548

- Add include_answer, search_depth, include_raw_content, include_images, include_image_descriptions to SEARCH_ENGINE config
- Update get_web_search_tool() to load these parameters from configuration with sensible defaults
- Parameters are now properly passed to TavilySearchWithImages during initialization
- This fixes 'got an unexpected keyword argument' errors when using web_search tool
- Update tests to verify new parameters are correctly set

* test: add comprehensive unit tests for web search configuration loading

- Add test for custom configuration values (include_answer, search_depth, etc.)
- Add test for empty configuration (all defaults)
- Add test for image_descriptions logic when include_images is false
- Add test for partial configuration
- Add test for missing config file
- Add test for multiple domains in include/exclude lists

All 7 new tests pass and provide comprehensive coverage of configuration loading
and parameter handling for Tavily search tool initialization.

* test: verify all Tavily configuration parameters are optional

Add 8 comprehensive tests to verify that all Tavily engine configuration
parameters are truly optional:

- test_tavily_with_no_search_engine_section: SEARCH_ENGINE section missing
- test_tavily_with_completely_empty_config: Entire config missing
- test_tavily_with_only_include_answer_param: Single param, rest default
- test_tavily_with_only_search_depth_param: Single param, rest default
- test_tavily_with_only_include_domains_param: Domain param, rest default
- test_tavily_with_explicit_false_boolean_values: False values work correctly
- test_tavily_with_empty_domain_lists: Empty lists handled correctly
- test_tavily_all_parameters_optional_mix: Multiple missing params work

These tests verify:
- Tool creation never fails regardless of missing configuration
- All parameters have sensible defaults
- Boolean parameters can be explicitly set to False
- Any combination of optional parameters works
- Domain lists can be empty or omitted

All 15 Tavily configuration tests pass successfully.
2025-10-22 22:56:02 +08:00
Willem Jiang
d30c4d00d3 fix: convert crawl_tool dict return to JSON string for type consistency (#636)
Keep fixing #631
This pull request updates the crawl_tool function to return its results as a JSON string instead of a dictionary, and adjusts the unit tests accordingly to handle the new return type. The changes ensure consistent serialization of output and proper validation in tests.
2025-10-21 10:00:33 +08:00
Willem Jiang
e2ff765460 fix: correct image result format for OpenAI compatibility to fix #632 (#634)
- Change image result type from 'image' to 'image_url' to match OpenAI API expectations
- Wrap image URL in dict structure: {"url": "..."} instead of plain string
- Update SearchResultPostProcessor to handle dict-based image_url during duplicate removal
- Update tests to validate new image format

This fixes the 400 error: Invalid value: 'image'. Supported values are: 'text', 'image_url'...

Co-authored-by: Willem Jiang <143703838+willem-bd@users.noreply.github.com>
2025-10-20 23:14:09 +08:00
jimmyuconn1982
2510cc61de feat: Add intelligent clarification feature in coordinate step for research queries (#613)
* fix: support local models by making thought field optional in Plan model

- Make thought field optional in Plan model to fix Pydantic validation errors with local models
- Add Ollama configuration example to conf.yaml.example
- Update documentation to include local model support
- Improve planner prompt with better JSON format requirements

Fixes local model integration issues where models like qwen3:14b would fail
due to missing thought field in JSON output.

* feat: Add intelligent clarification feature for research queries

- Add multi-turn clarification process to refine vague research questions
- Implement three-dimension clarification standard (Tech/App, Focus, Scope)
- Add clarification state management in coordinator node
- Update coordinator prompt with detailed clarification guidelines
- Add UI settings to enable/disable clarification feature (disabled by default)
- Update workflow to handle clarification rounds recursively
- Add comprehensive test coverage for clarification functionality
- Update documentation with clarification feature usage guide

Key components:
- src/graph/nodes.py: Core clarification logic and state management
- src/prompts/coordinator.md: Detailed clarification guidelines
- src/workflow.py: Recursive clarification handling
- web/: UI settings integration
- tests/: Comprehensive test coverage
- docs/: Updated configuration guide

* fix: Improve clarification conversation continuity

- Add comprehensive conversation history to clarification context
- Include previous exchanges summary in system messages
- Add explicit guidelines for continuing rounds in coordinator prompt
- Prevent LLM from starting new topics during clarification
- Ensure topic continuity across clarification rounds

Fixes issue where LLM would restart clarification instead of building upon previous exchanges.

* fix: Add conversation history to clarification context

* fix: resolve clarification feature message to planer, prompt, test issues

- Optimize coordinator.md prompt template for better clarification flow
- Simplify final message sent to planner after clarification
- Fix API key assertion issues in test_search.py

* fix: Add configurable max_clarification_rounds and comprehensive tests

- Add max_clarification_rounds parameter for external configuration
- Add comprehensive test cases for clarification feature in test_app.py
- Fixes issues found during interactive mode testing where:
  - Recursive call failed due to missing initial_state parameter
  - Clarification exited prematurely at max rounds
  - Incorrect logging of max rounds reached

* Move clarification tests to test_nodes.py and add max_clarification_rounds to zh.json
2025-10-14 13:35:57 +08:00
Fancy-hjyp
5f4eb38fdb feat: add context compress (#590)
* feat:Add context compress

* feat: Add unit test

* feat: add unit test for context manager

* feat: add postprocessor param && code format

* feat: add configuration guide

* fix: fix the configuration_guide

* fix: fix the unit test

* fix: fix the default value

* feat: add test and log for context_manager
2025-09-27 21:42:22 +08:00
zgjja
3b4e993531 feat: 1. replace black with ruff for fomatting and sort import (#489)
2. use tavily from`langchain-tavily` rather than the older one from `langchain-community`

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2025-08-17 22:57:23 +08:00
Willem Jiang
9e691ecf20 fix: added configuration of python_repl (#503)
* fix: added configuration of python_repl

* fix the lint and unit test errors

* fix the lint and unit test errors

* fix:the lint check errors
2025-08-06 14:27:03 +08:00
DanielWalnut
6d8853b7c7 refine the research prompt (#460) 2025-07-22 14:49:04 +08:00
Willem Jiang
3c46201ff0 fix: fix the lint check errors of the main branch (#403) 2025-07-12 14:43:25 +08:00
Willem Jiang
4c2fe2e7f5 test: add more unit tests of tools (#315)
* test: add more test on test_tts.py

* test: add unit test of search and retriever in tools

* test: remove the main code of search.py

* test: add the travily_search unit test

* reformate the codes

* test: add unit tests of tools

* Added the pytest-asyncio dependency

* added the license header of test_tavily_search_api_wrapper.py
2025-06-12 20:43:32 +08:00