feat: support infoquest (#708)

* support infoquest

* support html checker

* support html checker

* change line break format

* change line break format

* change line break format

* change line break format

* change line break format

* change line break format

* change line break format

* change line break format

* Fix several critical issues in the codebase
- Resolve crawler panic by improving error handling
- Fix plan validation to prevent invalid configurations
- Correct InfoQuest crawler JSON conversion logic

* add test for infoquest

* add test for infoquest

* Add InfoQuest introduction to the README

* add test for infoquest

* fix readme for infoquest

* fix readme for infoquest

* resolve the conflict

* resolve the conflict

* resolve the conflict

* Fix formatting of INFOQUEST in SearchEngine enum

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Willem Jiang <143703838+willem-bd@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
infoquest-byteplus
2025-12-02 08:16:35 +08:00
committed by GitHub
parent e179fb1632
commit 7ec9e45702
22 changed files with 2103 additions and 94 deletions

View File

@@ -61,9 +61,13 @@ BASIC_MODEL:
# # When interrupt is triggered, user will be prompted to approve/reject
# # Approved keywords: "approved", "approve", "yes", "proceed", "continue", "ok", "okay", "accepted", "accept"
# Search engine configuration (Only supports Tavily currently)
# Search engine configuration
# Supported engines: tavily, infoquest
# SEARCH_ENGINE:
# engine: tavily
# # Engine type to use: "tavily" or "infoquest"
# engine: tavily or infoquest
#
# # The following parameters are specific to Tavily
# # Only include results from these domains
# include_domains:
# - example.com
@@ -88,3 +92,28 @@ BASIC_MODEL:
# min_score_threshold: 0.0
# # Maximum content length per page
# max_content_length_per_page: 4000
#
# # The following parameters are specific to InfoQuest
# # Used to limit the scope of search results, only returns content within the specified time range. Set to -1 to disable time filtering
# time_range: 30
# # Used to limit the scope of search results, only returns content from specified whitelisted domains. Set to empty string to disable site filtering
# site: "example.com"
# Crawler engine configuration
# Supported engines: jina (default), infoquest
# Uncomment the following section to configure crawler engine
# CRAWLER_ENGINE:
# # Engine type to use: "jina" (default) or "infoquest"
# engine: infoquest
#
# # The following timeout parameters are only effective when engine is set to "infoquest"
# # Waiting time after page loading (in seconds)
# # Set to positive value to enable, -1 to disable
# fetch_time: 10
# # Overall timeout for the entire crawling process (in seconds)
# # Set to positive value to enable, -1 to disable
# timeout: 30
# # Timeout for navigating to the page (in seconds)
# # Set to positive value to enable, -1 to disable
# navi_timeout: 15