6 Commits

Author SHA1 Message Date
infoquest-byteplus
7ec9e45702 feat: support infoquest (#708)
* support infoquest

* support html checker

* support html checker

* change line break format

* change line break format

* change line break format

* change line break format

* change line break format

* change line break format

* change line break format

* change line break format

* Fix several critical issues in the codebase
- Resolve crawler panic by improving error handling
- Fix plan validation to prevent invalid configurations
- Correct InfoQuest crawler JSON conversion logic

* add test for infoquest

* add test for infoquest

* Add InfoQuest introduction to the README

* add test for infoquest

* fix readme for infoquest

* fix readme for infoquest

* resolve the conflict

* resolve the conflict

* resolve the conflict

* Fix formatting of INFOQUEST in SearchEngine enum

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Willem Jiang <143703838+willem-bd@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-02 08:16:35 +08:00
Willem Jiang
bec97f02ae fix: the crawling error when encountering PDF URLs (#707)
* fix: the crawling error when encountering PDF URLs

* Added the unit test for the new feature of crawl tool

* fix: address the code review problems

* fix: address the code review problems
2025-11-25 09:24:52 +08:00
Willem Jiang
b4c09aa4b1 security: add log injection attack prevention with input sanitization (#667)
* security: add log injection attack prevention with input sanitization

- Created src/utils/log_sanitizer.py to sanitize user-controlled input before logging
- Prevents log injection attacks using newlines, tabs, carriage returns, etc.
- Escapes dangerous characters: \n, \r, \t, \0, \x1b
- Provides specialized functions for different input types:
  - sanitize_log_input: general purpose sanitization
  - sanitize_thread_id: for user-provided thread IDs
  - sanitize_user_content: for user messages (more aggressive truncation)
  - sanitize_agent_name: for agent identifiers
  - sanitize_tool_name: for tool names
  - sanitize_feedback: for user interrupt feedback
  - create_safe_log_message: template-based safe message creation

- Updated src/server/app.py to sanitize all user input in logging:
  - Thread IDs from request parameter
  - Message content from user
  - Agent names and node information
  - Tool names and feedback

- Updated src/agents/tool_interceptor.py to sanitize:
  - Tool names during execution
  - User feedback during interrupt handling
  - Tool input data

- Added 29 comprehensive unit tests covering:
  - Classic newline injection attacks
  - Carriage return injection
  - Tab and null character injection
  - HTML/ANSI escape sequence injection
  - Combined multi-character attacks
  - Truncation and length limits

Fixes potential log forgery vulnerability where malicious users could inject
fake log entries via unsanitized input containing control characters.
2025-10-27 20:57:23 +08:00
Willem Jiang
975b344ca7 fix: resolve issue #651 - crawl error with None content handling (#652)
* fix: resolve issue #651 - crawl error with None content handling
Fixed issue #651 by adding comprehensive null-safety checks and error handling to the crawl system.
The fix prevents the ‘TypeError: Incoming markup is of an invalid type: None’ crash by:
1. Validating HTTP responses from Jina API
2. Handling None/empty content at extraction stage
3. Adding fallback handling in Article markdown/message conversion
4. Improving error diagnostics with detailed logging
5. Adding 16 new tests with 100% coverage for critical paths

* Update src/crawler/readability_extractor.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update src/crawler/article.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-24 17:06:54 +08:00
Willem Jiang
3c46201ff0 fix: fix the lint check errors of the main branch (#403) 2025-07-12 14:43:25 +08:00
Willem Jiang
c6ed423021 test: add unit tests of crawler (#292)
* test: add unit tests of crawler

* test: polish the code of crawler unit tests
2025-06-07 21:51:05 +08:00