backend/packages/harness/deerflow/community/tavily/tools.py

import json

from langchain.tools import tool
from tavily import TavilyClient

from deerflow.config import get_app_config


def _get_tavily_client() -> TavilyClient:
    config = get_app_config().get_tool_config("web_search")
    api_key = None
    if config is not None and "api_key" in config.model_extra:
        api_key = config.model_extra.get("api_key")
    return TavilyClient(api_key=api_key)


@tool("web_search", parse_docstring=True)
def web_search_tool(query: str) -> str:
    """Search the web.

    Args:
        query: The query to search for.
    """
    config = get_app_config().get_tool_config("web_search")
    max_results = 5
    if config is not None and "max_results" in config.model_extra:
        max_results = config.model_extra.get("max_results")

    client = _get_tavily_client()
    res = client.search(query, max_results=max_results)
    normalized_results = [
        {
            "title": result["title"],
            "url": result["url"],
            "snippet": result["content"],
        }
        for result in res["results"]
    ]
    json_results = json.dumps(normalized_results, indent=2, ensure_ascii=False)
    return json_results


@tool("web_fetch", parse_docstring=True)
def web_fetch_tool(url: str) -> str:
    """Fetch the contents of a web page at a given URL.
    Only fetch EXACT URLs that have been provided directly by the user or have been returned in results from the web_search and web_fetch tools.
    This tool can NOT access content that requires authentication, such as private Google Docs or pages behind login walls.
    Do NOT add www. to URLs that do NOT have them.
    URLs must include the schema: https://example.com is a valid URL while example.com is an invalid URL.

    Args:
        url: The URL to fetch the contents of.
    """
    client = _get_tavily_client()
    res = client.extract([url])
    if "failed_results" in res and len(res["failed_results"]) > 0:
        return f"Error: {res['failed_results'][0]['error']}"
    elif "results" in res and len(res["results"]) > 0:
        result = res["results"][0]
        return f"# {result['title']}\n\n{result['raw_content'][:4096]}"
    else:
        return "Error: No results found"
feat: integrated with Tavily and Jina AI 2026-01-14 07:17:22 +08:00			`import json`

			`from langchain.tools import tool`
			`from tavily import TavilyClient`

refactor: split backend into harness (deerflow.) and app (app.) (#1131) * refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.) and app (app.) Physically split the monolithic backend/src/ package into two layers: - Harness (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - App* (`app/`): unpublished application code with import prefix `app.`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src. → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.` or `import app.` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow./app. after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-14 22:55:52 +08:00			`from deerflow.config import get_app_config`
feat: integrated with Tavily and Jina AI 2026-01-14 07:17:22 +08:00
fix: support loading tavily ak from config.yaml 2026-01-23 23:50:40 +08:00
			`def _get_tavily_client() -> TavilyClient:`
			`config = get_app_config().get_tool_config("web_search")`
			`api_key = None`
			`if config is not None and "api_key" in config.model_extra:`
			`api_key = config.model_extra.get("api_key")`
			`return TavilyClient(api_key=api_key)`
feat: integrated with Tavily and Jina AI 2026-01-14 07:17:22 +08:00

			`@tool("web_search", parse_docstring=True)`
			`def web_search_tool(query: str) -> str:`
			`"""Search the web.`

			`Args:`
			`query: The query to search for.`
			`"""`
			`config = get_app_config().get_tool_config("web_search")`
			`max_results = 5`
			`if config is not None and "max_results" in config.model_extra:`
			`max_results = config.model_extra.get("max_results")`
fix: support loading tavily ak from config.yaml 2026-01-23 23:50:40 +08:00
			`client = _get_tavily_client()`
			`res = client.search(query, max_results=max_results)`
feat: integrated with Tavily and Jina AI 2026-01-14 07:17:22 +08:00			`normalized_results = [`
			`{`
			`"title": result["title"],`
			`"url": result["url"],`
			`"snippet": result["content"],`
			`}`
			`for result in res["results"]`
			`]`
			`json_results = json.dumps(normalized_results, indent=2, ensure_ascii=False)`
			`return json_results`


			`@tool("web_fetch", parse_docstring=True)`
			`def web_fetch_tool(url: str) -> str:`
			`"""Fetch the contents of a web page at a given URL.`
			`Only fetch EXACT URLs that have been provided directly by the user or have been returned in results from the web_search and web_fetch tools.`
			`This tool can NOT access content that requires authentication, such as private Google Docs or pages behind login walls.`
			`Do NOT add www. to URLs that do NOT have them.`
			`URLs must include the schema: https://example.com is a valid URL while example.com is an invalid URL.`

			`Args:`
			`url: The URL to fetch the contents of.`
			`"""`
fix: support loading tavily ak from config.yaml 2026-01-23 23:50:40 +08:00			`client = _get_tavily_client()`
			`res = client.extract([url])`
feat: integrated with Tavily and Jina AI 2026-01-14 07:17:22 +08:00			`if "failed_results" in res and len(res["failed_results"]) > 0:`
style: format 2026-01-14 09:21:19 +08:00			`return f"Error: {res['failed_results'][0]['error']}"`
feat: integrated with Tavily and Jina AI 2026-01-14 07:17:22 +08:00			`elif "results" in res and len(res["results"]) > 0:`
			`result = res["results"][0]`
feat: add view_image tool and optimize web fetch tools Add image viewing capability for vision-enabled models with ViewImageMiddleware and view_image_tool. Limit web_fetch tool output to 4096 characters to prevent excessive content. Update model config to support vision capability flag. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> 2026-01-29 13:44:04 +08:00			`return f"# {result['title']}\n\n{result['raw_content'][:4096]}"`
feat: integrated with Tavily and Jina AI 2026-01-14 07:17:22 +08:00			`else:`
			`return "Error: No results found"`