refactor: split backend into harness (deerflow.*) and app (app.*) (#1131)

* refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.*) and app (app.*) Physically split the monolithic backend/src/ package into two layers: - **Harness** (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.*`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - **App** (`app/`): unpublished application code with import prefix `app.*`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src.* → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.*` or `import app.*` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow.*/app.* after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-18 12:04:45 +08:00 · 2026-03-14 22:55:52 +08:00
parent 9b49a80dda
commit 76803b826f
198 changed files with 1786 additions and 941 deletions
--- a/config.example.yaml
+++ b/config.example.yaml
@@ -7,6 +7,13 @@
 # - Environment variables are available for all field values. Example: `api_key: $OPENAI_API_KEY`
 # - The `use` path is a string that looks like "package_name.sub_package_name.module_name:class_name/variable_name".

+# ============================================================================
+# Config Version (used to detect outdated config files)
+# ============================================================================
+# Bump this number when the config schema changes.
+# Run `make config-upgrade` to merge new fields into your local config.yaml.
+config_version: 1
+
 # ============================================================================
 # Models Configuration
 # ============================================================================
@@ -16,7 +23,7 @@ models:
  # Example: Volcengine (Doubao) model
  # - name: doubao-seed-1.8
  #   display_name: Doubao-Seed-1.8
-  #   use: src.models.patched_deepseek:PatchedChatDeepSeek
+  #   use: deerflow.models.patched_deepseek:PatchedChatDeepSeek
  #   model: doubao-seed-1-8-251228
  #   api_base: https://ark.cn-beijing.volces.com/api/v3
  #   api_key: $VOLCENGINE_API_KEY
@@ -62,7 +69,7 @@ models:
  # Example: DeepSeek model (with thinking support)
  # - name: deepseek-v3
  #   display_name: DeepSeek V3 (Thinking)
-  #   use: src.models.patched_deepseek:PatchedChatDeepSeek
+  #   use: deerflow.models.patched_deepseek:PatchedChatDeepSeek
  #   model: deepseek-reasoner
  #   api_key: $DEEPSEEK_API_KEY
  #   max_tokens: 16384
@@ -76,7 +83,7 @@ models:
  # Example: Kimi K2.5 model
  # - name: kimi-k2.5
  #   display_name: Kimi K2.5
-  #   use: src.models.patched_deepseek:PatchedChatDeepSeek
+  #   use: deerflow.models.patched_deepseek:PatchedChatDeepSeek
  #   model: kimi-k2.5
  #   api_base: https://api.moonshot.cn/v1
  #   api_key: $MOONSHOT_API_KEY
@@ -160,27 +167,27 @@ tools:
  # Web search tool (requires Tavily API key)
  - name: web_search
    group: web
-    use: src.community.tavily.tools:web_search_tool
+    use: deerflow.community.tavily.tools:web_search_tool
    max_results: 5
    # api_key: $TAVILY_API_KEY  # Set if needed

  # Web search tool (requires InfoQuest API key)
  # - name: web_search
  #   group: web
-  #   use: src.community.infoquest.tools:web_search_tool
+  #   use: deerflow.community.infoquest.tools:web_search_tool
  #   # Used to limit the scope of search results, only returns content within the specified time range. Set to -1 to disable time filtering
  #   search_time_range: 10

  # Web fetch tool (uses Jina AI reader)
  - name: web_fetch
    group: web
-    use: src.community.jina_ai.tools:web_fetch_tool
+    use: deerflow.community.jina_ai.tools:web_fetch_tool
    timeout: 10

  # Web fetch tool (uses InfoQuest AI reader)
  # - name: web_fetch
  #   group: web
-  #   use: src.community.infoquest.tools:web_fetch_tool
+  #   use: deerflow.community.infoquest.tools:web_fetch_tool
  #   # Overall timeout for the entire crawling process (in seconds). Set to positive value to enable, -1 to disable
  #   timeout: 10
  #   # Waiting time after page loading (in seconds). Set to positive value to enable, -1 to disable
@@ -192,30 +199,30 @@ tools:
  # Use this to find reference images before image generation
  - name: image_search
    group: web
-    use: src.community.image_search.tools:image_search_tool
+    use: deerflow.community.image_search.tools:image_search_tool
    max_results: 5

  # File operations tools
  - name: ls
    group: file:read
-    use: src.sandbox.tools:ls_tool
+    use: deerflow.sandbox.tools:ls_tool

  - name: read_file
    group: file:read
-    use: src.sandbox.tools:read_file_tool
+    use: deerflow.sandbox.tools:read_file_tool

  - name: write_file
    group: file:write
-    use: src.sandbox.tools:write_file_tool
+    use: deerflow.sandbox.tools:write_file_tool

  - name: str_replace
    group: file:write
-    use: src.sandbox.tools:str_replace_tool
+    use: deerflow.sandbox.tools:str_replace_tool

  # Bash execution tool
  - name: bash
    group: bash
-    use: src.sandbox.tools:bash_tool
+    use: deerflow.sandbox.tools:bash_tool

 # ============================================================================
 # Sandbox Configuration
@@ -225,7 +232,7 @@ tools:
 # Option 1: Local Sandbox (Default)
 # Executes commands directly on the host machine
 sandbox:
-  use: src.sandbox.local:LocalSandboxProvider
+  use: deerflow.sandbox.local:LocalSandboxProvider

 # Option 2: Container-based AIO Sandbox
 # Executes commands in isolated containers (Docker or Apple Container)
@@ -233,7 +240,7 @@ sandbox:
 # On other platforms: Uses Docker
 # Uncomment to use:
 # sandbox:
-#   use: src.community.aio_sandbox:AioSandboxProvider
+#   use: deerflow.community.aio_sandbox:AioSandboxProvider
 #
 #   # Optional: Container image to use (works with both Docker and Apple Container)
 #   # Default: enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest
@@ -271,7 +278,7 @@ sandbox:
 # Each sandbox_id gets a dedicated Pod in k3s, managed by the provisioner.
 # Recommended for production or advanced users who want better isolation and scalability.:
 # sandbox:
-#   use: src.community.aio_sandbox:AioSandboxProvider
+#   use: deerflow.community.aio_sandbox:AioSandboxProvider
 #   provisioner_url: http://provisioner:8002

 # ============================================================================