refactor: split backend into harness (deerflow.*) and app (app.*) (#1131)

* refactor: extract shared utils to break harness→app cross-layer imports

Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: split backend/src into harness (deerflow.*) and app (app.*)

Physically split the monolithic backend/src/ package into two layers:

- **Harness** (`packages/harness/deerflow/`): publishable agent framework
  package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
  models, MCP, skills, config, and all core infrastructure.

- **App** (`app/`): unpublished application code with import prefix `app.*`.
  Contains gateway (FastAPI REST API) and channels (IM integrations).

Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure

Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: add harness→app boundary check test and update docs

Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.

Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add config versioning with auto-upgrade on startup

When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.

- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix comments

* fix: update src.* import in test_sandbox_tools_security to deerflow.*

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle empty config and search parent dirs for config.example.yaml

Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
  looking next to config.yaml, fixing detection in common setups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct skills root path depth and config_version type coercion

- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
  after harness split, file lives at packages/harness/deerflow/skills/
  so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
  _check_config_version() to prevent TypeError when YAML stores value
  as string (e.g. config_version: "1")
- tests: add regression tests for both fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update test imports from src.* to deerflow.*/app.* after harness refactor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
DanielWalnut
2026-03-14 22:55:52 +08:00
committed by GitHub
parent 9b49a80dda
commit 76803b826f
198 changed files with 1786 additions and 941 deletions

View File

@@ -7,6 +7,13 @@
# - Environment variables are available for all field values. Example: `api_key: $OPENAI_API_KEY`
# - The `use` path is a string that looks like "package_name.sub_package_name.module_name:class_name/variable_name".
# ============================================================================
# Config Version (used to detect outdated config files)
# ============================================================================
# Bump this number when the config schema changes.
# Run `make config-upgrade` to merge new fields into your local config.yaml.
config_version: 1
# ============================================================================
# Models Configuration
# ============================================================================
@@ -16,7 +23,7 @@ models:
# Example: Volcengine (Doubao) model
# - name: doubao-seed-1.8
# display_name: Doubao-Seed-1.8
# use: src.models.patched_deepseek:PatchedChatDeepSeek
# use: deerflow.models.patched_deepseek:PatchedChatDeepSeek
# model: doubao-seed-1-8-251228
# api_base: https://ark.cn-beijing.volces.com/api/v3
# api_key: $VOLCENGINE_API_KEY
@@ -62,7 +69,7 @@ models:
# Example: DeepSeek model (with thinking support)
# - name: deepseek-v3
# display_name: DeepSeek V3 (Thinking)
# use: src.models.patched_deepseek:PatchedChatDeepSeek
# use: deerflow.models.patched_deepseek:PatchedChatDeepSeek
# model: deepseek-reasoner
# api_key: $DEEPSEEK_API_KEY
# max_tokens: 16384
@@ -76,7 +83,7 @@ models:
# Example: Kimi K2.5 model
# - name: kimi-k2.5
# display_name: Kimi K2.5
# use: src.models.patched_deepseek:PatchedChatDeepSeek
# use: deerflow.models.patched_deepseek:PatchedChatDeepSeek
# model: kimi-k2.5
# api_base: https://api.moonshot.cn/v1
# api_key: $MOONSHOT_API_KEY
@@ -160,27 +167,27 @@ tools:
# Web search tool (requires Tavily API key)
- name: web_search
group: web
use: src.community.tavily.tools:web_search_tool
use: deerflow.community.tavily.tools:web_search_tool
max_results: 5
# api_key: $TAVILY_API_KEY # Set if needed
# Web search tool (requires InfoQuest API key)
# - name: web_search
# group: web
# use: src.community.infoquest.tools:web_search_tool
# use: deerflow.community.infoquest.tools:web_search_tool
# # Used to limit the scope of search results, only returns content within the specified time range. Set to -1 to disable time filtering
# search_time_range: 10
# Web fetch tool (uses Jina AI reader)
- name: web_fetch
group: web
use: src.community.jina_ai.tools:web_fetch_tool
use: deerflow.community.jina_ai.tools:web_fetch_tool
timeout: 10
# Web fetch tool (uses InfoQuest AI reader)
# - name: web_fetch
# group: web
# use: src.community.infoquest.tools:web_fetch_tool
# use: deerflow.community.infoquest.tools:web_fetch_tool
# # Overall timeout for the entire crawling process (in seconds). Set to positive value to enable, -1 to disable
# timeout: 10
# # Waiting time after page loading (in seconds). Set to positive value to enable, -1 to disable
@@ -192,30 +199,30 @@ tools:
# Use this to find reference images before image generation
- name: image_search
group: web
use: src.community.image_search.tools:image_search_tool
use: deerflow.community.image_search.tools:image_search_tool
max_results: 5
# File operations tools
- name: ls
group: file:read
use: src.sandbox.tools:ls_tool
use: deerflow.sandbox.tools:ls_tool
- name: read_file
group: file:read
use: src.sandbox.tools:read_file_tool
use: deerflow.sandbox.tools:read_file_tool
- name: write_file
group: file:write
use: src.sandbox.tools:write_file_tool
use: deerflow.sandbox.tools:write_file_tool
- name: str_replace
group: file:write
use: src.sandbox.tools:str_replace_tool
use: deerflow.sandbox.tools:str_replace_tool
# Bash execution tool
- name: bash
group: bash
use: src.sandbox.tools:bash_tool
use: deerflow.sandbox.tools:bash_tool
# ============================================================================
# Sandbox Configuration
@@ -225,7 +232,7 @@ tools:
# Option 1: Local Sandbox (Default)
# Executes commands directly on the host machine
sandbox:
use: src.sandbox.local:LocalSandboxProvider
use: deerflow.sandbox.local:LocalSandboxProvider
# Option 2: Container-based AIO Sandbox
# Executes commands in isolated containers (Docker or Apple Container)
@@ -233,7 +240,7 @@ sandbox:
# On other platforms: Uses Docker
# Uncomment to use:
# sandbox:
# use: src.community.aio_sandbox:AioSandboxProvider
# use: deerflow.community.aio_sandbox:AioSandboxProvider
#
# # Optional: Container image to use (works with both Docker and Apple Container)
# # Default: enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest
@@ -271,7 +278,7 @@ sandbox:
# Each sandbox_id gets a dedicated Pod in k3s, managed by the provisioner.
# Recommended for production or advanced users who want better isolation and scalability.:
# sandbox:
# use: src.community.aio_sandbox:AioSandboxProvider
# use: deerflow.community.aio_sandbox:AioSandboxProvider
# provisioner_url: http://provisioner:8002
# ============================================================================