refactor: split backend into harness (deerflow.*) and app (app.*) (#1131)

* refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.*) and app (app.*) Physically split the monolithic backend/src/ package into two layers: - **Harness** (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.*`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - **App** (`app/`): unpublished application code with import prefix `app.*`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src.* → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.*` or `import app.*` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow.*/app.* after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 06:12:14 +08:00 · 2026-03-14 22:55:52 +08:00
parent 9b49a80dda
commit 76803b826f
198 changed files with 1786 additions and 941 deletions
--- a/backend/packages/harness/deerflow/config/app_config.py
+++ b/backend/packages/harness/deerflow/config/app_config.py
@@ -0,0 +1,273 @@
+import logging
+import os
+from pathlib import Path
+from typing import Any, Self
+
+import yaml
+from dotenv import load_dotenv
+from pydantic import BaseModel, ConfigDict, Field
+
+from deerflow.config.checkpointer_config import CheckpointerConfig, load_checkpointer_config_from_dict
+from deerflow.config.extensions_config import ExtensionsConfig
+from deerflow.config.memory_config import load_memory_config_from_dict
+from deerflow.config.model_config import ModelConfig
+from deerflow.config.sandbox_config import SandboxConfig
+from deerflow.config.skills_config import SkillsConfig
+from deerflow.config.subagents_config import load_subagents_config_from_dict
+from deerflow.config.summarization_config import load_summarization_config_from_dict
+from deerflow.config.title_config import load_title_config_from_dict
+from deerflow.config.tool_config import ToolConfig, ToolGroupConfig
+
+load_dotenv()
+
+logger = logging.getLogger(__name__)
+
+
+class AppConfig(BaseModel):
+    """Config for the DeerFlow application"""
+
+    models: list[ModelConfig] = Field(default_factory=list, description="Available models")
+    sandbox: SandboxConfig = Field(description="Sandbox configuration")
+    tools: list[ToolConfig] = Field(default_factory=list, description="Available tools")
+    tool_groups: list[ToolGroupConfig] = Field(default_factory=list, description="Available tool groups")
+    skills: SkillsConfig = Field(default_factory=SkillsConfig, description="Skills configuration")
+    extensions: ExtensionsConfig = Field(default_factory=ExtensionsConfig, description="Extensions configuration (MCP servers and skills state)")
+    model_config = ConfigDict(extra="allow", frozen=False)
+    checkpointer: CheckpointerConfig | None = Field(default=None, description="Checkpointer configuration")
+
+    @classmethod
+    def resolve_config_path(cls, config_path: str | None = None) -> Path:
+        """Resolve the config file path.
+
+        Priority:
+        1. If provided `config_path` argument, use it.
+        2. If provided `DEER_FLOW_CONFIG_PATH` environment variable, use it.
+        3. Otherwise, first check the `config.yaml` in the current directory, then fallback to `config.yaml` in the parent directory.
+        """
+        if config_path:
+            path = Path(config_path)
+            if not Path.exists(path):
+                raise FileNotFoundError(f"Config file specified by param `config_path` not found at {path}")
+            return path
+        elif os.getenv("DEER_FLOW_CONFIG_PATH"):
+            path = Path(os.getenv("DEER_FLOW_CONFIG_PATH"))
+            if not Path.exists(path):
+                raise FileNotFoundError(f"Config file specified by environment variable `DEER_FLOW_CONFIG_PATH` not found at {path}")
+            return path
+        else:
+            # Check if the config.yaml is in the current directory
+            path = Path(os.getcwd()) / "config.yaml"
+            if not path.exists():
+                # Check if the config.yaml is in the parent directory of CWD
+                path = Path(os.getcwd()).parent / "config.yaml"
+                if not path.exists():
+                    raise FileNotFoundError("`config.yaml` file not found at the current directory nor its parent directory")
+            return path
+
+    @classmethod
+    def from_file(cls, config_path: str | None = None) -> Self:
+        """Load config from YAML file.
+
+        See `resolve_config_path` for more details.
+
+        Args:
+            config_path: Path to the config file.
+
+        Returns:
+            AppConfig: The loaded config.
+        """
+        resolved_path = cls.resolve_config_path(config_path)
+        with open(resolved_path, encoding="utf-8") as f:
+            config_data = yaml.safe_load(f) or {}
+
+        # Check config version before processing
+        cls._check_config_version(config_data, resolved_path)
+
+        config_data = cls.resolve_env_variables(config_data)
+
+        # Load title config if present
+        if "title" in config_data:
+            load_title_config_from_dict(config_data["title"])
+
+        # Load summarization config if present
+        if "summarization" in config_data:
+            load_summarization_config_from_dict(config_data["summarization"])
+
+        # Load memory config if present
+        if "memory" in config_data:
+            load_memory_config_from_dict(config_data["memory"])
+
+        # Load subagents config if present
+        if "subagents" in config_data:
+            load_subagents_config_from_dict(config_data["subagents"])
+
+        # Load checkpointer config if present
+        if "checkpointer" in config_data:
+            load_checkpointer_config_from_dict(config_data["checkpointer"])
+
+        # Load extensions config separately (it's in a different file)
+        extensions_config = ExtensionsConfig.from_file()
+        config_data["extensions"] = extensions_config.model_dump()
+
+        result = cls.model_validate(config_data)
+        return result
+
+    @classmethod
+    def _check_config_version(cls, config_data: dict, config_path: Path) -> None:
+        """Check if the user's config.yaml is outdated compared to config.example.yaml.
+
+        Emits a warning if the user's config_version is lower than the example's.
+        Missing config_version is treated as version 0 (pre-versioning).
+        """
+        try:
+            user_version = int(config_data.get("config_version", 0))
+        except (TypeError, ValueError):
+            user_version = 0
+
+        # Find config.example.yaml by searching config.yaml's directory and its parents
+        example_path = None
+        search_dir = config_path.parent
+        for _ in range(5):  # search up to 5 levels
+            candidate = search_dir / "config.example.yaml"
+            if candidate.exists():
+                example_path = candidate
+                break
+            parent = search_dir.parent
+            if parent == search_dir:
+                break
+            search_dir = parent
+        if example_path is None:
+            return
+
+        try:
+            with open(example_path, encoding="utf-8") as f:
+                example_data = yaml.safe_load(f)
+            raw = example_data.get("config_version", 0) if example_data else 0
+            try:
+                example_version = int(raw)
+            except (TypeError, ValueError):
+                example_version = 0
+        except Exception:
+            return
+
+        if user_version < example_version:
+            logger.warning(
+                "Your config.yaml (version %d) is outdated — the latest version is %d. "
+                "Run `make config-upgrade` to merge new fields into your config.",
+                user_version,
+                example_version,
+            )
+
+    @classmethod
+    def resolve_env_variables(cls, config: Any) -> Any:
+        """Recursively resolve environment variables in the config.
+
+        Environment variables are resolved using the `os.getenv` function. Example: $OPENAI_API_KEY
+
+        Args:
+            config: The config to resolve environment variables in.
+
+        Returns:
+            The config with environment variables resolved.
+        """
+        if isinstance(config, str):
+            if config.startswith("$"):
+                env_value = os.getenv(config[1:])
+                if env_value is None:
+                    raise ValueError(f"Environment variable {config[1:]} not found for config value {config}")
+                return env_value
+            return config
+        elif isinstance(config, dict):
+            return {k: cls.resolve_env_variables(v) for k, v in config.items()}
+        elif isinstance(config, list):
+            return [cls.resolve_env_variables(item) for item in config]
+        return config
+
+    def get_model_config(self, name: str) -> ModelConfig | None:
+        """Get the model config by name.
+
+        Args:
+            name: The name of the model to get the config for.
+
+        Returns:
+            The model config if found, otherwise None.
+        """
+        return next((model for model in self.models if model.name == name), None)
+
+    def get_tool_config(self, name: str) -> ToolConfig | None:
+        """Get the tool config by name.
+
+        Args:
+            name: The name of the tool to get the config for.
+
+        Returns:
+            The tool config if found, otherwise None.
+        """
+        return next((tool for tool in self.tools if tool.name == name), None)
+
+    def get_tool_group_config(self, name: str) -> ToolGroupConfig | None:
+        """Get the tool group config by name.
+
+        Args:
+            name: The name of the tool group to get the config for.
+
+        Returns:
+            The tool group config if found, otherwise None.
+        """
+        return next((group for group in self.tool_groups if group.name == name), None)
+
+
+_app_config: AppConfig | None = None
+
+
+def get_app_config() -> AppConfig:
+    """Get the DeerFlow config instance.
+
+    Returns a cached singleton instance. Use `reload_app_config()` to reload
+    from file, or `reset_app_config()` to clear the cache.
+    """
+    global _app_config
+    if _app_config is None:
+        _app_config = AppConfig.from_file()
+    return _app_config
+
+
+def reload_app_config(config_path: str | None = None) -> AppConfig:
+    """Reload the config from file and update the cached instance.
+
+    This is useful when the config file has been modified and you want
+    to pick up the changes without restarting the application.
+
+    Args:
+        config_path: Optional path to config file. If not provided,
+                     uses the default resolution strategy.
+
+    Returns:
+        The newly loaded AppConfig instance.
+    """
+    global _app_config
+    _app_config = AppConfig.from_file(config_path)
+    return _app_config
+
+
+def reset_app_config() -> None:
+    """Reset the cached config instance.
+
+    This clears the singleton cache, causing the next call to
+    `get_app_config()` to reload from file. Useful for testing
+    or when switching between different configurations.
+    """
+    global _app_config
+    _app_config = None
+
+
+def set_app_config(config: AppConfig) -> None:
+    """Set a custom config instance.
+
+    This allows injecting a custom or mock config for testing purposes.
+
+    Args:
+        config: The AppConfig instance to use.
+    """
+    global _app_config
+    _app_config = config