mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-02 22:02:13 +08:00
* refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.*) and app (app.*) Physically split the monolithic backend/src/ package into two layers: - **Harness** (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.*`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - **App** (`app/`): unpublished application code with import prefix `app.*`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src.* → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.*` or `import app.*` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow.*/app.* after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(harness): add tool-first ACP agent invocation (#37) * feat(harness): add tool-first ACP agent invocation * build(harness): make ACP dependency required * fix(harness): address ACP review feedback * feat(harness): decouple ACP agent workspace from thread data ACP agents (codex, claude-code) previously used per-thread workspace directories, causing path resolution complexity and coupling task execution to DeerFlow's internal thread data layout. This change: - Replace _resolve_cwd() with a fixed _get_work_dir() that always uses {base_dir}/acp-workspace/, eliminating virtual path translation and thread_id lookups - Introduce /mnt/acp-workspace virtual path for lead agent read-only access to ACP agent output files (same pattern as /mnt/skills) - Add security guards: read-only validation, path traversal prevention, command path allowlisting, and output masking for acp-workspace - Update system prompt and tool description to guide LLM: send self-contained tasks to ACP agents, copy results via /mnt/acp-workspace - Add 11 new security tests for ACP workspace path handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(prompt): inject ACP section only when ACP agents are configured The ACP agent guidance in the system prompt is now conditionally built by _build_acp_section(), which checks get_acp_agents() and returns an empty string when no ACP agents are configured. This avoids polluting the prompt with irrelevant instructions for users who don't use ACP. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix lint * fix(harness): address Copilot review comments on sandbox path handling and ACP tool - local_sandbox: fix path-segment boundary bug in _resolve_path (== or startswith +"/") and add lookahead in _resolve_paths_in_command regex to prevent /mnt/skills matching inside /mnt/skills-extra - local_sandbox_provider: replace print() with logger.warning(..., exc_info=True) - invoke_acp_agent_tool: guard getattr(option, "optionId") with None default + continue; move full prompt from INFO to DEBUG level (truncated to 200 chars) - sandbox/tools: fix _get_acp_workspace_host_path docstring to match implementation; remove misleading "read-only" language from validate_local_bash_command_paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(acp): thread-isolated workspaces, permission guardrail, and ContextVar registry P1.1 – ACP workspace thread isolation - Add `Paths.acp_workspace_dir(thread_id)` for per-thread paths - `_get_work_dir(thread_id)` in invoke_acp_agent_tool now uses `{base_dir}/threads/{thread_id}/acp-workspace/`; falls back to global workspace when thread_id is absent or invalid - `_invoke` extracts thread_id from `RunnableConfig` via `Annotated[RunnableConfig, InjectedToolArg]` - `sandbox/tools.py`: `_get_acp_workspace_host_path(thread_id)`, `_resolve_acp_workspace_path(path, thread_id)`, and all callers (`replace_virtual_paths_in_command`, `mask_local_paths_in_output`, `ls_tool`, `read_file_tool`) now resolve ACP paths per-thread P1.2 – ACP permission guardrail - New `auto_approve_permissions: bool = False` field in `ACPAgentConfig` - `_build_permission_response(options, *, auto_approve: bool)` now defaults to deny; only approves when `auto_approve=True` - Document field in `config.example.yaml` P2 – Deferred tool registry race condition - Replace module-level `_registry` global with `contextvars.ContextVar` - Each asyncio request context gets its own registry; worker threads inherit the context automatically via `loop.run_in_executor` - Expose `get_deferred_registry` / `set_deferred_registry` / `reset_deferred_registry` helpers Tests: 831 pass (57 for affected modules, 3 new tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(sandbox): mount /mnt/acp-workspace in docker sandbox container The AioSandboxProvider was not mounting the ACP workspace into the sandbox container, so /mnt/acp-workspace was inaccessible when the lead agent tried to read ACP results in docker mode. Changes: - `ensure_thread_dirs`: also create `acp-workspace/` (chmod 0o777) so the directory exists before the sandbox container starts — required for Docker volume mounts - `_get_thread_mounts`: add read-only `/mnt/acp-workspace` mount using the per-thread host path (`host_paths.acp_workspace_dir(thread_id)`) - Update stale CLAUDE.md description (was "fixed global workspace") Tests: `test_aio_sandbox_provider.py` (4 new tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lint): remove unused imports in test_aio_sandbox_provider Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix config --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
334 lines
13 KiB
Python
334 lines
13 KiB
Python
import logging
|
|
import os
|
|
from pathlib import Path
|
|
from typing import Any, Self
|
|
|
|
import yaml
|
|
from dotenv import load_dotenv
|
|
from pydantic import BaseModel, ConfigDict, Field
|
|
|
|
from deerflow.config.acp_config import load_acp_config_from_dict
|
|
from deerflow.config.checkpointer_config import CheckpointerConfig, load_checkpointer_config_from_dict
|
|
from deerflow.config.extensions_config import ExtensionsConfig
|
|
from deerflow.config.guardrails_config import load_guardrails_config_from_dict
|
|
from deerflow.config.memory_config import load_memory_config_from_dict
|
|
from deerflow.config.model_config import ModelConfig
|
|
from deerflow.config.sandbox_config import SandboxConfig
|
|
from deerflow.config.skills_config import SkillsConfig
|
|
from deerflow.config.subagents_config import load_subagents_config_from_dict
|
|
from deerflow.config.summarization_config import load_summarization_config_from_dict
|
|
from deerflow.config.title_config import load_title_config_from_dict
|
|
from deerflow.config.token_usage_config import TokenUsageConfig
|
|
from deerflow.config.tool_config import ToolConfig, ToolGroupConfig
|
|
from deerflow.config.tool_search_config import ToolSearchConfig, load_tool_search_config_from_dict
|
|
|
|
load_dotenv()
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
class AppConfig(BaseModel):
|
|
"""Config for the DeerFlow application"""
|
|
|
|
log_level: str = Field(default="info", description="Logging level for deerflow modules (debug/info/warning/error)")
|
|
token_usage: TokenUsageConfig = Field(default_factory=TokenUsageConfig, description="Token usage tracking configuration")
|
|
models: list[ModelConfig] = Field(default_factory=list, description="Available models")
|
|
sandbox: SandboxConfig = Field(description="Sandbox configuration")
|
|
tools: list[ToolConfig] = Field(default_factory=list, description="Available tools")
|
|
tool_groups: list[ToolGroupConfig] = Field(default_factory=list, description="Available tool groups")
|
|
skills: SkillsConfig = Field(default_factory=SkillsConfig, description="Skills configuration")
|
|
extensions: ExtensionsConfig = Field(default_factory=ExtensionsConfig, description="Extensions configuration (MCP servers and skills state)")
|
|
tool_search: ToolSearchConfig = Field(default_factory=ToolSearchConfig, description="Tool search / deferred loading configuration")
|
|
model_config = ConfigDict(extra="allow", frozen=False)
|
|
checkpointer: CheckpointerConfig | None = Field(default=None, description="Checkpointer configuration")
|
|
|
|
@classmethod
|
|
def resolve_config_path(cls, config_path: str | None = None) -> Path:
|
|
"""Resolve the config file path.
|
|
|
|
Priority:
|
|
1. If provided `config_path` argument, use it.
|
|
2. If provided `DEER_FLOW_CONFIG_PATH` environment variable, use it.
|
|
3. Otherwise, first check the `config.yaml` in the current directory, then fallback to `config.yaml` in the parent directory.
|
|
"""
|
|
if config_path:
|
|
path = Path(config_path)
|
|
if not Path.exists(path):
|
|
raise FileNotFoundError(f"Config file specified by param `config_path` not found at {path}")
|
|
return path
|
|
elif os.getenv("DEER_FLOW_CONFIG_PATH"):
|
|
path = Path(os.getenv("DEER_FLOW_CONFIG_PATH"))
|
|
if not Path.exists(path):
|
|
raise FileNotFoundError(f"Config file specified by environment variable `DEER_FLOW_CONFIG_PATH` not found at {path}")
|
|
return path
|
|
else:
|
|
# Check if the config.yaml is in the current directory
|
|
path = Path(os.getcwd()) / "config.yaml"
|
|
if not path.exists():
|
|
# Check if the config.yaml is in the parent directory of CWD
|
|
path = Path(os.getcwd()).parent / "config.yaml"
|
|
if not path.exists():
|
|
raise FileNotFoundError("`config.yaml` file not found at the current directory nor its parent directory")
|
|
return path
|
|
|
|
@classmethod
|
|
def from_file(cls, config_path: str | None = None) -> Self:
|
|
"""Load config from YAML file.
|
|
|
|
See `resolve_config_path` for more details.
|
|
|
|
Args:
|
|
config_path: Path to the config file.
|
|
|
|
Returns:
|
|
AppConfig: The loaded config.
|
|
"""
|
|
resolved_path = cls.resolve_config_path(config_path)
|
|
with open(resolved_path, encoding="utf-8") as f:
|
|
config_data = yaml.safe_load(f) or {}
|
|
|
|
# Check config version before processing
|
|
cls._check_config_version(config_data, resolved_path)
|
|
|
|
config_data = cls.resolve_env_variables(config_data)
|
|
|
|
# Load title config if present
|
|
if "title" in config_data:
|
|
load_title_config_from_dict(config_data["title"])
|
|
|
|
# Load summarization config if present
|
|
if "summarization" in config_data:
|
|
load_summarization_config_from_dict(config_data["summarization"])
|
|
|
|
# Load memory config if present
|
|
if "memory" in config_data:
|
|
load_memory_config_from_dict(config_data["memory"])
|
|
|
|
# Load subagents config if present
|
|
if "subagents" in config_data:
|
|
load_subagents_config_from_dict(config_data["subagents"])
|
|
|
|
# Load tool_search config if present
|
|
if "tool_search" in config_data:
|
|
load_tool_search_config_from_dict(config_data["tool_search"])
|
|
|
|
# Load guardrails config if present
|
|
if "guardrails" in config_data:
|
|
load_guardrails_config_from_dict(config_data["guardrails"])
|
|
|
|
# Load checkpointer config if present
|
|
if "checkpointer" in config_data:
|
|
load_checkpointer_config_from_dict(config_data["checkpointer"])
|
|
|
|
# Always refresh ACP agent config so removed entries do not linger across reloads.
|
|
load_acp_config_from_dict(config_data.get("acp_agents", {}))
|
|
|
|
# Load extensions config separately (it's in a different file)
|
|
extensions_config = ExtensionsConfig.from_file()
|
|
config_data["extensions"] = extensions_config.model_dump()
|
|
|
|
result = cls.model_validate(config_data)
|
|
return result
|
|
|
|
@classmethod
|
|
def _check_config_version(cls, config_data: dict, config_path: Path) -> None:
|
|
"""Check if the user's config.yaml is outdated compared to config.example.yaml.
|
|
|
|
Emits a warning if the user's config_version is lower than the example's.
|
|
Missing config_version is treated as version 0 (pre-versioning).
|
|
"""
|
|
try:
|
|
user_version = int(config_data.get("config_version", 0))
|
|
except (TypeError, ValueError):
|
|
user_version = 0
|
|
|
|
# Find config.example.yaml by searching config.yaml's directory and its parents
|
|
example_path = None
|
|
search_dir = config_path.parent
|
|
for _ in range(5): # search up to 5 levels
|
|
candidate = search_dir / "config.example.yaml"
|
|
if candidate.exists():
|
|
example_path = candidate
|
|
break
|
|
parent = search_dir.parent
|
|
if parent == search_dir:
|
|
break
|
|
search_dir = parent
|
|
if example_path is None:
|
|
return
|
|
|
|
try:
|
|
with open(example_path, encoding="utf-8") as f:
|
|
example_data = yaml.safe_load(f)
|
|
raw = example_data.get("config_version", 0) if example_data else 0
|
|
try:
|
|
example_version = int(raw)
|
|
except (TypeError, ValueError):
|
|
example_version = 0
|
|
except Exception:
|
|
return
|
|
|
|
if user_version < example_version:
|
|
logger.warning(
|
|
"Your config.yaml (version %d) is outdated — the latest version is %d. Run `make config-upgrade` to merge new fields into your config.",
|
|
user_version,
|
|
example_version,
|
|
)
|
|
|
|
@classmethod
|
|
def resolve_env_variables(cls, config: Any) -> Any:
|
|
"""Recursively resolve environment variables in the config.
|
|
|
|
Environment variables are resolved using the `os.getenv` function. Example: $OPENAI_API_KEY
|
|
|
|
Args:
|
|
config: The config to resolve environment variables in.
|
|
|
|
Returns:
|
|
The config with environment variables resolved.
|
|
"""
|
|
if isinstance(config, str):
|
|
if config.startswith("$"):
|
|
env_value = os.getenv(config[1:])
|
|
if env_value is None:
|
|
raise ValueError(f"Environment variable {config[1:]} not found for config value {config}")
|
|
return env_value
|
|
return config
|
|
elif isinstance(config, dict):
|
|
return {k: cls.resolve_env_variables(v) for k, v in config.items()}
|
|
elif isinstance(config, list):
|
|
return [cls.resolve_env_variables(item) for item in config]
|
|
return config
|
|
|
|
def get_model_config(self, name: str) -> ModelConfig | None:
|
|
"""Get the model config by name.
|
|
|
|
Args:
|
|
name: The name of the model to get the config for.
|
|
|
|
Returns:
|
|
The model config if found, otherwise None.
|
|
"""
|
|
return next((model for model in self.models if model.name == name), None)
|
|
|
|
def get_tool_config(self, name: str) -> ToolConfig | None:
|
|
"""Get the tool config by name.
|
|
|
|
Args:
|
|
name: The name of the tool to get the config for.
|
|
|
|
Returns:
|
|
The tool config if found, otherwise None.
|
|
"""
|
|
return next((tool for tool in self.tools if tool.name == name), None)
|
|
|
|
def get_tool_group_config(self, name: str) -> ToolGroupConfig | None:
|
|
"""Get the tool group config by name.
|
|
|
|
Args:
|
|
name: The name of the tool group to get the config for.
|
|
|
|
Returns:
|
|
The tool group config if found, otherwise None.
|
|
"""
|
|
return next((group for group in self.tool_groups if group.name == name), None)
|
|
|
|
|
|
_app_config: AppConfig | None = None
|
|
_app_config_path: Path | None = None
|
|
_app_config_mtime: float | None = None
|
|
_app_config_is_custom = False
|
|
|
|
|
|
def _get_config_mtime(config_path: Path) -> float | None:
|
|
"""Get the modification time of a config file if it exists."""
|
|
try:
|
|
return config_path.stat().st_mtime
|
|
except OSError:
|
|
return None
|
|
|
|
|
|
def _load_and_cache_app_config(config_path: str | None = None) -> AppConfig:
|
|
"""Load config from disk and refresh cache metadata."""
|
|
global _app_config, _app_config_path, _app_config_mtime, _app_config_is_custom
|
|
|
|
resolved_path = AppConfig.resolve_config_path(config_path)
|
|
_app_config = AppConfig.from_file(str(resolved_path))
|
|
_app_config_path = resolved_path
|
|
_app_config_mtime = _get_config_mtime(resolved_path)
|
|
_app_config_is_custom = False
|
|
return _app_config
|
|
|
|
|
|
def get_app_config() -> AppConfig:
|
|
"""Get the DeerFlow config instance.
|
|
|
|
Returns a cached singleton instance and automatically reloads it when the
|
|
underlying config file path or modification time changes. Use
|
|
`reload_app_config()` to force a reload, or `reset_app_config()` to clear
|
|
the cache.
|
|
"""
|
|
global _app_config, _app_config_path, _app_config_mtime
|
|
|
|
if _app_config is not None and _app_config_is_custom:
|
|
return _app_config
|
|
|
|
resolved_path = AppConfig.resolve_config_path()
|
|
current_mtime = _get_config_mtime(resolved_path)
|
|
|
|
should_reload = _app_config is None or _app_config_path != resolved_path or _app_config_mtime != current_mtime
|
|
if should_reload:
|
|
if _app_config_path == resolved_path and _app_config_mtime is not None and current_mtime is not None and _app_config_mtime != current_mtime:
|
|
logger.info(
|
|
"Config file has been modified (mtime: %s -> %s), reloading AppConfig",
|
|
_app_config_mtime,
|
|
current_mtime,
|
|
)
|
|
_load_and_cache_app_config(str(resolved_path))
|
|
return _app_config
|
|
|
|
|
|
def reload_app_config(config_path: str | None = None) -> AppConfig:
|
|
"""Reload the config from file and update the cached instance.
|
|
|
|
This is useful when the config file has been modified and you want
|
|
to pick up the changes without restarting the application.
|
|
|
|
Args:
|
|
config_path: Optional path to config file. If not provided,
|
|
uses the default resolution strategy.
|
|
|
|
Returns:
|
|
The newly loaded AppConfig instance.
|
|
"""
|
|
return _load_and_cache_app_config(config_path)
|
|
|
|
|
|
def reset_app_config() -> None:
|
|
"""Reset the cached config instance.
|
|
|
|
This clears the singleton cache, causing the next call to
|
|
`get_app_config()` to reload from file. Useful for testing
|
|
or when switching between different configurations.
|
|
"""
|
|
global _app_config, _app_config_path, _app_config_mtime, _app_config_is_custom
|
|
_app_config = None
|
|
_app_config_path = None
|
|
_app_config_mtime = None
|
|
_app_config_is_custom = False
|
|
|
|
|
|
def set_app_config(config: AppConfig) -> None:
|
|
"""Set a custom config instance.
|
|
|
|
This allows injecting a custom or mock config for testing purposes.
|
|
|
|
Args:
|
|
config: The AppConfig instance to use.
|
|
"""
|
|
global _app_config, _app_config_path, _app_config_mtime, _app_config_is_custom
|
|
_app_config = config
|
|
_app_config_path = None
|
|
_app_config_mtime = None
|
|
_app_config_is_custom = True
|