mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-02 22:02:13 +08:00
* refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.*) and app (app.*) Physically split the monolithic backend/src/ package into two layers: - **Harness** (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.*`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - **App** (`app/`): unpublished application code with import prefix `app.*`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src.* → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.*` or `import app.*` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow.*/app.* after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(harness): add tool-first ACP agent invocation (#37) * feat(harness): add tool-first ACP agent invocation * build(harness): make ACP dependency required * fix(harness): address ACP review feedback * feat(harness): decouple ACP agent workspace from thread data ACP agents (codex, claude-code) previously used per-thread workspace directories, causing path resolution complexity and coupling task execution to DeerFlow's internal thread data layout. This change: - Replace _resolve_cwd() with a fixed _get_work_dir() that always uses {base_dir}/acp-workspace/, eliminating virtual path translation and thread_id lookups - Introduce /mnt/acp-workspace virtual path for lead agent read-only access to ACP agent output files (same pattern as /mnt/skills) - Add security guards: read-only validation, path traversal prevention, command path allowlisting, and output masking for acp-workspace - Update system prompt and tool description to guide LLM: send self-contained tasks to ACP agents, copy results via /mnt/acp-workspace - Add 11 new security tests for ACP workspace path handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(prompt): inject ACP section only when ACP agents are configured The ACP agent guidance in the system prompt is now conditionally built by _build_acp_section(), which checks get_acp_agents() and returns an empty string when no ACP agents are configured. This avoids polluting the prompt with irrelevant instructions for users who don't use ACP. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix lint * fix(harness): address Copilot review comments on sandbox path handling and ACP tool - local_sandbox: fix path-segment boundary bug in _resolve_path (== or startswith +"/") and add lookahead in _resolve_paths_in_command regex to prevent /mnt/skills matching inside /mnt/skills-extra - local_sandbox_provider: replace print() with logger.warning(..., exc_info=True) - invoke_acp_agent_tool: guard getattr(option, "optionId") with None default + continue; move full prompt from INFO to DEBUG level (truncated to 200 chars) - sandbox/tools: fix _get_acp_workspace_host_path docstring to match implementation; remove misleading "read-only" language from validate_local_bash_command_paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(acp): thread-isolated workspaces, permission guardrail, and ContextVar registry P1.1 – ACP workspace thread isolation - Add `Paths.acp_workspace_dir(thread_id)` for per-thread paths - `_get_work_dir(thread_id)` in invoke_acp_agent_tool now uses `{base_dir}/threads/{thread_id}/acp-workspace/`; falls back to global workspace when thread_id is absent or invalid - `_invoke` extracts thread_id from `RunnableConfig` via `Annotated[RunnableConfig, InjectedToolArg]` - `sandbox/tools.py`: `_get_acp_workspace_host_path(thread_id)`, `_resolve_acp_workspace_path(path, thread_id)`, and all callers (`replace_virtual_paths_in_command`, `mask_local_paths_in_output`, `ls_tool`, `read_file_tool`) now resolve ACP paths per-thread P1.2 – ACP permission guardrail - New `auto_approve_permissions: bool = False` field in `ACPAgentConfig` - `_build_permission_response(options, *, auto_approve: bool)` now defaults to deny; only approves when `auto_approve=True` - Document field in `config.example.yaml` P2 – Deferred tool registry race condition - Replace module-level `_registry` global with `contextvars.ContextVar` - Each asyncio request context gets its own registry; worker threads inherit the context automatically via `loop.run_in_executor` - Expose `get_deferred_registry` / `set_deferred_registry` / `reset_deferred_registry` helpers Tests: 831 pass (57 for affected modules, 3 new tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(sandbox): mount /mnt/acp-workspace in docker sandbox container The AioSandboxProvider was not mounting the ACP workspace into the sandbox container, so /mnt/acp-workspace was inaccessible when the lead agent tried to read ACP results in docker mode. Changes: - `ensure_thread_dirs`: also create `acp-workspace/` (chmod 0o777) so the directory exists before the sandbox container starts — required for Docker volume mounts - `_get_thread_mounts`: add read-only `/mnt/acp-workspace` mount using the per-thread host path (`host_paths.acp_workspace_dir(thread_id)`) - Update stale CLAUDE.md description (was "fixed global workspace") Tests: `test_aio_sandbox_provider.py` (4 new tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lint): remove unused imports in test_aio_sandbox_provider Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix config --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
769 lines
31 KiB
Python
769 lines
31 KiB
Python
"""End-to-end tests for DeerFlowClient.
|
|
|
|
Middle tier of the test pyramid:
|
|
- Top: test_client_live.py — real LLM, needs API key
|
|
- Middle: test_client_e2e.py — real LLM + real modules ← THIS FILE
|
|
- Bottom: test_client.py — unit tests, mock everything
|
|
|
|
Core principle: use the real LLM from config.yaml, let config, middleware
|
|
chain, tool registration, file I/O, and event serialization all run for real.
|
|
Only DEER_FLOW_HOME is redirected to tmp_path for filesystem isolation.
|
|
|
|
Tests that call the LLM are marked ``requires_llm`` and skipped in CI.
|
|
File-management tests (upload/list/delete) don't need LLM and run everywhere.
|
|
"""
|
|
|
|
import json
|
|
import os
|
|
import uuid
|
|
import zipfile
|
|
|
|
import pytest
|
|
from dotenv import load_dotenv
|
|
|
|
from deerflow.client import DeerFlowClient, StreamEvent
|
|
from deerflow.config.app_config import AppConfig
|
|
from deerflow.config.model_config import ModelConfig
|
|
from deerflow.config.sandbox_config import SandboxConfig
|
|
|
|
# Load .env from project root (for OPENAI_API_KEY etc.)
|
|
load_dotenv(os.path.join(os.path.dirname(__file__), "../../.env"))
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Markers
|
|
# ---------------------------------------------------------------------------
|
|
|
|
requires_llm = pytest.mark.skipif(
|
|
os.getenv("CI", "").lower() in ("true", "1") or not os.getenv("OPENAI_API_KEY"),
|
|
reason="Requires LLM API key — skipped in CI or when OPENAI_API_KEY is unset",
|
|
)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Helpers
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def _make_e2e_config() -> AppConfig:
|
|
"""Build a minimal AppConfig using real LLM credentials from environment.
|
|
|
|
All LLM connection details come from environment variables so that both
|
|
internal CI and external contributors can run the tests:
|
|
|
|
- ``E2E_MODEL_NAME`` (default: ``volcengine-ark``)
|
|
- ``E2E_MODEL_USE`` (default: ``langchain_openai:ChatOpenAI``)
|
|
- ``E2E_MODEL_ID`` (default: ``ep-20251211175242-llcmh``)
|
|
- ``E2E_BASE_URL`` (default: ``https://ark-cn-beijing.bytedance.net/api/v3``)
|
|
- ``OPENAI_API_KEY`` (required for LLM tests)
|
|
"""
|
|
return AppConfig(
|
|
models=[
|
|
ModelConfig(
|
|
name=os.getenv("E2E_MODEL_NAME", "volcengine-ark"),
|
|
display_name="E2E Test Model",
|
|
use=os.getenv("E2E_MODEL_USE", "langchain_openai:ChatOpenAI"),
|
|
model=os.getenv("E2E_MODEL_ID", "ep-20251211175242-llcmh"),
|
|
base_url=os.getenv("E2E_BASE_URL", "https://ark-cn-beijing.bytedance.net/api/v3"),
|
|
api_key=os.getenv("OPENAI_API_KEY", ""),
|
|
max_tokens=512,
|
|
temperature=0.7,
|
|
supports_thinking=False,
|
|
supports_reasoning_effort=False,
|
|
supports_vision=False,
|
|
)
|
|
],
|
|
sandbox=SandboxConfig(use="deerflow.sandbox.local:LocalSandboxProvider"),
|
|
)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Fixtures
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
@pytest.fixture()
|
|
def e2e_env(tmp_path, monkeypatch):
|
|
"""Isolated filesystem environment for E2E tests.
|
|
|
|
- DEER_FLOW_HOME → tmp_path (all thread data lands in a temp dir)
|
|
- Singletons reset so they pick up the new env
|
|
- Title/memory/summarization disabled to avoid extra LLM calls
|
|
- AppConfig built programmatically (avoids config.yaml param-name issues)
|
|
"""
|
|
# 1. Filesystem isolation
|
|
monkeypatch.setenv("DEER_FLOW_HOME", str(tmp_path))
|
|
monkeypatch.setattr("deerflow.config.paths._paths", None)
|
|
monkeypatch.setattr("deerflow.sandbox.sandbox_provider._default_sandbox_provider", None)
|
|
|
|
# 2. Inject a clean AppConfig via the global singleton.
|
|
config = _make_e2e_config()
|
|
monkeypatch.setattr("deerflow.config.app_config._app_config", config)
|
|
monkeypatch.setattr("deerflow.config.app_config._app_config_is_custom", True)
|
|
|
|
# 3. Disable title generation (extra LLM call, non-deterministic)
|
|
from deerflow.config.title_config import TitleConfig
|
|
|
|
monkeypatch.setattr("deerflow.config.title_config._title_config", TitleConfig(enabled=False))
|
|
|
|
# 4. Disable memory queueing (avoids background threads & file writes)
|
|
from deerflow.config.memory_config import MemoryConfig
|
|
|
|
monkeypatch.setattr(
|
|
"deerflow.agents.middlewares.memory_middleware.get_memory_config",
|
|
lambda: MemoryConfig(enabled=False),
|
|
)
|
|
|
|
# 5. Ensure summarization is off (default, but be explicit)
|
|
from deerflow.config.summarization_config import SummarizationConfig
|
|
|
|
monkeypatch.setattr("deerflow.config.summarization_config._summarization_config", SummarizationConfig(enabled=False))
|
|
|
|
# 6. Exclude TitleMiddleware from the chain.
|
|
# It triggers an extra LLM call to generate a thread title, which adds
|
|
# non-determinism and cost to E2E tests (title generation is already
|
|
# disabled via TitleConfig above, but the middleware still participates
|
|
# in the chain and can interfere with event ordering).
|
|
from deerflow.agents.lead_agent.agent import _build_middlewares as _original_build_middlewares
|
|
from deerflow.agents.middlewares.title_middleware import TitleMiddleware
|
|
|
|
def _sync_safe_build_middlewares(*args, **kwargs):
|
|
mws = _original_build_middlewares(*args, **kwargs)
|
|
return [m for m in mws if not isinstance(m, TitleMiddleware)]
|
|
|
|
monkeypatch.setattr("deerflow.client._build_middlewares", _sync_safe_build_middlewares)
|
|
|
|
return {"tmp_path": tmp_path}
|
|
|
|
|
|
@pytest.fixture()
|
|
def client(e2e_env):
|
|
"""A DeerFlowClient wired to the isolated e2e_env."""
|
|
return DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Step 2: Basic streaming (requires LLM)
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestBasicChat:
|
|
"""Basic chat and streaming behavior with real LLM."""
|
|
|
|
@requires_llm
|
|
def test_basic_chat(self, client):
|
|
"""chat() returns a non-empty text response."""
|
|
result = client.chat("Say exactly: pong")
|
|
assert isinstance(result, str)
|
|
assert len(result) > 0
|
|
|
|
@requires_llm
|
|
def test_stream_event_sequence(self, client):
|
|
"""stream() yields events: messages-tuple, values, and end."""
|
|
events = list(client.stream("Say hi"))
|
|
|
|
types = [e.type for e in events]
|
|
assert types[-1] == "end"
|
|
assert "messages-tuple" in types
|
|
assert "values" in types
|
|
|
|
@requires_llm
|
|
def test_stream_event_data_format(self, client):
|
|
"""Each event type has the expected data structure."""
|
|
events = list(client.stream("Say hello"))
|
|
|
|
for event in events:
|
|
assert isinstance(event, StreamEvent)
|
|
assert isinstance(event.type, str)
|
|
assert isinstance(event.data, dict)
|
|
|
|
if event.type == "messages-tuple" and event.data.get("type") == "ai":
|
|
assert "content" in event.data
|
|
assert "id" in event.data
|
|
elif event.type == "values":
|
|
assert "messages" in event.data
|
|
assert "artifacts" in event.data
|
|
elif event.type == "end":
|
|
assert event.data == {}
|
|
|
|
@requires_llm
|
|
def test_multi_turn_stateless(self, client):
|
|
"""Without checkpointer, two calls to the same thread_id are independent."""
|
|
tid = str(uuid.uuid4())
|
|
|
|
r1 = client.chat("Remember the number 42", thread_id=tid)
|
|
# Reset so agent is recreated (simulates no cross-turn state)
|
|
client.reset_agent()
|
|
r2 = client.chat("What number did I say?", thread_id=tid)
|
|
|
|
# Without a checkpointer the second call has no memory of the first.
|
|
# We can't assert exact content, but both should be non-empty.
|
|
assert isinstance(r1, str) and len(r1) > 0
|
|
assert isinstance(r2, str) and len(r2) > 0
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Step 3: Tool call flow (requires LLM)
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestToolCallFlow:
|
|
"""Verify the LLM actually invokes tools through the real agent pipeline."""
|
|
|
|
@requires_llm
|
|
def test_tool_call_produces_events(self, client):
|
|
"""When the LLM decides to use a tool, we see tool call + result events."""
|
|
# Give a clear instruction that forces a tool call
|
|
events = list(client.stream("Use the bash tool to run: echo hello_e2e_test"))
|
|
|
|
types = [e.type for e in events]
|
|
assert types[-1] == "end"
|
|
|
|
# Should have at least one tool call event
|
|
tool_call_events = [e for e in events if e.type == "messages-tuple" and e.data.get("tool_calls")]
|
|
tool_result_events = [e for e in events if e.type == "messages-tuple" and e.data.get("type") == "tool"]
|
|
assert len(tool_call_events) >= 1, "Expected at least one tool_call event"
|
|
assert len(tool_result_events) >= 1, "Expected at least one tool result event"
|
|
|
|
@requires_llm
|
|
def test_tool_call_event_structure(self, client):
|
|
"""Tool call events contain name, args, and id fields."""
|
|
events = list(client.stream("Use the read_file tool to read /mnt/user-data/workspace/nonexistent.txt"))
|
|
|
|
tc_events = [e for e in events if e.type == "messages-tuple" and e.data.get("tool_calls")]
|
|
if tc_events:
|
|
tc = tc_events[0].data["tool_calls"][0]
|
|
assert "name" in tc
|
|
assert "args" in tc
|
|
assert "id" in tc
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Step 4: File upload integration (no LLM needed for most)
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestFileUploadIntegration:
|
|
"""Upload, list, and delete files through the real client path."""
|
|
|
|
def test_upload_files(self, e2e_env, tmp_path):
|
|
"""upload_files() copies files and returns metadata."""
|
|
test_file = tmp_path / "source" / "readme.txt"
|
|
test_file.parent.mkdir(parents=True, exist_ok=True)
|
|
test_file.write_text("Hello world")
|
|
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
tid = str(uuid.uuid4())
|
|
|
|
result = c.upload_files(tid, [test_file])
|
|
assert result["success"] is True
|
|
assert len(result["files"]) == 1
|
|
assert result["files"][0]["filename"] == "readme.txt"
|
|
|
|
# Physically exists
|
|
from deerflow.config.paths import get_paths
|
|
|
|
assert (get_paths().sandbox_uploads_dir(tid) / "readme.txt").exists()
|
|
|
|
def test_upload_duplicate_rename(self, e2e_env, tmp_path):
|
|
"""Uploading two files with the same name auto-renames the second."""
|
|
d1 = tmp_path / "dir1"
|
|
d2 = tmp_path / "dir2"
|
|
d1.mkdir()
|
|
d2.mkdir()
|
|
(d1 / "data.txt").write_text("content A")
|
|
(d2 / "data.txt").write_text("content B")
|
|
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
tid = str(uuid.uuid4())
|
|
|
|
result = c.upload_files(tid, [d1 / "data.txt", d2 / "data.txt"])
|
|
assert result["success"] is True
|
|
assert len(result["files"]) == 2
|
|
|
|
filenames = {f["filename"] for f in result["files"]}
|
|
assert "data.txt" in filenames
|
|
assert "data_1.txt" in filenames
|
|
|
|
def test_upload_list_and_delete(self, e2e_env, tmp_path):
|
|
"""Upload → list → delete → list lifecycle."""
|
|
test_file = tmp_path / "lifecycle.txt"
|
|
test_file.write_text("lifecycle test")
|
|
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
tid = str(uuid.uuid4())
|
|
|
|
c.upload_files(tid, [test_file])
|
|
|
|
listing = c.list_uploads(tid)
|
|
assert listing["count"] == 1
|
|
assert listing["files"][0]["filename"] == "lifecycle.txt"
|
|
|
|
del_result = c.delete_upload(tid, "lifecycle.txt")
|
|
assert del_result["success"] is True
|
|
|
|
listing = c.list_uploads(tid)
|
|
assert listing["count"] == 0
|
|
|
|
@requires_llm
|
|
def test_upload_then_chat(self, e2e_env, tmp_path):
|
|
"""Upload a file then ask the LLM about it — UploadsMiddleware injects file info."""
|
|
test_file = tmp_path / "source" / "notes.txt"
|
|
test_file.parent.mkdir(parents=True, exist_ok=True)
|
|
test_file.write_text("The secret code is 7749.")
|
|
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
tid = str(uuid.uuid4())
|
|
|
|
c.upload_files(tid, [test_file])
|
|
# Chat — the middleware should inject <uploaded_files> context
|
|
response = c.chat("What files are available?", thread_id=tid)
|
|
assert isinstance(response, str) and len(response) > 0
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Step 5: Lifecycle and configuration (no LLM needed)
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestLifecycleAndConfig:
|
|
"""Agent recreation and configuration behavior."""
|
|
|
|
@requires_llm
|
|
def test_agent_recreation_on_config_change(self, client):
|
|
"""Changing thinking_enabled triggers agent recreation (different config key)."""
|
|
list(client.stream("hi"))
|
|
key1 = client._agent_config_key
|
|
|
|
# Stream with a different config override
|
|
client.reset_agent()
|
|
list(client.stream("hi", thinking_enabled=True))
|
|
key2 = client._agent_config_key
|
|
|
|
# thinking_enabled changed: False → True → keys differ
|
|
assert key1 != key2
|
|
|
|
def test_reset_agent_clears_state(self, e2e_env):
|
|
"""reset_agent() sets the internal agent to None."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
# Before any call, agent is None
|
|
assert c._agent is None
|
|
|
|
c.reset_agent()
|
|
assert c._agent is None
|
|
assert c._agent_config_key is None
|
|
|
|
def test_plan_mode_config_key(self, e2e_env):
|
|
"""plan_mode is part of the config key tuple."""
|
|
c = DeerFlowClient(checkpointer=None, plan_mode=False)
|
|
cfg1 = c._get_runnable_config("test-thread")
|
|
key1 = (
|
|
cfg1["configurable"]["model_name"],
|
|
cfg1["configurable"]["thinking_enabled"],
|
|
cfg1["configurable"]["is_plan_mode"],
|
|
cfg1["configurable"]["subagent_enabled"],
|
|
)
|
|
|
|
c2 = DeerFlowClient(checkpointer=None, plan_mode=True)
|
|
cfg2 = c2._get_runnable_config("test-thread")
|
|
key2 = (
|
|
cfg2["configurable"]["model_name"],
|
|
cfg2["configurable"]["thinking_enabled"],
|
|
cfg2["configurable"]["is_plan_mode"],
|
|
cfg2["configurable"]["subagent_enabled"],
|
|
)
|
|
|
|
assert key1 != key2
|
|
assert key1[2] is False
|
|
assert key2[2] is True
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Step 6: Middleware chain verification (requires LLM)
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestMiddlewareChain:
|
|
"""Verify middleware side effects through real execution."""
|
|
|
|
@requires_llm
|
|
def test_thread_data_paths_in_state(self, client):
|
|
"""After streaming, thread directory paths are computed correctly."""
|
|
tid = str(uuid.uuid4())
|
|
events = list(client.stream("hi", thread_id=tid))
|
|
|
|
# The values event should contain messages
|
|
values_events = [e for e in events if e.type == "values"]
|
|
assert len(values_events) >= 1
|
|
|
|
# ThreadDataMiddleware should have set paths in the state.
|
|
# We verify the paths singleton can resolve the thread dir.
|
|
from deerflow.config.paths import get_paths
|
|
|
|
thread_dir = get_paths().thread_dir(tid)
|
|
assert str(thread_dir).endswith(tid)
|
|
|
|
@requires_llm
|
|
def test_stream_completes_without_middleware_errors(self, client):
|
|
"""Full middleware chain (ThreadData, Uploads, Sandbox, DanglingToolCall,
|
|
Memory, Clarification) executes without errors."""
|
|
events = list(client.stream("What is 1+1?"))
|
|
|
|
types = [e.type for e in events]
|
|
assert types[-1] == "end"
|
|
# Should have at least one AI response
|
|
ai_events = [e for e in events if e.type == "messages-tuple" and e.data.get("type") == "ai"]
|
|
assert len(ai_events) >= 1
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Step 7: Error and boundary conditions
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestErrorAndBoundary:
|
|
"""Error propagation and edge cases."""
|
|
|
|
def test_upload_nonexistent_file_raises(self, e2e_env):
|
|
"""Uploading a file that doesn't exist raises FileNotFoundError."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
with pytest.raises(FileNotFoundError):
|
|
c.upload_files("test-thread", ["/nonexistent/file.txt"])
|
|
|
|
def test_delete_nonexistent_upload_raises(self, e2e_env):
|
|
"""Deleting a file that doesn't exist raises FileNotFoundError."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
tid = str(uuid.uuid4())
|
|
# Ensure the uploads dir exists first
|
|
c.list_uploads(tid)
|
|
with pytest.raises(FileNotFoundError):
|
|
c.delete_upload(tid, "ghost.txt")
|
|
|
|
def test_artifact_path_traversal_blocked(self, e2e_env):
|
|
"""get_artifact blocks path traversal attempts."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
with pytest.raises(ValueError):
|
|
c.get_artifact("test-thread", "../../etc/passwd")
|
|
|
|
def test_upload_directory_rejected(self, e2e_env, tmp_path):
|
|
"""Uploading a directory (not a file) is rejected."""
|
|
d = tmp_path / "a_directory"
|
|
d.mkdir()
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
with pytest.raises(ValueError, match="not a file"):
|
|
c.upload_files("test-thread", [d])
|
|
|
|
@requires_llm
|
|
def test_empty_message_still_gets_response(self, client):
|
|
"""Even an empty-ish message should produce a valid event stream."""
|
|
events = list(client.stream(" "))
|
|
types = [e.type for e in events]
|
|
assert types[-1] == "end"
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Step 8: Artifact access (no LLM needed)
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestArtifactAccess:
|
|
"""Read artifacts through get_artifact() with real filesystem."""
|
|
|
|
def test_get_artifact_happy_path(self, e2e_env):
|
|
"""Write a file to outputs, then read it back via get_artifact()."""
|
|
from deerflow.config.paths import get_paths
|
|
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
tid = str(uuid.uuid4())
|
|
|
|
# Create an output file in the thread's outputs directory
|
|
outputs_dir = get_paths().sandbox_outputs_dir(tid)
|
|
outputs_dir.mkdir(parents=True, exist_ok=True)
|
|
(outputs_dir / "result.txt").write_text("hello artifact")
|
|
|
|
data, mime = c.get_artifact(tid, "mnt/user-data/outputs/result.txt")
|
|
assert data == b"hello artifact"
|
|
assert "text" in mime
|
|
|
|
def test_get_artifact_nested_path(self, e2e_env):
|
|
"""Artifacts in subdirectories are accessible."""
|
|
from deerflow.config.paths import get_paths
|
|
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
tid = str(uuid.uuid4())
|
|
|
|
outputs_dir = get_paths().sandbox_outputs_dir(tid)
|
|
sub = outputs_dir / "charts"
|
|
sub.mkdir(parents=True, exist_ok=True)
|
|
(sub / "data.json").write_text('{"x": 1}')
|
|
|
|
data, mime = c.get_artifact(tid, "mnt/user-data/outputs/charts/data.json")
|
|
assert b'"x"' in data
|
|
assert "json" in mime
|
|
|
|
def test_get_artifact_nonexistent_raises(self, e2e_env):
|
|
"""Reading a nonexistent artifact raises FileNotFoundError."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
with pytest.raises(FileNotFoundError):
|
|
c.get_artifact("test-thread", "mnt/user-data/outputs/ghost.txt")
|
|
|
|
def test_get_artifact_traversal_within_prefix_blocked(self, e2e_env):
|
|
"""Path traversal within the valid prefix is still blocked."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
with pytest.raises((PermissionError, ValueError, FileNotFoundError)):
|
|
c.get_artifact("test-thread", "mnt/user-data/outputs/../../etc/passwd")
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Step 9: Skill installation (no LLM needed)
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestSkillInstallation:
|
|
"""install_skill() with real ZIP handling and filesystem."""
|
|
|
|
@pytest.fixture(autouse=True)
|
|
def _isolate_skills_dir(self, tmp_path, monkeypatch):
|
|
"""Redirect skill installation to a temp directory."""
|
|
skills_root = tmp_path / "skills"
|
|
(skills_root / "public").mkdir(parents=True)
|
|
(skills_root / "custom").mkdir(parents=True)
|
|
monkeypatch.setattr(
|
|
"deerflow.skills.installer.get_skills_root_path",
|
|
lambda: skills_root,
|
|
)
|
|
self._skills_root = skills_root
|
|
|
|
@staticmethod
|
|
def _make_skill_zip(tmp_path, skill_name="test-e2e-skill"):
|
|
"""Create a minimal valid .skill archive."""
|
|
skill_dir = tmp_path / "build" / skill_name
|
|
skill_dir.mkdir(parents=True)
|
|
(skill_dir / "SKILL.md").write_text(f"---\nname: {skill_name}\ndescription: E2E test skill\n---\n\nTest content.\n")
|
|
archive_path = tmp_path / f"{skill_name}.skill"
|
|
with zipfile.ZipFile(archive_path, "w") as zf:
|
|
for file in skill_dir.rglob("*"):
|
|
zf.write(file, file.relative_to(tmp_path / "build"))
|
|
return archive_path
|
|
|
|
def test_install_skill_success(self, e2e_env, tmp_path):
|
|
"""A valid .skill archive installs to the custom skills directory."""
|
|
archive = self._make_skill_zip(tmp_path)
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
|
|
result = c.install_skill(archive)
|
|
assert result["success"] is True
|
|
assert result["skill_name"] == "test-e2e-skill"
|
|
assert (self._skills_root / "custom" / "test-e2e-skill" / "SKILL.md").exists()
|
|
|
|
def test_install_skill_duplicate_rejected(self, e2e_env, tmp_path):
|
|
"""Installing the same skill twice raises ValueError."""
|
|
archive = self._make_skill_zip(tmp_path)
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
|
|
c.install_skill(archive)
|
|
with pytest.raises(ValueError, match="already exists"):
|
|
c.install_skill(archive)
|
|
|
|
def test_install_skill_invalid_extension(self, e2e_env, tmp_path):
|
|
"""A file without .skill extension is rejected."""
|
|
bad_file = tmp_path / "not_a_skill.zip"
|
|
bad_file.write_bytes(b"PK\x03\x04") # ZIP magic bytes
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
with pytest.raises(ValueError, match=".skill extension"):
|
|
c.install_skill(bad_file)
|
|
|
|
def test_install_skill_missing_frontmatter(self, e2e_env, tmp_path):
|
|
"""A .skill archive without valid SKILL.md frontmatter is rejected."""
|
|
skill_dir = tmp_path / "build" / "bad-skill"
|
|
skill_dir.mkdir(parents=True)
|
|
(skill_dir / "SKILL.md").write_text("No frontmatter here.")
|
|
|
|
archive = tmp_path / "bad-skill.skill"
|
|
with zipfile.ZipFile(archive, "w") as zf:
|
|
for file in skill_dir.rglob("*"):
|
|
zf.write(file, file.relative_to(tmp_path / "build"))
|
|
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
with pytest.raises(ValueError, match="Invalid skill"):
|
|
c.install_skill(archive)
|
|
|
|
def test_install_skill_nonexistent_file(self, e2e_env):
|
|
"""Installing from a nonexistent path raises FileNotFoundError."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
with pytest.raises(FileNotFoundError):
|
|
c.install_skill("/nonexistent/skill.skill")
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Step 10: Configuration management (no LLM needed)
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestConfigManagement:
|
|
"""Config queries and updates through real code paths."""
|
|
|
|
def test_list_models_returns_injected_config(self, e2e_env):
|
|
"""list_models() returns the model from the injected AppConfig."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
result = c.list_models()
|
|
assert "models" in result
|
|
assert len(result["models"]) == 1
|
|
assert result["models"][0]["name"] == "volcengine-ark"
|
|
assert result["models"][0]["display_name"] == "E2E Test Model"
|
|
|
|
def test_get_model_found(self, e2e_env):
|
|
"""get_model() returns the model when it exists."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
model = c.get_model("volcengine-ark")
|
|
assert model is not None
|
|
assert model["name"] == "volcengine-ark"
|
|
assert model["supports_thinking"] is False
|
|
|
|
def test_get_model_not_found(self, e2e_env):
|
|
"""get_model() returns None for nonexistent model."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
assert c.get_model("nonexistent-model") is None
|
|
|
|
def test_list_skills_returns_list(self, e2e_env):
|
|
"""list_skills() returns a dict with 'skills' key from real directory scan."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
result = c.list_skills()
|
|
assert "skills" in result
|
|
assert isinstance(result["skills"], list)
|
|
# The real skills/ directory should have some public skills
|
|
assert len(result["skills"]) > 0
|
|
|
|
def test_get_skill_found(self, e2e_env):
|
|
"""get_skill() returns skill info for a known public skill."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
# 'deep-research' is a built-in public skill
|
|
skill = c.get_skill("deep-research")
|
|
if skill is not None:
|
|
assert skill["name"] == "deep-research"
|
|
assert "description" in skill
|
|
assert "enabled" in skill
|
|
|
|
def test_get_skill_not_found(self, e2e_env):
|
|
"""get_skill() returns None for nonexistent skill."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
assert c.get_skill("nonexistent-skill-xyz") is None
|
|
|
|
def test_get_mcp_config_returns_dict(self, e2e_env):
|
|
"""get_mcp_config() returns a dict with 'mcp_servers' key."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
result = c.get_mcp_config()
|
|
assert "mcp_servers" in result
|
|
assert isinstance(result["mcp_servers"], dict)
|
|
|
|
def test_update_mcp_config_writes_and_invalidates(self, e2e_env, tmp_path, monkeypatch):
|
|
"""update_mcp_config() writes extensions_config.json and invalidates the agent."""
|
|
# Set up a writable extensions_config.json
|
|
config_file = tmp_path / "extensions_config.json"
|
|
config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
|
|
monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
|
|
|
|
# Force reload so the singleton picks up our test file
|
|
from deerflow.config.extensions_config import reload_extensions_config
|
|
|
|
reload_extensions_config()
|
|
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
# Simulate a cached agent
|
|
c._agent = "fake-agent-placeholder"
|
|
c._agent_config_key = ("a", "b", "c", "d")
|
|
|
|
result = c.update_mcp_config({"test-server": {"enabled": True, "type": "stdio", "command": "echo"}})
|
|
assert "mcp_servers" in result
|
|
|
|
# Agent should be invalidated
|
|
assert c._agent is None
|
|
assert c._agent_config_key is None
|
|
|
|
# File should be written
|
|
written = json.loads(config_file.read_text())
|
|
assert "test-server" in written["mcpServers"]
|
|
|
|
def test_update_skill_writes_and_invalidates(self, e2e_env, tmp_path, monkeypatch):
|
|
"""update_skill() writes extensions_config.json and invalidates the agent."""
|
|
config_file = tmp_path / "extensions_config.json"
|
|
config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
|
|
monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
|
|
|
|
from deerflow.config.extensions_config import reload_extensions_config
|
|
|
|
reload_extensions_config()
|
|
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
c._agent = "fake-agent-placeholder"
|
|
c._agent_config_key = ("a", "b", "c", "d")
|
|
|
|
# Use a real skill name from the public skills directory
|
|
skills = c.list_skills()
|
|
if not skills["skills"]:
|
|
pytest.skip("No skills available for testing")
|
|
skill_name = skills["skills"][0]["name"]
|
|
|
|
result = c.update_skill(skill_name, enabled=False)
|
|
assert result["name"] == skill_name
|
|
assert result["enabled"] is False
|
|
|
|
# Agent should be invalidated
|
|
assert c._agent is None
|
|
assert c._agent_config_key is None
|
|
|
|
def test_update_skill_nonexistent_raises(self, e2e_env, tmp_path, monkeypatch):
|
|
"""update_skill() raises ValueError for nonexistent skill."""
|
|
config_file = tmp_path / "extensions_config.json"
|
|
config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
|
|
monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
|
|
|
|
from deerflow.config.extensions_config import reload_extensions_config
|
|
|
|
reload_extensions_config()
|
|
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
with pytest.raises(ValueError, match="not found"):
|
|
c.update_skill("nonexistent-skill-xyz", enabled=True)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Step 11: Memory access (no LLM needed)
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestMemoryAccess:
|
|
"""Memory system queries through real code paths."""
|
|
|
|
def test_get_memory_returns_dict(self, e2e_env):
|
|
"""get_memory() returns a dict (may be empty initial state)."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
result = c.get_memory()
|
|
assert isinstance(result, dict)
|
|
|
|
def test_reload_memory_returns_dict(self, e2e_env):
|
|
"""reload_memory() forces reload and returns a dict."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
result = c.reload_memory()
|
|
assert isinstance(result, dict)
|
|
|
|
def test_get_memory_config_fields(self, e2e_env):
|
|
"""get_memory_config() returns expected config fields."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
result = c.get_memory_config()
|
|
assert "enabled" in result
|
|
assert "storage_path" in result
|
|
assert "debounce_seconds" in result
|
|
assert "max_facts" in result
|
|
assert "fact_confidence_threshold" in result
|
|
assert "injection_enabled" in result
|
|
assert "max_injection_tokens" in result
|
|
|
|
def test_get_memory_status_combines_config_and_data(self, e2e_env):
|
|
"""get_memory_status() returns both 'config' and 'data' keys."""
|
|
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
|
result = c.get_memory_status()
|
|
assert "config" in result
|
|
assert "data" in result
|
|
assert "enabled" in result["config"]
|
|
assert isinstance(result["data"], dict)
|