mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-03 06:12:14 +08:00
refactor: extract shared skill installer and upload manager to harness (#1202)
* refactor: extract shared skill installer and upload manager to harness Move duplicated business logic from Gateway routers and Client into shared harness modules, eliminating code duplication. New shared modules: - deerflow.skills.installer: 6 functions (zip security, extraction, install) - deerflow.uploads.manager: 7 functions (normalize, deduplicate, validate, list, delete, get_uploads_dir, ensure_uploads_dir) Key improvements: - SkillAlreadyExistsError replaces stringly-typed 409 status routing - normalize_filename rejects backslash-containing filenames - Read paths (list/delete) no longer mkdir via get_uploads_dir - Write paths use ensure_uploads_dir for explicit directory creation - list_files_in_dir does stat inside scandir context (no re-stat) - install_skill_from_archive uses single is_file() check (one syscall) - Fix agent config key not reset on update_mcp_config/update_skill Tests: 42 new (22 installer + 20 upload manager) + client hardening * refactor: centralize upload URL construction and clean up installer - Extract upload_virtual_path(), upload_artifact_url(), enrich_file_listing() into shared manager.py, eliminating 6 duplicated URL constructions across Gateway router and Client - Derive all upload URLs from VIRTUAL_PATH_PREFIX constant instead of hardcoded "mnt/user-data/uploads" strings - Eliminate TOCTOU pre-checks and double file read in installer — single ZipFile() open with exception handling replaces is_file() + is_zipfile() + ZipFile() sequence - Add missing re-exports: ensure_uploads_dir in uploads/__init__.py, SkillAlreadyExistsError in skills/__init__.py - Remove redundant .lower() on already-lowercase CONVERTIBLE_EXTENSIONS - Hoist sandbox_uploads_dir(thread_id) before loop in uploads router * fix: add input validation for thread_id and filename length - Reject thread_id containing unsafe filesystem characters (only allow alphanumeric, hyphens, underscores, dots) — prevents 500 on inputs like <script> or shell metacharacters - Reject filenames longer than 255 bytes (OS limit) in normalize_filename - Gateway upload router maps ValueError to 400 for invalid thread_id * fix: address PR review — symlink safety, input validation coverage, error ordering - list_files_in_dir: use follow_symlinks=False to prevent symlink metadata leakage; check is_dir() instead of exists() for non-directory paths - install_skill_from_archive: restore is_file() pre-check before extension validation so error messages match the documented exception contract - validate_thread_id: move from ensure_uploads_dir to get_uploads_dir so all entry points (upload/list/delete) are protected - delete_uploaded_file: catch ValueError from thread_id validation (was 500) - requires_llm marker: also skip when OPENAI_API_KEY is unset - e2e fixture: update TitleMiddleware exclusion comment (kept filtering — middleware triggers extra LLM calls that add non-determinism to tests) * chore: revert uv.lock to main — no dependency changes in this PR * fix: use monkeypatch for global config in e2e fixture to prevent test pollution The e2e_env fixture was calling set_title_config() and set_summarization_config() directly, which mutated global singletons without automatic cleanup. When pytest ran test_client_e2e.py before test_title_middleware_core_logic.py, the leaked enabled=False caused 5 title tests to fail in CI. Switched to monkeypatch.setattr on the module-level private variables so pytest restores the originals after each test. * fix: address code review — URL encoding, API consistency, test isolation - upload_artifact_url: percent-encode filename to handle spaces/#/? - deduplicate_filename: mutate seen set in place (caller no longer needs manual .add() — less error-prone API) - list_files_in_dir: document that size is int, enrich stringifies - e2e fixture: monkeypatch _app_config instead of set_app_config() to prevent global singleton pollution (same pattern as title/summarization fix) - _make_e2e_config: read LLM connection details from env vars so external contributors can override defaults - Update tests to match new deduplicate_filename contract * docs: rewrite RFC in English and add alternatives/breaking changes sections * fix: address code review feedback on PR #1202 - Rename deduplicate_filename to claim_unique_filename to make the in-place set mutation explicit in the function name - Replace PermissionError with PathTraversalError(ValueError) for path traversal detection — malformed input is 400, not 403 * fix: set _app_config_is_custom in e2e test fixture to prevent config.yaml lookup in CI --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
This commit is contained in:
@@ -9,7 +9,7 @@ from pathlib import Path
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
from langchain_core.messages import AIMessage, HumanMessage, ToolMessage # noqa: F401
|
||||
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage # noqa: F401
|
||||
|
||||
from app.gateway.routers.mcp import McpConfigResponse
|
||||
from app.gateway.routers.memory import MemoryConfigResponse, MemoryStatusResponse
|
||||
@@ -17,6 +17,8 @@ from app.gateway.routers.models import ModelResponse, ModelsListResponse
|
||||
from app.gateway.routers.skills import SkillInstallResponse, SkillResponse, SkillsListResponse
|
||||
from app.gateway.routers.uploads import UploadResponse
|
||||
from deerflow.client import DeerFlowClient
|
||||
from deerflow.config.paths import Paths
|
||||
from deerflow.uploads.manager import PathTraversalError
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Fixtures
|
||||
@@ -609,10 +611,7 @@ class TestSkillsManagement:
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.loader.get_skills_root_path", return_value=skills_root),
|
||||
patch("deerflow.skills.validation._validate_skill_frontmatter", return_value=(True, "OK", "my-skill")),
|
||||
):
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
|
||||
result = client.install_skill(archive_path)
|
||||
|
||||
assert result["success"] is True
|
||||
@@ -700,7 +699,7 @@ class TestUploads:
|
||||
uploads_dir = tmp_path / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.upload_files("thread-1", [src_file])
|
||||
|
||||
assert result["success"] is True
|
||||
@@ -756,7 +755,7 @@ class TestUploads:
|
||||
return client.upload_files("thread-async", [first, second])
|
||||
|
||||
with (
|
||||
patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir),
|
||||
patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir),
|
||||
patch("deerflow.utils.file_conversion.CONVERTIBLE_EXTENSIONS", {".pdf"}),
|
||||
patch("deerflow.utils.file_conversion.convert_file_to_markdown", side_effect=fake_convert),
|
||||
patch("concurrent.futures.ThreadPoolExecutor", FakeExecutor),
|
||||
@@ -777,7 +776,7 @@ class TestUploads:
|
||||
(uploads_dir / "a.txt").write_text("a")
|
||||
(uploads_dir / "b.txt").write_text("bb")
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.list_uploads("thread-1")
|
||||
|
||||
assert result["count"] == 2
|
||||
@@ -793,7 +792,7 @@ class TestUploads:
|
||||
uploads_dir = Path(tmp)
|
||||
(uploads_dir / "delete-me.txt").write_text("gone")
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.delete_upload("thread-1", "delete-me.txt")
|
||||
|
||||
assert result["success"] is True
|
||||
@@ -802,15 +801,15 @@ class TestUploads:
|
||||
|
||||
def test_delete_upload_not_found(self, client):
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=Path(tmp)):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=Path(tmp)):
|
||||
with pytest.raises(FileNotFoundError):
|
||||
client.delete_upload("thread-1", "nope.txt")
|
||||
|
||||
def test_delete_upload_path_traversal(self, client):
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
uploads_dir = Path(tmp)
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with pytest.raises(PermissionError):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
with pytest.raises(PathTraversalError):
|
||||
client.delete_upload("thread-1", "../../etc/passwd")
|
||||
|
||||
|
||||
@@ -822,15 +821,12 @@ class TestUploads:
|
||||
class TestArtifacts:
|
||||
def test_get_artifact(self, client):
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
user_data_dir = Path(tmp) / "user-data"
|
||||
outputs = user_data_dir / "outputs"
|
||||
paths = Paths(base_dir=tmp)
|
||||
outputs = paths.sandbox_outputs_dir("t1")
|
||||
outputs.mkdir(parents=True)
|
||||
(outputs / "result.txt").write_text("artifact content")
|
||||
|
||||
mock_paths = MagicMock()
|
||||
mock_paths.sandbox_user_data_dir.return_value = user_data_dir
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=mock_paths):
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
content, mime = client.get_artifact("t1", "mnt/user-data/outputs/result.txt")
|
||||
|
||||
assert content == b"artifact content"
|
||||
@@ -838,13 +834,10 @@ class TestArtifacts:
|
||||
|
||||
def test_get_artifact_not_found(self, client):
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
user_data_dir = Path(tmp) / "user-data"
|
||||
user_data_dir.mkdir()
|
||||
paths = Paths(base_dir=tmp)
|
||||
paths.sandbox_user_data_dir("t1").mkdir(parents=True)
|
||||
|
||||
mock_paths = MagicMock()
|
||||
mock_paths.sandbox_user_data_dir.return_value = user_data_dir
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=mock_paths):
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
with pytest.raises(FileNotFoundError):
|
||||
client.get_artifact("t1", "mnt/user-data/outputs/nope.txt")
|
||||
|
||||
@@ -854,14 +847,11 @@ class TestArtifacts:
|
||||
|
||||
def test_get_artifact_path_traversal(self, client):
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
user_data_dir = Path(tmp) / "user-data"
|
||||
user_data_dir.mkdir()
|
||||
paths = Paths(base_dir=tmp)
|
||||
paths.sandbox_user_data_dir("t1").mkdir(parents=True)
|
||||
|
||||
mock_paths = MagicMock()
|
||||
mock_paths.sandbox_user_data_dir.return_value = user_data_dir
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=mock_paths):
|
||||
with pytest.raises(PermissionError):
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
with pytest.raises(PathTraversalError):
|
||||
client.get_artifact("t1", "mnt/user-data/../../../etc/passwd")
|
||||
|
||||
|
||||
@@ -1013,7 +1003,7 @@ class TestScenarioFileLifecycle:
|
||||
(tmp_path / "report.txt").write_text("quarterly report data")
|
||||
(tmp_path / "data.csv").write_text("a,b,c\n1,2,3")
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
# Step 1: Upload
|
||||
result = client.upload_files(
|
||||
"t-lifecycle",
|
||||
@@ -1046,15 +1036,16 @@ class TestScenarioFileLifecycle:
|
||||
tmp_path = Path(tmp)
|
||||
uploads_dir = tmp_path / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
user_data_dir = tmp_path / "user-data"
|
||||
outputs_dir = user_data_dir / "outputs"
|
||||
|
||||
paths = Paths(base_dir=tmp_path)
|
||||
outputs_dir = paths.sandbox_outputs_dir("t-artifact")
|
||||
outputs_dir.mkdir(parents=True)
|
||||
|
||||
# Upload phase
|
||||
src_file = tmp_path / "input.txt"
|
||||
src_file.write_text("raw data to process")
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
uploaded = client.upload_files("t-artifact", [src_file])
|
||||
assert len(uploaded["files"]) == 1
|
||||
|
||||
@@ -1062,10 +1053,7 @@ class TestScenarioFileLifecycle:
|
||||
(outputs_dir / "analysis.json").write_text('{"result": "processed"}')
|
||||
|
||||
# Retrieve artifact
|
||||
mock_paths = MagicMock()
|
||||
mock_paths.sandbox_user_data_dir.return_value = user_data_dir
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=mock_paths):
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
content, mime = client.get_artifact("t-artifact", "mnt/user-data/outputs/analysis.json")
|
||||
|
||||
assert json.loads(content) == {"result": "processed"}
|
||||
@@ -1286,7 +1274,7 @@ class TestScenarioThreadIsolation:
|
||||
def get_dir(thread_id):
|
||||
return uploads_a if thread_id == "thread-a" else uploads_b
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", side_effect=get_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", side_effect=get_dir), patch("deerflow.client.ensure_uploads_dir", side_effect=get_dir):
|
||||
client.upload_files("thread-a", [src_file])
|
||||
|
||||
files_a = client.list_uploads("thread-a")
|
||||
@@ -1298,18 +1286,13 @@ class TestScenarioThreadIsolation:
|
||||
def test_artifacts_isolated_per_thread(self, client):
|
||||
"""Artifacts in thread-A are not accessible from thread-B."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
paths = Paths(base_dir=tmp)
|
||||
outputs_a = paths.sandbox_outputs_dir("thread-a")
|
||||
outputs_a.mkdir(parents=True)
|
||||
paths.sandbox_user_data_dir("thread-b").mkdir(parents=True)
|
||||
(outputs_a / "result.txt").write_text("thread-a artifact")
|
||||
|
||||
data_a = tmp_path / "thread-a"
|
||||
data_b = tmp_path / "thread-b"
|
||||
(data_a / "outputs").mkdir(parents=True)
|
||||
(data_b / "outputs").mkdir(parents=True)
|
||||
(data_a / "outputs" / "result.txt").write_text("thread-a artifact")
|
||||
|
||||
mock_paths = MagicMock()
|
||||
mock_paths.sandbox_user_data_dir.side_effect = lambda tid: data_a if tid == "thread-a" else data_b
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=mock_paths):
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
content, _ = client.get_artifact("thread-a", "mnt/user-data/outputs/result.txt")
|
||||
assert content == b"thread-a artifact"
|
||||
|
||||
@@ -1377,10 +1360,7 @@ class TestScenarioSkillInstallAndUse:
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
# Step 1: Install
|
||||
with (
|
||||
patch("deerflow.skills.loader.get_skills_root_path", return_value=skills_root),
|
||||
patch("deerflow.skills.validation._validate_skill_frontmatter", return_value=(True, "OK", "my-analyzer")),
|
||||
):
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
|
||||
result = client.install_skill(archive)
|
||||
assert result["success"] is True
|
||||
assert (skills_root / "custom" / "my-analyzer" / "SKILL.md").exists()
|
||||
@@ -1512,7 +1492,7 @@ class TestScenarioEdgeCases:
|
||||
pdf_file.write_bytes(b"%PDF-1.4 fake content")
|
||||
|
||||
with (
|
||||
patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir),
|
||||
patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir),
|
||||
patch("deerflow.utils.file_conversion.CONVERTIBLE_EXTENSIONS", {".pdf"}),
|
||||
patch("deerflow.utils.file_conversion.convert_file_to_markdown", side_effect=Exception("conversion failed")),
|
||||
):
|
||||
@@ -1614,9 +1594,7 @@ class TestGatewayConformance:
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.write(skill_dir / "SKILL.md", "my-skill/SKILL.md")
|
||||
|
||||
custom_dir = tmp_path / "custom"
|
||||
custom_dir.mkdir()
|
||||
with patch("deerflow.skills.loader.get_skills_root_path", return_value=tmp_path):
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=tmp_path):
|
||||
result = client.install_skill(archive)
|
||||
|
||||
parsed = SkillInstallResponse(**result)
|
||||
@@ -1680,7 +1658,7 @@ class TestGatewayConformance:
|
||||
src_file = tmp_path / "hello.txt"
|
||||
src_file.write_text("hello")
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.upload_files("t-conform", [src_file])
|
||||
|
||||
parsed = UploadResponse(**result)
|
||||
@@ -1739,3 +1717,694 @@ class TestGatewayConformance:
|
||||
parsed = MemoryStatusResponse(**result)
|
||||
assert parsed.config.enabled is True
|
||||
assert parsed.data.version == "1.0"
|
||||
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — install_skill security gates
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestInstallSkillSecurity:
|
||||
"""Every security gate in install_skill() must have a red-line test."""
|
||||
|
||||
def test_zip_bomb_rejected(self, client):
|
||||
"""Archives whose extracted size exceeds the limit are rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
archive = Path(tmp) / "bomb.skill"
|
||||
# Create a small archive that claims huge uncompressed size.
|
||||
# Write 200 bytes but the safe_extract checks cumulative file_size.
|
||||
data = b"\x00" * 200
|
||||
with zipfile.ZipFile(archive, "w", compression=zipfile.ZIP_DEFLATED) as zf:
|
||||
zf.writestr("big.bin", data)
|
||||
|
||||
skills_root = Path(tmp) / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
# Patch max_total_size to a small value to trigger the bomb check.
|
||||
from deerflow.skills import installer as _installer
|
||||
orig = _installer.safe_extract_skill_archive
|
||||
|
||||
def patched_extract(zf, dest, max_total_size=100):
|
||||
return orig(zf, dest, max_total_size=100)
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
|
||||
patch("deerflow.skills.installer.safe_extract_skill_archive", side_effect=patched_extract),
|
||||
):
|
||||
with pytest.raises(ValueError, match="too large"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_absolute_path_in_archive_rejected(self, client):
|
||||
"""ZIP entries with absolute paths are rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
archive = Path(tmp) / "abs.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.writestr("/etc/passwd", "root:x:0:0")
|
||||
|
||||
skills_root = Path(tmp) / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
|
||||
with pytest.raises(ValueError, match="unsafe"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_dotdot_path_in_archive_rejected(self, client):
|
||||
"""ZIP entries with '..' path components are rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
archive = Path(tmp) / "traversal.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.writestr("skill/../../../etc/shadow", "bad")
|
||||
|
||||
skills_root = Path(tmp) / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
|
||||
with pytest.raises(ValueError, match="unsafe"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_symlinks_skipped_during_extraction(self, client):
|
||||
"""Symlink entries in the archive are skipped (never written to disk)."""
|
||||
import stat as stat_mod
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
|
||||
archive = tmp_path / "sym-skill.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.writestr("sym-skill/SKILL.md", "---\nname: sym-skill\ndescription: test\n---\nBody")
|
||||
# Inject a symlink entry via ZipInfo with Unix symlink mode.
|
||||
link_info = zipfile.ZipInfo("sym-skill/sneaky_link")
|
||||
link_info.external_attr = (stat_mod.S_IFLNK | 0o777) << 16
|
||||
zf.writestr(link_info, "/etc/passwd")
|
||||
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
|
||||
result = client.install_skill(archive)
|
||||
|
||||
assert result["success"] is True
|
||||
installed = skills_root / "custom" / "sym-skill"
|
||||
assert (installed / "SKILL.md").exists()
|
||||
assert not (installed / "sneaky_link").exists()
|
||||
|
||||
def test_invalid_skill_name_rejected(self, client):
|
||||
"""Skill names containing special characters are rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
|
||||
skill_dir = tmp_path / "bad-name"
|
||||
skill_dir.mkdir()
|
||||
(skill_dir / "SKILL.md").write_text("---\nname: ../evil\ndescription: test\n---\n")
|
||||
|
||||
archive = tmp_path / "bad.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.write(skill_dir / "SKILL.md", "bad-name/SKILL.md")
|
||||
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
|
||||
patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(True, "OK", "../evil")),
|
||||
):
|
||||
with pytest.raises(ValueError, match="Invalid skill name"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_existing_skill_rejected(self, client):
|
||||
"""Installing a skill that already exists is rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
|
||||
skill_dir = tmp_path / "dupe-skill"
|
||||
skill_dir.mkdir()
|
||||
(skill_dir / "SKILL.md").write_text("---\nname: dupe-skill\ndescription: test\n---\n")
|
||||
|
||||
archive = tmp_path / "dupe-skill.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.write(skill_dir / "SKILL.md", "dupe-skill/SKILL.md")
|
||||
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "custom" / "dupe-skill").mkdir(parents=True)
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
|
||||
patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(True, "OK", "dupe-skill")),
|
||||
):
|
||||
with pytest.raises(ValueError, match="already exists"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_empty_archive_rejected(self, client):
|
||||
"""An archive with no entries is rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
archive = Path(tmp) / "empty.skill"
|
||||
with zipfile.ZipFile(archive, "w"):
|
||||
pass # empty archive
|
||||
|
||||
skills_root = Path(tmp) / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
|
||||
with pytest.raises(ValueError, match="empty"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_invalid_frontmatter_rejected(self, client):
|
||||
"""Archive with invalid SKILL.md frontmatter is rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
skill_dir = tmp_path / "bad-meta"
|
||||
skill_dir.mkdir()
|
||||
(skill_dir / "SKILL.md").write_text("no frontmatter at all")
|
||||
|
||||
archive = tmp_path / "bad-meta.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.write(skill_dir / "SKILL.md", "bad-meta/SKILL.md")
|
||||
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
|
||||
patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(False, "Missing name field", "")),
|
||||
):
|
||||
with pytest.raises(ValueError, match="Invalid skill"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_not_a_zip_rejected(self, client):
|
||||
"""A .skill file that is not a valid ZIP is rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
archive = Path(tmp) / "fake.skill"
|
||||
archive.write_text("this is not a zip file")
|
||||
|
||||
with pytest.raises(ValueError, match="not a valid ZIP"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_directory_path_rejected(self, client):
|
||||
"""Passing a directory instead of a file is rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
with pytest.raises(ValueError, match="not a file"):
|
||||
client.install_skill(tmp)
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — _atomic_write_json error paths
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestAtomicWriteJson:
|
||||
def test_temp_file_cleaned_on_serialization_failure(self):
|
||||
"""If json.dump raises, the temp file is removed."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
target = Path(tmp) / "config.json"
|
||||
|
||||
# An object that cannot be serialized to JSON.
|
||||
bad_data = {"key": object()}
|
||||
|
||||
with pytest.raises(TypeError):
|
||||
DeerFlowClient._atomic_write_json(target, bad_data)
|
||||
|
||||
# Target should not have been created.
|
||||
assert not target.exists()
|
||||
# No stray .tmp files should remain.
|
||||
tmp_files = list(Path(tmp).glob("*.tmp"))
|
||||
assert tmp_files == []
|
||||
|
||||
def test_happy_path_writes_atomically(self):
|
||||
"""Normal write produces correct JSON and no temp files."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
target = Path(tmp) / "out.json"
|
||||
data = {"key": "value", "nested": [1, 2, 3]}
|
||||
|
||||
DeerFlowClient._atomic_write_json(target, data)
|
||||
|
||||
assert target.exists()
|
||||
with open(target) as f:
|
||||
loaded = json.load(f)
|
||||
assert loaded == data
|
||||
# No temp files left behind.
|
||||
assert list(Path(tmp).glob("*.tmp")) == []
|
||||
|
||||
def test_original_preserved_on_failure(self):
|
||||
"""If write fails, the original file is not corrupted."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
target = Path(tmp) / "config.json"
|
||||
target.write_text('{"original": true}')
|
||||
|
||||
bad_data = {"key": object()}
|
||||
with pytest.raises(TypeError):
|
||||
DeerFlowClient._atomic_write_json(target, bad_data)
|
||||
|
||||
# Original content must survive.
|
||||
with open(target) as f:
|
||||
assert json.load(f) == {"original": True}
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — config update error paths
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestConfigUpdateErrors:
|
||||
def test_update_mcp_config_no_config_file(self, client):
|
||||
"""FileNotFoundError when extensions_config.json cannot be located."""
|
||||
with patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=None):
|
||||
with pytest.raises(FileNotFoundError, match="Cannot locate"):
|
||||
client.update_mcp_config({"server": {}})
|
||||
|
||||
def test_update_skill_no_config_file(self, client):
|
||||
"""FileNotFoundError when extensions_config.json cannot be located."""
|
||||
skill = MagicMock()
|
||||
skill.name = "some-skill"
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.loader.load_skills", return_value=[skill]),
|
||||
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=None),
|
||||
):
|
||||
with pytest.raises(FileNotFoundError, match="Cannot locate"):
|
||||
client.update_skill("some-skill", enabled=False)
|
||||
|
||||
def test_update_skill_disappears_after_write(self, client):
|
||||
"""RuntimeError when skill vanishes between write and re-read."""
|
||||
skill = MagicMock()
|
||||
skill.name = "ghost-skill"
|
||||
|
||||
ext_config = MagicMock()
|
||||
ext_config.mcp_servers = {}
|
||||
ext_config.skills = {}
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
config_file = Path(tmp) / "extensions_config.json"
|
||||
config_file.write_text("{}")
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.loader.load_skills", side_effect=[[skill], []]),
|
||||
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
|
||||
patch("deerflow.client.get_extensions_config", return_value=ext_config),
|
||||
patch("deerflow.client.reload_extensions_config"),
|
||||
):
|
||||
with pytest.raises(RuntimeError, match="disappeared"):
|
||||
client.update_skill("ghost-skill", enabled=False)
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — stream / chat edge cases
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestStreamHardening:
|
||||
def test_agent_exception_propagates(self, client):
|
||||
"""Exceptions from agent.stream() propagate to caller."""
|
||||
agent = MagicMock()
|
||||
agent.stream.side_effect = RuntimeError("model quota exceeded")
|
||||
|
||||
with (
|
||||
patch.object(client, "_ensure_agent"),
|
||||
patch.object(client, "_agent", agent),
|
||||
):
|
||||
with pytest.raises(RuntimeError, match="model quota exceeded"):
|
||||
list(client.stream("hi", thread_id="t-err"))
|
||||
|
||||
def test_messages_without_id(self, client):
|
||||
"""Messages without id attribute are emitted without crashing."""
|
||||
ai = AIMessage(content="no id here")
|
||||
# Forcibly remove the id attribute to simulate edge case.
|
||||
object.__setattr__(ai, "id", None)
|
||||
chunks = [{"messages": [ai]}]
|
||||
agent = _make_agent_mock(chunks)
|
||||
|
||||
with (
|
||||
patch.object(client, "_ensure_agent"),
|
||||
patch.object(client, "_agent", agent),
|
||||
):
|
||||
events = list(client.stream("hi", thread_id="t-noid"))
|
||||
|
||||
# Should produce events without error.
|
||||
assert events[-1].type == "end"
|
||||
ai_events = _ai_events(events)
|
||||
assert len(ai_events) == 1
|
||||
assert ai_events[0].data["content"] == "no id here"
|
||||
|
||||
def test_tool_calls_only_no_text(self, client):
|
||||
"""chat() returns empty string when agent only emits tool calls."""
|
||||
ai = AIMessage(
|
||||
content="",
|
||||
id="ai-1",
|
||||
tool_calls=[{"name": "bash", "args": {"cmd": "ls"}, "id": "tc-1"}],
|
||||
)
|
||||
tool = ToolMessage(content="output", id="tm-1", tool_call_id="tc-1", name="bash")
|
||||
chunks = [
|
||||
{"messages": [ai]},
|
||||
{"messages": [ai, tool]},
|
||||
]
|
||||
agent = _make_agent_mock(chunks)
|
||||
|
||||
with (
|
||||
patch.object(client, "_ensure_agent"),
|
||||
patch.object(client, "_agent", agent),
|
||||
):
|
||||
result = client.chat("do it", thread_id="t-tc-only")
|
||||
|
||||
assert result == ""
|
||||
|
||||
def test_duplicate_messages_without_id_not_deduplicated(self, client):
|
||||
"""Messages with id=None are NOT deduplicated (each is emitted)."""
|
||||
ai1 = AIMessage(content="first")
|
||||
ai2 = AIMessage(content="second")
|
||||
object.__setattr__(ai1, "id", None)
|
||||
object.__setattr__(ai2, "id", None)
|
||||
|
||||
chunks = [
|
||||
{"messages": [ai1]},
|
||||
{"messages": [ai2]},
|
||||
]
|
||||
agent = _make_agent_mock(chunks)
|
||||
|
||||
with (
|
||||
patch.object(client, "_ensure_agent"),
|
||||
patch.object(client, "_agent", agent),
|
||||
):
|
||||
events = list(client.stream("hi", thread_id="t-dup-noid"))
|
||||
|
||||
ai_msgs = _ai_events(events)
|
||||
assert len(ai_msgs) == 2
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — _serialize_message coverage
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestSerializeMessage:
|
||||
def test_system_message(self):
|
||||
msg = SystemMessage(content="You are a helpful assistant.", id="sys-1")
|
||||
result = DeerFlowClient._serialize_message(msg)
|
||||
assert result["type"] == "system"
|
||||
assert result["content"] == "You are a helpful assistant."
|
||||
assert result["id"] == "sys-1"
|
||||
|
||||
def test_unknown_message_type(self):
|
||||
"""Non-standard message types serialize as 'unknown'."""
|
||||
msg = MagicMock()
|
||||
msg.id = "unk-1"
|
||||
msg.content = "something"
|
||||
# Not an instance of AIMessage/ToolMessage/HumanMessage/SystemMessage
|
||||
type(msg).__name__ = "CustomMessage"
|
||||
result = DeerFlowClient._serialize_message(msg)
|
||||
assert result["type"] == "unknown"
|
||||
assert result["id"] == "unk-1"
|
||||
|
||||
def test_ai_message_with_tool_calls(self):
|
||||
msg = AIMessage(
|
||||
content="",
|
||||
id="ai-tc",
|
||||
tool_calls=[{"name": "bash", "args": {"cmd": "ls"}, "id": "tc-1"}],
|
||||
)
|
||||
result = DeerFlowClient._serialize_message(msg)
|
||||
assert result["type"] == "ai"
|
||||
assert len(result["tool_calls"]) == 1
|
||||
assert result["tool_calls"][0]["name"] == "bash"
|
||||
|
||||
def test_tool_message_non_string_content(self):
|
||||
msg = ToolMessage(content={"key": "value"}, id="tm-1", tool_call_id="tc-1", name="tool")
|
||||
result = DeerFlowClient._serialize_message(msg)
|
||||
assert result["type"] == "tool"
|
||||
assert isinstance(result["content"], str)
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — upload / delete symlink attack
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestUploadDeleteSymlink:
|
||||
def test_delete_upload_symlink_outside_dir(self, client):
|
||||
"""A symlink in uploads dir pointing outside is caught by path traversal check."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
uploads_dir = Path(tmp) / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
|
||||
# Create a target file outside uploads dir.
|
||||
outside = Path(tmp) / "secret.txt"
|
||||
outside.write_text("sensitive data")
|
||||
|
||||
# Create a symlink inside uploads dir pointing to outside file.
|
||||
link = uploads_dir / "harmless.txt"
|
||||
link.symlink_to(outside)
|
||||
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
# The resolved path of the symlink escapes uploads_dir,
|
||||
# so path traversal check should catch it.
|
||||
with pytest.raises(PathTraversalError):
|
||||
client.delete_upload("thread-1", "harmless.txt")
|
||||
|
||||
# The outside file must NOT have been deleted.
|
||||
assert outside.exists()
|
||||
|
||||
def test_upload_filename_with_spaces_and_unicode(self, client):
|
||||
"""Files with spaces and unicode characters in names upload correctly."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
uploads_dir = tmp_path / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
|
||||
weird_name = "report 2024 数据.txt"
|
||||
src_file = tmp_path / weird_name
|
||||
src_file.write_text("data")
|
||||
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.upload_files("thread-1", [src_file])
|
||||
|
||||
assert result["success"] is True
|
||||
assert result["files"][0]["filename"] == weird_name
|
||||
assert (uploads_dir / weird_name).exists()
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — artifact edge cases
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestArtifactHardening:
|
||||
def test_artifact_directory_rejected(self, client):
|
||||
"""get_artifact rejects paths that resolve to a directory."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
paths = Paths(base_dir=tmp)
|
||||
subdir = paths.sandbox_outputs_dir("t1") / "subdir"
|
||||
subdir.mkdir(parents=True)
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
with pytest.raises(ValueError, match="not a file"):
|
||||
client.get_artifact("t1", "mnt/user-data/outputs/subdir")
|
||||
|
||||
def test_artifact_leading_slash_stripped(self, client):
|
||||
"""Paths with leading slash are handled correctly."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
paths = Paths(base_dir=tmp)
|
||||
outputs = paths.sandbox_outputs_dir("t1")
|
||||
outputs.mkdir(parents=True)
|
||||
(outputs / "file.txt").write_text("content")
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
content, _mime = client.get_artifact("t1", "/mnt/user-data/outputs/file.txt")
|
||||
|
||||
assert content == b"content"
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# BUG DETECTION — tests that expose real bugs in client.py
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestUploadDuplicateFilenames:
|
||||
"""Regression: upload_files must auto-rename duplicate basenames.
|
||||
|
||||
Previously it silently overwrote the first file with the second,
|
||||
then reported both in the response while only one existed on disk.
|
||||
Now duplicates are renamed (data.txt → data_1.txt) and the response
|
||||
includes original_filename so the agent / caller can see what happened.
|
||||
"""
|
||||
|
||||
def test_duplicate_filenames_auto_renamed(self, client):
|
||||
"""Two files with same basename → second gets _1 suffix."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
uploads_dir = tmp_path / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
|
||||
dir_a = tmp_path / "a"
|
||||
dir_b = tmp_path / "b"
|
||||
dir_a.mkdir()
|
||||
dir_b.mkdir()
|
||||
(dir_a / "data.txt").write_text("version A")
|
||||
(dir_b / "data.txt").write_text("version B")
|
||||
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.upload_files("t-dup", [dir_a / "data.txt", dir_b / "data.txt"])
|
||||
|
||||
assert result["success"] is True
|
||||
assert len(result["files"]) == 2
|
||||
|
||||
# Both files exist on disk with distinct names.
|
||||
disk_files = sorted(p.name for p in uploads_dir.iterdir())
|
||||
assert disk_files == ["data.txt", "data_1.txt"]
|
||||
|
||||
# First keeps original name, second is renamed.
|
||||
assert result["files"][0]["filename"] == "data.txt"
|
||||
assert "original_filename" not in result["files"][0]
|
||||
|
||||
assert result["files"][1]["filename"] == "data_1.txt"
|
||||
assert result["files"][1]["original_filename"] == "data.txt"
|
||||
|
||||
# Content preserved correctly.
|
||||
assert (uploads_dir / "data.txt").read_text() == "version A"
|
||||
assert (uploads_dir / "data_1.txt").read_text() == "version B"
|
||||
|
||||
def test_triple_duplicate_increments_counter(self, client):
|
||||
"""Three files with same basename → _1, _2 suffixes."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
uploads_dir = tmp_path / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
|
||||
for name in ["x", "y", "z"]:
|
||||
d = tmp_path / name
|
||||
d.mkdir()
|
||||
(d / "report.csv").write_text(f"from {name}")
|
||||
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.upload_files(
|
||||
"t-triple",
|
||||
[tmp_path / "x" / "report.csv", tmp_path / "y" / "report.csv", tmp_path / "z" / "report.csv"],
|
||||
)
|
||||
|
||||
filenames = [f["filename"] for f in result["files"]]
|
||||
assert filenames == ["report.csv", "report_1.csv", "report_2.csv"]
|
||||
assert len(list(uploads_dir.iterdir())) == 3
|
||||
|
||||
def test_different_filenames_no_rename(self, client):
|
||||
"""Non-duplicate filenames upload normally without rename."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
uploads_dir = tmp_path / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
|
||||
(tmp_path / "a.txt").write_text("aaa")
|
||||
(tmp_path / "b.txt").write_text("bbb")
|
||||
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.upload_files("t-ok", [tmp_path / "a.txt", tmp_path / "b.txt"])
|
||||
|
||||
assert result["success"] is True
|
||||
assert len(result["files"]) == 2
|
||||
assert all("original_filename" not in f for f in result["files"])
|
||||
assert len(list(uploads_dir.iterdir())) == 2
|
||||
|
||||
|
||||
class TestBugArtifactPrefixMatchTooLoose:
|
||||
"""Regression: get_artifact must reject paths like ``mnt/user-data-evil/...``.
|
||||
|
||||
Previously ``startswith("mnt/user-data")`` matched ``"mnt/user-data-evil"``
|
||||
because it was a string prefix, not a path-segment check.
|
||||
"""
|
||||
|
||||
def test_non_canonical_prefix_rejected(self, client):
|
||||
"""Paths that share a string prefix but differ at segment boundary are rejected."""
|
||||
with pytest.raises(ValueError, match="must start with"):
|
||||
client.get_artifact("t1", "mnt/user-data-evil/secret.txt")
|
||||
|
||||
def test_exact_prefix_without_subpath_accepted(self, client):
|
||||
"""Bare 'mnt/user-data' is accepted (will later fail as directory, not at prefix)."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
paths = Paths(base_dir=tmp)
|
||||
paths.sandbox_user_data_dir("t1").mkdir(parents=True)
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
# Accepted at prefix check, but fails because it's a directory.
|
||||
with pytest.raises(ValueError, match="not a file"):
|
||||
client.get_artifact("t1", "mnt/user-data")
|
||||
|
||||
|
||||
class TestBugListUploadsDeadCode:
|
||||
"""Regression: list_uploads works even when called on a fresh thread
|
||||
(directory does not exist yet — returns empty without creating it).
|
||||
"""
|
||||
|
||||
def test_list_uploads_on_fresh_thread(self, client):
|
||||
"""list_uploads on a thread that never had uploads returns empty list."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
non_existent = Path(tmp) / "does-not-exist" / "uploads"
|
||||
assert not non_existent.exists()
|
||||
|
||||
mock_paths = MagicMock()
|
||||
mock_paths.sandbox_uploads_dir.return_value = non_existent
|
||||
|
||||
with patch("deerflow.uploads.manager.get_paths", return_value=mock_paths):
|
||||
result = client.list_uploads("thread-fresh")
|
||||
|
||||
# Read path should NOT create the directory
|
||||
assert not non_existent.exists()
|
||||
assert result == {"files": [], "count": 0}
|
||||
|
||||
|
||||
class TestBugAgentInvalidationInconsistency:
|
||||
"""Regression: update_skill and update_mcp_config must reset both
|
||||
_agent and _agent_config_key, just like reset_agent() does.
|
||||
"""
|
||||
|
||||
def test_update_mcp_resets_config_key(self, client):
|
||||
"""After update_mcp_config, both _agent and _agent_config_key are None."""
|
||||
client._agent = MagicMock()
|
||||
client._agent_config_key = ("model", True, False, False)
|
||||
|
||||
current_config = MagicMock()
|
||||
current_config.skills = {}
|
||||
reloaded = MagicMock()
|
||||
reloaded.mcp_servers = {}
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
config_file = Path(tmp) / "ext.json"
|
||||
config_file.write_text("{}")
|
||||
|
||||
with (
|
||||
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
|
||||
patch("deerflow.client.get_extensions_config", return_value=current_config),
|
||||
patch("deerflow.client.reload_extensions_config", return_value=reloaded),
|
||||
):
|
||||
client.update_mcp_config({})
|
||||
|
||||
assert client._agent is None
|
||||
assert client._agent_config_key is None
|
||||
|
||||
def test_update_skill_resets_config_key(self, client):
|
||||
"""After update_skill, both _agent and _agent_config_key are None."""
|
||||
client._agent = MagicMock()
|
||||
client._agent_config_key = ("model", True, False, False)
|
||||
|
||||
skill = MagicMock()
|
||||
skill.name = "s1"
|
||||
updated = MagicMock()
|
||||
updated.name = "s1"
|
||||
updated.description = "d"
|
||||
updated.license = "MIT"
|
||||
updated.category = "c"
|
||||
updated.enabled = False
|
||||
|
||||
ext_config = MagicMock()
|
||||
ext_config.mcp_servers = {}
|
||||
ext_config.skills = {}
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
config_file = Path(tmp) / "ext.json"
|
||||
config_file.write_text("{}")
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.loader.load_skills", side_effect=[[skill], [updated]]),
|
||||
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
|
||||
patch("deerflow.client.get_extensions_config", return_value=ext_config),
|
||||
patch("deerflow.client.reload_extensions_config"),
|
||||
):
|
||||
client.update_skill("s1", enabled=False)
|
||||
|
||||
assert client._agent is None
|
||||
assert client._agent_config_key is None
|
||||
|
||||
781
backend/tests/test_client_e2e.py
Normal file
781
backend/tests/test_client_e2e.py
Normal file
@@ -0,0 +1,781 @@
|
||||
"""End-to-end tests for DeerFlowClient.
|
||||
|
||||
Middle tier of the test pyramid:
|
||||
- Top: test_client_live.py — real LLM, needs API key
|
||||
- Middle: test_client_e2e.py — real LLM + real modules ← THIS FILE
|
||||
- Bottom: test_client.py — unit tests, mock everything
|
||||
|
||||
Core principle: use the real LLM from config.yaml, let config, middleware
|
||||
chain, tool registration, file I/O, and event serialization all run for real.
|
||||
Only DEER_FLOW_HOME is redirected to tmp_path for filesystem isolation.
|
||||
|
||||
Tests that call the LLM are marked ``requires_llm`` and skipped in CI.
|
||||
File-management tests (upload/list/delete) don't need LLM and run everywhere.
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import uuid
|
||||
import zipfile
|
||||
|
||||
import pytest
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from deerflow.client import DeerFlowClient, StreamEvent
|
||||
from deerflow.config.app_config import AppConfig
|
||||
from deerflow.config.model_config import ModelConfig
|
||||
from deerflow.config.sandbox_config import SandboxConfig
|
||||
|
||||
# Load .env from project root (for OPENAI_API_KEY etc.)
|
||||
load_dotenv(os.path.join(os.path.dirname(__file__), "../../.env"))
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Markers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
requires_llm = pytest.mark.skipif(
|
||||
os.getenv("CI", "").lower() in ("true", "1") or not os.getenv("OPENAI_API_KEY"),
|
||||
reason="Requires LLM API key — skipped in CI or when OPENAI_API_KEY is unset",
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_e2e_config() -> AppConfig:
|
||||
"""Build a minimal AppConfig using real LLM credentials from environment.
|
||||
|
||||
All LLM connection details come from environment variables so that both
|
||||
internal CI and external contributors can run the tests:
|
||||
|
||||
- ``E2E_MODEL_NAME`` (default: ``volcengine-ark``)
|
||||
- ``E2E_MODEL_USE`` (default: ``langchain_openai:ChatOpenAI``)
|
||||
- ``E2E_MODEL_ID`` (default: ``ep-20251211175242-llcmh``)
|
||||
- ``E2E_BASE_URL`` (default: ``https://ark-cn-beijing.bytedance.net/api/v3``)
|
||||
- ``OPENAI_API_KEY`` (required for LLM tests)
|
||||
"""
|
||||
return AppConfig(
|
||||
models=[
|
||||
ModelConfig(
|
||||
name=os.getenv("E2E_MODEL_NAME", "volcengine-ark"),
|
||||
display_name="E2E Test Model",
|
||||
use=os.getenv("E2E_MODEL_USE", "langchain_openai:ChatOpenAI"),
|
||||
model=os.getenv("E2E_MODEL_ID", "ep-20251211175242-llcmh"),
|
||||
base_url=os.getenv("E2E_BASE_URL", "https://ark-cn-beijing.bytedance.net/api/v3"),
|
||||
api_key=os.getenv("OPENAI_API_KEY", ""),
|
||||
max_tokens=512,
|
||||
temperature=0.7,
|
||||
supports_thinking=False,
|
||||
supports_reasoning_effort=False,
|
||||
supports_vision=False,
|
||||
)
|
||||
],
|
||||
sandbox=SandboxConfig(use="deerflow.sandbox.local:LocalSandboxProvider"),
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Fixtures
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def e2e_env(tmp_path, monkeypatch):
|
||||
"""Isolated filesystem environment for E2E tests.
|
||||
|
||||
- DEER_FLOW_HOME → tmp_path (all thread data lands in a temp dir)
|
||||
- Singletons reset so they pick up the new env
|
||||
- Title/memory/summarization disabled to avoid extra LLM calls
|
||||
- AppConfig built programmatically (avoids config.yaml param-name issues)
|
||||
"""
|
||||
# 1. Filesystem isolation
|
||||
monkeypatch.setenv("DEER_FLOW_HOME", str(tmp_path))
|
||||
monkeypatch.setattr("deerflow.config.paths._paths", None)
|
||||
monkeypatch.setattr("deerflow.sandbox.sandbox_provider._default_sandbox_provider", None)
|
||||
|
||||
# 2. Inject a clean AppConfig via the global singleton.
|
||||
config = _make_e2e_config()
|
||||
monkeypatch.setattr("deerflow.config.app_config._app_config", config)
|
||||
monkeypatch.setattr("deerflow.config.app_config._app_config_is_custom", True)
|
||||
|
||||
# 3. Disable title generation (extra LLM call, non-deterministic)
|
||||
from deerflow.config.title_config import TitleConfig
|
||||
|
||||
monkeypatch.setattr("deerflow.config.title_config._title_config", TitleConfig(enabled=False))
|
||||
|
||||
# 4. Disable memory queueing (avoids background threads & file writes)
|
||||
from deerflow.config.memory_config import MemoryConfig
|
||||
|
||||
monkeypatch.setattr(
|
||||
"deerflow.agents.middlewares.memory_middleware.get_memory_config",
|
||||
lambda: MemoryConfig(enabled=False),
|
||||
)
|
||||
|
||||
# 5. Ensure summarization is off (default, but be explicit)
|
||||
from deerflow.config.summarization_config import SummarizationConfig
|
||||
|
||||
monkeypatch.setattr("deerflow.config.summarization_config._summarization_config", SummarizationConfig(enabled=False))
|
||||
|
||||
# 6. Exclude TitleMiddleware from the chain.
|
||||
# It triggers an extra LLM call to generate a thread title, which adds
|
||||
# non-determinism and cost to E2E tests (title generation is already
|
||||
# disabled via TitleConfig above, but the middleware still participates
|
||||
# in the chain and can interfere with event ordering).
|
||||
from deerflow.agents.lead_agent.agent import _build_middlewares as _original_build_middlewares
|
||||
from deerflow.agents.middlewares.title_middleware import TitleMiddleware
|
||||
|
||||
def _sync_safe_build_middlewares(*args, **kwargs):
|
||||
mws = _original_build_middlewares(*args, **kwargs)
|
||||
return [m for m in mws if not isinstance(m, TitleMiddleware)]
|
||||
|
||||
monkeypatch.setattr("deerflow.client._build_middlewares", _sync_safe_build_middlewares)
|
||||
|
||||
return {"tmp_path": tmp_path}
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def client(e2e_env):
|
||||
"""A DeerFlowClient wired to the isolated e2e_env."""
|
||||
return DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 2: Basic streaming (requires LLM)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestBasicChat:
|
||||
"""Basic chat and streaming behavior with real LLM."""
|
||||
|
||||
@requires_llm
|
||||
def test_basic_chat(self, client):
|
||||
"""chat() returns a non-empty text response."""
|
||||
result = client.chat("Say exactly: pong")
|
||||
assert isinstance(result, str)
|
||||
assert len(result) > 0
|
||||
|
||||
@requires_llm
|
||||
def test_stream_event_sequence(self, client):
|
||||
"""stream() yields events: messages-tuple, values, and end."""
|
||||
events = list(client.stream("Say hi"))
|
||||
|
||||
types = [e.type for e in events]
|
||||
assert types[-1] == "end"
|
||||
assert "messages-tuple" in types
|
||||
assert "values" in types
|
||||
|
||||
@requires_llm
|
||||
def test_stream_event_data_format(self, client):
|
||||
"""Each event type has the expected data structure."""
|
||||
events = list(client.stream("Say hello"))
|
||||
|
||||
for event in events:
|
||||
assert isinstance(event, StreamEvent)
|
||||
assert isinstance(event.type, str)
|
||||
assert isinstance(event.data, dict)
|
||||
|
||||
if event.type == "messages-tuple" and event.data.get("type") == "ai":
|
||||
assert "content" in event.data
|
||||
assert "id" in event.data
|
||||
elif event.type == "values":
|
||||
assert "messages" in event.data
|
||||
assert "artifacts" in event.data
|
||||
elif event.type == "end":
|
||||
assert event.data == {}
|
||||
|
||||
@requires_llm
|
||||
def test_multi_turn_stateless(self, client):
|
||||
"""Without checkpointer, two calls to the same thread_id are independent."""
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
r1 = client.chat("Remember the number 42", thread_id=tid)
|
||||
# Reset so agent is recreated (simulates no cross-turn state)
|
||||
client.reset_agent()
|
||||
r2 = client.chat("What number did I say?", thread_id=tid)
|
||||
|
||||
# Without a checkpointer the second call has no memory of the first.
|
||||
# We can't assert exact content, but both should be non-empty.
|
||||
assert isinstance(r1, str) and len(r1) > 0
|
||||
assert isinstance(r2, str) and len(r2) > 0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 3: Tool call flow (requires LLM)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestToolCallFlow:
|
||||
"""Verify the LLM actually invokes tools through the real agent pipeline."""
|
||||
|
||||
@requires_llm
|
||||
def test_tool_call_produces_events(self, client):
|
||||
"""When the LLM decides to use a tool, we see tool call + result events."""
|
||||
# Give a clear instruction that forces a tool call
|
||||
events = list(client.stream(
|
||||
"Use the bash tool to run: echo hello_e2e_test"
|
||||
))
|
||||
|
||||
types = [e.type for e in events]
|
||||
assert types[-1] == "end"
|
||||
|
||||
# Should have at least one tool call event
|
||||
tool_call_events = [
|
||||
e for e in events
|
||||
if e.type == "messages-tuple" and e.data.get("tool_calls")
|
||||
]
|
||||
tool_result_events = [
|
||||
e for e in events
|
||||
if e.type == "messages-tuple" and e.data.get("type") == "tool"
|
||||
]
|
||||
assert len(tool_call_events) >= 1, "Expected at least one tool_call event"
|
||||
assert len(tool_result_events) >= 1, "Expected at least one tool result event"
|
||||
|
||||
@requires_llm
|
||||
def test_tool_call_event_structure(self, client):
|
||||
"""Tool call events contain name, args, and id fields."""
|
||||
events = list(client.stream(
|
||||
"Use the read_file tool to read /mnt/user-data/workspace/nonexistent.txt"
|
||||
))
|
||||
|
||||
tc_events = [
|
||||
e for e in events
|
||||
if e.type == "messages-tuple" and e.data.get("tool_calls")
|
||||
]
|
||||
if tc_events:
|
||||
tc = tc_events[0].data["tool_calls"][0]
|
||||
assert "name" in tc
|
||||
assert "args" in tc
|
||||
assert "id" in tc
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 4: File upload integration (no LLM needed for most)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestFileUploadIntegration:
|
||||
"""Upload, list, and delete files through the real client path."""
|
||||
|
||||
def test_upload_files(self, e2e_env, tmp_path):
|
||||
"""upload_files() copies files and returns metadata."""
|
||||
test_file = tmp_path / "source" / "readme.txt"
|
||||
test_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
test_file.write_text("Hello world")
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
result = c.upload_files(tid, [test_file])
|
||||
assert result["success"] is True
|
||||
assert len(result["files"]) == 1
|
||||
assert result["files"][0]["filename"] == "readme.txt"
|
||||
|
||||
# Physically exists
|
||||
from deerflow.config.paths import get_paths
|
||||
assert (get_paths().sandbox_uploads_dir(tid) / "readme.txt").exists()
|
||||
|
||||
def test_upload_duplicate_rename(self, e2e_env, tmp_path):
|
||||
"""Uploading two files with the same name auto-renames the second."""
|
||||
d1 = tmp_path / "dir1"
|
||||
d2 = tmp_path / "dir2"
|
||||
d1.mkdir()
|
||||
d2.mkdir()
|
||||
(d1 / "data.txt").write_text("content A")
|
||||
(d2 / "data.txt").write_text("content B")
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
result = c.upload_files(tid, [d1 / "data.txt", d2 / "data.txt"])
|
||||
assert result["success"] is True
|
||||
assert len(result["files"]) == 2
|
||||
|
||||
filenames = {f["filename"] for f in result["files"]}
|
||||
assert "data.txt" in filenames
|
||||
assert "data_1.txt" in filenames
|
||||
|
||||
def test_upload_list_and_delete(self, e2e_env, tmp_path):
|
||||
"""Upload → list → delete → list lifecycle."""
|
||||
test_file = tmp_path / "lifecycle.txt"
|
||||
test_file.write_text("lifecycle test")
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
c.upload_files(tid, [test_file])
|
||||
|
||||
listing = c.list_uploads(tid)
|
||||
assert listing["count"] == 1
|
||||
assert listing["files"][0]["filename"] == "lifecycle.txt"
|
||||
|
||||
del_result = c.delete_upload(tid, "lifecycle.txt")
|
||||
assert del_result["success"] is True
|
||||
|
||||
listing = c.list_uploads(tid)
|
||||
assert listing["count"] == 0
|
||||
|
||||
@requires_llm
|
||||
def test_upload_then_chat(self, e2e_env, tmp_path):
|
||||
"""Upload a file then ask the LLM about it — UploadsMiddleware injects file info."""
|
||||
test_file = tmp_path / "source" / "notes.txt"
|
||||
test_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
test_file.write_text("The secret code is 7749.")
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
c.upload_files(tid, [test_file])
|
||||
# Chat — the middleware should inject <uploaded_files> context
|
||||
response = c.chat("What files are available?", thread_id=tid)
|
||||
assert isinstance(response, str) and len(response) > 0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 5: Lifecycle and configuration (no LLM needed)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestLifecycleAndConfig:
|
||||
"""Agent recreation and configuration behavior."""
|
||||
|
||||
@requires_llm
|
||||
def test_agent_recreation_on_config_change(self, client):
|
||||
"""Changing thinking_enabled triggers agent recreation (different config key)."""
|
||||
list(client.stream("hi"))
|
||||
key1 = client._agent_config_key
|
||||
|
||||
# Stream with a different config override
|
||||
client.reset_agent()
|
||||
list(client.stream("hi", thinking_enabled=True))
|
||||
key2 = client._agent_config_key
|
||||
|
||||
# thinking_enabled changed: False → True → keys differ
|
||||
assert key1 != key2
|
||||
|
||||
def test_reset_agent_clears_state(self, e2e_env):
|
||||
"""reset_agent() sets the internal agent to None."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
# Before any call, agent is None
|
||||
assert c._agent is None
|
||||
|
||||
c.reset_agent()
|
||||
assert c._agent is None
|
||||
assert c._agent_config_key is None
|
||||
|
||||
def test_plan_mode_config_key(self, e2e_env):
|
||||
"""plan_mode is part of the config key tuple."""
|
||||
c = DeerFlowClient(checkpointer=None, plan_mode=False)
|
||||
cfg1 = c._get_runnable_config("test-thread")
|
||||
key1 = (
|
||||
cfg1["configurable"]["model_name"],
|
||||
cfg1["configurable"]["thinking_enabled"],
|
||||
cfg1["configurable"]["is_plan_mode"],
|
||||
cfg1["configurable"]["subagent_enabled"],
|
||||
)
|
||||
|
||||
c2 = DeerFlowClient(checkpointer=None, plan_mode=True)
|
||||
cfg2 = c2._get_runnable_config("test-thread")
|
||||
key2 = (
|
||||
cfg2["configurable"]["model_name"],
|
||||
cfg2["configurable"]["thinking_enabled"],
|
||||
cfg2["configurable"]["is_plan_mode"],
|
||||
cfg2["configurable"]["subagent_enabled"],
|
||||
)
|
||||
|
||||
assert key1 != key2
|
||||
assert key1[2] is False
|
||||
assert key2[2] is True
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 6: Middleware chain verification (requires LLM)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMiddlewareChain:
|
||||
"""Verify middleware side effects through real execution."""
|
||||
|
||||
@requires_llm
|
||||
def test_thread_data_paths_in_state(self, client):
|
||||
"""After streaming, thread directory paths are computed correctly."""
|
||||
tid = str(uuid.uuid4())
|
||||
events = list(client.stream("hi", thread_id=tid))
|
||||
|
||||
# The values event should contain messages
|
||||
values_events = [e for e in events if e.type == "values"]
|
||||
assert len(values_events) >= 1
|
||||
|
||||
# ThreadDataMiddleware should have set paths in the state.
|
||||
# We verify the paths singleton can resolve the thread dir.
|
||||
from deerflow.config.paths import get_paths
|
||||
thread_dir = get_paths().thread_dir(tid)
|
||||
assert str(thread_dir).endswith(tid)
|
||||
|
||||
@requires_llm
|
||||
def test_stream_completes_without_middleware_errors(self, client):
|
||||
"""Full middleware chain (ThreadData, Uploads, Sandbox, DanglingToolCall,
|
||||
Memory, Clarification) executes without errors."""
|
||||
events = list(client.stream("What is 1+1?"))
|
||||
|
||||
types = [e.type for e in events]
|
||||
assert types[-1] == "end"
|
||||
# Should have at least one AI response
|
||||
ai_events = [
|
||||
e for e in events
|
||||
if e.type == "messages-tuple" and e.data.get("type") == "ai"
|
||||
]
|
||||
assert len(ai_events) >= 1
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 7: Error and boundary conditions
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestErrorAndBoundary:
|
||||
"""Error propagation and edge cases."""
|
||||
|
||||
def test_upload_nonexistent_file_raises(self, e2e_env):
|
||||
"""Uploading a file that doesn't exist raises FileNotFoundError."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(FileNotFoundError):
|
||||
c.upload_files("test-thread", ["/nonexistent/file.txt"])
|
||||
|
||||
def test_delete_nonexistent_upload_raises(self, e2e_env):
|
||||
"""Deleting a file that doesn't exist raises FileNotFoundError."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
# Ensure the uploads dir exists first
|
||||
c.list_uploads(tid)
|
||||
with pytest.raises(FileNotFoundError):
|
||||
c.delete_upload(tid, "ghost.txt")
|
||||
|
||||
def test_artifact_path_traversal_blocked(self, e2e_env):
|
||||
"""get_artifact blocks path traversal attempts."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(ValueError):
|
||||
c.get_artifact("test-thread", "../../etc/passwd")
|
||||
|
||||
def test_upload_directory_rejected(self, e2e_env, tmp_path):
|
||||
"""Uploading a directory (not a file) is rejected."""
|
||||
d = tmp_path / "a_directory"
|
||||
d.mkdir()
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(ValueError, match="not a file"):
|
||||
c.upload_files("test-thread", [d])
|
||||
|
||||
@requires_llm
|
||||
def test_empty_message_still_gets_response(self, client):
|
||||
"""Even an empty-ish message should produce a valid event stream."""
|
||||
events = list(client.stream(" "))
|
||||
types = [e.type for e in events]
|
||||
assert types[-1] == "end"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 8: Artifact access (no LLM needed)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestArtifactAccess:
|
||||
"""Read artifacts through get_artifact() with real filesystem."""
|
||||
|
||||
def test_get_artifact_happy_path(self, e2e_env):
|
||||
"""Write a file to outputs, then read it back via get_artifact()."""
|
||||
from deerflow.config.paths import get_paths
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
# Create an output file in the thread's outputs directory
|
||||
outputs_dir = get_paths().sandbox_outputs_dir(tid)
|
||||
outputs_dir.mkdir(parents=True, exist_ok=True)
|
||||
(outputs_dir / "result.txt").write_text("hello artifact")
|
||||
|
||||
data, mime = c.get_artifact(tid, "mnt/user-data/outputs/result.txt")
|
||||
assert data == b"hello artifact"
|
||||
assert "text" in mime
|
||||
|
||||
def test_get_artifact_nested_path(self, e2e_env):
|
||||
"""Artifacts in subdirectories are accessible."""
|
||||
from deerflow.config.paths import get_paths
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
outputs_dir = get_paths().sandbox_outputs_dir(tid)
|
||||
sub = outputs_dir / "charts"
|
||||
sub.mkdir(parents=True, exist_ok=True)
|
||||
(sub / "data.json").write_text('{"x": 1}')
|
||||
|
||||
data, mime = c.get_artifact(tid, "mnt/user-data/outputs/charts/data.json")
|
||||
assert b'"x"' in data
|
||||
assert "json" in mime
|
||||
|
||||
def test_get_artifact_nonexistent_raises(self, e2e_env):
|
||||
"""Reading a nonexistent artifact raises FileNotFoundError."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(FileNotFoundError):
|
||||
c.get_artifact("test-thread", "mnt/user-data/outputs/ghost.txt")
|
||||
|
||||
def test_get_artifact_traversal_within_prefix_blocked(self, e2e_env):
|
||||
"""Path traversal within the valid prefix is still blocked."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises((PermissionError, ValueError, FileNotFoundError)):
|
||||
c.get_artifact("test-thread", "mnt/user-data/outputs/../../etc/passwd")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 9: Skill installation (no LLM needed)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSkillInstallation:
|
||||
"""install_skill() with real ZIP handling and filesystem."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _isolate_skills_dir(self, tmp_path, monkeypatch):
|
||||
"""Redirect skill installation to a temp directory."""
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "public").mkdir(parents=True)
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
monkeypatch.setattr(
|
||||
"deerflow.skills.installer.get_skills_root_path",
|
||||
lambda: skills_root,
|
||||
)
|
||||
self._skills_root = skills_root
|
||||
|
||||
@staticmethod
|
||||
def _make_skill_zip(tmp_path, skill_name="test-e2e-skill"):
|
||||
"""Create a minimal valid .skill archive."""
|
||||
skill_dir = tmp_path / "build" / skill_name
|
||||
skill_dir.mkdir(parents=True)
|
||||
(skill_dir / "SKILL.md").write_text(
|
||||
f"---\nname: {skill_name}\ndescription: E2E test skill\n---\n\nTest content.\n"
|
||||
)
|
||||
archive_path = tmp_path / f"{skill_name}.skill"
|
||||
with zipfile.ZipFile(archive_path, "w") as zf:
|
||||
for file in skill_dir.rglob("*"):
|
||||
zf.write(file, file.relative_to(tmp_path / "build"))
|
||||
return archive_path
|
||||
|
||||
def test_install_skill_success(self, e2e_env, tmp_path):
|
||||
"""A valid .skill archive installs to the custom skills directory."""
|
||||
archive = self._make_skill_zip(tmp_path)
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
|
||||
result = c.install_skill(archive)
|
||||
assert result["success"] is True
|
||||
assert result["skill_name"] == "test-e2e-skill"
|
||||
assert (self._skills_root / "custom" / "test-e2e-skill" / "SKILL.md").exists()
|
||||
|
||||
def test_install_skill_duplicate_rejected(self, e2e_env, tmp_path):
|
||||
"""Installing the same skill twice raises ValueError."""
|
||||
archive = self._make_skill_zip(tmp_path)
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
|
||||
c.install_skill(archive)
|
||||
with pytest.raises(ValueError, match="already exists"):
|
||||
c.install_skill(archive)
|
||||
|
||||
def test_install_skill_invalid_extension(self, e2e_env, tmp_path):
|
||||
"""A file without .skill extension is rejected."""
|
||||
bad_file = tmp_path / "not_a_skill.zip"
|
||||
bad_file.write_bytes(b"PK\x03\x04") # ZIP magic bytes
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(ValueError, match=".skill extension"):
|
||||
c.install_skill(bad_file)
|
||||
|
||||
def test_install_skill_missing_frontmatter(self, e2e_env, tmp_path):
|
||||
"""A .skill archive without valid SKILL.md frontmatter is rejected."""
|
||||
skill_dir = tmp_path / "build" / "bad-skill"
|
||||
skill_dir.mkdir(parents=True)
|
||||
(skill_dir / "SKILL.md").write_text("No frontmatter here.")
|
||||
|
||||
archive = tmp_path / "bad-skill.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
for file in skill_dir.rglob("*"):
|
||||
zf.write(file, file.relative_to(tmp_path / "build"))
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(ValueError, match="Invalid skill"):
|
||||
c.install_skill(archive)
|
||||
|
||||
def test_install_skill_nonexistent_file(self, e2e_env):
|
||||
"""Installing from a nonexistent path raises FileNotFoundError."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(FileNotFoundError):
|
||||
c.install_skill("/nonexistent/skill.skill")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 10: Configuration management (no LLM needed)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestConfigManagement:
|
||||
"""Config queries and updates through real code paths."""
|
||||
|
||||
def test_list_models_returns_injected_config(self, e2e_env):
|
||||
"""list_models() returns the model from the injected AppConfig."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.list_models()
|
||||
assert "models" in result
|
||||
assert len(result["models"]) == 1
|
||||
assert result["models"][0]["name"] == "volcengine-ark"
|
||||
assert result["models"][0]["display_name"] == "E2E Test Model"
|
||||
|
||||
def test_get_model_found(self, e2e_env):
|
||||
"""get_model() returns the model when it exists."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
model = c.get_model("volcengine-ark")
|
||||
assert model is not None
|
||||
assert model["name"] == "volcengine-ark"
|
||||
assert model["supports_thinking"] is False
|
||||
|
||||
def test_get_model_not_found(self, e2e_env):
|
||||
"""get_model() returns None for nonexistent model."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
assert c.get_model("nonexistent-model") is None
|
||||
|
||||
def test_list_skills_returns_list(self, e2e_env):
|
||||
"""list_skills() returns a dict with 'skills' key from real directory scan."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.list_skills()
|
||||
assert "skills" in result
|
||||
assert isinstance(result["skills"], list)
|
||||
# The real skills/ directory should have some public skills
|
||||
assert len(result["skills"]) > 0
|
||||
|
||||
def test_get_skill_found(self, e2e_env):
|
||||
"""get_skill() returns skill info for a known public skill."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
# 'deep-research' is a built-in public skill
|
||||
skill = c.get_skill("deep-research")
|
||||
if skill is not None:
|
||||
assert skill["name"] == "deep-research"
|
||||
assert "description" in skill
|
||||
assert "enabled" in skill
|
||||
|
||||
def test_get_skill_not_found(self, e2e_env):
|
||||
"""get_skill() returns None for nonexistent skill."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
assert c.get_skill("nonexistent-skill-xyz") is None
|
||||
|
||||
def test_get_mcp_config_returns_dict(self, e2e_env):
|
||||
"""get_mcp_config() returns a dict with 'mcp_servers' key."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.get_mcp_config()
|
||||
assert "mcp_servers" in result
|
||||
assert isinstance(result["mcp_servers"], dict)
|
||||
|
||||
def test_update_mcp_config_writes_and_invalidates(self, e2e_env, tmp_path, monkeypatch):
|
||||
"""update_mcp_config() writes extensions_config.json and invalidates the agent."""
|
||||
# Set up a writable extensions_config.json
|
||||
config_file = tmp_path / "extensions_config.json"
|
||||
config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
|
||||
monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
|
||||
|
||||
# Force reload so the singleton picks up our test file
|
||||
from deerflow.config.extensions_config import reload_extensions_config
|
||||
reload_extensions_config()
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
# Simulate a cached agent
|
||||
c._agent = "fake-agent-placeholder"
|
||||
c._agent_config_key = ("a", "b", "c", "d")
|
||||
|
||||
result = c.update_mcp_config({"test-server": {"enabled": True, "type": "stdio", "command": "echo"}})
|
||||
assert "mcp_servers" in result
|
||||
|
||||
# Agent should be invalidated
|
||||
assert c._agent is None
|
||||
assert c._agent_config_key is None
|
||||
|
||||
# File should be written
|
||||
written = json.loads(config_file.read_text())
|
||||
assert "test-server" in written["mcpServers"]
|
||||
|
||||
def test_update_skill_writes_and_invalidates(self, e2e_env, tmp_path, monkeypatch):
|
||||
"""update_skill() writes extensions_config.json and invalidates the agent."""
|
||||
config_file = tmp_path / "extensions_config.json"
|
||||
config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
|
||||
monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
|
||||
|
||||
from deerflow.config.extensions_config import reload_extensions_config
|
||||
reload_extensions_config()
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
c._agent = "fake-agent-placeholder"
|
||||
c._agent_config_key = ("a", "b", "c", "d")
|
||||
|
||||
# Use a real skill name from the public skills directory
|
||||
skills = c.list_skills()
|
||||
if not skills["skills"]:
|
||||
pytest.skip("No skills available for testing")
|
||||
skill_name = skills["skills"][0]["name"]
|
||||
|
||||
result = c.update_skill(skill_name, enabled=False)
|
||||
assert result["name"] == skill_name
|
||||
assert result["enabled"] is False
|
||||
|
||||
# Agent should be invalidated
|
||||
assert c._agent is None
|
||||
assert c._agent_config_key is None
|
||||
|
||||
def test_update_skill_nonexistent_raises(self, e2e_env, tmp_path, monkeypatch):
|
||||
"""update_skill() raises ValueError for nonexistent skill."""
|
||||
config_file = tmp_path / "extensions_config.json"
|
||||
config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
|
||||
monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
|
||||
|
||||
from deerflow.config.extensions_config import reload_extensions_config
|
||||
reload_extensions_config()
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(ValueError, match="not found"):
|
||||
c.update_skill("nonexistent-skill-xyz", enabled=True)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 11: Memory access (no LLM needed)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMemoryAccess:
|
||||
"""Memory system queries through real code paths."""
|
||||
|
||||
def test_get_memory_returns_dict(self, e2e_env):
|
||||
"""get_memory() returns a dict (may be empty initial state)."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.get_memory()
|
||||
assert isinstance(result, dict)
|
||||
|
||||
def test_reload_memory_returns_dict(self, e2e_env):
|
||||
"""reload_memory() forces reload and returns a dict."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.reload_memory()
|
||||
assert isinstance(result, dict)
|
||||
|
||||
def test_get_memory_config_fields(self, e2e_env):
|
||||
"""get_memory_config() returns expected config fields."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.get_memory_config()
|
||||
assert "enabled" in result
|
||||
assert "storage_path" in result
|
||||
assert "debounce_seconds" in result
|
||||
assert "max_facts" in result
|
||||
assert "fact_confidence_threshold" in result
|
||||
assert "injection_enabled" in result
|
||||
assert "max_injection_tokens" in result
|
||||
|
||||
def test_get_memory_status_combines_config_and_data(self, e2e_env):
|
||||
"""get_memory_status() returns both 'config' and 'data' keys."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.get_memory_status()
|
||||
assert "config" in result
|
||||
assert "data" in result
|
||||
assert "enabled" in result["config"]
|
||||
assert isinstance(result["data"], dict)
|
||||
@@ -1,8 +1,8 @@
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi import HTTPException
|
||||
import pytest
|
||||
|
||||
from app.gateway.routers.skills import _resolve_skill_dir_from_archive_root
|
||||
from deerflow.skills.installer import resolve_skill_dir_from_archive
|
||||
|
||||
|
||||
def _write_skill(skill_dir: Path) -> None:
|
||||
@@ -23,24 +23,19 @@ def test_resolve_skill_dir_ignores_macosx_wrapper(tmp_path: Path) -> None:
|
||||
_write_skill(tmp_path / "demo-skill")
|
||||
(tmp_path / "__MACOSX").mkdir()
|
||||
|
||||
assert _resolve_skill_dir_from_archive_root(tmp_path) == tmp_path / "demo-skill"
|
||||
assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "demo-skill"
|
||||
|
||||
|
||||
def test_resolve_skill_dir_ignores_hidden_top_level_entries(tmp_path: Path) -> None:
|
||||
_write_skill(tmp_path / "demo-skill")
|
||||
(tmp_path / ".DS_Store").write_text("metadata", encoding="utf-8")
|
||||
|
||||
assert _resolve_skill_dir_from_archive_root(tmp_path) == tmp_path / "demo-skill"
|
||||
assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "demo-skill"
|
||||
|
||||
|
||||
def test_resolve_skill_dir_rejects_archive_with_only_metadata(tmp_path: Path) -> None:
|
||||
(tmp_path / "__MACOSX").mkdir()
|
||||
(tmp_path / ".DS_Store").write_text("metadata", encoding="utf-8")
|
||||
|
||||
try:
|
||||
_resolve_skill_dir_from_archive_root(tmp_path)
|
||||
except HTTPException as error:
|
||||
assert error.status_code == 400
|
||||
assert error.detail == "Skill archive is empty"
|
||||
else:
|
||||
raise AssertionError("Expected HTTPException for metadata-only archive")
|
||||
with pytest.raises(ValueError, match="empty"):
|
||||
resolve_skill_dir_from_archive(tmp_path)
|
||||
|
||||
220
backend/tests/test_skills_installer.py
Normal file
220
backend/tests/test_skills_installer.py
Normal file
@@ -0,0 +1,220 @@
|
||||
"""Tests for deerflow.skills.installer — shared skill installation logic."""
|
||||
|
||||
import stat
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from deerflow.skills.installer import (
|
||||
install_skill_from_archive,
|
||||
is_symlink_member,
|
||||
is_unsafe_zip_member,
|
||||
resolve_skill_dir_from_archive,
|
||||
safe_extract_skill_archive,
|
||||
should_ignore_archive_entry,
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# is_unsafe_zip_member
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestIsUnsafeZipMember:
|
||||
def test_absolute_path(self):
|
||||
info = zipfile.ZipInfo("/etc/passwd")
|
||||
assert is_unsafe_zip_member(info) is True
|
||||
|
||||
def test_dotdot_traversal(self):
|
||||
info = zipfile.ZipInfo("foo/../../../etc/passwd")
|
||||
assert is_unsafe_zip_member(info) is True
|
||||
|
||||
def test_safe_member(self):
|
||||
info = zipfile.ZipInfo("my-skill/SKILL.md")
|
||||
assert is_unsafe_zip_member(info) is False
|
||||
|
||||
def test_empty_filename(self):
|
||||
info = zipfile.ZipInfo("")
|
||||
assert is_unsafe_zip_member(info) is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# is_symlink_member
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestIsSymlinkMember:
|
||||
def test_detects_symlink(self):
|
||||
info = zipfile.ZipInfo("link.txt")
|
||||
info.external_attr = (stat.S_IFLNK | 0o777) << 16
|
||||
assert is_symlink_member(info) is True
|
||||
|
||||
def test_regular_file(self):
|
||||
info = zipfile.ZipInfo("file.txt")
|
||||
info.external_attr = (stat.S_IFREG | 0o644) << 16
|
||||
assert is_symlink_member(info) is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# should_ignore_archive_entry
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestShouldIgnoreArchiveEntry:
|
||||
def test_macosx_ignored(self):
|
||||
assert should_ignore_archive_entry(Path("__MACOSX")) is True
|
||||
|
||||
def test_dotfile_ignored(self):
|
||||
assert should_ignore_archive_entry(Path(".DS_Store")) is True
|
||||
|
||||
def test_normal_dir_not_ignored(self):
|
||||
assert should_ignore_archive_entry(Path("my-skill")) is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# resolve_skill_dir_from_archive
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestResolveSkillDir:
|
||||
def test_single_dir(self, tmp_path):
|
||||
(tmp_path / "my-skill").mkdir()
|
||||
(tmp_path / "my-skill" / "SKILL.md").write_text("content")
|
||||
assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "my-skill"
|
||||
|
||||
def test_with_macosx(self, tmp_path):
|
||||
(tmp_path / "my-skill").mkdir()
|
||||
(tmp_path / "my-skill" / "SKILL.md").write_text("content")
|
||||
(tmp_path / "__MACOSX").mkdir()
|
||||
assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "my-skill"
|
||||
|
||||
def test_empty_after_filter(self, tmp_path):
|
||||
(tmp_path / "__MACOSX").mkdir()
|
||||
(tmp_path / ".DS_Store").write_text("meta")
|
||||
with pytest.raises(ValueError, match="empty"):
|
||||
resolve_skill_dir_from_archive(tmp_path)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# safe_extract_skill_archive
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSafeExtract:
|
||||
def _make_zip(self, tmp_path, members: dict[str, str | bytes]) -> Path:
|
||||
"""Create a zip with given filename->content entries."""
|
||||
zip_path = tmp_path / "test.zip"
|
||||
with zipfile.ZipFile(zip_path, "w") as zf:
|
||||
for name, content in members.items():
|
||||
if isinstance(content, str):
|
||||
content = content.encode()
|
||||
zf.writestr(name, content)
|
||||
return zip_path
|
||||
|
||||
def test_rejects_zip_bomb(self, tmp_path):
|
||||
zip_path = self._make_zip(tmp_path, {"big.txt": "x" * 1000})
|
||||
dest = tmp_path / "out"
|
||||
dest.mkdir()
|
||||
with zipfile.ZipFile(zip_path) as zf:
|
||||
with pytest.raises(ValueError, match="too large"):
|
||||
safe_extract_skill_archive(zf, dest, max_total_size=100)
|
||||
|
||||
def test_rejects_absolute_path(self, tmp_path):
|
||||
zip_path = tmp_path / "abs.zip"
|
||||
with zipfile.ZipFile(zip_path, "w") as zf:
|
||||
zf.writestr("/etc/passwd", "root:x:0:0")
|
||||
dest = tmp_path / "out"
|
||||
dest.mkdir()
|
||||
with zipfile.ZipFile(zip_path) as zf:
|
||||
with pytest.raises(ValueError, match="unsafe"):
|
||||
safe_extract_skill_archive(zf, dest)
|
||||
|
||||
def test_skips_symlinks(self, tmp_path):
|
||||
zip_path = tmp_path / "sym.zip"
|
||||
with zipfile.ZipFile(zip_path, "w") as zf:
|
||||
info = zipfile.ZipInfo("link.txt")
|
||||
info.external_attr = (stat.S_IFLNK | 0o777) << 16
|
||||
zf.writestr(info, "/etc/passwd")
|
||||
zf.writestr("normal.txt", "hello")
|
||||
dest = tmp_path / "out"
|
||||
dest.mkdir()
|
||||
with zipfile.ZipFile(zip_path) as zf:
|
||||
safe_extract_skill_archive(zf, dest)
|
||||
assert (dest / "normal.txt").exists()
|
||||
assert not (dest / "link.txt").exists()
|
||||
|
||||
def test_normal_archive(self, tmp_path):
|
||||
zip_path = self._make_zip(tmp_path, {
|
||||
"my-skill/SKILL.md": "---\nname: test\ndescription: x\n---\n# Test",
|
||||
"my-skill/README.md": "readme",
|
||||
})
|
||||
dest = tmp_path / "out"
|
||||
dest.mkdir()
|
||||
with zipfile.ZipFile(zip_path) as zf:
|
||||
safe_extract_skill_archive(zf, dest)
|
||||
assert (dest / "my-skill" / "SKILL.md").exists()
|
||||
assert (dest / "my-skill" / "README.md").exists()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# install_skill_from_archive (full integration)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestInstallSkillFromArchive:
|
||||
def _make_skill_zip(self, tmp_path: Path, skill_name: str = "test-skill") -> Path:
|
||||
"""Create a valid .skill archive."""
|
||||
zip_path = tmp_path / f"{skill_name}.skill"
|
||||
with zipfile.ZipFile(zip_path, "w") as zf:
|
||||
zf.writestr(
|
||||
f"{skill_name}/SKILL.md",
|
||||
f"---\nname: {skill_name}\ndescription: A test skill\n---\n\n# {skill_name}\n",
|
||||
)
|
||||
return zip_path
|
||||
|
||||
def test_success(self, tmp_path):
|
||||
zip_path = self._make_skill_zip(tmp_path)
|
||||
skills_root = tmp_path / "skills"
|
||||
skills_root.mkdir()
|
||||
result = install_skill_from_archive(zip_path, skills_root=skills_root)
|
||||
assert result["success"] is True
|
||||
assert result["skill_name"] == "test-skill"
|
||||
assert (skills_root / "custom" / "test-skill" / "SKILL.md").exists()
|
||||
|
||||
def test_duplicate_raises(self, tmp_path):
|
||||
zip_path = self._make_skill_zip(tmp_path)
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "custom" / "test-skill").mkdir(parents=True)
|
||||
with pytest.raises(ValueError, match="already exists"):
|
||||
install_skill_from_archive(zip_path, skills_root=skills_root)
|
||||
|
||||
def test_invalid_extension(self, tmp_path):
|
||||
bad_path = tmp_path / "bad.zip"
|
||||
bad_path.write_text("not a skill")
|
||||
with pytest.raises(ValueError, match=".skill"):
|
||||
install_skill_from_archive(bad_path)
|
||||
|
||||
def test_bad_frontmatter(self, tmp_path):
|
||||
zip_path = tmp_path / "bad.skill"
|
||||
with zipfile.ZipFile(zip_path, "w") as zf:
|
||||
zf.writestr("bad/SKILL.md", "no frontmatter here")
|
||||
skills_root = tmp_path / "skills"
|
||||
skills_root.mkdir()
|
||||
with pytest.raises(ValueError, match="Invalid skill"):
|
||||
install_skill_from_archive(zip_path, skills_root=skills_root)
|
||||
|
||||
def test_nonexistent_file(self):
|
||||
with pytest.raises(FileNotFoundError):
|
||||
install_skill_from_archive(Path("/nonexistent/path.skill"))
|
||||
|
||||
def test_macosx_filtered_during_resolve(self, tmp_path):
|
||||
"""Archive with __MACOSX dir still installs correctly."""
|
||||
zip_path = tmp_path / "mac.skill"
|
||||
with zipfile.ZipFile(zip_path, "w") as zf:
|
||||
zf.writestr("my-skill/SKILL.md", "---\nname: my-skill\ndescription: desc\n---\n# My Skill\n")
|
||||
zf.writestr("__MACOSX/._my-skill", "meta")
|
||||
skills_root = tmp_path / "skills"
|
||||
skills_root.mkdir()
|
||||
result = install_skill_from_archive(zip_path, skills_root=skills_root)
|
||||
assert result["success"] is True
|
||||
assert result["skill_name"] == "my-skill"
|
||||
146
backend/tests/test_uploads_manager.py
Normal file
146
backend/tests/test_uploads_manager.py
Normal file
@@ -0,0 +1,146 @@
|
||||
"""Tests for deerflow.uploads.manager — shared upload management logic."""
|
||||
|
||||
import pytest
|
||||
|
||||
from deerflow.uploads.manager import (
|
||||
PathTraversalError,
|
||||
claim_unique_filename,
|
||||
delete_file_safe,
|
||||
list_files_in_dir,
|
||||
normalize_filename,
|
||||
validate_path_traversal,
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# normalize_filename
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestNormalizeFilename:
|
||||
def test_safe_filename(self):
|
||||
assert normalize_filename("report.pdf") == "report.pdf"
|
||||
|
||||
def test_strips_path_components(self):
|
||||
assert normalize_filename("../../etc/passwd") == "passwd"
|
||||
|
||||
def test_rejects_empty(self):
|
||||
with pytest.raises(ValueError, match="empty"):
|
||||
normalize_filename("")
|
||||
|
||||
def test_rejects_dot_dot(self):
|
||||
with pytest.raises(ValueError, match="unsafe"):
|
||||
normalize_filename("..")
|
||||
|
||||
def test_strips_separators(self):
|
||||
assert normalize_filename("path/to/file.txt") == "file.txt"
|
||||
|
||||
def test_dot_only(self):
|
||||
with pytest.raises(ValueError, match="unsafe"):
|
||||
normalize_filename(".")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# claim_unique_filename
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestDeduplicateFilename:
|
||||
def test_no_collision(self):
|
||||
seen: set[str] = set()
|
||||
assert claim_unique_filename("data.txt", seen) == "data.txt"
|
||||
assert "data.txt" in seen
|
||||
|
||||
def test_single_collision(self):
|
||||
seen = {"data.txt"}
|
||||
assert claim_unique_filename("data.txt", seen) == "data_1.txt"
|
||||
assert "data_1.txt" in seen
|
||||
|
||||
def test_triple_collision(self):
|
||||
seen = {"data.txt", "data_1.txt", "data_2.txt"}
|
||||
assert claim_unique_filename("data.txt", seen) == "data_3.txt"
|
||||
assert "data_3.txt" in seen
|
||||
|
||||
def test_mutates_seen(self):
|
||||
seen: set[str] = set()
|
||||
claim_unique_filename("a.txt", seen)
|
||||
claim_unique_filename("a.txt", seen)
|
||||
assert seen == {"a.txt", "a_1.txt"}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# validate_path_traversal
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestValidatePathTraversal:
|
||||
def test_inside_base_ok(self, tmp_path):
|
||||
child = tmp_path / "file.txt"
|
||||
child.touch()
|
||||
validate_path_traversal(child, tmp_path) # no exception
|
||||
|
||||
def test_outside_base_raises(self, tmp_path):
|
||||
outside = tmp_path / ".." / "evil.txt"
|
||||
with pytest.raises(PathTraversalError, match="traversal"):
|
||||
validate_path_traversal(outside, tmp_path)
|
||||
|
||||
def test_symlink_escape(self, tmp_path):
|
||||
target = tmp_path.parent / "secret.txt"
|
||||
target.touch()
|
||||
link = tmp_path / "escape"
|
||||
link.symlink_to(target)
|
||||
with pytest.raises(PathTraversalError, match="traversal"):
|
||||
validate_path_traversal(link, tmp_path)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# list_files_in_dir
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestListFilesInDir:
|
||||
def test_empty_dir(self, tmp_path):
|
||||
result = list_files_in_dir(tmp_path)
|
||||
assert result == {"files": [], "count": 0}
|
||||
|
||||
def test_nonexistent_dir(self, tmp_path):
|
||||
result = list_files_in_dir(tmp_path / "nope")
|
||||
assert result == {"files": [], "count": 0}
|
||||
|
||||
def test_multiple_files_sorted(self, tmp_path):
|
||||
(tmp_path / "b.txt").write_text("b")
|
||||
(tmp_path / "a.txt").write_text("a")
|
||||
result = list_files_in_dir(tmp_path)
|
||||
assert result["count"] == 2
|
||||
assert result["files"][0]["filename"] == "a.txt"
|
||||
assert result["files"][1]["filename"] == "b.txt"
|
||||
for f in result["files"]:
|
||||
assert set(f.keys()) == {"filename", "size", "path", "extension", "modified"}
|
||||
|
||||
def test_ignores_subdirectories(self, tmp_path):
|
||||
(tmp_path / "file.txt").write_text("data")
|
||||
(tmp_path / "subdir").mkdir()
|
||||
result = list_files_in_dir(tmp_path)
|
||||
assert result["count"] == 1
|
||||
assert result["files"][0]["filename"] == "file.txt"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# delete_file_safe
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestDeleteFileSafe:
|
||||
def test_delete_existing_file(self, tmp_path):
|
||||
f = tmp_path / "test.txt"
|
||||
f.write_text("data")
|
||||
result = delete_file_safe(tmp_path, "test.txt")
|
||||
assert result["success"] is True
|
||||
assert not f.exists()
|
||||
|
||||
def test_delete_nonexistent_raises(self, tmp_path):
|
||||
with pytest.raises(FileNotFoundError):
|
||||
delete_file_safe(tmp_path, "nope.txt")
|
||||
|
||||
def test_delete_traversal_raises(self, tmp_path):
|
||||
with pytest.raises(PathTraversalError, match="traversal"):
|
||||
delete_file_safe(tmp_path, "../outside.txt")
|
||||
@@ -19,6 +19,7 @@ def test_upload_files_writes_thread_storage_and_skips_local_sandbox_sync(tmp_pat
|
||||
|
||||
with (
|
||||
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
||||
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
||||
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
||||
):
|
||||
file = UploadFile(filename="notes.txt", file=BytesIO(b"hello uploads"))
|
||||
@@ -48,6 +49,7 @@ def test_upload_files_syncs_non_local_sandbox_and_marks_markdown_file(tmp_path):
|
||||
|
||||
with (
|
||||
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
||||
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
||||
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
||||
patch.object(uploads, "convert_file_to_markdown", AsyncMock(side_effect=fake_convert)),
|
||||
):
|
||||
@@ -78,6 +80,7 @@ def test_upload_files_rejects_dotdot_and_dot_filenames(tmp_path):
|
||||
|
||||
with (
|
||||
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
||||
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
||||
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
||||
):
|
||||
# These filenames must be rejected outright
|
||||
|
||||
Reference in New Issue
Block a user