refactor: extract shared skill installer and upload manager to harness (#1202)

* refactor: extract shared skill installer and upload manager to harness Move duplicated business logic from Gateway routers and Client into shared harness modules, eliminating code duplication. New shared modules: - deerflow.skills.installer: 6 functions (zip security, extraction, install) - deerflow.uploads.manager: 7 functions (normalize, deduplicate, validate, list, delete, get_uploads_dir, ensure_uploads_dir) Key improvements: - SkillAlreadyExistsError replaces stringly-typed 409 status routing - normalize_filename rejects backslash-containing filenames - Read paths (list/delete) no longer mkdir via get_uploads_dir - Write paths use ensure_uploads_dir for explicit directory creation - list_files_in_dir does stat inside scandir context (no re-stat) - install_skill_from_archive uses single is_file() check (one syscall) - Fix agent config key not reset on update_mcp_config/update_skill Tests: 42 new (22 installer + 20 upload manager) + client hardening * refactor: centralize upload URL construction and clean up installer - Extract upload_virtual_path(), upload_artifact_url(), enrich_file_listing() into shared manager.py, eliminating 6 duplicated URL constructions across Gateway router and Client - Derive all upload URLs from VIRTUAL_PATH_PREFIX constant instead of hardcoded "mnt/user-data/uploads" strings - Eliminate TOCTOU pre-checks and double file read in installer — single ZipFile() open with exception handling replaces is_file() + is_zipfile() + ZipFile() sequence - Add missing re-exports: ensure_uploads_dir in uploads/__init__.py, SkillAlreadyExistsError in skills/__init__.py - Remove redundant .lower() on already-lowercase CONVERTIBLE_EXTENSIONS - Hoist sandbox_uploads_dir(thread_id) before loop in uploads router * fix: add input validation for thread_id and filename length - Reject thread_id containing unsafe filesystem characters (only allow alphanumeric, hyphens, underscores, dots) — prevents 500 on inputs like <script> or shell metacharacters - Reject filenames longer than 255 bytes (OS limit) in normalize_filename - Gateway upload router maps ValueError to 400 for invalid thread_id * fix: address PR review — symlink safety, input validation coverage, error ordering - list_files_in_dir: use follow_symlinks=False to prevent symlink metadata leakage; check is_dir() instead of exists() for non-directory paths - install_skill_from_archive: restore is_file() pre-check before extension validation so error messages match the documented exception contract - validate_thread_id: move from ensure_uploads_dir to get_uploads_dir so all entry points (upload/list/delete) are protected - delete_uploaded_file: catch ValueError from thread_id validation (was 500) - requires_llm marker: also skip when OPENAI_API_KEY is unset - e2e fixture: update TitleMiddleware exclusion comment (kept filtering — middleware triggers extra LLM calls that add non-determinism to tests) * chore: revert uv.lock to main — no dependency changes in this PR * fix: use monkeypatch for global config in e2e fixture to prevent test pollution The e2e_env fixture was calling set_title_config() and set_summarization_config() directly, which mutated global singletons without automatic cleanup. When pytest ran test_client_e2e.py before test_title_middleware_core_logic.py, the leaked enabled=False caused 5 title tests to fail in CI. Switched to monkeypatch.setattr on the module-level private variables so pytest restores the originals after each test. * fix: address code review — URL encoding, API consistency, test isolation - upload_artifact_url: percent-encode filename to handle spaces/#/? - deduplicate_filename: mutate seen set in place (caller no longer needs manual .add() — less error-prone API) - list_files_in_dir: document that size is int, enrich stringifies - e2e fixture: monkeypatch _app_config instead of set_app_config() to prevent global singleton pollution (same pattern as title/summarization fix) - _make_e2e_config: read LLM connection details from env vars so external contributors can override defaults - Update tests to match new deduplicate_filename contract * docs: rewrite RFC in English and add alternatives/breaking changes sections * fix: address code review feedback on PR #1202 - Rename deduplicate_filename to claim_unique_filename to make the in-place set mutation explicit in the function name - Replace PermissionError with PathTraversalError(ValueError) for path traversal detection — malformed input is 400, not 403 * fix: set _app_config_is_custom in e2e test fixture to prevent config.yaml lookup in CI --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
2026-04-03 06:12:14 +08:00 · 2026-03-25 16:28:33 +08:00
parent ec46ae075d
commit b8bc80d89b
14 changed files with 2591 additions and 567 deletions
--- a/backend/tests/test_client.py
+++ b/backend/tests/test_client.py
@@ -9,7 +9,7 @@ from pathlib import Path
 from unittest.mock import MagicMock, patch

 import pytest
-from langchain_core.messages import AIMessage, HumanMessage, ToolMessage  # noqa: F401
+from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage  # noqa: F401

 from app.gateway.routers.mcp import McpConfigResponse
 from app.gateway.routers.memory import MemoryConfigResponse, MemoryStatusResponse
@@ -17,6 +17,8 @@ from app.gateway.routers.models import ModelResponse, ModelsListResponse
 from app.gateway.routers.skills import SkillInstallResponse, SkillResponse, SkillsListResponse
 from app.gateway.routers.uploads import UploadResponse
 from deerflow.client import DeerFlowClient
+from deerflow.config.paths import Paths
+from deerflow.uploads.manager import PathTraversalError

 # ---------------------------------------------------------------------------
 # Fixtures
@@ -609,10 +611,7 @@ class TestSkillsManagement:
            skills_root = tmp_path / "skills"
            (skills_root / "custom").mkdir(parents=True)

-            with (
-                patch("deerflow.skills.loader.get_skills_root_path", return_value=skills_root),
-                patch("deerflow.skills.validation._validate_skill_frontmatter", return_value=(True, "OK", "my-skill")),
-            ):
+            with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
                result = client.install_skill(archive_path)

            assert result["success"] is True
@@ -700,7 +699,7 @@ class TestUploads:
            uploads_dir = tmp_path / "uploads"
            uploads_dir.mkdir()

-            with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
+            with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
                result = client.upload_files("thread-1", [src_file])

            assert result["success"] is True
@@ -756,7 +755,7 @@ class TestUploads:
                return client.upload_files("thread-async", [first, second])

            with (
-                patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir),
+                patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir),
                patch("deerflow.utils.file_conversion.CONVERTIBLE_EXTENSIONS", {".pdf"}),
                patch("deerflow.utils.file_conversion.convert_file_to_markdown", side_effect=fake_convert),
                patch("concurrent.futures.ThreadPoolExecutor", FakeExecutor),
@@ -777,7 +776,7 @@ class TestUploads:
            (uploads_dir / "a.txt").write_text("a")
            (uploads_dir / "b.txt").write_text("bb")

-            with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
+            with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
                result = client.list_uploads("thread-1")

            assert result["count"] == 2
@@ -793,7 +792,7 @@ class TestUploads:
            uploads_dir = Path(tmp)
            (uploads_dir / "delete-me.txt").write_text("gone")

-            with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
+            with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
                result = client.delete_upload("thread-1", "delete-me.txt")

            assert result["success"] is True
@@ -802,15 +801,15 @@ class TestUploads:

    def test_delete_upload_not_found(self, client):
        with tempfile.TemporaryDirectory() as tmp:
-            with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=Path(tmp)):
+            with patch("deerflow.client.get_uploads_dir", return_value=Path(tmp)):
                with pytest.raises(FileNotFoundError):
                    client.delete_upload("thread-1", "nope.txt")

    def test_delete_upload_path_traversal(self, client):
        with tempfile.TemporaryDirectory() as tmp:
            uploads_dir = Path(tmp)
-            with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
-                with pytest.raises(PermissionError):
+            with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
+                with pytest.raises(PathTraversalError):
                    client.delete_upload("thread-1", "../../etc/passwd")


@@ -822,15 +821,12 @@ class TestUploads:
 class TestArtifacts:
    def test_get_artifact(self, client):
        with tempfile.TemporaryDirectory() as tmp:
-            user_data_dir = Path(tmp) / "user-data"
-            outputs = user_data_dir / "outputs"
+            paths = Paths(base_dir=tmp)
+            outputs = paths.sandbox_outputs_dir("t1")
            outputs.mkdir(parents=True)
            (outputs / "result.txt").write_text("artifact content")

-            mock_paths = MagicMock()
-            mock_paths.sandbox_user_data_dir.return_value = user_data_dir
-
-            with patch("deerflow.client.get_paths", return_value=mock_paths):
+            with patch("deerflow.client.get_paths", return_value=paths):
                content, mime = client.get_artifact("t1", "mnt/user-data/outputs/result.txt")

            assert content == b"artifact content"
@@ -838,13 +834,10 @@ class TestArtifacts:

    def test_get_artifact_not_found(self, client):
        with tempfile.TemporaryDirectory() as tmp:
-            user_data_dir = Path(tmp) / "user-data"
-            user_data_dir.mkdir()
+            paths = Paths(base_dir=tmp)
+            paths.sandbox_user_data_dir("t1").mkdir(parents=True)

-            mock_paths = MagicMock()
-            mock_paths.sandbox_user_data_dir.return_value = user_data_dir
-
-            with patch("deerflow.client.get_paths", return_value=mock_paths):
+            with patch("deerflow.client.get_paths", return_value=paths):
                with pytest.raises(FileNotFoundError):
                    client.get_artifact("t1", "mnt/user-data/outputs/nope.txt")

@@ -854,14 +847,11 @@ class TestArtifacts:

    def test_get_artifact_path_traversal(self, client):
        with tempfile.TemporaryDirectory() as tmp:
-            user_data_dir = Path(tmp) / "user-data"
-            user_data_dir.mkdir()
+            paths = Paths(base_dir=tmp)
+            paths.sandbox_user_data_dir("t1").mkdir(parents=True)

-            mock_paths = MagicMock()
-            mock_paths.sandbox_user_data_dir.return_value = user_data_dir
-
-            with patch("deerflow.client.get_paths", return_value=mock_paths):
-                with pytest.raises(PermissionError):
+            with patch("deerflow.client.get_paths", return_value=paths):
+                with pytest.raises(PathTraversalError):
                    client.get_artifact("t1", "mnt/user-data/../../../etc/passwd")


@@ -1013,7 +1003,7 @@ class TestScenarioFileLifecycle:
            (tmp_path / "report.txt").write_text("quarterly report data")
            (tmp_path / "data.csv").write_text("a,b,c\n1,2,3")

-            with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
+            with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
                # Step 1: Upload
                result = client.upload_files(
                    "t-lifecycle",
@@ -1046,15 +1036,16 @@ class TestScenarioFileLifecycle:
            tmp_path = Path(tmp)
            uploads_dir = tmp_path / "uploads"
            uploads_dir.mkdir()
-            user_data_dir = tmp_path / "user-data"
-            outputs_dir = user_data_dir / "outputs"
+
+            paths = Paths(base_dir=tmp_path)
+            outputs_dir = paths.sandbox_outputs_dir("t-artifact")
            outputs_dir.mkdir(parents=True)

            # Upload phase
            src_file = tmp_path / "input.txt"
            src_file.write_text("raw data to process")

-            with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
+            with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
                uploaded = client.upload_files("t-artifact", [src_file])
                assert len(uploaded["files"]) == 1

@@ -1062,10 +1053,7 @@ class TestScenarioFileLifecycle:
            (outputs_dir / "analysis.json").write_text('{"result": "processed"}')

            # Retrieve artifact
-            mock_paths = MagicMock()
-            mock_paths.sandbox_user_data_dir.return_value = user_data_dir
-
-            with patch("deerflow.client.get_paths", return_value=mock_paths):
+            with patch("deerflow.client.get_paths", return_value=paths):
                content, mime = client.get_artifact("t-artifact", "mnt/user-data/outputs/analysis.json")

            assert json.loads(content) == {"result": "processed"}
@@ -1286,7 +1274,7 @@ class TestScenarioThreadIsolation:
            def get_dir(thread_id):
                return uploads_a if thread_id == "thread-a" else uploads_b

-            with patch.object(DeerFlowClient, "_get_uploads_dir", side_effect=get_dir):
+            with patch("deerflow.client.get_uploads_dir", side_effect=get_dir), patch("deerflow.client.ensure_uploads_dir", side_effect=get_dir):
                client.upload_files("thread-a", [src_file])

                files_a = client.list_uploads("thread-a")
@@ -1298,18 +1286,13 @@ class TestScenarioThreadIsolation:
    def test_artifacts_isolated_per_thread(self, client):
        """Artifacts in thread-A are not accessible from thread-B."""
        with tempfile.TemporaryDirectory() as tmp:
-            tmp_path = Path(tmp)
+            paths = Paths(base_dir=tmp)
+            outputs_a = paths.sandbox_outputs_dir("thread-a")
+            outputs_a.mkdir(parents=True)
+            paths.sandbox_user_data_dir("thread-b").mkdir(parents=True)
+            (outputs_a / "result.txt").write_text("thread-a artifact")

-            data_a = tmp_path / "thread-a"
-            data_b = tmp_path / "thread-b"
-            (data_a / "outputs").mkdir(parents=True)
-            (data_b / "outputs").mkdir(parents=True)
-            (data_a / "outputs" / "result.txt").write_text("thread-a artifact")
-
-            mock_paths = MagicMock()
-            mock_paths.sandbox_user_data_dir.side_effect = lambda tid: data_a if tid == "thread-a" else data_b
-
-            with patch("deerflow.client.get_paths", return_value=mock_paths):
+            with patch("deerflow.client.get_paths", return_value=paths):
                content, _ = client.get_artifact("thread-a", "mnt/user-data/outputs/result.txt")
                assert content == b"thread-a artifact"

@@ -1377,10 +1360,7 @@ class TestScenarioSkillInstallAndUse:
            (skills_root / "custom").mkdir(parents=True)

            # Step 1: Install
-            with (
-                patch("deerflow.skills.loader.get_skills_root_path", return_value=skills_root),
-                patch("deerflow.skills.validation._validate_skill_frontmatter", return_value=(True, "OK", "my-analyzer")),
-            ):
+            with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
                result = client.install_skill(archive)
            assert result["success"] is True
            assert (skills_root / "custom" / "my-analyzer" / "SKILL.md").exists()
@@ -1512,7 +1492,7 @@ class TestScenarioEdgeCases:
            pdf_file.write_bytes(b"%PDF-1.4 fake content")

            with (
-                patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir),
+                patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir),
                patch("deerflow.utils.file_conversion.CONVERTIBLE_EXTENSIONS", {".pdf"}),
                patch("deerflow.utils.file_conversion.convert_file_to_markdown", side_effect=Exception("conversion failed")),
            ):
@@ -1614,9 +1594,7 @@ class TestGatewayConformance:
        with zipfile.ZipFile(archive, "w") as zf:
            zf.write(skill_dir / "SKILL.md", "my-skill/SKILL.md")

-        custom_dir = tmp_path / "custom"
-        custom_dir.mkdir()
-        with patch("deerflow.skills.loader.get_skills_root_path", return_value=tmp_path):
+        with patch("deerflow.skills.installer.get_skills_root_path", return_value=tmp_path):
            result = client.install_skill(archive)

        parsed = SkillInstallResponse(**result)
@@ -1680,7 +1658,7 @@ class TestGatewayConformance:
        src_file = tmp_path / "hello.txt"
        src_file.write_text("hello")

-        with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
+        with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
            result = client.upload_files("t-conform", [src_file])

        parsed = UploadResponse(**result)
@@ -1739,3 +1717,694 @@ class TestGatewayConformance:
        parsed = MemoryStatusResponse(**result)
        assert parsed.config.enabled is True
        assert parsed.data.version == "1.0"
+
+
+
+# ===========================================================================
+# Hardening — install_skill security gates
+# ===========================================================================
+
+
+class TestInstallSkillSecurity:
+    """Every security gate in install_skill() must have a red-line test."""
+
+    def test_zip_bomb_rejected(self, client):
+        """Archives whose extracted size exceeds the limit are rejected."""
+        with tempfile.TemporaryDirectory() as tmp:
+            archive = Path(tmp) / "bomb.skill"
+            # Create a small archive that claims huge uncompressed size.
+            # Write 200 bytes but the safe_extract checks cumulative file_size.
+            data = b"\x00" * 200
+            with zipfile.ZipFile(archive, "w", compression=zipfile.ZIP_DEFLATED) as zf:
+                zf.writestr("big.bin", data)
+
+            skills_root = Path(tmp) / "skills"
+            (skills_root / "custom").mkdir(parents=True)
+
+            # Patch max_total_size to a small value to trigger the bomb check.
+            from deerflow.skills import installer as _installer
+            orig = _installer.safe_extract_skill_archive
+
+            def patched_extract(zf, dest, max_total_size=100):
+                return orig(zf, dest, max_total_size=100)
+
+            with (
+                patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
+                patch("deerflow.skills.installer.safe_extract_skill_archive", side_effect=patched_extract),
+            ):
+                with pytest.raises(ValueError, match="too large"):
+                    client.install_skill(archive)
+
+    def test_absolute_path_in_archive_rejected(self, client):
+        """ZIP entries with absolute paths are rejected."""
+        with tempfile.TemporaryDirectory() as tmp:
+            archive = Path(tmp) / "abs.skill"
+            with zipfile.ZipFile(archive, "w") as zf:
+                zf.writestr("/etc/passwd", "root:x:0:0")
+
+            skills_root = Path(tmp) / "skills"
+            (skills_root / "custom").mkdir(parents=True)
+
+            with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
+                with pytest.raises(ValueError, match="unsafe"):
+                    client.install_skill(archive)
+
+    def test_dotdot_path_in_archive_rejected(self, client):
+        """ZIP entries with '..' path components are rejected."""
+        with tempfile.TemporaryDirectory() as tmp:
+            archive = Path(tmp) / "traversal.skill"
+            with zipfile.ZipFile(archive, "w") as zf:
+                zf.writestr("skill/../../../etc/shadow", "bad")
+
+            skills_root = Path(tmp) / "skills"
+            (skills_root / "custom").mkdir(parents=True)
+
+            with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
+                with pytest.raises(ValueError, match="unsafe"):
+                    client.install_skill(archive)
+
+    def test_symlinks_skipped_during_extraction(self, client):
+        """Symlink entries in the archive are skipped (never written to disk)."""
+        import stat as stat_mod
+
+        with tempfile.TemporaryDirectory() as tmp:
+            tmp_path = Path(tmp)
+
+            archive = tmp_path / "sym-skill.skill"
+            with zipfile.ZipFile(archive, "w") as zf:
+                zf.writestr("sym-skill/SKILL.md", "---\nname: sym-skill\ndescription: test\n---\nBody")
+                # Inject a symlink entry via ZipInfo with Unix symlink mode.
+                link_info = zipfile.ZipInfo("sym-skill/sneaky_link")
+                link_info.external_attr = (stat_mod.S_IFLNK | 0o777) << 16
+                zf.writestr(link_info, "/etc/passwd")
+
+            skills_root = tmp_path / "skills"
+            (skills_root / "custom").mkdir(parents=True)
+
+            with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
+                result = client.install_skill(archive)
+
+            assert result["success"] is True
+            installed = skills_root / "custom" / "sym-skill"
+            assert (installed / "SKILL.md").exists()
+            assert not (installed / "sneaky_link").exists()
+
+    def test_invalid_skill_name_rejected(self, client):
+        """Skill names containing special characters are rejected."""
+        with tempfile.TemporaryDirectory() as tmp:
+            tmp_path = Path(tmp)
+
+            skill_dir = tmp_path / "bad-name"
+            skill_dir.mkdir()
+            (skill_dir / "SKILL.md").write_text("---\nname: ../evil\ndescription: test\n---\n")
+
+            archive = tmp_path / "bad.skill"
+            with zipfile.ZipFile(archive, "w") as zf:
+                zf.write(skill_dir / "SKILL.md", "bad-name/SKILL.md")
+
+            skills_root = tmp_path / "skills"
+            (skills_root / "custom").mkdir(parents=True)
+
+            with (
+                patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
+                patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(True, "OK", "../evil")),
+            ):
+                with pytest.raises(ValueError, match="Invalid skill name"):
+                    client.install_skill(archive)
+
+    def test_existing_skill_rejected(self, client):
+        """Installing a skill that already exists is rejected."""
+        with tempfile.TemporaryDirectory() as tmp:
+            tmp_path = Path(tmp)
+
+            skill_dir = tmp_path / "dupe-skill"
+            skill_dir.mkdir()
+            (skill_dir / "SKILL.md").write_text("---\nname: dupe-skill\ndescription: test\n---\n")
+
+            archive = tmp_path / "dupe-skill.skill"
+            with zipfile.ZipFile(archive, "w") as zf:
+                zf.write(skill_dir / "SKILL.md", "dupe-skill/SKILL.md")
+
+            skills_root = tmp_path / "skills"
+            (skills_root / "custom" / "dupe-skill").mkdir(parents=True)
+
+            with (
+                patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
+                patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(True, "OK", "dupe-skill")),
+            ):
+                with pytest.raises(ValueError, match="already exists"):
+                    client.install_skill(archive)
+
+    def test_empty_archive_rejected(self, client):
+        """An archive with no entries is rejected."""
+        with tempfile.TemporaryDirectory() as tmp:
+            archive = Path(tmp) / "empty.skill"
+            with zipfile.ZipFile(archive, "w"):
+                pass  # empty archive
+
+            skills_root = Path(tmp) / "skills"
+            (skills_root / "custom").mkdir(parents=True)
+
+            with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
+                with pytest.raises(ValueError, match="empty"):
+                    client.install_skill(archive)
+
+    def test_invalid_frontmatter_rejected(self, client):
+        """Archive with invalid SKILL.md frontmatter is rejected."""
+        with tempfile.TemporaryDirectory() as tmp:
+            tmp_path = Path(tmp)
+            skill_dir = tmp_path / "bad-meta"
+            skill_dir.mkdir()
+            (skill_dir / "SKILL.md").write_text("no frontmatter at all")
+
+            archive = tmp_path / "bad-meta.skill"
+            with zipfile.ZipFile(archive, "w") as zf:
+                zf.write(skill_dir / "SKILL.md", "bad-meta/SKILL.md")
+
+            skills_root = tmp_path / "skills"
+            (skills_root / "custom").mkdir(parents=True)
+
+            with (
+                patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
+                patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(False, "Missing name field", "")),
+            ):
+                with pytest.raises(ValueError, match="Invalid skill"):
+                    client.install_skill(archive)
+
+    def test_not_a_zip_rejected(self, client):
+        """A .skill file that is not a valid ZIP is rejected."""
+        with tempfile.TemporaryDirectory() as tmp:
+            archive = Path(tmp) / "fake.skill"
+            archive.write_text("this is not a zip file")
+
+            with pytest.raises(ValueError, match="not a valid ZIP"):
+                client.install_skill(archive)
+
+    def test_directory_path_rejected(self, client):
+        """Passing a directory instead of a file is rejected."""
+        with tempfile.TemporaryDirectory() as tmp:
+            with pytest.raises(ValueError, match="not a file"):
+                client.install_skill(tmp)
+
+
+# ===========================================================================
+# Hardening — _atomic_write_json error paths
+# ===========================================================================
+
+
+class TestAtomicWriteJson:
+    def test_temp_file_cleaned_on_serialization_failure(self):
+        """If json.dump raises, the temp file is removed."""
+        with tempfile.TemporaryDirectory() as tmp:
+            target = Path(tmp) / "config.json"
+
+            # An object that cannot be serialized to JSON.
+            bad_data = {"key": object()}
+
+            with pytest.raises(TypeError):
+                DeerFlowClient._atomic_write_json(target, bad_data)
+
+            # Target should not have been created.
+            assert not target.exists()
+            # No stray .tmp files should remain.
+            tmp_files = list(Path(tmp).glob("*.tmp"))
+            assert tmp_files == []
+
+    def test_happy_path_writes_atomically(self):
+        """Normal write produces correct JSON and no temp files."""
+        with tempfile.TemporaryDirectory() as tmp:
+            target = Path(tmp) / "out.json"
+            data = {"key": "value", "nested": [1, 2, 3]}
+
+            DeerFlowClient._atomic_write_json(target, data)
+
+            assert target.exists()
+            with open(target) as f:
+                loaded = json.load(f)
+            assert loaded == data
+            # No temp files left behind.
+            assert list(Path(tmp).glob("*.tmp")) == []
+
+    def test_original_preserved_on_failure(self):
+        """If write fails, the original file is not corrupted."""
+        with tempfile.TemporaryDirectory() as tmp:
+            target = Path(tmp) / "config.json"
+            target.write_text('{"original": true}')
+
+            bad_data = {"key": object()}
+            with pytest.raises(TypeError):
+                DeerFlowClient._atomic_write_json(target, bad_data)
+
+            # Original content must survive.
+            with open(target) as f:
+                assert json.load(f) == {"original": True}
+
+
+# ===========================================================================
+# Hardening — config update error paths
+# ===========================================================================
+
+
+class TestConfigUpdateErrors:
+    def test_update_mcp_config_no_config_file(self, client):
+        """FileNotFoundError when extensions_config.json cannot be located."""
+        with patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=None):
+            with pytest.raises(FileNotFoundError, match="Cannot locate"):
+                client.update_mcp_config({"server": {}})
+
+    def test_update_skill_no_config_file(self, client):
+        """FileNotFoundError when extensions_config.json cannot be located."""
+        skill = MagicMock()
+        skill.name = "some-skill"
+
+        with (
+            patch("deerflow.skills.loader.load_skills", return_value=[skill]),
+            patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=None),
+        ):
+            with pytest.raises(FileNotFoundError, match="Cannot locate"):
+                client.update_skill("some-skill", enabled=False)
+
+    def test_update_skill_disappears_after_write(self, client):
+        """RuntimeError when skill vanishes between write and re-read."""
+        skill = MagicMock()
+        skill.name = "ghost-skill"
+
+        ext_config = MagicMock()
+        ext_config.mcp_servers = {}
+        ext_config.skills = {}
+
+        with tempfile.TemporaryDirectory() as tmp:
+            config_file = Path(tmp) / "extensions_config.json"
+            config_file.write_text("{}")
+
+            with (
+                patch("deerflow.skills.loader.load_skills", side_effect=[[skill], []]),
+                patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
+                patch("deerflow.client.get_extensions_config", return_value=ext_config),
+                patch("deerflow.client.reload_extensions_config"),
+            ):
+                with pytest.raises(RuntimeError, match="disappeared"):
+                    client.update_skill("ghost-skill", enabled=False)
+
+
+# ===========================================================================
+# Hardening — stream / chat edge cases
+# ===========================================================================
+
+
+class TestStreamHardening:
+    def test_agent_exception_propagates(self, client):
+        """Exceptions from agent.stream() propagate to caller."""
+        agent = MagicMock()
+        agent.stream.side_effect = RuntimeError("model quota exceeded")
+
+        with (
+            patch.object(client, "_ensure_agent"),
+            patch.object(client, "_agent", agent),
+        ):
+            with pytest.raises(RuntimeError, match="model quota exceeded"):
+                list(client.stream("hi", thread_id="t-err"))
+
+    def test_messages_without_id(self, client):
+        """Messages without id attribute are emitted without crashing."""
+        ai = AIMessage(content="no id here")
+        # Forcibly remove the id attribute to simulate edge case.
+        object.__setattr__(ai, "id", None)
+        chunks = [{"messages": [ai]}]
+        agent = _make_agent_mock(chunks)
+
+        with (
+            patch.object(client, "_ensure_agent"),
+            patch.object(client, "_agent", agent),
+        ):
+            events = list(client.stream("hi", thread_id="t-noid"))
+
+        # Should produce events without error.
+        assert events[-1].type == "end"
+        ai_events = _ai_events(events)
+        assert len(ai_events) == 1
+        assert ai_events[0].data["content"] == "no id here"
+
+    def test_tool_calls_only_no_text(self, client):
+        """chat() returns empty string when agent only emits tool calls."""
+        ai = AIMessage(
+            content="",
+            id="ai-1",
+            tool_calls=[{"name": "bash", "args": {"cmd": "ls"}, "id": "tc-1"}],
+        )
+        tool = ToolMessage(content="output", id="tm-1", tool_call_id="tc-1", name="bash")
+        chunks = [
+            {"messages": [ai]},
+            {"messages": [ai, tool]},
+        ]
+        agent = _make_agent_mock(chunks)
+
+        with (
+            patch.object(client, "_ensure_agent"),
+            patch.object(client, "_agent", agent),
+        ):
+            result = client.chat("do it", thread_id="t-tc-only")
+
+        assert result == ""
+
+    def test_duplicate_messages_without_id_not_deduplicated(self, client):
+        """Messages with id=None are NOT deduplicated (each is emitted)."""
+        ai1 = AIMessage(content="first")
+        ai2 = AIMessage(content="second")
+        object.__setattr__(ai1, "id", None)
+        object.__setattr__(ai2, "id", None)
+
+        chunks = [
+            {"messages": [ai1]},
+            {"messages": [ai2]},
+        ]
+        agent = _make_agent_mock(chunks)
+
+        with (
+            patch.object(client, "_ensure_agent"),
+            patch.object(client, "_agent", agent),
+        ):
+            events = list(client.stream("hi", thread_id="t-dup-noid"))
+
+        ai_msgs = _ai_events(events)
+        assert len(ai_msgs) == 2
+
+
+# ===========================================================================
+# Hardening — _serialize_message coverage
+# ===========================================================================
+
+
+class TestSerializeMessage:
+    def test_system_message(self):
+        msg = SystemMessage(content="You are a helpful assistant.", id="sys-1")
+        result = DeerFlowClient._serialize_message(msg)
+        assert result["type"] == "system"
+        assert result["content"] == "You are a helpful assistant."
+        assert result["id"] == "sys-1"
+
+    def test_unknown_message_type(self):
+        """Non-standard message types serialize as 'unknown'."""
+        msg = MagicMock()
+        msg.id = "unk-1"
+        msg.content = "something"
+        # Not an instance of AIMessage/ToolMessage/HumanMessage/SystemMessage
+        type(msg).__name__ = "CustomMessage"
+        result = DeerFlowClient._serialize_message(msg)
+        assert result["type"] == "unknown"
+        assert result["id"] == "unk-1"
+
+    def test_ai_message_with_tool_calls(self):
+        msg = AIMessage(
+            content="",
+            id="ai-tc",
+            tool_calls=[{"name": "bash", "args": {"cmd": "ls"}, "id": "tc-1"}],
+        )
+        result = DeerFlowClient._serialize_message(msg)
+        assert result["type"] == "ai"
+        assert len(result["tool_calls"]) == 1
+        assert result["tool_calls"][0]["name"] == "bash"
+
+    def test_tool_message_non_string_content(self):
+        msg = ToolMessage(content={"key": "value"}, id="tm-1", tool_call_id="tc-1", name="tool")
+        result = DeerFlowClient._serialize_message(msg)
+        assert result["type"] == "tool"
+        assert isinstance(result["content"], str)
+
+
+# ===========================================================================
+# Hardening — upload / delete symlink attack
+# ===========================================================================
+
+
+class TestUploadDeleteSymlink:
+    def test_delete_upload_symlink_outside_dir(self, client):
+        """A symlink in uploads dir pointing outside is caught by path traversal check."""
+        with tempfile.TemporaryDirectory() as tmp:
+            uploads_dir = Path(tmp) / "uploads"
+            uploads_dir.mkdir()
+
+            # Create a target file outside uploads dir.
+            outside = Path(tmp) / "secret.txt"
+            outside.write_text("sensitive data")
+
+            # Create a symlink inside uploads dir pointing to outside file.
+            link = uploads_dir / "harmless.txt"
+            link.symlink_to(outside)
+
+            with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
+                # The resolved path of the symlink escapes uploads_dir,
+                # so path traversal check should catch it.
+                with pytest.raises(PathTraversalError):
+                    client.delete_upload("thread-1", "harmless.txt")
+
+            # The outside file must NOT have been deleted.
+            assert outside.exists()
+
+    def test_upload_filename_with_spaces_and_unicode(self, client):
+        """Files with spaces and unicode characters in names upload correctly."""
+        with tempfile.TemporaryDirectory() as tmp:
+            tmp_path = Path(tmp)
+            uploads_dir = tmp_path / "uploads"
+            uploads_dir.mkdir()
+
+            weird_name = "report 2024 数据.txt"
+            src_file = tmp_path / weird_name
+            src_file.write_text("data")
+
+            with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
+                result = client.upload_files("thread-1", [src_file])
+
+            assert result["success"] is True
+            assert result["files"][0]["filename"] == weird_name
+            assert (uploads_dir / weird_name).exists()
+
+
+# ===========================================================================
+# Hardening — artifact edge cases
+# ===========================================================================
+
+
+class TestArtifactHardening:
+    def test_artifact_directory_rejected(self, client):
+        """get_artifact rejects paths that resolve to a directory."""
+        with tempfile.TemporaryDirectory() as tmp:
+            paths = Paths(base_dir=tmp)
+            subdir = paths.sandbox_outputs_dir("t1") / "subdir"
+            subdir.mkdir(parents=True)
+
+            with patch("deerflow.client.get_paths", return_value=paths):
+                with pytest.raises(ValueError, match="not a file"):
+                    client.get_artifact("t1", "mnt/user-data/outputs/subdir")
+
+    def test_artifact_leading_slash_stripped(self, client):
+        """Paths with leading slash are handled correctly."""
+        with tempfile.TemporaryDirectory() as tmp:
+            paths = Paths(base_dir=tmp)
+            outputs = paths.sandbox_outputs_dir("t1")
+            outputs.mkdir(parents=True)
+            (outputs / "file.txt").write_text("content")
+
+            with patch("deerflow.client.get_paths", return_value=paths):
+                content, _mime = client.get_artifact("t1", "/mnt/user-data/outputs/file.txt")
+
+            assert content == b"content"
+
+
+# ===========================================================================
+# BUG DETECTION — tests that expose real bugs in client.py
+# ===========================================================================
+
+
+class TestUploadDuplicateFilenames:
+    """Regression: upload_files must auto-rename duplicate basenames.
+
+    Previously it silently overwrote the first file with the second,
+    then reported both in the response while only one existed on disk.
+    Now duplicates are renamed (data.txt → data_1.txt) and the response
+    includes original_filename so the agent / caller can see what happened.
+    """
+
+    def test_duplicate_filenames_auto_renamed(self, client):
+        """Two files with same basename → second gets _1 suffix."""
+        with tempfile.TemporaryDirectory() as tmp:
+            tmp_path = Path(tmp)
+            uploads_dir = tmp_path / "uploads"
+            uploads_dir.mkdir()
+
+            dir_a = tmp_path / "a"
+            dir_b = tmp_path / "b"
+            dir_a.mkdir()
+            dir_b.mkdir()
+            (dir_a / "data.txt").write_text("version A")
+            (dir_b / "data.txt").write_text("version B")
+
+            with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
+                result = client.upload_files("t-dup", [dir_a / "data.txt", dir_b / "data.txt"])
+
+            assert result["success"] is True
+            assert len(result["files"]) == 2
+
+            # Both files exist on disk with distinct names.
+            disk_files = sorted(p.name for p in uploads_dir.iterdir())
+            assert disk_files == ["data.txt", "data_1.txt"]
+
+            # First keeps original name, second is renamed.
+            assert result["files"][0]["filename"] == "data.txt"
+            assert "original_filename" not in result["files"][0]
+
+            assert result["files"][1]["filename"] == "data_1.txt"
+            assert result["files"][1]["original_filename"] == "data.txt"
+
+            # Content preserved correctly.
+            assert (uploads_dir / "data.txt").read_text() == "version A"
+            assert (uploads_dir / "data_1.txt").read_text() == "version B"
+
+    def test_triple_duplicate_increments_counter(self, client):
+        """Three files with same basename → _1, _2 suffixes."""
+        with tempfile.TemporaryDirectory() as tmp:
+            tmp_path = Path(tmp)
+            uploads_dir = tmp_path / "uploads"
+            uploads_dir.mkdir()
+
+            for name in ["x", "y", "z"]:
+                d = tmp_path / name
+                d.mkdir()
+                (d / "report.csv").write_text(f"from {name}")
+
+            with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
+                result = client.upload_files(
+                    "t-triple",
+                    [tmp_path / "x" / "report.csv", tmp_path / "y" / "report.csv", tmp_path / "z" / "report.csv"],
+                )
+
+            filenames = [f["filename"] for f in result["files"]]
+            assert filenames == ["report.csv", "report_1.csv", "report_2.csv"]
+            assert len(list(uploads_dir.iterdir())) == 3
+
+    def test_different_filenames_no_rename(self, client):
+        """Non-duplicate filenames upload normally without rename."""
+        with tempfile.TemporaryDirectory() as tmp:
+            tmp_path = Path(tmp)
+            uploads_dir = tmp_path / "uploads"
+            uploads_dir.mkdir()
+
+            (tmp_path / "a.txt").write_text("aaa")
+            (tmp_path / "b.txt").write_text("bbb")
+
+            with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
+                result = client.upload_files("t-ok", [tmp_path / "a.txt", tmp_path / "b.txt"])
+
+            assert result["success"] is True
+            assert len(result["files"]) == 2
+            assert all("original_filename" not in f for f in result["files"])
+            assert len(list(uploads_dir.iterdir())) == 2
+
+
+class TestBugArtifactPrefixMatchTooLoose:
+    """Regression: get_artifact must reject paths like ``mnt/user-data-evil/...``.
+
+    Previously ``startswith("mnt/user-data")`` matched ``"mnt/user-data-evil"``
+    because it was a string prefix, not a path-segment check.
+    """
+
+    def test_non_canonical_prefix_rejected(self, client):
+        """Paths that share a string prefix but differ at segment boundary are rejected."""
+        with pytest.raises(ValueError, match="must start with"):
+            client.get_artifact("t1", "mnt/user-data-evil/secret.txt")
+
+    def test_exact_prefix_without_subpath_accepted(self, client):
+        """Bare 'mnt/user-data' is accepted (will later fail as directory, not at prefix)."""
+        with tempfile.TemporaryDirectory() as tmp:
+            paths = Paths(base_dir=tmp)
+            paths.sandbox_user_data_dir("t1").mkdir(parents=True)
+
+            with patch("deerflow.client.get_paths", return_value=paths):
+                # Accepted at prefix check, but fails because it's a directory.
+                with pytest.raises(ValueError, match="not a file"):
+                    client.get_artifact("t1", "mnt/user-data")
+
+
+class TestBugListUploadsDeadCode:
+    """Regression: list_uploads works even when called on a fresh thread
+    (directory does not exist yet — returns empty without creating it).
+    """
+
+    def test_list_uploads_on_fresh_thread(self, client):
+        """list_uploads on a thread that never had uploads returns empty list."""
+        with tempfile.TemporaryDirectory() as tmp:
+            non_existent = Path(tmp) / "does-not-exist" / "uploads"
+            assert not non_existent.exists()
+
+            mock_paths = MagicMock()
+            mock_paths.sandbox_uploads_dir.return_value = non_existent
+
+            with patch("deerflow.uploads.manager.get_paths", return_value=mock_paths):
+                result = client.list_uploads("thread-fresh")
+
+            # Read path should NOT create the directory
+            assert not non_existent.exists()
+            assert result == {"files": [], "count": 0}
+
+
+class TestBugAgentInvalidationInconsistency:
+    """Regression: update_skill and update_mcp_config must reset both
+    _agent and _agent_config_key, just like reset_agent() does.
+    """
+
+    def test_update_mcp_resets_config_key(self, client):
+        """After update_mcp_config, both _agent and _agent_config_key are None."""
+        client._agent = MagicMock()
+        client._agent_config_key = ("model", True, False, False)
+
+        current_config = MagicMock()
+        current_config.skills = {}
+        reloaded = MagicMock()
+        reloaded.mcp_servers = {}
+
+        with tempfile.TemporaryDirectory() as tmp:
+            config_file = Path(tmp) / "ext.json"
+            config_file.write_text("{}")
+
+            with (
+                patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
+                patch("deerflow.client.get_extensions_config", return_value=current_config),
+                patch("deerflow.client.reload_extensions_config", return_value=reloaded),
+            ):
+                client.update_mcp_config({})
+
+        assert client._agent is None
+        assert client._agent_config_key is None
+
+    def test_update_skill_resets_config_key(self, client):
+        """After update_skill, both _agent and _agent_config_key are None."""
+        client._agent = MagicMock()
+        client._agent_config_key = ("model", True, False, False)
+
+        skill = MagicMock()
+        skill.name = "s1"
+        updated = MagicMock()
+        updated.name = "s1"
+        updated.description = "d"
+        updated.license = "MIT"
+        updated.category = "c"
+        updated.enabled = False
+
+        ext_config = MagicMock()
+        ext_config.mcp_servers = {}
+        ext_config.skills = {}
+
+        with tempfile.TemporaryDirectory() as tmp:
+            config_file = Path(tmp) / "ext.json"
+            config_file.write_text("{}")
+
+            with (
+                patch("deerflow.skills.loader.load_skills", side_effect=[[skill], [updated]]),
+                patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
+                patch("deerflow.client.get_extensions_config", return_value=ext_config),
+                patch("deerflow.client.reload_extensions_config"),
+            ):
+                client.update_skill("s1", enabled=False)
+
+        assert client._agent is None
+        assert client._agent_config_key is None
--- a/backend/tests/test_client_e2e.py
+++ b/backend/tests/test_client_e2e.py
@@ -0,0 +1,781 @@
+"""End-to-end tests for DeerFlowClient.
+
+Middle tier of the test pyramid:
+- Top:    test_client_live.py  — real LLM, needs API key
+- Middle: test_client_e2e.py   — real LLM + real modules  ← THIS FILE
+- Bottom: test_client.py       — unit tests, mock everything
+
+Core principle: use the real LLM from config.yaml, let config, middleware
+chain, tool registration, file I/O, and event serialization all run for real.
+Only DEER_FLOW_HOME is redirected to tmp_path for filesystem isolation.
+
+Tests that call the LLM are marked ``requires_llm`` and skipped in CI.
+File-management tests (upload/list/delete) don't need LLM and run everywhere.
+"""
+
+import json
+import os
+import uuid
+import zipfile
+
+import pytest
+from dotenv import load_dotenv
+
+from deerflow.client import DeerFlowClient, StreamEvent
+from deerflow.config.app_config import AppConfig
+from deerflow.config.model_config import ModelConfig
+from deerflow.config.sandbox_config import SandboxConfig
+
+# Load .env from project root (for OPENAI_API_KEY etc.)
+load_dotenv(os.path.join(os.path.dirname(__file__), "../../.env"))
+
+# ---------------------------------------------------------------------------
+# Markers
+# ---------------------------------------------------------------------------
+
+requires_llm = pytest.mark.skipif(
+    os.getenv("CI", "").lower() in ("true", "1") or not os.getenv("OPENAI_API_KEY"),
+    reason="Requires LLM API key — skipped in CI or when OPENAI_API_KEY is unset",
+)
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _make_e2e_config() -> AppConfig:
+    """Build a minimal AppConfig using real LLM credentials from environment.
+
+    All LLM connection details come from environment variables so that both
+    internal CI and external contributors can run the tests:
+
+    - ``E2E_MODEL_NAME``  (default: ``volcengine-ark``)
+    - ``E2E_MODEL_USE``   (default: ``langchain_openai:ChatOpenAI``)
+    - ``E2E_MODEL_ID``    (default: ``ep-20251211175242-llcmh``)
+    - ``E2E_BASE_URL``    (default: ``https://ark-cn-beijing.bytedance.net/api/v3``)
+    - ``OPENAI_API_KEY``  (required for LLM tests)
+    """
+    return AppConfig(
+        models=[
+            ModelConfig(
+                name=os.getenv("E2E_MODEL_NAME", "volcengine-ark"),
+                display_name="E2E Test Model",
+                use=os.getenv("E2E_MODEL_USE", "langchain_openai:ChatOpenAI"),
+                model=os.getenv("E2E_MODEL_ID", "ep-20251211175242-llcmh"),
+                base_url=os.getenv("E2E_BASE_URL", "https://ark-cn-beijing.bytedance.net/api/v3"),
+                api_key=os.getenv("OPENAI_API_KEY", ""),
+                max_tokens=512,
+                temperature=0.7,
+                supports_thinking=False,
+                supports_reasoning_effort=False,
+                supports_vision=False,
+            )
+        ],
+        sandbox=SandboxConfig(use="deerflow.sandbox.local:LocalSandboxProvider"),
+    )
+
+
+# ---------------------------------------------------------------------------
+# Fixtures
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture()
+def e2e_env(tmp_path, monkeypatch):
+    """Isolated filesystem environment for E2E tests.
+
+    - DEER_FLOW_HOME → tmp_path (all thread data lands in a temp dir)
+    - Singletons reset so they pick up the new env
+    - Title/memory/summarization disabled to avoid extra LLM calls
+    - AppConfig built programmatically (avoids config.yaml param-name issues)
+    """
+    # 1. Filesystem isolation
+    monkeypatch.setenv("DEER_FLOW_HOME", str(tmp_path))
+    monkeypatch.setattr("deerflow.config.paths._paths", None)
+    monkeypatch.setattr("deerflow.sandbox.sandbox_provider._default_sandbox_provider", None)
+
+    # 2. Inject a clean AppConfig via the global singleton.
+    config = _make_e2e_config()
+    monkeypatch.setattr("deerflow.config.app_config._app_config", config)
+    monkeypatch.setattr("deerflow.config.app_config._app_config_is_custom", True)
+
+    # 3. Disable title generation (extra LLM call, non-deterministic)
+    from deerflow.config.title_config import TitleConfig
+
+    monkeypatch.setattr("deerflow.config.title_config._title_config", TitleConfig(enabled=False))
+
+    # 4. Disable memory queueing (avoids background threads & file writes)
+    from deerflow.config.memory_config import MemoryConfig
+
+    monkeypatch.setattr(
+        "deerflow.agents.middlewares.memory_middleware.get_memory_config",
+        lambda: MemoryConfig(enabled=False),
+    )
+
+    # 5. Ensure summarization is off (default, but be explicit)
+    from deerflow.config.summarization_config import SummarizationConfig
+
+    monkeypatch.setattr("deerflow.config.summarization_config._summarization_config", SummarizationConfig(enabled=False))
+
+    # 6. Exclude TitleMiddleware from the chain.
+    #    It triggers an extra LLM call to generate a thread title, which adds
+    #    non-determinism and cost to E2E tests (title generation is already
+    #    disabled via TitleConfig above, but the middleware still participates
+    #    in the chain and can interfere with event ordering).
+    from deerflow.agents.lead_agent.agent import _build_middlewares as _original_build_middlewares
+    from deerflow.agents.middlewares.title_middleware import TitleMiddleware
+
+    def _sync_safe_build_middlewares(*args, **kwargs):
+        mws = _original_build_middlewares(*args, **kwargs)
+        return [m for m in mws if not isinstance(m, TitleMiddleware)]
+
+    monkeypatch.setattr("deerflow.client._build_middlewares", _sync_safe_build_middlewares)
+
+    return {"tmp_path": tmp_path}
+
+
+@pytest.fixture()
+def client(e2e_env):
+    """A DeerFlowClient wired to the isolated e2e_env."""
+    return DeerFlowClient(checkpointer=None, thinking_enabled=False)
+
+
+# ---------------------------------------------------------------------------
+# Step 2: Basic streaming (requires LLM)
+# ---------------------------------------------------------------------------
+
+
+class TestBasicChat:
+    """Basic chat and streaming behavior with real LLM."""
+
+    @requires_llm
+    def test_basic_chat(self, client):
+        """chat() returns a non-empty text response."""
+        result = client.chat("Say exactly: pong")
+        assert isinstance(result, str)
+        assert len(result) > 0
+
+    @requires_llm
+    def test_stream_event_sequence(self, client):
+        """stream() yields events: messages-tuple, values, and end."""
+        events = list(client.stream("Say hi"))
+
+        types = [e.type for e in events]
+        assert types[-1] == "end"
+        assert "messages-tuple" in types
+        assert "values" in types
+
+    @requires_llm
+    def test_stream_event_data_format(self, client):
+        """Each event type has the expected data structure."""
+        events = list(client.stream("Say hello"))
+
+        for event in events:
+            assert isinstance(event, StreamEvent)
+            assert isinstance(event.type, str)
+            assert isinstance(event.data, dict)
+
+            if event.type == "messages-tuple" and event.data.get("type") == "ai":
+                assert "content" in event.data
+                assert "id" in event.data
+            elif event.type == "values":
+                assert "messages" in event.data
+                assert "artifacts" in event.data
+            elif event.type == "end":
+                assert event.data == {}
+
+    @requires_llm
+    def test_multi_turn_stateless(self, client):
+        """Without checkpointer, two calls to the same thread_id are independent."""
+        tid = str(uuid.uuid4())
+
+        r1 = client.chat("Remember the number 42", thread_id=tid)
+        # Reset so agent is recreated (simulates no cross-turn state)
+        client.reset_agent()
+        r2 = client.chat("What number did I say?", thread_id=tid)
+
+        # Without a checkpointer the second call has no memory of the first.
+        # We can't assert exact content, but both should be non-empty.
+        assert isinstance(r1, str) and len(r1) > 0
+        assert isinstance(r2, str) and len(r2) > 0
+
+
+# ---------------------------------------------------------------------------
+# Step 3: Tool call flow (requires LLM)
+# ---------------------------------------------------------------------------
+
+
+class TestToolCallFlow:
+    """Verify the LLM actually invokes tools through the real agent pipeline."""
+
+    @requires_llm
+    def test_tool_call_produces_events(self, client):
+        """When the LLM decides to use a tool, we see tool call + result events."""
+        # Give a clear instruction that forces a tool call
+        events = list(client.stream(
+            "Use the bash tool to run: echo hello_e2e_test"
+        ))
+
+        types = [e.type for e in events]
+        assert types[-1] == "end"
+
+        # Should have at least one tool call event
+        tool_call_events = [
+            e for e in events
+            if e.type == "messages-tuple" and e.data.get("tool_calls")
+        ]
+        tool_result_events = [
+            e for e in events
+            if e.type == "messages-tuple" and e.data.get("type") == "tool"
+        ]
+        assert len(tool_call_events) >= 1, "Expected at least one tool_call event"
+        assert len(tool_result_events) >= 1, "Expected at least one tool result event"
+
+    @requires_llm
+    def test_tool_call_event_structure(self, client):
+        """Tool call events contain name, args, and id fields."""
+        events = list(client.stream(
+            "Use the read_file tool to read /mnt/user-data/workspace/nonexistent.txt"
+        ))
+
+        tc_events = [
+            e for e in events
+            if e.type == "messages-tuple" and e.data.get("tool_calls")
+        ]
+        if tc_events:
+            tc = tc_events[0].data["tool_calls"][0]
+            assert "name" in tc
+            assert "args" in tc
+            assert "id" in tc
+
+
+# ---------------------------------------------------------------------------
+# Step 4: File upload integration (no LLM needed for most)
+# ---------------------------------------------------------------------------
+
+
+class TestFileUploadIntegration:
+    """Upload, list, and delete files through the real client path."""
+
+    def test_upload_files(self, e2e_env, tmp_path):
+        """upload_files() copies files and returns metadata."""
+        test_file = tmp_path / "source" / "readme.txt"
+        test_file.parent.mkdir(parents=True, exist_ok=True)
+        test_file.write_text("Hello world")
+
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        tid = str(uuid.uuid4())
+
+        result = c.upload_files(tid, [test_file])
+        assert result["success"] is True
+        assert len(result["files"]) == 1
+        assert result["files"][0]["filename"] == "readme.txt"
+
+        # Physically exists
+        from deerflow.config.paths import get_paths
+        assert (get_paths().sandbox_uploads_dir(tid) / "readme.txt").exists()
+
+    def test_upload_duplicate_rename(self, e2e_env, tmp_path):
+        """Uploading two files with the same name auto-renames the second."""
+        d1 = tmp_path / "dir1"
+        d2 = tmp_path / "dir2"
+        d1.mkdir()
+        d2.mkdir()
+        (d1 / "data.txt").write_text("content A")
+        (d2 / "data.txt").write_text("content B")
+
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        tid = str(uuid.uuid4())
+
+        result = c.upload_files(tid, [d1 / "data.txt", d2 / "data.txt"])
+        assert result["success"] is True
+        assert len(result["files"]) == 2
+
+        filenames = {f["filename"] for f in result["files"]}
+        assert "data.txt" in filenames
+        assert "data_1.txt" in filenames
+
+    def test_upload_list_and_delete(self, e2e_env, tmp_path):
+        """Upload → list → delete → list lifecycle."""
+        test_file = tmp_path / "lifecycle.txt"
+        test_file.write_text("lifecycle test")
+
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        tid = str(uuid.uuid4())
+
+        c.upload_files(tid, [test_file])
+
+        listing = c.list_uploads(tid)
+        assert listing["count"] == 1
+        assert listing["files"][0]["filename"] == "lifecycle.txt"
+
+        del_result = c.delete_upload(tid, "lifecycle.txt")
+        assert del_result["success"] is True
+
+        listing = c.list_uploads(tid)
+        assert listing["count"] == 0
+
+    @requires_llm
+    def test_upload_then_chat(self, e2e_env, tmp_path):
+        """Upload a file then ask the LLM about it — UploadsMiddleware injects file info."""
+        test_file = tmp_path / "source" / "notes.txt"
+        test_file.parent.mkdir(parents=True, exist_ok=True)
+        test_file.write_text("The secret code is 7749.")
+
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        tid = str(uuid.uuid4())
+
+        c.upload_files(tid, [test_file])
+        # Chat — the middleware should inject <uploaded_files> context
+        response = c.chat("What files are available?", thread_id=tid)
+        assert isinstance(response, str) and len(response) > 0
+
+
+# ---------------------------------------------------------------------------
+# Step 5: Lifecycle and configuration (no LLM needed)
+# ---------------------------------------------------------------------------
+
+
+class TestLifecycleAndConfig:
+    """Agent recreation and configuration behavior."""
+
+    @requires_llm
+    def test_agent_recreation_on_config_change(self, client):
+        """Changing thinking_enabled triggers agent recreation (different config key)."""
+        list(client.stream("hi"))
+        key1 = client._agent_config_key
+
+        # Stream with a different config override
+        client.reset_agent()
+        list(client.stream("hi", thinking_enabled=True))
+        key2 = client._agent_config_key
+
+        # thinking_enabled changed: False → True → keys differ
+        assert key1 != key2
+
+    def test_reset_agent_clears_state(self, e2e_env):
+        """reset_agent() sets the internal agent to None."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        # Before any call, agent is None
+        assert c._agent is None
+
+        c.reset_agent()
+        assert c._agent is None
+        assert c._agent_config_key is None
+
+    def test_plan_mode_config_key(self, e2e_env):
+        """plan_mode is part of the config key tuple."""
+        c = DeerFlowClient(checkpointer=None, plan_mode=False)
+        cfg1 = c._get_runnable_config("test-thread")
+        key1 = (
+            cfg1["configurable"]["model_name"],
+            cfg1["configurable"]["thinking_enabled"],
+            cfg1["configurable"]["is_plan_mode"],
+            cfg1["configurable"]["subagent_enabled"],
+        )
+
+        c2 = DeerFlowClient(checkpointer=None, plan_mode=True)
+        cfg2 = c2._get_runnable_config("test-thread")
+        key2 = (
+            cfg2["configurable"]["model_name"],
+            cfg2["configurable"]["thinking_enabled"],
+            cfg2["configurable"]["is_plan_mode"],
+            cfg2["configurable"]["subagent_enabled"],
+        )
+
+        assert key1 != key2
+        assert key1[2] is False
+        assert key2[2] is True
+
+
+# ---------------------------------------------------------------------------
+# Step 6: Middleware chain verification (requires LLM)
+# ---------------------------------------------------------------------------
+
+
+class TestMiddlewareChain:
+    """Verify middleware side effects through real execution."""
+
+    @requires_llm
+    def test_thread_data_paths_in_state(self, client):
+        """After streaming, thread directory paths are computed correctly."""
+        tid = str(uuid.uuid4())
+        events = list(client.stream("hi", thread_id=tid))
+
+        # The values event should contain messages
+        values_events = [e for e in events if e.type == "values"]
+        assert len(values_events) >= 1
+
+        # ThreadDataMiddleware should have set paths in the state.
+        # We verify the paths singleton can resolve the thread dir.
+        from deerflow.config.paths import get_paths
+        thread_dir = get_paths().thread_dir(tid)
+        assert str(thread_dir).endswith(tid)
+
+    @requires_llm
+    def test_stream_completes_without_middleware_errors(self, client):
+        """Full middleware chain (ThreadData, Uploads, Sandbox, DanglingToolCall,
+        Memory, Clarification) executes without errors."""
+        events = list(client.stream("What is 1+1?"))
+
+        types = [e.type for e in events]
+        assert types[-1] == "end"
+        # Should have at least one AI response
+        ai_events = [
+            e for e in events
+            if e.type == "messages-tuple" and e.data.get("type") == "ai"
+        ]
+        assert len(ai_events) >= 1
+
+
+# ---------------------------------------------------------------------------
+# Step 7: Error and boundary conditions
+# ---------------------------------------------------------------------------
+
+
+class TestErrorAndBoundary:
+    """Error propagation and edge cases."""
+
+    def test_upload_nonexistent_file_raises(self, e2e_env):
+        """Uploading a file that doesn't exist raises FileNotFoundError."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        with pytest.raises(FileNotFoundError):
+            c.upload_files("test-thread", ["/nonexistent/file.txt"])
+
+    def test_delete_nonexistent_upload_raises(self, e2e_env):
+        """Deleting a file that doesn't exist raises FileNotFoundError."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        tid = str(uuid.uuid4())
+        # Ensure the uploads dir exists first
+        c.list_uploads(tid)
+        with pytest.raises(FileNotFoundError):
+            c.delete_upload(tid, "ghost.txt")
+
+    def test_artifact_path_traversal_blocked(self, e2e_env):
+        """get_artifact blocks path traversal attempts."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        with pytest.raises(ValueError):
+            c.get_artifact("test-thread", "../../etc/passwd")
+
+    def test_upload_directory_rejected(self, e2e_env, tmp_path):
+        """Uploading a directory (not a file) is rejected."""
+        d = tmp_path / "a_directory"
+        d.mkdir()
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        with pytest.raises(ValueError, match="not a file"):
+            c.upload_files("test-thread", [d])
+
+    @requires_llm
+    def test_empty_message_still_gets_response(self, client):
+        """Even an empty-ish message should produce a valid event stream."""
+        events = list(client.stream(" "))
+        types = [e.type for e in events]
+        assert types[-1] == "end"
+
+
+# ---------------------------------------------------------------------------
+# Step 8: Artifact access (no LLM needed)
+# ---------------------------------------------------------------------------
+
+
+class TestArtifactAccess:
+    """Read artifacts through get_artifact() with real filesystem."""
+
+    def test_get_artifact_happy_path(self, e2e_env):
+        """Write a file to outputs, then read it back via get_artifact()."""
+        from deerflow.config.paths import get_paths
+
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        tid = str(uuid.uuid4())
+
+        # Create an output file in the thread's outputs directory
+        outputs_dir = get_paths().sandbox_outputs_dir(tid)
+        outputs_dir.mkdir(parents=True, exist_ok=True)
+        (outputs_dir / "result.txt").write_text("hello artifact")
+
+        data, mime = c.get_artifact(tid, "mnt/user-data/outputs/result.txt")
+        assert data == b"hello artifact"
+        assert "text" in mime
+
+    def test_get_artifact_nested_path(self, e2e_env):
+        """Artifacts in subdirectories are accessible."""
+        from deerflow.config.paths import get_paths
+
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        tid = str(uuid.uuid4())
+
+        outputs_dir = get_paths().sandbox_outputs_dir(tid)
+        sub = outputs_dir / "charts"
+        sub.mkdir(parents=True, exist_ok=True)
+        (sub / "data.json").write_text('{"x": 1}')
+
+        data, mime = c.get_artifact(tid, "mnt/user-data/outputs/charts/data.json")
+        assert b'"x"' in data
+        assert "json" in mime
+
+    def test_get_artifact_nonexistent_raises(self, e2e_env):
+        """Reading a nonexistent artifact raises FileNotFoundError."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        with pytest.raises(FileNotFoundError):
+            c.get_artifact("test-thread", "mnt/user-data/outputs/ghost.txt")
+
+    def test_get_artifact_traversal_within_prefix_blocked(self, e2e_env):
+        """Path traversal within the valid prefix is still blocked."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        with pytest.raises((PermissionError, ValueError, FileNotFoundError)):
+            c.get_artifact("test-thread", "mnt/user-data/outputs/../../etc/passwd")
+
+
+# ---------------------------------------------------------------------------
+# Step 9: Skill installation (no LLM needed)
+# ---------------------------------------------------------------------------
+
+
+class TestSkillInstallation:
+    """install_skill() with real ZIP handling and filesystem."""
+
+    @pytest.fixture(autouse=True)
+    def _isolate_skills_dir(self, tmp_path, monkeypatch):
+        """Redirect skill installation to a temp directory."""
+        skills_root = tmp_path / "skills"
+        (skills_root / "public").mkdir(parents=True)
+        (skills_root / "custom").mkdir(parents=True)
+        monkeypatch.setattr(
+            "deerflow.skills.installer.get_skills_root_path",
+            lambda: skills_root,
+        )
+        self._skills_root = skills_root
+
+    @staticmethod
+    def _make_skill_zip(tmp_path, skill_name="test-e2e-skill"):
+        """Create a minimal valid .skill archive."""
+        skill_dir = tmp_path / "build" / skill_name
+        skill_dir.mkdir(parents=True)
+        (skill_dir / "SKILL.md").write_text(
+            f"---\nname: {skill_name}\ndescription: E2E test skill\n---\n\nTest content.\n"
+        )
+        archive_path = tmp_path / f"{skill_name}.skill"
+        with zipfile.ZipFile(archive_path, "w") as zf:
+            for file in skill_dir.rglob("*"):
+                zf.write(file, file.relative_to(tmp_path / "build"))
+        return archive_path
+
+    def test_install_skill_success(self, e2e_env, tmp_path):
+        """A valid .skill archive installs to the custom skills directory."""
+        archive = self._make_skill_zip(tmp_path)
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+
+        result = c.install_skill(archive)
+        assert result["success"] is True
+        assert result["skill_name"] == "test-e2e-skill"
+        assert (self._skills_root / "custom" / "test-e2e-skill" / "SKILL.md").exists()
+
+    def test_install_skill_duplicate_rejected(self, e2e_env, tmp_path):
+        """Installing the same skill twice raises ValueError."""
+        archive = self._make_skill_zip(tmp_path)
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+
+        c.install_skill(archive)
+        with pytest.raises(ValueError, match="already exists"):
+            c.install_skill(archive)
+
+    def test_install_skill_invalid_extension(self, e2e_env, tmp_path):
+        """A file without .skill extension is rejected."""
+        bad_file = tmp_path / "not_a_skill.zip"
+        bad_file.write_bytes(b"PK\x03\x04")  # ZIP magic bytes
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        with pytest.raises(ValueError, match=".skill extension"):
+            c.install_skill(bad_file)
+
+    def test_install_skill_missing_frontmatter(self, e2e_env, tmp_path):
+        """A .skill archive without valid SKILL.md frontmatter is rejected."""
+        skill_dir = tmp_path / "build" / "bad-skill"
+        skill_dir.mkdir(parents=True)
+        (skill_dir / "SKILL.md").write_text("No frontmatter here.")
+
+        archive = tmp_path / "bad-skill.skill"
+        with zipfile.ZipFile(archive, "w") as zf:
+            for file in skill_dir.rglob("*"):
+                zf.write(file, file.relative_to(tmp_path / "build"))
+
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        with pytest.raises(ValueError, match="Invalid skill"):
+            c.install_skill(archive)
+
+    def test_install_skill_nonexistent_file(self, e2e_env):
+        """Installing from a nonexistent path raises FileNotFoundError."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        with pytest.raises(FileNotFoundError):
+            c.install_skill("/nonexistent/skill.skill")
+
+
+# ---------------------------------------------------------------------------
+# Step 10: Configuration management (no LLM needed)
+# ---------------------------------------------------------------------------
+
+
+class TestConfigManagement:
+    """Config queries and updates through real code paths."""
+
+    def test_list_models_returns_injected_config(self, e2e_env):
+        """list_models() returns the model from the injected AppConfig."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        result = c.list_models()
+        assert "models" in result
+        assert len(result["models"]) == 1
+        assert result["models"][0]["name"] == "volcengine-ark"
+        assert result["models"][0]["display_name"] == "E2E Test Model"
+
+    def test_get_model_found(self, e2e_env):
+        """get_model() returns the model when it exists."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        model = c.get_model("volcengine-ark")
+        assert model is not None
+        assert model["name"] == "volcengine-ark"
+        assert model["supports_thinking"] is False
+
+    def test_get_model_not_found(self, e2e_env):
+        """get_model() returns None for nonexistent model."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        assert c.get_model("nonexistent-model") is None
+
+    def test_list_skills_returns_list(self, e2e_env):
+        """list_skills() returns a dict with 'skills' key from real directory scan."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        result = c.list_skills()
+        assert "skills" in result
+        assert isinstance(result["skills"], list)
+        # The real skills/ directory should have some public skills
+        assert len(result["skills"]) > 0
+
+    def test_get_skill_found(self, e2e_env):
+        """get_skill() returns skill info for a known public skill."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        # 'deep-research' is a built-in public skill
+        skill = c.get_skill("deep-research")
+        if skill is not None:
+            assert skill["name"] == "deep-research"
+            assert "description" in skill
+            assert "enabled" in skill
+
+    def test_get_skill_not_found(self, e2e_env):
+        """get_skill() returns None for nonexistent skill."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        assert c.get_skill("nonexistent-skill-xyz") is None
+
+    def test_get_mcp_config_returns_dict(self, e2e_env):
+        """get_mcp_config() returns a dict with 'mcp_servers' key."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        result = c.get_mcp_config()
+        assert "mcp_servers" in result
+        assert isinstance(result["mcp_servers"], dict)
+
+    def test_update_mcp_config_writes_and_invalidates(self, e2e_env, tmp_path, monkeypatch):
+        """update_mcp_config() writes extensions_config.json and invalidates the agent."""
+        # Set up a writable extensions_config.json
+        config_file = tmp_path / "extensions_config.json"
+        config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
+        monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
+
+        # Force reload so the singleton picks up our test file
+        from deerflow.config.extensions_config import reload_extensions_config
+        reload_extensions_config()
+
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        # Simulate a cached agent
+        c._agent = "fake-agent-placeholder"
+        c._agent_config_key = ("a", "b", "c", "d")
+
+        result = c.update_mcp_config({"test-server": {"enabled": True, "type": "stdio", "command": "echo"}})
+        assert "mcp_servers" in result
+
+        # Agent should be invalidated
+        assert c._agent is None
+        assert c._agent_config_key is None
+
+        # File should be written
+        written = json.loads(config_file.read_text())
+        assert "test-server" in written["mcpServers"]
+
+    def test_update_skill_writes_and_invalidates(self, e2e_env, tmp_path, monkeypatch):
+        """update_skill() writes extensions_config.json and invalidates the agent."""
+        config_file = tmp_path / "extensions_config.json"
+        config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
+        monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
+
+        from deerflow.config.extensions_config import reload_extensions_config
+        reload_extensions_config()
+
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        c._agent = "fake-agent-placeholder"
+        c._agent_config_key = ("a", "b", "c", "d")
+
+        # Use a real skill name from the public skills directory
+        skills = c.list_skills()
+        if not skills["skills"]:
+            pytest.skip("No skills available for testing")
+        skill_name = skills["skills"][0]["name"]
+
+        result = c.update_skill(skill_name, enabled=False)
+        assert result["name"] == skill_name
+        assert result["enabled"] is False
+
+        # Agent should be invalidated
+        assert c._agent is None
+        assert c._agent_config_key is None
+
+    def test_update_skill_nonexistent_raises(self, e2e_env, tmp_path, monkeypatch):
+        """update_skill() raises ValueError for nonexistent skill."""
+        config_file = tmp_path / "extensions_config.json"
+        config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
+        monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
+
+        from deerflow.config.extensions_config import reload_extensions_config
+        reload_extensions_config()
+
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        with pytest.raises(ValueError, match="not found"):
+            c.update_skill("nonexistent-skill-xyz", enabled=True)
+
+
+# ---------------------------------------------------------------------------
+# Step 11: Memory access (no LLM needed)
+# ---------------------------------------------------------------------------
+
+
+class TestMemoryAccess:
+    """Memory system queries through real code paths."""
+
+    def test_get_memory_returns_dict(self, e2e_env):
+        """get_memory() returns a dict (may be empty initial state)."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        result = c.get_memory()
+        assert isinstance(result, dict)
+
+    def test_reload_memory_returns_dict(self, e2e_env):
+        """reload_memory() forces reload and returns a dict."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        result = c.reload_memory()
+        assert isinstance(result, dict)
+
+    def test_get_memory_config_fields(self, e2e_env):
+        """get_memory_config() returns expected config fields."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        result = c.get_memory_config()
+        assert "enabled" in result
+        assert "storage_path" in result
+        assert "debounce_seconds" in result
+        assert "max_facts" in result
+        assert "fact_confidence_threshold" in result
+        assert "injection_enabled" in result
+        assert "max_injection_tokens" in result
+
+    def test_get_memory_status_combines_config_and_data(self, e2e_env):
+        """get_memory_status() returns both 'config' and 'data' keys."""
+        c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
+        result = c.get_memory_status()
+        assert "config" in result
+        assert "data" in result
+        assert "enabled" in result["config"]
+        assert isinstance(result["data"], dict)
--- a/backend/tests/test_skills_archive_root.py
+++ b/backend/tests/test_skills_archive_root.py
@@ -1,8 +1,8 @@
 from pathlib import Path

-from fastapi import HTTPException
+import pytest

-from app.gateway.routers.skills import _resolve_skill_dir_from_archive_root
+from deerflow.skills.installer import resolve_skill_dir_from_archive


 def _write_skill(skill_dir: Path) -> None:
@@ -23,24 +23,19 @@ def test_resolve_skill_dir_ignores_macosx_wrapper(tmp_path: Path) -> None:
    _write_skill(tmp_path / "demo-skill")
    (tmp_path / "__MACOSX").mkdir()

-    assert _resolve_skill_dir_from_archive_root(tmp_path) == tmp_path / "demo-skill"
+    assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "demo-skill"


 def test_resolve_skill_dir_ignores_hidden_top_level_entries(tmp_path: Path) -> None:
    _write_skill(tmp_path / "demo-skill")
    (tmp_path / ".DS_Store").write_text("metadata", encoding="utf-8")

-    assert _resolve_skill_dir_from_archive_root(tmp_path) == tmp_path / "demo-skill"
+    assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "demo-skill"


 def test_resolve_skill_dir_rejects_archive_with_only_metadata(tmp_path: Path) -> None:
    (tmp_path / "__MACOSX").mkdir()
    (tmp_path / ".DS_Store").write_text("metadata", encoding="utf-8")

-    try:
-        _resolve_skill_dir_from_archive_root(tmp_path)
-    except HTTPException as error:
-        assert error.status_code == 400
-        assert error.detail == "Skill archive is empty"
-    else:
-        raise AssertionError("Expected HTTPException for metadata-only archive")
+    with pytest.raises(ValueError, match="empty"):
+        resolve_skill_dir_from_archive(tmp_path)
--- a/backend/tests/test_skills_installer.py
+++ b/backend/tests/test_skills_installer.py
@@ -0,0 +1,220 @@
+"""Tests for deerflow.skills.installer — shared skill installation logic."""
+
+import stat
+import zipfile
+from pathlib import Path
+
+import pytest
+
+from deerflow.skills.installer import (
+    install_skill_from_archive,
+    is_symlink_member,
+    is_unsafe_zip_member,
+    resolve_skill_dir_from_archive,
+    safe_extract_skill_archive,
+    should_ignore_archive_entry,
+)
+
+# ---------------------------------------------------------------------------
+# is_unsafe_zip_member
+# ---------------------------------------------------------------------------
+
+
+class TestIsUnsafeZipMember:
+    def test_absolute_path(self):
+        info = zipfile.ZipInfo("/etc/passwd")
+        assert is_unsafe_zip_member(info) is True
+
+    def test_dotdot_traversal(self):
+        info = zipfile.ZipInfo("foo/../../../etc/passwd")
+        assert is_unsafe_zip_member(info) is True
+
+    def test_safe_member(self):
+        info = zipfile.ZipInfo("my-skill/SKILL.md")
+        assert is_unsafe_zip_member(info) is False
+
+    def test_empty_filename(self):
+        info = zipfile.ZipInfo("")
+        assert is_unsafe_zip_member(info) is False
+
+
+# ---------------------------------------------------------------------------
+# is_symlink_member
+# ---------------------------------------------------------------------------
+
+
+class TestIsSymlinkMember:
+    def test_detects_symlink(self):
+        info = zipfile.ZipInfo("link.txt")
+        info.external_attr = (stat.S_IFLNK | 0o777) << 16
+        assert is_symlink_member(info) is True
+
+    def test_regular_file(self):
+        info = zipfile.ZipInfo("file.txt")
+        info.external_attr = (stat.S_IFREG | 0o644) << 16
+        assert is_symlink_member(info) is False
+
+
+# ---------------------------------------------------------------------------
+# should_ignore_archive_entry
+# ---------------------------------------------------------------------------
+
+
+class TestShouldIgnoreArchiveEntry:
+    def test_macosx_ignored(self):
+        assert should_ignore_archive_entry(Path("__MACOSX")) is True
+
+    def test_dotfile_ignored(self):
+        assert should_ignore_archive_entry(Path(".DS_Store")) is True
+
+    def test_normal_dir_not_ignored(self):
+        assert should_ignore_archive_entry(Path("my-skill")) is False
+
+
+# ---------------------------------------------------------------------------
+# resolve_skill_dir_from_archive
+# ---------------------------------------------------------------------------
+
+
+class TestResolveSkillDir:
+    def test_single_dir(self, tmp_path):
+        (tmp_path / "my-skill").mkdir()
+        (tmp_path / "my-skill" / "SKILL.md").write_text("content")
+        assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "my-skill"
+
+    def test_with_macosx(self, tmp_path):
+        (tmp_path / "my-skill").mkdir()
+        (tmp_path / "my-skill" / "SKILL.md").write_text("content")
+        (tmp_path / "__MACOSX").mkdir()
+        assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "my-skill"
+
+    def test_empty_after_filter(self, tmp_path):
+        (tmp_path / "__MACOSX").mkdir()
+        (tmp_path / ".DS_Store").write_text("meta")
+        with pytest.raises(ValueError, match="empty"):
+            resolve_skill_dir_from_archive(tmp_path)
+
+
+# ---------------------------------------------------------------------------
+# safe_extract_skill_archive
+# ---------------------------------------------------------------------------
+
+
+class TestSafeExtract:
+    def _make_zip(self, tmp_path, members: dict[str, str | bytes]) -> Path:
+        """Create a zip with given filename->content entries."""
+        zip_path = tmp_path / "test.zip"
+        with zipfile.ZipFile(zip_path, "w") as zf:
+            for name, content in members.items():
+                if isinstance(content, str):
+                    content = content.encode()
+                zf.writestr(name, content)
+        return zip_path
+
+    def test_rejects_zip_bomb(self, tmp_path):
+        zip_path = self._make_zip(tmp_path, {"big.txt": "x" * 1000})
+        dest = tmp_path / "out"
+        dest.mkdir()
+        with zipfile.ZipFile(zip_path) as zf:
+            with pytest.raises(ValueError, match="too large"):
+                safe_extract_skill_archive(zf, dest, max_total_size=100)
+
+    def test_rejects_absolute_path(self, tmp_path):
+        zip_path = tmp_path / "abs.zip"
+        with zipfile.ZipFile(zip_path, "w") as zf:
+            zf.writestr("/etc/passwd", "root:x:0:0")
+        dest = tmp_path / "out"
+        dest.mkdir()
+        with zipfile.ZipFile(zip_path) as zf:
+            with pytest.raises(ValueError, match="unsafe"):
+                safe_extract_skill_archive(zf, dest)
+
+    def test_skips_symlinks(self, tmp_path):
+        zip_path = tmp_path / "sym.zip"
+        with zipfile.ZipFile(zip_path, "w") as zf:
+            info = zipfile.ZipInfo("link.txt")
+            info.external_attr = (stat.S_IFLNK | 0o777) << 16
+            zf.writestr(info, "/etc/passwd")
+            zf.writestr("normal.txt", "hello")
+        dest = tmp_path / "out"
+        dest.mkdir()
+        with zipfile.ZipFile(zip_path) as zf:
+            safe_extract_skill_archive(zf, dest)
+        assert (dest / "normal.txt").exists()
+        assert not (dest / "link.txt").exists()
+
+    def test_normal_archive(self, tmp_path):
+        zip_path = self._make_zip(tmp_path, {
+            "my-skill/SKILL.md": "---\nname: test\ndescription: x\n---\n# Test",
+            "my-skill/README.md": "readme",
+        })
+        dest = tmp_path / "out"
+        dest.mkdir()
+        with zipfile.ZipFile(zip_path) as zf:
+            safe_extract_skill_archive(zf, dest)
+        assert (dest / "my-skill" / "SKILL.md").exists()
+        assert (dest / "my-skill" / "README.md").exists()
+
+
+# ---------------------------------------------------------------------------
+# install_skill_from_archive (full integration)
+# ---------------------------------------------------------------------------
+
+
+class TestInstallSkillFromArchive:
+    def _make_skill_zip(self, tmp_path: Path, skill_name: str = "test-skill") -> Path:
+        """Create a valid .skill archive."""
+        zip_path = tmp_path / f"{skill_name}.skill"
+        with zipfile.ZipFile(zip_path, "w") as zf:
+            zf.writestr(
+                f"{skill_name}/SKILL.md",
+                f"---\nname: {skill_name}\ndescription: A test skill\n---\n\n# {skill_name}\n",
+            )
+        return zip_path
+
+    def test_success(self, tmp_path):
+        zip_path = self._make_skill_zip(tmp_path)
+        skills_root = tmp_path / "skills"
+        skills_root.mkdir()
+        result = install_skill_from_archive(zip_path, skills_root=skills_root)
+        assert result["success"] is True
+        assert result["skill_name"] == "test-skill"
+        assert (skills_root / "custom" / "test-skill" / "SKILL.md").exists()
+
+    def test_duplicate_raises(self, tmp_path):
+        zip_path = self._make_skill_zip(tmp_path)
+        skills_root = tmp_path / "skills"
+        (skills_root / "custom" / "test-skill").mkdir(parents=True)
+        with pytest.raises(ValueError, match="already exists"):
+            install_skill_from_archive(zip_path, skills_root=skills_root)
+
+    def test_invalid_extension(self, tmp_path):
+        bad_path = tmp_path / "bad.zip"
+        bad_path.write_text("not a skill")
+        with pytest.raises(ValueError, match=".skill"):
+            install_skill_from_archive(bad_path)
+
+    def test_bad_frontmatter(self, tmp_path):
+        zip_path = tmp_path / "bad.skill"
+        with zipfile.ZipFile(zip_path, "w") as zf:
+            zf.writestr("bad/SKILL.md", "no frontmatter here")
+        skills_root = tmp_path / "skills"
+        skills_root.mkdir()
+        with pytest.raises(ValueError, match="Invalid skill"):
+            install_skill_from_archive(zip_path, skills_root=skills_root)
+
+    def test_nonexistent_file(self):
+        with pytest.raises(FileNotFoundError):
+            install_skill_from_archive(Path("/nonexistent/path.skill"))
+
+    def test_macosx_filtered_during_resolve(self, tmp_path):
+        """Archive with __MACOSX dir still installs correctly."""
+        zip_path = tmp_path / "mac.skill"
+        with zipfile.ZipFile(zip_path, "w") as zf:
+            zf.writestr("my-skill/SKILL.md", "---\nname: my-skill\ndescription: desc\n---\n# My Skill\n")
+            zf.writestr("__MACOSX/._my-skill", "meta")
+        skills_root = tmp_path / "skills"
+        skills_root.mkdir()
+        result = install_skill_from_archive(zip_path, skills_root=skills_root)
+        assert result["success"] is True
+        assert result["skill_name"] == "my-skill"
--- a/backend/tests/test_uploads_manager.py
+++ b/backend/tests/test_uploads_manager.py
@@ -0,0 +1,146 @@
+"""Tests for deerflow.uploads.manager — shared upload management logic."""
+
+import pytest
+
+from deerflow.uploads.manager import (
+    PathTraversalError,
+    claim_unique_filename,
+    delete_file_safe,
+    list_files_in_dir,
+    normalize_filename,
+    validate_path_traversal,
+)
+
+# ---------------------------------------------------------------------------
+# normalize_filename
+# ---------------------------------------------------------------------------
+
+
+class TestNormalizeFilename:
+    def test_safe_filename(self):
+        assert normalize_filename("report.pdf") == "report.pdf"
+
+    def test_strips_path_components(self):
+        assert normalize_filename("../../etc/passwd") == "passwd"
+
+    def test_rejects_empty(self):
+        with pytest.raises(ValueError, match="empty"):
+            normalize_filename("")
+
+    def test_rejects_dot_dot(self):
+        with pytest.raises(ValueError, match="unsafe"):
+            normalize_filename("..")
+
+    def test_strips_separators(self):
+        assert normalize_filename("path/to/file.txt") == "file.txt"
+
+    def test_dot_only(self):
+        with pytest.raises(ValueError, match="unsafe"):
+            normalize_filename(".")
+
+
+# ---------------------------------------------------------------------------
+# claim_unique_filename
+# ---------------------------------------------------------------------------
+
+
+class TestDeduplicateFilename:
+    def test_no_collision(self):
+        seen: set[str] = set()
+        assert claim_unique_filename("data.txt", seen) == "data.txt"
+        assert "data.txt" in seen
+
+    def test_single_collision(self):
+        seen = {"data.txt"}
+        assert claim_unique_filename("data.txt", seen) == "data_1.txt"
+        assert "data_1.txt" in seen
+
+    def test_triple_collision(self):
+        seen = {"data.txt", "data_1.txt", "data_2.txt"}
+        assert claim_unique_filename("data.txt", seen) == "data_3.txt"
+        assert "data_3.txt" in seen
+
+    def test_mutates_seen(self):
+        seen: set[str] = set()
+        claim_unique_filename("a.txt", seen)
+        claim_unique_filename("a.txt", seen)
+        assert seen == {"a.txt", "a_1.txt"}
+
+
+# ---------------------------------------------------------------------------
+# validate_path_traversal
+# ---------------------------------------------------------------------------
+
+
+class TestValidatePathTraversal:
+    def test_inside_base_ok(self, tmp_path):
+        child = tmp_path / "file.txt"
+        child.touch()
+        validate_path_traversal(child, tmp_path)  # no exception
+
+    def test_outside_base_raises(self, tmp_path):
+        outside = tmp_path / ".." / "evil.txt"
+        with pytest.raises(PathTraversalError, match="traversal"):
+            validate_path_traversal(outside, tmp_path)
+
+    def test_symlink_escape(self, tmp_path):
+        target = tmp_path.parent / "secret.txt"
+        target.touch()
+        link = tmp_path / "escape"
+        link.symlink_to(target)
+        with pytest.raises(PathTraversalError, match="traversal"):
+            validate_path_traversal(link, tmp_path)
+
+
+# ---------------------------------------------------------------------------
+# list_files_in_dir
+# ---------------------------------------------------------------------------
+
+
+class TestListFilesInDir:
+    def test_empty_dir(self, tmp_path):
+        result = list_files_in_dir(tmp_path)
+        assert result == {"files": [], "count": 0}
+
+    def test_nonexistent_dir(self, tmp_path):
+        result = list_files_in_dir(tmp_path / "nope")
+        assert result == {"files": [], "count": 0}
+
+    def test_multiple_files_sorted(self, tmp_path):
+        (tmp_path / "b.txt").write_text("b")
+        (tmp_path / "a.txt").write_text("a")
+        result = list_files_in_dir(tmp_path)
+        assert result["count"] == 2
+        assert result["files"][0]["filename"] == "a.txt"
+        assert result["files"][1]["filename"] == "b.txt"
+        for f in result["files"]:
+            assert set(f.keys()) == {"filename", "size", "path", "extension", "modified"}
+
+    def test_ignores_subdirectories(self, tmp_path):
+        (tmp_path / "file.txt").write_text("data")
+        (tmp_path / "subdir").mkdir()
+        result = list_files_in_dir(tmp_path)
+        assert result["count"] == 1
+        assert result["files"][0]["filename"] == "file.txt"
+
+
+# ---------------------------------------------------------------------------
+# delete_file_safe
+# ---------------------------------------------------------------------------
+
+
+class TestDeleteFileSafe:
+    def test_delete_existing_file(self, tmp_path):
+        f = tmp_path / "test.txt"
+        f.write_text("data")
+        result = delete_file_safe(tmp_path, "test.txt")
+        assert result["success"] is True
+        assert not f.exists()
+
+    def test_delete_nonexistent_raises(self, tmp_path):
+        with pytest.raises(FileNotFoundError):
+            delete_file_safe(tmp_path, "nope.txt")
+
+    def test_delete_traversal_raises(self, tmp_path):
+        with pytest.raises(PathTraversalError, match="traversal"):
+            delete_file_safe(tmp_path, "../outside.txt")
--- a/backend/tests/test_uploads_router.py
+++ b/backend/tests/test_uploads_router.py
@@ -19,6 +19,7 @@ def test_upload_files_writes_thread_storage_and_skips_local_sandbox_sync(tmp_pat

    with (
        patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
+        patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
        patch.object(uploads, "get_sandbox_provider", return_value=provider),
    ):
        file = UploadFile(filename="notes.txt", file=BytesIO(b"hello uploads"))
@@ -48,6 +49,7 @@ def test_upload_files_syncs_non_local_sandbox_and_marks_markdown_file(tmp_path):

    with (
        patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
+        patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
        patch.object(uploads, "get_sandbox_provider", return_value=provider),
        patch.object(uploads, "convert_file_to_markdown", AsyncMock(side_effect=fake_convert)),
    ):
@@ -78,6 +80,7 @@ def test_upload_files_rejects_dotdot_and_dot_filenames(tmp_path):

    with (
        patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
+        patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
        patch.object(uploads, "get_sandbox_provider", return_value=provider),
    ):
        # These filenames must be rejected outright