Files
deer-flow/backend/tests/test_uploads_router.py
greatmengqi b8bc80d89b refactor: extract shared skill installer and upload manager to harness (#1202)
* refactor: extract shared skill installer and upload manager to harness

Move duplicated business logic from Gateway routers and Client into
shared harness modules, eliminating code duplication.

New shared modules:
- deerflow.skills.installer: 6 functions (zip security, extraction, install)
- deerflow.uploads.manager: 7 functions (normalize, deduplicate, validate,
  list, delete, get_uploads_dir, ensure_uploads_dir)

Key improvements:
- SkillAlreadyExistsError replaces stringly-typed 409 status routing
- normalize_filename rejects backslash-containing filenames
- Read paths (list/delete) no longer mkdir via get_uploads_dir
- Write paths use ensure_uploads_dir for explicit directory creation
- list_files_in_dir does stat inside scandir context (no re-stat)
- install_skill_from_archive uses single is_file() check (one syscall)
- Fix agent config key not reset on update_mcp_config/update_skill

Tests: 42 new (22 installer + 20 upload manager) + client hardening

* refactor: centralize upload URL construction and clean up installer

- Extract upload_virtual_path(), upload_artifact_url(), enrich_file_listing()
  into shared manager.py, eliminating 6 duplicated URL constructions across
  Gateway router and Client
- Derive all upload URLs from VIRTUAL_PATH_PREFIX constant instead of
  hardcoded "mnt/user-data/uploads" strings
- Eliminate TOCTOU pre-checks and double file read in installer — single
  ZipFile() open with exception handling replaces is_file() + is_zipfile()
  + ZipFile() sequence
- Add missing re-exports: ensure_uploads_dir in uploads/__init__.py,
  SkillAlreadyExistsError in skills/__init__.py
- Remove redundant .lower() on already-lowercase CONVERTIBLE_EXTENSIONS
- Hoist sandbox_uploads_dir(thread_id) before loop in uploads router

* fix: add input validation for thread_id and filename length

- Reject thread_id containing unsafe filesystem characters (only allow
  alphanumeric, hyphens, underscores, dots) — prevents 500 on inputs
  like <script> or shell metacharacters
- Reject filenames longer than 255 bytes (OS limit) in normalize_filename
- Gateway upload router maps ValueError to 400 for invalid thread_id

* fix: address PR review — symlink safety, input validation coverage, error ordering

- list_files_in_dir: use follow_symlinks=False to prevent symlink metadata
  leakage; check is_dir() instead of exists() for non-directory paths
- install_skill_from_archive: restore is_file() pre-check before extension
  validation so error messages match the documented exception contract
- validate_thread_id: move from ensure_uploads_dir to get_uploads_dir so
  all entry points (upload/list/delete) are protected
- delete_uploaded_file: catch ValueError from thread_id validation (was 500)
- requires_llm marker: also skip when OPENAI_API_KEY is unset
- e2e fixture: update TitleMiddleware exclusion comment (kept filtering —
  middleware triggers extra LLM calls that add non-determinism to tests)

* chore: revert uv.lock to main — no dependency changes in this PR

* fix: use monkeypatch for global config in e2e fixture to prevent test pollution

The e2e_env fixture was calling set_title_config() and
set_summarization_config() directly, which mutated global singletons
without automatic cleanup. When pytest ran test_client_e2e.py before
test_title_middleware_core_logic.py, the leaked enabled=False caused
5 title tests to fail in CI.

Switched to monkeypatch.setattr on the module-level private variables
so pytest restores the originals after each test.

* fix: address code review — URL encoding, API consistency, test isolation

- upload_artifact_url: percent-encode filename to handle spaces/#/?
- deduplicate_filename: mutate seen set in place (caller no longer
  needs manual .add() — less error-prone API)
- list_files_in_dir: document that size is int, enrich stringifies
- e2e fixture: monkeypatch _app_config instead of set_app_config()
  to prevent global singleton pollution (same pattern as title/summarization fix)
- _make_e2e_config: read LLM connection details from env vars so
  external contributors can override defaults
- Update tests to match new deduplicate_filename contract

* docs: rewrite RFC in English and add alternatives/breaking changes sections

* fix: address code review feedback on PR #1202

- Rename deduplicate_filename to claim_unique_filename to make
  the in-place set mutation explicit in the function name
- Replace PermissionError with PathTraversalError(ValueError) for
  path traversal detection — malformed input is 400, not 403

* fix: set _app_config_is_custom in e2e test fixture to prevent config.yaml lookup in CI

---------

Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
2026-03-25 16:28:33 +08:00

116 lines
4.7 KiB
Python

import asyncio
from io import BytesIO
from pathlib import Path
from unittest.mock import AsyncMock, MagicMock, patch
from fastapi import UploadFile
from app.gateway.routers import uploads
def test_upload_files_writes_thread_storage_and_skips_local_sandbox_sync(tmp_path):
thread_uploads_dir = tmp_path / "uploads"
thread_uploads_dir.mkdir(parents=True)
provider = MagicMock()
provider.acquire.return_value = "local"
sandbox = MagicMock()
provider.get.return_value = sandbox
with (
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
patch.object(uploads, "get_sandbox_provider", return_value=provider),
):
file = UploadFile(filename="notes.txt", file=BytesIO(b"hello uploads"))
result = asyncio.run(uploads.upload_files("thread-local", files=[file]))
assert result.success is True
assert len(result.files) == 1
assert result.files[0]["filename"] == "notes.txt"
assert (thread_uploads_dir / "notes.txt").read_bytes() == b"hello uploads"
sandbox.update_file.assert_not_called()
def test_upload_files_syncs_non_local_sandbox_and_marks_markdown_file(tmp_path):
thread_uploads_dir = tmp_path / "uploads"
thread_uploads_dir.mkdir(parents=True)
provider = MagicMock()
provider.acquire.return_value = "aio-1"
sandbox = MagicMock()
provider.get.return_value = sandbox
async def fake_convert(file_path: Path) -> Path:
md_path = file_path.with_suffix(".md")
md_path.write_text("converted", encoding="utf-8")
return md_path
with (
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
patch.object(uploads, "get_sandbox_provider", return_value=provider),
patch.object(uploads, "convert_file_to_markdown", AsyncMock(side_effect=fake_convert)),
):
file = UploadFile(filename="report.pdf", file=BytesIO(b"pdf-bytes"))
result = asyncio.run(uploads.upload_files("thread-aio", files=[file]))
assert result.success is True
assert len(result.files) == 1
file_info = result.files[0]
assert file_info["filename"] == "report.pdf"
assert file_info["markdown_file"] == "report.md"
assert (thread_uploads_dir / "report.pdf").read_bytes() == b"pdf-bytes"
assert (thread_uploads_dir / "report.md").read_text(encoding="utf-8") == "converted"
sandbox.update_file.assert_any_call("/mnt/user-data/uploads/report.pdf", b"pdf-bytes")
sandbox.update_file.assert_any_call("/mnt/user-data/uploads/report.md", b"converted")
def test_upload_files_rejects_dotdot_and_dot_filenames(tmp_path):
thread_uploads_dir = tmp_path / "uploads"
thread_uploads_dir.mkdir(parents=True)
provider = MagicMock()
provider.acquire.return_value = "local"
sandbox = MagicMock()
provider.get.return_value = sandbox
with (
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
patch.object(uploads, "get_sandbox_provider", return_value=provider),
):
# These filenames must be rejected outright
for bad_name in ["..", "."]:
file = UploadFile(filename=bad_name, file=BytesIO(b"data"))
result = asyncio.run(uploads.upload_files("thread-local", files=[file]))
assert result.success is True
assert result.files == [], f"Expected no files for unsafe filename {bad_name!r}"
# Path-traversal prefixes are stripped to the basename and accepted safely
file = UploadFile(filename="../etc/passwd", file=BytesIO(b"data"))
result = asyncio.run(uploads.upload_files("thread-local", files=[file]))
assert result.success is True
assert len(result.files) == 1
assert result.files[0]["filename"] == "passwd"
# Only the safely normalised file should exist
assert [f.name for f in thread_uploads_dir.iterdir()] == ["passwd"]
def test_delete_uploaded_file_removes_generated_markdown_companion(tmp_path):
thread_uploads_dir = tmp_path / "uploads"
thread_uploads_dir.mkdir(parents=True)
(thread_uploads_dir / "report.pdf").write_bytes(b"pdf-bytes")
(thread_uploads_dir / "report.md").write_text("converted", encoding="utf-8")
with patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir):
result = asyncio.run(uploads.delete_uploaded_file("thread-aio", "report.pdf"))
assert result == {"success": True, "message": "Deleted report.pdf"}
assert not (thread_uploads_dir / "report.pdf").exists()
assert not (thread_uploads_dir / "report.md").exists()