refactor: extract shared skill installer and upload manager to harness (#1202)

* refactor: extract shared skill installer and upload manager to harness Move duplicated business logic from Gateway routers and Client into shared harness modules, eliminating code duplication. New shared modules: - deerflow.skills.installer: 6 functions (zip security, extraction, install) - deerflow.uploads.manager: 7 functions (normalize, deduplicate, validate, list, delete, get_uploads_dir, ensure_uploads_dir) Key improvements: - SkillAlreadyExistsError replaces stringly-typed 409 status routing - normalize_filename rejects backslash-containing filenames - Read paths (list/delete) no longer mkdir via get_uploads_dir - Write paths use ensure_uploads_dir for explicit directory creation - list_files_in_dir does stat inside scandir context (no re-stat) - install_skill_from_archive uses single is_file() check (one syscall) - Fix agent config key not reset on update_mcp_config/update_skill Tests: 42 new (22 installer + 20 upload manager) + client hardening * refactor: centralize upload URL construction and clean up installer - Extract upload_virtual_path(), upload_artifact_url(), enrich_file_listing() into shared manager.py, eliminating 6 duplicated URL constructions across Gateway router and Client - Derive all upload URLs from VIRTUAL_PATH_PREFIX constant instead of hardcoded "mnt/user-data/uploads" strings - Eliminate TOCTOU pre-checks and double file read in installer — single ZipFile() open with exception handling replaces is_file() + is_zipfile() + ZipFile() sequence - Add missing re-exports: ensure_uploads_dir in uploads/__init__.py, SkillAlreadyExistsError in skills/__init__.py - Remove redundant .lower() on already-lowercase CONVERTIBLE_EXTENSIONS - Hoist sandbox_uploads_dir(thread_id) before loop in uploads router * fix: add input validation for thread_id and filename length - Reject thread_id containing unsafe filesystem characters (only allow alphanumeric, hyphens, underscores, dots) — prevents 500 on inputs like <script> or shell metacharacters - Reject filenames longer than 255 bytes (OS limit) in normalize_filename - Gateway upload router maps ValueError to 400 for invalid thread_id * fix: address PR review — symlink safety, input validation coverage, error ordering - list_files_in_dir: use follow_symlinks=False to prevent symlink metadata leakage; check is_dir() instead of exists() for non-directory paths - install_skill_from_archive: restore is_file() pre-check before extension validation so error messages match the documented exception contract - validate_thread_id: move from ensure_uploads_dir to get_uploads_dir so all entry points (upload/list/delete) are protected - delete_uploaded_file: catch ValueError from thread_id validation (was 500) - requires_llm marker: also skip when OPENAI_API_KEY is unset - e2e fixture: update TitleMiddleware exclusion comment (kept filtering — middleware triggers extra LLM calls that add non-determinism to tests) * chore: revert uv.lock to main — no dependency changes in this PR * fix: use monkeypatch for global config in e2e fixture to prevent test pollution The e2e_env fixture was calling set_title_config() and set_summarization_config() directly, which mutated global singletons without automatic cleanup. When pytest ran test_client_e2e.py before test_title_middleware_core_logic.py, the leaked enabled=False caused 5 title tests to fail in CI. Switched to monkeypatch.setattr on the module-level private variables so pytest restores the originals after each test. * fix: address code review — URL encoding, API consistency, test isolation - upload_artifact_url: percent-encode filename to handle spaces/#/? - deduplicate_filename: mutate seen set in place (caller no longer needs manual .add() — less error-prone API) - list_files_in_dir: document that size is int, enrich stringifies - e2e fixture: monkeypatch _app_config instead of set_app_config() to prevent global singleton pollution (same pattern as title/summarization fix) - _make_e2e_config: read LLM connection details from env vars so external contributors can override defaults - Update tests to match new deduplicate_filename contract * docs: rewrite RFC in English and add alternatives/breaking changes sections * fix: address code review feedback on PR #1202 - Rename deduplicate_filename to claim_unique_filename to make the in-place set mutation explicit in the function name - Replace PermissionError with PathTraversalError(ValueError) for path traversal detection — malformed input is 400, not 403 * fix: set _app_config_is_custom in e2e test fixture to prevent config.yaml lookup in CI --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
2026-04-18 03:54:46 +08:00 · 2026-03-25 16:28:33 +08:00
parent ec46ae075d
commit b8bc80d89b
14 changed files with 2591 additions and 567 deletions
--- a/backend/packages/harness/deerflow/client.py
+++ b/backend/packages/harness/deerflow/client.py
@@ -19,12 +19,9 @@ import asyncio
 import json
 import logging
 import mimetypes
-import os
-import re
 import shutil
 import tempfile
 import uuid
-import zipfile
 from collections.abc import Generator
 from dataclasses import dataclass, field
 from pathlib import Path
@@ -42,6 +39,17 @@ from deerflow.config.app_config import get_app_config, reload_app_config
 from deerflow.config.extensions_config import ExtensionsConfig, SkillStateConfig, get_extensions_config, reload_extensions_config
 from deerflow.config.paths import get_paths
 from deerflow.models import create_chat_model
+from deerflow.skills.installer import install_skill_from_archive
+from deerflow.uploads.manager import (
+    claim_unique_filename,
+    delete_file_safe,
+    enrich_file_listing,
+    ensure_uploads_dir,
+    get_uploads_dir,
+    list_files_in_dir,
+    upload_artifact_url,
+    upload_virtual_path,
+)

 logger = logging.getLogger(__name__)

@@ -566,6 +574,7 @@ class DeerFlowClient:
        self._atomic_write_json(config_path, config_data)

        self._agent = None
+        self._agent_config_key = None
        reloaded = reload_extensions_config()
        return {"mcp_servers": {name: server.model_dump() for name, server in reloaded.mcp_servers.items()}}

@@ -631,6 +640,7 @@ class DeerFlowClient:
        self._atomic_write_json(config_path, config_data)

        self._agent = None
+        self._agent_config_key = None
        reload_extensions_config()

        updated = next((s for s in load_skills(enabled_only=False) if s.name == name), None)
@@ -657,56 +667,7 @@ class DeerFlowClient:
            FileNotFoundError: If the file does not exist.
            ValueError: If the file is invalid.
        """
-        from deerflow.skills.loader import get_skills_root_path
-        from deerflow.skills.validation import _validate_skill_frontmatter
-
-        path = Path(skill_path)
-        if not path.exists():
-            raise FileNotFoundError(f"Skill file not found: {skill_path}")
-        if not path.is_file():
-            raise ValueError(f"Path is not a file: {skill_path}")
-        if path.suffix != ".skill":
-            raise ValueError("File must have .skill extension")
-        if not zipfile.is_zipfile(path):
-            raise ValueError("File is not a valid ZIP archive")
-
-        skills_root = get_skills_root_path()
-        custom_dir = skills_root / "custom"
-        custom_dir.mkdir(parents=True, exist_ok=True)
-
-        with tempfile.TemporaryDirectory() as tmp:
-            tmp_path = Path(tmp)
-            with zipfile.ZipFile(path, "r") as zf:
-                total_size = sum(info.file_size for info in zf.infolist())
-                if total_size > 100 * 1024 * 1024:
-                    raise ValueError("Skill archive too large when extracted (>100MB)")
-                for info in zf.infolist():
-                    if Path(info.filename).is_absolute() or ".." in Path(info.filename).parts:
-                        raise ValueError(f"Unsafe path in archive: {info.filename}")
-                zf.extractall(tmp_path)
-            for p in tmp_path.rglob("*"):
-                if p.is_symlink():
-                    p.unlink()
-
-            items = list(tmp_path.iterdir())
-            if not items:
-                raise ValueError("Skill archive is empty")
-
-            skill_dir = items[0] if len(items) == 1 and items[0].is_dir() else tmp_path
-
-            is_valid, message, skill_name = _validate_skill_frontmatter(skill_dir)
-            if not is_valid:
-                raise ValueError(f"Invalid skill: {message}")
-            if not re.fullmatch(r"[a-zA-Z0-9_-]+", skill_name):
-                raise ValueError(f"Invalid skill name: {skill_name}")
-
-            target = custom_dir / skill_name
-            if target.exists():
-                raise ValueError(f"Skill '{skill_name}' already exists")
-
-            shutil.copytree(skill_dir, target)
-
-        return {"success": True, "skill_name": skill_name, "message": f"Skill '{skill_name}' installed successfully"}
+        return install_skill_from_archive(skill_path)

    # ------------------------------------------------------------------
    # Public API — memory management
@@ -756,13 +717,6 @@ class DeerFlowClient:
    # Public API — file uploads
    # ------------------------------------------------------------------

-    @staticmethod
-    def _get_uploads_dir(thread_id: str) -> Path:
-        """Get (and create) the uploads directory for a thread."""
-        base = get_paths().sandbox_uploads_dir(thread_id)
-        base.mkdir(parents=True, exist_ok=True)
-        return base
-
    def upload_files(self, thread_id: str, files: list[str | Path]) -> dict:
        """Upload local files into a thread's uploads directory.

@@ -784,7 +738,7 @@ class DeerFlowClient:

        # Validate all files upfront to avoid partial uploads.
        resolved_files = []
-        convertible_extensions = {ext.lower() for ext in CONVERTIBLE_EXTENSIONS}
+        seen_names: set[str] = set()
        has_convertible_file = False
        for f in files:
            p = Path(f)
@@ -792,11 +746,12 @@ class DeerFlowClient:
                raise FileNotFoundError(f"File not found: {f}")
            if not p.is_file():
                raise ValueError(f"Path is not a file: {f}")
-            resolved_files.append(p)
-            if not has_convertible_file and p.suffix.lower() in convertible_extensions:
+            dest_name = claim_unique_filename(p.name, seen_names)
+            resolved_files.append((p, dest_name))
+            if not has_convertible_file and p.suffix.lower() in CONVERTIBLE_EXTENSIONS:
                has_convertible_file = True

-        uploads_dir = self._get_uploads_dir(thread_id)
+        uploads_dir = ensure_uploads_dir(thread_id)
        uploaded_files: list[dict] = []

        conversion_pool = None
@@ -816,19 +771,21 @@ class DeerFlowClient:
            return asyncio.run(convert_file_to_markdown(path))

        try:
-            for src_path in resolved_files:
-                dest = uploads_dir / src_path.name
+            for src_path, dest_name in resolved_files:
+                dest = uploads_dir / dest_name
                shutil.copy2(src_path, dest)

                info: dict[str, Any] = {
-                    "filename": src_path.name,
+                    "filename": dest_name,
                    "size": str(dest.stat().st_size),
                    "path": str(dest),
-                    "virtual_path": f"/mnt/user-data/uploads/{src_path.name}",
-                    "artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{src_path.name}",
+                    "virtual_path": upload_virtual_path(dest_name),
+                    "artifact_url": upload_artifact_url(thread_id, dest_name),
                }
+                if dest_name != src_path.name:
+                    info["original_filename"] = src_path.name

-                if src_path.suffix.lower() in convertible_extensions:
+                if src_path.suffix.lower() in CONVERTIBLE_EXTENSIONS:
                    try:
                        if conversion_pool is not None:
                            md_path = conversion_pool.submit(_convert_in_thread, dest).result()
@@ -844,8 +801,9 @@ class DeerFlowClient:

                    if md_path is not None:
                        info["markdown_file"] = md_path.name
-                        info["markdown_virtual_path"] = f"/mnt/user-data/uploads/{md_path.name}"
-                        info["markdown_artifact_url"] = f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{md_path.name}"
+                        info["markdown_path"] = str(uploads_dir / md_path.name)
+                        info["markdown_virtual_path"] = upload_virtual_path(md_path.name)
+                        info["markdown_artifact_url"] = upload_artifact_url(thread_id, md_path.name)

                uploaded_files.append(info)
        finally:
@@ -868,29 +826,9 @@ class DeerFlowClient:
            Dict with "files" and "count" keys, matching the Gateway API
            ``list_uploaded_files`` response.
        """
-        uploads_dir = self._get_uploads_dir(thread_id)
-        if not uploads_dir.exists():
-            return {"files": [], "count": 0}
-
-        files = []
-        with os.scandir(uploads_dir) as entries:
-            file_entries = [entry for entry in entries if entry.is_file()]
-
-        for entry in sorted(file_entries, key=lambda item: item.name):
-            stat = entry.stat()
-            filename = entry.name
-            files.append(
-                {
-                    "filename": filename,
-                    "size": str(stat.st_size),
-                    "path": str(Path(entry.path)),
-                    "virtual_path": f"/mnt/user-data/uploads/{filename}",
-                    "artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{filename}",
-                    "extension": Path(filename).suffix,
-                    "modified": stat.st_mtime,
-                }
-            )
-        return {"files": files, "count": len(files)}
+        uploads_dir = get_uploads_dir(thread_id)
+        result = list_files_in_dir(uploads_dir)
+        return enrich_file_listing(result, thread_id)

    def delete_upload(self, thread_id: str, filename: str) -> dict:
        """Delete a file from a thread's uploads directory.
@@ -907,19 +845,10 @@ class DeerFlowClient:
            FileNotFoundError: If the file does not exist.
            PermissionError: If path traversal is detected.
        """
-        uploads_dir = self._get_uploads_dir(thread_id)
-        file_path = (uploads_dir / filename).resolve()
+        from deerflow.utils.file_conversion import CONVERTIBLE_EXTENSIONS

-        try:
-            file_path.relative_to(uploads_dir.resolve())
-        except ValueError as exc:
-            raise PermissionError("Access denied: path traversal detected") from exc
-
-        if not file_path.is_file():
-            raise FileNotFoundError(f"File not found: {filename}")
-
-        file_path.unlink()
-        return {"success": True, "message": f"Deleted {filename}"}
+        uploads_dir = get_uploads_dir(thread_id)
+        return delete_file_safe(uploads_dir, filename, convertible_extensions=CONVERTIBLE_EXTENSIONS)

    # ------------------------------------------------------------------
    # Public API — artifacts
@@ -939,19 +868,13 @@ class DeerFlowClient:
            FileNotFoundError: If the artifact does not exist.
            ValueError: If the path is invalid.
        """
-        virtual_prefix = "mnt/user-data"
-        clean_path = path.lstrip("/")
-        if not clean_path.startswith(virtual_prefix):
-            raise ValueError(f"Path must start with /{virtual_prefix}")
-
-        relative = clean_path[len(virtual_prefix) :].lstrip("/")
-        base_dir = get_paths().sandbox_user_data_dir(thread_id)
-        actual = (base_dir / relative).resolve()
-
        try:
-            actual.relative_to(base_dir.resolve())
+            actual = get_paths().resolve_virtual_path(thread_id, path)
        except ValueError as exc:
-            raise PermissionError("Access denied: path traversal detected") from exc
+            if "traversal" in str(exc):
+                from deerflow.uploads.manager import PathTraversalError
+                raise PathTraversalError("Path traversal detected") from exc
+            raise
        if not actual.exists():
            raise FileNotFoundError(f"Artifact not found: {path}")
        if not actual.is_file():
--- a/backend/packages/harness/deerflow/skills/init.py
+++ b/backend/packages/harness/deerflow/skills/init.py
@@ -1,5 +1,14 @@
+from .installer import SkillAlreadyExistsError, install_skill_from_archive
 from .loader import get_skills_root_path, load_skills
 from .types import Skill
 from .validation import ALLOWED_FRONTMATTER_PROPERTIES, _validate_skill_frontmatter

-__all__ = ["load_skills", "get_skills_root_path", "Skill", "ALLOWED_FRONTMATTER_PROPERTIES", "_validate_skill_frontmatter"]
+__all__ = [
+    "load_skills",
+    "get_skills_root_path",
+    "Skill",
+    "ALLOWED_FRONTMATTER_PROPERTIES",
+    "_validate_skill_frontmatter",
+    "install_skill_from_archive",
+    "SkillAlreadyExistsError",
+]
--- a/backend/packages/harness/deerflow/skills/installer.py
+++ b/backend/packages/harness/deerflow/skills/installer.py
@@ -0,0 +1,176 @@
+"""Shared skill archive installation logic.
+
+Pure business logic — no FastAPI/HTTP dependencies.
+Both Gateway and Client delegate to these functions.
+"""
+
+import logging
+import shutil
+import stat
+import tempfile
+import zipfile
+from pathlib import Path
+
+from deerflow.skills.loader import get_skills_root_path
+from deerflow.skills.validation import _validate_skill_frontmatter
+
+logger = logging.getLogger(__name__)
+
+
+class SkillAlreadyExistsError(ValueError):
+    """Raised when a skill with the same name is already installed."""
+
+
+def is_unsafe_zip_member(info: zipfile.ZipInfo) -> bool:
+    """Return True if the zip member path is absolute or attempts directory traversal."""
+    name = info.filename
+    if not name:
+        return False
+    path = Path(name)
+    if path.is_absolute():
+        return True
+    if ".." in path.parts:
+        return True
+    return False
+
+
+def is_symlink_member(info: zipfile.ZipInfo) -> bool:
+    """Detect symlinks based on the external attributes stored in the ZipInfo."""
+    mode = info.external_attr >> 16
+    return stat.S_ISLNK(mode)
+
+
+def should_ignore_archive_entry(path: Path) -> bool:
+    """Return True for macOS metadata dirs and dotfiles."""
+    return path.name.startswith(".") or path.name == "__MACOSX"
+
+
+def resolve_skill_dir_from_archive(temp_path: Path) -> Path:
+    """Locate the skill root directory from extracted archive contents.
+
+    Filters out macOS metadata (__MACOSX) and dotfiles (.DS_Store).
+
+    Returns:
+        Path to the skill directory.
+
+    Raises:
+        ValueError: If the archive is empty after filtering.
+    """
+    items = [p for p in temp_path.iterdir() if not should_ignore_archive_entry(p)]
+    if not items:
+        raise ValueError("Skill archive is empty")
+    if len(items) == 1 and items[0].is_dir():
+        return items[0]
+    return temp_path
+
+
+def safe_extract_skill_archive(
+    zip_ref: zipfile.ZipFile,
+    dest_path: Path,
+    max_total_size: int = 512 * 1024 * 1024,
+) -> None:
+    """Safely extract a skill archive with security protections.
+
+    Protections:
+    - Reject absolute paths and directory traversal (..).
+    - Skip symlink entries instead of materialising them.
+    - Enforce a hard limit on total uncompressed size (zip bomb defence).
+
+    Raises:
+        ValueError: If unsafe members or size limit exceeded.
+    """
+    dest_root = dest_path.resolve()
+    total_written = 0
+
+    for info in zip_ref.infolist():
+        if is_unsafe_zip_member(info):
+            raise ValueError(f"Archive contains unsafe member path: {info.filename!r}")
+
+        if is_symlink_member(info):
+            logger.warning("Skipping symlink entry in skill archive: %s", info.filename)
+            continue
+
+        member_path = dest_root / info.filename
+        if not member_path.resolve().is_relative_to(dest_root):
+            raise ValueError(f"Zip entry escapes destination: {info.filename!r}")
+        member_path.parent.mkdir(parents=True, exist_ok=True)
+
+        if info.is_dir():
+            member_path.mkdir(parents=True, exist_ok=True)
+            continue
+
+        with zip_ref.open(info) as src, member_path.open("wb") as dst:
+            while chunk := src.read(65536):
+                total_written += len(chunk)
+                if total_written > max_total_size:
+                    raise ValueError("Skill archive is too large or appears highly compressed.")
+                dst.write(chunk)
+
+
+def install_skill_from_archive(
+    zip_path: str | Path,
+    *,
+    skills_root: Path | None = None,
+) -> dict:
+    """Install a skill from a .skill archive (ZIP).
+
+    Args:
+        zip_path: Path to the .skill file.
+        skills_root: Override the skills root directory. If None, uses
+            the default from config.
+
+    Returns:
+        Dict with success, skill_name, message.
+
+    Raises:
+        FileNotFoundError: If the file does not exist.
+        ValueError: If the file is invalid (wrong extension, bad ZIP,
+            invalid frontmatter, duplicate name).
+    """
+    logger.info("Installing skill from %s", zip_path)
+    path = Path(zip_path)
+    if not path.is_file():
+        if not path.exists():
+            raise FileNotFoundError(f"Skill file not found: {zip_path}")
+        raise ValueError(f"Path is not a file: {zip_path}")
+    if path.suffix != ".skill":
+        raise ValueError("File must have .skill extension")
+
+    if skills_root is None:
+        skills_root = get_skills_root_path()
+    custom_dir = skills_root / "custom"
+    custom_dir.mkdir(parents=True, exist_ok=True)
+
+    with tempfile.TemporaryDirectory() as tmp:
+        tmp_path = Path(tmp)
+
+        try:
+            zf = zipfile.ZipFile(path, "r")
+        except FileNotFoundError:
+            raise FileNotFoundError(f"Skill file not found: {zip_path}") from None
+        except (zipfile.BadZipFile, IsADirectoryError):
+            raise ValueError("File is not a valid ZIP archive") from None
+
+        with zf:
+            safe_extract_skill_archive(zf, tmp_path)
+
+        skill_dir = resolve_skill_dir_from_archive(tmp_path)
+
+        is_valid, message, skill_name = _validate_skill_frontmatter(skill_dir)
+        if not is_valid:
+            raise ValueError(f"Invalid skill: {message}")
+        if not skill_name or "/" in skill_name or "\\" in skill_name or ".." in skill_name:
+            raise ValueError(f"Invalid skill name: {skill_name}")
+
+        target = custom_dir / skill_name
+        if target.exists():
+            raise SkillAlreadyExistsError(f"Skill '{skill_name}' already exists")
+
+        shutil.copytree(skill_dir, target)
+        logger.info("Skill %r installed to %s", skill_name, target)
+
+    return {
+        "success": True,
+        "skill_name": skill_name,
+        "message": f"Skill '{skill_name}' installed successfully",
+    }
--- a/backend/packages/harness/deerflow/uploads/init.py
+++ b/backend/packages/harness/deerflow/uploads/init.py
@@ -0,0 +1,29 @@
+from .manager import (
+    PathTraversalError,
+    claim_unique_filename,
+    delete_file_safe,
+    enrich_file_listing,
+    ensure_uploads_dir,
+    get_uploads_dir,
+    list_files_in_dir,
+    normalize_filename,
+    upload_artifact_url,
+    upload_virtual_path,
+    validate_path_traversal,
+    validate_thread_id,
+)
+
+__all__ = [
+    "get_uploads_dir",
+    "ensure_uploads_dir",
+    "normalize_filename",
+    "PathTraversalError",
+    "claim_unique_filename",
+    "validate_path_traversal",
+    "list_files_in_dir",
+    "delete_file_safe",
+    "upload_artifact_url",
+    "upload_virtual_path",
+    "enrich_file_listing",
+    "validate_thread_id",
+]
--- a/backend/packages/harness/deerflow/uploads/manager.py
+++ b/backend/packages/harness/deerflow/uploads/manager.py
@@ -0,0 +1,198 @@
+"""Shared upload management logic.
+
+Pure business logic — no FastAPI/HTTP dependencies.
+Both Gateway and Client delegate to these functions.
+"""
+
+import os
+import re
+from pathlib import Path
+from urllib.parse import quote
+
+from deerflow.config.paths import VIRTUAL_PATH_PREFIX, get_paths
+
+
+class PathTraversalError(ValueError):
+    """Raised when a path escapes its allowed base directory."""
+
+# thread_id must be alphanumeric, hyphens, underscores, or dots only.
+_SAFE_THREAD_ID = re.compile(r"^[a-zA-Z0-9._-]+$")
+
+
+def validate_thread_id(thread_id: str) -> None:
+    """Reject thread IDs containing characters unsafe for filesystem paths.
+
+    Raises:
+        ValueError: If thread_id is empty or contains unsafe characters.
+    """
+    if not thread_id or not _SAFE_THREAD_ID.match(thread_id):
+        raise ValueError(f"Invalid thread_id: {thread_id!r}")
+
+
+def get_uploads_dir(thread_id: str) -> Path:
+    """Return the uploads directory path for a thread (no side effects)."""
+    validate_thread_id(thread_id)
+    return get_paths().sandbox_uploads_dir(thread_id)
+
+
+def ensure_uploads_dir(thread_id: str) -> Path:
+    """Return the uploads directory for a thread, creating it if needed."""
+    base = get_uploads_dir(thread_id)
+    base.mkdir(parents=True, exist_ok=True)
+    return base
+
+
+def normalize_filename(filename: str) -> str:
+    """Sanitize a filename by extracting its basename.
+
+    Strips any directory components and rejects traversal patterns.
+
+    Args:
+        filename: Raw filename from user input (may contain path components).
+
+    Returns:
+        Safe filename (basename only).
+
+    Raises:
+        ValueError: If filename is empty or resolves to a traversal pattern.
+    """
+    if not filename:
+        raise ValueError("Filename is empty")
+    safe = Path(filename).name
+    if not safe or safe in {".", ".."}:
+        raise ValueError(f"Filename is unsafe: {filename!r}")
+    # Reject backslashes — on Linux Path.name keeps them as literal chars,
+    # but they indicate a Windows-style path that should be stripped or rejected.
+    if "\\" in safe:
+        raise ValueError(f"Filename contains backslash: {filename!r}")
+    if len(safe.encode("utf-8")) > 255:
+        raise ValueError(f"Filename too long: {len(safe)} chars")
+    return safe
+
+
+def claim_unique_filename(name: str, seen: set[str]) -> str:
+    """Generate a unique filename by appending ``_N`` suffix on collision.
+
+    Automatically adds the returned name to *seen* so callers don't need to.
+
+    Args:
+        name: Candidate filename.
+        seen: Set of filenames already claimed (mutated in place).
+
+    Returns:
+        A filename not present in *seen* (already added to *seen*).
+    """
+    if name not in seen:
+        seen.add(name)
+        return name
+    stem, suffix = Path(name).stem, Path(name).suffix
+    counter = 1
+    candidate = f"{stem}_{counter}{suffix}"
+    while candidate in seen:
+        counter += 1
+        candidate = f"{stem}_{counter}{suffix}"
+    seen.add(candidate)
+    return candidate
+
+
+def validate_path_traversal(path: Path, base: Path) -> None:
+    """Verify that *path* is inside *base*.
+
+    Raises:
+        PathTraversalError: If a path traversal is detected.
+    """
+    try:
+        path.resolve().relative_to(base.resolve())
+    except ValueError:
+        raise PathTraversalError("Path traversal detected") from None
+
+
+def list_files_in_dir(directory: Path) -> dict:
+    """List files (not directories) in *directory*.
+
+    Args:
+        directory: Directory to scan.
+
+    Returns:
+        Dict with "files" list (sorted by name) and "count".
+        Each file entry has ``size`` as *int* (bytes).  Call
+        :func:`enrich_file_listing` to stringify sizes and add
+        virtual / artifact URLs.
+    """
+    if not directory.is_dir():
+        return {"files": [], "count": 0}
+
+    files = []
+    with os.scandir(directory) as entries:
+        for entry in sorted(entries, key=lambda e: e.name):
+            if not entry.is_file(follow_symlinks=False):
+                continue
+            st = entry.stat(follow_symlinks=False)
+            files.append({
+                "filename": entry.name,
+                "size": st.st_size,
+                "path": entry.path,
+                "extension": Path(entry.name).suffix,
+                "modified": st.st_mtime,
+            })
+    return {"files": files, "count": len(files)}
+
+
+def delete_file_safe(base_dir: Path, filename: str, *, convertible_extensions: set[str] | None = None) -> dict:
+    """Delete a file inside *base_dir* after path-traversal validation.
+
+    If *convertible_extensions* is provided and the file's extension matches,
+    the companion ``.md`` file is also removed (if it exists).
+
+    Args:
+        base_dir: Directory containing the file.
+        filename: Name of file to delete.
+        convertible_extensions: Lowercase extensions (e.g. ``{".pdf", ".docx"}``)
+            whose companion markdown should be cleaned up.
+
+    Returns:
+        Dict with success and message.
+
+    Raises:
+        FileNotFoundError: If the file does not exist.
+        PathTraversalError: If path traversal is detected.
+    """
+    file_path = (base_dir / filename).resolve()
+    validate_path_traversal(file_path, base_dir)
+
+    if not file_path.is_file():
+        raise FileNotFoundError(f"File not found: {filename}")
+
+    file_path.unlink()
+
+    # Clean up companion markdown generated during upload conversion.
+    if convertible_extensions and file_path.suffix.lower() in convertible_extensions:
+        file_path.with_suffix(".md").unlink(missing_ok=True)
+
+    return {"success": True, "message": f"Deleted {filename}"}
+
+
+def upload_artifact_url(thread_id: str, filename: str) -> str:
+    """Build the artifact URL for a file in a thread's uploads directory.
+
+    *filename* is percent-encoded so that spaces, ``#``, ``?`` etc. are safe.
+    """
+    return f"/api/threads/{thread_id}/artifacts{VIRTUAL_PATH_PREFIX}/uploads/{quote(filename, safe='')}"
+
+
+def upload_virtual_path(filename: str) -> str:
+    """Build the virtual path for a file in the uploads directory."""
+    return f"{VIRTUAL_PATH_PREFIX}/uploads/{filename}"
+
+
+def enrich_file_listing(result: dict, thread_id: str) -> dict:
+    """Add virtual paths, artifact URLs, and stringify sizes on a listing result.
+
+    Mutates *result* in place and returns it for convenience.
+    """
+    for f in result["files"]:
+        filename = f["filename"]
+        f["size"] = str(f["size"])
+        f["virtual_path"] = upload_virtual_path(filename)
+        f["artifact_url"] = upload_artifact_url(thread_id, filename)
+    return result