refactor: extract shared skill installer and upload manager to harness (#1202)

* refactor: extract shared skill installer and upload manager to harness

Move duplicated business logic from Gateway routers and Client into
shared harness modules, eliminating code duplication.

New shared modules:
- deerflow.skills.installer: 6 functions (zip security, extraction, install)
- deerflow.uploads.manager: 7 functions (normalize, deduplicate, validate,
  list, delete, get_uploads_dir, ensure_uploads_dir)

Key improvements:
- SkillAlreadyExistsError replaces stringly-typed 409 status routing
- normalize_filename rejects backslash-containing filenames
- Read paths (list/delete) no longer mkdir via get_uploads_dir
- Write paths use ensure_uploads_dir for explicit directory creation
- list_files_in_dir does stat inside scandir context (no re-stat)
- install_skill_from_archive uses single is_file() check (one syscall)
- Fix agent config key not reset on update_mcp_config/update_skill

Tests: 42 new (22 installer + 20 upload manager) + client hardening

* refactor: centralize upload URL construction and clean up installer

- Extract upload_virtual_path(), upload_artifact_url(), enrich_file_listing()
  into shared manager.py, eliminating 6 duplicated URL constructions across
  Gateway router and Client
- Derive all upload URLs from VIRTUAL_PATH_PREFIX constant instead of
  hardcoded "mnt/user-data/uploads" strings
- Eliminate TOCTOU pre-checks and double file read in installer — single
  ZipFile() open with exception handling replaces is_file() + is_zipfile()
  + ZipFile() sequence
- Add missing re-exports: ensure_uploads_dir in uploads/__init__.py,
  SkillAlreadyExistsError in skills/__init__.py
- Remove redundant .lower() on already-lowercase CONVERTIBLE_EXTENSIONS
- Hoist sandbox_uploads_dir(thread_id) before loop in uploads router

* fix: add input validation for thread_id and filename length

- Reject thread_id containing unsafe filesystem characters (only allow
  alphanumeric, hyphens, underscores, dots) — prevents 500 on inputs
  like <script> or shell metacharacters
- Reject filenames longer than 255 bytes (OS limit) in normalize_filename
- Gateway upload router maps ValueError to 400 for invalid thread_id

* fix: address PR review — symlink safety, input validation coverage, error ordering

- list_files_in_dir: use follow_symlinks=False to prevent symlink metadata
  leakage; check is_dir() instead of exists() for non-directory paths
- install_skill_from_archive: restore is_file() pre-check before extension
  validation so error messages match the documented exception contract
- validate_thread_id: move from ensure_uploads_dir to get_uploads_dir so
  all entry points (upload/list/delete) are protected
- delete_uploaded_file: catch ValueError from thread_id validation (was 500)
- requires_llm marker: also skip when OPENAI_API_KEY is unset
- e2e fixture: update TitleMiddleware exclusion comment (kept filtering —
  middleware triggers extra LLM calls that add non-determinism to tests)

* chore: revert uv.lock to main — no dependency changes in this PR

* fix: use monkeypatch for global config in e2e fixture to prevent test pollution

The e2e_env fixture was calling set_title_config() and
set_summarization_config() directly, which mutated global singletons
without automatic cleanup. When pytest ran test_client_e2e.py before
test_title_middleware_core_logic.py, the leaked enabled=False caused
5 title tests to fail in CI.

Switched to monkeypatch.setattr on the module-level private variables
so pytest restores the originals after each test.

* fix: address code review — URL encoding, API consistency, test isolation

- upload_artifact_url: percent-encode filename to handle spaces/#/?
- deduplicate_filename: mutate seen set in place (caller no longer
  needs manual .add() — less error-prone API)
- list_files_in_dir: document that size is int, enrich stringifies
- e2e fixture: monkeypatch _app_config instead of set_app_config()
  to prevent global singleton pollution (same pattern as title/summarization fix)
- _make_e2e_config: read LLM connection details from env vars so
  external contributors can override defaults
- Update tests to match new deduplicate_filename contract

* docs: rewrite RFC in English and add alternatives/breaking changes sections

* fix: address code review feedback on PR #1202

- Rename deduplicate_filename to claim_unique_filename to make
  the in-place set mutation explicit in the function name
- Replace PermissionError with PathTraversalError(ValueError) for
  path traversal detection — malformed input is 400, not 403

* fix: set _app_config_is_custom in e2e test fixture to prevent config.yaml lookup in CI

---------

Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
This commit is contained in:
greatmengqi
2026-03-25 16:28:33 +08:00
committed by GitHub
parent ec46ae075d
commit b8bc80d89b
14 changed files with 2591 additions and 567 deletions

View File

@@ -1,9 +1,5 @@
import json
import logging
import shutil
import stat
import tempfile
import zipfile
from pathlib import Path
from fastapi import APIRouter, HTTPException
@@ -12,80 +8,10 @@ from pydantic import BaseModel, Field
from app.gateway.path_utils import resolve_thread_virtual_path
from deerflow.config.extensions_config import ExtensionsConfig, SkillStateConfig, get_extensions_config, reload_extensions_config
from deerflow.skills import Skill, load_skills
from deerflow.skills.loader import get_skills_root_path
from deerflow.skills.validation import _validate_skill_frontmatter
from deerflow.skills.installer import SkillAlreadyExistsError, install_skill_from_archive
logger = logging.getLogger(__name__)
def _is_unsafe_zip_member(info: zipfile.ZipInfo) -> bool:
"""Return True if the zip member path is absolute or attempts directory traversal."""
name = info.filename
if not name:
return False
path = Path(name)
if path.is_absolute():
return True
if ".." in path.parts:
return True
return False
def _is_symlink_member(info: zipfile.ZipInfo) -> bool:
"""Detect symlinks based on the external attributes stored in the ZipInfo."""
# Upper 16 bits of external_attr contain the Unix file mode when created on Unix.
mode = info.external_attr >> 16
return stat.S_ISLNK(mode)
def _safe_extract_skill_archive(
zip_ref: zipfile.ZipFile,
dest_path: Path,
max_total_size: int = 512 * 1024 * 1024,
) -> None:
"""Safely extract a skill archive into dest_path with basic protections.
Protections:
- Reject absolute paths and directory traversal (..).
- Skip symlink entries instead of materialising them.
- Enforce a hard limit on total uncompressed size to mitigate zip bombs.
"""
dest_root = Path(dest_path).resolve()
total_size = 0
for info in zip_ref.infolist():
# Reject absolute paths or any path that attempts directory traversal.
if _is_unsafe_zip_member(info):
raise HTTPException(
status_code=400,
detail=f"Archive contains unsafe member path: {info.filename!r}",
)
# Skip any symlink entries instead of materialising them on disk.
if _is_symlink_member(info):
logger.warning("Skipping symlink entry in skill archive: %s", info.filename)
continue
# Basic unzip-bomb defence: bound the total uncompressed size we will write.
total_size += max(info.file_size, 0)
if total_size > max_total_size:
raise HTTPException(
status_code=400,
detail="Skill archive is too large or appears highly compressed.",
)
member_path = dest_root / info.filename
member_path_parent = member_path.parent
member_path_parent.mkdir(parents=True, exist_ok=True)
if info.is_dir():
member_path.mkdir(parents=True, exist_ok=True)
continue
with zip_ref.open(info) as src, open(member_path, "wb") as dst:
shutil.copyfileobj(src, dst)
router = APIRouter(prefix="/api", tags=["skills"])
@@ -126,19 +52,6 @@ class SkillInstallResponse(BaseModel):
message: str = Field(..., description="Installation result message")
def _should_ignore_archive_entry(path: Path) -> bool:
return path.name.startswith(".") or path.name == "__MACOSX"
def _resolve_skill_dir_from_archive_root(temp_path: Path) -> Path:
extracted_items = [item for item in temp_path.iterdir() if not _should_ignore_archive_entry(item)]
if len(extracted_items) == 0:
raise HTTPException(status_code=400, detail="Skill archive is empty")
if len(extracted_items) == 1 and extracted_items[0].is_dir():
return extracted_items[0]
return temp_path
def _skill_to_response(skill: Skill) -> SkillResponse:
"""Convert a Skill object to a SkillResponse."""
return SkillResponse(
@@ -157,37 +70,7 @@ def _skill_to_response(skill: Skill) -> SkillResponse:
description="Retrieve a list of all available skills from both public and custom directories.",
)
async def list_skills() -> SkillsListResponse:
"""List all available skills.
Returns all skills regardless of their enabled status.
Returns:
A list of all skills with their metadata.
Example Response:
```json
{
"skills": [
{
"name": "PDF Processing",
"description": "Extract and analyze PDF content",
"license": "MIT",
"category": "public",
"enabled": true
},
{
"name": "Frontend Design",
"description": "Generate frontend designs and components",
"license": null,
"category": "custom",
"enabled": false
}
]
}
```
"""
try:
# Load all skills (including disabled ones)
skills = load_skills(enabled_only=False)
return SkillsListResponse(skills=[_skill_to_response(skill) for skill in skills])
except Exception as e:
@@ -202,28 +85,6 @@ async def list_skills() -> SkillsListResponse:
description="Retrieve detailed information about a specific skill by its name.",
)
async def get_skill(skill_name: str) -> SkillResponse:
"""Get a specific skill by name.
Args:
skill_name: The name of the skill to retrieve.
Returns:
Skill information if found.
Raises:
HTTPException: 404 if skill not found.
Example Response:
```json
{
"name": "PDF Processing",
"description": "Extract and analyze PDF content",
"license": "MIT",
"category": "public",
"enabled": true
}
```
"""
try:
skills = load_skills(enabled_only=False)
skill = next((s for s in skills if s.name == skill_name), None)
@@ -246,76 +107,32 @@ async def get_skill(skill_name: str) -> SkillResponse:
description="Update a skill's enabled status by modifying the extensions_config.json file.",
)
async def update_skill(skill_name: str, request: SkillUpdateRequest) -> SkillResponse:
"""Update a skill's enabled status.
This will modify the extensions_config.json file to update the enabled state.
The SKILL.md file itself is not modified.
Args:
skill_name: The name of the skill to update.
request: The update request containing the new enabled status.
Returns:
The updated skill information.
Raises:
HTTPException: 404 if skill not found, 500 if update fails.
Example Request:
```json
{
"enabled": false
}
```
Example Response:
```json
{
"name": "PDF Processing",
"description": "Extract and analyze PDF content",
"license": "MIT",
"category": "public",
"enabled": false
}
```
"""
try:
# Find the skill to verify it exists
skills = load_skills(enabled_only=False)
skill = next((s for s in skills if s.name == skill_name), None)
if skill is None:
raise HTTPException(status_code=404, detail=f"Skill '{skill_name}' not found")
# Get or create config path
config_path = ExtensionsConfig.resolve_config_path()
if config_path is None:
# Create new config file in parent directory (project root)
config_path = Path.cwd().parent / "extensions_config.json"
logger.info(f"No existing extensions config found. Creating new config at: {config_path}")
# Load current configuration
extensions_config = get_extensions_config()
# Update the skill's enabled status
extensions_config.skills[skill_name] = SkillStateConfig(enabled=request.enabled)
# Convert to JSON format (preserve MCP servers config)
config_data = {
"mcpServers": {name: server.model_dump() for name, server in extensions_config.mcp_servers.items()},
"skills": {name: {"enabled": skill_config.enabled} for name, skill_config in extensions_config.skills.items()},
}
# Write the configuration to file
with open(config_path, "w", encoding="utf-8") as f:
json.dump(config_data, f, indent=2)
logger.info(f"Skills configuration updated and saved to: {config_path}")
# Reload the extensions config to update the global cache
reload_extensions_config()
# Reload the skills to get the updated status (for API response)
skills = load_skills(enabled_only=False)
updated_skill = next((s for s in skills if s.name == skill_name), None)
@@ -339,98 +156,16 @@ async def update_skill(skill_name: str, request: SkillUpdateRequest) -> SkillRes
description="Install a skill from a .skill file (ZIP archive) located in the thread's user-data directory.",
)
async def install_skill(request: SkillInstallRequest) -> SkillInstallResponse:
"""Install a skill from a .skill file.
The .skill file is a ZIP archive containing a skill directory with SKILL.md
and optional resources (scripts, references, assets).
Args:
request: The install request containing thread_id and virtual path to .skill file.
Returns:
Installation result with skill name and status message.
Raises:
HTTPException:
- 400 if path is invalid or file is not a valid .skill file
- 403 if access denied (path traversal detected)
- 404 if file not found
- 409 if skill already exists
- 500 if installation fails
Example Request:
```json
{
"thread_id": "abc123-def456",
"path": "/mnt/user-data/outputs/my-skill.skill"
}
```
Example Response:
```json
{
"success": true,
"skill_name": "my-skill",
"message": "Skill 'my-skill' installed successfully"
}
```
"""
try:
# Resolve the virtual path to actual file path
skill_file_path = resolve_thread_virtual_path(request.thread_id, request.path)
# Check if file exists
if not skill_file_path.exists():
raise HTTPException(status_code=404, detail=f"Skill file not found: {request.path}")
# Check if it's a file
if not skill_file_path.is_file():
raise HTTPException(status_code=400, detail=f"Path is not a file: {request.path}")
# Check file extension
if not skill_file_path.suffix == ".skill":
raise HTTPException(status_code=400, detail="File must have .skill extension")
# Verify it's a valid ZIP file
if not zipfile.is_zipfile(skill_file_path):
raise HTTPException(status_code=400, detail="File is not a valid ZIP archive")
# Get the custom skills directory
skills_root = get_skills_root_path()
custom_skills_dir = skills_root / "custom"
# Create custom directory if it doesn't exist
custom_skills_dir.mkdir(parents=True, exist_ok=True)
# Extract to a temporary directory first for validation
with tempfile.TemporaryDirectory() as temp_dir:
temp_path = Path(temp_dir)
# Extract the .skill file with validation and protections.
with zipfile.ZipFile(skill_file_path, "r") as zip_ref:
_safe_extract_skill_archive(zip_ref, temp_path)
skill_dir = _resolve_skill_dir_from_archive_root(temp_path)
# Validate the skill
is_valid, message, skill_name = _validate_skill_frontmatter(skill_dir)
if not is_valid:
raise HTTPException(status_code=400, detail=f"Invalid skill: {message}")
if not skill_name:
raise HTTPException(status_code=400, detail="Could not determine skill name")
# Check if skill already exists
target_dir = custom_skills_dir / skill_name
if target_dir.exists():
raise HTTPException(status_code=409, detail=f"Skill '{skill_name}' already exists. Please remove it first or use a different name.")
# Move the skill directory to the custom skills directory
shutil.copytree(skill_dir, target_dir)
logger.info(f"Skill '{skill_name}' installed successfully to {target_dir}")
return SkillInstallResponse(success=True, skill_name=skill_name, message=f"Skill '{skill_name}' installed successfully")
result = install_skill_from_archive(skill_file_path)
return SkillInstallResponse(**result)
except FileNotFoundError as e:
raise HTTPException(status_code=404, detail=str(e))
except SkillAlreadyExistsError as e:
raise HTTPException(status_code=409, detail=str(e))
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except HTTPException:
raise
except Exception as e:

View File

@@ -1,13 +1,23 @@
"""Upload router for handling file uploads."""
import logging
from pathlib import Path
from fastapi import APIRouter, File, HTTPException, UploadFile
from pydantic import BaseModel
from deerflow.config.paths import VIRTUAL_PATH_PREFIX, get_paths
from deerflow.config.paths import get_paths
from deerflow.sandbox.sandbox_provider import get_sandbox_provider
from deerflow.uploads.manager import (
PathTraversalError,
delete_file_safe,
enrich_file_listing,
ensure_uploads_dir,
get_uploads_dir,
list_files_in_dir,
normalize_filename,
upload_artifact_url,
upload_virtual_path,
)
from deerflow.utils.file_conversion import CONVERTIBLE_EXTENSIONS, convert_file_to_markdown
logger = logging.getLogger(__name__)
@@ -23,18 +33,6 @@ class UploadResponse(BaseModel):
message: str
def get_uploads_dir(thread_id: str) -> Path:
"""Get the uploads directory for a thread.
Args:
thread_id: The thread ID.
Returns:
Path to the uploads directory.
"""
base_dir = get_paths().sandbox_uploads_dir(thread_id)
base_dir.mkdir(parents=True, exist_ok=True)
return base_dir
@router.post("", response_model=UploadResponse)
@@ -42,23 +40,15 @@ async def upload_files(
thread_id: str,
files: list[UploadFile] = File(...),
) -> UploadResponse:
"""Upload multiple files to a thread's uploads directory.
For PDF, PPT, Excel, and Word files, they will be converted to markdown using markitdown.
All files (original and converted) are saved to /mnt/user-data/uploads.
Args:
thread_id: The thread ID to upload files to.
files: List of files to upload.
Returns:
Upload response with success status and file information.
"""
"""Upload multiple files to a thread's uploads directory."""
if not files:
raise HTTPException(status_code=400, detail="No files provided")
uploads_dir = get_uploads_dir(thread_id)
paths = get_paths()
try:
uploads_dir = ensure_uploads_dir(thread_id)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
sandbox_uploads = get_paths().sandbox_uploads_dir(thread_id)
uploaded_files = []
sandbox_provider = get_sandbox_provider()
@@ -70,50 +60,44 @@ async def upload_files(
continue
try:
# Normalize filename to prevent path traversal
safe_filename = Path(file.filename).name
if not safe_filename or safe_filename in {".", ".."} or "/" in safe_filename or "\\" in safe_filename:
logger.warning(f"Skipping file with unsafe filename: {file.filename!r}")
continue
safe_filename = normalize_filename(file.filename)
except ValueError:
logger.warning(f"Skipping file with unsafe filename: {file.filename!r}")
continue
try:
content = await file.read()
file_path = uploads_dir / safe_filename
file_path.write_bytes(content)
# Build relative path from backend root
relative_path = str(paths.sandbox_uploads_dir(thread_id) / safe_filename)
virtual_path = f"{VIRTUAL_PATH_PREFIX}/uploads/{safe_filename}"
virtual_path = upload_virtual_path(safe_filename)
# Keep local sandbox source of truth in thread-scoped host storage.
# For non-local sandboxes, also sync to virtual path for runtime visibility.
if sandbox_id != "local":
sandbox.update_file(virtual_path, content)
file_info = {
"filename": safe_filename,
"size": str(len(content)),
"path": relative_path, # Actual filesystem path (relative to backend/)
"virtual_path": virtual_path, # Path for Agent in sandbox
"artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{safe_filename}", # HTTP URL
"path": str(sandbox_uploads / safe_filename),
"virtual_path": virtual_path,
"artifact_url": upload_artifact_url(thread_id, safe_filename),
}
logger.info(f"Saved file: {safe_filename} ({len(content)} bytes) to {relative_path}")
logger.info(f"Saved file: {safe_filename} ({len(content)} bytes) to {file_info['path']}")
# Check if file should be converted to markdown
file_ext = file_path.suffix.lower()
if file_ext in CONVERTIBLE_EXTENSIONS:
md_path = await convert_file_to_markdown(file_path)
if md_path:
md_relative_path = str(paths.sandbox_uploads_dir(thread_id) / md_path.name)
md_virtual_path = f"{VIRTUAL_PATH_PREFIX}/uploads/{md_path.name}"
md_virtual_path = upload_virtual_path(md_path.name)
if sandbox_id != "local":
sandbox.update_file(md_virtual_path, md_path.read_bytes())
file_info["markdown_file"] = md_path.name
file_info["markdown_path"] = md_relative_path
file_info["markdown_path"] = str(sandbox_uploads / md_path.name)
file_info["markdown_virtual_path"] = md_virtual_path
file_info["markdown_artifact_url"] = f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{md_path.name}"
file_info["markdown_artifact_url"] = upload_artifact_url(thread_id, md_path.name)
uploaded_files.append(file_info)
@@ -130,69 +114,35 @@ async def upload_files(
@router.get("/list", response_model=dict)
async def list_uploaded_files(thread_id: str) -> dict:
"""List all files in a thread's uploads directory.
"""List all files in a thread's uploads directory."""
try:
uploads_dir = get_uploads_dir(thread_id)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
result = list_files_in_dir(uploads_dir)
enrich_file_listing(result, thread_id)
Args:
thread_id: The thread ID to list files for.
# Gateway additionally includes the sandbox-relative path.
sandbox_uploads = get_paths().sandbox_uploads_dir(thread_id)
for f in result["files"]:
f["path"] = str(sandbox_uploads / f["filename"])
Returns:
Dictionary containing list of files with their metadata.
"""
uploads_dir = get_uploads_dir(thread_id)
if not uploads_dir.exists():
return {"files": [], "count": 0}
files = []
for file_path in sorted(uploads_dir.iterdir()):
if file_path.is_file():
stat = file_path.stat()
relative_path = str(get_paths().sandbox_uploads_dir(thread_id) / file_path.name)
files.append(
{
"filename": file_path.name,
"size": stat.st_size,
"path": relative_path, # Actual filesystem path
"virtual_path": f"{VIRTUAL_PATH_PREFIX}/uploads/{file_path.name}", # Path for Agent in sandbox
"artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{file_path.name}", # HTTP URL
"extension": file_path.suffix,
"modified": stat.st_mtime,
}
)
return {"files": files, "count": len(files)}
return result
@router.delete("/{filename}")
async def delete_uploaded_file(thread_id: str, filename: str) -> dict:
"""Delete a file from a thread's uploads directory.
Args:
thread_id: The thread ID.
filename: The filename to delete.
Returns:
Success message.
"""
uploads_dir = get_uploads_dir(thread_id)
file_path = uploads_dir / filename
if not file_path.exists():
"""Delete a file from a thread's uploads directory."""
try:
uploads_dir = get_uploads_dir(thread_id)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
try:
return delete_file_safe(uploads_dir, filename, convertible_extensions=CONVERTIBLE_EXTENSIONS)
except FileNotFoundError:
raise HTTPException(status_code=404, detail=f"File not found: {filename}")
# Security check: ensure the path is within the uploads directory
try:
file_path.resolve().relative_to(uploads_dir.resolve())
except ValueError:
raise HTTPException(status_code=403, detail="Access denied")
try:
if file_path.suffix.lower() in CONVERTIBLE_EXTENSIONS:
companion_markdown = file_path.with_suffix(".md")
companion_markdown.unlink(missing_ok=True)
file_path.unlink(missing_ok=True)
logger.info(f"Deleted file: {filename}")
return {"success": True, "message": f"Deleted {filename}"}
except PathTraversalError:
raise HTTPException(status_code=400, detail="Invalid path")
except Exception as e:
logger.error(f"Failed to delete {filename}: {e}")
raise HTTPException(status_code=500, detail=f"Failed to delete {filename}: {str(e)}")

View File

@@ -0,0 +1,190 @@
# RFC: Extract Shared Skill Installer and Upload Manager into Harness
## 1. Problem
Gateway (`app/gateway/routers/skills.py`, `uploads.py`) and Client (`deerflow/client.py`) each independently implement the same business logic:
### Skill Installation
| Logic | Gateway (`skills.py`) | Client (`client.py`) |
|-------|----------------------|---------------------|
| Zip safety check | `_is_unsafe_zip_member()` | Inline `Path(info.filename).is_absolute()` |
| Symlink filtering | `_is_symlink_member()` | `p.is_symlink()` post-extraction delete |
| Zip bomb defence | `total_size += info.file_size` (declared) | `total_size > 100MB` (declared) |
| macOS metadata filter | `_should_ignore_archive_entry()` | None |
| Frontmatter validation | `_validate_skill_frontmatter()` | `_validate_skill_frontmatter()` |
| Duplicate detection | `HTTPException(409)` | `ValueError` |
**Two implementations, inconsistent behaviour**: Gateway streams writes and tracks real decompressed size; Client sums declared `file_size`. Gateway skips symlinks during extraction; Client extracts everything then walks and deletes symlinks.
### Upload Management
| Logic | Gateway (`uploads.py`) | Client (`client.py`) |
|-------|----------------------|---------------------|
| Directory access | `get_uploads_dir()` + `mkdir` | `_get_uploads_dir()` + `mkdir` |
| Filename safety | Inline `Path(f).name` + manual checks | No checks, uses `src_path.name` directly |
| Duplicate handling | None (overwrites) | None (overwrites) |
| Listing | Inline `iterdir()` | Inline `os.scandir()` |
| Deletion | Inline `unlink()` + traversal check | Inline `unlink()` + traversal check |
| Path traversal | `resolve().relative_to()` | `resolve().relative_to()` |
**The same traversal check is written twice** — any security fix must be applied to both locations.
## 2. Design Principles
### Dependency Direction
```
app.gateway.routers.skills ──┐
app.gateway.routers.uploads ──┤── calls ──→ deerflow.skills.installer
deerflow.client ──┘ deerflow.uploads.manager
```
- Shared modules live in the harness layer (`deerflow.*`), pure business logic, no FastAPI dependency
- Gateway handles HTTP adaptation (`UploadFile` → bytes, exceptions → `HTTPException`)
- Client handles local adaptation (`Path` → copy, exceptions → Python exceptions)
- Satisfies `test_harness_boundary.py` constraint: harness never imports app
### Exception Strategy
| Shared Layer Exception | Gateway Maps To | Client |
|----------------------|-----------------|--------|
| `FileNotFoundError` | `HTTPException(404)` | Propagates |
| `ValueError` | `HTTPException(400)` | Propagates |
| `SkillAlreadyExistsError` | `HTTPException(409)` | Propagates |
| `PermissionError` | `HTTPException(403)` | Propagates |
Replaces stringly-typed routing (`"already exists" in str(e)`) with typed exception matching (`SkillAlreadyExistsError`).
## 3. New Modules
### 3.1 `deerflow.skills.installer`
```python
# Safety checks
is_unsafe_zip_member(info: ZipInfo) -> bool # Absolute path / .. traversal
is_symlink_member(info: ZipInfo) -> bool # Unix symlink detection
should_ignore_archive_entry(path: Path) -> bool # __MACOSX / dotfiles
# Extraction
safe_extract_skill_archive(zip_ref, dest_path, max_total_size=512MB)
# Streaming write, accumulates real bytes (vs declared file_size)
# Dual traversal check: member-level + resolve-level
# Directory resolution
resolve_skill_dir_from_archive(temp_path: Path) -> Path
# Auto-enters single directory, filters macOS metadata
# Install entry point
install_skill_from_archive(zip_path, *, skills_root=None) -> dict
# is_file() pre-check before extension validation
# SkillAlreadyExistsError replaces ValueError
# Exception
class SkillAlreadyExistsError(ValueError)
```
### 3.2 `deerflow.uploads.manager`
```python
# Directory management
get_uploads_dir(thread_id: str) -> Path # Pure path, no side effects
ensure_uploads_dir(thread_id: str) -> Path # Creates directory (for write paths)
# Filename safety
normalize_filename(filename: str) -> str
# Path.name extraction + rejects ".." / "." / backslash / >255 bytes
deduplicate_filename(name: str, seen: set) -> str
# _N suffix increment for dedup, mutates seen in place
# Path safety
validate_path_traversal(path: Path, base: Path) -> None
# resolve().relative_to(), raises PermissionError on failure
# File operations
list_files_in_dir(directory: Path) -> dict
# scandir with stat inside context (no re-stat)
# follow_symlinks=False to prevent metadata leakage
# Non-existent directory returns empty list
delete_file_safe(base_dir: Path, filename: str) -> dict
# Validates traversal first, then unlinks
# URL helpers
upload_artifact_url(thread_id, filename) -> str # Percent-encoded for HTTP safety
upload_virtual_path(filename) -> str # Sandbox-internal path
enrich_file_listing(result, thread_id) -> dict # Adds URLs, stringifies sizes
```
## 4. Changes
### 4.1 Gateway Slimming
**`app/gateway/routers/skills.py`**:
- Remove `_is_unsafe_zip_member`, `_is_symlink_member`, `_safe_extract_skill_archive`, `_should_ignore_archive_entry`, `_resolve_skill_dir_from_archive_root` (~80 lines)
- `install_skill` route becomes a single call to `install_skill_from_archive(path)`
- Exception mapping: `SkillAlreadyExistsError → 409`, `ValueError → 400`, `FileNotFoundError → 404`
**`app/gateway/routers/uploads.py`**:
- Remove inline `get_uploads_dir` (replaced by `ensure_uploads_dir`/`get_uploads_dir`)
- `upload_files` uses `normalize_filename()` instead of inline safety checks
- `list_uploaded_files` uses `list_files_in_dir()` + enrichment
- `delete_uploaded_file` uses `delete_file_safe()` + companion markdown cleanup
### 4.2 Client Slimming
**`deerflow/client.py`**:
- Remove `_get_uploads_dir` static method
- Remove ~50 lines of inline zip handling in `install_skill`
- `install_skill` delegates to `install_skill_from_archive()`
- `upload_files` uses `deduplicate_filename()` + `ensure_uploads_dir()`
- `list_uploads` uses `get_uploads_dir()` + `list_files_in_dir()`
- `delete_upload` uses `get_uploads_dir()` + `delete_file_safe()`
- `update_mcp_config` / `update_skill` now reset `_agent_config_key = None`
### 4.3 Read/Write Path Separation
| Operation | Function | Creates dir? |
|-----------|----------|:------------:|
| upload (write) | `ensure_uploads_dir()` | Yes |
| list (read) | `get_uploads_dir()` | No |
| delete (read) | `get_uploads_dir()` | No |
Read paths no longer have `mkdir` side effects — non-existent directories return empty lists.
## 5. Security Improvements
| Improvement | Before | After |
|-------------|--------|-------|
| Zip bomb detection | Sum of declared `file_size` | Streaming write, accumulates real bytes |
| Symlink handling | Gateway skips / Client deletes post-extract | Unified skip + log |
| Traversal check | Member-level only | Member-level + `resolve().is_relative_to()` |
| Filename backslash | Gateway checks / Client doesn't | Unified rejection |
| Filename length | No check | Reject > 255 bytes (OS limit) |
| thread_id validation | None | Reject unsafe filesystem characters |
| Listing symlink leak | `follow_symlinks=True` (default) | `follow_symlinks=False` |
| 409 status routing | `"already exists" in str(e)` | `SkillAlreadyExistsError` type match |
| Artifact URL encoding | Raw filename in URL | `urllib.parse.quote()` |
## 6. Alternatives Considered
| Alternative | Why Not |
|-------------|---------|
| Keep logic in Gateway, Client calls Gateway via HTTP | Adds network dependency to embedded Client; defeats the purpose of `DeerFlowClient` as an in-process API |
| Abstract base class with Gateway/Client subclasses | Over-engineered for what are pure functions; no polymorphism needed |
| Move everything into `client.py` and have Gateway import it | Violates harness/app boundary — Client is in harness, but Gateway-specific models (Pydantic response types) should stay in app layer |
| Merge Gateway and Client into one module | They serve different consumers (HTTP vs in-process) with different adaptation needs |
## 7. Breaking Changes
**None.** All public APIs (Gateway HTTP endpoints, `DeerFlowClient` methods) retain their existing signatures and return formats. The `SkillAlreadyExistsError` is a subclass of `ValueError`, so existing `except ValueError` handlers still catch it.
## 8. Tests
| Module | Test File | Count |
|--------|-----------|:-----:|
| `skills.installer` | `tests/test_skills_installer.py` | 22 |
| `uploads.manager` | `tests/test_uploads_manager.py` | 20 |
| `client` hardening | `tests/test_client.py` (new cases) | ~40 |
| `client` e2e | `tests/test_client_e2e.py` (new file) | ~20 |
Coverage: unsafe zip / symlink / zip bomb / frontmatter / duplicate / extension / macOS filter / normalize / deduplicate / traversal / list / delete / agent invalidation / upload lifecycle / thread isolation / URL encoding / config pollution.

View File

@@ -19,12 +19,9 @@ import asyncio
import json
import logging
import mimetypes
import os
import re
import shutil
import tempfile
import uuid
import zipfile
from collections.abc import Generator
from dataclasses import dataclass, field
from pathlib import Path
@@ -42,6 +39,17 @@ from deerflow.config.app_config import get_app_config, reload_app_config
from deerflow.config.extensions_config import ExtensionsConfig, SkillStateConfig, get_extensions_config, reload_extensions_config
from deerflow.config.paths import get_paths
from deerflow.models import create_chat_model
from deerflow.skills.installer import install_skill_from_archive
from deerflow.uploads.manager import (
claim_unique_filename,
delete_file_safe,
enrich_file_listing,
ensure_uploads_dir,
get_uploads_dir,
list_files_in_dir,
upload_artifact_url,
upload_virtual_path,
)
logger = logging.getLogger(__name__)
@@ -566,6 +574,7 @@ class DeerFlowClient:
self._atomic_write_json(config_path, config_data)
self._agent = None
self._agent_config_key = None
reloaded = reload_extensions_config()
return {"mcp_servers": {name: server.model_dump() for name, server in reloaded.mcp_servers.items()}}
@@ -631,6 +640,7 @@ class DeerFlowClient:
self._atomic_write_json(config_path, config_data)
self._agent = None
self._agent_config_key = None
reload_extensions_config()
updated = next((s for s in load_skills(enabled_only=False) if s.name == name), None)
@@ -657,56 +667,7 @@ class DeerFlowClient:
FileNotFoundError: If the file does not exist.
ValueError: If the file is invalid.
"""
from deerflow.skills.loader import get_skills_root_path
from deerflow.skills.validation import _validate_skill_frontmatter
path = Path(skill_path)
if not path.exists():
raise FileNotFoundError(f"Skill file not found: {skill_path}")
if not path.is_file():
raise ValueError(f"Path is not a file: {skill_path}")
if path.suffix != ".skill":
raise ValueError("File must have .skill extension")
if not zipfile.is_zipfile(path):
raise ValueError("File is not a valid ZIP archive")
skills_root = get_skills_root_path()
custom_dir = skills_root / "custom"
custom_dir.mkdir(parents=True, exist_ok=True)
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
with zipfile.ZipFile(path, "r") as zf:
total_size = sum(info.file_size for info in zf.infolist())
if total_size > 100 * 1024 * 1024:
raise ValueError("Skill archive too large when extracted (>100MB)")
for info in zf.infolist():
if Path(info.filename).is_absolute() or ".." in Path(info.filename).parts:
raise ValueError(f"Unsafe path in archive: {info.filename}")
zf.extractall(tmp_path)
for p in tmp_path.rglob("*"):
if p.is_symlink():
p.unlink()
items = list(tmp_path.iterdir())
if not items:
raise ValueError("Skill archive is empty")
skill_dir = items[0] if len(items) == 1 and items[0].is_dir() else tmp_path
is_valid, message, skill_name = _validate_skill_frontmatter(skill_dir)
if not is_valid:
raise ValueError(f"Invalid skill: {message}")
if not re.fullmatch(r"[a-zA-Z0-9_-]+", skill_name):
raise ValueError(f"Invalid skill name: {skill_name}")
target = custom_dir / skill_name
if target.exists():
raise ValueError(f"Skill '{skill_name}' already exists")
shutil.copytree(skill_dir, target)
return {"success": True, "skill_name": skill_name, "message": f"Skill '{skill_name}' installed successfully"}
return install_skill_from_archive(skill_path)
# ------------------------------------------------------------------
# Public API — memory management
@@ -756,13 +717,6 @@ class DeerFlowClient:
# Public API — file uploads
# ------------------------------------------------------------------
@staticmethod
def _get_uploads_dir(thread_id: str) -> Path:
"""Get (and create) the uploads directory for a thread."""
base = get_paths().sandbox_uploads_dir(thread_id)
base.mkdir(parents=True, exist_ok=True)
return base
def upload_files(self, thread_id: str, files: list[str | Path]) -> dict:
"""Upload local files into a thread's uploads directory.
@@ -784,7 +738,7 @@ class DeerFlowClient:
# Validate all files upfront to avoid partial uploads.
resolved_files = []
convertible_extensions = {ext.lower() for ext in CONVERTIBLE_EXTENSIONS}
seen_names: set[str] = set()
has_convertible_file = False
for f in files:
p = Path(f)
@@ -792,11 +746,12 @@ class DeerFlowClient:
raise FileNotFoundError(f"File not found: {f}")
if not p.is_file():
raise ValueError(f"Path is not a file: {f}")
resolved_files.append(p)
if not has_convertible_file and p.suffix.lower() in convertible_extensions:
dest_name = claim_unique_filename(p.name, seen_names)
resolved_files.append((p, dest_name))
if not has_convertible_file and p.suffix.lower() in CONVERTIBLE_EXTENSIONS:
has_convertible_file = True
uploads_dir = self._get_uploads_dir(thread_id)
uploads_dir = ensure_uploads_dir(thread_id)
uploaded_files: list[dict] = []
conversion_pool = None
@@ -816,19 +771,21 @@ class DeerFlowClient:
return asyncio.run(convert_file_to_markdown(path))
try:
for src_path in resolved_files:
dest = uploads_dir / src_path.name
for src_path, dest_name in resolved_files:
dest = uploads_dir / dest_name
shutil.copy2(src_path, dest)
info: dict[str, Any] = {
"filename": src_path.name,
"filename": dest_name,
"size": str(dest.stat().st_size),
"path": str(dest),
"virtual_path": f"/mnt/user-data/uploads/{src_path.name}",
"artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{src_path.name}",
"virtual_path": upload_virtual_path(dest_name),
"artifact_url": upload_artifact_url(thread_id, dest_name),
}
if dest_name != src_path.name:
info["original_filename"] = src_path.name
if src_path.suffix.lower() in convertible_extensions:
if src_path.suffix.lower() in CONVERTIBLE_EXTENSIONS:
try:
if conversion_pool is not None:
md_path = conversion_pool.submit(_convert_in_thread, dest).result()
@@ -844,8 +801,9 @@ class DeerFlowClient:
if md_path is not None:
info["markdown_file"] = md_path.name
info["markdown_virtual_path"] = f"/mnt/user-data/uploads/{md_path.name}"
info["markdown_artifact_url"] = f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{md_path.name}"
info["markdown_path"] = str(uploads_dir / md_path.name)
info["markdown_virtual_path"] = upload_virtual_path(md_path.name)
info["markdown_artifact_url"] = upload_artifact_url(thread_id, md_path.name)
uploaded_files.append(info)
finally:
@@ -868,29 +826,9 @@ class DeerFlowClient:
Dict with "files" and "count" keys, matching the Gateway API
``list_uploaded_files`` response.
"""
uploads_dir = self._get_uploads_dir(thread_id)
if not uploads_dir.exists():
return {"files": [], "count": 0}
files = []
with os.scandir(uploads_dir) as entries:
file_entries = [entry for entry in entries if entry.is_file()]
for entry in sorted(file_entries, key=lambda item: item.name):
stat = entry.stat()
filename = entry.name
files.append(
{
"filename": filename,
"size": str(stat.st_size),
"path": str(Path(entry.path)),
"virtual_path": f"/mnt/user-data/uploads/{filename}",
"artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{filename}",
"extension": Path(filename).suffix,
"modified": stat.st_mtime,
}
)
return {"files": files, "count": len(files)}
uploads_dir = get_uploads_dir(thread_id)
result = list_files_in_dir(uploads_dir)
return enrich_file_listing(result, thread_id)
def delete_upload(self, thread_id: str, filename: str) -> dict:
"""Delete a file from a thread's uploads directory.
@@ -907,19 +845,10 @@ class DeerFlowClient:
FileNotFoundError: If the file does not exist.
PermissionError: If path traversal is detected.
"""
uploads_dir = self._get_uploads_dir(thread_id)
file_path = (uploads_dir / filename).resolve()
from deerflow.utils.file_conversion import CONVERTIBLE_EXTENSIONS
try:
file_path.relative_to(uploads_dir.resolve())
except ValueError as exc:
raise PermissionError("Access denied: path traversal detected") from exc
if not file_path.is_file():
raise FileNotFoundError(f"File not found: {filename}")
file_path.unlink()
return {"success": True, "message": f"Deleted {filename}"}
uploads_dir = get_uploads_dir(thread_id)
return delete_file_safe(uploads_dir, filename, convertible_extensions=CONVERTIBLE_EXTENSIONS)
# ------------------------------------------------------------------
# Public API — artifacts
@@ -939,19 +868,13 @@ class DeerFlowClient:
FileNotFoundError: If the artifact does not exist.
ValueError: If the path is invalid.
"""
virtual_prefix = "mnt/user-data"
clean_path = path.lstrip("/")
if not clean_path.startswith(virtual_prefix):
raise ValueError(f"Path must start with /{virtual_prefix}")
relative = clean_path[len(virtual_prefix) :].lstrip("/")
base_dir = get_paths().sandbox_user_data_dir(thread_id)
actual = (base_dir / relative).resolve()
try:
actual.relative_to(base_dir.resolve())
actual = get_paths().resolve_virtual_path(thread_id, path)
except ValueError as exc:
raise PermissionError("Access denied: path traversal detected") from exc
if "traversal" in str(exc):
from deerflow.uploads.manager import PathTraversalError
raise PathTraversalError("Path traversal detected") from exc
raise
if not actual.exists():
raise FileNotFoundError(f"Artifact not found: {path}")
if not actual.is_file():

View File

@@ -1,5 +1,14 @@
from .installer import SkillAlreadyExistsError, install_skill_from_archive
from .loader import get_skills_root_path, load_skills
from .types import Skill
from .validation import ALLOWED_FRONTMATTER_PROPERTIES, _validate_skill_frontmatter
__all__ = ["load_skills", "get_skills_root_path", "Skill", "ALLOWED_FRONTMATTER_PROPERTIES", "_validate_skill_frontmatter"]
__all__ = [
"load_skills",
"get_skills_root_path",
"Skill",
"ALLOWED_FRONTMATTER_PROPERTIES",
"_validate_skill_frontmatter",
"install_skill_from_archive",
"SkillAlreadyExistsError",
]

View File

@@ -0,0 +1,176 @@
"""Shared skill archive installation logic.
Pure business logic — no FastAPI/HTTP dependencies.
Both Gateway and Client delegate to these functions.
"""
import logging
import shutil
import stat
import tempfile
import zipfile
from pathlib import Path
from deerflow.skills.loader import get_skills_root_path
from deerflow.skills.validation import _validate_skill_frontmatter
logger = logging.getLogger(__name__)
class SkillAlreadyExistsError(ValueError):
"""Raised when a skill with the same name is already installed."""
def is_unsafe_zip_member(info: zipfile.ZipInfo) -> bool:
"""Return True if the zip member path is absolute or attempts directory traversal."""
name = info.filename
if not name:
return False
path = Path(name)
if path.is_absolute():
return True
if ".." in path.parts:
return True
return False
def is_symlink_member(info: zipfile.ZipInfo) -> bool:
"""Detect symlinks based on the external attributes stored in the ZipInfo."""
mode = info.external_attr >> 16
return stat.S_ISLNK(mode)
def should_ignore_archive_entry(path: Path) -> bool:
"""Return True for macOS metadata dirs and dotfiles."""
return path.name.startswith(".") or path.name == "__MACOSX"
def resolve_skill_dir_from_archive(temp_path: Path) -> Path:
"""Locate the skill root directory from extracted archive contents.
Filters out macOS metadata (__MACOSX) and dotfiles (.DS_Store).
Returns:
Path to the skill directory.
Raises:
ValueError: If the archive is empty after filtering.
"""
items = [p for p in temp_path.iterdir() if not should_ignore_archive_entry(p)]
if not items:
raise ValueError("Skill archive is empty")
if len(items) == 1 and items[0].is_dir():
return items[0]
return temp_path
def safe_extract_skill_archive(
zip_ref: zipfile.ZipFile,
dest_path: Path,
max_total_size: int = 512 * 1024 * 1024,
) -> None:
"""Safely extract a skill archive with security protections.
Protections:
- Reject absolute paths and directory traversal (..).
- Skip symlink entries instead of materialising them.
- Enforce a hard limit on total uncompressed size (zip bomb defence).
Raises:
ValueError: If unsafe members or size limit exceeded.
"""
dest_root = dest_path.resolve()
total_written = 0
for info in zip_ref.infolist():
if is_unsafe_zip_member(info):
raise ValueError(f"Archive contains unsafe member path: {info.filename!r}")
if is_symlink_member(info):
logger.warning("Skipping symlink entry in skill archive: %s", info.filename)
continue
member_path = dest_root / info.filename
if not member_path.resolve().is_relative_to(dest_root):
raise ValueError(f"Zip entry escapes destination: {info.filename!r}")
member_path.parent.mkdir(parents=True, exist_ok=True)
if info.is_dir():
member_path.mkdir(parents=True, exist_ok=True)
continue
with zip_ref.open(info) as src, member_path.open("wb") as dst:
while chunk := src.read(65536):
total_written += len(chunk)
if total_written > max_total_size:
raise ValueError("Skill archive is too large or appears highly compressed.")
dst.write(chunk)
def install_skill_from_archive(
zip_path: str | Path,
*,
skills_root: Path | None = None,
) -> dict:
"""Install a skill from a .skill archive (ZIP).
Args:
zip_path: Path to the .skill file.
skills_root: Override the skills root directory. If None, uses
the default from config.
Returns:
Dict with success, skill_name, message.
Raises:
FileNotFoundError: If the file does not exist.
ValueError: If the file is invalid (wrong extension, bad ZIP,
invalid frontmatter, duplicate name).
"""
logger.info("Installing skill from %s", zip_path)
path = Path(zip_path)
if not path.is_file():
if not path.exists():
raise FileNotFoundError(f"Skill file not found: {zip_path}")
raise ValueError(f"Path is not a file: {zip_path}")
if path.suffix != ".skill":
raise ValueError("File must have .skill extension")
if skills_root is None:
skills_root = get_skills_root_path()
custom_dir = skills_root / "custom"
custom_dir.mkdir(parents=True, exist_ok=True)
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
try:
zf = zipfile.ZipFile(path, "r")
except FileNotFoundError:
raise FileNotFoundError(f"Skill file not found: {zip_path}") from None
except (zipfile.BadZipFile, IsADirectoryError):
raise ValueError("File is not a valid ZIP archive") from None
with zf:
safe_extract_skill_archive(zf, tmp_path)
skill_dir = resolve_skill_dir_from_archive(tmp_path)
is_valid, message, skill_name = _validate_skill_frontmatter(skill_dir)
if not is_valid:
raise ValueError(f"Invalid skill: {message}")
if not skill_name or "/" in skill_name or "\\" in skill_name or ".." in skill_name:
raise ValueError(f"Invalid skill name: {skill_name}")
target = custom_dir / skill_name
if target.exists():
raise SkillAlreadyExistsError(f"Skill '{skill_name}' already exists")
shutil.copytree(skill_dir, target)
logger.info("Skill %r installed to %s", skill_name, target)
return {
"success": True,
"skill_name": skill_name,
"message": f"Skill '{skill_name}' installed successfully",
}

View File

@@ -0,0 +1,29 @@
from .manager import (
PathTraversalError,
claim_unique_filename,
delete_file_safe,
enrich_file_listing,
ensure_uploads_dir,
get_uploads_dir,
list_files_in_dir,
normalize_filename,
upload_artifact_url,
upload_virtual_path,
validate_path_traversal,
validate_thread_id,
)
__all__ = [
"get_uploads_dir",
"ensure_uploads_dir",
"normalize_filename",
"PathTraversalError",
"claim_unique_filename",
"validate_path_traversal",
"list_files_in_dir",
"delete_file_safe",
"upload_artifact_url",
"upload_virtual_path",
"enrich_file_listing",
"validate_thread_id",
]

View File

@@ -0,0 +1,198 @@
"""Shared upload management logic.
Pure business logic — no FastAPI/HTTP dependencies.
Both Gateway and Client delegate to these functions.
"""
import os
import re
from pathlib import Path
from urllib.parse import quote
from deerflow.config.paths import VIRTUAL_PATH_PREFIX, get_paths
class PathTraversalError(ValueError):
"""Raised when a path escapes its allowed base directory."""
# thread_id must be alphanumeric, hyphens, underscores, or dots only.
_SAFE_THREAD_ID = re.compile(r"^[a-zA-Z0-9._-]+$")
def validate_thread_id(thread_id: str) -> None:
"""Reject thread IDs containing characters unsafe for filesystem paths.
Raises:
ValueError: If thread_id is empty or contains unsafe characters.
"""
if not thread_id or not _SAFE_THREAD_ID.match(thread_id):
raise ValueError(f"Invalid thread_id: {thread_id!r}")
def get_uploads_dir(thread_id: str) -> Path:
"""Return the uploads directory path for a thread (no side effects)."""
validate_thread_id(thread_id)
return get_paths().sandbox_uploads_dir(thread_id)
def ensure_uploads_dir(thread_id: str) -> Path:
"""Return the uploads directory for a thread, creating it if needed."""
base = get_uploads_dir(thread_id)
base.mkdir(parents=True, exist_ok=True)
return base
def normalize_filename(filename: str) -> str:
"""Sanitize a filename by extracting its basename.
Strips any directory components and rejects traversal patterns.
Args:
filename: Raw filename from user input (may contain path components).
Returns:
Safe filename (basename only).
Raises:
ValueError: If filename is empty or resolves to a traversal pattern.
"""
if not filename:
raise ValueError("Filename is empty")
safe = Path(filename).name
if not safe or safe in {".", ".."}:
raise ValueError(f"Filename is unsafe: {filename!r}")
# Reject backslashes — on Linux Path.name keeps them as literal chars,
# but they indicate a Windows-style path that should be stripped or rejected.
if "\\" in safe:
raise ValueError(f"Filename contains backslash: {filename!r}")
if len(safe.encode("utf-8")) > 255:
raise ValueError(f"Filename too long: {len(safe)} chars")
return safe
def claim_unique_filename(name: str, seen: set[str]) -> str:
"""Generate a unique filename by appending ``_N`` suffix on collision.
Automatically adds the returned name to *seen* so callers don't need to.
Args:
name: Candidate filename.
seen: Set of filenames already claimed (mutated in place).
Returns:
A filename not present in *seen* (already added to *seen*).
"""
if name not in seen:
seen.add(name)
return name
stem, suffix = Path(name).stem, Path(name).suffix
counter = 1
candidate = f"{stem}_{counter}{suffix}"
while candidate in seen:
counter += 1
candidate = f"{stem}_{counter}{suffix}"
seen.add(candidate)
return candidate
def validate_path_traversal(path: Path, base: Path) -> None:
"""Verify that *path* is inside *base*.
Raises:
PathTraversalError: If a path traversal is detected.
"""
try:
path.resolve().relative_to(base.resolve())
except ValueError:
raise PathTraversalError("Path traversal detected") from None
def list_files_in_dir(directory: Path) -> dict:
"""List files (not directories) in *directory*.
Args:
directory: Directory to scan.
Returns:
Dict with "files" list (sorted by name) and "count".
Each file entry has ``size`` as *int* (bytes). Call
:func:`enrich_file_listing` to stringify sizes and add
virtual / artifact URLs.
"""
if not directory.is_dir():
return {"files": [], "count": 0}
files = []
with os.scandir(directory) as entries:
for entry in sorted(entries, key=lambda e: e.name):
if not entry.is_file(follow_symlinks=False):
continue
st = entry.stat(follow_symlinks=False)
files.append({
"filename": entry.name,
"size": st.st_size,
"path": entry.path,
"extension": Path(entry.name).suffix,
"modified": st.st_mtime,
})
return {"files": files, "count": len(files)}
def delete_file_safe(base_dir: Path, filename: str, *, convertible_extensions: set[str] | None = None) -> dict:
"""Delete a file inside *base_dir* after path-traversal validation.
If *convertible_extensions* is provided and the file's extension matches,
the companion ``.md`` file is also removed (if it exists).
Args:
base_dir: Directory containing the file.
filename: Name of file to delete.
convertible_extensions: Lowercase extensions (e.g. ``{".pdf", ".docx"}``)
whose companion markdown should be cleaned up.
Returns:
Dict with success and message.
Raises:
FileNotFoundError: If the file does not exist.
PathTraversalError: If path traversal is detected.
"""
file_path = (base_dir / filename).resolve()
validate_path_traversal(file_path, base_dir)
if not file_path.is_file():
raise FileNotFoundError(f"File not found: {filename}")
file_path.unlink()
# Clean up companion markdown generated during upload conversion.
if convertible_extensions and file_path.suffix.lower() in convertible_extensions:
file_path.with_suffix(".md").unlink(missing_ok=True)
return {"success": True, "message": f"Deleted {filename}"}
def upload_artifact_url(thread_id: str, filename: str) -> str:
"""Build the artifact URL for a file in a thread's uploads directory.
*filename* is percent-encoded so that spaces, ``#``, ``?`` etc. are safe.
"""
return f"/api/threads/{thread_id}/artifacts{VIRTUAL_PATH_PREFIX}/uploads/{quote(filename, safe='')}"
def upload_virtual_path(filename: str) -> str:
"""Build the virtual path for a file in the uploads directory."""
return f"{VIRTUAL_PATH_PREFIX}/uploads/{filename}"
def enrich_file_listing(result: dict, thread_id: str) -> dict:
"""Add virtual paths, artifact URLs, and stringify sizes on a listing result.
Mutates *result* in place and returns it for convenience.
"""
for f in result["files"]:
filename = f["filename"]
f["size"] = str(f["size"])
f["virtual_path"] = upload_virtual_path(filename)
f["artifact_url"] = upload_artifact_url(thread_id, filename)
return result

View File

@@ -9,7 +9,7 @@ from pathlib import Path
from unittest.mock import MagicMock, patch
import pytest
from langchain_core.messages import AIMessage, HumanMessage, ToolMessage # noqa: F401
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage # noqa: F401
from app.gateway.routers.mcp import McpConfigResponse
from app.gateway.routers.memory import MemoryConfigResponse, MemoryStatusResponse
@@ -17,6 +17,8 @@ from app.gateway.routers.models import ModelResponse, ModelsListResponse
from app.gateway.routers.skills import SkillInstallResponse, SkillResponse, SkillsListResponse
from app.gateway.routers.uploads import UploadResponse
from deerflow.client import DeerFlowClient
from deerflow.config.paths import Paths
from deerflow.uploads.manager import PathTraversalError
# ---------------------------------------------------------------------------
# Fixtures
@@ -609,10 +611,7 @@ class TestSkillsManagement:
skills_root = tmp_path / "skills"
(skills_root / "custom").mkdir(parents=True)
with (
patch("deerflow.skills.loader.get_skills_root_path", return_value=skills_root),
patch("deerflow.skills.validation._validate_skill_frontmatter", return_value=(True, "OK", "my-skill")),
):
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
result = client.install_skill(archive_path)
assert result["success"] is True
@@ -700,7 +699,7 @@ class TestUploads:
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.upload_files("thread-1", [src_file])
assert result["success"] is True
@@ -756,7 +755,7 @@ class TestUploads:
return client.upload_files("thread-async", [first, second])
with (
patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir),
patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir),
patch("deerflow.utils.file_conversion.CONVERTIBLE_EXTENSIONS", {".pdf"}),
patch("deerflow.utils.file_conversion.convert_file_to_markdown", side_effect=fake_convert),
patch("concurrent.futures.ThreadPoolExecutor", FakeExecutor),
@@ -777,7 +776,7 @@ class TestUploads:
(uploads_dir / "a.txt").write_text("a")
(uploads_dir / "b.txt").write_text("bb")
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.list_uploads("thread-1")
assert result["count"] == 2
@@ -793,7 +792,7 @@ class TestUploads:
uploads_dir = Path(tmp)
(uploads_dir / "delete-me.txt").write_text("gone")
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.delete_upload("thread-1", "delete-me.txt")
assert result["success"] is True
@@ -802,15 +801,15 @@ class TestUploads:
def test_delete_upload_not_found(self, client):
with tempfile.TemporaryDirectory() as tmp:
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=Path(tmp)):
with patch("deerflow.client.get_uploads_dir", return_value=Path(tmp)):
with pytest.raises(FileNotFoundError):
client.delete_upload("thread-1", "nope.txt")
def test_delete_upload_path_traversal(self, client):
with tempfile.TemporaryDirectory() as tmp:
uploads_dir = Path(tmp)
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
with pytest.raises(PermissionError):
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
with pytest.raises(PathTraversalError):
client.delete_upload("thread-1", "../../etc/passwd")
@@ -822,15 +821,12 @@ class TestUploads:
class TestArtifacts:
def test_get_artifact(self, client):
with tempfile.TemporaryDirectory() as tmp:
user_data_dir = Path(tmp) / "user-data"
outputs = user_data_dir / "outputs"
paths = Paths(base_dir=tmp)
outputs = paths.sandbox_outputs_dir("t1")
outputs.mkdir(parents=True)
(outputs / "result.txt").write_text("artifact content")
mock_paths = MagicMock()
mock_paths.sandbox_user_data_dir.return_value = user_data_dir
with patch("deerflow.client.get_paths", return_value=mock_paths):
with patch("deerflow.client.get_paths", return_value=paths):
content, mime = client.get_artifact("t1", "mnt/user-data/outputs/result.txt")
assert content == b"artifact content"
@@ -838,13 +834,10 @@ class TestArtifacts:
def test_get_artifact_not_found(self, client):
with tempfile.TemporaryDirectory() as tmp:
user_data_dir = Path(tmp) / "user-data"
user_data_dir.mkdir()
paths = Paths(base_dir=tmp)
paths.sandbox_user_data_dir("t1").mkdir(parents=True)
mock_paths = MagicMock()
mock_paths.sandbox_user_data_dir.return_value = user_data_dir
with patch("deerflow.client.get_paths", return_value=mock_paths):
with patch("deerflow.client.get_paths", return_value=paths):
with pytest.raises(FileNotFoundError):
client.get_artifact("t1", "mnt/user-data/outputs/nope.txt")
@@ -854,14 +847,11 @@ class TestArtifacts:
def test_get_artifact_path_traversal(self, client):
with tempfile.TemporaryDirectory() as tmp:
user_data_dir = Path(tmp) / "user-data"
user_data_dir.mkdir()
paths = Paths(base_dir=tmp)
paths.sandbox_user_data_dir("t1").mkdir(parents=True)
mock_paths = MagicMock()
mock_paths.sandbox_user_data_dir.return_value = user_data_dir
with patch("deerflow.client.get_paths", return_value=mock_paths):
with pytest.raises(PermissionError):
with patch("deerflow.client.get_paths", return_value=paths):
with pytest.raises(PathTraversalError):
client.get_artifact("t1", "mnt/user-data/../../../etc/passwd")
@@ -1013,7 +1003,7 @@ class TestScenarioFileLifecycle:
(tmp_path / "report.txt").write_text("quarterly report data")
(tmp_path / "data.csv").write_text("a,b,c\n1,2,3")
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
# Step 1: Upload
result = client.upload_files(
"t-lifecycle",
@@ -1046,15 +1036,16 @@ class TestScenarioFileLifecycle:
tmp_path = Path(tmp)
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
user_data_dir = tmp_path / "user-data"
outputs_dir = user_data_dir / "outputs"
paths = Paths(base_dir=tmp_path)
outputs_dir = paths.sandbox_outputs_dir("t-artifact")
outputs_dir.mkdir(parents=True)
# Upload phase
src_file = tmp_path / "input.txt"
src_file.write_text("raw data to process")
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
uploaded = client.upload_files("t-artifact", [src_file])
assert len(uploaded["files"]) == 1
@@ -1062,10 +1053,7 @@ class TestScenarioFileLifecycle:
(outputs_dir / "analysis.json").write_text('{"result": "processed"}')
# Retrieve artifact
mock_paths = MagicMock()
mock_paths.sandbox_user_data_dir.return_value = user_data_dir
with patch("deerflow.client.get_paths", return_value=mock_paths):
with patch("deerflow.client.get_paths", return_value=paths):
content, mime = client.get_artifact("t-artifact", "mnt/user-data/outputs/analysis.json")
assert json.loads(content) == {"result": "processed"}
@@ -1286,7 +1274,7 @@ class TestScenarioThreadIsolation:
def get_dir(thread_id):
return uploads_a if thread_id == "thread-a" else uploads_b
with patch.object(DeerFlowClient, "_get_uploads_dir", side_effect=get_dir):
with patch("deerflow.client.get_uploads_dir", side_effect=get_dir), patch("deerflow.client.ensure_uploads_dir", side_effect=get_dir):
client.upload_files("thread-a", [src_file])
files_a = client.list_uploads("thread-a")
@@ -1298,18 +1286,13 @@ class TestScenarioThreadIsolation:
def test_artifacts_isolated_per_thread(self, client):
"""Artifacts in thread-A are not accessible from thread-B."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
paths = Paths(base_dir=tmp)
outputs_a = paths.sandbox_outputs_dir("thread-a")
outputs_a.mkdir(parents=True)
paths.sandbox_user_data_dir("thread-b").mkdir(parents=True)
(outputs_a / "result.txt").write_text("thread-a artifact")
data_a = tmp_path / "thread-a"
data_b = tmp_path / "thread-b"
(data_a / "outputs").mkdir(parents=True)
(data_b / "outputs").mkdir(parents=True)
(data_a / "outputs" / "result.txt").write_text("thread-a artifact")
mock_paths = MagicMock()
mock_paths.sandbox_user_data_dir.side_effect = lambda tid: data_a if tid == "thread-a" else data_b
with patch("deerflow.client.get_paths", return_value=mock_paths):
with patch("deerflow.client.get_paths", return_value=paths):
content, _ = client.get_artifact("thread-a", "mnt/user-data/outputs/result.txt")
assert content == b"thread-a artifact"
@@ -1377,10 +1360,7 @@ class TestScenarioSkillInstallAndUse:
(skills_root / "custom").mkdir(parents=True)
# Step 1: Install
with (
patch("deerflow.skills.loader.get_skills_root_path", return_value=skills_root),
patch("deerflow.skills.validation._validate_skill_frontmatter", return_value=(True, "OK", "my-analyzer")),
):
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
result = client.install_skill(archive)
assert result["success"] is True
assert (skills_root / "custom" / "my-analyzer" / "SKILL.md").exists()
@@ -1512,7 +1492,7 @@ class TestScenarioEdgeCases:
pdf_file.write_bytes(b"%PDF-1.4 fake content")
with (
patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir),
patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir),
patch("deerflow.utils.file_conversion.CONVERTIBLE_EXTENSIONS", {".pdf"}),
patch("deerflow.utils.file_conversion.convert_file_to_markdown", side_effect=Exception("conversion failed")),
):
@@ -1614,9 +1594,7 @@ class TestGatewayConformance:
with zipfile.ZipFile(archive, "w") as zf:
zf.write(skill_dir / "SKILL.md", "my-skill/SKILL.md")
custom_dir = tmp_path / "custom"
custom_dir.mkdir()
with patch("deerflow.skills.loader.get_skills_root_path", return_value=tmp_path):
with patch("deerflow.skills.installer.get_skills_root_path", return_value=tmp_path):
result = client.install_skill(archive)
parsed = SkillInstallResponse(**result)
@@ -1680,7 +1658,7 @@ class TestGatewayConformance:
src_file = tmp_path / "hello.txt"
src_file.write_text("hello")
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.upload_files("t-conform", [src_file])
parsed = UploadResponse(**result)
@@ -1739,3 +1717,694 @@ class TestGatewayConformance:
parsed = MemoryStatusResponse(**result)
assert parsed.config.enabled is True
assert parsed.data.version == "1.0"
# ===========================================================================
# Hardening — install_skill security gates
# ===========================================================================
class TestInstallSkillSecurity:
"""Every security gate in install_skill() must have a red-line test."""
def test_zip_bomb_rejected(self, client):
"""Archives whose extracted size exceeds the limit are rejected."""
with tempfile.TemporaryDirectory() as tmp:
archive = Path(tmp) / "bomb.skill"
# Create a small archive that claims huge uncompressed size.
# Write 200 bytes but the safe_extract checks cumulative file_size.
data = b"\x00" * 200
with zipfile.ZipFile(archive, "w", compression=zipfile.ZIP_DEFLATED) as zf:
zf.writestr("big.bin", data)
skills_root = Path(tmp) / "skills"
(skills_root / "custom").mkdir(parents=True)
# Patch max_total_size to a small value to trigger the bomb check.
from deerflow.skills import installer as _installer
orig = _installer.safe_extract_skill_archive
def patched_extract(zf, dest, max_total_size=100):
return orig(zf, dest, max_total_size=100)
with (
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
patch("deerflow.skills.installer.safe_extract_skill_archive", side_effect=patched_extract),
):
with pytest.raises(ValueError, match="too large"):
client.install_skill(archive)
def test_absolute_path_in_archive_rejected(self, client):
"""ZIP entries with absolute paths are rejected."""
with tempfile.TemporaryDirectory() as tmp:
archive = Path(tmp) / "abs.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.writestr("/etc/passwd", "root:x:0:0")
skills_root = Path(tmp) / "skills"
(skills_root / "custom").mkdir(parents=True)
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
with pytest.raises(ValueError, match="unsafe"):
client.install_skill(archive)
def test_dotdot_path_in_archive_rejected(self, client):
"""ZIP entries with '..' path components are rejected."""
with tempfile.TemporaryDirectory() as tmp:
archive = Path(tmp) / "traversal.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.writestr("skill/../../../etc/shadow", "bad")
skills_root = Path(tmp) / "skills"
(skills_root / "custom").mkdir(parents=True)
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
with pytest.raises(ValueError, match="unsafe"):
client.install_skill(archive)
def test_symlinks_skipped_during_extraction(self, client):
"""Symlink entries in the archive are skipped (never written to disk)."""
import stat as stat_mod
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
archive = tmp_path / "sym-skill.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.writestr("sym-skill/SKILL.md", "---\nname: sym-skill\ndescription: test\n---\nBody")
# Inject a symlink entry via ZipInfo with Unix symlink mode.
link_info = zipfile.ZipInfo("sym-skill/sneaky_link")
link_info.external_attr = (stat_mod.S_IFLNK | 0o777) << 16
zf.writestr(link_info, "/etc/passwd")
skills_root = tmp_path / "skills"
(skills_root / "custom").mkdir(parents=True)
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
result = client.install_skill(archive)
assert result["success"] is True
installed = skills_root / "custom" / "sym-skill"
assert (installed / "SKILL.md").exists()
assert not (installed / "sneaky_link").exists()
def test_invalid_skill_name_rejected(self, client):
"""Skill names containing special characters are rejected."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
skill_dir = tmp_path / "bad-name"
skill_dir.mkdir()
(skill_dir / "SKILL.md").write_text("---\nname: ../evil\ndescription: test\n---\n")
archive = tmp_path / "bad.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.write(skill_dir / "SKILL.md", "bad-name/SKILL.md")
skills_root = tmp_path / "skills"
(skills_root / "custom").mkdir(parents=True)
with (
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(True, "OK", "../evil")),
):
with pytest.raises(ValueError, match="Invalid skill name"):
client.install_skill(archive)
def test_existing_skill_rejected(self, client):
"""Installing a skill that already exists is rejected."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
skill_dir = tmp_path / "dupe-skill"
skill_dir.mkdir()
(skill_dir / "SKILL.md").write_text("---\nname: dupe-skill\ndescription: test\n---\n")
archive = tmp_path / "dupe-skill.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.write(skill_dir / "SKILL.md", "dupe-skill/SKILL.md")
skills_root = tmp_path / "skills"
(skills_root / "custom" / "dupe-skill").mkdir(parents=True)
with (
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(True, "OK", "dupe-skill")),
):
with pytest.raises(ValueError, match="already exists"):
client.install_skill(archive)
def test_empty_archive_rejected(self, client):
"""An archive with no entries is rejected."""
with tempfile.TemporaryDirectory() as tmp:
archive = Path(tmp) / "empty.skill"
with zipfile.ZipFile(archive, "w"):
pass # empty archive
skills_root = Path(tmp) / "skills"
(skills_root / "custom").mkdir(parents=True)
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
with pytest.raises(ValueError, match="empty"):
client.install_skill(archive)
def test_invalid_frontmatter_rejected(self, client):
"""Archive with invalid SKILL.md frontmatter is rejected."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
skill_dir = tmp_path / "bad-meta"
skill_dir.mkdir()
(skill_dir / "SKILL.md").write_text("no frontmatter at all")
archive = tmp_path / "bad-meta.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.write(skill_dir / "SKILL.md", "bad-meta/SKILL.md")
skills_root = tmp_path / "skills"
(skills_root / "custom").mkdir(parents=True)
with (
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(False, "Missing name field", "")),
):
with pytest.raises(ValueError, match="Invalid skill"):
client.install_skill(archive)
def test_not_a_zip_rejected(self, client):
"""A .skill file that is not a valid ZIP is rejected."""
with tempfile.TemporaryDirectory() as tmp:
archive = Path(tmp) / "fake.skill"
archive.write_text("this is not a zip file")
with pytest.raises(ValueError, match="not a valid ZIP"):
client.install_skill(archive)
def test_directory_path_rejected(self, client):
"""Passing a directory instead of a file is rejected."""
with tempfile.TemporaryDirectory() as tmp:
with pytest.raises(ValueError, match="not a file"):
client.install_skill(tmp)
# ===========================================================================
# Hardening — _atomic_write_json error paths
# ===========================================================================
class TestAtomicWriteJson:
def test_temp_file_cleaned_on_serialization_failure(self):
"""If json.dump raises, the temp file is removed."""
with tempfile.TemporaryDirectory() as tmp:
target = Path(tmp) / "config.json"
# An object that cannot be serialized to JSON.
bad_data = {"key": object()}
with pytest.raises(TypeError):
DeerFlowClient._atomic_write_json(target, bad_data)
# Target should not have been created.
assert not target.exists()
# No stray .tmp files should remain.
tmp_files = list(Path(tmp).glob("*.tmp"))
assert tmp_files == []
def test_happy_path_writes_atomically(self):
"""Normal write produces correct JSON and no temp files."""
with tempfile.TemporaryDirectory() as tmp:
target = Path(tmp) / "out.json"
data = {"key": "value", "nested": [1, 2, 3]}
DeerFlowClient._atomic_write_json(target, data)
assert target.exists()
with open(target) as f:
loaded = json.load(f)
assert loaded == data
# No temp files left behind.
assert list(Path(tmp).glob("*.tmp")) == []
def test_original_preserved_on_failure(self):
"""If write fails, the original file is not corrupted."""
with tempfile.TemporaryDirectory() as tmp:
target = Path(tmp) / "config.json"
target.write_text('{"original": true}')
bad_data = {"key": object()}
with pytest.raises(TypeError):
DeerFlowClient._atomic_write_json(target, bad_data)
# Original content must survive.
with open(target) as f:
assert json.load(f) == {"original": True}
# ===========================================================================
# Hardening — config update error paths
# ===========================================================================
class TestConfigUpdateErrors:
def test_update_mcp_config_no_config_file(self, client):
"""FileNotFoundError when extensions_config.json cannot be located."""
with patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=None):
with pytest.raises(FileNotFoundError, match="Cannot locate"):
client.update_mcp_config({"server": {}})
def test_update_skill_no_config_file(self, client):
"""FileNotFoundError when extensions_config.json cannot be located."""
skill = MagicMock()
skill.name = "some-skill"
with (
patch("deerflow.skills.loader.load_skills", return_value=[skill]),
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=None),
):
with pytest.raises(FileNotFoundError, match="Cannot locate"):
client.update_skill("some-skill", enabled=False)
def test_update_skill_disappears_after_write(self, client):
"""RuntimeError when skill vanishes between write and re-read."""
skill = MagicMock()
skill.name = "ghost-skill"
ext_config = MagicMock()
ext_config.mcp_servers = {}
ext_config.skills = {}
with tempfile.TemporaryDirectory() as tmp:
config_file = Path(tmp) / "extensions_config.json"
config_file.write_text("{}")
with (
patch("deerflow.skills.loader.load_skills", side_effect=[[skill], []]),
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
patch("deerflow.client.get_extensions_config", return_value=ext_config),
patch("deerflow.client.reload_extensions_config"),
):
with pytest.raises(RuntimeError, match="disappeared"):
client.update_skill("ghost-skill", enabled=False)
# ===========================================================================
# Hardening — stream / chat edge cases
# ===========================================================================
class TestStreamHardening:
def test_agent_exception_propagates(self, client):
"""Exceptions from agent.stream() propagate to caller."""
agent = MagicMock()
agent.stream.side_effect = RuntimeError("model quota exceeded")
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
with pytest.raises(RuntimeError, match="model quota exceeded"):
list(client.stream("hi", thread_id="t-err"))
def test_messages_without_id(self, client):
"""Messages without id attribute are emitted without crashing."""
ai = AIMessage(content="no id here")
# Forcibly remove the id attribute to simulate edge case.
object.__setattr__(ai, "id", None)
chunks = [{"messages": [ai]}]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-noid"))
# Should produce events without error.
assert events[-1].type == "end"
ai_events = _ai_events(events)
assert len(ai_events) == 1
assert ai_events[0].data["content"] == "no id here"
def test_tool_calls_only_no_text(self, client):
"""chat() returns empty string when agent only emits tool calls."""
ai = AIMessage(
content="",
id="ai-1",
tool_calls=[{"name": "bash", "args": {"cmd": "ls"}, "id": "tc-1"}],
)
tool = ToolMessage(content="output", id="tm-1", tool_call_id="tc-1", name="bash")
chunks = [
{"messages": [ai]},
{"messages": [ai, tool]},
]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
result = client.chat("do it", thread_id="t-tc-only")
assert result == ""
def test_duplicate_messages_without_id_not_deduplicated(self, client):
"""Messages with id=None are NOT deduplicated (each is emitted)."""
ai1 = AIMessage(content="first")
ai2 = AIMessage(content="second")
object.__setattr__(ai1, "id", None)
object.__setattr__(ai2, "id", None)
chunks = [
{"messages": [ai1]},
{"messages": [ai2]},
]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-dup-noid"))
ai_msgs = _ai_events(events)
assert len(ai_msgs) == 2
# ===========================================================================
# Hardening — _serialize_message coverage
# ===========================================================================
class TestSerializeMessage:
def test_system_message(self):
msg = SystemMessage(content="You are a helpful assistant.", id="sys-1")
result = DeerFlowClient._serialize_message(msg)
assert result["type"] == "system"
assert result["content"] == "You are a helpful assistant."
assert result["id"] == "sys-1"
def test_unknown_message_type(self):
"""Non-standard message types serialize as 'unknown'."""
msg = MagicMock()
msg.id = "unk-1"
msg.content = "something"
# Not an instance of AIMessage/ToolMessage/HumanMessage/SystemMessage
type(msg).__name__ = "CustomMessage"
result = DeerFlowClient._serialize_message(msg)
assert result["type"] == "unknown"
assert result["id"] == "unk-1"
def test_ai_message_with_tool_calls(self):
msg = AIMessage(
content="",
id="ai-tc",
tool_calls=[{"name": "bash", "args": {"cmd": "ls"}, "id": "tc-1"}],
)
result = DeerFlowClient._serialize_message(msg)
assert result["type"] == "ai"
assert len(result["tool_calls"]) == 1
assert result["tool_calls"][0]["name"] == "bash"
def test_tool_message_non_string_content(self):
msg = ToolMessage(content={"key": "value"}, id="tm-1", tool_call_id="tc-1", name="tool")
result = DeerFlowClient._serialize_message(msg)
assert result["type"] == "tool"
assert isinstance(result["content"], str)
# ===========================================================================
# Hardening — upload / delete symlink attack
# ===========================================================================
class TestUploadDeleteSymlink:
def test_delete_upload_symlink_outside_dir(self, client):
"""A symlink in uploads dir pointing outside is caught by path traversal check."""
with tempfile.TemporaryDirectory() as tmp:
uploads_dir = Path(tmp) / "uploads"
uploads_dir.mkdir()
# Create a target file outside uploads dir.
outside = Path(tmp) / "secret.txt"
outside.write_text("sensitive data")
# Create a symlink inside uploads dir pointing to outside file.
link = uploads_dir / "harmless.txt"
link.symlink_to(outside)
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
# The resolved path of the symlink escapes uploads_dir,
# so path traversal check should catch it.
with pytest.raises(PathTraversalError):
client.delete_upload("thread-1", "harmless.txt")
# The outside file must NOT have been deleted.
assert outside.exists()
def test_upload_filename_with_spaces_and_unicode(self, client):
"""Files with spaces and unicode characters in names upload correctly."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
weird_name = "report 2024 数据.txt"
src_file = tmp_path / weird_name
src_file.write_text("data")
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.upload_files("thread-1", [src_file])
assert result["success"] is True
assert result["files"][0]["filename"] == weird_name
assert (uploads_dir / weird_name).exists()
# ===========================================================================
# Hardening — artifact edge cases
# ===========================================================================
class TestArtifactHardening:
def test_artifact_directory_rejected(self, client):
"""get_artifact rejects paths that resolve to a directory."""
with tempfile.TemporaryDirectory() as tmp:
paths = Paths(base_dir=tmp)
subdir = paths.sandbox_outputs_dir("t1") / "subdir"
subdir.mkdir(parents=True)
with patch("deerflow.client.get_paths", return_value=paths):
with pytest.raises(ValueError, match="not a file"):
client.get_artifact("t1", "mnt/user-data/outputs/subdir")
def test_artifact_leading_slash_stripped(self, client):
"""Paths with leading slash are handled correctly."""
with tempfile.TemporaryDirectory() as tmp:
paths = Paths(base_dir=tmp)
outputs = paths.sandbox_outputs_dir("t1")
outputs.mkdir(parents=True)
(outputs / "file.txt").write_text("content")
with patch("deerflow.client.get_paths", return_value=paths):
content, _mime = client.get_artifact("t1", "/mnt/user-data/outputs/file.txt")
assert content == b"content"
# ===========================================================================
# BUG DETECTION — tests that expose real bugs in client.py
# ===========================================================================
class TestUploadDuplicateFilenames:
"""Regression: upload_files must auto-rename duplicate basenames.
Previously it silently overwrote the first file with the second,
then reported both in the response while only one existed on disk.
Now duplicates are renamed (data.txt → data_1.txt) and the response
includes original_filename so the agent / caller can see what happened.
"""
def test_duplicate_filenames_auto_renamed(self, client):
"""Two files with same basename → second gets _1 suffix."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
dir_a = tmp_path / "a"
dir_b = tmp_path / "b"
dir_a.mkdir()
dir_b.mkdir()
(dir_a / "data.txt").write_text("version A")
(dir_b / "data.txt").write_text("version B")
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.upload_files("t-dup", [dir_a / "data.txt", dir_b / "data.txt"])
assert result["success"] is True
assert len(result["files"]) == 2
# Both files exist on disk with distinct names.
disk_files = sorted(p.name for p in uploads_dir.iterdir())
assert disk_files == ["data.txt", "data_1.txt"]
# First keeps original name, second is renamed.
assert result["files"][0]["filename"] == "data.txt"
assert "original_filename" not in result["files"][0]
assert result["files"][1]["filename"] == "data_1.txt"
assert result["files"][1]["original_filename"] == "data.txt"
# Content preserved correctly.
assert (uploads_dir / "data.txt").read_text() == "version A"
assert (uploads_dir / "data_1.txt").read_text() == "version B"
def test_triple_duplicate_increments_counter(self, client):
"""Three files with same basename → _1, _2 suffixes."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
for name in ["x", "y", "z"]:
d = tmp_path / name
d.mkdir()
(d / "report.csv").write_text(f"from {name}")
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.upload_files(
"t-triple",
[tmp_path / "x" / "report.csv", tmp_path / "y" / "report.csv", tmp_path / "z" / "report.csv"],
)
filenames = [f["filename"] for f in result["files"]]
assert filenames == ["report.csv", "report_1.csv", "report_2.csv"]
assert len(list(uploads_dir.iterdir())) == 3
def test_different_filenames_no_rename(self, client):
"""Non-duplicate filenames upload normally without rename."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
(tmp_path / "a.txt").write_text("aaa")
(tmp_path / "b.txt").write_text("bbb")
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.upload_files("t-ok", [tmp_path / "a.txt", tmp_path / "b.txt"])
assert result["success"] is True
assert len(result["files"]) == 2
assert all("original_filename" not in f for f in result["files"])
assert len(list(uploads_dir.iterdir())) == 2
class TestBugArtifactPrefixMatchTooLoose:
"""Regression: get_artifact must reject paths like ``mnt/user-data-evil/...``.
Previously ``startswith("mnt/user-data")`` matched ``"mnt/user-data-evil"``
because it was a string prefix, not a path-segment check.
"""
def test_non_canonical_prefix_rejected(self, client):
"""Paths that share a string prefix but differ at segment boundary are rejected."""
with pytest.raises(ValueError, match="must start with"):
client.get_artifact("t1", "mnt/user-data-evil/secret.txt")
def test_exact_prefix_without_subpath_accepted(self, client):
"""Bare 'mnt/user-data' is accepted (will later fail as directory, not at prefix)."""
with tempfile.TemporaryDirectory() as tmp:
paths = Paths(base_dir=tmp)
paths.sandbox_user_data_dir("t1").mkdir(parents=True)
with patch("deerflow.client.get_paths", return_value=paths):
# Accepted at prefix check, but fails because it's a directory.
with pytest.raises(ValueError, match="not a file"):
client.get_artifact("t1", "mnt/user-data")
class TestBugListUploadsDeadCode:
"""Regression: list_uploads works even when called on a fresh thread
(directory does not exist yet — returns empty without creating it).
"""
def test_list_uploads_on_fresh_thread(self, client):
"""list_uploads on a thread that never had uploads returns empty list."""
with tempfile.TemporaryDirectory() as tmp:
non_existent = Path(tmp) / "does-not-exist" / "uploads"
assert not non_existent.exists()
mock_paths = MagicMock()
mock_paths.sandbox_uploads_dir.return_value = non_existent
with patch("deerflow.uploads.manager.get_paths", return_value=mock_paths):
result = client.list_uploads("thread-fresh")
# Read path should NOT create the directory
assert not non_existent.exists()
assert result == {"files": [], "count": 0}
class TestBugAgentInvalidationInconsistency:
"""Regression: update_skill and update_mcp_config must reset both
_agent and _agent_config_key, just like reset_agent() does.
"""
def test_update_mcp_resets_config_key(self, client):
"""After update_mcp_config, both _agent and _agent_config_key are None."""
client._agent = MagicMock()
client._agent_config_key = ("model", True, False, False)
current_config = MagicMock()
current_config.skills = {}
reloaded = MagicMock()
reloaded.mcp_servers = {}
with tempfile.TemporaryDirectory() as tmp:
config_file = Path(tmp) / "ext.json"
config_file.write_text("{}")
with (
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
patch("deerflow.client.get_extensions_config", return_value=current_config),
patch("deerflow.client.reload_extensions_config", return_value=reloaded),
):
client.update_mcp_config({})
assert client._agent is None
assert client._agent_config_key is None
def test_update_skill_resets_config_key(self, client):
"""After update_skill, both _agent and _agent_config_key are None."""
client._agent = MagicMock()
client._agent_config_key = ("model", True, False, False)
skill = MagicMock()
skill.name = "s1"
updated = MagicMock()
updated.name = "s1"
updated.description = "d"
updated.license = "MIT"
updated.category = "c"
updated.enabled = False
ext_config = MagicMock()
ext_config.mcp_servers = {}
ext_config.skills = {}
with tempfile.TemporaryDirectory() as tmp:
config_file = Path(tmp) / "ext.json"
config_file.write_text("{}")
with (
patch("deerflow.skills.loader.load_skills", side_effect=[[skill], [updated]]),
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
patch("deerflow.client.get_extensions_config", return_value=ext_config),
patch("deerflow.client.reload_extensions_config"),
):
client.update_skill("s1", enabled=False)
assert client._agent is None
assert client._agent_config_key is None

View File

@@ -0,0 +1,781 @@
"""End-to-end tests for DeerFlowClient.
Middle tier of the test pyramid:
- Top: test_client_live.py — real LLM, needs API key
- Middle: test_client_e2e.py — real LLM + real modules ← THIS FILE
- Bottom: test_client.py — unit tests, mock everything
Core principle: use the real LLM from config.yaml, let config, middleware
chain, tool registration, file I/O, and event serialization all run for real.
Only DEER_FLOW_HOME is redirected to tmp_path for filesystem isolation.
Tests that call the LLM are marked ``requires_llm`` and skipped in CI.
File-management tests (upload/list/delete) don't need LLM and run everywhere.
"""
import json
import os
import uuid
import zipfile
import pytest
from dotenv import load_dotenv
from deerflow.client import DeerFlowClient, StreamEvent
from deerflow.config.app_config import AppConfig
from deerflow.config.model_config import ModelConfig
from deerflow.config.sandbox_config import SandboxConfig
# Load .env from project root (for OPENAI_API_KEY etc.)
load_dotenv(os.path.join(os.path.dirname(__file__), "../../.env"))
# ---------------------------------------------------------------------------
# Markers
# ---------------------------------------------------------------------------
requires_llm = pytest.mark.skipif(
os.getenv("CI", "").lower() in ("true", "1") or not os.getenv("OPENAI_API_KEY"),
reason="Requires LLM API key — skipped in CI or when OPENAI_API_KEY is unset",
)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _make_e2e_config() -> AppConfig:
"""Build a minimal AppConfig using real LLM credentials from environment.
All LLM connection details come from environment variables so that both
internal CI and external contributors can run the tests:
- ``E2E_MODEL_NAME`` (default: ``volcengine-ark``)
- ``E2E_MODEL_USE`` (default: ``langchain_openai:ChatOpenAI``)
- ``E2E_MODEL_ID`` (default: ``ep-20251211175242-llcmh``)
- ``E2E_BASE_URL`` (default: ``https://ark-cn-beijing.bytedance.net/api/v3``)
- ``OPENAI_API_KEY`` (required for LLM tests)
"""
return AppConfig(
models=[
ModelConfig(
name=os.getenv("E2E_MODEL_NAME", "volcengine-ark"),
display_name="E2E Test Model",
use=os.getenv("E2E_MODEL_USE", "langchain_openai:ChatOpenAI"),
model=os.getenv("E2E_MODEL_ID", "ep-20251211175242-llcmh"),
base_url=os.getenv("E2E_BASE_URL", "https://ark-cn-beijing.bytedance.net/api/v3"),
api_key=os.getenv("OPENAI_API_KEY", ""),
max_tokens=512,
temperature=0.7,
supports_thinking=False,
supports_reasoning_effort=False,
supports_vision=False,
)
],
sandbox=SandboxConfig(use="deerflow.sandbox.local:LocalSandboxProvider"),
)
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture()
def e2e_env(tmp_path, monkeypatch):
"""Isolated filesystem environment for E2E tests.
- DEER_FLOW_HOME → tmp_path (all thread data lands in a temp dir)
- Singletons reset so they pick up the new env
- Title/memory/summarization disabled to avoid extra LLM calls
- AppConfig built programmatically (avoids config.yaml param-name issues)
"""
# 1. Filesystem isolation
monkeypatch.setenv("DEER_FLOW_HOME", str(tmp_path))
monkeypatch.setattr("deerflow.config.paths._paths", None)
monkeypatch.setattr("deerflow.sandbox.sandbox_provider._default_sandbox_provider", None)
# 2. Inject a clean AppConfig via the global singleton.
config = _make_e2e_config()
monkeypatch.setattr("deerflow.config.app_config._app_config", config)
monkeypatch.setattr("deerflow.config.app_config._app_config_is_custom", True)
# 3. Disable title generation (extra LLM call, non-deterministic)
from deerflow.config.title_config import TitleConfig
monkeypatch.setattr("deerflow.config.title_config._title_config", TitleConfig(enabled=False))
# 4. Disable memory queueing (avoids background threads & file writes)
from deerflow.config.memory_config import MemoryConfig
monkeypatch.setattr(
"deerflow.agents.middlewares.memory_middleware.get_memory_config",
lambda: MemoryConfig(enabled=False),
)
# 5. Ensure summarization is off (default, but be explicit)
from deerflow.config.summarization_config import SummarizationConfig
monkeypatch.setattr("deerflow.config.summarization_config._summarization_config", SummarizationConfig(enabled=False))
# 6. Exclude TitleMiddleware from the chain.
# It triggers an extra LLM call to generate a thread title, which adds
# non-determinism and cost to E2E tests (title generation is already
# disabled via TitleConfig above, but the middleware still participates
# in the chain and can interfere with event ordering).
from deerflow.agents.lead_agent.agent import _build_middlewares as _original_build_middlewares
from deerflow.agents.middlewares.title_middleware import TitleMiddleware
def _sync_safe_build_middlewares(*args, **kwargs):
mws = _original_build_middlewares(*args, **kwargs)
return [m for m in mws if not isinstance(m, TitleMiddleware)]
monkeypatch.setattr("deerflow.client._build_middlewares", _sync_safe_build_middlewares)
return {"tmp_path": tmp_path}
@pytest.fixture()
def client(e2e_env):
"""A DeerFlowClient wired to the isolated e2e_env."""
return DeerFlowClient(checkpointer=None, thinking_enabled=False)
# ---------------------------------------------------------------------------
# Step 2: Basic streaming (requires LLM)
# ---------------------------------------------------------------------------
class TestBasicChat:
"""Basic chat and streaming behavior with real LLM."""
@requires_llm
def test_basic_chat(self, client):
"""chat() returns a non-empty text response."""
result = client.chat("Say exactly: pong")
assert isinstance(result, str)
assert len(result) > 0
@requires_llm
def test_stream_event_sequence(self, client):
"""stream() yields events: messages-tuple, values, and end."""
events = list(client.stream("Say hi"))
types = [e.type for e in events]
assert types[-1] == "end"
assert "messages-tuple" in types
assert "values" in types
@requires_llm
def test_stream_event_data_format(self, client):
"""Each event type has the expected data structure."""
events = list(client.stream("Say hello"))
for event in events:
assert isinstance(event, StreamEvent)
assert isinstance(event.type, str)
assert isinstance(event.data, dict)
if event.type == "messages-tuple" and event.data.get("type") == "ai":
assert "content" in event.data
assert "id" in event.data
elif event.type == "values":
assert "messages" in event.data
assert "artifacts" in event.data
elif event.type == "end":
assert event.data == {}
@requires_llm
def test_multi_turn_stateless(self, client):
"""Without checkpointer, two calls to the same thread_id are independent."""
tid = str(uuid.uuid4())
r1 = client.chat("Remember the number 42", thread_id=tid)
# Reset so agent is recreated (simulates no cross-turn state)
client.reset_agent()
r2 = client.chat("What number did I say?", thread_id=tid)
# Without a checkpointer the second call has no memory of the first.
# We can't assert exact content, but both should be non-empty.
assert isinstance(r1, str) and len(r1) > 0
assert isinstance(r2, str) and len(r2) > 0
# ---------------------------------------------------------------------------
# Step 3: Tool call flow (requires LLM)
# ---------------------------------------------------------------------------
class TestToolCallFlow:
"""Verify the LLM actually invokes tools through the real agent pipeline."""
@requires_llm
def test_tool_call_produces_events(self, client):
"""When the LLM decides to use a tool, we see tool call + result events."""
# Give a clear instruction that forces a tool call
events = list(client.stream(
"Use the bash tool to run: echo hello_e2e_test"
))
types = [e.type for e in events]
assert types[-1] == "end"
# Should have at least one tool call event
tool_call_events = [
e for e in events
if e.type == "messages-tuple" and e.data.get("tool_calls")
]
tool_result_events = [
e for e in events
if e.type == "messages-tuple" and e.data.get("type") == "tool"
]
assert len(tool_call_events) >= 1, "Expected at least one tool_call event"
assert len(tool_result_events) >= 1, "Expected at least one tool result event"
@requires_llm
def test_tool_call_event_structure(self, client):
"""Tool call events contain name, args, and id fields."""
events = list(client.stream(
"Use the read_file tool to read /mnt/user-data/workspace/nonexistent.txt"
))
tc_events = [
e for e in events
if e.type == "messages-tuple" and e.data.get("tool_calls")
]
if tc_events:
tc = tc_events[0].data["tool_calls"][0]
assert "name" in tc
assert "args" in tc
assert "id" in tc
# ---------------------------------------------------------------------------
# Step 4: File upload integration (no LLM needed for most)
# ---------------------------------------------------------------------------
class TestFileUploadIntegration:
"""Upload, list, and delete files through the real client path."""
def test_upload_files(self, e2e_env, tmp_path):
"""upload_files() copies files and returns metadata."""
test_file = tmp_path / "source" / "readme.txt"
test_file.parent.mkdir(parents=True, exist_ok=True)
test_file.write_text("Hello world")
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
tid = str(uuid.uuid4())
result = c.upload_files(tid, [test_file])
assert result["success"] is True
assert len(result["files"]) == 1
assert result["files"][0]["filename"] == "readme.txt"
# Physically exists
from deerflow.config.paths import get_paths
assert (get_paths().sandbox_uploads_dir(tid) / "readme.txt").exists()
def test_upload_duplicate_rename(self, e2e_env, tmp_path):
"""Uploading two files with the same name auto-renames the second."""
d1 = tmp_path / "dir1"
d2 = tmp_path / "dir2"
d1.mkdir()
d2.mkdir()
(d1 / "data.txt").write_text("content A")
(d2 / "data.txt").write_text("content B")
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
tid = str(uuid.uuid4())
result = c.upload_files(tid, [d1 / "data.txt", d2 / "data.txt"])
assert result["success"] is True
assert len(result["files"]) == 2
filenames = {f["filename"] for f in result["files"]}
assert "data.txt" in filenames
assert "data_1.txt" in filenames
def test_upload_list_and_delete(self, e2e_env, tmp_path):
"""Upload → list → delete → list lifecycle."""
test_file = tmp_path / "lifecycle.txt"
test_file.write_text("lifecycle test")
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
tid = str(uuid.uuid4())
c.upload_files(tid, [test_file])
listing = c.list_uploads(tid)
assert listing["count"] == 1
assert listing["files"][0]["filename"] == "lifecycle.txt"
del_result = c.delete_upload(tid, "lifecycle.txt")
assert del_result["success"] is True
listing = c.list_uploads(tid)
assert listing["count"] == 0
@requires_llm
def test_upload_then_chat(self, e2e_env, tmp_path):
"""Upload a file then ask the LLM about it — UploadsMiddleware injects file info."""
test_file = tmp_path / "source" / "notes.txt"
test_file.parent.mkdir(parents=True, exist_ok=True)
test_file.write_text("The secret code is 7749.")
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
tid = str(uuid.uuid4())
c.upload_files(tid, [test_file])
# Chat — the middleware should inject <uploaded_files> context
response = c.chat("What files are available?", thread_id=tid)
assert isinstance(response, str) and len(response) > 0
# ---------------------------------------------------------------------------
# Step 5: Lifecycle and configuration (no LLM needed)
# ---------------------------------------------------------------------------
class TestLifecycleAndConfig:
"""Agent recreation and configuration behavior."""
@requires_llm
def test_agent_recreation_on_config_change(self, client):
"""Changing thinking_enabled triggers agent recreation (different config key)."""
list(client.stream("hi"))
key1 = client._agent_config_key
# Stream with a different config override
client.reset_agent()
list(client.stream("hi", thinking_enabled=True))
key2 = client._agent_config_key
# thinking_enabled changed: False → True → keys differ
assert key1 != key2
def test_reset_agent_clears_state(self, e2e_env):
"""reset_agent() sets the internal agent to None."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
# Before any call, agent is None
assert c._agent is None
c.reset_agent()
assert c._agent is None
assert c._agent_config_key is None
def test_plan_mode_config_key(self, e2e_env):
"""plan_mode is part of the config key tuple."""
c = DeerFlowClient(checkpointer=None, plan_mode=False)
cfg1 = c._get_runnable_config("test-thread")
key1 = (
cfg1["configurable"]["model_name"],
cfg1["configurable"]["thinking_enabled"],
cfg1["configurable"]["is_plan_mode"],
cfg1["configurable"]["subagent_enabled"],
)
c2 = DeerFlowClient(checkpointer=None, plan_mode=True)
cfg2 = c2._get_runnable_config("test-thread")
key2 = (
cfg2["configurable"]["model_name"],
cfg2["configurable"]["thinking_enabled"],
cfg2["configurable"]["is_plan_mode"],
cfg2["configurable"]["subagent_enabled"],
)
assert key1 != key2
assert key1[2] is False
assert key2[2] is True
# ---------------------------------------------------------------------------
# Step 6: Middleware chain verification (requires LLM)
# ---------------------------------------------------------------------------
class TestMiddlewareChain:
"""Verify middleware side effects through real execution."""
@requires_llm
def test_thread_data_paths_in_state(self, client):
"""After streaming, thread directory paths are computed correctly."""
tid = str(uuid.uuid4())
events = list(client.stream("hi", thread_id=tid))
# The values event should contain messages
values_events = [e for e in events if e.type == "values"]
assert len(values_events) >= 1
# ThreadDataMiddleware should have set paths in the state.
# We verify the paths singleton can resolve the thread dir.
from deerflow.config.paths import get_paths
thread_dir = get_paths().thread_dir(tid)
assert str(thread_dir).endswith(tid)
@requires_llm
def test_stream_completes_without_middleware_errors(self, client):
"""Full middleware chain (ThreadData, Uploads, Sandbox, DanglingToolCall,
Memory, Clarification) executes without errors."""
events = list(client.stream("What is 1+1?"))
types = [e.type for e in events]
assert types[-1] == "end"
# Should have at least one AI response
ai_events = [
e for e in events
if e.type == "messages-tuple" and e.data.get("type") == "ai"
]
assert len(ai_events) >= 1
# ---------------------------------------------------------------------------
# Step 7: Error and boundary conditions
# ---------------------------------------------------------------------------
class TestErrorAndBoundary:
"""Error propagation and edge cases."""
def test_upload_nonexistent_file_raises(self, e2e_env):
"""Uploading a file that doesn't exist raises FileNotFoundError."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
with pytest.raises(FileNotFoundError):
c.upload_files("test-thread", ["/nonexistent/file.txt"])
def test_delete_nonexistent_upload_raises(self, e2e_env):
"""Deleting a file that doesn't exist raises FileNotFoundError."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
tid = str(uuid.uuid4())
# Ensure the uploads dir exists first
c.list_uploads(tid)
with pytest.raises(FileNotFoundError):
c.delete_upload(tid, "ghost.txt")
def test_artifact_path_traversal_blocked(self, e2e_env):
"""get_artifact blocks path traversal attempts."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
with pytest.raises(ValueError):
c.get_artifact("test-thread", "../../etc/passwd")
def test_upload_directory_rejected(self, e2e_env, tmp_path):
"""Uploading a directory (not a file) is rejected."""
d = tmp_path / "a_directory"
d.mkdir()
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
with pytest.raises(ValueError, match="not a file"):
c.upload_files("test-thread", [d])
@requires_llm
def test_empty_message_still_gets_response(self, client):
"""Even an empty-ish message should produce a valid event stream."""
events = list(client.stream(" "))
types = [e.type for e in events]
assert types[-1] == "end"
# ---------------------------------------------------------------------------
# Step 8: Artifact access (no LLM needed)
# ---------------------------------------------------------------------------
class TestArtifactAccess:
"""Read artifacts through get_artifact() with real filesystem."""
def test_get_artifact_happy_path(self, e2e_env):
"""Write a file to outputs, then read it back via get_artifact()."""
from deerflow.config.paths import get_paths
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
tid = str(uuid.uuid4())
# Create an output file in the thread's outputs directory
outputs_dir = get_paths().sandbox_outputs_dir(tid)
outputs_dir.mkdir(parents=True, exist_ok=True)
(outputs_dir / "result.txt").write_text("hello artifact")
data, mime = c.get_artifact(tid, "mnt/user-data/outputs/result.txt")
assert data == b"hello artifact"
assert "text" in mime
def test_get_artifact_nested_path(self, e2e_env):
"""Artifacts in subdirectories are accessible."""
from deerflow.config.paths import get_paths
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
tid = str(uuid.uuid4())
outputs_dir = get_paths().sandbox_outputs_dir(tid)
sub = outputs_dir / "charts"
sub.mkdir(parents=True, exist_ok=True)
(sub / "data.json").write_text('{"x": 1}')
data, mime = c.get_artifact(tid, "mnt/user-data/outputs/charts/data.json")
assert b'"x"' in data
assert "json" in mime
def test_get_artifact_nonexistent_raises(self, e2e_env):
"""Reading a nonexistent artifact raises FileNotFoundError."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
with pytest.raises(FileNotFoundError):
c.get_artifact("test-thread", "mnt/user-data/outputs/ghost.txt")
def test_get_artifact_traversal_within_prefix_blocked(self, e2e_env):
"""Path traversal within the valid prefix is still blocked."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
with pytest.raises((PermissionError, ValueError, FileNotFoundError)):
c.get_artifact("test-thread", "mnt/user-data/outputs/../../etc/passwd")
# ---------------------------------------------------------------------------
# Step 9: Skill installation (no LLM needed)
# ---------------------------------------------------------------------------
class TestSkillInstallation:
"""install_skill() with real ZIP handling and filesystem."""
@pytest.fixture(autouse=True)
def _isolate_skills_dir(self, tmp_path, monkeypatch):
"""Redirect skill installation to a temp directory."""
skills_root = tmp_path / "skills"
(skills_root / "public").mkdir(parents=True)
(skills_root / "custom").mkdir(parents=True)
monkeypatch.setattr(
"deerflow.skills.installer.get_skills_root_path",
lambda: skills_root,
)
self._skills_root = skills_root
@staticmethod
def _make_skill_zip(tmp_path, skill_name="test-e2e-skill"):
"""Create a minimal valid .skill archive."""
skill_dir = tmp_path / "build" / skill_name
skill_dir.mkdir(parents=True)
(skill_dir / "SKILL.md").write_text(
f"---\nname: {skill_name}\ndescription: E2E test skill\n---\n\nTest content.\n"
)
archive_path = tmp_path / f"{skill_name}.skill"
with zipfile.ZipFile(archive_path, "w") as zf:
for file in skill_dir.rglob("*"):
zf.write(file, file.relative_to(tmp_path / "build"))
return archive_path
def test_install_skill_success(self, e2e_env, tmp_path):
"""A valid .skill archive installs to the custom skills directory."""
archive = self._make_skill_zip(tmp_path)
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
result = c.install_skill(archive)
assert result["success"] is True
assert result["skill_name"] == "test-e2e-skill"
assert (self._skills_root / "custom" / "test-e2e-skill" / "SKILL.md").exists()
def test_install_skill_duplicate_rejected(self, e2e_env, tmp_path):
"""Installing the same skill twice raises ValueError."""
archive = self._make_skill_zip(tmp_path)
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
c.install_skill(archive)
with pytest.raises(ValueError, match="already exists"):
c.install_skill(archive)
def test_install_skill_invalid_extension(self, e2e_env, tmp_path):
"""A file without .skill extension is rejected."""
bad_file = tmp_path / "not_a_skill.zip"
bad_file.write_bytes(b"PK\x03\x04") # ZIP magic bytes
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
with pytest.raises(ValueError, match=".skill extension"):
c.install_skill(bad_file)
def test_install_skill_missing_frontmatter(self, e2e_env, tmp_path):
"""A .skill archive without valid SKILL.md frontmatter is rejected."""
skill_dir = tmp_path / "build" / "bad-skill"
skill_dir.mkdir(parents=True)
(skill_dir / "SKILL.md").write_text("No frontmatter here.")
archive = tmp_path / "bad-skill.skill"
with zipfile.ZipFile(archive, "w") as zf:
for file in skill_dir.rglob("*"):
zf.write(file, file.relative_to(tmp_path / "build"))
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
with pytest.raises(ValueError, match="Invalid skill"):
c.install_skill(archive)
def test_install_skill_nonexistent_file(self, e2e_env):
"""Installing from a nonexistent path raises FileNotFoundError."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
with pytest.raises(FileNotFoundError):
c.install_skill("/nonexistent/skill.skill")
# ---------------------------------------------------------------------------
# Step 10: Configuration management (no LLM needed)
# ---------------------------------------------------------------------------
class TestConfigManagement:
"""Config queries and updates through real code paths."""
def test_list_models_returns_injected_config(self, e2e_env):
"""list_models() returns the model from the injected AppConfig."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
result = c.list_models()
assert "models" in result
assert len(result["models"]) == 1
assert result["models"][0]["name"] == "volcengine-ark"
assert result["models"][0]["display_name"] == "E2E Test Model"
def test_get_model_found(self, e2e_env):
"""get_model() returns the model when it exists."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
model = c.get_model("volcengine-ark")
assert model is not None
assert model["name"] == "volcengine-ark"
assert model["supports_thinking"] is False
def test_get_model_not_found(self, e2e_env):
"""get_model() returns None for nonexistent model."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
assert c.get_model("nonexistent-model") is None
def test_list_skills_returns_list(self, e2e_env):
"""list_skills() returns a dict with 'skills' key from real directory scan."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
result = c.list_skills()
assert "skills" in result
assert isinstance(result["skills"], list)
# The real skills/ directory should have some public skills
assert len(result["skills"]) > 0
def test_get_skill_found(self, e2e_env):
"""get_skill() returns skill info for a known public skill."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
# 'deep-research' is a built-in public skill
skill = c.get_skill("deep-research")
if skill is not None:
assert skill["name"] == "deep-research"
assert "description" in skill
assert "enabled" in skill
def test_get_skill_not_found(self, e2e_env):
"""get_skill() returns None for nonexistent skill."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
assert c.get_skill("nonexistent-skill-xyz") is None
def test_get_mcp_config_returns_dict(self, e2e_env):
"""get_mcp_config() returns a dict with 'mcp_servers' key."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
result = c.get_mcp_config()
assert "mcp_servers" in result
assert isinstance(result["mcp_servers"], dict)
def test_update_mcp_config_writes_and_invalidates(self, e2e_env, tmp_path, monkeypatch):
"""update_mcp_config() writes extensions_config.json and invalidates the agent."""
# Set up a writable extensions_config.json
config_file = tmp_path / "extensions_config.json"
config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
# Force reload so the singleton picks up our test file
from deerflow.config.extensions_config import reload_extensions_config
reload_extensions_config()
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
# Simulate a cached agent
c._agent = "fake-agent-placeholder"
c._agent_config_key = ("a", "b", "c", "d")
result = c.update_mcp_config({"test-server": {"enabled": True, "type": "stdio", "command": "echo"}})
assert "mcp_servers" in result
# Agent should be invalidated
assert c._agent is None
assert c._agent_config_key is None
# File should be written
written = json.loads(config_file.read_text())
assert "test-server" in written["mcpServers"]
def test_update_skill_writes_and_invalidates(self, e2e_env, tmp_path, monkeypatch):
"""update_skill() writes extensions_config.json and invalidates the agent."""
config_file = tmp_path / "extensions_config.json"
config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
from deerflow.config.extensions_config import reload_extensions_config
reload_extensions_config()
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
c._agent = "fake-agent-placeholder"
c._agent_config_key = ("a", "b", "c", "d")
# Use a real skill name from the public skills directory
skills = c.list_skills()
if not skills["skills"]:
pytest.skip("No skills available for testing")
skill_name = skills["skills"][0]["name"]
result = c.update_skill(skill_name, enabled=False)
assert result["name"] == skill_name
assert result["enabled"] is False
# Agent should be invalidated
assert c._agent is None
assert c._agent_config_key is None
def test_update_skill_nonexistent_raises(self, e2e_env, tmp_path, monkeypatch):
"""update_skill() raises ValueError for nonexistent skill."""
config_file = tmp_path / "extensions_config.json"
config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
from deerflow.config.extensions_config import reload_extensions_config
reload_extensions_config()
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
with pytest.raises(ValueError, match="not found"):
c.update_skill("nonexistent-skill-xyz", enabled=True)
# ---------------------------------------------------------------------------
# Step 11: Memory access (no LLM needed)
# ---------------------------------------------------------------------------
class TestMemoryAccess:
"""Memory system queries through real code paths."""
def test_get_memory_returns_dict(self, e2e_env):
"""get_memory() returns a dict (may be empty initial state)."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
result = c.get_memory()
assert isinstance(result, dict)
def test_reload_memory_returns_dict(self, e2e_env):
"""reload_memory() forces reload and returns a dict."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
result = c.reload_memory()
assert isinstance(result, dict)
def test_get_memory_config_fields(self, e2e_env):
"""get_memory_config() returns expected config fields."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
result = c.get_memory_config()
assert "enabled" in result
assert "storage_path" in result
assert "debounce_seconds" in result
assert "max_facts" in result
assert "fact_confidence_threshold" in result
assert "injection_enabled" in result
assert "max_injection_tokens" in result
def test_get_memory_status_combines_config_and_data(self, e2e_env):
"""get_memory_status() returns both 'config' and 'data' keys."""
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
result = c.get_memory_status()
assert "config" in result
assert "data" in result
assert "enabled" in result["config"]
assert isinstance(result["data"], dict)

View File

@@ -1,8 +1,8 @@
from pathlib import Path
from fastapi import HTTPException
import pytest
from app.gateway.routers.skills import _resolve_skill_dir_from_archive_root
from deerflow.skills.installer import resolve_skill_dir_from_archive
def _write_skill(skill_dir: Path) -> None:
@@ -23,24 +23,19 @@ def test_resolve_skill_dir_ignores_macosx_wrapper(tmp_path: Path) -> None:
_write_skill(tmp_path / "demo-skill")
(tmp_path / "__MACOSX").mkdir()
assert _resolve_skill_dir_from_archive_root(tmp_path) == tmp_path / "demo-skill"
assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "demo-skill"
def test_resolve_skill_dir_ignores_hidden_top_level_entries(tmp_path: Path) -> None:
_write_skill(tmp_path / "demo-skill")
(tmp_path / ".DS_Store").write_text("metadata", encoding="utf-8")
assert _resolve_skill_dir_from_archive_root(tmp_path) == tmp_path / "demo-skill"
assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "demo-skill"
def test_resolve_skill_dir_rejects_archive_with_only_metadata(tmp_path: Path) -> None:
(tmp_path / "__MACOSX").mkdir()
(tmp_path / ".DS_Store").write_text("metadata", encoding="utf-8")
try:
_resolve_skill_dir_from_archive_root(tmp_path)
except HTTPException as error:
assert error.status_code == 400
assert error.detail == "Skill archive is empty"
else:
raise AssertionError("Expected HTTPException for metadata-only archive")
with pytest.raises(ValueError, match="empty"):
resolve_skill_dir_from_archive(tmp_path)

View File

@@ -0,0 +1,220 @@
"""Tests for deerflow.skills.installer — shared skill installation logic."""
import stat
import zipfile
from pathlib import Path
import pytest
from deerflow.skills.installer import (
install_skill_from_archive,
is_symlink_member,
is_unsafe_zip_member,
resolve_skill_dir_from_archive,
safe_extract_skill_archive,
should_ignore_archive_entry,
)
# ---------------------------------------------------------------------------
# is_unsafe_zip_member
# ---------------------------------------------------------------------------
class TestIsUnsafeZipMember:
def test_absolute_path(self):
info = zipfile.ZipInfo("/etc/passwd")
assert is_unsafe_zip_member(info) is True
def test_dotdot_traversal(self):
info = zipfile.ZipInfo("foo/../../../etc/passwd")
assert is_unsafe_zip_member(info) is True
def test_safe_member(self):
info = zipfile.ZipInfo("my-skill/SKILL.md")
assert is_unsafe_zip_member(info) is False
def test_empty_filename(self):
info = zipfile.ZipInfo("")
assert is_unsafe_zip_member(info) is False
# ---------------------------------------------------------------------------
# is_symlink_member
# ---------------------------------------------------------------------------
class TestIsSymlinkMember:
def test_detects_symlink(self):
info = zipfile.ZipInfo("link.txt")
info.external_attr = (stat.S_IFLNK | 0o777) << 16
assert is_symlink_member(info) is True
def test_regular_file(self):
info = zipfile.ZipInfo("file.txt")
info.external_attr = (stat.S_IFREG | 0o644) << 16
assert is_symlink_member(info) is False
# ---------------------------------------------------------------------------
# should_ignore_archive_entry
# ---------------------------------------------------------------------------
class TestShouldIgnoreArchiveEntry:
def test_macosx_ignored(self):
assert should_ignore_archive_entry(Path("__MACOSX")) is True
def test_dotfile_ignored(self):
assert should_ignore_archive_entry(Path(".DS_Store")) is True
def test_normal_dir_not_ignored(self):
assert should_ignore_archive_entry(Path("my-skill")) is False
# ---------------------------------------------------------------------------
# resolve_skill_dir_from_archive
# ---------------------------------------------------------------------------
class TestResolveSkillDir:
def test_single_dir(self, tmp_path):
(tmp_path / "my-skill").mkdir()
(tmp_path / "my-skill" / "SKILL.md").write_text("content")
assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "my-skill"
def test_with_macosx(self, tmp_path):
(tmp_path / "my-skill").mkdir()
(tmp_path / "my-skill" / "SKILL.md").write_text("content")
(tmp_path / "__MACOSX").mkdir()
assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "my-skill"
def test_empty_after_filter(self, tmp_path):
(tmp_path / "__MACOSX").mkdir()
(tmp_path / ".DS_Store").write_text("meta")
with pytest.raises(ValueError, match="empty"):
resolve_skill_dir_from_archive(tmp_path)
# ---------------------------------------------------------------------------
# safe_extract_skill_archive
# ---------------------------------------------------------------------------
class TestSafeExtract:
def _make_zip(self, tmp_path, members: dict[str, str | bytes]) -> Path:
"""Create a zip with given filename->content entries."""
zip_path = tmp_path / "test.zip"
with zipfile.ZipFile(zip_path, "w") as zf:
for name, content in members.items():
if isinstance(content, str):
content = content.encode()
zf.writestr(name, content)
return zip_path
def test_rejects_zip_bomb(self, tmp_path):
zip_path = self._make_zip(tmp_path, {"big.txt": "x" * 1000})
dest = tmp_path / "out"
dest.mkdir()
with zipfile.ZipFile(zip_path) as zf:
with pytest.raises(ValueError, match="too large"):
safe_extract_skill_archive(zf, dest, max_total_size=100)
def test_rejects_absolute_path(self, tmp_path):
zip_path = tmp_path / "abs.zip"
with zipfile.ZipFile(zip_path, "w") as zf:
zf.writestr("/etc/passwd", "root:x:0:0")
dest = tmp_path / "out"
dest.mkdir()
with zipfile.ZipFile(zip_path) as zf:
with pytest.raises(ValueError, match="unsafe"):
safe_extract_skill_archive(zf, dest)
def test_skips_symlinks(self, tmp_path):
zip_path = tmp_path / "sym.zip"
with zipfile.ZipFile(zip_path, "w") as zf:
info = zipfile.ZipInfo("link.txt")
info.external_attr = (stat.S_IFLNK | 0o777) << 16
zf.writestr(info, "/etc/passwd")
zf.writestr("normal.txt", "hello")
dest = tmp_path / "out"
dest.mkdir()
with zipfile.ZipFile(zip_path) as zf:
safe_extract_skill_archive(zf, dest)
assert (dest / "normal.txt").exists()
assert not (dest / "link.txt").exists()
def test_normal_archive(self, tmp_path):
zip_path = self._make_zip(tmp_path, {
"my-skill/SKILL.md": "---\nname: test\ndescription: x\n---\n# Test",
"my-skill/README.md": "readme",
})
dest = tmp_path / "out"
dest.mkdir()
with zipfile.ZipFile(zip_path) as zf:
safe_extract_skill_archive(zf, dest)
assert (dest / "my-skill" / "SKILL.md").exists()
assert (dest / "my-skill" / "README.md").exists()
# ---------------------------------------------------------------------------
# install_skill_from_archive (full integration)
# ---------------------------------------------------------------------------
class TestInstallSkillFromArchive:
def _make_skill_zip(self, tmp_path: Path, skill_name: str = "test-skill") -> Path:
"""Create a valid .skill archive."""
zip_path = tmp_path / f"{skill_name}.skill"
with zipfile.ZipFile(zip_path, "w") as zf:
zf.writestr(
f"{skill_name}/SKILL.md",
f"---\nname: {skill_name}\ndescription: A test skill\n---\n\n# {skill_name}\n",
)
return zip_path
def test_success(self, tmp_path):
zip_path = self._make_skill_zip(tmp_path)
skills_root = tmp_path / "skills"
skills_root.mkdir()
result = install_skill_from_archive(zip_path, skills_root=skills_root)
assert result["success"] is True
assert result["skill_name"] == "test-skill"
assert (skills_root / "custom" / "test-skill" / "SKILL.md").exists()
def test_duplicate_raises(self, tmp_path):
zip_path = self._make_skill_zip(tmp_path)
skills_root = tmp_path / "skills"
(skills_root / "custom" / "test-skill").mkdir(parents=True)
with pytest.raises(ValueError, match="already exists"):
install_skill_from_archive(zip_path, skills_root=skills_root)
def test_invalid_extension(self, tmp_path):
bad_path = tmp_path / "bad.zip"
bad_path.write_text("not a skill")
with pytest.raises(ValueError, match=".skill"):
install_skill_from_archive(bad_path)
def test_bad_frontmatter(self, tmp_path):
zip_path = tmp_path / "bad.skill"
with zipfile.ZipFile(zip_path, "w") as zf:
zf.writestr("bad/SKILL.md", "no frontmatter here")
skills_root = tmp_path / "skills"
skills_root.mkdir()
with pytest.raises(ValueError, match="Invalid skill"):
install_skill_from_archive(zip_path, skills_root=skills_root)
def test_nonexistent_file(self):
with pytest.raises(FileNotFoundError):
install_skill_from_archive(Path("/nonexistent/path.skill"))
def test_macosx_filtered_during_resolve(self, tmp_path):
"""Archive with __MACOSX dir still installs correctly."""
zip_path = tmp_path / "mac.skill"
with zipfile.ZipFile(zip_path, "w") as zf:
zf.writestr("my-skill/SKILL.md", "---\nname: my-skill\ndescription: desc\n---\n# My Skill\n")
zf.writestr("__MACOSX/._my-skill", "meta")
skills_root = tmp_path / "skills"
skills_root.mkdir()
result = install_skill_from_archive(zip_path, skills_root=skills_root)
assert result["success"] is True
assert result["skill_name"] == "my-skill"

View File

@@ -0,0 +1,146 @@
"""Tests for deerflow.uploads.manager — shared upload management logic."""
import pytest
from deerflow.uploads.manager import (
PathTraversalError,
claim_unique_filename,
delete_file_safe,
list_files_in_dir,
normalize_filename,
validate_path_traversal,
)
# ---------------------------------------------------------------------------
# normalize_filename
# ---------------------------------------------------------------------------
class TestNormalizeFilename:
def test_safe_filename(self):
assert normalize_filename("report.pdf") == "report.pdf"
def test_strips_path_components(self):
assert normalize_filename("../../etc/passwd") == "passwd"
def test_rejects_empty(self):
with pytest.raises(ValueError, match="empty"):
normalize_filename("")
def test_rejects_dot_dot(self):
with pytest.raises(ValueError, match="unsafe"):
normalize_filename("..")
def test_strips_separators(self):
assert normalize_filename("path/to/file.txt") == "file.txt"
def test_dot_only(self):
with pytest.raises(ValueError, match="unsafe"):
normalize_filename(".")
# ---------------------------------------------------------------------------
# claim_unique_filename
# ---------------------------------------------------------------------------
class TestDeduplicateFilename:
def test_no_collision(self):
seen: set[str] = set()
assert claim_unique_filename("data.txt", seen) == "data.txt"
assert "data.txt" in seen
def test_single_collision(self):
seen = {"data.txt"}
assert claim_unique_filename("data.txt", seen) == "data_1.txt"
assert "data_1.txt" in seen
def test_triple_collision(self):
seen = {"data.txt", "data_1.txt", "data_2.txt"}
assert claim_unique_filename("data.txt", seen) == "data_3.txt"
assert "data_3.txt" in seen
def test_mutates_seen(self):
seen: set[str] = set()
claim_unique_filename("a.txt", seen)
claim_unique_filename("a.txt", seen)
assert seen == {"a.txt", "a_1.txt"}
# ---------------------------------------------------------------------------
# validate_path_traversal
# ---------------------------------------------------------------------------
class TestValidatePathTraversal:
def test_inside_base_ok(self, tmp_path):
child = tmp_path / "file.txt"
child.touch()
validate_path_traversal(child, tmp_path) # no exception
def test_outside_base_raises(self, tmp_path):
outside = tmp_path / ".." / "evil.txt"
with pytest.raises(PathTraversalError, match="traversal"):
validate_path_traversal(outside, tmp_path)
def test_symlink_escape(self, tmp_path):
target = tmp_path.parent / "secret.txt"
target.touch()
link = tmp_path / "escape"
link.symlink_to(target)
with pytest.raises(PathTraversalError, match="traversal"):
validate_path_traversal(link, tmp_path)
# ---------------------------------------------------------------------------
# list_files_in_dir
# ---------------------------------------------------------------------------
class TestListFilesInDir:
def test_empty_dir(self, tmp_path):
result = list_files_in_dir(tmp_path)
assert result == {"files": [], "count": 0}
def test_nonexistent_dir(self, tmp_path):
result = list_files_in_dir(tmp_path / "nope")
assert result == {"files": [], "count": 0}
def test_multiple_files_sorted(self, tmp_path):
(tmp_path / "b.txt").write_text("b")
(tmp_path / "a.txt").write_text("a")
result = list_files_in_dir(tmp_path)
assert result["count"] == 2
assert result["files"][0]["filename"] == "a.txt"
assert result["files"][1]["filename"] == "b.txt"
for f in result["files"]:
assert set(f.keys()) == {"filename", "size", "path", "extension", "modified"}
def test_ignores_subdirectories(self, tmp_path):
(tmp_path / "file.txt").write_text("data")
(tmp_path / "subdir").mkdir()
result = list_files_in_dir(tmp_path)
assert result["count"] == 1
assert result["files"][0]["filename"] == "file.txt"
# ---------------------------------------------------------------------------
# delete_file_safe
# ---------------------------------------------------------------------------
class TestDeleteFileSafe:
def test_delete_existing_file(self, tmp_path):
f = tmp_path / "test.txt"
f.write_text("data")
result = delete_file_safe(tmp_path, "test.txt")
assert result["success"] is True
assert not f.exists()
def test_delete_nonexistent_raises(self, tmp_path):
with pytest.raises(FileNotFoundError):
delete_file_safe(tmp_path, "nope.txt")
def test_delete_traversal_raises(self, tmp_path):
with pytest.raises(PathTraversalError, match="traversal"):
delete_file_safe(tmp_path, "../outside.txt")

View File

@@ -19,6 +19,7 @@ def test_upload_files_writes_thread_storage_and_skips_local_sandbox_sync(tmp_pat
with (
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
patch.object(uploads, "get_sandbox_provider", return_value=provider),
):
file = UploadFile(filename="notes.txt", file=BytesIO(b"hello uploads"))
@@ -48,6 +49,7 @@ def test_upload_files_syncs_non_local_sandbox_and_marks_markdown_file(tmp_path):
with (
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
patch.object(uploads, "get_sandbox_provider", return_value=provider),
patch.object(uploads, "convert_file_to_markdown", AsyncMock(side_effect=fake_convert)),
):
@@ -78,6 +80,7 @@ def test_upload_files_rejects_dotdot_and_dot_filenames(tmp_path):
with (
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
patch.object(uploads, "get_sandbox_provider", return_value=provider),
):
# These filenames must be rejected outright