mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-02 22:02:13 +08:00
refactor: extract shared skill installer and upload manager to harness (#1202)
* refactor: extract shared skill installer and upload manager to harness Move duplicated business logic from Gateway routers and Client into shared harness modules, eliminating code duplication. New shared modules: - deerflow.skills.installer: 6 functions (zip security, extraction, install) - deerflow.uploads.manager: 7 functions (normalize, deduplicate, validate, list, delete, get_uploads_dir, ensure_uploads_dir) Key improvements: - SkillAlreadyExistsError replaces stringly-typed 409 status routing - normalize_filename rejects backslash-containing filenames - Read paths (list/delete) no longer mkdir via get_uploads_dir - Write paths use ensure_uploads_dir for explicit directory creation - list_files_in_dir does stat inside scandir context (no re-stat) - install_skill_from_archive uses single is_file() check (one syscall) - Fix agent config key not reset on update_mcp_config/update_skill Tests: 42 new (22 installer + 20 upload manager) + client hardening * refactor: centralize upload URL construction and clean up installer - Extract upload_virtual_path(), upload_artifact_url(), enrich_file_listing() into shared manager.py, eliminating 6 duplicated URL constructions across Gateway router and Client - Derive all upload URLs from VIRTUAL_PATH_PREFIX constant instead of hardcoded "mnt/user-data/uploads" strings - Eliminate TOCTOU pre-checks and double file read in installer — single ZipFile() open with exception handling replaces is_file() + is_zipfile() + ZipFile() sequence - Add missing re-exports: ensure_uploads_dir in uploads/__init__.py, SkillAlreadyExistsError in skills/__init__.py - Remove redundant .lower() on already-lowercase CONVERTIBLE_EXTENSIONS - Hoist sandbox_uploads_dir(thread_id) before loop in uploads router * fix: add input validation for thread_id and filename length - Reject thread_id containing unsafe filesystem characters (only allow alphanumeric, hyphens, underscores, dots) — prevents 500 on inputs like <script> or shell metacharacters - Reject filenames longer than 255 bytes (OS limit) in normalize_filename - Gateway upload router maps ValueError to 400 for invalid thread_id * fix: address PR review — symlink safety, input validation coverage, error ordering - list_files_in_dir: use follow_symlinks=False to prevent symlink metadata leakage; check is_dir() instead of exists() for non-directory paths - install_skill_from_archive: restore is_file() pre-check before extension validation so error messages match the documented exception contract - validate_thread_id: move from ensure_uploads_dir to get_uploads_dir so all entry points (upload/list/delete) are protected - delete_uploaded_file: catch ValueError from thread_id validation (was 500) - requires_llm marker: also skip when OPENAI_API_KEY is unset - e2e fixture: update TitleMiddleware exclusion comment (kept filtering — middleware triggers extra LLM calls that add non-determinism to tests) * chore: revert uv.lock to main — no dependency changes in this PR * fix: use monkeypatch for global config in e2e fixture to prevent test pollution The e2e_env fixture was calling set_title_config() and set_summarization_config() directly, which mutated global singletons without automatic cleanup. When pytest ran test_client_e2e.py before test_title_middleware_core_logic.py, the leaked enabled=False caused 5 title tests to fail in CI. Switched to monkeypatch.setattr on the module-level private variables so pytest restores the originals after each test. * fix: address code review — URL encoding, API consistency, test isolation - upload_artifact_url: percent-encode filename to handle spaces/#/? - deduplicate_filename: mutate seen set in place (caller no longer needs manual .add() — less error-prone API) - list_files_in_dir: document that size is int, enrich stringifies - e2e fixture: monkeypatch _app_config instead of set_app_config() to prevent global singleton pollution (same pattern as title/summarization fix) - _make_e2e_config: read LLM connection details from env vars so external contributors can override defaults - Update tests to match new deduplicate_filename contract * docs: rewrite RFC in English and add alternatives/breaking changes sections * fix: address code review feedback on PR #1202 - Rename deduplicate_filename to claim_unique_filename to make the in-place set mutation explicit in the function name - Replace PermissionError with PathTraversalError(ValueError) for path traversal detection — malformed input is 400, not 403 * fix: set _app_config_is_custom in e2e test fixture to prevent config.yaml lookup in CI --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
This commit is contained in:
@@ -1,9 +1,5 @@
|
||||
import json
|
||||
import logging
|
||||
import shutil
|
||||
import stat
|
||||
import tempfile
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi import APIRouter, HTTPException
|
||||
@@ -12,80 +8,10 @@ from pydantic import BaseModel, Field
|
||||
from app.gateway.path_utils import resolve_thread_virtual_path
|
||||
from deerflow.config.extensions_config import ExtensionsConfig, SkillStateConfig, get_extensions_config, reload_extensions_config
|
||||
from deerflow.skills import Skill, load_skills
|
||||
from deerflow.skills.loader import get_skills_root_path
|
||||
from deerflow.skills.validation import _validate_skill_frontmatter
|
||||
from deerflow.skills.installer import SkillAlreadyExistsError, install_skill_from_archive
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _is_unsafe_zip_member(info: zipfile.ZipInfo) -> bool:
|
||||
"""Return True if the zip member path is absolute or attempts directory traversal."""
|
||||
name = info.filename
|
||||
if not name:
|
||||
return False
|
||||
path = Path(name)
|
||||
if path.is_absolute():
|
||||
return True
|
||||
if ".." in path.parts:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def _is_symlink_member(info: zipfile.ZipInfo) -> bool:
|
||||
"""Detect symlinks based on the external attributes stored in the ZipInfo."""
|
||||
# Upper 16 bits of external_attr contain the Unix file mode when created on Unix.
|
||||
mode = info.external_attr >> 16
|
||||
return stat.S_ISLNK(mode)
|
||||
|
||||
|
||||
def _safe_extract_skill_archive(
|
||||
zip_ref: zipfile.ZipFile,
|
||||
dest_path: Path,
|
||||
max_total_size: int = 512 * 1024 * 1024,
|
||||
) -> None:
|
||||
"""Safely extract a skill archive into dest_path with basic protections.
|
||||
|
||||
Protections:
|
||||
- Reject absolute paths and directory traversal (..).
|
||||
- Skip symlink entries instead of materialising them.
|
||||
- Enforce a hard limit on total uncompressed size to mitigate zip bombs.
|
||||
"""
|
||||
dest_root = Path(dest_path).resolve()
|
||||
total_size = 0
|
||||
|
||||
for info in zip_ref.infolist():
|
||||
# Reject absolute paths or any path that attempts directory traversal.
|
||||
if _is_unsafe_zip_member(info):
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Archive contains unsafe member path: {info.filename!r}",
|
||||
)
|
||||
|
||||
# Skip any symlink entries instead of materialising them on disk.
|
||||
if _is_symlink_member(info):
|
||||
logger.warning("Skipping symlink entry in skill archive: %s", info.filename)
|
||||
continue
|
||||
|
||||
# Basic unzip-bomb defence: bound the total uncompressed size we will write.
|
||||
total_size += max(info.file_size, 0)
|
||||
if total_size > max_total_size:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail="Skill archive is too large or appears highly compressed.",
|
||||
)
|
||||
|
||||
member_path = dest_root / info.filename
|
||||
member_path_parent = member_path.parent
|
||||
member_path_parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
if info.is_dir():
|
||||
member_path.mkdir(parents=True, exist_ok=True)
|
||||
continue
|
||||
|
||||
with zip_ref.open(info) as src, open(member_path, "wb") as dst:
|
||||
shutil.copyfileobj(src, dst)
|
||||
|
||||
|
||||
router = APIRouter(prefix="/api", tags=["skills"])
|
||||
|
||||
|
||||
@@ -126,19 +52,6 @@ class SkillInstallResponse(BaseModel):
|
||||
message: str = Field(..., description="Installation result message")
|
||||
|
||||
|
||||
def _should_ignore_archive_entry(path: Path) -> bool:
|
||||
return path.name.startswith(".") or path.name == "__MACOSX"
|
||||
|
||||
|
||||
def _resolve_skill_dir_from_archive_root(temp_path: Path) -> Path:
|
||||
extracted_items = [item for item in temp_path.iterdir() if not _should_ignore_archive_entry(item)]
|
||||
if len(extracted_items) == 0:
|
||||
raise HTTPException(status_code=400, detail="Skill archive is empty")
|
||||
if len(extracted_items) == 1 and extracted_items[0].is_dir():
|
||||
return extracted_items[0]
|
||||
return temp_path
|
||||
|
||||
|
||||
def _skill_to_response(skill: Skill) -> SkillResponse:
|
||||
"""Convert a Skill object to a SkillResponse."""
|
||||
return SkillResponse(
|
||||
@@ -157,37 +70,7 @@ def _skill_to_response(skill: Skill) -> SkillResponse:
|
||||
description="Retrieve a list of all available skills from both public and custom directories.",
|
||||
)
|
||||
async def list_skills() -> SkillsListResponse:
|
||||
"""List all available skills.
|
||||
|
||||
Returns all skills regardless of their enabled status.
|
||||
|
||||
Returns:
|
||||
A list of all skills with their metadata.
|
||||
|
||||
Example Response:
|
||||
```json
|
||||
{
|
||||
"skills": [
|
||||
{
|
||||
"name": "PDF Processing",
|
||||
"description": "Extract and analyze PDF content",
|
||||
"license": "MIT",
|
||||
"category": "public",
|
||||
"enabled": true
|
||||
},
|
||||
{
|
||||
"name": "Frontend Design",
|
||||
"description": "Generate frontend designs and components",
|
||||
"license": null,
|
||||
"category": "custom",
|
||||
"enabled": false
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
"""
|
||||
try:
|
||||
# Load all skills (including disabled ones)
|
||||
skills = load_skills(enabled_only=False)
|
||||
return SkillsListResponse(skills=[_skill_to_response(skill) for skill in skills])
|
||||
except Exception as e:
|
||||
@@ -202,28 +85,6 @@ async def list_skills() -> SkillsListResponse:
|
||||
description="Retrieve detailed information about a specific skill by its name.",
|
||||
)
|
||||
async def get_skill(skill_name: str) -> SkillResponse:
|
||||
"""Get a specific skill by name.
|
||||
|
||||
Args:
|
||||
skill_name: The name of the skill to retrieve.
|
||||
|
||||
Returns:
|
||||
Skill information if found.
|
||||
|
||||
Raises:
|
||||
HTTPException: 404 if skill not found.
|
||||
|
||||
Example Response:
|
||||
```json
|
||||
{
|
||||
"name": "PDF Processing",
|
||||
"description": "Extract and analyze PDF content",
|
||||
"license": "MIT",
|
||||
"category": "public",
|
||||
"enabled": true
|
||||
}
|
||||
```
|
||||
"""
|
||||
try:
|
||||
skills = load_skills(enabled_only=False)
|
||||
skill = next((s for s in skills if s.name == skill_name), None)
|
||||
@@ -246,76 +107,32 @@ async def get_skill(skill_name: str) -> SkillResponse:
|
||||
description="Update a skill's enabled status by modifying the extensions_config.json file.",
|
||||
)
|
||||
async def update_skill(skill_name: str, request: SkillUpdateRequest) -> SkillResponse:
|
||||
"""Update a skill's enabled status.
|
||||
|
||||
This will modify the extensions_config.json file to update the enabled state.
|
||||
The SKILL.md file itself is not modified.
|
||||
|
||||
Args:
|
||||
skill_name: The name of the skill to update.
|
||||
request: The update request containing the new enabled status.
|
||||
|
||||
Returns:
|
||||
The updated skill information.
|
||||
|
||||
Raises:
|
||||
HTTPException: 404 if skill not found, 500 if update fails.
|
||||
|
||||
Example Request:
|
||||
```json
|
||||
{
|
||||
"enabled": false
|
||||
}
|
||||
```
|
||||
|
||||
Example Response:
|
||||
```json
|
||||
{
|
||||
"name": "PDF Processing",
|
||||
"description": "Extract and analyze PDF content",
|
||||
"license": "MIT",
|
||||
"category": "public",
|
||||
"enabled": false
|
||||
}
|
||||
```
|
||||
"""
|
||||
try:
|
||||
# Find the skill to verify it exists
|
||||
skills = load_skills(enabled_only=False)
|
||||
skill = next((s for s in skills if s.name == skill_name), None)
|
||||
|
||||
if skill is None:
|
||||
raise HTTPException(status_code=404, detail=f"Skill '{skill_name}' not found")
|
||||
|
||||
# Get or create config path
|
||||
config_path = ExtensionsConfig.resolve_config_path()
|
||||
if config_path is None:
|
||||
# Create new config file in parent directory (project root)
|
||||
config_path = Path.cwd().parent / "extensions_config.json"
|
||||
logger.info(f"No existing extensions config found. Creating new config at: {config_path}")
|
||||
|
||||
# Load current configuration
|
||||
extensions_config = get_extensions_config()
|
||||
|
||||
# Update the skill's enabled status
|
||||
extensions_config.skills[skill_name] = SkillStateConfig(enabled=request.enabled)
|
||||
|
||||
# Convert to JSON format (preserve MCP servers config)
|
||||
config_data = {
|
||||
"mcpServers": {name: server.model_dump() for name, server in extensions_config.mcp_servers.items()},
|
||||
"skills": {name: {"enabled": skill_config.enabled} for name, skill_config in extensions_config.skills.items()},
|
||||
}
|
||||
|
||||
# Write the configuration to file
|
||||
with open(config_path, "w", encoding="utf-8") as f:
|
||||
json.dump(config_data, f, indent=2)
|
||||
|
||||
logger.info(f"Skills configuration updated and saved to: {config_path}")
|
||||
|
||||
# Reload the extensions config to update the global cache
|
||||
reload_extensions_config()
|
||||
|
||||
# Reload the skills to get the updated status (for API response)
|
||||
skills = load_skills(enabled_only=False)
|
||||
updated_skill = next((s for s in skills if s.name == skill_name), None)
|
||||
|
||||
@@ -339,98 +156,16 @@ async def update_skill(skill_name: str, request: SkillUpdateRequest) -> SkillRes
|
||||
description="Install a skill from a .skill file (ZIP archive) located in the thread's user-data directory.",
|
||||
)
|
||||
async def install_skill(request: SkillInstallRequest) -> SkillInstallResponse:
|
||||
"""Install a skill from a .skill file.
|
||||
|
||||
The .skill file is a ZIP archive containing a skill directory with SKILL.md
|
||||
and optional resources (scripts, references, assets).
|
||||
|
||||
Args:
|
||||
request: The install request containing thread_id and virtual path to .skill file.
|
||||
|
||||
Returns:
|
||||
Installation result with skill name and status message.
|
||||
|
||||
Raises:
|
||||
HTTPException:
|
||||
- 400 if path is invalid or file is not a valid .skill file
|
||||
- 403 if access denied (path traversal detected)
|
||||
- 404 if file not found
|
||||
- 409 if skill already exists
|
||||
- 500 if installation fails
|
||||
|
||||
Example Request:
|
||||
```json
|
||||
{
|
||||
"thread_id": "abc123-def456",
|
||||
"path": "/mnt/user-data/outputs/my-skill.skill"
|
||||
}
|
||||
```
|
||||
|
||||
Example Response:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"skill_name": "my-skill",
|
||||
"message": "Skill 'my-skill' installed successfully"
|
||||
}
|
||||
```
|
||||
"""
|
||||
try:
|
||||
# Resolve the virtual path to actual file path
|
||||
skill_file_path = resolve_thread_virtual_path(request.thread_id, request.path)
|
||||
|
||||
# Check if file exists
|
||||
if not skill_file_path.exists():
|
||||
raise HTTPException(status_code=404, detail=f"Skill file not found: {request.path}")
|
||||
|
||||
# Check if it's a file
|
||||
if not skill_file_path.is_file():
|
||||
raise HTTPException(status_code=400, detail=f"Path is not a file: {request.path}")
|
||||
|
||||
# Check file extension
|
||||
if not skill_file_path.suffix == ".skill":
|
||||
raise HTTPException(status_code=400, detail="File must have .skill extension")
|
||||
|
||||
# Verify it's a valid ZIP file
|
||||
if not zipfile.is_zipfile(skill_file_path):
|
||||
raise HTTPException(status_code=400, detail="File is not a valid ZIP archive")
|
||||
|
||||
# Get the custom skills directory
|
||||
skills_root = get_skills_root_path()
|
||||
custom_skills_dir = skills_root / "custom"
|
||||
|
||||
# Create custom directory if it doesn't exist
|
||||
custom_skills_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Extract to a temporary directory first for validation
|
||||
with tempfile.TemporaryDirectory() as temp_dir:
|
||||
temp_path = Path(temp_dir)
|
||||
|
||||
# Extract the .skill file with validation and protections.
|
||||
with zipfile.ZipFile(skill_file_path, "r") as zip_ref:
|
||||
_safe_extract_skill_archive(zip_ref, temp_path)
|
||||
|
||||
skill_dir = _resolve_skill_dir_from_archive_root(temp_path)
|
||||
|
||||
# Validate the skill
|
||||
is_valid, message, skill_name = _validate_skill_frontmatter(skill_dir)
|
||||
if not is_valid:
|
||||
raise HTTPException(status_code=400, detail=f"Invalid skill: {message}")
|
||||
|
||||
if not skill_name:
|
||||
raise HTTPException(status_code=400, detail="Could not determine skill name")
|
||||
|
||||
# Check if skill already exists
|
||||
target_dir = custom_skills_dir / skill_name
|
||||
if target_dir.exists():
|
||||
raise HTTPException(status_code=409, detail=f"Skill '{skill_name}' already exists. Please remove it first or use a different name.")
|
||||
|
||||
# Move the skill directory to the custom skills directory
|
||||
shutil.copytree(skill_dir, target_dir)
|
||||
|
||||
logger.info(f"Skill '{skill_name}' installed successfully to {target_dir}")
|
||||
return SkillInstallResponse(success=True, skill_name=skill_name, message=f"Skill '{skill_name}' installed successfully")
|
||||
|
||||
result = install_skill_from_archive(skill_file_path)
|
||||
return SkillInstallResponse(**result)
|
||||
except FileNotFoundError as e:
|
||||
raise HTTPException(status_code=404, detail=str(e))
|
||||
except SkillAlreadyExistsError as e:
|
||||
raise HTTPException(status_code=409, detail=str(e))
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
|
||||
@@ -1,13 +1,23 @@
|
||||
"""Upload router for handling file uploads."""
|
||||
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi import APIRouter, File, HTTPException, UploadFile
|
||||
from pydantic import BaseModel
|
||||
|
||||
from deerflow.config.paths import VIRTUAL_PATH_PREFIX, get_paths
|
||||
from deerflow.config.paths import get_paths
|
||||
from deerflow.sandbox.sandbox_provider import get_sandbox_provider
|
||||
from deerflow.uploads.manager import (
|
||||
PathTraversalError,
|
||||
delete_file_safe,
|
||||
enrich_file_listing,
|
||||
ensure_uploads_dir,
|
||||
get_uploads_dir,
|
||||
list_files_in_dir,
|
||||
normalize_filename,
|
||||
upload_artifact_url,
|
||||
upload_virtual_path,
|
||||
)
|
||||
from deerflow.utils.file_conversion import CONVERTIBLE_EXTENSIONS, convert_file_to_markdown
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
@@ -23,18 +33,6 @@ class UploadResponse(BaseModel):
|
||||
message: str
|
||||
|
||||
|
||||
def get_uploads_dir(thread_id: str) -> Path:
|
||||
"""Get the uploads directory for a thread.
|
||||
|
||||
Args:
|
||||
thread_id: The thread ID.
|
||||
|
||||
Returns:
|
||||
Path to the uploads directory.
|
||||
"""
|
||||
base_dir = get_paths().sandbox_uploads_dir(thread_id)
|
||||
base_dir.mkdir(parents=True, exist_ok=True)
|
||||
return base_dir
|
||||
|
||||
|
||||
@router.post("", response_model=UploadResponse)
|
||||
@@ -42,23 +40,15 @@ async def upload_files(
|
||||
thread_id: str,
|
||||
files: list[UploadFile] = File(...),
|
||||
) -> UploadResponse:
|
||||
"""Upload multiple files to a thread's uploads directory.
|
||||
|
||||
For PDF, PPT, Excel, and Word files, they will be converted to markdown using markitdown.
|
||||
All files (original and converted) are saved to /mnt/user-data/uploads.
|
||||
|
||||
Args:
|
||||
thread_id: The thread ID to upload files to.
|
||||
files: List of files to upload.
|
||||
|
||||
Returns:
|
||||
Upload response with success status and file information.
|
||||
"""
|
||||
"""Upload multiple files to a thread's uploads directory."""
|
||||
if not files:
|
||||
raise HTTPException(status_code=400, detail="No files provided")
|
||||
|
||||
uploads_dir = get_uploads_dir(thread_id)
|
||||
paths = get_paths()
|
||||
try:
|
||||
uploads_dir = ensure_uploads_dir(thread_id)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
sandbox_uploads = get_paths().sandbox_uploads_dir(thread_id)
|
||||
uploaded_files = []
|
||||
|
||||
sandbox_provider = get_sandbox_provider()
|
||||
@@ -70,50 +60,44 @@ async def upload_files(
|
||||
continue
|
||||
|
||||
try:
|
||||
# Normalize filename to prevent path traversal
|
||||
safe_filename = Path(file.filename).name
|
||||
if not safe_filename or safe_filename in {".", ".."} or "/" in safe_filename or "\\" in safe_filename:
|
||||
safe_filename = normalize_filename(file.filename)
|
||||
except ValueError:
|
||||
logger.warning(f"Skipping file with unsafe filename: {file.filename!r}")
|
||||
continue
|
||||
|
||||
try:
|
||||
content = await file.read()
|
||||
file_path = uploads_dir / safe_filename
|
||||
file_path.write_bytes(content)
|
||||
|
||||
# Build relative path from backend root
|
||||
relative_path = str(paths.sandbox_uploads_dir(thread_id) / safe_filename)
|
||||
virtual_path = f"{VIRTUAL_PATH_PREFIX}/uploads/{safe_filename}"
|
||||
virtual_path = upload_virtual_path(safe_filename)
|
||||
|
||||
# Keep local sandbox source of truth in thread-scoped host storage.
|
||||
# For non-local sandboxes, also sync to virtual path for runtime visibility.
|
||||
if sandbox_id != "local":
|
||||
sandbox.update_file(virtual_path, content)
|
||||
|
||||
file_info = {
|
||||
"filename": safe_filename,
|
||||
"size": str(len(content)),
|
||||
"path": relative_path, # Actual filesystem path (relative to backend/)
|
||||
"virtual_path": virtual_path, # Path for Agent in sandbox
|
||||
"artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{safe_filename}", # HTTP URL
|
||||
"path": str(sandbox_uploads / safe_filename),
|
||||
"virtual_path": virtual_path,
|
||||
"artifact_url": upload_artifact_url(thread_id, safe_filename),
|
||||
}
|
||||
|
||||
logger.info(f"Saved file: {safe_filename} ({len(content)} bytes) to {relative_path}")
|
||||
logger.info(f"Saved file: {safe_filename} ({len(content)} bytes) to {file_info['path']}")
|
||||
|
||||
# Check if file should be converted to markdown
|
||||
file_ext = file_path.suffix.lower()
|
||||
if file_ext in CONVERTIBLE_EXTENSIONS:
|
||||
md_path = await convert_file_to_markdown(file_path)
|
||||
if md_path:
|
||||
md_relative_path = str(paths.sandbox_uploads_dir(thread_id) / md_path.name)
|
||||
md_virtual_path = f"{VIRTUAL_PATH_PREFIX}/uploads/{md_path.name}"
|
||||
md_virtual_path = upload_virtual_path(md_path.name)
|
||||
|
||||
if sandbox_id != "local":
|
||||
sandbox.update_file(md_virtual_path, md_path.read_bytes())
|
||||
|
||||
file_info["markdown_file"] = md_path.name
|
||||
file_info["markdown_path"] = md_relative_path
|
||||
file_info["markdown_path"] = str(sandbox_uploads / md_path.name)
|
||||
file_info["markdown_virtual_path"] = md_virtual_path
|
||||
file_info["markdown_artifact_url"] = f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{md_path.name}"
|
||||
file_info["markdown_artifact_url"] = upload_artifact_url(thread_id, md_path.name)
|
||||
|
||||
uploaded_files.append(file_info)
|
||||
|
||||
@@ -130,69 +114,35 @@ async def upload_files(
|
||||
|
||||
@router.get("/list", response_model=dict)
|
||||
async def list_uploaded_files(thread_id: str) -> dict:
|
||||
"""List all files in a thread's uploads directory.
|
||||
|
||||
Args:
|
||||
thread_id: The thread ID to list files for.
|
||||
|
||||
Returns:
|
||||
Dictionary containing list of files with their metadata.
|
||||
"""
|
||||
"""List all files in a thread's uploads directory."""
|
||||
try:
|
||||
uploads_dir = get_uploads_dir(thread_id)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
result = list_files_in_dir(uploads_dir)
|
||||
enrich_file_listing(result, thread_id)
|
||||
|
||||
if not uploads_dir.exists():
|
||||
return {"files": [], "count": 0}
|
||||
# Gateway additionally includes the sandbox-relative path.
|
||||
sandbox_uploads = get_paths().sandbox_uploads_dir(thread_id)
|
||||
for f in result["files"]:
|
||||
f["path"] = str(sandbox_uploads / f["filename"])
|
||||
|
||||
files = []
|
||||
for file_path in sorted(uploads_dir.iterdir()):
|
||||
if file_path.is_file():
|
||||
stat = file_path.stat()
|
||||
relative_path = str(get_paths().sandbox_uploads_dir(thread_id) / file_path.name)
|
||||
files.append(
|
||||
{
|
||||
"filename": file_path.name,
|
||||
"size": stat.st_size,
|
||||
"path": relative_path, # Actual filesystem path
|
||||
"virtual_path": f"{VIRTUAL_PATH_PREFIX}/uploads/{file_path.name}", # Path for Agent in sandbox
|
||||
"artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{file_path.name}", # HTTP URL
|
||||
"extension": file_path.suffix,
|
||||
"modified": stat.st_mtime,
|
||||
}
|
||||
)
|
||||
|
||||
return {"files": files, "count": len(files)}
|
||||
return result
|
||||
|
||||
|
||||
@router.delete("/{filename}")
|
||||
async def delete_uploaded_file(thread_id: str, filename: str) -> dict:
|
||||
"""Delete a file from a thread's uploads directory.
|
||||
|
||||
Args:
|
||||
thread_id: The thread ID.
|
||||
filename: The filename to delete.
|
||||
|
||||
Returns:
|
||||
Success message.
|
||||
"""
|
||||
"""Delete a file from a thread's uploads directory."""
|
||||
try:
|
||||
uploads_dir = get_uploads_dir(thread_id)
|
||||
file_path = uploads_dir / filename
|
||||
|
||||
if not file_path.exists():
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
try:
|
||||
return delete_file_safe(uploads_dir, filename, convertible_extensions=CONVERTIBLE_EXTENSIONS)
|
||||
except FileNotFoundError:
|
||||
raise HTTPException(status_code=404, detail=f"File not found: {filename}")
|
||||
|
||||
# Security check: ensure the path is within the uploads directory
|
||||
try:
|
||||
file_path.resolve().relative_to(uploads_dir.resolve())
|
||||
except ValueError:
|
||||
raise HTTPException(status_code=403, detail="Access denied")
|
||||
|
||||
try:
|
||||
if file_path.suffix.lower() in CONVERTIBLE_EXTENSIONS:
|
||||
companion_markdown = file_path.with_suffix(".md")
|
||||
companion_markdown.unlink(missing_ok=True)
|
||||
file_path.unlink(missing_ok=True)
|
||||
logger.info(f"Deleted file: {filename}")
|
||||
return {"success": True, "message": f"Deleted {filename}"}
|
||||
except PathTraversalError:
|
||||
raise HTTPException(status_code=400, detail="Invalid path")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to delete {filename}: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"Failed to delete {filename}: {str(e)}")
|
||||
|
||||
190
backend/docs/rfc-extract-shared-modules.md
Normal file
190
backend/docs/rfc-extract-shared-modules.md
Normal file
@@ -0,0 +1,190 @@
|
||||
# RFC: Extract Shared Skill Installer and Upload Manager into Harness
|
||||
|
||||
## 1. Problem
|
||||
|
||||
Gateway (`app/gateway/routers/skills.py`, `uploads.py`) and Client (`deerflow/client.py`) each independently implement the same business logic:
|
||||
|
||||
### Skill Installation
|
||||
|
||||
| Logic | Gateway (`skills.py`) | Client (`client.py`) |
|
||||
|-------|----------------------|---------------------|
|
||||
| Zip safety check | `_is_unsafe_zip_member()` | Inline `Path(info.filename).is_absolute()` |
|
||||
| Symlink filtering | `_is_symlink_member()` | `p.is_symlink()` post-extraction delete |
|
||||
| Zip bomb defence | `total_size += info.file_size` (declared) | `total_size > 100MB` (declared) |
|
||||
| macOS metadata filter | `_should_ignore_archive_entry()` | None |
|
||||
| Frontmatter validation | `_validate_skill_frontmatter()` | `_validate_skill_frontmatter()` |
|
||||
| Duplicate detection | `HTTPException(409)` | `ValueError` |
|
||||
|
||||
**Two implementations, inconsistent behaviour**: Gateway streams writes and tracks real decompressed size; Client sums declared `file_size`. Gateway skips symlinks during extraction; Client extracts everything then walks and deletes symlinks.
|
||||
|
||||
### Upload Management
|
||||
|
||||
| Logic | Gateway (`uploads.py`) | Client (`client.py`) |
|
||||
|-------|----------------------|---------------------|
|
||||
| Directory access | `get_uploads_dir()` + `mkdir` | `_get_uploads_dir()` + `mkdir` |
|
||||
| Filename safety | Inline `Path(f).name` + manual checks | No checks, uses `src_path.name` directly |
|
||||
| Duplicate handling | None (overwrites) | None (overwrites) |
|
||||
| Listing | Inline `iterdir()` | Inline `os.scandir()` |
|
||||
| Deletion | Inline `unlink()` + traversal check | Inline `unlink()` + traversal check |
|
||||
| Path traversal | `resolve().relative_to()` | `resolve().relative_to()` |
|
||||
|
||||
**The same traversal check is written twice** — any security fix must be applied to both locations.
|
||||
|
||||
## 2. Design Principles
|
||||
|
||||
### Dependency Direction
|
||||
|
||||
```
|
||||
app.gateway.routers.skills ──┐
|
||||
app.gateway.routers.uploads ──┤── calls ──→ deerflow.skills.installer
|
||||
deerflow.client ──┘ deerflow.uploads.manager
|
||||
```
|
||||
|
||||
- Shared modules live in the harness layer (`deerflow.*`), pure business logic, no FastAPI dependency
|
||||
- Gateway handles HTTP adaptation (`UploadFile` → bytes, exceptions → `HTTPException`)
|
||||
- Client handles local adaptation (`Path` → copy, exceptions → Python exceptions)
|
||||
- Satisfies `test_harness_boundary.py` constraint: harness never imports app
|
||||
|
||||
### Exception Strategy
|
||||
|
||||
| Shared Layer Exception | Gateway Maps To | Client |
|
||||
|----------------------|-----------------|--------|
|
||||
| `FileNotFoundError` | `HTTPException(404)` | Propagates |
|
||||
| `ValueError` | `HTTPException(400)` | Propagates |
|
||||
| `SkillAlreadyExistsError` | `HTTPException(409)` | Propagates |
|
||||
| `PermissionError` | `HTTPException(403)` | Propagates |
|
||||
|
||||
Replaces stringly-typed routing (`"already exists" in str(e)`) with typed exception matching (`SkillAlreadyExistsError`).
|
||||
|
||||
## 3. New Modules
|
||||
|
||||
### 3.1 `deerflow.skills.installer`
|
||||
|
||||
```python
|
||||
# Safety checks
|
||||
is_unsafe_zip_member(info: ZipInfo) -> bool # Absolute path / .. traversal
|
||||
is_symlink_member(info: ZipInfo) -> bool # Unix symlink detection
|
||||
should_ignore_archive_entry(path: Path) -> bool # __MACOSX / dotfiles
|
||||
|
||||
# Extraction
|
||||
safe_extract_skill_archive(zip_ref, dest_path, max_total_size=512MB)
|
||||
# Streaming write, accumulates real bytes (vs declared file_size)
|
||||
# Dual traversal check: member-level + resolve-level
|
||||
|
||||
# Directory resolution
|
||||
resolve_skill_dir_from_archive(temp_path: Path) -> Path
|
||||
# Auto-enters single directory, filters macOS metadata
|
||||
|
||||
# Install entry point
|
||||
install_skill_from_archive(zip_path, *, skills_root=None) -> dict
|
||||
# is_file() pre-check before extension validation
|
||||
# SkillAlreadyExistsError replaces ValueError
|
||||
|
||||
# Exception
|
||||
class SkillAlreadyExistsError(ValueError)
|
||||
```
|
||||
|
||||
### 3.2 `deerflow.uploads.manager`
|
||||
|
||||
```python
|
||||
# Directory management
|
||||
get_uploads_dir(thread_id: str) -> Path # Pure path, no side effects
|
||||
ensure_uploads_dir(thread_id: str) -> Path # Creates directory (for write paths)
|
||||
|
||||
# Filename safety
|
||||
normalize_filename(filename: str) -> str
|
||||
# Path.name extraction + rejects ".." / "." / backslash / >255 bytes
|
||||
deduplicate_filename(name: str, seen: set) -> str
|
||||
# _N suffix increment for dedup, mutates seen in place
|
||||
|
||||
# Path safety
|
||||
validate_path_traversal(path: Path, base: Path) -> None
|
||||
# resolve().relative_to(), raises PermissionError on failure
|
||||
|
||||
# File operations
|
||||
list_files_in_dir(directory: Path) -> dict
|
||||
# scandir with stat inside context (no re-stat)
|
||||
# follow_symlinks=False to prevent metadata leakage
|
||||
# Non-existent directory returns empty list
|
||||
delete_file_safe(base_dir: Path, filename: str) -> dict
|
||||
# Validates traversal first, then unlinks
|
||||
|
||||
# URL helpers
|
||||
upload_artifact_url(thread_id, filename) -> str # Percent-encoded for HTTP safety
|
||||
upload_virtual_path(filename) -> str # Sandbox-internal path
|
||||
enrich_file_listing(result, thread_id) -> dict # Adds URLs, stringifies sizes
|
||||
```
|
||||
|
||||
## 4. Changes
|
||||
|
||||
### 4.1 Gateway Slimming
|
||||
|
||||
**`app/gateway/routers/skills.py`**:
|
||||
- Remove `_is_unsafe_zip_member`, `_is_symlink_member`, `_safe_extract_skill_archive`, `_should_ignore_archive_entry`, `_resolve_skill_dir_from_archive_root` (~80 lines)
|
||||
- `install_skill` route becomes a single call to `install_skill_from_archive(path)`
|
||||
- Exception mapping: `SkillAlreadyExistsError → 409`, `ValueError → 400`, `FileNotFoundError → 404`
|
||||
|
||||
**`app/gateway/routers/uploads.py`**:
|
||||
- Remove inline `get_uploads_dir` (replaced by `ensure_uploads_dir`/`get_uploads_dir`)
|
||||
- `upload_files` uses `normalize_filename()` instead of inline safety checks
|
||||
- `list_uploaded_files` uses `list_files_in_dir()` + enrichment
|
||||
- `delete_uploaded_file` uses `delete_file_safe()` + companion markdown cleanup
|
||||
|
||||
### 4.2 Client Slimming
|
||||
|
||||
**`deerflow/client.py`**:
|
||||
- Remove `_get_uploads_dir` static method
|
||||
- Remove ~50 lines of inline zip handling in `install_skill`
|
||||
- `install_skill` delegates to `install_skill_from_archive()`
|
||||
- `upload_files` uses `deduplicate_filename()` + `ensure_uploads_dir()`
|
||||
- `list_uploads` uses `get_uploads_dir()` + `list_files_in_dir()`
|
||||
- `delete_upload` uses `get_uploads_dir()` + `delete_file_safe()`
|
||||
- `update_mcp_config` / `update_skill` now reset `_agent_config_key = None`
|
||||
|
||||
### 4.3 Read/Write Path Separation
|
||||
|
||||
| Operation | Function | Creates dir? |
|
||||
|-----------|----------|:------------:|
|
||||
| upload (write) | `ensure_uploads_dir()` | Yes |
|
||||
| list (read) | `get_uploads_dir()` | No |
|
||||
| delete (read) | `get_uploads_dir()` | No |
|
||||
|
||||
Read paths no longer have `mkdir` side effects — non-existent directories return empty lists.
|
||||
|
||||
## 5. Security Improvements
|
||||
|
||||
| Improvement | Before | After |
|
||||
|-------------|--------|-------|
|
||||
| Zip bomb detection | Sum of declared `file_size` | Streaming write, accumulates real bytes |
|
||||
| Symlink handling | Gateway skips / Client deletes post-extract | Unified skip + log |
|
||||
| Traversal check | Member-level only | Member-level + `resolve().is_relative_to()` |
|
||||
| Filename backslash | Gateway checks / Client doesn't | Unified rejection |
|
||||
| Filename length | No check | Reject > 255 bytes (OS limit) |
|
||||
| thread_id validation | None | Reject unsafe filesystem characters |
|
||||
| Listing symlink leak | `follow_symlinks=True` (default) | `follow_symlinks=False` |
|
||||
| 409 status routing | `"already exists" in str(e)` | `SkillAlreadyExistsError` type match |
|
||||
| Artifact URL encoding | Raw filename in URL | `urllib.parse.quote()` |
|
||||
|
||||
## 6. Alternatives Considered
|
||||
|
||||
| Alternative | Why Not |
|
||||
|-------------|---------|
|
||||
| Keep logic in Gateway, Client calls Gateway via HTTP | Adds network dependency to embedded Client; defeats the purpose of `DeerFlowClient` as an in-process API |
|
||||
| Abstract base class with Gateway/Client subclasses | Over-engineered for what are pure functions; no polymorphism needed |
|
||||
| Move everything into `client.py` and have Gateway import it | Violates harness/app boundary — Client is in harness, but Gateway-specific models (Pydantic response types) should stay in app layer |
|
||||
| Merge Gateway and Client into one module | They serve different consumers (HTTP vs in-process) with different adaptation needs |
|
||||
|
||||
## 7. Breaking Changes
|
||||
|
||||
**None.** All public APIs (Gateway HTTP endpoints, `DeerFlowClient` methods) retain their existing signatures and return formats. The `SkillAlreadyExistsError` is a subclass of `ValueError`, so existing `except ValueError` handlers still catch it.
|
||||
|
||||
## 8. Tests
|
||||
|
||||
| Module | Test File | Count |
|
||||
|--------|-----------|:-----:|
|
||||
| `skills.installer` | `tests/test_skills_installer.py` | 22 |
|
||||
| `uploads.manager` | `tests/test_uploads_manager.py` | 20 |
|
||||
| `client` hardening | `tests/test_client.py` (new cases) | ~40 |
|
||||
| `client` e2e | `tests/test_client_e2e.py` (new file) | ~20 |
|
||||
|
||||
Coverage: unsafe zip / symlink / zip bomb / frontmatter / duplicate / extension / macOS filter / normalize / deduplicate / traversal / list / delete / agent invalidation / upload lifecycle / thread isolation / URL encoding / config pollution.
|
||||
@@ -19,12 +19,9 @@ import asyncio
|
||||
import json
|
||||
import logging
|
||||
import mimetypes
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import tempfile
|
||||
import uuid
|
||||
import zipfile
|
||||
from collections.abc import Generator
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
@@ -42,6 +39,17 @@ from deerflow.config.app_config import get_app_config, reload_app_config
|
||||
from deerflow.config.extensions_config import ExtensionsConfig, SkillStateConfig, get_extensions_config, reload_extensions_config
|
||||
from deerflow.config.paths import get_paths
|
||||
from deerflow.models import create_chat_model
|
||||
from deerflow.skills.installer import install_skill_from_archive
|
||||
from deerflow.uploads.manager import (
|
||||
claim_unique_filename,
|
||||
delete_file_safe,
|
||||
enrich_file_listing,
|
||||
ensure_uploads_dir,
|
||||
get_uploads_dir,
|
||||
list_files_in_dir,
|
||||
upload_artifact_url,
|
||||
upload_virtual_path,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -566,6 +574,7 @@ class DeerFlowClient:
|
||||
self._atomic_write_json(config_path, config_data)
|
||||
|
||||
self._agent = None
|
||||
self._agent_config_key = None
|
||||
reloaded = reload_extensions_config()
|
||||
return {"mcp_servers": {name: server.model_dump() for name, server in reloaded.mcp_servers.items()}}
|
||||
|
||||
@@ -631,6 +640,7 @@ class DeerFlowClient:
|
||||
self._atomic_write_json(config_path, config_data)
|
||||
|
||||
self._agent = None
|
||||
self._agent_config_key = None
|
||||
reload_extensions_config()
|
||||
|
||||
updated = next((s for s in load_skills(enabled_only=False) if s.name == name), None)
|
||||
@@ -657,56 +667,7 @@ class DeerFlowClient:
|
||||
FileNotFoundError: If the file does not exist.
|
||||
ValueError: If the file is invalid.
|
||||
"""
|
||||
from deerflow.skills.loader import get_skills_root_path
|
||||
from deerflow.skills.validation import _validate_skill_frontmatter
|
||||
|
||||
path = Path(skill_path)
|
||||
if not path.exists():
|
||||
raise FileNotFoundError(f"Skill file not found: {skill_path}")
|
||||
if not path.is_file():
|
||||
raise ValueError(f"Path is not a file: {skill_path}")
|
||||
if path.suffix != ".skill":
|
||||
raise ValueError("File must have .skill extension")
|
||||
if not zipfile.is_zipfile(path):
|
||||
raise ValueError("File is not a valid ZIP archive")
|
||||
|
||||
skills_root = get_skills_root_path()
|
||||
custom_dir = skills_root / "custom"
|
||||
custom_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
with zipfile.ZipFile(path, "r") as zf:
|
||||
total_size = sum(info.file_size for info in zf.infolist())
|
||||
if total_size > 100 * 1024 * 1024:
|
||||
raise ValueError("Skill archive too large when extracted (>100MB)")
|
||||
for info in zf.infolist():
|
||||
if Path(info.filename).is_absolute() or ".." in Path(info.filename).parts:
|
||||
raise ValueError(f"Unsafe path in archive: {info.filename}")
|
||||
zf.extractall(tmp_path)
|
||||
for p in tmp_path.rglob("*"):
|
||||
if p.is_symlink():
|
||||
p.unlink()
|
||||
|
||||
items = list(tmp_path.iterdir())
|
||||
if not items:
|
||||
raise ValueError("Skill archive is empty")
|
||||
|
||||
skill_dir = items[0] if len(items) == 1 and items[0].is_dir() else tmp_path
|
||||
|
||||
is_valid, message, skill_name = _validate_skill_frontmatter(skill_dir)
|
||||
if not is_valid:
|
||||
raise ValueError(f"Invalid skill: {message}")
|
||||
if not re.fullmatch(r"[a-zA-Z0-9_-]+", skill_name):
|
||||
raise ValueError(f"Invalid skill name: {skill_name}")
|
||||
|
||||
target = custom_dir / skill_name
|
||||
if target.exists():
|
||||
raise ValueError(f"Skill '{skill_name}' already exists")
|
||||
|
||||
shutil.copytree(skill_dir, target)
|
||||
|
||||
return {"success": True, "skill_name": skill_name, "message": f"Skill '{skill_name}' installed successfully"}
|
||||
return install_skill_from_archive(skill_path)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Public API — memory management
|
||||
@@ -756,13 +717,6 @@ class DeerFlowClient:
|
||||
# Public API — file uploads
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
@staticmethod
|
||||
def _get_uploads_dir(thread_id: str) -> Path:
|
||||
"""Get (and create) the uploads directory for a thread."""
|
||||
base = get_paths().sandbox_uploads_dir(thread_id)
|
||||
base.mkdir(parents=True, exist_ok=True)
|
||||
return base
|
||||
|
||||
def upload_files(self, thread_id: str, files: list[str | Path]) -> dict:
|
||||
"""Upload local files into a thread's uploads directory.
|
||||
|
||||
@@ -784,7 +738,7 @@ class DeerFlowClient:
|
||||
|
||||
# Validate all files upfront to avoid partial uploads.
|
||||
resolved_files = []
|
||||
convertible_extensions = {ext.lower() for ext in CONVERTIBLE_EXTENSIONS}
|
||||
seen_names: set[str] = set()
|
||||
has_convertible_file = False
|
||||
for f in files:
|
||||
p = Path(f)
|
||||
@@ -792,11 +746,12 @@ class DeerFlowClient:
|
||||
raise FileNotFoundError(f"File not found: {f}")
|
||||
if not p.is_file():
|
||||
raise ValueError(f"Path is not a file: {f}")
|
||||
resolved_files.append(p)
|
||||
if not has_convertible_file and p.suffix.lower() in convertible_extensions:
|
||||
dest_name = claim_unique_filename(p.name, seen_names)
|
||||
resolved_files.append((p, dest_name))
|
||||
if not has_convertible_file and p.suffix.lower() in CONVERTIBLE_EXTENSIONS:
|
||||
has_convertible_file = True
|
||||
|
||||
uploads_dir = self._get_uploads_dir(thread_id)
|
||||
uploads_dir = ensure_uploads_dir(thread_id)
|
||||
uploaded_files: list[dict] = []
|
||||
|
||||
conversion_pool = None
|
||||
@@ -816,19 +771,21 @@ class DeerFlowClient:
|
||||
return asyncio.run(convert_file_to_markdown(path))
|
||||
|
||||
try:
|
||||
for src_path in resolved_files:
|
||||
dest = uploads_dir / src_path.name
|
||||
for src_path, dest_name in resolved_files:
|
||||
dest = uploads_dir / dest_name
|
||||
shutil.copy2(src_path, dest)
|
||||
|
||||
info: dict[str, Any] = {
|
||||
"filename": src_path.name,
|
||||
"filename": dest_name,
|
||||
"size": str(dest.stat().st_size),
|
||||
"path": str(dest),
|
||||
"virtual_path": f"/mnt/user-data/uploads/{src_path.name}",
|
||||
"artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{src_path.name}",
|
||||
"virtual_path": upload_virtual_path(dest_name),
|
||||
"artifact_url": upload_artifact_url(thread_id, dest_name),
|
||||
}
|
||||
if dest_name != src_path.name:
|
||||
info["original_filename"] = src_path.name
|
||||
|
||||
if src_path.suffix.lower() in convertible_extensions:
|
||||
if src_path.suffix.lower() in CONVERTIBLE_EXTENSIONS:
|
||||
try:
|
||||
if conversion_pool is not None:
|
||||
md_path = conversion_pool.submit(_convert_in_thread, dest).result()
|
||||
@@ -844,8 +801,9 @@ class DeerFlowClient:
|
||||
|
||||
if md_path is not None:
|
||||
info["markdown_file"] = md_path.name
|
||||
info["markdown_virtual_path"] = f"/mnt/user-data/uploads/{md_path.name}"
|
||||
info["markdown_artifact_url"] = f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{md_path.name}"
|
||||
info["markdown_path"] = str(uploads_dir / md_path.name)
|
||||
info["markdown_virtual_path"] = upload_virtual_path(md_path.name)
|
||||
info["markdown_artifact_url"] = upload_artifact_url(thread_id, md_path.name)
|
||||
|
||||
uploaded_files.append(info)
|
||||
finally:
|
||||
@@ -868,29 +826,9 @@ class DeerFlowClient:
|
||||
Dict with "files" and "count" keys, matching the Gateway API
|
||||
``list_uploaded_files`` response.
|
||||
"""
|
||||
uploads_dir = self._get_uploads_dir(thread_id)
|
||||
if not uploads_dir.exists():
|
||||
return {"files": [], "count": 0}
|
||||
|
||||
files = []
|
||||
with os.scandir(uploads_dir) as entries:
|
||||
file_entries = [entry for entry in entries if entry.is_file()]
|
||||
|
||||
for entry in sorted(file_entries, key=lambda item: item.name):
|
||||
stat = entry.stat()
|
||||
filename = entry.name
|
||||
files.append(
|
||||
{
|
||||
"filename": filename,
|
||||
"size": str(stat.st_size),
|
||||
"path": str(Path(entry.path)),
|
||||
"virtual_path": f"/mnt/user-data/uploads/{filename}",
|
||||
"artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{filename}",
|
||||
"extension": Path(filename).suffix,
|
||||
"modified": stat.st_mtime,
|
||||
}
|
||||
)
|
||||
return {"files": files, "count": len(files)}
|
||||
uploads_dir = get_uploads_dir(thread_id)
|
||||
result = list_files_in_dir(uploads_dir)
|
||||
return enrich_file_listing(result, thread_id)
|
||||
|
||||
def delete_upload(self, thread_id: str, filename: str) -> dict:
|
||||
"""Delete a file from a thread's uploads directory.
|
||||
@@ -907,19 +845,10 @@ class DeerFlowClient:
|
||||
FileNotFoundError: If the file does not exist.
|
||||
PermissionError: If path traversal is detected.
|
||||
"""
|
||||
uploads_dir = self._get_uploads_dir(thread_id)
|
||||
file_path = (uploads_dir / filename).resolve()
|
||||
from deerflow.utils.file_conversion import CONVERTIBLE_EXTENSIONS
|
||||
|
||||
try:
|
||||
file_path.relative_to(uploads_dir.resolve())
|
||||
except ValueError as exc:
|
||||
raise PermissionError("Access denied: path traversal detected") from exc
|
||||
|
||||
if not file_path.is_file():
|
||||
raise FileNotFoundError(f"File not found: {filename}")
|
||||
|
||||
file_path.unlink()
|
||||
return {"success": True, "message": f"Deleted {filename}"}
|
||||
uploads_dir = get_uploads_dir(thread_id)
|
||||
return delete_file_safe(uploads_dir, filename, convertible_extensions=CONVERTIBLE_EXTENSIONS)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Public API — artifacts
|
||||
@@ -939,19 +868,13 @@ class DeerFlowClient:
|
||||
FileNotFoundError: If the artifact does not exist.
|
||||
ValueError: If the path is invalid.
|
||||
"""
|
||||
virtual_prefix = "mnt/user-data"
|
||||
clean_path = path.lstrip("/")
|
||||
if not clean_path.startswith(virtual_prefix):
|
||||
raise ValueError(f"Path must start with /{virtual_prefix}")
|
||||
|
||||
relative = clean_path[len(virtual_prefix) :].lstrip("/")
|
||||
base_dir = get_paths().sandbox_user_data_dir(thread_id)
|
||||
actual = (base_dir / relative).resolve()
|
||||
|
||||
try:
|
||||
actual.relative_to(base_dir.resolve())
|
||||
actual = get_paths().resolve_virtual_path(thread_id, path)
|
||||
except ValueError as exc:
|
||||
raise PermissionError("Access denied: path traversal detected") from exc
|
||||
if "traversal" in str(exc):
|
||||
from deerflow.uploads.manager import PathTraversalError
|
||||
raise PathTraversalError("Path traversal detected") from exc
|
||||
raise
|
||||
if not actual.exists():
|
||||
raise FileNotFoundError(f"Artifact not found: {path}")
|
||||
if not actual.is_file():
|
||||
|
||||
@@ -1,5 +1,14 @@
|
||||
from .installer import SkillAlreadyExistsError, install_skill_from_archive
|
||||
from .loader import get_skills_root_path, load_skills
|
||||
from .types import Skill
|
||||
from .validation import ALLOWED_FRONTMATTER_PROPERTIES, _validate_skill_frontmatter
|
||||
|
||||
__all__ = ["load_skills", "get_skills_root_path", "Skill", "ALLOWED_FRONTMATTER_PROPERTIES", "_validate_skill_frontmatter"]
|
||||
__all__ = [
|
||||
"load_skills",
|
||||
"get_skills_root_path",
|
||||
"Skill",
|
||||
"ALLOWED_FRONTMATTER_PROPERTIES",
|
||||
"_validate_skill_frontmatter",
|
||||
"install_skill_from_archive",
|
||||
"SkillAlreadyExistsError",
|
||||
]
|
||||
|
||||
176
backend/packages/harness/deerflow/skills/installer.py
Normal file
176
backend/packages/harness/deerflow/skills/installer.py
Normal file
@@ -0,0 +1,176 @@
|
||||
"""Shared skill archive installation logic.
|
||||
|
||||
Pure business logic — no FastAPI/HTTP dependencies.
|
||||
Both Gateway and Client delegate to these functions.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import shutil
|
||||
import stat
|
||||
import tempfile
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
from deerflow.skills.loader import get_skills_root_path
|
||||
from deerflow.skills.validation import _validate_skill_frontmatter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SkillAlreadyExistsError(ValueError):
|
||||
"""Raised when a skill with the same name is already installed."""
|
||||
|
||||
|
||||
def is_unsafe_zip_member(info: zipfile.ZipInfo) -> bool:
|
||||
"""Return True if the zip member path is absolute or attempts directory traversal."""
|
||||
name = info.filename
|
||||
if not name:
|
||||
return False
|
||||
path = Path(name)
|
||||
if path.is_absolute():
|
||||
return True
|
||||
if ".." in path.parts:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def is_symlink_member(info: zipfile.ZipInfo) -> bool:
|
||||
"""Detect symlinks based on the external attributes stored in the ZipInfo."""
|
||||
mode = info.external_attr >> 16
|
||||
return stat.S_ISLNK(mode)
|
||||
|
||||
|
||||
def should_ignore_archive_entry(path: Path) -> bool:
|
||||
"""Return True for macOS metadata dirs and dotfiles."""
|
||||
return path.name.startswith(".") or path.name == "__MACOSX"
|
||||
|
||||
|
||||
def resolve_skill_dir_from_archive(temp_path: Path) -> Path:
|
||||
"""Locate the skill root directory from extracted archive contents.
|
||||
|
||||
Filters out macOS metadata (__MACOSX) and dotfiles (.DS_Store).
|
||||
|
||||
Returns:
|
||||
Path to the skill directory.
|
||||
|
||||
Raises:
|
||||
ValueError: If the archive is empty after filtering.
|
||||
"""
|
||||
items = [p for p in temp_path.iterdir() if not should_ignore_archive_entry(p)]
|
||||
if not items:
|
||||
raise ValueError("Skill archive is empty")
|
||||
if len(items) == 1 and items[0].is_dir():
|
||||
return items[0]
|
||||
return temp_path
|
||||
|
||||
|
||||
def safe_extract_skill_archive(
|
||||
zip_ref: zipfile.ZipFile,
|
||||
dest_path: Path,
|
||||
max_total_size: int = 512 * 1024 * 1024,
|
||||
) -> None:
|
||||
"""Safely extract a skill archive with security protections.
|
||||
|
||||
Protections:
|
||||
- Reject absolute paths and directory traversal (..).
|
||||
- Skip symlink entries instead of materialising them.
|
||||
- Enforce a hard limit on total uncompressed size (zip bomb defence).
|
||||
|
||||
Raises:
|
||||
ValueError: If unsafe members or size limit exceeded.
|
||||
"""
|
||||
dest_root = dest_path.resolve()
|
||||
total_written = 0
|
||||
|
||||
for info in zip_ref.infolist():
|
||||
if is_unsafe_zip_member(info):
|
||||
raise ValueError(f"Archive contains unsafe member path: {info.filename!r}")
|
||||
|
||||
if is_symlink_member(info):
|
||||
logger.warning("Skipping symlink entry in skill archive: %s", info.filename)
|
||||
continue
|
||||
|
||||
member_path = dest_root / info.filename
|
||||
if not member_path.resolve().is_relative_to(dest_root):
|
||||
raise ValueError(f"Zip entry escapes destination: {info.filename!r}")
|
||||
member_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
if info.is_dir():
|
||||
member_path.mkdir(parents=True, exist_ok=True)
|
||||
continue
|
||||
|
||||
with zip_ref.open(info) as src, member_path.open("wb") as dst:
|
||||
while chunk := src.read(65536):
|
||||
total_written += len(chunk)
|
||||
if total_written > max_total_size:
|
||||
raise ValueError("Skill archive is too large or appears highly compressed.")
|
||||
dst.write(chunk)
|
||||
|
||||
|
||||
def install_skill_from_archive(
|
||||
zip_path: str | Path,
|
||||
*,
|
||||
skills_root: Path | None = None,
|
||||
) -> dict:
|
||||
"""Install a skill from a .skill archive (ZIP).
|
||||
|
||||
Args:
|
||||
zip_path: Path to the .skill file.
|
||||
skills_root: Override the skills root directory. If None, uses
|
||||
the default from config.
|
||||
|
||||
Returns:
|
||||
Dict with success, skill_name, message.
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the file does not exist.
|
||||
ValueError: If the file is invalid (wrong extension, bad ZIP,
|
||||
invalid frontmatter, duplicate name).
|
||||
"""
|
||||
logger.info("Installing skill from %s", zip_path)
|
||||
path = Path(zip_path)
|
||||
if not path.is_file():
|
||||
if not path.exists():
|
||||
raise FileNotFoundError(f"Skill file not found: {zip_path}")
|
||||
raise ValueError(f"Path is not a file: {zip_path}")
|
||||
if path.suffix != ".skill":
|
||||
raise ValueError("File must have .skill extension")
|
||||
|
||||
if skills_root is None:
|
||||
skills_root = get_skills_root_path()
|
||||
custom_dir = skills_root / "custom"
|
||||
custom_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
|
||||
try:
|
||||
zf = zipfile.ZipFile(path, "r")
|
||||
except FileNotFoundError:
|
||||
raise FileNotFoundError(f"Skill file not found: {zip_path}") from None
|
||||
except (zipfile.BadZipFile, IsADirectoryError):
|
||||
raise ValueError("File is not a valid ZIP archive") from None
|
||||
|
||||
with zf:
|
||||
safe_extract_skill_archive(zf, tmp_path)
|
||||
|
||||
skill_dir = resolve_skill_dir_from_archive(tmp_path)
|
||||
|
||||
is_valid, message, skill_name = _validate_skill_frontmatter(skill_dir)
|
||||
if not is_valid:
|
||||
raise ValueError(f"Invalid skill: {message}")
|
||||
if not skill_name or "/" in skill_name or "\\" in skill_name or ".." in skill_name:
|
||||
raise ValueError(f"Invalid skill name: {skill_name}")
|
||||
|
||||
target = custom_dir / skill_name
|
||||
if target.exists():
|
||||
raise SkillAlreadyExistsError(f"Skill '{skill_name}' already exists")
|
||||
|
||||
shutil.copytree(skill_dir, target)
|
||||
logger.info("Skill %r installed to %s", skill_name, target)
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"skill_name": skill_name,
|
||||
"message": f"Skill '{skill_name}' installed successfully",
|
||||
}
|
||||
29
backend/packages/harness/deerflow/uploads/__init__.py
Normal file
29
backend/packages/harness/deerflow/uploads/__init__.py
Normal file
@@ -0,0 +1,29 @@
|
||||
from .manager import (
|
||||
PathTraversalError,
|
||||
claim_unique_filename,
|
||||
delete_file_safe,
|
||||
enrich_file_listing,
|
||||
ensure_uploads_dir,
|
||||
get_uploads_dir,
|
||||
list_files_in_dir,
|
||||
normalize_filename,
|
||||
upload_artifact_url,
|
||||
upload_virtual_path,
|
||||
validate_path_traversal,
|
||||
validate_thread_id,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"get_uploads_dir",
|
||||
"ensure_uploads_dir",
|
||||
"normalize_filename",
|
||||
"PathTraversalError",
|
||||
"claim_unique_filename",
|
||||
"validate_path_traversal",
|
||||
"list_files_in_dir",
|
||||
"delete_file_safe",
|
||||
"upload_artifact_url",
|
||||
"upload_virtual_path",
|
||||
"enrich_file_listing",
|
||||
"validate_thread_id",
|
||||
]
|
||||
198
backend/packages/harness/deerflow/uploads/manager.py
Normal file
198
backend/packages/harness/deerflow/uploads/manager.py
Normal file
@@ -0,0 +1,198 @@
|
||||
"""Shared upload management logic.
|
||||
|
||||
Pure business logic — no FastAPI/HTTP dependencies.
|
||||
Both Gateway and Client delegate to these functions.
|
||||
"""
|
||||
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
from urllib.parse import quote
|
||||
|
||||
from deerflow.config.paths import VIRTUAL_PATH_PREFIX, get_paths
|
||||
|
||||
|
||||
class PathTraversalError(ValueError):
|
||||
"""Raised when a path escapes its allowed base directory."""
|
||||
|
||||
# thread_id must be alphanumeric, hyphens, underscores, or dots only.
|
||||
_SAFE_THREAD_ID = re.compile(r"^[a-zA-Z0-9._-]+$")
|
||||
|
||||
|
||||
def validate_thread_id(thread_id: str) -> None:
|
||||
"""Reject thread IDs containing characters unsafe for filesystem paths.
|
||||
|
||||
Raises:
|
||||
ValueError: If thread_id is empty or contains unsafe characters.
|
||||
"""
|
||||
if not thread_id or not _SAFE_THREAD_ID.match(thread_id):
|
||||
raise ValueError(f"Invalid thread_id: {thread_id!r}")
|
||||
|
||||
|
||||
def get_uploads_dir(thread_id: str) -> Path:
|
||||
"""Return the uploads directory path for a thread (no side effects)."""
|
||||
validate_thread_id(thread_id)
|
||||
return get_paths().sandbox_uploads_dir(thread_id)
|
||||
|
||||
|
||||
def ensure_uploads_dir(thread_id: str) -> Path:
|
||||
"""Return the uploads directory for a thread, creating it if needed."""
|
||||
base = get_uploads_dir(thread_id)
|
||||
base.mkdir(parents=True, exist_ok=True)
|
||||
return base
|
||||
|
||||
|
||||
def normalize_filename(filename: str) -> str:
|
||||
"""Sanitize a filename by extracting its basename.
|
||||
|
||||
Strips any directory components and rejects traversal patterns.
|
||||
|
||||
Args:
|
||||
filename: Raw filename from user input (may contain path components).
|
||||
|
||||
Returns:
|
||||
Safe filename (basename only).
|
||||
|
||||
Raises:
|
||||
ValueError: If filename is empty or resolves to a traversal pattern.
|
||||
"""
|
||||
if not filename:
|
||||
raise ValueError("Filename is empty")
|
||||
safe = Path(filename).name
|
||||
if not safe or safe in {".", ".."}:
|
||||
raise ValueError(f"Filename is unsafe: {filename!r}")
|
||||
# Reject backslashes — on Linux Path.name keeps them as literal chars,
|
||||
# but they indicate a Windows-style path that should be stripped or rejected.
|
||||
if "\\" in safe:
|
||||
raise ValueError(f"Filename contains backslash: {filename!r}")
|
||||
if len(safe.encode("utf-8")) > 255:
|
||||
raise ValueError(f"Filename too long: {len(safe)} chars")
|
||||
return safe
|
||||
|
||||
|
||||
def claim_unique_filename(name: str, seen: set[str]) -> str:
|
||||
"""Generate a unique filename by appending ``_N`` suffix on collision.
|
||||
|
||||
Automatically adds the returned name to *seen* so callers don't need to.
|
||||
|
||||
Args:
|
||||
name: Candidate filename.
|
||||
seen: Set of filenames already claimed (mutated in place).
|
||||
|
||||
Returns:
|
||||
A filename not present in *seen* (already added to *seen*).
|
||||
"""
|
||||
if name not in seen:
|
||||
seen.add(name)
|
||||
return name
|
||||
stem, suffix = Path(name).stem, Path(name).suffix
|
||||
counter = 1
|
||||
candidate = f"{stem}_{counter}{suffix}"
|
||||
while candidate in seen:
|
||||
counter += 1
|
||||
candidate = f"{stem}_{counter}{suffix}"
|
||||
seen.add(candidate)
|
||||
return candidate
|
||||
|
||||
|
||||
def validate_path_traversal(path: Path, base: Path) -> None:
|
||||
"""Verify that *path* is inside *base*.
|
||||
|
||||
Raises:
|
||||
PathTraversalError: If a path traversal is detected.
|
||||
"""
|
||||
try:
|
||||
path.resolve().relative_to(base.resolve())
|
||||
except ValueError:
|
||||
raise PathTraversalError("Path traversal detected") from None
|
||||
|
||||
|
||||
def list_files_in_dir(directory: Path) -> dict:
|
||||
"""List files (not directories) in *directory*.
|
||||
|
||||
Args:
|
||||
directory: Directory to scan.
|
||||
|
||||
Returns:
|
||||
Dict with "files" list (sorted by name) and "count".
|
||||
Each file entry has ``size`` as *int* (bytes). Call
|
||||
:func:`enrich_file_listing` to stringify sizes and add
|
||||
virtual / artifact URLs.
|
||||
"""
|
||||
if not directory.is_dir():
|
||||
return {"files": [], "count": 0}
|
||||
|
||||
files = []
|
||||
with os.scandir(directory) as entries:
|
||||
for entry in sorted(entries, key=lambda e: e.name):
|
||||
if not entry.is_file(follow_symlinks=False):
|
||||
continue
|
||||
st = entry.stat(follow_symlinks=False)
|
||||
files.append({
|
||||
"filename": entry.name,
|
||||
"size": st.st_size,
|
||||
"path": entry.path,
|
||||
"extension": Path(entry.name).suffix,
|
||||
"modified": st.st_mtime,
|
||||
})
|
||||
return {"files": files, "count": len(files)}
|
||||
|
||||
|
||||
def delete_file_safe(base_dir: Path, filename: str, *, convertible_extensions: set[str] | None = None) -> dict:
|
||||
"""Delete a file inside *base_dir* after path-traversal validation.
|
||||
|
||||
If *convertible_extensions* is provided and the file's extension matches,
|
||||
the companion ``.md`` file is also removed (if it exists).
|
||||
|
||||
Args:
|
||||
base_dir: Directory containing the file.
|
||||
filename: Name of file to delete.
|
||||
convertible_extensions: Lowercase extensions (e.g. ``{".pdf", ".docx"}``)
|
||||
whose companion markdown should be cleaned up.
|
||||
|
||||
Returns:
|
||||
Dict with success and message.
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the file does not exist.
|
||||
PathTraversalError: If path traversal is detected.
|
||||
"""
|
||||
file_path = (base_dir / filename).resolve()
|
||||
validate_path_traversal(file_path, base_dir)
|
||||
|
||||
if not file_path.is_file():
|
||||
raise FileNotFoundError(f"File not found: {filename}")
|
||||
|
||||
file_path.unlink()
|
||||
|
||||
# Clean up companion markdown generated during upload conversion.
|
||||
if convertible_extensions and file_path.suffix.lower() in convertible_extensions:
|
||||
file_path.with_suffix(".md").unlink(missing_ok=True)
|
||||
|
||||
return {"success": True, "message": f"Deleted {filename}"}
|
||||
|
||||
|
||||
def upload_artifact_url(thread_id: str, filename: str) -> str:
|
||||
"""Build the artifact URL for a file in a thread's uploads directory.
|
||||
|
||||
*filename* is percent-encoded so that spaces, ``#``, ``?`` etc. are safe.
|
||||
"""
|
||||
return f"/api/threads/{thread_id}/artifacts{VIRTUAL_PATH_PREFIX}/uploads/{quote(filename, safe='')}"
|
||||
|
||||
|
||||
def upload_virtual_path(filename: str) -> str:
|
||||
"""Build the virtual path for a file in the uploads directory."""
|
||||
return f"{VIRTUAL_PATH_PREFIX}/uploads/{filename}"
|
||||
|
||||
|
||||
def enrich_file_listing(result: dict, thread_id: str) -> dict:
|
||||
"""Add virtual paths, artifact URLs, and stringify sizes on a listing result.
|
||||
|
||||
Mutates *result* in place and returns it for convenience.
|
||||
"""
|
||||
for f in result["files"]:
|
||||
filename = f["filename"]
|
||||
f["size"] = str(f["size"])
|
||||
f["virtual_path"] = upload_virtual_path(filename)
|
||||
f["artifact_url"] = upload_artifact_url(thread_id, filename)
|
||||
return result
|
||||
@@ -9,7 +9,7 @@ from pathlib import Path
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
from langchain_core.messages import AIMessage, HumanMessage, ToolMessage # noqa: F401
|
||||
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, ToolMessage # noqa: F401
|
||||
|
||||
from app.gateway.routers.mcp import McpConfigResponse
|
||||
from app.gateway.routers.memory import MemoryConfigResponse, MemoryStatusResponse
|
||||
@@ -17,6 +17,8 @@ from app.gateway.routers.models import ModelResponse, ModelsListResponse
|
||||
from app.gateway.routers.skills import SkillInstallResponse, SkillResponse, SkillsListResponse
|
||||
from app.gateway.routers.uploads import UploadResponse
|
||||
from deerflow.client import DeerFlowClient
|
||||
from deerflow.config.paths import Paths
|
||||
from deerflow.uploads.manager import PathTraversalError
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Fixtures
|
||||
@@ -609,10 +611,7 @@ class TestSkillsManagement:
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.loader.get_skills_root_path", return_value=skills_root),
|
||||
patch("deerflow.skills.validation._validate_skill_frontmatter", return_value=(True, "OK", "my-skill")),
|
||||
):
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
|
||||
result = client.install_skill(archive_path)
|
||||
|
||||
assert result["success"] is True
|
||||
@@ -700,7 +699,7 @@ class TestUploads:
|
||||
uploads_dir = tmp_path / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.upload_files("thread-1", [src_file])
|
||||
|
||||
assert result["success"] is True
|
||||
@@ -756,7 +755,7 @@ class TestUploads:
|
||||
return client.upload_files("thread-async", [first, second])
|
||||
|
||||
with (
|
||||
patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir),
|
||||
patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir),
|
||||
patch("deerflow.utils.file_conversion.CONVERTIBLE_EXTENSIONS", {".pdf"}),
|
||||
patch("deerflow.utils.file_conversion.convert_file_to_markdown", side_effect=fake_convert),
|
||||
patch("concurrent.futures.ThreadPoolExecutor", FakeExecutor),
|
||||
@@ -777,7 +776,7 @@ class TestUploads:
|
||||
(uploads_dir / "a.txt").write_text("a")
|
||||
(uploads_dir / "b.txt").write_text("bb")
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.list_uploads("thread-1")
|
||||
|
||||
assert result["count"] == 2
|
||||
@@ -793,7 +792,7 @@ class TestUploads:
|
||||
uploads_dir = Path(tmp)
|
||||
(uploads_dir / "delete-me.txt").write_text("gone")
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.delete_upload("thread-1", "delete-me.txt")
|
||||
|
||||
assert result["success"] is True
|
||||
@@ -802,15 +801,15 @@ class TestUploads:
|
||||
|
||||
def test_delete_upload_not_found(self, client):
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=Path(tmp)):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=Path(tmp)):
|
||||
with pytest.raises(FileNotFoundError):
|
||||
client.delete_upload("thread-1", "nope.txt")
|
||||
|
||||
def test_delete_upload_path_traversal(self, client):
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
uploads_dir = Path(tmp)
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with pytest.raises(PermissionError):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
with pytest.raises(PathTraversalError):
|
||||
client.delete_upload("thread-1", "../../etc/passwd")
|
||||
|
||||
|
||||
@@ -822,15 +821,12 @@ class TestUploads:
|
||||
class TestArtifacts:
|
||||
def test_get_artifact(self, client):
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
user_data_dir = Path(tmp) / "user-data"
|
||||
outputs = user_data_dir / "outputs"
|
||||
paths = Paths(base_dir=tmp)
|
||||
outputs = paths.sandbox_outputs_dir("t1")
|
||||
outputs.mkdir(parents=True)
|
||||
(outputs / "result.txt").write_text("artifact content")
|
||||
|
||||
mock_paths = MagicMock()
|
||||
mock_paths.sandbox_user_data_dir.return_value = user_data_dir
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=mock_paths):
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
content, mime = client.get_artifact("t1", "mnt/user-data/outputs/result.txt")
|
||||
|
||||
assert content == b"artifact content"
|
||||
@@ -838,13 +834,10 @@ class TestArtifacts:
|
||||
|
||||
def test_get_artifact_not_found(self, client):
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
user_data_dir = Path(tmp) / "user-data"
|
||||
user_data_dir.mkdir()
|
||||
paths = Paths(base_dir=tmp)
|
||||
paths.sandbox_user_data_dir("t1").mkdir(parents=True)
|
||||
|
||||
mock_paths = MagicMock()
|
||||
mock_paths.sandbox_user_data_dir.return_value = user_data_dir
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=mock_paths):
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
with pytest.raises(FileNotFoundError):
|
||||
client.get_artifact("t1", "mnt/user-data/outputs/nope.txt")
|
||||
|
||||
@@ -854,14 +847,11 @@ class TestArtifacts:
|
||||
|
||||
def test_get_artifact_path_traversal(self, client):
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
user_data_dir = Path(tmp) / "user-data"
|
||||
user_data_dir.mkdir()
|
||||
paths = Paths(base_dir=tmp)
|
||||
paths.sandbox_user_data_dir("t1").mkdir(parents=True)
|
||||
|
||||
mock_paths = MagicMock()
|
||||
mock_paths.sandbox_user_data_dir.return_value = user_data_dir
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=mock_paths):
|
||||
with pytest.raises(PermissionError):
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
with pytest.raises(PathTraversalError):
|
||||
client.get_artifact("t1", "mnt/user-data/../../../etc/passwd")
|
||||
|
||||
|
||||
@@ -1013,7 +1003,7 @@ class TestScenarioFileLifecycle:
|
||||
(tmp_path / "report.txt").write_text("quarterly report data")
|
||||
(tmp_path / "data.csv").write_text("a,b,c\n1,2,3")
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
# Step 1: Upload
|
||||
result = client.upload_files(
|
||||
"t-lifecycle",
|
||||
@@ -1046,15 +1036,16 @@ class TestScenarioFileLifecycle:
|
||||
tmp_path = Path(tmp)
|
||||
uploads_dir = tmp_path / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
user_data_dir = tmp_path / "user-data"
|
||||
outputs_dir = user_data_dir / "outputs"
|
||||
|
||||
paths = Paths(base_dir=tmp_path)
|
||||
outputs_dir = paths.sandbox_outputs_dir("t-artifact")
|
||||
outputs_dir.mkdir(parents=True)
|
||||
|
||||
# Upload phase
|
||||
src_file = tmp_path / "input.txt"
|
||||
src_file.write_text("raw data to process")
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
uploaded = client.upload_files("t-artifact", [src_file])
|
||||
assert len(uploaded["files"]) == 1
|
||||
|
||||
@@ -1062,10 +1053,7 @@ class TestScenarioFileLifecycle:
|
||||
(outputs_dir / "analysis.json").write_text('{"result": "processed"}')
|
||||
|
||||
# Retrieve artifact
|
||||
mock_paths = MagicMock()
|
||||
mock_paths.sandbox_user_data_dir.return_value = user_data_dir
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=mock_paths):
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
content, mime = client.get_artifact("t-artifact", "mnt/user-data/outputs/analysis.json")
|
||||
|
||||
assert json.loads(content) == {"result": "processed"}
|
||||
@@ -1286,7 +1274,7 @@ class TestScenarioThreadIsolation:
|
||||
def get_dir(thread_id):
|
||||
return uploads_a if thread_id == "thread-a" else uploads_b
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", side_effect=get_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", side_effect=get_dir), patch("deerflow.client.ensure_uploads_dir", side_effect=get_dir):
|
||||
client.upload_files("thread-a", [src_file])
|
||||
|
||||
files_a = client.list_uploads("thread-a")
|
||||
@@ -1298,18 +1286,13 @@ class TestScenarioThreadIsolation:
|
||||
def test_artifacts_isolated_per_thread(self, client):
|
||||
"""Artifacts in thread-A are not accessible from thread-B."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
paths = Paths(base_dir=tmp)
|
||||
outputs_a = paths.sandbox_outputs_dir("thread-a")
|
||||
outputs_a.mkdir(parents=True)
|
||||
paths.sandbox_user_data_dir("thread-b").mkdir(parents=True)
|
||||
(outputs_a / "result.txt").write_text("thread-a artifact")
|
||||
|
||||
data_a = tmp_path / "thread-a"
|
||||
data_b = tmp_path / "thread-b"
|
||||
(data_a / "outputs").mkdir(parents=True)
|
||||
(data_b / "outputs").mkdir(parents=True)
|
||||
(data_a / "outputs" / "result.txt").write_text("thread-a artifact")
|
||||
|
||||
mock_paths = MagicMock()
|
||||
mock_paths.sandbox_user_data_dir.side_effect = lambda tid: data_a if tid == "thread-a" else data_b
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=mock_paths):
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
content, _ = client.get_artifact("thread-a", "mnt/user-data/outputs/result.txt")
|
||||
assert content == b"thread-a artifact"
|
||||
|
||||
@@ -1377,10 +1360,7 @@ class TestScenarioSkillInstallAndUse:
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
# Step 1: Install
|
||||
with (
|
||||
patch("deerflow.skills.loader.get_skills_root_path", return_value=skills_root),
|
||||
patch("deerflow.skills.validation._validate_skill_frontmatter", return_value=(True, "OK", "my-analyzer")),
|
||||
):
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
|
||||
result = client.install_skill(archive)
|
||||
assert result["success"] is True
|
||||
assert (skills_root / "custom" / "my-analyzer" / "SKILL.md").exists()
|
||||
@@ -1512,7 +1492,7 @@ class TestScenarioEdgeCases:
|
||||
pdf_file.write_bytes(b"%PDF-1.4 fake content")
|
||||
|
||||
with (
|
||||
patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir),
|
||||
patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir),
|
||||
patch("deerflow.utils.file_conversion.CONVERTIBLE_EXTENSIONS", {".pdf"}),
|
||||
patch("deerflow.utils.file_conversion.convert_file_to_markdown", side_effect=Exception("conversion failed")),
|
||||
):
|
||||
@@ -1614,9 +1594,7 @@ class TestGatewayConformance:
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.write(skill_dir / "SKILL.md", "my-skill/SKILL.md")
|
||||
|
||||
custom_dir = tmp_path / "custom"
|
||||
custom_dir.mkdir()
|
||||
with patch("deerflow.skills.loader.get_skills_root_path", return_value=tmp_path):
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=tmp_path):
|
||||
result = client.install_skill(archive)
|
||||
|
||||
parsed = SkillInstallResponse(**result)
|
||||
@@ -1680,7 +1658,7 @@ class TestGatewayConformance:
|
||||
src_file = tmp_path / "hello.txt"
|
||||
src_file.write_text("hello")
|
||||
|
||||
with patch.object(DeerFlowClient, "_get_uploads_dir", return_value=uploads_dir):
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.upload_files("t-conform", [src_file])
|
||||
|
||||
parsed = UploadResponse(**result)
|
||||
@@ -1739,3 +1717,694 @@ class TestGatewayConformance:
|
||||
parsed = MemoryStatusResponse(**result)
|
||||
assert parsed.config.enabled is True
|
||||
assert parsed.data.version == "1.0"
|
||||
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — install_skill security gates
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestInstallSkillSecurity:
|
||||
"""Every security gate in install_skill() must have a red-line test."""
|
||||
|
||||
def test_zip_bomb_rejected(self, client):
|
||||
"""Archives whose extracted size exceeds the limit are rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
archive = Path(tmp) / "bomb.skill"
|
||||
# Create a small archive that claims huge uncompressed size.
|
||||
# Write 200 bytes but the safe_extract checks cumulative file_size.
|
||||
data = b"\x00" * 200
|
||||
with zipfile.ZipFile(archive, "w", compression=zipfile.ZIP_DEFLATED) as zf:
|
||||
zf.writestr("big.bin", data)
|
||||
|
||||
skills_root = Path(tmp) / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
# Patch max_total_size to a small value to trigger the bomb check.
|
||||
from deerflow.skills import installer as _installer
|
||||
orig = _installer.safe_extract_skill_archive
|
||||
|
||||
def patched_extract(zf, dest, max_total_size=100):
|
||||
return orig(zf, dest, max_total_size=100)
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
|
||||
patch("deerflow.skills.installer.safe_extract_skill_archive", side_effect=patched_extract),
|
||||
):
|
||||
with pytest.raises(ValueError, match="too large"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_absolute_path_in_archive_rejected(self, client):
|
||||
"""ZIP entries with absolute paths are rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
archive = Path(tmp) / "abs.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.writestr("/etc/passwd", "root:x:0:0")
|
||||
|
||||
skills_root = Path(tmp) / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
|
||||
with pytest.raises(ValueError, match="unsafe"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_dotdot_path_in_archive_rejected(self, client):
|
||||
"""ZIP entries with '..' path components are rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
archive = Path(tmp) / "traversal.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.writestr("skill/../../../etc/shadow", "bad")
|
||||
|
||||
skills_root = Path(tmp) / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
|
||||
with pytest.raises(ValueError, match="unsafe"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_symlinks_skipped_during_extraction(self, client):
|
||||
"""Symlink entries in the archive are skipped (never written to disk)."""
|
||||
import stat as stat_mod
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
|
||||
archive = tmp_path / "sym-skill.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.writestr("sym-skill/SKILL.md", "---\nname: sym-skill\ndescription: test\n---\nBody")
|
||||
# Inject a symlink entry via ZipInfo with Unix symlink mode.
|
||||
link_info = zipfile.ZipInfo("sym-skill/sneaky_link")
|
||||
link_info.external_attr = (stat_mod.S_IFLNK | 0o777) << 16
|
||||
zf.writestr(link_info, "/etc/passwd")
|
||||
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
|
||||
result = client.install_skill(archive)
|
||||
|
||||
assert result["success"] is True
|
||||
installed = skills_root / "custom" / "sym-skill"
|
||||
assert (installed / "SKILL.md").exists()
|
||||
assert not (installed / "sneaky_link").exists()
|
||||
|
||||
def test_invalid_skill_name_rejected(self, client):
|
||||
"""Skill names containing special characters are rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
|
||||
skill_dir = tmp_path / "bad-name"
|
||||
skill_dir.mkdir()
|
||||
(skill_dir / "SKILL.md").write_text("---\nname: ../evil\ndescription: test\n---\n")
|
||||
|
||||
archive = tmp_path / "bad.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.write(skill_dir / "SKILL.md", "bad-name/SKILL.md")
|
||||
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
|
||||
patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(True, "OK", "../evil")),
|
||||
):
|
||||
with pytest.raises(ValueError, match="Invalid skill name"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_existing_skill_rejected(self, client):
|
||||
"""Installing a skill that already exists is rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
|
||||
skill_dir = tmp_path / "dupe-skill"
|
||||
skill_dir.mkdir()
|
||||
(skill_dir / "SKILL.md").write_text("---\nname: dupe-skill\ndescription: test\n---\n")
|
||||
|
||||
archive = tmp_path / "dupe-skill.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.write(skill_dir / "SKILL.md", "dupe-skill/SKILL.md")
|
||||
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "custom" / "dupe-skill").mkdir(parents=True)
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
|
||||
patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(True, "OK", "dupe-skill")),
|
||||
):
|
||||
with pytest.raises(ValueError, match="already exists"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_empty_archive_rejected(self, client):
|
||||
"""An archive with no entries is rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
archive = Path(tmp) / "empty.skill"
|
||||
with zipfile.ZipFile(archive, "w"):
|
||||
pass # empty archive
|
||||
|
||||
skills_root = Path(tmp) / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
|
||||
with pytest.raises(ValueError, match="empty"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_invalid_frontmatter_rejected(self, client):
|
||||
"""Archive with invalid SKILL.md frontmatter is rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
skill_dir = tmp_path / "bad-meta"
|
||||
skill_dir.mkdir()
|
||||
(skill_dir / "SKILL.md").write_text("no frontmatter at all")
|
||||
|
||||
archive = tmp_path / "bad-meta.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.write(skill_dir / "SKILL.md", "bad-meta/SKILL.md")
|
||||
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
|
||||
patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(False, "Missing name field", "")),
|
||||
):
|
||||
with pytest.raises(ValueError, match="Invalid skill"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_not_a_zip_rejected(self, client):
|
||||
"""A .skill file that is not a valid ZIP is rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
archive = Path(tmp) / "fake.skill"
|
||||
archive.write_text("this is not a zip file")
|
||||
|
||||
with pytest.raises(ValueError, match="not a valid ZIP"):
|
||||
client.install_skill(archive)
|
||||
|
||||
def test_directory_path_rejected(self, client):
|
||||
"""Passing a directory instead of a file is rejected."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
with pytest.raises(ValueError, match="not a file"):
|
||||
client.install_skill(tmp)
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — _atomic_write_json error paths
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestAtomicWriteJson:
|
||||
def test_temp_file_cleaned_on_serialization_failure(self):
|
||||
"""If json.dump raises, the temp file is removed."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
target = Path(tmp) / "config.json"
|
||||
|
||||
# An object that cannot be serialized to JSON.
|
||||
bad_data = {"key": object()}
|
||||
|
||||
with pytest.raises(TypeError):
|
||||
DeerFlowClient._atomic_write_json(target, bad_data)
|
||||
|
||||
# Target should not have been created.
|
||||
assert not target.exists()
|
||||
# No stray .tmp files should remain.
|
||||
tmp_files = list(Path(tmp).glob("*.tmp"))
|
||||
assert tmp_files == []
|
||||
|
||||
def test_happy_path_writes_atomically(self):
|
||||
"""Normal write produces correct JSON and no temp files."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
target = Path(tmp) / "out.json"
|
||||
data = {"key": "value", "nested": [1, 2, 3]}
|
||||
|
||||
DeerFlowClient._atomic_write_json(target, data)
|
||||
|
||||
assert target.exists()
|
||||
with open(target) as f:
|
||||
loaded = json.load(f)
|
||||
assert loaded == data
|
||||
# No temp files left behind.
|
||||
assert list(Path(tmp).glob("*.tmp")) == []
|
||||
|
||||
def test_original_preserved_on_failure(self):
|
||||
"""If write fails, the original file is not corrupted."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
target = Path(tmp) / "config.json"
|
||||
target.write_text('{"original": true}')
|
||||
|
||||
bad_data = {"key": object()}
|
||||
with pytest.raises(TypeError):
|
||||
DeerFlowClient._atomic_write_json(target, bad_data)
|
||||
|
||||
# Original content must survive.
|
||||
with open(target) as f:
|
||||
assert json.load(f) == {"original": True}
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — config update error paths
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestConfigUpdateErrors:
|
||||
def test_update_mcp_config_no_config_file(self, client):
|
||||
"""FileNotFoundError when extensions_config.json cannot be located."""
|
||||
with patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=None):
|
||||
with pytest.raises(FileNotFoundError, match="Cannot locate"):
|
||||
client.update_mcp_config({"server": {}})
|
||||
|
||||
def test_update_skill_no_config_file(self, client):
|
||||
"""FileNotFoundError when extensions_config.json cannot be located."""
|
||||
skill = MagicMock()
|
||||
skill.name = "some-skill"
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.loader.load_skills", return_value=[skill]),
|
||||
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=None),
|
||||
):
|
||||
with pytest.raises(FileNotFoundError, match="Cannot locate"):
|
||||
client.update_skill("some-skill", enabled=False)
|
||||
|
||||
def test_update_skill_disappears_after_write(self, client):
|
||||
"""RuntimeError when skill vanishes between write and re-read."""
|
||||
skill = MagicMock()
|
||||
skill.name = "ghost-skill"
|
||||
|
||||
ext_config = MagicMock()
|
||||
ext_config.mcp_servers = {}
|
||||
ext_config.skills = {}
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
config_file = Path(tmp) / "extensions_config.json"
|
||||
config_file.write_text("{}")
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.loader.load_skills", side_effect=[[skill], []]),
|
||||
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
|
||||
patch("deerflow.client.get_extensions_config", return_value=ext_config),
|
||||
patch("deerflow.client.reload_extensions_config"),
|
||||
):
|
||||
with pytest.raises(RuntimeError, match="disappeared"):
|
||||
client.update_skill("ghost-skill", enabled=False)
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — stream / chat edge cases
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestStreamHardening:
|
||||
def test_agent_exception_propagates(self, client):
|
||||
"""Exceptions from agent.stream() propagate to caller."""
|
||||
agent = MagicMock()
|
||||
agent.stream.side_effect = RuntimeError("model quota exceeded")
|
||||
|
||||
with (
|
||||
patch.object(client, "_ensure_agent"),
|
||||
patch.object(client, "_agent", agent),
|
||||
):
|
||||
with pytest.raises(RuntimeError, match="model quota exceeded"):
|
||||
list(client.stream("hi", thread_id="t-err"))
|
||||
|
||||
def test_messages_without_id(self, client):
|
||||
"""Messages without id attribute are emitted without crashing."""
|
||||
ai = AIMessage(content="no id here")
|
||||
# Forcibly remove the id attribute to simulate edge case.
|
||||
object.__setattr__(ai, "id", None)
|
||||
chunks = [{"messages": [ai]}]
|
||||
agent = _make_agent_mock(chunks)
|
||||
|
||||
with (
|
||||
patch.object(client, "_ensure_agent"),
|
||||
patch.object(client, "_agent", agent),
|
||||
):
|
||||
events = list(client.stream("hi", thread_id="t-noid"))
|
||||
|
||||
# Should produce events without error.
|
||||
assert events[-1].type == "end"
|
||||
ai_events = _ai_events(events)
|
||||
assert len(ai_events) == 1
|
||||
assert ai_events[0].data["content"] == "no id here"
|
||||
|
||||
def test_tool_calls_only_no_text(self, client):
|
||||
"""chat() returns empty string when agent only emits tool calls."""
|
||||
ai = AIMessage(
|
||||
content="",
|
||||
id="ai-1",
|
||||
tool_calls=[{"name": "bash", "args": {"cmd": "ls"}, "id": "tc-1"}],
|
||||
)
|
||||
tool = ToolMessage(content="output", id="tm-1", tool_call_id="tc-1", name="bash")
|
||||
chunks = [
|
||||
{"messages": [ai]},
|
||||
{"messages": [ai, tool]},
|
||||
]
|
||||
agent = _make_agent_mock(chunks)
|
||||
|
||||
with (
|
||||
patch.object(client, "_ensure_agent"),
|
||||
patch.object(client, "_agent", agent),
|
||||
):
|
||||
result = client.chat("do it", thread_id="t-tc-only")
|
||||
|
||||
assert result == ""
|
||||
|
||||
def test_duplicate_messages_without_id_not_deduplicated(self, client):
|
||||
"""Messages with id=None are NOT deduplicated (each is emitted)."""
|
||||
ai1 = AIMessage(content="first")
|
||||
ai2 = AIMessage(content="second")
|
||||
object.__setattr__(ai1, "id", None)
|
||||
object.__setattr__(ai2, "id", None)
|
||||
|
||||
chunks = [
|
||||
{"messages": [ai1]},
|
||||
{"messages": [ai2]},
|
||||
]
|
||||
agent = _make_agent_mock(chunks)
|
||||
|
||||
with (
|
||||
patch.object(client, "_ensure_agent"),
|
||||
patch.object(client, "_agent", agent),
|
||||
):
|
||||
events = list(client.stream("hi", thread_id="t-dup-noid"))
|
||||
|
||||
ai_msgs = _ai_events(events)
|
||||
assert len(ai_msgs) == 2
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — _serialize_message coverage
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestSerializeMessage:
|
||||
def test_system_message(self):
|
||||
msg = SystemMessage(content="You are a helpful assistant.", id="sys-1")
|
||||
result = DeerFlowClient._serialize_message(msg)
|
||||
assert result["type"] == "system"
|
||||
assert result["content"] == "You are a helpful assistant."
|
||||
assert result["id"] == "sys-1"
|
||||
|
||||
def test_unknown_message_type(self):
|
||||
"""Non-standard message types serialize as 'unknown'."""
|
||||
msg = MagicMock()
|
||||
msg.id = "unk-1"
|
||||
msg.content = "something"
|
||||
# Not an instance of AIMessage/ToolMessage/HumanMessage/SystemMessage
|
||||
type(msg).__name__ = "CustomMessage"
|
||||
result = DeerFlowClient._serialize_message(msg)
|
||||
assert result["type"] == "unknown"
|
||||
assert result["id"] == "unk-1"
|
||||
|
||||
def test_ai_message_with_tool_calls(self):
|
||||
msg = AIMessage(
|
||||
content="",
|
||||
id="ai-tc",
|
||||
tool_calls=[{"name": "bash", "args": {"cmd": "ls"}, "id": "tc-1"}],
|
||||
)
|
||||
result = DeerFlowClient._serialize_message(msg)
|
||||
assert result["type"] == "ai"
|
||||
assert len(result["tool_calls"]) == 1
|
||||
assert result["tool_calls"][0]["name"] == "bash"
|
||||
|
||||
def test_tool_message_non_string_content(self):
|
||||
msg = ToolMessage(content={"key": "value"}, id="tm-1", tool_call_id="tc-1", name="tool")
|
||||
result = DeerFlowClient._serialize_message(msg)
|
||||
assert result["type"] == "tool"
|
||||
assert isinstance(result["content"], str)
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — upload / delete symlink attack
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestUploadDeleteSymlink:
|
||||
def test_delete_upload_symlink_outside_dir(self, client):
|
||||
"""A symlink in uploads dir pointing outside is caught by path traversal check."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
uploads_dir = Path(tmp) / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
|
||||
# Create a target file outside uploads dir.
|
||||
outside = Path(tmp) / "secret.txt"
|
||||
outside.write_text("sensitive data")
|
||||
|
||||
# Create a symlink inside uploads dir pointing to outside file.
|
||||
link = uploads_dir / "harmless.txt"
|
||||
link.symlink_to(outside)
|
||||
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
# The resolved path of the symlink escapes uploads_dir,
|
||||
# so path traversal check should catch it.
|
||||
with pytest.raises(PathTraversalError):
|
||||
client.delete_upload("thread-1", "harmless.txt")
|
||||
|
||||
# The outside file must NOT have been deleted.
|
||||
assert outside.exists()
|
||||
|
||||
def test_upload_filename_with_spaces_and_unicode(self, client):
|
||||
"""Files with spaces and unicode characters in names upload correctly."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
uploads_dir = tmp_path / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
|
||||
weird_name = "report 2024 数据.txt"
|
||||
src_file = tmp_path / weird_name
|
||||
src_file.write_text("data")
|
||||
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.upload_files("thread-1", [src_file])
|
||||
|
||||
assert result["success"] is True
|
||||
assert result["files"][0]["filename"] == weird_name
|
||||
assert (uploads_dir / weird_name).exists()
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# Hardening — artifact edge cases
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestArtifactHardening:
|
||||
def test_artifact_directory_rejected(self, client):
|
||||
"""get_artifact rejects paths that resolve to a directory."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
paths = Paths(base_dir=tmp)
|
||||
subdir = paths.sandbox_outputs_dir("t1") / "subdir"
|
||||
subdir.mkdir(parents=True)
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
with pytest.raises(ValueError, match="not a file"):
|
||||
client.get_artifact("t1", "mnt/user-data/outputs/subdir")
|
||||
|
||||
def test_artifact_leading_slash_stripped(self, client):
|
||||
"""Paths with leading slash are handled correctly."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
paths = Paths(base_dir=tmp)
|
||||
outputs = paths.sandbox_outputs_dir("t1")
|
||||
outputs.mkdir(parents=True)
|
||||
(outputs / "file.txt").write_text("content")
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
content, _mime = client.get_artifact("t1", "/mnt/user-data/outputs/file.txt")
|
||||
|
||||
assert content == b"content"
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
# BUG DETECTION — tests that expose real bugs in client.py
|
||||
# ===========================================================================
|
||||
|
||||
|
||||
class TestUploadDuplicateFilenames:
|
||||
"""Regression: upload_files must auto-rename duplicate basenames.
|
||||
|
||||
Previously it silently overwrote the first file with the second,
|
||||
then reported both in the response while only one existed on disk.
|
||||
Now duplicates are renamed (data.txt → data_1.txt) and the response
|
||||
includes original_filename so the agent / caller can see what happened.
|
||||
"""
|
||||
|
||||
def test_duplicate_filenames_auto_renamed(self, client):
|
||||
"""Two files with same basename → second gets _1 suffix."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
uploads_dir = tmp_path / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
|
||||
dir_a = tmp_path / "a"
|
||||
dir_b = tmp_path / "b"
|
||||
dir_a.mkdir()
|
||||
dir_b.mkdir()
|
||||
(dir_a / "data.txt").write_text("version A")
|
||||
(dir_b / "data.txt").write_text("version B")
|
||||
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.upload_files("t-dup", [dir_a / "data.txt", dir_b / "data.txt"])
|
||||
|
||||
assert result["success"] is True
|
||||
assert len(result["files"]) == 2
|
||||
|
||||
# Both files exist on disk with distinct names.
|
||||
disk_files = sorted(p.name for p in uploads_dir.iterdir())
|
||||
assert disk_files == ["data.txt", "data_1.txt"]
|
||||
|
||||
# First keeps original name, second is renamed.
|
||||
assert result["files"][0]["filename"] == "data.txt"
|
||||
assert "original_filename" not in result["files"][0]
|
||||
|
||||
assert result["files"][1]["filename"] == "data_1.txt"
|
||||
assert result["files"][1]["original_filename"] == "data.txt"
|
||||
|
||||
# Content preserved correctly.
|
||||
assert (uploads_dir / "data.txt").read_text() == "version A"
|
||||
assert (uploads_dir / "data_1.txt").read_text() == "version B"
|
||||
|
||||
def test_triple_duplicate_increments_counter(self, client):
|
||||
"""Three files with same basename → _1, _2 suffixes."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
uploads_dir = tmp_path / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
|
||||
for name in ["x", "y", "z"]:
|
||||
d = tmp_path / name
|
||||
d.mkdir()
|
||||
(d / "report.csv").write_text(f"from {name}")
|
||||
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.upload_files(
|
||||
"t-triple",
|
||||
[tmp_path / "x" / "report.csv", tmp_path / "y" / "report.csv", tmp_path / "z" / "report.csv"],
|
||||
)
|
||||
|
||||
filenames = [f["filename"] for f in result["files"]]
|
||||
assert filenames == ["report.csv", "report_1.csv", "report_2.csv"]
|
||||
assert len(list(uploads_dir.iterdir())) == 3
|
||||
|
||||
def test_different_filenames_no_rename(self, client):
|
||||
"""Non-duplicate filenames upload normally without rename."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
uploads_dir = tmp_path / "uploads"
|
||||
uploads_dir.mkdir()
|
||||
|
||||
(tmp_path / "a.txt").write_text("aaa")
|
||||
(tmp_path / "b.txt").write_text("bbb")
|
||||
|
||||
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
|
||||
result = client.upload_files("t-ok", [tmp_path / "a.txt", tmp_path / "b.txt"])
|
||||
|
||||
assert result["success"] is True
|
||||
assert len(result["files"]) == 2
|
||||
assert all("original_filename" not in f for f in result["files"])
|
||||
assert len(list(uploads_dir.iterdir())) == 2
|
||||
|
||||
|
||||
class TestBugArtifactPrefixMatchTooLoose:
|
||||
"""Regression: get_artifact must reject paths like ``mnt/user-data-evil/...``.
|
||||
|
||||
Previously ``startswith("mnt/user-data")`` matched ``"mnt/user-data-evil"``
|
||||
because it was a string prefix, not a path-segment check.
|
||||
"""
|
||||
|
||||
def test_non_canonical_prefix_rejected(self, client):
|
||||
"""Paths that share a string prefix but differ at segment boundary are rejected."""
|
||||
with pytest.raises(ValueError, match="must start with"):
|
||||
client.get_artifact("t1", "mnt/user-data-evil/secret.txt")
|
||||
|
||||
def test_exact_prefix_without_subpath_accepted(self, client):
|
||||
"""Bare 'mnt/user-data' is accepted (will later fail as directory, not at prefix)."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
paths = Paths(base_dir=tmp)
|
||||
paths.sandbox_user_data_dir("t1").mkdir(parents=True)
|
||||
|
||||
with patch("deerflow.client.get_paths", return_value=paths):
|
||||
# Accepted at prefix check, but fails because it's a directory.
|
||||
with pytest.raises(ValueError, match="not a file"):
|
||||
client.get_artifact("t1", "mnt/user-data")
|
||||
|
||||
|
||||
class TestBugListUploadsDeadCode:
|
||||
"""Regression: list_uploads works even when called on a fresh thread
|
||||
(directory does not exist yet — returns empty without creating it).
|
||||
"""
|
||||
|
||||
def test_list_uploads_on_fresh_thread(self, client):
|
||||
"""list_uploads on a thread that never had uploads returns empty list."""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
non_existent = Path(tmp) / "does-not-exist" / "uploads"
|
||||
assert not non_existent.exists()
|
||||
|
||||
mock_paths = MagicMock()
|
||||
mock_paths.sandbox_uploads_dir.return_value = non_existent
|
||||
|
||||
with patch("deerflow.uploads.manager.get_paths", return_value=mock_paths):
|
||||
result = client.list_uploads("thread-fresh")
|
||||
|
||||
# Read path should NOT create the directory
|
||||
assert not non_existent.exists()
|
||||
assert result == {"files": [], "count": 0}
|
||||
|
||||
|
||||
class TestBugAgentInvalidationInconsistency:
|
||||
"""Regression: update_skill and update_mcp_config must reset both
|
||||
_agent and _agent_config_key, just like reset_agent() does.
|
||||
"""
|
||||
|
||||
def test_update_mcp_resets_config_key(self, client):
|
||||
"""After update_mcp_config, both _agent and _agent_config_key are None."""
|
||||
client._agent = MagicMock()
|
||||
client._agent_config_key = ("model", True, False, False)
|
||||
|
||||
current_config = MagicMock()
|
||||
current_config.skills = {}
|
||||
reloaded = MagicMock()
|
||||
reloaded.mcp_servers = {}
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
config_file = Path(tmp) / "ext.json"
|
||||
config_file.write_text("{}")
|
||||
|
||||
with (
|
||||
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
|
||||
patch("deerflow.client.get_extensions_config", return_value=current_config),
|
||||
patch("deerflow.client.reload_extensions_config", return_value=reloaded),
|
||||
):
|
||||
client.update_mcp_config({})
|
||||
|
||||
assert client._agent is None
|
||||
assert client._agent_config_key is None
|
||||
|
||||
def test_update_skill_resets_config_key(self, client):
|
||||
"""After update_skill, both _agent and _agent_config_key are None."""
|
||||
client._agent = MagicMock()
|
||||
client._agent_config_key = ("model", True, False, False)
|
||||
|
||||
skill = MagicMock()
|
||||
skill.name = "s1"
|
||||
updated = MagicMock()
|
||||
updated.name = "s1"
|
||||
updated.description = "d"
|
||||
updated.license = "MIT"
|
||||
updated.category = "c"
|
||||
updated.enabled = False
|
||||
|
||||
ext_config = MagicMock()
|
||||
ext_config.mcp_servers = {}
|
||||
ext_config.skills = {}
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
config_file = Path(tmp) / "ext.json"
|
||||
config_file.write_text("{}")
|
||||
|
||||
with (
|
||||
patch("deerflow.skills.loader.load_skills", side_effect=[[skill], [updated]]),
|
||||
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
|
||||
patch("deerflow.client.get_extensions_config", return_value=ext_config),
|
||||
patch("deerflow.client.reload_extensions_config"),
|
||||
):
|
||||
client.update_skill("s1", enabled=False)
|
||||
|
||||
assert client._agent is None
|
||||
assert client._agent_config_key is None
|
||||
|
||||
781
backend/tests/test_client_e2e.py
Normal file
781
backend/tests/test_client_e2e.py
Normal file
@@ -0,0 +1,781 @@
|
||||
"""End-to-end tests for DeerFlowClient.
|
||||
|
||||
Middle tier of the test pyramid:
|
||||
- Top: test_client_live.py — real LLM, needs API key
|
||||
- Middle: test_client_e2e.py — real LLM + real modules ← THIS FILE
|
||||
- Bottom: test_client.py — unit tests, mock everything
|
||||
|
||||
Core principle: use the real LLM from config.yaml, let config, middleware
|
||||
chain, tool registration, file I/O, and event serialization all run for real.
|
||||
Only DEER_FLOW_HOME is redirected to tmp_path for filesystem isolation.
|
||||
|
||||
Tests that call the LLM are marked ``requires_llm`` and skipped in CI.
|
||||
File-management tests (upload/list/delete) don't need LLM and run everywhere.
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import uuid
|
||||
import zipfile
|
||||
|
||||
import pytest
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from deerflow.client import DeerFlowClient, StreamEvent
|
||||
from deerflow.config.app_config import AppConfig
|
||||
from deerflow.config.model_config import ModelConfig
|
||||
from deerflow.config.sandbox_config import SandboxConfig
|
||||
|
||||
# Load .env from project root (for OPENAI_API_KEY etc.)
|
||||
load_dotenv(os.path.join(os.path.dirname(__file__), "../../.env"))
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Markers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
requires_llm = pytest.mark.skipif(
|
||||
os.getenv("CI", "").lower() in ("true", "1") or not os.getenv("OPENAI_API_KEY"),
|
||||
reason="Requires LLM API key — skipped in CI or when OPENAI_API_KEY is unset",
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_e2e_config() -> AppConfig:
|
||||
"""Build a minimal AppConfig using real LLM credentials from environment.
|
||||
|
||||
All LLM connection details come from environment variables so that both
|
||||
internal CI and external contributors can run the tests:
|
||||
|
||||
- ``E2E_MODEL_NAME`` (default: ``volcengine-ark``)
|
||||
- ``E2E_MODEL_USE`` (default: ``langchain_openai:ChatOpenAI``)
|
||||
- ``E2E_MODEL_ID`` (default: ``ep-20251211175242-llcmh``)
|
||||
- ``E2E_BASE_URL`` (default: ``https://ark-cn-beijing.bytedance.net/api/v3``)
|
||||
- ``OPENAI_API_KEY`` (required for LLM tests)
|
||||
"""
|
||||
return AppConfig(
|
||||
models=[
|
||||
ModelConfig(
|
||||
name=os.getenv("E2E_MODEL_NAME", "volcengine-ark"),
|
||||
display_name="E2E Test Model",
|
||||
use=os.getenv("E2E_MODEL_USE", "langchain_openai:ChatOpenAI"),
|
||||
model=os.getenv("E2E_MODEL_ID", "ep-20251211175242-llcmh"),
|
||||
base_url=os.getenv("E2E_BASE_URL", "https://ark-cn-beijing.bytedance.net/api/v3"),
|
||||
api_key=os.getenv("OPENAI_API_KEY", ""),
|
||||
max_tokens=512,
|
||||
temperature=0.7,
|
||||
supports_thinking=False,
|
||||
supports_reasoning_effort=False,
|
||||
supports_vision=False,
|
||||
)
|
||||
],
|
||||
sandbox=SandboxConfig(use="deerflow.sandbox.local:LocalSandboxProvider"),
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Fixtures
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def e2e_env(tmp_path, monkeypatch):
|
||||
"""Isolated filesystem environment for E2E tests.
|
||||
|
||||
- DEER_FLOW_HOME → tmp_path (all thread data lands in a temp dir)
|
||||
- Singletons reset so they pick up the new env
|
||||
- Title/memory/summarization disabled to avoid extra LLM calls
|
||||
- AppConfig built programmatically (avoids config.yaml param-name issues)
|
||||
"""
|
||||
# 1. Filesystem isolation
|
||||
monkeypatch.setenv("DEER_FLOW_HOME", str(tmp_path))
|
||||
monkeypatch.setattr("deerflow.config.paths._paths", None)
|
||||
monkeypatch.setattr("deerflow.sandbox.sandbox_provider._default_sandbox_provider", None)
|
||||
|
||||
# 2. Inject a clean AppConfig via the global singleton.
|
||||
config = _make_e2e_config()
|
||||
monkeypatch.setattr("deerflow.config.app_config._app_config", config)
|
||||
monkeypatch.setattr("deerflow.config.app_config._app_config_is_custom", True)
|
||||
|
||||
# 3. Disable title generation (extra LLM call, non-deterministic)
|
||||
from deerflow.config.title_config import TitleConfig
|
||||
|
||||
monkeypatch.setattr("deerflow.config.title_config._title_config", TitleConfig(enabled=False))
|
||||
|
||||
# 4. Disable memory queueing (avoids background threads & file writes)
|
||||
from deerflow.config.memory_config import MemoryConfig
|
||||
|
||||
monkeypatch.setattr(
|
||||
"deerflow.agents.middlewares.memory_middleware.get_memory_config",
|
||||
lambda: MemoryConfig(enabled=False),
|
||||
)
|
||||
|
||||
# 5. Ensure summarization is off (default, but be explicit)
|
||||
from deerflow.config.summarization_config import SummarizationConfig
|
||||
|
||||
monkeypatch.setattr("deerflow.config.summarization_config._summarization_config", SummarizationConfig(enabled=False))
|
||||
|
||||
# 6. Exclude TitleMiddleware from the chain.
|
||||
# It triggers an extra LLM call to generate a thread title, which adds
|
||||
# non-determinism and cost to E2E tests (title generation is already
|
||||
# disabled via TitleConfig above, but the middleware still participates
|
||||
# in the chain and can interfere with event ordering).
|
||||
from deerflow.agents.lead_agent.agent import _build_middlewares as _original_build_middlewares
|
||||
from deerflow.agents.middlewares.title_middleware import TitleMiddleware
|
||||
|
||||
def _sync_safe_build_middlewares(*args, **kwargs):
|
||||
mws = _original_build_middlewares(*args, **kwargs)
|
||||
return [m for m in mws if not isinstance(m, TitleMiddleware)]
|
||||
|
||||
monkeypatch.setattr("deerflow.client._build_middlewares", _sync_safe_build_middlewares)
|
||||
|
||||
return {"tmp_path": tmp_path}
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def client(e2e_env):
|
||||
"""A DeerFlowClient wired to the isolated e2e_env."""
|
||||
return DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 2: Basic streaming (requires LLM)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestBasicChat:
|
||||
"""Basic chat and streaming behavior with real LLM."""
|
||||
|
||||
@requires_llm
|
||||
def test_basic_chat(self, client):
|
||||
"""chat() returns a non-empty text response."""
|
||||
result = client.chat("Say exactly: pong")
|
||||
assert isinstance(result, str)
|
||||
assert len(result) > 0
|
||||
|
||||
@requires_llm
|
||||
def test_stream_event_sequence(self, client):
|
||||
"""stream() yields events: messages-tuple, values, and end."""
|
||||
events = list(client.stream("Say hi"))
|
||||
|
||||
types = [e.type for e in events]
|
||||
assert types[-1] == "end"
|
||||
assert "messages-tuple" in types
|
||||
assert "values" in types
|
||||
|
||||
@requires_llm
|
||||
def test_stream_event_data_format(self, client):
|
||||
"""Each event type has the expected data structure."""
|
||||
events = list(client.stream("Say hello"))
|
||||
|
||||
for event in events:
|
||||
assert isinstance(event, StreamEvent)
|
||||
assert isinstance(event.type, str)
|
||||
assert isinstance(event.data, dict)
|
||||
|
||||
if event.type == "messages-tuple" and event.data.get("type") == "ai":
|
||||
assert "content" in event.data
|
||||
assert "id" in event.data
|
||||
elif event.type == "values":
|
||||
assert "messages" in event.data
|
||||
assert "artifacts" in event.data
|
||||
elif event.type == "end":
|
||||
assert event.data == {}
|
||||
|
||||
@requires_llm
|
||||
def test_multi_turn_stateless(self, client):
|
||||
"""Without checkpointer, two calls to the same thread_id are independent."""
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
r1 = client.chat("Remember the number 42", thread_id=tid)
|
||||
# Reset so agent is recreated (simulates no cross-turn state)
|
||||
client.reset_agent()
|
||||
r2 = client.chat("What number did I say?", thread_id=tid)
|
||||
|
||||
# Without a checkpointer the second call has no memory of the first.
|
||||
# We can't assert exact content, but both should be non-empty.
|
||||
assert isinstance(r1, str) and len(r1) > 0
|
||||
assert isinstance(r2, str) and len(r2) > 0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 3: Tool call flow (requires LLM)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestToolCallFlow:
|
||||
"""Verify the LLM actually invokes tools through the real agent pipeline."""
|
||||
|
||||
@requires_llm
|
||||
def test_tool_call_produces_events(self, client):
|
||||
"""When the LLM decides to use a tool, we see tool call + result events."""
|
||||
# Give a clear instruction that forces a tool call
|
||||
events = list(client.stream(
|
||||
"Use the bash tool to run: echo hello_e2e_test"
|
||||
))
|
||||
|
||||
types = [e.type for e in events]
|
||||
assert types[-1] == "end"
|
||||
|
||||
# Should have at least one tool call event
|
||||
tool_call_events = [
|
||||
e for e in events
|
||||
if e.type == "messages-tuple" and e.data.get("tool_calls")
|
||||
]
|
||||
tool_result_events = [
|
||||
e for e in events
|
||||
if e.type == "messages-tuple" and e.data.get("type") == "tool"
|
||||
]
|
||||
assert len(tool_call_events) >= 1, "Expected at least one tool_call event"
|
||||
assert len(tool_result_events) >= 1, "Expected at least one tool result event"
|
||||
|
||||
@requires_llm
|
||||
def test_tool_call_event_structure(self, client):
|
||||
"""Tool call events contain name, args, and id fields."""
|
||||
events = list(client.stream(
|
||||
"Use the read_file tool to read /mnt/user-data/workspace/nonexistent.txt"
|
||||
))
|
||||
|
||||
tc_events = [
|
||||
e for e in events
|
||||
if e.type == "messages-tuple" and e.data.get("tool_calls")
|
||||
]
|
||||
if tc_events:
|
||||
tc = tc_events[0].data["tool_calls"][0]
|
||||
assert "name" in tc
|
||||
assert "args" in tc
|
||||
assert "id" in tc
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 4: File upload integration (no LLM needed for most)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestFileUploadIntegration:
|
||||
"""Upload, list, and delete files through the real client path."""
|
||||
|
||||
def test_upload_files(self, e2e_env, tmp_path):
|
||||
"""upload_files() copies files and returns metadata."""
|
||||
test_file = tmp_path / "source" / "readme.txt"
|
||||
test_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
test_file.write_text("Hello world")
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
result = c.upload_files(tid, [test_file])
|
||||
assert result["success"] is True
|
||||
assert len(result["files"]) == 1
|
||||
assert result["files"][0]["filename"] == "readme.txt"
|
||||
|
||||
# Physically exists
|
||||
from deerflow.config.paths import get_paths
|
||||
assert (get_paths().sandbox_uploads_dir(tid) / "readme.txt").exists()
|
||||
|
||||
def test_upload_duplicate_rename(self, e2e_env, tmp_path):
|
||||
"""Uploading two files with the same name auto-renames the second."""
|
||||
d1 = tmp_path / "dir1"
|
||||
d2 = tmp_path / "dir2"
|
||||
d1.mkdir()
|
||||
d2.mkdir()
|
||||
(d1 / "data.txt").write_text("content A")
|
||||
(d2 / "data.txt").write_text("content B")
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
result = c.upload_files(tid, [d1 / "data.txt", d2 / "data.txt"])
|
||||
assert result["success"] is True
|
||||
assert len(result["files"]) == 2
|
||||
|
||||
filenames = {f["filename"] for f in result["files"]}
|
||||
assert "data.txt" in filenames
|
||||
assert "data_1.txt" in filenames
|
||||
|
||||
def test_upload_list_and_delete(self, e2e_env, tmp_path):
|
||||
"""Upload → list → delete → list lifecycle."""
|
||||
test_file = tmp_path / "lifecycle.txt"
|
||||
test_file.write_text("lifecycle test")
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
c.upload_files(tid, [test_file])
|
||||
|
||||
listing = c.list_uploads(tid)
|
||||
assert listing["count"] == 1
|
||||
assert listing["files"][0]["filename"] == "lifecycle.txt"
|
||||
|
||||
del_result = c.delete_upload(tid, "lifecycle.txt")
|
||||
assert del_result["success"] is True
|
||||
|
||||
listing = c.list_uploads(tid)
|
||||
assert listing["count"] == 0
|
||||
|
||||
@requires_llm
|
||||
def test_upload_then_chat(self, e2e_env, tmp_path):
|
||||
"""Upload a file then ask the LLM about it — UploadsMiddleware injects file info."""
|
||||
test_file = tmp_path / "source" / "notes.txt"
|
||||
test_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
test_file.write_text("The secret code is 7749.")
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
c.upload_files(tid, [test_file])
|
||||
# Chat — the middleware should inject <uploaded_files> context
|
||||
response = c.chat("What files are available?", thread_id=tid)
|
||||
assert isinstance(response, str) and len(response) > 0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 5: Lifecycle and configuration (no LLM needed)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestLifecycleAndConfig:
|
||||
"""Agent recreation and configuration behavior."""
|
||||
|
||||
@requires_llm
|
||||
def test_agent_recreation_on_config_change(self, client):
|
||||
"""Changing thinking_enabled triggers agent recreation (different config key)."""
|
||||
list(client.stream("hi"))
|
||||
key1 = client._agent_config_key
|
||||
|
||||
# Stream with a different config override
|
||||
client.reset_agent()
|
||||
list(client.stream("hi", thinking_enabled=True))
|
||||
key2 = client._agent_config_key
|
||||
|
||||
# thinking_enabled changed: False → True → keys differ
|
||||
assert key1 != key2
|
||||
|
||||
def test_reset_agent_clears_state(self, e2e_env):
|
||||
"""reset_agent() sets the internal agent to None."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
# Before any call, agent is None
|
||||
assert c._agent is None
|
||||
|
||||
c.reset_agent()
|
||||
assert c._agent is None
|
||||
assert c._agent_config_key is None
|
||||
|
||||
def test_plan_mode_config_key(self, e2e_env):
|
||||
"""plan_mode is part of the config key tuple."""
|
||||
c = DeerFlowClient(checkpointer=None, plan_mode=False)
|
||||
cfg1 = c._get_runnable_config("test-thread")
|
||||
key1 = (
|
||||
cfg1["configurable"]["model_name"],
|
||||
cfg1["configurable"]["thinking_enabled"],
|
||||
cfg1["configurable"]["is_plan_mode"],
|
||||
cfg1["configurable"]["subagent_enabled"],
|
||||
)
|
||||
|
||||
c2 = DeerFlowClient(checkpointer=None, plan_mode=True)
|
||||
cfg2 = c2._get_runnable_config("test-thread")
|
||||
key2 = (
|
||||
cfg2["configurable"]["model_name"],
|
||||
cfg2["configurable"]["thinking_enabled"],
|
||||
cfg2["configurable"]["is_plan_mode"],
|
||||
cfg2["configurable"]["subagent_enabled"],
|
||||
)
|
||||
|
||||
assert key1 != key2
|
||||
assert key1[2] is False
|
||||
assert key2[2] is True
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 6: Middleware chain verification (requires LLM)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMiddlewareChain:
|
||||
"""Verify middleware side effects through real execution."""
|
||||
|
||||
@requires_llm
|
||||
def test_thread_data_paths_in_state(self, client):
|
||||
"""After streaming, thread directory paths are computed correctly."""
|
||||
tid = str(uuid.uuid4())
|
||||
events = list(client.stream("hi", thread_id=tid))
|
||||
|
||||
# The values event should contain messages
|
||||
values_events = [e for e in events if e.type == "values"]
|
||||
assert len(values_events) >= 1
|
||||
|
||||
# ThreadDataMiddleware should have set paths in the state.
|
||||
# We verify the paths singleton can resolve the thread dir.
|
||||
from deerflow.config.paths import get_paths
|
||||
thread_dir = get_paths().thread_dir(tid)
|
||||
assert str(thread_dir).endswith(tid)
|
||||
|
||||
@requires_llm
|
||||
def test_stream_completes_without_middleware_errors(self, client):
|
||||
"""Full middleware chain (ThreadData, Uploads, Sandbox, DanglingToolCall,
|
||||
Memory, Clarification) executes without errors."""
|
||||
events = list(client.stream("What is 1+1?"))
|
||||
|
||||
types = [e.type for e in events]
|
||||
assert types[-1] == "end"
|
||||
# Should have at least one AI response
|
||||
ai_events = [
|
||||
e for e in events
|
||||
if e.type == "messages-tuple" and e.data.get("type") == "ai"
|
||||
]
|
||||
assert len(ai_events) >= 1
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 7: Error and boundary conditions
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestErrorAndBoundary:
|
||||
"""Error propagation and edge cases."""
|
||||
|
||||
def test_upload_nonexistent_file_raises(self, e2e_env):
|
||||
"""Uploading a file that doesn't exist raises FileNotFoundError."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(FileNotFoundError):
|
||||
c.upload_files("test-thread", ["/nonexistent/file.txt"])
|
||||
|
||||
def test_delete_nonexistent_upload_raises(self, e2e_env):
|
||||
"""Deleting a file that doesn't exist raises FileNotFoundError."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
# Ensure the uploads dir exists first
|
||||
c.list_uploads(tid)
|
||||
with pytest.raises(FileNotFoundError):
|
||||
c.delete_upload(tid, "ghost.txt")
|
||||
|
||||
def test_artifact_path_traversal_blocked(self, e2e_env):
|
||||
"""get_artifact blocks path traversal attempts."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(ValueError):
|
||||
c.get_artifact("test-thread", "../../etc/passwd")
|
||||
|
||||
def test_upload_directory_rejected(self, e2e_env, tmp_path):
|
||||
"""Uploading a directory (not a file) is rejected."""
|
||||
d = tmp_path / "a_directory"
|
||||
d.mkdir()
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(ValueError, match="not a file"):
|
||||
c.upload_files("test-thread", [d])
|
||||
|
||||
@requires_llm
|
||||
def test_empty_message_still_gets_response(self, client):
|
||||
"""Even an empty-ish message should produce a valid event stream."""
|
||||
events = list(client.stream(" "))
|
||||
types = [e.type for e in events]
|
||||
assert types[-1] == "end"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 8: Artifact access (no LLM needed)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestArtifactAccess:
|
||||
"""Read artifacts through get_artifact() with real filesystem."""
|
||||
|
||||
def test_get_artifact_happy_path(self, e2e_env):
|
||||
"""Write a file to outputs, then read it back via get_artifact()."""
|
||||
from deerflow.config.paths import get_paths
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
# Create an output file in the thread's outputs directory
|
||||
outputs_dir = get_paths().sandbox_outputs_dir(tid)
|
||||
outputs_dir.mkdir(parents=True, exist_ok=True)
|
||||
(outputs_dir / "result.txt").write_text("hello artifact")
|
||||
|
||||
data, mime = c.get_artifact(tid, "mnt/user-data/outputs/result.txt")
|
||||
assert data == b"hello artifact"
|
||||
assert "text" in mime
|
||||
|
||||
def test_get_artifact_nested_path(self, e2e_env):
|
||||
"""Artifacts in subdirectories are accessible."""
|
||||
from deerflow.config.paths import get_paths
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
tid = str(uuid.uuid4())
|
||||
|
||||
outputs_dir = get_paths().sandbox_outputs_dir(tid)
|
||||
sub = outputs_dir / "charts"
|
||||
sub.mkdir(parents=True, exist_ok=True)
|
||||
(sub / "data.json").write_text('{"x": 1}')
|
||||
|
||||
data, mime = c.get_artifact(tid, "mnt/user-data/outputs/charts/data.json")
|
||||
assert b'"x"' in data
|
||||
assert "json" in mime
|
||||
|
||||
def test_get_artifact_nonexistent_raises(self, e2e_env):
|
||||
"""Reading a nonexistent artifact raises FileNotFoundError."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(FileNotFoundError):
|
||||
c.get_artifact("test-thread", "mnt/user-data/outputs/ghost.txt")
|
||||
|
||||
def test_get_artifact_traversal_within_prefix_blocked(self, e2e_env):
|
||||
"""Path traversal within the valid prefix is still blocked."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises((PermissionError, ValueError, FileNotFoundError)):
|
||||
c.get_artifact("test-thread", "mnt/user-data/outputs/../../etc/passwd")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 9: Skill installation (no LLM needed)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSkillInstallation:
|
||||
"""install_skill() with real ZIP handling and filesystem."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _isolate_skills_dir(self, tmp_path, monkeypatch):
|
||||
"""Redirect skill installation to a temp directory."""
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "public").mkdir(parents=True)
|
||||
(skills_root / "custom").mkdir(parents=True)
|
||||
monkeypatch.setattr(
|
||||
"deerflow.skills.installer.get_skills_root_path",
|
||||
lambda: skills_root,
|
||||
)
|
||||
self._skills_root = skills_root
|
||||
|
||||
@staticmethod
|
||||
def _make_skill_zip(tmp_path, skill_name="test-e2e-skill"):
|
||||
"""Create a minimal valid .skill archive."""
|
||||
skill_dir = tmp_path / "build" / skill_name
|
||||
skill_dir.mkdir(parents=True)
|
||||
(skill_dir / "SKILL.md").write_text(
|
||||
f"---\nname: {skill_name}\ndescription: E2E test skill\n---\n\nTest content.\n"
|
||||
)
|
||||
archive_path = tmp_path / f"{skill_name}.skill"
|
||||
with zipfile.ZipFile(archive_path, "w") as zf:
|
||||
for file in skill_dir.rglob("*"):
|
||||
zf.write(file, file.relative_to(tmp_path / "build"))
|
||||
return archive_path
|
||||
|
||||
def test_install_skill_success(self, e2e_env, tmp_path):
|
||||
"""A valid .skill archive installs to the custom skills directory."""
|
||||
archive = self._make_skill_zip(tmp_path)
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
|
||||
result = c.install_skill(archive)
|
||||
assert result["success"] is True
|
||||
assert result["skill_name"] == "test-e2e-skill"
|
||||
assert (self._skills_root / "custom" / "test-e2e-skill" / "SKILL.md").exists()
|
||||
|
||||
def test_install_skill_duplicate_rejected(self, e2e_env, tmp_path):
|
||||
"""Installing the same skill twice raises ValueError."""
|
||||
archive = self._make_skill_zip(tmp_path)
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
|
||||
c.install_skill(archive)
|
||||
with pytest.raises(ValueError, match="already exists"):
|
||||
c.install_skill(archive)
|
||||
|
||||
def test_install_skill_invalid_extension(self, e2e_env, tmp_path):
|
||||
"""A file without .skill extension is rejected."""
|
||||
bad_file = tmp_path / "not_a_skill.zip"
|
||||
bad_file.write_bytes(b"PK\x03\x04") # ZIP magic bytes
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(ValueError, match=".skill extension"):
|
||||
c.install_skill(bad_file)
|
||||
|
||||
def test_install_skill_missing_frontmatter(self, e2e_env, tmp_path):
|
||||
"""A .skill archive without valid SKILL.md frontmatter is rejected."""
|
||||
skill_dir = tmp_path / "build" / "bad-skill"
|
||||
skill_dir.mkdir(parents=True)
|
||||
(skill_dir / "SKILL.md").write_text("No frontmatter here.")
|
||||
|
||||
archive = tmp_path / "bad-skill.skill"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
for file in skill_dir.rglob("*"):
|
||||
zf.write(file, file.relative_to(tmp_path / "build"))
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(ValueError, match="Invalid skill"):
|
||||
c.install_skill(archive)
|
||||
|
||||
def test_install_skill_nonexistent_file(self, e2e_env):
|
||||
"""Installing from a nonexistent path raises FileNotFoundError."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(FileNotFoundError):
|
||||
c.install_skill("/nonexistent/skill.skill")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 10: Configuration management (no LLM needed)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestConfigManagement:
|
||||
"""Config queries and updates through real code paths."""
|
||||
|
||||
def test_list_models_returns_injected_config(self, e2e_env):
|
||||
"""list_models() returns the model from the injected AppConfig."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.list_models()
|
||||
assert "models" in result
|
||||
assert len(result["models"]) == 1
|
||||
assert result["models"][0]["name"] == "volcengine-ark"
|
||||
assert result["models"][0]["display_name"] == "E2E Test Model"
|
||||
|
||||
def test_get_model_found(self, e2e_env):
|
||||
"""get_model() returns the model when it exists."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
model = c.get_model("volcengine-ark")
|
||||
assert model is not None
|
||||
assert model["name"] == "volcengine-ark"
|
||||
assert model["supports_thinking"] is False
|
||||
|
||||
def test_get_model_not_found(self, e2e_env):
|
||||
"""get_model() returns None for nonexistent model."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
assert c.get_model("nonexistent-model") is None
|
||||
|
||||
def test_list_skills_returns_list(self, e2e_env):
|
||||
"""list_skills() returns a dict with 'skills' key from real directory scan."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.list_skills()
|
||||
assert "skills" in result
|
||||
assert isinstance(result["skills"], list)
|
||||
# The real skills/ directory should have some public skills
|
||||
assert len(result["skills"]) > 0
|
||||
|
||||
def test_get_skill_found(self, e2e_env):
|
||||
"""get_skill() returns skill info for a known public skill."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
# 'deep-research' is a built-in public skill
|
||||
skill = c.get_skill("deep-research")
|
||||
if skill is not None:
|
||||
assert skill["name"] == "deep-research"
|
||||
assert "description" in skill
|
||||
assert "enabled" in skill
|
||||
|
||||
def test_get_skill_not_found(self, e2e_env):
|
||||
"""get_skill() returns None for nonexistent skill."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
assert c.get_skill("nonexistent-skill-xyz") is None
|
||||
|
||||
def test_get_mcp_config_returns_dict(self, e2e_env):
|
||||
"""get_mcp_config() returns a dict with 'mcp_servers' key."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.get_mcp_config()
|
||||
assert "mcp_servers" in result
|
||||
assert isinstance(result["mcp_servers"], dict)
|
||||
|
||||
def test_update_mcp_config_writes_and_invalidates(self, e2e_env, tmp_path, monkeypatch):
|
||||
"""update_mcp_config() writes extensions_config.json and invalidates the agent."""
|
||||
# Set up a writable extensions_config.json
|
||||
config_file = tmp_path / "extensions_config.json"
|
||||
config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
|
||||
monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
|
||||
|
||||
# Force reload so the singleton picks up our test file
|
||||
from deerflow.config.extensions_config import reload_extensions_config
|
||||
reload_extensions_config()
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
# Simulate a cached agent
|
||||
c._agent = "fake-agent-placeholder"
|
||||
c._agent_config_key = ("a", "b", "c", "d")
|
||||
|
||||
result = c.update_mcp_config({"test-server": {"enabled": True, "type": "stdio", "command": "echo"}})
|
||||
assert "mcp_servers" in result
|
||||
|
||||
# Agent should be invalidated
|
||||
assert c._agent is None
|
||||
assert c._agent_config_key is None
|
||||
|
||||
# File should be written
|
||||
written = json.loads(config_file.read_text())
|
||||
assert "test-server" in written["mcpServers"]
|
||||
|
||||
def test_update_skill_writes_and_invalidates(self, e2e_env, tmp_path, monkeypatch):
|
||||
"""update_skill() writes extensions_config.json and invalidates the agent."""
|
||||
config_file = tmp_path / "extensions_config.json"
|
||||
config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
|
||||
monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
|
||||
|
||||
from deerflow.config.extensions_config import reload_extensions_config
|
||||
reload_extensions_config()
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
c._agent = "fake-agent-placeholder"
|
||||
c._agent_config_key = ("a", "b", "c", "d")
|
||||
|
||||
# Use a real skill name from the public skills directory
|
||||
skills = c.list_skills()
|
||||
if not skills["skills"]:
|
||||
pytest.skip("No skills available for testing")
|
||||
skill_name = skills["skills"][0]["name"]
|
||||
|
||||
result = c.update_skill(skill_name, enabled=False)
|
||||
assert result["name"] == skill_name
|
||||
assert result["enabled"] is False
|
||||
|
||||
# Agent should be invalidated
|
||||
assert c._agent is None
|
||||
assert c._agent_config_key is None
|
||||
|
||||
def test_update_skill_nonexistent_raises(self, e2e_env, tmp_path, monkeypatch):
|
||||
"""update_skill() raises ValueError for nonexistent skill."""
|
||||
config_file = tmp_path / "extensions_config.json"
|
||||
config_file.write_text(json.dumps({"mcpServers": {}, "skills": {}}))
|
||||
monkeypatch.setenv("DEER_FLOW_EXTENSIONS_CONFIG_PATH", str(config_file))
|
||||
|
||||
from deerflow.config.extensions_config import reload_extensions_config
|
||||
reload_extensions_config()
|
||||
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
with pytest.raises(ValueError, match="not found"):
|
||||
c.update_skill("nonexistent-skill-xyz", enabled=True)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 11: Memory access (no LLM needed)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMemoryAccess:
|
||||
"""Memory system queries through real code paths."""
|
||||
|
||||
def test_get_memory_returns_dict(self, e2e_env):
|
||||
"""get_memory() returns a dict (may be empty initial state)."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.get_memory()
|
||||
assert isinstance(result, dict)
|
||||
|
||||
def test_reload_memory_returns_dict(self, e2e_env):
|
||||
"""reload_memory() forces reload and returns a dict."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.reload_memory()
|
||||
assert isinstance(result, dict)
|
||||
|
||||
def test_get_memory_config_fields(self, e2e_env):
|
||||
"""get_memory_config() returns expected config fields."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.get_memory_config()
|
||||
assert "enabled" in result
|
||||
assert "storage_path" in result
|
||||
assert "debounce_seconds" in result
|
||||
assert "max_facts" in result
|
||||
assert "fact_confidence_threshold" in result
|
||||
assert "injection_enabled" in result
|
||||
assert "max_injection_tokens" in result
|
||||
|
||||
def test_get_memory_status_combines_config_and_data(self, e2e_env):
|
||||
"""get_memory_status() returns both 'config' and 'data' keys."""
|
||||
c = DeerFlowClient(checkpointer=None, thinking_enabled=False)
|
||||
result = c.get_memory_status()
|
||||
assert "config" in result
|
||||
assert "data" in result
|
||||
assert "enabled" in result["config"]
|
||||
assert isinstance(result["data"], dict)
|
||||
@@ -1,8 +1,8 @@
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi import HTTPException
|
||||
import pytest
|
||||
|
||||
from app.gateway.routers.skills import _resolve_skill_dir_from_archive_root
|
||||
from deerflow.skills.installer import resolve_skill_dir_from_archive
|
||||
|
||||
|
||||
def _write_skill(skill_dir: Path) -> None:
|
||||
@@ -23,24 +23,19 @@ def test_resolve_skill_dir_ignores_macosx_wrapper(tmp_path: Path) -> None:
|
||||
_write_skill(tmp_path / "demo-skill")
|
||||
(tmp_path / "__MACOSX").mkdir()
|
||||
|
||||
assert _resolve_skill_dir_from_archive_root(tmp_path) == tmp_path / "demo-skill"
|
||||
assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "demo-skill"
|
||||
|
||||
|
||||
def test_resolve_skill_dir_ignores_hidden_top_level_entries(tmp_path: Path) -> None:
|
||||
_write_skill(tmp_path / "demo-skill")
|
||||
(tmp_path / ".DS_Store").write_text("metadata", encoding="utf-8")
|
||||
|
||||
assert _resolve_skill_dir_from_archive_root(tmp_path) == tmp_path / "demo-skill"
|
||||
assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "demo-skill"
|
||||
|
||||
|
||||
def test_resolve_skill_dir_rejects_archive_with_only_metadata(tmp_path: Path) -> None:
|
||||
(tmp_path / "__MACOSX").mkdir()
|
||||
(tmp_path / ".DS_Store").write_text("metadata", encoding="utf-8")
|
||||
|
||||
try:
|
||||
_resolve_skill_dir_from_archive_root(tmp_path)
|
||||
except HTTPException as error:
|
||||
assert error.status_code == 400
|
||||
assert error.detail == "Skill archive is empty"
|
||||
else:
|
||||
raise AssertionError("Expected HTTPException for metadata-only archive")
|
||||
with pytest.raises(ValueError, match="empty"):
|
||||
resolve_skill_dir_from_archive(tmp_path)
|
||||
|
||||
220
backend/tests/test_skills_installer.py
Normal file
220
backend/tests/test_skills_installer.py
Normal file
@@ -0,0 +1,220 @@
|
||||
"""Tests for deerflow.skills.installer — shared skill installation logic."""
|
||||
|
||||
import stat
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from deerflow.skills.installer import (
|
||||
install_skill_from_archive,
|
||||
is_symlink_member,
|
||||
is_unsafe_zip_member,
|
||||
resolve_skill_dir_from_archive,
|
||||
safe_extract_skill_archive,
|
||||
should_ignore_archive_entry,
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# is_unsafe_zip_member
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestIsUnsafeZipMember:
|
||||
def test_absolute_path(self):
|
||||
info = zipfile.ZipInfo("/etc/passwd")
|
||||
assert is_unsafe_zip_member(info) is True
|
||||
|
||||
def test_dotdot_traversal(self):
|
||||
info = zipfile.ZipInfo("foo/../../../etc/passwd")
|
||||
assert is_unsafe_zip_member(info) is True
|
||||
|
||||
def test_safe_member(self):
|
||||
info = zipfile.ZipInfo("my-skill/SKILL.md")
|
||||
assert is_unsafe_zip_member(info) is False
|
||||
|
||||
def test_empty_filename(self):
|
||||
info = zipfile.ZipInfo("")
|
||||
assert is_unsafe_zip_member(info) is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# is_symlink_member
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestIsSymlinkMember:
|
||||
def test_detects_symlink(self):
|
||||
info = zipfile.ZipInfo("link.txt")
|
||||
info.external_attr = (stat.S_IFLNK | 0o777) << 16
|
||||
assert is_symlink_member(info) is True
|
||||
|
||||
def test_regular_file(self):
|
||||
info = zipfile.ZipInfo("file.txt")
|
||||
info.external_attr = (stat.S_IFREG | 0o644) << 16
|
||||
assert is_symlink_member(info) is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# should_ignore_archive_entry
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestShouldIgnoreArchiveEntry:
|
||||
def test_macosx_ignored(self):
|
||||
assert should_ignore_archive_entry(Path("__MACOSX")) is True
|
||||
|
||||
def test_dotfile_ignored(self):
|
||||
assert should_ignore_archive_entry(Path(".DS_Store")) is True
|
||||
|
||||
def test_normal_dir_not_ignored(self):
|
||||
assert should_ignore_archive_entry(Path("my-skill")) is False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# resolve_skill_dir_from_archive
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestResolveSkillDir:
|
||||
def test_single_dir(self, tmp_path):
|
||||
(tmp_path / "my-skill").mkdir()
|
||||
(tmp_path / "my-skill" / "SKILL.md").write_text("content")
|
||||
assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "my-skill"
|
||||
|
||||
def test_with_macosx(self, tmp_path):
|
||||
(tmp_path / "my-skill").mkdir()
|
||||
(tmp_path / "my-skill" / "SKILL.md").write_text("content")
|
||||
(tmp_path / "__MACOSX").mkdir()
|
||||
assert resolve_skill_dir_from_archive(tmp_path) == tmp_path / "my-skill"
|
||||
|
||||
def test_empty_after_filter(self, tmp_path):
|
||||
(tmp_path / "__MACOSX").mkdir()
|
||||
(tmp_path / ".DS_Store").write_text("meta")
|
||||
with pytest.raises(ValueError, match="empty"):
|
||||
resolve_skill_dir_from_archive(tmp_path)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# safe_extract_skill_archive
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSafeExtract:
|
||||
def _make_zip(self, tmp_path, members: dict[str, str | bytes]) -> Path:
|
||||
"""Create a zip with given filename->content entries."""
|
||||
zip_path = tmp_path / "test.zip"
|
||||
with zipfile.ZipFile(zip_path, "w") as zf:
|
||||
for name, content in members.items():
|
||||
if isinstance(content, str):
|
||||
content = content.encode()
|
||||
zf.writestr(name, content)
|
||||
return zip_path
|
||||
|
||||
def test_rejects_zip_bomb(self, tmp_path):
|
||||
zip_path = self._make_zip(tmp_path, {"big.txt": "x" * 1000})
|
||||
dest = tmp_path / "out"
|
||||
dest.mkdir()
|
||||
with zipfile.ZipFile(zip_path) as zf:
|
||||
with pytest.raises(ValueError, match="too large"):
|
||||
safe_extract_skill_archive(zf, dest, max_total_size=100)
|
||||
|
||||
def test_rejects_absolute_path(self, tmp_path):
|
||||
zip_path = tmp_path / "abs.zip"
|
||||
with zipfile.ZipFile(zip_path, "w") as zf:
|
||||
zf.writestr("/etc/passwd", "root:x:0:0")
|
||||
dest = tmp_path / "out"
|
||||
dest.mkdir()
|
||||
with zipfile.ZipFile(zip_path) as zf:
|
||||
with pytest.raises(ValueError, match="unsafe"):
|
||||
safe_extract_skill_archive(zf, dest)
|
||||
|
||||
def test_skips_symlinks(self, tmp_path):
|
||||
zip_path = tmp_path / "sym.zip"
|
||||
with zipfile.ZipFile(zip_path, "w") as zf:
|
||||
info = zipfile.ZipInfo("link.txt")
|
||||
info.external_attr = (stat.S_IFLNK | 0o777) << 16
|
||||
zf.writestr(info, "/etc/passwd")
|
||||
zf.writestr("normal.txt", "hello")
|
||||
dest = tmp_path / "out"
|
||||
dest.mkdir()
|
||||
with zipfile.ZipFile(zip_path) as zf:
|
||||
safe_extract_skill_archive(zf, dest)
|
||||
assert (dest / "normal.txt").exists()
|
||||
assert not (dest / "link.txt").exists()
|
||||
|
||||
def test_normal_archive(self, tmp_path):
|
||||
zip_path = self._make_zip(tmp_path, {
|
||||
"my-skill/SKILL.md": "---\nname: test\ndescription: x\n---\n# Test",
|
||||
"my-skill/README.md": "readme",
|
||||
})
|
||||
dest = tmp_path / "out"
|
||||
dest.mkdir()
|
||||
with zipfile.ZipFile(zip_path) as zf:
|
||||
safe_extract_skill_archive(zf, dest)
|
||||
assert (dest / "my-skill" / "SKILL.md").exists()
|
||||
assert (dest / "my-skill" / "README.md").exists()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# install_skill_from_archive (full integration)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestInstallSkillFromArchive:
|
||||
def _make_skill_zip(self, tmp_path: Path, skill_name: str = "test-skill") -> Path:
|
||||
"""Create a valid .skill archive."""
|
||||
zip_path = tmp_path / f"{skill_name}.skill"
|
||||
with zipfile.ZipFile(zip_path, "w") as zf:
|
||||
zf.writestr(
|
||||
f"{skill_name}/SKILL.md",
|
||||
f"---\nname: {skill_name}\ndescription: A test skill\n---\n\n# {skill_name}\n",
|
||||
)
|
||||
return zip_path
|
||||
|
||||
def test_success(self, tmp_path):
|
||||
zip_path = self._make_skill_zip(tmp_path)
|
||||
skills_root = tmp_path / "skills"
|
||||
skills_root.mkdir()
|
||||
result = install_skill_from_archive(zip_path, skills_root=skills_root)
|
||||
assert result["success"] is True
|
||||
assert result["skill_name"] == "test-skill"
|
||||
assert (skills_root / "custom" / "test-skill" / "SKILL.md").exists()
|
||||
|
||||
def test_duplicate_raises(self, tmp_path):
|
||||
zip_path = self._make_skill_zip(tmp_path)
|
||||
skills_root = tmp_path / "skills"
|
||||
(skills_root / "custom" / "test-skill").mkdir(parents=True)
|
||||
with pytest.raises(ValueError, match="already exists"):
|
||||
install_skill_from_archive(zip_path, skills_root=skills_root)
|
||||
|
||||
def test_invalid_extension(self, tmp_path):
|
||||
bad_path = tmp_path / "bad.zip"
|
||||
bad_path.write_text("not a skill")
|
||||
with pytest.raises(ValueError, match=".skill"):
|
||||
install_skill_from_archive(bad_path)
|
||||
|
||||
def test_bad_frontmatter(self, tmp_path):
|
||||
zip_path = tmp_path / "bad.skill"
|
||||
with zipfile.ZipFile(zip_path, "w") as zf:
|
||||
zf.writestr("bad/SKILL.md", "no frontmatter here")
|
||||
skills_root = tmp_path / "skills"
|
||||
skills_root.mkdir()
|
||||
with pytest.raises(ValueError, match="Invalid skill"):
|
||||
install_skill_from_archive(zip_path, skills_root=skills_root)
|
||||
|
||||
def test_nonexistent_file(self):
|
||||
with pytest.raises(FileNotFoundError):
|
||||
install_skill_from_archive(Path("/nonexistent/path.skill"))
|
||||
|
||||
def test_macosx_filtered_during_resolve(self, tmp_path):
|
||||
"""Archive with __MACOSX dir still installs correctly."""
|
||||
zip_path = tmp_path / "mac.skill"
|
||||
with zipfile.ZipFile(zip_path, "w") as zf:
|
||||
zf.writestr("my-skill/SKILL.md", "---\nname: my-skill\ndescription: desc\n---\n# My Skill\n")
|
||||
zf.writestr("__MACOSX/._my-skill", "meta")
|
||||
skills_root = tmp_path / "skills"
|
||||
skills_root.mkdir()
|
||||
result = install_skill_from_archive(zip_path, skills_root=skills_root)
|
||||
assert result["success"] is True
|
||||
assert result["skill_name"] == "my-skill"
|
||||
146
backend/tests/test_uploads_manager.py
Normal file
146
backend/tests/test_uploads_manager.py
Normal file
@@ -0,0 +1,146 @@
|
||||
"""Tests for deerflow.uploads.manager — shared upload management logic."""
|
||||
|
||||
import pytest
|
||||
|
||||
from deerflow.uploads.manager import (
|
||||
PathTraversalError,
|
||||
claim_unique_filename,
|
||||
delete_file_safe,
|
||||
list_files_in_dir,
|
||||
normalize_filename,
|
||||
validate_path_traversal,
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# normalize_filename
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestNormalizeFilename:
|
||||
def test_safe_filename(self):
|
||||
assert normalize_filename("report.pdf") == "report.pdf"
|
||||
|
||||
def test_strips_path_components(self):
|
||||
assert normalize_filename("../../etc/passwd") == "passwd"
|
||||
|
||||
def test_rejects_empty(self):
|
||||
with pytest.raises(ValueError, match="empty"):
|
||||
normalize_filename("")
|
||||
|
||||
def test_rejects_dot_dot(self):
|
||||
with pytest.raises(ValueError, match="unsafe"):
|
||||
normalize_filename("..")
|
||||
|
||||
def test_strips_separators(self):
|
||||
assert normalize_filename("path/to/file.txt") == "file.txt"
|
||||
|
||||
def test_dot_only(self):
|
||||
with pytest.raises(ValueError, match="unsafe"):
|
||||
normalize_filename(".")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# claim_unique_filename
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestDeduplicateFilename:
|
||||
def test_no_collision(self):
|
||||
seen: set[str] = set()
|
||||
assert claim_unique_filename("data.txt", seen) == "data.txt"
|
||||
assert "data.txt" in seen
|
||||
|
||||
def test_single_collision(self):
|
||||
seen = {"data.txt"}
|
||||
assert claim_unique_filename("data.txt", seen) == "data_1.txt"
|
||||
assert "data_1.txt" in seen
|
||||
|
||||
def test_triple_collision(self):
|
||||
seen = {"data.txt", "data_1.txt", "data_2.txt"}
|
||||
assert claim_unique_filename("data.txt", seen) == "data_3.txt"
|
||||
assert "data_3.txt" in seen
|
||||
|
||||
def test_mutates_seen(self):
|
||||
seen: set[str] = set()
|
||||
claim_unique_filename("a.txt", seen)
|
||||
claim_unique_filename("a.txt", seen)
|
||||
assert seen == {"a.txt", "a_1.txt"}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# validate_path_traversal
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestValidatePathTraversal:
|
||||
def test_inside_base_ok(self, tmp_path):
|
||||
child = tmp_path / "file.txt"
|
||||
child.touch()
|
||||
validate_path_traversal(child, tmp_path) # no exception
|
||||
|
||||
def test_outside_base_raises(self, tmp_path):
|
||||
outside = tmp_path / ".." / "evil.txt"
|
||||
with pytest.raises(PathTraversalError, match="traversal"):
|
||||
validate_path_traversal(outside, tmp_path)
|
||||
|
||||
def test_symlink_escape(self, tmp_path):
|
||||
target = tmp_path.parent / "secret.txt"
|
||||
target.touch()
|
||||
link = tmp_path / "escape"
|
||||
link.symlink_to(target)
|
||||
with pytest.raises(PathTraversalError, match="traversal"):
|
||||
validate_path_traversal(link, tmp_path)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# list_files_in_dir
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestListFilesInDir:
|
||||
def test_empty_dir(self, tmp_path):
|
||||
result = list_files_in_dir(tmp_path)
|
||||
assert result == {"files": [], "count": 0}
|
||||
|
||||
def test_nonexistent_dir(self, tmp_path):
|
||||
result = list_files_in_dir(tmp_path / "nope")
|
||||
assert result == {"files": [], "count": 0}
|
||||
|
||||
def test_multiple_files_sorted(self, tmp_path):
|
||||
(tmp_path / "b.txt").write_text("b")
|
||||
(tmp_path / "a.txt").write_text("a")
|
||||
result = list_files_in_dir(tmp_path)
|
||||
assert result["count"] == 2
|
||||
assert result["files"][0]["filename"] == "a.txt"
|
||||
assert result["files"][1]["filename"] == "b.txt"
|
||||
for f in result["files"]:
|
||||
assert set(f.keys()) == {"filename", "size", "path", "extension", "modified"}
|
||||
|
||||
def test_ignores_subdirectories(self, tmp_path):
|
||||
(tmp_path / "file.txt").write_text("data")
|
||||
(tmp_path / "subdir").mkdir()
|
||||
result = list_files_in_dir(tmp_path)
|
||||
assert result["count"] == 1
|
||||
assert result["files"][0]["filename"] == "file.txt"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# delete_file_safe
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestDeleteFileSafe:
|
||||
def test_delete_existing_file(self, tmp_path):
|
||||
f = tmp_path / "test.txt"
|
||||
f.write_text("data")
|
||||
result = delete_file_safe(tmp_path, "test.txt")
|
||||
assert result["success"] is True
|
||||
assert not f.exists()
|
||||
|
||||
def test_delete_nonexistent_raises(self, tmp_path):
|
||||
with pytest.raises(FileNotFoundError):
|
||||
delete_file_safe(tmp_path, "nope.txt")
|
||||
|
||||
def test_delete_traversal_raises(self, tmp_path):
|
||||
with pytest.raises(PathTraversalError, match="traversal"):
|
||||
delete_file_safe(tmp_path, "../outside.txt")
|
||||
@@ -19,6 +19,7 @@ def test_upload_files_writes_thread_storage_and_skips_local_sandbox_sync(tmp_pat
|
||||
|
||||
with (
|
||||
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
||||
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
||||
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
||||
):
|
||||
file = UploadFile(filename="notes.txt", file=BytesIO(b"hello uploads"))
|
||||
@@ -48,6 +49,7 @@ def test_upload_files_syncs_non_local_sandbox_and_marks_markdown_file(tmp_path):
|
||||
|
||||
with (
|
||||
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
||||
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
||||
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
||||
patch.object(uploads, "convert_file_to_markdown", AsyncMock(side_effect=fake_convert)),
|
||||
):
|
||||
@@ -78,6 +80,7 @@ def test_upload_files_rejects_dotdot_and_dot_filenames(tmp_path):
|
||||
|
||||
with (
|
||||
patch.object(uploads, "get_uploads_dir", return_value=thread_uploads_dir),
|
||||
patch.object(uploads, "ensure_uploads_dir", return_value=thread_uploads_dir),
|
||||
patch.object(uploads, "get_sandbox_provider", return_value=provider),
|
||||
):
|
||||
# These filenames must be rejected outright
|
||||
|
||||
Reference in New Issue
Block a user