mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-18 03:54:46 +08:00
refactor: extract shared skill installer and upload manager to harness (#1202)
* refactor: extract shared skill installer and upload manager to harness Move duplicated business logic from Gateway routers and Client into shared harness modules, eliminating code duplication. New shared modules: - deerflow.skills.installer: 6 functions (zip security, extraction, install) - deerflow.uploads.manager: 7 functions (normalize, deduplicate, validate, list, delete, get_uploads_dir, ensure_uploads_dir) Key improvements: - SkillAlreadyExistsError replaces stringly-typed 409 status routing - normalize_filename rejects backslash-containing filenames - Read paths (list/delete) no longer mkdir via get_uploads_dir - Write paths use ensure_uploads_dir for explicit directory creation - list_files_in_dir does stat inside scandir context (no re-stat) - install_skill_from_archive uses single is_file() check (one syscall) - Fix agent config key not reset on update_mcp_config/update_skill Tests: 42 new (22 installer + 20 upload manager) + client hardening * refactor: centralize upload URL construction and clean up installer - Extract upload_virtual_path(), upload_artifact_url(), enrich_file_listing() into shared manager.py, eliminating 6 duplicated URL constructions across Gateway router and Client - Derive all upload URLs from VIRTUAL_PATH_PREFIX constant instead of hardcoded "mnt/user-data/uploads" strings - Eliminate TOCTOU pre-checks and double file read in installer — single ZipFile() open with exception handling replaces is_file() + is_zipfile() + ZipFile() sequence - Add missing re-exports: ensure_uploads_dir in uploads/__init__.py, SkillAlreadyExistsError in skills/__init__.py - Remove redundant .lower() on already-lowercase CONVERTIBLE_EXTENSIONS - Hoist sandbox_uploads_dir(thread_id) before loop in uploads router * fix: add input validation for thread_id and filename length - Reject thread_id containing unsafe filesystem characters (only allow alphanumeric, hyphens, underscores, dots) — prevents 500 on inputs like <script> or shell metacharacters - Reject filenames longer than 255 bytes (OS limit) in normalize_filename - Gateway upload router maps ValueError to 400 for invalid thread_id * fix: address PR review — symlink safety, input validation coverage, error ordering - list_files_in_dir: use follow_symlinks=False to prevent symlink metadata leakage; check is_dir() instead of exists() for non-directory paths - install_skill_from_archive: restore is_file() pre-check before extension validation so error messages match the documented exception contract - validate_thread_id: move from ensure_uploads_dir to get_uploads_dir so all entry points (upload/list/delete) are protected - delete_uploaded_file: catch ValueError from thread_id validation (was 500) - requires_llm marker: also skip when OPENAI_API_KEY is unset - e2e fixture: update TitleMiddleware exclusion comment (kept filtering — middleware triggers extra LLM calls that add non-determinism to tests) * chore: revert uv.lock to main — no dependency changes in this PR * fix: use monkeypatch for global config in e2e fixture to prevent test pollution The e2e_env fixture was calling set_title_config() and set_summarization_config() directly, which mutated global singletons without automatic cleanup. When pytest ran test_client_e2e.py before test_title_middleware_core_logic.py, the leaked enabled=False caused 5 title tests to fail in CI. Switched to monkeypatch.setattr on the module-level private variables so pytest restores the originals after each test. * fix: address code review — URL encoding, API consistency, test isolation - upload_artifact_url: percent-encode filename to handle spaces/#/? - deduplicate_filename: mutate seen set in place (caller no longer needs manual .add() — less error-prone API) - list_files_in_dir: document that size is int, enrich stringifies - e2e fixture: monkeypatch _app_config instead of set_app_config() to prevent global singleton pollution (same pattern as title/summarization fix) - _make_e2e_config: read LLM connection details from env vars so external contributors can override defaults - Update tests to match new deduplicate_filename contract * docs: rewrite RFC in English and add alternatives/breaking changes sections * fix: address code review feedback on PR #1202 - Rename deduplicate_filename to claim_unique_filename to make the in-place set mutation explicit in the function name - Replace PermissionError with PathTraversalError(ValueError) for path traversal detection — malformed input is 400, not 403 * fix: set _app_config_is_custom in e2e test fixture to prevent config.yaml lookup in CI --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
This commit is contained in:
@@ -19,12 +19,9 @@ import asyncio
|
||||
import json
|
||||
import logging
|
||||
import mimetypes
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import tempfile
|
||||
import uuid
|
||||
import zipfile
|
||||
from collections.abc import Generator
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
@@ -42,6 +39,17 @@ from deerflow.config.app_config import get_app_config, reload_app_config
|
||||
from deerflow.config.extensions_config import ExtensionsConfig, SkillStateConfig, get_extensions_config, reload_extensions_config
|
||||
from deerflow.config.paths import get_paths
|
||||
from deerflow.models import create_chat_model
|
||||
from deerflow.skills.installer import install_skill_from_archive
|
||||
from deerflow.uploads.manager import (
|
||||
claim_unique_filename,
|
||||
delete_file_safe,
|
||||
enrich_file_listing,
|
||||
ensure_uploads_dir,
|
||||
get_uploads_dir,
|
||||
list_files_in_dir,
|
||||
upload_artifact_url,
|
||||
upload_virtual_path,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -566,6 +574,7 @@ class DeerFlowClient:
|
||||
self._atomic_write_json(config_path, config_data)
|
||||
|
||||
self._agent = None
|
||||
self._agent_config_key = None
|
||||
reloaded = reload_extensions_config()
|
||||
return {"mcp_servers": {name: server.model_dump() for name, server in reloaded.mcp_servers.items()}}
|
||||
|
||||
@@ -631,6 +640,7 @@ class DeerFlowClient:
|
||||
self._atomic_write_json(config_path, config_data)
|
||||
|
||||
self._agent = None
|
||||
self._agent_config_key = None
|
||||
reload_extensions_config()
|
||||
|
||||
updated = next((s for s in load_skills(enabled_only=False) if s.name == name), None)
|
||||
@@ -657,56 +667,7 @@ class DeerFlowClient:
|
||||
FileNotFoundError: If the file does not exist.
|
||||
ValueError: If the file is invalid.
|
||||
"""
|
||||
from deerflow.skills.loader import get_skills_root_path
|
||||
from deerflow.skills.validation import _validate_skill_frontmatter
|
||||
|
||||
path = Path(skill_path)
|
||||
if not path.exists():
|
||||
raise FileNotFoundError(f"Skill file not found: {skill_path}")
|
||||
if not path.is_file():
|
||||
raise ValueError(f"Path is not a file: {skill_path}")
|
||||
if path.suffix != ".skill":
|
||||
raise ValueError("File must have .skill extension")
|
||||
if not zipfile.is_zipfile(path):
|
||||
raise ValueError("File is not a valid ZIP archive")
|
||||
|
||||
skills_root = get_skills_root_path()
|
||||
custom_dir = skills_root / "custom"
|
||||
custom_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
with zipfile.ZipFile(path, "r") as zf:
|
||||
total_size = sum(info.file_size for info in zf.infolist())
|
||||
if total_size > 100 * 1024 * 1024:
|
||||
raise ValueError("Skill archive too large when extracted (>100MB)")
|
||||
for info in zf.infolist():
|
||||
if Path(info.filename).is_absolute() or ".." in Path(info.filename).parts:
|
||||
raise ValueError(f"Unsafe path in archive: {info.filename}")
|
||||
zf.extractall(tmp_path)
|
||||
for p in tmp_path.rglob("*"):
|
||||
if p.is_symlink():
|
||||
p.unlink()
|
||||
|
||||
items = list(tmp_path.iterdir())
|
||||
if not items:
|
||||
raise ValueError("Skill archive is empty")
|
||||
|
||||
skill_dir = items[0] if len(items) == 1 and items[0].is_dir() else tmp_path
|
||||
|
||||
is_valid, message, skill_name = _validate_skill_frontmatter(skill_dir)
|
||||
if not is_valid:
|
||||
raise ValueError(f"Invalid skill: {message}")
|
||||
if not re.fullmatch(r"[a-zA-Z0-9_-]+", skill_name):
|
||||
raise ValueError(f"Invalid skill name: {skill_name}")
|
||||
|
||||
target = custom_dir / skill_name
|
||||
if target.exists():
|
||||
raise ValueError(f"Skill '{skill_name}' already exists")
|
||||
|
||||
shutil.copytree(skill_dir, target)
|
||||
|
||||
return {"success": True, "skill_name": skill_name, "message": f"Skill '{skill_name}' installed successfully"}
|
||||
return install_skill_from_archive(skill_path)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Public API — memory management
|
||||
@@ -756,13 +717,6 @@ class DeerFlowClient:
|
||||
# Public API — file uploads
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
@staticmethod
|
||||
def _get_uploads_dir(thread_id: str) -> Path:
|
||||
"""Get (and create) the uploads directory for a thread."""
|
||||
base = get_paths().sandbox_uploads_dir(thread_id)
|
||||
base.mkdir(parents=True, exist_ok=True)
|
||||
return base
|
||||
|
||||
def upload_files(self, thread_id: str, files: list[str | Path]) -> dict:
|
||||
"""Upload local files into a thread's uploads directory.
|
||||
|
||||
@@ -784,7 +738,7 @@ class DeerFlowClient:
|
||||
|
||||
# Validate all files upfront to avoid partial uploads.
|
||||
resolved_files = []
|
||||
convertible_extensions = {ext.lower() for ext in CONVERTIBLE_EXTENSIONS}
|
||||
seen_names: set[str] = set()
|
||||
has_convertible_file = False
|
||||
for f in files:
|
||||
p = Path(f)
|
||||
@@ -792,11 +746,12 @@ class DeerFlowClient:
|
||||
raise FileNotFoundError(f"File not found: {f}")
|
||||
if not p.is_file():
|
||||
raise ValueError(f"Path is not a file: {f}")
|
||||
resolved_files.append(p)
|
||||
if not has_convertible_file and p.suffix.lower() in convertible_extensions:
|
||||
dest_name = claim_unique_filename(p.name, seen_names)
|
||||
resolved_files.append((p, dest_name))
|
||||
if not has_convertible_file and p.suffix.lower() in CONVERTIBLE_EXTENSIONS:
|
||||
has_convertible_file = True
|
||||
|
||||
uploads_dir = self._get_uploads_dir(thread_id)
|
||||
uploads_dir = ensure_uploads_dir(thread_id)
|
||||
uploaded_files: list[dict] = []
|
||||
|
||||
conversion_pool = None
|
||||
@@ -816,19 +771,21 @@ class DeerFlowClient:
|
||||
return asyncio.run(convert_file_to_markdown(path))
|
||||
|
||||
try:
|
||||
for src_path in resolved_files:
|
||||
dest = uploads_dir / src_path.name
|
||||
for src_path, dest_name in resolved_files:
|
||||
dest = uploads_dir / dest_name
|
||||
shutil.copy2(src_path, dest)
|
||||
|
||||
info: dict[str, Any] = {
|
||||
"filename": src_path.name,
|
||||
"filename": dest_name,
|
||||
"size": str(dest.stat().st_size),
|
||||
"path": str(dest),
|
||||
"virtual_path": f"/mnt/user-data/uploads/{src_path.name}",
|
||||
"artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{src_path.name}",
|
||||
"virtual_path": upload_virtual_path(dest_name),
|
||||
"artifact_url": upload_artifact_url(thread_id, dest_name),
|
||||
}
|
||||
if dest_name != src_path.name:
|
||||
info["original_filename"] = src_path.name
|
||||
|
||||
if src_path.suffix.lower() in convertible_extensions:
|
||||
if src_path.suffix.lower() in CONVERTIBLE_EXTENSIONS:
|
||||
try:
|
||||
if conversion_pool is not None:
|
||||
md_path = conversion_pool.submit(_convert_in_thread, dest).result()
|
||||
@@ -844,8 +801,9 @@ class DeerFlowClient:
|
||||
|
||||
if md_path is not None:
|
||||
info["markdown_file"] = md_path.name
|
||||
info["markdown_virtual_path"] = f"/mnt/user-data/uploads/{md_path.name}"
|
||||
info["markdown_artifact_url"] = f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{md_path.name}"
|
||||
info["markdown_path"] = str(uploads_dir / md_path.name)
|
||||
info["markdown_virtual_path"] = upload_virtual_path(md_path.name)
|
||||
info["markdown_artifact_url"] = upload_artifact_url(thread_id, md_path.name)
|
||||
|
||||
uploaded_files.append(info)
|
||||
finally:
|
||||
@@ -868,29 +826,9 @@ class DeerFlowClient:
|
||||
Dict with "files" and "count" keys, matching the Gateway API
|
||||
``list_uploaded_files`` response.
|
||||
"""
|
||||
uploads_dir = self._get_uploads_dir(thread_id)
|
||||
if not uploads_dir.exists():
|
||||
return {"files": [], "count": 0}
|
||||
|
||||
files = []
|
||||
with os.scandir(uploads_dir) as entries:
|
||||
file_entries = [entry for entry in entries if entry.is_file()]
|
||||
|
||||
for entry in sorted(file_entries, key=lambda item: item.name):
|
||||
stat = entry.stat()
|
||||
filename = entry.name
|
||||
files.append(
|
||||
{
|
||||
"filename": filename,
|
||||
"size": str(stat.st_size),
|
||||
"path": str(Path(entry.path)),
|
||||
"virtual_path": f"/mnt/user-data/uploads/{filename}",
|
||||
"artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{filename}",
|
||||
"extension": Path(filename).suffix,
|
||||
"modified": stat.st_mtime,
|
||||
}
|
||||
)
|
||||
return {"files": files, "count": len(files)}
|
||||
uploads_dir = get_uploads_dir(thread_id)
|
||||
result = list_files_in_dir(uploads_dir)
|
||||
return enrich_file_listing(result, thread_id)
|
||||
|
||||
def delete_upload(self, thread_id: str, filename: str) -> dict:
|
||||
"""Delete a file from a thread's uploads directory.
|
||||
@@ -907,19 +845,10 @@ class DeerFlowClient:
|
||||
FileNotFoundError: If the file does not exist.
|
||||
PermissionError: If path traversal is detected.
|
||||
"""
|
||||
uploads_dir = self._get_uploads_dir(thread_id)
|
||||
file_path = (uploads_dir / filename).resolve()
|
||||
from deerflow.utils.file_conversion import CONVERTIBLE_EXTENSIONS
|
||||
|
||||
try:
|
||||
file_path.relative_to(uploads_dir.resolve())
|
||||
except ValueError as exc:
|
||||
raise PermissionError("Access denied: path traversal detected") from exc
|
||||
|
||||
if not file_path.is_file():
|
||||
raise FileNotFoundError(f"File not found: {filename}")
|
||||
|
||||
file_path.unlink()
|
||||
return {"success": True, "message": f"Deleted {filename}"}
|
||||
uploads_dir = get_uploads_dir(thread_id)
|
||||
return delete_file_safe(uploads_dir, filename, convertible_extensions=CONVERTIBLE_EXTENSIONS)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Public API — artifacts
|
||||
@@ -939,19 +868,13 @@ class DeerFlowClient:
|
||||
FileNotFoundError: If the artifact does not exist.
|
||||
ValueError: If the path is invalid.
|
||||
"""
|
||||
virtual_prefix = "mnt/user-data"
|
||||
clean_path = path.lstrip("/")
|
||||
if not clean_path.startswith(virtual_prefix):
|
||||
raise ValueError(f"Path must start with /{virtual_prefix}")
|
||||
|
||||
relative = clean_path[len(virtual_prefix) :].lstrip("/")
|
||||
base_dir = get_paths().sandbox_user_data_dir(thread_id)
|
||||
actual = (base_dir / relative).resolve()
|
||||
|
||||
try:
|
||||
actual.relative_to(base_dir.resolve())
|
||||
actual = get_paths().resolve_virtual_path(thread_id, path)
|
||||
except ValueError as exc:
|
||||
raise PermissionError("Access denied: path traversal detected") from exc
|
||||
if "traversal" in str(exc):
|
||||
from deerflow.uploads.manager import PathTraversalError
|
||||
raise PathTraversalError("Path traversal detected") from exc
|
||||
raise
|
||||
if not actual.exists():
|
||||
raise FileNotFoundError(f"Artifact not found: {path}")
|
||||
if not actual.is_file():
|
||||
|
||||
@@ -1,5 +1,14 @@
|
||||
from .installer import SkillAlreadyExistsError, install_skill_from_archive
|
||||
from .loader import get_skills_root_path, load_skills
|
||||
from .types import Skill
|
||||
from .validation import ALLOWED_FRONTMATTER_PROPERTIES, _validate_skill_frontmatter
|
||||
|
||||
__all__ = ["load_skills", "get_skills_root_path", "Skill", "ALLOWED_FRONTMATTER_PROPERTIES", "_validate_skill_frontmatter"]
|
||||
__all__ = [
|
||||
"load_skills",
|
||||
"get_skills_root_path",
|
||||
"Skill",
|
||||
"ALLOWED_FRONTMATTER_PROPERTIES",
|
||||
"_validate_skill_frontmatter",
|
||||
"install_skill_from_archive",
|
||||
"SkillAlreadyExistsError",
|
||||
]
|
||||
|
||||
176
backend/packages/harness/deerflow/skills/installer.py
Normal file
176
backend/packages/harness/deerflow/skills/installer.py
Normal file
@@ -0,0 +1,176 @@
|
||||
"""Shared skill archive installation logic.
|
||||
|
||||
Pure business logic — no FastAPI/HTTP dependencies.
|
||||
Both Gateway and Client delegate to these functions.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import shutil
|
||||
import stat
|
||||
import tempfile
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
from deerflow.skills.loader import get_skills_root_path
|
||||
from deerflow.skills.validation import _validate_skill_frontmatter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SkillAlreadyExistsError(ValueError):
|
||||
"""Raised when a skill with the same name is already installed."""
|
||||
|
||||
|
||||
def is_unsafe_zip_member(info: zipfile.ZipInfo) -> bool:
|
||||
"""Return True if the zip member path is absolute or attempts directory traversal."""
|
||||
name = info.filename
|
||||
if not name:
|
||||
return False
|
||||
path = Path(name)
|
||||
if path.is_absolute():
|
||||
return True
|
||||
if ".." in path.parts:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def is_symlink_member(info: zipfile.ZipInfo) -> bool:
|
||||
"""Detect symlinks based on the external attributes stored in the ZipInfo."""
|
||||
mode = info.external_attr >> 16
|
||||
return stat.S_ISLNK(mode)
|
||||
|
||||
|
||||
def should_ignore_archive_entry(path: Path) -> bool:
|
||||
"""Return True for macOS metadata dirs and dotfiles."""
|
||||
return path.name.startswith(".") or path.name == "__MACOSX"
|
||||
|
||||
|
||||
def resolve_skill_dir_from_archive(temp_path: Path) -> Path:
|
||||
"""Locate the skill root directory from extracted archive contents.
|
||||
|
||||
Filters out macOS metadata (__MACOSX) and dotfiles (.DS_Store).
|
||||
|
||||
Returns:
|
||||
Path to the skill directory.
|
||||
|
||||
Raises:
|
||||
ValueError: If the archive is empty after filtering.
|
||||
"""
|
||||
items = [p for p in temp_path.iterdir() if not should_ignore_archive_entry(p)]
|
||||
if not items:
|
||||
raise ValueError("Skill archive is empty")
|
||||
if len(items) == 1 and items[0].is_dir():
|
||||
return items[0]
|
||||
return temp_path
|
||||
|
||||
|
||||
def safe_extract_skill_archive(
|
||||
zip_ref: zipfile.ZipFile,
|
||||
dest_path: Path,
|
||||
max_total_size: int = 512 * 1024 * 1024,
|
||||
) -> None:
|
||||
"""Safely extract a skill archive with security protections.
|
||||
|
||||
Protections:
|
||||
- Reject absolute paths and directory traversal (..).
|
||||
- Skip symlink entries instead of materialising them.
|
||||
- Enforce a hard limit on total uncompressed size (zip bomb defence).
|
||||
|
||||
Raises:
|
||||
ValueError: If unsafe members or size limit exceeded.
|
||||
"""
|
||||
dest_root = dest_path.resolve()
|
||||
total_written = 0
|
||||
|
||||
for info in zip_ref.infolist():
|
||||
if is_unsafe_zip_member(info):
|
||||
raise ValueError(f"Archive contains unsafe member path: {info.filename!r}")
|
||||
|
||||
if is_symlink_member(info):
|
||||
logger.warning("Skipping symlink entry in skill archive: %s", info.filename)
|
||||
continue
|
||||
|
||||
member_path = dest_root / info.filename
|
||||
if not member_path.resolve().is_relative_to(dest_root):
|
||||
raise ValueError(f"Zip entry escapes destination: {info.filename!r}")
|
||||
member_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
if info.is_dir():
|
||||
member_path.mkdir(parents=True, exist_ok=True)
|
||||
continue
|
||||
|
||||
with zip_ref.open(info) as src, member_path.open("wb") as dst:
|
||||
while chunk := src.read(65536):
|
||||
total_written += len(chunk)
|
||||
if total_written > max_total_size:
|
||||
raise ValueError("Skill archive is too large or appears highly compressed.")
|
||||
dst.write(chunk)
|
||||
|
||||
|
||||
def install_skill_from_archive(
|
||||
zip_path: str | Path,
|
||||
*,
|
||||
skills_root: Path | None = None,
|
||||
) -> dict:
|
||||
"""Install a skill from a .skill archive (ZIP).
|
||||
|
||||
Args:
|
||||
zip_path: Path to the .skill file.
|
||||
skills_root: Override the skills root directory. If None, uses
|
||||
the default from config.
|
||||
|
||||
Returns:
|
||||
Dict with success, skill_name, message.
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the file does not exist.
|
||||
ValueError: If the file is invalid (wrong extension, bad ZIP,
|
||||
invalid frontmatter, duplicate name).
|
||||
"""
|
||||
logger.info("Installing skill from %s", zip_path)
|
||||
path = Path(zip_path)
|
||||
if not path.is_file():
|
||||
if not path.exists():
|
||||
raise FileNotFoundError(f"Skill file not found: {zip_path}")
|
||||
raise ValueError(f"Path is not a file: {zip_path}")
|
||||
if path.suffix != ".skill":
|
||||
raise ValueError("File must have .skill extension")
|
||||
|
||||
if skills_root is None:
|
||||
skills_root = get_skills_root_path()
|
||||
custom_dir = skills_root / "custom"
|
||||
custom_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
|
||||
try:
|
||||
zf = zipfile.ZipFile(path, "r")
|
||||
except FileNotFoundError:
|
||||
raise FileNotFoundError(f"Skill file not found: {zip_path}") from None
|
||||
except (zipfile.BadZipFile, IsADirectoryError):
|
||||
raise ValueError("File is not a valid ZIP archive") from None
|
||||
|
||||
with zf:
|
||||
safe_extract_skill_archive(zf, tmp_path)
|
||||
|
||||
skill_dir = resolve_skill_dir_from_archive(tmp_path)
|
||||
|
||||
is_valid, message, skill_name = _validate_skill_frontmatter(skill_dir)
|
||||
if not is_valid:
|
||||
raise ValueError(f"Invalid skill: {message}")
|
||||
if not skill_name or "/" in skill_name or "\\" in skill_name or ".." in skill_name:
|
||||
raise ValueError(f"Invalid skill name: {skill_name}")
|
||||
|
||||
target = custom_dir / skill_name
|
||||
if target.exists():
|
||||
raise SkillAlreadyExistsError(f"Skill '{skill_name}' already exists")
|
||||
|
||||
shutil.copytree(skill_dir, target)
|
||||
logger.info("Skill %r installed to %s", skill_name, target)
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"skill_name": skill_name,
|
||||
"message": f"Skill '{skill_name}' installed successfully",
|
||||
}
|
||||
29
backend/packages/harness/deerflow/uploads/__init__.py
Normal file
29
backend/packages/harness/deerflow/uploads/__init__.py
Normal file
@@ -0,0 +1,29 @@
|
||||
from .manager import (
|
||||
PathTraversalError,
|
||||
claim_unique_filename,
|
||||
delete_file_safe,
|
||||
enrich_file_listing,
|
||||
ensure_uploads_dir,
|
||||
get_uploads_dir,
|
||||
list_files_in_dir,
|
||||
normalize_filename,
|
||||
upload_artifact_url,
|
||||
upload_virtual_path,
|
||||
validate_path_traversal,
|
||||
validate_thread_id,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"get_uploads_dir",
|
||||
"ensure_uploads_dir",
|
||||
"normalize_filename",
|
||||
"PathTraversalError",
|
||||
"claim_unique_filename",
|
||||
"validate_path_traversal",
|
||||
"list_files_in_dir",
|
||||
"delete_file_safe",
|
||||
"upload_artifact_url",
|
||||
"upload_virtual_path",
|
||||
"enrich_file_listing",
|
||||
"validate_thread_id",
|
||||
]
|
||||
198
backend/packages/harness/deerflow/uploads/manager.py
Normal file
198
backend/packages/harness/deerflow/uploads/manager.py
Normal file
@@ -0,0 +1,198 @@
|
||||
"""Shared upload management logic.
|
||||
|
||||
Pure business logic — no FastAPI/HTTP dependencies.
|
||||
Both Gateway and Client delegate to these functions.
|
||||
"""
|
||||
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
from urllib.parse import quote
|
||||
|
||||
from deerflow.config.paths import VIRTUAL_PATH_PREFIX, get_paths
|
||||
|
||||
|
||||
class PathTraversalError(ValueError):
|
||||
"""Raised when a path escapes its allowed base directory."""
|
||||
|
||||
# thread_id must be alphanumeric, hyphens, underscores, or dots only.
|
||||
_SAFE_THREAD_ID = re.compile(r"^[a-zA-Z0-9._-]+$")
|
||||
|
||||
|
||||
def validate_thread_id(thread_id: str) -> None:
|
||||
"""Reject thread IDs containing characters unsafe for filesystem paths.
|
||||
|
||||
Raises:
|
||||
ValueError: If thread_id is empty or contains unsafe characters.
|
||||
"""
|
||||
if not thread_id or not _SAFE_THREAD_ID.match(thread_id):
|
||||
raise ValueError(f"Invalid thread_id: {thread_id!r}")
|
||||
|
||||
|
||||
def get_uploads_dir(thread_id: str) -> Path:
|
||||
"""Return the uploads directory path for a thread (no side effects)."""
|
||||
validate_thread_id(thread_id)
|
||||
return get_paths().sandbox_uploads_dir(thread_id)
|
||||
|
||||
|
||||
def ensure_uploads_dir(thread_id: str) -> Path:
|
||||
"""Return the uploads directory for a thread, creating it if needed."""
|
||||
base = get_uploads_dir(thread_id)
|
||||
base.mkdir(parents=True, exist_ok=True)
|
||||
return base
|
||||
|
||||
|
||||
def normalize_filename(filename: str) -> str:
|
||||
"""Sanitize a filename by extracting its basename.
|
||||
|
||||
Strips any directory components and rejects traversal patterns.
|
||||
|
||||
Args:
|
||||
filename: Raw filename from user input (may contain path components).
|
||||
|
||||
Returns:
|
||||
Safe filename (basename only).
|
||||
|
||||
Raises:
|
||||
ValueError: If filename is empty or resolves to a traversal pattern.
|
||||
"""
|
||||
if not filename:
|
||||
raise ValueError("Filename is empty")
|
||||
safe = Path(filename).name
|
||||
if not safe or safe in {".", ".."}:
|
||||
raise ValueError(f"Filename is unsafe: {filename!r}")
|
||||
# Reject backslashes — on Linux Path.name keeps them as literal chars,
|
||||
# but they indicate a Windows-style path that should be stripped or rejected.
|
||||
if "\\" in safe:
|
||||
raise ValueError(f"Filename contains backslash: {filename!r}")
|
||||
if len(safe.encode("utf-8")) > 255:
|
||||
raise ValueError(f"Filename too long: {len(safe)} chars")
|
||||
return safe
|
||||
|
||||
|
||||
def claim_unique_filename(name: str, seen: set[str]) -> str:
|
||||
"""Generate a unique filename by appending ``_N`` suffix on collision.
|
||||
|
||||
Automatically adds the returned name to *seen* so callers don't need to.
|
||||
|
||||
Args:
|
||||
name: Candidate filename.
|
||||
seen: Set of filenames already claimed (mutated in place).
|
||||
|
||||
Returns:
|
||||
A filename not present in *seen* (already added to *seen*).
|
||||
"""
|
||||
if name not in seen:
|
||||
seen.add(name)
|
||||
return name
|
||||
stem, suffix = Path(name).stem, Path(name).suffix
|
||||
counter = 1
|
||||
candidate = f"{stem}_{counter}{suffix}"
|
||||
while candidate in seen:
|
||||
counter += 1
|
||||
candidate = f"{stem}_{counter}{suffix}"
|
||||
seen.add(candidate)
|
||||
return candidate
|
||||
|
||||
|
||||
def validate_path_traversal(path: Path, base: Path) -> None:
|
||||
"""Verify that *path* is inside *base*.
|
||||
|
||||
Raises:
|
||||
PathTraversalError: If a path traversal is detected.
|
||||
"""
|
||||
try:
|
||||
path.resolve().relative_to(base.resolve())
|
||||
except ValueError:
|
||||
raise PathTraversalError("Path traversal detected") from None
|
||||
|
||||
|
||||
def list_files_in_dir(directory: Path) -> dict:
|
||||
"""List files (not directories) in *directory*.
|
||||
|
||||
Args:
|
||||
directory: Directory to scan.
|
||||
|
||||
Returns:
|
||||
Dict with "files" list (sorted by name) and "count".
|
||||
Each file entry has ``size`` as *int* (bytes). Call
|
||||
:func:`enrich_file_listing` to stringify sizes and add
|
||||
virtual / artifact URLs.
|
||||
"""
|
||||
if not directory.is_dir():
|
||||
return {"files": [], "count": 0}
|
||||
|
||||
files = []
|
||||
with os.scandir(directory) as entries:
|
||||
for entry in sorted(entries, key=lambda e: e.name):
|
||||
if not entry.is_file(follow_symlinks=False):
|
||||
continue
|
||||
st = entry.stat(follow_symlinks=False)
|
||||
files.append({
|
||||
"filename": entry.name,
|
||||
"size": st.st_size,
|
||||
"path": entry.path,
|
||||
"extension": Path(entry.name).suffix,
|
||||
"modified": st.st_mtime,
|
||||
})
|
||||
return {"files": files, "count": len(files)}
|
||||
|
||||
|
||||
def delete_file_safe(base_dir: Path, filename: str, *, convertible_extensions: set[str] | None = None) -> dict:
|
||||
"""Delete a file inside *base_dir* after path-traversal validation.
|
||||
|
||||
If *convertible_extensions* is provided and the file's extension matches,
|
||||
the companion ``.md`` file is also removed (if it exists).
|
||||
|
||||
Args:
|
||||
base_dir: Directory containing the file.
|
||||
filename: Name of file to delete.
|
||||
convertible_extensions: Lowercase extensions (e.g. ``{".pdf", ".docx"}``)
|
||||
whose companion markdown should be cleaned up.
|
||||
|
||||
Returns:
|
||||
Dict with success and message.
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If the file does not exist.
|
||||
PathTraversalError: If path traversal is detected.
|
||||
"""
|
||||
file_path = (base_dir / filename).resolve()
|
||||
validate_path_traversal(file_path, base_dir)
|
||||
|
||||
if not file_path.is_file():
|
||||
raise FileNotFoundError(f"File not found: {filename}")
|
||||
|
||||
file_path.unlink()
|
||||
|
||||
# Clean up companion markdown generated during upload conversion.
|
||||
if convertible_extensions and file_path.suffix.lower() in convertible_extensions:
|
||||
file_path.with_suffix(".md").unlink(missing_ok=True)
|
||||
|
||||
return {"success": True, "message": f"Deleted {filename}"}
|
||||
|
||||
|
||||
def upload_artifact_url(thread_id: str, filename: str) -> str:
|
||||
"""Build the artifact URL for a file in a thread's uploads directory.
|
||||
|
||||
*filename* is percent-encoded so that spaces, ``#``, ``?`` etc. are safe.
|
||||
"""
|
||||
return f"/api/threads/{thread_id}/artifacts{VIRTUAL_PATH_PREFIX}/uploads/{quote(filename, safe='')}"
|
||||
|
||||
|
||||
def upload_virtual_path(filename: str) -> str:
|
||||
"""Build the virtual path for a file in the uploads directory."""
|
||||
return f"{VIRTUAL_PATH_PREFIX}/uploads/{filename}"
|
||||
|
||||
|
||||
def enrich_file_listing(result: dict, thread_id: str) -> dict:
|
||||
"""Add virtual paths, artifact URLs, and stringify sizes on a listing result.
|
||||
|
||||
Mutates *result* in place and returns it for convenience.
|
||||
"""
|
||||
for f in result["files"]:
|
||||
filename = f["filename"]
|
||||
f["size"] = str(f["size"])
|
||||
f["virtual_path"] = upload_virtual_path(filename)
|
||||
f["artifact_url"] = upload_artifact_url(thread_id, filename)
|
||||
return result
|
||||
Reference in New Issue
Block a user