Files
deer-flow/backend/app/gateway/routers/uploads.py
DanielWalnut 76803b826f refactor: split backend into harness (deerflow.*) and app (app.*) (#1131)
* refactor: extract shared utils to break harness→app cross-layer imports

Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: split backend/src into harness (deerflow.*) and app (app.*)

Physically split the monolithic backend/src/ package into two layers:

- **Harness** (`packages/harness/deerflow/`): publishable agent framework
  package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
  models, MCP, skills, config, and all core infrastructure.

- **App** (`app/`): unpublished application code with import prefix `app.*`.
  Contains gateway (FastAPI REST API) and channels (IM integrations).

Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure

Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: add harness→app boundary check test and update docs

Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.

Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add config versioning with auto-upgrade on startup

When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.

- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix comments

* fix: update src.* import in test_sandbox_tools_security to deerflow.*

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle empty config and search parent dirs for config.example.yaml

Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
  looking next to config.yaml, fixing detection in common setups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct skills root path depth and config_version type coercion

- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
  after harness split, file lives at packages/harness/deerflow/skills/
  so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
  _check_config_version() to prevent TypeError when YAML stores value
  as string (e.g. config_version: "1")
- tests: add regression tests for both fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update test imports from src.* to deerflow.*/app.* after harness refactor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 22:55:52 +08:00

196 lines
6.9 KiB
Python

"""Upload router for handling file uploads."""
import logging
from pathlib import Path
from fastapi import APIRouter, File, HTTPException, UploadFile
from pydantic import BaseModel
from deerflow.config.paths import VIRTUAL_PATH_PREFIX, get_paths
from deerflow.sandbox.sandbox_provider import get_sandbox_provider
from deerflow.utils.file_conversion import CONVERTIBLE_EXTENSIONS, convert_file_to_markdown
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/api/threads/{thread_id}/uploads", tags=["uploads"])
class UploadResponse(BaseModel):
"""Response model for file upload."""
success: bool
files: list[dict[str, str]]
message: str
def get_uploads_dir(thread_id: str) -> Path:
"""Get the uploads directory for a thread.
Args:
thread_id: The thread ID.
Returns:
Path to the uploads directory.
"""
base_dir = get_paths().sandbox_uploads_dir(thread_id)
base_dir.mkdir(parents=True, exist_ok=True)
return base_dir
@router.post("", response_model=UploadResponse)
async def upload_files(
thread_id: str,
files: list[UploadFile] = File(...),
) -> UploadResponse:
"""Upload multiple files to a thread's uploads directory.
For PDF, PPT, Excel, and Word files, they will be converted to markdown using markitdown.
All files (original and converted) are saved to /mnt/user-data/uploads.
Args:
thread_id: The thread ID to upload files to.
files: List of files to upload.
Returns:
Upload response with success status and file information.
"""
if not files:
raise HTTPException(status_code=400, detail="No files provided")
uploads_dir = get_uploads_dir(thread_id)
paths = get_paths()
uploaded_files = []
sandbox_provider = get_sandbox_provider()
sandbox_id = sandbox_provider.acquire(thread_id)
sandbox = sandbox_provider.get(sandbox_id)
for file in files:
if not file.filename:
continue
try:
# Normalize filename to prevent path traversal
safe_filename = Path(file.filename).name
if not safe_filename or safe_filename in {".", ".."} or "/" in safe_filename or "\\" in safe_filename:
logger.warning(f"Skipping file with unsafe filename: {file.filename!r}")
continue
content = await file.read()
file_path = uploads_dir / safe_filename
file_path.write_bytes(content)
# Build relative path from backend root
relative_path = str(paths.sandbox_uploads_dir(thread_id) / safe_filename)
virtual_path = f"{VIRTUAL_PATH_PREFIX}/uploads/{safe_filename}"
# Keep local sandbox source of truth in thread-scoped host storage.
# For non-local sandboxes, also sync to virtual path for runtime visibility.
if sandbox_id != "local":
sandbox.update_file(virtual_path, content)
file_info = {
"filename": safe_filename,
"size": str(len(content)),
"path": relative_path, # Actual filesystem path (relative to backend/)
"virtual_path": virtual_path, # Path for Agent in sandbox
"artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{safe_filename}", # HTTP URL
}
logger.info(f"Saved file: {safe_filename} ({len(content)} bytes) to {relative_path}")
# Check if file should be converted to markdown
file_ext = file_path.suffix.lower()
if file_ext in CONVERTIBLE_EXTENSIONS:
md_path = await convert_file_to_markdown(file_path)
if md_path:
md_relative_path = str(paths.sandbox_uploads_dir(thread_id) / md_path.name)
md_virtual_path = f"{VIRTUAL_PATH_PREFIX}/uploads/{md_path.name}"
if sandbox_id != "local":
sandbox.update_file(md_virtual_path, md_path.read_bytes())
file_info["markdown_file"] = md_path.name
file_info["markdown_path"] = md_relative_path
file_info["markdown_virtual_path"] = md_virtual_path
file_info["markdown_artifact_url"] = f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{md_path.name}"
uploaded_files.append(file_info)
except Exception as e:
logger.error(f"Failed to upload {file.filename}: {e}")
raise HTTPException(status_code=500, detail=f"Failed to upload {file.filename}: {str(e)}")
return UploadResponse(
success=True,
files=uploaded_files,
message=f"Successfully uploaded {len(uploaded_files)} file(s)",
)
@router.get("/list", response_model=dict)
async def list_uploaded_files(thread_id: str) -> dict:
"""List all files in a thread's uploads directory.
Args:
thread_id: The thread ID to list files for.
Returns:
Dictionary containing list of files with their metadata.
"""
uploads_dir = get_uploads_dir(thread_id)
if not uploads_dir.exists():
return {"files": [], "count": 0}
files = []
for file_path in sorted(uploads_dir.iterdir()):
if file_path.is_file():
stat = file_path.stat()
relative_path = str(get_paths().sandbox_uploads_dir(thread_id) / file_path.name)
files.append(
{
"filename": file_path.name,
"size": stat.st_size,
"path": relative_path, # Actual filesystem path
"virtual_path": f"{VIRTUAL_PATH_PREFIX}/uploads/{file_path.name}", # Path for Agent in sandbox
"artifact_url": f"/api/threads/{thread_id}/artifacts/mnt/user-data/uploads/{file_path.name}", # HTTP URL
"extension": file_path.suffix,
"modified": stat.st_mtime,
}
)
return {"files": files, "count": len(files)}
@router.delete("/{filename}")
async def delete_uploaded_file(thread_id: str, filename: str) -> dict:
"""Delete a file from a thread's uploads directory.
Args:
thread_id: The thread ID.
filename: The filename to delete.
Returns:
Success message.
"""
uploads_dir = get_uploads_dir(thread_id)
file_path = uploads_dir / filename
if not file_path.exists():
raise HTTPException(status_code=404, detail=f"File not found: {filename}")
# Security check: ensure the path is within the uploads directory
try:
file_path.resolve().relative_to(uploads_dir.resolve())
except ValueError:
raise HTTPException(status_code=403, detail="Access denied")
try:
file_path.unlink()
logger.info(f"Deleted file: {filename}")
return {"success": True, "message": f"Deleted {filename}"}
except Exception as e:
logger.error(f"Failed to delete {filename}: {e}")
raise HTTPException(status_code=500, detail=f"Failed to delete {filename}: {str(e)}")