backend/packages/harness/deerflow/agents/middlewares/thread_data_middleware.py

from typing import NotRequired, override

from langchain.agents import AgentState
from langchain.agents.middleware import AgentMiddleware
from langgraph.config import get_config
from langgraph.runtime import Runtime

from deerflow.agents.thread_state import ThreadDataState
from deerflow.config.paths import Paths, get_paths


class ThreadDataMiddlewareState(AgentState):
    """Compatible with the `ThreadState` schema."""

    thread_data: NotRequired[ThreadDataState | None]


class ThreadDataMiddleware(AgentMiddleware[ThreadDataMiddlewareState]):
    """Create thread data directories for each thread execution.

    Creates the following directory structure:
    - {base_dir}/threads/{thread_id}/user-data/workspace
    - {base_dir}/threads/{thread_id}/user-data/uploads
    - {base_dir}/threads/{thread_id}/user-data/outputs

    Lifecycle Management:
    - With lazy_init=True (default): Only compute paths, directories created on-demand
    - With lazy_init=False: Eagerly create directories in before_agent()
    """

    state_schema = ThreadDataMiddlewareState

    def __init__(self, base_dir: str | None = None, lazy_init: bool = True):
        """Initialize the middleware.

        Args:
            base_dir: Base directory for thread data. Defaults to Paths resolution.
            lazy_init: If True, defer directory creation until needed.
                      If False, create directories eagerly in before_agent().
                      Default is True for optimal performance.
        """
        super().__init__()
        self._paths = Paths(base_dir) if base_dir else get_paths()
        self._lazy_init = lazy_init

    def _get_thread_paths(self, thread_id: str) -> dict[str, str]:
        """Get the paths for a thread's data directories.

        Args:
            thread_id: The thread ID.

        Returns:
            Dictionary with workspace_path, uploads_path, and outputs_path.
        """
        return {
            "workspace_path": str(self._paths.sandbox_work_dir(thread_id)),
            "uploads_path": str(self._paths.sandbox_uploads_dir(thread_id)),
            "outputs_path": str(self._paths.sandbox_outputs_dir(thread_id)),
        }

    def _create_thread_directories(self, thread_id: str) -> dict[str, str]:
        """Create the thread data directories.

        Args:
            thread_id: The thread ID.

        Returns:
            Dictionary with the created directory paths.
        """
        self._paths.ensure_thread_dirs(thread_id)
        return self._get_thread_paths(thread_id)

    @override
    def before_agent(self, state: ThreadDataMiddlewareState, runtime: Runtime) -> dict | None:
        context = runtime.context or {}
        thread_id = context.get("thread_id")
        if thread_id is None:
            config = get_config()
            thread_id = config.get("configurable", {}).get("thread_id")

        if thread_id is None:
            raise ValueError("Thread ID is required in runtime context or config.configurable")

        if self._lazy_init:
            # Lazy initialization: only compute paths, don't create directories
            paths = self._get_thread_paths(thread_id)
        else:
            # Eager initialization: create directories immediately
            paths = self._create_thread_directories(thread_id)
            print(f"Created thread data directories for thread {thread_id}")

        return {
            "thread_data": {
                **paths,
            }
        }
feat: add thread data middleware (#2) 2026-01-15 13:22:30 +08:00			`from typing import NotRequired, override`

			`from langchain.agents import AgentState`
			`from langchain.agents.middleware import AgentMiddleware`
fix(middleware): fallback to configurable thread_id in thread data middleware (#1237) Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> 2026-03-22 20:14:51 +08:00			`from langgraph.config import get_config`
feat: add thread data middleware (#2) 2026-01-15 13:22:30 +08:00			`from langgraph.runtime import Runtime`

refactor: split backend into harness (deerflow.) and app (app.) (#1131) * refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.) and app (app.) Physically split the monolithic backend/src/ package into two layers: - Harness (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - App* (`app/`): unpublished application code with import prefix `app.`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src. → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.` or `import app.` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow./app. after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-14 22:55:52 +08:00			`from deerflow.agents.thread_state import ThreadDataState`
			`from deerflow.config.paths import Paths, get_paths`
feat: add thread data middleware (#2) 2026-01-15 13:22:30 +08:00

			`class ThreadDataMiddlewareState(AgentState):`
			"""Compatible with the `ThreadState` schema."""

			`thread_data: NotRequired[ThreadDataState \| None]`


			`class ThreadDataMiddleware(AgentMiddleware[ThreadDataMiddlewareState]):`
			`"""Create thread data directories for each thread execution.`

			`Creates the following directory structure:`
Refactor base paths with centralized path management (#901) * Initial plan * refactor: centralize path management and improve memory storage configuration * fix: update memory storage path in config.example.yaml for clarity * Initial plan * Address PR #901 review comments: security fixes and documentation improvements Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com> 2026-02-25 21:30:33 +08:00			`- {base_dir}/threads/{thread_id}/user-data/workspace`
			`- {base_dir}/threads/{thread_id}/user-data/uploads`
			`- {base_dir}/threads/{thread_id}/user-data/outputs`
feat: implement lazy sandbox and thread data initialization (#11) Defer sandbox acquisition and thread directory creation until first use to improve performance and reduce resource usage. Changes: - Add lazy_init parameter to SandboxMiddleware (default: true) - Add ensure_sandbox_initialized() helper for lazy sandbox acquisition - Update all sandbox tools to use lazy initialization - Add lazy_init parameter to ThreadDataMiddleware (default: true) - Create thread directories on-demand in AioSandboxProvider - LocalSandbox already creates directories on write (no changes needed) Benefits: - Saves 1-2s Docker container startup for conversations without tools - Reduces unnecessary directory creation and file system operations - Backward compatible with lazy_init=false option Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> 2026-01-18 13:38:34 +08:00
			`Lifecycle Management:`
			`- With lazy_init=True (default): Only compute paths, directories created on-demand`
			`- With lazy_init=False: Eagerly create directories in before_agent()`
feat: add thread data middleware (#2) 2026-01-15 13:22:30 +08:00			`"""`

			`state_schema = ThreadDataMiddlewareState`

feat: implement lazy sandbox and thread data initialization (#11) Defer sandbox acquisition and thread directory creation until first use to improve performance and reduce resource usage. Changes: - Add lazy_init parameter to SandboxMiddleware (default: true) - Add ensure_sandbox_initialized() helper for lazy sandbox acquisition - Update all sandbox tools to use lazy initialization - Add lazy_init parameter to ThreadDataMiddleware (default: true) - Create thread directories on-demand in AioSandboxProvider - LocalSandbox already creates directories on write (no changes needed) Benefits: - Saves 1-2s Docker container startup for conversations without tools - Reduces unnecessary directory creation and file system operations - Backward compatible with lazy_init=false option Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> 2026-01-18 13:38:34 +08:00			`def __init__(self, base_dir: str \| None = None, lazy_init: bool = True):`
feat: add thread data middleware (#2) 2026-01-15 13:22:30 +08:00			`"""Initialize the middleware.`

			`Args:`
Refactor base paths with centralized path management (#901) * Initial plan * refactor: centralize path management and improve memory storage configuration * fix: update memory storage path in config.example.yaml for clarity * Initial plan * Address PR #901 review comments: security fixes and documentation improvements Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com> 2026-02-25 21:30:33 +08:00			`base_dir: Base directory for thread data. Defaults to Paths resolution.`
feat: implement lazy sandbox and thread data initialization (#11) Defer sandbox acquisition and thread directory creation until first use to improve performance and reduce resource usage. Changes: - Add lazy_init parameter to SandboxMiddleware (default: true) - Add ensure_sandbox_initialized() helper for lazy sandbox acquisition - Update all sandbox tools to use lazy initialization - Add lazy_init parameter to ThreadDataMiddleware (default: true) - Create thread directories on-demand in AioSandboxProvider - LocalSandbox already creates directories on write (no changes needed) Benefits: - Saves 1-2s Docker container startup for conversations without tools - Reduces unnecessary directory creation and file system operations - Backward compatible with lazy_init=false option Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> 2026-01-18 13:38:34 +08:00			`lazy_init: If True, defer directory creation until needed.`
			`If False, create directories eagerly in before_agent().`
			`Default is True for optimal performance.`
feat: add thread data middleware (#2) 2026-01-15 13:22:30 +08:00			`"""`
			`super().__init__()`
Refactor base paths with centralized path management (#901) * Initial plan * refactor: centralize path management and improve memory storage configuration * fix: update memory storage path in config.example.yaml for clarity * Initial plan * Address PR #901 review comments: security fixes and documentation improvements Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com> 2026-02-25 21:30:33 +08:00			`self._paths = Paths(base_dir) if base_dir else get_paths()`
feat: implement lazy sandbox and thread data initialization (#11) Defer sandbox acquisition and thread directory creation until first use to improve performance and reduce resource usage. Changes: - Add lazy_init parameter to SandboxMiddleware (default: true) - Add ensure_sandbox_initialized() helper for lazy sandbox acquisition - Update all sandbox tools to use lazy initialization - Add lazy_init parameter to ThreadDataMiddleware (default: true) - Create thread directories on-demand in AioSandboxProvider - LocalSandbox already creates directories on write (no changes needed) Benefits: - Saves 1-2s Docker container startup for conversations without tools - Reduces unnecessary directory creation and file system operations - Backward compatible with lazy_init=false option Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> 2026-01-18 13:38:34 +08:00			`self._lazy_init = lazy_init`
feat: add thread data middleware (#2) 2026-01-15 13:22:30 +08:00
			`def _get_thread_paths(self, thread_id: str) -> dict[str, str]:`
			`"""Get the paths for a thread's data directories.`

			`Args:`
			`thread_id: The thread ID.`

			`Returns:`
			`Dictionary with workspace_path, uploads_path, and outputs_path.`
			`"""`
			`return {`
Refactor base paths with centralized path management (#901) * Initial plan * refactor: centralize path management and improve memory storage configuration * fix: update memory storage path in config.example.yaml for clarity * Initial plan * Address PR #901 review comments: security fixes and documentation improvements Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com> 2026-02-25 21:30:33 +08:00			`"workspace_path": str(self._paths.sandbox_work_dir(thread_id)),`
			`"uploads_path": str(self._paths.sandbox_uploads_dir(thread_id)),`
			`"outputs_path": str(self._paths.sandbox_outputs_dir(thread_id)),`
feat: add thread data middleware (#2) 2026-01-15 13:22:30 +08:00			`}`

			`def _create_thread_directories(self, thread_id: str) -> dict[str, str]:`
			`"""Create the thread data directories.`

			`Args:`
			`thread_id: The thread ID.`

			`Returns:`
			`Dictionary with the created directory paths.`
			`"""`
Refactor base paths with centralized path management (#901) * Initial plan * refactor: centralize path management and improve memory storage configuration * fix: update memory storage path in config.example.yaml for clarity * Initial plan * Address PR #901 review comments: security fixes and documentation improvements Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: foreleven <4785594+foreleven@users.noreply.github.com> 2026-02-25 21:30:33 +08:00			`self._paths.ensure_thread_dirs(thread_id)`
			`return self._get_thread_paths(thread_id)`
feat: add thread data middleware (#2) 2026-01-15 13:22:30 +08:00
			`@override`
			`def before_agent(self, state: ThreadDataMiddlewareState, runtime: Runtime) -> dict \| None:`
fix(middleware): fallback to configurable thread_id in thread data middleware (#1237) Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> 2026-03-22 20:14:51 +08:00			`context = runtime.context or {}`
			`thread_id = context.get("thread_id")`
feat: support function factory (#4) 2026-01-15 22:05:54 +08:00			`if thread_id is None:`
fix(middleware): fallback to configurable thread_id in thread data middleware (#1237) Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> 2026-03-22 20:14:51 +08:00			`config = get_config()`
			`thread_id = config.get("configurable", {}).get("thread_id")`

			`if thread_id is None:`
			`raise ValueError("Thread ID is required in runtime context or config.configurable")`
feat: implement lazy sandbox and thread data initialization (#11) Defer sandbox acquisition and thread directory creation until first use to improve performance and reduce resource usage. Changes: - Add lazy_init parameter to SandboxMiddleware (default: true) - Add ensure_sandbox_initialized() helper for lazy sandbox acquisition - Update all sandbox tools to use lazy initialization - Add lazy_init parameter to ThreadDataMiddleware (default: true) - Create thread directories on-demand in AioSandboxProvider - LocalSandbox already creates directories on write (no changes needed) Benefits: - Saves 1-2s Docker container startup for conversations without tools - Reduces unnecessary directory creation and file system operations - Backward compatible with lazy_init=false option Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> 2026-01-18 13:38:34 +08:00
			`if self._lazy_init:`
			`# Lazy initialization: only compute paths, don't create directories`
			`paths = self._get_thread_paths(thread_id)`
			`else:`
			`# Eager initialization: create directories immediately`
			`paths = self._create_thread_directories(thread_id)`
			`print(f"Created thread data directories for thread {thread_id}")`
feat: add thread data middleware (#2) 2026-01-15 13:22:30 +08:00
			`return {`
			`"thread_data": {`
			`**paths,`
			`}`
			`}`