backend/packages/harness/deerflow/agents/middlewares/subagent_limit_middleware.py

"""Middleware to enforce maximum concurrent subagent tool calls per model response."""

import logging
from typing import override

from langchain.agents import AgentState
from langchain.agents.middleware import AgentMiddleware
from langgraph.runtime import Runtime

from deerflow.subagents.executor import MAX_CONCURRENT_SUBAGENTS

logger = logging.getLogger(__name__)

# Valid range for max_concurrent_subagents
MIN_SUBAGENT_LIMIT = 2
MAX_SUBAGENT_LIMIT = 4


def _clamp_subagent_limit(value: int) -> int:
    """Clamp subagent limit to valid range [2, 4]."""
    return max(MIN_SUBAGENT_LIMIT, min(MAX_SUBAGENT_LIMIT, value))


class SubagentLimitMiddleware(AgentMiddleware[AgentState]):
    """Truncates excess 'task' tool calls from a single model response.

    When an LLM generates more than max_concurrent parallel task tool calls
    in one response, this middleware keeps only the first max_concurrent and
    discards the rest. This is more reliable than prompt-based limits.

    Args:
        max_concurrent: Maximum number of concurrent subagent calls allowed.
            Defaults to MAX_CONCURRENT_SUBAGENTS (3). Clamped to [2, 4].
    """

    def __init__(self, max_concurrent: int = MAX_CONCURRENT_SUBAGENTS):
        super().__init__()
        self.max_concurrent = _clamp_subagent_limit(max_concurrent)

    def _truncate_task_calls(self, state: AgentState) -> dict | None:
        messages = state.get("messages", [])
        if not messages:
            return None

        last_msg = messages[-1]
        if getattr(last_msg, "type", None) != "ai":
            return None

        tool_calls = getattr(last_msg, "tool_calls", None)
        if not tool_calls:
            return None

        # Count task tool calls
        task_indices = [i for i, tc in enumerate(tool_calls) if tc.get("name") == "task"]
        if len(task_indices) <= self.max_concurrent:
            return None

        # Build set of indices to drop (excess task calls beyond the limit)
        indices_to_drop = set(task_indices[self.max_concurrent :])
        truncated_tool_calls = [tc for i, tc in enumerate(tool_calls) if i not in indices_to_drop]

        dropped_count = len(indices_to_drop)
        logger.warning(f"Truncated {dropped_count} excess task tool call(s) from model response (limit: {self.max_concurrent})")

        # Replace the AIMessage with truncated tool_calls (same id triggers replacement)
        updated_msg = last_msg.model_copy(update={"tool_calls": truncated_tool_calls})
        return {"messages": [updated_msg]}

    @override
    def after_model(self, state: AgentState, runtime: Runtime) -> dict | None:
        return self._truncate_task_calls(state)

    @override
    async def aafter_model(self, state: AgentState, runtime: Runtime) -> dict | None:
        return self._truncate_task_calls(state)
feat: add DanglingToolCallMiddleware and SubagentLimitMiddleware Add two new middlewares to improve robustness of the agent pipeline: - DanglingToolCallMiddleware injects placeholder ToolMessages for interrupted tool calls, preventing LLM errors from malformed history - SubagentLimitMiddleware truncates excess parallel task tool calls at the model response level, replacing the runtime check in task_tool Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-09 13:21:58 +08:00			`"""Middleware to enforce maximum concurrent subagent tool calls per model response."""`

			`import logging`
			`from typing import override`

			`from langchain.agents import AgentState`
			`from langchain.agents.middleware import AgentMiddleware`
			`from langgraph.runtime import Runtime`

refactor: split backend into harness (deerflow.) and app (app.) (#1131) * refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.) and app (app.) Physically split the monolithic backend/src/ package into two layers: - Harness (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - App* (`app/`): unpublished application code with import prefix `app.`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src. → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.` or `import app.` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow./app. after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> 2026-03-14 22:55:52 +08:00			`from deerflow.subagents.executor import MAX_CONCURRENT_SUBAGENTS`
feat: add DanglingToolCallMiddleware and SubagentLimitMiddleware Add two new middlewares to improve robustness of the agent pipeline: - DanglingToolCallMiddleware injects placeholder ToolMessages for interrupted tool calls, preventing LLM errors from malformed history - SubagentLimitMiddleware truncates excess parallel task tool calls at the model response level, replacing the runtime check in task_tool Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-09 13:21:58 +08:00
			`logger = logging.getLogger(__name__)`

feat: make max concurrent subagents configurable via runtime config Support configuring max_concurrent_subagents (2-4, default 3) through config.configurable, with automatic clamping and dynamic prompt generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-11 11:41:30 +08:00			`# Valid range for max_concurrent_subagents`
			`MIN_SUBAGENT_LIMIT = 2`
			`MAX_SUBAGENT_LIMIT = 4`


			`def _clamp_subagent_limit(value: int) -> int:`
			`"""Clamp subagent limit to valid range [2, 4]."""`
			`return max(MIN_SUBAGENT_LIMIT, min(MAX_SUBAGENT_LIMIT, value))`

feat: add DanglingToolCallMiddleware and SubagentLimitMiddleware Add two new middlewares to improve robustness of the agent pipeline: - DanglingToolCallMiddleware injects placeholder ToolMessages for interrupted tool calls, preventing LLM errors from malformed history - SubagentLimitMiddleware truncates excess parallel task tool calls at the model response level, replacing the runtime check in task_tool Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-09 13:21:58 +08:00
			`class SubagentLimitMiddleware(AgentMiddleware[AgentState]):`
			`"""Truncates excess 'task' tool calls from a single model response.`

feat: make max concurrent subagents configurable via runtime config Support configuring max_concurrent_subagents (2-4, default 3) through config.configurable, with automatic clamping and dynamic prompt generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-11 11:41:30 +08:00			`When an LLM generates more than max_concurrent parallel task tool calls`
			`in one response, this middleware keeps only the first max_concurrent and`
feat: add DanglingToolCallMiddleware and SubagentLimitMiddleware Add two new middlewares to improve robustness of the agent pipeline: - DanglingToolCallMiddleware injects placeholder ToolMessages for interrupted tool calls, preventing LLM errors from malformed history - SubagentLimitMiddleware truncates excess parallel task tool calls at the model response level, replacing the runtime check in task_tool Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-09 13:21:58 +08:00			`discards the rest. This is more reliable than prompt-based limits.`
feat: make max concurrent subagents configurable via runtime config Support configuring max_concurrent_subagents (2-4, default 3) through config.configurable, with automatic clamping and dynamic prompt generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-11 11:41:30 +08:00
			`Args:`
			`max_concurrent: Maximum number of concurrent subagent calls allowed.`
			`Defaults to MAX_CONCURRENT_SUBAGENTS (3). Clamped to [2, 4].`
feat: add DanglingToolCallMiddleware and SubagentLimitMiddleware Add two new middlewares to improve robustness of the agent pipeline: - DanglingToolCallMiddleware injects placeholder ToolMessages for interrupted tool calls, preventing LLM errors from malformed history - SubagentLimitMiddleware truncates excess parallel task tool calls at the model response level, replacing the runtime check in task_tool Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-09 13:21:58 +08:00			`"""`

feat: make max concurrent subagents configurable via runtime config Support configuring max_concurrent_subagents (2-4, default 3) through config.configurable, with automatic clamping and dynamic prompt generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-11 11:41:30 +08:00			`def __init__(self, max_concurrent: int = MAX_CONCURRENT_SUBAGENTS):`
			`super().__init__()`
			`self.max_concurrent = _clamp_subagent_limit(max_concurrent)`

feat: add DanglingToolCallMiddleware and SubagentLimitMiddleware Add two new middlewares to improve robustness of the agent pipeline: - DanglingToolCallMiddleware injects placeholder ToolMessages for interrupted tool calls, preventing LLM errors from malformed history - SubagentLimitMiddleware truncates excess parallel task tool calls at the model response level, replacing the runtime check in task_tool Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-09 13:21:58 +08:00			`def _truncate_task_calls(self, state: AgentState) -> dict \| None:`
			`messages = state.get("messages", [])`
			`if not messages:`
			`return None`

			`last_msg = messages[-1]`
			`if getattr(last_msg, "type", None) != "ai":`
			`return None`

			`tool_calls = getattr(last_msg, "tool_calls", None)`
			`if not tool_calls:`
			`return None`

			`# Count task tool calls`
			`task_indices = [i for i, tc in enumerate(tool_calls) if tc.get("name") == "task"]`
feat: make max concurrent subagents configurable via runtime config Support configuring max_concurrent_subagents (2-4, default 3) through config.configurable, with automatic clamping and dynamic prompt generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-11 11:41:30 +08:00			`if len(task_indices) <= self.max_concurrent:`
feat: add DanglingToolCallMiddleware and SubagentLimitMiddleware Add two new middlewares to improve robustness of the agent pipeline: - DanglingToolCallMiddleware injects placeholder ToolMessages for interrupted tool calls, preventing LLM errors from malformed history - SubagentLimitMiddleware truncates excess parallel task tool calls at the model response level, replacing the runtime check in task_tool Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-09 13:21:58 +08:00			`return None`

			`# Build set of indices to drop (excess task calls beyond the limit)`
feat: make max concurrent subagents configurable via runtime config Support configuring max_concurrent_subagents (2-4, default 3) through config.configurable, with automatic clamping and dynamic prompt generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-11 11:41:30 +08:00			`indices_to_drop = set(task_indices[self.max_concurrent :])`
feat: add DanglingToolCallMiddleware and SubagentLimitMiddleware Add two new middlewares to improve robustness of the agent pipeline: - DanglingToolCallMiddleware injects placeholder ToolMessages for interrupted tool calls, preventing LLM errors from malformed history - SubagentLimitMiddleware truncates excess parallel task tool calls at the model response level, replacing the runtime check in task_tool Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-09 13:21:58 +08:00			`truncated_tool_calls = [tc for i, tc in enumerate(tool_calls) if i not in indices_to_drop]`

			`dropped_count = len(indices_to_drop)`
feat: make max concurrent subagents configurable via runtime config Support configuring max_concurrent_subagents (2-4, default 3) through config.configurable, with automatic clamping and dynamic prompt generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-11 11:41:30 +08:00			`logger.warning(f"Truncated {dropped_count} excess task tool call(s) from model response (limit: {self.max_concurrent})")`
feat: add DanglingToolCallMiddleware and SubagentLimitMiddleware Add two new middlewares to improve robustness of the agent pipeline: - DanglingToolCallMiddleware injects placeholder ToolMessages for interrupted tool calls, preventing LLM errors from malformed history - SubagentLimitMiddleware truncates excess parallel task tool calls at the model response level, replacing the runtime check in task_tool Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-09 13:21:58 +08:00
			`# Replace the AIMessage with truncated tool_calls (same id triggers replacement)`
			`updated_msg = last_msg.model_copy(update={"tool_calls": truncated_tool_calls})`
			`return {"messages": [updated_msg]}`

			`@override`
			`def after_model(self, state: AgentState, runtime: Runtime) -> dict \| None:`
			`return self._truncate_task_calls(state)`

			`@override`
			`async def aafter_model(self, state: AgentState, runtime: Runtime) -> dict \| None:`
			`return self._truncate_task_calls(state)`