Files
deer-flow/backend/packages/harness/deerflow/agents/middlewares/view_image_middleware.py
DanielWalnut 76803b826f refactor: split backend into harness (deerflow.*) and app (app.*) (#1131)
* refactor: extract shared utils to break harness→app cross-layer imports

Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: split backend/src into harness (deerflow.*) and app (app.*)

Physically split the monolithic backend/src/ package into two layers:

- **Harness** (`packages/harness/deerflow/`): publishable agent framework
  package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
  models, MCP, skills, config, and all core infrastructure.

- **App** (`app/`): unpublished application code with import prefix `app.*`.
  Contains gateway (FastAPI REST API) and channels (IM integrations).

Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure

Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: add harness→app boundary check test and update docs

Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.

Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add config versioning with auto-upgrade on startup

When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.

- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix comments

* fix: update src.* import in test_sandbox_tools_security to deerflow.*

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle empty config and search parent dirs for config.example.yaml

Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
  looking next to config.yaml, fixing detection in common setups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct skills root path depth and config_version type coercion

- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
  after harness split, file lives at packages/harness/deerflow/skills/
  so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
  _check_config_version() to prevent TypeError when YAML stores value
  as string (e.g. config_version: "1")
- tests: add regression tests for both fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update test imports from src.* to deerflow.*/app.* after harness refactor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 22:55:52 +08:00

222 lines
8.4 KiB
Python

"""Middleware for injecting image details into conversation before LLM call."""
from typing import NotRequired, override
from langchain.agents import AgentState
from langchain.agents.middleware import AgentMiddleware
from langchain_core.messages import AIMessage, HumanMessage, ToolMessage
from langgraph.runtime import Runtime
from deerflow.agents.thread_state import ViewedImageData
class ViewImageMiddlewareState(AgentState):
"""Compatible with the `ThreadState` schema."""
viewed_images: NotRequired[dict[str, ViewedImageData] | None]
class ViewImageMiddleware(AgentMiddleware[ViewImageMiddlewareState]):
"""Injects image details as a human message before LLM calls when view_image tools have completed.
This middleware:
1. Runs before each LLM call
2. Checks if the last assistant message contains view_image tool calls
3. Verifies all tool calls in that message have been completed (have corresponding ToolMessages)
4. If conditions are met, creates a human message with all viewed image details (including base64 data)
5. Adds the message to state so the LLM can see and analyze the images
This enables the LLM to automatically receive and analyze images that were loaded via view_image tool,
without requiring explicit user prompts to describe the images.
"""
state_schema = ViewImageMiddlewareState
def _get_last_assistant_message(self, messages: list) -> AIMessage | None:
"""Get the last assistant message from the message list.
Args:
messages: List of messages
Returns:
Last AIMessage or None if not found
"""
for msg in reversed(messages):
if isinstance(msg, AIMessage):
return msg
return None
def _has_view_image_tool(self, message: AIMessage) -> bool:
"""Check if the assistant message contains view_image tool calls.
Args:
message: Assistant message to check
Returns:
True if message contains view_image tool calls
"""
if not hasattr(message, "tool_calls") or not message.tool_calls:
return False
return any(tool_call.get("name") == "view_image" for tool_call in message.tool_calls)
def _all_tools_completed(self, messages: list, assistant_msg: AIMessage) -> bool:
"""Check if all tool calls in the assistant message have been completed.
Args:
messages: List of all messages
assistant_msg: The assistant message containing tool calls
Returns:
True if all tool calls have corresponding ToolMessages
"""
if not hasattr(assistant_msg, "tool_calls") or not assistant_msg.tool_calls:
return False
# Get all tool call IDs from the assistant message
tool_call_ids = {tool_call.get("id") for tool_call in assistant_msg.tool_calls if tool_call.get("id")}
# Find the index of the assistant message
try:
assistant_idx = messages.index(assistant_msg)
except ValueError:
return False
# Get all ToolMessages after the assistant message
completed_tool_ids = set()
for msg in messages[assistant_idx + 1 :]:
if isinstance(msg, ToolMessage) and msg.tool_call_id:
completed_tool_ids.add(msg.tool_call_id)
# Check if all tool calls have been completed
return tool_call_ids.issubset(completed_tool_ids)
def _create_image_details_message(self, state: ViewImageMiddlewareState) -> list[str | dict]:
"""Create a formatted message with all viewed image details.
Args:
state: Current state containing viewed_images
Returns:
List of content blocks (text and images) for the HumanMessage
"""
viewed_images = state.get("viewed_images", {})
if not viewed_images:
return ["No images have been viewed."]
# Build the message with image information
content_blocks: list[str | dict] = [{"type": "text", "text": "Here are the images you've viewed:"}]
for image_path, image_data in viewed_images.items():
mime_type = image_data.get("mime_type", "unknown")
base64_data = image_data.get("base64", "")
# Add text description
content_blocks.append({"type": "text", "text": f"\n- **{image_path}** ({mime_type})"})
# Add the actual image data so LLM can "see" it
if base64_data:
content_blocks.append(
{
"type": "image_url",
"image_url": {"url": f"data:{mime_type};base64,{base64_data}"},
}
)
return content_blocks
def _should_inject_image_message(self, state: ViewImageMiddlewareState) -> bool:
"""Determine if we should inject an image details message.
Args:
state: Current state
Returns:
True if we should inject the message
"""
messages = state.get("messages", [])
if not messages:
return False
# Get the last assistant message
last_assistant_msg = self._get_last_assistant_message(messages)
if not last_assistant_msg:
return False
# Check if it has view_image tool calls
if not self._has_view_image_tool(last_assistant_msg):
return False
# Check if all tools have been completed
if not self._all_tools_completed(messages, last_assistant_msg):
return False
# Check if we've already added an image details message
# Look for a human message after the last assistant message that contains image details
assistant_idx = messages.index(last_assistant_msg)
for msg in messages[assistant_idx + 1 :]:
if isinstance(msg, HumanMessage):
content_str = str(msg.content)
if "Here are the images you've viewed" in content_str or "Here are the details of the images you've viewed" in content_str:
# Already added, don't add again
return False
return True
def _inject_image_message(self, state: ViewImageMiddlewareState) -> dict | None:
"""Internal helper to inject image details message.
Args:
state: Current state
Returns:
State update with additional human message, or None if no update needed
"""
if not self._should_inject_image_message(state):
return None
# Create the image details message with text and image content
image_content = self._create_image_details_message(state)
# Create a new human message with mixed content (text + images)
human_msg = HumanMessage(content=image_content)
print("[ViewImageMiddleware] Injecting image details message with images before LLM call")
# Return state update with the new message
return {"messages": [human_msg]}
@override
def before_model(self, state: ViewImageMiddlewareState, runtime: Runtime) -> dict | None:
"""Inject image details message before LLM call if view_image tools have completed (sync version).
This runs before each LLM call, checking if the previous turn included view_image
tool calls that have all completed. If so, it injects a human message with the image
details so the LLM can see and analyze the images.
Args:
state: Current state
runtime: Runtime context (unused but required by interface)
Returns:
State update with additional human message, or None if no update needed
"""
return self._inject_image_message(state)
@override
async def abefore_model(self, state: ViewImageMiddlewareState, runtime: Runtime) -> dict | None:
"""Inject image details message before LLM call if view_image tools have completed (async version).
This runs before each LLM call, checking if the previous turn included view_image
tool calls that have all completed. If so, it injects a human message with the image
details so the LLM can see and analyze the images.
Args:
state: Current state
runtime: Runtime context (unused but required by interface)
Returns:
State update with additional human message, or None if no update needed
"""
return self._inject_image_message(state)