* refactor: extract shared utils to break harness→app cross-layer imports
Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: split backend/src into harness (deerflow.*) and app (app.*)
Physically split the monolithic backend/src/ package into two layers:
- **Harness** (`packages/harness/deerflow/`): publishable agent framework
package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
models, MCP, skills, config, and all core infrastructure.
- **App** (`app/`): unpublished application code with import prefix `app.*`.
Contains gateway (FastAPI REST API) and channels (IM integrations).
Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure
Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: add harness→app boundary check test and update docs
Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.
Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add config versioning with auto-upgrade on startup
When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.
- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix comments
* fix: update src.* import in test_sandbox_tools_security to deerflow.*
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: handle empty config and search parent dirs for config.example.yaml
Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
looking next to config.yaml, fixing detection in common setups
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: correct skills root path depth and config_version type coercion
- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
after harness split, file lives at packages/harness/deerflow/skills/
so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
_check_config_version() to prevent TypeError when YAML stores value
as string (e.g. config_version: "1")
- tests: add regression tests for both fixes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: update test imports from src.* to deerflow.*/app.* after harness refactor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(harness): add tool-first ACP agent invocation (#37)
* feat(harness): add tool-first ACP agent invocation
* build(harness): make ACP dependency required
* fix(harness): address ACP review feedback
* feat(harness): decouple ACP agent workspace from thread data
ACP agents (codex, claude-code) previously used per-thread workspace
directories, causing path resolution complexity and coupling task
execution to DeerFlow's internal thread data layout. This change:
- Replace _resolve_cwd() with a fixed _get_work_dir() that always uses
{base_dir}/acp-workspace/, eliminating virtual path translation and
thread_id lookups
- Introduce /mnt/acp-workspace virtual path for lead agent read-only
access to ACP agent output files (same pattern as /mnt/skills)
- Add security guards: read-only validation, path traversal prevention,
command path allowlisting, and output masking for acp-workspace
- Update system prompt and tool description to guide LLM: send
self-contained tasks to ACP agents, copy results via /mnt/acp-workspace
- Add 11 new security tests for ACP workspace path handling
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(prompt): inject ACP section only when ACP agents are configured
The ACP agent guidance in the system prompt is now conditionally built
by _build_acp_section(), which checks get_acp_agents() and returns an
empty string when no ACP agents are configured. This avoids polluting
the prompt with irrelevant instructions for users who don't use ACP.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix lint
* fix(harness): address Copilot review comments on sandbox path handling and ACP tool
- local_sandbox: fix path-segment boundary bug in _resolve_path (== or startswith +"/")
and add lookahead in _resolve_paths_in_command regex to prevent /mnt/skills matching
inside /mnt/skills-extra
- local_sandbox_provider: replace print() with logger.warning(..., exc_info=True)
- invoke_acp_agent_tool: guard getattr(option, "optionId") with None default + continue;
move full prompt from INFO to DEBUG level (truncated to 200 chars)
- sandbox/tools: fix _get_acp_workspace_host_path docstring to match implementation;
remove misleading "read-only" language from validate_local_bash_command_paths
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(acp): thread-isolated workspaces, permission guardrail, and ContextVar registry
P1.1 – ACP workspace thread isolation
- Add `Paths.acp_workspace_dir(thread_id)` for per-thread paths
- `_get_work_dir(thread_id)` in invoke_acp_agent_tool now uses
`{base_dir}/threads/{thread_id}/acp-workspace/`; falls back to
global workspace when thread_id is absent or invalid
- `_invoke` extracts thread_id from `RunnableConfig` via
`Annotated[RunnableConfig, InjectedToolArg]`
- `sandbox/tools.py`: `_get_acp_workspace_host_path(thread_id)`,
`_resolve_acp_workspace_path(path, thread_id)`, and all callers
(`replace_virtual_paths_in_command`, `mask_local_paths_in_output`,
`ls_tool`, `read_file_tool`) now resolve ACP paths per-thread
P1.2 – ACP permission guardrail
- New `auto_approve_permissions: bool = False` field in `ACPAgentConfig`
- `_build_permission_response(options, *, auto_approve: bool)` now
defaults to deny; only approves when `auto_approve=True`
- Document field in `config.example.yaml`
P2 – Deferred tool registry race condition
- Replace module-level `_registry` global with `contextvars.ContextVar`
- Each asyncio request context gets its own registry; worker threads
inherit the context automatically via `loop.run_in_executor`
- Expose `get_deferred_registry` / `set_deferred_registry` /
`reset_deferred_registry` helpers
Tests: 831 pass (57 for affected modules, 3 new tests)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(sandbox): mount /mnt/acp-workspace in docker sandbox container
The AioSandboxProvider was not mounting the ACP workspace into the
sandbox container, so /mnt/acp-workspace was inaccessible when the lead
agent tried to read ACP results in docker mode.
Changes:
- `ensure_thread_dirs`: also create `acp-workspace/` (chmod 0o777) so
the directory exists before the sandbox container starts — required
for Docker volume mounts
- `_get_thread_mounts`: add read-only `/mnt/acp-workspace` mount using
the per-thread host path (`host_paths.acp_workspace_dir(thread_id)`)
- Update stale CLAUDE.md description (was "fixed global workspace")
Tests: `test_aio_sandbox_provider.py` (4 new tests)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lint): remove unused imports in test_aio_sandbox_provider
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix config
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* test: add unit tests for TodoMiddleware
Cover context-loss detection logic:
- _todos_in_messages and _reminder_in_messages helpers
- _format_todos formatting
- Reminder injection when write_todos truncated
- No-op when todos visible or reminder already present
- abefore_model async delegation
* test: fix event loop error in todo middleware async test
Use asyncio.run() instead of get_event_loop().run_until_complete()
to avoid RuntimeError on Python 3.12 where no default event loop
exists in the main thread.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* test: add unit tests for DanglingToolCallMiddleware
Cover message patching logic for dangling tool calls:
- No-op when all tool calls have responses
- Synthetic ToolMessage insertion at correct positions
- Mixed responded/dangling scenarios
- wrap_model_call and awrap_model_call integration
* test: fix async tests and strengthen override assertions
- Use @pytest.mark.anyio + async def instead of deprecated
asyncio.get_event_loop().run_until_complete() (fixes Py3.12 CI failure)
- Assert that override() receives the correct patched messages kwarg
in both wrap_model_call and awrap_model_call tests
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
- Add null checks for runtime.context in uploads_middleware.py and
sandbox/middleware.py to prevent NPE when langgraph runtime context is None
- Tighten langgraph version constraint from >=1.0.6 to >=1.0.6,<1.0.10
to avoid context=None incompatibility with langgraph-api 0.7.x
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* refactor: extract shared skill installer and upload manager to harness
Move duplicated business logic from Gateway routers and Client into
shared harness modules, eliminating code duplication.
New shared modules:
- deerflow.skills.installer: 6 functions (zip security, extraction, install)
- deerflow.uploads.manager: 7 functions (normalize, deduplicate, validate,
list, delete, get_uploads_dir, ensure_uploads_dir)
Key improvements:
- SkillAlreadyExistsError replaces stringly-typed 409 status routing
- normalize_filename rejects backslash-containing filenames
- Read paths (list/delete) no longer mkdir via get_uploads_dir
- Write paths use ensure_uploads_dir for explicit directory creation
- list_files_in_dir does stat inside scandir context (no re-stat)
- install_skill_from_archive uses single is_file() check (one syscall)
- Fix agent config key not reset on update_mcp_config/update_skill
Tests: 42 new (22 installer + 20 upload manager) + client hardening
* refactor: centralize upload URL construction and clean up installer
- Extract upload_virtual_path(), upload_artifact_url(), enrich_file_listing()
into shared manager.py, eliminating 6 duplicated URL constructions across
Gateway router and Client
- Derive all upload URLs from VIRTUAL_PATH_PREFIX constant instead of
hardcoded "mnt/user-data/uploads" strings
- Eliminate TOCTOU pre-checks and double file read in installer — single
ZipFile() open with exception handling replaces is_file() + is_zipfile()
+ ZipFile() sequence
- Add missing re-exports: ensure_uploads_dir in uploads/__init__.py,
SkillAlreadyExistsError in skills/__init__.py
- Remove redundant .lower() on already-lowercase CONVERTIBLE_EXTENSIONS
- Hoist sandbox_uploads_dir(thread_id) before loop in uploads router
* fix: add input validation for thread_id and filename length
- Reject thread_id containing unsafe filesystem characters (only allow
alphanumeric, hyphens, underscores, dots) — prevents 500 on inputs
like <script> or shell metacharacters
- Reject filenames longer than 255 bytes (OS limit) in normalize_filename
- Gateway upload router maps ValueError to 400 for invalid thread_id
* fix: address PR review — symlink safety, input validation coverage, error ordering
- list_files_in_dir: use follow_symlinks=False to prevent symlink metadata
leakage; check is_dir() instead of exists() for non-directory paths
- install_skill_from_archive: restore is_file() pre-check before extension
validation so error messages match the documented exception contract
- validate_thread_id: move from ensure_uploads_dir to get_uploads_dir so
all entry points (upload/list/delete) are protected
- delete_uploaded_file: catch ValueError from thread_id validation (was 500)
- requires_llm marker: also skip when OPENAI_API_KEY is unset
- e2e fixture: update TitleMiddleware exclusion comment (kept filtering —
middleware triggers extra LLM calls that add non-determinism to tests)
* chore: revert uv.lock to main — no dependency changes in this PR
* fix: use monkeypatch for global config in e2e fixture to prevent test pollution
The e2e_env fixture was calling set_title_config() and
set_summarization_config() directly, which mutated global singletons
without automatic cleanup. When pytest ran test_client_e2e.py before
test_title_middleware_core_logic.py, the leaked enabled=False caused
5 title tests to fail in CI.
Switched to monkeypatch.setattr on the module-level private variables
so pytest restores the originals after each test.
* fix: address code review — URL encoding, API consistency, test isolation
- upload_artifact_url: percent-encode filename to handle spaces/#/?
- deduplicate_filename: mutate seen set in place (caller no longer
needs manual .add() — less error-prone API)
- list_files_in_dir: document that size is int, enrich stringifies
- e2e fixture: monkeypatch _app_config instead of set_app_config()
to prevent global singleton pollution (same pattern as title/summarization fix)
- _make_e2e_config: read LLM connection details from env vars so
external contributors can override defaults
- Update tests to match new deduplicate_filename contract
* docs: rewrite RFC in English and add alternatives/breaking changes sections
* fix: address code review feedback on PR #1202
- Rename deduplicate_filename to claim_unique_filename to make
the in-place set mutation explicit in the function name
- Replace PermissionError with PathTraversalError(ValueError) for
path traversal detection — malformed input is 400, not 403
* fix: set _app_config_is_custom in e2e test fixture to prevent config.yaml lookup in CI
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
* feat: add configurable log level and token usage tracking
- Add `log_level` config to control deerflow module log level, synced
to LangGraph Server via serve.sh `--server-log-level`
- Add `token_usage.enabled` config with TokenUsageMiddleware that logs
input/output/total tokens per LLM call from usage_metadata
- Add .omc/ to .gitignore
* fix: use info level for token usage logs since feature has its own toggle
* fix: sort imports to pass lint check
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
LoopDetectionMiddleware injected SystemMessage mid-conversation to warn
about repetitive tool calls. This crashes Anthropic models because
langchain_anthropic's _format_messages() requires system messages to
appear only at the start of the conversation — interleaved system
messages raise 'Received multiple non-consecutive system messages'.
Switch the warning injection from SystemMessage to HumanMessage, which
works with all providers (Anthropic, OpenAI, Google, etc.).
Fixes#1299
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
* fix: add Windows compatibility for make dev/start commands
On Windows with MinGW/Git Bash, the Makefile's direct shell script
execution fails with 'CreateProcess(NULL, env bash ...)' error.
This fix:
- Detects Windows via $(OS) == Windows_NT
- Uses explicit bash invocation on Windows
- Falls back to direct execution on Unix
Users need Git Bash installed (comes with Git for Windows).
Fixes#1288
Related #1278
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix(mcp): implement sync invocation wrapper for async MCP tools
Since DeerFlowClient streams synchronously, invoking async-only MCP tools
(loaded via langchain-mcp-adapters) resulted in a NotImplementedError.
This commit bridges the sync/async gap by dynamically injecting a `func`
wrapper into `StructuredTool` instances that only have a `coroutine`.
Key changes:
- Added `sync_wrapper` in `get_mcp_tools` to execute async tool calls.
- Handled nested event loops by delegating to a global `ThreadPoolExecutor`
when an event loop is already running, avoiding `RuntimeError`.
- Added detailed error logging within the wrapper for better transparency.
- Added comprehensive test coverage in `test_mcp_sync_wrapper.py` verifying
tool patching, event loop behavior, and exception propagation.
* refactor(mcp): extract sync wrapper to module level and fix test mocks
Addressed PR review comments:
- Extracted _make_sync_tool_wrapper to module level to avoid nested func definitions.
- Refactored tests to use the actual production helper instead of duplicating logic.
- Fixed AsyncMock patching for awaited dependencies in tests.
- Added atexit hook for graceful thread pool shutdown.
- Fixed PEP8 blank line formatting in tests.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
os.walk() does not follow symbolic links by default. This means
custom skills installed as symlinks in skills/custom/ are discovered
as directories but never descended into, so their SKILL.md files
are never found and the skills silently fail to load.
Adding followlinks=True fixes this for users who symlink skill
directories from external projects into the custom skills folder.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Only tool calls with name === "task" should be rendered as SubtaskCard.
Previously all tool_calls were mapped to IDs, causing SubtaskCard to
render for non-task tool calls whose IDs were never registered in the
subtask context, resulting in a TypeError on task.status.
Signed-off-by: Gao Mingfei <g199209@gmail.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Surface the usage_metadata that PR #1218 added to the streaming API.
A compact indicator in the chat header shows cumulative tokens consumed
per thread, with a tooltip breakdown of input/output/total counts.
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(threads): clean up local thread data after thread deletion
Delete DeerFlow-managed thread directories after the web UI removes a LangGraph thread.
This keeps local thread data in sync with conversation deletion and adds regression coverage for the cleanup flow.
* fix(threads): address thread cleanup review feedback
Encode thread cleanup URLs in the web client, keep cache updates explicit when no thread search data is cached, and return a generic 500 response from the cleanup endpoint while documenting the sanitized error behavior.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: add error handling for podcast generation failures
When TTS processing fails, the system was generating 0-second audio files
without any error indication. This fix adds:
1. Track failed TTS lines and log warning with indices
2. Raise ValueError when all TTS generation fails with helpful message
3. Check for empty audio output in mix_audio and raise error
4. Log success/failure ratio for debugging
Fixes#30
* fix: address Copilot review feedback
- Use `not audio` to catch both None and empty bytes
- Log failed lines with 1-based indices for user-friendly output
- Handle empty script case with clear error message
- Validate env vars before ThreadPoolExecutor for fast-fail on config errors
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(frontend): add Cmd+K command palette and keyboard shortcuts
Wire up the existing shadcn/ui Command component as a global command
palette. Adds a useGlobalShortcuts hook for Cmd+K (palette), Cmd+Shift+N
(new chat), Cmd+, (settings), and Cmd+/ (shortcuts help overlay).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(frontend): address Copilot review feedback on command palette
- Normalize event.key with toLowerCase() for reliable Shift+key matching
- Replace dead deerflow:open-settings event with router.push navigation
- Use platform-appropriate Shift label (Shift+ on Windows/Linux, glyph on Mac)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Add GuardrailMiddleware that evaluates every tool call before execution.
Three provider options: built-in AllowlistProvider (zero deps), OAP passport
providers (open standard), or custom providers loaded by class path.
- GuardrailProvider protocol with GuardrailRequest/Decision dataclasses
- GuardrailMiddleware (AgentMiddleware, position 5 in chain)
- AllowlistProvider for simple deny/allow by tool name
- GuardrailsConfig (Pydantic singleton, loaded from config.yaml)
- 25 tests covering allow/deny, fail-closed/open, async, GraphBubbleUp
- Comprehensive docs at backend/docs/GUARDRAILS.md
Closes#1213
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(web): add conversation export as Markdown and JSON (#976)
Add the ability to export conversations in Markdown and JSON formats,
accessible from both the chat header and the sidebar context menu.
- Add export utility (formatThreadAsMarkdown, formatThreadAsJSON) with
support for user/assistant messages, thinking blocks, and tool calls
- Add ExportTrigger component in chat header (appears when messages exist)
- Add Export submenu to sidebar dropdown (fetches full thread state on demand)
- Add i18n translations for en-US and zh-CN
Closes#976
Made-with: Cursor
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update thread creation timestamp to updated_at
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The dev compose file was missing CLI auth directory mounts that exist in
the production compose file. This caused CodexChatModel to fail with
'Codex CLI credential not found' error in dev mode.
Fixes#1246
* feat: add Claude Code OAuth and Codex CLI providers
Port of bytedance/deer-flow#1136 from @solanian's feat/cli-oauth-providers branch.\n\nCarries the feature forward on top of current main without the original CLA-blocked commit metadata, while preserving attribution in the commit message for review.
* fix: harden CLI credential loading
Align Codex auth loading with the current ~/.codex/auth.json shape, make Docker credential mounts directory-based to avoid broken file binds on hosts without exported credential files, and add focused loader tests.
* refactor: tighten codex auth typing
Replace the temporary Any return type in CodexChatModel._load_codex_auth with the concrete CodexCliCredential type after the credential loader was stabilized.
* fix: load Claude Code OAuth from Keychain
Match Claude Code's macOS storage strategy more closely by checking the Keychain-backed credentials store before falling back to ~/.claude/.credentials.json. Keep explicit file overrides and add focused tests for the Keychain path.
* fix: require explicit Claude OAuth handoff
* style: format thread hooks reasoning request
* docs: document CLI-backed auth providers
* fix: address provider review feedback
* fix: harden provider edge cases
* Fix deferred tools, Codex message normalization, and local sandbox paths
* chore: narrow PR scope to OAuth providers
* chore: remove unrelated frontend changes
* chore: reapply OAuth branch frontend scope cleanup
* fix: preserve upload guards with reasoning effort wiring
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: normalize ToolMessage structured content in serialization
When models return ToolMessage content as a list of content blocks
(e.g. [{"type": "text", "text": "..."}]), the UI previously displayed
the raw Python repr string instead of the extracted text.
Replace str(msg.content) with the existing _extract_text() helper in
both _serialize_message() and stream() to properly normalize
list-of-blocks content to plain text.
Fixes#1149
Also fixes the same root cause as #1188 (characters displayed one per
line when tool response content is returned as structured blocks).
Added 11 regression tests covering string, list-of-blocks, mixed,
empty, and fallback content types.
* fix(memory): extract text from structured LLM responses in memory updater
When LLMs return response content as list of content blocks
(e.g. [{"type": "text", "text": "..."}]) instead of plain strings,
str() produces Python repr which breaks JSON parsing in the memory
updater. This caused memory updates to silently fail.
Changes:
- Add _extract_text() helper in updater.py for safe content normalization
- Use _extract_text() instead of str(response.content) in update_memory()
- Fix format_conversation_for_update() to handle plain strings in list content
- Fix subagent executor fallback path to extract text from list content
- Replace print() with structured logging (logger.info/warning/error)
- Add 13 regression tests covering _extract_text, format_conversation,
and update_memory with structured LLM responses
* fix: address Copilot review - defensive text extraction + logger.exception
- client.py _extract_text: use block.get('text') + isinstance check (prevent KeyError/TypeError)
- prompt.py format_conversation_for_update: same defensive check for dict text blocks
- executor.py: type-safe text extraction in both code paths, fallback to placeholder instead of str(raw_content)
- updater.py: use logger.exception() instead of logger.error() for traceback preservation
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: preserve chunked structured content without spurious newlines
* fix: restore backend unit test compatibility
---------
Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: track token usage per conversation turn
Add token usage tracking to the streaming API so consumers can monitor
cost per turn without additional API calls.
Changes:
1. _serialize_message now includes usage_metadata for AI messages in
values events, exposing input_tokens/output_tokens/total_tokens
from LangChain's native metadata.
2. stream() accumulates token usage across all AI messages in a turn
and emits the cumulative totals in the end event:
{usage: {input_tokens: N, output_tokens: N, total_tokens: N}}
3. Each messages-tuple AI event with text content now includes a
per-message usage_metadata field for granular tracking.
This enables the frontend to display token consumption per turn,
support cost-aware UX, and let users monitor API spending.
10 tests added covering serialization passthrough and cumulative
aggregation logic.
Co-Authored-By: OpenClaw <noreply@openclaw.ai>
* fix: address Copilot review - use Mapping access for usage_metadata
- Replace getattr(usage, 'input_tokens', 0) with usage.get('input_tokens', 0)
since LangChain usage_metadata is a dict, not an object
- Remove unused 'import pytest' (fixes Ruff F401)
- Add proper stream() integration tests for cumulative usage in end event
and per-message usage_metadata in messages-tuple events
---------
Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com>
Co-authored-by: OpenClaw <noreply@openclaw.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>