os.walk() does not follow symbolic links by default. This means
custom skills installed as symlinks in skills/custom/ are discovered
as directories but never descended into, so their SKILL.md files
are never found and the skills silently fail to load.
Adding followlinks=True fixes this for users who symlink skill
directories from external projects into the custom skills folder.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Only tool calls with name === "task" should be rendered as SubtaskCard.
Previously all tool_calls were mapped to IDs, causing SubtaskCard to
render for non-task tool calls whose IDs were never registered in the
subtask context, resulting in a TypeError on task.status.
Signed-off-by: Gao Mingfei <g199209@gmail.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Surface the usage_metadata that PR #1218 added to the streaming API.
A compact indicator in the chat header shows cumulative tokens consumed
per thread, with a tooltip breakdown of input/output/total counts.
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(threads): clean up local thread data after thread deletion
Delete DeerFlow-managed thread directories after the web UI removes a LangGraph thread.
This keeps local thread data in sync with conversation deletion and adds regression coverage for the cleanup flow.
* fix(threads): address thread cleanup review feedback
Encode thread cleanup URLs in the web client, keep cache updates explicit when no thread search data is cached, and return a generic 500 response from the cleanup endpoint while documenting the sanitized error behavior.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: add error handling for podcast generation failures
When TTS processing fails, the system was generating 0-second audio files
without any error indication. This fix adds:
1. Track failed TTS lines and log warning with indices
2. Raise ValueError when all TTS generation fails with helpful message
3. Check for empty audio output in mix_audio and raise error
4. Log success/failure ratio for debugging
Fixes#30
* fix: address Copilot review feedback
- Use `not audio` to catch both None and empty bytes
- Log failed lines with 1-based indices for user-friendly output
- Handle empty script case with clear error message
- Validate env vars before ThreadPoolExecutor for fast-fail on config errors
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(frontend): add Cmd+K command palette and keyboard shortcuts
Wire up the existing shadcn/ui Command component as a global command
palette. Adds a useGlobalShortcuts hook for Cmd+K (palette), Cmd+Shift+N
(new chat), Cmd+, (settings), and Cmd+/ (shortcuts help overlay).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(frontend): address Copilot review feedback on command palette
- Normalize event.key with toLowerCase() for reliable Shift+key matching
- Replace dead deerflow:open-settings event with router.push navigation
- Use platform-appropriate Shift label (Shift+ on Windows/Linux, glyph on Mac)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Add GuardrailMiddleware that evaluates every tool call before execution.
Three provider options: built-in AllowlistProvider (zero deps), OAP passport
providers (open standard), or custom providers loaded by class path.
- GuardrailProvider protocol with GuardrailRequest/Decision dataclasses
- GuardrailMiddleware (AgentMiddleware, position 5 in chain)
- AllowlistProvider for simple deny/allow by tool name
- GuardrailsConfig (Pydantic singleton, loaded from config.yaml)
- 25 tests covering allow/deny, fail-closed/open, async, GraphBubbleUp
- Comprehensive docs at backend/docs/GUARDRAILS.md
Closes#1213
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(web): add conversation export as Markdown and JSON (#976)
Add the ability to export conversations in Markdown and JSON formats,
accessible from both the chat header and the sidebar context menu.
- Add export utility (formatThreadAsMarkdown, formatThreadAsJSON) with
support for user/assistant messages, thinking blocks, and tool calls
- Add ExportTrigger component in chat header (appears when messages exist)
- Add Export submenu to sidebar dropdown (fetches full thread state on demand)
- Add i18n translations for en-US and zh-CN
Closes#976
Made-with: Cursor
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update thread creation timestamp to updated_at
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The dev compose file was missing CLI auth directory mounts that exist in
the production compose file. This caused CodexChatModel to fail with
'Codex CLI credential not found' error in dev mode.
Fixes#1246
* feat: add Claude Code OAuth and Codex CLI providers
Port of bytedance/deer-flow#1136 from @solanian's feat/cli-oauth-providers branch.\n\nCarries the feature forward on top of current main without the original CLA-blocked commit metadata, while preserving attribution in the commit message for review.
* fix: harden CLI credential loading
Align Codex auth loading with the current ~/.codex/auth.json shape, make Docker credential mounts directory-based to avoid broken file binds on hosts without exported credential files, and add focused loader tests.
* refactor: tighten codex auth typing
Replace the temporary Any return type in CodexChatModel._load_codex_auth with the concrete CodexCliCredential type after the credential loader was stabilized.
* fix: load Claude Code OAuth from Keychain
Match Claude Code's macOS storage strategy more closely by checking the Keychain-backed credentials store before falling back to ~/.claude/.credentials.json. Keep explicit file overrides and add focused tests for the Keychain path.
* fix: require explicit Claude OAuth handoff
* style: format thread hooks reasoning request
* docs: document CLI-backed auth providers
* fix: address provider review feedback
* fix: harden provider edge cases
* Fix deferred tools, Codex message normalization, and local sandbox paths
* chore: narrow PR scope to OAuth providers
* chore: remove unrelated frontend changes
* chore: reapply OAuth branch frontend scope cleanup
* fix: preserve upload guards with reasoning effort wiring
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: normalize ToolMessage structured content in serialization
When models return ToolMessage content as a list of content blocks
(e.g. [{"type": "text", "text": "..."}]), the UI previously displayed
the raw Python repr string instead of the extracted text.
Replace str(msg.content) with the existing _extract_text() helper in
both _serialize_message() and stream() to properly normalize
list-of-blocks content to plain text.
Fixes#1149
Also fixes the same root cause as #1188 (characters displayed one per
line when tool response content is returned as structured blocks).
Added 11 regression tests covering string, list-of-blocks, mixed,
empty, and fallback content types.
* fix(memory): extract text from structured LLM responses in memory updater
When LLMs return response content as list of content blocks
(e.g. [{"type": "text", "text": "..."}]) instead of plain strings,
str() produces Python repr which breaks JSON parsing in the memory
updater. This caused memory updates to silently fail.
Changes:
- Add _extract_text() helper in updater.py for safe content normalization
- Use _extract_text() instead of str(response.content) in update_memory()
- Fix format_conversation_for_update() to handle plain strings in list content
- Fix subagent executor fallback path to extract text from list content
- Replace print() with structured logging (logger.info/warning/error)
- Add 13 regression tests covering _extract_text, format_conversation,
and update_memory with structured LLM responses
* fix: address Copilot review - defensive text extraction + logger.exception
- client.py _extract_text: use block.get('text') + isinstance check (prevent KeyError/TypeError)
- prompt.py format_conversation_for_update: same defensive check for dict text blocks
- executor.py: type-safe text extraction in both code paths, fallback to placeholder instead of str(raw_content)
- updater.py: use logger.exception() instead of logger.error() for traceback preservation
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: preserve chunked structured content without spurious newlines
* fix: restore backend unit test compatibility
---------
Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: track token usage per conversation turn
Add token usage tracking to the streaming API so consumers can monitor
cost per turn without additional API calls.
Changes:
1. _serialize_message now includes usage_metadata for AI messages in
values events, exposing input_tokens/output_tokens/total_tokens
from LangChain's native metadata.
2. stream() accumulates token usage across all AI messages in a turn
and emits the cumulative totals in the end event:
{usage: {input_tokens: N, output_tokens: N, total_tokens: N}}
3. Each messages-tuple AI event with text content now includes a
per-message usage_metadata field for granular tracking.
This enables the frontend to display token consumption per turn,
support cost-aware UX, and let users monitor API spending.
10 tests added covering serialization passthrough and cumulative
aggregation logic.
Co-Authored-By: OpenClaw <noreply@openclaw.ai>
* fix: address Copilot review - use Mapping access for usage_metadata
- Replace getattr(usage, 'input_tokens', 0) with usage.get('input_tokens', 0)
since LangChain usage_metadata is a dict, not an object
- Remove unused 'import pytest' (fixes Ruff F401)
- Add proper stream() integration tests for cumulative usage in end event
and per-message usage_metadata in messages-tuple events
---------
Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com>
Co-authored-by: OpenClaw <noreply@openclaw.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
This PR improves MiniMax Code Plan integration in DeerFlow by fixing three issues in the current flow: stream errors were not clearly surfaced in the UI, the frontend could not display the actual provider model ID, and MiniMax reasoning output could leak into final assistant content as inline <think>...</think>. The change adds a MiniMax-specific adapter, exposes real model IDs end-to-end, and adds a frontend fallback for historical messages.
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(feishu): support @bot message in topic groups
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(feishu): preserve rich-text formatting and add parser unit tests
* chore(test): remove unused import to fix ruff lint error
* style: auto-format imports to satisfy ruff
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(manager): add bootstrap command to initialize soul.md in correct place
* feat(channels): add /bootstrap command to IM channels
Add a `/bootstrap` command that routes to the chat handler with
`is_bootstrap: True` in the run context, allowing the agent to invoke
its setup/initialization flow (e.g. `setup_agent`).
- The text after `/bootstrap` is forwarded as the chat message; when
omitted a default "Initialize workspace" message is used.
- Feishu channels use the streaming path as with normal chat.
- No changes to ChannelStore — bootstrap is stateless and triggered
purely by the command.
- Update /help output to include /bootstrap.
- Add 5 tests covering: text/no-text variants, Feishu streaming path,
thread creation, and help text.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix: accept copilot suggestion
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(scripts): handle docker-init failures gracefully for private registry
The make docker-init command was failing on Linux environments when users
could not access the private Volces container registry. This commonly
occurs in corporate intranet environments with proxies or for users
without registry credentials.
Changes:
- Detect sandbox mode from config.yaml before attempting image pull
- Skip image pull entirely for local sandbox mode (default)
- Gracefully handle pull failures with informative messages
- Update setup-sandbox Makefile target with same error handling
Fixes#1168
* Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: BillionClaw <billionclaw@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(frontend): block duplicate sends during uploads
Expose pre-submit upload work as a busy state so the chat input does not allow a second send while the first attachment is still uploading.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* docs(frontend): document upload and stream ownership
Record that thread hooks own upload-before-submit state while the chat page owns composer busy wiring, so future changes do not reintroduce duplicate socket or upload state handling.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* fix(frontend): separate upload busy state from streaming
Keep uploads from reusing the streaming stop state so duplicate submits are blocked without turning the composer into a stop button during file uploads.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(harness): allow agent read access to /mnt/skills in local sandbox
Skill files under /mnt/skills/ were blocked by the path validator,
preventing agents from reading skill definitions. This change:
- Refactors `resolve_local_tool_path` into `validate_local_tool_path`,
a pure security gate that no longer resolves paths (left to the sandbox)
- Permits read-only access to the skills container path (/mnt/skills by
default, configurable via config.skills.container_path)
- Blocks write access to skills paths (PermissionError)
- Allows /mnt/skills in bash command path validation
- Adds `LocalSandbox.update_path_mappings` and injects per-thread
user-data mappings into the sandbox so all virtual-path resolution
is handled uniformly by the sandbox layer
- Covers all new behaviour with tests
Fixes#1177
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(sandbox): unify all virtual path resolution in tools.py
Move skills path resolution from LocalSandbox into tools.py so that all
virtual-to-host path translation (user-data and skills) lives in one
layer. LocalSandbox becomes a pure execution layer that receives only
real host paths — no more path_mappings, _resolve_path, or reverse
resolve logic.
This addresses architecture feedback that path resolution was split
across two layers (tools.py for user-data, LocalSandbox for skills),
making the flow hard to follow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(sandbox): address Copilot review — cache-on-success and error path masking
- Replace @lru_cache with manual cache-on-success for _get_skills_container_path
and _get_skills_host_path so transient failures at startup don't permanently
disable skills access.
- Add _sanitize_error() helper that masks host filesystem paths in error
messages via mask_local_paths_in_output before returning them to the agent.
- Apply _sanitize_error() to all catch-all (Exception/OSError) handlers in
sandbox tool functions to prevent host path leakage in error output.
- Remove unused lru_cache import.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(tools): add tool_search for deferred MCP tool loading
When multiple MCP servers are enabled, total tool count can exceed 30-50,
causing context bloat and degraded tool selection accuracy. This adds a
deferred tool loading mechanism controlled by `tool_search.enabled` config.
- Add ToolSearchConfig with single `enabled` field
- Add DeferredToolRegistry with regex search (select:, +keyword, keyword)
- Add tool_search tool returning OpenAI-compatible function JSON
- Add DeferredToolFilterMiddleware to hide deferred schemas from bind_tools
- Add <available-deferred-tools> section to system prompt
- Enable MCP tool_name_prefix to prevent cross-server name collisions
- Add 34 unit tests covering registry, tool, prompt, and middleware
* fix: reset stale deferred registry and bump config_version
- Reset deferred registry upfront in get_available_tools() to prevent
stale tool entries when MCP servers are disabled between calls
- Bump config_version to 2 for new tool_search config field
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tests): mock get_app_config in prompt section tests for CI
CI has no config.yaml, causing TestDeferredToolsPromptSection to fail
with FileNotFoundError. Add autouse fixture to mock get_app_config.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The cleanup() trap kills "next dev" and "next start" but not
"next-server". Since "next start" forks a "next-server" child
process, killing the parent may leave the child running as a
zombie, holding port 3000. The startup teardown block (line 35)
already handles this, but the Ctrl+C / SIGTERM trap did not.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add citation/reference support to deep research reports (#1141)
- Enhance lead agent system prompt with mandatory citation requirements
after web_search/web_fetch tool usage
- Add citation examples and best practices to GitHub Deep Research skill
- Add citation hints to report template (Executive Summary, Key Analysis)
- Style regular markdown links in frontend for visual distinction
(color, underline, hover effect)
- Fix TitleMiddleware being registered when title generation is disabled
* fix: address PR review comments
- Revert TitleMiddleware conditional registration (agent.py) to avoid
sync/async incompatibility with DeerFlowClient
- Fix markdown link rendering: merge classNames instead of overwriting,
only set target=_blank for external http(s) URLs
- Remove unrelated package.json/pnpm-lock.yaml changes
* fix: use plain markdown links in Sources section for cleaner rendering
Inline citations in report body use [citation:Title](URL) for pill/badge style.
Sources section uses plain [Title](URL) for simple underlined link style.
* fix(frontend): render plain links as underlined text in artifact markdown
Only links with citation: prefix render as Badge pills.
Regular links in Sources section now render as underlined text links.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The help text described docker-init as "Build the custom k3s image"
but the actual implementation (scripts/docker.sh init) only pulls
the sandbox image. Updated to match the real behavior.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Wrap the OGL Renderer instantiation in a try-catch so the app does not
crash when WebGL is unavailable (e.g. hardware acceleration disabled).
The Galaxy background simply does not render instead of taking down the
entire page.
Fixes#1144
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>