Files
deer-flow/backend/packages/harness/deerflow/agents/memory/prompt.py

341 lines
13 KiB
Python
Raw Normal View History

"""Prompt templates for memory update and injection."""
import math
fix(memory): prevent file upload events from persisting in long-term memory (#971) * fix(memory): prevent file upload events from persisting in long-term memory Uploaded files are session-scoped and unavailable in future sessions. Previously, upload interactions were recorded in memory, causing the agent to search for non-existent files in subsequent conversations. Changes: - memory_middleware: skip human messages containing <uploaded_files> and their paired AI responses from the memory queue - updater: post-process generated memory to strip upload mentions before saving to file - prompt: instruct the memory LLM to ignore file upload events Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): address Copilot review feedback on upload filtering - memory_middleware: strip <uploaded_files> block from human messages instead of dropping the entire turn; only skip the turn (and paired AI response) when nothing remains after stripping - updater: narrow the upload-scrubbing regex to explicit upload events (avoids false-positive removal of "User works with CSV files" etc.); also filter upload-event facts from the facts array - prompt: move `import re` to module scope; skip upload-only human messages (empty after stripping) rather than appending "User: " Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): allow optional words between 'upload' and 'file' in scrub regex The previous pattern required 'uploading file' with no intervening words, so 'uploading a test file' was not matched and leaked into long-term memory. Allow up to 3 modifier words between the verb and noun (e.g. 'uploading a test file', 'uploaded the attachment'). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(memory): add unit tests for upload filtering in memory pipeline Covers _filter_messages_for_memory and _strip_upload_mentions_from_memory per Copilot review suggestion. 15 test cases verify: - Upload-only turns (and paired AI responses) are excluded from memory queue - User's real question is preserved when combined with an upload block - Upload file paths are never present in filtered message content - Intermediate tool messages are always excluded - Multi-turn conversations: only the upload turn is dropped - Multimodal (list-content) human messages are handled - Upload-event sentences are removed from summaries and facts - Legitimate file-related facts (CSV preferences, PDF exports) are preserved - "uploading a test file" (words between verb and noun) is caught by regex Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-03-05 11:14:34 +08:00
import re
from typing import Any
try:
import tiktoken
TIKTOKEN_AVAILABLE = True
except ImportError:
TIKTOKEN_AVAILABLE = False
# Prompt template for updating memory based on conversation
MEMORY_UPDATE_PROMPT = """You are a memory management system. Your task is to analyze a conversation and update the user's memory profile.
Current Memory State:
<current_memory>
{current_memory}
</current_memory>
New Conversation to Process:
<conversation>
{conversation}
</conversation>
Instructions:
1. Analyze the conversation for important information about the user
2. Extract relevant facts, preferences, and context with specific details (numbers, names, technologies)
3. Update the memory sections as needed following the detailed length guidelines below
Memory Section Guidelines:
**User Context** (Current state - concise summaries):
- workContext: Professional role, company, key projects, main technologies (2-3 sentences)
Example: Core contributor, project names with metrics (16k+ stars), technical stack
- personalContext: Languages, communication preferences, key interests (1-2 sentences)
Example: Bilingual capabilities, specific interest areas, expertise domains
- topOfMind: Multiple ongoing focus areas and priorities (3-5 sentences, detailed paragraph)
Example: Primary project work, parallel technical investigations, ongoing learning/tracking
Include: Active implementation work, troubleshooting issues, market/research interests
Note: This captures SEVERAL concurrent focus areas, not just one task
**History** (Temporal context - rich paragraphs):
- recentMonths: Detailed summary of recent activities (4-6 sentences or 1-2 paragraphs)
Timeline: Last 1-3 months of interactions
Include: Technologies explored, projects worked on, problems solved, interests demonstrated
- earlierContext: Important historical patterns (3-5 sentences or 1 paragraph)
Timeline: 3-12 months ago
Include: Past projects, learning journeys, established patterns
- longTermBackground: Persistent background and foundational context (2-4 sentences)
Timeline: Overall/foundational information
Include: Core expertise, longstanding interests, fundamental working style
**Facts Extraction**:
- Extract specific, quantifiable details (e.g., "16k+ GitHub stars", "200+ datasets")
- Include proper nouns (company names, project names, technology names)
- Preserve technical terminology and version numbers
- Categories:
* preference: Tools, styles, approaches user prefers/dislikes
* knowledge: Specific expertise, technologies mastered, domain knowledge
* context: Background facts (job title, projects, locations, languages)
* behavior: Working patterns, communication habits, problem-solving approaches
* goal: Stated objectives, learning targets, project ambitions
- Confidence levels:
* 0.9-1.0: Explicitly stated facts ("I work on X", "My role is Y")
* 0.7-0.8: Strongly implied from actions/discussions
* 0.5-0.6: Inferred patterns (use sparingly, only for clear patterns)
**What Goes Where**:
- workContext: Current job, active projects, primary tech stack
- personalContext: Languages, personality, interests outside direct work tasks
- topOfMind: Multiple ongoing priorities and focus areas user cares about recently (gets updated most frequently)
Should capture 3-5 concurrent themes: main work, side explorations, learning/tracking interests
- recentMonths: Detailed account of recent technical explorations and work
- earlierContext: Patterns from slightly older interactions still relevant
- longTermBackground: Unchanging foundational facts about the user
**Multilingual Content**:
- Preserve original language for proper nouns and company names
- Keep technical terms in their original form (DeepSeek, LangGraph, etc.)
- Note language capabilities in personalContext
Output Format (JSON):
{{
"user": {{
"workContext": {{ "summary": "...", "shouldUpdate": true/false }},
"personalContext": {{ "summary": "...", "shouldUpdate": true/false }},
"topOfMind": {{ "summary": "...", "shouldUpdate": true/false }}
}},
"history": {{
"recentMonths": {{ "summary": "...", "shouldUpdate": true/false }},
"earlierContext": {{ "summary": "...", "shouldUpdate": true/false }},
"longTermBackground": {{ "summary": "...", "shouldUpdate": true/false }}
}},
"newFacts": [
{{ "content": "...", "category": "preference|knowledge|context|behavior|goal", "confidence": 0.0-1.0 }}
],
"factsToRemove": ["fact_id_1", "fact_id_2"]
}}
Important Rules:
- Only set shouldUpdate=true if there's meaningful new information
- Follow length guidelines: workContext/personalContext are concise (1-3 sentences), topOfMind and history sections are detailed (paragraphs)
- Include specific metrics, version numbers, and proper nouns in facts
- Only add facts that are clearly stated (0.9+) or strongly implied (0.7+)
- Remove facts that are contradicted by new information
- When updating topOfMind, integrate new focus areas while removing completed/abandoned ones
Keep 3-5 concurrent focus themes that are still active and relevant
- For history sections, integrate new information chronologically into appropriate time period
- Preserve technical accuracy - keep exact names of technologies, companies, projects
- Focus on information useful for future interactions and personalization
fix(memory): prevent file upload events from persisting in long-term memory (#971) * fix(memory): prevent file upload events from persisting in long-term memory Uploaded files are session-scoped and unavailable in future sessions. Previously, upload interactions were recorded in memory, causing the agent to search for non-existent files in subsequent conversations. Changes: - memory_middleware: skip human messages containing <uploaded_files> and their paired AI responses from the memory queue - updater: post-process generated memory to strip upload mentions before saving to file - prompt: instruct the memory LLM to ignore file upload events Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): address Copilot review feedback on upload filtering - memory_middleware: strip <uploaded_files> block from human messages instead of dropping the entire turn; only skip the turn (and paired AI response) when nothing remains after stripping - updater: narrow the upload-scrubbing regex to explicit upload events (avoids false-positive removal of "User works with CSV files" etc.); also filter upload-event facts from the facts array - prompt: move `import re` to module scope; skip upload-only human messages (empty after stripping) rather than appending "User: " Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): allow optional words between 'upload' and 'file' in scrub regex The previous pattern required 'uploading file' with no intervening words, so 'uploading a test file' was not matched and leaked into long-term memory. Allow up to 3 modifier words between the verb and noun (e.g. 'uploading a test file', 'uploaded the attachment'). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(memory): add unit tests for upload filtering in memory pipeline Covers _filter_messages_for_memory and _strip_upload_mentions_from_memory per Copilot review suggestion. 15 test cases verify: - Upload-only turns (and paired AI responses) are excluded from memory queue - User's real question is preserved when combined with an upload block - Upload file paths are never present in filtered message content - Intermediate tool messages are always excluded - Multi-turn conversations: only the upload turn is dropped - Multimodal (list-content) human messages are handled - Upload-event sentences are removed from summaries and facts - Legitimate file-related facts (CSV preferences, PDF exports) are preserved - "uploading a test file" (words between verb and noun) is caught by regex Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-03-05 11:14:34 +08:00
- IMPORTANT: Do NOT record file upload events in memory. Uploaded files are
session-specific and ephemeral they will not be accessible in future sessions.
Recording upload events causes confusion in subsequent conversations.
Return ONLY valid JSON, no explanation or markdown."""
# Prompt template for extracting facts from a single message
FACT_EXTRACTION_PROMPT = """Extract factual information about the user from this message.
Message:
{message}
Extract facts in this JSON format:
{{
"facts": [
{{ "content": "...", "category": "preference|knowledge|context|behavior|goal", "confidence": 0.0-1.0 }}
]
}}
Categories:
- preference: User preferences (likes/dislikes, styles, tools)
- knowledge: User's expertise or knowledge areas
- context: Background context (location, job, projects)
- behavior: Behavioral patterns
- goal: User's goals or objectives
Rules:
- Only extract clear, specific facts
- Confidence should reflect certainty (explicit statement = 0.9+, implied = 0.6-0.8)
- Skip vague or temporary information
Return ONLY valid JSON."""
def _count_tokens(text: str, encoding_name: str = "cl100k_base") -> int:
"""Count tokens in text using tiktoken.
Args:
text: The text to count tokens for.
encoding_name: The encoding to use (default: cl100k_base for GPT-4/3.5).
Returns:
The number of tokens in the text.
"""
if not TIKTOKEN_AVAILABLE:
# Fallback to character-based estimation if tiktoken is not available
return len(text) // 4
try:
encoding = tiktoken.get_encoding(encoding_name)
return len(encoding.encode(text))
except Exception:
# Fallback to character-based estimation on error
return len(text) // 4
def _coerce_confidence(value: Any, default: float = 0.0) -> float:
"""Coerce a confidence-like value to a bounded float in [0, 1].
Non-finite values (NaN, inf, -inf) are treated as invalid and fall back
to the default before clamping, preventing them from dominating ranking.
The ``default`` parameter is assumed to be a finite value.
"""
try:
confidence = float(value)
except (TypeError, ValueError):
return max(0.0, min(1.0, default))
if not math.isfinite(confidence):
return max(0.0, min(1.0, default))
return max(0.0, min(1.0, confidence))
def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2000) -> str:
"""Format memory data for injection into system prompt.
Args:
memory_data: The memory data dictionary.
max_tokens: Maximum tokens to use (counted via tiktoken for accuracy).
Returns:
Formatted memory string for system prompt injection.
"""
if not memory_data:
return ""
sections = []
# Format user context
user_data = memory_data.get("user", {})
if user_data:
user_sections = []
work_ctx = user_data.get("workContext", {})
if work_ctx.get("summary"):
user_sections.append(f"Work: {work_ctx['summary']}")
personal_ctx = user_data.get("personalContext", {})
if personal_ctx.get("summary"):
user_sections.append(f"Personal: {personal_ctx['summary']}")
top_of_mind = user_data.get("topOfMind", {})
if top_of_mind.get("summary"):
user_sections.append(f"Current Focus: {top_of_mind['summary']}")
if user_sections:
sections.append("User Context:\n" + "\n".join(f"- {s}" for s in user_sections))
# Format history
history_data = memory_data.get("history", {})
if history_data:
history_sections = []
recent = history_data.get("recentMonths", {})
if recent.get("summary"):
history_sections.append(f"Recent: {recent['summary']}")
earlier = history_data.get("earlierContext", {})
if earlier.get("summary"):
history_sections.append(f"Earlier: {earlier['summary']}")
if history_sections:
sections.append("History:\n" + "\n".join(f"- {s}" for s in history_sections))
# Format facts (sorted by confidence; include as many as token budget allows)
facts_data = memory_data.get("facts", [])
if isinstance(facts_data, list) and facts_data:
ranked_facts = sorted(
feat(harness): integration ACP agent tool (#1344) * refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.*) and app (app.*) Physically split the monolithic backend/src/ package into two layers: - **Harness** (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.*`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - **App** (`app/`): unpublished application code with import prefix `app.*`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src.* → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.*` or `import app.*` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow.*/app.* after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(harness): add tool-first ACP agent invocation (#37) * feat(harness): add tool-first ACP agent invocation * build(harness): make ACP dependency required * fix(harness): address ACP review feedback * feat(harness): decouple ACP agent workspace from thread data ACP agents (codex, claude-code) previously used per-thread workspace directories, causing path resolution complexity and coupling task execution to DeerFlow's internal thread data layout. This change: - Replace _resolve_cwd() with a fixed _get_work_dir() that always uses {base_dir}/acp-workspace/, eliminating virtual path translation and thread_id lookups - Introduce /mnt/acp-workspace virtual path for lead agent read-only access to ACP agent output files (same pattern as /mnt/skills) - Add security guards: read-only validation, path traversal prevention, command path allowlisting, and output masking for acp-workspace - Update system prompt and tool description to guide LLM: send self-contained tasks to ACP agents, copy results via /mnt/acp-workspace - Add 11 new security tests for ACP workspace path handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(prompt): inject ACP section only when ACP agents are configured The ACP agent guidance in the system prompt is now conditionally built by _build_acp_section(), which checks get_acp_agents() and returns an empty string when no ACP agents are configured. This avoids polluting the prompt with irrelevant instructions for users who don't use ACP. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix lint * fix(harness): address Copilot review comments on sandbox path handling and ACP tool - local_sandbox: fix path-segment boundary bug in _resolve_path (== or startswith +"/") and add lookahead in _resolve_paths_in_command regex to prevent /mnt/skills matching inside /mnt/skills-extra - local_sandbox_provider: replace print() with logger.warning(..., exc_info=True) - invoke_acp_agent_tool: guard getattr(option, "optionId") with None default + continue; move full prompt from INFO to DEBUG level (truncated to 200 chars) - sandbox/tools: fix _get_acp_workspace_host_path docstring to match implementation; remove misleading "read-only" language from validate_local_bash_command_paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(acp): thread-isolated workspaces, permission guardrail, and ContextVar registry P1.1 – ACP workspace thread isolation - Add `Paths.acp_workspace_dir(thread_id)` for per-thread paths - `_get_work_dir(thread_id)` in invoke_acp_agent_tool now uses `{base_dir}/threads/{thread_id}/acp-workspace/`; falls back to global workspace when thread_id is absent or invalid - `_invoke` extracts thread_id from `RunnableConfig` via `Annotated[RunnableConfig, InjectedToolArg]` - `sandbox/tools.py`: `_get_acp_workspace_host_path(thread_id)`, `_resolve_acp_workspace_path(path, thread_id)`, and all callers (`replace_virtual_paths_in_command`, `mask_local_paths_in_output`, `ls_tool`, `read_file_tool`) now resolve ACP paths per-thread P1.2 – ACP permission guardrail - New `auto_approve_permissions: bool = False` field in `ACPAgentConfig` - `_build_permission_response(options, *, auto_approve: bool)` now defaults to deny; only approves when `auto_approve=True` - Document field in `config.example.yaml` P2 – Deferred tool registry race condition - Replace module-level `_registry` global with `contextvars.ContextVar` - Each asyncio request context gets its own registry; worker threads inherit the context automatically via `loop.run_in_executor` - Expose `get_deferred_registry` / `set_deferred_registry` / `reset_deferred_registry` helpers Tests: 831 pass (57 for affected modules, 3 new tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(sandbox): mount /mnt/acp-workspace in docker sandbox container The AioSandboxProvider was not mounting the ACP workspace into the sandbox container, so /mnt/acp-workspace was inaccessible when the lead agent tried to read ACP results in docker mode. Changes: - `ensure_thread_dirs`: also create `acp-workspace/` (chmod 0o777) so the directory exists before the sandbox container starts — required for Docker volume mounts - `_get_thread_mounts`: add read-only `/mnt/acp-workspace` mount using the per-thread host path (`host_paths.acp_workspace_dir(thread_id)`) - Update stale CLAUDE.md description (was "fixed global workspace") Tests: `test_aio_sandbox_provider.py` (4 new tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lint): remove unused imports in test_aio_sandbox_provider Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix config --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 14:20:18 +08:00
(f for f in facts_data if isinstance(f, dict) and isinstance(f.get("content"), str) and f.get("content").strip()),
key=lambda fact: _coerce_confidence(fact.get("confidence"), default=0.0),
reverse=True,
)
# Compute token count for existing sections once, then account
# incrementally for each fact line to avoid full-string re-tokenization.
base_text = "\n\n".join(sections)
base_tokens = _count_tokens(base_text) if base_text else 0
# Account for the separator between existing sections and the facts section.
facts_header = "Facts:\n"
separator_tokens = _count_tokens("\n\n" + facts_header) if base_text else _count_tokens(facts_header)
running_tokens = base_tokens + separator_tokens
fact_lines: list[str] = []
for fact in ranked_facts:
content_value = fact.get("content")
if not isinstance(content_value, str):
continue
content = content_value.strip()
if not content:
continue
category = str(fact.get("category", "context")).strip() or "context"
confidence = _coerce_confidence(fact.get("confidence"), default=0.0)
line = f"- [{category} | {confidence:.2f}] {content}"
# Each additional line is preceded by a newline (except the first).
line_text = ("\n" + line) if fact_lines else line
line_tokens = _count_tokens(line_text)
if running_tokens + line_tokens <= max_tokens:
fact_lines.append(line)
running_tokens += line_tokens
else:
break
if fact_lines:
sections.append("Facts:\n" + "\n".join(fact_lines))
if not sections:
return ""
result = "\n\n".join(sections)
# Use accurate token counting with tiktoken
token_count = _count_tokens(result)
if token_count > max_tokens:
# Truncate to fit within token limit
# Estimate characters to remove based on token ratio
char_per_token = len(result) / token_count
target_chars = int(max_tokens * char_per_token * 0.95) # 95% to leave margin
result = result[:target_chars] + "\n..."
return result
def format_conversation_for_update(messages: list[Any]) -> str:
"""Format conversation messages for memory update prompt.
Args:
messages: List of conversation messages.
Returns:
Formatted conversation string.
"""
lines = []
for msg in messages:
role = getattr(msg, "type", "unknown")
content = getattr(msg, "content", str(msg))
# Handle content that might be a list (multimodal)
if isinstance(content, list):
fix: normalize structured LLM content in serialization and memory updater (#1215) * fix: normalize ToolMessage structured content in serialization When models return ToolMessage content as a list of content blocks (e.g. [{"type": "text", "text": "..."}]), the UI previously displayed the raw Python repr string instead of the extracted text. Replace str(msg.content) with the existing _extract_text() helper in both _serialize_message() and stream() to properly normalize list-of-blocks content to plain text. Fixes #1149 Also fixes the same root cause as #1188 (characters displayed one per line when tool response content is returned as structured blocks). Added 11 regression tests covering string, list-of-blocks, mixed, empty, and fallback content types. * fix(memory): extract text from structured LLM responses in memory updater When LLMs return response content as list of content blocks (e.g. [{"type": "text", "text": "..."}]) instead of plain strings, str() produces Python repr which breaks JSON parsing in the memory updater. This caused memory updates to silently fail. Changes: - Add _extract_text() helper in updater.py for safe content normalization - Use _extract_text() instead of str(response.content) in update_memory() - Fix format_conversation_for_update() to handle plain strings in list content - Fix subagent executor fallback path to extract text from list content - Replace print() with structured logging (logger.info/warning/error) - Add 13 regression tests covering _extract_text, format_conversation, and update_memory with structured LLM responses * fix: address Copilot review - defensive text extraction + logger.exception - client.py _extract_text: use block.get('text') + isinstance check (prevent KeyError/TypeError) - prompt.py format_conversation_for_update: same defensive check for dict text blocks - executor.py: type-safe text extraction in both code paths, fallback to placeholder instead of str(raw_content) - updater.py: use logger.exception() instead of logger.error() for traceback preservation * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix: preserve chunked structured content without spurious newlines * fix: restore backend unit test compatibility --------- Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-22 17:29:29 +08:00
text_parts = []
for p in content:
if isinstance(p, str):
text_parts.append(p)
elif isinstance(p, dict):
text_val = p.get("text")
if isinstance(text_val, str):
text_parts.append(text_val)
content = " ".join(text_parts) if text_parts else str(content)
fix(memory): prevent file upload events from persisting in long-term memory (#971) * fix(memory): prevent file upload events from persisting in long-term memory Uploaded files are session-scoped and unavailable in future sessions. Previously, upload interactions were recorded in memory, causing the agent to search for non-existent files in subsequent conversations. Changes: - memory_middleware: skip human messages containing <uploaded_files> and their paired AI responses from the memory queue - updater: post-process generated memory to strip upload mentions before saving to file - prompt: instruct the memory LLM to ignore file upload events Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): address Copilot review feedback on upload filtering - memory_middleware: strip <uploaded_files> block from human messages instead of dropping the entire turn; only skip the turn (and paired AI response) when nothing remains after stripping - updater: narrow the upload-scrubbing regex to explicit upload events (avoids false-positive removal of "User works with CSV files" etc.); also filter upload-event facts from the facts array - prompt: move `import re` to module scope; skip upload-only human messages (empty after stripping) rather than appending "User: " Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): allow optional words between 'upload' and 'file' in scrub regex The previous pattern required 'uploading file' with no intervening words, so 'uploading a test file' was not matched and leaked into long-term memory. Allow up to 3 modifier words between the verb and noun (e.g. 'uploading a test file', 'uploaded the attachment'). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(memory): add unit tests for upload filtering in memory pipeline Covers _filter_messages_for_memory and _strip_upload_mentions_from_memory per Copilot review suggestion. 15 test cases verify: - Upload-only turns (and paired AI responses) are excluded from memory queue - User's real question is preserved when combined with an upload block - Upload file paths are never present in filtered message content - Intermediate tool messages are always excluded - Multi-turn conversations: only the upload turn is dropped - Multimodal (list-content) human messages are handled - Upload-event sentences are removed from summaries and facts - Legitimate file-related facts (CSV preferences, PDF exports) are preserved - "uploading a test file" (words between verb and noun) is caught by regex Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-03-05 11:14:34 +08:00
# Strip uploaded_files tags from human messages to avoid persisting
# ephemeral file path info into long-term memory. Skip the turn entirely
# when nothing remains after stripping (upload-only message).
if role == "human":
content = re.sub(r"<uploaded_files>[\s\S]*?</uploaded_files>\n*", "", str(content)).strip()
fix(memory): prevent file upload events from persisting in long-term memory (#971) * fix(memory): prevent file upload events from persisting in long-term memory Uploaded files are session-scoped and unavailable in future sessions. Previously, upload interactions were recorded in memory, causing the agent to search for non-existent files in subsequent conversations. Changes: - memory_middleware: skip human messages containing <uploaded_files> and their paired AI responses from the memory queue - updater: post-process generated memory to strip upload mentions before saving to file - prompt: instruct the memory LLM to ignore file upload events Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): address Copilot review feedback on upload filtering - memory_middleware: strip <uploaded_files> block from human messages instead of dropping the entire turn; only skip the turn (and paired AI response) when nothing remains after stripping - updater: narrow the upload-scrubbing regex to explicit upload events (avoids false-positive removal of "User works with CSV files" etc.); also filter upload-event facts from the facts array - prompt: move `import re` to module scope; skip upload-only human messages (empty after stripping) rather than appending "User: " Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): allow optional words between 'upload' and 'file' in scrub regex The previous pattern required 'uploading file' with no intervening words, so 'uploading a test file' was not matched and leaked into long-term memory. Allow up to 3 modifier words between the verb and noun (e.g. 'uploading a test file', 'uploaded the attachment'). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(memory): add unit tests for upload filtering in memory pipeline Covers _filter_messages_for_memory and _strip_upload_mentions_from_memory per Copilot review suggestion. 15 test cases verify: - Upload-only turns (and paired AI responses) are excluded from memory queue - User's real question is preserved when combined with an upload block - Upload file paths are never present in filtered message content - Intermediate tool messages are always excluded - Multi-turn conversations: only the upload turn is dropped - Multimodal (list-content) human messages are handled - Upload-event sentences are removed from summaries and facts - Legitimate file-related facts (CSV preferences, PDF exports) are preserved - "uploading a test file" (words between verb and noun) is caught by regex Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-03-05 11:14:34 +08:00
if not content:
continue
# Truncate very long messages
if len(str(content)) > 1000:
content = str(content)[:1000] + "..."
if role == "human":
lines.append(f"User: {content}")
elif role == "ai":
lines.append(f"Assistant: {content}")
return "\n\n".join(lines)