fix(memory): prevent file upload events from persisting in long-term memory (#971)

* fix(memory): prevent file upload events from persisting in long-term memory Uploaded files are session-scoped and unavailable in future sessions. Previously, upload interactions were recorded in memory, causing the agent to search for non-existent files in subsequent conversations. Changes: - memory_middleware: skip human messages containing <uploaded_files> and their paired AI responses from the memory queue - updater: post-process generated memory to strip upload mentions before saving to file - prompt: instruct the memory LLM to ignore file upload events Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): address Copilot review feedback on upload filtering - memory_middleware: strip <uploaded_files> block from human messages instead of dropping the entire turn; only skip the turn (and paired AI response) when nothing remains after stripping - updater: narrow the upload-scrubbing regex to explicit upload events (avoids false-positive removal of "User works with CSV files" etc.); also filter upload-event facts from the facts array - prompt: move `import re` to module scope; skip upload-only human messages (empty after stripping) rather than appending "User: " Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): allow optional words between 'upload' and 'file' in scrub regex The previous pattern required 'uploading file' with no intervening words, so 'uploading a test file' was not matched and leaked into long-term memory. Allow up to 3 modifier words between the verb and noun (e.g. 'uploading a test file', 'uploaded the attachment'). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(memory): add unit tests for upload filtering in memory pipeline Covers _filter_messages_for_memory and _strip_upload_mentions_from_memory per Copilot review suggestion. 15 test cases verify: - Upload-only turns (and paired AI responses) are excluded from memory queue - User's real question is preserved when combined with an upload block - Upload file paths are never present in filtered message content - Intermediate tool messages are always excluded - Multi-turn conversations: only the upload turn is dropped - Multimodal (list-content) human messages are handled - Upload-event sentences are removed from summaries and facts - Legitimate file-related facts (CSV preferences, PDF exports) are preserved - "uploading a test file" (words between verb and noun) is caught by regex Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-26 07:14:47 +08:00 · 2026-03-05 11:14:34 +08:00
parent 6ac0042cfe
commit 3ada4f98b1
4 changed files with 336 additions and 5 deletions
--- a/backend/src/agents/memory/prompt.py
+++ b/backend/src/agents/memory/prompt.py
@@ -1,5 +1,6 @@
 """Prompt templates for memory update and injection."""

+import re
 from typing import Any

 try:
@@ -108,6 +109,9 @@ Important Rules:
 - For history sections, integrate new information chronologically into appropriate time period
 - Preserve technical accuracy - keep exact names of technologies, companies, projects
 - Focus on information useful for future interactions and personalization
+- IMPORTANT: Do NOT record file upload events in memory. Uploaded files are
+  session-specific and ephemeral — they will not be accessible in future sessions.
+  Recording upload events causes confusion in subsequent conversations.

 Return ONLY valid JSON, no explanation or markdown."""

@@ -249,6 +253,16 @@ def format_conversation_for_update(messages: list[Any]) -> str:
            text_parts = [p.get("text", "") for p in content if isinstance(p, dict) and "text" in p]
            content = " ".join(text_parts) if text_parts else str(content)

+        # Strip uploaded_files tags from human messages to avoid persisting
+        # ephemeral file path info into long-term memory.  Skip the turn entirely
+        # when nothing remains after stripping (upload-only message).
+        if role == "human":
+            content = re.sub(
+                r"<uploaded_files>[\s\S]*?</uploaded_files>\n*", "", str(content)
+            ).strip()
+            if not content:
+                continue
+
        # Truncate very long messages
        if len(str(content)) > 1000:
            content = str(content)[:1000] + "..."