diff --git a/backend/docs/MEMORY_IMPROVEMENTS.md b/backend/docs/MEMORY_IMPROVEMENTS.md index e916c40..3fddd4b 100644 --- a/backend/docs/MEMORY_IMPROVEMENTS.md +++ b/backend/docs/MEMORY_IMPROVEMENTS.md @@ -1,281 +1,65 @@ # Memory System Improvements -This document describes recent improvements to the memory system's fact injection mechanism. +This document tracks memory injection behavior and roadmap status. -## Overview +## Status (As Of 2026-03-10) -Two major improvements have been made to the `format_memory_for_injection` function: +Implemented in `main`: +- Accurate token counting via `tiktoken` in `format_memory_for_injection`. +- Facts are injected into prompt memory context. +- Facts are ranked by confidence (descending). +- Injection respects `max_injection_tokens` budget. -1. **Similarity-Based Fact Retrieval**: Uses TF-IDF to select facts most relevant to current conversation context -2. **Accurate Token Counting**: Uses tiktoken for precise token estimation instead of rough character-based approximation +Planned / not yet merged: +- TF-IDF similarity-based fact retrieval. +- `current_context` input for context-aware scoring. +- Configurable similarity/confidence weights (`similarity_weight`, `confidence_weight`). +- Middleware/runtime wiring for context-aware retrieval before each model call. -## 1. Similarity-Based Fact Retrieval +## Current Behavior -### Problem -The original implementation selected facts based solely on confidence scores, taking the top 15 highest-confidence facts regardless of their relevance to the current conversation. This could result in injecting irrelevant facts while omitting contextually important ones. - -### Solution -The new implementation uses **TF-IDF (Term Frequency-Inverse Document Frequency)** vectorization with cosine similarity to measure how relevant each fact is to the current conversation context. - -**Scoring Formula**: -``` -final_score = (similarity × 0.6) + (confidence × 0.4) -``` - -- **Similarity (60% weight)**: Cosine similarity between fact content and current context -- **Confidence (40% weight)**: LLM-assigned confidence score (0-1) - -### Benefits -- **Context-Aware**: Prioritizes facts relevant to what the user is currently discussing -- **Dynamic**: Different facts surface based on conversation topic -- **Balanced**: Considers both relevance and reliability -- **Fallback**: Gracefully degrades to confidence-only ranking if context is unavailable - -### Example -Given facts about Python, React, and Docker: -- User asks: *"How should I write Python tests?"* - - Prioritizes: Python testing, type hints, pytest -- User asks: *"How to optimize my Next.js app?"* - - Prioritizes: React/Next.js experience, performance optimization - -### Configuration -Customize weights in `config.yaml` (optional): -```yaml -memory: - similarity_weight: 0.6 # Weight for TF-IDF similarity (0-1) - confidence_weight: 0.4 # Weight for confidence score (0-1) -``` - -**Note**: Weights should sum to 1.0 for best results. - -## 2. Accurate Token Counting - -### Problem -The original implementation estimated tokens using a simple formula: -```python -max_chars = max_tokens * 4 -``` - -This assumes ~4 characters per token, which is: -- Inaccurate for many languages and content types -- Can lead to over-injection (exceeding token limits) -- Can lead to under-injection (wasting available budget) - -### Solution -The new implementation uses **tiktoken**, OpenAI's official tokenizer library, to count tokens accurately: +Function today: ```python -import tiktoken - -def _count_tokens(text: str, encoding_name: str = "cl100k_base") -> int: - encoding = tiktoken.get_encoding(encoding_name) - return len(encoding.encode(text)) +def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2000) -> str: ``` -- Uses `cl100k_base` encoding (GPT-4, GPT-3.5, text-embedding-ada-002) -- Provides exact token counts for budget management -- Falls back to character-based estimation if tiktoken fails +Current injection format: +- `User Context` section from `user.*.summary` +- `History` section from `history.*.summary` +- `Facts` section from `facts[]`, sorted by confidence, appended until token budget is reached -### Benefits -- **Precision**: Exact token counts match what the model sees -- **Budget Optimization**: Maximizes use of available token budget -- **No Overflows**: Prevents exceeding `max_injection_tokens` limit -- **Better Planning**: Each section's token cost is known precisely +Token counting: +- Uses `tiktoken` (`cl100k_base`) when available +- Falls back to `len(text) // 4` if tokenizer import fails -### Example -```python -text = "This is a test string to count tokens accurately using tiktoken." +## Known Gap -# Old method -char_count = len(text) # 64 characters -old_estimate = char_count // 4 # 16 tokens (overestimate) +Previous versions of this document described TF-IDF/context-aware retrieval as if it were already shipped. +That was not accurate for `main` and caused confusion. -# New method -accurate_count = _count_tokens(text) # 13 tokens (exact) +Issue reference: `#1059` + +## Roadmap (Planned) + +Planned scoring strategy: + +```text +final_score = (similarity * 0.6) + (confidence * 0.4) ``` -**Result**: 3-token difference (18.75% error rate) +Planned integration shape: +1. Extract recent conversational context from filtered user/final-assistant turns. +2. Compute TF-IDF cosine similarity between each fact and current context. +3. Rank by weighted score and inject under token budget. +4. Fall back to confidence-only ranking if context is unavailable. -In production, errors can be much larger for: -- Code snippets (more tokens per character) -- Non-English text (variable token ratios) -- Technical jargon (often multi-token words) +## Validation -## Implementation Details +Current regression coverage includes: +- facts inclusion in memory injection output +- confidence ordering +- token-budget-limited fact inclusion -### Function Signature -```python -def format_memory_for_injection( - memory_data: dict[str, Any], - max_tokens: int = 2000, - current_context: str | None = None, -) -> str: -``` - -**New Parameter**: -- `current_context`: Optional string containing recent conversation messages for similarity calculation - -### Backward Compatibility -The function remains **100% backward compatible**: -- If `current_context` is `None` or empty, falls back to confidence-only ranking -- Existing callers without the parameter work exactly as before -- Token counting is always accurate (transparent improvement) - -### Integration Point -Memory is **dynamically injected** via `MemoryMiddleware.before_model()`: - -```python -# src/agents/middlewares/memory_middleware.py - -def _extract_conversation_context(messages: list, max_turns: int = 3) -> str: - """Extract recent conversation (user input + final responses only).""" - context_parts = [] - turn_count = 0 - - for msg in reversed(messages): - if msg.type == "human": - # Always include user messages - context_parts.append(extract_text(msg)) - turn_count += 1 - if turn_count >= max_turns: - break - - elif msg.type == "ai" and not msg.tool_calls: - # Only include final AI responses (no tool_calls) - context_parts.append(extract_text(msg)) - - # Skip tool messages and AI messages with tool_calls - - return " ".join(reversed(context_parts)) - - -class MemoryMiddleware: - def before_model(self, state, runtime): - """Inject memory before EACH LLM call (not just before_agent).""" - - # Get recent conversation context (filtered) - conversation_context = _extract_conversation_context( - state["messages"], - max_turns=3 - ) - - # Load memory with context-aware fact selection - memory_data = get_memory_data() - memory_content = format_memory_for_injection( - memory_data, - max_tokens=config.max_injection_tokens, - current_context=conversation_context, # ✅ Clean conversation only - ) - - # Inject as system message - memory_message = SystemMessage( - content=f"\n{memory_content}\n", - name="memory_context", - ) - - return {"messages": [memory_message] + state["messages"]} -``` - -### How It Works - -1. **User continues conversation**: - ``` - Turn 1: "I'm working on a Python project" - Turn 2: "It uses FastAPI and SQLAlchemy" - Turn 3: "How do I write tests?" ← Current query - ``` - -2. **Extract recent context**: Last 3 turns combined: - ``` - "I'm working on a Python project. It uses FastAPI and SQLAlchemy. How do I write tests?" - ``` - -3. **TF-IDF scoring**: Ranks facts by relevance to this context - - High score: "Prefers pytest for testing" (testing + Python) - - High score: "Likes type hints in Python" (Python related) - - High score: "Expert in Python and FastAPI" (Python + FastAPI) - - Low score: "Uses Docker for containerization" (less relevant) - -4. **Injection**: Top-ranked facts injected into system prompt's `` section - -5. **Agent sees**: Full system prompt with relevant memory context - -### Benefits of Dynamic System Prompt - -- **Multi-Turn Context**: Uses last 3 turns, not just current question - - Captures ongoing conversation flow - - Better understanding of user's current focus -- **Query-Specific Facts**: Different facts surface based on conversation topic -- **Clean Architecture**: No middleware message manipulation -- **LangChain Native**: Uses built-in dynamic system prompt support -- **Runtime Flexibility**: Memory regenerated for each agent invocation - -## Dependencies - -New dependencies added to `pyproject.toml`: -```toml -dependencies = [ - # ... existing dependencies ... - "tiktoken>=0.8.0", # Accurate token counting - "scikit-learn>=1.6.1", # TF-IDF vectorization -] -``` - -Install with: -```bash -cd backend -uv sync -``` - -## Testing - -Run the test script to verify improvements: -```bash -cd backend -python test_memory_improvement.py -``` - -Expected output shows: -- Different fact ordering based on context -- Accurate token counts vs old estimates -- Budget-respecting fact selection - -## Performance Impact - -### Computational Cost -- **TF-IDF Calculation**: O(n × m) where n=facts, m=vocabulary - - Negligible for typical fact counts (10-100 facts) - - Caching opportunities if context doesn't change -- **Token Counting**: ~10-100µs per call - - Faster than the old character-counting approach - - Minimal overhead compared to LLM inference - -### Memory Usage -- **TF-IDF Vectorizer**: ~1-5MB for typical vocabulary - - Instantiated once per injection call - - Garbage collected after use -- **Tiktoken Encoding**: ~1MB (cached singleton) - - Loaded once per process lifetime - -### Recommendations -- Current implementation is optimized for accuracy over caching -- For high-throughput scenarios, consider: - - Pre-computing fact embeddings (store in memory.json) - - Caching TF-IDF vectorizer between calls - - Using approximate nearest neighbor search for >1000 facts - -## Summary - -| Aspect | Before | After | -|--------|--------|-------| -| Fact Selection | Top 15 by confidence only | Relevance-based (similarity + confidence) | -| Token Counting | `len(text) // 4` | `tiktoken.encode(text)` | -| Context Awareness | None | TF-IDF cosine similarity | -| Accuracy | ±25% token estimate | Exact token count | -| Configuration | Fixed weights | Customizable similarity/confidence weights | - -These improvements result in: -- **More relevant** facts injected into context -- **Better utilization** of available token budget -- **Fewer hallucinations** due to focused context -- **Higher quality** agent responses +Tests: +- `backend/tests/test_memory_prompt_injection.py` diff --git a/backend/docs/MEMORY_IMPROVEMENTS_SUMMARY.md b/backend/docs/MEMORY_IMPROVEMENTS_SUMMARY.md index 67701cb..da2bcd8 100644 --- a/backend/docs/MEMORY_IMPROVEMENTS_SUMMARY.md +++ b/backend/docs/MEMORY_IMPROVEMENTS_SUMMARY.md @@ -1,260 +1,38 @@ # Memory System Improvements - Summary -## 改进概述 +## Sync Note (2026-03-10) -针对你提出的两个问题进行了优化: -1. ✅ **粗糙的 token 计算**(`字符数 * 4`)→ 使用 tiktoken 精确计算 -2. ✅ **缺乏相似度召回** → 使用 TF-IDF + 最近对话上下文 +This summary is synchronized with the `main` branch implementation. +TF-IDF/context-aware retrieval is **planned**, not merged yet. -## 核心改进 +## Implemented -### 1. 基于对话上下文的智能 Facts 召回 +- Accurate token counting with `tiktoken` in memory injection. +- Facts are injected into `` prompt content. +- Facts are ordered by confidence and bounded by `max_injection_tokens`. -**之前**: -- 只按 confidence 排序取前 15 个 -- 无论用户在讨论什么都注入相同的 facts +## Planned (Not Yet Merged) -**现在**: -- 提取最近 **3 轮对话**(human + AI 消息)作为上下文 -- 使用 **TF-IDF 余弦相似度**计算每个 fact 与对话的相关性 -- 综合评分:`相似度(60%) + 置信度(40%)` -- 动态选择最相关的 facts +- TF-IDF cosine similarity recall based on recent conversation context. +- `current_context` parameter for `format_memory_for_injection`. +- Weighted ranking (`similarity` + `confidence`). +- Runtime extraction/injection flow for context-aware fact selection. -**示例**: -``` -对话历史: -Turn 1: "我在做一个 Python 项目" -Turn 2: "使用 FastAPI 和 SQLAlchemy" -Turn 3: "怎么写测试?" +## Why This Sync Was Needed -上下文: "我在做一个 Python 项目 使用 FastAPI 和 SQLAlchemy 怎么写测试?" +Earlier docs described TF-IDF behavior as already implemented, which did not match code in `main`. +This mismatch is tracked in issue `#1059`. -相关度高的 facts: -✓ "Prefers pytest for testing" (Python + 测试) -✓ "Expert in Python and FastAPI" (Python + FastAPI) -✓ "Likes type hints in Python" (Python) - -相关度低的 facts: -✗ "Uses Docker for containerization" (不相关) -``` - -### 2. 精确的 Token 计算 - -**之前**: -```python -max_chars = max_tokens * 4 # 粗糙估算 -``` - -**现在**: -```python -import tiktoken - -def _count_tokens(text: str) -> int: - encoding = tiktoken.get_encoding("cl100k_base") # GPT-4/3.5 - return len(encoding.encode(text)) -``` - -**效果对比**: -```python -text = "This is a test string to count tokens accurately." -旧方法: len(text) // 4 = 12 tokens (估算) -新方法: tiktoken.encode = 10 tokens (精确) -误差: 20% -``` - -### 3. 多轮对话上下文 - -**之前的担心**: -> "只传最近一条 human message 会不会上下文不太够?" - -**现在的解决方案**: -- 提取最近 **3 轮对话**(可配置) -- 包括 human 和 AI 消息 -- 更完整的对话上下文 - -**示例**: -``` -单条消息: "怎么写测试?" -→ 缺少上下文,不知道是什么项目 - -3轮对话: "Python 项目 + FastAPI + 怎么写测试?" -→ 完整上下文,能选择更相关的 facts -``` - -## 实现方式 - -### Middleware 动态注入 - -使用 `before_model` 钩子在**每次 LLM 调用前**注入 memory: +## Current API Shape ```python -# src/agents/middlewares/memory_middleware.py - -def _extract_conversation_context(messages: list, max_turns: int = 3) -> str: - """提取最近 3 轮对话(只包含用户输入和最终回复)""" - context_parts = [] - turn_count = 0 - - for msg in reversed(messages): - msg_type = getattr(msg, "type", None) - - if msg_type == "human": - # ✅ 总是包含用户消息 - content = extract_text(msg) - if content: - context_parts.append(content) - turn_count += 1 - if turn_count >= max_turns: - break - - elif msg_type == "ai": - # ✅ 只包含没有 tool_calls 的 AI 消息(最终回复) - tool_calls = getattr(msg, "tool_calls", None) - if not tool_calls: - content = extract_text(msg) - if content: - context_parts.append(content) - - # ✅ 跳过 tool messages 和带 tool_calls 的 AI 消息 - - return " ".join(reversed(context_parts)) - - -class MemoryMiddleware: - def before_model(self, state, runtime): - """在每次 LLM 调用前注入 memory(不是 before_agent)""" - - # 1. 提取最近 3 轮对话(过滤掉 tool calls) - messages = state["messages"] - conversation_context = _extract_conversation_context(messages, max_turns=3) - - # 2. 使用干净的对话上下文选择相关 facts - memory_data = get_memory_data() - memory_content = format_memory_for_injection( - memory_data, - max_tokens=config.max_injection_tokens, - current_context=conversation_context, # ✅ 只包含真实对话内容 - ) - - # 3. 作为 system message 注入到消息列表开头 - memory_message = SystemMessage( - content=f"\n{memory_content}\n", - name="memory_context", # 用于去重检测 - ) - - # 4. 插入到消息列表开头 - updated_messages = [memory_message] + messages - return {"messages": updated_messages} +def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2000) -> str: ``` -### 为什么这样设计? +No `current_context` argument is currently available in `main`. -基于你的三个重要观察: +## Verification Pointers -1. **应该用 `before_model` 而不是 `before_agent`** - - ✅ `before_agent`: 只在整个 agent 开始时调用一次 - - ✅ `before_model`: 在**每次 LLM 调用前**都会调用 - - ✅ 这样每次 LLM 推理都能看到最新的相关 memory - -2. **messages 数组里只有 human/ai/tool,没有 system** - - ✅ 虽然不常见,但 LangChain 允许在对话中插入 system message - - ✅ Middleware 可以修改 messages 数组 - - ✅ 使用 `name="memory_context"` 防止重复注入 - -3. **应该剔除 tool call 的 AI messages,只传用户输入和最终输出** - - ✅ 过滤掉带 `tool_calls` 的 AI 消息(中间步骤) - - ✅ 只保留: - Human 消息(用户输入) - - AI 消息但无 tool_calls(最终回复) - - ✅ 上下文更干净,TF-IDF 相似度计算更准确 - -## 配置选项 - -在 `config.yaml` 中可以调整: - -```yaml -memory: - enabled: true - max_injection_tokens: 2000 # ✅ 使用精确 token 计数 - - # 高级设置(可选) - # max_context_turns: 3 # 对话轮数(默认 3) - # similarity_weight: 0.6 # 相似度权重 - # confidence_weight: 0.4 # 置信度权重 -``` - -## 依赖变更 - -新增依赖: -```toml -dependencies = [ - "tiktoken>=0.8.0", # 精确 token 计数 - "scikit-learn>=1.6.1", # TF-IDF 向量化 -] -``` - -安装: -```bash -cd backend -uv sync -``` - -## 性能影响 - -- **TF-IDF 计算**:O(n × m),n=facts 数量,m=词汇表大小 - - 典型场景(10-100 facts):< 10ms -- **Token 计数**:~100µs per call - - 比字符计数还快 -- **总开销**:可忽略(相比 LLM 推理) - -## 向后兼容性 - -✅ 完全向后兼容: -- 如果没有 `current_context`,退化为按 confidence 排序 -- 所有现有配置继续工作 -- 不影响其他功能 - -## 文件变更清单 - -1. **核心功能** - - `src/agents/memory/prompt.py` - 添加 TF-IDF 召回和精确 token 计数 - - `src/agents/lead_agent/prompt.py` - 动态系统提示 - - `src/agents/lead_agent/agent.py` - 传入函数而非字符串 - -2. **依赖** - - `pyproject.toml` - 添加 tiktoken 和 scikit-learn - -3. **文档** - - `docs/MEMORY_IMPROVEMENTS.md` - 详细技术文档 - - `docs/MEMORY_IMPROVEMENTS_SUMMARY.md` - 改进总结(本文件) - - `CLAUDE.md` - 更新架构说明 - - `config.example.yaml` - 添加配置说明 - -## 测试验证 - -运行项目验证: -```bash -cd backend -make dev -``` - -在对话中测试: -1. 讨论不同主题(Python、React、Docker 等) -2. 观察不同对话注入的 facts 是否不同 -3. 检查 token 预算是否被准确控制 - -## 总结 - -| 问题 | 之前 | 现在 | -|------|------|------| -| Token 计算 | `len(text) // 4` (±25% 误差) | `tiktoken.encode()` (精确) | -| Facts 选择 | 按 confidence 固定排序 | TF-IDF 相似度 + confidence | -| 上下文 | 无 | 最近 3 轮对话 | -| 实现方式 | 静态系统提示 | 动态系统提示函数 | -| 配置灵活性 | 有限 | 可调轮数和权重 | - -所有改进都实现了,并且: -- ✅ 不修改 messages 数组 -- ✅ 使用多轮对话上下文 -- ✅ 精确 token 计数 -- ✅ 智能相似度召回 -- ✅ 完全向后兼容 +- Implementation: `backend/src/agents/memory/prompt.py` +- Prompt assembly: `backend/src/agents/lead_agent/prompt.py` +- Regression tests: `backend/tests/test_memory_prompt_injection.py` diff --git a/backend/src/agents/memory/prompt.py b/backend/src/agents/memory/prompt.py index 4529156..9d6b1b0 100644 --- a/backend/src/agents/memory/prompt.py +++ b/backend/src/agents/memory/prompt.py @@ -1,5 +1,6 @@ """Prompt templates for memory update and injection.""" +import math import re from typing import Any @@ -166,6 +167,22 @@ def _count_tokens(text: str, encoding_name: str = "cl100k_base") -> int: return len(text) // 4 +def _coerce_confidence(value: Any, default: float = 0.0) -> float: + """Coerce a confidence-like value to a bounded float in [0, 1]. + + Non-finite values (NaN, inf, -inf) are treated as invalid and fall back + to the default before clamping, preventing them from dominating ranking. + The ``default`` parameter is assumed to be a finite value. + """ + try: + confidence = float(value) + except (TypeError, ValueError): + return max(0.0, min(1.0, default)) + if not math.isfinite(confidence): + return max(0.0, min(1.0, default)) + return max(0.0, min(1.0, confidence)) + + def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2000) -> str: """Format memory data for injection into system prompt. @@ -217,6 +234,55 @@ def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2 if history_sections: sections.append("History:\n" + "\n".join(f"- {s}" for s in history_sections)) + # Format facts (sorted by confidence; include as many as token budget allows) + facts_data = memory_data.get("facts", []) + if isinstance(facts_data, list) and facts_data: + ranked_facts = sorted( + ( + f + for f in facts_data + if isinstance(f, dict) + and isinstance(f.get("content"), str) + and f.get("content").strip() + ), + key=lambda fact: _coerce_confidence(fact.get("confidence"), default=0.0), + reverse=True, + ) + + # Compute token count for existing sections once, then account + # incrementally for each fact line to avoid full-string re-tokenization. + base_text = "\n\n".join(sections) + base_tokens = _count_tokens(base_text) if base_text else 0 + # Account for the separator between existing sections and the facts section. + facts_header = "Facts:\n" + separator_tokens = _count_tokens("\n\n" + facts_header) if base_text else _count_tokens(facts_header) + running_tokens = base_tokens + separator_tokens + + fact_lines: list[str] = [] + for fact in ranked_facts: + content_value = fact.get("content") + if not isinstance(content_value, str): + continue + content = content_value.strip() + if not content: + continue + category = str(fact.get("category", "context")).strip() or "context" + confidence = _coerce_confidence(fact.get("confidence"), default=0.0) + line = f"- [{category} | {confidence:.2f}] {content}" + + # Each additional line is preceded by a newline (except the first). + line_text = ("\n" + line) if fact_lines else line + line_tokens = _count_tokens(line_text) + + if running_tokens + line_tokens <= max_tokens: + fact_lines.append(line) + running_tokens += line_tokens + else: + break + + if fact_lines: + sections.append("Facts:\n" + "\n".join(fact_lines)) + if not sections: return "" diff --git a/backend/tests/test_memory_prompt_injection.py b/backend/tests/test_memory_prompt_injection.py new file mode 100644 index 0000000..d00bbd5 --- /dev/null +++ b/backend/tests/test_memory_prompt_injection.py @@ -0,0 +1,122 @@ +"""Tests for memory prompt injection formatting.""" + +import math + +from src.agents.memory.prompt import _coerce_confidence, format_memory_for_injection + + +def test_format_memory_includes_facts_section() -> None: + memory_data = { + "user": {}, + "history": {}, + "facts": [ + {"content": "User uses PostgreSQL", "category": "knowledge", "confidence": 0.9}, + {"content": "User prefers SQLAlchemy", "category": "preference", "confidence": 0.8}, + ], + } + + result = format_memory_for_injection(memory_data, max_tokens=2000) + + assert "Facts:" in result + assert "User uses PostgreSQL" in result + assert "User prefers SQLAlchemy" in result + + +def test_format_memory_sorts_facts_by_confidence_desc() -> None: + memory_data = { + "user": {}, + "history": {}, + "facts": [ + {"content": "Low confidence fact", "category": "context", "confidence": 0.4}, + {"content": "High confidence fact", "category": "knowledge", "confidence": 0.95}, + ], + } + + result = format_memory_for_injection(memory_data, max_tokens=2000) + + assert result.index("High confidence fact") < result.index("Low confidence fact") + + +def test_format_memory_respects_budget_when_adding_facts(monkeypatch) -> None: + # Make token counting deterministic for this test by counting characters. + monkeypatch.setattr("src.agents.memory.prompt._count_tokens", lambda text, encoding_name="cl100k_base": len(text)) + + memory_data = { + "user": {}, + "history": {}, + "facts": [ + {"content": "First fact should fit", "category": "knowledge", "confidence": 0.95}, + {"content": "Second fact should not fit in tiny budget", "category": "knowledge", "confidence": 0.90}, + ], + } + + first_fact_only_memory_data = { + "user": {}, + "history": {}, + "facts": [ + {"content": "First fact should fit", "category": "knowledge", "confidence": 0.95}, + ], + } + one_fact_result = format_memory_for_injection(first_fact_only_memory_data, max_tokens=2000) + two_facts_result = format_memory_for_injection(memory_data, max_tokens=2000) + # Choose a budget that can include exactly one fact section line. + max_tokens = (len(one_fact_result) + len(two_facts_result)) // 2 + + first_only_result = format_memory_for_injection(memory_data, max_tokens=max_tokens) + + assert "First fact should fit" in first_only_result + assert "Second fact should not fit in tiny budget" not in first_only_result + + +def test_coerce_confidence_nan_falls_back_to_default() -> None: + """NaN should not be treated as a valid confidence value.""" + result = _coerce_confidence(math.nan, default=0.5) + assert result == 0.5 + + +def test_coerce_confidence_inf_falls_back_to_default() -> None: + """Infinite values should fall back to default rather than clamping to 1.0.""" + assert _coerce_confidence(math.inf, default=0.3) == 0.3 + assert _coerce_confidence(-math.inf, default=0.3) == 0.3 + + +def test_coerce_confidence_valid_values_are_clamped() -> None: + """Valid floats outside [0, 1] are clamped; values inside are preserved.""" + assert _coerce_confidence(1.5) == 1.0 + assert _coerce_confidence(-0.5) == 0.0 + assert abs(_coerce_confidence(0.75) - 0.75) < 1e-9 + + +def test_format_memory_skips_none_content_facts() -> None: + """Facts with content=None must not produce a 'None' line in the output.""" + memory_data = { + "facts": [ + {"content": None, "category": "knowledge", "confidence": 0.9}, + {"content": "Real fact", "category": "knowledge", "confidence": 0.8}, + ], + } + + result = format_memory_for_injection(memory_data, max_tokens=2000) + + assert "None" not in result + assert "Real fact" in result + + +def test_format_memory_skips_non_string_content_facts() -> None: + """Facts with non-string content (e.g. int/list) must be ignored.""" + memory_data = { + "facts": [ + {"content": 42, "category": "knowledge", "confidence": 0.9}, + {"content": ["list"], "category": "knowledge", "confidence": 0.85}, + {"content": "Valid fact", "category": "knowledge", "confidence": 0.7}, + ], + } + + result = format_memory_for_injection(memory_data, max_tokens=2000) + + # The formatted line for an integer content would be "- [knowledge | 0.90] 42". + assert "| 0.90] 42" not in result + # The formatted line for a list content would be "- [knowledge | 0.85] ['list']". + assert "| 0.85]" not in result + assert "Valid fact" in result +