refactor: split backend into harness (deerflow.*) and app (app.*) (#1131)

* refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.*) and app (app.*) Physically split the monolithic backend/src/ package into two layers: - **Harness** (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.*`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - **App** (`app/`): unpublished application code with import prefix `app.*`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src.* → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.*` or `import app.*` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow.*/app.* after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-01 17:50:44 +08:00 · 2026-03-14 22:55:52 +08:00
parent 9b49a80dda
commit 76803b826f
198 changed files with 1786 additions and 941 deletions
--- a/backend/packages/harness/deerflow/agents/memory/prompt.py
+++ b/backend/packages/harness/deerflow/agents/memory/prompt.py
@@ -0,0 +1,339 @@
+"""Prompt templates for memory update and injection."""
+
+import math
+import re
+from typing import Any
+
+try:
+    import tiktoken
+
+    TIKTOKEN_AVAILABLE = True
+except ImportError:
+    TIKTOKEN_AVAILABLE = False
+
+# Prompt template for updating memory based on conversation
+MEMORY_UPDATE_PROMPT = """You are a memory management system. Your task is to analyze a conversation and update the user's memory profile.
+
+Current Memory State:
+<current_memory>
+{current_memory}
+</current_memory>
+
+New Conversation to Process:
+<conversation>
+{conversation}
+</conversation>
+
+Instructions:
+1. Analyze the conversation for important information about the user
+2. Extract relevant facts, preferences, and context with specific details (numbers, names, technologies)
+3. Update the memory sections as needed following the detailed length guidelines below
+
+Memory Section Guidelines:
+
+**User Context** (Current state - concise summaries):
+- workContext: Professional role, company, key projects, main technologies (2-3 sentences)
+  Example: Core contributor, project names with metrics (16k+ stars), technical stack
+- personalContext: Languages, communication preferences, key interests (1-2 sentences)
+  Example: Bilingual capabilities, specific interest areas, expertise domains
+- topOfMind: Multiple ongoing focus areas and priorities (3-5 sentences, detailed paragraph)
+  Example: Primary project work, parallel technical investigations, ongoing learning/tracking
+  Include: Active implementation work, troubleshooting issues, market/research interests
+  Note: This captures SEVERAL concurrent focus areas, not just one task
+
+**History** (Temporal context - rich paragraphs):
+- recentMonths: Detailed summary of recent activities (4-6 sentences or 1-2 paragraphs)
+  Timeline: Last 1-3 months of interactions
+  Include: Technologies explored, projects worked on, problems solved, interests demonstrated
+- earlierContext: Important historical patterns (3-5 sentences or 1 paragraph)
+  Timeline: 3-12 months ago
+  Include: Past projects, learning journeys, established patterns
+- longTermBackground: Persistent background and foundational context (2-4 sentences)
+  Timeline: Overall/foundational information
+  Include: Core expertise, longstanding interests, fundamental working style
+
+**Facts Extraction**:
+- Extract specific, quantifiable details (e.g., "16k+ GitHub stars", "200+ datasets")
+- Include proper nouns (company names, project names, technology names)
+- Preserve technical terminology and version numbers
+- Categories:
+  * preference: Tools, styles, approaches user prefers/dislikes
+  * knowledge: Specific expertise, technologies mastered, domain knowledge
+  * context: Background facts (job title, projects, locations, languages)
+  * behavior: Working patterns, communication habits, problem-solving approaches
+  * goal: Stated objectives, learning targets, project ambitions
+- Confidence levels:
+  * 0.9-1.0: Explicitly stated facts ("I work on X", "My role is Y")
+  * 0.7-0.8: Strongly implied from actions/discussions
+  * 0.5-0.6: Inferred patterns (use sparingly, only for clear patterns)
+
+**What Goes Where**:
+- workContext: Current job, active projects, primary tech stack
+- personalContext: Languages, personality, interests outside direct work tasks
+- topOfMind: Multiple ongoing priorities and focus areas user cares about recently (gets updated most frequently)
+  Should capture 3-5 concurrent themes: main work, side explorations, learning/tracking interests
+- recentMonths: Detailed account of recent technical explorations and work
+- earlierContext: Patterns from slightly older interactions still relevant
+- longTermBackground: Unchanging foundational facts about the user
+
+**Multilingual Content**:
+- Preserve original language for proper nouns and company names
+- Keep technical terms in their original form (DeepSeek, LangGraph, etc.)
+- Note language capabilities in personalContext
+
+Output Format (JSON):
+{{
+  "user": {{
+    "workContext": {{ "summary": "...", "shouldUpdate": true/false }},
+    "personalContext": {{ "summary": "...", "shouldUpdate": true/false }},
+    "topOfMind": {{ "summary": "...", "shouldUpdate": true/false }}
+  }},
+  "history": {{
+    "recentMonths": {{ "summary": "...", "shouldUpdate": true/false }},
+    "earlierContext": {{ "summary": "...", "shouldUpdate": true/false }},
+    "longTermBackground": {{ "summary": "...", "shouldUpdate": true/false }}
+  }},
+  "newFacts": [
+    {{ "content": "...", "category": "preference|knowledge|context|behavior|goal", "confidence": 0.0-1.0 }}
+  ],
+  "factsToRemove": ["fact_id_1", "fact_id_2"]
+}}
+
+Important Rules:
+- Only set shouldUpdate=true if there's meaningful new information
+- Follow length guidelines: workContext/personalContext are concise (1-3 sentences), topOfMind and history sections are detailed (paragraphs)
+- Include specific metrics, version numbers, and proper nouns in facts
+- Only add facts that are clearly stated (0.9+) or strongly implied (0.7+)
+- Remove facts that are contradicted by new information
+- When updating topOfMind, integrate new focus areas while removing completed/abandoned ones
+  Keep 3-5 concurrent focus themes that are still active and relevant
+- For history sections, integrate new information chronologically into appropriate time period
+- Preserve technical accuracy - keep exact names of technologies, companies, projects
+- Focus on information useful for future interactions and personalization
+- IMPORTANT: Do NOT record file upload events in memory. Uploaded files are
+  session-specific and ephemeral — they will not be accessible in future sessions.
+  Recording upload events causes confusion in subsequent conversations.
+
+Return ONLY valid JSON, no explanation or markdown."""
+
+
+# Prompt template for extracting facts from a single message
+FACT_EXTRACTION_PROMPT = """Extract factual information about the user from this message.
+
+Message:
+{message}
+
+Extract facts in this JSON format:
+{{
+  "facts": [
+    {{ "content": "...", "category": "preference|knowledge|context|behavior|goal", "confidence": 0.0-1.0 }}
+  ]
+}}
+
+Categories:
+- preference: User preferences (likes/dislikes, styles, tools)
+- knowledge: User's expertise or knowledge areas
+- context: Background context (location, job, projects)
+- behavior: Behavioral patterns
+- goal: User's goals or objectives
+
+Rules:
+- Only extract clear, specific facts
+- Confidence should reflect certainty (explicit statement = 0.9+, implied = 0.6-0.8)
+- Skip vague or temporary information
+
+Return ONLY valid JSON."""
+
+
+def _count_tokens(text: str, encoding_name: str = "cl100k_base") -> int:
+    """Count tokens in text using tiktoken.
+
+    Args:
+        text: The text to count tokens for.
+        encoding_name: The encoding to use (default: cl100k_base for GPT-4/3.5).
+
+    Returns:
+        The number of tokens in the text.
+    """
+    if not TIKTOKEN_AVAILABLE:
+        # Fallback to character-based estimation if tiktoken is not available
+        return len(text) // 4
+
+    try:
+        encoding = tiktoken.get_encoding(encoding_name)
+        return len(encoding.encode(text))
+    except Exception:
+        # Fallback to character-based estimation on error
+        return len(text) // 4
+
+
+def _coerce_confidence(value: Any, default: float = 0.0) -> float:
+    """Coerce a confidence-like value to a bounded float in [0, 1].
+
+    Non-finite values (NaN, inf, -inf) are treated as invalid and fall back
+    to the default before clamping, preventing them from dominating ranking.
+    The ``default`` parameter is assumed to be a finite value.
+    """
+    try:
+        confidence = float(value)
+    except (TypeError, ValueError):
+        return max(0.0, min(1.0, default))
+    if not math.isfinite(confidence):
+        return max(0.0, min(1.0, default))
+    return max(0.0, min(1.0, confidence))
+
+
+def format_memory_for_injection(memory_data: dict[str, Any], max_tokens: int = 2000) -> str:
+    """Format memory data for injection into system prompt.
+
+    Args:
+        memory_data: The memory data dictionary.
+        max_tokens: Maximum tokens to use (counted via tiktoken for accuracy).
+
+    Returns:
+        Formatted memory string for system prompt injection.
+    """
+    if not memory_data:
+        return ""
+
+    sections = []
+
+    # Format user context
+    user_data = memory_data.get("user", {})
+    if user_data:
+        user_sections = []
+
+        work_ctx = user_data.get("workContext", {})
+        if work_ctx.get("summary"):
+            user_sections.append(f"Work: {work_ctx['summary']}")
+
+        personal_ctx = user_data.get("personalContext", {})
+        if personal_ctx.get("summary"):
+            user_sections.append(f"Personal: {personal_ctx['summary']}")
+
+        top_of_mind = user_data.get("topOfMind", {})
+        if top_of_mind.get("summary"):
+            user_sections.append(f"Current Focus: {top_of_mind['summary']}")
+
+        if user_sections:
+            sections.append("User Context:\n" + "\n".join(f"- {s}" for s in user_sections))
+
+    # Format history
+    history_data = memory_data.get("history", {})
+    if history_data:
+        history_sections = []
+
+        recent = history_data.get("recentMonths", {})
+        if recent.get("summary"):
+            history_sections.append(f"Recent: {recent['summary']}")
+
+        earlier = history_data.get("earlierContext", {})
+        if earlier.get("summary"):
+            history_sections.append(f"Earlier: {earlier['summary']}")
+
+        if history_sections:
+            sections.append("History:\n" + "\n".join(f"- {s}" for s in history_sections))
+
+    # Format facts (sorted by confidence; include as many as token budget allows)
+    facts_data = memory_data.get("facts", [])
+    if isinstance(facts_data, list) and facts_data:
+        ranked_facts = sorted(
+            (
+                f
+                for f in facts_data
+                if isinstance(f, dict)
+                and isinstance(f.get("content"), str)
+                and f.get("content").strip()
+            ),
+            key=lambda fact: _coerce_confidence(fact.get("confidence"), default=0.0),
+            reverse=True,
+        )
+
+        # Compute token count for existing sections once, then account
+        # incrementally for each fact line to avoid full-string re-tokenization.
+        base_text = "\n\n".join(sections)
+        base_tokens = _count_tokens(base_text) if base_text else 0
+        # Account for the separator between existing sections and the facts section.
+        facts_header = "Facts:\n"
+        separator_tokens = _count_tokens("\n\n" + facts_header) if base_text else _count_tokens(facts_header)
+        running_tokens = base_tokens + separator_tokens
+
+        fact_lines: list[str] = []
+        for fact in ranked_facts:
+            content_value = fact.get("content")
+            if not isinstance(content_value, str):
+                continue
+            content = content_value.strip()
+            if not content:
+                continue
+            category = str(fact.get("category", "context")).strip() or "context"
+            confidence = _coerce_confidence(fact.get("confidence"), default=0.0)
+            line = f"- [{category} | {confidence:.2f}] {content}"
+
+            # Each additional line is preceded by a newline (except the first).
+            line_text = ("\n" + line) if fact_lines else line
+            line_tokens = _count_tokens(line_text)
+
+            if running_tokens + line_tokens <= max_tokens:
+                fact_lines.append(line)
+                running_tokens += line_tokens
+            else:
+                break
+
+        if fact_lines:
+            sections.append("Facts:\n" + "\n".join(fact_lines))
+
+    if not sections:
+        return ""
+
+    result = "\n\n".join(sections)
+
+    # Use accurate token counting with tiktoken
+    token_count = _count_tokens(result)
+    if token_count > max_tokens:
+        # Truncate to fit within token limit
+        # Estimate characters to remove based on token ratio
+        char_per_token = len(result) / token_count
+        target_chars = int(max_tokens * char_per_token * 0.95)  # 95% to leave margin
+        result = result[:target_chars] + "\n..."
+
+    return result
+
+
+def format_conversation_for_update(messages: list[Any]) -> str:
+    """Format conversation messages for memory update prompt.
+
+    Args:
+        messages: List of conversation messages.
+
+    Returns:
+        Formatted conversation string.
+    """
+    lines = []
+    for msg in messages:
+        role = getattr(msg, "type", "unknown")
+        content = getattr(msg, "content", str(msg))
+
+        # Handle content that might be a list (multimodal)
+        if isinstance(content, list):
+            text_parts = [p.get("text", "") for p in content if isinstance(p, dict) and "text" in p]
+            content = " ".join(text_parts) if text_parts else str(content)
+
+        # Strip uploaded_files tags from human messages to avoid persisting
+        # ephemeral file path info into long-term memory.  Skip the turn entirely
+        # when nothing remains after stripping (upload-only message).
+        if role == "human":
+            content = re.sub(r"<uploaded_files>[\s\S]*?</uploaded_files>\n*", "", str(content)).strip()
+            if not content:
+                continue
+
+        # Truncate very long messages
+        if len(str(content)) > 1000:
+            content = str(content)[:1000] + "..."
+
+        if role == "human":
+            lines.append(f"User: {content}")
+        elif role == "ai":
+            lines.append(f"Assistant: {content}")
+
+    return "\n\n".join(lines)