chore: 移除所有 Citations 相关逻辑,为后续重构做准备

- Backend: 删除 lead_agent / general_purpose 中的 citations_format 与引用相关 reminder;artifacts 下载不再对 markdown 做 citation 清洗,统一走 FileResponse,保留 Response 用于二进制 inline
- Frontend: 删除 core/citations 模块、inline-citation、safe-citation-content;新增 MarkdownContent 仅做 Markdown 渲染;消息/artifact 预览与复制均使用原始 content
- i18n: 移除 citations 命名空间(loadingCitations、loadingCitationsWithCount)
- 技能与 demo: 措辞改为 references,demo 数据去掉 <citations> 块
- 文档: 更新 CLAUDE/AGENTS/README 描述,新增按文件 diff 的代码变更总结

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
ruitanglin
2026-02-09 16:24:01 +08:00
parent 59c8fec7e7
commit 8747873b8d
27 changed files with 1043 additions and 894 deletions

View File

@@ -156,7 +156,7 @@ FastAPI application on port 8001 with health check at `GET /health`.
| **Skills** (`/api/skills`) | `GET /` - list skills; `GET /{name}` - details; `PUT /{name}` - update enabled; `POST /install` - install from .skill archive |
| **Memory** (`/api/memory`) | `GET /` - memory data; `POST /reload` - force reload; `GET /config` - config; `GET /status` - config + data |
| **Uploads** (`/api/threads/{id}/uploads`) | `POST /` - upload files (auto-converts PDF/PPT/Excel/Word); `GET /list` - list; `DELETE /{filename}` - delete |
| **Artifacts** (`/api/threads/{id}/artifacts`) | `GET /{path}` - serve artifacts; `?download=true` for download with citation removal |
| **Artifacts** (`/api/threads/{id}/artifacts`) | `GET /{path}` - serve artifacts; `?download=true` for file download |
Proxied through nginx: `/api/langgraph/*` → LangGraph, all other `/api/*` → Gateway.

View File

@@ -240,34 +240,8 @@ You have access to skills that provide optimized workflows for specific tasks. E
- Action-Oriented: Focus on delivering results, not explaining processes
</response_style>
<citations_format>
After web_search, ALWAYS include citations in your output:
1. Start with a `<citations>` block in JSONL format listing all sources
2. In content, use FULL markdown link format: [Short Title](full_url)
**CRITICAL - Citation Link Format:**
- CORRECT: `[TechCrunch](https://techcrunch.com/ai-trends)` - full markdown link with URL
- WRONG: `[arXiv:2502.19166]` - missing URL, will NOT render as link
- WRONG: `[Source]` - missing URL, will NOT render as link
**Rules:**
- Every citation MUST be a complete markdown link with URL: `[Title](https://...)`
- Write content naturally, add citation link at end of sentence/paragraph
- NEVER use bare brackets like `[arXiv:xxx]` or `[Source]` without URL
**Example:**
<citations>
{{"id": "cite-1", "title": "AI Trends 2026", "url": "https://techcrunch.com/ai-trends", "snippet": "Tech industry predictions"}}
{{"id": "cite-2", "title": "OpenAI Research", "url": "https://openai.com/research", "snippet": "Latest AI research developments"}}
</citations>
The key AI trends for 2026 include enhanced reasoning capabilities and multimodal integration [TechCrunch](https://techcrunch.com/ai-trends). Recent breakthroughs in language models have also accelerated progress [OpenAI](https://openai.com/research).
</citations_format>
<critical_reminders>
- **Clarification First**: ALWAYS clarify unclear/missing/ambiguous requirements BEFORE starting work - never assume or guess
- **Web search citations**: When you use web_search (or synthesize subagent results that used it), you MUST output the `<citations>` block and [Title](url) links as specified in citations_format so citations display for the user.
{subagent_reminder}- Skill First: Always load the relevant skill before starting **complex** tasks.
- Progressive Loading: Load resources incrementally as referenced in skills
- Output Files: Final deliverables must be in `/mnt/user-data/outputs`
@@ -341,7 +315,6 @@ def apply_prompt_template(subagent_enabled: bool = False) -> str:
# Add subagent reminder to critical_reminders if enabled
subagent_reminder = (
"- **Orchestrator Mode**: You are a task orchestrator - decompose complex tasks into parallel sub-tasks and launch multiple subagents simultaneously. Synthesize results, don't execute directly.\n"
"- **Citations when synthesizing**: When you synthesize subagent results that used web search or cite sources, you MUST include a consolidated `<citations>` block (JSONL format) and use [Title](url) markdown links in your response so citations display correctly.\n"
if subagent_enabled
else ""
)

View File

@@ -1,12 +1,10 @@
import json
import mimetypes
import re
import zipfile
from pathlib import Path
from urllib.parse import quote
from fastapi import APIRouter, HTTPException, Request, Response
from fastapi.responses import FileResponse, HTMLResponse, PlainTextResponse
from fastapi import APIRouter, HTTPException, Request
from fastapi.responses import FileResponse, HTMLResponse, PlainTextResponse, Response
from src.gateway.path_utils import resolve_thread_virtual_path
@@ -24,40 +22,6 @@ def is_text_file_by_content(path: Path, sample_size: int = 8192) -> bool:
return False
def _extract_citation_urls(content: str) -> set[str]:
"""Extract URLs from <citations> JSONL blocks. Format must match frontend core/citations/utils.ts."""
urls: set[str] = set()
for match in re.finditer(r"<citations>([\s\S]*?)</citations>", content):
for line in match.group(1).split("\n"):
line = line.strip()
if line.startswith("{"):
try:
obj = json.loads(line)
if "url" in obj:
urls.add(obj["url"])
except (json.JSONDecodeError, ValueError):
pass
return urls
def remove_citations_block(content: str) -> str:
"""Remove ALL citations from markdown (blocks, [cite-N], and citation links). Used for downloads."""
if not content:
return content
citation_urls = _extract_citation_urls(content)
result = re.sub(r"<citations>[\s\S]*?</citations>", "", content)
if "<citations>" in result:
result = re.sub(r"<citations>[\s\S]*$", "", result)
result = re.sub(r"\[cite-\d+\]", "", result)
for url in citation_urls:
result = re.sub(rf"\[[^\]]+\]\({re.escape(url)}\)", "", result)
return re.sub(r"\n{3,}", "\n\n", result).strip()
def _extract_file_from_skill_archive(zip_path: Path, internal_path: str) -> bytes | None:
"""Extract a file from a .skill ZIP archive.
@@ -172,24 +136,9 @@ async def get_artifact(thread_id: str, path: str, request: Request) -> FileRespo
# Encode filename for Content-Disposition header (RFC 5987)
encoded_filename = quote(actual_path.name)
# Check if this is a markdown file that might contain citations
is_markdown = mime_type == "text/markdown" or actual_path.suffix.lower() in [".md", ".markdown"]
# if `download` query parameter is true, return the file as a download
if request.query_params.get("download"):
# For markdown files, remove citations block before download
if is_markdown:
content = actual_path.read_text()
clean_content = remove_citations_block(content)
return Response(
content=clean_content.encode("utf-8"),
media_type="text/markdown",
headers={
"Content-Disposition": f"attachment; filename*=UTF-8''{encoded_filename}",
"Content-Type": "text/markdown; charset=utf-8"
}
)
return FileResponse(path=actual_path, filename=actual_path.name, media_type=mime_type, headers={"Content-Disposition": f"attachment; filename*=UTF-8''{encoded_filename}"})
if mime_type and mime_type == "text/html":

View File

@@ -24,21 +24,10 @@ Do NOT use for simple, single-step operations.""",
- Do NOT ask for clarification - work with the information provided
</guidelines>
<citations_format>
If you used web_search (or similar) and cite sources, ALWAYS include citations in your output:
1. Start with a `<citations>` block in JSONL format listing all sources (one JSON object per line)
2. In content, use FULL markdown link format: [Short Title](full_url)
- Every citation MUST be a complete markdown link with URL: [Title](https://...)
- Example block:
<citations>
{"id": "cite-1", "title": "...", "url": "https://...", "snippet": "..."}
</citations>
</citations_format>
<output_format>
When you complete the task, provide:
1. A brief summary of what was accomplished
2. Key findings or results (with citation links when from web search)
2. Key findings or results
3. Any relevant file paths, data, or artifacts created
4. Issues encountered (if any)
</output_format>