feat: add citation support in research report block and markdown

* feat: add citation support in research report block and markdown - Enhanced ResearchReportBlock to fetch citations based on researchId and pass them to the Markdown component. - Introduced CitationLink component to display citation metadata on hover for links in markdown. - Implemented CitationCard and CitationList components for displaying citation details and lists. - Updated Markdown component to handle citation links and inline citations. - Created HoverCard component for displaying citation information in a tooltip-like manner. - Modified store to manage citations, including setting and retrieving citations for ongoing research. - Added CitationsEvent type to handle citations in chat events and updated Message type to include citations. * fix(log): Enable the logging level when enabling the DEBUG environment variable (#793) * fix(frontend): render all tool calls in the frontend #796 (#797) * build(deps): bump jspdf from 3.0.4 to 4.0.0 in /web (#798) Bumps [jspdf](https://github.com/parallax/jsPDF) from 3.0.4 to 4.0.0. - [Release notes](https://github.com/parallax/jsPDF/releases) - [Changelog](https://github.com/parallax/jsPDF/blob/master/RELEASE.md) - [Commits](https://github.com/parallax/jsPDF/compare/v3.0.4...v4.0.0) --- updated-dependencies: - dependency-name: jspdf dependency-version: 4.0.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix(frontend):added the display of the 'analyst' message #800 (#801) * fix: migrate from deprecated create_react_agent to langchain.agents.create_agent (#802) * fix: migrate from deprecated create_react_agent to langchain.agents.create_agent Fixes #799 - Replace deprecated langgraph.prebuilt.create_react_agent with langchain.agents.create_agent (LangGraph 1.0 migration) - Add DynamicPromptMiddleware to handle dynamic prompt templates (replaces the 'prompt' callable parameter) - Add PreModelHookMiddleware to handle pre-model hooks (replaces the 'pre_model_hook' parameter) - Update AgentState import from langchain.agents in template.py - Update tests to use the new API * fix:update the code with review comments * fix: Add runtime parameter to compress_messages method(#803) * fix: Add runtime parameter to compress_messages method(#803) The compress_messages method was being called by PreModelHookMiddleware with both state and runtime parameters, but only accepted state parameter. This caused a TypeError when the middleware executed the pre_model_hook. Added optional runtime parameter to compress_messages signature to match the expected interface while maintaining backward compatibility. * Update the code with the review comments * fix: Refactor citation handling and add comprehensive tests for citation features * refactor: Clean up imports and formatting across citation modules * fix: Add monkeypatch to clear AGENT_RECURSION_LIMIT in recursion limit tests * feat: Enhance citation link handling in Markdown component * fix: Exclude citations from finish reason handling in mergeMessage function * fix(nodes): update message handling * fix(citations): improve citation extraction and handling in event processing * feat(citations): enhance citation extraction and handling with improved merging and normalization * fix(reporter): update citation formatting instructions for clarity and consistency * fix(reporter): prioritize using Markdown tables for data presentation and comparison --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: LoftyComet <1277173875@qq。> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-13 02:24:44 +08:00 · 2026-01-24 17:49:13 +08:00
parent 612bddd3fb
commit b7f0f54aa0
22 changed files with 2125 additions and 29 deletions
--- a/src/graph/nodes.py
+++ b/src/graph/nodes.py
@@ -14,6 +14,7 @@ from langchain_mcp_adapters.client import MultiServerMCPClient
 from langgraph.types import Command, interrupt

 from src.agents import create_agent
+from src.citations import extract_citations_from_messages, merge_citations
 from src.config.agents import AGENT_LLM_MAP
 from src.config.configuration import Configuration
 from src.llms.llm import get_llm_by_type, get_llm_token_limit_by_type
@@ -715,6 +716,7 @@ def coordinator_node(
                        "clarified_research_topic": clarified_topic,
                        "is_clarification_complete": False,
                        "goto": goto,
+                        "citations": state.get("citations", []),
                        "__interrupt__": [("coordinator", response.content)],
                    },
                    goto=goto,
@@ -802,6 +804,7 @@ def coordinator_node(
            "clarification_history": clarification_history,
            "is_clarification_complete": goto != "coordinator",
            "goto": goto,
+            "citations": state.get("citations", []),
        },
        goto=goto,
    )
@@ -822,14 +825,32 @@ def reporter_node(state: State, config: RunnableConfig):
    }
    invoke_messages = apply_prompt_template("reporter", input_, configurable, input_.get("locale", "en-US"))
    observations = state.get("observations", [])
+    
+    # Get collected citations for the report
+    citations = state.get("citations", [])

-    # Add a reminder about the new report format, citation style, and table usage
-    invoke_messages.append(
-        HumanMessage(
-            content="IMPORTANT: Structure your report according to the format in the prompt. Remember to include:\n\n1. Key Points - A bulleted list of the most important findings\n2. Overview - A brief introduction to the topic\n3. Detailed Analysis - Organized into logical sections\n4. Survey Note (optional) - For more comprehensive reports\n5. Key Citations - List all references at the end\n\nFor citations, DO NOT include inline citations in the text. Instead, place all citations in the 'Key Citations' section at the end using the format: `- [Source Title](URL)`. Include an empty line between each citation for better readability.\n\nPRIORITIZE USING MARKDOWN TABLES for data presentation and comparison. Use tables whenever presenting comparative data, statistics, features, or options. Structure tables with clear headers and aligned columns. Example table format:\n\n| Feature | Description | Pros | Cons |\n|---------|-------------|------|------|\n| Feature 1 | Description 1 | Pros 1 | Cons 1 |\n| Feature 2 | Description 2 | Pros 2 | Cons 2 |",
-            name="system",
+    # If we have collected citations, provide them to the reporter
+    if citations:
+        citation_list = "\n\n## Available Source References (use these in References section):\n\n"
+        for i, citation in enumerate(citations, 1):
+            title = citation.get("title", "Untitled")
+            url = citation.get("url", "")
+            domain = citation.get("domain", "")
+            description = citation.get("description", "")
+            desc_truncated = description[:150] if description else ""
+            citation_list += f"{i}. **{title}**\n   - URL: {url}\n   - Domain: {domain}\n"
+            if desc_truncated:
+                citation_list += f"   - Summary: {desc_truncated}...\n"
+            citation_list += "\n"
+        
+        logger.info(f"Providing {len(citations)} collected citations to reporter")
+
+        invoke_messages.append(
+            HumanMessage(
+                content=citation_list,
+                name="system",
+            )
        )
-    )

    observation_messages = []
    for observation in observations:
@@ -852,7 +873,10 @@ def reporter_node(state: State, config: RunnableConfig):
    response_content = response.content
    logger.info(f"reporter response: {response_content}")

-    return {"final_report": response_content}
+    return {
+        "final_report": response_content,
+        "citations": citations,  # Pass citations through to final state
+    }


 def research_team_node(state: State):
@@ -1114,11 +1138,23 @@ async def _execute_agent_step(
            f"All tool results will be preserved and streamed to frontend."
        )

+    # Extract citations from tool call results (web_search, crawl)
+    existing_citations = state.get("citations", [])
+    new_citations = extract_citations_from_messages(agent_messages)
+    merged_citations = merge_citations(existing_citations, new_citations)
+    
+    if new_citations:
+        logger.info(
+            f"Extracted {len(new_citations)} new citations from {agent_name} agent. "
+            f"Total citations: {len(merged_citations)}"
+        )
+
    return Command(
        update={
+            **preserve_state_meta_fields(state),
            "messages": agent_messages,
            "observations": observations + [response_content + validation_info],
-            **preserve_state_meta_fields(state),
+            "citations": merged_citations,  # Store merged citations based on existing state and new tool results
        },
        goto="research_team",
    )
--- a/src/graph/types.py
+++ b/src/graph/types.py
@@ -3,6 +3,7 @@


 from dataclasses import field
+from typing import Any

 from langgraph.graph import MessagesState

@@ -27,6 +28,10 @@ class State(MessagesState):
    auto_accepted_plan: bool = False
    enable_background_investigation: bool = True
    background_investigation_results: str = None
+    
+    # Citation metadata collected during research
+    # Format: List of citation dictionaries with url, title, description, etc.
+    citations: list[dict[str, Any]] = field(default_factory=list)

    # Clarification state tracking (disabled by default)
    enable_clarification: bool = (