fix: move Key Citations to early position in reporter prompt to reduce URL hallucination (#859)

* fix: move Key Citations to early position in reporter prompt to reduce URL hallucination Move the Key Citations section from position 6 (end of report) to position 2 (immediately after title) in the reporter prompt. When citations are placed at the end of a long report, LLMs tend to forget real URLs from source material and fabricate plausible-looking but non-existent URLs. Changes to src/prompts/reporter.md: - Move Key Citations from section 6 to section 2 (right after Title) - Add explicit anti-hallucination instructions: only use URLs from provided source material, never fabricate or guess URLs - Keep a repeated citation list at the end (section 7) for completeness - Renumber all subsequent sections accordingly - Update Notes section to reflect new structure Tested with real DeerFlow backend + DuckDuckGo search: - Before: multiple hallucinated URLs in report citations - After: hallucinated URLs reduced significantly Closes #825 * fix: move citations after observations in reporter_node to reduce URL hallucination Previously, the citation message was appended BEFORE observation messages, meaning it got buried under potentially thousands of chars of research data. By the time the LLM reached the end of the context to generate the report, it had 'forgotten' the real URLs and fabricated plausible-looking ones. Now citations are appended AFTER compressed observations, placing them closest to the LLM's generation point for maximum recall accuracy. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-15 11:04:44 +08:00 · 2026-02-14 15:21:24 +08:00
parent c95b2711c3
commit 13a25112b1
2 changed files with 31 additions and 17 deletions
--- a/src/graph/nodes.py
+++ b/src/graph/nodes.py
@@ -853,7 +853,8 @@ def reporter_node(state: State, config: RunnableConfig):
    # Get collected citations for the report
    citations = state.get("citations", [])

-    # If we have collected citations, provide them to the reporter
+    # Build citation messages for the reporter
+    citation_list = ""
    if citations:
        citation_list = "\n\n## Available Source References (use these in References section):\n\n"
        for i, citation in enumerate(citations, 1):
@@ -869,13 +870,6 @@ def reporter_node(state: State, config: RunnableConfig):
        
        logger.info(f"Providing {len(citations)} collected citations to reporter")

-        invoke_messages.append(
-            HumanMessage(
-                content=citation_list,
-                name="system",
-            )
-        )
-
    observation_messages = []
    for observation in observations:
        observation_messages.append(
@@ -892,6 +886,17 @@ def reporter_node(state: State, config: RunnableConfig):
    )
    invoke_messages += compressed_state.get("messages", [])

+    # Append citations AFTER observations so they are closest to the LLM's
+    # generation point.  This reduces the chance of the model "forgetting"
+    # real URLs and fabricating plausible-looking ones instead.
+    if citation_list:
+        invoke_messages.append(
+            HumanMessage(
+                content=citation_list,
+                name="system",
+            )
+        )
+
    logger.debug(f"Current invoke messages: {invoke_messages}")
    response = get_llm_by_type(AGENT_LLM_MAP["reporter"]).invoke(invoke_messages)
    response_content = response.content