fix: move Key Citations to early position in reporter prompt to reduce URL hallucination (#859)

* fix: move Key Citations to early position in reporter prompt to reduce URL hallucination

Move the Key Citations section from position 6 (end of report) to position 2
(immediately after title) in the reporter prompt. When citations are placed at
the end of a long report, LLMs tend to forget real URLs from source material
and fabricate plausible-looking but non-existent URLs.

Changes to src/prompts/reporter.md:
- Move Key Citations from section 6 to section 2 (right after Title)
- Add explicit anti-hallucination instructions: only use URLs from provided
  source material, never fabricate or guess URLs
- Keep a repeated citation list at the end (section 7) for completeness
- Renumber all subsequent sections accordingly
- Update Notes section to reflect new structure

Tested with real DeerFlow backend + DuckDuckGo search:
- Before: multiple hallucinated URLs in report citations
- After: hallucinated URLs reduced significantly

Closes #825

* fix: move citations after observations in reporter_node to reduce URL hallucination

Previously, the citation message was appended BEFORE observation messages,
meaning it got buried under potentially thousands of chars of research data.
By the time the LLM reached the end of the context to generate the report,
it had 'forgotten' the real URLs and fabricated plausible-looking ones.

Now citations are appended AFTER compressed observations, placing them
closest to the LLM's generation point for maximum recall accuracy.

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
This commit is contained in:
大猫子
2026-02-14 15:21:24 +08:00
committed by GitHub
parent c95b2711c3
commit 13a25112b1
2 changed files with 31 additions and 17 deletions

View File

@@ -853,7 +853,8 @@ def reporter_node(state: State, config: RunnableConfig):
# Get collected citations for the report
citations = state.get("citations", [])
# If we have collected citations, provide them to the reporter
# Build citation messages for the reporter
citation_list = ""
if citations:
citation_list = "\n\n## Available Source References (use these in References section):\n\n"
for i, citation in enumerate(citations, 1):
@@ -869,13 +870,6 @@ def reporter_node(state: State, config: RunnableConfig):
logger.info(f"Providing {len(citations)} collected citations to reporter")
invoke_messages.append(
HumanMessage(
content=citation_list,
name="system",
)
)
observation_messages = []
for observation in observations:
observation_messages.append(
@@ -892,6 +886,17 @@ def reporter_node(state: State, config: RunnableConfig):
)
invoke_messages += compressed_state.get("messages", [])
# Append citations AFTER observations so they are closest to the LLM's
# generation point. This reduces the chance of the model "forgetting"
# real URLs and fabricating plausible-looking ones instead.
if citation_list:
invoke_messages.append(
HumanMessage(
content=citation_list,
name="system",
)
)
logger.debug(f"Current invoke messages: {invoke_messages}")
response = get_llm_by_type(AGENT_LLM_MAP["reporter"]).invoke(invoke_messages)
response_content = response.content