mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-20 04:44:46 +08:00
* feat: add citation/reference support to deep research reports (#1141) - Enhance lead agent system prompt with mandatory citation requirements after web_search/web_fetch tool usage - Add citation examples and best practices to GitHub Deep Research skill - Add citation hints to report template (Executive Summary, Key Analysis) - Style regular markdown links in frontend for visual distinction (color, underline, hover effect) - Fix TitleMiddleware being registered when title generation is disabled * fix: address PR review comments - Revert TitleMiddleware conditional registration (agent.py) to avoid sync/async incompatibility with DeerFlowClient - Fix markdown link rendering: merge classNames instead of overwriting, only set target=_blank for external http(s) URLs - Remove unrelated package.json/pnpm-lock.yaml changes * fix: use plain markdown links in Sources section for cleaner rendering Inline citations in report body use [citation:Title](URL) for pill/badge style. Sources section uses plain [Title](URL) for simple underlined link style. * fix(frontend): render plain links as underlined text in artifact markdown Only links with citation: prefix render as Badge pills. Regular links in Sources section now render as underlined text links. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
167 lines
4.9 KiB
Markdown
167 lines
4.9 KiB
Markdown
---
|
||
name: github-deep-research
|
||
description: Conduct multi-round deep research on any GitHub Repo. Use when users request comprehensive analysis, timeline reconstruction, competitive analysis, or in-depth investigation of GitHub. Produces structured markdown reports with executive summaries, chronological timelines, metrics analysis, and Mermaid diagrams. Triggers on Github repository URL or open source projects.
|
||
---
|
||
|
||
# GitHub Deep Research Skill
|
||
|
||
Multi-round research combining GitHub API, web_search, web_fetch to produce comprehensive markdown reports.
|
||
|
||
## Research Workflow
|
||
|
||
- Round 1: GitHub API
|
||
- Round 2: Discovery
|
||
- Round 3: Deep Investigation
|
||
- Round 4: Deep Dive
|
||
|
||
## Core Methodology
|
||
|
||
### Query Strategy
|
||
|
||
**Broad to Narrow**: Start with GitHub API, then general queries, refine based on findings.
|
||
|
||
```
|
||
Round 1: GitHub API
|
||
Round 2: "{topic} overview"
|
||
Round 3: "{topic} architecture", "{topic} vs alternatives"
|
||
Round 4: "{topic} issues", "{topic} roadmap", "site:github.com {topic}"
|
||
```
|
||
|
||
**Source Prioritization**:
|
||
1. Official docs/repos (highest weight)
|
||
2. Technical blogs (Medium, Dev.to)
|
||
3. News articles (verified outlets)
|
||
4. Community discussions (Reddit, HN)
|
||
5. Social media (lowest weight, for sentiment)
|
||
|
||
### Research Rounds
|
||
|
||
**Round 1 - GitHub API**
|
||
Directly execute `scripts/github_api.py` without `read_file()`:
|
||
```bash
|
||
python /path/to/skill/scripts/github_api.py <owner> <repo> summary
|
||
python /path/to/skill/scripts/github_api.py <owner> <repo> readme
|
||
python /path/to/skill/scripts/github_api.py <owner> <repo> tree
|
||
```
|
||
|
||
**Available commands (the last argument of `github_api.py`):**
|
||
- summary
|
||
- info
|
||
- readme
|
||
- tree
|
||
- languages
|
||
- contributors
|
||
- commits
|
||
- issues
|
||
- prs
|
||
- releases
|
||
|
||
**Round 2 - Discovery (3-5 web_search)**
|
||
- Get overview and identify key terms
|
||
- Find official website/repo
|
||
- Identify main players/competitors
|
||
|
||
**Round 3 - Deep Investigation (5-10 web_search + web_fetch)**
|
||
- Technical architecture details
|
||
- Timeline of key events
|
||
- Community sentiment
|
||
- Use web_fetch on valuable URLs for full content
|
||
|
||
**Round 4 - Deep Dive**
|
||
- Analyze commit history for timeline
|
||
- Review issues/PRs for feature evolution
|
||
- Check contributor activity
|
||
|
||
## Report Structure
|
||
|
||
Follow template in `assets/report_template.md`:
|
||
|
||
1. **Metadata Block** - Date, confidence level, subject
|
||
2. **Executive Summary** - 2-3 sentence overview with key metrics
|
||
3. **Chronological Timeline** - Phased breakdown with dates
|
||
4. **Key Analysis Sections** - Topic-specific deep dives
|
||
5. **Metrics & Comparisons** - Tables, growth charts
|
||
6. **Strengths & Weaknesses** - Balanced assessment
|
||
7. **Sources** - Categorized references
|
||
8. **Confidence Assessment** - Claims by confidence level
|
||
9. **Methodology** - Research approach used
|
||
|
||
### Mermaid Diagrams
|
||
|
||
Include diagrams where helpful:
|
||
|
||
**Timeline (Gantt)**:
|
||
```mermaid
|
||
gantt
|
||
title Project Timeline
|
||
dateFormat YYYY-MM-DD
|
||
section Phase 1
|
||
Development :2025-01-01, 2025-03-01
|
||
section Phase 2
|
||
Launch :2025-03-01, 2025-04-01
|
||
```
|
||
|
||
**Architecture (Flowchart)**:
|
||
```mermaid
|
||
flowchart TD
|
||
A[User] --> B[Coordinator]
|
||
B --> C[Planner]
|
||
C --> D[Research Team]
|
||
D --> E[Reporter]
|
||
```
|
||
|
||
**Comparison (Pie/Bar)**:
|
||
```mermaid
|
||
pie title Market Share
|
||
"Project A" : 45
|
||
"Project B" : 30
|
||
"Others" : 25
|
||
```
|
||
|
||
## Confidence Scoring
|
||
|
||
Assign confidence based on source quality:
|
||
|
||
| Confidence | Criteria |
|
||
|------------|----------|
|
||
| High (90%+) | Official docs, GitHub data, multiple corroborating sources |
|
||
| Medium (70-89%) | Single reliable source, recent articles |
|
||
| Low (50-69%) | Social media, unverified claims, outdated info |
|
||
|
||
## Output
|
||
|
||
Save report as: `research_{topic}_{YYYYMMDD}.md`
|
||
|
||
### Formatting Rules
|
||
|
||
- Chinese content: Use full-width punctuation(,。:;!?)
|
||
- Technical terms: Provide Wiki/doc URL on first mention
|
||
- Tables: Use for metrics, comparisons
|
||
- Code blocks: For technical examples
|
||
- Mermaid: For architecture, timelines, flows
|
||
|
||
## Best Practices
|
||
|
||
1. **Start with official sources** - Repo, docs, company blog
|
||
2. **Verify dates from commits/PRs** - More reliable than articles
|
||
3. **Triangulate claims** - 2+ independent sources
|
||
4. **Note conflicting info** - Don't hide contradictions
|
||
5. **Distinguish fact vs opinion** - Label speculation clearly
|
||
6. **CRITICAL: Always include inline citations** - Use `[citation:Title](URL)` format immediately after each claim from external sources
|
||
7. **Extract URLs from search results** - web_search returns {title, url, snippet} - always use the URL field
|
||
8. **Update as you go** - Don't wait until end to synthesize
|
||
|
||
### Citation Examples
|
||
|
||
**Good - With inline citations:**
|
||
```markdown
|
||
The project gained 10,000 stars within 3 months of launch [citation:GitHub Stats](https://github.com/owner/repo).
|
||
The architecture uses LangGraph for workflow orchestration [citation:LangGraph Docs](https://langchain.com/langgraph).
|
||
```
|
||
|
||
**Bad - Without citations:**
|
||
```markdown
|
||
The project gained 10,000 stars within 3 months of launch.
|
||
The architecture uses LangGraph for workflow orchestration.
|
||
```
|