Files
deer-flow/web/messages/en.json
Willem Jiang 8d9d767051 feat(eval): add report quality evaluation module and UI integration (#776)
* feat(eval): add report quality evaluation module

Addresses issue #773 - How to evaluate generated report quality objectively.

This module provides two evaluation approaches:
1. Automated metrics (no LLM required):
   - Citation count and source diversity
   - Word count compliance per report style
   - Section structure validation
   - Image inclusion tracking

2. LLM-as-Judge evaluation:
   - Factual accuracy scoring
   - Completeness assessment
   - Coherence evaluation
   - Relevance and citation quality checks

The combined evaluator provides a final score (1-10) and letter grade (A+ to F).

Files added:
- src/eval/__init__.py
- src/eval/metrics.py
- src/eval/llm_judge.py
- src/eval/evaluator.py
- tests/unit/eval/test_metrics.py
- tests/unit/eval/test_evaluator.py

* feat(eval): integrate report evaluation with web UI

This commit adds the web UI integration for the evaluation module:

Backend:
- Add EvaluateReportRequest/Response models in src/server/eval_request.py
- Add /api/report/evaluate endpoint to src/server/app.py

Frontend:
- Add evaluateReport API function in web/src/core/api/evaluate.ts
- Create EvaluationDialog component with grade badge, metrics display,
  and optional LLM deep evaluation
- Add evaluation button (graduation cap icon) to research-block.tsx toolbar
- Add i18n translations for English and Chinese

The evaluation UI allows users to:
1. View quick metrics-only evaluation (instant)
2. Optionally run deep LLM-based evaluation for detailed analysis
3. See grade (A+ to F), score (1-10), and metric breakdown

* feat(eval): improve evaluation reliability and add LLM judge tests

- Extract MAX_REPORT_LENGTH constant in llm_judge.py for maintainability
- Add comprehensive unit tests for LLMJudge class (parse_response,
  calculate_weighted_score, evaluate with mocked LLM)
- Pass reportStyle prop to EvaluationDialog for accurate evaluation criteria
- Add researchQueries store map to reliably associate queries with research
- Add getResearchQuery helper to retrieve query by researchId
- Remove unused imports in test_metrics.py

* fix(eval): use resolveServiceURL for evaluate API endpoint

The evaluateReport function was using a relative URL '/api/report/evaluate'
which sent requests to the Next.js server instead of the FastAPI backend.
Changed to use resolveServiceURL() consistent with other API functions.

* fix: improve type accuracy and React hooks in evaluation components

- Fix get_word_count_target return type from Optional[Dict] to Dict since it always returns a value via default fallback
- Fix useEffect dependency issue in EvaluationDialog using useRef to prevent unwanted re-evaluations
- Add aria-label to GradeBadge for screen reader accessibility
2025-12-25 21:55:48 +08:00

286 lines
13 KiB
JSON

{
"common": {
"cancel": "Cancel",
"save": "Save",
"settings": "Settings",
"getStarted": "Get Started",
"learnMore": "Learn More",
"starOnGitHub": "Star on GitHub",
"send": "Send",
"stop": "Stop",
"linkNotReliable": "This link might be a hallucination from AI model and may not be reliable.",
"noResult": "No result"
},
"messageInput": {
"placeholder": "What can I do for you?",
"placeholderWithRag": "What can I do for you? \nYou may refer to RAG resources by using @."
},
"header": {
"title": "DeerFlow"
},
"hero": {
"title": "Deep Research",
"subtitle": "at Your Fingertips",
"description": "Meet DeerFlow, your personal Deep Research assistant. With powerful tools like search engines, web crawlers, Python and MCP services, it delivers instant insights, comprehensive reports, and even captivating podcasts.",
"footnote": "* DEER stands for Deep Exploration and Efficient Research."
},
"settings": {
"title": "DeerFlow Settings",
"description": "Manage your DeerFlow settings here.",
"addServers": "Add Servers",
"cancel": "Cancel",
"addNewMCPServers": "Add New MCP Servers",
"mcpConfigDescription": "DeerFlow uses the standard JSON MCP config to create a new server.",
"pasteConfigBelow": "Paste your config below and click \"Add\" to add new servers.",
"rag": {
"title": "Resources",
"description": "Manage your knowledge base resources here. Upload markdown or text files to be indexed for retrieval.",
"upload": "Upload",
"uploading": "Uploading...",
"uploadSuccess": "File uploaded successfully",
"uploadFailed": "Failed to upload file",
"emptyFile": "Cannot upload an empty file",
"loading": "Loading resources...",
"noResources": "No resources found. Upload a file to get started."
},
"add": "Add",
"general": {
"title": "General",
"autoAcceptPlan": "Allow automatic acceptance of plans",
"enableClarification": "Allow clarification",
"maxClarificationRounds": "Max clarification rounds",
"maxClarificationRoundsDescription": "Maximum number of clarification rounds when clarification is enabled (default: 3).",
"maxPlanIterations": "Max plan iterations",
"maxPlanIterationsDescription": "Set to 1 for single-step planning. Set to 2 or more to enable re-planning.",
"maxStepsOfPlan": "Max steps of a research plan",
"maxStepsDescription": "By default, each research plan has 3 steps.",
"maxSearchResults": "Max search results",
"maxSearchResultsDescription": "By default, each search step has 3 results.",
"enableWebSearch": "Enable web search",
"enableWebSearchDescription": "When disabled, only local RAG knowledge base will be used. Useful for environments without internet access."
},
"mcp": {
"title": "MCP Servers",
"description": "The Model Context Protocol boosts DeerFlow by integrating external tools for tasks like private domain searches, web browsing, food ordering, and more. Click here to",
"learnMore": "learn more about MCP.",
"enableDisable": "Enable/disable server",
"deleteServer": "Delete server",
"editServer": "Edit server",
"refreshServer": "Refresh server",
"editServerDescription": "Edit the MCP server configuration",
"editServerNote": "Update the server configuration in JSON format",
"disabled": "Disabled",
"new": "New",
"invalidJson": "Invalid JSON format",
"validationFailed": "Validation failed",
"missingServerName": "Missing server name"
},
"about": {
"title": "About"
},
"reportStyle": {
"writingStyle": "Writing Style",
"chooseTitle": "Choose Writing Style",
"chooseDesc": "Select the writing style for your research reports. Different styles are optimized for different audiences and purposes.",
"academic": "Academic",
"academicDesc": "Formal, objective, and analytical with precise terminology",
"popularScience": "Popular Science",
"popularScienceDesc": "Engaging and accessible for general audience",
"news": "News",
"newsDesc": "Factual, concise, and impartial journalistic style",
"socialMedia": "Social Media",
"socialMediaDesc": "Concise, attention-grabbing, and shareable",
"strategicInvestment": "Strategic Investment",
"strategicInvestmentDesc": "Deep, comprehensive analysis for strategic investment institutions with actionable insights"
}
},
"footer": {
"quote": "Originated from Open Source, give back to Open Source.",
"license": "Licensed under MIT License",
"copyright": "DeerFlow"
},
"chat": {
"page": {
"loading": "Loading DeerFlow...",
"welcomeUser": "Welcome, {username}",
"starOnGitHub": "Star DeerFlow on GitHub"
},
"welcome": {
"greeting": "👋 Hello, there!",
"description": "Welcome to 🦌 DeerFlow, a deep research assistant built on cutting-edge language models, helps you search on web, browse information, and handle complex tasks."
},
"conversationStarters": [
"How many times taller is the Eiffel Tower than the tallest building in the world?",
"How many years does an average Tesla battery last compared to a gasoline engine?",
"How many liters of water are required to produce 1 kg of beef?",
"How many times faster is the speed of light compared to the speed of sound?"
],
"inputBox": {
"deepThinking": "Deep Thinking",
"deepThinkingTooltip": {
"title": "Deep Thinking Mode: {status}",
"description": "When enabled, DeerFlow will use reasoning model ({model}) to generate more thoughtful plans."
},
"investigation": "Investigation",
"investigationTooltip": {
"title": "Investigation Mode: {status}",
"description": "When enabled, DeerFlow will perform a quick search before planning. This is useful for researches related to ongoing events and news."
},
"enhancePrompt": "Enhance prompt with AI",
"on": "On",
"off": "Off"
},
"research": {
"deepResearch": "Deep Research",
"researching": "Researching...",
"generatingReport": "Generating report...",
"reportGenerated": "Report generated",
"open": "Open",
"close": "Close",
"deepThinking": "Deep Thinking",
"report": "Report",
"activities": "Activities",
"generatePodcast": "Generate podcast",
"edit": "Edit",
"copy": "Copy",
"downloadReport": "Download report",
"downloadMarkdown": "Markdown (.md)",
"downloadHTML": "HTML (.html)",
"downloadPDF": "PDF (.pdf)",
"downloadWord": "Word (.docx)",
"downloadImage": "Image (.png)",
"exportFailed": "Export failed, please try again",
"evaluateReport": "Evaluate report quality",
"searchingFor": "Searching for",
"reading": "Reading",
"runningPythonCode": "Running Python code",
"errorExecutingCode": "Error when executing the above code",
"executionOutput": "Execution output",
"retrievingDocuments": "Retrieving documents from RAG",
"running": "Running",
"generatingPodcast": "Generating podcast...",
"nowPlayingPodcast": "Now playing podcast...",
"podcast": "Podcast",
"errorGeneratingPodcast": "Error when generating podcast. Please try again.",
"downloadPodcast": "Download podcast"
},
"evaluation": {
"title": "Report Quality Evaluation",
"description": "Evaluate your report using automated metrics and AI analysis.",
"evaluating": "Evaluating report...",
"analyzing": "Running deep analysis...",
"overallScore": "Overall Score",
"metrics": "Report Metrics",
"wordCount": "Word Count",
"citations": "Citations",
"sources": "Unique Sources",
"images": "Images",
"sectionCoverage": "Section Coverage",
"detailedAnalysis": "Detailed Analysis",
"deepEvaluation": "Deep Evaluation (AI)",
"strengths": "Strengths",
"weaknesses": "Areas for Improvement",
"scores": {
"factual_accuracy": "Factual Accuracy",
"completeness": "Completeness",
"coherence": "Coherence",
"relevance": "Relevance",
"citation_quality": "Citation Quality",
"writing_quality": "Writing Quality"
}
},
"messages": {
"replaying": "Replaying",
"replayDescription": "DeerFlow is now replaying the conversation...",
"replayHasStopped": "The replay has been stopped.",
"replayModeDescription": "You're now in DeerFlow's replay mode. Click the \"Play\" button on the right to start.",
"play": "Play",
"fastForward": "Fast Forward",
"demoNotice": "* This site is for demo purposes only. If you want to try your own question, please",
"clickHere": "click here",
"cloneLocally": "to clone it locally and run it."
},
"multiAgent": {
"moveToPrevious": "Move to the previous step",
"playPause": "Play / Pause",
"moveToNext": "Move to the next step",
"toggleFullscreen": "Toggle fullscreen"
}
},
"landing": {
"caseStudies": {
"title": "Case Studies",
"description": "See DeerFlow in action through replays.",
"clickToWatch": "Click to watch replay",
"cases": [
{
"title": "How tall is Eiffel Tower compared to tallest building?",
"description": "The research compares the heights and global significance of the Eiffel Tower and Burj Khalifa, and uses Python code to calculate the multiples."
},
{
"title": "What are the top trending repositories on GitHub?",
"description": "The research utilized MCP services to identify the most popular GitHub repositories and documented them in detail using search engines."
},
{
"title": "Write an article about Nanjing's traditional dishes",
"description": "The study vividly showcases Nanjing's famous dishes through rich content and imagery, uncovering their hidden histories and cultural significance."
},
{
"title": "How to decorate a small rental apartment?",
"description": "The study provides readers with practical and straightforward methods for decorating apartments, accompanied by inspiring images."
},
{
"title": "Introduce the movie 'Léon: The Professional'",
"description": "The research provides a comprehensive introduction to the movie 'Léon: The Professional', including its plot, characters, and themes."
},
{
"title": "How do you view the takeaway war in China? (in Chinese)",
"description": "The research analyzes the intensifying competition between JD and Meituan, highlighting their strategies, technological innovations, and challenges."
},
{
"title": "Are ultra-processed foods linked to health?",
"description": "The research examines the health risks of rising ultra-processed food consumption, urging more research on long-term effects and individual differences."
},
{
"title": "Write an article on \"Would you insure your AI twin?\"",
"description": "The research explores the concept of insuring AI twins, highlighting their benefits, risks, ethical considerations, and the evolving regulatory."
}
]
},
"coreFeatures": {
"title": "Core Features",
"description": "Find out what makes DeerFlow effective.",
"features": [
{
"name": "Dive Deeper and Reach Wider",
"description": "Unlock deeper insights with advanced tools. Our powerful search + crawling and Python tools gathers comprehensive data, delivering in-depth reports to enhance your study."
},
{
"name": "Human-in-the-loop",
"description": "Refine your research plan, or adjust focus areas all through simple natural language."
},
{
"name": "Lang Stack",
"description": "Build with confidence using the LangChain and LangGraph frameworks."
},
{
"name": "MCP Integrations",
"description": "Supercharge your research workflow and expand your toolkit with seamless MCP integrations."
},
{
"name": "Podcast Generation",
"description": "Instantly generate podcasts from reports. Perfect for on-the-go learning or sharing findings effortlessly."
}
]
},
"multiAgent": {
"title": "Multi-Agent Architecture",
"description": "Experience the agent teamwork with our Supervisor + Handoffs design pattern."
},
"joinCommunity": {
"title": "Join the DeerFlow Community",
"description": "Contribute brilliant ideas to shape the future of DeerFlow. Collaborate, innovate, and make impacts.",
"contributeNow": "Contribute Now"
}
}
}