feat(eval): add report quality evaluation module and UI integration (#776)

* feat(eval): add report quality evaluation module Addresses issue #773 - How to evaluate generated report quality objectively. This module provides two evaluation approaches: 1. Automated metrics (no LLM required): - Citation count and source diversity - Word count compliance per report style - Section structure validation - Image inclusion tracking 2. LLM-as-Judge evaluation: - Factual accuracy scoring - Completeness assessment - Coherence evaluation - Relevance and citation quality checks The combined evaluator provides a final score (1-10) and letter grade (A+ to F). Files added: - src/eval/__init__.py - src/eval/metrics.py - src/eval/llm_judge.py - src/eval/evaluator.py - tests/unit/eval/test_metrics.py - tests/unit/eval/test_evaluator.py * feat(eval): integrate report evaluation with web UI This commit adds the web UI integration for the evaluation module: Backend: - Add EvaluateReportRequest/Response models in src/server/eval_request.py - Add /api/report/evaluate endpoint to src/server/app.py Frontend: - Add evaluateReport API function in web/src/core/api/evaluate.ts - Create EvaluationDialog component with grade badge, metrics display, and optional LLM deep evaluation - Add evaluation button (graduation cap icon) to research-block.tsx toolbar - Add i18n translations for English and Chinese The evaluation UI allows users to: 1. View quick metrics-only evaluation (instant) 2. Optionally run deep LLM-based evaluation for detailed analysis 3. See grade (A+ to F), score (1-10), and metric breakdown * feat(eval): improve evaluation reliability and add LLM judge tests - Extract MAX_REPORT_LENGTH constant in llm_judge.py for maintainability - Add comprehensive unit tests for LLMJudge class (parse_response, calculate_weighted_score, evaluate with mocked LLM) - Pass reportStyle prop to EvaluationDialog for accurate evaluation criteria - Add researchQueries store map to reliably associate queries with research - Add getResearchQuery helper to retrieve query by researchId - Remove unused imports in test_metrics.py * fix(eval): use resolveServiceURL for evaluate API endpoint The evaluateReport function was using a relative URL '/api/report/evaluate' which sent requests to the Next.js server instead of the FastAPI backend. Changed to use resolveServiceURL() consistent with other API functions. * fix: improve type accuracy and React hooks in evaluation components - Fix get_word_count_target return type from Optional[Dict] to Dict since it always returns a value via default fallback - Fix useEffect dependency issue in EvaluationDialog using useRef to prevent unwanted re-evaluations - Add aria-label to GradeBadge for screen reader accessibility
2026-04-03 06:12:14 +08:00 · 2025-12-25 21:55:48 +08:00
parent 84a7f7815c
commit 8d9d767051
17 changed files with 2103 additions and 2 deletions
--- a/tests/unit/eval/init.py
+++ b/tests/unit/eval/init.py
@@ -0,0 +1,2 @@
+# Copyright (c) 2025 Bytedance Ltd. and/or its affiliates
+# SPDX-License-Identifier: MIT
--- a/tests/unit/eval/test_evaluator.py
+++ b/tests/unit/eval/test_evaluator.py
@@ -0,0 +1,489 @@
+# Copyright (c) 2025 Bytedance Ltd. and/or its affiliates
+# SPDX-License-Identifier: MIT
+
+"""Unit tests for the combined report evaluator."""
+
+import json
+from unittest.mock import AsyncMock, MagicMock
+
+import pytest
+
+from src.eval.evaluator import CombinedEvaluation, ReportEvaluator, score_to_grade
+from src.eval.llm_judge import (
+    EVALUATION_CRITERIA,
+    MAX_REPORT_LENGTH,
+    EvaluationResult,
+    LLMJudge,
+)
+from src.eval.metrics import ReportMetrics
+
+
+class TestScoreToGrade:
+    """Tests for score to grade conversion."""
+
+    def test_excellent_scores(self):
+        assert score_to_grade(9.5) == "A+"
+        assert score_to_grade(9.0) == "A+"
+        assert score_to_grade(8.7) == "A"
+        assert score_to_grade(8.5) == "A"
+        assert score_to_grade(8.2) == "A-"
+
+    def test_good_scores(self):
+        assert score_to_grade(7.8) == "B+"
+        assert score_to_grade(7.5) == "B+"
+        assert score_to_grade(7.2) == "B"
+        assert score_to_grade(7.0) == "B"
+        assert score_to_grade(6.7) == "B-"
+
+    def test_average_scores(self):
+        assert score_to_grade(6.2) == "C+"
+        assert score_to_grade(5.8) == "C"
+        assert score_to_grade(5.5) == "C"
+        assert score_to_grade(5.2) == "C-"
+
+    def test_poor_scores(self):
+        assert score_to_grade(4.5) == "D"
+        assert score_to_grade(4.0) == "D"
+        assert score_to_grade(3.0) == "F"
+        assert score_to_grade(1.0) == "F"
+
+
+class TestReportEvaluator:
+    """Tests for ReportEvaluator class."""
+
+    @pytest.fixture
+    def evaluator(self):
+        """Create evaluator without LLM for metrics-only tests."""
+        return ReportEvaluator(use_llm=False)
+
+    @pytest.fixture
+    def sample_report(self):
+        """Sample report for testing."""
+        return """
+# Comprehensive Research Report
+
+## Key Points
+- Important finding number one with significant implications
+- Critical discovery that changes our understanding
+- Key insight that provides actionable recommendations
+- Notable observation from the research data
+
+## Overview
+This report presents a comprehensive analysis of the research topic.
+The findings are based on extensive data collection and analysis.
+
+## Detailed Analysis
+
+### Section 1: Background
+The background of this research involves multiple factors.
+[Source 1](https://example.com/source1) provides foundational context.
+
+### Section 2: Methodology
+Our methodology follows established research practices.
+[Source 2](https://research.org/methods) outlines the approach.
+
+### Section 3: Findings
+The key findings include several important discoveries.
+![Research Data](https://example.com/chart.png)
+
+[Source 3](https://academic.edu/paper) supports these conclusions.
+
+## Key Citations
+- [Example Source](https://example.com/source1)
+- [Research Methods](https://research.org/methods)
+- [Academic Paper](https://academic.edu/paper)
+- [Additional Reference](https://reference.com/doc)
+        """
+
+    def test_evaluate_metrics_only(self, evaluator, sample_report):
+        """Test metrics-only evaluation."""
+        result = evaluator.evaluate_metrics_only(sample_report)
+
+        assert "metrics" in result
+        assert "score" in result
+        assert "grade" in result
+        assert result["score"] > 0
+        assert result["grade"] in ["A+", "A", "A-", "B+", "B", "B-", "C+", "C", "C-", "D", "F"]
+
+    def test_evaluate_metrics_only_structure(self, evaluator, sample_report):
+        """Test that metrics contain expected fields."""
+        result = evaluator.evaluate_metrics_only(sample_report)
+        metrics = result["metrics"]
+
+        assert "word_count" in metrics
+        assert "citation_count" in metrics
+        assert "unique_sources" in metrics
+        assert "image_count" in metrics
+        assert "section_coverage_score" in metrics
+
+    def test_evaluate_minimal_report(self, evaluator):
+        """Test evaluation of minimal report."""
+        minimal_report = "Just some text."
+        result = evaluator.evaluate_metrics_only(minimal_report)
+
+        assert result["score"] < 5.0
+        assert result["grade"] in ["D", "F"]
+
+    def test_metrics_score_calculation(self, evaluator):
+        """Test that metrics score is calculated correctly."""
+        good_report = """
+# Title
+
+## Key Points
+- Point 1
+- Point 2
+
+## Overview
+Overview content here.
+
+## Detailed Analysis
+Analysis with [cite](https://a.com) and [cite2](https://b.com) 
+and [cite3](https://c.com) and more [refs](https://d.com).
+
+![Image](https://img.com/1.png)
+
+## Key Citations
+- [A](https://a.com)
+- [B](https://b.com)
+        """
+        result = evaluator.evaluate_metrics_only(good_report)
+        assert result["score"] > 5.0
+
+    def test_combined_evaluation_to_dict(self):
+        """Test CombinedEvaluation to_dict method."""
+        metrics = ReportMetrics(
+            word_count=1000,
+            citation_count=5,
+            unique_sources=3,
+        )
+        evaluation = CombinedEvaluation(
+            metrics=metrics,
+            llm_evaluation=None,
+            final_score=7.5,
+            grade="B+",
+            summary="Test summary",
+        )
+
+        result = evaluation.to_dict()
+        assert result["final_score"] == 7.5
+        assert result["grade"] == "B+"
+        assert result["metrics"]["word_count"] == 1000
+
+
+class TestReportEvaluatorIntegration:
+    """Integration tests for evaluator (may require LLM)."""
+
+    @pytest.mark.asyncio
+    async def test_full_evaluation_without_llm(self):
+        """Test full evaluation with LLM disabled."""
+        evaluator = ReportEvaluator(use_llm=False)
+
+        report = """
+# Test Report
+
+## Key Points
+- Key point 1
+
+## Overview
+Test overview.
+
+## Key Citations
+- [Test](https://test.com)
+        """
+
+        result = await evaluator.evaluate(report, "test query")
+
+        assert isinstance(result, CombinedEvaluation)
+        assert result.final_score > 0
+        assert result.grade is not None
+        assert result.summary is not None
+        assert result.llm_evaluation is None
+
+
+class TestLLMJudgeParseResponse:
+    """Tests for LLMJudge._parse_response method."""
+
+    @pytest.fixture
+    def judge(self):
+        """Create LLMJudge with mock LLM."""
+        return LLMJudge(llm=MagicMock())
+
+    @pytest.fixture
+    def valid_response_data(self):
+        """Valid evaluation response data."""
+        return {
+            "scores": {
+                "factual_accuracy": 8,
+                "completeness": 7,
+                "coherence": 9,
+                "relevance": 8,
+                "citation_quality": 6,
+                "writing_quality": 8,
+            },
+            "overall_score": 8,
+            "strengths": ["Well researched", "Clear structure"],
+            "weaknesses": ["Could use more citations"],
+            "suggestions": ["Add more sources"],
+        }
+
+    def test_parse_valid_json(self, judge, valid_response_data):
+        """Test parsing valid JSON response."""
+        response = json.dumps(valid_response_data)
+        result = judge._parse_response(response)
+
+        assert result["scores"]["factual_accuracy"] == 8
+        assert result["overall_score"] == 8
+        assert "Well researched" in result["strengths"]
+
+    def test_parse_json_in_markdown_block(self, judge, valid_response_data):
+        """Test parsing JSON wrapped in markdown code block."""
+        response = f"```json\n{json.dumps(valid_response_data)}\n```"
+        result = judge._parse_response(response)
+
+        assert result["scores"]["coherence"] == 9
+        assert result["overall_score"] == 8
+
+    def test_parse_json_in_generic_code_block(self, judge, valid_response_data):
+        """Test parsing JSON in generic code block."""
+        response = f"```\n{json.dumps(valid_response_data)}\n```"
+        result = judge._parse_response(response)
+
+        assert result["scores"]["relevance"] == 8
+
+    def test_parse_malformed_json_returns_defaults(self, judge):
+        """Test that malformed JSON returns default scores."""
+        response = "This is not valid JSON at all"
+        result = judge._parse_response(response)
+
+        assert result["scores"]["factual_accuracy"] == 5
+        assert result["scores"]["completeness"] == 5
+        assert result["overall_score"] == 5
+        assert "Unable to parse evaluation" in result["strengths"]
+        assert "Evaluation parsing failed" in result["weaknesses"]
+
+    def test_parse_incomplete_json(self, judge):
+        """Test parsing incomplete JSON."""
+        response = '{"scores": {"factual_accuracy": 8}'  # Missing closing braces
+        result = judge._parse_response(response)
+
+        # Should return defaults due to parse failure
+        assert result["overall_score"] == 5
+
+    def test_parse_json_with_extra_text(self, judge, valid_response_data):
+        """Test parsing JSON with surrounding text."""
+        response = f"Here is my evaluation:\n```json\n{json.dumps(valid_response_data)}\n```\nHope this helps!"
+        result = judge._parse_response(response)
+
+        assert result["scores"]["factual_accuracy"] == 8
+
+
+class TestLLMJudgeCalculateWeightedScore:
+    """Tests for LLMJudge._calculate_weighted_score method."""
+
+    @pytest.fixture
+    def judge(self):
+        """Create LLMJudge with mock LLM."""
+        return LLMJudge(llm=MagicMock())
+
+    def test_calculate_with_all_scores(self, judge):
+        """Test weighted score calculation with all criteria."""
+        scores = {
+            "factual_accuracy": 10,  # weight 0.25
+            "completeness": 10,  # weight 0.20
+            "coherence": 10,  # weight 0.20
+            "relevance": 10,  # weight 0.15
+            "citation_quality": 10,  # weight 0.10
+            "writing_quality": 10,  # weight 0.10
+        }
+        result = judge._calculate_weighted_score(scores)
+        assert result == 10.0
+
+    def test_calculate_with_varied_scores(self, judge):
+        """Test weighted score with varied scores."""
+        scores = {
+            "factual_accuracy": 8,  # 8 * 0.25 = 2.0
+            "completeness": 6,  # 6 * 0.20 = 1.2
+            "coherence": 7,  # 7 * 0.20 = 1.4
+            "relevance": 9,  # 9 * 0.15 = 1.35
+            "citation_quality": 5,  # 5 * 0.10 = 0.5
+            "writing_quality": 8,  # 8 * 0.10 = 0.8
+        }
+        # Total: 7.25
+        result = judge._calculate_weighted_score(scores)
+        assert result == 7.25
+
+    def test_calculate_with_partial_scores(self, judge):
+        """Test weighted score with only some criteria."""
+        scores = {
+            "factual_accuracy": 8,  # weight 0.25
+            "completeness": 6,  # weight 0.20
+        }
+        # (8 * 0.25 + 6 * 0.20) / (0.25 + 0.20) = 3.2 / 0.45 = 7.11
+        result = judge._calculate_weighted_score(scores)
+        assert abs(result - 7.11) < 0.01
+
+    def test_calculate_with_unknown_criteria(self, judge):
+        """Test that unknown criteria are ignored."""
+        scores = {
+            "factual_accuracy": 10,
+            "unknown_criterion": 1,  # Should be ignored
+        }
+        result = judge._calculate_weighted_score(scores)
+        assert result == 10.0
+
+    def test_calculate_with_empty_scores(self, judge):
+        """Test with empty scores dict."""
+        result = judge._calculate_weighted_score({})
+        assert result == 0.0
+
+    def test_weights_sum_to_one(self):
+        """Verify that all criteria weights sum to 1.0."""
+        total_weight = sum(c["weight"] for c in EVALUATION_CRITERIA.values())
+        assert abs(total_weight - 1.0) < 0.001
+
+
+class TestLLMJudgeEvaluate:
+    """Tests for LLMJudge.evaluate method with mocked LLM."""
+
+    @pytest.fixture
+    def valid_llm_response(self):
+        """Create a valid LLM response."""
+        return json.dumps(
+            {
+                "scores": {
+                    "factual_accuracy": 8,
+                    "completeness": 7,
+                    "coherence": 9,
+                    "relevance": 8,
+                    "citation_quality": 7,
+                    "writing_quality": 8,
+                },
+                "overall_score": 8,
+                "strengths": ["Comprehensive coverage", "Well structured"],
+                "weaknesses": ["Some claims need more support"],
+                "suggestions": ["Add more academic sources"],
+            }
+        )
+
+    @pytest.mark.asyncio
+    async def test_successful_evaluation(self, valid_llm_response):
+        """Test successful LLM evaluation."""
+        mock_llm = AsyncMock()
+        mock_response = MagicMock()
+        mock_response.content = valid_llm_response
+        mock_llm.ainvoke.return_value = mock_response
+
+        judge = LLMJudge(llm=mock_llm)
+        result = await judge.evaluate("Test report", "Test query")
+
+        assert isinstance(result, EvaluationResult)
+        assert result.scores["factual_accuracy"] == 8
+        assert result.overall_score == 8
+        assert result.weighted_score > 0
+        assert "Comprehensive coverage" in result.strengths
+        assert result.raw_response == valid_llm_response
+
+    @pytest.mark.asyncio
+    async def test_evaluation_with_llm_failure(self):
+        """Test that LLM failures are handled gracefully."""
+        mock_llm = AsyncMock()
+        mock_llm.ainvoke.side_effect = Exception("LLM service unavailable")
+
+        judge = LLMJudge(llm=mock_llm)
+        result = await judge.evaluate("Test report", "Test query")
+
+        assert isinstance(result, EvaluationResult)
+        assert result.overall_score == 0
+        assert result.weighted_score == 0
+        assert all(score == 0 for score in result.scores.values())
+        assert any("failed" in w.lower() for w in result.weaknesses)
+
+    @pytest.mark.asyncio
+    async def test_evaluation_with_malformed_response(self):
+        """Test handling of malformed LLM response."""
+        mock_llm = AsyncMock()
+        mock_response = MagicMock()
+        mock_response.content = "I cannot evaluate this report properly."
+        mock_llm.ainvoke.return_value = mock_response
+
+        judge = LLMJudge(llm=mock_llm)
+        result = await judge.evaluate("Test report", "Test query")
+
+        # Should return default scores
+        assert result.scores["factual_accuracy"] == 5
+        assert result.overall_score == 5
+
+    @pytest.mark.asyncio
+    async def test_evaluation_passes_report_style(self):
+        """Test that report_style is passed to LLM."""
+        mock_llm = AsyncMock()
+        mock_response = MagicMock()
+        mock_response.content = json.dumps(
+            {
+                "scores": {k: 7 for k in EVALUATION_CRITERIA.keys()},
+                "overall_score": 7,
+                "strengths": [],
+                "weaknesses": [],
+                "suggestions": [],
+            }
+        )
+        mock_llm.ainvoke.return_value = mock_response
+
+        judge = LLMJudge(llm=mock_llm)
+        await judge.evaluate("Test report", "Test query", report_style="academic")
+
+        # Verify the prompt contains the report style
+        call_args = mock_llm.ainvoke.call_args
+        messages = call_args[0][0]
+        user_message_content = messages[1].content
+        assert "academic" in user_message_content
+
+    @pytest.mark.asyncio
+    async def test_evaluation_truncates_long_reports(self):
+        """Test that very long reports are truncated."""
+        mock_llm = AsyncMock()
+        mock_response = MagicMock()
+        mock_response.content = json.dumps(
+            {
+                "scores": {k: 7 for k in EVALUATION_CRITERIA.keys()},
+                "overall_score": 7,
+                "strengths": [],
+                "weaknesses": [],
+                "suggestions": [],
+            }
+        )
+        mock_llm.ainvoke.return_value = mock_response
+
+        judge = LLMJudge(llm=mock_llm)
+        long_report = "x" * (MAX_REPORT_LENGTH + 5000)
+        await judge.evaluate(long_report, "Test query")
+
+        call_args = mock_llm.ainvoke.call_args
+        messages = call_args[0][0]
+        user_message_content = messages[1].content
+        # The report content in the message should be truncated to MAX_REPORT_LENGTH
+        assert len(user_message_content) < len(long_report) + 500
+
+
+class TestEvaluationResult:
+    """Tests for EvaluationResult dataclass."""
+
+    def test_to_dict(self):
+        """Test EvaluationResult.to_dict method."""
+        result = EvaluationResult(
+            scores={"factual_accuracy": 8, "completeness": 7},
+            overall_score=7.5,
+            weighted_score=7.6,
+            strengths=["Good research"],
+            weaknesses=["Needs more detail"],
+            suggestions=["Expand section 2"],
+            raw_response="test response",
+        )
+
+        d = result.to_dict()
+        assert d["scores"]["factual_accuracy"] == 8
+        assert d["overall_score"] == 7.5
+        assert d["weighted_score"] == 7.6
+        assert "Good research" in d["strengths"]
+        # raw_response should not be in dict
+        assert "raw_response" not in d
--- a/tests/unit/eval/test_metrics.py
+++ b/tests/unit/eval/test_metrics.py
@@ -0,0 +1,207 @@
+# Copyright (c) 2025 Bytedance Ltd. and/or its affiliates
+# SPDX-License-Identifier: MIT
+
+"""Unit tests for report evaluation metrics."""
+
+from src.eval.metrics import (
+    compute_metrics,
+    count_citations,
+    count_images,
+    count_words,
+    detect_sections,
+    extract_domains,
+    get_word_count_target,
+)
+
+
+class TestCountWords:
+    """Tests for word counting function."""
+
+    def test_english_words(self):
+        text = "This is a simple test sentence."
+        assert count_words(text) == 6
+
+    def test_chinese_characters(self):
+        text = "这是一个测试"
+        assert count_words(text) == 6
+
+    def test_mixed_content(self):
+        text = "Hello 你好 World 世界"
+        assert count_words(text) == 4 + 2  # 2 English + 4 Chinese
+
+    def test_empty_string(self):
+        assert count_words("") == 0
+
+
+class TestCountCitations:
+    """Tests for citation counting function."""
+
+    def test_markdown_citations(self):
+        text = """
+        Check out [Google](https://google.com) and [GitHub](https://github.com).
+        """
+        assert count_citations(text) == 2
+
+    def test_no_citations(self):
+        text = "This is plain text without any links."
+        assert count_citations(text) == 0
+
+    def test_invalid_urls(self):
+        text = "[Link](not-a-url) [Another](ftp://ftp.example.com)"
+        assert count_citations(text) == 0
+
+    def test_complex_urls(self):
+        text = "[Article](https://example.com/path/to/article?id=123&ref=test)"
+        assert count_citations(text) == 1
+
+
+class TestExtractDomains:
+    """Tests for domain extraction function."""
+
+    def test_extract_multiple_domains(self):
+        text = """
+        https://google.com/search
+        https://www.github.com/user/repo
+        https://docs.python.org/3/
+        """
+        domains = extract_domains(text)
+        assert len(domains) == 3
+        assert "google.com" in domains
+        assert "github.com" in domains
+        assert "docs.python.org" in domains
+
+    def test_deduplicate_domains(self):
+        text = """
+        https://example.com/page1
+        https://example.com/page2
+        https://www.example.com/page3
+        """
+        domains = extract_domains(text)
+        assert len(domains) == 1
+        assert "example.com" in domains
+
+    def test_no_urls(self):
+        text = "Plain text without URLs"
+        assert extract_domains(text) == []
+
+
+class TestCountImages:
+    """Tests for image counting function."""
+
+    def test_markdown_images(self):
+        text = """
+        ![Alt text](https://example.com/image1.png)
+        ![](https://example.com/image2.jpg)
+        """
+        assert count_images(text) == 2
+
+    def test_no_images(self):
+        text = "Text without images [link](url)"
+        assert count_images(text) == 0
+
+
+class TestDetectSections:
+    """Tests for section detection function."""
+
+    def test_detect_title(self):
+        text = "# My Report Title\n\nSome content here."
+        sections = detect_sections(text)
+        assert sections.get("title") is True
+
+    def test_detect_key_points(self):
+        text = "## Key Points\n- Point 1\n- Point 2"
+        sections = detect_sections(text)
+        assert sections.get("key_points") is True
+
+    def test_detect_chinese_sections(self):
+        text = """# 报告标题
+## 要点
+- 要点1
+## 概述
+这是概述内容
+        """
+        sections = detect_sections(text)
+        assert sections.get("title") is True
+        assert sections.get("key_points") is True
+        assert sections.get("overview") is True
+
+    def test_detect_citations_section(self):
+        text = """
+        ## Key Citations
+        - [Source 1](https://example.com)
+        """
+        sections = detect_sections(text)
+        assert sections.get("key_citations") is True
+
+
+class TestComputeMetrics:
+    """Tests for the main compute_metrics function."""
+
+    def test_complete_report(self):
+        report = """
+# Research Report Title
+
+## Key Points
+- Point 1
+- Point 2
+- Point 3
+
+## Overview
+This is an overview of the research topic.
+
+## Detailed Analysis
+Here is the detailed analysis with [source](https://example.com).
+
+![Figure 1](https://example.com/image.png)
+
+## Key Citations
+- [Source 1](https://example.com)
+- [Source 2](https://another.com)
+        """
+        metrics = compute_metrics(report)
+
+        assert metrics.has_title is True
+        assert metrics.has_key_points is True
+        assert metrics.has_overview is True
+        assert metrics.has_citations_section is True
+        assert metrics.citation_count >= 2
+        assert metrics.image_count == 1
+        assert metrics.unique_sources >= 1
+        assert metrics.section_coverage_score > 0.5
+
+    def test_minimal_report(self):
+        report = "Just some text without structure."
+        metrics = compute_metrics(report)
+
+        assert metrics.has_title is False
+        assert metrics.citation_count == 0
+        assert metrics.section_coverage_score < 0.5
+
+    def test_metrics_to_dict(self):
+        report = "# Title\n\nSome content"
+        metrics = compute_metrics(report)
+        result = metrics.to_dict()
+
+        assert isinstance(result, dict)
+        assert "word_count" in result
+        assert "citation_count" in result
+        assert "section_coverage_score" in result
+
+
+class TestGetWordCountTarget:
+    """Tests for word count target function."""
+
+    def test_strategic_investment_target(self):
+        target = get_word_count_target("strategic_investment")
+        assert target["min"] == 10000
+        assert target["max"] == 15000
+
+    def test_news_target(self):
+        target = get_word_count_target("news")
+        assert target["min"] == 800
+        assert target["max"] == 2000
+
+    def test_default_target(self):
+        target = get_word_count_target("unknown_style")
+        assert target["min"] == 1000
+        assert target["max"] == 5000