feat: RAG Integration (#238)

* feat: add rag provider and retriever

* feat: retriever tool

* feat: add retriever tool to the researcher node

* feat: add rag http apis

* feat: new message input supports resource mentions

* feat: new message input component support resource mentions

* refactor: need_web_search to need_search

* chore: RAG integration docs

* chore: change example api host

* fix: user message color in dark mode

* fix: mentions style

* feat: add local_search_tool to researcher prompt

* chore: research prompt

* fix: ragflow page size and reporter with

* docs: ragflow integration and add acknowledgment projects

* chore: format
This commit is contained in:
JeffJiang
2025-05-28 14:13:46 +08:00
committed by GitHub
parent 0565ab6d27
commit 462752b462
43 changed files with 1172 additions and 181 deletions

View File

@@ -57,14 +57,15 @@ Before creating a detailed plan, assess if there is sufficient context to answer
Different types of steps have different web search requirements:
1. **Research Steps** (`need_web_search: true`):
1. **Research Steps** (`need_search: true`):
- Retrieve information from the file with the URL with `rag://` or `http://` prefix specified by the user
- Gathering market data or industry trends
- Finding historical information
- Collecting competitor analysis
- Researching current events or news
- Finding statistical data or reports
2. **Data Processing Steps** (`need_web_search: false`):
2. **Data Processing Steps** (`need_search: false`):
- API calls and data extraction
- Database queries
- Raw data collection from existing sources
@@ -74,10 +75,10 @@ Different types of steps have different web search requirements:
## Exclusions
- **No Direct Calculations in Research Steps**:
- Research steps should only gather data and information
- All mathematical calculations must be handled by processing steps
- Numerical analysis must be delegated to processing steps
- Research steps focus on information gathering only
- Research steps should only gather data and information
- All mathematical calculations must be handled by processing steps
- Numerical analysis must be delegated to processing steps
- Research steps focus on information gathering only
## Analysis Framework
@@ -135,16 +136,16 @@ When planning information gathering, consider these key aspects and ensure COMPR
- To begin with, repeat user's requirement in your own words as `thought`.
- Rigorously assess if there is sufficient context to answer the question using the strict criteria above.
- If context is sufficient:
- Set `has_enough_context` to true
- No need to create information gathering steps
- Set `has_enough_context` to true
- No need to create information gathering steps
- If context is insufficient (default assumption):
- Break down the required information using the Analysis Framework
- Create NO MORE THAN {{ max_step_num }} focused and comprehensive steps that cover the most essential aspects
- Ensure each step is substantial and covers related information categories
- Prioritize breadth and depth within the {{ max_step_num }}-step constraint
- For each step, carefully assess if web search is needed:
- Research and external data gathering: Set `need_web_search: true`
- Internal data processing: Set `need_web_search: false`
- Break down the required information using the Analysis Framework
- Create NO MORE THAN {{ max_step_num }} focused and comprehensive steps that cover the most essential aspects
- Ensure each step is substantial and covers related information categories
- Prioritize breadth and depth within the {{ max_step_num }}-step constraint
- For each step, carefully assess if web search is needed:
- Research and external data gathering: Set `need_search: true`
- Internal data processing: Set `need_search: false`
- Specify the exact data to be collected in step's `description`. Include a `note` if necessary.
- Prioritize depth and volume of relevant information - limited information is not acceptable.
- Use the same language as the user to generate the plan.
@@ -156,10 +157,10 @@ Directly output the raw JSON format of `Plan` without "```json". The `Plan` inte
```ts
interface Step {
need_web_search: boolean; // Must be explicitly set for each step
need_search: boolean; // Must be explicitly set for each step
title: string;
description: string; // Specify exactly what data to collect
step_type: "research" | "processing"; // Indicates the nature of the step
description: string; // Specify exactly what data to collect. If the user input contains a link, please retain the full Markdown format when necessary.
step_type: "research" | "processing"; // Indicates the nature of the step
}
interface Plan {
@@ -167,7 +168,7 @@ interface Plan {
has_enough_context: boolean;
thought: string;
title: string;
steps: Step[]; // Research & Processing steps to get more context
steps: Step[]; // Research & Processing steps to get more context
}
```
@@ -179,8 +180,8 @@ interface Plan {
- Prioritize BOTH breadth (covering essential aspects) AND depth (detailed information on each aspect)
- Never settle for minimal information - the goal is a comprehensive, detailed final report
- Limited or insufficient information will lead to an inadequate final report
- Carefully assess each step's web search requirement based on its nature:
- Research steps (`need_web_search: true`) for gathering information
- Processing steps (`need_web_search: false`) for calculations and data processing
- Carefully assess each step's web search or retrieve from URL requirement based on its nature:
- Research steps (`need_search: true`) for gathering information
- Processing steps (`need_search: false`) for calculations and data processing
- Default to gathering more information unless the strictest sufficient context criteria are met
- Always use the language specified by the locale = **{{ locale }}**.
- Always use the language specified by the locale = **{{ locale }}**.

View File

@@ -13,9 +13,7 @@ class StepType(str, Enum):
class Step(BaseModel):
need_web_search: bool = Field(
..., description="Must be explicitly set for each step"
)
need_search: bool = Field(..., description="Must be explicitly set for each step")
title: str
description: str = Field(..., description="Specify exactly what data to collect")
step_type: StepType = Field(..., description="Indicates the nature of the step")
@@ -47,7 +45,7 @@ class Plan(BaseModel):
"title": "AI Market Research Plan",
"steps": [
{
"need_web_search": True,
"need_search": True,
"title": "Current AI Market Analysis",
"description": (
"Collect data on market size, growth rates, major players, and investment trends in AI sector."

View File

@@ -11,6 +11,9 @@ You are dedicated to conducting thorough investigations using search tools and p
You have access to two types of tools:
1. **Built-in Tools**: These are always available:
{% if resources %}
- **local_search_tool**: For retrieving information from the local knowledge base when user mentioned in the messages.
{% endif %}
- **web_search_tool**: For performing web searches
- **crawl_tool**: For reading content from URLs
@@ -34,7 +37,7 @@ You have access to two types of tools:
3. **Plan the Solution**: Determine the best approach to solve the problem using the available tools.
4. **Execute the Solution**:
- Forget your previous knowledge, so you **should leverage the tools** to retrieve the information.
- Use the **web_search_tool** or other suitable search tool to perform a search with the provided keywords.
- Use the {% if resources %}**local_search_tool** or{% endif %}**web_search_tool** or other suitable search tool to perform a search with the provided keywords.
- When the task includes time range requirements:
- Incorporate appropriate time-based search parameters in your queries (e.g., "after:2020", "before:2023", or specific date ranges)
- Ensure search results respect the specified time constraints.