mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-27 15:54:48 +08:00
- Add podcast-generation skill for creating tech explainer podcasts - Include generate.py script with TTS synthesis capabilities - Add tech-explainer template for structured podcast content - Increase sandbox command timeout from 30s to 600s to support longer-running skill scripts
125 lines
4.7 KiB
Markdown
125 lines
4.7 KiB
Markdown
---
|
|
name: podcast-generation
|
|
description: Use this skill when the user requests to generate, create, or produce podcasts from text content. Converts written content into a two-host conversational podcast audio format with natural dialogue.
|
|
---
|
|
|
|
# Podcast Generation Skill
|
|
|
|
## Overview
|
|
|
|
This skill generates high-quality podcast audio from text content using a multi-stage pipeline. The workflow includes script generation (converting input to conversational dialogue), text-to-speech synthesis, and audio mixing to produce the final podcast.
|
|
|
|
## Core Capabilities
|
|
|
|
- Convert any text content (articles, reports, documentation) into podcast scripts
|
|
- Generate natural two-host conversational dialogue (male and female hosts)
|
|
- Synthesize speech audio using text-to-speech
|
|
- Mix audio chunks into a final podcast MP3 file
|
|
- Support both English and Chinese content
|
|
|
|
## Workflow
|
|
|
|
### Step 1: Understand Requirements
|
|
|
|
When a user requests podcast generation, identify:
|
|
|
|
- Source content: The text/article/report to convert into a podcast
|
|
- Language: English or Chinese (auto-detected from content)
|
|
- Output location: Where to save the generated podcast
|
|
- You don't need to check the folder under `/mnt/user-data`
|
|
|
|
### Step 2: Prepare Input Content
|
|
|
|
The input content should be plain text or markdown. Save it to a text file in `/mnt/user-data/workspace/` with naming pattern: `{descriptive-name}-content.md`
|
|
|
|
### Step 3: Execute Generation
|
|
|
|
Call the Python script directly without any concerns about timeout or the need for pre-testing:
|
|
|
|
```bash
|
|
python /mnt/skills/public/podcast-generation/scripts/generate.py \
|
|
--input-file /mnt/user-data/workspace/content-file.md \
|
|
--output-file /mnt/user-data/outputs/generated-podcast.mp3 \
|
|
--locale en
|
|
```
|
|
|
|
Parameters:
|
|
|
|
- `--input-file`: Absolute path to input text/markdown file (required)
|
|
- `--output-file`: Absolute path to output MP3 file (required)
|
|
- `--locale`: Language locale - "en" for English or "zh" for Chinese (optional, auto-detected if not specified)
|
|
|
|
> [!IMPORTANT]
|
|
> - Execute the script in one complete call. Do NOT split the workflow into separate steps (e.g., testing script generation first, then TTS).
|
|
> - The script handles all external API calls and audio generation internally with proper timeout management.
|
|
> - Do NOT read the Python file, just call it with the parameters.
|
|
|
|
## Podcast Generation Example
|
|
|
|
User request: "Generate a podcast about the history of artificial intelligence"
|
|
|
|
Step 1: Create content file `/mnt/user-data/workspace/ai-history-content.md` with the source text:
|
|
```markdown
|
|
# The History of Artificial Intelligence
|
|
|
|
Artificial intelligence has a rich history spanning over seven decades...
|
|
|
|
## Early Beginnings (1950s)
|
|
The term "artificial intelligence" was coined by John McCarthy in 1956...
|
|
|
|
## The First AI Winter (1970s)
|
|
After initial enthusiasm, AI research faced significant setbacks...
|
|
|
|
## Modern Era (2010s-Present)
|
|
Deep learning revolutionized the field with breakthrough results...
|
|
```
|
|
|
|
Step 2: Execute generation:
|
|
```bash
|
|
python /mnt/skills/public/podcast-generation/scripts/generate.py \
|
|
--input-file /mnt/user-data/workspace/ai-history-content.md \
|
|
--output-file /mnt/user-data/outputs/ai-history-podcast.mp3 \
|
|
--locale en
|
|
```
|
|
|
|
## Specific Templates
|
|
|
|
Read the following template file only when matching the user request.
|
|
|
|
- [Tech Explainer](templates/tech-explainer.md) - For converting technical documentation and tutorials
|
|
|
|
## Output Format
|
|
|
|
The generated podcast follows the "Hello Deer" format:
|
|
- Two hosts: one male, one female
|
|
- Natural conversational dialogue
|
|
- Starts with "Hello Deer" greeting
|
|
- Target duration: approximately 10 minutes
|
|
- Alternating speakers for engaging flow
|
|
|
|
## Output Handling
|
|
|
|
After generation:
|
|
|
|
- Podcasts are saved in `/mnt/user-data/outputs/`
|
|
- Share generated podcast with user using `present_files` tool
|
|
- Provide brief description of the generation result (topic, duration, hosts)
|
|
- Offer to regenerate if adjustments needed
|
|
|
|
## Requirements
|
|
|
|
The following environment variables must be set:
|
|
- `OPENAI_API_KEY` or equivalent LLM API key for script generation
|
|
- `VOLCENGINE_TTS_APPID`: Volcengine TTS application ID
|
|
- `VOLCENGINE_TTS_ACCESS_TOKEN`: Volcengine TTS access token
|
|
- `VOLCENGINE_TTS_CLUSTER`: Volcengine TTS cluster (optional, defaults to "volcano_tts")
|
|
|
|
## Notes
|
|
|
|
- **Always execute the full pipeline in one call** - no need to test individual steps or worry about timeouts
|
|
- Input content language is auto-detected and matched in output
|
|
- The script generation uses LLM to create natural conversational dialogue
|
|
- Technical content is automatically simplified for audio accessibility
|
|
- Complex notations (formulas, code) are translated to plain language
|
|
- Long content may result in longer podcasts
|