Files
deer-flow/skills/public/podcast-generation/SKILL.md
hetaoBackend 9f5658fa0e feat: add podcast generation skill
- Add podcast-generation skill for creating tech explainer podcasts
- Include generate.py script with TTS synthesis capabilities
- Add tech-explainer template for structured podcast content
- Increase sandbox command timeout from 30s to 600s to support
  longer-running skill scripts
2026-01-26 13:16:35 +08:00

4.7 KiB

name, description
name description
podcast-generation Use this skill when the user requests to generate, create, or produce podcasts from text content. Converts written content into a two-host conversational podcast audio format with natural dialogue.

Podcast Generation Skill

Overview

This skill generates high-quality podcast audio from text content using a multi-stage pipeline. The workflow includes script generation (converting input to conversational dialogue), text-to-speech synthesis, and audio mixing to produce the final podcast.

Core Capabilities

  • Convert any text content (articles, reports, documentation) into podcast scripts
  • Generate natural two-host conversational dialogue (male and female hosts)
  • Synthesize speech audio using text-to-speech
  • Mix audio chunks into a final podcast MP3 file
  • Support both English and Chinese content

Workflow

Step 1: Understand Requirements

When a user requests podcast generation, identify:

  • Source content: The text/article/report to convert into a podcast
  • Language: English or Chinese (auto-detected from content)
  • Output location: Where to save the generated podcast
  • You don't need to check the folder under /mnt/user-data

Step 2: Prepare Input Content

The input content should be plain text or markdown. Save it to a text file in /mnt/user-data/workspace/ with naming pattern: {descriptive-name}-content.md

Step 3: Execute Generation

Call the Python script directly without any concerns about timeout or the need for pre-testing:

python /mnt/skills/public/podcast-generation/scripts/generate.py \
  --input-file /mnt/user-data/workspace/content-file.md \
  --output-file /mnt/user-data/outputs/generated-podcast.mp3 \
  --locale en

Parameters:

  • --input-file: Absolute path to input text/markdown file (required)
  • --output-file: Absolute path to output MP3 file (required)
  • --locale: Language locale - "en" for English or "zh" for Chinese (optional, auto-detected if not specified)

Important

  • Execute the script in one complete call. Do NOT split the workflow into separate steps (e.g., testing script generation first, then TTS).
  • The script handles all external API calls and audio generation internally with proper timeout management.
  • Do NOT read the Python file, just call it with the parameters.

Podcast Generation Example

User request: "Generate a podcast about the history of artificial intelligence"

Step 1: Create content file /mnt/user-data/workspace/ai-history-content.md with the source text:

# The History of Artificial Intelligence

Artificial intelligence has a rich history spanning over seven decades...

## Early Beginnings (1950s)
The term "artificial intelligence" was coined by John McCarthy in 1956...

## The First AI Winter (1970s)
After initial enthusiasm, AI research faced significant setbacks...

## Modern Era (2010s-Present)
Deep learning revolutionized the field with breakthrough results...

Step 2: Execute generation:

python /mnt/skills/public/podcast-generation/scripts/generate.py \
  --input-file /mnt/user-data/workspace/ai-history-content.md \
  --output-file /mnt/user-data/outputs/ai-history-podcast.mp3 \
  --locale en

Specific Templates

Read the following template file only when matching the user request.

  • Tech Explainer - For converting technical documentation and tutorials

Output Format

The generated podcast follows the "Hello Deer" format:

  • Two hosts: one male, one female
  • Natural conversational dialogue
  • Starts with "Hello Deer" greeting
  • Target duration: approximately 10 minutes
  • Alternating speakers for engaging flow

Output Handling

After generation:

  • Podcasts are saved in /mnt/user-data/outputs/
  • Share generated podcast with user using present_files tool
  • Provide brief description of the generation result (topic, duration, hosts)
  • Offer to regenerate if adjustments needed

Requirements

The following environment variables must be set:

  • OPENAI_API_KEY or equivalent LLM API key for script generation
  • VOLCENGINE_TTS_APPID: Volcengine TTS application ID
  • VOLCENGINE_TTS_ACCESS_TOKEN: Volcengine TTS access token
  • VOLCENGINE_TTS_CLUSTER: Volcengine TTS cluster (optional, defaults to "volcano_tts")

Notes

  • Always execute the full pipeline in one call - no need to test individual steps or worry about timeouts
  • Input content language is auto-detected and matched in output
  • The script generation uses LLM to create natural conversational dialogue
  • Technical content is automatically simplified for audio accessibility
  • Complex notations (formulas, code) are translated to plain language
  • Long content may result in longer podcasts