feat: add podcast generation skill

- Add podcast-generation skill for creating tech explainer podcasts - Include generate.py script with TTS synthesis capabilities - Add tech-explainer template for structured podcast content - Increase sandbox command timeout from 30s to 600s to support longer-running skill scripts
2026-04-24 22:54:46 +08:00 · 2026-01-25 22:37:53 +08:00
parent 038f5d44f4
commit 3fa16467a2
5 changed files with 542 additions and 1 deletions
--- a/skills/public/podcast-generation/SKILL.md
+++ b/skills/public/podcast-generation/SKILL.md
@@ -0,0 +1,124 @@
+---
+name: podcast-generation
+description: Use this skill when the user requests to generate, create, or produce podcasts from text content. Converts written content into a two-host conversational podcast audio format with natural dialogue.
+---
+
+# Podcast Generation Skill
+
+## Overview
+
+This skill generates high-quality podcast audio from text content using a multi-stage pipeline. The workflow includes script generation (converting input to conversational dialogue), text-to-speech synthesis, and audio mixing to produce the final podcast.
+
+## Core Capabilities
+
+- Convert any text content (articles, reports, documentation) into podcast scripts
+- Generate natural two-host conversational dialogue (male and female hosts)
+- Synthesize speech audio using text-to-speech
+- Mix audio chunks into a final podcast MP3 file
+- Support both English and Chinese content
+
+## Workflow
+
+### Step 1: Understand Requirements
+
+When a user requests podcast generation, identify:
+
+- Source content: The text/article/report to convert into a podcast
+- Language: English or Chinese (auto-detected from content)
+- Output location: Where to save the generated podcast
+- You don't need to check the folder under `/mnt/user-data`
+
+### Step 2: Prepare Input Content
+
+The input content should be plain text or markdown. Save it to a text file in `/mnt/user-data/workspace/` with naming pattern: `{descriptive-name}-content.md`
+
+### Step 3: Execute Generation
+
+Call the Python script directly without any concerns about timeout or the need for pre-testing:
+
+```bash
+python /mnt/skills/public/podcast-generation/scripts/generate.py \
+  --input-file /mnt/user-data/workspace/content-file.md \
+  --output-file /mnt/user-data/outputs/generated-podcast.mp3 \
+  --locale en
+```
+
+Parameters:
+
+- `--input-file`: Absolute path to input text/markdown file (required)
+- `--output-file`: Absolute path to output MP3 file (required)
+- `--locale`: Language locale - "en" for English or "zh" for Chinese (optional, auto-detected if not specified)
+
+> [!IMPORTANT]
+> - Execute the script in one complete call. Do NOT split the workflow into separate steps (e.g., testing script generation first, then TTS).
+> - The script handles all external API calls and audio generation internally with proper timeout management.
+> - Do NOT read the Python file, just call it with the parameters.
+
+## Podcast Generation Example
+
+User request: "Generate a podcast about the history of artificial intelligence"
+
+Step 1: Create content file `/mnt/user-data/workspace/ai-history-content.md` with the source text:
+```markdown
+# The History of Artificial Intelligence
+
+Artificial intelligence has a rich history spanning over seven decades...
+
+## Early Beginnings (1950s)
+The term "artificial intelligence" was coined by John McCarthy in 1956...
+
+## The First AI Winter (1970s)
+After initial enthusiasm, AI research faced significant setbacks...
+
+## Modern Era (2010s-Present)
+Deep learning revolutionized the field with breakthrough results...
+```
+
+Step 2: Execute generation:
+```bash
+python /mnt/skills/public/podcast-generation/scripts/generate.py \
+  --input-file /mnt/user-data/workspace/ai-history-content.md \
+  --output-file /mnt/user-data/outputs/ai-history-podcast.mp3 \
+  --locale en
+```
+
+## Specific Templates
+
+Read the following template file only when matching the user request.
+
+- [Tech Explainer](templates/tech-explainer.md) - For converting technical documentation and tutorials
+
+## Output Format
+
+The generated podcast follows the "Hello Deer" format:
+- Two hosts: one male, one female
+- Natural conversational dialogue
+- Starts with "Hello Deer" greeting
+- Target duration: approximately 10 minutes
+- Alternating speakers for engaging flow
+
+## Output Handling
+
+After generation:
+
+- Podcasts are saved in `/mnt/user-data/outputs/`
+- Share generated podcast with user using `present_files` tool
+- Provide brief description of the generation result (topic, duration, hosts)
+- Offer to regenerate if adjustments needed
+
+## Requirements
+
+The following environment variables must be set:
+- `OPENAI_API_KEY` or equivalent LLM API key for script generation
+- `VOLCENGINE_TTS_APPID`: Volcengine TTS application ID
+- `VOLCENGINE_TTS_ACCESS_TOKEN`: Volcengine TTS access token
+- `VOLCENGINE_TTS_CLUSTER`: Volcengine TTS cluster (optional, defaults to "volcano_tts")
+
+## Notes
+
+- **Always execute the full pipeline in one call** - no need to test individual steps or worry about timeouts
+- Input content language is auto-detected and matched in output
+- The script generation uses LLM to create natural conversational dialogue
+- Technical content is automatically simplified for audio accessibility
+- Complex notations (formulas, code) are translated to plain language
+- Long content may result in longer podcasts