TubeScribe 🎬

Turn any YouTube video into a polished document + audio summary.

Drop a YouTube link → get a beautiful transcript with speaker labels, key quotes, timestamps that link back to the video, and an audio summary you can listen to on the go.

💸 Free & No Paid APIs

No subscriptions or API keys — works out of the box
Local processing — transcription, speaker detection, and TTS run on your machine
Network access — fetching from YouTube (captions, metadata, comments) requires internet
No data uploaded — nothing is sent to external services; all processing stays on your machine
Safe sub-agent — spawned sub-agent has strict instructions: no software installation, no network calls beyond YouTube

✨ Features

📄 Transcript with summary and key quotes — Export as DOCX, HTML, or Markdown
🎯 Smart Speaker Detection — Automatically identifies participants
🔊 Audio Summaries — Listen to key points (MP3/WAV)
📝 Clickable Timestamps — Every quote links directly to that moment in the video
💬 YouTube Comments — Viewer sentiment analysis and best comments
📋 Queue Support — Send multiple links, they get processed in order
🚀 Non-Blocking Workflow — Conversation continues while video processes in background

🎬 Works With Any Video

Interviews & podcasts (multi-speaker detection)
Lectures & tutorials (single speaker)
Music videos (lyrics extraction)
News & documentaries
Any YouTube content with captions

Quick Start

When user sends a YouTube URL:

Spawn sub-agent with the full pipeline task immediately
Reply: "🎬 TubeScribe is processing — I'll let you know when it's ready!"
Continue conversation (don't wait!)
Sub-agent notification will announce completion with title and details

DO NOT BLOCK — spawn and move on instantly.

First-Time Setup

Run setup to check dependencies and configure defaults:

python skills/tubescribe/scripts/setup.py

This checks:

summarize

CLI,

pandoc

ffmpeg

Kokoro TTS

Full Workflow (Single Sub-Agent)

Spawn ONE sub-agent that does the entire pipeline:

sessions_spawn(
    task=f"""
## TubeScribe: Process {youtube_url}
⚠️ CRITICAL: Do NOT install any software.
No pip, brew, curl, venv, or binary downloads.
If a tool is missing, STOP and report what's needed.
Run the COMPLETE pipeline — do not stop until all steps are done.
Step 1: Extract
python3 skills/tubescribe/scripts/tubescribe.py &quot;{youtube_url}&quot;
</code></pre>
<p>Note the <strong>Source</strong> and <strong>Output</strong> paths printed by the script. Use those exact paths in subsequent steps.</p>
<h3>Step 2: Read source JSON</h3>
<p>Read the Source path from Step 1 output and note:</p>
<ul>
<li>metadata.title (for filename)</li>
<li>metadata.video_id</li>
<li>metadata.channel, upload_date, duration_string</li>
</ul>
<h3>Step 3: Create formatted markdown</h3>
<p>Write to the Output path from Step 1:</p>
<ol>
<li><code># **&lt;title&gt;**</code></li>
</ol>
<hr/>
<ol start="2">
<li>Video info block — Channel, Date, Duration, URL (clickable). Empty line between each field.</li>
</ol>
<hr/>
<ol start="3">
<li><code>## **Participants**</code> — table with bold headers:<!-- -->
<pre><code>| **Name** | **Role** | **Description** |
|----------|----------|-----------------|
</code></pre>
</li>
</ol>
<hr/>
<ol start="4">
<li><code>## **Summary**</code> — 3-5 paragraphs of prose</li>
</ol>
<hr/>
<ol start="5">
<li><code>## **Key Quotes**</code> — 5 best with clickable YouTube timestamps. Format each as:<!-- -->
<pre><code>&quot;Quote text here.&quot; - [12:34](https://www.youtube.com/watch?v=ID&amp;t=754s)

&quot;Another quote.&quot; - [25:10](https://www.youtube.com/watch?v=ID&amp;t=1510s)
</code></pre>
<!-- -->Use regular dash <code>-</code>, NOT em dash <code>—</code>. Do NOT use blockquotes <code>&gt;</code>. Plain paragraphs only.</li>
</ol>
<hr/>
<ol start="6">
<li><code>## **Viewer Sentiment**</code> (if comments exist)</li>
</ol>
<hr/>
<ol start="7">
<li><code>## **Best Comments**</code> (if comments exist) — Top 5, NO lines between them:<!-- -->
<pre><code>Comment text here.

*- ▲ 123 @AuthorName*

Next comment text here.

*- ▲ 45 @AnotherAuthor*
</code></pre>
<!-- -->Attribution line: dash + italic. Just blank line between comments, NO <code>---</code> separators.</li>
</ol>
<hr/>
<ol start="8">
<li><code>## **Full Transcript**</code> — merge segments, speaker labels, clickable timestamps</li>
</ol>
<h3>Step 4: Create DOCX</h3>
<p>Clean the title for filename (remove special chars), then:</p>
<pre><code class="language-bash">pandoc &lt;output_path&gt; -o ~/Documents/TubeScribe/&lt;safe_title&gt;.docx
</code></pre>
<h3>Step 5: Generate audio</h3>
<p>Write the summary text to a temp file, then use TubeScribe&#x27;s built-in audio generation:</p>
<pre><code class="language-bash"># Write summary to temp file (use python3 to write, avoids shell escaping issues)
python3 -c &quot;
text = &#x27;&#x27;&#x27;YOUR SUMMARY TEXT HERE&#x27;&#x27;&#x27;
with open(&#x27;&lt;temp_dir&gt;/tubescribe_&lt;video_id&gt;_summary.txt&#x27;, &#x27;w&#x27;) as f:
    f.write(text)
&quot;

# Generate audio (auto-detects engine, voice, format from config)
python3 skills/tubescribe/scripts/tubescribe.py \
  --generate-audio &lt;temp_dir&gt;/tubescribe_&lt;video_id&gt;_summary.txt \
  --audio-output ~/Documents/TubeScribe/&lt;safe_title&gt;_summary
</code></pre>
<p>This reads <code>~/.tubescribe/config.json</code> and uses the configured TTS engine (mlx/kokoro/builtin), voice blend, and speed automatically. Output format (mp3/wav) comes from config.</p>
<h3>Step 6: Cleanup</h3>
<pre><code class="language-bash">python3 skills/tubescribe/scripts/tubescribe.py --cleanup &lt;video_id&gt;
</code></pre>
<h3>Step 7: Open folder</h3>
<pre><code class="language-bash">open ~/Documents/TubeScribe/
</code></pre>
<h3>Report</h3>
<p>Tell what was created: DOCX name, MP3 name + duration, video stats.
&quot;&quot;&quot;,
label=&quot;tubescribe&quot;,
runTimeoutSeconds=900,
cleanup=&quot;delete&quot;
)</p>
<pre><code>
**After spawning, reply immediately:**
&gt; 🎬 TubeScribe is processing - I&#x27;ll let you know when it&#x27;s ready!
Then continue the conversation. The sub-agent notification announces completion.

## Configuration

Config file: `~/.tubescribe/config.json`

```json
{
  &quot;output&quot;: {
    &quot;folder&quot;: &quot;~/Documents/TubeScribe&quot;,
    &quot;open_folder_after&quot;: true,
    &quot;open_document_after&quot;: false,
    &quot;open_audio_after&quot;: false
  },
  &quot;document&quot;: {
    &quot;format&quot;: &quot;docx&quot;,
    &quot;engine&quot;: &quot;pandoc&quot;
  },
  &quot;audio&quot;: {
    &quot;enabled&quot;: true,
    &quot;format&quot;: &quot;mp3&quot;,
    &quot;tts_engine&quot;: &quot;mlx&quot;
  },
  &quot;mlx_audio&quot;: {
    &quot;path&quot;: &quot;~/.openclaw/tools/mlx-audio&quot;,
    &quot;model&quot;: &quot;mlx-community/Kokoro-82M-bf16&quot;,
    &quot;voice&quot;: &quot;af_heart&quot;,
    &quot;lang_code&quot;: &quot;a&quot;,
    &quot;speed&quot;: 1.05
  },
  &quot;kokoro&quot;: {
    &quot;path&quot;: &quot;~/.openclaw/tools/kokoro&quot;,
    &quot;voice_blend&quot;: { &quot;af_heart&quot;: 0.6, &quot;af_sky&quot;: 0.4 },
    &quot;speed&quot;: 1.05
  },
  &quot;processing&quot;: {
    &quot;subagent_timeout&quot;: 600,
    &quot;cleanup_temp_files&quot;: true
  }
}
</code></pre>
<h3>Output Options</h3>
<table><thead><tr><th>Option</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><code>output.folder</code></td><td><code>~/Documents/TubeScribe</code></td><td>Where to save files</td></tr><tr><td><code>output.open_folder_after</code></td><td><code>true</code></td><td>Open output folder when done</td></tr><tr><td><code>output.open_document_after</code></td><td><code>false</code></td><td>Auto-open generated document</td></tr><tr><td><code>output.open_audio_after</code></td><td><code>false</code></td><td>Auto-open generated audio summary</td></tr></tbody></table>
<h3>Document Options</h3>
<table><thead><tr><th>Option</th><th>Default</th><th>Values</th><th>Description</th></tr></thead><tbody><tr><td><code>document.format</code></td><td><code>docx</code></td><td><code>docx</code>, <code>html</code>, <code>md</code></td><td>Output format</td></tr><tr><td><code>document.engine</code></td><td><code>pandoc</code></td><td><code>pandoc</code></td><td>Converter for DOCX (falls back to HTML)</td></tr></tbody></table>
<h3>Audio Options</h3>
<table><thead><tr><th>Option</th><th>Default</th><th>Values</th><th>Description</th></tr></thead><tbody><tr><td><code>audio.enabled</code></td><td><code>true</code></td><td><code>true</code>, <code>false</code></td><td>Generate audio summary</td></tr><tr><td><code>audio.format</code></td><td><code>mp3</code></td><td><code>mp3</code>, <code>wav</code></td><td>Audio format (mp3 needs ffmpeg)</td></tr><tr><td><code>audio.tts_engine</code></td><td><code>mlx</code></td><td><code>mlx</code>, <code>kokoro</code>, <code>builtin</code></td><td>TTS engine (mlx = fastest on Apple Silicon)</td></tr></tbody></table>
<h3>MLX-Audio Options (preferred on Apple Silicon)</h3>
<table><thead><tr><th>Option</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><code>mlx_audio.path</code></td><td><code>~/.openclaw/tools/mlx-audio</code></td><td>mlx-audio venv location</td></tr><tr><td><code>mlx_audio.model</code></td><td><code>mlx-community/Kokoro-82M-bf16</code></td><td>MLX model to use</td></tr><tr><td><code>mlx_audio.voice</code></td><td><code>af_heart</code></td><td>Voice preset (used if no voice_blend)</td></tr><tr><td><code>mlx_audio.voice_blend</code></td><td><code>{af_heart: 0.6, af_sky: 0.4}</code></td><td>Custom voice mix (weighted blend)</td></tr><tr><td><code>mlx_audio.lang_code</code></td><td><code>a</code></td><td>Language code (a=US English)</td></tr><tr><td><code>mlx_audio.speed</code></td><td><code>1.05</code></td><td>Playback speed (1.0 = normal, 1.05 = 5% faster)</td></tr></tbody></table>
<h3>Kokoro PyTorch Options (fallback)</h3>
<table><thead><tr><th>Option</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><code>kokoro.path</code></td><td><code>~/.openclaw/tools/kokoro</code></td><td>Kokoro repo location</td></tr><tr><td><code>kokoro.voice_blend</code></td><td><code>{af_heart: 0.6, af_sky: 0.4}</code></td><td>Custom voice mix</td></tr><tr><td><code>kokoro.speed</code></td><td><code>1.05</code></td><td>Playback speed (1.0 = normal, 1.05 = 5% faster)</td></tr></tbody></table>
<h3>Processing Options</h3>
<table><thead><tr><th>Option</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><code>processing.subagent_timeout</code></td><td><code>600</code></td><td>Seconds for sub-agent (increase for long videos)</td></tr><tr><td><code>processing.cleanup_temp_files</code></td><td><code>true</code></td><td>Remove /tmp files after completion</td></tr></tbody></table>
<h3>Comment Options</h3>
<table><thead><tr><th>Option</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><code>comments.max_count</code></td><td><code>50</code></td><td>Number of comments to fetch</td></tr><tr><td><code>comments.timeout</code></td><td><code>90</code></td><td>Timeout for comment fetching (seconds)</td></tr></tbody></table>
<h3>Queue Options</h3>
<table><thead><tr><th>Option</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><code>queue.stale_minutes</code></td><td><code>30</code></td><td>Consider a processing job stale after this many minutes</td></tr></tbody></table>
<h2>Output Structure</h2>
<pre><code>~/Documents/TubeScribe/
├── {Video Title}.html         # Formatted document (or .docx / .md)
└── {Video Title}_summary.mp3  # Audio summary (or .wav)
</code></pre>
<p>After generation, opens the folder (not individual files) so you can access everything.</p>
<h2>Dependencies</h2>
<p><strong>Required:</strong></p>
<ul>
<li><code>summarize</code> CLI — <code>brew install steipete/tap/summarize</code></li>
<li>Python 3.8+</li>
</ul>
<p><strong>Optional (better quality):</strong></p>
<ul>
<li><code>pandoc</code> — DOCX output: <code>brew install pandoc</code></li>
<li><code>ffmpeg</code> — MP3 audio: <code>brew install ffmpeg</code></li>
<li><code>yt-dlp</code> — YouTube comments: <code>brew install yt-dlp</code></li>
<li>mlx-audio — Fastest TTS on Apple Silicon: <code>pip install mlx-audio</code> (uses MLX backend for Kokoro)</li>
<li>Kokoro TTS — PyTorch fallback: see <a href="https://github.com/hexgrad/kokoro">https://github.com/hexgrad/kokoro</a></li>
</ul>
<h3>yt-dlp Search Paths</h3>
<p>TubeScribe checks these locations (in order):</p>
<table><thead><tr><th>Priority</th><th>Path</th><th>Source</th></tr></thead><tbody><tr><td>1</td><td><code>which yt-dlp</code></td><td>System PATH</td></tr><tr><td>2</td><td><code>/opt/homebrew/bin/yt-dlp</code></td><td>Homebrew (Apple Silicon)</td></tr><tr><td>3</td><td><code>/usr/local/bin/yt-dlp</code></td><td>Homebrew (Intel) / Linux</td></tr><tr><td>4</td><td><code>~/.local/bin/yt-dlp</code></td><td>pip install --user</td></tr><tr><td>5</td><td><code>~/.local/pipx/venvs/yt-dlp/bin/yt-dlp</code></td><td>pipx</td></tr><tr><td>6</td><td><code>~/.openclaw/tools/yt-dlp/yt-dlp</code></td><td>TubeScribe auto-install</td></tr></tbody></table>
<p>If not found, setup downloads a standalone binary to the tools directory.
The tools directory version doesn&#x27;t conflict with system installations.</p>
<h2>Queue Handling</h2>
<p>When user sends multiple YouTube URLs while one is processing:</p>
<h3>Check Before Starting</h3>
<pre><code class="language-bash">python skills/tubescribe/scripts/tubescribe.py --queue-status
</code></pre>
<h3>If Already Processing</h3>
<pre><code class="language-bash"># Add to queue instead of starting parallel processing
python skills/tubescribe/scripts/tubescribe.py --queue-add &quot;NEW_URL&quot;
# → Replies: &quot;📋 Added to queue (position 2)&quot;
</code></pre>
<h3>After Completion</h3>
<pre><code class="language-bash"># Check if more in queue
python skills/tubescribe/scripts/tubescribe.py --queue-next
# → Automatically pops and processes next URL
</code></pre>
<h3>Queue Commands</h3>
<table><thead><tr><th>Command</th><th>Description</th></tr></thead><tbody><tr><td><code>--queue-status</code></td><td>Show what&#x27;s processing + queued items</td></tr><tr><td><code>--queue-add URL</code></td><td>Add URL to queue</td></tr><tr><td><code>--queue-next</code></td><td>Process next item from queue</td></tr><tr><td><code>--queue-clear</code></td><td>Clear entire queue</td></tr></tbody></table>
<h3>Batch Processing (multiple URLs at once)</h3>
<pre><code class="language-bash">python skills/tubescribe/scripts/tubescribe.py url1 url2 url3
</code></pre>
<p>Processes all URLs sequentially with a summary at the end.</p>
<h2>Error Handling</h2>
<p>The script detects and reports these errors with clear messages:</p>
<table><thead><tr><th>Error</th><th>Message</th></tr></thead><tbody><tr><td>Invalid URL</td><td>❌ Not a valid YouTube URL</td></tr><tr><td>Private video</td><td>❌ Video is private — can&#x27;t access</td></tr><tr><td>Video removed</td><td>❌ Video not found or removed</td></tr><tr><td>No captions</td><td>❌ No captions available for this video</td></tr><tr><td>Age-restricted</td><td>❌ Age-restricted video — can&#x27;t access without login</td></tr><tr><td>Region-blocked</td><td>❌ Video blocked in your region</td></tr><tr><td>Live stream</td><td>❌ Live streams not supported — wait until it ends</td></tr><tr><td>Network error</td><td>❌ Network error — check your connection</td></tr><tr><td>Timeout</td><td>❌ Request timed out — try again later</td></tr></tbody></table>
<p>When an error occurs, report it to the user and don&#x27;t proceed with that video.</p>
<h2>Tips</h2>
<ul>
<li>For long videos (&gt;30 min), increase sub-agent timeout to 900s</li>
<li>Speaker detection works best with clear interview/podcast formats</li>
<li>Single-speaker videos (tutorials, lectures) skip speaker labels automatically</li>
<li>Timestamps link directly to YouTube at that moment</li>
<li>Use batch mode for multiple videos: <code>tubescribe url1 url2 url3</code></li>
</ul>

TubeScribe

AI Skill Market Insights

Be Part of the 0+ Developer Community

TubeScribe 🎬

💸 Free & No Paid APIs

✨ Features

🎬 Works With Any Video

Quick Start

First-Time Setup

Full Workflow (Single Sub-Agent)

Step 1: Extract

Quick Start

Manual Installation

TEAR & SHARE

Tags

plan-design-review

design-review

design-html

design-shotgun

design-consultation

Channels

Learn

Compare

Company