wjs-localizing-video
Thin orchestrator for the end-to-end video localization pipeline. Routes to the four focused sub-skills — /wjs-transcribing-audio, /wjs-translating-subtitles, /wjs-dubbing-video, /wjs-burning-subtitle
Thin orchestrator for the end-to-end video localization pipeline. Routes to the four focused sub-skills — /wjs-transcribing-audio, /wjs-translating-subtitles, /wjs-dubbing-video, /wjs-burning-subtitle
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
This skill was split. What used to be one 1300-line catch-all is now 4 focused sub-skills. Use this orchestrator only when the user asks for the full pipeline in one request. For any individual step, route directly to the sub-skill.
| Step | Skill | IN | OUT |
|---|---|---|---|
| 1. ASR | | audio / video + source language | source-language SRT |
| 2. Translate | | source SRT + target language | target SRT (punct-bounded) |
| 3. Dub (optional) | | video + target SRT + voice | (TTS audio swapped in) |
| 4. Burn / mix (optional) | | video + SRT + (optional dub) | final MP4 (one encode) |
Steps 3 and 4 are optional and independent — most "subtitle-only" jobs stop after step 2 + soft-mux via
/wjs-burning-subtitles. Most "dub-only" jobs stop after step 3.
Invoke this skill (
) when the user asks for the whole chain in one go:wjs-localizing-video
Invoke a sub-skill directly when the user asks for one step:
/wjs-transcribing-audio/wjs-translating-subtitles/wjs-dubbing-video/wjs-burning-subtitlesThe sub-skills are more focused and document their input/output contracts tighter. Don't route through this orchestrator when a single step is the whole ask — you'll just be reading docs you don't need.
The pipeline has up to four steps and runs for several minutes. The user wants to see live progress, not a wall of silence followed by a finished file. Before invoking the first sub-skill, lay out the planned steps as a TaskCreate checklist. As you start each step, update its task to
in_progress; when it finishes, mark completed. Claude Code renders this as a checklist in the UI that ticks off in real time.
Decide from the user's ask which subset of the 4 steps will run:
| User asked for | Tasks to create |
|---|---|
| Full localization (subs + dub + final mix) | ① 转写 ② 翻译 ③ 配音 ④ 烧字幕 + 混音 |
| Subtitles only (no voice change) | ① 转写 ② 翻译 ③ 烧字幕 (or soft-mux) |
| Dub only (no burn) | ① 转写 ② 翻译 ③ 配音 |
| User already has source SRT | skip ①; create only the remaining steps |
| User already has target SRT | skip ① and ②; create only the remaining |
Don't create tasks for steps you won't run — an unchecked item at the end reads as "we forgot," not "we skipped."
Use the same labels you'll use in the "Final response template" below, so the in-flight checklist and the completion summary read as the same artifact:
转写 <source-lang> (e.g., 转写 西班牙语)翻译 → <target-lang> (e.g., 翻译 → 中文)配音 <target-lang> (e.g., 配音 中文 (高冷御姐 -8%))烧字幕 + 原声底层 <bed> (e.g., 烧字幕 + 原声底层 0.18)For
activeForm, use the present-continuous variant (转写中…, 翻译中…, 配音中…, 合成中…).
in_progress only the task currently running (one at a time).completed immediately when the sub-skill returns its output file — don't batch.in_progress so the user can see exactly where the pipeline stopped. Don't mark a half-finished step as completed.This is the original "validation scenario" — Spanish yoga/spiritual content into Chinese for 微信视频号 / 小红书. Walks through all 4 sub-skills.
Invoke /wjs-transcribing-audio with the video and `--language es`.
The sub-skill handles: chunking, word-level timestamps, cue assembly at punctuation boundaries, loop guard, retry. Output:
entrevista.srt (Spanish, source-language).
Invoke /wjs-translating-subtitles with the Spanish SRT and target `zh-CN`.
The sub-skill handles: re-segmenting cues at punctuation, minimizing filler demonstratives, capping line length, preserving speaker tone. Output:
entrevista.zh-CN.srt.
Invoke /wjs-dubbing-video with the video + entrevista.zh-CN.srt + a voice ID.
Default for mature contemplative female:
zh_female_gaolengyujie_moon_bigtts (Volcano 高冷御姐, --rate -8% +0Hz). If no Volcano credentials: zh-CN-XiaoxiaoNeural --rate -8% --pitch -10Hz. The sub-skill samples first, then commits.
Output:
entrevista_zh_dub.mp4.
Invoke /wjs-burning-subtitles with --video entrevista.mp4 --srt entrevista.zh-CN.srt --dub entrevista_zh_dub.mp4
The sub-skill handles: libass availability check, evermeet static-build fallback, Fontsize calibration, frame-check before full render, audio bed mix at 0.18.
Output:
entrevista_zh_final.mp4 (ship-ready).
These apply across the pipeline; each sub-skill enforces its own slice:
/wjs-transcribing-audio./wjs-dubbing-video./wjs-burning-subtitles./wjs-burning-subtitles.~/.claude/CLAUDE.md; applies when this pipeline is followed by /wjs-uploading-video.)input: entrevista.mp4 Chinese pipeline: source SRT entrevista.srt Chinese SRT entrevista.zh-CN.srt Chinese dub (audio only) entrevista_zh_dub.mp4 Chinese final (subs+dub) entrevista_zh_final.mp4 English pipeline: English SRT entrevista.en.srt English dub entrevista_en_dub.mp4 English final entrevista_en_final.mp4 Bilingual subtitles: Spanish + Chinese entrevista.es-zh.srt Spanish + English entrevista.es-en.srt three-language entrevista.es-zh-en.srt
BCP-47-style suffixes keep multiple target-language outputs side-by-side and make the target obvious at a glance.
dub.py lives in /wjs-dubbing-video/scripts/, render.py lives in /wjs-burning-subtitles/scripts/, visual_diarize.py lives in /wjs-dubbing-video/scripts/. If you see stale copies under wjs-localizing-video/scripts/, prefer the canonical sub-skill locations.When the full pipeline completes, respond briefly in the user's language. Match the original "Done" template the user is used to:
已完成: - 西班牙语转写 (/wjs-transcribing-audio) - 中文翻译 (/wjs-translating-subtitles) - 中文配音 (/wjs-dubbing-video, voice: 高冷御姐 -8%) - 烧入字幕 + 原声底层 0.18 (/wjs-burning-subtitles) 输出: - entrevista.srt - entrevista.zh-CN.srt - entrevista_zh_dub.mp4 - entrevista_zh_final.mp4 不确定片段: - 00:01:23–00:01:26 背景噪音较大,原文可能不完全准确。
If there are no uncertain parts, drop the second list.
/wjs-segmenting-video — cut long-form video into stand-alone short clips (uses its own SRT slicer; orthogonal to this pipeline)./wjs-overlaying-video — HTML/CSS captions on a clip via HyperFrames. Don't combine with /wjs-burning-subtitles — pick one caption system per output./wjs-uploading-video — push the final MP4 to YouTube./lark-minutes — alternate Chinese-only transcript path via 飞书妙记 when local 豆包 ASR isn't wired up.MIT
No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.