The 15-Skill Video Pipeline: How One Engineer Built an End-to-End Editor in a Night
Wang Jianshuo shipped a Chinese-podcast-to-YouTube-Shorts pipeline as 15 single-purpose Claude Code skills. The interesting part isn't the result — it's the design.
Wang Jianshuo (王建硕) — early-internet veteran, longtime blogger, ex-Kijiji China lead — published 15 skills last week that, taken together, are a complete video-editing pipeline. Each skill does exactly one thing. Chained together, they take a raw multi-hour recording and produce upload-ready Spanish-dubbed Shorts in a single overnight run.
What he did is interesting. How he structured it is more interesting.
The Workflow He Actually Ran
After a long-form podcast recording with Ren Xin, Wang stitched the skills together literally in this order, sentence-fed to Claude Code:
Turn this recording into subtitles → cut it into four short clips → reframe the second one to vertical → add covers and animated captions to all of them → dub the AI-education segment in Spanish → upload everything to YouTube.
Five hours of unattended rendering later (he slept), he woke up to the finished deliverables. The repo is at jianshuo/claude-skills.
The Five Groups
Wang split the 15 skills into five thematic groups. Each group is independently useful; the seams between them are deliberate.
Group 1 — Localize a video (5 skills)
Turn a recording in one language into a polished version in another, end to end.
- wjs-transcribing-audio — audio/video in, timestamped SRT out. Chinese routes through Volcano (Doubao) ASR; everything else uses Whisper.
- wjs-translating-subtitles — SRT in, translated SRT out, with smart resegmentation so lines break at natural punctuation, not mid-clause.
- wjs-dubbing-video — translated SRT plus the source video, TTS dub aligned to the original timing.
- wjs-burning-subtitles — final compositing: subtitles in-frame, dubbed audio over a low-volume original bed, one encode.
- wjs-localizing-video — orchestrator. Calls the four above in order. This is what you actually invoke.
Group 2 — Long recording → publishable shorts (3 skills)
- wjs-segmenting-video — takes a multi-hour interview and produces 3-6 self-contained short clips by topic. It only segments — composition is downstream.
- wjs-overlaying-video — post-production: AI-generated cover art, kinetic captions that follow the SRT, animated transitions at key moments, chapter tags, end-card CTAs. Built on top of Hyperframe.
- wjs-reframing-video — landscape ↔ portrait, but it tracks the speaker by detecting mouth motion rather than centroid-cropping. When two people are on-screen, the crop follows whoever's talking.
Group 3 — Multicam (2 skills)
The hardest part of video editing, finally tractable.
- wjs-syncing-multicam — N camera angles + audio tracks → one
.sync.jsontimeline alignment file. Doesn't touch the source media. - wjs-editing-multicam — synced angles → single final video, auto-cutting between cameras based on audio level, with optional picture-in-picture.
Group 4 — Distribution (3 skills)
The unsexy work that kills most pipelines.
- wjs-uploading-video — batch YouTube upload. Title/description/tags come from an
UPLOAD_META.mdfile. No web UI. - wjs-publishing-wechat — WeChat official-account article generation: copy polish, header image, inline explanatory figures, backend-upload-ready. (Wang notes this very skill wrote his original announcement post.)
- wjs-promoting-skills — researches how other authors market their skills, generates a promo plan, posts to X.
Group 5 — Off-piste utilities (2 skills)
- wjs-auditing-project — "something's off but I can't say what" diagnostic. Lists unmerged branches, stuck PRs, failed CI, plan-vs-reality drift. Read-only by default; user opts in to fixes.
- wjs-eating-and-growing — Wang's favorite. A 吃一堑长一智 ("learn from each setback") reflection skill. He says it's been the most personally useful of the lot.
What's Actually New Here
Plenty of people have built video-editing automations. Plenty of people have used Claude Code. The unusual move is the granularity of the decomposition.
Every skill is named with a present-participle verb: transcribing, translating, dubbing, burning, segmenting, overlaying, reframing. The naming is the API. When the name describes one action, the skill's responsibility surface becomes obvious — and so do the seams where the next skill picks up.
This sounds like trivial advice until you compare against the average skill in the wild, which is named something like "video-pipeline" or "auto-editor" — names that promise everything and constrain nothing. Those skills accumulate features. Wang's don't.
Where The Decomposition Pays Off
Two places, immediately.
1. Mix and match for a specific job. Wang's example flow used 6 of the 15 skills. The other 9 didn't run. That only works because each skill terminates cleanly with a well-defined artifact (an SRT, a .sync.json, an UPLOAD_META.md). When skills do too many things, you can't peel them apart.
2. Reuse outside video. wjs-auditing-project and wjs-eating-and-growing have nothing to do with video. They're in this repo because they emerged from the same project hygiene Wang was practicing while building the video stack. They're as installable as the rest.
Practical Takeaway
If you publish video content in any language — long-form podcasts, interviews, conference talks — this is the most complete open-source pipeline I've seen for getting it to YouTube + Shorts + WeChat with minimal hand-editing. The localization group alone is worth the install.
Install any of them in 30 seconds via aiskill.market or directly:
claude plugin marketplace add jianshuo/claude-skills
claude plugin install wjs-localizing-video
The deeper lesson — why this many small skills works better than three big ones — is the subject of the next post.
Part of the AI Skill Daily series — skills worth understanding, one at a time.