name: wjs-translating-subtitles
description: Use when the user has an SRT (or transcript text) in one language and wants it translated to another, with punctuation-bounded re-segmentation so cues end at real sentence breaks. Simplified Chinese (zh-CN) and English (en) are first-class targets; other targets follow the same rules. Outputs a target-language SRT or bilingual SRT — no audio, no burn-in. Triggers — "翻译字幕", "翻成中文", "translate this SRT", "中英双语字幕", "把这个 SRT 翻译成 X", "bilingual subtitles".
wjs-translating-subtitles
Source-language SRT in → target-language (or bilingual) SRT out. This skill is text-only. Burn-in lives in
/wjs-burning-subtitles
; voice dub in
/wjs-dubbing-video
.
When to use
- User has an SRT in language A and wants it in language B.
- User pasted a transcript (with or without timestamps) and wants a translation that becomes an SRT.
- User has an SRT but cues end mid-sentence — this skill's re-segmentation step fixes that.
When NOT to use
- No source-language SRT yet → run
/wjs-transcribing-audio
first.
- User wants burned-in subtitles → finish translation here, then
/wjs-burning-subtitles
.
- User wants voice dub → finish translation here, then
/wjs-dubbing-video
.
Pick the target
Resolve target from the user's phrasing once, don't re-ask:
- "翻成中文 / 中文字幕 / 中文配音" →
zh-CN
.
- "translate to English / English subs / English dub" →
en
.
- "bilingual" / "双语" → produce both
.<source>.srt
and .<target>.srt
(and optionally a combined .<source>-<target>.srt
).
- Ambiguous → default to whichever the user has historically chosen in the project.
Simplified Chinese and English are fully validated. Other targets (Japanese, Korean, French, etc.) work via the same rules; the bottleneck is TTS-voice availability if dubbing follows — see
/wjs-dubbing-video
before promising.
Shared translation principles
- Prioritize meaning over literal wording.
- Use concise subtitle-style language — viewers read at ~3 wps for Chinese, ~3–4 wps for English; lines that exceed that go off-screen before they can be read.
- Preserve the tone of the speaker. Casual source → casual target; formal source → formal target.
- Do not over-translate names, brands, cultural references, or technical terms.
- Keep numbers, dates, names, and places accurate.
- If a phrase has no exact equivalent, translate the meaning naturally. No literal/word-for-word constructions.
- Avoid stiff, machine-translated output.
Translating into Simplified Chinese (zh-CN)
- Use natural spoken Mandarin for casual speech, formal Mandarin for formal speech.
- Use Simplified characters only (do NOT use Traditional Hanzi unless the user explicitly asks).
- Subtitle lines should be roughly 15 Chinese characters or fewer per line, max 2 lines per cue (3 only when unavoidable for very long cues).
- Use Chinese punctuation: 「,」「。」「;」「:」「、」「——」. Never mix English commas/periods into Chinese subtitles.
- Minimize filler demonstratives 「这」「那」「这个」「那个」「那份」「那种」「那里」「那样」. Spanish-to-Chinese (and English-to-Chinese) MT routinely inserts these because the source has overt demonstratives that Chinese usually drops. Examples:
- "这把我们带入二元世界的载体" → "把我们带入二元的载体"
- "运用那份能量" → "运用这股能量" if needed, or just "运用能量"
- "正是在这合一里" → "正是在合一中"
- "像罪人那样翻滚" → "像罪人翻滚" / "像罪人般翻滚"
- "那份精微的觉知" → "精微的觉知"
Keep them only when they carry real meaning (deixis, contrast, or fixed phrase like spiritual "我就是那" / "tat tvam asi"). Default is to delete; add back only if the sentence becomes ambiguous.
Examples (Spanish → Chinese):
Spanish: No pasa nada. → Chinese: 没关系。
Spanish: Vamos a ver qué pasa. → Chinese: 我们看看会发生什么。
Spanish: Me parece una locura. → Chinese: 我觉得这太疯狂了。
Spanish: ¿Qué quieres decir? → Chinese: 你是什么意思?
Spanish: La verdad es que no lo esperaba.
→ Chinese: 说实话,我没想到会这样。
Translating into English (en)
- Use natural conversational English. Avoid translationese ("It is precisely through entering the body…" → "It's by entering the body…").
- Lines should be roughly 40–42 characters or fewer (about 7–9 words), max 2 lines per cue. Hard cap 50 chars per line.
- Use ASCII punctuation:
,
.
;
:
—
(em-dash). Avoid Unicode curly quotes — keeps .srt
portable.
- For contemplative/spiritual content, prefer plain words over Latinate jargon: "presence" over "manifestation," "wholeness" over "totality," "wake up" over "awaken to consciousness."
Examples (Spanish → English):
Spanish: No pasa nada. → English: It's nothing.
Spanish: Vamos a ver qué pasa. → English: Let's see what happens.
Spanish: Me parece una locura. → English: This feels crazy to me.
Spanish: ¿Qué quieres decir? → English: What do you mean?
Spanish: La verdad es que no lo esperaba.
→ English: Honestly, I wasn't expecting this.
Re-segment at punctuation boundaries (mandatory)
Whisper segments by silence/breath, not grammar. The result almost always has cues that end mid-sentence (e.g., "...es una forma de aterrizar," next cue starts "el espíritu en el cuerpo..."). Any TTS that processes one cue at a time will then insert an unnatural pause exactly where the original speaker did not. The fix is mandatory before dubbing — and improves on-screen reading too.
Punctuation set differs:
- Chinese cues must end at
,
。
;
:
——
or 、
.
- English cues must end at
,
.
;
:
—
(em-dash) or, in practice for subtitles, occasionally a single dash. Never end an English cue on a comma-less clause break, and never split inside a phrase like "kind of" or "in order to".
Rules:
- Every cue must end at a real punctuation mark. Never let a cue end on a noun, verb, conjunction, or article that flows into the next cue.
- It is fine (and often necessary) to split a single source cue into 2–4 shorter cues, with timestamps interpolated by character position within the original cue's duration.
- It is fine to merge the tail of one source cue with the head of the next when they form one clause — the merged cue inherits the start of the first and the end of the second.
- Target 3–8 seconds per cue. Cues shorter than ~1.5s feel choppy on screen; cues longer than ~10s usually contain a missed punctuation break.
A typical 2–3 minute talk yields roughly 25–40 punct-bounded cues from 12–18 raw source cues. Don't try to keep the original cue count.
When TTS dubbing follows: the punctuation-bounded structure means each TTS clip is a complete utterance with proper end-intonation, and concatenating clips sounds natural because every join is at a real pause point.
SRT output rules
1
00:00:01,200 --> 00:00:04,800
中文字幕内容
2
00:00:04,800 --> 00:00:08,500
中文字幕内容
- Number subtitles sequentially starting from
1
.
- Timestamp format:
HH:MM:SS,mmm
. Comma milliseconds, never period milliseconds.
- Do not overlap timestamps.
- Preserve the original timing unless adjustment is necessary.
- Each subtitle should usually be 1–2 lines.
- If one subtitle is too long, split it into shorter subtitles when timing allows.
- Do not add commentary inside the subtitle file.
Bilingual output
When the user asks for bilingual: source on first line, target on second:
1
00:00:01,200 --> 00:00:04,800
No pasa nada.
没关系。
Rules:
- Keep source first, target second.
- Preserve timing.
- Avoid adding extra explanations unless requested.
- Keep both lines short enough to read.
Output formats
Depending on the user request, provide one or more:
- Target-only
.srt
- Bilingual
.srt
(source line + target line)
- Target transcript without timestamps
- Side-by-side source/target table
Default output for "translate this SRT" with no other modifiers: target-only .srt
+ a short uncertainty note if needed.
File naming
input.srt # source (e.g., from /wjs-transcribing-audio)
translated outputs:
input.zh-CN.srt # Simplified Chinese only
input.en.srt # English only
input.es-zh.srt # Spanish + Chinese bilingual
input.es-en.srt # Spanish + English bilingual
input.es-zh-en.srt # three-language
BCP-47-style suffixes make the target language obvious at a glance and keep multiple target-language outputs side-by-side.
Handling unclear audio markers
If the source SRT contains
[inaudible]
or
[unclear]
:
- Translate the surrounding context naturally.
- Keep the bracketed marker in the target SRT (don't invent content).
- If a
[unclear]
chunk makes a cue ungrammatical in the target language, leave it bracketed and add a note in the response (not in the SRT file).
Quality gate before handoff
- Subtitle numbers are sequential
- Timestamps are valid (
HH:MM:SS,mmm
, no overlap)
- Milliseconds use commas
- Translation is natural; speaker tone preserved
- Line length within platform/cue caps
- Proper nouns accurate
- No cue ends mid-clause / mid-phrase
- No invented content
Downstream
/wjs-burning-subtitles
— burn this SRT onto the video, or soft-mux as a togglable track.
/wjs-dubbing-video
— generate a TTS voice dub from this SRT, time-aligned to the original timing.
- For bilingual playback: most platforms can soft-mux multiple subtitle tracks, but if you need bilingual visible at once, burn the
*.source-target.srt
directly via /wjs-burning-subtitles
.
Common pitfalls
- Letting the cue end mid-sentence after translation. The source's silence-aligned cues are unsafe boundaries; re-segment at punctuation, always.
- Filler demonstratives in Chinese output. MT inserts 「这」/「那」 because the source had
eso/that
. Delete them aggressively.
- Period milliseconds. Whisper local writes
.mmm
; SRT spec is ,mmm
. Always normalize.
- Translating proper nouns. Brand names, place names, technical terms — leave as-is or use the conventional target-language version (e.g., "OpenAI" stays, "New York" → "纽约").
- Over-shortening for cue caps. If a line is genuinely longer than the cap, split into two cues with interpolated timestamps; don't drop meaning to fit the cap.
- Forgetting to do re-segmentation when no dub is requested. The punct-bounded SRT is also better for reading — line endings at natural pauses match how viewers scan. Re-segment even when burn-only.