wjs-syncing-multicam
Use when the user has 2+ video / audio recordings of the same event captured by different devices (cameras, phones, separate audio recorders) and wants them aligned to a single common timeline. Output
Use when the user has 2+ video / audio recordings of the same event captured by different devices (cameras, phones, separate audio recorders) and wants them aligned to a single common timeline. Output
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
.sync.json sidecar per input — original files are never re-encoded. Triggers — "多机位同步", "对齐这几个机位", "match camera timelines", "sync these angles", "audio drift between cameras", "separate audio recorder", "Riverside / Zoom recording that needs to line up".Compute a single time offset for each multi-source recording of the same event using audio cross-correlation, and emit a
.sync.json sidecar next to each original. Originals are never modified, copied, or re-encoded. Downstream tools use -itsoffset to apply the offset at consume time.
Earlier versions of this skill produced
*_synced.MOV files by trimming + re-encoding to bake the offset into the file. We removed that:
_synced.MOV generation took 10+ min per file on Apple Silicon; sidecar emission takes seconds.Raw PCM cross-correlation gives weak peaks and false matches when the two mics have different gain / room response — i.e., almost always with a secondary cam. The log-energy envelope captures dialogue and music dynamics, which both mics hear regardless of frequency response. Don't skip the envelope step — it's the entire reason this skill is robust at low SNR.
delta(t) = slope·t + intercept reveals real clock drift (5–50 ppm typical). Use the midpoint-canonical offset (slope · midpoint + intercept) so residual error is symmetric around zero.overlap = [max(0, delta), min(ref_dur, delta + src_dur)]..sync.json sidecar next to each non-reference input. No file is copied, trimmed, or re-encoded. The reference input gets a sidecar too (with delta_seconds: 0) so downstream code can treat all inputs uniformly.scripts/sync.py is the implementation. Note: the current script still emits _synced.MOV files alongside the sidecar — that path is deprecated; the sidecar is the only authoritative output.
<input>.sync.json)One sidecar per original input, written next to it. Pure JSON, no comments in-file — the field reference below is canonical.
{ "_about": "Sync metadata for cam_b.MOV. Apply via ffmpeg -itsoffset. See wjs-syncing-multicam SKILL.md for full schema.", "schema_version": 1, "source": "cam_b.MOV", "reference": "cam_a.MOV", "delta_seconds": 12.345, "drift_slope": 1.8e-5, "overlap_in_reference": [12.345, 4512.180], "overlap_in_source": [0.000, 4499.835], "verification": { "median_residual_ms": 4.2, "residual_spread_ms": 11.8, "probe_count": 24 } }
| Field | Type | Meaning |
|---|---|---|
| string | Human-readable one-liner. Includes pointer back to this SKILL.md. Always present. |
| int | Bumps on any breaking change to this schema. Current: . |
| string | Filename of the original this sidecar describes. Relative to the sidecar's directory. Never points to a re-encoded file. |
| string | The input whose timeline we're aligned to. Reference's own sidecar lists itself here. |
| float | The source's expressed in the reference's timeline. If positive, source starts after reference; pass to ffmpeg as . Can be negative (source starts before reference, e.g. early-rolling camera). |
| float | Linear clock-drift slope (dimensionless, ~10⁻⁵). means no measurable drift. Downstream applies to the source ONLY for sync-sound / long-form lip-sync — for camera-cut editing, ignore. |
| (seconds) | The window during which both source and reference have coverage, expressed in the reference's timeline. Use this to trim outputs to mutually-valid time ranges. |
| (seconds) | Same window expressed in the source's local timeline. . |
| object | Output of running verify.py — drives a "did sync converge?" gate. should be a few ms; > 1 frame at delivery fps means drift correction was needed but skipped. |
-itsoffset is per-input in ffmpeg and applies BEFORE -i. Always read the source's delta_seconds from the sidecar:
# Play cam_b aligned to cam_a's timeline ffmpeg -itsoffset $(jq -r .delta_seconds cam_b.MOV.sync.json) -i cam_b.MOV \ -i cam_a.MOV \ -filter_complex "[0:v][1:v]hstack" out.mp4 # Trim to mutual overlap window (read from cam_b.MOV.sync.json) ffmpeg -ss <overlap_in_source[0]> -i cam_b.MOV -t <overlap_dur> ...
For
wjs-editing-multicam, the EDL builder in autoedit.py ingests every <input>.sync.json automatically; you don't compose these flags by hand.
Common case — main cams cover 75 min, a Riverside / phone / lavalier recorder only covers the middle 30 min.
scripts/sync_partial.py REF.MOV NEW.mp4:
t=0 sits in the reference timeline (delta_seconds may be large, e.g. 1842.5).overlap_in_reference tells consumers exactly when this input has coverage; outside that window, fall back to the main cams.--audio-only flag is meaningful only for hinting downstream that this source has no video stream — there's no encoding step to skip anymore.
For camera-cut editing (the common case), ±25 ms residual across an hour is below human perception — pass
drift_slope: 0.0 and use only the midpoint delta_seconds.
For sync-sound / lip-sync at long durations (>30 min and
verification.residual_spread_ms > 40), downstream applies atempo = 1 + drift_slope to the source. Source files are still not modified — the atempo filter runs at consume time.
scripts/verify.py REF.MOV SRC.MOV SRC.sync.json re-extracts audio from BOTH originals (with -itsoffset applied to the source per the sidecar) and runs multi-probe correlation again. Writes results back into the sidecar's verification field.
Pass criteria —
median_residual_ms < 15 and residual_spread_ms < 1 frame at delivery fps. Fail = retry with drift correction enabled.
-itsoffset semantics differ for audio vs video — for sync-correctness it must be the FIRST flag for that input. ffmpeg -i src -itsoffset X is wrong; ffmpeg -itsoffset X -i src is right.source / reference against Path(sidecar).parent.drift_slope into the sidecar's delta_seconds. They're separate fields for a reason — naive consumers can ignore drift, sync-sound consumers can apply it. Mixing them loses information.MIT
No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.