wjs-reframing-video
Use when the user wants to convert a video between horizontal and vertical orientations while preserving the inverted aspect ratio (16:9 ↔ 9:16, 4:3 ↔ 3:4, 21:9 ↔ 9:21). The skill crops a narrow band
Use when the user wants to convert a video between horizontal and vertical orientations while preserving the inverted aspect ratio (16:9 ↔ 9:16, 4:3 ↔ 3:4, 21:9 ↔ 9:21). The skill crops a narrow band
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
Convert a video's orientation by cropping a narrow band from the source — not by physically rotating it. The crop window follows the active speaker (the face whose mouth is moving), not just the largest or most-confident face. A
.crop.json sidecar records the crop plan, the per-segment speaker decisions, and the parameters used. The original input is never modified.
The output aspect is the source aspect with width and height swapped — 16:9 → 9:16, not "letterboxed 16:9 in a 9:16 frame".
| Is | Is not |
|---|---|
| Visual active-speaker detection via MAR (mouth-aspect-ratio) variance | Audio-visual fusion (audio energy + lip motion cross-correlated) |
| Stable face tracking across frames by center-distance matching | Re-identification across long gaps / occlusions |
| Speaker-aligned segments with hysteresis to prevent flicker | Frame-by-frame switching on every flicker |
(default) — pick whoever's mouth is moving | (opt-in legacy) — pick largest face |
Hard cuts between segments, fixed crop within each segment (, default) | Smooth panning that drifts during a speaker's turn (opt-in ) |
| Audio stream-copy (bit-exact) | Audio reprocessing / re-encoding |
MediaPipe Tasks (478-pt mesh) at 5 fps sampled via ffmpeg | Per-frame neural inpainting / out-painting |
One pass | Frame-by-frame Python compositor |
Falls back to "largest face" automatically when no one is talking (silence, music-only stretches).
pip install mediapipe opencv-python numpy
(MediaPipe lives outside the standard Python distribution; ffmpeg and ffprobe must be on
PATH.)
First-run model download: MediaPipe 0.10+ uses the Tasks API, which needs a
face_landmarker.task model file (~4 MB). On the first call, crop.py downloads it to ~/.claude/skills/wjs-reframing-video/models/ and caches it for subsequent runs. The script fails offline on first run.
Range limitation: The bundled landmarker is tuned for faces within ~2 m of the camera (selfie / podcast / interview distance). Wide event shots with small faces may not detect — sample a frame first to confirm.
Source aspect =
W / H. Target aspect = H / W (inverted). Compute crop window:
| Source orientation | Crop window |
|---|---|
| Horizontal (W > H) → Portrait | , (narrow vertical band) |
| Portrait (W < H) → Horizontal | , (narrow horizontal band) |
For 1920×1080 → portrait,
W_crop = 608, H_crop = 1080. Final scale to 1080×1920 (upscale ~1.78×).
For 1080×1920 → landscape, W_crop = 1080, H_crop = 608. Final scale to 1920×1080.
Override the final size via
--output-size 1080x1920 if you want native crop dimensions instead of upscaling.
--target portrait|landscape to override).--sample-fps (default 5; high enough to catch mouth motion — Nyquist for speech is ~10 Hz, we need at least 4–5 fps).FaceLandmarker (478 landmarks). For each detected face record: center, size proxy, MAR (mouth-aspect-ratio = inner-lip vertical distance / horizontal mouth-corner distance).face_id.--mar-var-window-sec, default 1 s). The face with the highest variance is "speaking". Below --mar-var-threshold, no one is speaking → fall back to largest face.--min-segment-sec (default 1.5 s). Shorter flickers are squashed — prevents the crop from ping-ponging on a one-frame mis-detection.--motion cut, default) that holds each segment's crop position constant and jumps instantly at each segment boundary — the visual feel of a real cut between camera angles. (--motion smooth switches to piecewise-linear pan between segment midpoints; rarely the right call for talking-head content because the camera appears to drift mid-sentence.)crop=W:H:x='expr':y='expr', scale=OUT_W:OUT_H. The crop filter evaluates x and y per frame natively. Audio stream-copied.scripts/crop.py is the implementation. Output side effects:
<input>.crop.json — sidecar with the crop plan<input>_cropped.mp4 — final cropped + scaled video<input>.crop.json){ "_about": "wjs-reframing-video crop plan for cam_a.MOV. Active-speaker detected via MAR variance.", "_help": { "source_size": "[width, height] in pixels.", "target_size": "[width, height] of the final rendered output.", "crop_window": "[width, height] of the moving crop in source coords.", "chunks": "Speaker-aligned segments: {t0, t1, cx, cy, speaker_id}.", "face_pick_mode": "speaker = MAR-variance active-speaker; largest = old behavior.", "speaker_id": "Stable face track id. null means no face / silence fallback." }, "schema_version": 2, "source": "cam_a.MOV", "source_size": [1920, 1080], "target": "portrait", "target_size": [1080, 1920], "crop_window": [608, 1080], "face_pick_mode": "speaker", "sample_fps": 5.0, "mar_var_window_sec": 1.0, "mar_var_threshold": 1.5e-4, "min_segment_sec": 1.5, "chunks": [ {"t0": 0.0, "t1": 4.2, "cx": 808, "cy": 540, "speaker_id": 0}, {"t0": 4.2, "t1": 11.6, "cx": 1182, "cy": 540, "speaker_id": 1}, {"t0": 11.6, "t1": 14.0, "cx": 808, "cy": 540, "speaker_id": 0} ], "face_sample_count": 1234, "track_count": 2 }
--sample-fps makes detection slower but tracking more responsive.hevc_videotoolbox on macOS). Often <1× realtime for a 1080p source.face#N: Xs on screen (Y%) summary) and re-run with a different --mar-var-threshold if needed.--face-pick largest.ffmpeg lenscorrection first.608×1080 → 1080×1920 is a 1.78× upscale and visible on sharp text. Render at native crop dims (--output-size 608x1080) and let the platform upscale, if you have overlays you want to keep sharp.--bitrate 12M. WeChat Channels (视频号) caps at 10 Mbps; pass --bitrate 8M for that target.MIT
No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.