Speech To Text
Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation,...
Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation,...
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
Transcribe audio to text via inference.sh CLI.

curl -fsSL https://cli.inference.sh | sh && infsh logininfsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'
Install note: The install script only detects your OS/architecture, downloads the matching binary from
, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available.dist.inference.sh
| Model | App ID | Best For |
|---|---|---|
| Fast Whisper V3 | | Fast transcription |
| Whisper V3 Large | | Highest accuracy |
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'
infsh app sample infsh/fast-whisper-large-v3 --save input.json{
"audio_url": "https://podcast.mp3",
"timestamps": true
}
infsh app run infsh/fast-whisper-large-v3 --input input.json
infsh app run infsh/whisper-v3-large --input '{ "audio_url": "https://french-audio.mp3", "task": "translate" }'
# Extract audio from video first infsh app run infsh/video-audio-extractor --input '{"video_url": "https://video.mp4"}' > audio.jsonTranscribe the extracted audio
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'
# 1. Transcribe video audio infsh app run infsh/fast-whisper-large-v3 --input '{ "audio_url": "https://video.mp4", "timestamps": true }' > transcript.json2. Use transcript for captions
infsh app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'
Whisper supports 99+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, and many more.
Returns JSON with:
text: Full transcriptionsegments: Timestamped segments (if requested)language: Detected language# Full platform skill (all 150+ apps) npx skills add inference-sh/skills@inference-shText-to-speech (reverse direction)
npx skills add inference-sh/skills@text-to-speech
Video generation (add captions)
npx skills add inference-sh/skills@ai-video-generation
AI avatars (lipsync with transcripts)
npx skills add inference-sh/skills@ai-avatar-video
Browse all audio apps:
infsh app list --category audio
No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.