Name: Speech To Text
Availability: InStock
Author: okaris

Speech-to-Text

Transcribe audio to text via inference.sh CLI.

Speech-to-Text

Quick Start

curl -fsSL https://cli.inference.sh | sh && infsh login
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3&quot;}&#x27;

Install note: The install script only detects your OS/architecture, downloads the matching binary from
dist.inference.sh
, and verifies its SHA-256 checksum. No elevated permissions or background processes. Manual install & verification available.

Available Models

Model	App ID	Best For
Fast Whisper V3	`infsh/fast-whisper-large-v3`	Fast transcription
Whisper V3 Large	`infsh/whisper-v3-large`	Highest accuracy

Examples

Basic Transcription

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'

With Timestamps

infsh app sample infsh/fast-whisper-large-v3 --save input.json
{
"audio_url": "https://podcast.mp3",
"timestamps": true
}

infsh app run infsh/fast-whisper-large-v3 --input input.json

Translation (to English)

infsh app run infsh/whisper-v3-large --input '{
  "audio_url": "https://french-audio.mp3",
  "task": "translate"
}'

From Video

# Extract audio from video first
infsh app run infsh/video-audio-extractor --input '{"video_url": "https://video.mp4"}' > audio.json
Transcribe the extracted audio
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'

Workflow: Video Subtitles

# 1. Transcribe video audio
infsh app run infsh/fast-whisper-large-v3 --input '{
  "audio_url": "https://video.mp4",
  "timestamps": true
}' > transcript.json
2. Use transcript for captions

infsh app run infsh/caption-videos --input '{
"video_url": "https://video.mp4",
"captions": "<transcript-from-step-1>"
}'

Supported Languages

Whisper supports 99+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, and many more.

Use Cases

Meetings: Transcribe recordings
Podcasts: Generate transcripts
Subtitles: Create captions for videos
Voice Notes: Convert to searchable text
Interviews: Transcription for research
Accessibility: Make audio content accessible

Output Format

Returns JSON with:

```
text
```
: Full transcription
```
segments
```
: Timestamped segments (if requested)
```
language
```
: Detected language

Related Skills

# Full platform skill (all 150+ apps)
npx skills add inference-sh/skills@inference-sh
Text-to-speech (reverse direction)
npx skills add inference-sh/skills@text-to-speech
Video generation (add captions)
npx skills add inference-sh/skills@ai-video-generation
AI avatars (lipsync with transcripts)

npx skills add inference-sh/skills@ai-avatar-video

Browse all audio apps:

infsh app list --category audio

Documentation

Running Apps - How to run apps via CLI
Audio Transcription Example - Complete transcription guide
Apps Overview - Understanding the app ecosystem

Speech To Text

AI Skill Market Insights

Be Part of the 2,707+ Developer Community

Speech-to-Text

Quick Start

Available Models

Examples

Basic Transcription

With Timestamps

{

"audio_url": "https://podcast.mp3",

"timestamps": true

}

Translation (to English)

From Video

Transcribe the extracted audio

Workflow: Video Subtitles

2. Use transcript for captions

Supported Languages

Use Cases

Output Format

Related Skills

Text-to-speech (reverse direction)

Video generation (add captions)

AI avatars (lipsync with transcripts)

Documentation

Quick Start

Manual Installation

TEAR & SHARE

Tags

Chart MCP Server

Douyin MCP Server

KiCad MCP Server

Shadcn UI MCP Server

Drawio MCP Server

Channels

Learn

Compare

Company

Agents