Gemini STT
Transcribe audio files using Google's Gemini API or Vertex AI
Transcribe audio files using Google's Gemini API or Vertex AI
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
Transcribe audio files using Google's Gemini API or Vertex AI. Default model is
gemini-2.0-flash-lite for fastest transcription.
gcloud auth application-default login gcloud config set project YOUR_PROJECT_ID
The script will automatically detect and use ADC when available.
Set
GEMINI_API_KEY in environment (e.g., ~/.env or ~/.clawdbot/.env)
.ogg / .opus (Telegram voice messages).mp3.wav.m4a# Auto-detect auth (tries ADC first, then GEMINI_API_KEY) python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.oggForce Vertex AI
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex
With a specific model
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --model gemini-2.5-pro
Vertex AI with specific project and region
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex --project my-project --region us-central1
With Clawdbot media
python ~/.claude/skills/gemini-stt/transcribe.py ~/.clawdbot/media/inbound/voice-message.ogg
| Option | Description |
|---|---|
| Path to the audio file (required) |
, | Gemini model to use (default: ) |
, | Force use of Vertex AI with ADC |
, | GCP project ID (for Vertex, defaults to gcloud config) |
, | GCP region (for Vertex, default: ) |
Any Gemini model that supports audio input can be used. Recommended models:
| Model | Notes |
|---|---|
| Default. Fastest transcription speed. |
| Fast and cost-effective. |
| Lightweight 2.5 model. |
| Balanced speed and quality. |
| Higher quality, slower. |
| Latest flash model. |
| Latest pro model, best quality. |
See Gemini API Models for the latest list.
For Clawdbot voice message handling:
# Transcribe incoming voice message TRANSCRIPT=$(python ~/.claude/skills/gemini-stt/transcribe.py "$AUDIO_PATH") echo "User said: $TRANSCRIPT"
The script exits with code 1 and prints to stderr on:
No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.