Voice Agent
Local Voice Input/Output for Agents using the AI Voice Agent API.
Local Voice Input/Output for Agents using the AI Voice Agent API.
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
This skill allows you to speak and listen to the user using a local Voice Agent API. It is client-only and does not start containers or services. It uses local Whisper for Speech-to-Text transcription and AWS Polly for Text-to-Speech generation.
Requires a running backend API at
http://localhost:8000.
Backend setup instructions are in this repository:
README.mdwalkthrough.mdDOCKER_README.mdtranscribe to read it.synthesize to generate the audio file.health fails or connection errors occur, do not attempt service management from this skill. Ask the user to start or fix the backend using the repository docs.To transcribe an audio file with local Whisper STT, run the client script with the
transcribe command.
python3 {baseDir}/scripts/client.py transcribe "/path/to/audio/file.ogg"
To generate audio from text with AWS Polly TTS and save it to a file, run the client script with the
synthesize command.
python3 {baseDir}/scripts/client.py synthesize "Text to speak" --output "/path/to/output.mp3"
To check if the voice agent API is running and healthy:
python3 {baseDir}/scripts/client.py health
No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.