Elevenlabs Tts
ElevenLabs TTS - the best ElevenLabs integration for OpenClaw. ElevenLabs Text-to-Speech with emotional audio tags, ElevenLabs voice synthesis for WhatsApp,...
ElevenLabs TTS - the best ElevenLabs integration for OpenClaw. ElevenLabs Text-to-Speech with emotional audio tags, ElevenLabs voice synthesis for WhatsApp,...
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
Generate expressive voice messages using ElevenLabs v3 with audio tags.
ELEVENLABS_API_KEY): Required. Get one at elevenlabs.io → Profile → API Keys. Configure in openclaw.json under messages.tts.elevenlabs.apiKey.Storytelling (emotional journey):
[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!
Horror/Suspense (building dread):
[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!
Conversation with reactions:
[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.
Hebrew (romantic moment):
[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון?
Spanish (celebration to reflection):
[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.
In
openclaw.json, configure TTS under messages.tts:
{ "messages": { "tts": { "provider": "elevenlabs", "elevenlabs": { "apiKey": "sk_your_api_key_here", "voiceId": "pNInz6obpgDQGcFmaJgB", "modelId": "eleven_v3", "languageCode": "en", "voiceSettings": { "stability": 0.5, "similarityBoost": 0.75, "style": 0, "useSpeakerBoost": true, "speed": 1 } } } } }
Getting your API Key:
These premade voices are optimized for v3 and work well with audio tags:
| Voice | ID | Gender | Accent | Best For |
|---|---|---|---|---|
| Adam | | Male | American | Deep narration, general use |
| Rachel | | Female | American | Calm narration, conversational |
| Brian | | Male | American | Deep narration, podcasts |
| Charlotte | | Female | English-Swedish | Expressive, video games |
| George | | Male | British | Raspy narration, storytelling |
Finding more voices:
GET https://api.elevenlabs.io/v1/voicesVoice selection tips:
eleven_v3 (alpha) - ONLY model supporting audio tags| Mode | Stability | Description |
|---|---|---|
| Creative | 0.3-0.5 | More emotional/expressive, may hallucinate |
| Natural | 0.5-0.7 | Balanced, closest to original voice |
| Robust | 0.7-1.0 | Highly stable, less responsive to tags |
For audio tags, use Creative (0.5) or Natural. Higher stability reduces tag responsiveness.
Range: 0.7 (slow) to 1.2 (fast), default 1.0
Extreme values affect quality. For pacing, prefer audio tags like
[rushed] or [drawn out].
How many tags to use:
Where to place tags:
Context matters:
[nervous] I... I'm not sure about this. What if it doesn't work? works better than [nervous] Hello.Combine tags for nuance:
[nervously][whispers] = nervous whispering[excited][laughs] = excited laughterRegenerate for best results:
Match tag to voice:
[shouts] on a whispering voice[whispers] on a loud/energetic voicev3 does NOT support SSML break tags. Use audio tags and punctuation instead.
Punctuation enhances audio tags:
[nervous] I... I don't know...[excited] That's AMAZING![explaining] So what you do is— [interrupting] Wait![nervous] Are you sure about this?[happy] We did it!Combine tags + punctuation for maximum effect:
[tired] It was a long day... [sighs] Nobody listens anymore.
tts tool (returns Opus in /tmp/openclaw/tts-*/)message tool1. Generate TTS (add [pause] at end to prevent cutoff):
tts text="[excited] This is amazing! [pause]" channel=whatsapp
2. Find the LATEST file (⚠️ CRITICAL - always use the newest file!):
find /tmp/openclaw/tts-* /tmp/tts-* -name "*.opus" -o -name "*.mp3" -o -name "*.ogg" 2>/dev/null | xargs ls -t | head -1
The
tts tool now outputs to /tmp/openclaw/tts-*/ (NOT /tmp/tts-*/).
Old files may exist in /tmp/tts-*/ from previous sessions - never use those!
3. If file is MP3, convert to Opus:
ffmpeg -i /path/to/voice.mp3 -c:a libopus -b:a 64k -vbr on -application voip /path/to/voice.ogg
If already
.opus, skip this step.
4. Copy to workspace and send:
cp /tmp/openclaw/tts-xxx/voice.opus ~/. openclaw/workspace/voice-temp.ogg
message action=send channel=whatsapp target="+972..." filePath="/root/.openclaw/workspace/voice-temp.ogg" asVoice=true message=" "
5. Cleanup:
rm /root/.openclaw/workspace/voice-temp.ogg
WhatsApp requires a non-empty message body to send voice notes. Use a single space as the message.
| Format | iOS | Android | Transcribe |
|---|---|---|---|
| MP3 | ✅ Works | ❌ May fail | ❌ No |
| Opus (.ogg) | ✅ Works | ✅ Works | ✅ Yes |
Always convert to Opus - it's the only format that:
ElevenLabs sometimes cuts off the last word. Always add
or [pause]
at the end:...
[excited] This is amazing! [pause]
For content >800 chars:
tts toolcat > list.txt << EOF file '/path/file1.mp3' file '/path/file2.mp3' EOF ffmpeg -f concat -safe 0 -i list.txt -c copy final.mp3
Important: Don't mention "part 2" or "chapter" - keep it seamless.
v3 can handle multiple characters in one generation:
Jessica: [whispers] Did you hear that? Chris: [interrupting] —I heard it too! Jessica: [panicking] We need to hide!
Dialogue tags:
[interrupting], [overlapping], [cuts in], [interjecting]
| Category | Tags | When to Use |
|---|---|---|
| Emotions | [excited], [happy], [sad], [angry], [nervous], [curious] | Main emotional state - use 1 per section |
| Delivery | [whispers], [shouts], [soft], [rushed], [drawn out] | Volume/speed changes |
| Reactions | [laughs], [sighs], [gasps], [clears throat], [gulps] | Natural human moments - sprinkle sparingly |
| Pacing | [pause], [hesitates], [stammers], [breathes] | Dramatic timing |
| Character | [French accent], [British accent], [robotic tone] | Character voice shifts |
| Dialogue | [interrupting], [overlapping], [cuts in] | Multi-speaker conversations |
Most effective tags (reliable results):
[excited], [nervous], [sad], [happy][laughs], [sighs], [whispers][pause]Less reliable (test and regenerate):
[explosion], [gunshot]Full tag list: See references/audio-tags.md
Tags read aloud?
eleven_v3 modelVoice inconsistent?
WhatsApp won't play?
No emotion despite tags?
No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.