Build a Voice AI Agent in 30 Minutes with the ElevenLabs Skill

A year ago, scaffolding a real-time voice agent was a multi-week project. You needed to wire WebSocket streams, handle turn-detection, manage barge-in, integrate a tool layer, deal with audio buffering on both ends, and somehow make the latency acceptable. Most teams gave up after the prototype.

The official ElevenLabs Voice Agents skill compresses that wiring layer into something your AI coding assistant can scaffold for you. This tutorial walks through building a working voice agent — a customer-support bot for a fictional SaaS product — from zero to first conversation in about half an hour.

Key Takeaways

The voice-agents skill scaffolds a full conversational agent, including persona, tool calls, and streaming setup.
You configure the agent in the ElevenLabs dashboard and the skill emits the wiring code your app needs.
Tool calling is first-class — the agent can look up customers, create tickets, or trigger workflows mid-conversation.
Turn-taking and barge-in are handled by the skill, not your application code.
Local development uses the streaming WebSocket; production deployments can stream over WebRTC for sub-200ms latency.

What you'll build

A voice agent named "Aria" that:

Greets a caller and asks how it can help.
Looks up account information when given an email address (tool call).
Creates a support ticket when the issue cannot be resolved (tool call).
Hands off to a human when the caller asks for one.

The full project ships in two files: an agent definition and a Node.js entry point.

Prerequisites

An ElevenLabs account with the voice-agents skill installed
An ELEVENLABS_API_KEY configured
Node.js 20 or newer
A Claude Code or Cursor workspace

If you have not set up the API key yet, install the setup-api-key skill first and ask your agent: "Use the setup-api-key skill to configure ElevenLabs." The skill walks through key creation in the dashboard and verifies the configuration before continuing.

Step 1 — Define the agent

Open Claude Code in an empty directory and prompt:

Use the elevenlabs voice-agents skill to scaffold a customer support agent named Aria. The agent should be friendly, professional, and capable of looking up accounts and creating tickets. Use English (US) and the voice "Sarah".

The skill emits a configuration block that looks roughly like this:

# agent.yaml
name: Aria
voice: sarah
language: en-US
system_prompt: |
  You are Aria, a customer support agent for Acme SaaS. Be friendly, concise,
  and professional. When a customer provides an email, use the lookup_account
  tool. If you cannot resolve their issue, use the create_ticket tool.
  Hand off to a human when explicitly requested.
tools:
  - name: lookup_account
    description: Look up a customer account by email address
    parameters:
      email: string
  - name: create_ticket
    description: Create a support ticket for the current customer
    parameters:
      summary: string
      severity: enum [low, medium, high]

Save it. The skill will reference this file when generating the runtime code.

Step 2 — Implement the tool layer

The agent calls tools through your application code. Prompt your assistant:

Generate the tool implementations for lookup_account and create_ticket. Use a stub data layer for now — we'll wire it to a real database after the agent is working.

You will get something like:

// tools.ts
export async function lookupAccount({ email }: { email: string }) {
  // Stub — replace with real DB query later
  if (email === 'demo@acme.com') {
    return { found: true, plan: 'pro', mrr: 99 }
  }
  return { found: false }
}

export async function createTicket({
  summary,
  severity,
}: {
  summary: string
  severity: 'low' | 'medium' | 'high'
}) {
  // Stub — replace with real ticket-system call later
  const id = `TICK-${Math.floor(Math.random() * 10000)}`
  console.log(`[ticket created] ${id} (${severity}): ${summary}`)
  return { ticketId: id }
}

Stubs are deliberate. You want to validate the conversational flow before wiring real systems. The skill knows this and scaffolds stubs by default.

Step 3 — Wire the streaming entry point

This is the part that used to be the hardest. Ask the assistant:

Generate the streaming entry point that connects to ElevenLabs over WebSocket and routes tool calls to my tool implementations.

The skill emits something like:

// index.ts
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js'
import { lookupAccount, createTicket } from './tools'

const client = new ElevenLabsClient({ apiKey: process.env.ELEVENLABS_API_KEY! })

const session = await client.agents.startSession({
  agentId: process.env.AGENT_ID!,
  toolHandlers: {
    lookup_account: lookupAccount,
    create_ticket: createTicket,
  },
})

session.on('audio', (chunk) => {
  // Stream chunk to the caller's audio output
})

session.on('user_speech', (transcript) => {
  console.log(`[caller] ${transcript}`)
})

session.on('agent_speech', (transcript) => {
  console.log(`[aria] ${transcript}`)
})

session.on('end', (summary) => {
  console.log('[call ended]', summary)
})

Notice what is not in this file: WebSocket framing, audio buffering, turn-detection, barge-in handling, retry logic. The skill wraps all of that inside startSession. You are left with the parts that actually matter to your application — what the tools do and where the audio goes.

Step 4 — Run a local conversation

For local testing, the easiest path is the dashboard's "test in browser" affordance. The skill prints a deep link to that page when scaffolding, so you can talk to your agent before wiring real audio I/O.

Once you are ready to drive audio from your own app, the skill scaffolds either a browser client (WebRTC) or a server bridge (twilio/livekit/agora). Ask your assistant for the variant you want:

Add a Twilio bridge so we can dial Aria from a phone number.

You will get a webhook handler that bridges Twilio Media Streams to the ElevenLabs session. Drop it into a Vercel Function, point a Twilio number at it, and you have a working phone agent.

Step 5 — Replace stubs with real systems

Once the conversational flow is right, swap the stub tools for real implementations. This is the cheap part. Most teams find that 80% of the work was the wiring — now removed — and the remaining 20% is straightforward database calls.

Common pitfalls

Latency feels off. The skill defaults to WebSocket streaming, which is fine for development but adds 200-400ms of buffering. For production, switch to WebRTC. The skill knows how — ask it.

The agent ignores tools. Usually means the system prompt doesn't tell the agent when to call them. Be explicit: "Whenever a user provides an email address, call lookup_account." Skill-scaffolded prompts include this guidance, but it is worth checking.

Barge-in feels awkward. ElevenLabs handles barge-in by default, but some voices and pacing settings tune it down. The skill exposes interruptible: true on the session — make sure it is set.

Where to go next

Add a knowledge base — the agents skill supports retrieving from a vector store mid-conversation.
Add analytics — pipe session.on('end', ...) summaries into your analytics layer.
Layer in voice-isolator — pre-process inbound audio if your callers are in noisy environments.

The point of skills is that adding any of these is a one-line prompt to your assistant, not a one-week project.

Build a Voice AI Agent in 30 Minutes with the ElevenLabs Skill

Build a Voice AI Agent in 30 Minutes with the ElevenLabs Skill

Key Takeaways

What you'll build

Prerequisites

Step 1 — Define the agent

Step 2 — Implement the tool layer

Step 3 — Wire the streaming entry point

Step 4 — Run a local conversation

Step 5 — Replace stubs with real systems

Common pitfalls

Where to go next

References

Related Skills to Try

Related Skills to Try

Soultrace

Related Articles

Related Articles

Gesture Recognition in AI Interfaces

CI/CD on Apple Silicon With AI

Apple Silicon Optimization for AI

Soultrace

ElevenLabs Voice Agents

ElevenLabs Text to Speech

ElevenLabs Skills

ElevenLabs Voice Agents

ElevenLabs Text to Speech

ElevenLabs Skills