Podcast Generation with Microsoft Foundry
Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation f
Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation f
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
Generate real audio narratives from text content using Azure OpenAI's Realtime API.
AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini
Note: Endpoint should NOT include
/openai/v1/ - just the base URL.
from openai import AsyncOpenAI import base64Convert HTTPS endpoint to WebSocket URL
ws_url = endpoint.replace("https://", "wss://") + "/openai/v1"
client = AsyncOpenAI( websocket_base_url=ws_url, api_key=api_key )
audio_chunks = [] transcript_parts = []
async with client.realtime.connect(model="gpt-realtime-mini") as conn: # Configure for audio-only output await conn.session.update(session={ "output_modalities": ["audio"], "instructions": "You are a narrator. Speak naturally." })
# Send text to narrate await conn.conversation.item.create(item={ "type": "message", "role": "user", "content": [{"type": "input_text", "text": prompt}] }) await conn.response.create() # Collect streaming events async for event in conn: if event.type == "response.output_audio.delta": audio_chunks.append(base64.b64decode(event.delta)) elif event.type == "response.output_audio_transcript.delta": transcript_parts.append(event.delta) elif event.type == "response.done": breakConvert PCM to WAV (see scripts/pcm_to_wav.py)
pcm_audio = b''.join(audio_chunks) wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)
// Convert base64 WAV to playable blob const base64ToBlob = (base64, mimeType) => { const bytes = atob(base64); const arr = new Uint8Array(bytes.length); for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i); return new Blob([arr], { type: mimeType }); };const audioBlob = base64ToBlob(response.audio_data, 'audio/wav'); const audioUrl = URL.createObjectURL(audioBlob); new Audio(audioUrl).play();
| Voice | Character |
|---|---|
| alloy | Neutral |
| echo | Warm |
| fable | Expressive |
| onyx | Deep |
| nova | Friendly |
| shimmer | Clear |
response.output_audio.delta - Base64 audio chunkresponse.output_audio_transcript.delta - Transcript textresponse.done - Generation completeerror - Handle with event.error.messageNo automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.