Google Gemini Media
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understand
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-to-end multimodal media workflows and code templates for "generation + understand
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
This Skill consolidates six Gemini API capabilities into reusable workflows and implementation templates:
Convention: This Skill follows the official Google Gen AI SDK (Node.js/REST) as the main line; currently only Node.js/REST examples are provided. If your project already wraps other languages or frameworks, map this Skill's request structure, model selection, and I/O spec to your wrapper layer.
npm install @google/genai
curl; if you need to parse image Base64, install jq (optional).GEMINI_API_KEYx-goog-api-key: $GEMINI_API_KEYInline (embedded bytes/Base64)
Files API (upload then reference)
files.upload(...) (SDK) or POST /upload/v1beta/files (REST resumable)file_data / file_uri in generateContentEngineering suggestion: implement
so that when a file exceeds a threshold (for example 10-15MB warning) or is reused, you automatically route through the Files API.ensure_file_uri()
inline_data (Base64) in response parts; in the SDK use part.as_image() or decode Base64 and save as PNG/JPG..pcm or wrap into .wav (commonly 24kHz, 16-bit, mono).Important: model names, versions, limits, and quotas can change over time. Verify against official docs before use. Last updated: 2026-01-22.
gemini-3-flash-preview for image, video, and audio understanding (choose stronger models as needed for quality/cost).veo-3.1-generate-preview (generates 8-second video and can natively generate audio).gemini-2.5-flash-preview-tts (native TTS, currently in preview).SDK (Node.js) minimal template
import { GoogleGenAI } from "@google/genai"; import * as fs from "node:fs";const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.generateContent({ model: "gemini-2.5-flash-image", contents: "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme", });
const parts = response.candidates?.[0]?.content?.parts ?? []; for (const part of parts) { if (part.text) console.log(part.text); if (part.inlineData?.data) { fs.writeFileSync("out.png", Buffer.from(part.inlineData.data, "base64")); } }
REST (with imageConfig) minimal template
curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" -H "x-goog-api-key: $GEMINI_API_KEY" -H "Content-Type: application/json" -d '{ "contents":[{"parts":[{"text":"Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"}]}], "generationConfig": {"imageConfig": {"aspectRatio":"16:9"}} }'
REST image parsing (Base64 decode)
curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY" \ -H "Content-Type: application/json" \ -d '{"contents":[{"parts":[{"text":"A minimal studio product shot of a nano banana"}]}]}' \ | jq -r '.candidates[0].content.parts[] | select(.inline_data) | .inline_data.data' \ | base64 --decode > out.pngmacOS can use: base64 -D > out.png
Use case: given an image, add/remove/modify elements, change style, color grading, etc.
SDK (Node.js) minimal template
import { GoogleGenAI } from "@google/genai"; import * as fs from "node:fs";const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const prompt = "Add a nano banana on the table, keep lighting consistent, cinematic tone."; const imageBase64 = fs.readFileSync("input.png").toString("base64");
const response = await ai.models.generateContent({ model: "gemini-2.5-flash-image", contents: [ { text: prompt }, { inlineData: { mimeType: "image/png", data: imageBase64 } }, ], });
const parts = response.candidates?.[0]?.content?.parts ?? []; for (const part of parts) { if (part.inlineData?.data) { fs.writeFileSync("edited.png", Buffer.from(part.inlineData.data, "base64")); } }
Best practice: use chat for continuous iteration (for example: generate first, then "only edit a specific region/element", then "make variants in the same style").
To output mixed "text + image" results, set
response_modalities to ["TEXT", "IMAGE"].
You can set in
generationConfig.imageConfig or the SDK config:
aspectRatio: e.g. 16:9, 1:1.imageSize: e.g. 2K, 4K (higher resolution is usually slower/more expensive and model support can vary).import { GoogleGenAI } from "@google/genai"; import * as fs from "node:fs";const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const imageBase64 = fs.readFileSync("image.jpg").toString("base64");
const response = await ai.models.generateContent({ model: "gemini-3-flash-preview", contents: [ { inlineData: { mimeType: "image/jpeg", data: imageBase64 } }, { text: "Caption this image, and list any visible brands." }, ], });
console.log(response.text);
import { GoogleGenAI, createPartFromUri, createUserContent } from "@google/genai";const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); const uploaded = await ai.files.upload({ file: "image.jpg" });
const response = await ai.models.generateContent({ model: "gemini-3-flash-preview", contents: createUserContent([ createPartFromUri(uploaded.uri, uploaded.mimeType), "Caption this image.", ]), });
console.log(response.text);
Append multiple images as multiple
Part entries in the same contents; you can mix uploaded references and inline bytes.
import { GoogleGenAI } from "@google/genai";const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const prompt = "A cinematic shot of a cat astronaut walking on the moon. Include subtle wind ambience."; let operation = await ai.models.generateVideos({ model: "veo-3.1-generate-preview", prompt, config: { resolution: "1080p" }, });
while (!operation.done) { await new Promise((resolve) => setTimeout(resolve, 10_000)); operation = await ai.operations.getVideosOperation({ operation }); }
const video = operation.response?.generatedVideos?.[0]?.video; if (!video) throw new Error("No video returned"); await ai.files.download({ file: video, downloadPath: "out.mp4" });
Key point: Veo REST uses
:predictLongRunning to return an operation name, then poll GET /v1beta/{operation_name}; once done, download from the video URI in the response.
aspectRatio: "16:9" or "9:16"resolution: "720p" | "1080p" | "4k" (higher resolutions are usually slower/more expensive)Polling fallback (with timeout/backoff) pseudocode
const deadline = Date.now() + 300_000; // 5 min let sleepMs = 2000; while (!operation.done && Date.now() < deadline) { await new Promise((resolve) => setTimeout(resolve, sleepMs)); sleepMs = Math.min(Math.floor(sleepMs * 1.5), 15_000); operation = await ai.operations.getVideosOperation({ operation }); } if (!operation.done) throw new Error("video generation timed out");
import { GoogleGenAI, createPartFromUri, createUserContent } from "@google/genai";const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); const uploaded = await ai.files.upload({ file: "sample.mp4" });
const response = await ai.models.generateContent({ model: "gemini-3-flash-preview", contents: createUserContent([ createPartFromUri(uploaded.uri, uploaded.mimeType), "Summarize this video. Provide timestamps for key events.", ]), });
console.log(response.text);
import { GoogleGenAI } from "@google/genai"; import * as fs from "node:fs";const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.generateContent({ model: "gemini-2.5-flash-preview-tts", contents: [{ parts: [{ text: "Say cheerfully: Have a wonderful day!" }] }], config: { responseModalities: ["AUDIO"], speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: "Kore" }, }, }, }, });
const data = response.candidates?.[0]?.content?.parts?.[0]?.inlineData?.data ?? ""; if (!data) throw new Error("No audio returned"); fs.writeFileSync("out.pcm", Buffer.from(data, "base64"));
Requirements:
multiSpeakerVoiceConfigvoice_name supports 30 prebuilt voices (for example Zephyr, Puck, Charon, Kore, etc.).Provide controllable directions for style, pace, accent, etc., but avoid over-constraining.
import { GoogleGenAI, createPartFromUri, createUserContent } from "@google/genai";const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); const uploaded = await ai.files.upload({ file: "sample.mp3" });
const response = await ai.models.generateContent({ model: "gemini-3-flash-preview", contents: createUserContent([ "Describe this audio clip.", createPartFromUri(uploaded.uri, uploaded.mimeType), ]), });
console.log(response.text);
No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.