AI Images in the Terminal: Creative Direction, Not a Slot Machine

Most people use AI image generation like a slot machine. They type "blog header about productivity," pull the lever, get a stock-feeling render of a desk with a plant, and either accept it or pull again. The output is generic because the input was generic — the model filled in every unstated decision with its median guess, and the median is a desk with a plant. This is the same failure as a vibe-coded landing page: when you don't direct, the model ships the average, and the average is dead.

The fix isn't a better model. It's a better brief, generated without leaving your terminal. You can produce blog headers, thumbnails, product mockups, even room redesigns from inside the agent at roughly four cents an image — and a wrapper skill can take your lazy one-liner and expand it into a directed shot using a five-component formula. Stop pulling the lever. Start writing the brief.

Key Takeaways

Generic in, generic out. A one-line prompt makes the model guess every unstated decision — and its guesses are the median. Direction is the entire game.
Five components turn a lever-pull into a shot. Subject, Action, Location, Composition, Style — name all five and the slot machine becomes a brief.
It lives in the terminal. nano-banana-gemini-images generates blog headers, thumbnails, mockups and room redesigns without leaving the agent at ~$0.04/image.
A creative director can do the directing for you. banana-claude-creative-director expands a lazy request into the full five-component prompt before it ever hits the model.
One free key unlocks it. nano-banana needs a free Gemini API key — that's the only setup between you and terminal-native image generation.

The Slot Machine vs. The Brief

The difference between a generic render and a striking one is almost never the model. It's whether the prompt specified the decisions the model would otherwise make for you. Leave a decision unstated and the model fills it with the most probable value — which, by definition, is what everyone else gets too.

The five-component formula forces every decision onto the table: Subject (what), Action (what it's doing), Location (where), Composition (how it's framed), Style (how it's rendered). Name all five and there's nothing left for the model to median.

	Lazy prompt (slot machine)	Directed shot (brief)
Prompt	"blog header about productivity"	"A single brass stopwatch (Subject) mid-fall, frozen an inch above a marble desk (Action), in a sunlit minimalist studio (Location), shot from a low 3/4 angle with shallow depth of field (Composition), editorial product-photography style, warm key light, high contrast (Style)"
Subject	Unspecified → desk + plant	Brass stopwatch
Composition	Centered, flat	Low 3/4 angle, shallow DoF
Style	Model's default	Editorial product photo
Result	Stock-feeling, forgettable	Specific, ownable, on-brief

The right column isn't more creative than you — it's more complete. Every cell the model would have guessed, you answered. That's the whole difference between an asset that looks chosen and one that looks defaulted.

Generating From Inside the Agent

nano-banana-gemini-images puts Gemini image generation inside Claude Code, so you produce assets in the same loop where you write the post — no browser tab, no copy-paste, no context switch. It handles blog headers, thumbnails, product mockups, and even room redesigns, at roughly $0.04 per image.

Install it as a plugin:

/plugin marketplace add devonjones/devon-claude-skills
/plugin install nano-banana@devon-claude-skills

It needs a free Gemini API key — that's the only prerequisite. Set it, and the agent can generate inline:

"Generate a 1200x630 blog header. Subject: a brass stopwatch mid-fall.
Action: frozen an inch above a marble desk. Location: sunlit minimalist
studio. Composition: low 3/4 angle, shallow depth of field, subject
left-of-center for headline space. Style: editorial product photography,
warm key light, high contrast."

Because it runs in the terminal at four cents a shot, iteration is cheap. Generate three framings, keep the one that works, and you've spent twelve cents and zero context switches. That economics is what makes direction practical — you can afford to be picky when each pull costs a nickel.

Letting a Creative Director Write the Brief

Writing all five components every time is friction, and friction is where people slide back to lazy one-liners. banana-claude-creative-director closes that gap: it takes a vague request and expands it into the full directed prompt before generation, so you get the quality of a complete brief from the convenience of a casual ask.

/plugin marketplace add AgriciDaniel/banana-claude

Now the lazy input becomes a directed output automatically:

You:  "thumbnail for my video on shipping faster"
CD:   → Subject: a paper plane made of code.
      → Action: launching off a laptop trackpad.
      → Location: dark desk, neon rim light.
      → Composition: dynamic diagonal, rule-of-thirds, bold negative space.
      → Style: high-contrast tech-editorial, cinematic.
      [generates the directed shot]

This is the same pattern as the rest of the series: a SKILL.md overwrites the model's default behavior before the work starts. The creative-director skill doesn't make the model better at images — it makes your prompts better at directing, which is where the median dies. Pair it with nano-banana-gemini-images for generation and you've got a terminal-native studio: one skill directs the shot, the other renders it.

Frequently Asked Questions

What does it cost to run image generation in the terminal?

Roughly $0.04 per image with nano-banana-gemini-images. Because each pull is a few cents, iteration is genuinely cheap — generate three framings, keep one, and you've spent about a dime. That low per-image cost is what makes serious direction practical; you can afford to reject the first two and refine.

Do I need a paid API key?

You need a free Gemini API key for nano-banana — that's the only setup step. There's no paid tier required to start; the per-image cost is billed through Gemini, and the free key gets you generating from inside the agent immediately.

What's the five-component formula again?

Subject, Action, Location, Composition, Style. Name all five and you've answered every decision the model would otherwise guess with its median value. banana-claude-creative-director can expand a vague request into all five for you, so you get a complete brief without writing one by hand each time.

Why not just use a web image tool?

Two reasons. First, context: generating inside the agent keeps you in the same loop where you're writing the post, no tab-switching or copy-paste. Second, direction: the creative-director skill enforces the five-component brief automatically, which a generic web prompt box does not. The terminal isn't just convenient — it's where the anti-slop direction lives.

Browse 135+ agent-ready design systems in the Designs category, or explore the full skill catalog at aiskill.market.

AI Images in the Terminal: Creative Direction, Not a Slot Machine

Key Takeaways

The Slot Machine vs. The Brief

Generating From Inside the Agent

Letting a Creative Director Write the Brief

Frequently Asked Questions

What does it cost to run image generation in the terminal?

Do I need a paid API key?

What's the five-component formula again?

Why not just use a web image tool?

Related Skills to Try

Related Skills to Try

baoyu-comic

comfyui

Related Articles

Related Articles

Design Systems for Solo Builders

Why AI Output Needs DESIGN.md

Trust Is a Design Material — and Taste Is Still Yours

baoyu-comic

comfyui

baoyu-infographic

inference-sh-cli

baoyu-infographic

inference-sh-cli