Designing AI Product Behavior, Not Just Pixels
The next design surface isn't the layout — it's how the product behaves. Context budgeting, turn-taking, generative UI, and frustration repair, in one skill.
You can give an AI product a beautiful interface and still ship a bad product. The slop in an AI app isn't usually the layout — it's the behavior. The assistant that dumps a wall of text when a card would do. The one that asks three clarifying questions when it should have just acted, or acts when it should have asked. The one that has no idea the user is frustrated and keeps cheerfully repeating the same wrong answer. None of that is a pixel problem. It's a behavior-design problem, and it's the surface most teams haven't learned to see yet.
This is the design layer past the design layer. Once your design system handles the look, the next question is how the thing acts — and "acts" is designable. Context-window budgeting, conversation and turn-taking, when to render generative UI versus plain text, progressive disclosure, mixed-initiative flow, frustration detection: these are real decisions with right and wrong answers. The source thread for this series imagined a separate skill per concept, but the real repo bundles all of it into one — model-interaction-design. The model ships median behavior by default; this skill overwrites it with deliberate behavior.
Key Takeaways
- Behavior is a design surface. How an AI product responds — render UI or text, ask or act, recover or repeat — is designed, not accidental, and it's where most AI slop actually lives.
- One real skill, many concerns.
model-interaction-designcovers context budgeting, turn-taking and repair, generative UI decisions, progressive disclosure, mixed-initiative flow, and frustration detection — install it withnpx skills add https://github.com/Owl-Listener/ai-design-skills --skill model-interaction-design. - Context is a budget, not a bucket. Treat the window as a finite resource you allocate on purpose; spend it on what the current turn needs.
- Generative UI is a choice, not a default. Render structured UI when the user needs to act or compare; render plain text when they need to read or reason.
- Frustration is a signal to design for. Detect it and change behavior — escalate, simplify, or hand off — instead of repeating the failing response.
Why Behavior Is the Real Surface
A static product is judged by its screens. An AI product is judged by its turns — the back-and-forth, the recovery when it's wrong, the moments it reads the room or fails to. You can A/B test a button color, but behavior is what users remember. When someone says an AI feature "feels dumb," they almost never mean it looked bad. They mean it asked when it should have acted, buried the answer in prose, or ignored that they were clearly annoyed.
That's why behavior design is high-leverage. Most teams pour their effort into the chat UI — the bubbles, the streaming animation, the input box — and leave the behavior at whatever the base model does out of the box. The base model's behavior is the median behavior, and the median is forgettable. model-interaction-design exists to make the behavior deliberate, the same way a DESIGN.md makes the pixels deliberate. It's a single skill covering several concerns, so installing it overwrites the agent's default interaction taste across the board.
The Behavior Concerns, and What Good Looks Like
Here's the map. Each row is a behavior decision the skill helps you make on purpose instead of by default.
| Behavior concern | Default / slop behavior | What good looks like |
|---|---|---|
| Context-window budgeting | Stuff everything in until it overflows | Allocate the window on purpose; keep what this turn needs, summarize or drop the rest |
| Turn-taking & repair | Plough ahead; restate the same wrong answer | Yield at natural points; on failure, diagnose and try a different approach |
| Generative UI vs text | Always reply in prose | Render UI to act or compare; use text to read or reason |
| Progressive disclosure | Dump everything at once | Reveal depth on demand; lead with the answer, offer the detail |
| Mixed-initiative flow | Either always asks or never asks | Ask when ambiguity is costly; act when intent is clear |
| Frustration detection | Oblivious; repeats the failing reply | Detect the signal; simplify, escalate, or hand off |
Read top to bottom, the difference between the middle column and the right column is the difference between a forgettable AI feature and one that feels like it was designed by someone who'd used AI products before. That's the whole bet of this series.
Context as a Budget
The most common behavior bug is treating the context window as a bucket you fill until it overflows. It's a budget. Every token you spend on stale history or irrelevant retrieved chunks is a token not spent on the current task — and an overstuffed window makes the model worse, not better. Designing context budgeting means deciding, per turn, what earns its place.
You manage a finite context budget. Before each turn, decide what to
keep, summarize, or drop:
- Keep: the current goal, the last 2-3 relevant turns, active state.
- Summarize: older conversation into a compact running brief.
- Drop: resolved subtasks, superseded plans, tool output already acted on.
Never carry forward context just because it fits. Spend the window on
what THIS turn needs.
That's a behavior instruction, and it changes how the product feels: sharper, less prone to drift, less likely to "forget" the actual task by burying it under its own history. The budgeting discipline pairs directly with the prompt-architecture work in system prompts are design — what you put in the window and how you structure it are the same problem from two angles.
Generative UI vs Plain Text
The single highest-impact behavior decision in an AI product is when to render structured UI instead of text. Get it wrong in one direction and the user reads a paragraph when they wanted a button. Get it wrong in the other and you render a complex widget when a sentence would have done. The rule is about the user's next move:
Decide the response format by what the user needs to do next:
- They need to ACT (choose, confirm, edit, compare options) -> render UI.
Buttons, a comparison table, a form, a card with actions.
- They need to READ or REASON (an explanation, a judgment, a narrative)
-> render plain text.
- Mixed -> lead with text, then offer UI for the actionable part.
Default to the lighter option; escalate to UI only when it earns itself.
This is where "generative UI" stops being a buzzword and becomes a decision rule. The product that nails it feels responsive and considered; the one that dumps prose for everything feels like a chat log. Tie this to frustration detection — when a user re-asks the same thing, that's often a signal your last response was the wrong format, not just the wrong content. Designing the recovery is part of the behavior, and it's exactly the kind of trust-building move covered in trust is a design material.
Frequently Asked Questions
Isn't this just prompt engineering?
No — prompt engineering is one tool inside behavior design. Behavior design is the broader discipline of deciding how the product acts across turns: when to render UI, how to budget context, how to recover from failure, how to read frustration. Prompts implement some of those decisions, but the decisions themselves are design choices. The prompt-architecture layer is the implementation; this is the spec.
Why one skill instead of a skill per concept?
Because these concerns are entangled. Context budgeting affects turn-taking; turn-taking affects when you render UI; frustration detection changes all of them. A single model-interaction-design skill keeps them coherent, so you overwrite the agent's whole interaction taste at once rather than bolting on six conflicting fragments.
How do I know if my AI product has a behavior problem?
Watch real sessions, not happy-path demos. Look for users re-asking the same question (format or content failure), abandoning mid-flow (turn-taking or initiative failure), or the assistant repeating itself after a wrong answer (no repair, no frustration detection). Those are behavior bugs, and no amount of UI polish fixes them.
Does this only apply to chat products?
No. Any product where an AI takes actions, makes decisions, or responds to ambiguous input has behavior to design — agents, copilots, search, recommendations. The chat case is just the most visible. Mixed-initiative flow and generative UI decisions matter anywhere the AI has to choose between asking and acting.
Where does this sit in the anti-slop stack?
Behavior is the layer between pixels and trust in the six-layer anti-slop stack. The pillar argument — the model ships the median — applies to behavior even harder than to pixels, because almost no one is designing behavior on purpose yet.
Explore the deeper design layers in the Designs category, or browse the full skill catalog at aiskill.market.