Conceptual product case study ~2,000 words | 5–6 min full read | 2–3 min skim

Explore how AI and humans can share decision-making in high-stakes workflows while maintaining user control. The project examines how design patterns around confidence signals, escalation pathways, and user override capabilities can reduce decision friction and accelerate time to resolution while keeping users in the loop and respecting their anxiety around AI-made decisions.
The project uses customer support as a familiar context to prototype conceptual UX patterns for safer AI-assisted decisions.
This project was intentionally designed within the following constraints:
How should systems decide when AI can act independently, when humans should review, and when escalation is required?
I used customer support as the primary context because it’s widely understood and directly relevant to many companies building AI-assisted support workflows, such as Intercom, Zendesk, and similar customer service platforms where AI must decide when to respond, defer, or escalate. The patterns here apply anywhere AI assists humans with decisions that carry real consequences.
This is a conceptual project, grounded in public documentation, real workflows, and observed patterns. The goal wasn't to design a "perfect AI," but to design a system that helps humans make better judgment calls.
Support teams are under pressure to move faster without eroding trust.
AI helps by drafting responses, but it introduces new tensions:
The core challenge isn't speed or accuracy in isolation. It's helping humans decide, quickly and safely. Is this response good enough to send, does it need review, or should it escalate?
One principle shaped every decision in this system.
AI is a tool, not a decision-maker.
This preserves accountability, builds trust gradually, and respects human expertise. It also reflects real operational and legal constraints. In many customer-facing or regulated environments, responsibility can't sit with an automated system. Someone has to own the outcome.
Everything that follows is designed around that reality.
Most AI tools treat confidence as the primary signal for action. I started there too.
But looking at real support behavior across products like Intercom Fin and Zendesk AI revealed something different.
Agents don't escalate because AI is uncertain. They escalate because the stakes are high.
Example 1: Low stakes
"Where's my order?" AI confidence: 65%. Action: Sent anyway (low stakes, easy to correct)
Example 2: High stakes
"Can I get a refund after 30 days?" AI confidence: 92%. Action: Reviewed carefully (policy and revenue impact)
A wrong answer to a low-stakes question is an annoyance. A wrong answer to a high-stakes question is a crisis.
This shifted the design from a single axis (confidence) to a two-axis system - Confidence and stakes.
AI behavior varies along two dimensions:
AI confidence Based on source reliability and answer consistency.
Question stakes Inferred from cost impact and reversibility.
| Low Stakes | High Stakes | |
|---|---|---|
| High Confidence | Auto-send | Suggest with review |
| Low Confidence | Flag for review | Escalate immediately |
This framework doesn't just explain behavior. It directly drives the interface.
The system evaluates confidence and question type in real time to determine the UI state.
Customer asks:
"Can I get a refund for my annual plan?"
System evaluation:
UI changes:
AI is likely correct, but the cost of being wrong is high. The interface forces a conscious decision without removing agency.
The system doesn't treat all AI output the same. Its authority changes based on the cost of being wrong.
When confidence is high and the question is low risk, the AI can respond immediately. Speed matters more than precision, and mistakes are easy to correct.
When confidence is high but stakes increase, the AI drafts a response instead of sending it. The agent decides whether to send, edit, or escalate.
When confidence is low or judgement is required, escalation becomes the safest default.
This keeps automation fast where it's safe, and deliberately slower where errors carry real consequences.
What the agent sees: Customer message, AI draft, confidence badge (High, Medium, Low), question type tag (General, Billing, Security), actions (Send, Edit, Escalate).

Caption: High-confidence general inquiry. "Send" is the primary action. Reasoning is collapsed by default but available on demand. The interface supports fast decisions without removing verification.
Key design decisions
Trade-off: Power users might want more granular data. I chose clarity over precision for most workflows.
Collaboration: Confidence states assume close collaboration with ML and data partners to calibrate thresholds using real accuracy and override data.
When AI detects conflicting information, the interface shifts.
What changes: Confidence drops to Medium. An inline explanation appears ("Conflicting policy documents found"). Reasoning auto-expands showing both sources with timestamps. Edit becomes the primary action.

Caption: Medium confidence with specific uncertainty surfaced. Conflicting sources are shown inline so the agent can resolve the issue quickly.
What happens: AI says "I'm about 50% sure this is the right policy, but I found conflicting information in our KB."
Why it matters: An agent might trust "high confidence" and send a wrong answer. They won't trust "low confidence" and will review more carefully or escalate.
Key design decisions
Trade-off: Surfacing uncertainty increases cognitive load. But it prevents silent failure on ambiguous cases.
Collaboration: Source conflict detection would be reviewed with knowledge and content teams to ensure surfaced issues reflect real gaps, not outdated content.
Low confidence combined with high-stakes question types triggers escalation mode.
What the agent sees: Escalate is primary. The form is pre-filled with customer context, what AI attempted, and a suggested specialist. The agent can override routing if needed.

Caption: Escalation framed as a routing decision, not a failure. Context is preserved so specialists can act immediately.
Key design decisions
Trade-off: Less manual control by default. Chosen to reduce cognitive load and mis-routing.
Collaboration: Routing logic would be co-designed with support ops and engineering to reflect team structure and availability.
The system improves by observing real behavior.
What happens: An agent edits tone from formal to casual. The system logs the pattern. A brief indicator appears ("System learned from tone adjustment") and fades.

Caption: Learning happens silently in the background. No surveys, no ratings, no workflow interruption.
Key design decisions
Trade-off: Slower adaptation. Chosen to avoid over-fitting to individual preferences.
Collaboration: Learning thresholds would be reviewed with data science to balance responsiveness and stability.
Primary metric
Deflection rate without CSAT degradation.
Leading indicators
Lagging indicators
Health metrics
Each builds on the same principle: AI assists judgment, it doesn't replace it.
If this shipped tomorrow, I'd validate:
This is a conceptual project, so there are no shipped metrics. But if this system were deployed, here is how I would measure whether it was working:
AI confidence matters less than helping humans understand the cost of being wrong.
Trust is built by moving fast when it is safe, slowing down when stakes are high, and improving quietly over time.
Handoff is not about maximizing automation. It is about designing systems that make human judgment better.