An Approach for Designing Trustworthy Agents
Agents & People
I’ve been thinking about trust for years. How it’s granted, how it’s built, how it breaks, and how it’s repaired. Recently, seeing the struggles of companies trying to apply AI for their customers, agentic AI has become a forcing function for me to bring clarity to these thoughts.
Collaboration with agents may be new, but we collaborate with them in similar ways as people. You need to know that you can trust both agents and people to complete specific tasks. More broadly, to have your best interest in mind.
So how do we build that trust?
Trust Happens in Phases
Logically, trust doesn’t happen all at once, but moves through distinct phases. Each one involves different considerations and a different design approach.
| Acquiring Trust | Growing Trust | Maintaining Trust | Repairing Trust |
|---|
So if you wouldn’t accept behavior from a teammate, you shouldn’t accept it from an agent. What follows is a framework for building trust in agentic systems by reflecting on how we build trust with each other.
Acquiring Trust
The first time a user encounters your AI agent, they’re asking: Can this thing actually do what it claims? Is it trying to be something it’s not? Can I trust it with real work?
Humility
The systems that fail here are the ones that overpromise. AI that claims to think like you or understand your business sets expectations it can’t meet. When the agent inevitably falls short, trust collapses before the user even sees what it can do well.
In marketing, sales, UX guidance, agents should be positioned as a capable tool, not a god-like solution to everything. If customers find it valuable, they’ll drive word of mouth. Underpromise and overdeliver, and the hype will become natural.
Small Wins
Don’t show users everything the agent can do on day one. Let them experience it succeeding at simple tasks first. Build confidence through evidence, not assertion. Start with high supervision on simple tasks, then expand scope as trust builds through an increasing range of wins.
Growing Trust
Consistency
Once initial trust forms, it deepens through accumulated experience. Users learn to anticipate how the system behaves. Consistency becomes the primary trust-builder. Can it prove to me that the trust I’ve granted it is sustainable?
Progressive Mental Models
Onboarding experiences aren’t just for users what or how to do things. They’re for building their mental model of how it works. When mental models align with actual system behavior, trust grows. When they diverge, trust erodes—even if the agent performs well.
This is why progressive clarity matters. Users shouldn’t encounter full complexity on day one:
- Day 1: Simple tasks, high supervision, immediate feedback
- Day 10: Pattern recognition begins, users predict agent behavior
- Day 100: Earned autonomy, users delegate without constant monitoring
Bias Towards (Low-Risk) Action
Trust builds through action, not contemplation. Agentic systems that require extensive configuration before delivering value miss the window for trust-building. Agents that succeed demonstrate competence immediately, even at limited scope.
Purposeful Check-Ins
Research reveals a dangerous dynamic. Users can shift from appropriate skepticism to following AI recommendations without question. This happens when agents work too seamlessly and users lose the cognitive engagement that keeps trust calibrated. The solution: design systems where humans remain reviewers and decision-makers while agents handle execution.
Progressive Scope & Autonomy
Training wheels at first. Autonomy earned through performance. Each successful cycle proves the agent can handle more, and supervision requirements reduce gradually. You wouldn’t give a new employee full autonomy on day one. Don’t do it with an AI agent.
Maintaining Trust
Fragility
Trust is easier to destroy than build. A single significant failure can devastate years of credibility. In agentic systems where autonomous actions cascade into serious consequences, maintaining trust becomes an active discipline.
Transparency
Agents should show their work, and be clear about how confident they are. You can’t be certain about everything. But here’s where it gets nuanced: sometimes you want tools to disappear in skilled hands, other times you need mechanisms visible. The design challenge is giving users control over this dynamic.
Meaningful Explainability
How things are explained matters. You could dump the entire thought log of how the agent came to an output. Explanations must be meaningful, not performative. Showing your work only builds trust if the work you’re showing is actually what happened, and if what you’re showing is understandable. This reminds me of Grok’s early UX thought bubble pattern that got such accolades, yet (IMO) failed on this heuristic.
Agents need persistent, digestible audit trails—not incoherent braindumps or reverse rationalizations. Every decision should be logged, reviewable, and explainable on demand. Users should trace back from any agent action to the reasoning and data that informed it.
Repairing Trust
In complex systems of people and agents, trust will not always grow or be sustained. Mistakes can happen, sometimes with significant consequences. The difference between systems that recover and those that don’t lies in how failures are handled.
Acknowledge, Explain, Correct, Commit
Research on trust repair identifies key strategies: acknowledge the breach, explain what went wrong, demonstrate corrective action, explicitly recommit to trustworthy outcomes. For agentic systems, this means graceful error handling that doesn’t just say sorry—it shows what’s being done to prevent recurrence.
Timing
I heard recently that apologies immediately after loss of trust are less effective than apologies when new opportunities to trust arise. Who knows how true that is, but it checks out on my end. Error acknowledgment from systems should include explicit recommitment and give users control over how to reengage. Forcing immediacy and control backfires.
Systems that maintain trust through failures reflect, learn, and focus forward rather than deflect blame. After significant errors, agents should automatically scale back to requiring more human approval. They need to re-earn autonomy through demonstrated improvement.
Mode Reversal
When an agent fails at a high-stakes task, it should automatically revert to training-wheel mode for similar tasks. Users should see explicit evidence of improved performance before expanding autonomy again.
Agentic Empathy
I’m still noodling on this one, the relationship between Trust and Empathy. So far, I’m thinking of Empathy as a means towards Trust.
Second, if we’re to design empathetic agents, then we ourselves must become even more empathetic.
You can’t design trustworthy systems from a distance. Empathy underlies all of the points discussed above. If you nurture empathy as a muscle, or a skill, then I’d expect that many of the items above will come naturally. In other words, empathetic agents and people embody more trustworthy habits.
When you transcend intellectually “knowing” into feeling what’s at risk for users, design decisions change. The weight of agentic delegation that users feel becomes butterflies in your own gut.
When a user trusts your agent with high-stakes work, they’re putting their reputation and outcomes in your hands. That understanding shapes everything: how you position the agent’s role, how you design for transparency, how you handle failures.
Empathy takes work. Using AI, to the extent we want our users to, in the ways we think they could, to understand how it makes them feel. Having deep and honest conversations with them about it. Not just reading about it.
Conclusion
Technology may be increasingly commoditized. Trust won’t be.
Companies that build trust architecture into their agents will win, period. And designers are the roles best positioned to ensure that happens.
So, reviewing our four phases:
| Acquiring Trust | Growing Trust | Maintaining Trust | Repairing Trust | |
|---|---|---|---|---|
| Principle | Progressive disclosure | Staged autonomy | Persistent audit trails | Reduced autonomy during recovery |
| Application | Simple wins before expanding to full scope | Consistent outcomes and purposeful check-ins | Show work, give users visibility and control | Auto-rollback to more supervised modes |
At best, products that treat trust as an afterthought will see users revert to manual processes the first time something goes wrong. At worst, they’ll run to a competitor who prioritizes their trust.