An Approach for Designing Trustworthy Agents

Published in 2026 by Eric

Agents & People

I’ve been thinking about trust for years. How it’s granted, how it’s built, how it breaks, and how it’s repaired. Recently, seeing the struggles of companies trying to apply AI for their customers, agentic AI has become a forcing function for me to bring clarity to these thoughts.

Collaboration with agents may be new, but we collaborate with them in similar ways as people. You need to know that you can trust both agents and people to complete specific tasks. More broadly, to have your best interest in mind.

So how do we build that trust?

Trust Happens in Phases

Logically, trust doesn’t happen all at once, but moves through distinct phases. Each one involves different considerations and a different design approach.

Acquiring Trust	Growing Trust	Maintaining Trust	Repairing Trust

So if you wouldn’t accept behavior from a teammate, you shouldn’t accept it from an agent. What follows is a framework for building trust in agentic systems by reflecting on how we build trust with each other.

Acquiring Trust

The first time a user encounters your AI agent, they’re asking: Can this thing actually do what it claims? Is it trying to be something it’s not? Can I trust it with real work?

Humility

The systems that fail here are the ones that overpromise. AI that claims to think like you or understand your business sets expectations it can’t meet. When the agent inevitably falls short, trust collapses before the user even sees what it can do well.

In marketing, sales, UX guidance, agents should be positioned as a capable tool, not a god-like solution to everything. If customers find it valuable, they’ll drive word of mouth. Underpromise and overdeliver, and the hype will become natural.

Small Wins

Don’t show users everything the agent can do on day one. Let them experience it succeeding at simple tasks first. Build confidence through evidence, not assertion. Start with high supervision on simple tasks, then expand scope as trust builds through an increasing range of wins.

Growing Trust

Consistency

Once initial trust forms, it deepens through accumulated experience. Users learn to anticipate how the system behaves. Consistency becomes the primary trust-builder. Can it prove to me that the trust I’ve granted it is sustainable?

Progressive Mental Models

Onboarding experiences aren’t just for users what or how to do things. They’re for building their mental model of how it works. When mental models align with actual system behavior, trust grows. When they diverge, trust erodes—even if the agent performs well.

This is why progressive clarity matters. Users shouldn’t encounter full complexity on day one:

Day 1: Simple tasks, high supervision, immediate feedback
Day 10: Pattern recognition begins, users predict agent behavior
Day 100: Earned autonomy, users delegate without constant monitoring

Bias Towards (Low-Risk) Action

Trust builds through action, not contemplation. Agentic systems that require extensive configuration before delivering value miss the window for trust-building. Agents that succeed demonstrate competence immediately, even at limited scope.

Purposeful Check-Ins

Research reveals a dangerous dynamic. Users can shift from appropriate skepticism to following AI recommendations without question. This happens when agents work too seamlessly and users lose the cognitive engagement that keeps trust calibrated. The solution: design systems where humans remain reviewers and decision-makers while agents handle execution.

Progressive Scope & Autonomy

Training wheels at first. Autonomy earned through performance. Each successful cycle proves the agent can handle more, and supervision requirements reduce gradually. You wouldn’t give a new employee full autonomy on day one. Don’t do it with an AI agent.

Maintaining Trust

Fragility

Trust is easier to destroy than build. A single significant failure can devastate years of credibility. In agentic systems where autonomous actions cascade into serious consequences, maintaining trust becomes an active discipline.

Transparency

Agents should show their work, and be clear about how confident they are. You can’t be certain about everything. But here’s where it gets nuanced: sometimes you want tools to disappear in skilled hands, other times you need mechanisms visible. The design challenge is giving users control over this dynamic.

Meaningful Explainability

How things are explained matters. You could dump the entire thought log of how the agent came to an output. Explanations must be meaningful, not performative. Showing your work only builds trust if the work you’re showing is actually what happened, and if what you’re showing is understandable. This reminds me of Grok’s early UX thought bubble pattern that got such accolades, yet (IMO) failed on this heuristic.

Agents need persistent, digestible audit trails—not incoherent braindumps or reverse rationalizations. Every decision should be logged, reviewable, and explainable on demand. Users should trace back from any agent action to the reasoning and data that informed it.

Repairing Trust

In complex systems of people and agents, trust will not always grow or be sustained. Mistakes can happen, sometimes with significant consequences. The difference between systems that recover and those that don’t lies in how failures are handled.

Acknowledge, Explain, Correct, Commit

Research on trust repair identifies key strategies: acknowledge the breach, explain what went wrong, demonstrate corrective action, explicitly recommit to trustworthy outcomes. For agentic systems, this means graceful error handling that doesn’t just say sorry—it shows what’s being done to prevent recurrence.

Timing

I heard recently that apologies immediately after loss of trust are less effective than apologies when new opportunities to trust arise. Who knows how true that is, but it checks out on my end. Error acknowledgment from systems should include explicit recommitment and give users control over how to reengage. Forcing immediacy and control backfires.

Systems that maintain trust through failures reflect, learn, and focus forward rather than deflect blame. After significant errors, agents should automatically scale back to requiring more human approval. They need to re-earn autonomy through demonstrated improvement.

Mode Reversal

When an agent fails at a high-stakes task, it should automatically revert to training-wheel mode for similar tasks. Users should see explicit evidence of improved performance before expanding autonomy again.

Agentic Empathy

I’m still noodling on this one, the relationship between Trust and Empathy. So far, I’m thinking of Empathy as a means towards Trust.

Second, if we’re to design empathetic agents, then we ourselves must become even more empathetic.

You can’t design trustworthy systems from a distance. Empathy underlies all of the points discussed above. If you nurture empathy as a muscle, or a skill, then I’d expect that many of the items above will come naturally. In other words, empathetic agents and people embody more trustworthy habits.

When you transcend intellectually “knowing” into feeling what’s at risk for users, design decisions change. The weight of agentic delegation that users feel becomes butterflies in your own gut.

When a user trusts your agent with high-stakes work, they’re putting their reputation and outcomes in your hands. That understanding shapes everything: how you position the agent’s role, how you design for transparency, how you handle failures.

Empathy takes work. Using AI, to the extent we want our users to, in the ways we think they could, to understand how it makes them feel. Having deep and honest conversations with them about it. Not just reading about it.

Conclusion

Technology may be increasingly commoditized. Trust won’t be.

Companies that build trust architecture into their agents will win, period. And designers are the roles best positioned to ensure that happens.

So, reviewing our four phases:

	Acquiring Trust	Growing Trust	Maintaining Trust	Repairing Trust
Principle	Progressive disclosure	Staged autonomy	Persistent audit trails	Reduced autonomy during recovery
Application	Simple wins before expanding to full scope	Consistent outcomes and purposeful check-ins	Show work, give users visibility and control	Auto-rollback to more supervised modes

At best, products that treat trust as an afterthought will see users revert to manual processes the first time something goes wrong. At worst, they’ll run to a competitor who prioritizes their trust.