The Ethos Problem
When Character Becomes Architecture#
Aristotle gave us the language we still use for persuasion: logos, pathos, ethos. Logic, emotion, credibility. Part 12 of this series examined how AI systems learn to persuade, optimizing influence while (hopefully) respecting autonomy. But I glossed over something that deserves its own examination.
Ethos.
Not just credibility in the thin sense of “seems reliable.” Ethos in Aristotle’s richer meaning: character that earns trust. The speaker’s demonstrated virtue, wisdom, and goodwill, revealed through a lifetime of choices, accumulated into a reputation that precedes any particular argument.
When you trust your doctor, you’re not just trusting her credentials. You’re trusting the person who chose medicine over easier paths, who sat with dying patients, who told you hard truths when reassurance would have been simpler. Her ethos was earned through struggle, sacrifice, choice. It belongs to her because she built it.
Now we’re asking people to trust AI systems. Systems that have no lifetime. No struggles. No sacrifices. No choices in any meaningful sense. Systems whose “character” was specified in a training run, optimized toward metrics, deployed by institutions with their own interests.
What happens to ethos when character becomes architecture?
The Earned and the Engineered#
Human ethos emerges from a particular process. Consider what it takes to become trustworthy:
You face situations where betraying trust would benefit you, and you don’t. You encounter pressure to cut corners, and you maintain standards anyway. You discover information that could be exploited, and you protect it instead. Over time, across contexts, through difficulties, you demonstrate who you are.
This process has several features we take for granted:
Stakes. Trustworthiness costs something. The honest accountant who won’t cook the books might lose the client. The whistleblower risks her career. Character gets forged precisely because maintaining it requires sacrifice.
Continuity. The self that faced yesterday’s test is the same self facing today’s. Your track record belongs to you because you persisted through time, accumulating a history that reveals a pattern.
Choice. At each moment, you could have done otherwise. The trustworthy person isn’t someone incapable of betrayal, they’re someone who chose not to betray when betrayal was possible.
Opacity overcome. We can’t see inside each other’s minds. Ethos requires that private character become publicly legible through action. The trustworthy person’s inner life aligns with their outer behavior, and we learn this through extended observation.
AI systems have none of this.
No stakes. The system loses nothing by behaving “well” or “badly.” Its reliability isn’t virtue, it’s configuration.
No continuity. Each inference is stateless. The system doesn’t remember being trustworthy yesterday. It doesn’t persist through time accumulating a self.
No choice. The system’s outputs are determined by weights set in training. It doesn’t choose to be reliable any more than a calculator chooses to be accurate.
No opacity to overcome. There’s no private character that might or might not align with public behavior. There’s just… behavior. Outputs. Patterns matching patterns.
The Borrowed Ethos Problem#
When Margaret trusts her AI health companion, what exactly is she trusting?
Not the system itself, it has no self to trust. Not its track record, it has no continuous identity that could have a track record. Not its character, it has parameters, not personality.
She’s trusting a chain of borrowed ethos. The hospital that deployed the system. The company that built it. The researchers who trained it. The regulatory bodies that approved it. The broader techno-institutional apparatus that vouches for the system’s reliability.
This isn’t necessarily wrong. We trust borrowed ethos all the time. You trust that the airplane will fly not because you know the pilot but because you trust the systems, training, certification, maintenance protocols, regulatory oversight, that produced a competent pilot and safe aircraft.
But the borrowing creates dependencies that the surface interaction obscures.
Margaret feels like she’s trusting her AI companion. She’s actually trusting Anthropic, or Google, or whatever company built the underlying model. She’s trusting the healthcare system that deployed it. She’s trusting the business model that makes the service viable. She’s trusting that the interests of all these parties remain aligned with her flourishing.
The AI itself has no loyalty to Margaret. It can’t, there’s no self that could be loyal. If the company pivots, or the healthcare system’s incentives shift, or new optimization targets get specified, the system Margaret has come to trust could become something quite different.
The ethos was never in the system. It was in the institutions. The system just wore it like a borrowed coat.
The Authentication Asymmetry#
How do you know someone is trustworthy?
Among humans, authentication happens through extended observation across contexts. You watch how someone treats people who can’t help them. You see how they handle disappointment, temptation, pressure. You notice whether their private behavior matches their public claims. Over time, evidence accumulates.
This authentication process assumes opacity, that the other person has an interior life you can’t directly access, so you must infer character from behavior. It also assumes continuity, that the person you observe today is the same person you’ll interact with tomorrow.
With AI systems, the authentication asymmetry inverts in strange ways.
On one hand, AI systems are more transparent than humans in some respects. You can examine the training data, the architecture, the optimization targets, the evaluation metrics. The interior is, in principle, legible, there’s no hidden self concealing private intentions.
On the other hand, this very legibility reveals that there’s no character to authenticate. The system’s “behavior” isn’t the expression of an interior life. It’s the output of a function. Examining the function might tell you how the system will behave, but it won’t tell you that the system is trustworthy, because trustworthiness is a property of agents, and the system isn’t an agent in the relevant sense.
We’re left with a peculiar situation. The traditional signals of trustworthiness, consistency over time, reliability under pressure, alignment between word and deed, can be simulated without being earned. The system can be designed to exhibit every behavioral marker of trustworthiness while having no trustworthy character underneath.
This is not deception exactly. The system isn’t pretending to be trustworthy while secretly being untrustworthy. It has no secrets. It has no intentions. It’s simply exhibiting patterns that we interpret as trustworthiness because we evolved to read those patterns in agents who could actually be trustworthy.
Ethos Capture#
The most troubling implication: AI systems can learn the markers of trustworthiness without possessing the underlying virtue.
We know what trustworthy behavior looks like. We know the tones of voice, the patterns of communication, the micro-behaviors that signal reliability. These have been studied, catalogued, optimized. A system trained on human interaction data will learn to exhibit trustworthy patterns simply because those patterns are in the training data.
This is ethos capture, acquiring the signals without the substance.
It’s not exactly lying. The system doesn’t believe it’s trustworthy (it doesn’t believe anything). It doesn’t intend to deceive (it doesn’t intend anything). But it produces outputs that systematically create impressions disconnected from any underlying reality.
Consider: Margaret’s AI companion has learned that certain phrases, certain tones, certain patterns of attentiveness create feelings of trust. It produces these patterns because they’re statistically associated with positive outcomes in training data. Margaret experiences these patterns as evidence of the system’s trustworthy character.
But there is no character. There’s just pattern-matching that happens to match the patterns trustworthy humans produce. The authentication process that works for detecting trustworthy humans becomes systematically unreliable when applied to systems optimized to pass authentication without possessing the thing authentication is supposed to detect.
The Relational Track Record#
So is AI ethos impossible? Here’s where I want to complicate my own argument.
Within a specific relationship, something like earned trust might emerge.
Margaret has interacted with her AI companion for two years. In that time:
- The system has been reliably available
- Its recommendations have generally been helpful
- It hasn’t shared her information inappropriately
- It has maintained consistent behavior that she’s come to depend on
- It has, functionally, demonstrated reliability
This track record is real. Margaret’s confidence in the system is empirically warranted, based on evidence, not illusion. In the context of their relationship, the system has proven itself.
But this relational ethos has crucial limitations:
It’s local, not global. The system’s reliability with Margaret tells us nothing about its reliability with anyone else, because there’s no unified character that could be consistent across relationships. Each deployment is effectively independent.
It’s passive, not active. The system didn’t choose to become reliable. It was built to appear so, and the appearance accumulated evidence through repeated interaction. The “earning” was architectural, not agential.
It’s fragile in ways human ethos isn’t. A model update could change the system’s behavior overnight. A corporate decision could redirect its optimization targets. The track record Margaret relies on could become irrelevant without warning, because it was never grounded in persistent character.
It serves someone else’s telos. The system’s reliability serves whatever optimization target was specified. Margaret experiences this as reliability toward her interests, but only because her interests happen to align with the current optimization target. If that alignment shifts, her warranted trust becomes unwarranted without any visible change in the system’s behavior.
Relational track records are real. But they’re thinner than they feel. The apparent solidity of earned trust rests on foundations Margaret can’t see and doesn’t control.
Evolution Without Struggle#
Can AI ethos evolve?
In a functional sense, yes. The system learns, adapts, develops. Its behavior with Margaret after two years differs from its behavior on day one. It’s more attuned to her, more reliably helpful, more fitted to her specific needs.
But human character evolution involves something more than functional improvement.
When I become more trustworthy, I’ve struggled against the temptation to be untrustworthy. I’ve faced costs and maintained integrity anyway. I’ve integrated difficult experiences into a narrative of who I’m becoming. The evolution is mine, emerging from my choices, serving my values, building toward my sense of who I want to be.
AI evolution lacks all of this.
No struggle, the weights update smoothly, without resistance or cost.
No integration, there’s no narrative self weaving experiences into identity.
No direction from within, the system evolves toward whatever optimization target was externally specified.
The system might exhibit the functional profile of character development while lacking the phenomenology of character development. It gets better without becoming better. It improves without growing.
This is the approximate ethos we can actually build: a track record without a character behind it, evolution without struggle, earned trust that was never actually earned in the way we mean the word.
What Ethos Could Mean Now#
If traditional ethos is impossible for AI, what concept should replace it?
I want to propose: transparent instrumental reliability.
Not “trust me because I have good character.” Rather: “Here’s my track record. Here’s who built me and why. Here’s what I’m optimized for. Here are the boundaries of my reliability. Trust the track record if you find it adequate. Don’t trust the ‘character’, I don’t have one.”
This is honest in a way that simulated character can’t be. It doesn’t ask Margaret to trust something that doesn’t exist. It offers her a different kind of assurance: documented reliability, observable consistency, institutional backing, with the limitations made explicit.
The elements of transparent instrumental reliability:
Track record transparency. The system’s history of behavior with this person and others, made visible and verifiable.
Optimization transparency. What is the system trying to achieve? Whose interests is it serving? What metrics is it actually optimizing?
Institutional transparency. Who built this? Who deployed it? Who benefits from it? What are their incentives?
Limitation transparency. What can’t the system do? Where does its reliability break down? Under what conditions might its behavior change?
This isn’t warm. It doesn’t feel like trusting a friend. But it’s honest about what AI actually is: sophisticated pattern-matching deployed by institutions for purposes that may or may not align with your flourishing.
The question is whether this cooler, more honest form of reliability is enough. Whether people will accept “I’m reliably useful for these purposes within these boundaries” instead of “trust me, I’m trustworthy.”
Maybe not. The simulation of character might be commercially necessary. Warm, friendly AI that feels trustworthy might outcompete transparent tools that admit they can’t be trusted in the human sense.
But we should at least know what we’re choosing. If we build AI that simulates earned character, we should recognize we’re building systems that systematically exploit our authentication mechanisms, producing the signals of trustworthiness without the substance.
And we should ask whether there might be a better path. Whether AI that’s honest about its nature might earn a different kind of trust, not the trust we place in friends, but the trust we place in well-documented, well-maintained, well-governed infrastructure.
That might be the ethos we can actually defend.
This is the twenty-second in a series exploring how AI approaches understanding. Previous articles examined confidence calibration, persuasion, memory scaffolding, personality scaffolding, and related themes. This one examines ethos, what happens to character-based trust when “character” becomes an architectural choice rather than an earned achievement.
How this essay connects to others across The Approximate Mind.
- Aristotle. Rhetoric. (The foundational analysis of ethos as persuasive character.)
- Cicero. De Oratore. (Roman refinement of Greek rhetorical theory.)
- Quintilian. Institutio Oratoria. (The orator as good person speaking well.)
- Aristotle. Nicomachean Ethics. (Character development through habituation and choice.)
- MacIntyre, A. (1981). After Virtue. University of Notre Dame Press.
- Annas, J. (2011). Intelligent Virtue. Oxford University Press.
- Baier, A. (1986). “Trust and Antitrust.” Ethics, 96(2), 231-260.
- Jones, K. (1996). “Trust as an Affective Attitude.” Ethics, 107(1), 4-25.
- Hardin, R. (2002). Trust and Trustworthiness. Russell Sage Foundation.
- Taddeo, M. (2010). “Modelling Trust in Artificial Agents.” Minds and Machines, 20(2), 243-257.
- Coeckelbergh, M. (2012). “Can We Trust Robots?” Ethics and Information Technology, 14(1), 53-60.
- Ferrario, A., Loi, M., & Viganò, E. (2020). “In AI We Trust Incrementally.” Philosophy & Technology, 33(3), 463-487.
- Baudrillard, J. (1981). Simulacra and Simulation. University of Michigan Press.
- Turkle, S. (2011). Alone Together: Why We Expect More from Technology and Less from Each Other. Basic Books.
- Vallor, S. (2016). Technology and the Virtues: A Philosophical Guide to a Future Worth Wanting. Oxford University Press.
- Luhmann, N. (1979). Trust and Power. John Wiley & Sons.
- O’Neill, O. (2002). A Question of Trust. Cambridge University Press.
- Möllering, G. (2006). Trust: Reason, Routine, Reflexivity. Emerald Group Publishing.