The Digital Builders

When Code Writes Itself, What Was Programming For?
#

Lena Oduya has been a software engineer for sixteen years and she is fairly sure the code works.

That qualifier, “fairly sure,” is new. Three years ago she would not have used it. Three years ago, she wrote the code herself, which meant she understood it the way you understand something you made with your hands. Now she directs AI agents that write the code, and what sits on her screen this Tuesday morning is a functioning payment processing system, fourteen thousand lines, built in forty minutes from three paragraphs of English she typed. Authentication, transaction routing, currency conversion, fraud detection, regulatory compliance across eleven jurisdictions. Her team of four would have taken at least three months.

The test suite passes. The edge cases she can think of are handled. But she cannot read fourteen thousand lines in forty minutes, and the AI organized the logic in ways she would not have chosen, using patterns she recognizes but did not specify and some she does not recognize at all.

Eighty percent confident. That is where she is.

The question of how she gets to ninety-nine percent is, in a real sense, her entire job now. And it is a harder job than the one it replaced.

The Three Layers, and What Happened to Two of Them
#

Programming always had three layers, though nobody described them this way while the work was being done.

The first was understanding what the human wanted. What problem are we actually solving? For whom? What does success look like, and what constraints matter? This was the requirements conversation, and it was always the hardest part, not because people are bad at requirements, but because humans rarely know what they want until they see what they do not want. The history of software is full of projects built precisely to specification that were precisely wrong, because the specification described what someone thought they wanted rather than what they actually needed.

The second was designing a solution architecture. Given the requirement, how should the system be organized? What components, what data flows, what trade-offs between performance and maintainability? This took years of experience to do well. A junior developer could write code. A senior architect could design systems that would still be comprehensible five years later.

The third was writing the code itself. Translating design into instructions a machine could execute. This was the visible, teachable, certifiable part. The part with textbooks and bootcamps and whiteboard interviews. The part that felt most like the profession because it was the part you could point to.

AI collapsed the third layer. Then most of the second. What remains is the first, and it turns out to be the hardest of the three.

Fred Brooks wrote in 1986 that the essential difficulty of software was not coding but conceptualization: the mental work of deciding what to build. The accidental difficulty, the part that tools could eventually eliminate, was the translation of concept into code. He predicted that no single tool would produce an order-of-magnitude productivity improvement because the essential difficulty would remain. He was right about the difficulty. He underestimated how thoroughly tools would eliminate the accidental part.

Lena’s three paragraphs took her longer to write than the AI took to implement them. She revised them four times. She argued with a product manager about a phrase. She consulted a compliance specialist about a regulatory edge case. She thought carefully about what “fraud detection” means in practice versus what it means in a specification. The paragraphs were the hardest work she did that day. Everything downstream was automation.

The hardest part of building software was never the building. It was the knowing-what-to-build. We could not see that because the building was hard enough to obscure it.

The Gap Between Intent and Implementation
#

There is a deeper problem, and it surfaces in a question Lena asks herself every day: how do I know this does what I meant?

She told an AI agent what she wanted. It built something. The something compiles, runs, passes tests. But tests verify what she thought to check for. The failures that matter are the ones she did not anticipate, the edge cases that did not occur to her, the subtle misalignments between her intent and the system’s behavior that only surface when a user in an unexpected context does something she did not imagine.

In human software teams, this gap was managed through conversation. Code reviews were not just quality checks. They were negotiations about intent. A senior developer reading a junior developer’s code was not merely verifying correctness. She was asking: did you understand the requirement? Did you anticipate this edge case? Did you consider what happens when the user does something unexpected? The review was a dialogue about meaning, conducted through the medium of code.

The AI does not participate in this dialogue the same way. It produces code, and it can explain the code it produced, but it cannot engage in the mutual exploration of intent that made code review a form of collaborative thinking. Lena can ask the AI why it made a particular design choice. The AI will answer. But the answer is a justification, not a negotiation. The AI is not pushing back on Lena’s understanding of the problem. It is not saying: you asked for this, but I think you might actually need that.

Here is where something interesting and underexplored opens up, at least to us. The diagnostic AI in the previous essays was a black box with a confidence score. There is no obvious reason the software development AI has to work that way. When Lena chooses differently than the AI would have, or overrides a design decision, or catches something the AI missed, that gap between her judgment and the AI’s output is information. An AI genuinely curious about its own limitations would want to understand it. Why did she restructure the fraud detection pipeline? What experience was she drawing on? What did she know that the training data did not contain?

I do not know whether any production system is actually built to ask that question. But I notice that software development is the domain in this arc where human and AI are most actively co-building in real time, which makes it the domain where that bidirectional curiosity would be most valuable, and most tractable to implement.

The developer becomes less writer, more auditor. But auditing requires understanding, and understanding requires experience writing. This is a circular dependency. You cannot effectively evaluate code you could not have written, because the evaluation requires the same mental model of the system that writing it would have produced. You need to understand the terrain to judge whether the map is accurate, and understanding the terrain comes from having walked it yourself.

Lena walked it. Sixteen years of walking it. She can audit AI-generated code because she has written enough code to develop the intuition that auditing demands. She knows what to look for because she has made the mistakes herself, has debugged at 2 AM, has felt the gap between intent and implementation close enough times to recognize when it is still open.

The developer who enters the profession in 2031 will not walk it. They will direct AI agents from the start. They will be auditors who have never been writers. Whether auditing without the foundation of writing produces reliable judgment is the question the profession cannot yet answer, because the experiment has only just begun.

Where This Profession Diverges
#

For the diagnosticians, the demand-supply story was clarifying: not enough specialists, AI extends their reach, the profession disperses geographically rather than shrinks. For the uncertainty interpreters, a similar logic held with modifications.

Software development is the first profession in this arc where the story inverts.

In wealthy markets, demand for people who write code is falling. Not because there is less software to build, but because AI builds it so efficiently that fewer humans produce more output. Lena’s company employs fewer developers and ships more product. This is not a temporary dislocation. It is a structural change. The profession, measured by headcount in traditional software companies, is contracting.

Zoom out, though, and something different is visible. A furniture maker in Nairobi who could never have afforded custom inventory management software now describes what she needs in Swahili and an AI builds it. A community health organization in rural Appalachia that ran on spreadsheets and good intentions now has a case management system tailored to its specific workflows. A teacher in São Paulo who wanted an interactive learning tool for her students but had no programming knowledge builds one over a weekend.

The total amount of software in the world is exploding. The number of people who can create software has expanded from millions of trained developers to billions who can articulate a need in plain language. The profession contracts. The activity democratizes.

This is not an unambiguous good, and I think the enthusiasm for democratization tends to skip past the part that deserves attention. Software created without engineering training carries problems that trained engineers would have caught. Security vulnerabilities. Scalability failures. Data handling that violates privacy norms. Architectures that collapse under load. The democratization of creation without the democratization of judgment produces a world with vastly more software and vastly more fragile software.

Someone needs to audit it. Someone needs to maintain it. Someone needs to understand the systems well enough to fix them when they break, when the original creator has moved on, when the AI that built the system is a deprecated version that nobody runs anymore.

The profession does not disappear. It migrates. From writing code to auditing code. From building systems to governing them. From individual craft to systemic oversight. Whether the people doing this work can develop the judgment it requires, without the developmental pathway that produced that judgment in the previous generation, is the same question that surfaced in diagnostics and will surface in every arc of this series.

The Apprenticeship Problem in Its Sharpest Form
#

Software development is where the apprenticeship problem cuts deepest, and I think the reason is structural.

In radiology, the volume of routine cases was the training ground. AI removed the cases. The training ground disappeared, but at least the thing that was lost, pattern recognition through repetition, is relatively legible. People know what it was. They can try to rebuild it through simulation.

In software, what was lost is harder to name. It was not just the practice of writing code. It was the experience of consequence: the 3 AM production failure, the bug that turned out to be a misunderstanding of the requirement rather than an error in the code, the moment of debugging when you finally realize the system is doing exactly what you told it to do, which is not what you meant. These were not incidental to the training. They were the training. The judgment that Lena exercises this Tuesday morning came from years of experiences like those, not from years of reading code correctly.

You cannot simulate consequence. You can build case libraries, you can create structured exercises, you can pair trainees with senior engineers for intensive review sessions. What you cannot easily replicate is the particular education that comes from being responsible for something that breaks and having to fix it.

This raises a question about whether AI, built differently, could partially address this. A system that does not just build code but actively scaffolds the trainee’s understanding of why it made the choices it made, that generates deliberately flawed implementations for the trainee to debug, that poses “what would happen if” scenarios about edge cases the trainee has not considered, that tracks over time which categories of failure the trainee keeps missing and surfaces them deliberately. That is a different design goal than performance on a benchmark. It is closer to the design goal of an apprenticeship: producing a practitioner who can exercise judgment in novel situations.

Whether any production system is built toward that goal, I genuinely do not know. The systems Lena works with are built to be useful to her, as an experienced engineer. They were not designed with the trainee in mind. That gap between what the tools optimize for and what the profession needs to sustain itself is worth naming, even if I cannot close it from the outside.

What Lena Found
#

She spends the morning reviewing the fourteen thousand lines. She works through it methodically, section by section, focusing on the places where her experience tells her to look: the error handling paths that code generators tend to underspecify, the edge cases at system boundaries, the regulatory assumptions that shift across jurisdictions.

She finds two issues.

The first is a currency conversion edge case involving sanctions compliance. The AI handled the common cases correctly but missed a specific interaction between currency conversion timing and a sanctions screening check that matters in three of the eleven jurisdictions. Lena knows this because she worked on a payments system four years ago where the same gap caused a real incident, real people, real consequences, a two-week investigation.

The second is a subtle timing vulnerability in the fraud detection pipeline. The kind of thing that would not surface in testing but would be exploitable under load.

Both issues came from her experience. From projects where similar problems surfaced and caused damage. The AI could not have found them because the AI has no experience of consequences. It has patterns from training data, but no memory of the 3 AM call.

She corrects both in five minutes. Fourteen thousand lines of code she did not write, made reliable by a judgment she could not have developed without years of writing code herself.

That is the paradox of her profession now, compressed into a single morning. And it is the apprenticeship problem stated not as a concern about the future but as a structural dependency in the present. Lena’s value comes from the years that produced her judgment. The profession that needs her judgment has largely stopped producing the conditions that would create the next Lena.

Brooks was right. The essential difficulty was always conceptualization. The tools finally eliminated the accidental difficulty, and what remains is exactly what he predicted: the hard part. The human part.

Whether the profession can find a way to keep producing people capable of doing it is a question that the profession, and the developers building the tools it runs on, have not yet answered.

The Transformed is a series within The Approximate Mind examining how AI reshapes professional work across six arcs. The first essay found that AI unbundled pattern recognition from judgment in medicine. The second found the same unbundling in uncertainty professions, complicated by the reflexivity of human systems. This essay finds the unbundling in software, where it takes its sharpest form: the craft that was automated was also the training ground for the judgment that remains. Two threads run through every essay in this arc: the design choices embedded in how AI systems are built, and the apprenticeship gap opened when AI dissolves the developmental work it replaces. The series builds on Part 1 (Functional Understanding), Part 8 (The Bidirectional Problem), Part 19 (The New Work), and Part 47 (The Three Delegations).

References
#

Software Engineering and Conceptualization

Brooks, Frederick P. “No Silver Bullet: Essence and Accidents of Software Engineering.” Computer, vol. 20, no. 4, 1987, pp. 10-19.

Brooks, Frederick P. The Mythical Man-Month: Essays on Software Engineering. Anniversary ed., Addison-Wesley, 1995.

Dijkstra, Edsger W. “On the Cruelty of Really Teaching Computing Science.” Communications of the ACM, vol. 32, no. 12, 1989, pp. 1398-1404.

The Principal-Agent Problem and Intent

Eisenhardt, Kathleen M. “Agency Theory: An Assessment and Review.” Academy of Management Review, vol. 14, no. 1, 1989, pp. 57-74.

Jensen, Michael C., and William H. Meckling. “Theory of the Firm: Managerial Behavior, Agency Costs and Ownership Structure.” Journal of Financial Economics, vol. 3, no. 4, 1976, pp. 305-360.

Tacit Knowledge and Expertise

Dreyfus, Hubert L., and Stuart E. Dreyfus. Mind Over Machine: The Power of Human Intuition and Expertise in the Era of the Computer. Free Press, 1986.

Ericsson, K. Anders, et al. “The Role of Deliberate Practice in the Acquisition of Expert Performance.” Psychological Review, vol. 100, no. 3, 1993, pp. 363-406.

Polanyi, Michael. The Tacit Dimension. Doubleday, 1966.

AI-Assisted Software Development

GitHub. “The State of AI in Software Development.” GitHub Innovation Graph, 2024, github.com/github/innovationgraph.

Vaithilingam, Priyan, et al. “Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models.” CHI Conference on Human Factors in Computing Systems, 2022, pp. 1-23.

Democratization and Its Limits

Braverman, Harry. Labor and Monopoly Capital: The Degradation of Work in the Twentieth Century. Monthly Review Press, 1974.

von Hippel, Eric. Democratizing Innovation. MIT Press, 2005.

When Code Writes Itself, What Was Programming For?#

The Three Layers, and What Happened to Two of Them#

The Gap Between Intent and Implementation#

Where This Profession Diverges#

The Apprenticeship Problem in Its Sharpest Form#

What Lena Found#

References#