The Cost Collapse
When Compute Stops Being the Barrier
TAM-UNF.11 · The Ungoverned Frontier · The Approximate Mind
Dr. Yuki Tanaka assembles the swarm on a Friday afternoon, drinking black tea from a thermos she has carried since graduate school. The thermos is dented on one side from a fall on a research vessel off the Kuril Islands in 2019. She cannot break the habit of it.
She is a marine biologist at a regional university in Hokkaido. Her institution does not have a frontier compute cluster. What it has is a laptop cluster, three GPUs shared across the department, and the accumulated knowledge of thirty years of field research on cold-water kelp forest dynamics, documented in field notes, gray literature, and a handful of published papers that the major journals found too regional to warrant wide attention.
She is assembling five models. A small language model trained on the oceanographic literature relevant to her region. A state-space model optimized for the time-series analysis of temperature gradients her buoys have been recording for a decade. A Tiny LM, built from her team’s own field notes, that knows things about this specific stretch of coast that no published paper contains. A transformer-based model for cross-domain inference with atmospheric chemistry, because the kelp dynamics she’s studying are downstream of weather patterns she can’t model alone. And a routing layer that will assemble whichever combination is most relevant to the specific question she puts to it.
The whole thing cost less to build than the conference she attended in Bergen last autumn. The inference cost per query is less than a cup of coffee.
Five years ago, this was frontier lab territory. Last year, it was expensive but possible. Today, it is an afternoon’s work.
What Changed#
The cost of AI capability has not declined linearly. It has declined architecturally.
The frontier model approach, train one enormous general model on the broadest possible corpus, achieve general capability through scale, concentrates both capability and cost at the top of the distribution. The institutions that can train and run these models are countable. The problems they prioritize are the problems those institutions have reason to care about.
The swarm approach inverts this logic. Instead of one large general model, assemble a mixture of specialized models: small language models trained on specific knowledge domains, state-space models optimized for time-series or sequential data, Tiny LMs built from curated datasets that may be tiny in volume but deep in domain specificity, routing layers that direct queries to the most relevant combination. Each component is cheap to train and cheap to run. The assembly is dynamic: the swarm configures itself differently for different questions, rather than running a large general system continuously.
The cost does not just drop. The cost structure changes. Training a frontier model costs what a mid-sized country spends on its public agricultural research system annually. Training a swarm component costs what a single postdoctoral researcher earns in a year. Running inference on the assembled swarm costs what Yuki’s department spends on field equipment in a month. These are not refinements of the same economy. They are a different economy.
The hyper-local contextual assembly matters as much as the cost. The frontier model is general: it brings broad knowledge to every query, most of which is irrelevant to the specific context. Yuki’s swarm brings the knowledge most relevant to this coast, this season, this question, assembled on demand, discarded when the query is complete. This is closer to how domain expertise actually works than how frontier models work. The expert doesn’t activate everything she knows at once. She assembles what the situation requires.
What the Swarm Enables That Scale Cannot#
The frontier model’s broad knowledge is also its limitation. A system trained on the entire published corpus knows the general case well and the specific case poorly. Yuki’s coastline is not the general case. Her kelp forests behave according to dynamics that are partly documented in the published literature and partly documented only in her team’s thirty years of field notes, in the specific interaction between the Kuril Current and the local seafloor topology, in the seasonal patterns that no global oceanographic model has the resolution to represent.
The frontier model can tell her what kelp forests generally do. The swarm can tell her what this kelp forest is doing now, in relation to what it was doing last October, in the specific temperature gradient this buoy has been recording since 2014. This is not a marginal improvement in specificity. It is a different kind of knowledge.
And it is the knowledge that matters for the decisions that need to be made. Conservation policy for this coastline, fisheries management for this season, early warning systems for this community: all of these require the specific knowledge, not the general. The frontier model gestures toward the specific from the general. The swarm is built from the specific.
The hyper-local contextual assembly extends this further. Yuki does not run the full swarm continuously. She assembles the relevant configuration for a specific question and dissolves it when the question is answered. If she is asking about temperature gradient anomalies, she pulls the state-space model and the Tiny LM. If she is asking about cross-domain interactions with atmospheric chemistry, she adds the transformer component. The routing layer makes this selection, but the selection reflects a design Yuki made about what knowledge domains are relevant to her research questions. The swarm’s configuration is itself an epistemological argument about what matters.
This configurability is what makes the swarm more than a cheaper version of the frontier model. It is a different instrument, suited to different problems, building different kinds of knowledge. The discovery pipeline run through a frontier model finds what the frontier model’s architecture can find. The discovery pipeline run through a swarm finds what the swarm’s curated components were built to find. The two search spaces are not the same.
Who Is Inside the Pipeline Now#
The series argued in earlier essays that the architecture choice is the equity choice. That framing assumed cost as the primary barrier. The swarm architecture substantially removes that assumption.
What the swarm requires instead of compute budget is curation expertise: the knowledge to build the right Tiny LM for a specific domain, to identify which gray literature to include in a small language model’s training data, to know which time-series model architecture fits the data structure of a specific research program, to design a routing layer that correctly identifies which component combination is relevant to a given question.
This is a different kind of expertise than the technical expertise required to train frontier models. It is domain-adjacent rather than technically specialized. Yuki can curate the training data for her kelp forest Tiny LM because she knows the domain well enough to know what knowledge is most important and what is missing from the published record. She does not need to know how to train a frontier model. She needs to know her field.
This expertise is more distributed than frontier compute. It lives in domain communities: the marine biologists who know their coastlines, the water engineers who know their watersheds, the historians who know their archives, the agricultural researchers who know their specific crops in specific microclimates. It does not require institutional affiliation with a major AI laboratory. It requires deep knowledge of a specific domain and the curation judgment to translate that knowledge into training data.
The equity barrier has not dissolved. It has moved. The question is no longer “who can afford the compute” but “who has the curation knowledge and the institutional context to build the relevant components.” These two questions have different answers. The first excluded most of the world. The second excludes much less, and the exclusions follow different patterns.
What the Equity Problem Looks Like Now#
Yuki can build her swarm because she has thirty years of accumulated field knowledge and a team that can curate it. The marine biologist at a university with no long-term field program cannot build the same thing, not because of compute but because the Tiny LM’s value comes from the knowledge it was built from, and the knowledge requires the field program that produced it.
The new equity question: who has accumulated the domain-specific knowledge that makes a Tiny LM valuable? The answer maps partly onto the same institutions that benefited from the old infrastructure, research universities with long-term programs, government agencies with decades of monitoring data, established scientific communities with substantial gray literature. And partly onto communities that accumulated knowledge outside the institutional research framework: traditional practitioners whose knowledge was never formalized, communities with long empirical relationships with specific landscapes, industries with proprietary operational knowledge that was never published.
The second group can now build knowledge infrastructure that was previously inaccessible to them. The first group has accumulated knowledge that translates directly into swarm components. The distance between them is not gone. It is smaller, and differently shaped, than it was.
I wonder whether the institutions that hold the most relevant domain knowledge for the most urgent problems, the communities with deep situational knowledge of their own conditions, will develop the curation capacity to build the swarm components that represent what they know, or whether the second architecture will replicate the first architecture’s concentration in a new form.
Yuki closes the routing layer’s configuration file. The swarm is assembled. She puts the thermos back in its place beside the monitor, in the dent the Kuril Islands fall left in it. She types the first query. The swarm assembles the relevant components. The response arrives in seconds, in the specific frame of her specific stretch of coast.
She has been waiting thirty years for something that could hold all of this at once.
This is Part 11 of The Ungoverned Frontier. The cost barrier to the discovery pipeline has changed shape. Part 12 (The Utility Layer) asks what happens when the distance between discovery and benefit compresses as dramatically as the cost of the discovery itself.
References#
AI Architecture and Efficiency
Lepikhin, Dmitry, et al. “GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.” ICLR 2021.
Fedus, William, Barret Zoph, and Noam Shazeer. “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.” Journal of Machine Learning Research, vol. 23, 2022.
Small and Specialized Models
Gunasekar, Suriya, et al. “Textbooks Are All You Need.” arXiv, 2023.
Abdin, Marah, et al. “Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone.” arXiv, 2024.
State Space Models
Gu, Albert, and Tri Dao. “Mamba: Linear-Time Sequence Modeling with Selective State Spaces.” arXiv, 2023.
Knowledge and Curation
Bommasani, Rishi, et al. “On the Opportunities and Risks of Foundation Models.” arXiv, 2021.
Crawford, Kate. Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. Yale University Press, 2021.
How this essay connects to others across The Approximate Mind.
- Lepikhin, Dmitry, et al. “GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.” ICLR 2021.
- Fedus, William, Barret Zoph, and Noam Shazeer. “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.” Journal of Machine Learning Research, vol. 23, 2022.
- Gunasekar, Suriya, et al. “Textbooks Are All You Need.” arXiv, 2023.
- Abdin, Marah, et al. “Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone.” arXiv, 2024.
- Gu, Albert, and Tri Dao. “Mamba: Linear-Time Sequence Modeling with Selective State Spaces.” arXiv, 2023.
- Bommasani, Rishi, et al. “On the Opportunities and Risks of Foundation Models.” arXiv, 2021.
- Crawford, Kate. Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. Yale University Press, 2021.