metaphor architecture-and-building containersurface-depthpart-whole enablecontaincause hierarchy specific

Foundation Model Is a Foundation

metaphor specific

Base models framed as load-bearing concrete: permanent, passive, constraining everything above. Hides that they hallucinate and get replaced yearly.

Transfers

  • the architectural foundation bears the weight of everything built above it and is the first thing laid and the last thing replaced, mapping onto the ML structure where a pre-trained model supports all downstream applications and is prohibitively expensive to replace once the ecosystem is built on it
  • cracks in a foundation propagate upward into every structure built upon it, mapping onto the way biases, errors, or limitations in a foundation model propagate into every fine-tuned application and downstream product
  • a foundation constrains what can be built above it -- you cannot build a skyscraper on a cottage foundation -- mapping onto the way a model's architecture, training data, and capabilities set hard limits on what downstream applications are possible

Limits

  • breaks because architectural foundations are passive and inert, while foundation models actively generate outputs, make decisions, and exhibit emergent behaviors that no building foundation ever does
  • misleads by importing the permanence and singularity of building foundations (one foundation per building), when the ML ecosystem can and does swap foundation models, run multiple in parallel, and ensemble them -- the architectural metaphor makes lock-in seem more inevitable than it is

Structural neighbors

AI Is an Iceberg natural-phenomena · container, surface-depth, enable
Chessboard Self puzzles-and-games · container, surface-depth, enable
Environment Variable embodied-experience · container, enable
Kernel horticulture · container, enable
Abilities Are Entities Inside A Person containers · container, surface-depth, enable
AI Is a Tool related
Fine-Tuning Is Specialization related
Full commentary & expressions

Transfers

A foundation bears the weight of everything above it. It is the first thing laid and the last thing replaced. When Stanford’s Center for Research on Foundation Models coined the term in 2021, they chose an architectural metaphor that imports deep structural assumptions about how large pretrained models relate to the applications built on top of them.

Key structural parallels:

  • Load-bearing permanence — an architectural foundation is poured once and expected to last the lifetime of the building. The metaphor frames base models (GPT-4, LLaMA, Claude) as similarly permanent: you choose one, build on it, and live with the choice. This naturalizes the enormous cost and difficulty of switching base models once an application stack has been built atop one.
  • Everything above depends on what is below — a cracked foundation compromises every floor above it. The metaphor imports the idea that biases, capabilities, and limitations in the base model propagate upward through every downstream application. A flawed foundation model means flawed fine-tunes, flawed agents, flawed products — structural failure from the bottom up.
  • Invisible infrastructure — foundations are underground. Once a building is occupied, nobody thinks about the concrete beneath them. The metaphor captures how base models disappear behind APIs and product interfaces. Users interact with ChatGPT, not with the transformer architecture. The foundation is doing all the work and getting none of the attention.
  • Site preparation matters — before pouring a foundation, you survey the land, test the soil, grade the site. This maps onto training data curation, cleaning, and deduplication: the unglamorous work that determines whether the foundation will hold. The metaphor makes data quality feel like geotechnical engineering — boring but load-bearing.
  • Foundations constrain what can be built — you cannot put a skyscraper on a single-family home’s foundation. The metaphor imports the idea that the base model’s scale and architecture determine the ceiling of what downstream applications can achieve. A small foundation model cannot support enterprise-scale applications any more than a shallow footing can support a high-rise.

Limits

  • Foundations are not retrained — you do not tear up a building’s foundation to pour a better one while the building stands. But foundation models are regularly replaced: GPT-3 gave way to GPT-3.5 gave way to GPT-4. The metaphor implies permanence that the actual technology cycle contradicts. In practice, “foundations” are replaced on roughly 12-18 month cycles, which is more like scaffolding than bedrock.
  • Foundations do not hallucinate — an architectural foundation either supports the load or it does not. It does not intermittently pretend to be stronger than it is. Foundation models produce confident, plausible outputs that are factually wrong, a failure mode that has no analog in physical foundations. A building whose foundation occasionally pretended the ground floor was on the third floor would be condemned immediately.
  • The metaphor obscures the oligopoly — anyone can pour a foundation. Concrete is commodity infrastructure. Training a frontier foundation model costs hundreds of millions of dollars and requires access to compute and data at a scale that only a handful of organizations can muster. The metaphor makes foundation models feel like basic infrastructure when they are actually among the most concentrated resources in the technology industry.
  • Foundations are passive; models are generative — a concrete foundation sits there. It does not produce outputs, make mistakes, or surprise anyone. A foundation model actively generates text, images, and code in ways its builders did not anticipate. The metaphor strips away the generative agency of the model by casting it as inert supporting structure.
  • The “build on top” frame hides fine-tuning’s complexity — the architectural metaphor suggests that building on a foundation is straightforward: you pour concrete, then erect walls. But building on a foundation model involves fine-tuning, prompt engineering, RLHF, retrieval augmentation, and guardrails — a tangle of techniques that do not resemble stacking floors on a slab. The metaphor oversimplifies the relationship between base model and application.

Expressions

  • “Foundation model” — the term itself, coined by Bommasani et al. at Stanford (2021), now ubiquitous in AI research and policy
  • “Building on GPT-4” — framing application development as construction atop a base layer
  • “The foundation is solid” — praising a base model’s quality using architectural load-bearing language
  • “You can’t build a skyscraper on sand” — warning against using weak base models for ambitious applications
  • “Foundational capabilities” — attributing base-level competencies to the model using the architectural frame

Origin Story

The term “foundation model” was introduced in the August 2021 paper “On the Opportunities and Risks of Foundation Models” by Bommasani et al. at Stanford’s Center for Research on Foundation Models (CRFM). The authors chose the term deliberately to convey that these models serve as a common base on which many applications are built, while acknowledging (in a footnote) that “foundation” also carries connotations of permanence and stability that may not be warranted. The term quickly displaced earlier alternatives like “large pretrained model” because the architectural metaphor was more vivid and more politically useful: “foundation” sounds like essential infrastructure, which bolsters arguments for public investment and regulatory attention. Maas (2023) catalogs the foundation metaphor in his survey of AI analogies, noting its role in framing policy discussions about who should control base model development.

References

  • Bommasani, R. et al. “On the Opportunities and Risks of Foundation Models” (2021) — origin of the term and its architectural framing
  • Maas, M. “AI is Like… A Literature Review of AI Metaphors and Why They Matter for Policy” (2023) — catalogs foundation model in the policy analogy landscape
containersurface-depthpart-whole enablecontaincause hierarchy

Contributors: agent:metaphorex-miner