Documentation Essential

Models All the Way Down

How AI works Bias & discrimination Environmental & social costs

What can I learn?

A visual story, from the Knowing Machines research project, that follows the construction of LAION-5B — the open foundation dataset of 5.8 billion image-text pairs used to train models such as Stable Diffusion. The investigation unpacks what is actually inside a dataset too large for any human to review, and what that means.

Core insight

Generative AI is "models all the way down": every model rests on a dataset that is itself the product of earlier models, scraping choices and assumptions. The training data is not raw reality but a constructed, messy, consequential artefact.

How to use it in daily work

An accessible, visual reference for understanding where image-generators get their material and why that origin shapes their outputs and biases.

  • Use the piece to explain to a client why an AI image generator produces stereotyped or skewed results — it learned them from the data.
  • Draw on the scale of LAION-5B to convey why no one fully controls or inspects what these models have seen.

Time

30–45 minutes to read and explore.

Cost

Free