Models All the Way Down
How AI works
Bias & discrimination
Environmental & social costs
What can I learn?
A visual story, from the Knowing Machines research project, that follows the construction of LAION-5B — the open foundation dataset of 5.8 billion image-text pairs used to train models such as Stable Diffusion. The investigation unpacks what is actually inside a dataset too large for any human to review, and what that means.
Core insight
Generative AI is "models all the way down": every model rests on a dataset that is itself the product of earlier models, scraping choices and assumptions. The training data is not raw reality but a constructed, messy, consequential artefact.
How to use it in daily work
An accessible, visual reference for understanding where image-generators get their material and why that origin shapes their outputs and biases.
- Use the piece to explain to a client why an AI image generator produces stereotyped or skewed results — it learned them from the data.
- Draw on the scale of LAION-5B to convey why no one fully controls or inspects what these models have seen.