How we got here, what the words actually mean, and why this decade is different from every AI hype cycle before it.
Seventy years of waves — and why this one didn't crash.
The family tree, the recipe, and the cake.
Training vs. inference. Context windows. Cutoffs.
Money, time, and carbon — the numbers nobody puts on the landing page.
Experts wrote logic by hand. ELIZA, early chess engines. Couldn't handle messy reality.
Spam filters, recommendation engines. Needed huge labeled datasets to work well.
Many-layer networks, massive datasets, GPU acceleration. AlphaGo, ChatGPT, image generators.
1956 Dartmouth · 1997 Deep Blue · 2016 AlphaGo · 2022 ChatGPT
None of these on their own would have been enough. All three together changed what was possible.
These terms get used interchangeably. They mean different things. Knowing which is which is the first unlock.
ChatGPT is a GenAI, which is a kind of Deep Learning, which is a kind of Machine Learning, which is a kind of AI. All four statements are correct.
IF temperature > 30°C
THEN turn on cooling
ELSE turn off cooling
Learn from 1M examples of
temperature/comfort data →
predict optimal settings
An algorithm tells a computer how to solve a problem. A model is what you get after running the algorithm on real data. Two teams with the same algorithm and different data produce different models.
The three ideas that explain most of what you experience when you use ChatGPT, Claude, or Gemini.
Common misconception: "The AI is learning from me right now." No — it's applying patterns it already learned.
Learns decision boundaries between categories. Outputs a class, a score, a yes/no.
Easier to measure. Easier to validate. Can be overconfident.
Learns the distribution of its training data and samples new outputs from it.
Harder to validate. Can "hallucinate" — produce plausible but false output.
2020 GPT-3 · 2022 ChatGPT · 2023 GPT-4 · 2024 Claude 3.5, Llama 3, Gemini 1.5 · 2025 GPT-5, Claude 4, Gemini 2.5
The maximum text the model can "see" at once — your prompt plus its response. A token ≈ 0.75 words.
Why it matters: long documents need splitting, early conversation gets forgotten.
The latest date in the model's training data. Everything after is unknown unless the model can search the web.
Why it matters: current events, new laws, new libraries — the model doesn't know them.
Biased data produces biased models. Incomplete data produces blind spots. Mislabeled data teaches wrong answers confidently. Amazon's recruiting AI showed bias against women because it learned from ten years of male-dominated hiring. Quality beats quantity — every time.
Efficient algorithms · renewable power · model compression · edge inference — the next decade's engineering fight.
AI is not magic — it's data + compute + statistics, scaled to an unprecedented degree. Understand how it learns, and you'll know when to trust it.
Next up · Module 02 — responsibility: bias, hallucinations, and the human in the loop.