AI Foundations · Module 01 April 2026
Demystifying·AI.

How we got here, what the words actually mean, and why this decade is different from every AI hype cycle before it.

What we'll cover 02 / 18
Agenda

Four questions. One working mental model.

01

Where did AI come from?

Seventy years of waves — and why this one didn't crash.

02

What are ML, DL, GenAI?

The family tree, the recipe, and the cake.

03

How do modern LLMs work?

Training vs. inference. Context windows. Cutoffs.

04

What does it actually cost?

Money, time, and carbon — the numbers nobody puts on the landing page.

The journey of AI 03 / 18
1950 → today

Three eras of artificial intelligence.

1950s – 1980s · Symbolic

Hand-coded rules

Experts wrote logic by hand. ELIZA, early chess engines. Couldn't handle messy reality.

1990s – 2010s · Statistical ML

Learning from data

Spam filters, recommendation engines. Needed huge labeled datasets to work well.

2012 → today · Deep learning

Neural networks at scale

Many-layer networks, massive datasets, GPU acceleration. AlphaGo, ChatGPT, image generators.

1956 Dartmouth · 1997 Deep Blue · 2016 AlphaGo · 2022 ChatGPT

Why this wave didn't crash 04 / 18
The perfect storm

Three forces arrived in the same decade.

Data
The internet produced more text, images, and code than any prior civilization — a training set that didn't exist before 2010.
Internet · Web · Social
GPUs
Graphics cards built for gaming and crypto turned out to be ideal for training neural networks.
NVIDIA · CUDA
200×
NVIDIA's market value rose more than 200-fold over the last decade — the symbol of the hardware boom powering modern AI.
Market cap · 2014 → today

None of these on their own would have been enough. All three together changed what was possible.

Part One 05 / 18
01
Part One · The words

AI, ML, DL, GenAI — what's what.

These terms get used interchangeably. They mean different things. Knowing which is which is the first unlock.

Three levels of AI 06 / 18
Narrow → General → Super

Today's AI is narrow. The rest is aspiration.

ANI · TODAY

Narrow

Good at one thing.
Excellent at specific tasks. Cannot transfer knowledge across domains.
Examples
Siri · Google Translate · Netflix recommendations · every LLM you use today.
AGI · THE GOAL

General

Good at everything humans are.
Human-level intelligence across any domain. Can learn and adapt flexibly.
Status
Does not exist. Active target of OpenAI, Anthropic, DeepMind, others.
ASI · SPECULATION

Super

Beyond human ability.
Surpasses humans in all areas. Purely theoretical.
Status
Science fiction territory. Subject of debate — not engineering yet.
The family tree 07 / 18
Russian nesting dolls

Each term lives inside the one before it.

Artificial Intelligence
Machines simulating human intelligence
Machine Learning
Learns from data, not rules
Deep Learning
Multi-layer neural networks
GenAI · LLMs
Creates new content

ChatGPT is a GenAI, which is a kind of Deep Learning, which is a kind of Machine Learning, which is a kind of AI. All four statements are correct.

Rules vs. patterns 08 / 18
A different kind of software

The fundamental shift: from coding rules to learning from data.

Traditional software

You write the rules.

IF temperature > 30°C THEN turn on cooling ELSE turn off cooling
  • Deterministic · same input → same output
  • Logic is explicit and auditable
  • Update by changing code
  • Test by checking every path
ML software

The system learns the rules.

Learn from 1M examples of temperature/comfort data → predict optimal settings
  • Probabilistic · same input may produce different output
  • Logic is learned, not written
  • Update by retraining with new data
  • Test by statistical validation
Three kinds of machine learning 09 / 18
How does it learn?

Three answers, three kinds of ML.

KIND 01

Supervised

Flashcards with answers on the back.
Learn from labeled examples. Each training example has an input and the correct output.
In your life
Spam filters · image classification · credit scoring.
KIND 02

Unsupervised

Sorting coins without knowing their values.
Find patterns in unlabeled data. The system groups, clusters, compresses.
In your life
Customer segmentation · anomaly detection · topic clustering.
KIND 03

Reinforcement

Training a dog with treats.
Learn through trial, error, and reward signals over time.
In your life
AlphaGo · game-playing agents · robotics · RLHF for LLMs.
Algorithm vs. model 10 / 18
The recipe and the cake
Algorithmis the recipe.
Datais the ingredients.
Modelis the cake you baked.

An algorithm tells a computer how to solve a problem. A model is what you get after running the algorithm on real data. Two teams with the same algorithm and different data produce different models.

Part Two 11 / 18
02
Part Two · How LLMs work

Training, inference, and the working memory.

The three ideas that explain most of what you experience when you use ChatGPT, Claude, or Gemini.

Training vs. inference 12 / 18
Learning vs. performing

Two phases. Very different economics.

Training · learning

A student studying to become a doctor.

  • Months of computation
  • Millions of dollars
  • Terabytes of data
  • Thousands of GPUs
  • Happens once, or periodically
Inference · performing

The doctor diagnosing a patient.

  • Milliseconds to seconds per request
  • Relatively cheap per call
  • Happens every time you prompt
  • What you actually pay for as a user
  • Model is frozen — not learning from you

Common misconception: "The AI is learning from me right now." No — it's applying patterns it already learned.

Two paradigms of AI 13 / 18
Analyzer vs. creator

Predictive AI decides. Generative AI creates.

Predictive · the analyzer

Sorts. Scores. Flags.

Learns decision boundaries between categories. Outputs a class, a score, a yes/no.

  • Spam filters · fraud detection
  • Credit scoring · medical diagnosis
  • Recommendation engines

Easier to measure. Easier to validate. Can be overconfident.

Generative · the creator

Writes. Draws. Codes.

Learns the distribution of its training data and samples new outputs from it.

  • ChatGPT, Claude · text
  • DALL·E, Midjourney · images
  • GitHub Copilot, Cursor · code

Harder to validate. Can "hallucinate" — produce plausible but false output.

What makes LLMs different 14 / 18
The game changer

One model. Countless tasks.

1T+ params
Frontier LLMs encode billions to trillions of learned patterns — the "weights" of the network.
GPT-4 · Claude · Gemini
Internet-scale
Pre-trained on huge text corpora by predicting the next word, over and over, for billions of steps.
Common Crawl · books · code
General-purpose
Before: narrow task-specific models. Now: one model that translates, summarizes, reasons, codes, explains.
The LLM revolution

2020 GPT-3 · 2022 ChatGPT · 2023 GPT-4 · 2024 Claude 3.5, Llama 3, Gemini 1.5 · 2025 GPT-5, Claude 4, Gemini 2.5

The two limits that shape every answer 15 / 18
Working memory · frozen knowledge

What the model sees and what the model knows.

Context window

Working memory, measured in tokens.

The maximum text the model can "see" at once — your prompt plus its response. A token ≈ 0.75 words.

  • GPT-4 · 128K tokens (~96K words)
  • GPT-5 · 400K tokens
  • Claude 4 · 200K – 1M tokens
  • Gemini 2.5 Pro · 2M tokens

Why it matters: long documents need splitting, early conversation gets forgotten.

Knowledge cutoff

An encyclopedia with a publication date.

The latest date in the model's training data. Everything after is unknown unless the model can search the web.

  • GPT-5 · Oct 2024
  • Claude 4.5 · Apr 2025
  • Gemini 2.5 · Jan 2025
  • Llama 4 · Aug 2024

Why it matters: current events, new laws, new libraries — the model doesn't know them.

The one rule that governs everything 16 / 18
Garbage in, garbage out
AI is only as good as the data it was trained on.
GIGO principle · the oldest law in computing, rediscovered

Biased data produces biased models. Incomplete data produces blind spots. Mislabeled data teaches wrong answers confidently. Amazon's recruiting AI showed bias against women because it learned from ten years of male-dominated hiring. Quality beats quantity — every time.

The true cost of AI 17 / 18
Money · time · carbon

The numbers that don't appear on the landing page.

~$500M
Estimated training cost of a 2025 frontier model (GPT-5 class). GPT-3 cost ~$4.6M in 2020 — a hundredfold increase in five years.
Training · 2020 → 2025
~$100B+
Enterprise AI spend in 2025 — training, inference, and infrastructure combined. Inference at scale is a utility-bill-sized operation.
IDC · Enterprise AI · 2025
280t CO₂
CO₂ emissions of training one large model — roughly the lifetime emissions of five cars. Data centers use ~1% of global electricity.
Environmental · peer-reviewed

Efficient algorithms · renewable power · model compression · edge inference — the next decade's engineering fight.

End of Module 01 18 / 18
// TAKE HOME
Know the words.
Respect the limits.
Judge the output.

AI is not magic — it's data + compute + statistics, scaled to an unprecedented degree. Understand how it learns, and you'll know when to trust it.

Next up · Module 02 — responsibility: bias, hallucinations, and the human in the loop.