Module 1 — Demystifying AI

AI Foundations · Module 01 April 2026

Demystifying·AI.

How we got here, what the words actually mean, and why this decade is different from every AI hype cycle before it.

What we'll cover 02 / 18

Agenda

Four questions. One working mental model.

Where did AI come from?

Seventy years of waves — and why this one didn't crash.

What are ML, DL, GenAI?

The family tree, the recipe, and the cake.

How do modern LLMs work?

Training vs. inference. Context windows. Cutoffs.

What does it actually cost?

Money, time, and carbon — the numbers nobody puts on the landing page.

The journey of AI 03 / 18

1950 → today

Three eras of artificial intelligence.

1950s – 1980s · Symbolic

Hand-coded rules

Experts wrote logic by hand. ELIZA, early chess engines. Couldn't handle messy reality.

1990s – 2010s · Statistical ML

Learning from data

Spam filters, recommendation engines. Needed huge labeled datasets to work well.

2012 → today · Deep learning

Neural networks at scale

Many-layer networks, massive datasets, GPU acceleration. AlphaGo, ChatGPT, image generators.

1956 Dartmouth · 1997 Deep Blue · 2016 AlphaGo · 2022 ChatGPT

Why this wave didn't crash 04 / 18

The perfect storm

Three forces arrived in the same decade.

Data

The internet produced more text, images, and code than any prior civilization — a training set that didn't exist before 2010.

Internet · Web · Social

GPUs

Graphics cards built for gaming and crypto turned out to be ideal for training neural networks.

NVIDIA · CUDA

200×

NVIDIA's market value rose more than 200-fold over the last decade — the symbol of the hardware boom powering modern AI.

Market cap · 2014 → today

None of these on their own would have been enough. All three together changed what was possible.

Part One 05 / 18

Part One · The words

AI, ML, DL, GenAI — what's what.

These terms get used interchangeably. They mean different things. Knowing which is which is the first unlock.

Three levels of AI 06 / 18

Narrow → General → Super

Today's AI is narrow. The rest is aspiration.

ANI · TODAY

Narrow

Good at one thing.

Excellent at specific tasks. Cannot transfer knowledge across domains.

Examples

Siri · Google Translate · Netflix recommendations · every LLM you use today.

AGI · THE GOAL

General

Good at everything humans are.

Human-level intelligence across any domain. Can learn and adapt flexibly.

Status

Does not exist. Active target of OpenAI, Anthropic, DeepMind, others.

ASI · SPECULATION

Super

Beyond human ability.

Surpasses humans in all areas. Purely theoretical.

Status

Science fiction territory. Subject of debate — not engineering yet.

The family tree 07 / 18

Russian nesting dolls

Each term lives inside the one before it.

Artificial Intelligence
Machines simulating human intelligence
Machine Learning
Learns from data, not rules
Deep Learning
Multi-layer neural networks
GenAI · LLMs
Creates new content

ChatGPT is a GenAI, which is a kind of Deep Learning, which is a kind of Machine Learning, which is a kind of AI. All four statements are correct.

Rules vs. patterns 08 / 18

A different kind of software

The fundamental shift: from coding rules to learning from data.

Traditional software

You write the rules.

IF temperature > 30°C
THEN turn on cooling
ELSE turn off cooling

Deterministic · same input → same output
Logic is explicit and auditable
Update by changing code
Test by checking every path

ML software

The system learns the rules.

Learn from 1M examples of
temperature/comfort data →
predict optimal settings

Probabilistic · same input may produce different output
Logic is learned, not written
Update by retraining with new data
Test by statistical validation

Three kinds of machine learning 09 / 18

How does it learn?

Three answers, three kinds of ML.

KIND 01

Supervised

Flashcards with answers on the back.

Learn from labeled examples. Each training example has an input and the correct output.

In your life

Spam filters · image classification · credit scoring.

KIND 02

Unsupervised

Sorting coins without knowing their values.

Find patterns in unlabeled data. The system groups, clusters, compresses.

In your life

Customer segmentation · anomaly detection · topic clustering.

KIND 03

Reinforcement

Training a dog with treats.

Learn through trial, error, and reward signals over time.

In your life

AlphaGo · game-playing agents · robotics · RLHF for LLMs.

Algorithm vs. model 10 / 18

The recipe and the cake

Algorithmis the recipe.

Datais the ingredients.

Modelis the cake you baked.

An algorithm tells a computer how to solve a problem. A model is what you get after running the algorithm on real data. Two teams with the same algorithm and different data produce different models.

Part Two 11 / 18

Part Two · How LLMs work

Training, inference, and the working memory.

The three ideas that explain most of what you experience when you use ChatGPT, Claude, or Gemini.

Training vs. inference 12 / 18

Learning vs. performing

Two phases. Very different economics.

Training · learning

A student studying to become a doctor.

Months of computation
Millions of dollars
Terabytes of data
Thousands of GPUs
Happens once, or periodically

Inference · performing

The doctor diagnosing a patient.

Milliseconds to seconds per request
Relatively cheap per call
Happens every time you prompt
What you actually pay for as a user
Model is frozen — not learning from you

Common misconception: "The AI is learning from me right now." No — it's applying patterns it already learned.

Two paradigms of AI 13 / 18

Analyzer vs. creator

Predictive AI decides. Generative AI creates.

Predictive · the analyzer

Sorts. Scores. Flags.

Learns decision boundaries between categories. Outputs a class, a score, a yes/no.

Spam filters · fraud detection
Credit scoring · medical diagnosis
Recommendation engines

Easier to measure. Easier to validate. Can be overconfident.

Generative · the creator

Writes. Draws. Codes.

Learns the distribution of its training data and samples new outputs from it.

ChatGPT, Claude · text
DALL·E, Midjourney · images
GitHub Copilot, Cursor · code

Harder to validate. Can "hallucinate" — produce plausible but false output.

What makes LLMs different 14 / 18

The game changer

One model. Countless tasks.

1T+ params

Frontier LLMs encode billions to trillions of learned patterns — the "weights" of the network.

GPT-4 · Claude · Gemini

Internet-scale

Pre-trained on huge text corpora by predicting the next word, over and over, for billions of steps.

Common Crawl · books · code

General-purpose

Before: narrow task-specific models. Now: one model that translates, summarizes, reasons, codes, explains.

The LLM revolution

2020 GPT-3 · 2022 ChatGPT · 2023 GPT-4 · 2024 Claude 3.5, Llama 3, Gemini 1.5 · 2025 GPT-5, Claude 4, Gemini 2.5

The two limits that shape every answer 15 / 18

Working memory · frozen knowledge

What the model sees and what the model knows.

Context window

Working memory, measured in tokens.

The maximum text the model can "see" at once — your prompt plus its response. A token ≈ 0.75 words.

GPT-4 · 128K tokens (~96K words)
GPT-5 · 400K tokens
Claude 4 · 200K – 1M tokens
Gemini 2.5 Pro · 2M tokens

Why it matters: long documents need splitting, early conversation gets forgotten.

Knowledge cutoff

An encyclopedia with a publication date.

The latest date in the model's training data. Everything after is unknown unless the model can search the web.

GPT-5 · Oct 2024
Claude 4.5 · Apr 2025
Gemini 2.5 · Jan 2025
Llama 4 · Aug 2024

Why it matters: current events, new laws, new libraries — the model doesn't know them.

The one rule that governs everything 16 / 18

Garbage in, garbage out

AI is only as good as the data it was trained on.

GIGO principle · the oldest law in computing, rediscovered

Biased data produces biased models. Incomplete data produces blind spots. Mislabeled data teaches wrong answers confidently. Amazon's recruiting AI showed bias against women because it learned from ten years of male-dominated hiring. Quality beats quantity — every time.

The true cost of AI 17 / 18

Money · time · carbon

The numbers that don't appear on the landing page.

~$500M

Estimated training cost of a 2025 frontier model (GPT-5 class). GPT-3 cost ~$4.6M in 2020 — a hundredfold increase in five years.

Training · 2020 → 2025

~$100B+

Enterprise AI spend in 2025 — training, inference, and infrastructure combined. Inference at scale is a utility-bill-sized operation.

IDC · Enterprise AI · 2025

280t CO₂

CO₂ emissions of training one large model — roughly the lifetime emissions of five cars. Data centers use ~1% of global electricity.

Environmental · peer-reviewed

Efficient algorithms · renewable power · model compression · edge inference — the next decade's engineering fight.

End of Module 01 18 / 18

// TAKE HOME

Know the words.
Respect the limits.
Judge the output.

AI is not magic — it's data + compute + statistics, scaled to an unprecedented degree. Understand how it learns, and you'll know when to trust it.

Next up · Module 02 — responsibility: bias, hallucinations, and the human in the loop.