Module 2 — Responsible AI

AI Foundations · Module 02 April 2026

Responsible.·AI.

The human in the loop. When AI decides about loans, hires, and diagnoses, someone still has to be accountable — and that someone is you.

Why this module 02 / 16

The stakes, in one sentence

AI is already deciding who gets hired, loans, and care.

Technical excellence is necessary but not sufficient. Without a foundation of fairness, accountability, transparency, and privacy, AI systems amplify past discrimination and cause real harm. This module is about the guardrails — what they are, why they exist, and your role in using them.

Part One 03 / 16

Part One · Ethical challenges

Two problems that don't fix themselves.

Bias and hallucinations are not bugs in the normal sense. They're properties of how the system learns.

AI bias 04 / 16

The inherited problem

AI reflects the data it was trained on — including the unfairness in it.

Where bias comes from

Historical — past discrimination baked into the data
Representation — some groups underrepresented in training
Measurement — collection methods favor certain groups
Aggregation — one-size-fits-all models ignore minority patterns

What it looks like in the world

Healthcare — algorithms allocating less care to Black patients
Criminal justice — risk scores with racial bias
Hiring — resume screening favoring male candidates
Credit — lower limits for women (Apple Card, 2019)

Bias is not intentional. That's what makes it dangerous — nobody thinks they have to check.

Hallucinations 05 / 16

When the model gets creative with facts

AI predicts likely words — not true ones.

Why they happen

LLMs optimize for plausibility, not truth
Training data includes misinformation
Systems are pressured to always produce an answer
Models can't say "I don't know" by default

Real consequences

Lawyers citing non-existent cases in court
Students submitting essays with fabricated references
Medical misinformation presented confidently
Business decisions made on invented data

Prevention: always verify, cross-reference multiple sources, treat AI as a starting point, and prefer tools with citation capability (NotebookLM, Gemini) for anything you'll act on.

Part Two 06 / 16

Part Two · Building responsibility

Six principles. One moving target.

Global regulators, standards bodies, and industry coalitions mostly agree on the principles. The hard part is operationalizing them.

The six core principles 07 / 16

Transparency → accountability → human oversight

The ethical foundation of every AI system worth trusting.

Transparency

Understand how an AI system reaches its decisions — to the extent possible.

Accountability

Clear human ownership for every AI-driven action and its consequences.

Fairness

Equal treatment across groups. Audit for bias. Measure outcomes, not intent.

Privacy

Minimize data. Protect personal information. Consent, not assumption.

Safety

Prevent foreseeable harm — technical, social, psychological.

Human in the loop

Keep meaningful human oversight on any decision that affects people's lives.

IEEE · Partnership on AI · UNESCO · EU AI Act (prohibited uses in force Feb 2025 · high-risk systems Aug 2026) — different texts, same backbone.

Guardrails 08 / 16

Keeping AI on track

Layered safety mechanisms — not a single switch.

LAYER 01

Input filters

Block harmful or out-of-scope requests before the model sees them.

LAYER 02

Model-level

Training and fine-tuning shape the model's baseline values and refusals.

LAYER 03

Output filters

Screen generated content before it reaches the user — toxicity, PII, violence.

LAYER 04

Content moderation

Remove inappropriate material at the application layer.

LAYER 05

Safety classifiers

Specialized detectors for self-harm, extremism, CSAM, credential leaks.

LAYER 06

Human review

Escalation paths for edge cases. The final fallback that can't be bypassed.

The challenge: balance safety with usefulness. Over-restrictive guardrails block legitimate work; under-restrictive ones create real harm.

Part Three 09 / 16

Part Three · Learning from failure

When safeguards weren't there.

Two high-profile failures, each with a lesson that shaped the industry's current defaults.

Case · Amazon recruiting AI 10 / 16

Amazon · 2014 → 2018 Hiring · HR automation

10 years

of historical resumes used to train a recruiting model — and the model learned the bias in them.

What happened: the system downgraded resumes containing the word "women's" and penalized graduates of all-women colleges. It learned that historically, male candidates were preferred — and treated that history as a rule. The response: Amazon scrapped the system entirely. They couldn't guarantee the bias could be removed. The lesson: training on biased history automates the bias. Always audit outcomes, not just accuracy.

Case · Microsoft Tay 11 / 16

Microsoft · 2016 Conversational AI · Twitter

24 hours

from friendly launch to forced shutdown, after coordinated trolls fed the bot racist and extremist content.

What happened: Tay was designed to learn from Twitter conversations. Trolls exploited the "repeat after me" behavior and the absence of content filters. Within hours the bot was posting deeply offensive content. Microsoft took it offline. The lesson: any system that learns from public input and acts in public needs adversarial testing, content filters, and rate limits. Assume users will try to break it — because some will.

Part Four 12 / 16

Part Four · Transparency & privacy

The black box and the data inside it.

Two practical problems that show up on the job: unexplainable decisions, and the temptation to feed AI things it shouldn't see.

The black box problem 13 / 16

Why "because the model said so" doesn't cut it

Modern models can't always explain themselves — but the affected person still deserves an explanation.

Why decisions become opaque

Millions of calculations per inference
No single rule explains any answer
Even developers can't trace the logic
Patterns are learned in spaces humans can't see

Where it hurts

Credit — "denied" with no explanation
Medical — AI flags a scan, doctor doesn't know what it saw
Hiring — "not selected", criteria unclear
Moderation — posts removed without reason

What regulators now require: the right to an explanation, the right to appeal, and evidence the system was audited for fairness. This is why explainability is a skill — not a nice-to-have.

What you can — and can't — put into AI 14 / 16

Data privacy in practice

When you paste it into ChatGPT, you're sharing it with more than ChatGPT.

Data type	Public AI (ChatGPT, Gemini)	Enterprise / approved AI tool
Public information	Allowed	Allowed
Internal non-sensitive	Strip identifying details	Allowed
Client data	Never	With approval
Personal employee data	Never	With approval
Passwords, API keys, credentials	Never	Never

Pseudonymize by default. "Private mode" is a UI affordance — not a data-protection guarantee.

The short list 15 / 16

Keep these six rules within reach

Verifyevery fact, every citation, every number before you publish.

Minimizethe data. Only share what the task actually needs.

Anonymizenames, IDs, and specifics when you ask for help.

Consentfor photos, recordings, and personal stories — always.

Never sharepasswords, keys, client data, or proprietary code with public tools.

Be transparentwhen AI meaningfully shaped your work.

End of Module 02 16 / 16

// TAKE HOME

Verify.
Respect.
Own the decision.

AI can draft, suggest, and accelerate — but accountability doesn't transfer. The human in the loop is not a compliance checkbox. It's the job.

Next up · Module 03 — the tools of the trade. Module 04 — deepfakes, your rights, and AI at work.