π On This Page
Module 3: Neural Networks – Teaching Computers to Think
Duration: Week 5-6
Difficulty: Intermediate
Prerequisites: Modules 1-2 completed
—
π― What You’ll Learn
Your brain has billions of neurons that work together to help you think. We’ll build artificial neurons that work together to help computers “think”!
The Big Picture:
Human Brain β Artificial Neural Network
Neurons (cells) β Nodes (math functions)
Connections β Weights (numbers)
Learning β Adjusting weights
—
π§ Core Concepts
1. The Perceptron – A Single Artificial Neuron
Think of it like a decision-maker:
- Inputs: Gets information (is it sunny? is it weekend?)
- Weights: How important is each input?
- Output: Makes a decision (go to beach: yes/no)
Math (simple!):
def perceptron(inputs, weights, threshold):
# 1. Multiply each input by its weight
weighted_sum = sum(i * w for i, w in zip(inputs, weights))
# 2. If sum > threshold β output 1, else β output 0
return 1 if weighted_sum > threshold else 0
You’ll code this in 10 lines of Python!
—
2. Activation Functions – Adding Flexibility
Problem: Perceptron can only draw straight lines
Solution: Activation functions add curves!
Types:
Sigmoid: Smooth S-curve (0 to 1)
def sigmoid(x):
return 1 / (1 + np.exp(-x))ReLU: Simple but powerful
def relu(x):
return max(0, x) # If negative β 0, else β keep sameTanh: Like sigmoid but (-1 to 1)
def tanh(x):
return np.tanh(x)
Visual: We’ll plot these so you SEE the difference!
—
3. Multi-Layer Networks – Stacking Neurons
One neuron is like one student. A network is like a whole classroom working together!
Structure:
Input Layer β Hidden Layer(s) β Output Layer
[Data] β [Processing] β [Answer]
Example: Recognizing handwritten digits
- Input: 784 pixels (28×28 image)
- Hidden: 128 neurons (finding patterns)
- Output: 10 neurons (one for each digit 0-9)
—
4. Backpropagation – How Networks Learn
This is the “magic” that makes neural networks work!
Simple explanation:
1. Make a prediction (forward pass)
2. See how wrong you were (calculate error)
3. Figure out which weights caused the error (backward pass)
4. Adjust those weights a tiny bit (learning!)
5. Repeat 1000s of times
Analogy: Like practicing free throws in basketball
- Shoot (prediction)
- Miss (error)
- Adjust your form (update weights)
- Try again (next iteration)
—
π οΈ Hands-on Projects
Project 1: Build a Neural Network Library from Scratch
Goal: Create your own mini-PyTorch!
Components:
class Layer:
def forward(self, inputs):
"""Compute outputs"""
pass
def backward(self, grad):
"""Compute gradients"""
passclass Network:
def __init__(self):
self.layers = []
def add(self, layer):
self.layers.append(layer)
def train(self, X, y, epochs=100):
for epoch in range(epochs):
# Forward pass
output = self.forward(X)
# Calculate loss
loss = self.loss_function(output, y)
# Backward pass
self.backward(loss)
# Update weights
self.update_weights()
Lines of code: ~200 (you’ll understand every single one!)
—
Project 2: MNIST Digit Classifier
Dataset: 60,000 handwritten digits
Your network: Input(784) β Hidden(128) β Output(10)
Goal: 95%+ accuracy
Time to train: 5-10 minutes on your laptop!
Cool demo: Draw digits in Paint, your AI recognizes them!
—
Project 3: Automatic Differentiation (Autograd)
Build the tool that PyTorch uses internally!
What it does: Automatically calculates derivatives
Why it’s cool: Never calculate gradients by hand again!
Concept:
You write:
y = x2 + 3*x + 5Autograd automatically knows:
dy/dx = 2*x + 3
—
π The Math You’ll Master
Chain Rule (The heart of backpropagation)
If y = f(g(x)), then:
dy/dx = (dy/dg) Γ (dg/dx)In English: To find how output changes with input,
multiply the changes at each step!
Example:
y = (x + 2)Β²Let g(x) = x + 2
Then y = gΒ²
dy/dg = 2g = 2(x + 2)
dg/dx = 1
dy/dx = 2(x + 2) Γ 1 = 2(x + 2)
We’ll use visual diagrams and step-by-step examples!
—
Optimization Algorithms
1. SGD (Stochastic Gradient Descent)
- Basic: Move opposite to gradient
- Like: Walking downhill to find the valley
2. Momentum
- Improvement: Remember previous direction
- Like: Rolling a ball downhill (builds speed)
3. Adam (Most popular!)
- Smart: Adapts learning rate for each weight
- Like: GPS that adjusts your route in real-time
—
β‘ ISL Optimization – Training on 16GB RAM
Problem: Big networks use lots of memory
Solutions you’ll implement:
1. Gradient Checkpointing
- Don’t save all intermediate values
- Recalculate when needed
- Result: 10x less memory, 30% slower (worth it!)
2. Mixed Precision Training
- Use 16-bit numbers instead of 32-bit
- Result: 2x faster, 2x less memory
- Accuracy: Almost the same!
3. Smart Batch Sizing
def find_max_batch_size(model, input_shape):
batch_size = 1
while True:
try:
# Try to process this batch
test_input = torch.randn(batch_size, *input_shape)
model(test_input)
batch_size *= 2
except RuntimeError: # Out of memory!
return batch_size // 2
—
π Resources
Videos (MUST WATCH!)
Reading
Datasets
- MNIST: Handwritten digits
- Fashion-MNIST: Clothing items
- Your own handwritten digits!
—
β Learning Checklist
- [ ] Implement perceptron from scratch
- [ ] Understand activation functions
- [ ] Build multi-layer neural network
- [ ] Implement backpropagation
- [ ] Train MNIST classifier (95%+ accuracy)
- [ ] Create automatic differentiation system
- [ ] Apply gradient checkpointing
- [ ] Use mixed precision training
—
π Next Steps
Module 4: Advanced AI Architectures
Learn CNNs for images, RNNs for sequences, and Transformers with attention!
π References & Further Reading
Dive deeper with these carefully selected resources:
-
π Neural Networks and Deep Learning
by Michael Nielsen
-
π Backpropagation Explained
by 3Blue1Brown
-
π PyTorch Tutorials
by PyTorch Team
π Related Topics
-
β
Understanding Backpropagation Step by Step -
β
Activation Functions: ReLU vs Sigmoid vs Tanh -
β
Gradient Descent Optimization