📚 On This Page

Module 3: Neural Networks – Teaching Computers to Think

Duration: Week 5-6
Difficulty: Intermediate
Prerequisites: Modules 1-2 completed

—

🎯 What You’ll Learn

Your brain has billions of neurons that work together to help you think. We’ll build artificial neurons that work together to help computers “think”!

The Big Picture:

Human Brain          →    Artificial Neural Network
Neurons (cells)      →    Nodes (math functions)
Connections          →    Weights (numbers)
Learning             →    Adjusting weights

—

🧠 Core Concepts

1. The Perceptron – A Single Artificial Neuron

Think of it like a decision-maker:

Inputs: Gets information (is it sunny? is it weekend?)
Weights: How important is each input?
Output: Makes a decision (go to beach: yes/no)

Math (simple!):

def perceptron(inputs, weights, threshold):
    # 1. Multiply each input by its weight
    weighted_sum = sum(i * w for i, w in zip(inputs, weights))
    
    # 2. If sum > threshold → output 1, else → output 0
    return 1 if weighted_sum > threshold else 0

You’ll code this in 10 lines of Python!

—

2. Activation Functions – Adding Flexibility

Problem: Perceptron can only draw straight lines
Solution: Activation functions add curves!

Types:

Sigmoid: Smooth S-curve (0 to 1)
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
ReLU: Simple but powerful
def relu(x):
    return max(0, x)  # If negative → 0, else → keep same
Tanh: Like sigmoid but (-1 to 1)
def tanh(x):
    return np.tanh(x)

Visual: We’ll plot these so you SEE the difference!

—

3. Multi-Layer Networks – Stacking Neurons

One neuron is like one student. A network is like a whole classroom working together!

Structure:

Input Layer → Hidden Layer(s) → Output Layer
  [Data]    →  [Processing]   →  [Answer]

Example: Recognizing handwritten digits

Input: 784 pixels (28×28 image)
Hidden: 128 neurons (finding patterns)
Output: 10 neurons (one for each digit 0-9)

—

4. Backpropagation – How Networks Learn

This is the “magic” that makes neural networks work!

Simple explanation:

1. Make a prediction (forward pass)
2. See how wrong you were (calculate error)
3. Figure out which weights caused the error (backward pass)
4. Adjust those weights a tiny bit (learning!)
5. Repeat 1000s of times

Analogy: Like practicing free throws in basketball

Shoot (prediction)
Miss (error)
Adjust your form (update weights)
Try again (next iteration)

—

🛠️ Hands-on Projects

Project 1: Build a Neural Network Library from Scratch

Goal: Create your own mini-PyTorch!

Components:

class Layer:
    def forward(self, inputs):
        """Compute outputs"""
        pass
    
    def backward(self, grad):
        """Compute gradients"""
        passclass Network:
    def __init__(self):
        self.layers = []
    
    def add(self, layer):
        self.layers.append(layer)
    
    def train(self, X, y, epochs=100):
        for epoch in range(epochs):
            # Forward pass
            output = self.forward(X)
            
            # Calculate loss
            loss = self.loss_function(output, y)
            
            # Backward pass
            self.backward(loss)
            
            # Update weights
            self.update_weights()

Lines of code: ~200 (you’ll understand every single one!)

—

Project 2: MNIST Digit Classifier

Dataset: 60,000 handwritten digits
Your network: Input(784) → Hidden(128) → Output(10)
Goal: 95%+ accuracy
Time to train: 5-10 minutes on your laptop!

Cool demo: Draw digits in Paint, your AI recognizes them!

—

Project 3: Automatic Differentiation (Autograd)

Build the tool that PyTorch uses internally!

What it does: Automatically calculates derivatives
Why it’s cool: Never calculate gradients by hand again!

Concept:

You write:
y = x2 + 3*x + 5
Autograd automatically knows:
dy/dx = 2*x + 3

—

📐 The Math You’ll Master

Chain Rule (The heart of backpropagation)

If y = f(g(x)), then:
dy/dx = (dy/dg) × (dg/dx)In English: To find how output changes with input,
multiply the changes at each step!

Example:

y = (x + 2)²
Let g(x) = x + 2
Then y = g²
dy/dg = 2g = 2(x + 2)
dg/dx = 1dy/dx = 2(x + 2) × 1 = 2(x + 2)

We’ll use visual diagrams and step-by-step examples!

—

Optimization Algorithms

1. SGD (Stochastic Gradient Descent)

Basic: Move opposite to gradient
Like: Walking downhill to find the valley

2. Momentum

Improvement: Remember previous direction
Like: Rolling a ball downhill (builds speed)

3. Adam (Most popular!)

Smart: Adapts learning rate for each weight
Like: GPS that adjusts your route in real-time

—

⚡ ISL Optimization – Training on 16GB RAM

Problem: Big networks use lots of memory

Solutions you’ll implement:

1. Gradient Checkpointing

Don’t save all intermediate values
Recalculate when needed
Result: 10x less memory, 30% slower (worth it!)

2. Mixed Precision Training

Use 16-bit numbers instead of 32-bit
Result: 2x faster, 2x less memory
Accuracy: Almost the same!

3. Smart Batch Sizing

def find_max_batch_size(model, input_shape):
    batch_size = 1
    while True:
        try:
            # Try to process this batch
            test_input = torch.randn(batch_size, *input_shape)
            model(test_input)
            batch_size *= 2
        except RuntimeError:  # Out of memory!
            return batch_size // 2

—