Module 3: Neural Networks – Teaching Computers to Think

πŸ“š On This Page

Module 3: Neural Networks – Teaching Computers to Think

Duration: Week 5-6
Difficulty: Intermediate
Prerequisites: Modules 1-2 completed

🎯 What You’ll Learn

Your brain has billions of neurons that work together to help you think. We’ll build artificial neurons that work together to help computers “think”!

The Big Picture:

Human Brain          β†’    Artificial Neural Network
Neurons (cells)      β†’    Nodes (math functions)
Connections          β†’    Weights (numbers)
Learning             β†’    Adjusting weights

🧠 Core Concepts

1. The Perceptron – A Single Artificial Neuron

Think of it like a decision-maker:

  • Inputs: Gets information (is it sunny? is it weekend?)
  • Weights: How important is each input?
  • Output: Makes a decision (go to beach: yes/no)

Math (simple!):

def perceptron(inputs, weights, threshold):
    # 1. Multiply each input by its weight
    weighted_sum = sum(i * w for i, w in zip(inputs, weights))
    
    # 2. If sum > threshold β†’ output 1, else β†’ output 0
    return 1 if weighted_sum > threshold else 0

You’ll code this in 10 lines of Python!

2. Activation Functions – Adding Flexibility

Problem: Perceptron can only draw straight lines
Solution: Activation functions add curves!

Types:

Sigmoid: Smooth S-curve (0 to 1)

def sigmoid(x): return 1 / (1 + np.exp(-x))

ReLU: Simple but powerful

def relu(x): return max(0, x) # If negative β†’ 0, else β†’ keep same

Tanh: Like sigmoid but (-1 to 1)

def tanh(x): return np.tanh(x)

Visual: We’ll plot these so you SEE the difference!

3. Multi-Layer Networks – Stacking Neurons

One neuron is like one student. A network is like a whole classroom working together!

Structure:

Input Layer β†’ Hidden Layer(s) β†’ Output Layer
  [Data]    β†’  [Processing]   β†’  [Answer]

Example: Recognizing handwritten digits

  • Input: 784 pixels (28×28 image)
  • Hidden: 128 neurons (finding patterns)
  • Output: 10 neurons (one for each digit 0-9)

4. Backpropagation – How Networks Learn

This is the “magic” that makes neural networks work!

Simple explanation:

1. Make a prediction (forward pass)
2. See how wrong you were (calculate error)
3. Figure out which weights caused the error (backward pass)
4. Adjust those weights a tiny bit (learning!)
5. Repeat 1000s of times

Analogy: Like practicing free throws in basketball

  • Shoot (prediction)
  • Miss (error)
  • Adjust your form (update weights)
  • Try again (next iteration)

πŸ› οΈ Hands-on Projects

Project 1: Build a Neural Network Library from Scratch

Goal: Create your own mini-PyTorch!

Components:

class Layer:
    def forward(self, inputs):
        """Compute outputs"""
        pass
    
    def backward(self, grad):
        """Compute gradients"""
        pass

class Network: def __init__(self): self.layers = [] def add(self, layer): self.layers.append(layer) def train(self, X, y, epochs=100): for epoch in range(epochs): # Forward pass output = self.forward(X) # Calculate loss loss = self.loss_function(output, y) # Backward pass self.backward(loss) # Update weights self.update_weights()

Lines of code: ~200 (you’ll understand every single one!)

Project 2: MNIST Digit Classifier

Dataset: 60,000 handwritten digits
Your network: Input(784) β†’ Hidden(128) β†’ Output(10)
Goal: 95%+ accuracy
Time to train: 5-10 minutes on your laptop!

Cool demo: Draw digits in Paint, your AI recognizes them!

Project 3: Automatic Differentiation (Autograd)

Build the tool that PyTorch uses internally!

What it does: Automatically calculates derivatives
Why it’s cool: Never calculate gradients by hand again!

Concept:

You write:

y = x2 + 3*x + 5

Autograd automatically knows:

dy/dx = 2*x + 3

πŸ“ The Math You’ll Master

Chain Rule (The heart of backpropagation)

If y = f(g(x)), then:
dy/dx = (dy/dg) Γ— (dg/dx)

In English: To find how output changes with input, multiply the changes at each step!

Example:

y = (x + 2)Β²

Let g(x) = x + 2 Then y = gΒ²

dy/dg = 2g = 2(x + 2) dg/dx = 1

dy/dx = 2(x + 2) Γ— 1 = 2(x + 2)

We’ll use visual diagrams and step-by-step examples!

Optimization Algorithms

1. SGD (Stochastic Gradient Descent)

  • Basic: Move opposite to gradient
  • Like: Walking downhill to find the valley

2. Momentum

  • Improvement: Remember previous direction
  • Like: Rolling a ball downhill (builds speed)

3. Adam (Most popular!)

  • Smart: Adapts learning rate for each weight
  • Like: GPS that adjusts your route in real-time

⚑ ISL Optimization – Training on 16GB RAM

Problem: Big networks use lots of memory

Solutions you’ll implement:

1. Gradient Checkpointing

  • Don’t save all intermediate values
  • Recalculate when needed
  • Result: 10x less memory, 30% slower (worth it!)

2. Mixed Precision Training

  • Use 16-bit numbers instead of 32-bit
  • Result: 2x faster, 2x less memory
  • Accuracy: Almost the same!

3. Smart Batch Sizing

def find_max_batch_size(model, input_shape):
    batch_size = 1
    while True:
        try:
            # Try to process this batch
            test_input = torch.randn(batch_size, *input_shape)
            model(test_input)
            batch_size *= 2
        except RuntimeError:  # Out of memory!
            return batch_size // 2

πŸ“š Resources

Videos (MUST WATCH!)

Reading

Datasets

  • MNIST: Handwritten digits
  • Fashion-MNIST: Clothing items
  • Your own handwritten digits!

βœ… Learning Checklist

  • [ ] Implement perceptron from scratch
  • [ ] Understand activation functions
  • [ ] Build multi-layer neural network
  • [ ] Implement backpropagation
  • [ ] Train MNIST classifier (95%+ accuracy)
  • [ ] Create automatic differentiation system
  • [ ] Apply gradient checkpointing
  • [ ] Use mixed precision training

πŸš€ Next Steps

Module 4: Advanced AI Architectures

Learn CNNs for images, RNNs for sequences, and Transformers with attention!

πŸ“š References & Further Reading

Dive deeper with these carefully selected resources:

πŸ“ Related Topics

  • β†’
    Understanding Backpropagation Step by Step
  • β†’
    Activation Functions: ReLU vs Sigmoid vs Tanh
  • β†’
    Gradient Descent Optimization