π On This Page
Module 8: Deploy Your AI – Share It With The World!
Duration: Week 15-16
Difficulty: Intermediate
Prerequisites: Modules 1-7 completed
—
π― What You’ll Learn
Turn your trained models into real applications that anyone can use!
—
π¦ Part 1: Model Export – Making Models Portable
The Problem
You trained in PyTorch, but want to run on:
- Phones (Android/iOS)
- Websites (JavaScript)
- Other frameworks
Solution: ONNX (Open Neural Network Exchange)
Think of ONNX like a universal translator:
PyTorch Model β ONNX β Run anywhere!
Project: Export to ONNX
Your PyTorch model
model = YourModel()Export (one line!)
torch.onnx.export(model, dummy_input, "model.onnx")Now use ONNX Runtime (2-3x faster!)
—
π’ Part 2: Quantization – Make Models Tiny & Fast
The Magic of Quantization
Before: 32-bit numbers (4 bytes each)
After: 8-bit numbers (1 byte each)Result:
- 4x smaller file size
- 3-4x faster inference
- <1% accuracy loss
Types of Quantization
1. Dynamic Quantization (Easiest)
- One line of code
- Instant speedup
2. Static Quantization (Better)
- Need calibration data
- Better accuracy
3. Quantization-Aware Training (Best)
- Train with quantization in mind
- Best accuracy
Project: Quantize Your Model
- Take your trained classifier
- Apply dynamic quantization
- Benchmark: Size, speed, accuracy
- Goal: 4x smaller, 3x faster!
---
π Part 3: Building a Web API
What's an API?
A way for programs to talk to each other!
Example:
User uploads image β Your API β Model predicts β Return "Cat!"
FastAPI - Easy Python Web Framework
from fastapi import FastAPI, Fileapp = FastAPI()
@app.post("/predict")
def predict(image: File):
# Load image
# Run model
# Return prediction
return {"class": "cat", "confidence": 0.95}
Project: Build Image Classifier API
- Create FastAPI server
- Load quantized model
- Accept image uploads
- Return predictions
- Deploy locally
---
π¨ Part 4: Creating a User Interface
Option 1: Gradio (Easiest)
import gradio as grdef classify(image):
return "Cat: 95%"
gr.Interface(fn=classify,
inputs="image",
outputs="text").launch()
3 lines β Beautiful web interface!
Option 2: Custom HTML/JavaScript
- Build custom website
- Upload images
- Display results beautifully
Project: Create Web Interface
- Use Gradio for quick demo
- Build custom HTML page
- Connect to FastAPI backend
- Style with CSS
---
π³ Part 5: Docker - Package Everything Together
What's Docker?
Put your entire app in a "container" that runs anywhere!
Analogy: Like a shipping container
- Pack everything inside
- Ship anywhere
- Works the same everywhere
Dockerfile Example
FROM python:3.9
COPY model.onnx /app/
COPY api.py /app/
RUN pip install fastapi onnxruntime
CMD ["python", "api.py"]
Project: Dockerize Your App
- Write Dockerfile
- Build container
- Run locally
- Share with friends!
---
β‘ ISL Optimization - Fast Inference
1. Batch Inference
Instead of:
Predict image 1 β 100ms
Predict image 2 β 100ms
Total: 200msDo:
Predict [image 1, 2] together β 120ms
Total: 120ms (2x faster!)
2. Caching
If same image uploaded β Return cached result
No need to run model again!
3. Model Pruning
- Find weights close to zero
- Remove them
- Retrain briefly
- Result: 50% smaller!
4. CPU Optimizations
- Use AVX2 (most CPUs have this)
- Result: 2x faster on same hardware!
---
π Real-World Deployment Options
Free Options
1. Hugging Face Spaces (Free hosting!)
- Upload Gradio app
- Get free URL
2. Google Colab (For demos)
- Run in notebook
- Free GPU!
3. Render / Railway (Free tier)
- Deploy FastAPI app
- Get public URL
---
π Resources
- FastAPI documentation
- Gradio documentation
- ONNX Runtime guides
- Docker tutorials
- Hugging Face Spaces
---
β Learning Checklist
- [ ] Export models to ONNX
- [ ] Quantize models for 4x speedup
- [ ] Build REST API with FastAPI
- [ ] Create web interface with Gradio
- [ ] Deploy with Docker
- [ ] Optimize inference for production
---
π Final Project Ideas
1. Image Classifier Website: Upload photo, get classification
2. Text Generator API: Send prompt, receive story
3. Object Detection App: Upload image, see boxes
4. Chatbot Interface: Simple conversational AI
5. Style Transfer Tool: Turn photos into paintings
---
π Congratulations!
You've completed the AI Engineering Syllabus!
You now know how to:
β
Build AI models from scratch
β
Train them efficiently on 16GB RAM
β
Deploy them as real applications
β
Optimize for speed and memory
Keep learning, keep building, and share your projects with the world! π
π References & Further Reading
Dive deeper with these carefully selected resources:
-
π ONNX Documentation
by ONNX Team
-
π FastAPI Documentation
by SebastiΓ‘n RamΓrez
-
π Docker for ML
by Docker Team
π Related Topics
-
β
Model Deployment: Best Practices -
β
Building REST APIs with FastAPI -
β
Containerizing ML Models with Docker