Docker & Containerization

INFO 153B/253B: Backend Web Architecture

Week 5

Kay Ashaolu - Instructor

Suk Min Hwang - GSI

Today's Agenda

Part 1: Prep Work Recap - Docker basics check-in
Part 2: Containers vs VMs - Why containers won
Part 3: Images & Layers - How Docker builds efficiently
Part 4: Dockerfile Deep Dive - Every command explained
Part 5: docker-compose - Multi-container development
Demo: Dockerize a Flask App (5 min)
In-Class Exploration: Containerize a Product API (45 min)

Part 1: Prep Work Recap

Quick check-in on O'Reilly Chapter 4

What You Learned in Prep

Containers vs VMs: Lighter weight, share OS kernel
Images: Blueprints for containers (read-only)
Containers: Running instances of images
Dockerfile: Recipe for building images
docker-compose: Run multiple containers together

Key insight: Docker packages your app WITH its environment. "It works on my machine" becomes "It works everywhere."

The Problem Docker Solves

Developer laptop: Python 3.11, Flask 3.0, macOS
Server: Python 3.9, Flask 2.0, Ubuntu
Result: "It works on my machine!" (but crashes in production)

The real world: Different team members have different OS versions, different Python versions, different installed packages. Docker makes this problem disappear.

Quick Check: Docker Vocabulary

Term	Definition	Analogy
Image	Read-only template with your app	Recipe / Blueprint
Container	Running instance of an image	Cooked meal / Building
Dockerfile	Instructions to build an image	Recipe steps
Registry	Storage for images (Docker Hub)	Recipe book

You build images from Dockerfiles
You run containers from images
You push/pull images to/from registries

Quick Check: Basic Commands

# Build an image from Dockerfile
docker build -t my-flask-app .

# Run a container from an image
docker run -p 5000:5000 my-flask-app

# List running containers
docker ps

# Stop a container
docker stop <container_id>

# List all images
docker images

-t tags (names) the image
-p maps ports (host:container)
. means use current directory's Dockerfile

Prep Recap: Key Takeaways

Docker solves: "Works on my machine" problem
Images are blueprints: Built once, run anywhere
Containers are isolated: Don't affect host system
Dockerfile is the recipe: FROM, COPY, RUN, CMD

The connection: Last week you built Flask APIs that run locally. This week: package them to run anywhere.

Part 2: Containers vs VMs

Why containers won

Virtual Machines: The Old Way

How VMs Work

Full OS for each application
Hypervisor manages VMs
Complete isolation
Heavy: GBs of overhead

VM Stack

+-------+ +-------+ +-------+
| App 1 | | App 2 | | App 3 |
+-------+ +-------+ +-------+
| Guest | | Guest | | Guest |
|  OS   | |  OS   | |  OS   |
+-------+-+-------+-+-------+
|       Hypervisor          |
+---------------------------+
|        Host OS            |
+---------------------------+
|        Hardware           |
+---------------------------+

Containers: The Modern Way

How Containers Work

Share host OS kernel
Docker daemon manages
Process-level isolation
Light: MBs of overhead

Container Stack

+-------+ +-------+ +-------+
| App 1 | | App 2 | | App 3 |
+-------+-+-------+-+-------+
|     Docker Engine         |
+---------------------------+
|        Host OS            |
+---------------------------+
|        Hardware           |
+---------------------------+

Key difference: No Guest OS layer! Apps share the host kernel.

Containers vs VMs: Comparison

Aspect	Virtual Machine	Container
Size	GBs (full OS)	MBs (just app)
Startup	Minutes	Seconds
Memory	Dedicated per VM	Shared
Isolation	Complete (separate OS)	Process-level
Density	~10 per host	~100+ per host

Containers are 10-100x more efficient
Same hardware runs more applications
Faster deployments, faster scaling

When to Use What

Use VMs When:

Need different OS (Windows on Linux)
Complete kernel isolation required
Running untrusted code
Legacy applications

Use Containers When:

Deploying microservices
CI/CD pipelines
Development environments
Cloud-native applications

In practice: 95% of new applications use containers. VMs are for specific edge cases.

The Docker Workflow

┌─────────────┐    docker build    ┌─────────────┐
│  Dockerfile │ ──────────────────▶│    Image    │
└─────────────┘                    └──────┬──────┘
                                          │
                                   docker run
                                          │
                                          ▼
                                   ┌─────────────┐
                                   │  Container  │
                                   └─────────────┘

Write a Dockerfile (recipe)
Build an image (docker build)
Run containers from the image (docker run)
Share the image via a registry (docker push/pull)

Part 3: Images & Layers

How Docker builds efficiently

Images Are Made of Layers

FROM python:3.11-slim      # Layer 1: Base Python image
WORKDIR /app               # Layer 2: Create directory
COPY requirements.txt .    # Layer 3: Copy requirements
RUN pip install -r ...     # Layer 4: Install dependencies
COPY app.py .              # Layer 5: Copy application
CMD ["python", "app.py"]   # Layer 6: Set startup command

Each Dockerfile instruction creates a layer
Layers are cached and reused
If a layer hasn't changed, Docker skips rebuilding it
This is why Docker builds are fast after the first time

Layer Caching in Action

First Build

Step 1/6: FROM python
 ---> Using cache
Step 2/6: WORKDIR
 ---> Running...
Step 3/6: COPY reqs
 ---> Running...
Step 4/6: RUN pip
 ---> Running... (slow)
Step 5/6: COPY app
 ---> Running...
Step 6/6: CMD
 ---> Running...

After Editing app.py

Step 1/6: FROM python
 ---> Using cache
Step 2/6: WORKDIR
 ---> Using cache
Step 3/6: COPY reqs
 ---> Using cache
Step 4/6: RUN pip
 ---> Using cache (fast!)
Step 5/6: COPY app
 ---> Running...
Step 6/6: CMD
 ---> Running...

Key insight: pip install is cached! Only app.py is rebuilt.

Dockerfile Order Matters

Bad: Slow Rebuilds

# Copy everything first
COPY . .

# Then install
RUN pip install -r requirements.txt

# Every code change
# reinstalls ALL dependencies!

Good: Fast Rebuilds

# Copy requirements FIRST
COPY requirements.txt .

# Install dependencies
RUN pip install -r requirements.txt

# THEN copy code
COPY . .

Put things that change rarely at the top (dependencies)
Put things that change often at the bottom (code)
When a layer changes, all layers after it rebuild

Base Images

# Full Python image (~1 GB)
FROM python:3.11

# Slim image (~150 MB) - no build tools
FROM python:3.11-slim

# Alpine image (~50 MB) - minimal Linux
FROM python:3.11-alpine

Image	Size	Use Case
python:3.11	~1 GB	Need to compile packages (numpy)
python:3.11-slim	~150 MB	Most Flask apps (recommended)
python:3.11-alpine	~50 MB	Size critical, no C extensions

Inspecting Images

# List all images
docker images

# See image history (layers)
docker history flask-app

# Inspect image details
docker inspect flask-app

# Remove unused images
docker image prune

docker images shows all local images with sizes
docker history shows each layer and its size
Prune regularly to free disk space

Part 4: Dockerfile Deep Dive

Every command explained

Complete Dockerfile Example

# Base image with Python
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Copy requirements and install
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Document the port
EXPOSE 5000

# Default command
CMD ["python", "app.py"]

This is the standard pattern for Flask apps
Let's break down each instruction

FROM: Start with a Base

# REQUIRED: Every Dockerfile starts with FROM
FROM python:3.11-slim

Always first - defines the starting point
Format: image:tag
Where it comes from: Docker Hub (default registry)
Includes: OS, Python, pip, common tools

Pro tip: Always specify a version tag (3.11, not latest). "latest" can change and break your build.

WORKDIR: Set the Directory

# Create and switch to /app directory
WORKDIR /app

# All following commands run in /app
COPY requirements.txt .    # Copies to /app/requirements.txt
RUN pip install ...        # Runs in /app
COPY . .                   # Copies to /app

Creates the directory if it doesn't exist
Changes to that directory (like cd)
Persists for all following instructions
Convention: Use /app or /code

COPY: Add Your Files

# Copy single file
COPY requirements.txt .

# Copy multiple files
COPY app.py config.py ./

# Copy entire directory
COPY . .

# Copy with rename
COPY app.py /app/server.py

Source: Path relative to Dockerfile location
Destination: Path inside the container
. means current directory (WORKDIR)
Trailing slash on destination means "into this directory"

RUN: Execute Commands

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# System updates (less common for slim images)
RUN apt-get update && apt-get install -y curl

# Multiple commands in one layer
RUN pip install flask && \
    pip install requests && \
    rm -rf /root/.cache

Executes shell commands during build
Creates a new layer for each RUN
Combine related commands with && to reduce layers
--no-cache-dir reduces image size

EXPOSE: Document Ports

# Document that the container listens on 5000
EXPOSE 5000

# Multiple ports
EXPOSE 5000 8080

Documentation only - doesn't actually publish ports
Tells humans and tools which ports the app uses
Still need -p flag when running: docker run -p 5000:5000

Common confusion: EXPOSE doesn't make ports accessible. You still need -p 5000:5000 when running.

CMD: Default Startup Command

# Exec form (preferred) - runs directly
CMD ["python", "app.py"]

# Shell form - runs in /bin/sh -c
CMD python app.py

# With Flask CLI
CMD ["flask", "run", "--host=0.0.0.0"]

Runs when container starts (not during build)
Exec form (brackets): Runs directly, handles signals properly
Shell form: Runs in a shell, more flexible but less clean
Only one CMD: If multiple, only the last one runs

ENV: Environment Variables

# Set environment variables
ENV FLASK_APP=app.py
ENV FLASK_ENV=production

# Multiple in one line
ENV FLASK_APP=app.py FLASK_ENV=production

# Use in later commands
RUN echo "App: $FLASK_APP"

Sets environment variables in the image
Persists when container runs
Can override with docker run -e VAR=value

Part 5: docker-compose

Multi-container development

Why docker-compose?

Problem: Real apps have multiple services
Flask app + PostgreSQL + Redis + Celery
Running separately: docker run for each (painful)
Solution: Define all services in one file

# Without docker-compose
docker run -d --name postgres postgres:15
docker run -d --name redis redis:7
docker run -d -p 5000:5000 --link postgres --link redis flask-app

# With docker-compose
docker-compose up

docker-compose.yml Structure

version: '3.8'

services:
  web:                          # Service name
    build: .                    # Build from Dockerfile
    ports:
      - "5000:5000"            # Port mapping
    environment:
      - FLASK_ENV=development  # Environment variables
    volumes:
      - .:/app                 # Mount code for hot reload

version: Compose file format version
services: Define each container
build: Path to Dockerfile
ports: Same as -p flag
volumes: Mount host directories into container

Volumes: Hot Reload Magic

services:
  web:
    build: .
    volumes:
      - .:/app    # Mount current directory to /app in container

Volumes mount host directories into container
Changes on host immediately appear in container
No need to rebuild when code changes!
Format: host_path:container_path

Development workflow: Edit code locally, Flask reloads automatically. No docker build needed during development!

Multi-Service Example

version: '3.8'

services:
  web:
    build: .
    ports:
      - "5000:5000"
    depends_on:
      - db
    environment:
      - DATABASE_URL=postgresql://postgres:password@db:5432/app

  db:
    image: postgres:15
    environment:
      - POSTGRES_PASSWORD=password
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

docker-compose Commands

# Start all services (detached)
docker-compose up -d

# Start with build
docker-compose up --build

# View logs
docker-compose logs -f web

# Stop all services
docker-compose down

# Stop and remove volumes
docker-compose down -v

up: Start services (builds if needed)
-d: Detached mode (background)
--build: Force rebuild
down: Stop and remove containers
-v: Also remove volumes (data loss!)

Development vs Production

Development

# docker-compose.yml
services:
  web:
    build: .
    volumes:
      - .:/app  # Hot reload
    environment:
      - FLASK_ENV=development

Production

# docker-compose.prod.yml
services:
  web:
    image: my-app:v1.0
    # No volumes!
    environment:
      - FLASK_ENV=production

Dev: Volumes for hot reload, debug mode
Prod: Pre-built image, no volumes, production mode

Summary

Key takeaways from today

What You Learned Today

Containers vs VMs: Lighter, faster, more efficient
Images & Layers: Order matters for caching
Dockerfile commands: FROM, WORKDIR, COPY, RUN, EXPOSE, CMD
docker-compose: Define multi-container apps in YAML
Volumes: Hot reload during development

Key insight: Your Flask app from Week 3 + a Dockerfile = runs anywhere.

Looking Ahead

Week	Topic	Building On
6	Validation & Schemas	Flask-Smorest, Marshmallow
7	SQLAlchemy + Migrations	PostgreSQL in containers
8	Async Task Queues	Redis + rq in containers

Assignment 1: Due before Week 6 Friday lab.
Friday: Assignment 1 Help
Prep for Week 6: O'Reilly Chapter 5 - Flask-Smorest

Live Demo

Dockerize a Flask App

Demo: What We'll Build

Start with a simple Flask app (already written)
Create a Dockerfile from scratch
Build the image with docker build
Run the container with docker run
Test with curl - see it work!
Create docker-compose.yml for development

Watch along: I'll type everything live. You'll practice in the in-class exploration.

In-Class Exploration

Containerize a Product API

Project Overview

You receive a complete Flask Product API (already written)
Your job: containerize it!
Tasks:
- Write a Dockerfile (15 min)
- Build and run the container (10 min)
- Write docker-compose.yml (15 min)
- Test and submit (5 min)

Focus: This is about Docker, not Flask. The Flask code is provided and working.

Getting Started

Click the GitHub Classroom link (in bCourses)

Clone your repo:

git clone <your-repo-url>
cd in-class-exploration-week-5-<your-username>

Verify Flask works locally first:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
flask run

Then containerize it!

Testing Your Container

# Build the image
docker build -t product-api .

# Run the container
docker run -p 5000:5000 product-api

# In another terminal, test it
curl http://localhost:5000/products
curl http://localhost:5000/health

# With docker-compose
docker-compose up

Submitting (at 10:45 AM)

Commit everything you have (even if incomplete):

git add .
git commit -m "In-class exploration submission"

Push to your repository:
```
git push
```
Submit on bCourses: Add your GitHub repo URL

Grading: Pass/No Pass based on engagement. Just show you worked on it!

Want to keep working? Continue after submitting - just push additional changes.

Questions?

Website: groups.ischool.berkeley.edu/i253/sp26

Email: kay@ischool.berkeley.edu