Validation & Data Schemas

INFO 153B/253B: Backend Web Architecture

Week 6

 

Kay Ashaolu - Instructor

Suk Min Hwang - GSI

Today's Agenda

  • Part 1: Prep Work Recap - Flask-Smorest basics
  • Part 2: Why Validation Matters - Security and integrity
  • Part 3: Marshmallow Deep Dive - Schema definitions
  • Part 4: Flask-Smorest Integration - Putting it together
  • Part 5: Error Handling - User-friendly validation errors
  • Demo: Add validation to a Store API (5 min)
  • In-Class Exploration: Validate an Items API (45 min)

Part 1: Prep Work Recap

Quick check-in on O'Reilly Chapter 5

What You Learned in Prep

  • Blueprints: Organize Flask routes into modules
  • MethodViews: Class-based views for cleaner code
  • Marshmallow: Define data schemas for validation
  • Flask-Smorest: Connect schemas to Flask routes
  • OpenAPI: Automatic API documentation
Key insight: Schemas define what valid data looks like. Instead of manual if checks, declare the rules once.

The Problem We're Solving

  • User sends: {"price": "expensive"}
  • Your API expects: {"price": 9.99}
  • Without validation: Crash, bad data, security holes
# Without validation - what could go wrong?
@app.route('/items', methods=['POST'])
def create_item():
    data = request.get_json()
    # Hope for the best!
    item = {"name": data["name"], "price": data["price"]}
    items.append(item)
    return jsonify(item), 201
Problems: Missing fields? Wrong types? Negative prices? This code trusts user input completely - never do this!

Quick Check: Marshmallow Vocabulary

TermDefinitionExample
SchemaBlueprint for data structureItemSchema
FieldSingle piece of datafields.String()
loadDeserialize (JSON to Python)Incoming request
dumpSerialize (Python to JSON)Outgoing response
  • Load = when data comes IN (validate input)
  • Dump = when data goes OUT (format response)
  • Schemas work in both directions!

Quick Check: Flask-Smorest Decorators

from flask_smorest import Blueprint

blp = Blueprint("items", __name__, description="Operations on items")

@blp.route("/items")
class ItemList(MethodView):
    @blp.response(200, ItemSchema(many=True))  # Output
    def get(self):
        return items

    @blp.arguments(ItemSchema)                  # Input
    @blp.response(201, ItemSchema)              # Output
    def post(self, item_data):
        # item_data is already validated!
        items.append(item_data)
        return item_data
  • @blp.arguments - validates incoming JSON
  • @blp.response - formats outgoing JSON
  • Validation happens before your function runs

Prep Recap: Key Takeaways

  • Schemas define rules: Required fields, types, constraints
  • Decorators apply schemas: @blp.arguments, @blp.response
  • Validation is automatic: Bad data = 422 error, good data = your function runs
  • Blueprints organize code: Group related routes together
The connection: Week 3 APIs trusted all input. Week 5 we containerized them. Week 6: we make them bulletproof.

Part 2: Why Validation Matters

Security, integrity, and user experience

The Three Pillars of Validation

1. Security

  • Prevent injection attacks
  • Block malformed payloads
  • Limit data sizes

2. Data Integrity

  • Correct types in database
  • Required fields present
  • Values within ranges

3. User Experience

  • Clear error messages
  • Fast feedback
  • Helpful guidance
Good validation: "Price must be a positive number"
Bad validation: "500 Internal Server Error"

What Happens Without Validation?

# No validation - user sends: {"price": "free"}
@app.route('/items', methods=['POST'])
def create_item():
    data = request.get_json()
    new_item = {
        "name": data["name"],      # KeyError if missing!
        "price": data["price"]     # "free" is not a number!
    }

    # Later, in another endpoint...
    total = sum(item["price"] for item in items)  # TypeError!
  • Missing field: KeyError crash (500 error)
  • Wrong type: Bad data saved, crashes later
  • Negative price: Business logic errors
  • Huge payload: Memory exhaustion, DoS

Real-World Horror Stories

  • The $0.00 Bug: E-commerce site accepted negative prices. Users "bought" items and got refunded more than they paid.
  • The JSON Bomb: API accepted nested JSON. Attacker sent 10GB of nested arrays. Server ran out of memory.
  • The Type Confusion: User ID accepted as string OR number. Led to authentication bypass.
  • The Missing Check: "Delete all items" endpoint didn't validate ownership. Anyone could delete anyone's data.
Industry wisdom: "Never trust user input" is rule #1 of backend development.

Manual Validation: The Hard Way

@app.route('/items', methods=['POST'])
def create_item():
    data = request.get_json()

    # Manual validation - tedious and error-prone
    if not data:
        return {"error": "No data provided"}, 400
    if "name" not in data:
        return {"error": "Name is required"}, 400
    if not isinstance(data["name"], str):
        return {"error": "Name must be a string"}, 400
    if len(data["name"]) > 100:
        return {"error": "Name too long"}, 400
    if "price" not in data:
        return {"error": "Price is required"}, 400
    if not isinstance(data["price"], (int, float)):
        return {"error": "Price must be a number"}, 400
    if data["price"] < 0:
        return {"error": "Price cannot be negative"}, 400

    # Finally, do the actual work...
    new_item = {"name": data["name"], "price": data["price"]}
    items.append(new_item)
    return jsonify(new_item), 201

Schema Validation: The Smart Way

from marshmallow import Schema, fields, validate

class ItemSchema(Schema):
    name = fields.Str(required=True, validate=validate.Length(max=100))
    price = fields.Float(required=True, validate=validate.Range(min=0))

# In your route:
@blp.arguments(ItemSchema)
@blp.response(201, ItemSchema)
def post(self, item_data):
    # item_data is already validated!
    items.append(item_data)
    return item_data
  • 5 lines replace 20 lines of if-statements
  • Declarative: Say what you want, not how to check it
  • Reusable: Same schema for input and output
  • Automatic errors: Flask-Smorest handles the 422 response

Part 3: Marshmallow Deep Dive

Schema definitions and field types

Basic Schema Structure

from marshmallow import Schema, fields

class ItemSchema(Schema):
    id = fields.Int(dump_only=True)        # Only in output
    name = fields.Str(required=True)        # Required in input
    price = fields.Float(required=True)
    description = fields.Str()              # Optional
    created_at = fields.DateTime(dump_only=True)
  • Schema class: Define fields as class attributes
  • Field types: Str, Int, Float, Bool, DateTime
  • required=True: Must be present in input
  • dump_only=True: Only appears in output (auto-generated IDs)
  • load_only=True: Only accepted in input (passwords)

Common Field Types

Field TypePython TypeExample
fields.Str()str"hello"
fields.Int()int42
fields.Float()float3.14
fields.Bool()booltrue
fields.DateTime()datetime"2024-01-15T10:30:00"
fields.List(fields.Str())list["a", "b", "c"]
fields.Nested(OtherSchema)dict{"name": "..."}

Validation Rules

from marshmallow import Schema, fields, validate

class ItemSchema(Schema):
    # Length constraints
    name = fields.Str(
        required=True,
        validate=validate.Length(min=1, max=100)
    )

    # Range constraints
    price = fields.Float(
        required=True,
        validate=validate.Range(min=0, max=10000)
    )

    # Choice from options
    category = fields.Str(
        validate=validate.OneOf(["electronics", "clothing", "food"])
    )

    # Regex pattern
    sku = fields.Str(
        validate=validate.Regexp(r'^[A-Z]{3}-\d{4}$')
    )

Custom Validators

from marshmallow import Schema, fields, validates, ValidationError

class ItemSchema(Schema):
    name = fields.Str(required=True)
    price = fields.Float(required=True)
    discount_price = fields.Float()

    @validates("name")
    def validate_name(self, value):
        """Custom validation for name field."""
        if value.lower() == "test":
            raise ValidationError("Name cannot be 'test'")
        if "  " in value:
            raise ValidationError("Name cannot have double spaces")

    @validates("discount_price")
    def validate_discount(self, value):
        """Discount must be less than regular price."""
        # Note: cross-field validation is tricky here
        # For complex cases, use @validates_schema
        if value < 0:
            raise ValidationError("Discount cannot be negative")

Nested Schemas

class ItemSchema(Schema):
    id = fields.Int(dump_only=True)
    name = fields.Str(required=True)
    price = fields.Float(required=True)

class StoreSchema(Schema):
    id = fields.Int(dump_only=True)
    name = fields.Str(required=True)
    # Nested items - one store has many items
    items = fields.List(fields.Nested(ItemSchema), dump_only=True)
// JSON output:
{
    "id": 1,
    "name": "My Store",
    "items": [
        {"id": 1, "name": "Chair", "price": 49.99},
        {"id": 2, "name": "Table", "price": 149.99}
    ]
}

Schema Options: dump_only vs load_only

dump_only=True

# Only in OUTPUT
id = fields.Int(dump_only=True)
created_at = fields.DateTime(dump_only=True)
  • Auto-generated values
  • Server-computed fields
  • Ignored in input

load_only=True

# Only in INPUT
password = fields.Str(load_only=True)
confirm_password = fields.Str(load_only=True)
  • Sensitive input
  • Never returned to user
  • Ignored in output
Rule of thumb: dump_only for IDs and timestamps. load_only for passwords and secrets.

Part 4: Flask-Smorest Integration

Connecting schemas to Flask routes

Setting Up Flask-Smorest

from flask import Flask
from flask_smorest import Api

app = Flask(__name__)

# Required configuration
app.config["API_TITLE"] = "Stores API"
app.config["API_VERSION"] = "v1"
app.config["OPENAPI_VERSION"] = "3.0.3"

# Optional: Enable Swagger UI
app.config["OPENAPI_URL_PREFIX"] = "/"
app.config["OPENAPI_SWAGGER_UI_PATH"] = "/swagger-ui"
app.config["OPENAPI_SWAGGER_UI_URL"] = "https://cdn.jsdelivr.net/npm/swagger-ui-dist/"

api = Api(app)

# Register blueprints
api.register_blueprint(items_blp)
api.register_blueprint(stores_blp)

Creating Blueprints

from flask_smorest import Blueprint
from flask.views import MethodView

# Create a blueprint for items
blp = Blueprint(
    "items",                    # Blueprint name
    __name__,                   # Import name
    description="Operations on items"  # For API docs
)

@blp.route("/items")
class ItemList(MethodView):
    def get(self):
        """Get all items."""
        pass

    def post(self):
        """Create a new item."""
        pass

@blp.route("/items/<int:item_id>")
class Item(MethodView):
    def get(self, item_id):
        """Get a specific item."""
        pass

The @blp.arguments Decorator

@blp.route("/items")
class ItemList(MethodView):
    @blp.arguments(ItemSchema)
    @blp.response(201, ItemSchema)
    def post(self, item_data):
        """Create a new item.

        item_data is already validated!
        - If invalid: 422 Unprocessable Entity (automatic)
        - If valid: this function runs with clean data
        """
        new_item = {
            "id": len(items) + 1,
            **item_data  # Already a dict with correct types
        }
        items.append(new_item)
        return new_item
  • Validates incoming JSON against ItemSchema
  • Passes validated data as first argument
  • Returns 422 with error details if invalid

The @blp.response Decorator

@blp.route("/items")
class ItemList(MethodView):
    @blp.response(200, ItemSchema(many=True))
    def get(self):
        """Get all items.

        Returns a list of items.
        many=True means "this returns a list, not a single object"
        """
        return items  # List of dicts

@blp.route("/items/<int:item_id>")
class Item(MethodView):
    @blp.response(200, ItemSchema)
    def get(self, item_id):
        """Get a specific item."""
        item = next((i for i in items if i["id"] == item_id), None)
        if not item:
            abort(404, message="Item not found")
        return item  # Single dict
  • Documents the response format in OpenAPI
  • many=True for lists, omit for single objects
  • Serializes output through the schema

Complete Example: Items API

# schemas.py
from marshmallow import Schema, fields, validate

class ItemSchema(Schema):
    id = fields.Int(dump_only=True)
    name = fields.Str(required=True, validate=validate.Length(min=1, max=100))
    price = fields.Float(required=True, validate=validate.Range(min=0))

class ItemUpdateSchema(Schema):
    name = fields.Str(validate=validate.Length(min=1, max=100))
    price = fields.Float(validate=validate.Range(min=0))
  • ItemSchema: For create (POST) - fields required
  • ItemUpdateSchema: For update (PUT) - fields optional
  • Separate schemas for different operations

Complete Example: Routes

# resources/item.py
from flask_smorest import Blueprint, abort
from flask.views import MethodView
from schemas import ItemSchema, ItemUpdateSchema

blp = Blueprint("items", __name__, description="Operations on items")
items = []

@blp.route("/items")
class ItemList(MethodView):
    @blp.response(200, ItemSchema(many=True))
    def get(self):
        return items

    @blp.arguments(ItemSchema)
    @blp.response(201, ItemSchema)
    def post(self, item_data):
        item = {"id": len(items) + 1, **item_data}
        items.append(item)
        return item

@blp.route("/items/<int:item_id>")
class Item(MethodView):
    @blp.response(200, ItemSchema)
    def get(self, item_id):
        item = next((i for i in items if i["id"] == item_id), None)
        if not item:
            abort(404, message="Item not found")
        return item

    @blp.arguments(ItemUpdateSchema)
    @blp.response(200, ItemSchema)
    def put(self, item_data, item_id):
        item = next((i for i in items if i["id"] == item_id), None)
        if not item:
            abort(404, message="Item not found")
        item.update(item_data)
        return item

Automatic API Documentation

  • Flask-Smorest generates OpenAPI 3.0 documentation
  • Schemas become request/response schemas in docs
  • Decorators add metadata automatically
  • Swagger UI provides interactive testing
Visit: http://localhost:5000/swagger-ui
- See all endpoints documented
- Try requests directly from the browser
- View request/response schemas

Part 5: Error Handling

User-friendly validation errors

Default Validation Errors

# Request with missing field
curl -X POST http://localhost:5000/items \
  -H "Content-Type: application/json" \
  -d '{"name": "Chair"}'  # Missing price!
// Response: 422 Unprocessable Entity
{
    "code": 422,
    "errors": {
        "json": {
            "price": ["Missing data for required field."]
        }
    },
    "status": "Unprocessable Entity"
}
  • Flask-Smorest returns structured error responses
  • errors.json contains field-by-field errors
  • Multiple errors returned at once

Multiple Validation Errors

# Request with multiple issues
curl -X POST http://localhost:5000/items \
  -H "Content-Type: application/json" \
  -d '{"price": -10}'  # Missing name AND negative price!
// Response: 422 Unprocessable Entity
{
    "code": 422,
    "errors": {
        "json": {
            "name": ["Missing data for required field."],
            "price": ["Must be greater than or equal to 0."]
        }
    },
    "status": "Unprocessable Entity"
}
  • All validation errors reported together
  • User can fix everything in one attempt
  • Better UX than "fix one, find another"

Custom Error Messages

from marshmallow import Schema, fields, validate

class ItemSchema(Schema):
    name = fields.Str(
        required=True,
        validate=validate.Length(min=1, max=100),
        error_messages={
            "required": "Item name is required.",
            "null": "Item name cannot be null.",
        }
    )
    price = fields.Float(
        required=True,
        validate=validate.Range(
            min=0,
            error="Price must be a positive number."
        ),
        error_messages={
            "required": "Price is required.",
            "invalid": "Price must be a valid number.",
        }
    )
  • error_messages dict for field-level messages
  • error parameter on validators for custom messages
  • Match your application's tone and language

Using abort() for Business Logic Errors

from flask_smorest import abort

@blp.route("/items/<int:item_id>")
class Item(MethodView):
    @blp.response(200, ItemSchema)
    def get(self, item_id):
        item = next((i for i in items if i["id"] == item_id), None)
        if not item:
            abort(404, message="Item not found")
        return item

    def delete(self, item_id):
        item = next((i for i in items if i["id"] == item_id), None)
        if not item:
            abort(404, message="Item not found")
        if item.get("protected"):
            abort(403, message="Cannot delete protected items")
        items.remove(item)
        return "", 204
  • abort() for non-validation errors (404, 403, etc.)
  • message parameter for error details
  • Returns JSON error response automatically

Best Practices Summary

  • Validate at the boundary: APIs are where external data enters
  • Use schemas, not if-statements: Declarative, reusable, testable
  • Separate input/output schemas: Create vs Update, User vs UserWithPassword
  • Return all errors at once: Better UX than one-at-a-time
  • Custom messages for UX: "Price required" beats "Missing data for required field"
  • Use abort() for business errors: 404, 403, 409 for logic issues

Summary

Key takeaways from today

What You Learned Today

  • Why validation matters: Security, integrity, UX
  • Marshmallow schemas: Declarative field definitions
  • Field options: required, dump_only, load_only, validate
  • Flask-Smorest: @blp.arguments, @blp.response
  • Error handling: Automatic 422s, custom messages, abort()
Key insight: Schemas turn validation from tedious if-statements into clean, declarative rules.

Looking Ahead

WeekTopicBuilding On
7SQLAlchemy + MigrationsStore validated data in PostgreSQL
8Async Task QueuesBackground processing with Celery
9System DesignScalability, reliability, caching
Assignment 1: Due before Friday, 3/6 lab!
Friday: Lab 3: Validation
Prep for Week 7: O'Reilly Chapter 6 (SQLAlchemy) + Chapter 9 (Migrations)

Live Demo

Add validation to a Store API

Demo: What We'll Build

  • Start with a Flask Stores API (no validation)
  • Create ItemSchema and StoreSchema
  • Add @blp.arguments and @blp.response
  • Test with valid and invalid data
  • See automatic error responses
  • Visit Swagger UI for documentation
Watch along: I'll type everything live. You'll practice in the in-class exploration.

In-Class Exploration

Add Validation to an Items API

Project Overview

  • You receive a working Flask Items/Stores API (no validation)
  • Your job: add Marshmallow schemas and Flask-Smorest!
  • Tasks:
    • Write ItemSchema and StoreSchema (15 min)
    • Add @blp.arguments decorators (10 min)
    • Add @blp.response decorators (10 min)
    • Test validation and submit (10 min)
Focus: This is about validation, not Flask routing. The routes are provided and working.

Getting Started

  1. Click the GitHub Classroom link (in bCourses)
  2. Clone your repo:
    git clone <your-repo-url>
    cd in-class-exploration-week-6-<your-username>
  3. Set up and run:
    python3 -m venv venv
    source venv/bin/activate  # Windows: venv\Scripts\activate
    pip install -r requirements.txt
    flask run
  4. Test the API works (no validation yet):
    curl http://localhost:5000/items
    curl -X POST http://localhost:5000/items \
      -H "Content-Type: application/json" \
      -d '{"anything": "works"}'  # No validation!

Testing Your Validation

# Valid request - should succeed
curl -X POST http://localhost:5000/items \
  -H "Content-Type: application/json" \
  -d '{"name": "Chair", "price": 49.99}'

# Invalid request - should return 422
curl -X POST http://localhost:5000/items \
  -H "Content-Type: application/json" \
  -d '{"name": "Chair"}'  # Missing price!

# Check Swagger UI
open http://localhost:5000/swagger-ui

Submitting (at 10:45 AM)

  1. Commit everything you have (even if incomplete):
    git add .
    git commit -m "In-class exploration submission"
  2. Push to your repository:
    git push
  3. Submit on bCourses: Add your GitHub repo URL
Grading: Pass/No Pass based on engagement. Just show you worked on it!

Want to keep working? Continue after submitting - just push additional changes.

Questions?

 

Website: groups.ischool.berkeley.edu/i253/sp26

Email: kay@ischool.berkeley.edu