Introduction

Autograd is PyTorch's automatic differentiation engine that powers neural network training. It provides automatic computation of gradients, which are essential for optimizing neural networks. This module will cover the basics of autograd, how to use it, and practical examples to solidify your understanding.

Key Concepts

  1. Tensors and Gradients

  • Tensors: The fundamental building blocks in PyTorch, similar to NumPy arrays but with additional capabilities for GPU acceleration.
  • Gradients: Derivatives of tensors with respect to some scalar value, typically a loss function.

  1. Computational Graph

  • Computational Graph: A directed acyclic graph where nodes represent operations and edges represent tensors. PyTorch dynamically builds this graph as operations are performed.

  1. Backpropagation

  • Backpropagation: The process of computing gradients by traversing the computational graph in reverse order.

Practical Examples

Example 1: Basic Tensor Operations with Autograd

import torch

# Create tensors
x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)

# Perform operations
z = x * y + y**2

# Compute gradients
z.backward()

# Print gradients
print(f"Gradient of x: {x.grad}")
print(f"Gradient of y: {y.grad}")

Explanation:

  1. requires_grad=True tells PyTorch to track operations on these tensors.
  2. z.backward() computes the gradients of z with respect to x and y.
  3. x.grad and y.grad hold the computed gradients.

Example 2: Using Autograd with a Simple Neural Network

import torch
import torch.nn as nn

# Define a simple linear model
model = nn.Linear(1, 1)

# Input tensor
input_tensor = torch.tensor([[1.0]], requires_grad=True)

# Forward pass
output = model(input_tensor)

# Define a simple loss function
loss = (output - 2.0)**2

# Backward pass
loss.backward()

# Print gradients
print(f"Gradient of input_tensor: {input_tensor.grad}")
print(f"Gradient of model weight: {model.weight.grad}")
print(f"Gradient of model bias: {model.bias.grad}")

Explanation:

  1. A simple linear model is defined using nn.Linear.
  2. The forward pass computes the output.
  3. A loss function is defined as the squared difference between the output and a target value.
  4. loss.backward() computes the gradients of the loss with respect to the input tensor and model parameters.

Exercises

Exercise 1: Compute Gradients for a Polynomial Function

Task:

  1. Create a tensor x with value 3.0 and set requires_grad=True.
  2. Define a polynomial function y = 3x^3 + 2x^2 + x.
  3. Compute the gradient of y with respect to x.

Solution:

import torch

# Create tensor
x = torch.tensor(3.0, requires_grad=True)

# Define polynomial function
y = 3 * x**3 + 2 * x**2 + x

# Compute gradient
y.backward()

# Print gradient
print(f"Gradient of x: {x.grad}")

Explanation:

  1. The tensor x is created with requires_grad=True.
  2. The polynomial function y is defined.
  3. y.backward() computes the gradient of y with respect to x.

Exercise 2: Gradient Descent Step

Task:

  1. Create a tensor w with value 1.0 and set requires_grad=True.
  2. Define a simple quadratic function loss = (w - 5)**2.
  3. Perform a gradient descent step to update w.

Solution:

import torch

# Create tensor
w = torch.tensor(1.0, requires_grad=True)

# Define loss function
loss = (w - 5)**2

# Compute gradient
loss.backward()

# Perform gradient descent step
learning_rate = 0.1
with torch.no_grad():
    w -= learning_rate * w.grad

# Print updated value of w
print(f"Updated value of w: {w}")

Explanation:

  1. The tensor w is created with requires_grad=True.
  2. The loss function is defined.
  3. loss.backward() computes the gradient of the loss with respect to w.
  4. A gradient descent step is performed to update w.

Common Mistakes and Tips

  • Forgetting requires_grad=True: Ensure that tensors for which you need gradients have requires_grad=True.
  • Clearing Gradients: Gradients accumulate by default. Use optimizer.zero_grad() or tensor.grad.zero_() to clear gradients before the next backward pass.
  • Using torch.no_grad(): Use this context manager to perform operations that should not track gradients, such as during model evaluation or updating parameters manually.

Conclusion

In this section, you learned about PyTorch's autograd functionality, which is crucial for training neural networks. You explored basic tensor operations, computational graphs, and backpropagation. Practical examples and exercises helped reinforce these concepts. In the next module, you will dive into building neural networks using PyTorch.

© Copyright 2024. All rights reserved