In this section, we will learn how to build a Convolutional Neural Network (CNN) from scratch using PyTorch. CNNs are particularly effective for image recognition tasks due to their ability to capture spatial hierarchies in images.

Key Concepts

  1. Convolutional Layers: These layers apply convolution operations to the input, which helps in detecting features such as edges, textures, and patterns.
  2. Pooling Layers: These layers reduce the spatial dimensions of the input, which helps in reducing the computational load and controlling overfitting.
  3. Fully Connected Layers: These layers are used at the end of the network to make predictions based on the features extracted by the convolutional and pooling layers.
  4. Activation Functions: Functions like ReLU (Rectified Linear Unit) introduce non-linearity into the model, enabling it to learn complex patterns.

Step-by-Step Guide

Step 1: Import Libraries

First, we need to import the necessary libraries.

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import datasets, transforms

Step 2: Define the CNN Architecture

We will define a simple CNN with the following architecture:

  • Two convolutional layers
  • Two pooling layers
  • Two fully connected layers
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 64 * 7 * 7)  # Flatten the tensor
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

Step 3: Load and Preprocess Data

We will use the MNIST dataset for this example. The dataset will be downloaded and transformed into tensors.

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=1000, shuffle=False)

Step 4: Define Loss Function and Optimizer

We will use Cross-Entropy Loss and the Adam optimizer.

model = SimpleCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Step 5: Train the Model

We will train the model for a few epochs and print the loss for each epoch.

num_epochs = 5

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')

Step 6: Evaluate the Model

We will evaluate the model on the test dataset to check its performance.

model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the model on the test images: {100 * correct / total:.2f}%')

Practical Exercises

Exercise 1: Modify the CNN Architecture

Modify the SimpleCNN class to include an additional convolutional layer and observe how it affects the model's performance.

Solution

class ModifiedCNN(nn.Module):
    def __init__(self):
        super(ModifiedCNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1)
        self.fc1 = nn.Linear(128 * 3 * 3, 128)
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 128 * 3 * 3)  # Flatten the tensor
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

Exercise 2: Implement Dropout

Add dropout layers to the SimpleCNN class to prevent overfitting and observe the changes in performance.

Solution

class DropoutCNN(nn.Module):
    def __init__(self):
        super(DropoutCNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
        self.dropout = nn.Dropout(0.5)
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 64 * 7 * 7)  # Flatten the tensor
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

Conclusion

In this section, we have learned how to build a simple CNN from scratch using PyTorch. We covered the key components of a CNN, including convolutional layers, pooling layers, and fully connected layers. We also walked through the process of loading and preprocessing data, defining the model architecture, training the model, and evaluating its performance. Finally, we provided practical exercises to reinforce the learned concepts.

In the next section, we will explore transfer learning with pre-trained models, which can significantly speed up the training process and improve performance on complex tasks.

© Copyright 2024. All rights reserved