The Project | About Us | Contribute | Donations | License

HOME

Activation functions play a crucial role in neural networks by introducing non-linearity into the model, allowing it to learn complex patterns. In this section, we will explore various activation functions, their properties, and their applications.

Key Concepts

What is an Activation Function?
- An activation function determines whether a neuron should be activated or not by calculating the weighted sum and adding bias to it.
- It introduces non-linearity into the output of a neuron, enabling the network to learn complex patterns.
Types of Activation Functions
- Linear Activation Function
- Non-Linear Activation Functions
  - Sigmoid
  - Tanh
  - ReLU (Rectified Linear Unit)
  - Leaky ReLU
  - Softmax

Linear Activation Function

Definition

A linear activation function is simply the identity function, where the output is directly proportional to the input.

Formula

\[ f(x) = x \]

Code Example

import torch
import torch.nn as nn

# Linear activation function
class LinearActivation(nn.Module):
    def forward(self, x):
        return x

# Example usage
x = torch.tensor([1.0, 2.0, 3.0])
linear_activation = LinearActivation()
output = linear_activation(x)
print(output)  # Output: tensor([1., 2., 3.])

Drawbacks

Cannot handle complex patterns due to its linear nature.
All layers in the network would collapse into a single layer.

Non-Linear Activation Functions

Sigmoid

Definition

The sigmoid function maps any input to a value between 0 and 1.

Formula

\[ f(x) = \frac{1}{1 + e^{-x}} \]

Code Example

import torch
import torch.nn as nn

# Sigmoid activation function
sigmoid = nn.Sigmoid()

# Example usage
x = torch.tensor([1.0, 2.0, 3.0])
output = sigmoid(x)
print(output)  # Output: tensor([0.7311, 0.8808, 0.9526])

Properties

Smooth gradient, preventing jumps in output values.
Output range: (0, 1)
Commonly used in binary classification problems.

Tanh

Definition

The tanh function maps any input to a value between -1 and 1.

Formula

\[ f(x) = \tanh(x) = \frac{2}{1 + e^{-2x}} - 1 \]

Code Example

import torch
import torch.nn as nn

# Tanh activation function
tanh = nn.Tanh()

# Example usage
x = torch.tensor([1.0, 2.0, 3.0])
output = tanh(x)
print(output)  # Output: tensor([0.7616, 0.9640, 0.9951])

Properties

Zero-centered, making optimization easier.
Output range: (-1, 1)
Often used in hidden layers of neural networks.

ReLU (Rectified Linear Unit)

Definition

The ReLU function outputs the input directly if it is positive; otherwise, it outputs zero.

Formula

\[ f(x) = \max(0, x) \]

Code Example

import torch
import torch.nn as nn

# ReLU activation function
relu = nn.ReLU()

# Example usage
x = torch.tensor([-1.0, 0.0, 1.0, 2.0])
output = relu(x)
print(output)  # Output: tensor([0., 0., 1., 2.])

Properties

Computationally efficient.
Helps mitigate the vanishing gradient problem.
Output range: [0, ∞)

Leaky ReLU

Definition

Leaky ReLU allows a small, non-zero gradient when the input is negative.

Formula

\[ f(x) = \begin{cases} x & \text{if } x > 0
\alpha x & \text{if } x \leq 0 \end{cases} \]

Code Example

import torch
import torch.nn as nn

# Leaky ReLU activation function
leaky_relu = nn.LeakyReLU(negative_slope=0.01)

# Example usage
x = torch.tensor([-1.0, 0.0, 1.0, 2.0])
output = leaky_relu(x)
print(output)  # Output: tensor([-0.0100,  0.0000,  1.0000,  2.0000])

Properties

Helps mitigate the dying ReLU problem.
Output range: (-∞, ∞)

Softmax

Definition

The softmax function converts logits into probabilities, making it useful for multi-class classification problems.

Formula

\[ f(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}} \]

Code Example

import torch
import torch.nn as nn

# Softmax activation function
softmax = nn.Softmax(dim=0)

# Example usage
x = torch.tensor([1.0, 2.0, 3.0])
output = softmax(x)
print(output)  # Output: tensor([0.0900, 0.2447, 0.6652])

Properties

Outputs a probability distribution.
Output range: (0, 1)
Sum of outputs is 1.

Practical Exercise

Task

Implement a simple neural network using PyTorch that uses different activation functions in its layers. Train the network on a small dataset and observe the effects of different activation functions.

Solution

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the dataset
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.long)
y_test = torch.tensor(y_test, dtype=torch.long)

# Define a simple neural network
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(20, 50)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(50, 2)
        self.softmax = nn.Softmax(dim=1)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.softmax(x)
        return x

# Initialize the model, loss function, and optimizer
model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 20
for epoch in range(num_epochs):
    model.train()
    optimizer.zero_grad()
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()
    
    if (epoch+1) % 5 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Evaluate the model
model.eval()
with torch.no_grad():
    outputs = model(X_test)
    _, predicted = torch.max(outputs.data, 1)
    accuracy = (predicted == y_test).sum().item() / y_test.size(0)
    print(f'Accuracy: {accuracy * 100:.2f}%')

Summary

In this section, we covered the importance of activation functions in neural networks and explored various types, including linear and non-linear activation functions. We provided practical examples and a hands-on exercise to solidify your understanding. In the next section, we will delve into loss functions and optimization techniques to further enhance your neural network models.

Activation Functions

Key Concepts

Linear Activation Function

Definition

Formula

Code Example

Drawbacks

Non-Linear Activation Functions

Sigmoid

Definition

Formula

Code Example

Properties

Tanh

Definition

Formula

Code Example

Properties

ReLU (Rectified Linear Unit)

Definition

Formula

Code Example

Properties

Leaky ReLU

Definition

Formula

Code Example

Properties

Softmax

Definition

Formula

Code Example

Properties

Practical Exercise

Task

Solution

Summary

PyTorch: From Beginner to Advanced

Module 1: Introduction to PyTorch

Module 2: Building Neural Networks

Module 3: Training Neural Networks

Module 4: Convolutional Neural Networks (CNNs)

Module 5: Recurrent Neural Networks (RNNs)

Module 6: Advanced Topics

Module 7: Case Studies and Projects