Recurrent Neural Networks (RNNs) are a class of neural networks that are particularly effective for sequential data. Unlike traditional feedforward neural networks, RNNs have connections that form directed cycles, allowing them to maintain a 'memory' of previous inputs. This makes them well-suited for tasks where the order of the data is important, such as time series forecasting, natural language processing, and speech recognition.

Key Concepts

  1. Sequential Data: Data where the order of elements is significant. Examples include time series data, text, and audio.
  2. Recurrent Connections: Connections that loop back to previous layers, enabling the network to retain information from previous time steps.
  3. Hidden State: A vector that captures information from previous time steps and is updated at each time step.
  4. Backpropagation Through Time (BPTT): A variant of backpropagation used to train RNNs, which takes into account the sequential nature of the data.

Basic Structure of an RNN

An RNN processes input data one element at a time, maintaining a hidden state that is updated at each time step. The basic structure can be summarized as follows:

  1. Input Sequence: \( x_1, x_2, \ldots, x_T \)
  2. Hidden State: \( h_t \) at time step \( t \)
  3. Output Sequence: \( y_1, y_2, \ldots, y_T \)

The hidden state \( h_t \) is computed using the current input \( x_t \) and the previous hidden state \( h_{t-1} \):

\[ h_t = f(W_{hx} x_t + W_{hh} h_{t-1} + b_h) \]


  • \( W_{hx} \) is the weight matrix for the input.
  • \( W_{hh} \) is the weight matrix for the hidden state.
  • \( b_h \) is the bias term.
  • \( f \) is the activation function (commonly \( \tanh \) or \( \text{ReLU} \)).

The output \( y_t \) is computed using the hidden state \( h_t \):

\[ y_t = g(W_{hy} h_t + b_y) \]


  • \( W_{hy} \) is the weight matrix for the output.
  • \( b_y \) is the bias term.
  • \( g \) is the activation function (commonly \( \text{softmax} \) for classification tasks).

Practical Example: Implementing a Simple RNN in PyTorch

Let's implement a simple RNN in PyTorch to understand how it works in practice.

Step 1: Import Libraries

import torch
import torch.nn as nn
import torch.optim as optim

Step 2: Define the RNN Model

class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.hidden_size = hidden_size
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    def forward(self, x):
        h0 = torch.zeros(1, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.rnn(x, h0)
        out = self.fc(out[:, -1, :])
        return out

Step 3: Initialize the Model, Loss Function, and Optimizer

input_size = 10
hidden_size = 20
output_size = 1

model = SimpleRNN(input_size, hidden_size, output_size)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Step 4: Training the Model

# Dummy data
x_train = torch.randn(100, 5, input_size)  # (batch_size, sequence_length, input_size)
y_train = torch.randn(100, output_size)    # (batch_size, output_size)

num_epochs = 100

for epoch in range(num_epochs):
    outputs = model(x_train)
    loss = criterion(outputs, y_train)
    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')


  • Model Definition: The SimpleRNN class defines an RNN with one hidden layer and a fully connected layer for the output.
  • Forward Pass: The forward method initializes the hidden state, processes the input sequence through the RNN, and applies the fully connected layer to the last hidden state.
  • Training Loop: The training loop iterates over the epochs, computes the loss, performs backpropagation, and updates the model parameters.

Practical Exercise


Implement an RNN to perform a simple sequence prediction task. Given a sequence of numbers, predict the next number in the sequence.


import torch
import torch.nn as nn
import torch.optim as optim

# Define the RNN model
class SequenceRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SequenceRNN, self).__init__()
        self.hidden_size = hidden_size
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    def forward(self, x):
        h0 = torch.zeros(1, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.rnn(x, h0)
        out = self.fc(out[:, -1, :])
        return out

# Initialize the model, loss function, and optimizer
input_size = 1
hidden_size = 10
output_size = 1

model = SequenceRNN(input_size, hidden_size, output_size)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Dummy data for sequence prediction
x_train = torch.tensor([[[i] for i in range(j, j+5)] for j in range(100)], dtype=torch.float32)
y_train = torch.tensor([[j+5] for j in range(100)], dtype=torch.float32)

# Training the model
num_epochs = 200

for epoch in range(num_epochs):
    outputs = model(x_train)
    loss = criterion(outputs, y_train)
    if (epoch+1) % 20 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Testing the model
test_seq = torch.tensor([[[i] for i in range(95, 100)]], dtype=torch.float32)
predicted = model(test_seq)
print(f'Predicted next number: {predicted.item():.4f}')


  • Model Definition: The SequenceRNN class defines an RNN for sequence prediction.
  • Training Data: The training data consists of sequences of 5 numbers, and the target is the next number in the sequence.
  • Training Loop: The training loop trains the model for 200 epochs and prints the loss every 20 epochs.
  • Testing: The model is tested on a new sequence to predict the next number.


In this section, we introduced Recurrent Neural Networks (RNNs) and their key concepts. We discussed the basic structure of an RNN and implemented a simple RNN in PyTorch. We also provided a practical exercise to reinforce the learned concepts. In the next section, we will delve deeper into building an RNN from scratch and explore more advanced architectures like LSTMs and GRUs.

© Copyright 2024. All rights reserved