
Gated Recurrent Units (GRUs) are a type of Recurrent Neural Network (RNN) architecture designed to address the vanishing gradient problem commonly encountered in traditional RNNs. GRUs are similar to Long Short-Term Memory (LSTM) networks but are simpler and often perform comparably well.

Key Concepts

  • Gates: GRUs use gating units to control the flow of information.
    • Update Gate: Decides how much of the past information needs to be passed along to the future.
    • Reset Gate: Decides how much of the past information to forget.
  • Hidden State: The hidden state in GRUs is updated using the gates, allowing the network to retain long-term dependencies.

GRU Architecture

The GRU architecture can be summarized by the following equations:

  1. Update Gate: \[ z_t = \sigma(W_z \cdot [h_{t-1}, x_t]) \]
  2. Reset Gate: \[ r_t = \sigma(W_r \cdot [h_{t-1}, x_t]) \]
  3. Candidate Hidden State: \[ \tilde{h}t = \tanh(W \cdot [r_t * h{t-1}, x_t]) \]
  4. Final Hidden State: \[ h_t = (1 - z_t) * h_{t-1} + z_t * \tilde{h}_t \]


  • \( \sigma \) is the sigmoid function.
  • \( \tanh \) is the hyperbolic tangent function.
  • \( W_z, W_r, W \) are weight matrices.
  • \( h_{t-1} \) is the previous hidden state.
  • \( x_t \) is the input at time step \( t \).

Practical Example: Building a GRU in PyTorch

Step 1: Import Libraries

import torch
import torch.nn as nn
import torch.optim as optim

Step 2: Define the GRU Model

class GRUNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers):
        super(GRUNet, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.gru = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.gru(x, h0)
        out = self.fc(out[:, -1, :])
        return out

Step 3: Initialize the Model, Loss Function, and Optimizer

input_size = 10
hidden_size = 20
output_size = 1
num_layers = 2

model = GRUNet(input_size, hidden_size, output_size, num_layers)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Step 4: Training Loop

num_epochs = 100
for epoch in range(num_epochs):
    # Dummy input and target for illustration
    inputs = torch.randn(32, 5, input_size)  # Batch size: 32, Sequence length: 5
    targets = torch.randn(32, output_size)
    outputs = model(inputs)
    loss = criterion(outputs, targets)
    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Step 5: Evaluation

with torch.no_grad():
    test_inputs = torch.randn(32, 5, input_size)
    test_outputs = model(test_inputs)

Practical Exercise


Create a GRU model to predict the next value in a sequence of sine wave data.


  1. Generate sine wave data.
  2. Split the data into training and testing sets.
  3. Define and train a GRU model.
  4. Evaluate the model on the test set.


import numpy as np
import matplotlib.pyplot as plt

# Generate sine wave data
data = np.sin(np.linspace(0, 100, 1000))
sequence_length = 50

# Prepare the dataset
def create_sequences(data, seq_length):
    xs, ys = [], []
    for i in range(len(data)-seq_length):
        x = data[i:i+seq_length]
        y = data[i+seq_length]
    return np.array(xs), np.array(ys)

X, y = create_sequences(data, sequence_length)
X_train, X_test = X[:800], X[800:]
y_train, y_test = y[:800], y[800:]

# Convert to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32).unsqueeze(-1)
y_train = torch.tensor(y_train, dtype=torch.float32).unsqueeze(-1)
X_test = torch.tensor(X_test, dtype=torch.float32).unsqueeze(-1)
y_test = torch.tensor(y_test, dtype=torch.float32).unsqueeze(-1)

# Define the GRU model
class GRUNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers):
        super(GRUNet, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.gru = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.gru(x, h0)
        out = self.fc(out[:, -1, :])
        return out

# Initialize the model, loss function, and optimizer
input_size = 1
hidden_size = 20
output_size = 1
num_layers = 2

model = GRUNet(input_size, hidden_size, output_size, num_layers)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 200
for epoch in range(num_epochs):
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    if (epoch+1) % 20 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Evaluation
with torch.no_grad():
    test_outputs = model(X_test)
    test_loss = criterion(test_outputs, y_test)
    print(f'Test Loss: {test_loss.item():.4f}')

# Plot the results
plt.plot(y_test.numpy(), label='True')
plt.plot(test_outputs.numpy(), label='Predicted')

Common Mistakes and Tips

  • Incorrect Input Shape: Ensure the input tensor shape is (batch_size, sequence_length, input_size).
  • Overfitting: Use techniques like dropout or regularization if the model overfits.
  • Learning Rate: Adjust the learning rate if the model is not converging.


In this section, we explored Gated Recurrent Units (GRUs), their architecture, and how to implement them in PyTorch. We also provided a practical example and exercise to solidify your understanding. In the next module, we will delve into advanced topics such as Generative Adversarial Networks (GANs).

© Copyright 2024. All rights reserved