The Project | About Us | Contribute | Donations | License

HOME

Activation functions play a crucial role in neural networks by introducing non-linearity into the model, allowing it to learn complex patterns. In this section, we will explore various activation functions, their properties, and their applications.

Key Concepts

What is an Activation Function?
- An activation function determines whether a neuron should be activated or not by calculating the weighted sum and adding bias to it.
- It introduces non-linearity into the output of a neuron, enabling the network to learn complex patterns.
Types of Activation Functions
- Linear Activation Function
- Non-Linear Activation Functions
  - Sigmoid
  - Tanh
  - ReLU (Rectified Linear Unit)
  - Leaky ReLU
  - Softmax

Linear Activation Function

Definition

A linear activation function is simply the identity function, where the output is directly proportional to the input.

Formula

\[ f(x) = x \]

Characteristics

Pros: Simple and easy to implement.
Cons: Cannot handle complex patterns due to its linear nature.

Code Example

import tensorflow as tf

# Linear activation function
def linear_activation(x):
    return x

# Example usage
x = tf.constant([1.0, 2.0, 3.0])
output = linear_activation(x)
print(output.numpy())  # Output: [1. 2. 3.]

Non-Linear Activation Functions

Sigmoid

Definition

The sigmoid function maps any input to a value between 0 and 1.

Formula

\[ f(x) = \frac{1}{1 + e^{-x}} \]

Characteristics

Pros: Smooth gradient, output range (0, 1), good for binary classification.
Cons: Vanishing gradient problem, not zero-centered.

Code Example

import tensorflow as tf

# Sigmoid activation function
x = tf.constant([1.0, 2.0, 3.0])
output = tf.nn.sigmoid(x)
print(output.numpy())  # Output: [0.7310586 0.880797  0.95257413]

Tanh

Definition

The tanh function maps any input to a value between -1 and 1.

Formula

\[ f(x) = \tanh(x) = \frac{2}{1 + e^{-2x}} - 1 \]

Characteristics

Pros: Zero-centered, smooth gradient.
Cons: Vanishing gradient problem.

Code Example

import tensorflow as tf

# Tanh activation function
x = tf.constant([1.0, 2.0, 3.0])
output = tf.nn.tanh(x)
print(output.numpy())  # Output: [0.7615942 0.9640276 0.9950547]

ReLU (Rectified Linear Unit)

Definition

The ReLU function outputs the input directly if it is positive; otherwise, it outputs zero.

Formula

\[ f(x) = \max(0, x) \]

Characteristics

Pros: Computationally efficient, mitigates the vanishing gradient problem.
Cons: Can cause "dead neurons" if many neurons output zero.

Code Example

import tensorflow as tf

# ReLU activation function
x = tf.constant([-1.0, 2.0, 3.0])
output = tf.nn.relu(x)
print(output.numpy())  # Output: [0. 2. 3.]

Leaky ReLU

Definition

Leaky ReLU allows a small, non-zero gradient when the input is negative.

Formula

\[ f(x) = \max(0.01x, x) \]

Characteristics

Pros: Prevents "dead neurons" by allowing a small gradient when the input is negative.
Cons: The slope of the negative part is a hyperparameter that needs tuning.

Code Example

import tensorflow as tf

# Leaky ReLU activation function
x = tf.constant([-1.0, 2.0, 3.0])
output = tf.nn.leaky_relu(x, alpha=0.01)
print(output.numpy())  # Output: [-0.01  2.    3.  ]

Softmax

Definition

The softmax function converts a vector of values into a probability distribution.

Formula

\[ f(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}} \]

Characteristics

Pros: Useful for multi-class classification problems.
Cons: Computationally expensive for large number of classes.

Code Example

import tensorflow as tf

# Softmax activation function
x = tf.constant([1.0, 2.0, 3.0])
output = tf.nn.softmax(x)
print(output.numpy())  # Output: [0.09003057 0.24472848 0.66524094]

Practical Exercise

Task

Implement a simple neural network using TensorFlow that uses different activation functions and compare their performance on the MNIST dataset.

Solution

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Define a simple neural network model
def create_model(activation_function):
    model = Sequential([
        Flatten(input_shape=(28, 28)),
        Dense(128, activation=activation_function),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

# Train and evaluate the model with different activation functions
activation_functions = ['sigmoid', 'tanh', 'relu', 'leaky_relu']
for activation in activation_functions:
    print(f"Training with {activation} activation function")
    model = create_model(activation)
    model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))
    test_loss, test_acc = model.evaluate(x_test, y_test)
    print(f"Test accuracy with {activation}: {test_acc}\n")

Summary

In this section, we covered the importance of activation functions in neural networks and explored various types, including linear, sigmoid, tanh, ReLU, Leaky ReLU, and softmax. Each activation function has its own characteristics, advantages, and disadvantages. Understanding these will help you choose the right activation function for your specific neural network model.

Activation Functions

Key Concepts

Linear Activation Function

Definition

Formula

Characteristics

Code Example

Non-Linear Activation Functions

Sigmoid

Definition

Formula

Characteristics

Code Example

Tanh

Definition

Formula

Characteristics

Code Example

ReLU (Rectified Linear Unit)

Definition

Formula

Characteristics

Code Example

Leaky ReLU

Definition

Formula

Characteristics

Code Example

Softmax

Definition

Formula

Characteristics

Code Example

Practical Exercise

Task

Solution

Summary

TensorFlow Course

Module 1: Introduction to TensorFlow

Module 2: TensorFlow Basics

Module 3: Data Handling in TensorFlow

Module 4: Building Neural Networks

Module 5: Convolutional Neural Networks (CNNs)

Module 6: Recurrent Neural Networks (RNNs)

Module 7: Advanced TensorFlow Techniques

Module 8: TensorFlow for Production

Module 9: TensorFlow Extended (TFX)

Module 10: Special Topics