In this section, we will explore some of the most influential and widely used Convolutional Neural Network (CNN) architectures. These architectures have set benchmarks in various image recognition tasks and have inspired numerous advancements in the field of deep learning.
Key Concepts
- LeNet-5
- AlexNet
- VGGNet
- GoogLeNet (Inception)
- ResNet
- DenseNet
- LeNet-5
Overview
LeNet-5, developed by Yann LeCun and his colleagues in 1998, is one of the earliest CNN architectures. It was designed for handwritten digit recognition (MNIST dataset).
Architecture
- Input Layer: 32x32 grayscale image.
- C1: Convolutional layer with 6 filters of size 5x5, followed by a subsampling layer.
- S2: Subsampling layer (average pooling) with a 2x2 filter.
- C3: Convolutional layer with 16 filters of size 5x5.
- S4: Subsampling layer (average pooling) with a 2x2 filter.
- C5: Fully connected convolutional layer with 120 filters of size 5x5.
- F6: Fully connected layer with 84 units.
- Output Layer: Fully connected layer with 10 units (one for each digit).
Code Example
import tensorflow as tf from tensorflow.keras import layers, models model = models.Sequential() model.add(layers.Conv2D(6, (5, 5), activation='relu', input_shape=(32, 32, 1))) model.add(layers.AveragePooling2D((2, 2))) model.add(layers.Conv2D(16, (5, 5), activation='relu')) model.add(layers.AveragePooling2D((2, 2))) model.add(layers.Conv2D(120, (5, 5), activation='relu')) model.add(layers.Flatten()) model.add(layers.Dense(84, activation='relu')) model.add(layers.Dense(10, activation='softmax'))
- AlexNet
Overview
AlexNet, created by Alex Krizhevsky et al. in 2012, won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and significantly advanced the field of deep learning.
Architecture
- Input Layer: 227x227 RGB image.
- Conv1: Convolutional layer with 96 filters of size 11x11, stride 4, followed by max pooling.
- Conv2: Convolutional layer with 256 filters of size 5x5, followed by max pooling.
- Conv3: Convolutional layer with 384 filters of size 3x3.
- Conv4: Convolutional layer with 384 filters of size 3x3.
- Conv5: Convolutional layer with 256 filters of size 3x3, followed by max pooling.
- FC6: Fully connected layer with 4096 units.
- FC7: Fully connected layer with 4096 units.
- Output Layer: Fully connected layer with 1000 units (one for each class in ImageNet).
Code Example
model = models.Sequential() model.add(layers.Conv2D(96, (11, 11), strides=(4, 4), activation='relu', input_shape=(227, 227, 3))) model.add(layers.MaxPooling2D((3, 3), strides=(2, 2))) model.add(layers.Conv2D(256, (5, 5), activation='relu')) model.add(layers.MaxPooling2D((3, 3), strides=(2, 2))) model.add(layers.Conv2D(384, (3, 3), activation='relu')) model.add(layers.Conv2D(384, (3, 3), activation='relu')) model.add(layers.Conv2D(256, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((3, 3), strides=(2, 2))) model.add(layers.Flatten()) model.add(layers.Dense(4096, activation='relu')) model.add(layers.Dense(4096, activation='relu')) model.add(layers.Dense(1000, activation='softmax'))
- VGGNet
Overview
VGGNet, developed by the Visual Geometry Group at Oxford, is known for its simplicity and depth. It uses very small (3x3) convolution filters and has a uniform architecture.
Architecture
- Input Layer: 224x224 RGB image.
- Conv Layers: Multiple convolutional layers with 3x3 filters, followed by max pooling.
- FC Layers: Three fully connected layers, the first two with 4096 units and the third with 1000 units.
Code Example
model = models.Sequential() model.add(layers.Conv2D(64, (3, 3), activation='relu', input_shape=(224, 224, 3))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(128, (3, 3), activation='relu')) model.add(layers.Conv2D(128, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(256, (3, 3), activation='relu')) model.add(layers.Conv2D(256, (3, 3), activation='relu')) model.add(layers.Conv2D(256, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(512, (3, 3), activation='relu')) model.add(layers.Conv2D(512, (3, 3), activation='relu')) model.add(layers.Conv2D(512, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(512, (3, 3), activation='relu')) model.add(layers.Conv2D(512, (3, 3), activation='relu')) model.add(layers.Conv2D(512, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Flatten()) model.add(layers.Dense(4096, activation='relu')) model.add(layers.Dense(4096, activation='relu')) model.add(layers.Dense(1000, activation='softmax'))
- GoogLeNet (Inception)
Overview
GoogLeNet, also known as Inception, was developed by Google and won the ILSVRC 2014. It introduced the Inception module, which allows for more efficient computation.
Architecture
- Input Layer: 224x224 RGB image.
- Inception Modules: Multiple inception modules that apply convolutional filters of different sizes in parallel.
- Auxiliary Classifiers: Intermediate classifiers to combat the vanishing gradient problem.
- Output Layer: Fully connected layer with 1000 units.
Code Example
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, concatenate, Dense, Flatten from tensorflow.keras.models import Model input_img = Input(shape=(224, 224, 3)) def inception_module(x, filters): f1, f3_r, f3, f5_r, f5, pool_proj = filters conv1 = Conv2D(f1, (1, 1), padding='same', activation='relu')(x) conv3 = Conv2D(f3_r, (1, 1), padding='same', activation='relu')(x) conv3 = Conv2D(f3, (3, 3), padding='same', activation='relu')(conv3) conv5 = Conv2D(f5_r, (1, 1), padding='same', activation='relu')(x) conv5 = Conv2D(f5, (5, 5), padding='same', activation='relu')(conv5) pool = MaxPooling2D((3, 3), strides=(1, 1), padding='same')(x) pool = Conv2D(pool_proj, (1, 1), padding='same', activation='relu')(pool) return concatenate([conv1, conv3, conv5, pool], axis=-1) x = inception_module(input_img, [64, 96, 128, 16, 32, 32]) x = MaxPooling2D((3, 3), strides=(2, 2))(x) x = inception_module(x, [128, 128, 192, 32, 96, 64]) x = MaxPooling2D((3, 3), strides=(2, 2))(x) x = Flatten()(x) x = Dense(1000, activation='softmax')(x) model = Model(input_img, x)
- ResNet
Overview
ResNet, or Residual Network, introduced by Microsoft in 2015, won the ILSVRC 2015. It uses residual connections (skip connections) to allow for training much deeper networks.
Architecture
- Input Layer: 224x224 RGB image.
- Residual Blocks: Multiple residual blocks with identity shortcuts.
- Output Layer: Fully connected layer with 1000 units.
Code Example
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Activation, add, Dense, Flatten from tensorflow.keras.models import Model def residual_block(x, filters): shortcut = x x = Conv2D(filters, (3, 3), padding='same')(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = Conv2D(filters, (3, 3), padding='same')(x) x = BatchNormalization()(x) x = add([x, shortcut]) x = Activation('relu')(x) return x input_img = Input(shape=(224, 224, 3)) x = Conv2D(64, (7, 7), strides=(2, 2), padding='same')(input_img) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling2D((3, 3), strides=(2, 2))(x) x = residual_block(x, 64) x = residual_block(x, 64) x = residual_block(x, 64) x = Flatten()(x) x = Dense(1000, activation='softmax')(x) model = Model(input_img, x)
- DenseNet
Overview
DenseNet, or Densely Connected Convolutional Networks, introduced by Gao Huang et al. in 2016, connects each layer to every other layer in a feed-forward fashion.
Architecture
- Input Layer: 224x224 RGB image.
- Dense Blocks: Multiple dense blocks where each layer receives input from all previous layers.
- Output Layer: Fully connected layer with 1000 units.
Code Example
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Activation, concatenate, Dense, Flatten from tensorflow.keras.models import Model def dense_block(x, filters, growth_rate, layers): for _ in range(layers): conv = BatchNormalization()(x) conv = Activation('relu')(conv) conv = Conv2D(growth_rate, (3, 3), padding='same')(conv) x = concatenate([x, conv]) return x input_img = Input(shape=(224, 224, 3)) x = Conv2D(64, (7, 7), strides=(2, 2), padding='same')(input_img) x = BatchNormalization()(x) x = Activation('relu')(x) x = MaxPooling2D((3, 3), strides=(2, 2))(x) x = dense_block(x, 64, 32, 6) x = Flatten()(x) x = Dense(1000, activation='softmax')(x) model = Model(input_img, x)
Conclusion
In this section, we explored some of the most popular CNN architectures that have significantly impacted the field of deep learning. Each architecture has its unique characteristics and has contributed to advancements in image recognition tasks. Understanding these architectures provides a solid foundation for designing and implementing your own CNN models.
Next, we will delve into the applications of CNNs in image recognition, where we will see how these architectures are applied to solve real-world problems.
Deep Learning Course
Module 1: Introduction to Deep Learning
- What is Deep Learning?
- History and Evolution of Deep Learning
- Applications of Deep Learning
- Basic Concepts of Neural Networks
Module 2: Fundamentals of Neural Networks
- Perceptron and Multilayer Perceptron
- Activation Function
- Forward and Backward Propagation
- Optimization and Loss Function
Module 3: Convolutional Neural Networks (CNN)
- Introduction to CNN
- Convolutional and Pooling Layers
- Popular CNN Architectures
- CNN Applications in Image Recognition
Module 4: Recurrent Neural Networks (RNN)
- Introduction to RNN
- LSTM and GRU
- RNN Applications in Natural Language Processing
- Sequences and Time Series
Module 5: Advanced Techniques in Deep Learning
- Generative Adversarial Networks (GAN)
- Autoencoders
- Transfer Learning
- Regularization and Improvement Techniques
Module 6: Tools and Frameworks
- Introduction to TensorFlow
- Introduction to PyTorch
- Framework Comparison
- Development Environments and Additional Resources
Module 7: Practical Projects
- Image Classification with CNN
- Text Generation with RNN
- Anomaly Detection with Autoencoders
- Creating a GAN for Image Generation