In this section, we will explore some of the most influential and widely used Convolutional Neural Network (CNN) architectures. These architectures have set benchmarks in various image recognition tasks and have inspired numerous advancements in the field of deep learning.

Key Concepts

  1. LeNet-5
  2. AlexNet
  3. VGGNet
  4. GoogLeNet (Inception)
  5. ResNet
  6. DenseNet

  1. LeNet-5

Overview

LeNet-5, developed by Yann LeCun and his colleagues in 1998, is one of the earliest CNN architectures. It was designed for handwritten digit recognition (MNIST dataset).

Architecture

  • Input Layer: 32x32 grayscale image.
  • C1: Convolutional layer with 6 filters of size 5x5, followed by a subsampling layer.
  • S2: Subsampling layer (average pooling) with a 2x2 filter.
  • C3: Convolutional layer with 16 filters of size 5x5.
  • S4: Subsampling layer (average pooling) with a 2x2 filter.
  • C5: Fully connected convolutional layer with 120 filters of size 5x5.
  • F6: Fully connected layer with 84 units.
  • Output Layer: Fully connected layer with 10 units (one for each digit).

Code Example

import tensorflow as tf
from tensorflow.keras import layers, models

model = models.Sequential()
model.add(layers.Conv2D(6, (5, 5), activation='relu', input_shape=(32, 32, 1)))
model.add(layers.AveragePooling2D((2, 2)))
model.add(layers.Conv2D(16, (5, 5), activation='relu'))
model.add(layers.AveragePooling2D((2, 2)))
model.add(layers.Conv2D(120, (5, 5), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(84, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

  1. AlexNet

Overview

AlexNet, created by Alex Krizhevsky et al. in 2012, won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and significantly advanced the field of deep learning.

Architecture

  • Input Layer: 227x227 RGB image.
  • Conv1: Convolutional layer with 96 filters of size 11x11, stride 4, followed by max pooling.
  • Conv2: Convolutional layer with 256 filters of size 5x5, followed by max pooling.
  • Conv3: Convolutional layer with 384 filters of size 3x3.
  • Conv4: Convolutional layer with 384 filters of size 3x3.
  • Conv5: Convolutional layer with 256 filters of size 3x3, followed by max pooling.
  • FC6: Fully connected layer with 4096 units.
  • FC7: Fully connected layer with 4096 units.
  • Output Layer: Fully connected layer with 1000 units (one for each class in ImageNet).

Code Example

model = models.Sequential()
model.add(layers.Conv2D(96, (11, 11), strides=(4, 4), activation='relu', input_shape=(227, 227, 3)))
model.add(layers.MaxPooling2D((3, 3), strides=(2, 2)))
model.add(layers.Conv2D(256, (5, 5), activation='relu'))
model.add(layers.MaxPooling2D((3, 3), strides=(2, 2)))
model.add(layers.Conv2D(384, (3, 3), activation='relu'))
model.add(layers.Conv2D(384, (3, 3), activation='relu'))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((3, 3), strides=(2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dense(1000, activation='softmax'))

  1. VGGNet

Overview

VGGNet, developed by the Visual Geometry Group at Oxford, is known for its simplicity and depth. It uses very small (3x3) convolution filters and has a uniform architecture.

Architecture

  • Input Layer: 224x224 RGB image.
  • Conv Layers: Multiple convolutional layers with 3x3 filters, followed by max pooling.
  • FC Layers: Three fully connected layers, the first two with 4096 units and the third with 1000 units.

Code Example

model = models.Sequential()
model.add(layers.Conv2D(64, (3, 3), activation='relu', input_shape=(224, 224, 3)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(512, (3, 3), activation='relu'))
model.add(layers.Conv2D(512, (3, 3), activation='relu'))
model.add(layers.Conv2D(512, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(512, (3, 3), activation='relu'))
model.add(layers.Conv2D(512, (3, 3), activation='relu'))
model.add(layers.Conv2D(512, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dense(1000, activation='softmax'))

  1. GoogLeNet (Inception)

Overview

GoogLeNet, also known as Inception, was developed by Google and won the ILSVRC 2014. It introduced the Inception module, which allows for more efficient computation.

Architecture

  • Input Layer: 224x224 RGB image.
  • Inception Modules: Multiple inception modules that apply convolutional filters of different sizes in parallel.
  • Auxiliary Classifiers: Intermediate classifiers to combat the vanishing gradient problem.
  • Output Layer: Fully connected layer with 1000 units.

Code Example

from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, concatenate, Dense, Flatten
from tensorflow.keras.models import Model

input_img = Input(shape=(224, 224, 3))

def inception_module(x, filters):
    f1, f3_r, f3, f5_r, f5, pool_proj = filters
    conv1 = Conv2D(f1, (1, 1), padding='same', activation='relu')(x)
    conv3 = Conv2D(f3_r, (1, 1), padding='same', activation='relu')(x)
    conv3 = Conv2D(f3, (3, 3), padding='same', activation='relu')(conv3)
    conv5 = Conv2D(f5_r, (1, 1), padding='same', activation='relu')(x)
    conv5 = Conv2D(f5, (5, 5), padding='same', activation='relu')(conv5)
    pool = MaxPooling2D((3, 3), strides=(1, 1), padding='same')(x)
    pool = Conv2D(pool_proj, (1, 1), padding='same', activation='relu')(pool)
    return concatenate([conv1, conv3, conv5, pool], axis=-1)

x = inception_module(input_img, [64, 96, 128, 16, 32, 32])
x = MaxPooling2D((3, 3), strides=(2, 2))(x)
x = inception_module(x, [128, 128, 192, 32, 96, 64])
x = MaxPooling2D((3, 3), strides=(2, 2))(x)
x = Flatten()(x)
x = Dense(1000, activation='softmax')(x)

model = Model(input_img, x)

  1. ResNet

Overview

ResNet, or Residual Network, introduced by Microsoft in 2015, won the ILSVRC 2015. It uses residual connections (skip connections) to allow for training much deeper networks.

Architecture

  • Input Layer: 224x224 RGB image.
  • Residual Blocks: Multiple residual blocks with identity shortcuts.
  • Output Layer: Fully connected layer with 1000 units.

Code Example

from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Activation, add, Dense, Flatten
from tensorflow.keras.models import Model

def residual_block(x, filters):
    shortcut = x
    x = Conv2D(filters, (3, 3), padding='same')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv2D(filters, (3, 3), padding='same')(x)
    x = BatchNormalization()(x)
    x = add([x, shortcut])
    x = Activation('relu')(x)
    return x

input_img = Input(shape=(224, 224, 3))
x = Conv2D(64, (7, 7), strides=(2, 2), padding='same')(input_img)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D((3, 3), strides=(2, 2))(x)
x = residual_block(x, 64)
x = residual_block(x, 64)
x = residual_block(x, 64)
x = Flatten()(x)
x = Dense(1000, activation='softmax')(x)

model = Model(input_img, x)

  1. DenseNet

Overview

DenseNet, or Densely Connected Convolutional Networks, introduced by Gao Huang et al. in 2016, connects each layer to every other layer in a feed-forward fashion.

Architecture

  • Input Layer: 224x224 RGB image.
  • Dense Blocks: Multiple dense blocks where each layer receives input from all previous layers.
  • Output Layer: Fully connected layer with 1000 units.

Code Example

from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Activation, concatenate, Dense, Flatten
from tensorflow.keras.models import Model

def dense_block(x, filters, growth_rate, layers):
    for _ in range(layers):
        conv = BatchNormalization()(x)
        conv = Activation('relu')(conv)
        conv = Conv2D(growth_rate, (3, 3), padding='same')(conv)
        x = concatenate([x, conv])
    return x

input_img = Input(shape=(224, 224, 3))
x = Conv2D(64, (7, 7), strides=(2, 2), padding='same')(input_img)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D((3, 3), strides=(2, 2))(x)
x = dense_block(x, 64, 32, 6)
x = Flatten()(x)
x = Dense(1000, activation='softmax')(x)

model = Model(input_img, x)

Conclusion

In this section, we explored some of the most popular CNN architectures that have significantly impacted the field of deep learning. Each architecture has its unique characteristics and has contributed to advancements in image recognition tasks. Understanding these architectures provides a solid foundation for designing and implementing your own CNN models.

Next, we will delve into the applications of CNNs in image recognition, where we will see how these architectures are applied to solve real-world problems.

© Copyright 2024. All rights reserved