Deep learning is a neural network architecture where the trained algorithm goes through multiple layers in the neural network. These multiple layers act as a low level representation of each layer and by combining these layers, the models are able to learn more accurately. Regular Neural Networks transform an input by putting it through a series of hidden layers. Every layer is made up of a set of neurons, where each layer is fully connected to all neurons in the layer before.
Meanwhile, convolutional neural networks (CNN/ConvNet) are a type of neural network that uses different architecture than regular Neural Networks. CNNs are fully connected networks that make the assumption that the inputs are images. The structure of CNNs were influenced by the neural connections in human vision since the way in which neurons overlap the stimuli received to the visual field to create an image is similar to how the connected layers in CNNs overlap to create a more accurate representation of the input image.
Using a mathematical function called convolution, CNNs have a more specific set of ideal use cases than deep learning, including facial recognition and medical imaging. CNNs are generally composed of three layers: convolutional layer, rectified linear unit layer, pooling layer, and fully connected layer.
- Convolution Layer creates N x N blocks as overlapping filters to analyze features of an image, and classifies them using a weighted sum from the inputs. However, a limitation of convolution layers is that they are not efficient to use for larger inputs like high resolution images because it would require an excessive amount of inputs from each filter.
- Rectified Linear Unit Layer (ReLu layer) keeps the CNNs in check by replacing negative numbers with 0’s which prevents the learned values from getting stuck near zero or infinity. This makes sure that the images are not linear to one another and are made up of different objects.
- Pooling Layer reduces the dimensions of the feature block by combining points at one layer into a single point at the next layer for the purposes of retaining only the important information and increasing efficiency.
- Fully Connected Layer flattens the matrix of the pooled feature maps into a single layer. Each additional layer adds more sophistication to the features to create a better model.
References:
- Lu, Le, et al. Deep Learning and Convolutional Neural Networks for Medical Image Computing Precision Medicine, High Performance and Large-Scale Datasets. Springer International Publishing, 2018.
- Iglesias, Lara Lloret, et al. “A Primer on Deep Learning and Convolutional Neural Networks for Clinicians.” Insights into Imaging, Springer International Publishing, 12 Aug. 2021, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8360246/.
- “Convolutional Neural Networks” Dive into Deep Learning 0.17.5 Documentation, https://www.d2l.ai/chapter_convolutional-neural-networks/index.html.
- “CS231N Convolutional Neural Networks for Visual Recognition.” CS231n Convolutional Neural Networks for Visual Recognition, https://cs231n.github.io/.