Skip to content

Commit 16ea20f

Browse files
b-shermanastorfi
authored andcommitted
Cnn (#22)
* Created cnn module +Added current rst for cnn +Added current images I have made *Still need to make 9 more images *Still need to link images in rst * Added remaining images +Added images to _img folder *Still need to put figures in the rst * Added figures to rst * Updated cnn.rst +Add bold terms +Fixed image link * Updated cnn.rst +Added references section
1 parent c103c5d commit 16ea20f

File tree

12 files changed

+249
-0
lines changed

12 files changed

+249
-0
lines changed
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
#############################
2+
Convolutional Neural Networks
3+
#############################
4+
5+
.. contents::
6+
:local:
7+
:depth: 2
8+
9+
10+
********
11+
Overview
12+
********
13+
In the last module, we started our dive into deep learning by talking about
14+
multi-layer perceptrons. In this module, we will learn about **convolutional
15+
neural networks** also called **CNNs** or **ConvNets**. CNNs differ from other
16+
neural networks in that sequential layers are not necessarily fully connected.
17+
This means that a subset of the input neurons may only feed into a single
18+
neuron in the next layer. Another interesting feature of CNNs is their inputs.
19+
With other neural networks we might use vectors as inputs, but with CNNs we
20+
are typically working with images and other objects with many dimensions.
21+
*Figure 1* shows some sample images that are each 6 pixels by 6 pixels. The
22+
first image is colored and has three channels for red, green, and blue values.
23+
The second image is black-and-white and only has one channel for gray values
24+
25+
.. figure:: _img/Images.png
26+
27+
**Figure 1. Two sample images and their color channels**
28+
29+
30+
**********
31+
Motivation
32+
**********
33+
CNNs are widely used in computer vision where we are trying to analyze visual
34+
imagery. CNNs can also be used for other applications such as natural language
35+
processing. We will be focusing on the former case here because it is one of
36+
the most common applications of CNNs.
37+
38+
Because we assume that we’re working with images, we can design our
39+
architecture so that it specifically does a good job at analyzing images.
40+
Images have heights, depths, and one or more channels for color. In an image,
41+
there might be lines and edges that make up shapes as well as more complex
42+
structures such as cars and faces. We will potentially need to identify a
43+
large set of relevant features in order to properly classify an image. But
44+
just identifying individual features in an image usually isn’t enough. Say we
45+
have an image that may or may not be a face. If we saw three noses, an eye,
46+
and an ear, we probably wouldn’t call it a face even though those are common
47+
features of a face. So then we must also care about where features are located
48+
in the image and their proximity to other features. This is a lot of
49+
information to keep track of! Fortunately, the architecture of CNNs will cover
50+
a lot of these requirements.
51+
52+
53+
************
54+
Architecture
55+
************
56+
The architecture of a CNN can be broken down into an input layer, a set of
57+
hidden layers, and an output layer. These are shown in *Figure 2*.
58+
59+
.. figure:: _img/Layers.png
60+
61+
**Figure 2. The layers of a CNN**
62+
63+
The hidden layers are where the magic happens. The hidden layers will break
64+
down our input image in order to identify features present in the image. The
65+
initial layers focus on low-level features such as edges while the later
66+
layers progressively get more abstract. At the end of all the layers, we have
67+
a fully connected layer with neurons for each of our classification values.
68+
What we end up with is a probability for each of the classification values. We
69+
choose the classification with the highest probability as our guess for what
70+
the image show.
71+
72+
Below, we will talk about some types of layers we might use in our hidden
73+
layers. Remember that sequential layers are not necessarily fully connected
74+
with the exception of the final output layer.
75+
76+
Convolutional Layers
77+
====================
78+
The first type of layer we will discuss is called a **convolutional layer**.
79+
The convolutional description comes from the concept of a convolution in
80+
mathematics. Roughly, a convolution is some operation that acts on two input
81+
functions and produces an output function that combines the information
82+
present in the inputs. The first input will be our image and the second input
83+
will be some sort of filter such as a blur or sharpen. When we combine our
84+
image with the filter, we extract some information about the image. This
85+
process is shown in *Figure 3*. This is precisely how the CNN will go about
86+
extracting features.
87+
88+
.. figure:: _img/Filtering.png
89+
90+
**Figure 3. An image before and after filtering**
91+
92+
In the human eye, a single neuron is only responsible for a small region of
93+
our field of view. It is through many neurons with overlapping regions that we
94+
are able to see the world. CNNs are similar. The neurons in a convolutional
95+
layer are only responsible for analyzing a small region of the input image but
96+
overlap so that we ultimately analyze the whole image. Let’s examine that
97+
filter concept we mentioned above.
98+
99+
The **filter** or **kernel** is one of the functions used in the convolution.
100+
The filter will likely have a smaller height and width than the input image
101+
and can be thought of as a window sliding over the image. *Figure 4* shows a
102+
sample filter and the region of the image it will interact with in the first
103+
step of convolution.
104+
105+
.. figure:: _img/Filter1.png
106+
107+
**Figure 4. A sample filter and sample window of an image**
108+
109+
As the filter moves across the image, we are calculating values for the
110+
convolution output called a **feature map**. At each step, we multiply each
111+
entry in the image sample and filter elementwise and sum up all the products.
112+
This becomes an entry in the feature map. This process is shown in *Figure 5*.
113+
114+
.. figure:: _img/Filter2.png
115+
116+
**Figure 5. Calculating an entry in the feature map**
117+
118+
After the window traverses the entire image, we have the complete feature map.
119+
This is shown in *Figure 6*.
120+
121+
.. figure:: _img/Filter3.png
122+
123+
**Figure 6. The complete feature map**
124+
125+
In the example above, we moved the filter one unit horizontally or one unit
126+
vertically from some previous position. This value is called the **stride**. We
127+
could have used other values for the stride but using one everywhere tends to
128+
produce the best results.
129+
130+
You may have noticed that the feature map we ended up with had a smaller
131+
height and width than the original image sample. This is a result of the way
132+
we moved the filter around the sample. If we wanted the feature map to have
133+
the same height and width, we could **pad** the sample. This involves adding
134+
zero entries around the sample so that moving the filter keeps the dimensions
135+
of the original sample in the feature map. *Figure 7* illustrates this process.
136+
137+
.. figure:: _img/Padding.png
138+
139+
**Figure 7. Padding before applying a filter**
140+
141+
A feature map represents one type of feature we’re analyzing the image for.
142+
Often, we want to analyze the image for a bunch of features so we end up with
143+
a bunch of feature maps! The output of the convolution layer is a set of
144+
feature maps. *Figure 8* shows the process of going from an image to the
145+
resulting feature maps.
146+
147+
.. figure:: _img/Convo_Output.png
148+
149+
**Figure 8. The output of a convolutional layer**
150+
151+
After a convolutional layer, it is common to have a **ReLU** (rectified linear
152+
unit) layer. The purpose of this layer is to introduce non-linearity into the
153+
system. Basically, real-world problems are rarely nice and linear so we want
154+
our CNN to account for this when it trains. A good explanation of this layer
155+
requires math that we don’t expect you to know. If you are curious about the
156+
topic, you can find an explanation here_.
157+
158+
.. _here: https://www.kaggle.com/dansbecker/rectified-linear-units-relu-in-deep-learning
159+
160+
Pooling Layers
161+
==============
162+
The next type of layer we will cover is called a **pooling layer**. The
163+
purpose of pooling layers are to reduce the spatial size of the problem. This
164+
in turn reduces the number of parameters needed for processing and the total
165+
amount of computation in the CNN. There are several options for pooling but we
166+
will cover the most common approach, **max pooling**.
167+
168+
In max pooling, we slide a window over the input and take the max value in the
169+
window at each step. This process is shown in *Figure 9*.
170+
171+
.. figure:: _img/Pooled.png
172+
173+
**Figure 9. Max pooling on a feature map**
174+
175+
Max pooling is good because it maintains important features about the input,
176+
reduces noise by ignoring small values, and reduces the spatial size of the
177+
problem. We can use these after convolutional layers to keep the computation
178+
of problems manageable.
179+
180+
Fully Connected Layers
181+
======================
182+
The last type of layer we will discuss is called a **fully connected layer**.
183+
Fully connected layers are used to make the final classification in the CNN.
184+
They work exactly like they do in other neural networks. Before moving to the
185+
first fully connected layer, we must flatten our input values into a
186+
one-dimensional vector that the layer can interpret. *Figure 10* shows a
187+
simple example of converting a multi-dimensional input into a one-dimensional
188+
vector.
189+
190+
.. figure:: _img/Flatten.png
191+
192+
**Figure 10. Flattening input values**
193+
194+
After doing this, we may have several fully connected layers before the final
195+
output layer. The output layer uses some function, such as softmax_,
196+
to convert the neuron values into a probability distribution over our classes.
197+
This means that the image has a certain probability for being classified as
198+
one of our classes and the sum of all those probabilities equals one. This is
199+
clearly visible in *Figure 11*.
200+
201+
.. _softmax: https://developers.google.com/machine-learning/crash-course/multi-class-neural-networks/softmax
202+
203+
.. figure:: _img/Layers_Final.png
204+
205+
**Figure 11. The final probabilistic outputs**
206+
207+
208+
********
209+
Training
210+
********
211+
Now that we have the architecture in place for CNNs we can move on to
212+
training. Training a CNN is pretty much exactly the same as training a normal
213+
neural network. There is some added complexity due to the convolutional layers
214+
but the strategies for training remain the same. Techniques, such as gradient
215+
descent or backpropagation, can be used to train filter values and other
216+
parameters in the network. As with all the other training we have covered,
217+
having a large training set will improve the performance of the CNN. The
218+
problem with training CNNs and other deep learning models is that they are
219+
much more complex than the models we covered in earlier modules. This results
220+
in training being much more computationally expensive to the point where we
221+
would need specialized hardware like GPUs to run our code. However, we get
222+
what we pay for because deep learning models are much more powerful than the
223+
models covered in earlier modules.
224+
225+
226+
*******
227+
Summary
228+
*******
229+
In this module, we learned about convolutional neural networks. CNNs differ
230+
from other neural networks because they usually take images as input and can
231+
have hidden layers that are not fully connected. CNNs are powerful tools
232+
widely used in image classification applications. By using a variety of hidden
233+
layers, we can extract features from an image and use them to
234+
probabilistically guess a classification. CNNs are also complex models and
235+
understanding how they work can be an intimidating task. We hope that the
236+
information presented gives you a better understanding of how CNNs work so
237+
that you can continue to learn about them and deep learning.
238+
239+
240+
**********
241+
References
242+
**********
243+
#. https://towardsdatascience.com/convolutional-neural-networks-for-beginners-practical-guide-with-python-and-keras-dc688ea90dca
244+
#. https://medium.com/technologymadeeasy/the-best-explanation-of-convolutional-neural-networks-on-the-internet-fbb8b1ad5df8
245+
#. https://medium.freecodecamp.org/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050
246+
#. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
247+
#. https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
248+
#. https://www.kaggle.com/dansbecker/rectified-linear-units-relu-in-deep-learning
249+
#. https://en.wikipedia.org/wiki/Convolutional_neural_network#ReLU_layer

0 commit comments

Comments
 (0)