Hello, and welcome! In this video, we’ll explain how CNN can solve an image classification problem such as digit recognition. First, let’s see why traditional shallow neural networks do not work as well as CNN. As mentioned before, one of the important steps in the modeling for classification with Shallow Neural Networks, is the feature extraction step. These chosen features could simply be the color, object edges, pixel location, or countless other features that could be extracted from the images. Of course, the better and more effective the feature sets you find, the more accurate and efficient the image classification you can obtain. However, as you can imagine, the process of selecting the best features is tremendously time-consuming, and is often ineffective. Also, extending the features to other types of images becomes an even greater problem. Convolutional neural networks (or CNNs) try to solve this problem by using more hidden layers, and also with more specific layers. So, when using CNN, instead of you choosing image features, to classify dogs vs. cats, for instance, CNNs can automatically find those features and classify the images for you. Instead of using a dogs and cats dataset, though, let’s use a more practical example here. The MNIST dataset is a "database of handwritten digits that has a training set of 60,000 examples. The great thing about the MNIST dataset is that the digits have been size-normalized and centered in a fixed-size image". Now, consider the digit recognition problem. We would like to classify images of handwritten numbers, where the observations are the intensity of the pixels. And, the target will be a digit, from 0 to 9. So, our objective here is to build a digit recognition system using CNN. Now let’s look at it from a higher level. Basically, if we look at the pipeline of our deep learning process, we can see the following phases: First, pre-processing of input data. Second, training the deep learning model. And third, inference and deployment of the model. In the first part, we have to convert the images to a readable and proper format for our network. Then, an untrained network is fed with a big dataset of the images in the Training phase, so as to allow the network to learn. That is, we build a CNN, and train it with many hand-written images in the training set. Finally, we use the trained model in the Inference phase, which classifies a new image by inferring its similarity to the trained model. So, this model can be deployed and used as a digit recognition model for unseen cases. Now let’s focus on the training process. As mentioned before, a deep neural network not only has multiple hidden layers, the type of layers and their connectivity also is different from a shallow neural network, in that it usually has multiple Convolutional layers, pooling layers, as well as fully connected layers. The convolutional layer applies a convolution operation to the input, and passes the result to the next layer. The pooling layer combines the outputs of a cluster of neurons in the previous layer into a single neuron in the next layer. And then, the fully connected layers connect every neuron in the previous layer to every neuron in the next layer. In the next video, you will learn more about these layers and their specifications, in detail. Thanks for watching.