In order to capture this relationship between our different features, those features being the different pixels within our image. In order to capture this relationship, we're going to make use of kernels. Now our kernel's just going to be a grid of weights, that's going to be overlaid on a certain portion of our image centered around a single pixel. Now, once that kernel's overlaid on that portion of the image, each weight from the kernel is going to be multiplied by the pixel. Remember that pixel is just going to be some number, so multiplied by that number beneath it, and the output over that centered pixel that we're going to get by overlaying this kernel on that portion of the image, is just going to be the sum of all of those multiplications of the kernel and its respective pixels. That's going to be the convolutional operation, and that's where we get this name for convolutional neural nets. This method of using kernels is going to be what allows us to capture the relationships of nearby pixels, to detect blurred portions of images, sharp portions, edges, etc. Let's look at an example of this in action using a three by three kernel. If these are going to be the different values for the pixels in, say our image in this example is just three by three, and then what we have here is our kernel. We want to think about how would we calculate the output? Note that overlaying a three by three kernel on a three by three image will only output one single value, and that one single value will be at the center of what will ultimately be our output matrix, which we see here to the right. The key will be to overlay that kernel on top of the image. Now, in a way, it's just going to be like a dot product, where we will take, starting at that first row, cell by cell, we'll take the 3, multiply it by negative 1, that's the top-left corner times the top-left corner of the kernel, and then 2 plus 2 times 0 plus 1 times 1, and you see we multiplied each value with its respective point within the kernel across that first row. We keep adding that up row by row. We look at the second row. We do 1 times negative 2, 2 times 0 plus 3 times 2. Do the same thing for the third row. Now we have nine different multiplications all being added up. Each one with their respective values in the kernel, similar to how we'd work with a dot product. We add those all together and we end up with this output value at 2. The way this will work when you're working with an actual image is your original input will probably be something larger than three by three, as we see here. What we do is we just slide over that kernel that we have using that same kernel, slide it one over to the right, and by sliding it over one to the right, that will provide the output to the right of that two value within our output matrix, and similarly, if we had larger input, again, it's not a three by three input, we get to slide that kernel one cell down and do all the multiplication, take that dot products, and we would have the output within our output matrix right below the two, because we slid it down one. We'd slide that kernel across every single space that it can throughout our input image. Now, you can think of the kernels as feature detectors. Here, we have a vertical line detector and there are some good videos on how you detect an actual line using matrix similar to what we see here using that convolutional function. But the basic concept is just that as you move this filter along some type of vertical edge, assuming you have that vertical edge and run that convolution and get your output, you end up being able to highlight that there is an existence of this vertical line. Similarly, we can overlay the filter that we see here and detect a horizontal line, or use this filter that we have here, run it across and detect any corners that we may have in the image. The point being that we want to take away from here is that each one of these different kernels will be able to detect edges, whether they're vertical, horizontal, diagonal, corners, or other combinations of features that may be important. Now, these different filters that we just introduced are powerful to have some type of intuition of what a filter can be. But in reality, the network will find those most useful kernels for you. Also, I'd like to know, we'll probably set up our framework so that we learn many different kernels, not just one, but every single one of these different kernels will operate across that entire image, and this allows for that translation invariance. It doesn't matter where the object is within an image or whether that object is flipped or what the size of that object is. Then also compared to our fully connected architecture. If you think about just having as many different kernels as we have and each one only having nine weights, this is going to require much less parameters to learn. This will reduce that overall variance in regards to that bias-variance tradeoff. Now to bring this all home when you're working with images, generally speaking, most of our images will not just be on the grayscale, but rather have color. For a color image to be represented numerically, there will have to be, generally speaking, most common is three two-dimensional arrays, all stacked one on top of the other. As we see here, where each one of these two-dimensional arrays represents either the red scale, the green scale, or the blue scale, respectively. Now, to move our kernels to three-dimensions, rather than using the convolution operation using just this kernel that's three by three, we're going to use convolutions on a filter. Filters the term once we move up to three dimensions, which may be three by three by three. It's going to be three, three by three kernels all stack together. So that instead of having nine multiplications added together to get our one output, we have the sum of 27 multiplications. You could make about the filters that we learned, where we'll have nine for each one of these different dimensions, nine for red, nine for green, and nine for blue, and we multiply those respectively to each one of their different components within that input image to get our one-centered output. So we're adding together 27 different multiplications. Once we use that filter, we'll go back to two-dimensional output rather than these three-dimensions. Now, something that you may have noticed as we went through this idea of working with convolutions, is that when we work with these centered values and are trying to output centered values, the edges of our image and the corners of our image tend to get somewhat overlooked. In the next video, we're going to address this problem and introduce the concept of padding. All right, I'll see you there.