In this Colab we'll explore a very basic class activation map using the fashion hymnist dataset. I'm going to start off just by importing the libraries that I need. Then I'm going to create my x train, y train, x tests, y tests, by just loading the data from Fashion MNIST. Because it's a convolutional neural network, the 28 by 28 images will have to be reshaped into 28 by 28 by 1 for the monochrome layer. I'm going to reshape x train and x tests that way. If we just want to show an image, we can show and see what it looks like. This is item three in x train and like item one looks like this just so we can see that we're loading our images correctly. Next, I'm going to create the convolutional neural network. This is gratuitous, just for the point of the demo, you probably wouldn't need four layers of convolutions for something like Fashion MNIST, but I just want to do it that way so we can see the class activation maps. I'm creating it here with four convolutional layers. At the end instead of a max pooling layer, I'm going to use an average pooling layer so that we don't lose out some of the features by doing max pooling. Also that we don't reduce the image too much from three by three like to one by one because then we'd lose pretty much all of our information. I've trained it for five epochs. At the end of five epochs, we can see it's about 90 percent accurate. We could probably train it for longer and do better, but this is enough for our demonstration. When you look at the layers in a convolutional neural network or in any network that sequential, You can actually index the layers by counting backwards from the bottom. Dense underscore 1, we indexed minus 1, the global average pooling will be index minus 2, et cetera. For example here if I say print model layers by those names, we can see it's convd2d_7 and dance one. It isn't convd2d_7, as the one here. Now, I'm just going to import the model from Keras models. Then I can use that to create a subclass model, specifying my inputs, other model that inputs and my outputs. Now, it's going to be like a model with two outputs. One of them will be the model dot layer is minus three, which is convd2d_7. The other one will be the model dot layers minus 1, which is our dance output. Here I'm just going to look at model layers minus 1, which is the dense output and get the weights for that, and I'm going to use them as the weights on my final classification. You'll see that in just a moment. I can do features comma results equals count model.predict and pass it in the test set. If I pass it in the test set of 10 thousand items, I'm going to get back features which is 10 thousand items that are three by three with a 128 filters. We remember here that we had the output from convd2d_7 was three by three images and 128 filters. I now know that my features are going to look like that. Of course, my results will be the prediction for the value. That's the ten neurons, each containing the probabilities of that the x tests matches that class. Here's the show cam feature that we spoke about in the lecture. I won't go through the code for that again. But what I've added onto this one is just to be able to show the images. I said it so that if the prediction is greater than 0.95, it's going to overlay the class activation map, shading that with greens, otherwise going to shade it with reds. All I do is like I say, I pick a desired class, like say I wanted the desired class to be number nine, and then go through the first 100 in the test set. If it is the desired class, show it. Then based on the prediction, will be able to see if it's a green or a red prediction. Here are my predictions that come out of it. As you can see, this one's class nine, It's an ankle boot. The probability that came out of it was 0.9963 that it's an ankle boot. But I'm just interested in where the attention was spent. We can see from the dock here that a lot of attention was spent in the center of the image. It looks like when the attention is spent extensively in the center of the image, we're getting really good probabilities. But not always like here we can see the attention was still in the center of the image, but there was something different enough that the classification was still correct, but the certainty was lower. This is a 0.882 and we can see that this ankle boot is somewhat different than the other, where these areas at the top and the bottom didn't have as much pixels in them, so it may be that there was something there. But just given the fact that we had three by three filters that were coming out of this, that we scale back up to 28 by 28. This is not going to be very accurate by any means. It just gives us an idea for how the class activation mapping works. When we look at more detailed images later on there are a higher resolution, and the final image coming from the convolutions is also higher resolution. It might give us better clues as to what the attention was being spent on. It gave us the ability to visualize that.