Welcome to our demo here on recurrent neural nets. In this demo we're going to be using recurrent neural nets to classify the sentiments on an IMDB dataset. IMDB is just going to be movie reviews in general. Our data consists of 25,000 training sequences, so those are just going to be sequences representing different reviews. We're also going to have with at 25,000 tests sequences that we can test how well we're able to train our data. Then our outcome will be binary, either it's a positive review or a negative review, and we have those will be labeled for us. Keras provides a convenient interface to load in this data, and this is actually built into Keras as it's dataset. We'll immediately encode those words as integers, and those are going to be based on the most common words, and we'll see in just a bit how those are encoded as integers. Then from there, what we're going to do is we're going to actually show you how to come up with vector representations of those words and then train our actual recurrent neural nets. First things first, we import all the necessary libraries. So Keras, some that I want to point out here is that we're going to bring in embedding, and we didn't talk about this much. But when we use embedding, what we're doing is we're taking our, in this case the integers, but taking those sequences and taking those words and coming up with the word vectors that will represent the syntax or the context of that word in a way. If you have two words that are basically synonyms, such as doing something fast or doing something quickly or fastened quickly, could have very similar meanings. The embedding will have vectors that are very similar to one another, so that gives you another layer of learning that we're going to come up with, and that's going to be our embedding layer. We're also going to import simple RNN, so just going to be that recurrent neural net. We talked about how that is going to be simpler than the versions we're going to learn in later videos such as LSTMs, and that it may have that problem of that longer-term learning within a longer sequence. But just to learn how the cells piece together, when I start off here with a simple RNN. We're then going to initialize the length of our features, and we see that max features is 20,000, and this is used when we're loading in our data. Using the IMDB dataset, it's going to pick the most common whatever number of max features we have here 20,000, the most common 20,000 words. We're then going to, and we discussed this in lecture a bit, set the maximum length of the sequence and will truncate after this, as well as pad if it's not up to that length, and then we'll decide our batch size here as well. When we do our deep learning, of course we always say what the batch size is, so when we went through one epoch, that'll be decided by whatever our batch sizes and the size of the entire dataset. So you can imagine with a batch size of 32 and quite a large dataset, we'll go through many iterations of gradient descent before going through a single epoch. We're then going to load in our data, and when we load in our data, we just call an IMDB.loaddata, and it's the only parameter we needed to pass here, or that we did pass here is going to be that max features of there 20,000, which again is the 20,000 most common words within our dataset. That will output x train or y train as well as x tests and a y test. We can look at the length of each, and they should be equal to that 25,000 that we just discussed. We'll take just a second to load here, and here we see 25,000 train sequences and 25,000 test sequences. Now, as I mentioned, we're also going to pad or truncate each of our sequences using that max length that we discussed earlier, which was equal to 30, as we see right over here. So we set sequence, pulling in that sequence that we pulled here, in terms of, from that pre-processing library, we have the functionality of padding our sequences. So Keras has something built in, in order to quickly pad or truncate our sequences, and we set x train to that max length, as well as our x tests to that max length. Now when we look at the shape of those sequences, we see that they're now at 25,000 different examples, where each one of those examples is of length 30. If we want to see what one of those examples look like, we see here that, again, this is meant to represent a bunch of words, and each one of those words are represented by a single integer. Now, our goal is to build out our recurrent neural net, and in order to do so, we should dive-in a bit into what this embedding layer is, as well as how this simpleRNN layer works. So rather than using pretrained word vectors, we're going to learn what those vectors actually are using this embedding layer. Now, when I say that, again, that embedding will allow you to have that context, so that similar words will have vectors close to each other. So if we're talking about X dimensional space, let's see, what was our dimensional space here? We put in word embedding of 50, we see it down here, then we are going to have a vector that has 50 numbers, and in 50 dimensional space, one vector should be close to the other if they're similar in meaning. We're going to learn whether they're similar in meaning using this embedding layer. The layer maps each integer into a distinct, dense word vector of length, output them, as we just mentioned, and we can think of this as learning that word vector embedding on the fly, so using the context of IMDb reviews, So we'll be specific to IMDb, which could be powerful. Something to note, if you're trying to do embeddings on your own, there are pretrained embeddings available, such as Word2vec, and because that's pretrained, it makes it easy to actually take whatever dataset, and automatically use that embedding and come up with vectors that are similar to one another if they're synonyms. We then are going to have, again, that input dimension should be the size of our vocabulary, and then the input length specifies the length of the sequences that the network's going to expect, and we just discussed how we're going to keep that at 30 by padding or truncating accordingly. We then have our simpleRNN layer, which we're going to pass in the number of units, so we can think again to our diagram that we saw earlier and we can say how many units we want that to output. We can say what type of activation to use, tanh is usually best as we pass through our simpleRNN, but we have our options of working with others, and feel free to play around with that. Then we have our kernel initializer and our recurrent initializer, which are going to be the initial values for our weight [inaudible]. Again, that kernel initializer is going to be the weights for the input, and the recurrent initializer is going to be an initialized weights for those state layers. Here, we're actually going to change that activation to relu, if you see. You can try going back to tanh and see how that works, and then we're also going to just pass in that input shape, which is just going to be, if we call x_train.shape1: then we should have, and we can just look at what that is, x_train.shape1:, and we see that that's going to be of shape 30. It doesn't matter how many examples we have, in general, when you're trying to pass a shape, you're going to be passing in that shape of what a single vector would look like. Let's build out our first RNN. Our rnn_hidden_dim is going to be equal to five, our word embedding dimension is going to be 50. So again, we're going to take those integers that we currently have, and given their context, come up with an embedding, where it's going to transfer each one of those single values into a vector that's of dimension 50. We're then going to initialize our model, add on our embedding layer, pass in the max features, as well as the word embedding dim. That max features is going to be what we have here, 20,000, to give us what the actual input dimension is. Then our word embedding is going to tell us our new dimensions, and then that's going to be the first layer. Once we have our new embedding and our data ready to be fed forward, we can pass that through our simple recurrent neural network. We pass them in the number of hidden dimensions, which is just five. We then call our kernel initializer as well as our recurrent initializer, again initializing those weights for that first layer for our input as well as that state layer. What this is is just random normal with very tight standard deviation around that zero for random values, and then this is just going to be a diagonal matrix where along a diagonal we're going to have a bunch of ones. This shouldn't make that large of a difference starting off, you can try just removing these and using the default values which we have up here. I've tried it and I believe they're around similar performance, I think this out performs it by just a bit. We then set our activation to ReLU, input shape equal to that x_train.shape that we just saw, and then finally to get just one output, because we just want positive or negative, we add on that dense layer with the activation of sigmoid. Now we have our model and we can look at the summary, and we see we have to train a bunch of parameters for that embedding layer. Then for the simple RNN, if we think about it, we're going to have in that initial matrix, going from our input to our state layer, we should have a 50. We have 50 as input, and then we have five hidden cells. We add on that bias term, so we end up with 50 times 5 plus the five bias term, so 255 weights there, and then to go from one state layer to the next, we recall that we're going to use a five by five matrix that keep that same dimension. That's going to be another 25 weights that we learn, and that's how we get 280 parameters that we are currently learning, then finally that dense layer which will just be those five input plus the bias term. We can then call our optimizer, we're going to use RMSprop with a learning rate of 0.0001. We're going to use binary cross entropy since we're deciding between 0 and 1, we're going to use that optimizer that we just discussed, and we're going to track that accuracy as well. Finally in order to fit that model, we can pass that in, passing our x_train, our y_train, we pass in our batch size, which we defined earlier as 32, the number of epochs, as well as the validation set which is going to be our x_test and the y_test, to allow us to evaluate how well we're actually performing and whether we're over fitting on that holdout set. You run this, and this will take just a bit, so I'm going to pause the video here, and we'll come back when this is done running and discuss the results. Now our model has run, went through the 10 epochs. As we're starting to learn or probably have learned at this point, oftentimes it will take a bit for our deep learning models to actually learn each one of the weights and to optimize on the models that we're trying to run. Now we're going to call model_rnn.evaluate, and we're going to evaluate on our test set, so on our x_test and y_test we call evaluate, this will take just a second, it's not too long, and we're going to get our score and our accuracy. That score is just going to be our binary cross entropy loss, so our loss score. Then our accuracy will just be our actual accuracy. We have all these lines here, we're going to scroll down to the bottom since we've printed it out, and we see that we have a score, that log loss of 0.45, and then a test accuracy of 0.78. That closes out this video. In the next video we're going to briefly touch on different ways that we can manipulate the models that we just went through, trying different parameters and hyper parameters, we're not going to go through all possible different parameters, hyper parameters, but we will discuss them, and after we go through it in the next video, I suggest that you as well at home go through and try playing around with each one of the different parameters. I'll see you there.