Okay. So, what we would like to now do is to dive in to a little bit more of the details of what machine learning is. So, one of the key aspects of machine learning is that we're going to have a training set which again are examples of data and outcomes. Our goal is to be able to learn a model which is capable of predicting the outcomes given the data. So, to do learning we have an algorithm, that algorithm is characterised by a set of parameters and learning means that we would like to try to infer what parameters of that model are consistent with our training data. To make this a little bit more precise, we're going to start by considering one of the most basic and widely used algorithms, logistic regression. Logistic regression is a very good starting point. As we'll see it is a nice launching or point or building point for subsequent more complicated models. So, in machine learning, the goal is to take a training set and a training set in this case corresponds to N examples where we have a data X and an outcome Y. What we would like to do is to build a predictive model that is capable of predicting Y given X. That model again is characterised by parameters. Let's look at a very simple model which is a linear predictive model. So, here remember that X corresponds to the data. We will say that X sub i corresponds to the ith example. So, remember that we have N training examples shown here. We're going to look at one of them which will be the ith example and then X sub i1 corresponds to the first component of the vector X sub i, X sub i2 corresponds to the second component a vector X sub i down to X sub im which corresponds to the nth component of that vector. A very simple idea here is we're just going to multiply every component of the vector X by a parameter. Where that parameter B1 will be multiplied by the first component of Xi, parameter B2 will be multiplied by the second component of Xi and parameter BM will be multiplied by the nth component. We do all of those multiplications, we add them up and then we add an additional constant or what we call a bias, B sub zero and we're going to call this, this a mapping from the data X sub i to a number Z sub i, a variable Z sub i this is a very simple linear predictive model. So, if we go back to our example, the Y, if you recall corresponded to whether it rained or not on a given day and Yi equal to one, means that on day i it rained, Yi equals zero means that on day i it did not rain and then the features or the data that we're going to feed into our model with which to make a prediction are going to be for example the cloud cover, humidity, temperature, et cetera. So, therefore what we're doing with this model is we're taking the values for cloud cover, humidity and temperature multiplying them by corresponding parameters B1, B2, B3, et cetera. We're doing those multiplications adding that all up and then we constitute a variable Z1 for the first example, Z2 for the second example. Then what we would like to do is to convert our prediction of whether it's going to rain or not to a probabilistic representation so recall that Y sub i equal to one will correspond to it raining on day i. So, whenever we observe the data for example cloud cover, humidity, temperature, et cetera and we want to make a prediction, it's oftentimes better to not absolutely say whether it will rain or not on a given day but rather to assign a probability to say that with 70 percent confidence for example, it's going to rain on a given day, and so the way that we do this is by something called a logistic function which is represented by the symbol Sigma and it's drawn here. So, the idea is that we take that variable Zi, which is manifested as a multiplication of the components of the data X with corresponding parameters B1, B2 through BM, we calculate Z, we then feed it into this function and then out comes a number between zero and one. So, the key thing to notice about this function Sigma which is called the Sigmoid function, is that it always lives between zero and one. So, when Zi is large positive for example, around say five or six when Zi is around five or six, the Sigmoid function then converts that into a number which is close to one which would mean that we have high confidence that based upon that value of Zi it is going to rain on that day. If Zi is small, say negative two, negative three, negative four, then the output of the Sigmoid function is rather small close to zero which means that based upon that data, we say that there is a low probability that it will rain on a given day. So, this Sigmoid function is a way for us to convert predictions about the outcome on a given day into a probabilistic perspective. So, as was said as if Z is large, we have a high probability of outcome one, if Z is negative then we have a high probability of outcome Y equal to zero. So, this is again called logistic regression. To again link this back to the example. So, we might have data, the data in our objective, is based upon observed information about the climate we want to make a prediction about whether it is going to rain today. We're looking at for example cloud cover, humidity, we multiply those by corresponding parameter values B1, B2, et cetera. Those parameters B1, B2, B3, tell us how important those variables; those data variables are to the prediction and then finally that Z is sent into the Sigmoid function. This model, which is a very simple model just a linear combination of multiplying the observed data variables by associated parameters just summing them up, mapping that to a variable Zi and then running that through a Sigmoid function. This is a very classic model is called logistic regression. So, this really encapsulates what machine learning is all about. So, at the top center what machine learning, the heart of machine learning is we have an algorithmic model which is a parametric model that is characterised in terms of a set of parameters which we would like to learn. The way we do the learning on the left we have a set of data and for that data we have observations X and we have outcomes Y. We would like to learn the parameters of our model such that the predictions of the model are consistent with the training data. So, what we mean by learning is to infer the parameters B0 through BM which provide outputs of providing mapping from X to Y, that is consistent with the data. So, this concept of logistic regression is one of the most simple concept from standpoint of a model that one can look at. Although it is widely used and it is quite effective model, it captures really the heart and in some sense the simplest way of what machine learning is about. What we will see as we move forward is that the basic concepts of logistic regression are used repeatedly and in many aspects of machine learning in particular, deep learning.