Now let's zoom in on a single node in the middle of our basic neural network. First that node will get input values from the previous layer wherever that node lies. Those input values are then going to be combined via weights from each of those different values, similar to basic multiple linear regression. Then that combination of weights is going to be transformed similar to how logistic regression transforms a linear combination to squash those values between 0 and 1, and that transform value is used as input for the next layer. Now let's add in some variables to paint this process a bit clear. We can have as our three input values: X1, X2, and X3, and we'll assume also an intercept term with that value equal to 1, as we do with multiple regression. We also have the respective weights for each one of our different values: W1, W2, and W3, as well as b, and our model is going to learn each one of these weights as well as the b. As mentioned, we will multiply each value by its weight as we do with linear regression, and end up with some output value z. Finally we're going to use an activation function, like I said, similar to logistic regression or even logistic regression, to transform that output and use that value as input for the next layer. Now without this activation function we are restricted to only linear output or a linear combinations of our inputs, and no matter how many layers deep we go, we're still just working with a linear combination of our features. It's going to be this activation function that allows for the great flexibility with respect to how we consider the model outputs given our model inputs using a neural network. Now, some notation that'll be worth getting familiar with as we walk through working with neural networks. We have z, which is going to be the net input or the linear combination of the inputs prior to activation. Essentially the output of just that linear regression. We'll have our bias term or that b that we just saw, which is also similar to our bias term within linear regression, we'll have f, our activation function, that non-linear function we use to transform the output of z. Then we have a, our output layer, or the value once we take f of z, once we transform z, that we ultimately pass through to our next layer. Now with this syntax in mind, as well as that basic unit that we just walked through, that basic neuron, we've seen that there is a lot of relation between that neuron and the logistic regression. When we choose f of z equals to 1 over 1 plus e to the negative z, where z is our output of just the linear part of that neuron, we were actually looking at something very similar to logistic regression, and what we have here with z, z is just going to be equal to that intercept term, plus the sum of each one of the different inputs, multiplied by their respective weights, which we've expanded out here. Our neuron is then simply just a unit of logistic regression where we have the different weights that we learn are just the co-efficients for logistic regression, the inputs are the different variables that we have here, and the bias term is that constant term. It all relates back to our basic logistic regression. Because logistic regression and our neural network in a way can accomplish the same task if we're trying to accomplish classification, we want to ensure that when we move to neural network that we actually need a more complex mode, that we don't just need this single unit, but we need multiple units and perhaps multiple layers, and that's when we would switch over to neural networks. The trade-off being that you may be able to come up with a more complex boundary with neural networks, but you'll lose a lot of the explanatory value that you have with logistic regression. What we have here is going to be a sigmoid function which we use for logistic regression, as well as our activation here when we talked about the neuron and the output for the neural network. What our sigmoid function will do, will take that linear combination and create a non-linear function. As we see here, and we have non-linearity, not a straight line here, and squash those values between 0 and 1, which will be useful as we walk through the different steps of our neural network. Now there are nice properties of the sigmoid function which make it particularly useful when passing it through our neural network. Let's say we set our activation function to the sigmoid function, what's special about it is we will want to take the derivative of it at each step, and we'll see why later on down this course, but in order to take this derivative we'd use the quotient rule, where we take the derivative of the top multiplied by the bottom minus the top multiplied by the derivative of the bottom, all over that bottom squared, and that's going to give us our derivative. Now, if we're trying to take the derivative of the sigmoid function, plugging in each one of the different values we see that the derivative of 1 should be 0. That first value will be 0. Then we multiply that top value, which is just 1, by the derivative of the bottom, which is just going to be negative e to the negative z, and then that's all over 1 plus e to the negative z squared. Then we end up with e to the negative z over 1 plus e to the negative z squared, which we can expand out, adding and subtracting 1 to that top value, and that should make clear that we have 1 plus e to the negative z, minus 1 all over those same values of 1 plus e to the negative z squared. We can then cross out that top and bottom values as we see here, and we end up with rather than squared, 1 over 1 plus e to the negative z, minus 1 over 1 plus e to the negative z squared. Factor that out. We just have factoring out that 1 over 1 plus e to the negative z. We have that multiplied by 1 minus 1 plus e to the negative of z, which is just our original function multiplied by 1 minus our original sigmoid function. Nice step if we want to actually find the derivative along the way, which will be useful as we try to compute our neural network. Now that closes out this video. In the next video we'll introduce a means of actually using this within Python, after focusing in on a single perception and then seeing how each layer will connect to one to the other. I'll see you there.