Okay so let's talk about the subject of prediction. Imagine that we have a y variable and an x variable and we have fit a linear model either via Lee squares or more likely with some distributional assumptions to obtain Beta hat which is x transpose x inverse x transpose y. We use transpose same there. Okay. What would be our obvious estimate for the expected value of y, let's say at a particular value of x, okay? So the expected value of y at a given value of x. And so, this is a vector y. Now let's say expect the value of a particular value y naught of a scalar version of y at the particular value x naught, will that expected value is x naught transpose beta. Okay? And so our estimate of that clearly, is going to be x naught transpose beta hat. So we can create a confidence interval for this prediction very easily using the tools that we've developed so far because this is, again, just a linear contrast. Okay, and of the betas which we've already even covered how to create a confidence interval for that. So we know that the variance of this, so we'll call that Y hat naught, okay. So the variance of Y hat naught is equal to X naught transpose variance of beta hat X naught which is equal to X naught transpose (X transpose X) inverse X naught sigma squared. Okay. And we know we can generate our T interval like we've done in some the previous lectures and so that we can get our interval estimate, our interval is going to be y hat naught plus or minus a t 1-, Alpha over 2, and n-p degrees of freedom times s, our residual variance estimate, which we're going to use for that, times the square root of x naught transpose (x transpose x) inverse x naught, okay? So that is, say, if we have a linear regression What that is, is a confidence interval for the line at a given value of x naught, okay. But that's not the entire story here about prediction intervals because this talks about how well we've estimated the line, okay. So if we think about our diamond prices for example, it talks about how well we've estimated the average cost of a diamond for that particular weight or that particular mass. But if you're selling a diamond, you might be interested in knowing, okay not if I collected all the diamonds of this particular mass and took the average price that they were valued at. Not that, but if I were to sell this particular diamond, what's the range of possible values? That would be reasonable as a price for this diamond and that's a different thing, so there's a difference in this context between a confidence interval for the mean value, in other words the value of the line or the plane or whatever, at that particular collection of X values versus a prediction that incorporates the uncertainty that is included in the Ys themselves, okay. So imagine we want to predict Y naught, which is the price of this diamond for this particular mass, where we haven't actually observed the Y at this particular value of X naught. Think of it as a new value of Y. Well think about the the quantity y naught- x naught beta-hat, okay? That's the difference between our actual y naught at that particular value of x naught, the new realized value of y and what we would predict at this value of x naught, where, not beta naught, just beta. Where again our beta-hat hasn't used this y naught in its calculation, okay? So now the variance of this is now the variance of y naught + the variance of, let's say y naught hat. Okay, and I can move that variance across that sum again, because this beta hat didn't involve that y naught. This potential new value of y naught in its calculations, so they're independent. Well this variance of Y-naught is sigma squared plus the variance of Y-hat, we just did that a second ago, that sigma squared, x-naught, x transpose x inverse, x-naught. There should be a transpose there, okay? So if I wanted to estimate this variance it would be sigma squared times 1 plus, x naught transpose. X transpose x inverse x naught. And then what I'm going to ask you to do for homework because it should be old hat for you now, is to prove to yourself that y naught minus x beta-hat over S square root, 1 plus X naught transpose, X transpose X, inverse X naught, follows the T-distribution, with N minus P degrees of freedom. And so we can calculate the probability that say a T quantile, the alpha over 2 t quantile, with n minus t degrees of freedom n minus p degrees of freedom, is less than or equal to y naught minus x. Beta hat over S square root 1+x naught transpose (x transpose x) inverse x naught is less than or equal to the t 1-alpha over 2 upper quantile. That should be equal to 1-alpha, in other words, we're looking at our t distribution, we're looking at the probability that it's, If we put alpha over 2 there and alpha over 2 of the mass there. The probability that our statistic lies in between those two cut offs should be equal to 1 minus alpha. And we can rearrange that to make the probability statement the probability that y naught is in the interval, X naught beta-hat, plus or minus t 1 minus alpha over 2, s times square root 1 plus x naught transpose x, transpose X. Inverse X naught. The probability that y naught lies in there is 1 minus alpha. Okay. So this quantity, X naught beta-hat plus or minus the T quantile S times square root 1 + is our prediction interval. Notice, it's exactly the same, Plus or minus t 1 alpha over 2. Notice, it's exactly the same as our confidence interval for the mean of the regression surface at that point, the only difference being no 1 plus. Okay so what is that extra 1? Well that extra 1 is from, if we look back it's from the variance of this y naught term right here. And it makes total sense that we would have to have that extra 1 in there if we want for our prediction interval versus a confidence interval for the mean. Okay? So if we want a confidence interval for the mean of the regression surface. If we collect an infinite amount of data, that confidence interval should get narrower and narrower. It should always limit to the exactly the mean price. So if we collected all the diamonds in the world of that specific weight, we should have a very good estimate of what the line should be like at that particular point. On the other hand, if we want to know what are the potential set of prices we could get for this specific new diamond, that we're trying to sell. Okay. There's some intrinsic variability that no matter how well we estimate the line, would be surrounding the line. Okay? And that's why this one part doesn't go away no matter how much data, how much better we estimate the line, it still is there. It's sort of an intrinsic variability. So as we collect an infinite amount of data, this part converges to sigma, and this part will converge to 0. And so that one part will just stay there. And then just represent the natural variability around the line. Okay? So that's the distinction between a prediction interval and a confidence interval. So, a prediction interval, by the way, is not a confidence interval because if you look at the actual probability statement we used, the quantity we're saying is in the interval is a random quantity. So it's not a confidence interval but we derive it in kind of the same way. And now at this point in the class, I think you should be able to derive the prediction interval and the confidence interval that's why I'm going over this kind of glossing over it a little bit. But I just want to make sure that everyone understands the distinction between the two and why it is there's this fixed quantity in the prediction interval. Okay, it's because of this natural variability that exists around the regression line or we're ground the regression surface that doesn't go away if what you want to estimate is what are the potential likely values for a response at that given value of x, or that given collection of values of x.