So we get to the Z-distribution. Now, in order for us to know, or to use the Z-distribution at least, we should know the population standard deviation and in this example population mean. Now, most of the time we don't know that we only know the mean and the standard deviation for our small sample. But imagine we knew that, then we can use the Z-distribution. It's actually easy to explain. Now, I'm going to use an example which is not set in reality at all. It's just to illustrate the point, so. Follow along. We have a hormone level and we measure it in the blood. Imagine, we always did it by some technique. All the machines that were ever built to do that measurement used the same technique. And every now and then we've got to calibrate that machine. So we'll take a little sample that was made in a factory and so it doesn't come from anyone's blood, and it has a value of exactly three in it. So, if we were to put it in the machine, the machine must measure three. And so every machine that's ever been made will measure it three, as a mean, for that reagent. Just to say that we know the machine works. And we know that it would always give a standard deviation of about 1.44. So imagine we knew this absolutely. That's a whole population we know that it has a mean of three and a standard deviation of 1.44. Now someone invents a new technique, a new method for this machine to work. And we want to know, if we took that reagent that we know has a value of three in it and we added these new machines that use the new technique, what was the probability of finding a mean for these new machines from a sample of 25? Of more than three and half, more than 3.5. Now you might say, well 3.5 it should have given three, how likely was it from a sample of 25, we were to find a mean of three and a half or more. What was the P value? What was the probability of finding that? Now, Remember that this 3.5 though in our sample of 25 is just one of many possible values we could have gotten. We could have taken 25 other machines or repeated this experiment later and we would get another mean of a 3.5 or some other means. So, this one single 3.5 is just somewhere on a curve of many, many possible means. That means we can construct this Z-distribution. This curve. And we can ask, how many standard errors away from the mean, 3.5 would represent. Remember we said we can't talk about standard deviations anymore. We gotta ask ourselves how many standard errors away from the mean. This 3.5 would be. So, first of all we've got to convert Standard deviation. We knew that from the population having the 1.44. We just have to change that into units of standard error. So that is going to be, 1.44 divided by the square root of 25 because our example had 25 in it and we see we changed it to 0.288, that would now present one standard error away from the mean, one standard error away from the mean. Now, we change the question ever so slightly. We just want to know how many standard errors, we know what one standard error is now. It's 0.288. How many standard errors away from the mean is 3.5 away from 3? So I've just got to change that difference, which is a half, 3.5 minus 3 is a half. I've got to change that into units of standard deviation. How many standard errors away from the mean is 3.5 away from 3. So that will be a half divided by the standard error, divided by 0.288. And we see between 3 and 3.5 that half a distance is 1.74 standard errors away from the mean. So I've just converted my absolute values and I had a mean of 3 and I had a mean of 3.5. How far is 3.5 away from 5 in units of standard error? That is what we do to try and calculate the statistics. Now we got to try to convert that into a P value. Now that can really be done by, computer does it's numerical calculus, numerical analysis at least of integration. Or many of the textbooks on statistics will have tables at the back and you can just do that from these tables. And the way that those tables work, you see the Z table there, down on the side, on left-hand side, you go down to til you find 1.7. And then you go across to the top to find the second decimal which would be 0.04, so that it would at least in the end read 1.74. You just reconstruct 1.74 by finding the first decimal place. The number in the first decimal place, place along the left side, and then the top you go to find your second decimals. And you look down where they cross and you see, well that would represent 0.409. So, 1.74 standard it is away from the mean, which 3.5 is away from 3, would represent an area of 0.04 under the curve. And if 0.05 was our cutoff value, this is a statistical significant finding. It would only from 3.5 out to positive infinity, remember 3.5 is larger than 3 so we're looking at the positive side. It would only represent an area of 0.04, 4% of the area under the curve. Now, we can use that table in a different way. We can also say, what value of standard error would represent an area of 5% 0.05 under the curve. So, we just use that table in reverse. So, we actually look at the actual values inside of the table, and we look for something that is close enough, but not to 0.05. Remember, this graph reads from the left to the right, so we're looking from the right, out to infinity. So we've actually got to ask ourselves, what is the value of 0.95, because we want 95%, we want the mark that represents 95%, because we're only interested in the last 5%. And if we read that table off from the left hand side, we'll find 0.95 the closest to that is going to be 0.9495 in most of those tables, of course the computer will work it out much closer. But it will be about 1.6 and then you look up you see it's 0.04, 1,6 plus 0.04 is 1.64. So if we were to go 1.64 standard errors away from the mean, towards positive infinity, that little area under the curve there would represent 5% of the area under the curve. So, look at this last slide. That is what we have done now. We have the Z-distribution, which is a central limit theorem distribution. And 3.5, a sample mean of 3.5 just fell somewhere on that curve. It occurred, If we had to repeat this over and over again, it would occur with a certain frequency. So the frequency we really think about is from 3.5 towards the right hand side, towards positive infinity. If it was less than, it would be the other side. So you see the little green bar, you see the 1.64. So for this instance, knowing, the population mean of three, knowing the population standard deviation of 1.44, we know that 1.64 would represent an error under the curve towards the right of 5%. If you are 1.64 standard errors away from the mean, towards positive infinity, that would represent 5% error under the curve. Finding any value larger than 1.64. It's placed in terms of the standard error which represents a statistically significant find. It would be [INAUDIBLE] to find that. Now look at that so always do it from the negative infinity side. That would represent 95% of the area under the curve. We're interested in that last 5%. Now look at the value we found, we converted the difference between 3.5 and three, and that is a half, into units of standard error. So we divided that by the 0.288, that gave us 1.74. So, finding a 3.5 was 1.74 standard errors away from the mean. And that fell within that grey area of 5%. Of 0.05, that is a significant finding. It would be rare to find that 3.5. It is statistically significant being less than 0.05 if 0.05 was our cut off value. But this example beautifully illustrated. I think this last slide beautifully illustrates where the P value comes from, we construct a graph which is a sampling distribution, we convert the differences into units of standard error and we see what value which represent any as a five percent and where our difference, or our mean, or our difference in means actually fell, did it fall inside of that very rare five percent, does it really define those kinds of differences? Or did it fall in an area which was more common to find, therefore it really wasn't a significant finding.