Now we can represent our whole data set by a single value. We also to care about how spread those data points are. Are they all bunched up close to the median? Are the spread further away? Now, the first way to describe our spread is really simply by the range. Now, the range just requires two things, the minimum value and the maximum value in our data set. Now if we look again at the Mgelea and colleagues published report on detecting virological failure in the Tanzanian children, you'd see of those 280 children with a mean age of 10.6 years, they had a range of a 1 to 16. So the youngest child is 1. And the oldest child was 16. Just tells us that whole range of values. And we use that when we simply want to describe our dataset. We're not inferring this result, this mean onto a larger population. Just describing what we find here. There's no need to do anything more than just stating the minimum and maximum value. And we call that the range. Now let's move on to quartiles and percentiles. Now quartiles, the word says it all. Quartile, quarter, fall. We get to divide our dataset into four equal number of values. Remember we had the median. So the median we divided the numbers into two sections. Half of the values would be less than, half would be more. Now the quartiles did exactly the same but instead of having just two sets, we now are going to have four. So, the 1st quartile is then going to be a value for which only a quarter of the values are less than three-quarters of the values, irrespective of what they actually are more than that value. 2nd quartile, well that's just going to be the median. Half will be less than, half will be more than. 3rd quartile is simply three quarters less than one quarter mark. We do have a zeros quartile that will actually be the minimum value. And we actually do have a 4th quartile that would be the maximum value. Percentiles, all we do there we just have a bit of finer detail. Instead of just having those four, I could actually just drill down into each integer percentage. First, second, third, that means the 25th percentile is actually the first quartile and the 50th percentile is the second quartile or the median, it's just a bit of finer granularity. So why do we care about quartiles? Well, we've got this very interesting concept called the interquartile range. And this is simple subtraction where you can take the value for the third quartile and subtract from that the value of the first quartile. Let's have a visual representation of that. You can see there on the left hand side all our data values. Imagine that to be all our data values. We have a nice box and whisker plot data. So it'll tell us what the median was, it'll tell us the value of the first and the third quartiles. The inter quartile range is just the value at the third quartile minus the value at the first quartile. And what we do with that value is we multiply by 1.5. And if we add that value to the third quartile, that's where you see the upper whisker. And if we subtracted from the first quartile we'll see the lower whisker. And those are the values we actually care about because values outside of that you can call statistical Outliers. And you can make a case where actually excluding them from the data analysis. When you do that though, you have to report that you are doing it. Let's move on to the very exciting topic of standard deviations. It's actually a beautiful concept and makes so much sense. We've got our mean value, our value that represents all our values by a single value. But we want to know what is the average distance or the average difference all the other values are away from that mean value. So we just take every difference, every difference, every difference to all our other values, and we subtract it from this mean value. And we see what that average is. Now, you can well imagine, some of the values, of course, are going to be less than. So if we do a subtraction, we're going to end up with a negative value. And we can't have that. So what we do is we square all those differences. Now, remember if you have a negative squared you end up with positive. A minus times a minus is a positive. And a positive squared is also positive. So now we can add all of those and we're going to find the final number. But there are, because we squared, we actually call it the variance, and at the end, we've got to take the square root of that which we call the standard deviation. So, now there you have it. We have measures of central tendency, and measures of dispersion. Those are the common ones that you're going to see in the literature. I want to warn you about something. Remember, these are calculations. It's not the whole dataset. We are representing the dataset by some summary and there's a very interesting thing called Anscombe's quartet. Came up with it many years ago and it really shows how these summaries don't tell the full truth. I want you to look at this table. Now you'll see your variables there. You have x1, y1, x2, y2, x3, y3, x4, y4. And if you were to do the calculations, you'll very quickly see that for all the x's, x1, x2, x3, and x4, they have exactly the same mean and exactly the same standard deviation. If you look at the y columns, y1 through y4, those are wildly different numbers. But again, those columns will have exactly the same mean and exactly the same standard deviation. Completely different datasets there, and they will have exactly the same values until you plot them. And that's very important if you're looking at your own data, if you're looking at someone else's data, plot it first. It gives you a much better understanding. Look at that. Simple scatter plots of those four sets of columns, and you can see how differently those things really occur.