In this example, I want to look at a simple ANOVA example. It's a one-way ANOVA and the dataset has to be Insect Sprays. So this dataset, Insect Sprays, it's built into R and there are a couple of columns in this dataset that we're going to look at. I'm going to present the slides first, show you how to calculate the one-way ANOVA manually, and then we'll look at the R example, and you'll see that it's very easy to implement in R. So the datasets are on R. This dataset has two columns, one is count variables of the number of bugs that they've counted and there are six types of pesticides A, B, C, D, E, and F. Then within each type of insecticide spray, there are 12 numbers. So you can imagine a giant plot that they've divided up into six subplots and then within each subplot they've measured or sprayed each of the different chemicals, and then when they took measurements, they took 12 measurements from A, 12 measurements from B, 12 measurements from C etc. We want to test whether or not there was any difference in these place. So below are some of the examples of their data. So if we look at spray A has a count of 10, 7, and 20, and then there's 12 of them. Spray B; 11, 17, 21 etc. So recall that the ANOVA test we're testing to see if the group means are the same. So that's our null hypothesis. So the group mean of A, is the same as the group mean of B, is the same as the group mean of. The group mean of A is the same as B is the same as C, D, E and F. Then the alternative hypothesis is that at least two of the group means are different. At least two. So at least two are different, y_i indicates the mean of insecticide i, recall i ranges from A, B, C, D, E, F. We're going to use a significance level of 0.05. That's something you'll always want to communicate when presenting a report. In this case, the sample sizes for each of the group is 12 and that's denoted n sub A is the same as n sub B is the same as yada yada and sadaf. So all the group sizes are the same, but my total number of data points collected is 72 and k is the number of groups, and that's six. So make sure you have that string in your head. I have six groups,12 items within each group, and there are 72 items altogether. Here are the sum of squares calculations. Firstly, we want to calculate the grand mean. There it is. That's you take for all i's and j's of every element, you add them up, and then you divide by n or the total number of observations you have. In this case, it happens to be 75. Then within each group, you're going to want to calculate the group mean. So notice that the difference between this formula up here and one of these group mean formulas is that we're holding the group fixed by denoting A, and then just going through every element in group A, that's what this notation is telling me here and then you divide by n. We know in this case n is constant. If there were not constant then you would have to account for that. So we have y_A or the group mean A is 14.5, B is 15.33, and you'd want to do that for each of the groups. Your sum square total, recall is the distance between your group mean y_i and your grand mean denoted y dot dot. y_i means for group i and all the j's. That's what the little dot means. Then we have 12 in each sample. So there's the formula. So y, this is group A, this is group B, this one here is 2.08 minus 9.5, that's C and you do the rest of them. So I hope you can see that here. Let me point that out with a pen. Is 14.5 here? Came from this number here. So that's the group mean minus your grand mean. Here's the next group mean minus your grand mean and so on and so forth. You take the difference between the group mean and the grand mean, you square that, you add them all up together, and then you multiply by n. Here is the sum of squared error and this goes from the actual data point. So y_ij is group i, the j'th element. Then here's the group mean. So here the group means that I calculated for A, B, and C, 14.5, 15.33, and you can see them here, 14.5, 14.5 etc. The first element in group A is 10, the second element is 7, and so you just have to calculate this long equation. So then we want to now calculate the mean squares for the treatment and the mean squares for the error terms. So sum of squares total minus k minus one, those are the number of groups minus one, and then you get this value. You're calculating the mean of the sum of squared errors over the total number elements minus the number of groups there, and you get this 15.38. So that's your mean squared error. Then your F statistic is the ratio of those two values. You get 34.7 and recall if the number is way out to the right, you have a smaller p-value, and you are going to need to know your degrees of freedom to look it up. The the F distribution does change shape based on these two parameters df1 and df2. But in this case, we got a p-value that's significantly less than 0.05. So we can reject the null hypothesis, that two of the six means, group means are the same. So this is what a table would look like in a standard ANOVA table. Between group, that's your means, your treatment, your sum of squares treatment, here's your mean square treatment, and when you see a table, you can eyeball this the sum of squares minus your degrees of freedom should equal your mean square, and here's your F statistic, and here's your within group values. There you go. Then in this particular case, we get this following table. So let me show you how you run that in R.