Okay, so we're here in the R environment, and InsectSprays is a preloaded data set in R, so we can just attach that. And that brings the data set from the R libraries and puts it into our environment. But we might want to get a handle on this data set, or an understanding of this data set. Two common commands you can use, one is dimensions. So it has 72 rows and 2 columns, dimensions 72, 2. And remember, it's always rows and columns, rows and columns. And then this structure command, which tells us a little bit about the structure of the data set. The first column is named count, the second column is named spray, and these are the actual values. If we, InsectSprays, we can take a look at that data set. And you can see, here's row 1, it's from insecticide A, and that count of 10 bugs in this sample, etc, etc, and we know that there are 12 for each of these samples. Okay, so let's go back to the code, so this next line of code, tApply, this is one of the suite of apply functions in R. I know I did not discuss this in the tutorial videos, but let me just explain to you what it's doing. Recall that the column names of InsectSprays are count and spray, and the length command gets the length. So we have dimensions, we showed that earlier, InsectSprays, Down below, and you can see it's 72 by 2. I showed you the structure command, I should just run it up here. There's the structure command, and it's got count. And then we can use this length command, length(InsectSprays$count), and it's 72, so that's the length of that column. I can also do length of spray, S-P-R-A-Y, that should also be 72. What this tApply command is doing is count, what is the length of count if I separate it out by each of the values of spray? So I'm going to count how many As there are, how many spray type B, spray type C, etc, etc. So when I run this tApply command, it's x, y, and then the function, I get 12 in each category. So now I know that there are 12 elements in each group. Next, let's do a little visualization, there's our box plot. The lower line is the min, the max, the dark line in the middle is the median. And the box represents the quartile ranges, 25 and 75%, and let me expand that. Just by looking at this graph on the right, you can see that some of the pesticide had a better effect than others. Clearly, C, D, and E have lower counts, which means the insecticide was strong, good, worked. Whereas these are up here, so they were not as effective. So that's the box plot, and let's run a one-way ANOVA. Here's the command, aov, and we're interested in count as our response variable, and the type of spray is our treatment, so that we put that as an independent variable, data = InsectSprays. And here's your standard R technique of running your function and putting the results in the function in some variable name. Here, I called it anova, I could have called it anything I wanted to. And then I use the summary command to actually see the results. So I run that, and then now I want to see the results, and I use this summary command. And there you have it, let me clear this bottom thing, clear that so it's easier to see, run summary again. So you can see the degrees of freedom for the spray is five, right, so that's the number of groups minus 1. The sum of squares for the treatment is 2669, if you divide 2669 by 5, you should get 533. And in the bottom part of the ratio of the F statistic, it's this number, 1015 minus 66, 66 is 72 minus 6, the number of groups, and if you divide 1015 by 66, you should get 15.4. This is your actual value of the T statistic, but really, this is the key number here, that it's less than 0.05. So we know that at least two of the group means, we can reject the null hypothesis that all the means are the same, and at least two are different. I wanted to show you some code on how to calculate these numbers manually. Right, so in practice, you'll just do this and you should know how to read a one-way ANOVA table. But I did show you some formulas in the PowerPoint slide, so I thought we'd do this manually. n is the number of observations, and that should be 72, so n is 72. Number of observations per group, we saw that before, I'm just going to hard code it as a 12. k is the length of unique sprays, right, so I know it goes A, B, C, D, E, and F. But here I get the unique ones, and then I count the number of unique ones. So if you just run this portion of the command, you get a vector that looks like this. And then what's the length of this vector, it should be 6, and you can see that up here, k is equal to 6. Here's the grand mean, and that's just the mean of all the data in this data set. So here's InsectSprays$count, we take the mean, and that's now the grand mean. And now I want to get the mean within each group, so I use this tApply command. Here's the function mean, I'm using that function, and I'm going to get the mean of count, and I'm going to break it down by the type of spray. So this gives me six group means, boom, and there they are. We can take a look at them, group_mean, so we get group A is 14.5, 15.33, 2.08, etc. Okay, and the sum of squares for the treatment is n times the group mean minus the grand mean squared, there they are. And then to get the mean squared of the treatment is just the sum of square treatment minus k minus 1, there you have it. And then the sum of squared errors is similar to that. The only weird thing here in this implementation here is this part, where I repeat the group mean 12 times, For each of the elements. I'm sure there's a more elegant way to do that, but it worked. And then I get the mean squared error, which is the sum of squared error minus n minus k, there you have it. The F statistic is just the ratio of the two, and then I'm not going to go through this command in too much detail. But this basically looks up the value of the F statistic, it looks up the p-value based on that F statistic. For those of you who might have seen something like this before in a past, more traditional statistics class, this is where you would calculate the F statistic, and then go to the back of the book and look up the p-value in the index on those tables. But R can just do that for you. Actually, it's not too bad, it's just pf, and then that's your actual F statistic that you're looking up, and your two degrees of freedom. And we can do that, Oops, I didn't run this code. And so we have a p-value of 3.18 times 10 to the minus 17. So it's very, very, very, very small, and that should line up with the summary ANOVA command that I showed you earlier. And that wraps it up for a one-way ANOVA and how to read a table. In practice, it's this line here, up to line 22, that you'll need to be able to do. And next, we'll talk about two-way ANOVAs.