In this last ANOVA video, I'm going to talk about a two-way ANOVA. We're going to talk about this data set of teeth growth. So what is the two-way ANOVA? Recalling the one-way ANOVA example, we compare the variance of group means considering only one independent variable. So in the insect spray example, we only talked about spray as the independent variable and there were six treatments A through F for each of the different types of spray. In two-way ANOVA, we can split into more than just one independent variable. We can have two independent variables. It's a study whether there is an interaction between 200 independent variables on the response variable. The response variable is our variable of interest. You can think of it as the dependent variable. The independent variables are two treatments. For example, we can study whether a spray type and spray dosage will affect the kind of insects through a two-way ANOVA. So before we just assumed that there was an equal amount of spray, A through F on each of the groups, but when now we can also vary the dose. That might be an example of a two-way ANOVA. So what are the assumptions of a two-way ANOVA. First, all the observations are independent of each other so the existence of one observation doesn't impact the existence of another observation. The dependent variable is some sort of a continuous variable. The dependent variable also is distributed normal for every combination of groups of the two independent variables. They also have each of the two independent variables consists of two or more categorical groups. So within a independent variable, say bug spray, there's at least two or more categorical groups. So in that one-way ANOVA we have bug spray and the groups were A, B, C through F. The variance for every combination of groups are homogeneous, meaning that they're the same. There are no significant outliers which you could do look out through visual inspection. I think it helps to look at a table of data. So we're going to look at the SaratogaHouses dataset that we looked at before in AB testing. Here we're going to look at heating type. So that's one of our independent variables and sewer type. Our outcome variable or dependent variable is the housing price. So we're going to look at heating type and sewer type and see if those two categories of variables have an effect on housing prices. So here's a table. Heating type can take three possible values. Here they are. You can have hot air as a heating system. You can have a hot water or steam type system or an electric system. So those are three types of heating systems. Then across the top here, you have the type of sewer on that piece of property whether you had a septic tank, whether it's using a public or some sort of commercial sewage system or if there's no sewage system at all. Each cell has a price. So if we assume the two variables are independent, we can use this command here. It's that same AOV command. Now this is a two-way ANOVA. So price on heating and sewer. We can fit this additive model that's the results are shown here. We're going to see this in R in just a moment. But the key thing to notice here is the sum of squares for each of the treatments. Here's the Mean Square. Recall in each category heating and sewer there's three types. So the degrees of freedom for each group minus one is two. If you take this sum of squares divided by the degrees of freedom, you get the mean squared. Here are the F values. Here the P values and these are the key numbers that you're going to want to look at here. Both of these are less than 0.05 which is our critical value. So we know that both the heating type and the sewer type will ultimately affect the housing price. If we assume or if we model an interaction effect between the heating and sewer data, we use this star instead of this additive plus. What that does is it looks at heating as one item, the type of sewer as one item and then the interaction effect. The way I sort of had to think about interaction effects when I was studying ANOVA for the first time. I like the idea or metaphor of a medical experiment where you had two drugs, A and B. You can take drug A what's the effect of drug A? You can take drug B, what's the effect of drug B? But if you took them both at the same time, some of your benefits might be from drug A some of the benefits might be from drug B. Then there's some interaction effect A and B together. They're both working at the same time. What's the interaction effect? An interaction effect can run both ways, positive and negative. So by looking at this interaction effect, by looking at the combination, we can see what the P values are. Again, I look at this last column and I can see that the interaction term is significant, it's less than 0.05. So the interaction effect also impacts the housing price. So let's look at an example in our studio. So here we are in the RStudio environment and I'm going to load the SaratogaHouses dataset. That's the same dataset that we saw before. We can look at the first few elements, there they are. I double click on the value up here on the top right, this is the same dataset. For this analysis we're going to look at sewage type here and heating type here, these two columns. Recall the head command gives us the first number of elements. Looks like six elements. Let's look at the sample sizes. So here's a table of the heating. You can see it on the bottom right of my screen. So here's the heating type and here's the sewage type and here are the count variables. So there are 319 hot houses with hot air heating system and a septic tank versus 791 houses with hot air heating and uses public sewage and 11 houses with hot air heating and no sewage system. We can look at each of those elements. So this table command is useful table. Then here your two variables of interest. Now, if we assume that the two variables are independent, that is sewage and heating, they're not related. We can use this additive model. Here it is. So I'm going to call this anova2 and run that. Notice here it's heating and sewer. Then to get the output I'm going to do the summary command. Here are the values. We can see that heating is an important component of determining the price of a house along with the type of sewer. If we wanted to have an interaction effect, we use this A of B command but we build the model this way. The difference here is that I use the asterix to indicate that I want to include the interaction term. I'm going to run that, put the results in anova3, summary of anova3. You can see that all three are now significant. Heating, the type of heating is significant, the type of sewer used is significant as well as the interaction effect. There are significant at the 0.05 level. Finally anova4. This is essentially the same as this notation here. But there's heating, sewer and then I've explicitly put in the interaction effect using the column of these two variables. This notation is useful to use if you have maybe more than two independent variables. So you have three independent variables, you're going to have more than this one interaction term. You're going to have interaction term with, say you have three independent variables A B and C, you're going to have the interaction of A and B interaction at A and C and the interaction of B and C. If you use this multiplicative model, you'll have all the interaction terms. Here you can explicitly drop interaction terms if you have some theoretical reason for doing so. So that's why this notation is useful.