We have seen that Dutch children are pretty happy on average. Are they all happy or are there differences within the country? I have a hypothesis. I think that the place of residence plays an important role, and that actually, children living in the countryside are happier. They play more outside, have more freedom, get more sunlight, even with the rainy Dutch weather. So the question is, how can I test this hypothesis? To investigate hypotheses or theories, scientists often use hypothesis testing. In this lecture, we will discuss the fundamentals of this procedure. First, we will discuss what the null hypothesis assists and the alternative hypothesis are. Second, we will learn how to decide whether to reject the null hypothesis using data from a sample. A hypothesis is a statement about the world around us. Remember our thoughts about differences in happiness, imagine that if we believe that children who play outside every day are all very happy. To test whether this hypothesis is true, we could do two things. We could find all children in the Netherlands playing outside every day and measure their happiness scores. If all of them score very high, eureka! We would have proven our theory. Nice, but really tedious and difficult, even impossible to implement. Another way of approaching the problem is to find just one child who plays outside every day, but still is not really happy. This would already prove that our hypothesis would be wrong. Not all children playing outside are happy. This simple example illustrates a universal and important fact. It's much easier to find evidence against a hypothesis than proving that it's always true. In fact, Karl Popper, a famous philosopher, said that this idea of falsifiability is the key concept of science. Statistical methods help to apply the idea of falsifiability to test a specific type of hypothesis, called the null hypothesis. The null hypothesis usually states that there is no effect, no relation between two variables, or no difference between groups. Here, we could formulate our null hypothesis saying that the average happiness of children living in the countryside is not different than the average happiness score of the country. The alternative hypothesis is the opposite of the null hypothesis. It is usually the initial theory under investigation. In our example, we could formulate the alternative hypothesis saying that the average happiness score of children living in the countryside is different than the whole country average. So how does hypothesis testing work? Suppose that we want to test the hypothesis that the mean happiness score of Dutch children living in the countryside is different from the mean happiness score of the whole country, which is 7.8, and we consider this as a fixed reference value. For that, we collect a representative sample of 60 children living in rural areas in the Netherlands and calculate their mean happiness score. We obtained 8.1, higher than the Dutch average. Before concluding that the mean level of happiness in the countryside is indeed different from the Dutch average, we need to consider whether the observed difference could be by chance alone due to sampling variation. The main idea of hypothesis testing is to calculate the probability that sampling variation alone explains the observed difference. More specifically, we calculate, assuming that the null hypothesis is true, the probability of finding a difference like the one we found, or even larger. This probability is called the p-value. The p-value is used as a measure of strength of evidence against the null hypothesis. The smaller the p-value, the stronger the evidence against the null hypothesis. Let's come back to our example. We calculate the p-value using the sampling distribution of the mean happiness score, assuming that the null hypothesis is true. In such case, the sampling distribution would follow a normal distribution. With mean 7.8, the population mean, and a spread given by the standard error. Using the properties of the normal distribution, we can now calculate the probability associated with our sample mean of 8.1 in rural areas. Usually we consider both sides of the mean. The same distance from the population mean to the other side gives 7.5. The proportion of the area under the curve outside these two boundaries is the p-value, 0.25. This is not a small probability, so the data does not provide clear evidence against the null hypothesis. The observed higher happiness score in our sample could well be due to sampling variation and not due to a real difference at the population level, so we could not reject the null hypothesis. Can we instead claim that we proved the null hypothesis? Can we claim that children living in the countryside are just as happy as the average Dutch children? No, we cannot do that. We tried to reject the null hypothesis but failed. We can disprove but never prove the null hypothesis. Absence of evidence is not evidence of absence. It is common practice to interpret the results of a test using a predetermined threshold. Often, if the p-value is lower than 0.05, the null hypothesis is rejected and the difference is called statistically significant. This means that the p-value is small enough to justify rejecting the null hypothesis. This threshold, however, is arbitrary and subject to a lot of debate. We will cover this in the last module of the course. It is important to realize that the p-value depends on the sample size and not only on the difference between groups. In studies with small samples, the sampling distribution is wide. Therefore, it is difficult to reach statistical significance. However, in studies with large sample sizes, the sampling distribution is very narrow. Therefore, small differences can already be statistically significant. In this lecture, we have learned some key aspects of hypothesis testing. We have introduced the null and the alternative. The null hypothesis states that there is no difference or effect, while the alternative hypothesis states that there is one. We have discussed how to do a statistical test. We assume the null hypothesis and we calculate how likely it is to find the actual observed difference, or an even bigger one. This probability, the p-value, is used to state if a difference or effect is significant or not. However, you should remember that the p-value is a combination of effect and sample size. Now we have seen a first example of a hypothesis test. In the next lecture, we will discuss the various tests we need for different types of studies.