So we have a Hardy Weinberg law. We've gotta a measure, a means of describing any departure from how divine big, what's say we have now some data we've got observations on the genotypes for SNP. We have sample proportions for those three genotypes in a sample of some individual summer population. Do those sample proportions suggest that the population we sampled isn't Hardy Weinberg equilibrium? The simplest way to proceed is to estimate the inbreeding coefficient, and there's a very simple expression. The inbreeding coefficient estimate is 1 -- a ratio and the ratios the observed proportion of heterozygotes, divided what we would have expected to see if there was Hardy Weinberg, but we call that called the denominated the expected proportion of heterozygotes. So this number it's very simple, it turns out to be a very good statistical measure, a very good statistical estimate. It turns out it's actually a maximum likelihood, so it has a lot of desirable properties, unfortunately not guaranteeing it's unbiased for large samples this estimator is unbiased. It's on the average, is close to the true value for small samples are not so much. So in order to do the calculation, we start with the genotype proportions and we translate those into the allele proportions, proportion for capital A is the homozygote proportion plus half the heterozygote proportion. Similarly for the B allele, so a very simple means we have an estimate of the inbreeding coefficient. For each SNP now it might be that we have a range of SNPS, typical studies nowadays have a million, and this study we are working on here at the University of Washington has a billion SNPS per individual, and it might be as a starting point that we suppose that every snip has the same inbreeding. Coefficient has the same degree of departure from Hardy Weinberg. How they might use all the SNPS to get an even better rest of it of this common inbreeding coefficient. The estimate has the same functional form, it's just 1 minus the ratio of the observed to the expected heterozygotes. To calculate the observed heterozygosity on the computer, it's convenient to have an equation that makes the computation simple. So what we do is to work with those alleles look scores, the zeros, ones or twos, so each SNP. And for a particular individual individual J will say the score at SNP L is just xjl, so X will be 0, 1 or 2 depending on the number of copies of one of the alleles, typically the reference allele. If we take the product of X * 2 -- X, that product is 1 only for heterozygotes. So if we add those products over all the SNPS and all the individuals, we get the total number of homes of heterozygotes in the data. The expected proportion similarly to the one SNP case we take the sample allele frequencies and I've called those P tilde for the F SNP times 1 -- P tilde times 2 and add up over SNPS, so a very simple calculation. If we wish to use more than one SNP to get the inbreeding coefficient estimate. We have some data, genotype proportions, we translate those two allele proportions. We estimate the inbreeding coefficient and now the question is does our estimates suggest that the population value was 0 even if the population was strictly in Hardy? Weinberg equilibrium, and if it was 0, it's very unlikely any estimate will be exactly 0. The question is how different from 0 can it be? And it turns out that because the estimate if hat as maximum likelihood, it's approximately normally distributed and that allows us to square it, multiply by the sample size and end up with a statistic that has a chi square distribution with one degree of freedom. Now the consequences of all that is simply to say if we square estimate multiplied by the sample size, and we get a number bigger than 38.4, we have an unusual number. We have a number we would expect to find only 5% of the time if there really was Hardy Weinberg equilibrium. So conventionally, although for no particularly good reason, we adopt 5% as our standard significance level for testing Hardy Weinberg at a single step. So the chi square distribution enables us to translate the data via the inbreeding coefficient estimate into some measure of departure for Mattie Weinberg, and in particular we can calculate the P value. The P value is the probability of finding the data, the data we saw if there was Hardy Weinberg, if that P value was small, the data are unusual. If this Hardy Weinberg, and we would tend to reject the hypothesis of Hardy Weinberg. So that's a nice consequences of the chi square distribution. Strictly speaking, it's the central Chi Square distribution, is the distribution when the hypothesis is true. If there is not Hardy Weinberg, then the test statistic squared times in has a different distribution called the non central chi square. It's obviously it's part of the same family of distributions, but this one more quantity, one more parameter to describe that distribution. The parameters is called the non centrality, it has the same form as the test statistic is simply the sample size times the square of the parametric, the true value of F. So Lammed apparently non centrality is NF squared and there's theory, there are tables and there's now functions in our another packages that tell us immediately what the power is and to help us interpret the results. So a lot easier to put all this into context, so let's suppose in a very simple example, 100 individuals have been sampled from a population, and among those 100, there are 60 homozygotes for one of the alleles capital A, it'll a 30 heterozygotes and 10 of the other homozygote. The estimated inbreeding coefficient is 1 -- 30, or the proportion 0.3 divided by twice the product of the allele proportions, and those were 0.75 and 0.25.. The inbreeding coefficient is 0.2 has actually quite a high number for human population, anyway, there it is. Is that a big number? So to answer that question, we square it 0.04 multiplied by the sample size of 100 and we end up with 4. So our test statistic is 4 and that is indeed bigger than 3.84., so it's significant at the 5% level and we would say those data cause us to reject Hardy Weinberg for that population. We can be a little more precise instead of saying that the data have a P value less than 0.05., we can get the P value exactly with a little r command using the p chi square function. So we need to put in the test statistic value the 4, the degrees of freedom in this case 1, and then we have left out the non centrality because we don't need it under the null distribution. So 1 minus the quantity is 0.045., so the P values 0.045 which is indeed just a little bit less than 0.05. The sample size is small at 100, if we had 10 times as much data, 1000 individuals in the data, the test statistics and the proportions were the same that test statistic would be 10 times bigger, that would be 40, and then the P value drops down to 2 * 10 to the minus 10. It's a tiny number and we would certainly be happy to reject Hardy Weinberg if we had such a large estimate of 0.2.