In the previous lesson, we studied the case where the instrument Z was binary and the intermediate outcome M was continuous. In this lesson, we're going to do something a little different. We're going to have M also be binary, and we're going to make the exclusion restriction assumption again. Remember, that's when you have value M on the mediating variable, it doesn't matter if that comes from the treatment group or the control group, so that treatment itself operates through the mediator, and so that's going to allow us to write the extended version of the potential outcome which depends on both Z and the outcome Mi of Z. We can just drop the Z, because it really doesn't matter. For Z equals zero and Z equals one, it's the same. Now, the first thing we're going to do is introduce so-called moment based estimation, and we're going to assume that treatment assignment is unconditionally unconfounded, and I've written down what that means below. So, that makes sense, and the other thing we're going to assume, is that the average effect of the instrument on M is non zero. In other words, the instrument does hit the mediator have an effect on the mediator. So, that's what we're starting out with. Now, we're going to use the unconfoundedness assumption first to look at torr C which is the overall effect, overall average treatment effect or sometimes we've called it the intent to treat estimate. It's also been called the total effect back there in the some of the previous modules. Anyways, we should be familiar with that at this point, and so by the unconfoundedness assumption, we get this. So, using the exclusion restriction, we break down the total effect and so it breaks down into the average treatment effect in these two subgroups. The subgroup where Mi zero is zero and Mi one is one. In other words, one is doing what they're supposed to do under the treatment assignment and the average treatment effect amongst the folks that do exactly the opposite of what they're supposed to do and the other two. Well, by the exclusion restriction, the Y ones minus the y zeros are zero, so they go away. So now, one of the ways in which this has been studied is in the context of compliance, and so if you have an experiment whether is noncompliance and the Mi denotes whether or not you receive the treatment, it's typical to assume that the probability that when assigned to the control group, you take the treatment, and when assigned to the treatment group, you don't. We assume that that's zero. It's also called the monotonicity assumption or an assumption as so-called no "defiers". Now, in this case, this is actually an identification condition, but it's quite reasonable in many cases. So, when you have this identification condition, so the expected value, well the average treatment effect of Z on M reduces to this probability that M zero is zero and M one is one. So, under the additional assumption that we made above, I no defiers, we obtain the so-called IV or instrumental variable estimand which I've written below, and here that takes on this special form of, and you can just see it from the arithmetic above. You can do that here. That's just the average treatment effect amongst the sub population of folks who basically do what they're supposed to do under the treatment assignment. In other words, when they're assigned to the control group, they don't get treatment and when they're assigned to the treatment, they take up the treatment. So, usually we're used to dealing with a lot of assumptions that are untestable, but the assumption that Z affects M is testable. Now, the estimand that we have is often called the complier average causal effect because we're looking at this special subpopulation with the M zero is zero and the M one is one and these are the folks that do what they're supposed to do. We can call them compliers. They are latent subpopulation and we can't really see the compliers. But it's reasonable to call this subpopulation latent even to call them compilers. So, for this reason, we sometimes call this estimator in the complier average causal effect and it's sometimes also called the local average treatment effect. So, now if we go ahead and assume the observations are independent and identically distributed, the IV estimand can be estimated using the sample averages. So, I think that's pretty straightforward. So, that's the basics, very basics. So more generally, you might wish to think about variation in the causal average complier effect within levels of covariates x. Or it may be this case the confoundedness which we assumed above may not hold unconditional. You might need to condition on some covariates. So now, for the case where x is discrete, the sample means within co-vary classes can be used. So, you just partitioning into covariate classes and doing the same thing. But of course, if x is continuous or there are many levels of x, we're going to need to introduce some modeling assumptions. Standing back and looking at this case, well I'm just going to start calling it case. It's been argued that for a given intervention Z, the policy relevant question is the overall effect which is sometimes called the intent to treat or ITT estimator. I think I've used that before of the intervention, not the effect for some subset of units who will comply with whatever assignment they're given. So, for example, if Z is a medical treatment that has to be taken in an unsupervised environment say at home, the practical effectiveness of the treatment is the effect in the context in which the treatment is delivered. Furthermore, the case is not even the average effect for all those who would take up treatment if it were offered, as the experiment is not informative about units who take up treatment regardless of their assignment. In addition, even if you wanted to target the treatment towards the units who might benefit from being offered treatment. The compliers are a subpopulation that cannot be observed because we cannot observe both potential outcomes M zero and M one. These are some arguments against the utility of the case as an estimand in a real practical context. Now, on the other hand, people have argued that cases sometimes more broadly representative of the population. So, let's look at that. So, when there are no defiers, the average treatment effect is a mixture of the average treatment effects among three subgroups, the compliers, the always takers, and then never takers. Units with M zero equals M one equals zero, and the always takers M zero equals M one equals one. So, if the average treatment effect is the same for compliers and always takers, this gives the effective treatment on the treated, and if this extends also to the never takers, the complier average causal effect is also the average treatment effect. But of course, you'd have to make arguments for why that would be the case. All right, substantive arguments, and they really be tantamount to making assumptions. Because you can't really know if it's true or not. That said. Unfortunately, it's pretty common to see empirical work where the ATT or the ATE seems to be the parameter substantive interests. But where the case is estimated and extrapolated without further argument or consideration to the parameter of interest, that is you're estimating the case and some folks at doing empirical work seem to forget that that's not the ATT or the ATE which is what they're really interested in. So, if the compliers are a large majority of the population, that problem may be not too bad. But let's look at the Angrist, Imbens, Rubin paper, great paper, and the question of interests they start out with is the effect of military service on excess civilian mortality, and then we find out that the compliers are about 16 percent of the population. So, they give you the complier average causal effect. Now, in this case, it might be reasonable to believe that never takers are unhealthier than complies or always takers and that the mortality rate would have been higher had it been possible to take into account the mortality of this subpopulation. So, it applies clearly to that 16 percent, but how representative is that 16 plercent of all the other folks? Now, to take confounders into consideration, unless these are discrete and with few levels, you can't proceed non-parametrically as we'd been doing above. So, now let's suppose we have a random sample, and let's suppose that the unconfoundedness assumption holds conditional on x. So, let this f of M of Z equals M Y given X denote the observable conditional distribution of the outcome Y among subjects with the response M. Remember M is binary now as is Z. So, therefore such distributions, you can write them out. Now, if you use the monotinicity condition which is M one greater than or equal to M zero. Remember that's no defiers. That implies its subjects with M zero equals one or always takers, and thus if we see a Z is zero and M zero is one, we know we're seeing in always taker. So, that's why I've relabeled that as FA for always taker, and zero for Z equals zero Y given X. So similarly, subjects with M one equals zero are never takers by the monotinicity and F M one equals zero is I've relabeled that F N one. That's the distribution of the never takers under treatment assignment to treatment. The remaining distributions. If I see an M zero equal to zero, when you're assigned the control group and I see that you actually don't take up the treatment. Well, you could be a complier. You could also be a never taker. So, it's a mixture. Similarly, if I see that when you're assigned to take up treatment and you do take up treatment, I don't really know if you're complier or whether you're an always taker. That distribution is a mixture of those guys. The always taker distributions under the control group and that's when you're sign not to have treatment and the distribution of never takers when assigned to have treatment and the mixture probabilities, and distributions you can identify those guys in some instances under additional assumptions. So, as an example, suppose the outcome Y is continuous and the mixture distributions are assumed to be mixtures of normals with these means. So, Mu c zero X would be conditional on X. The mean for the compliers under no treatment, the mean for the compliers under treatment X et cetera, and we might assume a common variant Sigma squared, and a thing that's been done in the literature is that the mixing probabilities follow multinomial logit model, and you can see there's only two of them there because the other one by default is the probability of being a never taker covariates X. The assumption of a common variants you can relax, but it can create some difficulties. It's general phenomenon. So, an assumption that was commonly made in these analyses is that I just described is that the distributions and never takers are always the same for Z equals zero and Z equals one, which is basically a stochastic version of the exclusion restriction that we made in conjunction with the discussion of instrumental variables. But it is important to note that this assumption doesn't require the potential outcomes Y Zm as when we studied mediation. They don't need to be well defined. So, one of the earliest papers in this genre, very nice paper was written by Little and Yau in 1998 that appeared in the Journal of the American Statistical Association. Here, there are no always takers because you really can't access treatment outside of the treatment group, and so the analysis reduces a little bit and that's a fairly common thing in actual experiments.