Now we come to one of the most famous and powerful theorems in probability theory, Bayes' Theorem. We start with the familiar product rule. You'll notice that if we multiply both sides of this equation by the probability of B. We'll get something that's equivalent, still, due to the product rule, that looks like this. The conditional probability of A given B. Times the marginal probability of B is equal to the joint probability A,B, okay. Now, we can substitute probability of B and A for probability of A and B because we know that they are completely equivalent. And then we can apply the product rule again to probability of B and A in the form that we've written it in, its equivalent. So this would be probability of B given A times probability of A = probability of (A,B), okay? And then we are left with this equivalence, which is Bayes' Theorem. That is Bayes' Theorem. One of the most powerful uses of Bayes' theorem is for something we call inverse probability problems. Inverse probability problems are those where the answer is in the form of the probability. That a certain process with a certain probability parameter is being used to generate the observed data. And I'll give some examples of this so this will make more sense in a moment. But we sometimes use the Greek letter theta to refer to the different parameter that might be causing the data. Or generating the data that we observed. And we, sometimes also, just use Ai. Indicating one of the possible processes that are generating the observed data. They're essentially, for our purposes, equivalent. Okay? So, we're going to look at an example. Which is, let's say that we have two urns. And one has 20% white marbles and one has 10% white marbles. We observed three white marbles in a row being drawn with replacement from one or other of the urns. But we don't know which urn we are observing. What is the probability that we are observing Urn 1 and what is the probability that we are observing Urn 2? These probabilities are the probabilities of the process Urn 1 or the parameter white, probability of white = 0.2. Urn 2 is the process Urn 2 or the parameter that probability of white on each individual draw is equal to 0.1. So in a more conventional forward probability problem. We're interested in the probability of a certain observed outcome given a known process. So, for example, if we know that the urn is Urn 1, that's a known process. Then the probability that we would observe three white marbles in a row would be 0.2 times 0.2 times 0.2 or 8 one-thousandths. That would be a conventional probability problem. In this problem we are starting with the known outcome, and we're interested in how probable is it that we observed Urn 1. Or Urn 2, okay, an unknown process. Okay, written in the terminology of conditional probability. We're interested in what is the probability of the process parameter? This is urn 1, or urn 2. This is probability of white = 0.2 or probability of white = 0.. And they're equivalent given the observed data of three white marbles in a row. Bayes's theorem tells us that this is equal to, if this is probability of Ai given B, Ai given B. Then that is equal to the probability of B given Ai times probability of Ai divided by probability of B. Which here we have broken up using the sum rule to be the series of joint probabilities. The probability of the observed data with parameter 1. Plus probability of observed data with parameter 2. Each times the original probability of that parameter or process. Okay, so part of Bayes' Theorem is the particular equation, what is the probability of the observation given the parameter? Right. So this is the observation and this is A1. Okay. This is A2. This portion of the solution to Bayes's theorem is known as the likelihood. Solving for it is solving a simple forward probability problem. So the probability of observing three whites in a row, if we know we're observing r in 1 is 8 in 1000. Similarly if the likelihood of each individual white marble being drawn with replacement is 10%. Then the probability of three whites in a row is 1 in 1,000. And these terms are known as likelihoods, the likelihood of the data given the parameter. Okay, so what is our solution? Well we're going to start with the principle of indifference tells us that we have no basis for choosing between the two urns. So P(A1)=.5. P(A2)= .5. In other words before we observe any data, we are neutralized to which urn we are observing, they have equal probability. Because this is before any data are observed we call this the prior probability of the parameters, or of the process, okay? So, we're going to look at P(B given A1)P(A1)/P(B). Which is going to be equal to P(B,A1) + P(B, A2) by the sum rule. Which by the product rule is going to be the equal to P(B given A1)P(A1) + P(B given A2)P(A2), right? So all of this is equal to probability of B, okay? So you see, this is Bayes' Theorem for A1. And we can also solve for A2. But because they are only two possibilities, they are going to sum to 1. So if we solve for one we've also solved for the other. So P(B given A) as we just said a minute ago is 8/1,000. And that's times 1/2 divided by 8 times 1000 times 1/2. Plus the probability of observing three white marbles if we have a 10% chance of drawing a white marble. Which is going to be 1/1000 times 1/2, the prior probability of Urn 2. The 1/2s cancel out. And we're left with 8/9. Which is the probability that we observed Urn 1. And obviously the probability that we observed Urn 2 would be 1 minus 8/9 or 1/9. And that is the application of Bayes' Theorem using inverse probability.