When we discussed Variable Elimination. We said the Variable Elimination is correct no matter the elimination ordering that's used. But we it also showed that the elimination ordering can make a very significant difference in the computation complexity of the algorithm. So that embeds the question of how we find a good elimination ordering. Fortunately, the computational analysis based on graphs that we recently did gives us an intuition into how to find good elimination orderings. Because it turns out that the [INAUDIBLE] used graph and it, the way in which it. In the case the complexity of Variable Elimination can allow us to construct good me- methods for finding elimination orderings with good computational properties. How good of an elimination ordering can we construct? So, here's yet another example of the fact that any interesting problem is empty heart. So the following problem is empty heart also. For graph H, determining whether there exists any elimination ordering alpha for which the induced width in less than or equal to some fixed value K, that problem is NP complete. So you might say to yourself, well that's not surprising. I mean, probabilistic inferences end incomplete so obviously finding a really good elimination ordering is going to be NP complete. Turns out, that this is a separate NP NP-hardness problem. That is, even if you could solve this problem, even if somebody gave you the best elimination ordering, you're still going to have graphs for which the width is sufficiently large so that you can't actually solve the problem polynomial type. So finding the best induced width, or, sorry, the elimination ordering that gives you the best induced widths does not, give you polynomial type performance, in general, and so these are two separate NP-hardness problems. So how do you find a good elimination ordering? Fortunately simple heuristics actually do pretty well. So one standard way of finding good illumination ordering is to simply do a greedy search eliminating one variable at a time. Where at each point in the elimination, you've used some kind of heuristic cost function to decide which variable your going to eliminate next. And there's some obvious cost functions to use and it turns out that surprisingly they work pretty well. So, for example, one cost function is what's called min-neighbors. Which is to pick the node that has the minimal number of neighbors in the current graph. So you want to pick the variable that's connected to the fewest friends so it has the smallest party or produces um., the smallest factor. So this corresponds to the smallest factor or the one whose cardinality is smallest, enters number of variables. Min-weight accounts for the fact that sometimes you have variables that have 100 values and others that have two values, and if you just count variables, you're going to ignore that distinction. So min-weight computes the weight or the total number of values in the factor form. So min-weight is a way of capturing the fact that if you have two variable each of which has 100 value you might prefer to never the less take a factor that has five variables each of which has only two values. so these just look at the size of the factors formed and they ignore the fact that some of these factors might be inevitable. So a different strategy is to look at how much extra work is caused by the elimination ordering. So min-fill is a strategy that says, if I eliminate this node, how many nodes that weren't friends before become friends due to this elimination step? So here basically, I'm counting added complexity above and beyond the complexity that I would have had from the edges that I previously generated. And finally, again you have a weighted version of the fill heuristic that takes into account not just the number of edges but also how heavy they are. That is how big are the what's the car, what's the domain size of the variables that they connect. And it turns out that min-fill which is actually quite often used in practices a pretty good heuristic Now once important way of understanding the issues of finding elimination orderings is by is by looking at the following result. And the result tells us that the induced graph that is produced by Variable Elimination, regardless of the elimination ordering, is triangulated. So what does triangulated mean? Triangulated means that you cannot have a loop of size more than three that doesn't have a bridge an edge that goes in one direction or another. So that there is a way of crossing across that loop without going all the way around. Let's convince ourselves that this is true. So here's here's a simple proof. So once again assume, assume, so consider a set of variables that, and assume by contradiction that there is a loop like this. So for example assume that we have a loop, outsize greater than three. And again, one of these variables has to be the first to be eliminated. So assume that C is first to be eliminated. Well once we eliminate C we end up introducing an edge between B and D and so there has to be and edge between those two neighbors of C. Because when C is eliminated the CD edge must exist the CB edge must exist because as we observe the before you don't add any neighbors to C once you eliminate it. And so at that point a fill, a fill edge must be added. Though, it turns out that one way to find an elimination ordering, an alternative to the heuristic that I mentioned earlier, is to find a low-width triangulation of the original graph. And there's been a lot of work in the graph theory literature on triangulating graphs which we're not going to talk about, but it turns out that you could use all that literature for finding good low-width triangulations and then use that for constructing an elimination order. So now, let's convince ourselves that this can make a big difference. So let's consider a practical application and this is an application to Robot Localization & Mapping. So what we're going to see here is a robot moving around and at each point in time, it sees several landmarks, and it knows which landmarks it sees, it can recognize them. And what it senses at each point is some kind of approximate distance to the closest landmark. So lets, write that down as a graphical model. The graphical model looks something like this. We have, at each point in time, a robot pose. Which is the robot pose at time zero one up to the end of the trajectory T. And we have a set of landmarks, whose positions are fixed the landmarks don't move. So notice that these are not random variables that change over time. They're just there, and they're, and they're constant. And what you see here in gray are the observations, so this, for example, assumes that at Time one, Robot saw landmark one. And so what we have here is the observation the observe distance is the function of the robot pose, and the landmark position. And here at time two, we have that the robot saw landmark two so we have this dependency over here. And it lent at time three the robot also saw landmark two so you have two observations for the same landmark. In here, I didn't draw the fact that you could actually see the same landmark so you can see more than one landmark at any at a given point in time but that also is easy to add in, okay? So now, if we take the previous robot trajectory that we saw and we write down the, the Markov network that represents the factors, we see that we have these light blue little dots. That represent the robot pose at every point in time. We see that we have these temporal persistence edges that represent the fact that the robot pose at time T is correlated with this position of time T1. plus one. And we have these dark blue robot pose variables that, sorry these dark blue landmark position variables where we see that a landmark is connected to all of the positions at which it was visible by the robot. And this is an edge that's induced by the V structure that we have over here. So this is an edge that's induced by this V structure which we have because Z1's been observed, okay? So, does landmark, does elimination ordering matter? Oh boy. This is what you get when you eliminate the robots poses first, and then the landmark. And this, what you see here, is the induced graph. And you can see that you get this huge spaghetti where most of the landmarks are connect to every, to all other landmarks, okay? What about a different elimination ordering? This one's already better. This is what happens when you eliminate landmarks and then poses and you see the induced graph that you get over poses. And you can see that it's still pretty densely connected but it's not nearly as bad as the spaghetti that we had before. And finally, let's look at the result of min-fill elimination, so this is what you are seeing is the fill edges being constructed. And you can see that it's actually very sparse. Lets look at it again, very few fill edges are actually added over the course of the trajectory. And so there is, so the elimination ordering makes a very big difference. In summary, we've seen that finding the optimal elimination ordering is an NP-hard problem. And that is an NP-hardness that is different from the intrinsic difficulty of inferencing graphical models. However, we've also shown that the graph based view variable of elimination gives us a very, a set of very and intuitive, heuristics, that allow us to construct good elimination orderings. And those work by looking at the induced graph as it's constructed in the course of Variable Elimination. And trying to keep it snall and sparse. These heuristics although simple and certainly potentially not providing not the performance in the worst case are often quite reasonable in practice and are very commonly used.