So, in this set of lectures we'll be discussing the minimum cut problem and graphs and we'll be discussing the randomized contraction algorithm. Randomize algorithm which is so simple and elegant and it's almost impossible to believe that it can possibly work but that's exactly at what we'll be approving. So one way you can think about these set of lectures, as a segue of sorts, between our discussion of randomization and our discussion of graphs. So we just finished talking about randomization in the context of sorting and searching. We'll pick it up again toward the end of the class when we discuss hashing. But while we're in the middle of randomization probability review, I'm going to give you another application of randomization in a totally different domain. In particular to the domain of graphs, rather than to sorting and searching. So that's one high level goal of these lectures. The second one, is we'll get our feet wet talking about graphs, and a lot of the next couple weeks, that's what we're going to be talking about, fundamental graph primitives. So this will give us an excuse to start warming up with the vocabulary, some of the basic concepts of the graphs and what a graph algorithm looks like. Another perk, although it's not one of the main goals, but I want to do, I do want to point this fact, compared to most of this stuff that we're discussing in this class, this is a relatively recent algorithm, the contraction algorithm. By relatively recent I mean, it's 20 years old. But at least that means most of us, I know not all of us, but most of us at least were born at the time that this algorithm was invented. And so just one quick digression. In an intro course like this, most of what we're going to cover are oldies but goodies, stuff from as much as 50 years ago. And while it's kind of amazing, given how much the world and how much technology has changed over the past 50 years, that ideas in computer science from that long ago are still useful, they are. It's just sort of an amazing thing about the stuff that the first generation of computer scientists figured out. It's still relevant to this day. That said, algorithms is still a vibrant field with lots of open questions. And when I have an opportunity, I'll try and give you glimpses of that fact. So I do want to point out here that this is a somewhat more recent algorithm than most of the other ones we'll see, which dates back to the 90s. So let's talk about graphs. Fundamentally, what a graph does is represent pair-wise relationships among a set of objects. So, as such, a graph is going to have two ingredients. So first of all, there's the objects that you're talking about. And these have two very common names and you're just going to have to know both of the names, even though they're completely synonymous. The first name is vertices. So vertex is the singular, vertices is the plural. Also known interchangeably as nodes. I'll be using the notation V for the set up of vertices. So those are the objects, now I want to represent pair wise relationships so these pairs are going to be called edges. Will be noted by, denoted by E. And there's two flavors of graphs and both are really important. Both come up all the time in applications, so you should be aware of both kinds. So there's undirected graphs and directed graphs and that just depends on whether the edges themselves are undirected or directed. So edges can be undirected by which I mean this pair is unordered. There are just two vertices of an edge the two endpoints, say U and V, and you don't distinguish one as the first and one as the second. Or edges can be directed, in which case you have a directed graph. And here, a pair is ordered, so you do have a notion of a first vertex, or a first end point. And the second vertex or second end point of an edge. Those are often called the tail and the head respectively. And once in a while, although I will try to not use this terminology, you hear directed edges called arcs. Now I think all of this is much clearer if I just draw some pictures. Indeed one use to call graphs, dots and lines. The dots would refer to the vertices, so here's four dots, or four vertices. And the edges would be lines, so the way you denote one of these edges is you just draw a line between the two end points of that edge, the two vertices that it corresponds to. So this is undirected graph with four vertices and five edges. We can equally we'll have a directed version of this graph. So let's still have four vertices and five edges, but to indicate that this is directed graph and then each edge was first vertex and the second vertex, were going to add arrows to the line. So the arrow points to the second vertex, or to the head of the edge. So, the first vertex is often called the tail of the edge. So, graphs are completely fundamental, they show up not just in computer science but in all kinds of different disciplines, social sciences and biology being two prominent ones. So, let me just mention a couple of reasons you might use them just off the top of my head but literally there's hundreds or thousands of others, so a very literal example would be road networks. So imagine you type in asking for your driving directions from point A to point B in some web application or software, or whatever, it computes a route for you. What it's doing, is it's manipulating some representation of a road network, which inevitably is going to be stored as a graph, where the vertices corresponds to intersections and the edges correspond to individual roads. The Web is often fruitfully thought of as a directed graph, so here the vertices are the individual web pages, and edges correspond to hyperlinks. So the first vertex in an edge detail is going to be the page that contains the hyperlink. The second vertex, or the head of the edge, is going to be what the hyperlink points to. So that's the Web as a directed graph. Social networks are quite naturally represented as graphs. So here the vertices correspond to the individuals in the social network. And the edges correspond to relationships. They have friendship links. I encourage you to think about among the popular social networks these days, which ones are undirected graphs and which ones are directed graphs, we have some interesting examples of each of those.. And often graphs are useful even when there isn't such an obvious network structure. So just to mention one example. Let me just write down precedence constraints. So to say what I mean, you might think about, let's say you're a freshman in college and you're looking at your majors, you're a science major and you want to know what courses to take in what order. And you can think about the following graph where each of the courses in your major corresponds to a vertex And you draw a directed edge from course A to course B, if course A is a prerequisite for course B. That is, it has to be completed before you begin course B. Okay, so that's a way to represent dependencies, sort of a temporal ordering, between pairs of objects using a directed graph. So that's the basic language of graphs. Let me now talk about cuts in graphs. because again, this set of lectures is going to be about so called minimum cut problem. So, the definition of a cut of a graph is very simple, it's just a grouping, a partition of the vertices of the graph into two groups, A and B, and both of those two groups should be non-empty. So, to describe this in pictures, let me give you a cartoon of the cut in both the undirected and directed cases. So for an undirected graph, you can imagine drawing your two sets, A and B. And once you've defined the two sets A and B, the edges then fall into one of three categories. You've got edges with both of the endpoints in A. You've got edges with both of the endpoints in B. And then, you've got edges with one endpoint in A, and one endpoint in B. So that's generically what the picture of the graph looks like viewed through the lens of a particular cut, A B. The picture for directed graphs is similar. You would again have an A, and you'd again have a B, you have directed edges with both endpoints in A, directed edges with both endpoints in B. And now you should have two further categories, so you have edges who cross the cut from left to right, that is tail vertex is in A and the head vertex is in B and you can also have edges which cross the cut in the opposite direction, that is their tail is in B and their head is in A. Usually when we talk about cuts, we're going to be concerned with how many edges cross the given cuts. And by that I mean the following, the crossing edges of a cut (A,B) are those that satisfy the following property. So in the undirected case, it's exactly what you think it would be, one of the endpoints is an A, the other endpoint is in B, that's what it means to cross the cut. Now in the directed case, there's a number of reasonable definitions you could propose, about which edges crossed the cut. Typically and in this course, we're going to focus on the case where we only think about edges that cross the cut from the left to the right, and we ignore edges which cross from the right to the left. So that is the edges that cross the cut are those with tail in A and head in B. So referring to our two pictures, our two corrections of cuts for the underrated one all three of these blue edges would be the edges crossing the cut AB. because they're the ones that have one end point on the left side and one end point on the right side. Now for the directed one, we only have two crossing edges. So the two that cross from left to right. We have tail in A and head in B. The one that's crossing backwards does not contribute. We don't count it as a crossing edge of the cut. So the next quiz is just a sanity check that you've absorbed the definition of a cut of a graph. All right, so the answer to this quiz is the third option. Recall what is the definition of a cut, it's just a way to group the vertices into two sets A and B, both should also be not empty. So we have N vertices and essentially we have one binary degree of freedom for each, for each vertex, we can decide whether or not it goes in set A or it goes in set B, so two choices for each of the N vertices, that gives us a two to the N possible choices, two to the N possible cuts overall. Now that's slightly incorrect because we call that a cut. You can't have a non empty set A or a non empty set B, so those are two of the two to the N options which are disallowed. So strictly speaking the number is two to the N minus two, but two to the N is certainly the closest answer of the four provided. Now, the minimum cut problem is exactly what you'd think it would be. I give you as input a graph and among these exponentially, many cuts, I want you to identify one for me with the fewest number of crossing edges. So a few quick comments, so first of all the name for this cut is a min cut. A min cut is one with the fewest number of crossing edges. Secondly, to clarify, I am going to allow in the input what's called parallel edges. There will be lots of applications where parallel edges are sort of pointless, but for minimum cut actually it's natural to allow parallel edges. And that means you have two edges that correspond to exactly the same pair of vertices. Finally, the more seasoned programmers among you are probably wondering what I mean by, you're given the graph as input. You might be wondering about how exactly that's represented, so the next video's going to discuss exactly that, the popular ways of representing graphs and how you're usually going to do it in this course, specifically via what's called an adjacency list. Okay, so I want to make sure that everybody understands exactly what the minimum problem is asking. So, let me draw for you a particular graph with eight vertices and quite a few edges. And what I want you to answer is what is the min cut value in this graph? That is, how many edges cross the minimum cut, the cut with the fewest number of crossing edges? All right, so the correct answer is the second choice. The min cut value is 2 and the cut which shows that, is just to break it basically in half. And there were two halves. In this case, there are only two crossing edges, this one and this one. And I'll leave it for you to check that there's no other edge that has as few as two edges. Now in this case, we got a very balanced split when we took the minimum cut. In general, that need not be true. Sometimes even a single vertex can define the minimum cut of a graph, and I encourage you to think about a concrete example that proves that. So why should you care about computing the minimum cut? Well, this is one problem among a genre called graph partitioning, where you're given a graph and you want to break it into two or more pieces. And these kinds of graph partitioning problem comes up all the time, in a surprisingly diverse array of applications. So let me just mention a couple at a high level. So one very obvious one when your graph is representing its physical network, when identifying something like a min cut allows you to do, is identify weaknesses in your network. Perhaps it's your own network, and you want to understand where you soup of the infrastructure because it's, in some sense, a hot spot of your network or a weak point. Or, maybe there's someone else's network and you want to know where the weak spot in their network. In fact, there are some declassified documents about 15 years ago or so. Which showed that the United States and Soviet Union militaries, back during the Cold War, were actually quite interested in computing minimum cuts, because they were looking for things like, for example, what's the most efficient way to disrupt the other country's transportation network? Another application, which is a big deal in social network analysis these days, is the idea of community detection. So the question is among the huge graph, say the graph of everybody who is on Facebook or something like that. How can you identify small pockets of people that seem tightly knit, that seem closely related, from which you like to infer that there are community of some sort? Maybe they all go to the same school, maybe they all have the same interest, maybe they're part of the same biological family whatever. Now, it's to some degree still an open question how to best define communities and social networks. But as a quick and dirty sort of first order heuristic, you can imagine looking for small regions, which on the one hand, are highly interconnected among themselves, but quite weakly connected to the rest of the graph. So sub-routines like the minimum cut problem, can be used for identifying these small densely interconnected, but then weakly connected to everybody else, pockets of a graph. Finally, cut problems are also used a lot in vision. So for example, one way you can use them in what's called image segmentation. So here what's going on is you're given as input a 2D array where each entry is a pixel from some image. And there's a graph, which is very natural to define, given a 2D array of pixels. Namely, you have an edge between two pixels if they are neighboring. So for two pixels that are immediately next to each other left and right or top to bottom, you put an edge there. So that gives you what's called a grid graph. And now unlike the basic minimum cut problem that we're talking about here, in image segmentation it's most natural to use edge weights. Where the weight of an edge is basically how likely you expect those two pixels to be coming from a common object. Why might you're expect to enabling pixels to come from the same object, well perhaps their color maps were almost exactly the same and you just expected that they're part of the same thing. So once you've defined the screen graph which suitable edge ways now you run a graph partitioning or maybe cut type separate team, and the hope is that the cut that it identifies rips off one of the contiguous objects in the picture. And then you do that a few times and you get the major objects in the given picture. So this list is by no means exhaustive of the applications of min cut and graph partitioning server teams, but I hope it serves as sufficient motivation to watch the rest of the lectures in this sequence.