In this section, we're going to focus on the idea of undersampling. Now, the concept here is going to be to try and decrease the size of that majority class so that it is similar in size to that minority class. Now let's work with methods called NearMiss. Here, we will look through different means of keeping points that are closest to nearby minority points. So here we have, in our plot, that the positives are going to be the majority class, and the minuses are going to be our minority class. The goal here is to only select the positive samples, so from that majority class, for which the average distance to the N closest samples of that negative class are the smallest. So we are generally trying to keep points that are near our decision boundaries. That's going to be our goal. Now, what we need to be aware of though, is that, with this type of downsampling, it can easily be skewed by the presence of some outliers or some noise. That may cause those clusters to stick together, far from that boundary. As we see here in our example, that we have this difference in clustering, and then we come up with the decision boundary that we see in the middle, based off of these new boundaries from downsampling. That's due to the fact that we may have some outliers, which makes all of the points near a certain cluster the closest points to that cluster, and we end up with different decision boundaries than we probably would want, optimally. So this can be easily thrown off by outliers. Now, with NearMiss-2, it's going to select the positive samples for which the average distance to the farthest samples of the negative class is the smallest. So NearMiss-2 will not be as affected by outliers, since it does not focus on minimizing distance to the nearest samples, but rather minimizing the distance from the farthest samples. So still minimizing distance, but now minimizing distance from the farthest samples. Thus looking to minimize the distance to the farthest samples can help reduce the effects of noise here. As we can see, we end up with a bit better of a classification, but again, we should note that this can still be effected by marginal outliers. So it's not perfect. Now, NearMiss-3 is going to be a two-step algorithm. First, it's going to be for each negative sample. We're going to find the K-nearest neighbors of the positive class, then the positive samples selected are going to be the ones for which the average distance to the N-nearest neighbors is the largest. So first, we only take the nearest neighbors, and then we take the largest distance. Thus we are then taking points that are a bit further apart from one another. Due to that, NearMiss-3 is probably the version which will be less affected by noise due to this first step sample selection. So it won't be as affected by outliers. Again, we have a fairly good different decision boundary that we see here in the middle plot. Now, another method available is going to be the Tomek links. Now, a Tomek link exists if two samples from different classes are the nearest neighbors of one another. So in the figure here, we see a Tomek's link is illustrated by highlighting the samples of interest in green. So we see the green highlighted samples. Those are each other's nearest neighbors, but they are from different classes. We then can either just remove the majority class, as we see in the middle figure, where we just remove the positive, or we can remove both classes, as we see in the figure to the right. The point being that this will remove the points that are too close together, and create more distinct classes. Then finally, we have edited nearest neighbors. Essentially, all we do here is run K-nearest neighbors with K equal to 1. Then if you misclassify a point in one of the majority classes, that point will be removed. We again end up with clusters that are more distinct, with the new values being as we see on the right and the decision boundary, given our new data using our downsampling or undersampling, is going to be what the decision boundary we see to the left. Now that closes out this section on undersampling. In the next video, we'll briefly touch on bringing oversampling and undersampling together, as well as closing out the section on working with unbalanced classes. I'll see you there.