In this next set of videos, we'll introduce more complex commonly used modeling approaches available for working with unbalanced datasets. Now let's go over the different learning goals for this section. In this section, we'll go over some additional approaches to dealing with unbalanced outcomes. We'll go over random over sampling, as well as introducing the concept of synthetic over sampling. We'll go over some techniques for under sampling. And then we'll touch on using balanced bagging or blagging to address unbalanced class data as well. Now want to start off by restating the problem when working with unbalanced classes. So our problem is that in the learning phase for our machine learning models, as well as the eventual predictions that they make. They can easily be affected by our imbalanced data set. We see here an example of support vector machines. With different levels of imbalance, and how with the imbalanced data sets, such as the top two that we see here, we end up heavily favoring that majority class, and we end up unable to correctly classify the values in that minority class. Now to correct this, there are many approaches that we can actually use in practice. First off we just had the general sklearn approaches. So with many of our models, we'll see that there's a hyper parameter available called class weight. By default this will be set to none, but we can set it to balance, literally the string balanced to help balance out the error attributed to our minority and majority classes. We can over sample our minority class or under sample our majority class. And then we can also do a combination of over sampling and under sampling as well. And we can even use ensemble methods where we will leverage the over sampling or under sampling techniques to ensure balance between each one of our classes. Now I would highly suggest you looking at this link, after we go through this lecture, or through this number of videos, as we're just going to be showing you the techniques and in order to apply them in Python, you'll want to look at all the code that's available within this library. Now, as mentioned, many models allow weighted observation. We can adjust these so that the total weights are equal across classes, again by setting that class weight hyper-parameter, to either balanced. Again literally the stream balanced, or we can use some other specific weighting of your choosing. Now this can be convenient as it's easy to do as long as it's available for that specific model. And there's no need to sacrifice data with either over or under sampling. Now, as we've seen, we can and should also stratify our samples when working with unbalanced data sets. This ensures that our class balance remains consistent in both our train and test set. So options available in Python are going to be using the stratified arguments in our train test split function. So our actual vanilla train test split function also has this option of stratified. We can use the stratified shuffle split method, which we've used in many of our notebooks, which will also allow us to stratify, according to the balance of our outcome variable. Or we can use a stratified k fold or repeated stratified k fold to create many different train and test sets that are stratified as well. In our next video, we'll go and start going over more technical means of balancing out our dataset, starting off with over sampling, all right, I'll see you there.