In this section, I want to give a high-level overview of the models we'll build on the CAS server: logistic regression, support vector machines, decision trees, random forests, gradient boosting, and neural networks. Let's start with a simple data set consisting of two interval inputs, X-1 and X-2, along with a binary target, blue or yellow. Although most real data will have more than two inputs, it is useful to visualize how our models work in two dimensions. The intuition we build in this two-dimensional case can be extended to higher dimensions when working with real data. Each point in the plot represents a single observation from our data set, with its coordinates corresponding to the values of the input variables and the color corresponding to the value of the target variable. Each of our predictive modeling algorithms will create a rule that is applied to this plot to distinguish the blue dots from the yellow dots. The simplest model we will consider is logistic regression, which uses a linear combination of the input variables to predict the target. With our two input variables, we get three model parameters: an intercept and a weight for each input. Instead of predicting an unbounded number like linear regression, we want to predict a probability, so we use the logit link function to convert the model output from an unbounded number to a probability bounded between zero and 1. The model is used to estimate values of p-hat, which is the probability of an observation being an event or not. In this case, p-hat tells us the probability of an observation being yellow (the event of interest) versus blue. In the context of the two-dimensional plot, the logistic regression model draws lines on the plot corresponding to different probabilities of an observation being yellow. Each line represents a different probability. After we choose a cutoff probability (for example, 0.50), we would say every observation above the line for that probability is yellow and every observation below that line is blue. The angle and location of the lines in the plot are determined by the parameter estimates, which are found by maximizing the log-likelihood function. The limitation of logistic regression is that it can draw only straight lines in this plot, even if the data would be better separated by a more complicated geometry. In higher dimensions, this problem persists, although lines are replaced by planes (in three dimensions) or hyperplanes (in higher dimensions). One way to overcome the straight-line limitation and to introduce curved decision boundaries is to include higher-order polynomial terms in the model. A second-order polynomial regression model would include all quadratic terms, so in our example, we would have a parameter for X-1-squared a parameter for X-2-squared and a parameter for X-1 times X-2. Adding these terms to the model equation enables the model to draw quadratic decision boundaries rather than just linear boundaries. Adding third-order terms (like X-1-cubed) enables the model to draw cubic decision boundaries, and so on, for higher-order terms. These extra terms increase the model flexibility, possibly leading to overfitting. If we know that the inputs are nonlinearly related to the target, it can be useful to add polynomial terms. But it is important to still evaluate performance on validation data to ensure that we're not overfitting the data.