Decision trees recursively partition the input space into distinct regions and classify each region according to the value of the target for the majority of training cases in that region. This ends up creating a list of rules for classifying new data points. The decision tree in this example partitions the input space into four regions, classifying one region as blue because only 40% of training cases in that region are yellow, and the other three regions as yellow because each of those regions has majority of yellow training cases. This means the decision tree has generated four rules for classifying new data, an example rule being if X-2 is greater than 0.63 and X-1 is less than 0.51, we classify the new data point in the upper left-hand corner as yellow. Larger and more complicated decision trees generate more distinct regions and longer lists of rules. The first part of the tree building algorithm is called the split search. The split search starts by selecting an input for partitioning the available training data. For a selected input and fixed split point, two groups are generated. Cases with input values less than the split point are said to branch left and cases with input values greater than the split point are said to branch right. The groups, combined with the target outcomes, form a 2x2 contingency table with columns that specify a branch direction and rows that specify a target value. A Pearson chi-square statistic is used to quantify the independence of counts in the table's columns. Large values of the chi-square statistic suggest that the proportion of zeros and 1s in the left branch is different from the proportion in the right branch. A large difference in outcome proportions indicates a good split. Because the Pearson chi-square statistic can be applied to the case of multiway splits and multi-outcome targets, the statistic is converted to a p-value. The p-value indicates the likelihood of obtaining the observed value of the statistic if you assume identical target proportions in each branch direction. For large data sets, these p-values can be very close to zero. For this reason, the quality of a split is reported by the logworth which is -log(chi-squared p-value). The best split for an input is the split that yields the highest logworth.