Decision Tree

Decision Trees are known for producing robust, interpreable cateogories/splits in which data is classified.

Here, the top box is the root node, containing all the data.

Any nodes at the very bottom are considered the leaves. Any in between are called split nodes.

GINI Impurity

G_{i} = 1 - k = 1 \sum N p_{i, k}^{2}

expand on this

Where:

$G_{i}$ is the impurity of the $i^{t h}$ node
$p_{i, k}$ is the ration of class $k$ instances among the training instances in the $i^{t h}$ node

If there are two classes in the $i^{t h}$ node and 100 instances, then:

G_{i} = 1 - (\frac{x}{100})^{2} - (\frac{100 - x}{100})^{2}