Decision tree is a simple supervised machine learning classification model used to predict and classify the data. It is a tree like structure used to make decision. Here leaf node is the class label(i.e. target class) and internal node denotes a test on an attribute or decision node and top most node is denoted as root node.

In this algorithm for decision tree induction, we follow top-down approach, which starts with a training set of tuples and their associated class labels. The training set is recursively partitioned into smaller subsets as the tree is being built.

Algorithm

Let understand the algorithm by an example.

A small dataset
A small dataset

Step 1 : The algorithm is called with three parameters: Training data, Attribute list (Age and Income, and Marital status is our class label ), and Attribute selection method. Attribute selection method specifies the attribute that best split the tuples according to the class. Some attribute selection measures are Information gain, Gini Index etc.(In this example we use Information gain).

Information gain

We want to determine which attribute in the given set of attribute is most useful. We use this information gain to decide the ordering of attributes in the nodes of a decision tree.

Calculating Information gain

Information gain= Entropy(parent) - [Average entropy(children)]
Formula of Information gain

Entropy

Formula of Entropy
Formula of Entropy

Step 2 : The tree starts as a single node represents the training tuples in the dataset.

Root node with whole dataset
Starting node with whole dataset

Step 3 : If the tuples are all of the same class, then node becomes a leaf and is labelled with that class. In our case all the the tuples are not in a same class(i.e. we have combination of both married and unmarried class labels) so we need to split the dataset.

Step 4 : So now a question arises that, which attribute choose to split. Here comes the attribute selection method to determine the splitting criterion. That tell us which attribute to test at node. The splitting criterion also tells us which branches to grow from node with respect to the outcomes of the chosen test. So we calculate the information gain of all the attribute, and choose that attribute which has highest information gain (In our case Income has maximum information gain).

A partition is pure if all the tuples in it belong to the same class. Here since all tuples are not in same class so its not a pure partition.

Step 5 : There are three possible scenarios of splitting or partitioning.

a. If Splitting attribute is discrete valued : In this case branch is created for each known value and label with that value.

Attribute with discrete value
attribute with discrete value
attribute with discrete value with multiple branches
attribute with discrete value with multiple branches

b. If Splitting attribute is continuous-valued: In this case, the test at node has two possible outcomes corresponding to the conditions true or false (or Yes or No).

attribute with continuous value
attribute with continuous value

c. If Splitting attribute is discrete-valued and a binary tree must be produced: A subset of known values is created and If the attribute is belong to the subset then it will create a subtree. So two branches are grown Yes or No.

attribute with discrete-valued and must be a binary tree
attribute with discrete-valued and must be a binary tree

step 6 : The algorithm uses the same process recursively to form a decision tree for the tuples at each resulting partition.

step 7 : This process stops only when any one of the terminating conditions is true. The terminating conditions are:

a. All the tuples in partition belong to the same class. In the above figure we can see that all the partitions are in the same class. so the algorithm will stop.

b. If there are no remaining attributes on which the tuples may be further partitioned. In this case, we choose the class label that has majority and convert the node into a leaf. In our case

c. If there are no tuples for a given branch, that is, a partition is empty. In this case, a leaf is created with the majority class in D.

step 8 :  We will get our final resultant tree. And also generate rules by that we able to predict the class label with unknown dataset.

Final decision tree
Final decision tree

Now we get the decision rule where the outcome is the contents of the leaf node. So we can use the rule to predict class label (or target class) in unknown dataset(or In testing dataset).

Decision rules can be generated by -

if condition1 and condition2 and condition3 then outcome

In our case rules are: 1. if income is low and age<30 then unmarried, 2. if income is high and age>=30 then married, and so on.

Conclusion

So we learned about construction of decision tree with a simple dataset. And finally we generate decision rule for future prediction.