4 min read

Introduction to Decision Tree

A decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It uses predictive modeling which is one of the most popular machine learning algorithms used all along.
Introduction to Decision Tree

A decision tree is basically a decision support tool that builds classification or regression models in the form of a tree-like graph with nodes representing the place where we pick an attribute and ask a question; edges represent the answers the to the question; and the leaves represent the actual output or class label. It breaks down a dataset into smaller and smaller dataset and an associated decision tree is developed at the same time. The final result is a tree with decision nodes and leaf nodes.

The goal of using this algorithm is to create a training model which can predict the value of target variable by learning rules from the previous data.It uses multiple algorithms to decide to split a node into two or more sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. In other words, we can say that the purity of the node increases with respect to the target variable. The decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous sub-nodes

Types of Decision Tree

1.Categorical Variable Decision Tree

Decision Tree which has a categorical target variable then it called a Categorical variable decision tree.

E.g.:- In a scenario of student problem, if the target variable is “Student will play badminton or not” i.e. YES or NO

2.Continuous Variable Decision Tree

Decision Tree has a continuous target variable then it is called Continuous Variable Decision Tree.

E.g.:- Let’s say we have a problem to predict whether a customer will pay his renewal premium with an insurance company (yes/ no). Here we know that income of customer is a significant variable but insurance company does not have income details for all customers. Now, as we know this is an important variable, then we can build a decision tree to predict customer income based on occupation, product and various other variables. In this case, we are predicting values for continuous variable.

Frequent terminologies used in Decision Tree

1.Root Node

It represents entire population or sample and this further gets divided into two or more homogeneous sets.

2.Splitting

It is a process of dividing a node into two or more sub-nodes.

3.Decision Node

When a sub-node splits into further sub-nodes, then it is called decision node.

4.Leaf/ Terminal Node

Nodes which do not split is called Leaf or Terminal node.

5.Pruning

When we remove sub-nodes of a decision node, this process is called pruning. You can say opposite process of splitting.

6.Branch / Sub-Tree

A sub section of entire tree is called branch or sub-tree.

7.Parent and Child Node

A node, which is divided into sub-nodes is called parent node of sub-nodes where as sub-nodes are the child of parent node.

Decision Tree
Decision Tree

It classify the examples by sorting them down the tree from the root to some leaf/terminal node, with the leaf/terminal node providing the classification of the example.

Decision Tree Algorithm Pseudocode

1. Place the best attribute of the dataset at the root of the tree.
2. Split the training set into subsets. Subsets should be made in such a way that each subset contains data with the same value for an attribute.
3. Repeat step 1 and step 2 on each subset until you find leaf nodes in all the branches of the tree.
Decision Tree classifier
Decision Tree Classifier (image source : www.packtpub.com )

In decision trees, for predicting a class label for a record we start from the root of the tree. We compare the values of the root attribute with record’s attribute. On the basis of comparison, we follow the branch corresponding to that value and jump to the next node.

We continue comparing our record’s attribute values with other internal nodes of the tree until we reach a leaf node with predicted class value. As we know how the modeled decision tree can be used to predict the target class or the value.

Assumptions while creating Decision Tree

The below are the some of the assumptions we make while using Decision tree:

  • At the beginning, the whole training set is considered as the root.
  • Feature values are preferred to be categorical. If the values are continuous then they are discretized prior to building the model.
  • Records are distributed recursively on the basis of attribute values.
  • Order to placing attributes as root or internal node of the tree is done by using some statistical approach.

Decision Trees follow Sum of Product (SOP) representation from traversing for the root node to the leaf node. It’s a sum of product representation. The Sum of product(SOP) is also known as Disjunctive Normal Form. For a class, every branch from the root of the tree to a leaf node having the same class is a conjunction(product) of values, different branches ending in that class form a dis-junction(sum).

Advantages

Decision tree is simple and easy to use, understand and explain.It requires less data cleaning compared to some other modeling techniques. It is not influenced by outliers and missing values to a fair degree. Data type is not a constraint in decision tree. It can handle both numerical and categorical variables. Decision tree is considered to be a non-parametric method. This means that decision trees have no assumptions about the space distribution and the classifier structure. Interpretation of a complex Decision Tree model can be simplified by its visualizations. Even a naive person can understand logic.

Disadvantages of Decision Tree

There is a high probability of overfitting in decision tree. It is not fit for continuous variables. It looses information when it categorizes variables in different categories. Generally, prediction accuracy for a dataset is low  as compared to other machine learning algorithms. Information gain in a decision tree with categorical variables gives a biased response for attributes with greater no. of categories. Also, calculations can become complex when there are many class labels.

Conclusion

In this article, we have covered a lot about decision tree, its working, its types, pseudocode, the assumptions while creating it, its advantages and its disadvantages.