Decision Tree
Decision Tree
A Decision Tree is a supervised machine learning algorithm used for both classification and
regression tasks. It works by splitting the data into branches based on conditions or questions about
the input features. Each internal node represents a decision (based on a feature), each branch
represents the outcome of the decision, and each leaf node gives the final result or prediction.
It is called a "tree" because it starts from a root node and splits into branches like a tree.
Root Node: The topmost node that represents the entire dataset. This node is split into two
or more homogeneous sets.
Decision Nodes: These are the nodes where the data is split based on certain criteria
(features).
Leaf Nodes: These nodes represent the outcome (classification or decision) and do not split
further.
Branches: The arrows from one node to another, representing the outcome of a decision
o The decision tree begins at the root node, which has the entire dataset.
o The goal is to split the data in a way that makes each group more similar (pure).
o The chosen feature is used to divide the data into smaller groups.
5. Making Predictions:
o For a new data point, start from the root node.
Easy to Understand: The structure of decision trees makes them easy to interpret and
visualize.
Overfitting: Decision trees can become very complex and overfit the training data.
Unstable: Small changes in the data can lead to a completely different tree.
Bias: They can be biased towards features with more levels (high cardinality).
📌 Definition:
Because:
We want to know how well our model will perform on unseen data
Steps:
Benefits:
📊 Where is it used?