Boosting algorithms are a general form of strategy which helps in improving the accuracy of weak classifiers by combining them.
We tend to take a weak classifier which does "just" better than chance probability and make a much better classifier thus boosting the performance of the learning classifiers.
One of the most famous algorithm being AdaBoost which is also known as Adaptive Boosting. It is an ensemble learning method which increases the efficiency of binary classifiers. Here, we deal with sequential learning where we learn from the mistakes of the previous models.
The ensemble methods being divided into:
- Sequential Learning (eg. Adaboost)
- Parallel learning (eg. Random Forest)
Some of the boosting algorithms are AdaBoost, gradient boosting and XGBoost. Here we will be looking into AdaBoost.
Sometimes we try to use a single classifier to group the points but it fails quite a few times. Thus we try to create a strong model by combining multiple weak classifiers which progressively learn wronlgy classified points.
Let's start with a training set
We can define a weak classifier as
Thus we can create a loss function I
The pseudocode for the Adaboost algorithm comes is as follows:
for i from 1 to N,
for m = 1 to M do
where I(
for all i do
end
end
The below diagram shows the decision boundaries and the updation of the weights at each round.
<IPython.core.display.Image object>
Let's break down the pseudocode for easy understanding:
Taking the total number of data points to be N, we can start by weighing all the points equally.
Thus
<IPython.core.display.Image object>
We can check the values of
The final classifier is a linear combination of weak classifiers(base classifiers) :
f(x)= sign (
The below diagram shows the linear combination of weak classifiers into one final classifier.
<IPython.core.display.Image object>
Decision stumps:
These can be considered as a form of weak classifiers. It's representation is:
where the value of the function is either 1 or -1 i.e if the value of
The number of parameters to a decision stump are relatively small. These are:
c \in R where K is the dimensionality of x.k \in \{1,....K\} s \in \{-1,1\}
As the number of parameters are small, we can use brute force by discretizing the points from the lowest to the highest value in the training set, sequentialy mentioning all possible classifiers and picking the ones with the lowest training error.
We will be analyzing the working of Adaboost to get a better picture of it with a python implementation of the same in the next blog post.
References :