common.title

Overview
Service overview
Terms of service

Privacy policy

Contact

Sign in
Sign up
common.title

AdaBoost - Boosting Algorithm

Gaurav Singh

2021/06/24 11:03

#AdaBoost #Ensemble #Decision Tree #Boosting Algorithm

Boosting algorithms are a general form of strategy which helps in improving the accuracy of weak classifiers by combining them. We tend to take a weak classifier which does "just" better than chance probability and make a much better classifier thus boosting the performance of the learning classifiers.

One of the most famous algorithm being AdaBoost which is also known as Adaptive Boosting. It is an ensemble learning method which increases the efficiency of binary classifiers. Here, we deal with sequential learning where we learn from the mistakes of the previous models. The ensemble methods being divided into:

    • Sequential Learning (eg. Adaboost)
    • Parallel learning (eg. Random Forest)

Some of the boosting algorithms are AdaBoost, gradient boosting and XGBoost. Here we will be looking into AdaBoost.

Sometimes we try to use a single classifier to group the points but it fails quite a few times. Thus we try to create a strong model by combining multiple weak classifiers which progressively learn wronlgy classified points.

Let's start with a training set (xi,yi)N{( x_i,y_i)}^N where xiRkx_i \in R^k and yi{1,1}y_i \in \{-1,1\} . We can define a weak classifier as gm(x){1,1}g_m(x) \in \{-1,1\}.

Thus we can create a loss function I(gm(x),y)(g_m(x),y) which will give 0 when gm(x)=yig_m(x)=y_i and 1 otherwise.

The pseudocode for the Adaboost algorithm comes is as follows:

for i from 1 to N, wi(1){w_i}^{(1)} = 1

for m = 1 to M do

ϵm=i=1Nwi(m)I(fm(xi)yi)iwi(m){\epsilon}_m=\frac{\sum_{i=1}^{N} {w_i}^{(m)} I(f_m(x_i) \neq y_i)}{\sum_{i} {w_i}^{(m)}} #error or the total number of misclassifications

  where I(gm(x)yi)g_m(x) \neq y_i) = 1 and 0 otherwise,

αm=ln(1ϵmϵm{\alpha}_m = ln(\frac{1 - {\epsilon}_m}{{\epsilon}_m})

  for all i do

wi(m+1)=wi(m)eαmI(gm(xi)yi){w_i}^{(m+1)} = {w_i}^{(m)}e^{{\alpha}_m I(g_m(x_i) \neq y_i)}

  end

end

The below diagram shows the decision boundaries and the updation of the weights at each round.
<IPython.core.display.Image object>output

Let's break down the pseudocode for easy understanding:

Taking the total number of data points to be N, we can start by weighing all the points equally. Thus winitial=1/N[0,1]w_{initial} = 1/N \in [0,1]. The weighted samples summing to 1. αt{\alpha}_t is the influence a single stump will have in the final classification. We will be learning about decision stumps later on. We can plot the value of α\alpha is such a way

<IPython.core.display.Image object>output

We can check the values of α\alpha varying from infinity to -infinity and it being 0 when the number of classified objects are half correct, it being largely positive when there are a few misclassifications and it being largely negative when the classified objects are all wrong.

The final classifier is a linear combination of weak classifiers(base classifiers) :

f(x)= sign (m=1Mαmgm(x))\sum_{m=1}^{M} {\alpha}_m g_m(x))

The below diagram shows the linear combination of weak classifiers into one final classifier.
<IPython.core.display.Image object>output

Decision stumps:

These can be considered as a form of weak classifiers. It's representation is:

f(x)=s(xk>c)f(x) = s(x_k > c)

where the value of the function is either 1 or -1 i.e if the value of xk>cx_k > c it comes out to be 1 and -1 otherwise. Here xkx_k is the kthk^{th} element of the vector x.

The number of parameters to a decision stump are relatively small. These are:

    • cRc \in R
    • k{1,....K}k \in \{1,....K\} where K is the dimensionality of x.
    • s{1,1}s \in \{-1,1\}

As the number of parameters are small, we can use brute force by discretizing the points from the lowest to the highest value in the training set, sequentialy mentioning all possible classifiers and picking the ones with the lowest training error.

We will be analyzing the working of Adaboost to get a better picture of it with a python implementation of the same in the next blog post.

References :

© 2024, blueqat Inc. All rights reserved