common.title

Overview
Service overview
Terms of service

Privacy policy

Contact

Sign in
Sign up
common.title

AdaBoost - Boosting Algorithm (Implementation)

Gaurav Singh

2021/07/07 11:11

#AdaBoost #Boosting Algorithm #Implementation #Python #Two-Classes

In the last article, we discussed about why does AdaBoost work? which gave us an intuitive idea about it's working using exponential loss function optimization. Although we get a good enough understanding, still it may not be able to give us the complete picture. We also used scikit-learn library's Adaboostclassifier class to predict the error of a given classification problem.

Here we will try to implement AdaBoost Classifier function in python based on the pseudocode in the first AdaBoost article in the given series. Here we are dealing with a classification problem with two classes.

We will be using a Decision tree as the base estimator and we will fit new classifiers with weights.

<IPython.core.display.Image object>output

First we import the necessary packages for our AdaBoost Classifier.

import numpy as np from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn import datasets

We divide our dataset into training and test set using the train_test_split function.

hastie = datasets.make_hastie_10_2() x , y = hastie X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.3)

The function to calculate the error_rate is as below:

def error_rate(pred, Y): error = 0 for i in range(len(Y)): if pred[i] != Y[i]: error = error + 1 return error / len(Y)

Next, we will write the Adaboost implementation function where we use the algorithm as given in the above pseudocode. The output of the given function will be the error rate of the test set and the training set.

def adaboost(X_train, Y_train, X_test, Y_test, M, clf): n_train = len(X_train) n_test = len(X_test) #Initializing the weights w = np.ones(n_train) / n_train pred_train, pred_test = [np.zeros(n_train), np.zeros(n_test)] for i in range(M): #Fitting the classifier with weights clf.fit(X_train, Y_train, sample_weight = w) pred_train_ite = clf.predict(X_train) pred_test_ite = clf.predict(X_test) diff=[] diff2=[] for i in range(len(Y_train)): if pred_train_ite[i] != Y_train[i]: diff.append(1) else: diff.append(0) for i in range(len(diff)): if diff[i] == 0: diff2.append(-1) else: diff2.append(1) #error summ = 0 for i in range(len(w)): summ = summ + w[i]*diff[i] err_m = summ / sum(w) #Alpha value alpha_m = 0.5 * np.log( (1 - err_m) / float(err_m)) #Updating weights w = np.multiply(w, np.exp([float(x) * alpha_m for x in diff2])) #Summation of the predictions for the training and test set pred_train = [sum(x) for x in zip(pred_train, [x * alpha_m for x in pred_train_ite])] pred_test = [sum(x) for x in zip(pred_test, [x * alpha_m for x in pred_test_ite])] #signum function to the summation pred_train, pred_test = np.sign(pred_train), np.sign(pred_test) #Returning the error rate return error_rate(pred_train, Y_train), \ error_rate(pred_test, Y_test)

We use a Decision tree as a base classifier

clf_tree = DecisionTreeClassifier()
err_train=[] err_test=[] x_range = range(10, 500, 10) for i in x_range: err_i = adaboost(X_train, Y_train, X_test, Y_test, i, clf_tree) err_train.append(err_i[0]) err_test.append(err_i[1])

The training and test error rates in each iteration are stored in the given arrays.

Next we will be looking into Multi-Class Adaboost Algorothm by Trevor Hastie and Ji Zhu and try to implement in the coming articles.

© 2024, blueqat Inc. All rights reserved