AutoGluon試してみた

Yuichiro Minato

2021/07/28 18:14

いろんなAutoMLを試してみてます。

こちらはblueqat cloudの有料版で利用できました。pythonのバージョンの関係で。

なんかいろいろありますが、評判がよさそうなAutoGluon使ってみます。pythonのバージョンを3.6で実行しました。

!python3 -m pip install -U pip
!python3 -m pip install -U setuptools wheel
!python3 -m pip install -U "mxnet<2.0.0"
!python3 -m pip install autogluon

ツールを読み込みます。なんかうまく読めなかったので、こちらの記事を参照しながら進めました。

https://qiita.com/dyamaguc/items/dded739f35e59a6491c8

from autogluon import TabularPrediction

データは5000にしてみました。

train_data = TabularPrediction.Dataset(file_path='https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
subsample_size = 5000
train_data = train_data.sample(n=subsample_size, random_state=0)
train_data.head()

目的変数をclassにします。classは2値になってます。

label = 'class'
print("Summary of class variable: \n", train_data[label].describe())

さっそく時間を測定して、予測をしてみます。

%%time
save_path = 'agModels-predictClass' # specifies folder to store trained models
predictor = TabularPrediction.fit(train_data=train_data, label=label, output_directory=dir)

学習結果を見てみます。

results = predictor.fit_summary()

学習結果は、スコアの良かった順にモデルが並んでいます。すごい簡単に使えました。

*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 weighted_ensemble_k0_l1 0.878 0.280509 5.315961 0.002607 0.637513 1 True 12
1 CatboostClassifier 0.872 0.021003 3.832444 0.021003 3.832444 0 True 9
2 LightGBMClassifierXT 0.870 0.024198 0.412143 0.024198 0.412143 0 True 8
3 LightGBMClassifier 0.862 0.023735 0.412788 0.023735 0.412788 0 True 7
4 NeuralNetClassifier 0.862 0.184480 21.246298 0.184480 21.246298 0 True 10
5 LightGBMClassifierCustom 0.856 0.024570 0.710194 0.024570 0.710194 0 True 11
6 ExtraTreesClassifierGini 0.852 0.111771 0.884064 0.111771 0.884064 0 True 3
7 RandomForestClassifierGini 0.850 0.111832 1.104524 0.111832 1.104524 0 True 1
8 RandomForestClassifierEntr 0.846 0.111633 1.388850 0.111633 1.388850 0 True 2
9 ExtraTreesClassifierEntr 0.836 0.111355 0.983722 0.111355 0.983722 0 True 4
10 KNeighborsClassifierUnif 0.750 0.104517 0.011417 0.104517 0.011417 0 True 5
11 KNeighborsClassifierDist 0.732 0.104449 0.009657 0.104449 0.009657 0 True 6
Number of models trained: 12
Types of models trained:
{'WeightedEnsembleModel', 'TabularNeuralNetModel', 'CatboostModel', 'RFModel', 'KNNModel', 'XTModel', 'LGBModel'}
Bagging used: False
Stack-ensembling used: False
Hyperparameter-tuning used: False
User-specified hyperparameters:
{'default': {'NN': [{}], 'GBM': [{}, {'extra_trees': True, 'AG_args': {'name_suffix': 'XT'}}], 'CAT': [{}], 'RF': [{'criterion': 'gini', 'AG_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'AG_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}], 'XT': [{'criterion': 'gini', 'AG_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'AG_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}], 'KNN': [{'weights': 'uniform', 'AG_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'AG_args': {'name_suffix': 'Dist'}}], 'custom': [{'num_boost_round': 10000, 'num_threads': -1, 'objective': 'binary', 'verbose': -1, 'boosting_type': 'gbdt', 'learning_rate': 0.03, 'num_leaves': 128, 'feature_fraction': 0.9, 'min_data_in_leaf': 5, 'two_round': True, 'seed_value': 0, 'AG_args': {'model_type': 'GBM', 'name_suffix': 'Custom', 'disable_in_hpo': True}}]}}
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
Plot summary of models saved to file: agModels-predictClass/SummaryOfModels.html
*** End of fit() summary ***

前の記事では、Auto Sklearnや今回Auto Gluonを使ってみました。簡単にいろんなモデルを比較できるので簡単ですね。データ業界は競争が激しいのだなと感じました。以上ですが、もっといろんなデータに関するテクニックを学んで実装をしてみたいと思いました。