This is the second internship course. Last time we looked lightly at machine learning and neural network models. This time we will review optimization.
Optimization can be divided into two main types. One is "discrete optimization," in which values are not continuous, and the other is "continuous optimization," in which values are continuous.
Discrete optimization" takes discrete values at intervals and cannot be differentiated. Discrete Optimization is used for "combinatorial optimization problems," in which the best choice is selected from a large number of alternatives.
Continuous optimization (mathematical optimization) can be differentiated because the values are continuous. Although there are other ways to solve the problem without differentiation, it is mainly used to optimize models for machine learning and quantum computation.
In addition to that, you will learn roughly three types of optimization, including Bayesian optimization, which is used in machine learning parameter search.
Discrete Optimization
Genetic Algorithms (GA), Tabu Search, Simulated Annealing, etc. are well-known here. Often, calculations are performed using meta-heuristics methods, and algorithms such as quantum annealing and QAOA are used to deal with quantum. A cost function is set and the computation is done in the direction of minimizing it, for example. Minimization can be, for example, the shortest distance a car can travel or the most efficient schedule. In this lesson, we will learn GA.
Genetic Algorithm."
Description from wikipedia.
A genetic algorithm is a meta-heuristic algorithm that searches for approximate solutions, proposed by John H. Holland of the University of Michigan in 1975. Like artificial life, it uses chance factors to influence computer control; it is one of the four main evolutionary algorithms and the most commonly used of them.
The "Genetic Algorithm" Flow
Genetic algorithms are generally implemented in the following flow. In the following, the number of individuals is N and the maximum number of generations is G.
1.Prepare two sets that contain N individuals in advance. These two sets are called "current generation" and "next generation.
2.Generate N individuals randomly in the current generation. Calculate the degree of adaptation of each individual in the current generation using an evaluation function, and with a certain probability, perform one of the following three actions and store the results in the next generation.
A. Select two individuals to mate.
B. Select one individual to mutate.
C. Select one individual and copy it as is.
Repeat the above until there are N individuals in the next generation. When the number of individuals in the next generation reaches N, transfer all the contents of the next generation to the current generation.
- Repeat the above steps up to the maximum number of generations G, and finally output the individual with the highest degree of adaptation in the current generation as the solution.
We will immediately try to implement it using vcopt, a python package for genetic algorithms our intern made.
!pip install vcopt
Requirement already satisfied: vcopt in /opt/conda/lib/python3.10/site-packages (1.6.3)
Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from vcopt) (1.21.0)
Requirement already satisfied: joblib in /opt/conda/lib/python3.10/site-packages (from vcopt) (1.1.0)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.10/site-packages (from vcopt) (3.5.2)
Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib->vcopt) (9.2.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib->vcopt) (1.4.4)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.10/site-packages (from matplotlib->vcopt) (2.8.2)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib->vcopt) (21.3)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.10/site-packages (from matplotlib->vcopt) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib->vcopt) (4.34.4)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib->vcopt) (3.0.9)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib->vcopt) (1.16.0)
First, load the tool.
from vcopt import vcopt
Next, set the evaluation function for optimization. We are basically heading in the direction of minimizing this value. In this case we looked at a simple quadratic function.
#Evaluation function
def f(x):
return x**2
Next, set the range of parameters to search for。
#param
x_range = [-5, 5]
print(x_range)
[-5, 5]
Then execute it immediately. The arguments are set in the following order: parameter search range, evaluation function, and target. The target is set to -9999 this time because we want to make it small.
#execute optimization
para, score = vcopt().rcGA(x_range, f, -9999)
________________________________________ info ________________________________________
para_range : n=1
score_func : <class 'function'>
aim : ==-9999.0
show_pool_func : 'bar'
seed : None
pool_num : 10
max_gen : None
core_num : 1 (*vcopt, vc-grendel)
_______________________________________ start ________________________________________
Scoring first gen 10/10
| +< | gen=60, best_score=0.0001
_______________________________________ result _______________________________________
para = np.array([-0.010855754936406115])
score = 0.00011784741523930572
________________________________________ end _________________________________________
#result
print(para)
print(score)
[-0.01085575]
0.00011784741523930572
When x=0, the minimization proceeds to y=0 and some value is obtained. In this way, the evaluation function and range of values are set, and the process proceeds in python following the genetic algorithm procedure to output a solution.
Continuous Optimization
This method optimizes models for machine learning, etc. The gradient method, which calculates the slope of a function so that it advances toward a lower value, is well-known. Other methods that do not use gradients are sometimes used because they are easier to use. It is also used in quantum computation in a method called quantum-classical hybrid. In this article, we will look at scipy.optimize.minimize, which is available in a library called scipy. We will also look at the optim library, which contains several optimization algorithms, including SGD, which is included in pytorch. The latter is a gradient-based algorithm.
scipy.optimize.minimize
This is an optimization algorithm in a library called scipy. Let's take a look at how to use it.
Some optimization algorithms require first-order second-order derivatives, while others do not. In this case, we will simply use Powell/Nelder-Mead/COBYLA, etc. as those that do not require them. The functions are the same as in the previous GA. Here, we will throw it with the initial x value.
import scipy.optimize
import numpy as np
x0 = np.array([1.0])
scipy.optimize.minimize(f, x0, method='Powell')
direc: array([[1.]])
fun: array(0.)
message: 'Optimization terminated successfully.'
nfev: 19
nit: 2
status: 0
success: True
x: array([0.])
x0 = np.array([1.0])
scipy.optimize.minimize(f, x0, method='Nelder-Mead')
final_simplex: (array([[-8.8817842e-16],
[ 9.7656250e-05]]), array([7.88860905e-31, 9.53674316e-09]))
fun: 7.888609052210118e-31
message: 'Optimization terminated successfully.'
nfev: 34
nit: 17
status: 0
success: True
x: array([-8.8817842e-16])
x0 = np.array([1.0])
scipy.optimize.minimize(f, x0, method='COBYLA')
fun: 1e-08
maxcv: 0.0
message: 'Optimization terminated successfully.'
nfev: 18
status: 1
success: True
x: array([0.0001])
It doesn't make much difference which one you use, but it will give you a similar solution. Next, let's look at PyTorch's optim.
PyTorch Optimizer
Optimization algorithms for machine learning are mainly gradient-based. It calculates the derivative from the initial position and descends to a lower one, and implements an algorithm based on the stochastic gradient method, called SGD.
First load the tool.
!pip --no-cache-dir install torch
Requirement already satisfied: torch in /opt/conda/lib/python3.10/site-packages (1.12.1)
Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch) (4.3.0)
import torch
/opt/conda/lib/python3.10/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Prepare the initial values and the optimization algorithm.
The initial values must be in the form torch.sensor.
There are various optimization algorithms based on SGD. lr is the learning rate, which in this case is set to 0.1.
x = torch.tensor([10.0],requires_grad=True)
op = torch.optim.SGD([x], lr=0.1)
In execution, the loop is passed around.
The actual value of x is calculated by substituting the value of x as a function.
for _ in range(100):
y = x**2
op.zero_grad()
y.backward()
op.step()
print(x)
tensor([2.0370e-09], requires_grad=True)
The calculation was completed in this way. The implementation procedure was a little different from other optimization algorithms.
Bayesian optimization
When optimizing something, Bayesian optimization is a prioritized search. Bayesian optimization is often used in machine learning to search for a set of parameters that require various settings, such as choices, discrete values, and continuous values. In this article, we will look at a library called optuna.
!pip install optuna sklearn
Requirement already satisfied: optuna in /opt/conda/lib/python3.10/site-packages (3.0.0)
Collecting sklearn
Using cached sklearn-0.0-py2.py3-none-any.whl
Requirement already satisfied: colorlog in /opt/conda/lib/python3.10/site-packages (from optuna) (6.7.0)
Requirement already satisfied: typing-extensions>=3.10.0.0 in /opt/conda/lib/python3.10/site-packages (from optuna) (4.3.0)
Requirement already satisfied: cliff in /opt/conda/lib/python3.10/site-packages (from optuna) (4.0.0)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.10/site-packages (from optuna) (4.64.0)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.10/site-packages (from optuna) (21.3)
Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from optuna) (1.21.0)
Requirement already satisfied: alembic in /opt/conda/lib/python3.10/site-packages (from optuna) (1.8.1)
Requirement already satisfied: sqlalchemy>=1.1.0 in /opt/conda/lib/python3.10/site-packages (from optuna) (1.4.39)
Requirement already satisfied: scipy<1.9.0,>=1.7.0 in /opt/conda/lib/python3.10/site-packages (from optuna) (1.8.1)
Requirement already satisfied: cmaes>=0.8.2 in /opt/conda/lib/python3.10/site-packages (from optuna) (0.8.2)
Requirement already satisfied: PyYAML in /opt/conda/lib/python3.10/site-packages (from optuna) (6.0)
Collecting scikit-learn
Using cached scikit_learn-1.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30.5 MB)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.10/site-packages (from packaging>=20.0->optuna) (3.0.9)
Requirement already satisfied: greenlet!=0.4.17 in /opt/conda/lib/python3.10/site-packages (from sqlalchemy>=1.1.0->optuna) (1.1.2)
Requirement already satisfied: Mako in /opt/conda/lib/python3.10/site-packages (from alembic->optuna) (1.2.1)
Requirement already satisfied: PrettyTable>=0.7.2 in /opt/conda/lib/python3.10/site-packages (from cliff->optuna) (3.4.0)
Requirement already satisfied: importlib-metadata>=4.4 in /opt/conda/lib/python3.10/site-packages (from cliff->optuna) (4.11.4)
Requirement already satisfied: cmd2>=1.0.0 in /opt/conda/lib/python3.10/site-packages (from cliff->optuna) (2.4.2)
Requirement already satisfied: autopage>=0.4.0 in /opt/conda/lib/python3.10/site-packages (from cliff->optuna) (0.5.1)
Requirement already satisfied: stevedore>=2.0.1 in /opt/conda/lib/python3.10/site-packages (from cliff->optuna) (4.0.0)
Collecting threadpoolctl>=2.0.0
Using cached threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Requirement already satisfied: joblib>=1.0.0 in /opt/conda/lib/python3.10/site-packages (from scikit-learn->sklearn) (1.1.0)
Requirement already satisfied: wcwidth>=0.1.7 in /opt/conda/lib/python3.10/site-packages (from cmd2>=1.0.0->cliff->optuna) (0.2.5)
Requirement already satisfied: attrs>=16.3.0 in /opt/conda/lib/python3.10/site-packages (from cmd2>=1.0.0->cliff->optuna) (21.4.0)
Requirement already satisfied: pyperclip>=1.6 in /opt/conda/lib/python3.10/site-packages (from cmd2>=1.0.0->cliff->optuna) (1.8.2)
Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.10/site-packages (from importlib-metadata>=4.4->cliff->optuna) (3.8.0)
Requirement already satisfied: pbr!=2.1.0,>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from stevedore>=2.0.1->cliff->optuna) (5.10.0)
Requirement already satisfied: MarkupSafe>=0.9.2 in /opt/conda/lib/python3.10/site-packages (from Mako->alembic->optuna) (2.1.1)
Installing collected packages: threadpoolctl, scikit-learn, sklearn
Successfully installed scikit-learn-1.1.2 sklearn-0.0 threadpoolctl-3.1.0
This time, we will optimize the parameters of the previous problem of guessing the rent, with the NN intermediate layer, the optimization algorithm, and the learning rate as variables, respectively. The parameter to be minimized is the one that is closest to the correct value of 11.5 for rent.
import torch
import random
import optuna
import numpy as np
import pandas as pd
#data
data = np.array([[8,3,83.74,18,22],[5,2,75.72,19,19],[7,2,31.47,46,7.8],[18,2,46.62,36,8],[8,3,41.02,49,8],[8,1,70.43,25,15.8],[8,1,70.43,25,15.8],[12,1,48.02,3,12.5],[10,4,58.57,36,11.8]])
df = pd.DataFrame(data,columns=['walk','floor','area','age','rent'])
#rent
y = df["rent"]
#others
X = df.drop(columns=["rent"], axis=1)
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=1)
#fix the random seed
seed = 1
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
def objective(trial):
#middle layer x
x = trial.suggest_int('x', 4, 20)
model2 = torch.nn.Sequential()
model2.add_module('fc1', torch.nn.Linear(4, x))
model2.add_module('relu', torch.nn.ReLU())
model2.add_module('fc2', torch.nn.Linear(x, 1))
optimizer_name = trial.suggest_categorical('optimizer_name', ['SGD', 'Adam'])
learning_rate = trial.suggest_float('learning_rate', 0.01, 1.)
if optimizer_name == "SGD":
optimizer = torch.optim.SGD(model2.parameters(), lr = learning_rate)
else:
optimizer = torch.optim.Adam(model2.parameters(), lr = learning_rate)
#loss function
lossf = torch.nn.MSELoss()
#input and target
input = torch.Tensor(X_train.values)
target = torch.Tensor([[i] for i in y_train.values])
#train
for _ in range(100):
loss = lossf(model2(input), target)
optimizer.zero_grad()
loss.backward()
optimizer.step()
X_test = np.array([[9,5,58.3,34]])
pred = model2(torch.Tensor(X_test))
return (pred - 11.5)**2
study = optuna.create_study()
study.optimize(objective, n_trials=20)
[32m[I 2022-08-31 01:21:34,912][0m A new study created in memory with name: no-name-4803a178-32c9-490f-a115-7a5451ca4d3b[0m
[32m[I 2022-08-31 01:21:35,062][0m Trial 0 finished with value: 8.660400390625 and parameters: {'x': 20, 'optimizer_name': 'SGD', 'learning_rate': 0.1439265941761127}. Best is trial 0 with value: 8.660400390625.[0m
[32m[I 2022-08-31 01:21:35,230][0m Trial 1 finished with value: 3.636240005493164 and parameters: {'x': 7, 'optimizer_name': 'Adam', 'learning_rate': 0.019234503707883495}. Best is trial 1 with value: 3.636240005493164.[0m
[32m[I 2022-08-31 01:21:35,321][0m Trial 2 finished with value: 8.660406112670898 and parameters: {'x': 15, 'optimizer_name': 'SGD', 'learning_rate': 0.5354997444906043}. Best is trial 1 with value: 3.636240005493164.[0m
[32m[I 2022-08-31 01:21:35,461][0m Trial 3 finished with value: 5.593413352966309 and parameters: {'x': 10, 'optimizer_name': 'Adam', 'learning_rate': 0.3933733505221534}. Best is trial 1 with value: 3.636240005493164.[0m
[32m[I 2022-08-31 01:21:35,548][0m Trial 4 finished with value: 8.660406112670898 and parameters: {'x': 15, 'optimizer_name': 'SGD', 'learning_rate': 0.8059628003874073}. Best is trial 1 with value: 3.636240005493164.[0m
[32m[I 2022-08-31 01:21:35,706][0m Trial 5 finished with value: 0.014808638952672482 and parameters: {'x': 12, 'optimizer_name': 'Adam', 'learning_rate': 0.021233917148336535}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:35,800][0m Trial 6 finished with value: 9.847352027893066 and parameters: {'x': 12, 'optimizer_name': 'Adam', 'learning_rate': 0.49292223138525765}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:35,859][0m Trial 7 finished with value: 8.660406112670898 and parameters: {'x': 17, 'optimizer_name': 'SGD', 'learning_rate': 0.18078972340025243}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:35,922][0m Trial 8 finished with value: 8.660344123840332 and parameters: {'x': 19, 'optimizer_name': 'SGD', 'learning_rate': 0.8687629045697284}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:36,020][0m Trial 9 finished with value: 6.170830249786377 and parameters: {'x': 15, 'optimizer_name': 'Adam', 'learning_rate': 0.06719771416280539}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:36,137][0m Trial 10 finished with value: 7.751178741455078 and parameters: {'x': 4, 'optimizer_name': 'Adam', 'learning_rate': 0.3191524969920427}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:36,279][0m Trial 11 finished with value: 4.513744354248047 and parameters: {'x': 7, 'optimizer_name': 'Adam', 'learning_rate': 0.02837373827391308}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:36,403][0m Trial 12 finished with value: 3.097856044769287 and parameters: {'x': 8, 'optimizer_name': 'Adam', 'learning_rate': 0.2622355814702628}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:36,510][0m Trial 13 finished with value: 4.379225730895996 and parameters: {'x': 10, 'optimizer_name': 'Adam', 'learning_rate': 0.26472855178681554}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:36,616][0m Trial 14 finished with value: 1.1451605558395386 and parameters: {'x': 8, 'optimizer_name': 'Adam', 'learning_rate': 0.6468756571686085}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:36,777][0m Trial 15 finished with value: 7.920164108276367 and parameters: {'x': 5, 'optimizer_name': 'Adam', 'learning_rate': 0.7074803077867501}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:36,890][0m Trial 16 finished with value: 3.3594002723693848 and parameters: {'x': 12, 'optimizer_name': 'Adam', 'learning_rate': 0.9956476163968441}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:37,067][0m Trial 17 finished with value: 2.313199996948242 and parameters: {'x': 10, 'optimizer_name': 'Adam', 'learning_rate': 0.6474106907810393}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:37,255][0m Trial 18 finished with value: 1.465577483177185 and parameters: {'x': 13, 'optimizer_name': 'Adam', 'learning_rate': 0.5314045913335149}. Best is trial 5 with value: 0.014808638952672482.[0m
[32m[I 2022-08-31 01:21:37,365][0m Trial 19 finished with value: 10.00622272491455 and parameters: {'x': 9, 'optimizer_name': 'Adam', 'learning_rate': 0.6614527173591361}. Best is trial 5 with value: 0.014808638952672482.[0m
Thus, parameters were explored.