 # Internship Lecture 2: Discrete optimization (GA) / Continuous optimization (scipy optimize / torch optimize) / Bayesian optimization (optuna)

#### Yuichiro Minato a month ago

This is the second internship course. Last time we looked lightly at machine learning and neural network models. This time we will review optimization.

Optimization can be divided into two main types. One is "discrete optimization," in which values are not continuous, and the other is "continuous optimization," in which values are continuous.

Discrete optimization" takes discrete values at intervals and cannot be differentiated. Discrete Optimization is used for "combinatorial optimization problems," in which the best choice is selected from a large number of alternatives.

Continuous optimization (mathematical optimization) can be differentiated because the values are continuous. Although there are other ways to solve the problem without differentiation, it is mainly used to optimize models for machine learning and quantum computation.

In addition to that, you will learn roughly three types of optimization, including Bayesian optimization, which is used in machine learning parameter search.

Discrete Optimization
Genetic Algorithms (GA), Tabu Search, Simulated Annealing, etc. are well-known here. Often, calculations are performed using meta-heuristics methods, and algorithms such as quantum annealing and QAOA are used to deal with quantum. A cost function is set and the computation is done in the direction of minimizing it, for example. Minimization can be, for example, the shortest distance a car can travel or the most efficient schedule. In this lesson, we will learn GA.

Genetic Algorithm."
Description from wikipedia.

A genetic algorithm is a meta-heuristic algorithm that searches for approximate solutions, proposed by John H. Holland of the University of Michigan in 1975. Like artificial life, it uses chance factors to influence computer control; it is one of the four main evolutionary algorithms and the most commonly used of them.

The "Genetic Algorithm" Flow

Genetic algorithms are generally implemented in the following flow. In the following, the number of individuals is N and the maximum number of generations is G.

1.Prepare two sets that contain N individuals in advance. These two sets are called "current generation" and "next generation.
2.Generate N individuals randomly in the current generation. Calculate the degree of adaptation of each individual in the current generation using an evaluation function, and with a certain probability, perform one of the following three actions and store the results in the next generation.

A. Select two individuals to mate.
B. Select one individual to mutate.
C. Select one individual and copy it as is.

Repeat the above until there are N individuals in the next generation. When the number of individuals in the next generation reaches N, transfer all the contents of the next generation to the current generation.

1. Repeat the above steps up to the maximum number of generations G, and finally output the individual with the highest degree of adaptation in the current generation as the solution.

We will immediately try to implement it using vcopt, a python package for genetic algorithms our intern made.

`.css-wy3l36{position:absolute;top:0px;right:0px;}.css-u40cf7{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-appearance:none;-moz-appearance:none;-ms-appearance:none;appearance:none;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none;position:relative;white-space:nowrap;vertical-align:middle;outline:2px solid transparent;outline-offset:2px;line-height:1.2;border-radius:var(--chakra-radii-md);font-weight:var(--chakra-fontWeights-semibold);transition-property:var(--chakra-transition-property-common);transition-duration:var(--chakra-transition-duration-normal);height:var(--chakra-sizes-8);min-width:var(--chakra-sizes-8);font-size:var(--chakra-fontSizes-sm);-webkit-padding-start:var(--chakra-space-3);padding-inline-start:var(--chakra-space-3);-webkit-padding-end:var(--chakra-space-3);padding-inline-end:var(--chakra-space-3);background:var(--chakra-colors-teal-500);color:var(--chakra-colors-white);}.css-u40cf7:focus-visible,.css-u40cf7[data-focus-visible]{box-shadow:var(--chakra-shadows-outline);}.css-u40cf7[disabled],.css-u40cf7[aria-disabled=true],.css-u40cf7[data-disabled]{opacity:0.4;cursor:not-allowed;box-shadow:var(--chakra-shadows-none);}.css-u40cf7:hover,.css-u40cf7[data-hover]{background:var(--chakra-colors-teal-600);}.css-u40cf7:hover[disabled],.css-u40cf7[data-hover][disabled],.css-u40cf7:hover[aria-disabled=true],.css-u40cf7[data-hover][aria-disabled=true],.css-u40cf7:hover[data-disabled],.css-u40cf7[data-hover][data-disabled]{background:var(--chakra-colors-teal-500);}.css-u40cf7:active,.css-u40cf7[data-active]{background:var(--chakra-colors-teal-700);}.css-rwh0nv{display:inline-block;font-family:var(--chakra-fonts-mono);font-size:var(--chakra-fontSizes-sm);-webkit-padding-start:0.2em;padding-inline-start:0.2em;-webkit-padding-end:0.2em;padding-inline-end:0.2em;border-radius:var(--chakra-radii-sm);background:grey.100;color:grey.800;}`!pip install vcopt``
```.css-4m8w8z{display:inline-block;font-family:var(--chakra-fonts-mono);font-size:var(--chakra-fontSizes-sm);-webkit-padding-start:0.2em;padding-inline-start:0.2em;-webkit-padding-end:0.2em;padding-inline-end:0.2em;border-radius:var(--chakra-radii-sm);background:var(--chakra-colors-gray-100);color:var(--chakra-colors-gray-800);}```Requirement already satisfied: vcopt in /opt/conda/lib/python3.10/site-packages (1.6.3)

Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from vcopt) (1.21.0)

Requirement already satisfied: joblib in /opt/conda/lib/python3.10/site-packages (from vcopt) (1.1.0)

Requirement already satisfied: matplotlib in /opt/conda/lib/python3.10/site-packages (from vcopt) (3.5.2)

Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib->vcopt) (9.2.0)

Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib->vcopt) (1.4.4)

Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.10/site-packages (from matplotlib->vcopt) (2.8.2)

Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib->vcopt) (21.3)

Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.10/site-packages (from matplotlib->vcopt) (0.11.0)

Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib->vcopt) (4.34.4)

First, load the tool.

``from vcopt import vcopt``

Next, set the evaluation function for optimization. We are basically heading in the direction of minimizing this value. In this case we looked at a simple quadratic function.

``````#Evaluation function
def f(x):
return x**2``````

Next, set the range of parameters to search for。

``````#param
x_range = [-5, 5]
print(x_range)``````
``````[-5, 5]
``````

Then execute it immediately. The arguments are set in the following order: parameter search range, evaluation function, and target. The target is set to -9999 this time because we want to make it small.

``````#execute optimization
para, score = vcopt().rcGA(x_range, f, -9999)``````
``````________________________________________ info ________________________________________

para_range     : n=1

score_func     : <class 'function'>

aim            : ==-9999.0

show_pool_func : 'bar'

seed           : None

pool_num       : 10

max_gen        : None

core_num       : 1 (*vcopt, vc-grendel)

_______________________________________ start ________________________________________
``````
``````#result
print(para)
print(score)``````
``````[-0.01085575]

0.00011784741523930572
``````

When x=0, the minimization proceeds to y=0 and some value is obtained. In this way, the evaluation function and range of values are set, and the process proceeds in python following the genetic algorithm procedure to output a solution.

Continuous Optimization
This method optimizes models for machine learning, etc. The gradient method, which calculates the slope of a function so that it advances toward a lower value, is well-known. Other methods that do not use gradients are sometimes used because they are easier to use. It is also used in quantum computation in a method called quantum-classical hybrid. In this article, we will look at scipy.optimize.minimize, which is available in a library called scipy. We will also look at the optim library, which contains several optimization algorithms, including SGD, which is included in pytorch. The latter is a gradient-based algorithm.

scipy.optimize.minimize
This is an optimization algorithm in a library called scipy. Let's take a look at how to use it.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html

Some optimization algorithms require first-order second-order derivatives, while others do not. In this case, we will simply use Powell/Nelder-Mead/COBYLA, etc. as those that do not require them. The functions are the same as in the previous GA. Here, we will throw it with the initial x value.

``````import scipy.optimize
import numpy as np

x0 = np.array([1.0])
scipy.optimize.minimize(f, x0, method='Powell')``````
``````   direc: array([[1.]])

fun: array(0.)

message: 'Optimization terminated successfully.'

nfev: 19

nit: 2

status: 0

success: True

x: array([0.])``````
``````x0 = np.array([1.0])
`````` final_simplex: (array([[-8.8817842e-16],

[ 9.7656250e-05]]), array([7.88860905e-31, 9.53674316e-09]))

fun: 7.888609052210118e-31

message: 'Optimization terminated successfully.'

nfev: 34

nit: 17

status: 0

success: True

x: array([-8.8817842e-16])``````
``````x0 = np.array([1.0])
scipy.optimize.minimize(f, x0, method='COBYLA')``````
``````     fun: 1e-08

maxcv: 0.0

message: 'Optimization terminated successfully.'

nfev: 18

status: 1

success: True

x: array([0.0001])``````

It doesn't make much difference which one you use, but it will give you a similar solution. Next, let's look at PyTorch's optim.

PyTorch Optimizer
Optimization algorithms for machine learning are mainly gradient-based. It calculates the derivative from the initial position and descends to a lower one, and implements an algorithm based on the stochastic gradient method, called SGD.

First load the tool.

``!pip --no-cache-dir install torch``
``````Requirement already satisfied: torch in /opt/conda/lib/python3.10/site-packages (1.12.1)

Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch) (4.3.0)
``````
``import torch``
``````/opt/conda/lib/python3.10/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html

from .autonotebook import tqdm as notebook_tqdm
``````

Prepare the initial values and the optimization algorithm.
The initial values must be in the form torch.sensor.

There are various optimization algorithms based on SGD. lr is the learning rate, which in this case is set to 0.1.

``````x = torch.tensor([10.0],requires_grad=True)
op = torch.optim.SGD([x], lr=0.1)``````

In execution, the loop is passed around. The actual value of x is calculated by substituting the value of x as a function.

``````for _ in range(100):
y = x**2

y.backward()
op.step()

print(x)``````
``````tensor([2.0370e-09], requires_grad=True)
``````

The calculation was completed in this way. The implementation procedure was a little different from other optimization algorithms.

Bayesian optimization
When optimizing something, Bayesian optimization is a prioritized search. Bayesian optimization is often used in machine learning to search for a set of parameters that require various settings, such as choices, discrete values, and continuous values. In this article, we will look at a library called optuna.

``!pip install optuna sklearn``
``````Requirement already satisfied: optuna in /opt/conda/lib/python3.10/site-packages (3.0.0)

Collecting sklearn

Using cached sklearn-0.0-py2.py3-none-any.whl

Requirement already satisfied: typing-extensions>=3.10.0.0 in /opt/conda/lib/python3.10/site-packages (from optuna) (4.3.0)

Requirement already satisfied: cliff in /opt/conda/lib/python3.10/site-packages (from optuna) (4.0.0)

Requirement already satisfied: tqdm in /opt/conda/lib/python3.10/site-packages (from optuna) (4.64.0)

Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.10/site-packages (from optuna) (21.3)

Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from optuna) (1.21.0)

Requirement already satisfied: alembic in /opt/conda/lib/python3.10/site-packages (from optuna) (1.8.1)
``````

This time, we will optimize the parameters of the previous problem of guessing the rent, with the NN intermediate layer, the optimization algorithm, and the learning rate as variables, respectively. The parameter to be minimized is the one that is closest to the correct value of 11.5 for rent.

``````import torch
import random
import optuna
import numpy as np
import pandas as pd

#data
data = np.array([[8,3,83.74,18,22],[5,2,75.72,19,19],[7,2,31.47,46,7.8],[18,2,46.62,36,8],[8,3,41.02,49,8],[8,1,70.43,25,15.8],[8,1,70.43,25,15.8],[12,1,48.02,3,12.5],[10,4,58.57,36,11.8]])
df = pd.DataFrame(data,columns=['walk','floor','area','age','rent'])

#rent
y = df["rent"]

#others
X = df.drop(columns=["rent"], axis=1)

from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=1)

#fix the random seed
seed = 1
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)

def objective(trial):
#middle layer x
x = trial.suggest_int('x', 4, 20)

model2 = torch.nn.Sequential()

optimizer_name = trial.suggest_categorical('optimizer_name', ['SGD', 'Adam'])
learning_rate = trial.suggest_float('learning_rate', 0.01, 1.)
if optimizer_name == "SGD":
optimizer = torch.optim.SGD(model2.parameters(), lr = learning_rate)
else:
optimizer = torch.optim.Adam(model2.parameters(), lr = learning_rate)

#loss function
lossf = torch.nn.MSELoss()

#input and target
input = torch.Tensor(X_train.values)
target = torch.Tensor([[i] for i in y_train.values])

#train
for _ in range(100):
loss = lossf(model2(input), target)
loss.backward()
optimizer.step()

X_test = np.array([[9,5,58.3,34]])
pred = model2(torch.Tensor(X_test))

return (pred - 11.5)**2

study = optuna.create_study()
study.optimize(objective, n_trials=20)``````
``````[32m[I 2022-08-31 01:21:34,912][0m A new study created in memory with name: no-name-4803a178-32c9-490f-a115-7a5451ca4d3b[0m

[32m[I 2022-08-31 01:21:35,062][0m Trial 0 finished with value: 8.660400390625 and parameters: {'x': 20, 'optimizer_name': 'SGD', 'learning_rate': 0.1439265941761127}. Best is trial 0 with value: 8.660400390625.[0m

[32m[I 2022-08-31 01:21:35,230][0m Trial 1 finished with value: 3.636240005493164 and parameters: {'x': 7, 'optimizer_name': 'Adam', 'learning_rate': 0.019234503707883495}. Best is trial 1 with value: 3.636240005493164.[0m

[32m[I 2022-08-31 01:21:35,321][0m Trial 2 finished with value: 8.660406112670898 and parameters: {'x': 15, 'optimizer_name': 'SGD', 'learning_rate': 0.5354997444906043}. Best is trial 1 with value: 3.636240005493164.[0m

[32m[I 2022-08-31 01:21:35,461][0m Trial 3 finished with value: 5.593413352966309 and parameters: {'x': 10, 'optimizer_name': 'Adam', 'learning_rate': 0.3933733505221534}. Best is trial 1 with value: 3.636240005493164.[0m

[32m[I 2022-08-31 01:21:35,548][0m Trial 4 finished with value: 8.660406112670898 and parameters: {'x': 15, 'optimizer_name': 'SGD', 'learning_rate': 0.8059628003874073}. Best is trial 1 with value: 3.636240005493164.[0m

[32m[I 2022-08-31 01:21:35,706][0m Trial 5 finished with value: 0.014808638952672482 and parameters: {'x': 12, 'optimizer_name': 'Adam', 'learning_rate': 0.021233917148336535}. Best is trial 5 with value: 0.014808638952672482.[0m

[32m[I 2022-08-31 01:21:35,800][0m Trial 6 finished with value: 9.847352027893066 and parameters: {'x': 12, 'optimizer_name': 'Adam', 'learning_rate': 0.49292223138525765}. Best is trial 5 with value: 0.014808638952672482.[0m

[32m[I 2022-08-31 01:21:35,859][0m Trial 7 finished with value: 8.660406112670898 and parameters: {'x': 17, 'optimizer_name': 'SGD', 'learning_rate': 0.18078972340025243}. Best is trial 5 with value: 0.014808638952672482.[0m

[32m[I 2022-08-31 01:21:35,922][0m Trial 8 finished with value: 8.660344123840332 and parameters: {'x': 19, 'optimizer_name': 'SGD', 'learning_rate': 0.8687629045697284}. Best is trial 5 with value: 0.014808638952672482.[0m
``````

Thus, parameters were explored.

## Yuichiro Minato

@yuichiro_minato2

blueqat CEO/CTO