Lower Confidence Bound#
This section demonstrates lower confidence bound (LCB) where the next infill point is obtained by minimizing LCB which is written as:
where \(\hat{y}(x)\) and \(\hat{\sigma}(x)\) are the prediction and uncertainty in prediction (standard deviation) from surrogate model, and \(A\) is a constant. The constant \(A\) is used to control the trade-off between exploration and exploitation. Larger the value of \(A\), more the algorithm will explore search space. Smaller the value of \(A\), more the algorithm will exploit the current best solution. The value of \(A\) is usually set to 2 or 3. Below code imports required packages, defines modified branin function, and creates plotting data:
# Imports
import numpy as np
from smt.surrogate_models import KRG
from smt.sampling_methods import LHS, FullFactorial
import matplotlib.pyplot as plt
from pymoo.core.problem import Problem
from pymoo.algorithms.soo.nonconvex.de import DE
from pymoo.optimize import minimize
from pymoo.config import Config
Config.warnings['not_compiled'] = False
def modified_branin(x):
dim = x.ndim
if dim == 1:
x = x.reshape(1, -1)
x1 = x[:,0]
x2 = x[:,1]
a = 1.
b = 5.1 / (4.*np.pi**2)
c = 5. / np.pi
r = 6.
s = 10.
t = 1. / (8.*np.pi)
y = a * (x2 - b*x1**2 + c*x1 - r)**2 + s*(1-t)*np.cos(x1) + s + 5*x1
if dim == 1:
y = y.reshape(-1)
return y
# Bounds
lb = np.array([-5, 0])
ub = np.array([10, 15])
# Plotting data
sampler = FullFactorial(xlimits=np.array( [[lb[0], ub[0]], [lb[1], ub[1]]] ))
num_plot = 400
xplot = sampler(num_plot)
yplot = modified_branin(xplot)
Differential evolution (DE) from pymoo
is used for minimizing the lower confidence bound. Below code defines problem class and initializes DE. Note how problem class uses both prediction and uncertainty to define LCB.
# Problem class
class LCB(Problem):
def __init__(self, sm):
super().__init__(n_var=2, n_obj=1, n_constr=0, xl=lb, xu=ub)
self.sm = sm # store the surrogate model
def _evaluate(self, x, out, *args, **kwargs):
A = 3
out["F"] = self.sm.predict_values(x) - A * np.sqrt(self.sm.predict_variances(x)) # Standard deviation
# Optimization algorithm
algorithm = DE(pop_size=100, CR=0.8, dither="vector")
Below block of code creates 5 training points and performs sequential sampling using LCB. The maximum number of iterations is set to 25 and a convergence criterion is defined based on the change in true function value for infill points between two consecutive iterations.
sampler = LHS( xlimits=np.array( [[lb[0], ub[0]], [lb[1], ub[1]]] ), criterion='ese')
# Training data
num_train = 5
xtrain = sampler(num_train)
ytrain = modified_branin(xtrain)
# Variables
itr = 0
max_itr = 25
tol = 1e-3
delta_f = [1]
bounds = [(lb[0], ub[0]), (lb[1], ub[1])]
corr = 'squar_exp'
fs = 12
# Sequential sampling Loop
while itr < max_itr and tol < delta_f[-1]:
print("\nIteration {}".format(itr + 1))
# Initializing the kriging model
sm = KRG(theta0=[1e-2], corr=corr, theta_bounds=[1e-6, 1e2], print_global=False)
# Setting the training values
sm.set_training_values(xtrain, ytrain)
# Creating surrogate model
sm.train()
# Find the minimum of surrogate model
result = minimize(LCB(sm), algorithm, verbose=False)
# Computing true function value at infill point
y_infill = modified_branin(result.X.reshape(1,-1))
if itr == 0:
delta_f = [np.abs(result.F - y_infill)/np.abs(result.F)]
else:
delta_f = np.append(delta_f, np.abs(result.F - y_infill)/np.abs(result.F))
print("Change in f: {}".format(delta_f[-1]))
print("f*: {}".format(y_infill))
print("x*: {}".format(result.X))
# Appending the the new point to the current data set
xtrain = np.vstack(( xtrain, result.X.reshape(1,-1) ))
ytrain = np.append( ytrain, y_infill )
itr = itr + 1 # Increasing the iteration number
# Printing the final results
print("\nBest obtained point:")
print("x*: {}".format(xtrain[np.argmin(ytrain)]))
print("f*: {}".format(np.min(ytrain)))
Iteration 1
Change in f: [0.82629283]
f*: [-15.86579262]
x*: [-4.1447757 14.95867195]
Iteration 2
Change in f: 8.785577236201734
f*: [283.12909601]
x*: [-5.00000000e+00 1.77841142e-12]
Iteration 3
Change in f: 1.181492020710294
f*: [39.92483075]
x*: [-1.58172211 15. ]
Iteration 4
Change in f: 1.1375959952809056
f*: [24.63609396]
x*: [2.25294317e+00 1.49544216e-17]
Iteration 5
Change in f: 1.3278481309980739
f*: [60.96088904]
x*: [1.00000000e+01 1.93586961e-15]
Iteration 6
Change in f: 1.1490009953234266
f*: [22.61963859]
x*: [0.56287704 6.44297449]
Iteration 7
Change in f: 1.2296539552475179
f*: [21.39175629]
x*: [-5. 11.38494659]
Iteration 8
Change in f: 0.6363844452098978
f*: [-15.82902209]
x*: [-3.25878153 12.52464905]
Iteration 9
Change in f: 0.8262348028432618
f*: [-3.77340434]
x*: [-1.99222383 10.02343312]
Iteration 10
Change in f: 0.18483278053574043
f*: [-15.0066736]
x*: [-3.72557506 15. ]
Iteration 11
Change in f: 0.05904914293384039
f*: [-16.51316754]
x*: [-3.76701529 13.5022388 ]
Iteration 12
Change in f: 0.0407660189289802
f*: [-16.09661078]
x*: [-3.55634078 12.60612523]
Iteration 13
Change in f: 0.0011318254014887669
f*: [-16.64375032]
x*: [-3.68349124 13.62680128]
Iteration 14
Change in f: 7.092807339645009e-05
f*: [-16.64370748]
x*: [-3.69783167 13.64784312]
Best obtained point:
x*: [-3.68349124 13.62680128]
f*: -16.643750324531275
Below block of code plots the convergence of lower confidence bound process.
####################################### Plotting convergence history
fig, ax = plt.subplots(1, 2, figsize=(14,6))
ax[0].plot(np.arange(itr) + 1, xtrain[num_train:,0], c="black", label='$x_1^*$', marker=".")
ax[0].plot(np.arange(itr) + 1, xtrain[num_train:,1], c="green", label='$x_2^*$', marker=".")
ax[0].plot(np.arange(itr) + 1, ytrain[num_train:], c="blue", label='$f^*$', marker=".")
ax[0].set_xlabel("Iterations", fontsize=14)
ax[0].set_ylabel("$f(x^*)$ and ($x_1^*$, $x_2^*$)", fontsize=14)
ax[0].legend()
ax[0].set_xlim(left=1, right=itr)
ax[0].grid()
ax[1].plot(np.arange(itr) + 1, delta_f, c="red", marker=".")
ax[1].plot(np.arange(itr+1), [tol]*(itr+1), c="black", linestyle="--", label="Tolerance")
ax[1].set_xlabel("Iterations", fontsize=14)
ax[1].set_ylabel(r"$| \hat{f}^* - f(x^*) | / | \hat{f}^* |$", fontsize=14)
ax[1].set_xlim(left=1, right=itr)
ax[1].grid()
ax[1].legend()
ax[1].set_yscale("log")
fig.suptitle("Lower Confidence Bound".format(itr), fontsize=15)
Text(0.5, 0.98, 'Lower Confidence Bound')
The figure on the left shows the history of infill points and corresponding true function values, and figure on the right shows the convergence of LCB process. At the start, algorithm explores the search space and then it starts exploiting the current best solution. The process stopped before maximum number of iterations since the convergence criteria is met.
Below code plots the infill points.
####################################### Plotting initial samples and infills
# Reshaping into grid
reshape_size = int(np.sqrt(num_plot))
X = xplot[:,0].reshape(reshape_size, reshape_size)
Y = xplot[:,1].reshape(reshape_size, reshape_size)
Z = yplot.reshape(reshape_size, reshape_size)
# Level
levels = np.linspace(-17, -5, 5)
levels = np.concatenate((levels, np.linspace(0, 30, 7)))
levels = np.concatenate((levels, np.linspace(35, 60, 5)))
levels = np.concatenate((levels, np.linspace(70, 300, 12)))
fig, ax = plt.subplots(figsize=(8,6))
CS=ax.contour(X, Y, Z, levels=levels, colors='k', linestyles='solid', alpha=0.5, zorder=-10)
ax.clabel(CS, inline=1, fontsize=8)
ax.scatter(xtrain[0:num_train,0], xtrain[0:num_train,1], c="red", label='Initial samples')
ax.scatter(xtrain[num_train:,0], xtrain[num_train:,1], c="black", label='Infills')
ax.plot(-3.689, 13.630, 'g*', markersize=15, label="Global minimum")
ax.plot(xtrain[np.argmin(ytrain)][0], xtrain[np.argmin(ytrain)][1], 'bo', label="Obtained minimum")
ax.legend()
ax.set_xlabel("$x_1$", fontsize=14)
ax.set_ylabel("$x_2$", fontsize=14)
ax.set_title("Modified Branin function", fontsize=15)
Text(0.5, 1.0, 'Modified Branin function')
As can be seen in above plot, some infills are used for exploration and some are used for exploitation. But it is hard to know what value of \(A\) to use. You can change the value of \(A\) and see how it changes the infill location.
NOTE: Due to randomness in differential evolution, results may vary slightly between runs. So, it is recommend to run the code multiple times to see average behavior.
Final result:
Parameter |
True minimum |
Obtained minimum |
---|---|---|
\(x_1^*\) |
-3.689 |
-3.684 |
\(x_2^*\) |
13.630 |
13.627 |
\(f(x_1^*, x_2^*)\) |
-16.644 |
-16.644 |