Constrained Optimization#

This section covers local constrained optimization. We will be using scipy.optimize module for minimizing an analytical function with constraints. Going through the following documentation will help in understanding the code:

  1. Scipy minimize

  2. Matplotlib contour plot

  3. Scipy nonlinear constraint

Below block of code imports all the required packages:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.optimize import NonlinearConstraint

Consider the optimization problem described below:

\[\begin{split} \begin{aligned} \min_{x_1,x_2} \quad & x_1^2 - \frac{x_1}{2} - x_2 - 2 \\ \textrm{subject to} \quad & x_1^2 -4x_1 + x_2 + 1 \leq 0 \\ \quad & \frac{x_1^2}{2} + x_2^2 - x_1 - 4 \leq 0 \\ \quad & -3 \leq x_1 \leq 5 \\ \quad & -3 \leq x_2 \leq 4 \\ \end{aligned} \end{split}\]

Question: Plot the objective function contour and constraint boundaries.

Answer: Below block of code defines various functions. Read the comments to understand the code.

def f(x):
    """
        Function which calculates the function value at given x.
        
        Input:
        x - 1D/2D numpy array
    """

    # Number of dimensions of input
    dim = x.ndim

    # To ensure n x 2
    if dim == 1:
        x = x.reshape(1,-1)

    x1 = x[:,0]
    x2 = x[:,1]
    
    y = x1**2 - x1/2 - x2 - 2

    y = y.reshape(-1,1)
    
    if dim == 1:
        y = y.reshape(-1,)
    
    return y

def g1(x):
    """
        Function which calculates the g1 value at given x.
        
        Input:
        x - 1D/2D numpy array
    """

    # Number of dimensions of input
    dim = x.ndim

    # To ensure n x 2
    if dim == 1:
        x = x.reshape(1,-1)

    x1 = x[:,0]
    x2 = x[:,1]
    
    g1 = x1**2 - 4*x1 + x2 + 1

    g1 = g1.reshape(-1,1)
    
    if dim == 1:
        g1 = g1.reshape(-1,)
    
    return g1

def g2(x):
    """
        Function which calculates the g2 value at given x.
        
        Input:
        x - 1D/2D numpy array
    """

    # Number of dimensions of input
    dim = x.ndim

    # To ensure n x 2
    if dim == 1:
        x = x.reshape(1,-1)

    x1 = x[:,0]
    x2 = x[:,1]
    
    g2 = x1**2 / 2 + x2**2 - x1 - 4

    g2 = g2.reshape(-1,1)
    
    if dim == 1:
        g2 = g2.reshape(-1,)
    
    return g2

def plot_prob():
    """
        Function which plots the function contour and constraints
    """

    num_points = 100

    # Defining x and y values
    x = np.linspace(-3,5,num_points)
    y = np.linspace(-3,4,num_points)

    # Creating a mesh at which values and 
    # gradient will be evaluated and plotted
    X, Y = np.meshgrid(x, y)

    # Evaluating the function values at meshpoints
    obj = f(np.hstack((X.reshape(-1,1),Y.reshape(-1,1)))).reshape(num_points,num_points)
    const1 = g1(np.hstack((X.reshape(-1,1),Y.reshape(-1,1)))).reshape(num_points,num_points)
    const2 = g2(np.hstack((X.reshape(-1,1),Y.reshape(-1,1)))).reshape(num_points,num_points)

    # Plotting the filled contours
    fig, ax = plt.subplots(figsize=(7,5))
    CS = ax.contourf(X, Y, obj, levels=30)
    fig.colorbar(CS, orientation='vertical')

    # Plotting g1
    ax.contour(X, Y, const1, levels=[0], colors="k")
    ax.contourf(X, Y, const1, levels=np.linspace(0,const1.max()), colors="white", alpha=0.3, antialiased = True)
    ax.annotate('g1', xy =(3.1, 2.0))
 
    # Plotting g2
    ax.contour(X, Y, const2, levels=[0], colors="k")
    ax.contourf(X, Y, const2, levels=np.linspace(0,const2.max()), colors="white", alpha=0.3, antialiased = True)
    ax.annotate('g2', xy =(0.0, 2.2))

    ax.set_xlabel("$x_1$", fontsize=14)
    ax.set_ylabel("$x_2$", fontsize=14)

    # Few annotations
    ax.annotate('A', xy =(-0.1, -1.8), c='r', fontweight='bold')
    ax.annotate('B', xy =(1.0, 1.8), c='r', fontweight='bold')
    ax.annotate('C', xy =(3.0, 1.2), c='r', fontweight='bold')
    ax.annotate('D', xy =(3.5, -0.6), c='r', fontweight='bold')

    return ax

Below block of code makes a contour plot of the objective with constraints using the functions defined in previous block:

plot_prob()
<Axes: xlabel='$x_1$', ylabel='$x_2$'>
../_images/9569053f4792861499f7b02df61558e8ccbeeb0dced83795ae71bd9161745401.png

The black curves denote \(g_i=0\) and the region ABCDA is feasible, remaining design space (white masked region) is infeasible.

Penalty method#

The goal of penalty methods is to convert a constrained optimization problem into an unconstrained optimization problem by adding a penalty term to the objective function. This penalty term makes objective function larger when the constraints are violated, otherwise penalty is zero. There are many ways to implement penalty function but we will use quadratic penalty function here. Optimization problem statement will be transformed to:

\[\begin{split} \begin{aligned} \min_{x_1,x_2} \quad & f(x_1,x_2) + \lambda _1 \frac{[\max(g_1(x_1,x_2),0)]^2}{2} + \lambda _2 \frac{[\max(g_2(x_1,x_2),0)]^2}{2} \\ \textrm{subject to} \quad & -3 \leq x_1 \leq 5 \\ \quad & -3 \leq x_2 \leq 4 \\ \text{where} \quad & f(x_1, x_2) = x_1^2 - \frac{x_1}{2} - x_2 - 2 \\ \quad & g_1(x_1, x_2) = x_1^2 -4x_1 + x_2 + 1 \\ \quad & g_2(x_1, x_2) = \frac{x_1^2}{2} + x_2^2 - x_1 - 4 \end{aligned} \end{split}\]

So, the second and third term will be zero when the constraints are satisfied. Otherwise, the penalty term will be positive and will increase the objective function value. Now, the problem is unconstrained and we can use methods like BFGS or Conjugate Gradient to solve the above problem.

Below block of code defines various functions which are used during the optimization. Read comments in the function for more details.

def callback(x):
    """
        Function which is called after every iteration of optimization.
        It stores the value of x1, x2, and function value.
        
        Input: Current x value
        Output: None
    """
    
    history["x1"].append(x[0])
    history["x2"].append(x[1])
    history["f"].append(f(x))
    history["g1"].append(g1(x))
    history["g2"].append(g2(x))
    
def opt_plots(history, method):
    """
        Function used for plotting the results of the optimization.
        
        Input: 
        history - A dict which contains three key-value pairs - x1, x2, and f.
        Each of this pair should be a list which contains values of 
        the respective quantity at each iteration. Look at the usage of this
        function in following blocks for better understanding.
        
        method - A str which denotes the method used for optimization.
        It is only used in the title of the plots.
    """
    
    # Number of iterations.
    # Subtracting 1, since it also contains starting point
    num_itr = len(history["x1"]) - 1

    # Plotting the convergence history
    fig, ax = plt.subplots(figsize=(6,5))
    ax.plot(np.arange(num_itr+1), history["x1"], "k", marker=".", label="$x_1$")
    ax.plot(np.arange(num_itr+1), history["x2"], "b", marker=".", label="$x_2$")
    ax.plot(np.arange(num_itr+1), history["f"], "g", marker=".", label="$f$")
    ax.plot(np.arange(num_itr+1), history["g1"], "k--", marker=".", label="$g_1$")
    ax.plot(np.arange(num_itr+1), history["g2"], "g--", marker=".", label="$g_2$")
    ax.set_xlabel("Iterations", fontsize=14)
    ax.set_xlim(left=0, right=num_itr)
    ax.set_ylabel("Quantities", fontsize=14)
    ax.grid()
    ax.legend(fontsize=12)
    ax.set_title("Convergence history - " + method, fontsize=14)

    # Plotting function contours
    ax = plot_prob()

    # PLotting other things
    ax.plot(history["x1"], history["x2"], "k", marker=".", label="Path")
    ax.scatter(history["x1"][0], history["x2"][0], label="Starting point", c="red")
    ax.scatter(history["x1"][-1], history["x2"][-1], label="Final point", c="orange")
    ax.legend(loc='upper left')

Below block of code defines a function which returns penalized objective function value.

def penalized_obj(x):
    """
        Function for evaluating penalized objective.
    """
    
    # Compute objective and constraints
    obj = f(x)
    const1 = g1(x)
    const2 = g2(x)
    
    # Penalty multipliers
    p1 = 100
    p2 = 100
    
    # Returns quadratic penalized function
    return obj + p1 * max(const1,0)**2 / 2 + p2 * max(const2,0)**2 / 2

NOTE: The penalty parameters \(\lambda_1\) and \(\lambda_2\) are kept constant here, but they can be updated during the optimization process to improve the convergence.

Below block of code defines various parameters for optimization using BFGS method.

# Starting point
x0 = np.array([-2.0, -2.0])

# Solver
method = "L-BFGS-B"

# Defines which finite difference scheme to use. Possible values are:
# "2-point" - forward/backward difference
# "3-point" - central difference
# "cs" - complex step
jac = "3-point"

# Defining dict for storing history of optimization
history = {}
history["x1"] = [x0[0]]
history["x2"] = [x0[1]]
history["f"]= [f(x0)]
history["g1"] = [g1(x0)]
history["g2"] = [g2(x0)]

# Solver options
options ={
    "disp": True
}

bounds = [(-3,5), (-3,4)]

# Minimize the function
result = minimize(fun=penalized_obj, x0=x0, method=method, jac=jac, 
                  callback=callback, options=options, bounds=bounds)

# Convergence plots
opt_plots(history, method=method)
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            2     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  6.85500D+03    |proj g|=  7.00000D+00

At iterate    1    f= -1.86931D+00    |proj g|=  1.96247D+00

At iterate    2    f= -3.01019D+00    |proj g|=  1.00000D+00

At iterate    3    f= -3.12991D+00    |proj g|=  9.50783D-01

At iterate    4    f= -3.13199D+00    |proj g|=  9.84686D-01

At iterate    5    f= -3.21668D+00    |proj g|=  1.48971D+00

At iterate    6    f= -3.35361D+00    |proj g|=  4.16358D+00

At iterate    7    f= -3.39142D+00    |proj g|=  4.40044D+00

At iterate    8    f= -3.47010D+00    |proj g|=  2.76947D+00

At iterate    9    f= -3.50464D+00    |proj g|=  1.09467D+00

At iterate   10    f= -3.51375D+00    |proj g|=  1.12794D+00

At iterate   11    f= -3.52039D+00    |proj g|=  1.04807D+00

At iterate   12    f= -3.52312D+00    |proj g|=  9.74099D-01

At iterate   13    f= -3.52554D+00    |proj g|=  8.84907D-01

At iterate   14    f= -3.52650D+00    |proj g|=  1.19177D-01

At iterate   15    f= -3.52655D+00    |proj g|=  1.23359D-01

At iterate   16    f= -3.52672D+00    |proj g|=  2.15794D-01

At iterate   17    f= -3.52688D+00    |proj g|=  2.59330D-01

At iterate   18    f= -3.52703D+00    |proj g|=  2.70157D-01

At iterate   19    f= -3.52704D+00    |proj g|=  2.63544D-01

At iterate   20    f= -3.52713D+00    |proj g|=  2.75143D-02

At iterate   21    f= -3.52714D+00    |proj g|=  3.10332D-05

At iterate   22    f= -3.52714D+00    |proj g|=  3.67419D-09

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    2     22    100     24     0     0   3.674D-09  -3.527D+00
  F =  -3.5271354414899267     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            
 Bad direction in the line search;
   refresh the lbfgs memory and restart the iteration.
../_images/8216ce25b51bf9f98aab910a23df4ce4d3ac9d10b7db884d406e9850fa8feb71.png ../_images/9435f73742f4092c90ec0c4aec4f03959a6476c22787e55f158fc6a120f09d4b.png

By using the penalty method, you can reach the optimum while satisfying the constraints. Optimum point lies at the intersection of both the constraints (which makes both constraints active). It is recommended to change optimization parameters (such as starting point, penalty multiplier, etc.) and see how the solution changes.

NOTE: The optimization method used is L-BGS-B which is a modified version of BFGS for bounded optimization.

Question: Use SLSQP method for solving the original constrained optimization problem.

Answer: Below block of code defines various parameters for optimization using SLSQP method:

# Starting point
x0 = np.array([-2, -2])

# Solver
method = "SLSQP"

# Defines which finite difference scheme to use. Possible values are:
# "2-point" - forward/backward difference
# "3-point" - central difference
# "cs" - complex step
jac = "3-point"

# Defining dict for storing history of optimization
history = {}
history["x1"] = [x0[0]]
history["x2"] = [x0[1]]
history["f"] = [f(x0)]
history["g1"] = [g1(x0)]
history["g2"] = [g2(x0)]

# Solver options
options ={
    "disp": True
}

# Setting bounds
bounds = [(-3,5), (-3,4)]

# Setting constraints - optimizer needs a list of Nonlinear constraints objects
# Read the documentation for more details.
constraints = [NonlinearConstraint(g1, -np.inf, 0),
               NonlinearConstraint(g2, -np.inf, 0)]

# Minimize the function
result = minimize(fun=f, x0=x0, method=method, jac=jac, 
                  constraints=constraints, callback=callback, 
                  options=options, bounds=bounds)

# Print value of x
print("Value of x1 at optimum: {}".format(result.x[0]))
print("Value of x2 at optimum: {}".format(result.x[1]))
print("Value of g1 at optimum: {}".format(g1(result.x)))
print("Value of g2 at optimum: {}".format(g2(result.x)))

# Convergence plots
opt_plots(history, method=method)
Optimization terminated successfully    (Exit mode 0)
            Current function value: -3.523406546437449
            Iterations: 7
            Function evaluations: 36
            Gradient evaluations: 7
Value of x1 at optimum: 1.0623763071639867
Value of x2 at optimum: 2.120861810878845
Value of g1 at optimum: [2.46287879e-10]
Value of g2 at optimum: [2.22692002e-07]
../_images/159efbcd30663348027219a6ea4211111e09c7e42516088190acd3737c6dd3e4.png ../_images/e7bfd3ef25716cbc9cc034d999c2af462612b904c39b0f76c46502fad5432e0b.png

SLSQP reaches the same optimum as penalty method but with much less number of iterations (and function evaluations). It is specifically designed for constrained optimization and uses Quadratic Programming approach. It is recommended to use SQP based methods for constrained optimization.

SLSQP method is very sensitive to the starting point. Try changing the starting point and see how the optimization path changes. For example, try (4,-2) as the starting point and you will see that the optimization process goes to the boundary of contour plot and reverts back towards to optimum since the bounds are provided to optimizer.