Gaussian Process based Models#

This section provides implementation for concepts related to gaussian process based models. Below block imports required packages:

import numpy as np
from smt.surrogate_models import KRG
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error

Forrester function (given below) will be used for demonstration.

\[\begin{split} y(x) = (6x - 2)^2sin(12x-4) \\ 0 \leq x \leq 1 \end{split}\]

Below block of code defines this function:

forrester = lambda x: (6*x-2)**2*np.sin(12*x-4)

Below block of code creates train dataset consists of 4 equally spaced points:

# Training data
xtrain = np.linspace(0, 1, 4)
ytrain = forrester(xtrain)

# Plotting data
xplot = np.linspace(0, 1, 100)
yplot = (6*xplot-2)**2 * np.sin(12*xplot-4)

Kriging model is created using smt package. A multistart gradient based optimization method is used for finding the optimal length scale for each dimension of the input space. Following are the important parameters while creating the model:

  • theta0 : Initial hyperparameters for the model

  • theta_bounds : Bounds for hyperparameters

  • n_start : Number of starting points for optimization

  • hyper_opt : Optimization method for hyperparameters

  • corr : correlation function

Please read documentation for more details. Below block of code creates the model using radial basis (exponential) kernel function and plots the prediction:

# Create model
corr = 'squar_exp'
sm = KRG(theta0=[1e-2], corr=corr, theta_bounds=[1e-6, 1], print_global=False)
sm.set_training_values(xtrain, ytrain)
sm.train()

print("Optimal theta: {}".format(sm.optimal_theta))

# Prediction for plotting
yplot_pred = sm.predict_values(xplot).reshape(-1,)
yplot_var = sm.predict_variances(xplot).reshape(-1,)

# Plotting
fig, ax = plt.subplots()
ax.plot(xtrain, ytrain, 'ro', label='Training data')
ax.plot(xplot, yplot, 'b--', label='True function')
ax.plot(xplot, yplot_pred, 'k', label='Prediction')
ax.fill_between(xplot, yplot_pred - 2*np.sqrt(yplot_var), yplot_pred + 2*np.sqrt(yplot_var), color='lightblue', label='95% CI')
ax.set_xlim([0, 1])
plt.xlabel("x", fontsize=14)
plt.ylabel("y", fontsize=14)
ax.legend()
ax.grid()
Optimal theta: [1.]
_images/cfdc903240be2dde72edaed95ac7e09aa3448525337bd1ee04fe87c91ca8baca.png

Few important points to note:

  • Kriging is an interpolating model as it passes through the training points.

  • Model provides a mean prediction and an uncertainty estimate in the prediction.

  • Optimal theta is at the upper bound of the range. You can change the range and see how it affects the performance of the model.

Below block of code creates the model using Matern 32 kernel function:

# Create model
corr = 'matern32'
sm = KRG(theta0=[1e-2], corr=corr, theta_bounds=[1e-6, 1], print_global=False)
sm.set_training_values(xtrain, ytrain)
sm.train()

print("Optimal theta: {}".format(sm.optimal_theta))

# Prediction for plotting
yplot_pred = sm.predict_values(xplot).reshape(-1,)
yplot_var = sm.predict_variances(xplot).reshape(-1,)

# Plotting
fig, ax = plt.subplots()
ax.plot(xtrain, ytrain, 'ro', label='Training data')
ax.plot(xplot, yplot, 'b--', label='True function')
ax.plot(xplot, yplot_pred, 'k', label='Prediction')
ax.fill_between(xplot, yplot_pred - 2*np.sqrt(yplot_var), yplot_pred + 2*np.sqrt(yplot_var), color='lightblue', label='95% CI')
ax.set_xlim([0, 1])
plt.xlabel("x", fontsize=14)
plt.ylabel("y", fontsize=14)
ax.legend()
ax.grid()
Optimal theta: [1.]
_images/ded34712a0091549c8bcad3460663464555660353df5e1143ae976d38ad6b2bf.png

Below block of code creates the model using Matern 52 kernel function:

# Create model
corr = 'matern52'
sm = KRG(theta0=[1e-2], corr=corr, theta_bounds=[1e-6, 1], print_global=False)
sm.set_training_values(xtrain, ytrain)
sm.train()

print("Optimal theta: {}".format(sm.optimal_theta))

# Prediction for plotting
yplot_pred = sm.predict_values(xplot).reshape(-1,)
yplot_var = sm.predict_variances(xplot).reshape(-1,)

# Plotting
fig, ax = plt.subplots()
ax.plot(xtrain, ytrain, 'ro', label='Training data')
ax.plot(xplot, yplot, 'b--', label='True function')
ax.plot(xplot, yplot_pred, 'k', label='Prediction')
ax.fill_between(xplot, yplot_pred - 2*np.sqrt(yplot_var), yplot_pred + 2*np.sqrt(yplot_var), color='lightblue', label='95% CI')
ax.set_xlim([0, 1])
plt.xlabel("x", fontsize=14)
plt.ylabel("y", fontsize=14)
ax.legend()
ax.grid()
Optimal theta: [1.]
_images/782d8546f711013fe26aa1e9ac0ddd5bfe8fa804e6d5cfb1c40d66d78c1e8550.png

Note that mean prediction and uncertainty estimates are different for different kernel functions and you need to experiment with different kernel functions to find the best model for your problem. Now, number of samples will be increased to see when model prediction is good. This method is also known as one-shot sampling. More efficient way to obtain good fit is to use sequential sampling which will be discussed in future sections. Below block of code fits Kriging model on different sample sizes, plots the fit, and computes normalized rmse on plotting data.

# Creating array of training sample sizes
samples = np.array([3, 4, 5, 6, 7, 8, 9, 10])

# Initializing nrmse list
nrmse = []

# Fitting with different sample size
for sample in samples:
    
    xtrain = np.linspace(0, 1, sample)
    ytrain = forrester(xtrain)
    
    # Fitting the kriging
    sm = KRG(theta0=[1e-2], corr='matern52', theta_bounds=[1e-6, 1], print_global=False)
    sm.set_training_values(xtrain, ytrain)
    sm.train()
    
    # Predict at test values
    yplot_pred = sm.predict_values(xplot).reshape(-1,)
    yplot_var = sm.predict_variances(xplot).reshape(-1,)

    # Calculating average nrmse
    nrmse.append( np.sqrt(mean_squared_error(yplot, yplot_pred)) / np.ptp(yplot) )
    
    # Plotting prediction
    fig, ax = plt.subplots(1,2, figsize=(12,5))
    
    ax[0].plot(xtrain, ytrain, 'ro', label='Training data')
    ax[0].plot(xplot, yplot, 'b--', label='True function')
    ax[0].plot(xplot, yplot_pred, 'k', label='Prediction')
    ax[0].fill_between(xplot, yplot_pred - 2*np.sqrt(yplot_var), yplot_pred + 2*np.sqrt(yplot_var), color='lightblue', label='95% CI')
    ax[0].set_xlim((0, 1))
    ax[0].set_xlabel("x", fontsize=14)
    ax[0].set_ylabel("y", fontsize=14)
    ax[0].grid()
    ax[0].legend()
    
    ax[1].scatter(yplot_pred, yplot, c="k")
    ax[1].set_xlabel("Prediction", fontsize=14)
    ax[1].set_ylabel("True values", fontsize=14)
    ax[1].grid()
    
    fig.suptitle("Number of samples: {}".format(sample))
_images/7e833504759246af99885753bcb25178b78453884fc5bc0b9c6068775dc2d46e.png _images/42ea4177e05d98e0bfddbeb291f335f02694e5eb620e5c2b35fb0b25640e5e5b.png _images/5826d89316f86dadd54506b41d1f9fa84a43394a4601be5a7622ada82690f2af.png _images/0b10175243b2e2efe5bacd193f99c4de87d0b61082e4f58d39e46f3553260293.png _images/9169dcfde7303a8d825230afa3bf4533757820e65423ea251f40e8ee8b5e27d6.png _images/2399c837159549ad89f7468cd801fd828e0768985d6c1aeb53b0bedf2e21175b.png _images/bdbf87c447dc116b28f3828612b3ddd73982433bc6d07e0803056e45cebdf0dc.png _images/22a44c33d5c76a03e6cde2a3a25374f759190a4b547e644387f59a82015eb497.png

As the number of samples increase, the model prediction becomes better. Also, the value of length scale will change with number of samples. So, it is important to change the refit when model for different data. Below block of code plots the normalized rmse (computed using plotting data) as a function of number of samples. The nrmse decreases as the number of samples increase since prediction improves.

# Plotting NMRSE
fig, ax = plt.subplots()
ax.plot(samples, np.array(nrmse), c="k", marker=".")
ax.grid()
ax.set_yscale("log")
ax.set_xlim((samples[0], samples[-1]))
ax.set_xlabel("Number of samples", fontsize=14)
ax.set_ylabel("Normalized RMSE", fontsize=14)
Text(0, 0.5, 'Normalized RMSE')
_images/17af3fbdd892c524a0d7a32e737f49aa83dfd595dced45b9054c49a161307d18.png