Sampling Plans

Sampling Plans#

This section provides implementation for concepts related to sampling plans. Following topics are covered in this section:

Full Factorial Sampling
Latin Hypercube Sampling
Halton sequences

NOTE: You need to install smt which is a python-based surrogate modeling toolbox. It provides various surrogate modeling techniques, along with different sampling plans. Activate the environment you created in anaconda prompt and install smt using pip install smt.

Below block of code imports required packages:

import numpy as np
import matplotlib.pyplot as plt
from smt.sampling_methods import FullFactorial, LHS
from scipy.stats.qmc import Halton

Full Factorial Sampling#

First, we will look at full factorial sampling. You can read about how to use smt for generating full-factorial samples in the documentation. Below block of code generates full-factorial samples for two dimensional design space and plots it. Read comments for more details.

NOTE: When generating full-factorial samples, the number of samples grows exponentially with the number of input parameters. For example, if you have 3 input parameters, each with 3 levels, then the total number of samples will be 3^3 = 27. This can quickly become infeasible for large number of input parameters or levels.

xlimits = np.array([[-5.0, 10.0], [0.0, 15.0]]) # limits for x and y
sampling = FullFactorial(xlimits=xlimits) # Instantiating the FullFactorial class

num = 49 # Number of samples to generate, each dimension will have 7 levels
x = sampling(num) # Generating DOE

fig, ax = plt.subplots()
ax.plot(x[:, 0], x[:, 1], "o")
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.grid()

_images/ebcc0eb6865912694cd22fbde82371f37dd8834fb80cbb907599b3960da06dc2.png

As you can see, samples are generated in a grid-like fashion.

NOTE: Typically, the number of samples generated by full-factorial samples is a square number such as 4, 9, 16, 25, etc. However, it is not necessary to generate perfect square samples. For example, you can change num to 46 and you will see following result:

num = 46

x = sampling(num) # Generating DOE

fig, ax = plt.subplots()
ax.plot(x[:, 0], x[:, 1], "o")
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.grid()

_images/ff76ff5ba7a421a382f4bf917741c104dbdb16e6ca762b99a25bbc8c709075e0.png

Now, you can see that samples generated are not in a perfect grid-like fashion since num is not a square number.

Latin Hypercube Sampling#

Now, we will look into latin hypercube sampling. You can read about how to use smt for generating LH samples in the documentation. The process to generate samples is similar to what is described for full-factorial samples but with one additional step. You need to set criterion based on which samples are distributed in design space. Options of interest are:

center or c (default) - center of the strata
maximin or m - maximize the minimum distance between samples (based on heuristics, no optimization is performed)
centermaximin or cm - combination of c and maximin
ese - minimizing the \(\phi_p\) metric (described below) using enhanced stochastic evolution algorithm

Goodness of a sampling plan can be measured using an extension of maximin distance criterion proposed by Morris et al. (1995). For a sampling plan, all the pairwise distance between samples are calculated and sorted to get a distance list (\(d_1\), \(d_2\), …, \(d_s\)) and index list (\(J_1\), \(J_2\), …, \(J_s\)), where \(d_i\)’s are distinct distance values with \(d_1 < d_2 < \dots < d_s\), \(J_i\)’s are the number of pairs of samples with distance \(d_i\). Then, maximin distance criterion by Morris et al. is calculated using the following formula:

\[ \phi_p = \Bigg [ \sum_{i=1}^{s} J_i d_i^{-p} \Bigg ]^{1/p} \]

where, \(p\) is a positive integer. Smaller the value of \(\phi_p\), more space-filling the sampling plan.

Below block of code generates LH samples for two dimensional design space, plots it, and computes \(\phi_p\) value. Read comments for more details.

xlimits = np.array([[-5.0, 10.0], [0.0, 15.0]]) # limits for x and y
criterion = "c" # Sampling criterion
random_state = 1 # Fixing random state for reproducibility and comparison
sampling = LHS(xlimits=xlimits, criterion=criterion, random_state=random_state) # Instantiating the LHS class

num = 15 # Number of samples to generate, each dimension will be divided into 15 strata
x = sampling(num) # Generating DOE

phi_p = sampling._PhiP(x) # Calculating the phi_p value
print("phi_p value: {}".format(phi_p))

fig, ax = plt.subplots()
ax.plot(x[:, 0], x[:, 1], "o")
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_xlim(-5,10)
ax.set_ylim(0,15)
plt.xticks(np.linspace(-5, 10, num+1))
plt.yticks(np.linspace(0, 15, num+1))
ax.grid()

phi_p value: 0.7900654411297857

_images/312ff5a808cb0e04e88ec84bea96a1d5575b27ed306c0da23925404faf134c21.png

Here, the samples are distributed in a more random fashion as compared to full-factorial samples. \(\phi_p\) for the sampling plan is 0.79. In the next block, criterion is set to cm:

criterion = "cm" # Sampling criterion
random_state = 1 # Fixing random state for comparison
sampling = LHS(xlimits=xlimits, criterion=criterion, random_state=random_state) # Instantiating the LHS class

num = 15 # Number of samples to generate, each dimension will be divided into 15 strata
x = sampling(num) # Generating DOE

phi_p = sampling._PhiP(x) # Calculating the phi_p value
print("phi_p value: {}".format(phi_p))

fig, ax = plt.subplots()
ax.plot(x[:, 0], x[:, 1], "o")
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_xlim(-5,10)
ax.set_ylim(0,15)
plt.xticks(np.linspace(-5, 10, num+1))
plt.yticks(np.linspace(0, 15, num+1))
ax.grid()

phi_p value: 0.7094603158576284

_images/fa24cc932d2b19f40051bc8bc04f21f72b0900f9d9132cdffbc875724083af3f.png

Now, the samples are spread out more since cm criterion is using heuristics to maximize the minimum distance between samples. Accordingly, the \(\phi_p\) value is reduced to 0.71. In the next block, criterion is set to ese:

criterion = "ese" # Sampling criterion
random_state = 1 # Fixing random state for comparison
sampling = LHS(xlimits=xlimits, criterion=criterion, random_state=random_state) # Instantiating the LHS class

num = 15 # Number of samples to generate, each dimension will be divided into 15 strata
x = sampling(num) # Generating DOE

phi_p = sampling._PhiP(x) # Calculating the phi_p value
print("phi_p value: {}".format(phi_p))

fig, ax = plt.subplots()
ax.plot(x[:, 0], x[:, 1], "o")
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_xlim(-5,10)
ax.set_ylim(0,15)
plt.xticks(np.linspace(-5, 10, num+1))
plt.yticks(np.linspace(0, 15, num+1))
ax.grid()

phi_p value: 0.3453512854915221

_images/812a36085ca0a25837a9249e97f30c1cc17a8f097bbf8fa3cfe8876718cb135d.png

Now, the samples are even more spread out as compared to cm criterion. The \(\phi_p\) value is reduced to 0.35 from 0.71 for cm criterion which is a significant improvement. You can read more about how ese is used to minimize \(\phi_p\) in this paper.

Halton sequence#

Now, we will look at Halton sequence. We will be using scipy library for working with Halton sequence, refer documentation for more details.

The process to generate samples using Halton sequence is quite straightforward. First step is to initialize the Halton class (which was imported at the start of this section) with the dimension of the problem and other optional arguments to create a sampler object. After initialization, use the random method from the sampler object to generate desired number of samples.

Below block of code generates 15 samples using Halton sequence within two dimensional space and plots it.

# Generate samples
d = 2
num = 15
sampler = Halton(d, scramble=False)
x = sampler.random(num)

# Plot samples
fig, ax = plt.subplots()
ax.plot(x[:, 0], x[:, 1], "o")
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_xlim(0,1)
ax.set_ylim(0,1)
ax.grid()

# Annotate sample number
for i in range(num):
    ax.annotate(f'{i+1}', x[i,:])

_images/e97ee76c1a3897a999e060a75931eb781547c6c79751975466b9637aa47029f2.png

NOTE: The Halton class in the scipy library always generates samples in the unit hypercube.

Similar to maximin distance criterion, the quality of samples from a low-discrepancy sequence can be checked by computing the discrepancy criterion. You can compute discrepancy using scipy library, refer documentation for more details.

Sampling Plans

Contents

Sampling Plans#

Full Factorial Sampling#

Latin Hypercube Sampling#

Halton sequence#