# Sampling Plans#

This section provides implementation for concepts related to sampling plans. Following topics are covered in this section:

Full Factorial Sampling

Latin Hypercube Sampling

You need to install smt which is a python-based surrogate modeling toolbox. It provides various surrogate modeling techniques, along with different sampling plans. Activate the environment you created in anaconda prompt and install smt usingNOTE:`pip install smt`

.

Below block of code imports required packages:

```
import numpy as np
import matplotlib.pyplot as plt
from smt.sampling_methods import FullFactorial, LHS
```

## Full Factorial Sampling#

First, we will look at full factorial sampling. You can read about how to use `smt`

for generating full-factorial samples in the documentation. Below block of code generates full-factorial samples for *two* dimensional design space and plots it. Read comments for more details.

NOTE:When generating full-factorial samples, the number of samples grows exponentially with the number of input parameters. For example, if you have 3 input parameters, each with 3 levels, then the total number of samples will be 3^3 = 27. This can quickly become infeasible for large number of input parameters or levels.

```
xlimits = np.array([[-5.0, 10.0], [0.0, 15.0]]) # limits for x and y
sampling = FullFactorial(xlimits=xlimits) # Instantiating the FullFactorial class
num = 49 # Number of samples to generate, each dimension will have 7 levels
x = sampling(num) # Generating DOE
fig, ax = plt.subplots()
ax.plot(x[:, 0], x[:, 1], "o")
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.grid()
```

As you can see, samples are generated in a grid-like fashion.

NOTE:Typically, the number of samples generated by full-factorial samples is a square number such as 4, 9, 16, 25, etc. However, it is not necessary to generate perfect square samples. For example, you can change`num`

to 46 and you will see following result:

```
num = 46
x = sampling(num) # Generating DOE
fig, ax = plt.subplots()
ax.plot(x[:, 0], x[:, 1], "o")
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.grid()
```

Now, you can see that samples generated are not in a perfect grid-like fashion since `num`

is not a square number.

## Latin Hypercube Sampling#

Now, we will look into latin hypercube sampling. You can read about how to use `smt`

for generating LH samples in the documentation. The process to generate samples is similar to what is described for full-factorial samples but with one additional step. You need to set criterion based on which samples are distributed in design space. Options of interest are:

`center`

or`c`

(default) - center of the strata`maximin`

or`m`

- maximize the minimum distance between samples (based on heuristics, no optimization is performed)`centermaximin`

or`cm`

- combination of`c`

and`maximin`

`ese`

- minimizing the \(\phi_p\) metric (described below) using enhanced stochastic evolution algorithm

Goodness of a sampling plan can be measured using an extension of maximin distance criterion proposed by Morris et al. (1995). For a sampling plan, all the pairwise distance between samples are calculated and sorted to get a distance list (\(d_1\), \(d_2\), …, \(d_s\)) and index list (\(J_1\), \(J_2\), …, \(J_s\)), where \(d_i\)’s are distinct distance values with \(d_1 < d_2 < \dots < d_s\), \(J_i\)’s are the number of pairs of samples with distance \(d_i\). Then, maximin distance criterion by Morris et al. is calculated using the following formula:

where, \(p\) is a positive integer. Smaller the value of \(\phi_p\), more space-filling the sampling plan.

Below block of code generates LH samples for *two* dimensional design space, plots it, and computes \(\phi_p\) value. Read comments for more details.

```
xlimits = np.array([[-5.0, 10.0], [0.0, 15.0]]) # limits for x and y
criterion = "c" # Sampling criterion
random_state = 1 # Fixing random state for reproducibility and comparison
sampling = LHS(xlimits=xlimits, criterion=criterion, random_state=random_state) # Instantiating the LHS class
num = 15 # Number of samples to generate, each dimension will be divided into 15 strata
x = sampling(num) # Generating DOE
phi_p = sampling._PhiP(x) # Calculating the phi_p value
print("phi_p value: {}".format(phi_p))
fig, ax = plt.subplots()
ax.plot(x[:, 0], x[:, 1], "o")
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_xlim(-5,10)
ax.set_ylim(0,15)
plt.xticks(np.linspace(-5, 10, num+1))
plt.yticks(np.linspace(0, 15, num+1))
ax.grid()
```

```
phi_p value: 0.7900654411297857
```

Here, the samples are distributed in a more random fashion as compared to full-factorial samples. \(\phi_p\) for the sampling plan is 0.79. In the next block, criterion is set to `cm`

:

```
criterion = "cm" # Sampling criterion
random_state = 1 # Fixing random state for comparison
sampling = LHS(xlimits=xlimits, criterion=criterion, random_state=random_state) # Instantiating the LHS class
num = 15 # Number of samples to generate, each dimension will be divided into 15 strata
x = sampling(num) # Generating DOE
phi_p = sampling._PhiP(x) # Calculating the phi_p value
print("phi_p value: {}".format(phi_p))
fig, ax = plt.subplots()
ax.plot(x[:, 0], x[:, 1], "o")
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_xlim(-5,10)
ax.set_ylim(0,15)
plt.xticks(np.linspace(-5, 10, num+1))
plt.yticks(np.linspace(0, 15, num+1))
ax.grid()
```

```
phi_p value: 0.7094603158576284
```

Now, the samples are spread out more since `cm`

criterion is using heuristics to maximize the minimum distance between samples. Accordingly, the \(\phi_p\) value is reduced to 0.71. In the next block, criterion is set to `ese`

:

```
criterion = "ese" # Sampling criterion
random_state = 1 # Fixing random state for comparison
sampling = LHS(xlimits=xlimits, criterion=criterion, random_state=random_state) # Instantiating the LHS class
num = 15 # Number of samples to generate, each dimension will be divided into 15 strata
x = sampling(num) # Generating DOE
phi_p = sampling._PhiP(x) # Calculating the phi_p value
print("phi_p value: {}".format(phi_p))
fig, ax = plt.subplots()
ax.plot(x[:, 0], x[:, 1], "o")
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
ax.set_xlim(-5,10)
ax.set_ylim(0,15)
plt.xticks(np.linspace(-5, 10, num+1))
plt.yticks(np.linspace(0, 15, num+1))
ax.grid()
```

```
phi_p value: 0.3453512854915221
```

Now, the samples are even more spread out as compared to `cm`

criterion. The \(\phi_p\) value is reduced to 0.35 from 0.71 for `cm`

criterion which is a significant improvement. You can read more about how `ese`

is used to minimize \(\phi_p\) in this paper.