Abstract
Introduction
The identification and graphical representation of process design space are critical in locating not only feasible but also optimum operating variable ranges and design configurations. In this work, the mapping of the design space of pharmaceutical processes is achieved using the ideas of process operability and flexibility under uncertainty.
Methods
For this purpose, three approaches are proposed which are based on different datadriven modeling techniques: response surface methodology, highdimensional model representation, and kriging methodology. Using these approaches, models that describe the behavior of the process at different design configurations are generated using solely experimental data. The models are utilized in mixed integer nonlinear programming formulations, where the optimum designs are identified for different combinations of input parameters within the operating parameter and material property ranges.
Results
Based on this idea, by defining a desirable output range, the corresponding range of input variables that result to acceptable performance can be accurately calculated and graphically represented.
Conclusions
The main advantages of the methodologies used in this work are, firstly, that there is no restriction by the lack of firstprinciple models that describe the investigated process and, secondly, that the models developed are computationally inexpensive. This work can also be used for the comparative analysis of the use of different surrogatebased methodologies for the identification of pharmaceutical process Design Space.
References
Acknowledgments
This work was supported by the ERCSOPS (NSF0504497, NSFECC 0540855). Also special thanks to Bill Englisch and Aditya Vanarase for providing the experimental data.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1—Equipment Specifications
Gericke GCM 250 Mixer Specifications
 Length:

0.3 m
 Diameter:

0.1 m
 Impeller:

12 triangular equally spaced shaped blades
Gericke LossinWeight Feeder Specifications
The following picture shows (from left to right) the main agitator piece that is used for all screws, the size 2 screw dependant agitator piece, open helix screw and closed auger screw, followed by sizes 3 and 4 sets.
The following is a picture of the feeder with the tooling before the nozzle plate is attached.
Appendix 2—Experimental Data and Analysis of Variance (ANOVA)
Appendix 3—Interpolation Techniques

A.
HighDimensional Model Representation Algorithm
The HDMR algorithm steps are the following:

1.
Obtain input output data g (x _{1}, x _{2},...,x _{ n }) for n input variables. (Appendix 2)

2.
Choose nominal point conditions \( {x^N} = \left( {x_1^N, \ldots, x_n^N} \right) \) to calculate f _{0} term as:

3.
Calculate the firstorder component function f _{ i } (x _{ i }) values by keeping all other input variables at their nominal values only varying x _{ i }, and always subtracting f _{0}. The number of component function values computed are equal to the experimentally sampled conditions along the x _{ i } axes.

4.
Calculate the secondorder component functions f _{ i,j } (x _{ i }, x _{ j }) values by keeping all other variables other than x _{ i }, x _{ j } at their nominal values and subtracting the corresponding lower order function values based on the following equation:

5.
Create a lookup table of component functions based on at different values of input variables. Once the lookup tables are created, the HDMR predicted output can be calculated as the sum of component functions up to the secondorder term (Eq. 3A4). If the output at an unsampled position needs to be calculated, linear interpolation is performed for each component function and finally the predicted value is calculated based on the sum of all terms from Eq. 3A4.

B.
Kriging Algorithm
The Kriging Algorithm steps are the Following:

1.
Obtain input – output data (Appendix 2) f (x _{ i }) for N sampling points x _{ i }

2.
For any combination of a pair of sampling points x _{ i }  x _{ j }, calculate the distance between them as:
In total there will be N(N–1)/2 sampling pairs x _{ i } − x _{ j }.

3.
For all sampling pairs, the squared difference between the corresponding output values is calculated next as:

4.
The next step is to plot the corresponding squared differences (Eq. 3B2) with respect to their distances (Eq. 3B1) to form a scatterplot that will give us the semivariogram function γ(h). A function that best describes the scattered data is fitted out of the following possible functions: exponential, Gaussian, spherical, linear and power models. The semivariogram model is necessary for the computation of the kriging weights. If needed, data smoothing is first applied to the Scatterplot before the semivariogram fitting. The best model is identified as the one that has the minimum least squares error.

5.
The peak value of the chosen semivariogram model is identified as \( \sigma_{{ \max }}^2 \) and it is used to calculate the covariance function which based on definition is given by:

6.
Based on Eq. 3B3, the covariance between any pair of sampling points x _{ i } − x _{ j } can be obtained. Given a new unsampled point, first a number of sampled points that will affect the predicted value of the test point are chosen. In this algorithm, the seven closest points to the test point are always chosen. The kriging prediction of the unsampled point is expressed as a weighted sum of the kriging weights multiplied by their corresponding sampled point output values (Eq. 3B4). The kriging weights depend on the covariances between sampling point pairs as well as the covariances between sampling points and test point and are given by the solution of the system of equations given by Eq. 5. The sum of the weights for any test point must be equal to 1.

7.
The final step is the calculation of the variance estimate of the test point which is given by Eq. 6.
