Skip to main content


Log in

Design Space of Pharmaceutical Processes Using Data-Driven-Based Methods

  • Research Article
  • Published:
Journal of Pharmaceutical Innovation Aims and scope Submit manuscript



The identification and graphical representation of process design space are critical in locating not only feasible but also optimum operating variable ranges and design configurations. In this work, the mapping of the design space of pharmaceutical processes is achieved using the ideas of process operability and flexibility under uncertainty.


For this purpose, three approaches are proposed which are based on different data-driven modeling techniques: response surface methodology, high-dimensional model representation, and kriging methodology. Using these approaches, models that describe the behavior of the process at different design configurations are generated using solely experimental data. The models are utilized in mixed integer non-linear programming formulations, where the optimum designs are identified for different combinations of input parameters within the operating parameter and material property ranges.


Based on this idea, by defining a desirable output range, the corresponding range of input variables that result to acceptable performance can be accurately calculated and graphically represented.


The main advantages of the methodologies used in this work are, firstly, that there is no restriction by the lack of first-principle models that describe the investigated process and, secondly, that the models developed are computationally inexpensive. This work can also be used for the comparative analysis of the use of different surrogate-based methodologies for the identification of pharmaceutical process Design Space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others


  1. Lepore J, Spavins J. PQLI design space. J Pharmaceut Innovation. 2008;3(2):79–87.

    Article  Google Scholar 

  2. Garcia T, Cook G, Nosal R. PQLI key topics - criticality, design space, and control strategy. J Pharmaceut Innovation. 2008;3(2):60–8.

    Article  Google Scholar 

  3. Davis B, Lundsberg L, Cook G. PQLI control strategy model and concepts. J Pharmaceut Innovation. 2008;3(2):95–104.

    Article  Google Scholar 

  4. Nosal R, Schultz T. PQLI definition of criticality. J Pharmaceut Innovation. 2008;3(2):69–78.

    Article  Google Scholar 

  5. Lipsanen T, Antikainen O, Räikkönen H, Airaksinen S, Yliruusi J. Novel description of a design space for fluidised bed granulation. Int J Pharm. 2007;345(1–2):101–7.

    Article  CAS  PubMed  Google Scholar 

  6. Lebrun P, Govaerts B, Debrus B, Ceccato A, Caliaro G, Hubert P, et al. Development of a new predictive modelling technique to find with confidence equivalence zone and design space of chromatographic analytical methods. Chemom Intell Lab Syst. 2008;91(1):4–16.

    Article  CAS  Google Scholar 

  7. Halemane KP and Grossmann IE. Optimal process design under uncertainty. 1987 cited. Available from:

  8. Swaney RE, Grossmann IE. An index for operational flexibility in chemical process design. Part I: Formulation and theory. AIChe J. 1985;31:621–30.

    Article  CAS  Google Scholar 

  9. Floudas CA, Gumus ZH. Global optimization in design under uncertainty: feasibility test and flexibility index problems. Ind Eng Chem Res. 2001;40(20):4267–82.

    Article  CAS  Google Scholar 

  10. Grossman IE, Floudas CA. Active constraint strategy for flexibility analysis in chemical processes. Comput Chem. 1987;11:675–93.

    Article  Google Scholar 

  11. Vishal G, Marianthi GI. Determination of operability limits using simplicial approximation. AIChE J. 2002;48:2902–9.

    Article  Google Scholar 

  12. Vishal G, Marianthi GI. Framework for evaluating the feasibility/operability of nonconvex processes. AIChe J. 2003;49:1233–40.

    Article  Google Scholar 

  13. Georgakis C, Uztürk D, Subramanian S, Vinson DR. On the operability of continuous processes. Control Eng Pract. 2003;11(8):859–69.

    Article  Google Scholar 

  14. Vinson DR, Georgakis C. A new measure of process output controllability. J Process Control. 2000;10(2–3):185–94.

    Article  CAS  Google Scholar 

  15. Subramanian S, Georgakis C. Steady-state operability characteristics of reactors. Comput Chem Eng. 2000;24(2–7):1563–8.

    Article  CAS  Google Scholar 

  16. Subramanian S, Georgakis C. Steady-state operability characteristics of idealized reactors. Chem Eng Sci. 2001;56(17):5111–30.

    Article  CAS  Google Scholar 

  17. Subramanian S, Uzturk D, Georgakis C. An optimization-based approach for the operability analysis of continuously stirred tank reactors. Ind Eng Chem Res. 2001;40(20):4238–52.

    Article  CAS  Google Scholar 

  18. Lima F, Jia Z, Ierapetritou M, Georgakis C. Similarities and differences between the concepts of operability and flexibility: The steady-state case. AIChe J. 2010;56:702–16.

    CAS  Google Scholar 

  19. Banerjee I, Ierapetritou MG. Design optimization under parameter uncertainty for general black-box models. Ind Eng Chem Res. 2002;41(26):6687–97.

    Article  CAS  Google Scholar 

  20. Banerjee I, Ierapetritou MG. Parametric process synthesis for general nonlinear models. Comput Chem Eng. 2003;27(10):1499–512.

    Article  CAS  Google Scholar 

  21. Floudas CA. Nonlinear and mixed-integer optimization: fundamentals and applications. New York: Oxford University Press; 1995.

    Google Scholar 

  22. Jia Z, Davis E, Muzzio FJ, Ierapetritou MG. Predictive modeling for pharmaceutical processes using kriging and response surface. JPI, 2009;4:174.

  23. Boukouvala F, Muzzio F, Ierapetritou M. Predictive modeling of pharmaceutical processes with missing and noisy data. AIChe J, 2010 cited; Available from:

  24. Chowdhury R, Rao BN. Assessment of high dimensional model representation techniques for reliability analysis. Probab Eng Mech. 2009;24(1):100–15.

    Article  Google Scholar 

  25. Genyuan Li S-WW, Herschel Rabitz. High dimensional model representations (HDMR): concepts and applications. cited. Available from:

  26. Pistek M. High dimensional model representation. cited. Available from:

  27. Rabitz H, Aliş Ö. General foundations of high–dimensional model representations. J Math Chem. 1999;25(2):197–233.

    Article  CAS  Google Scholar 

  28. Rabitz H, Alis ÖF, Shorter J, Shim K. Efficient input–output model representations. Comput Phys Commun. 1999;117(1–2):11–20.

    Article  CAS  Google Scholar 

  29. Sobol IM. Theorems and examples on high dimensional model representation. Reliab Eng Syst Saf. 2003;79(2):187–93.

    Article  Google Scholar 

  30. Li G, Rosenthal C, Rabitz H. High dimensional model representations. J Phys Chem A. 2001;105(33):7765–77.

    Article  CAS  Google Scholar 

  31. Li G, Wang S-W, Rabitz H. Practical approaches to construct RS-HDMR component functions. J Phys Chem A. 2002;106(37):8721–33.

    Article  CAS  Google Scholar 

  32. Box GEP, Wilson KB. On the experimental attainment of optimum conditions. J R Stat Soc B Methodol. 1951;13(1):1–45.

    Google Scholar 

  33. Raymond HM, Douglas CM. Response surface methodology: process and product in optimization using designed experiments. New York: Wiley; 1995. p. 728.

    Google Scholar 

  34. Cressie N. Statistics for spatial data (Wiley Series in Probability and Statistics). New York: Wiley; 1993. p. 1993.

    Google Scholar 

  35. Isaaks E. SR, Applied Geostatistics. New York: Oxford University Press; 1989.

    Google Scholar 

  36. Grossmann IE. Mixed-integer nonlinear programming techniques for the synthesis of engineering systems. Res Eng Des. 1990;1(3):205–28.

    Article  Google Scholar 

  37. Myers RH, Classical and modern regression with applications (second edn). The Duxbury advanced series in statistics and decision sciences, ed. D. Press. PWS-KENT, Boston, MA, 1990

  38. Myers RH, Montgomery DC. Response surface methodology: process and product in optimization using designed experiments. New York: Wiley; 1995. p. 728.

    Google Scholar 

  39. Ferris MC. MATLAB and GAMS: interfacing optimization and visualization software. 2005 cited; Available from:

  40. Vanarase AU, Muzzio F. Effect of operating conditions and design parameters in a continuous powder mixer. Adv Powder Tech, 2010; (in press).

  41. Faqih A, Chaudhuri B, Alexander AW, Davies C, Muzzio FJ, Silvina Tomassone M. An experimental/computational approach for examining unconfined cohesive powder flow. Int J Pharm. 2006;324(2):116–27.

    Article  CAS  PubMed  Google Scholar 

  42. Jones DR, Schonlau M, Welch WJ. Efficient global optimization of expensive black-box functions. J Glob Optim. 1998;13(4):455–92.

    Article  Google Scholar 

Download references


This work was supported by the ERC-SOPS (NSF-0504497, NSF-ECC 0540855). Also special thanks to Bill Englisch and Aditya Vanarase for providing the experimental data.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Marianthi G. Ierapetritou.


Appendix 1—Equipment Specifications

Gericke GCM 250 Mixer Specifications


0.3 m


0.1 m


12 triangular equally spaced shaped blades

Gericke Loss-in-Weight Feeder Specifications

The following picture shows (from left to right) the main agitator piece that is used for all screws, the size 2 screw dependant agitator piece, open helix screw and closed auger screw, followed by sizes 3 and 4 sets.

The following is a picture of the feeder with the tooling before the nozzle plate is attached.

Appendix 2—Experimental Data and Analysis of Variance (ANOVA)

Table 1 Continuous mixer case study
Table 2 Source for Table 1
Table 3 Loss-in-weight feeder case study
Table 4 ANOVA table for feeder case study

Appendix 3—Interpolation Techniques

  1. A.

    High-Dimensional Model Representation Algorithm

The HDMR algorithm steps are the following:

  1. 1.

    Obtain input- output data g (x 1, x 2,...,x n ) for n input variables. (Appendix 2)

  2. 2.

    Choose nominal point conditions \( {x^N} = \left( {x_1^N, \ldots, x_n^N} \right) \) to calculate f 0 term as:

$$ {f_0} = g\left( {x_1^N, \ldots, x_n^N} \right) $$
  1. 3.

    Calculate the first-order component function f i (x i ) values by keeping all other input variables at their nominal values only varying x i , and always subtracting f 0. The number of component function values computed are equal to the experimentally sampled conditions along the x i axes.

$$ {f_i}\left( {{x_i}} \right) = g\left( {{x_i},x_j^N} \right) - {f_0},i \ne j $$
  1. 4.

    Calculate the second-order component functions f i,j (x i , x j ) values by keeping all other variables other than x i , x j at their nominal values and subtracting the corresponding lower order function values based on the following equation:

$$ {f_{i,j}}\left( {{x_i},{x_j}} \right) = g\left( {{x_i},{x_j},x_k^N} \right) - {f_i}\left( {{x_i}} \right) - {f_j}\left( {{x_j}} \right) - {f_0},k \ne i,j $$
  1. 5.

    Create a look-up table of component functions based on at different values of input variables. Once the look-up tables are created, the HDMR predicted output can be calculated as the sum of component functions up to the second-order term (Eq. 3A-4). If the output at an unsampled position needs to be calculated, linear interpolation is performed for each component function and finally the predicted value is calculated based on the sum of all terms from Eq. 3A-4.

$$ {f_{HDMR}}\left( {{x_1}, \ldots, {x_n}} \right) = {f_0} + \sum {_{i = 1}^n{f_i}({x_i}) + \sum {_{1 \leqslant i \leqslant j \leqslant n}{f_{i,j}}({x_i},{x_j})} } $$
  1. B.

    Kriging Algorithm

The Kriging Algorithm steps are the Following:

  1. 1.

    Obtain input – output data (Appendix 2) f (x i ) for N sampling points x i

  2. 2.

    For any combination of a pair of sampling points x i - x j , calculate the distance between them as:

$$ {d_{i,j}} = \sqrt {{{{\left( {{x_{1,i}} - {x_{1,j}}} \right)}^2} + \cdot \cdot \cdot + {{\left( {{x_{n,i}} - {x_{n,j}}} \right)}^2}}} \,\,{\hbox{for}}\,\,i,j = 1,...,N,i \ne j $$

In total there will be N(N–1)/2 sampling pairs x i  − x j .

  1. 3.

    For all sampling pairs, the squared difference between the corresponding output values is calculated next as:

$$ f_{i,j}^2 = {(f\left( {{x_i}} \right) - f({x_j}))^2} $$
  1. 4.

    The next step is to plot the corresponding squared differences (Eq. 3B-2) with respect to their distances (Eq. 3B-1) to form a scatterplot that will give us the semivariogram function γ(h). A function that best describes the scattered data is fitted out of the following possible functions: exponential, Gaussian, spherical, linear and power models. The semivariogram model is necessary for the computation of the kriging weights. If needed, data smoothing is first applied to the Scatterplot before the semivariogram fitting. The best model is identified as the one that has the minimum least squares error.

  2. 5.

    The peak value of the chosen semivariogram model is identified as \( \sigma_{{ \max }}^2 \) and it is used to calculate the covariance function which -based on definition- is given by:

$$ {\hbox{Cov}}(h) = \sigma_{{ \max }}^2 - \gamma (h) $$
  1. 6.

    Based on Eq. 3B-3, the covariance between any pair of sampling points x i  − x j can be obtained. Given a new unsampled point, first a number of sampled points that will affect the predicted value of the test point are chosen. In this algorithm, the seven closest points to the test point are always chosen. The kriging prediction of the unsampled point is expressed as a weighted sum of the kriging weights multiplied by their corresponding sampled point output values (Eq. 3B-4). The kriging weights depend on the covariances between sampling point pairs as well as the covariances between sampling points and test point and are given by the solution of the system of equations given by Eq. 5. The sum of the weights for any test point must be equal to 1.

$$ \widetilde{f}\left( {{x_k}} \right) = \sum {_{i = 1}^7{w_i}f({x_i})} $$
  1. 7.

    The final step is the calculation of the variance estimate of the test point which is given by Eq. 6.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boukouvala, F., Muzzio, F.J. & Ierapetritou, M.G. Design Space of Pharmaceutical Processes Using Data-Driven-Based Methods. J Pharm Innov 5, 119–137 (2010).

Download citation

  • Published:

  • Issue Date:

  • DOI: