Skip to main content

Bayesian optimization using deep Gaussian processes with applications to aerospace system design


Bayesian Optimization using Gaussian Processes is a popular approach to deal with optimization involving expensive black-box functions. However, because of the assumption on the stationarity of the covariance function defined in classic Gaussian Processes, this method may not be adapted for non-stationary functions involved in the optimization problem. To overcome this issue, Deep Gaussian Processes can be used as surrogate models instead of classic Gaussian Processes. This modeling technique increases the power of representation to capture the non-stationarity by considering a functional composition of stationary Gaussian Processes, providing a multiple layer structure. This paper investigates the application of Deep Gaussian Processes within Bayesian Optimization context. The specificities of this optimization method are discussed and highlighted with academic test cases. The performance of Bayesian Optimization with Deep Gaussian Processes is assessed on analytical test cases and aerospace design optimization problems and compared to the state-of-the-art stationary and non-stationary Bayesian Optimization approaches.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21


  • Amari S-I, Douglas SC (1998) Why natural gradient? In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, ICASSP’98 (Cat. No. 98CH36181), vol 2. IEEE, pp 1213–1216

  • Amine Bouhlel M, Bartoli N, Regis RG, Otsmane A, Morlier J (2018) Efficient global optimization for high-dimensional constrained problems by using the kriging models combined with the partial least squares method. Eng Optim 50(12):2038–2053

    MathSciNet  Article  Google Scholar 

  • Atkinson PM, Lloyd CD (2007) Non-stationary variogram models for geostatistical sampling optimisation: an empirical investigation using elevation data. Comput Geosci 33(10):1285–1300

    Article  Google Scholar 

  • Audet C, Denni J, Moore D, Booker A, Frank P (2000) A surrogate-model-based method for constrained optimization. In 8th symposium on multidisciplinary analysis and optimization, p 4891

  • Bartoli N, Lefebvre T, Dubreuil S, Olivanti R, Priem R, Bons N, Martins JRRA, Morlier J (2019) Adaptive modeling strategy for constrained global optimization with application to aerodynamic wing design. Aerosp Sci Technol 90:85–102

    Article  Google Scholar 

  • Basu K, Ghosh S (2017) Analysis of thompson sampling for gaussian process optimization in the bandit setting. arXiv preprint arXiv:1705.06808

  • Breiman L (2017) Classification and regression trees. Routledge, Abingdon

    Book  Google Scholar 

  • Bui T, Hernández-Lobato D, Hernandez-Lobato J, Li Y, Turner R (2016) Deep gaussian processes for regression using approximate expectation propagation. In: International conference on machine learning, pp 1472–1481

  • Cordery I, Yao SL (1993) Non stationarity of phenomena related to drought. Extreme hydrological events. In: Proceedings of the international symposium, Yokohama, 1993, 01 1993

  • Cox DD, John S (1997) Sdo: a statistical method for global optimization. In: Multidisciplinary design optimization: state-of-the-art, pp 315–329

  • Dai Z, Damianou A, González J, Lawrence N (2015) Variational auto-encoded deep gaussian processes. arXiv preprint arXiv:1511.06455

  • Damianou A, Lawrence N (2013) Deep gaussian processes. In: Artificial intelligence and statistics, pp 207–215

  • de G Matthews AG, van der Wilk M, Nickson T, Fujii K, Boukouvalas A, León-Villagrá P, Ghahramani Z, Hensman J (2017) GPflow: a Gaussian process library using TensorFlow. J Mach Learn Res 18(40):1–6

    MathSciNet  MATH  Google Scholar 

  • Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley, New York

    Book  Google Scholar 

  • Frazier PI (2018) A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811

  • Garg S, Singh A, Ramos F (2012) Learning non-stationary space-time models for environmental monitoring. In: Twenty-sixth AAAI conference on artificial intelligence, Toronto

  • Gibbs MN (1998) Bayesian Gaussian processes for regression and classification. PhD thesis, University of Cambridge

  • Gramacy RB, Apley DW (2015) Local gaussian process approximation for large computer experiments. J Comput Gr Stat 24(2):561–578

    MathSciNet  Article  Google Scholar 

  • Gramacy RB, Lee HKH (2008) Bayesian treed gaussian process models with an application to computer modeling. J Am Stat Assoc 103(483):1119–1130

    MathSciNet  Article  Google Scholar 

  • Gray JS, Hwang JT, Martins JRRA, Moore KT, Naylor BA (2019) OpenMDAO: an open-source framework for multidisciplinary design, analysis, and optimization. Struct Multidiscip Optim 59:1075–1104

    MathSciNet  Article  Google Scholar 

  • Haas TC (1990) Kriging and automated variogram modeling within a moving window. Atmos Environ Part A Gen Top 24(7):1759–1769

    Article  Google Scholar 

  • Havasi M, Hernández-Lobato JM, Murillo-Fuentes JJ (2018) Inference in deep gaussian processes using stochastic gradient Hamiltonian Monte Carlo. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31. Curran Associates Inc, New York, pp 7506–7516

    Google Scholar 

  • Hensman J, Fusi N, Lawrence ND (2013) Gaussian processes for big data. arXiv preprint arXiv:1309.6835

  • Hernández-Lobato JM, Hoffman MW, Ghahramani Z (2014) Predictive entropy search for efficient global optimization of black-box functions. In: Advances in neural information processing systems, pp 918–926

  • Higdon D, Swall J, Kern J (1999) Non-stationary spatial modeling. Bayesian Stat 6(1):761–768

    MATH  Google Scholar 

  • Hoffman MD, Brochu E, de Freitas N (2011) Portfolio allocation for Bayesian optimization. In: UAI. Citeseer, pp 327–336

  • Huang W, Zhao D, Sun F, Liu H, Chang E (2015) Scalable gaussian process regression using deep neural networks. In: Twenty-fourth international joint conference on artificial intelligence

  • Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13(4):455–492

    MathSciNet  Article  Google Scholar 

  • Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  • Konda S (2006) Fitting models of nonstationary time series: an application to EEG data. PhD thesis, Case Western Reserve University

  • Krityakierne T, Ginsbourger D (2015) Global optimization with sparse and local gaussian process models. In: International workshop on machine learning, optimization and big data. Springer, Berlin, pp 185–196

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

    Article  Google Scholar 

  • Marmin S, Ginsbourger D, Baccou J, Liandrat J (2018) Warped gaussian processes and derivative-based sequential designs for functions with heterogeneous variations. SIAM/ASA J Uncertain Quantif 6(3):991–1018

    MathSciNet  Article  Google Scholar 

  • Milly PCD, Betancourt J, Falkenmark M, Hirsch RM, Kundzewicz ZW, Lettenmaier DP, Stouffer RJ (2008) Stationarity is dead: Whither water management? Science 319(5863):573–574

    Article  Google Scholar 

  • Močkus J (1975) On Bayesian methods for seeking the extremum. In: Optimization techniques IFIP technical conference. Springer, Berlin, pp 400–404

  • Paciorek CJ, Schervish MJ (2006) Spatial modelling using a new class of nonstationary covariance functions. Environmetrics 17(5):483–506

    MathSciNet  Article  Google Scholar 

  • Papoulis A, Unnikrishna P (1991) Probability, random variables and stochastic processes. Tata McGraw-Hill Education, New York

    Google Scholar 

  • Parr JM, Keane AJ, Forrester AIJ, Holden CME (2012) Infill sampling criteria for surrogate-based optimization with constraint handling. Eng Optim 44(10):1147–1166

    Article  Google Scholar 

  • Picheny V, Wagner T, Ginsbourger D (2013) A benchmark of kriging-based infill criteria for noisy optimization. Struct Multidiscip Optim 48(3):607–626

    Article  Google Scholar 

  • Picheny V, Gramacy RB, Wild S, Le Digabel S (2016) Bayesian optimization under mixed constraints with a slack-variable augmented lagrangian. In: Advances in neural information processing systems, pp 1435–1443

  • Powell MJD (2009) The Bobyqa algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge, pp 26–46

  • Powell MJD (2003) On trust region methods for unconstrained minimization without derivatives. Math Program 97(3):605–623

    MathSciNet  Article  Google Scholar 

  • Priem R, Bartoli N, Diouane Y (2019) On the use of upper trust bounds in constrained Bayesian optimization infill criteria. In: AIAA aviation 2019 forum, p 2986

  • Qin AK, Huang VL, Suganthan PN (2009) Differential evolution algorithm with strategy adaptation for global numerical optimization. IEEE Trans Evol Comput 13(2):398–417

    Article  Google Scholar 

  • Rasmussen CE, Ghahramani Z (2002) Infinite mixtures of gaussian process experts. In: Advances in neural information processing systems, pp 881–888

  • Rasmussen C, Williams CKI (2006) Gaussian processes for machine learning, vol 1. MIT Press, Cambridge

    MATH  Google Scholar 

  • Remes S, Heinonen M, Kaski S (2017) Non-stationary spectral kernels. In: Advances in neural information processing systems, pp 4642–4651

  • Salimbeni H, Deisenroth M (2017) Doubly stochastic variational inference for deep gaussian processes. In: Advances in neural information processing systems, pp 4588–4599

  • Salimbeni H, Eleftheriadis S, Hensman J (2018) Natural gradients in practice: non-conjugate variational inference in gaussian process models. In: Artificial intelligence and statistics

  • Sampson P, Guttorp PD (1992) Nonparametric estimation of nonstationary spatial covariance structure. J Am Stat Assoc 87(417):108–119

    Article  Google Scholar 

  • Sasena MJ (2002) Flexibility and efficiency enhancements for constrained global design optimization with kriging approximations. PhD thesis, University of Michigan Ann Arbor, MI

  • Sasena MJ, Papalambros PY, Goovaerts P (2001) The use of surrogate modeling algorithms to exploit disparities in function computation time within simulation-based optimization. Constraints 2:5

    Google Scholar 

  • Schonlau M, Welch WJ, Jones D (1996) Global optimization with nonparametric function fitting. In: Proceedings of the ASA, section on physical and engineering sciences, pp 183–186

  • Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2015) Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 104(1):148–175

    Article  Google Scholar 

  • Shahriari B, Wang Z, Hoffman MW, Bouchard-Côté A, de Freitas N (2014) An entropy search portfolio for Bayesian optimization. arXiv preprint arXiv:1406.4625

  • Snelson E, Ghahramani Z (2006) Sparse gaussian processes using pseudo-inputs. In: Advances in neural information processing systems, pp 1257–1264

  • Snoek J, Rippel O, Swersky K, Kiros R, Satish N, Sundaram N, Patwary M, Prabhat MR, Adams R (2015) Scalable bayesian optimization using deep neural networks. In: International conference on machine learning, pp 2171–2180

  • Snoek J, Swersky K, Zemel R, Adams R (2014) Input warping for Bayesian optimization of non-stationary functions. In: International conference on machine learning, pp 1674–1682

  • Titsias M (2009) Variational learning of inducing variables in sparse gaussian processes. In: Artificial intelligence and statistics, pp 567–574

  • Toal DJJ, Keane AJ (2012) Non-stationary kriging for design optimization. Eng Optim 44(6):741–765

    MathSciNet  Article  Google Scholar 

  • Viana FAC, Haftka RT, Watson LT (2013) Efficient global optimization algorithm assisted by multiple surrogate techniques. J Glob Optim 56(2):669–689

    Article  Google Scholar 

  • Vidakovic B (2009) Statistical modeling by wavelets, vol 503. Wiley, New York

    MATH  Google Scholar 

  • Wang G, Shan S (2007) Review of metamodeling techniques in support of engineering design optimization. J Mech Des 129(4):370–380

    Article  Google Scholar 

  • Watson AG, Barnes RJ (1995) Infill sampling criteria to locate extremes. Math Geol 27(5):589–608

    Article  Google Scholar 

  • Wild SM, Regis RG, Shoemaker CA (2008) Orbit: optimization by radial basis function interpolation in trust-regions. SIAM J Sci Comput 30(6):3197–3219

    MathSciNet  Article  Google Scholar 

  • Xiong Y, Chen W, Apley D, Ding X (2007) A non-stationary covariance-based kriging method for metamodelling in engineering design. Int J Numer Methods Eng 71(6):733–756

    Article  Google Scholar 

Download references


This work is co-funded by ONERA-The French Aerospace Lab and Université de Lille, in the context of a joint PhD thesis. Discussions with Hugh Salimbeni and Zhenwen Dai were very helpful for this work, special thanks to them. The Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ali Hebbal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix A: Functions

Modified Xiong function (Fig. 22):

$$\begin{aligned} f(x)=-0.5\left( \sin \left( 40(x-0.85)^4\right) \cos \left( 2.5(x-0.95)\right) +0.5(x-0.9)+1\right) , \text { } x \in [0,1] \end{aligned}$$

Modified TNK constraint function (Fig. 23):

$$\begin{aligned} f({\mathbf{x }})=1.6(x_0-0.6)^2+1.6(x_1-0.6)^2-0.2\cos \left( 20\arctan \left( \frac{0.3x_0}{(x_1+10^{-8})}\right) \right) -0.4, \text { } {\mathbf{x }}\in [0,1]\times [0,1] \end{aligned}$$
Fig. 22
figure 22

Modified Xiong function

Fig. 23
figure 23

Modified TNK constraint

10d Trid function (Fig. 24):

$$\begin{aligned} f({\mathbf{x }})=\sum _{i=1}^{10} (x_i-1)^2-\sum _{i=2}^{10}x_ix_{i-1}, \text { } x_i \in [-100,100], \forall i=1,\dots ,10 \end{aligned}$$

Hartmann-6d function (Fig. 25):

$$\begin{aligned} f({\mathbf{x }})= \sum _{i=1}^4 \alpha _i \exp \left( -\sum _{j=1}^6 A_{ij} (x_j-Pij)^2 \right) , \text { } x_i\in [0,1], \forall i=1,\dots ,6 \end{aligned}$$


$$\begin{aligned} \alpha =[1,1.2,3,3.2]^\top \end{aligned}$$


$$\begin{aligned} P=10^{-4} \begin{bmatrix} 1312 &{} 1696 &{} 5569 &{} 124 &{} 8283 &{} 5886\\ 2329 &{} 4135 &{} 8307&{}3736&{}1004&{}9991\\ 2348&{}1451&{}3522&{}2883&{}3047&{}6650\\ 4047&{}8828&{}8732&{}5743&{}1091&{}381\end{bmatrix} \end{aligned}$$


$$\begin{aligned} A= \begin{bmatrix} 10&{}3&{}17&{}3.5&{}1.7&{}8\\ 0.05&{}10&{}17&{}0.1&{}8&{}14\\ 3&{}3.5&{}1.7&{}10&{}17&{}8\\ 17&{}8&{}0.05&{}10&{}0.1&{}14 \end{bmatrix} \end{aligned}$$
Fig. 24
figure 24

Sectional 2d view of the Trid function showing where the global minimum lies

Fig. 25
figure 25

Sectional 2d view of the Hartmann-6d function showing where the global minimum lies

Appendix B: Experimental setup

  • All experiments were executed on Grid’5000 using a Tesla P100 GPU. The code is based on GPflow (de G Matthews et al. 2017) and Doubly-Stochastic-DGP (Salimbeni and Deisenroth 2017).

  • For all DGPs, RBF kernels are used with a length-scale and variance initialized to 1 if it does not get an initialization from a previous DGP. The data is scaled to have a zero mean and a variance equal to 1.

  • The Adam optimizer is set with \(\beta _1=0.8\) and \(\beta _2=0.9\) and a step size \(\gamma ^{adam}=0.01\).

  • The natural gradient step size is initialized for all layers at \(\gamma ^{nat}=0.1\)

  • For BO with DGP the number of successive updates before optimizing from scratch is 5.

  • The infill criteria are optimized using a parallel differential evolution algorithm with a population of 400 and 100 generations.

  • A Github repository featuring BO & DGP algorithm will be available after the publication of the paper.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hebbal, A., Brevault, L., Balesdent, M. et al. Bayesian optimization using deep Gaussian processes with applications to aerospace system design. Optim Eng 22, 321–361 (2021).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Bayesian optimization
  • Gaussian process
  • Deep Gaussian process
  • Non-stationary function
  • Global constrained optimization