1 Introduction

In light of increasing awareness concerning the detrimental effects of the transportation sector on the environment, lawmakers and consumers demand stricter legal guidelines for motor vehicles. This obliges manufacturers to comply with demanding standards such as low carbon dioxide emissions, little pollutants, and extensive onboard diagnosis functions while expecting high performance and driving comfort [1]. European legislation also stipulates that each vehicle in the field must be compliant with emission regulations, which is verified by random tests on fleet vehicles randomly taken from customers [2]. For manufacturers, this means that their product must not only comply with emission limits but needs also to be robust with regard to them.

A major contributor to the uncertainty of fleet emissions is the component tolerances of the internal combustion engine (ICE). To provide an intuition: deviations between the physical air mass flow and the measured value of the air mass flow meter may lead to a change in the volumetric efficiency of the ICE, which in turn leads to different combustion behavior and combustion products. Manufacturers usually rely on empirical evidence and conservative decisionmaking to account for this. Emission targets are artificially decreased by using safety coefficients to determine a so called enginnering target. Calibration is then performed on an ICE consisting of nominal parts and aims at meeting the engineering target. However, calibration design that is based on an engineering target lacks sound statistical evaluation. It can lead to overly conservative calibration results and leaves room for improvement in quantities of interest such as carbon dioxide, pollutants, or power output. On the contrary, it can also lead to a non-compliant calibration, which is associated with costs and even delays of the start of production. The quantification and consideration of component-related uncertainties, therefore, has the potential to better meet desired optimization targets and reduce costs.

The complex nature of engine calibration tasks [3] with usually 10 and more design parameters generally renders manual calibration suboptimal. Model-based calibration methods have proven to be a successful solution in the past and have become state of the art [1, 3, 4]. Since modeling the entire combustion process using physical models is complex and computationally intensive, statistical models turned out to be a good alternative [3,4,5]. In particular, Gaussian processes [6] in combination with adapted experimental designs [7] are established in the field.

Reliability-based design is of crucial importance in engineering design. Hence, numerous methods have been proposed to systematically treat uncertainties in product design and carry out stochastic [8,9,10,11] or reliability-based [12] design optimization [13,14,15]. Stochastic methodology has already been applied in powertrain development. [16] and [17] performed stochastic optimization aiming to quantify and minimize uncertainty resulting from different driving cycles. In [18] chance-constrained optimization was performed to improve thermal efficiency. [19] did some early work on stochastic optimization of spark ignition engines considering spark advance and injection time uncertainty. Given the resources at the time, they make use of many linear approximations, which can be problematic given the nonlinear nature of more complex calibration problems. To the authors’ knowledge, there is no article addressing stochastic design optimization that considers component tolerances and is suitable for large-scale engine calibration problems. The objective of this paper is, therefore, to develop a methodology that solves this issue and incorporates it into a statistical framework.

The contribution of this article is a stochastic design optimization methodology for engine calibration problems. It is imperative to reduce the computational effort to a manageable level, taking into account the dimension of a typical calibration problem consisting of numerous operating points (OP) that need to be optimized. The proposed method aims to improve calibration results by replacing conventional safety coefficient-based calibration with uncertainty-based optimizations. The uncertainty of the design parameters is derived from the component tolerances employing a priori uncertainty analysis. The original optimization problem is reformulated to minimizing the expectation of a modified objective function taking uncertainties into account while limiting some high or low quantiles of the constraint functions. This is similar to the chance-constrained optimization setting in the literature [20, 21]. Solving this type of stochastic optimization task using Monte Carlo (MC) methods [22] can be computationally expensive for state-of-the-art calibration tasks where the engine models need to be evaluated several times. To address this issue, we approximate the necessary distribution statistics of the quantities of interest using Gaussian Process (GP) regression models. Since the GP expression has an analytical form, we can simplify the stochastic problem into a deterministic one. Thus reducing the computational burden by orders of magnitude.

This article is structured as follows: In Sect. 2 we provide some necessary fundamentals. In Sect. 3 we describe the proposed stochastic design optimization framework. We then introduce the test subject in Sect. 4. In Sect. 5, we discuss the asymptotic properties of the VaR surrogate models. In Sect. 6, the proposed methodology is applied to an engine calibration problem. First, the performance of the method is contrasted against a state-of-the-art deterministic calibration setting followed by a comparison with the numerical alternative using the quasi-Monte Carlo (QMC) method to quantify the computational effort needed to solve the optimization problem. Finally, the results of the article are discussed in sect. 7.

2 Background

As described above, this section introduces some necessary fundamentals. First, we give some intuition on Gaussian Process regression. We then provide an overview of nonlinear constrained optimization problems.

2.1 Black-box modeling with Gaussian processes

Bayesian regression based on Gaussian processes is an approach for nonparametric modeling where an unknown system \(y = f(\mathbf {x}) + \epsilon\) is modeled from some observed data [23,24,25]. For combustion engine models we usually assume our various system outputs to be uncorrelated. We, therefore, focus on the basic single output supervised learning setting where our goal is to approximate a function \(f:{\mathcal {X}}\subset {\mathbb {R}}^{d\in {\mathbb {N}}}\rightarrow {\mathcal {Y}}\subset {\mathbb {R}}\) from a set of \(n\in {\mathbb {N}}\) observations \({\mathcal {D}}=\{\mathbf {x}_{i},y_{i}\}_{i=1}^n\) [26].

In the Bayesian framework, we assume a GP to be a collection of random variables so that every finite set of these has a joint Gaussian distribution and is completely specified by its mean \(\mu (\mathbf {x})\) and covariance function \(k(\mathbf {x}, \mathbf {x}_{j})\) and can be written as \(f(\mathbf {x})\sim {{\mathcal {G}}}{{\mathcal {P}}}(\mu (\mathbf {x}), k(\mathbf {x}, \mathbf {x}_{j}))\). Given noisy observations and assuming the GP to have a zero-mean prior, its joint distribution for an arbitrary input is given as

$$\begin{aligned} p(y\mid {\mathcal {D}}) = {\mathcal {N}}(y|0,{\mathsf {K}}+\sigma _{\epsilon }^{2}{\mathsf {I}}_{n})., \end{aligned}$$
(1)

where \({\mathsf {K}}\in {\mathbb {R}}^{n\times n}\) is the covariance matrix, \(\sigma _{\epsilon }^{2}\) the noise variance, and \({\mathsf {I}}_{n}\) the \(n\times n\) identity matrix. Given this joint distribution, the predictive distribution of the GP for an arbitrary input \(\mathbf {x}\) is specified by its mean and variance

$$\begin{aligned} p(f\mid \mathbf {x},{\mathcal {D}}) = {\mathcal {N}}(f|{\mathbb {E}}[f(\mathbf {x})],{\mathbb {V}}[f(\mathbf {x})]), \end{aligned}$$
(2)

with

$$\begin{aligned} {\mathbb {E}}[f(\mathbf {x})]&=k^\top (\mathbf {x})({\mathsf {K}}+\sigma _{\epsilon }^{2}{\mathsf {I}}_n)^{-1}(y_{1},\dots ,y_{n})^\top , \end{aligned}$$
(3a)
$$\begin{aligned} {\mathbb {V}}[f(\mathbf {x})]&= 1-k^\top (\mathbf {x})({\mathsf {K}}+\sigma _{\epsilon }^{2}{\mathsf {I}}_n)^{-1}k(\mathbf {x}), \end{aligned}$$
(3b)

where \(k(\mathbf {x})=(k(\mathbf {x},\mathbf {x}_{1}),...,k(\mathbf {x},\mathbf {x}_{n}))^\top\) is a column vector containing the covariance of the input \(\mathbf {x}\) and the training samples.

By definition, the covariance function is symmetric positive-definite, i.e. for any set of inputs the resulting covariance matrix is positive-definite. A popular choice for the covariance function is the squared exponential (SE) kernel given by

$$\begin{aligned} k(\mathbf {x}_{p},\mathbf {x}_{q}) = \sigma _{s}^{2}\exp \left( -\frac{\left\| \mathbf {x}_{p}-\mathbf {x}_{q}\right\| _{l}}{2}\right) , \end{aligned}$$
(4)

where \(\sigma _{s}^{2}\in {\mathbb {R}}_0^+\) is the signal variance and \(\left\| \cdot \right\| _{l}\) designates the norm taking into account the characteristic length-scales \(l=\{l_{i}\}\) with \(i=1,\cdots ,d\) given by the sum of squares of the product \((\mathbf {x}_{p}-\mathbf {x}_{q})l_{i}^{-1}\). The infinite differentiability of the SE kernel results in smooth predictive functions. Further intuition on covariance functions is given in [6].

The prior and likelihood of the GP encode our beliefs about the model we wish to learn. With \(\sigma _{\epsilon }, \sigma _{s}\), and l being the hyperparameters of the model. GP hyperparameters can be learned using different methods such as integration via Markov chain Monte Carlo [25, 27]. Given the above assumptions, the hyperparameters can also be learned by maximizing the log marginal likelihood

$$\begin{aligned}&\log p(y|\sigma _{\epsilon }, l, \sigma _{s})\nonumber \\&\quad = -\frac{\mathbf {y}({\mathsf {K}}+\sigma _{\epsilon }^{2}{\mathsf {I}}_n)^{-1}\mathbf {y}^\top +\log |{\mathsf {K}}+\sigma _{\epsilon }^{2}{\mathsf {I}}_n|+n\log 2\pi }{2}, \end{aligned}$$
(5)

with \(\mathbf {y}=(y_{1},\dots ,y_{n})\). For covariance functions with first-order derivatives, this can be solved using gradient-based optimization.

2.2 Probabilistic nonlinear constrained optimization

Many industrial tasks such as base engine calibration are nonlinear constrained optimization problems. They aim at minimizing an objective f on a domain \({\mathcal {X}}\) subject to equality and inequality constraints. Since equality constraints are of no interest to this particular paper, they shall be disregarded from here on. The general mathematical formulation of a deterministic nonlinear constrained optimization problem is, therefore, given by

$$\begin{aligned} \min _{\mathbf {x}}&f(\mathbf {x}) \end{aligned}$$
(6a)
$$\begin{aligned} \text {s.t.}&c_{i}(\mathbf {x})\le 0,i=1,\dots ,n_{c} \end{aligned}$$
(6b)

where f is the cost function that needs to be minimized, which is subject to constraints c. This kind of problem classifies as a convex or non-convex mixed-integer nonlinear programming (MINLP) problem depending on the space \({\mathcal {X}}_{feasible}\subseteq {\mathcal {X}}\) of feasible solutions. More intuition on MINLP problems and an overview of methods for solving these are provided in [28, 29]. Most real-world applications, however, are subject to uncertainties ranging from environmental and operating conditions to manufacturing tolerances. In this probabilistic setting, the system inputs are assumed to be random quantities.

Definition 1

Let X be a multivariate random variable on a probability space \(({\Omega },{\Sigma },P)\) with sample space \({\Omega }\), sigma-algebra \({\Sigma }\), and probability measure P. Then \(Y=f(X)\) is a transformation of X under f and again a random variable on \({\Omega }\). Its cumulative distribution function is given by

$$\begin{aligned} P(Y\le y) = P(f(X)\le y). \end{aligned}$$
(7)

Given this, a possible probabilistic formulation of the nonlinear constrained optimization problem in equation 6a and 6bis given by the individual chance-constraint optimization setting

$$\begin{aligned} \min _{x}&{\mathbb {E}}[f(X)] \end{aligned}$$
(8a)
$$\begin{aligned} \text {s.t.}&P(c_{i}(X)\le 0)\ge \alpha _{i},i=1,\dots ,n_{c} \end{aligned}$$
(8b)

where the new objective is to minimize the first statistical moment of the distribution f(X) subject to probabilistic constraints. Solving these kinds of stochastic MINLP problems involves the evaluation of multi-dimensional integrals for the objective and constraints, which can be computationally expensive even for low dimensional problems. Often a more feasible alternative, especially for high dimensional problems is, therefore, the utilization of sampling techniques such as MC sampling [22]. This involves drawing samples from the distribution X, evaluating these, and approximating the distribution f(X). However, evaluating a high number of samples for optimization tasks involving a large number of function evaluations is computationally infeasible. A popular approach to tackle this issue is to simplify the stochastic problem into a deterministic task [30].

3 The stochastic design optimization framework

In this section, we propose a framework for stochastic design optimization, that aims at quantifying and considering component tolerances during the model-based calibration process. The methodology aims to find optimal design parameters for a system subject to uncertainty by solving a probabilistic constrained optimization problem. An necessity for the methodology is short computing time. This results from the requirement to keep the powertrain development cycle as short as possible. The overall objective of the methodology is to replace conventional safety factors using a priori uncertainty analysis and improve the calibration results. Figure 1 depicts the general structure of the framework.

Fig. 1
figure 1

Schematic representation of the framework

3.1 Uncertainty analysis

In general, there are two types of uncertainties [31]: While pure random effects such as geometric tolerances due to imprecisions during production are classified as aleatoric uncertainty, systematic effects like material aging are epistemic. Both effects need to be considered in an uncertainty analysis.

To take input uncertainty into account during optimization, it has to be determined beforehand. The combustion process of an engine is mainly determined by the design parameters or the set ECU parameters. Manufacturing tolerances lead to deviations from the set values and in turn to changes in quantities such as fuel consumption and emissions. Since the design parameters of our engine models are not directly subject to uncertainty, we have to perform an uncertainty analysis that maps relevant sources of uncertainty such as manufacturing tolerances to the design parameters. This is achieved by performing an uncertainty analysis, where uncertainty propagation is used to determine the design parameter uncertainty.

Figure 2 depicts an excerpt from the uncertainty analysis we performed for the calibration problem in Sect. 6. To convey an intuition here: using the results of tolerance investigations on a set of injectors, we determine the uncertainty distribution of the injected quantity (IQ) of a single pilot injection (PI), which depends on the nominal rail pressure and nominal IQ. Since we cannot measure the emissions of each individual cylinder, we sample all cylinders and assume the sample mean to represent the final quantity that results in the measured emissions.

Fig. 2
figure 2

Excerpt from the uncertainty analysis, which illustrates the influence of the injector component tolerance on the injected quantity of the first pilot injection

3.2 Optimization problem formulation

As mentioned before, the engine calibration task is a constrained optimization problem. The aim is to minimize an objective function while making sure that some constraints are met for the entire vehicle fleet. A potential formulation for a nonlinear constrained optimization problem was given by equations 8a and 8b. This is a reasonable formulation for the combustion engine setting since it also considers the tails of the target distributions. For example, if a vehicle with a nominal ICE barely complies with emission limits, then a high number of vehicles will be surely non-compliant.

Definition 2

The value at risk of a multivariate random variable X given a confidence level \(\alpha \in (0,1)\) is given by

$$\begin{aligned} \text {VaR}_\alpha (X)=\inf \{x\in {\mathbb {R}}:\alpha \le P(X\le x)\}. \end{aligned}$$
(9)

We can reformulate equation 8b using the VaR and write the optimization problem as

$$\begin{aligned} \min _{x}&\text {VaR}_{0.50}(f(X))\ \end{aligned}$$
(10a)
$$\begin{aligned} \text {s.t.}&\text {VaR}_{\alpha _{i}}(f(X))\le c_{c,i},i=1,\dots ,n_{c}, \end{aligned}$$
(10b)

where \(\text {VaR}_{\alpha _{i}}\) is a probabilistic constraint function and \(c_{c,i}\) the corresponding constraint.

3.3 Surrogate model of the value at risk

As already described above, base calibration tasks are usually high dimensional MINLP problems. The associated uncertainty of the design parameters is not necessarily uniform or Gaussian and often correlated. This renders cheap approximations such as Most Probable Points [32] useless. Therefore, sampling is often the only viable way to approximate distribution statistics. However, base calibration problems usually require millions of function evaluations, making the problem extremely expensive in terms of computing time and impractical to solve. Generally, a viable way of reducing the computational burden resulting from expensive experiments or simulations is to train a surrogate model that reproduces the behavior of the experiment or simulation [33]. The idea is to create a black-box model mapping the parameters of the experiment or the simulation to the desired outputs. If such a model can be created with good model quality and using a limited number of measurements or simulation runs, then it can reduce the computational burden of optimization substantially. The resulting black-box model is relatively cheap to evaluate. In our case, we wish to train a model \(g:\mathbf {x}\mapsto \text {VaR}_{\alpha }(X)\) that maps the nominal input x to the corresponding VaR\(_\alpha\). Our implementation uses GPs to learn the underlying VaR. Given a set of n observations \({\mathcal {E}}=\{x_{i},\text {VaR}_{\alpha }(X)_{i}\}_{i=1}^n\), the expectation of the GP surrogate model is expressed as

$$\begin{aligned} \text {VaR}_{\alpha }(\mathbf {x})=k^\top (x)({\mathsf {K}}+\sigma _{\epsilon }^{2}{\mathsf {I}}_n)^{-1} \mathbf {\text {VaR}}_{\alpha }^\top , \end{aligned}$$
(11)

where \(\mathbf {\text {VaR}}_{\alpha }\) is the vector containing the empirical VaR values from the observations. The empirical VaR is estimated by drawing m independent samples from a Sobol sequence [34,35,36,37]. The assumption of continuous domain needed for GPs holds true if the VaR exists. Further intuition on the approximation of distributional statistics is given by [38].

3.4 Implementation details

We have implemented the framework as a plug-in for the above-mentioned commercial software ASCMO as Matlab code. This has the advantage that ASCMO already offers several possibilities for modeling, optimization, and analysis.

For this work, in particular, we have also implemented the methodology in Python. This is necessary to ensure a high level of transparency in the calculations. We train all models using the GPRegression module from the GPy library [39]. Maximum likelihood estimates are calculated using the implemented stochastic conjugate gradient algorithm. For creating Sobol quasirandom sequences we use an implementation written by [40], which is available at https://people.sc.fsu.edu/~jburkardt/py_src/sobol/sobol.html. We use the sequential least squares programming algorithm from the scientific computing library [41] to solve the MINLP problems. Correlated random variables are generated using Cholesky decomposition, see Appendix A.

4 Test subject

The ICE we use for our investigations serves as a platform for research purposes at Robert Bosch GmbH [42]. The test subject is an inline four-cylinder light-duty diesel engine for passenger cars that is equipped with a direct injection system. The ICEs main characteristics are given in Table 1.

Table 1 Technical data of the research engine

For monitoring purposes, thermodynamic quantities such as temperatures, pressures, and mass flows are measured. The exhaust gas is extracted upstream of the exhaust gas after-treatment system. Nitrogen oxide (NOx) emissions are measured via chemiluminescence. Soot is measured using a high-resolution smoke meter [44]. The combustion noise is calculated using an in-cylinder pressure-based method [45].

4.1 The engine model

In general, the combustion process in a diesel engine is determined by the OP and design parameters. The OP is defined by the engine speed and engine torque. The control or design parameters consist of air supply and fuel supply parameters. The air supply parameters are intake air mass, exhaust gas recirculation rate, swirl flap actuation, and boost pressure. The fuel supply parameters are common rail pressure, injected quantities of the pilot and post injections (PoI), the start of energizing (SOE) of the main injection (MI) in Crank angle degree after top dead center firing (CAD a. TDCf) and the distances between each injection in seconds, see Figure 3.

Fig. 3
figure 3

Qualitative representation of the crankshaft position of a diesel engine, illustrating the injection pattern for the investigated OP

For the investigations, we create models of the relevant outputs that are sensitive with respect to the parameters named above. For the sake of simplicity, the OP is fixed to \({2000}{\mathrm{min}}^{-1}\) engine speed and \({400}{\mathrm{kPa}}\) brake mean effective pressure. The model is based on stationary test bench measurements from a design of experiments (DoE) that we set up using the commercial software package ASCMO [46]. The DoE samples are drawn from a Sobol sequence on \((\mathbf {x}^{lower}, \mathbf {x}^{upper})\in {\mathbb {R}}^d\), with d being the number of input dimensions or design parameters and \(\mathbf {x}^{lower}\) and \({\mathbf {x}^{upper}}\) being vectors that represent the range within which the design variables can be varied, see Table 2. For this measurement campaign, the EGR rate was not actively varied during the experiments. This leads to maximum EGR rates and avoids parameter combinations producing high nitrogen oxide emissions. The low-pressure EGR valve and throttle valve were fully open during the experiments. The injection quantity of the MI is not listed, as it results from the other parameters and the load. It is important to note that we assume the test subject to consist of nearly nominal components since we aim to create models for the nominal engine.

Table 2 Boundaries of the design of experiments

We use \(n=675\) measurements to create a model of each target quantity. The outputs are modeled independently using zero-mean prior GP regression models with additive Gaussian noise and automatic relevance determination (ARD) SE kernel [6, 47]. The model hyperparameters are estimated by maximizing the log marginal likelihood. In order to make meaningful design decisions through optimization, the model quality must be high. This is especially true for a stochastic optimization scenario. A common issue with supervised learning algorithms such as GPs is overfitting [48, 49]. This is basically the phenomenon of the algorithm memorizing the training data and not generalizing properly. An overfit model makes perfect predictions for the training data. When confronted with unknown test data, though, the predictions will be arbitrary. Although GPs under the premise of large amounts of data are generally robust against overfitting [6], the model can still overfit when maximizing the log marginal likelihood. One way of ruling out overfitting phenomena is to validate the model on unseen test data using proper performance metrics such as the normalized root-mean-square error (NRMSE) and the coefficient of determination (\(\text {R}^2\)), see Appendix B. Since in our case we only have a limited amount of training samples, we need to validate the model without the use of test samples. A valid way to accomplish this is to use the leave-one-out cross-validation (LOOCV) method, see Appendix C.

Figure 4 shows two scatter plots derived from the nitrogen oxide model. The left plot shows the observation and prediction value pairs. The model quality is excellent and the error between prediction and observation is in the measurement error range. The plot to the right shows value pairs consisting of observed values and model predictions using the leave-one-out method. The correspondence is very good and indicates that the model generalizes well.

Fig. 4
figure 4

Scatter plot derived from the nitrogen oxide model showing predicted vs. observed value pairs for the training data (left) and the leave-one-out method (right)

Table 3 shows the NRMSE LOOCV, and \(\text {R}^2\) LOOCV results of the three engine models. We see that model quality is high for each of the three models with \(\text {R}^2\) values close to 1 and the error is within the measurement accuracy range. The RMSE LOOCV and \(\text {R}^2\) LOOCV are similar to the RMSE and \(\text {R}^2\), which indicates good generalization for each of the three models.

Table 3 NRMSE, \(\text {R}^2\), NRMSE LOOCV, and \(\text {R}^2\) LOOCV values of the engine models

4.2 System uncertainty

In this section, we show the results of the uncertainty analysis we have performed for the above-described test subject. Combustion engines are subject to a variety of uncertainties, ranging from environmental and operating conditions to manufacturing tolerances. In this article, we focus on manufacturing tolerances in the most relevant components since these are the main contributors to the overall uncertainty during stationary operation.

Table 4 Results of uncertainty analysis

Table 4 shows a list of the design parameters with the corresponding uncertainty distributions, which result from the uncertainty analysis we have performed for the engine.

5 Asymptotic properties of the VaR surrogates

In this section, we discuss the asymptotic properties of the VaR surrogate models. We investigate the performance of the GP surrogates with respect to the number of training samples n and the number of random samples m used to calculate the VaR. We further investigate which ratio of n and m is most efficient in terms of computation for a given engine out soot emission model. Figure 5 shows the performance of three GP surrogates with \(n=256\), 512, and 1024 training data points as a function of m. The surrogate models are trained assuming zero-mean prior ARD SE kernel and additive Gaussian noise. The input parameter uncertainty is given in Table 2 and the chosen confidence level is 0.95. The training data is randomly drawn from the multivariate uniform distribution given in Table 2. The plot also shows the performance of empirical estimates using pure MC and QMC. The predictive performance is measured on a separate test data set with \(n_{test}=100\) using the NRMSE. The box plots are the result of repeating this experiment with 50 random initializations. Since the training samples are drawn randomly, the trained model differs each time. Similarly, the MC and QMC results vary since the bootstrap is initialized randomly. The residual error \(e_{i}\) for each test sample is the difference of the prediction and an empirical VaR using \(m_{test}=10^{6}\) quasirandom samples that is close to the true VaR.

Fig. 5
figure 5

Performance of the GP surrogate with \(n=256\), 512, and \(n=1024\) training data points, of MC sampling and QMC sampling as a function over m. The performance is measured using the NRMSE on a set of \(n_{test}=100\) test samples. The failure probability is \(1-\alpha =0.05\)

MC sampling shows significant variance in performance. QMC sampling shows the best performance with a significantly lower variance than the MC method. The GP surrogate models show low variance in performance, even with small n and m. Given a sufficiently high amount of training samples, the GP surrogate shows good performance only second to the QMC method. [38] have already argued that under certain conditions for a given confidence level a GP-based quantile estimate is asymptotically consistent and efficient. These conditions being: \({\mathcal {X}}\) is a convex and compact subset of \({\mathbb {R}}^d\), the training data is independent and identically distributed in \({\mathcal {X}}\), the assumption of continuous domain for the VaR holds and there exists a ratio \(\frac{n}{m}\) that is efficient in terms of function evaluations. They established a bound under which as \(n,m \rightarrow \infty\) the mean of the GP posterior almost surely converges to the true VaR at the fastest rate possible in terms of function evaluations.

Figure 6 depicts the GP surrogate performance as a function of the ratio \(\frac{n}{m}\) for an increasing amount of function calls required for training. It is evident that model performance increases as \(n,m \rightarrow \infty\). For this model, \(\frac{n}{m}\approx 2\) seems to be a good choice to maximize model performance.

Fig. 6
figure 6

Performance of the GP surrogate with \(n\cdot m=2^{17}\) (black circle), \(n\cdot m=2^{18}\) (blue cross), \(n\cdot m=2^{19}\) (red plus) and \(n\cdot m=2^{20}\) (green diamond) function calls required for training over the ratio \(\frac{n}{m}\). The performance is measured using the NRMSE on a set of \(n_{test}=100\) test samples

6 Application to an engine calibration problem

To demonstrate the fitness and strength of the proposed framework, we apply it to the engine calibration problem described above. For the sake of simplicity, the problem is formulated for a single OP. In practice, it is common to optimize several interconnected OPs simultaneously [50]. For this OP, we aim to minimize nitrogen oxide emissions while ensuring that soot and noise emissions are kept below an upper bound for \({95}{\%}\) of all realizations. We further wish to find a solution inside the initial parameter space \((x^{lower}, x^{upper})\in {\mathbb {R}}^d\) described by Table 2, since the GPs we construct are not suited for extrapolation.

Using the engine models we constructed in Sect. 4.1, we train surrogate models to emulate the VaR. We sample \(n=500\) training data points from the above mentioned bounded parameter space and draw \(m=1000\) QMC samples from the distribution given by Table 4 to estimate the empirical VaR for each point. This equals \(\frac{n}{m}=0.5\) and is only slightly worse than \(\frac{n}{m}=2\) performance-wise. However, considering the time complexity of a standard GP of \({\mathcal {O}}(n^{2})\), the computation time needed for a prediction is reduced fourfold. The corresponding mean estimate of the emulator can be expressed according to equation 11. Figure 7 shows two scatter plots derived from the VaR surrogate of the soot model. The left plot shows observation and prediction value pairs for the training data. The right plot shows observation and prediction value pairs for \(n_{test}=100\) test data points. The correspondence is nearly perfect for the training and test samples.

Fig. 7
figure 7

Scatter plot derived from the soot model showing predicted vs. observed value pairs for the training data (left) and test data (right)

Table 5 shows the NRMSE and \(\text {R}^2\) results for all three surrogate models. The surrogates emulate the VaR perfectly, even for the unseen test samples, and generalize well.

Table 5 NRMSE, \(\text {R}^2\), NRMSE LOOCV, and \(\text {R}^2\) LOOCV values of the engine models

6.1 Problem formulation

To assess the benefit of our methodology, we contrast it to the deterministic setting using safety coefficients. Both tasks can be formulated as MINLP problems. The deterministic scenario is expressed as

$$\begin{aligned} \min _{x}&f^{\text { NO}_\mathrm{x}}(\mathbf {x}) \end{aligned}$$
(12a)
$$\begin{aligned} \text {s.t.}&l^{\text { Soot}}*s^{\text { Soot}}-f^{\text { Soot}}(\mathbf {x})\le 0, \end{aligned}$$
(12b)
$$\begin{aligned}&l^{\text { Noise}}-s^{\text { Noise}}-f^{\text { Noise}}(\mathbf {x})\le 0, \end{aligned}$$
(12c)
$$\begin{aligned}&x^{lower}_{i}\le x_{i} \le x^{upper}_{i}\forall i=1,\cdots ,d \end{aligned}$$
(12d)

where f is the predictive mean of the engine model, l is an upper limit for admissable emissions, and s is a safety coefficient. The safety coefficients are empirical values based on expert knowledge accounting for tolerance behavior. The settings we chose for this particular problem are given in Table 6. The given safety coefficients are the state of the art choice for the given calibration task.

Table 6 Constraints used for optimization

The corresponding problem formulation for the stochastic setting is given by

$$\begin{aligned} \min _{\mathbf {x}}&\text {VaR}_{0.50}^{\text { NO}_\mathrm{x}}(\mathbf {x}) \end{aligned}$$
(13a)
$$\begin{aligned} \text {s.t.}&l^{\text { Soot}}-\text {VaR}_{\alpha }^{\text { Soot}}(\mathbf {x})\le 0, \end{aligned}$$
(13b)
$$\begin{aligned}&l^{\text { Noise}}-\text {VaR}_{\alpha }^{\text { Noise}}(\mathbf {x})\le 0, \end{aligned}$$
(13c)
$$\begin{aligned}&x^{lower}_{i}\le x_{i} \le x^{upper}_{i}\forall i=1,\cdots ,d. \end{aligned}$$
(13d)

6.2 Optimization results

In this section, we present the optimization results. The Sequential Least-Squares Quadratic Programming algorithm [51] is used for solving the above-formulated problems. The method can be used to solve MINLP problems, which are subject to equality and inequality constraints and to lower and upper bounds on the model parameters. The optimization problem is solved for \(n_{init}=20\) quasirandom initialization in an attempt to find a global optimum. The function tolerance is set to \(10^{-5}\). For the remaining settings, the default values are selected. Analytical formulations for the GP derivatives are provided for the stochastic setting.

Table 7 Optimization results for the different optimization scenarios. The various statistics were calculated using \(10^{6}\) QMC samples

Figure 8, Figure 9 and Table 7 show the optimization results for both scenarios. Although the ultimate goal is the same, we see two distinct results. It is evident that the prior assumptions we made for the deterministic case are too conservative. The \(\text {VaR}_{0.95}\) for soot and noise is well below the actual upper bound we were aiming for. By selecting smaller safety coefficients, an improvement in nitrogen oxide emissions would have been possible for this particular setting. On the other hand, for a different choice of design parameters, a smaller value could also lead to non-compliance with the constraints. Without prior knowledge of the output variance, a sufficiently large margin must be chosen.

Fig. 8
figure 8

Nominal optimization results for nitrogen oxide and soot emission of the deterministic (black circle) and stochastic (black rectangle) solution. And 200 samples drawn from the target distribution at these settings for both the deterministic (blue circle) and stochastic solution (red rectangle) with corresponding probability density estimates

Fig. 9
figure 9

Nominal optimization results for nitrogen oxide and noise emission of the deterministic (black circle) and stochastic (black rectangle) solution. And 200 samples drawn from the target distribution at these settings for both the deterministic (blue circle) and stochastic solution (red rectangle) with corresponding probability density estimates

In contrast to the deterministic setting, our probabilistic approach renders a good estimate for the VaR. This way, we obtain a solution that meets the desired boundary conditions almost exactly. The expected NOx emissions are reduced from \({1.129}{\mathrm{g/kWh}}\) to \({0.281}{\mathrm{g/kWh}}\). This corresponds to a decrease of \({79.54}{\%}\). The expected soot and noise emissions are slightly higher at \({0.430}{\mathrm{FSN}}\) and \({85.841}{\mathrm{dBA}}\) but do comply with the limits for \({95}{\%}\) of the realizations.

6.3 Gain in computing time

In this section, we compare the computation time of our method with the deterministic setting and a stochastic QMC equivalent. The aim is to keep computing time as low as possible. The reason for this is that in practice multiple OPs need to be optimized simultaneously and a high number of constraints must be satisfied. At the same time, the cost function needs to be minimized for a high number of driving trajectories, which leads to high numbers of function evaluations [16]. As mentioned above, the optimization problem is solved for \(n_{init}=20\) quasirandom initialization in an attempt to find a global optimum. All computations are carried out on a machine with an Intel Xeon E3-1505M CPU, 32 GB Ram, and an Nvidia Quadro M2000M GPU. We first compare the time needed to compute the optimization task of the deterministic setting with our approach. The computation time for both methods is \({15.87}{\mathrm{s}}\) and \({9.26}{\mathrm{s}}\) respectively. The time difference is mainly due to the higher prediction costs of the original engine models. As mentioned above, the time complexity of a standard GP is of \({\mathcal {O}}(n^{2})\). We further assess the amount of computation time that is saved compared to the setting where the stochastic problem is solved numerically. We, therefore, compare our method with the QMC method. To ensure similar accuracy as with the surrogate model, 500 quasirandom samples are drawn for estimating the empirical VaR. The optimization results for both solutions are given in Table 7. The QMC method finds a similar optimum as our approach. Differences in the solutions result mainly from the termination criteria of the optimizer. Model inaccuracy and numerical errors account for slight differences only. The theoretical time complexity for both methods should be of \({\mathcal {O}}(n^{2}n_{e}m)\), where n is the number of training samples of the respective models, m is the number of draws needed to evaluate the VaR, and \(n_{e}\) is the number of total function calls needed for optimization. If we assume that both algorithms need the same amount of function evaluations to find the optimum, our approach in this scenario should be approximately 900 times faster than the QMC method. The measured computing time for our approach and the QMC method is \({9.26}\mathrm{s}\) and \({1691.76}\mathrm{s}\) respectively. Our surrogate-based method is \(\sim 183\) times faster than the QMC approach. The difference to the theoretically possible computational time gain is due to the higher impact of load times on the overall shorter computing times of our approach. For larger optimization tasks, the impact should be significantly lower and the computational time gain is expected to be closer to the theoretical time gain. The time gain we obtain for the simplified problem we have solved may seem noteworthy. As mentioned before though, real-world calibration problems involve significantly more function evaluations and optimization usually takes several hours. In view of this, we can achieve a considerable reduction in computing time and enable large-scale optimizations without the need for parallel computing on multiple machines.

6.4 Conclusions

Stochastic design optimization was performed for a representative engine calibration problem in a single OP using the proposed method. The performance of our method was evaluated against the corresponding deterministic optimization scenario using safety coefficients. Compared to the deterministic scenario, it was possible to reduce the expected nitrogen oxide emissions for the OP by \({79.54}{\%}\). The imposed soot and noise constraints and design space boundaries were met in both cases. The reduction in nitrogen oxide emissions is mainly due to the accurate estimation of emission distributions. Due to the analytical form of the GP surrogates, evaluating the VaR is extremely cheap and computing time is comparable to the deterministic method.

In a second step, we contrasted our method to the numerical equivalent solution using QMC sampling. The global optimum was found in both cases with slight numerical differences. Our method showed a decrease in computing time by a factor of 183. This reduction is likely to be even greater for large-scale problems. Due to the low computing time, the method can be applied to large-scale engine calibration problems without the need for parallel computing.

7 Summary and outlook

In this paper, we proposed and demonstrated a method for efficiently solving large-scale stochastic optimization problems subject to nonlinear constraints. Our main motivation for developing the method is to employ it in the field of base engine calibration. The premise of our method is that a model of the system under study exists, which can be evaluated cheaply. Furthermore, it is essential that the stochastic optimization problem can be formulated such that the objective function and the constraint functions can be expressed using the VaR. The main idea is to derive analytically closed formulations for the VaR, which are cheap to evaluate and thus reduce the computational effort of evaluating the objective and constraints. The VaR is therefore learned as a function of the input parameters of the initial model using a supervised learning algorithm. For this work, we employ Gaussian process regression models.

For demonstration purposes, we have employed the methodology on an engine base calibration problem given a realistic optimization scenario in a single OP. We also performed a state-of-the-art deterministic optimization given a realistic engineering target. For the given scenario the objective was to minimize nitrogen oxide emissions while complying with an upper limit for soot and noise emissions. The optimization results show that the consistent treatment of the problem as a statistical one helps to maximize performance compared to the deterministic approach. Our stochastic approach reduces nitrogen oxide emissions by almost \({80}{\%}\) compared to the deterministic scenario. The constraints on soot and noise emissions are complied with. This significant decrease shows a high potential for the reduction of overall raw emissions for practical application of the methodology.

In a second step, we demonstrated the computational efficiency of the method by comparing the computing time of the optimization process to the deterministic case and an equivalent numerical QMC solution. Our method performed on par with the deterministic scenario time-wise. It reduced computational effort by nearly 200 times compared to the QMC alternative while converging on the same global optimum.

In this article, we focused on base calibration problems. Another application of uncertainty quantification in the field of engine calibration could be to use uncertainty analysis to assess the overall system uncertainty. This would facilitate decision making in the selection of engine components.

An interesting topic for future research is the extension of the method to larger calibration problems, where the entire operating range of the engine consisting of more than one OP is considered and optimized simultaneously. This adds additional uncertain parameters such as route, traffic, and driving style to the probabilistic problem [17]. A major challenge for solving this kind of task is to create global engine models that are sensitive with regard to the design parameters and are able to capture engine dynamics at the same time. However, recent proposals seem to be promising [52, 53].