A stochastic design optimization methodology to reduce emission spread in combustion engines

This paper introduces a method for efficiently solving stochastic optimization problems in the field of engine calibration. The main objective is to make more conscious decisions during the base engine calibration process by considering the system uncertainty due to component tolerances and thus enabling more robust design, low emissions, and avoiding expensive recalibration steps that generate costs and possibly postpone the start of production. The main idea behind the approach is to optimize the design parameters of the engine control unit (ECU) that are subject to uncertainty by considering the resulting output uncertainty. The premise is that a model of the system under study exists, which can be evaluated cheaply, and the system tolerance is known. Furthermore, it is essential that the stochastic optimization problem can be formulated such that the objective function and the constraint functions can be expressed using proper metrics such as the value at risk (VaR). The main idea is to derive analytically closed formulations for the VaR, which are cheap to evaluate and thus reduce the computational effort of evaluating the objective and constraints. The VaR is therefore learned as a function of the input parameters of the initial model using a supervised learning algorithm. For this work, we employ Gaussian process regression models. To illustrate the benefits of the approach, it is applied to a representative engine calibration problem. The results show a significant improvement in emissions compared to the deterministic setting, where the optimization problem is constructed using safety coefficients. We also show that the computation time is comparable to the deterministic setting and is orders of magnitude less than solving the problem using the Monte-Carlo or quasi-Monte-Carlo method.


Introduction
In light of increasing awareness concerning the detrimental effects of the transportation sector on the environment, lawmakers and consumers demand stricter legal guidelines for motor vehicles. This obliges manufacturers to comply with demanding standards such as low carbon dioxide emissions, little pollutants, and extensive onboard diagnosis functions while expecting high performance and driving comfort [1]. European legislation also stipulates that each vehicle in the field must be compliant with emission regulations, which is verified by random tests on fleet vehicles randomly taken from customers [2]. For manufacturers, this means that their product must not only comply with emission limits but needs also to be robust with regard to them. A major contributor to the uncertainty of fleet emissions is the component tolerances of the internal combustion engine (ICE). To provide an intuition: deviations between the physical air mass flow and the measured value of the air mass flow meter may lead to a change in the volumetric efficiency of the ICE, which in turn leads to different combustion behavior and combustion products. Manufacturers usually rely on empirical evidence and conservative decisionmaking to account for this. Emission targets are artificially decreased by using safety coefficients to determine a so called enginnering target. Calibration is then performed on an ICE consisting of nominal parts and aims at meeting the engineering target. However, calibration design that is based on an engineering target lacks sound statistical evaluation. It can lead to overly conservative calibration results and leaves room for improvement in quantities of interest such as carbon dioxide, pollutants, or power output. On the contrary, it can also lead to a non-compliant calibration, which is associated with costs and even delays of the start of production. The quantification and consideration of component-related uncertainties, therefore, has the potential to better meet desired optimization targets and reduce costs.
The complex nature of engine calibration tasks [3] with usually 10 and more design parameters generally renders manual calibration suboptimal. Model-based calibration methods have proven to be a successful solution in the past and have become state of the art [1,3,4]. Since modeling the entire combustion process using physical models is complex and computationally intensive, statistical models turned out to be a good alternative [3][4][5]. In particular, Gaussian processes [6] in combination with adapted experimental designs [7] are established in the field.
Reliability-based design is of crucial importance in engineering design. Hence, numerous methods have been proposed to systematically treat uncertainties in product design and carry out stochastic [8][9][10][11] or reliability-based [12] design optimization [13][14][15]. Stochastic methodology has already been applied in powertrain development. [16] and [17] performed stochastic optimization aiming to quantify and minimize uncertainty resulting from different driving cycles. In [18] chance-constrained optimization was performed to improve thermal efficiency. [19] did some early work on stochastic optimization of spark ignition engines considering spark advance and injection time uncertainty. Given the resources at the time, they make use of many linear approximations, which can be problematic given the nonlinear nature of more complex calibration problems. To the authors' knowledge, there is no article addressing stochastic design optimization that considers component tolerances and is suitable for large-scale engine calibration problems. The objective of this paper is, therefore, to develop a methodology that solves this issue and incorporates it into a statistical framework.
The contribution of this article is a stochastic design optimization methodology for engine calibration problems. It is imperative to reduce the computational effort to a manageable level, taking into account the dimension of a typical calibration problem consisting of numerous operating points (OP) that need to be optimized. The proposed method aims to improve calibration results by replacing conventional safety coefficient-based calibration with uncertainty-based optimizations. The uncertainty of the design parameters is derived from the component tolerances employing a priori uncertainty analysis. The original optimization problem is reformulated to minimizing the expectation of a modified objective function taking uncertainties into account while limiting some high or low quantiles of the constraint functions. This is similar to the chance-constrained optimization setting in the literature [20,21]. Solving this type of stochastic optimization task using Monte Carlo (MC) methods [22] can be computationally expensive for state-of-the-art calibration tasks where the engine models need to be evaluated several times. To address this issue, we approximate the necessary distribution statistics of the quantities of interest using Gaussian Process (GP) regression models. Since the GP expression has an analytical form, we can simplify the stochastic problem into a deterministic one. Thus reducing the computational burden by orders of magnitude.
This article is structured as follows: In Sect. 2 we provide some necessary fundamentals. In Sect. 3 we describe the proposed stochastic design optimization framework. We then introduce the test subject in Sect. 4. In Sect. 5, we discuss the asymptotic properties of the VaR surrogate models. In Sect. 6, the proposed methodology is applied to an engine calibration problem. First, the performance of the method is contrasted against a state-of-the-art deterministic calibration setting followed by a comparison with the numerical alternative using the quasi-Monte Carlo (QMC) method to quantify the computational effort needed to solve the optimization problem. Finally, the results of the article are discussed in sect. 7.

Background
As described above, this section introduces some necessary fundamentals. First, we give some intuition on Gaussian Process regression. We then provide an overview of nonlinear constrained optimization problems.

Black-box modeling with Gaussian processes
Bayesian regression based on Gaussian processes is an approach for nonparametric modeling where an unknown system y = f ( ) + is modeled from some observed data [23][24][25]. For combustion engine models we usually assume our various system outputs to be uncorrelated. We, therefore, focus on the basic single output supervised learning setting where our goal is to approximate a function [26]. In the Bayesian framework, we assume a GP to be a collection of random variables so that every finite set of these has a joint Gaussian distribution and is completely specified by its mean ( ) and covariance function k( , j ) and can be written as f ( ) ∼ GP( ( ), k( , j )) . Given noisy observations and assuming the GP to have a zero-mean prior, its joint distribution for an arbitrary input is given as where ∈ ℝ n×n is the covariance matrix, 2 the noise variance, and n the n × n identity matrix. Given this joint distribution, the predictive distribution of the GP for an arbitrary input is specified by its mean and variance with where k( ) = (k( , 1 ), ..., k( , n )) ⊤ is a column vector containing the covariance of the input and the training samples.
(1) p(y | D) = N(y|0, + 2 n )., By definition, the covariance function is symmetric positive-definite, i.e. for any set of inputs the resulting covariance matrix is positive-definite. A popular choice for the covariance function is the squared exponential (SE) kernel given by where 2 s ∈ ℝ + 0 is the signal variance and ‖⋅‖ l designates the norm taking into account the characteristic length-scales l = {l i } with i = 1, ⋯ , d given by the sum of squares of the product ( p − q )l −1 i . The infinite differentiability of the SE kernel results in smooth predictive functions. Further intuition on covariance functions is given in [6].
The prior and likelihood of the GP encode our beliefs about the model we wish to learn. With , s , and l being the hyperparameters of the model. GP hyperparameters can be learned using different methods such as integration via Markov chain Monte Carlo [25,27]. Given the above assumptions, the hyperparameters can also be learned by maximizing the log marginal likelihood with = (y 1 , … , y n ) . For covariance functions with firstorder derivatives, this can be solved using gradient-based optimization.

Probabilistic nonlinear constrained optimization
Many industrial tasks such as base engine calibration are nonlinear constrained optimization problems. They aim at minimizing an objective f on a domain X subject to equality and inequality constraints. Since equality constraints are of no interest to this particular paper, they shall be disregarded from here on. The general mathematical formulation of a deterministic nonlinear constrained optimization problem is, therefore, given by where f is the cost function that needs to be minimized, which is subject to constraints c. This kind of problem classifies as a convex or non-convex mixed-integer nonlinear programming (MINLP) problem depending on the space X feasible ⊆ X of feasible solutions. More intuition on MINLP problems and an overview of methods for solving these are provided in [28,29]. Most real-world applications, however, are subject to uncertainties ranging from environmental and operating conditions to manufacturing tolerances. In this probabilistic setting, the system inputs are assumed to be random quantities.
Definition 1 Let X be a multivariate random variable on a probability space (Ω, Σ, P) with sample space Ω , sigmaalgebra Σ , and probability measure P. Then Y = f (X) is a transformation of X under f and again a random variable on Ω . Its cumulative distribution function is given by Given this, a possible probabilistic formulation of the nonlinear constrained optimization problem in equation 6a and 6bis given by the individual chance-constraint optimization setting where the new objective is to minimize the first statistical moment of the distribution f(X) subject to probabilistic constraints. Solving these kinds of stochastic MINLP problems involves the evaluation of multi-dimensional integrals for the objective and constraints, which can be computationally expensive even for low dimensional problems. Often a more feasible alternative, especially for high dimensional problems is, therefore, the utilization of sampling techniques such as MC sampling [22]. This involves drawing samples from the distribution X, evaluating these, and approximating the distribution f(X). However, evaluating a high number of samples for optimization tasks involving a large number of function evaluations is computationally infeasible. A popular approach to tackle this issue is to simplify the stochastic problem into a deterministic task [30].

The stochastic design optimization framework
In this section, we propose a framework for stochastic design optimization, that aims at quantifying and considering component tolerances during the model-based calibration process. The methodology aims to find optimal design parameters for a system subject to uncertainty by solving a probabilistic constrained optimization problem. An necessity for the methodology is short computing time. This results from the requirement to keep the powertrain development cycle as short as possible. The overall objective of the methodology is to replace conventional safety factors using a priori uncertainty analysis and improve the calibration results. Figure 1 depicts the general structure of the framework.

Uncertainty analysis
In general, there are two types of uncertainties [31]: While pure random effects such as geometric tolerances due to imprecisions during production are classified as aleatoric uncertainty, systematic effects like material aging are epistemic. Both effects need to be considered in an uncertainty analysis.
To take input uncertainty into account during optimization, it has to be determined beforehand. The combustion process of an engine is mainly determined by the design parameters or the set ECU parameters. Manufacturing tolerances lead to deviations from the set values and in turn to changes in quantities such as fuel consumption and emissions. Since the design parameters of our engine models are not directly subject to uncertainty, we have to perform an uncertainty analysis that maps relevant sources of uncertainty such as manufacturing tolerances to the design parameters. This is achieved by performing an uncertainty analysis, where uncertainty propagation is used to determine the design parameter uncertainty. Figure 2 depicts an excerpt from the uncertainty analysis we performed for the calibration problem in Sect. 6. To convey an intuition here: using the results of tolerance investigations on a set of injectors, we determine the uncertainty distribution of the injected quantity (IQ) of a single pilot injection (PI), which depends on the nominal rail pressure and nominal IQ. Since we cannot measure the emissions of each individual cylinder, we sample all cylinders and assume the sample mean to represent the final quantity that results in the measured emissions.

Optimization problem formulation
As mentioned before, the engine calibration task is a constrained optimization problem. The aim is to minimize an objective function while making sure that some constraints are met for the entire vehicle fleet. A potential formulation for a nonlinear constrained optimization problem was given by equations 8a and 8b. This is a reasonable formulation for the combustion engine setting since it also considers the tails of the target distributions. For example, if a vehicle with a nominal ICE barely complies with emission limits, then a high number of vehicles will be surely non-compliant.

Definition 2
The value at risk of a multivariate random variable X given a confidence level ∈ (0, 1) is given by We can reformulate equation 8b using the VaR and write the optimization problem as where VaR i is a probabilistic constraint function and c c,i the corresponding constraint.

Surrogate model of the value at risk
As already described above, base calibration tasks are usually high dimensional MINLP problems. The associated uncertainty of the design parameters is not necessarily uniform or Gaussian and often correlated. This renders cheap approximations such as Most Probable Points [32] useless. Therefore, sampling is often the only viable way to approximate distribution statistics. However, base calibration problems usually require millions of function evaluations, making the problem extremely expensive in terms of computing time and impractical to solve. Generally, a viable way of reducing the computational burden resulting from expensive experiments or simulations is to train a surrogate model that reproduces the behavior of the experiment or simulation [33]. The idea is to create a black-box model mapping the parameters of the experiment or the simulation to the desired outputs. If such a model can be created with good model quality and using a limited number of measurements or simulation runs, then it can reduce the computational burden of optimization substantially. The resulting black-box model is relatively cheap to evaluate. In our case, we wish to train a model g ∶ ↦ VaR (X) that maps the nominal input x to the corresponding VaR . Our implementation uses GPs to learn the underlying VaR. Given a set of n observations , the expectation of the GP surrogate model is expressed as where VaR is the vector containing the empirical VaR values from the observations. The empirical VaR is estimated by drawing m independent samples from a Sobol sequence [34][35][36][37]. The assumption of continuous domain needed for GPs holds true if the VaR exists. Further intuition on the approximation of distributional statistics is given by [38].

Implementation details
We have implemented the framework as a plug-in for the above-mentioned commercial software ASCMO as Matlab code. This has the advantage that ASCMO already offers several possibilities for modeling, optimization, and analysis.
For this work, in particular, we have also implemented the methodology in Python. This is necessary to ensure a high level of transparency in the calculations. We train all models using the GPRegression module from the GPy library [39]. Maximum likelihood estimates are calculated using the implemented stochastic conjugate gradient algorithm. For creating Sobol quasirandom sequences we use an implementation written by [40], which is available at https ://peopl e.sc.fsu. edu/~jburk ardt/py_src/sobol /sobol .html. We use the sequential least squares programming algorithm from the scientific computing library [41] to solve the MINLP problems. Correlated random variables are generated using Cholesky decomposition, see Appendix A.

Test subject
The ICE we use for our investigations serves as a platform for research purposes at Robert Bosch GmbH [42]. The test subject is an inline four-cylinder light-duty diesel engine for passenger cars that is equipped with a direct injection system. The ICEs main characteristics are given in Table 1.
For monitoring purposes, thermodynamic quantities such as temperatures, pressures, and mass flows are measured. The exhaust gas is extracted upstream of the exhaust gas after-treatment system. Nitrogen oxide (NO x ) emissions are measured via chemiluminescence. Soot is measured using a high-resolution smoke meter [44]. The combustion noise is calculated using an in-cylinder pressure-based method [45].

The engine model
In general, the combustion process in a diesel engine is determined by the OP and design parameters. The OP is defined by the engine speed and engine torque. The control or design parameters consist of air supply and fuel supply parameters. The air supply parameters are intake air mass, exhaust gas recirculation rate, swirl flap actuation, and boost pressure. The fuel supply parameters are common rail pressure, injected quantities of the pilot and post injections (PoI), the start of energizing (SOE) of the main injection (MI) in Crank angle degree after top dead center firing (CAD a. TDCf) and the distances between each injection in seconds, see Figure 3.
For the investigations, we create models of the relevant outputs that are sensitive with respect to the parameters named above. For the sake of simplicity, the OP is fixed to 2000min −1 engine speed and 400kPa brake mean effective pressure. The model is based on stationary test bench measurements from a design of experiments (DoE) that we set up using the commercial software package ASCMO [46]. The DoE samples are drawn from a Sobol sequence on ( lower , upper ) ∈ ℝ d , with d being the number of input dimensions or design parameters and lower and upper being vectors that represent the range within which the design variables can be varied, see Table 2. For this measurement campaign, the EGR rate was not actively varied during the experiments. This leads to maximum EGR rates and avoids parameter combinations producing high nitrogen oxide Turbocharger with variable geometry turbine Exhaust-gas recirculation Cooled high-pressure EGR + low-pressure EGR system (EGR) system Exhaust-gas after-treatment Diesel oxidation catalyst + close-coupled selective system catalytic reduction (SCR) catalyst on a diesel particulate filter + underfloor SCR catalyst with ammonium-slip coating  We use n = 675 measurements to create a model of each target quantity. The outputs are modeled independently using zero-mean prior GP regression models with additive Gaussian noise and automatic relevance determination (ARD) SE kernel [6,47]. The model hyperparameters are estimated by maximizing the log marginal likelihood. In order to make meaningful design decisions through optimization, the model quality must be high. This is especially true for a stochastic optimization scenario. A common issue with supervised learning algorithms such as GPs is overfitting [48,49]. This is basically the phenomenon of the algorithm memorizing the training data and not generalizing properly. An overfit model makes perfect predictions for the training data. When confronted with unknown test data, though, the predictions will be arbitrary. Although GPs under the premise of large amounts of data are generally robust against overfitting [6], the model can still overfit when maximizing the log marginal likelihood. One way of ruling out overfitting phenomena is to validate the model on unseen test data using proper performance metrics such as the normalized root-mean-square error (NRMSE) and the coefficient of determination ( R 2 ), see Appendix B. Since in our case we only have a limited amount of training samples, we need to validate the model without the use of test samples. A valid way to accomplish this is to use the leave-one-out crossvalidation (LOOCV) method, see Appendix C. Figure 4 shows two scatter plots derived from the nitrogen oxide model. The left plot shows the observation and prediction value pairs. The model quality is excellent and the error between prediction and observation is in the measurement error range. The plot to the right shows value pairs consisting of observed values and model predictions using the leave-one-out method. The correspondence is very good and indicates that the model generalizes well. Table 3 shows the NRMSE LOOCV, and R 2 LOOCV results of the three engine models. We see that model quality is high for each of the three models with R 2 values close to 1 and the error is within the measurement accuracy range. The RMSE LOOCV and R 2 LOOCV are similar to the RMSE and R 2 , which indicates good generalization for each of the three models.

System uncertainty
In this section, we show the results of the uncertainty analysis we have performed for the above-described test subject. Combustion engines are subject to a variety of uncertainties, ranging from environmental and operating conditions to manufacturing tolerances. In this article, we focus on manufacturing tolerances in the most relevant components since these are the main contributors to the overall uncertainty during stationary operation. Table 4 shows a list of the design parameters with the corresponding uncertainty distributions, which result from the uncertainty analysis we have performed for the engine.

Asymptotic properties of the VaR surrogates
In this section, we discuss the asymptotic properties of the VaR surrogate models. We investigate the performance of the GP surrogates with respect to the number of training samples n and the number of random samples m used to calculate the VaR. We further investigate which ratio of n and m is most efficient in terms of computation for a given engine out soot emission model. Figure 5 shows the performance of three GP surrogates with n = 256 , 512, and 1024 training data points as a function of m. The surrogate models are trained assuming zero-mean prior ARD SE kernel and additive Gaussian noise. The input parameter uncertainty is given in Table 2 and the chosen confidence level is 0.95. The training data is randomly drawn from the multivariate uniform distribution given in Table 2. The plot also shows the performance of empirical estimates using pure MC and QMC. The predictive performance is measured on a separate test data set with n test = 100 using the NRMSE. The box plots are the result of repeating this experiment with 50 random initializations. Since the training samples are drawn randomly, the trained model differs each time. Similarly, the MC and QMC results vary since the bootstrap is initialized randomly. The residual error e i for each test sample is the difference of the prediction and an empirical VaR using m test = 10 6 quasirandom samples that is close to the true VaR. MC sampling shows significant variance in performance. QMC sampling shows the best performance with a significantly lower variance than the MC method. The GP surrogate models show low variance in performance, even with small n and m. Given a sufficiently high amount of training samples, the GP surrogate shows good performance only second to the QMC method. [38] have already argued that under certain conditions for a given confidence level a GP-based quantile estimate is asymptotically consistent  and efficient. These conditions being: X is a convex and compact subset of ℝ d , the training data is independent and identically distributed in X , the assumption of continuous domain for the VaR holds and there exists a ratio n m that is efficient in terms of function evaluations. They established a bound under which as n, m → ∞ the mean of the GP posterior almost surely converges to the true VaR at the fastest rate possible in terms of function evaluations. Figure 6 depicts the GP surrogate performance as a function of the ratio n m for an increasing amount of function calls required for training. It is evident that model performance increases as n, m → ∞ . For this model, n m ≈ 2 seems to be a good choice to maximize model performance.

Application to an engine calibration problem
To demonstrate the fitness and strength of the proposed framework, we apply it to the engine calibration problem described above. For the sake of simplicity, the problem is formulated for a single OP. In practice, it is common to optimize several interconnected OPs simultaneously [50]. For this OP, we aim to minimize nitrogen oxide emissions while ensuring that soot and noise emissions are kept below an upper bound for 95% of all realizations. We further wish to find a solution inside the initial parameter space (x lower , x upper ) ∈ ℝ d described by Table 2, since the GPs we construct are not suited for extrapolation.
Using the engine models we constructed in Sect. 4.1, we train surrogate models to emulate the VaR. We sample n = 500 training data points from the above mentioned bounded parameter space and draw m = 1000 QMC samples from the distribution given by Table 4 to estimate the empirical VaR for each point. This equals n m = 0.5 and is only slightly worse than n m = 2 performance-wise. However, considering the time complexity of a standard GP of O(n 2 ) , the computation time needed for a prediction is reduced fourfold. The corresponding mean estimate of the emulator can be expressed according to equation 11. Figure 7 shows two scatter plots derived from the VaR surrogate of the soot model. The left plot shows observation and prediction value pairs for the training data. The right plot shows observation and prediction value pairs for n test = 100 test data points. The correspondence is nearly perfect for the training and test samples. Table 5 shows the NRMSE and R 2 results for all three surrogate models. The surrogates emulate the VaR perfectly, even for the unseen test samples, and generalize well.

Problem formulation
To assess the benefit of our methodology, we contrast it to the deterministic setting using safety coefficients. Both tasks can be formulated as MINLP problems. The deterministic scenario is expressed as  where f is the predictive mean of the engine model, l is an upper limit for admissable emissions, and s is a safety coefficient. The safety coefficients are empirical values based on expert knowledge accounting for tolerance behavior. The settings we chose for this particular problem are given in Table 6. The given safety coefficients are the state of the art choice for the given calibration task.
The corresponding problem formulation for the stochastic setting is given by

Optimization results
In this section, we present the optimization results. The Sequential Least-Squares Quadratic Programming algorithm [51] is used for solving the above-formulated problems. The method can be used to solve MINLP problems, which are subject to equality and inequality constraints and to lower and upper bounds on the model parameters. The optimization problem is solved for n init = 20 quasirandom initialization in an attempt to find a global optimum. The function tolerance is set to 10 −5 . For the remaining settings, the default values are selected. Analytical formulations for the GP derivatives are provided for the stochastic setting. Figure 8, Figure 9 and Table 7 show the optimization results for both scenarios. Although the ultimate goal is the same, we see two distinct results. It is evident that the prior assumptions we made for the deterministic case are too conservative. The VaR 0.95 for soot and noise is well below the actual upper bound we were aiming for. By selecting smaller safety coefficients, an improvement in nitrogen oxide emissions would have been possible for this particular setting. On the other hand, for a different choice of design parameters, a smaller value could also lead to non-compliance with the constraints. Without prior knowledge of the output variance, a sufficiently large margin must be chosen.
In contrast to the deterministic setting, our probabilistic approach renders a good estimate for the VaR. This way, we obtain a solution that meets the desired boundary conditions almost exactly. The expected NO x emissions are reduced from 1.129g/kWh to 0.281g/kWh . This corresponds to a decrease of 79.54% . The expected soot and noise emissions are slightly higher at 0.430FSN and 85.841dBA but do comply with the limits for 95% of the realizations.

Gain in computing time
In this section, we compare the computation time of our method with the deterministic setting and a stochastic QMC equivalent. The aim is to keep computing time as low as possible. The reason for this is that in practice multiple OPs need to be optimized simultaneously and a high number of constraints must be satisfied. At the same time, the cost function needs to be minimized for a high number of driving trajectories, which leads to high numbers of function evaluations [16]. As mentioned above, the optimization problem is solved for n init = 20 quasirandom initialization in an attempt to find a global optimum. All computations are carried out on a machine with an Intel Xeon E3-1505M CPU, 32 GB Ram, and an Nvidia Quadro M2000M GPU. We first compare the time needed to compute the optimization task of the deterministic setting with our approach. The computation time for both methods is 15.87s and 9.26s respectively. The time difference is mainly due to the higher prediction costs of the original engine models. As mentioned above, the time complexity of a standard GP is of O(n 2 ) . We further assess the amount of computation time that is saved compared to the setting where the stochastic problem is solved numerically. We, therefore, compare our method with the QMC method. To ensure similar accuracy as with the surrogate model, 500 quasirandom samples are drawn for estimating the empirical VaR. The optimization results for both solutions are given in Table 7. The QMC method finds a similar optimum as our approach. Differences in the solutions result mainly from the termination criteria of the optimizer. Model inaccuracy and numerical errors account for slight differences only. The theoretical time complexity for both methods should be of O(n 2 n e m) , where n is the number of training samples of the respective models, m is the number of draws needed to evaluate the VaR, and n e is the number of total function calls needed for optimization. If we assume that both algorithms need the same amount of function evaluations to find the optimum, our approach in this scenario should be approximately 900 times faster than the QMC method. The measured computing time for our approach and the QMC method is 9.26s and 1691.76s respectively. Our surrogate-based method is ∼ 183 times faster than the QMC approach. The difference to the theoretically possible computational time gain is due to the higher impact of load times on the overall shorter computing times of our approach. For larger optimization tasks, the impact should be significantly lower and the computational time gain is expected to be closer to the theoretical time gain. The time gain we obtain for the simplified problem we have solved may seem noteworthy. As mentioned before though, real-world calibration problems involve significantly more function evaluations and optimization usually takes several hours. In view of this, we can achieve a considerable reduction in computing time and enable large-scale optimizations without the need for parallel computing on multiple machines.

Conclusions
Stochastic design optimization was performed for a representative engine calibration problem in a single OP using the proposed method. The performance of our method was evaluated against the corresponding deterministic optimization scenario using safety coefficients. Compared to the deterministic scenario, it was possible to reduce the expected nitrogen oxide emissions for the OP by 79.54% . The imposed soot and noise constraints and design space boundaries were met in both cases. The reduction in nitrogen oxide emissions is mainly due to the accurate estimation of emission distributions. Due to the analytical form of the GP surrogates, evaluating the VaR is extremely cheap and computing time is comparable to the deterministic method.
In a second step, we contrasted our method to the numerical equivalent solution using QMC sampling. The global optimum was found in both cases with slight numerical differences. Our method showed a decrease in computing time by a factor of 183. This reduction is likely to be even greater for large-scale problems. Due to the low computing time, the method can be applied to large-scale engine calibration problems without the need for parallel computing.

Summary and outlook
In this paper, we proposed and demonstrated a method for efficiently solving large-scale stochastic optimization problems subject to nonlinear constraints. Our main motivation for developing the method is to employ it in the field of base engine calibration. The premise of our method is that a model of the system under study exists, which can be evaluated cheaply. Furthermore, it is essential that the stochastic optimization problem can be formulated such that the objective function and the constraint functions can be expressed using the VaR. The main idea is to derive analytically closed formulations for the VaR, which are cheap to evaluate and thus reduce the computational effort of evaluating the objective and constraints. The VaR is therefore learned as a function of the input parameters of the initial model using a supervised learning algorithm. For this work, we employ Gaussian process regression models.
For demonstration purposes, we have employed the methodology on an engine base calibration problem given a realistic optimization scenario in a single OP. We also performed a state-of-the-art deterministic optimization given a realistic engineering target. For the given scenario the objective was to minimize nitrogen oxide emissions while complying with an upper limit for soot and noise emissions. The optimization results show that the consistent treatment of the problem as a statistical one helps to maximize performance compared to the deterministic approach. Our stochastic approach reduces nitrogen oxide emissions by almost 80% compared to the deterministic scenario. The constraints on soot and noise emissions are complied with. This significant decrease shows a high potential for the reduction of overall raw emissions for practical application of the methodology.
In a second step, we demonstrated the computational efficiency of the method by comparing the computing time of the optimization process to the deterministic case and an equivalent numerical QMC solution. Our method performed on par with the deterministic scenario timewise. It reduced computational effort by nearly 200 times compared to the QMC alternative while converging on the same global optimum.
In this article, we focused on base calibration problems. Another application of uncertainty quantification in the field of engine calibration could be to use uncertainty analysis to assess the overall system uncertainty. This would facilitate decision making in the selection of engine components.
An interesting topic for future research is the extension of the method to larger calibration problems, where the entire operating range of the engine consisting of more than one OP is considered and optimized simultaneously. This adds additional uncertain parameters such as route, traffic, and driving style to the probabilistic problem [17]. A major challenge for solving this kind of task is to create global engine models that are sensitive with regard to the design parameters and are able to capture engine dynamics at the same time. However, recent proposals seem to be promising [52,53].
Funding Open Access funding enabled and organized by Projekt DEAL..
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.

On generating correlated random variables
Let ∶ Ω → ℝ n be a n × 1 normal random vector on a probability space (Ω, Σ, P) consisting of correlated random variables (Z 1 , ⋯ , Z n ) . Given the mean Z and the positive-definite covariance Z , we wish to generate samples for . Let ∶ Ω → ℝ n be a n × 1 random vector with (X 1 , ⋯ , X n ) being independent normal random variables with mean X = 0 and covariance X = n . With n being the n × n identity matrix. Then we can write as with yet to be determined and The covariance of is defined as If the covariance of Z is an n × n real-valued symmetric matrix, it can be expressed as where is the eigenmatrix of Z and  a diagonal matrix with the entries being the eigenvalues of Z . Using equation 16 and 17, we can write and compute it as the Cholesky decomposition of Z . Using and we can calculate given samples .

Performance metrics
A widely used metric that measures the difference between measured data points y and corresponding model predictions ŷ is the NRMSE. It measures the normalized average quadratic error of the model predictions. Since the error is squared, it is disproportionately large for high error values, making it sensitive to outliers. An NRMSE of 0 means a perfect model fit. We denote the NRMSE as where e =ŷ − y is the residual and y min and y max are the minimum and maximum values of the observations . The coefficient of determination is another commonly used metric. It measures the error of the model proportional to the variation in the data. The coefficient of determination can be described as A R 2 of 1 means a perfect model fit.

Leave-one-out cross-validation
A valid way of evaluating the performance of a model without the need for test samples is to use the leave-one-out cross-validation method. The idea is to first split the training inputs consisting of n samples into n sets { −1 , ⋯ , −n } , where −i is the set containing all training samples but the ith. In a second step the model is trained on ( −i , −i ) , with being the respective outputs. And a prediction is made for the ith training sample. This way, the error of the model for the ith training point can be determined for all i ∈ ℕ , with the predicted values not being used for estimation. The calculation cost of this method is usually prohibitive, but in the case of GP regression, there are suitable approximations. One way to compute a numerical approximation to this is described in [54] and is written as where w ii is the ith diagonal entry of the n × n Woodbury identity denoted as and v i is the ith entry of the 1 × n vector The ith residual of the LOOCV approximation is, therefore, given by (21)  (24) e i =ŷ −i − y i = −w −1 ii v i