1 Introduction

In science, technology and economics mathematical models are commonly used to describe physical phenomena, to solve design problems and to manage production processes. The employment of these models frequently entails uncertainty. It has been observed that the dominant uncertainties arise from our lack of knowledge about system parameters and from deficiencies in the modeling itself (Oden et al. 2010). We consider models to be mathematical constructs which describe the functional relations between inputs, internal variables, model parameters and outputs. All present knowledge about the technical system or phenomenon of interest is represented by such a model. Correspondingly, we mean by model uncertainty that some of these functional relations are imperfect, insufficient or simplified in comparison to observed reality. Thus, the present description of the system or phenomenon is incomplete in the sense that there are aspects which have been ignored. As a consequence, any simulated process or manufactured product that is based upon these models is impaired in its predictive quality or usage. Hence, it is important to develop tools and algorithms for the identification, quantification and control of model uncertainty.

In order to detect whether a model is inadequate, one has to compare the model output to actual experimental data. It is, however, difficult to derive a simple criterion for the model to be accurate, since the measurement data is imperfect and subject to uncertainty as well. Generally, data uncertainty arises from irreducible randomness, which is also referred to as aleatoric uncertainty (Lemaire 2014; Zang et al. 2002), and from systematic errors in the measurement process due to lack of knowledge or ignorance, also known as epistemic uncertainty (Roy and Oberkampf 2011; Vandepitte and Moens 2011). In the course of model calibration, the model parameters are adjusted such as to make the model output compatible to experimental observations. As a consequence, uncertainty is transferred from experimental data to the model parameters.

In this paper we propose an algorithm to detect model uncertainty using parameter identification, the optimal design of experiments approach and statistical hypothesis testing. Here, we understand parameter identification to be the process of adjusting model parameters as described above and optimal design of experiments to be the best choice among experimental setups, e.g., sensor types and positions, such that the uncertainty in the estimated parameters is minimized (Alexanderian et al. 2016). Our methodology is able to distinguish between data uncertainty on the one hand and model uncertainty on the other hand. Particularly, we interpret any inconsistency in parameter estimates from different measurement series as an indicator that the underlying mathematical model is unable to describe all measurement series with the same set of parameter values. We assume neither an a priori distribution nor a specific form of model uncertainty in the mathematical equations.

The first step before estimating model parameters is to acquire measurements that capture the behavior of the system well. This step can be costly if many physical properties of the system have to be observed in each experiment. In some engineering applications, measurements are rather taken from a small-sized prototype than from the expensive product which is often unavailable yet. It is therefore desirable to know beforehand the optimal sensor positions in view of the actual product by considering the experimental results from the prototype. Thus, it is valuable to reduce the number of sensors if this does not downgrade the reliability of the identified model parameters. Additionally, removing unreliable sensors may even improve the quality of the estimate. This can be done using the methodology from optimal design of experiments, i.e., by deciding which sensors are actually best suited for gathering data in order to minimize the posterior variance of the estimated parameters. Using these kinds of sensors and their optimal positions, measurements with maximum informational value can be obtained.

To determine model uncertainty based on measurements obtained from an optimally designed experiment, we split the experimental data into a calibration and a validation set. Then we solve the parameter identification problem for the calibration set. Furthermore, we compute a confidence ellipsoid for a given confidence level \(1 - \alpha\), where \(\alpha \in (0,1)\), and if the model is correct, then the solution of the parameter identification problem for the validation set should lie within this confidence ellipsoid. If the optimal parameters for the validation set are outside this confidence ellipsoid then we have an indication of model uncertainty. The splitting of the data and the testing is repeated until the number of desired test scenarios is reached.

In the literature, a variety of methods exist to detect, quantify and control model uncertainty. We generally distinguish between a non-probabilistic approach where uncertainty is treated in a rather deterministic way (Farajpour and Atamturktur 2012; Simani et al. 2003; Smith 2014), a probabilistic Bayesian inference based approach to assess the prediction quality of a model (Farajpour and Atamturktur 2012; Gu and Wang 2018; Mallapur and Platz 2018, 2019; Sankararaman and Mahadevan 2011; Wang et al. 2009) and a probabilistic frequentist perspective (Liu et al. 2011; Wong et al. 2017; Zhao et al. 2017). In this paper we adopt a probabilistic frequentist point of view to deal with model uncertainty. In the following, we explain in more detail the main differences to other methods that are closely related to our approach.

Model uncertainty is especially discussed in model-based fault diagnosis of machines. Simani et al. (2003) treat uncertainty in the modeling by bounded error terms in the model equations and thus take a robust optimization point of view. This method assumes a priori information on the uncertainty in the mathematical equations. In our approach we do not need any assumptions upon the specific form of uncertainty.

Our methodology is similar to the idea of Körkel et al. (2004), Bauer et al. (2000) and Galvanin et al. (2007) who also combined optimal design of experiments with parameter identification. However, they only used this method to reliably find optimal parameter values. Asprey and Macchietto (2000) continued this methodology to choose between competing models via maximizing a measure of divergence between model predictions. In our approach no such measure is needed, we only employ the parameter estimates and their covariance matrices. Another difference is that we also consider higher order derivatives in the computation of the covariance matrix (Bard 1974) which is used to determine the confidence ellipsoid.

There is extensive literature on Bayesian parameter calibration and validation. However, there seem to be only a few references dealing with model uncertainty from a general viewpoint. Lima et al. (2017) describe a general method to select the best model based on Occam’s Plausibility Algorithm (Oden et al. 2017) and Bayesian calibration. However, we do not adopt a Bayesian perspective but we involve design of experiments instead to sharpen the parameter estimates.

Staying within this Bayesian framework the same question whether a set of measurements for a given model is adequately described by the same set of parameters is addressed by Tuomi et al. (2011). Using a given prior distribution for the parameters, they derive an inequality to dismiss the veracity of a model. If the probability for the data to be obtained under different parameter sets is significantly higher then the model is rejected. In this work we discuss the same question but from a probabilistic frequentist point of view without any assumptions on the prior distribution of the parameters.

Another important approach to identify and control model uncertainty was introduced by Kennedy and O’Hagan (2001). This method is based on the assumption that the true values of the quantities of interest are the sum of the model output h(pq), with input q and model parameters p, and the model discrepancy term \(\delta (\theta , q)\). Thus, the measurements z should satisfy the equation

$$\begin{aligned} z = h(p, q) + \delta (\theta , q) + \varepsilon \end{aligned}$$
(1)

with independent observational noise \(\varepsilon\). Then parameter identification can be performed for (1) to obtain best guesses for both the model parameters p as well as the parameters \(\theta\) of the model uncertainty \(\delta\). Arendt et al. (2012) use this approach for model updating and to distinguish between the effects of model calibration and model discrepancy. However, it has been shown by Brynjarsdóttir and O’Hagan (2014) that the success of this approach heavily depends on the incorporation of a priori knowledge about the specific form of model uncertainty into the representation of \(\delta\), which is often assumed to be a specific type of stochastic process, but is actually not known beforehand. In contrast, our approach does not need any assumptions about the specific form of model uncertainty.

One particular case of technical systems with a multitude of uncertain parameters and unknown physical effects that challenge the modeling process are forming presses. Forming presses are highly loaded machines, which have kinematic degrees of freedom to perform a motion and to apply high magnitude forces on a workpiece. During this motion, the workpiece is then formed into a new shape. This can cause a considerable deflection of machine components which is of high technical importance. Therefore, we want to model this deformation accurately. In this paper, we consider a mechanical forming machine, the 3D Servo Press, that consists of a linkage mechanism. The kinematic chain is determined by multiple mechanical components with a large number of parameters. We approximate this chain by a lumped parameter system to reduce the number of parameters. When modeling a machine we typically pursue one of two objectives that lead to different lumped parameter models: an accurate elastic behavior at low frequencies or an accurate frequency response (Dresig and Fidlin 2014). In the case at hand we seek a model that represents an accurate elastic behavior at low excitation frequencies. To estimate the stiffness of components with non-uniform cross-sections, a finite element model is a typical technique. In a second step, the finite element model is reduced to the lumped parameter model. This model order reduction makes the model inaccurate besides a variety of uncertain influencing variables like material properties and inexact geometries. Hence, for some components it is necessary to identify the stiffnesses after the assembly of the machine. Due to the deflection, a relative movement of the components occurs and as a result friction dissipates a portion of this kinetic energy. However, for the modeling of friction on a macroscopic level, multiple phenomenological models exist so far (Bertotti and Mayergoyz 2006). In this work, three different friction models are portrayed as competing to explain the load-displacement curve of the 3D Servo Press. We apply our methodology to identify uncertainty in these models and to select the most accurate of them.

The paper is organized as follows. First we introduce the parameter identification problem and its covariance estimation. Based on the resulting covariance matrix, we then formulate the problem of optimal experimental design to find optimal sensor positions which lead to the smallest variance of the resulting parameter estimates. In Sect. 4 we describe in more detail how parameter identification, optimal design of experiments and hypothesis testing can be used to detect model uncertainty. Afterwards we introduce the working principle and the mathematical models of the 3D Servo Press. The application of our proposed method to the models of the 3D Servo Press is done in Sect. 6, where we also present numerical results. We end the paper by giving some concluding remarks.

2 The parameter identification problem and its covariance estimation

In this section we present the parameter identification problem in a similar way as it is done by Körkel et al. (2004). We first introduce some basic notation and assumptions, formulate the problem and then deduce the covariance matrix as well as the considered confidence regions.

The mathematical model is given by the state equation

$$\begin{aligned} E(y, p, q) = 0, \end{aligned}$$
(2)

where \(E:{\mathbb {R}}^{d_{y}}\times {\mathbb {R}}^{n_p}\times {\mathbb {R}}^{d_q} \rightarrow {\mathbb {R}}^{d_E}\) is an operator coupling the state vector \(y \in {\mathbb {R}}^{d_y}\) and the parameters \(p \in {\mathbb {R}}^{n_p}\) for any input variable \(q \in {\mathbb {R}}^{d_{q}}\). This state equation may be a discretized form of a partial differential equation (PDE) with large dimensions \(d_y\) and \(d_E\). We assume that (2) has a unique solution y for any given p and q. In our modeling, the input variables represent external boundary or load forces which are applied to a mechanical system, see Sect. 5. Particularly, we have \(n_q\) inputs in a loading-unloading scenario and we write \(q_j \in {\mathbb {R}}^{d_{q}}\) for one input from such a scenario and \(y_j \in {\mathbb {R}}^{d_y}\) for the corresponding state.

The model parameters p are in general not known beforehand. Therefore, we need measurements to obtain appropriate estimates. Let \(n_S\) denote the number of allocated sensors for data collection. To be more precise, we define a measurement series \(z_i\), where \(i \in \left\{ 1, \ldots , n_M\right\}\), to be a set of data points \(z_{ijk}\) acquired for all input variables \(j = 1, \ldots , n_q\) and for all sensors \(k = 1, \ldots , n_S\). We collect \(n_M\) different measurement series in order to improve the information gain and accuracy. We assume that the measurements \(z_{ijk}\) are collected by prepositioned sensors where each sensor k has a constant standard deviation \(\sigma _k \in {\mathbb {R}}\) for each input \(q_j\) and in each measurement series i. The aim of the parameter identification problem is to find model parameters \(\overline{p}\in {\mathbb {R}}^{n_p}\) that best fit the model output to the measurements \(z \in {\mathbb {R}}^{n_M \times n_q \times n_S}\) for given inputs.

In general, it is not possible to measure all of the state components directly. Therefore, we introduce an observation operator \((y_j, p, q_j) \mapsto \overline{h}(y_j, p, q_j) \in {\mathbb {R}}^{n_S}\) that maps state, parameters and inputs to the actual quantity that is measured. Since we will later choose an optimal subset of all possible sensors, we introduce binary weights \(\omega \in \left\{ 0, 1\right\} ^{n_S}\) such that \(\omega _k = 1\) if and only if sensor k is used.

We apply the least-squares method to find the optimal parameter values which minimize the discrepancy between given measurements z and the model output weighted by the standard deviation of each sensor, respectively:

$$\begin{aligned} \begin{aligned} \min \limits _{(y, p)}&\quad \sum _{k = 1}^{n_S} \sum _{j = 1}^{n_q} \sum _{i = 1}^{n_M} \dfrac{\omega _k}{2} \left( \dfrac{z_{ijk} - \overline{h}_{k}(y_{j}, p, q_j)}{\sigma _{k}}\right) ^2 \\ \mathrm {s.t.}&\quad E(y_{j}, p, q_j) = 0, \quad \text {for } j \in \left\{ 1, \ldots , n_q\right\} . \end{aligned} \end{aligned}$$
(3)

Remark 1

Alternatively, we can also assume that each sensor k has a given standard deviation \(\sigma _{ijk}\) in each measurement scenario \(i \in \left\{ 1, \ldots , n_M\right\}\) and for each input \(q_j, j \in \left\{ 1, \ldots , n_q\right\}\). However, to keep notation simple, we assume the working precision \(\sigma _{k}\) of each sensor to be constant over all measurement series and all inputs.

For convenience, we rewrite problem (3) in vector form with \(n = n_M n_q n_S\) being the specified dimension and eliminate the state equation by inserting the unique state solution

$$\begin{aligned} y(p) :=(y_1(p), y_2(p), \ldots , y_{n_q}(p)) = (y(p,q_1), y(p,q_2), \ldots , y(p,q_{n_q})) \end{aligned}$$

into the objective function leading to the optimization problem

$$\begin{aligned} \begin{aligned} \min \limits _{p} \; f(p, z, \varOmega ) :=\frac{1}{2} r(p, z)^\top \varOmega \, r(p, z) \end{aligned} \end{aligned}$$
(4)

with the notations

$$\begin{aligned} r(p, z)&:=\varSigma ^{-1}\left( z - h(y(p), p, q)\right) \in {\mathbb {R}}^{n}, \nonumber \\ h(y(p),p,q)&:=\mathrm {rep}\left( \left[ \overline{h}(y_j(p),p,q_j)\right] _{j = 1, \ldots , n_q}, n_M\right) \in {\mathbb {R}}^{n}, \nonumber \\ \varOmega&:=\mathrm {Diag} \left( \mathrm {rep} \left( \left[ \omega _k\right] _{k = 1, \ldots , n_S}, n_q n_M\right) \right) \in {\mathbb {R}}^{n \times n}, \nonumber \\ \varSigma&:=\mathrm {Diag} \left( \mathrm {rep} \left( \left[ \sigma _k\right] _{k = 1, \ldots , n_S}, n_q n_M\right) \right) \in {\mathbb {R}}^{n \times n}, \end{aligned}$$
(5)

where \(\mathrm {rep}(x, m)\) is the repetition function that produces m copies of the vector x. Thus, the vector h is an arrangement of \(\overline{h}(y_j(p),p,q_j)\) for all \(j = 1, \ldots , n_q\) in a row vector copied \(n_M\) times, while \(\varOmega\) and \(\varSigma\) are diagonal matrices consisting of \(n_q n_M\) copies of \(\omega _1,\ldots ,\omega _{n_S}\) and \(\sigma _1,\ldots ,\sigma _{n_S}\), respectively. The measurement tensor z is vectorized compliant with h and for convenience we use the same symbol.

Problem (4) can be (locally) solved using, e.g., an extended Gauss-Newton algorithm, see Dennis et al. (1981) for more details. We denote the (local) solution of this optimization problem by \(p(z, \varOmega )\) to emphasize its dependence on the measurements and on the weights.

For the quantification of data uncertainty we assume the measurement errors to be normally and independently distributed, i.e.,

$$\begin{aligned} z_{ijk} = z^\star _{ijk} + \varepsilon _{k}, \quad \text { with } \; \varepsilon _{k} \in {\mathcal {N}}\left( 0, \sigma _{k}^2\right) , \end{aligned}$$

where \(z^\star\) are the true (but unknown) values of the quantities that are measured. Since the measurement series \(z_i\) are realizations of the same random variable Z, the estimated parameters \(p(Z, \varOmega )\) are also random variables. Denote the (unknown) expected value of the distribution of \(p(Z, \varOmega )\) by \(p^\star\). We are now interested in how a perturbation of Z propagates to \(p(Z, \varOmega )\). Therefore, we linearize the solution operator \(Z \mapsto p(Z, \varOmega )\) of the parameter identification problem around some fixed \(\overline{z}\), which will be specified later, such that the linearized \(p(Z, \varOmega )\) is Gaussian distributed, compare, e.g., Proposition 3.2 in Ross (2010):

$$\begin{aligned} p(Z, \varOmega ) = p(\overline{z}, \varOmega ) + \partial _{z} p(\overline{z}, \varOmega ) \cdot (Z - \overline{z}) + o(\left\| Z - \overline{z}\right\| ). \end{aligned}$$

The covariance matrix of the linearized random variable is defined by

$$\begin{aligned} \begin{aligned} C(p^\star , \varOmega ) :=\mathbb {E}\Big [&\bigl (p(\overline{z}, \varOmega ) + \partial _{z} p(\overline{z}, \varOmega ) \cdot (Z - \overline{z}) - p^\star \bigr ) \\&\cdot \bigl ( p(\overline{z}, \varOmega ) + \partial _{z} p(\overline{z}, \varOmega ) \cdot (Z - \overline{z}) - p^\star \bigr )^\top \Big ]. \end{aligned} \end{aligned}$$
(6)

Thus, the approximated confidence ellipsoid for a certain confidence level \(1 - \alpha\), where \(\alpha \in (0,1)\), of the multivariate Gaussian distributed solution of the parameter identification is given by

$$\begin{aligned} \begin{aligned} G\left( \alpha , p^\star \!, C(p^\star \!,\varOmega )\right) = \left\{ p \in {\mathbb {R}}^{n_p} : (p - p^\star )^\top C(p^\star \!,\varOmega )^{-1} (p - p^\star ) \le \gamma ^2(\alpha )\right\} , \end{aligned} \end{aligned}$$
(7)

where \(\gamma ^2(\alpha ) :=\chi _{n_p}^2(1-\alpha )\) is the quantile of the \(\chi ^2\) distribution with \(n_p\) degrees of freedom. For more details on multivariate Gaussian distributions and confidence ellipsoids, see for example Scheffé (1959).

In order to derive an analytical expression of the covariance matrix C in (6), following Bard (1974), we use standard methods for the linearized version of the mapping \(Z \mapsto p(Z, \varOmega )\) around some \(\overline{z}\), such that \(p(\overline{z}, \varOmega )\) is a good approximation of \(p^\star\). Denote \(\overline{p}:=p(\overline{z}, \varOmega )\) for brevity. Then, the sensitivity \(\partial _{z} p(\overline{z}, \varOmega )\) can be determined using the first order optimality condition for the parameter identification problem (4), i.e,

$$\begin{aligned} \partial _{p} f(\overline{p}, \overline{z}, \varOmega ) = 0. \end{aligned}$$
(8)

In order to use the implicit function theorem, we make the following assumption:

Assumption 1

  1. (i)

    \(f(\overline{p}, \overline{z}, \varOmega )\) is twice continuously differentiable with respect to p.

  2. (ii)

    \(\partial _{pp}^2 f(\overline{p}, \overline{z}, \varOmega )\) is invertible.

Remark 2

Note that Assumption 1 (i) is implied by the condition that the observation operator h is twice continuously differentiable with respect to p.

Using Assumption 1, we now can apply the implicit function theorem. Thus, equation (8) implicitly defines a mapping \(Z \mapsto p(Z, \varOmega )\) and its sensitivity \(\partial _{z} p(\overline{z}, \varOmega )\) is given by

$$\begin{aligned} \partial _{pp}^2 f(\overline{p}, \overline{z}, \varOmega )\, \partial _{z} p(\overline{z}, \varOmega ) \, \cdot \delta Z = - \partial _{pz}^2 f(\overline{p}, \overline{z}, \varOmega ) \, \cdot \delta Z \end{aligned}$$
(9)

in any direction \(\delta Z\). More precisely, we have

$$\begin{aligned} \partial _p f(\overline{p}, \overline{z}, \varOmega ) =&\ r(\overline{p}, \overline{z})^\top \varOmega \, \partial _p r(\overline{p}, \overline{z}), \\ \partial ^2_{p z} f(\overline{p}, \overline{z}, \varOmega ) =&\ \partial _{p} r(\overline{p}, \overline{z})^\top \varOmega \, \partial _{z}r(\overline{p}, \overline{z}) = \partial _{p}r(\overline{p}, \overline{z})^\top \varOmega \varSigma ^{-1} , \\ \partial ^2_{p p} f(\overline{p}, \overline{z}, \varOmega ) =&\ \partial _{p} r(\overline{p}, \overline{z})^\top \varOmega \, \partial _{p} r(\overline{p}, \overline{z}) + \sum _{i=1}^n r_i(\overline{p}, \overline{z}) \, \varOmega _{ii}\, \partial ^2_{p p} r_i(\overline{p}, \overline{z})\, . \end{aligned}$$

Let us define

$$\begin{aligned} H(\varOmega ) :=\partial ^2_{p p} f(\overline{p}, \overline{z}, \varOmega ) = J(\varOmega )^\top \varOmega J(\varOmega ) + S(\varOmega ) \end{aligned}$$

with \(J(\varOmega ) :=\partial _{p} r(\overline{p}, \overline{z})\) and

$$\begin{aligned} S(\varOmega )&:=\sum _{i=1}^n r_i(\overline{p}, \overline{z}) \, \varOmega _{ii}\, \partial ^2_{p p} r_i(\overline{p}, \overline{z}), \end{aligned}$$

where \(J(\varOmega ) \in {\mathbb {R}}^{n \times n_p}\) and \(S(\varOmega ) \in {\mathbb {R}}^{n_p \times n_p}\). The exact calculation of \(J(\varOmega )\) and \(S(\varOmega )\) is given in the appendix, which requires the following assumption to allow the usage of the implicit function theorem:

Assumption 2

  1. (i)

    The state equation E is twice continuously differentiable in all arguments.

  2. (ii)

    \(\partial _{y} E(y(\overline{p}), \overline{p},q)\) is invertible.

We want to make sure that the principal part \(J(\varOmega )^\top \varOmega J(\varOmega )\) stays invertible when changing the values of the weights \(\varOmega\).

Assumption 3

The matrix \(\varOmega J(\varOmega )\) has full column rank, i.e.,

$$\begin{aligned} \mathrm {rank}(\varOmega J(\varOmega )) = n_p. \end{aligned}$$

From this assumption we can infer invertibility of \(J(\varOmega )^\top \varOmega J(\varOmega )\), compare also Körkel et al. (2004). Notice, that Assumption 3 cannot be satisfied if \(n_S < n_p\) and \(J(\varOmega )\) is independent of the inputs. Since the latter could often be the case we require the experimenter to employ at least as many sensors as the number of parameters which shall be estimated. This will become an important constraint later in the optimal experimental design problem in Sect. 3.

From (9) we obtain

$$\begin{aligned} \partial _{z} p(\overline{z}, \varOmega ) = - H(\varOmega )^{-1} J(\varOmega )^\top \varOmega \varSigma ^{-1}. \end{aligned}$$

Using the calculations from above, the approximated covariance matrix is given by

$$\begin{aligned} \begin{aligned} C(\overline{p},\varOmega )&= \mathbb {E}\left[ \partial _{z} p(\overline{z}, \varOmega ) \cdot (Z - \overline{z}) (Z - \overline{z})^\top \cdot \partial _{z} p(\overline{z}, \varOmega )^\top \right] \\&= \partial _{z} p(\overline{z}, \varOmega ) \cdot \mathbb {E}[\varepsilon \varepsilon ^\top ] \cdot \partial _{z} p(\overline{z}, \varOmega )^\top = \partial _{z} p(\overline{z}, \varOmega ) \ \varSigma ^2 \ \partial _{z} p(\overline{z}, \varOmega )^\top \\&= H(\varOmega )^{-1} J(\varOmega )^\top \varOmega \, \varSigma ^{-1} \varSigma ^2 \varSigma ^{-1} \varOmega J(\varOmega ) H(\varOmega )^{-\top } \\&= H(\varOmega )^{-1} J(\varOmega )^\top \varOmega ^2 \, J(\varOmega ) H(\varOmega )^{-\top }. \end{aligned} \end{aligned}$$
(10)

3 Optimal design of experiments

The optimal design of experiments problem deals with the task of finding an optimal experimental configuration such that the reliability of the estimated model parameters is maximized. In the case at hand, this task simplifies to determining optimal sensor positions. Notice, however, that the reliability also depends on the accuracy of the sensors that are used for the measurements, whereby each sensor k has a given constant variance \(\sigma _{k}^2\). Often, the measurement error is composed of a variety of causes, e.g., the repetition error and internal approximation errors as specified by the manufacturer. While the experimenter is in charge to keep the repetition error small during the experiment, the internal errors are fixed by manufacturing of each sensor.

It is very common to measure the reliability of the parameter estimation by a single-valued design function \(\varPsi\), see Bauer et al. (2000) and Franceschini and Macchietto (2008). It is obvious that a small covariance leads to a high reliability of the parameter estimation. However, it is unclear what a small covariance means in terms of matrices. In general, there are different approaches how to choose the \(\varPsi\) function. We list the most prominent ones according to Fedorov and Leonov (2013):

  • A-criterion: the trace of the covariance matrix, \(\varPsi _A(C) = \mathrm {trace}(C)\),

  • D-criterion: the determinant of the covariance matrix, \(\varPsi _D(C) = \det (C)\),

  • E-criterion: the maximal eigenvalue of the covariance matrix, \(\varPsi _E(C) = \lambda _\text {max}(C)\).

It seems natural to use the D-criterion due to its close connection to the volume of the confidence ellipsoid and its invariance with respect to transformations applied to the model parameters. However, this criterion tends to emphasize the most sensitive parameter (Franceschini and Macchietto 2008). A usage of the A- and E-criterion requires a conscious scaling of the model parameters in order to achieve meaningful results. The major drawback of the A-criterion lies in its ignorance of the information gained from the off-diagonal elements of the covariance matrix. This is particularly inefficient when there is a high correlation between parameters. For the numerical example in this paper, we choose the E-criterion even though E-optimality may lead to a tolerable increase in volume of the confidence ellipsoid. The E-criterion effectively reduces the largest semi-axis of the confidence ellipsoid.

We now formulate the optimal experimental design problem as follows:

$$\begin{aligned} \begin{aligned} \min \limits _{\omega }&\quad \varPsi \left( C(\overline{p},\varOmega )\right) \\ \mathrm {s.t.}&\quad \varOmega = \mathrm {Diag} \left( \mathrm {rep} \left( \left[ \omega _k\right] _{k = 1, \ldots , n_S}, n_q n_M\right) \right) , \\&\quad \overline{p}= p(\overline{z}, \varOmega ) \text { solution of } (4), \\&\quad g(\omega ) \le 0, \quad \omega \in \left\{ 0, 1\right\} ^{n_S}. \end{aligned} \end{aligned}$$
(11)

The possibly nonlinear constraint \(g(\omega ) \le 0\) describes further conditions on \(\omega\). In our case, to fulfill the rank condition in Assumption 3, we have to include the constraint \(n_p - \sum _{i = 1}^{n_S} \omega _{k} \le 0 \,\).

The optimal experimental design problem (11) is thus a non-convex mixed-integer nonlinear program (MINLP). Such problems can be solved via spatial branch-and-bound, see, e.g., Burer and Letchford (2012) for an overview. In a broader sense, many authors deal with optimal sensor placement using other objective functions that involve the Fisher Information matrix, Bayes factors, condition numbers or a modal analysis, see, e.g., Papadopoulos and Garcia (1998), Hiramoto et al. (2000), Flynn and Todd (2010), Castro-Triguero et al. (2013). Alexanderian et al. (2016) come from an infinite-dimensional Bayesian perspective performing a relaxation of the integrality condition on \(\omega\) and introducing a penalty term to achieve sparsity. They further apply a reiteration of the optimization problem with smooth penalty functions that converge to the \(\ell _0\)-“norm”. Thus, the final solution has almost integer values, see also Alexanderian et al. (2014) and Koval et al. (2020). Neitzel et al. (2019) develop a sparse optimal control approach for sensor placement problems using Dirac measures. For review articles and books on optimal experimental design we refer to Pukelsheim (2006), Franceschini and Macchietto (2008), Fedorov (2010), Fedorov and Leonov (2013).

Note, however, that for the correctness of our approach to detect model uncertainty, problem (11) does not necessarily need to be solved to optimality. Using a good but suboptimal sensor placement will not lead to any incorrect rejection of a model, since the variance of the parameter estimates becomes larger and therefore also the confidence ellipsoids increase. Thus, it is also possible to solve (11), which is the computationally most expensive step of the proposed approach, by heuristic methods. In our numerical example, the number of sensors is very small, so that a heuristic method may indeed provide satisfactory results.

We clearly want to point out, that our method to solve (11) is designed for problems with low parameter dimension. For instances with a large number of model parameters, a different approach to efficiently compute the design criterion \(\varPsi\) is necessary. As mentioned before, Alexanderian et al. (2016) come from an infinite-dimensional setting and they develop, in case of the A-criterion, randomized trace estimators where only matrix-vector products of the Hessian and a random vector are needed. Thus, the computational cost, measured in terms of PDE-solves, stays independent of the number of parameters and sensors but grows linearly with the data dimension. In case of the D-criterion for high-dimensional linear inverse problems we refer to Alexanderian and Saibaba (2018).

4 Detecting model uncertainty

In this section, we discuss how optimal design of experiments and parameter identification can be used to detect model uncertainty in a mathematical model \({\mathcal {M}}\). To do so, assume that all parameters of the model have a true physical meaning and that in case the model is correct, the solution of the parameter identification problem is a good approximation of those real, physical values. Then repeated solutions of the parameter identification problem for different measurements with differing inputs should, within the boundaries of the model up to uncertainty of the measurements, deliver the same set of parameters. On the other hand, if one set of measurements leads to parameters which lie outside a given confidence set of the previous runs, then this implies that the model cannot replicate the results of all measurements reliably, i.e., the underlying model is inadequate.

Our approach to detect model uncertainty in a mathematical model \({\mathcal {M}}\) is depicted in Table 1. As already explained in the introduction, the first step before identifying model parameters by fitting the model output to a given set of measurements is to actually acquire these measurements which can be extremely costly. Furthermore, the quality of the parameter estimation may even be improved by removing unreliable sensors. Therefore, we only acquire a minimal amount of measurement series, or use artificial data, which is needed for the computation of the optimal design of experiments introduced in the previous section to determine optimal sensor positions (line 02). In this case, we solve problem (11) with a restriction on the desired number of used sensors to decide which sensors are actually essential to solve the parameter identification problem with minimal variance (line 03).

After using the optimal experimental setup \(\omega _\mathrm {opt}\) to acquire data it needs to be verified whether the measurement errors are normally distributed (lines 04-05). We use the well known Shapiro-Wilk goodness-of-fit test to do so, see D’Agostino (1986). We only consider experiments that render data with Gaussian measurement errors otherwise we cannot apply our algorithm.

Table 1 Algorithm for the detection of uncertainty in a mathematical model

Assume that a test set z of measurements is given. Then split the test set into one calibration set \(z^\mathrm {cal}\) and one validation set \(z^\mathrm {val}\), see line 06. This split can either be done randomly, as in a Monte Carlo cross-validation (Dubitzky et al. 2007), or it can be chosen in a way to test whether a specific physical effect is sufficiently modeled. For example, the test set could be split according to the magnitude of the inputs to check if the results for both sets can be reproduced by the model for the same set of parameters. On the one hand, this approach can help to identify ranges of input variables for which the model works better or worse and on the other hand, to detect specific effects which are not yet sufficiently implemented in the model. In any case, the splitting must be reasonable in the sense that the measurement errors in each set are still normally distributed.

From line 07 onward, a classical hypotheses test with standard Bonferroni correction (Dunn 1961) is conducted. For this, the parameters \(p_\mathrm {cal}\) and their covariance \(C_\mathrm {cal}\) are computed from the calibration data set \(z^\mathrm {cal}\) using (4) and (10), respectively. Likewise, the parameters \(p_\mathrm {val}\) are computed from the validation data set \(z^\mathrm {val}\). Now, the following hypothesis is tested:

$$\begin{aligned} \mathrm {HYP}_0 \;&: \; p^\star = p_\mathrm {cal} \text { is the true parameter value for all inputs } q_j,\\ \mathrm {HYP}_1 \;&: \; p^\star \ne p_\mathrm {cal}. \end{aligned}$$

The corrected threshold \(\overline{\mathtt {TOL}}= \mathtt {TOL}/n_\mathrm {tests}\) is the test level which is used to decide whether the null hypothesis \(\mathrm {HYP}_0\) needs to be rejected. If \(p_\mathrm {val} \notin G(\overline{\mathtt {TOL}}, p_\mathrm {cal}, C_\mathrm {cal})\) then the rejection occurs. Recall, that

$$\begin{aligned} \begin{aligned} G(\overline{\mathtt {TOL}}, p_\mathrm {cal}, C_\mathrm {cal}) = \left\{ p \in {\mathbb {R}}^{n_p} : (p - p_\mathrm {cal})^\top C_\mathrm {cal}^{-1} (p - p_\mathrm {cal}) \le \chi _{n_p}^2\left( 1-\overline{\mathtt {TOL}}\right) \right\} . \end{aligned} \end{aligned}$$

The outcome of the statistical test can easily be determined by comparing its p-value, \(\alpha _\mathrm {min}\), with the threshold \(\overline{\mathtt {TOL}}\) (line 09). The p-value is the smallest test level under which the null hypothesis can only just be rejected. If \(\mathrm {HYP}_0\) cannot pass the test then we detected model uncertainty. Otherwise another test is conducted by returning to line 06 until the number of desired test scenarios is reached.

The Bonferroni correction accounts for the potential problem of multiple testing since we may perform the tests on dependent validation sets. Without addressing this issue we should expect \(\approx n_\mathrm {tests} \mathtt {TOL}\) hypotheses to be rejected, which necessitates the introduction of another (arbitrary) threshold to deduce model uncertainty. The very conservative Bonferroni correction controls the familywise error rate (FWER), which is the probability of rejecting at least one true null hypothesis. By performing \(n_\mathrm {tests}\) tests with the modified test level \(\overline{\mathtt {TOL}}\) we are able to achieve \(\mathtt {TOL}\) as a bound for the FWER, which is equivalent to the error of the first kind in multiple hypothesis testing. Since all individual test levels are drastically reduced we interpret any rejection of a null hypothesis as significant, i.e., then model uncertainty is detected and \({\mathcal {M}}\) needs to be rejected.

In practical applications it may occur that an inaccurate model passes quite a few tests. Evidently, even an inaccurate model may be useful for a small range of input variables. However, a false model will always fail at least one test provided that enough data caused by a variety of inputs is available and that the splitting into one calibration and one validation test set is done intelligently. To catch the worst case in this splitting maneuver, it may be necessary to consult an expert judgment depending on the application to properly exploit the special structure of the technical system.

5 The 3D servo press model

The method for detecting model uncertainty is demonstrated at a technical system, the 3D Servo Press (Scheitza 2010), a forming machine which transmits the torques and forces of its drives onto a part to be formed, e.g., a car body part. Therefore, a forming machine is subject to high magnitudes of external forces during its motion which cause its mechanism to deflect. While a rigid body model is accurate during the unloaded state, it does not suffice during the forming operation (Groche et al. 2017). Especially for the closed-loop control of forming machines, an accurate model is crucial as inaccuracies can cause the control to become unstable (Hoppe et al. 2019). However, the modeling of forming machines requires a high degree of abstraction, since elastic bodies are usually reduced to bars and beams in order to keep the model tractable. Furthermore, nonlinear bearing stiffnesses as well as friction have to be taken into account.

Figure 1 shows the 3D Servo Press that consists of three identical linkage mechanisms. We use a mechanical substitute model and describe it for one linkage mechanism. A variety of bars and beams are connected via joints that are designed as rotary joints. Each elastic component is represented by a spring or beam and each mass by a gray volume. The eccentric and spindle drives move the three degrees of freedom of one gear unit \(\varphi _\mathrm {ecc}\), \(y_\mathrm {su}, y_\mathrm {sl}\) that cause all joints in the kinematic chain to perform a desired movement. The output of the gear unit is point D, which leads down to the ram bearing R via a linear pressure bar. For the rigid-body model, the position of all points is defined by the angle of the eccentric drive \(\varphi _\mathrm {ecc}\) as well as the upper and lower spindle drive position \(y_\mathrm {su}, y_\mathrm {sl}\).

Fig. 1
figure 1

Linkage mechanism of the 3D Servo Press

To model the elastic 3D Servo Press, the coupling links are interpreted as bars and beams, depending on their stress state under load. The bar and beam models are composed of masses and springs. The bearings are modeled as simple spring elements with either linear or non-linear spring characteristics. The equation of motion of the system is determined by the Lagrange equations of the second kind:

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t}\left( \frac{\partial L}{\partial {\dot{y}}}\right) - \frac{\partial L}{\partial y} = q, \end{aligned}$$
(12)

where \(L=T-U\) is the Lagrangian consisting of the total kinetic energy T and the total potential energy U, y are the system states and q are the non-conservative forces. The non-conservative forces contain all external forces that are applied to the machine, i.e., the torque of the eccentric drive \(q_\mathrm {ecc}\), the forces of the upper and lower spindles \(q_\mathrm {su}\), \(q_\mathrm {sl}\) and the reacting process force \(q_\mathrm {P}\). In this application we want to evaluate the elastic model and therefore fix the drives positions. Thus, only \(q_\mathrm {P}\) is applied and all other non-conservative forces are zero.

Solving the Lagrangian equation requires the potential and kinetic energy as a function of the states. These consist of the stored energy in each elastic and rigid body

$$\begin{aligned} T&=\sum _{i=1}^{5} T_{\text {bar},i} + \sum _{i=1}^{1} T_{\text {beam},i} + \sum _{i=1}^{2} T_{\text {body},i}, \\ U&=\sum _{i=1}^{5} U_{\text {bar},i} + \sum _{i=1}^{1} U_{\text {beam},i} + \sum _{i=1}^{10} U_{\text {joint},i}, \end{aligned}$$

whereby the energies of the individual elements are given as follows.

Bar model A direct approach to discretizing the bar while maintaining inertia and rigidity is the finite element method. It is based on the partial differential equation of the continuous bar and supplies the mass matrix \(M_i\) and the stiffness matrix \(K_i\) for an element of mass \(m_i\) and stiffness \(k_{\text {bar},i}\), which are given by

$$\begin{aligned} M_i = \begin{bmatrix} \frac{1}{2} m_i &{} \frac{1}{6} m_i\\ \frac{1}{6} m_i &{} \frac{1}{2} m_i \end{bmatrix}, \quad K_i = \begin{bmatrix} k_{\text {bar},i} &{} -k_{\text {bar},i}\\ -k_{\text {bar},i} &{} k_{\text {bar},i} \end{bmatrix}. \end{aligned}$$

As the actual elements do not have a uniform cross section, the stiffness is determined using a finite element simulation based on the ideal CAD model.

Remark 3

The CAD model and finite element model are based on the detailed knowledge of the elastic modulus and the geometry of the components. Due to natural fluctuations in material production, the elastic modulus may vary from part to part. In addition, manufacturing limitations only impede geometric accuracy. Therefore, determining the stiffness by an a priori FEM simulation leads to an uncertain estimation of the actual stiffness and requires a parameter identification based on posterior measurements.

The kinetic energy of an individual bar shown in Fig. 2 sums up to

$$\begin{aligned} T_{\text {bar},i} = \frac{1}{2} \left( m_{i,1} v_{i,\text {S}1}^2 + m_{i,2} v_{i,\text {S}2}^2 \right) + \frac{1}{2} \varTheta _i {\dot{\varphi }}_i^2 \end{aligned}$$

with the translational velocities of the masses \(v_{i,\text {S}j}\), the mass moment of inertia \(\varTheta _i\) and the corresponding rotational velocity \({\dot{\varphi }}_i\). Its potential energy originates from the energy stored in the elasticity and the gravitational potential energy of the masses

$$\begin{aligned} U_{\text {bar},i} = \frac{1}{2} k_{\text {bar},i} \xi _i^2 + m_{i,1} g_0 y_{i,1} +m_{i,2} g_0 y_{i,2}, \end{aligned}$$

where \(\xi _i\) is the elongation of the element, \(g_0\) is the standard gravity of Earth and \(y_{i,j}\) is the relative distance of each mass to the ground.

Fig. 2
figure 2

Model of a bar consisting of two masses and a spring

Beam model All elements that experience bending moments are modeled as beams. This applies especially to the lever, which connects three points instead of two and is marked as a thick gray line in Fig. 1. Like the bar model, the beam model is based on the equations of the finite element method and serves as the basis for modeling the lever under bending load. Since the lever in total features three joints, the model can be seen as two flat beam elements arranged in a row. A lumped mass model is set up in which all elements outside the main diagonal of the mass matrix are neglected. The stiffness of each finite element results in a stiffness matrix

$$\begin{aligned} K_{\text {beam},i,\text {element}} = \begin{bmatrix} k_{i,\alpha } &{}\quad &{}\quad &{}\quad -k_{i,\alpha } &{}\quad &{}\quad \\ &{}\quad k_{i,\beta } &{}\quad k_{i,\beta } l_i &{}\quad &{}\quad -k_{i,\beta } &{}\quad k_{i,\beta } l_i \\ &{}\quad k_{i,\beta } l_i &{}\quad k_{i,\beta } l_i^2 &{}\quad &{}\quad -k_{i,\beta } l_i &{}\quad k_{i,\beta } l_i^2 \\ -k_{i,\alpha } &{}\quad &{}\quad &{}\quad k_{i,\alpha } &{}\quad \\ &{}\quad -k_{i,\beta } &{}\quad -k_{i,\beta } l_i &{}\quad &{}\quad k_{i,\beta } &{}\quad -k_{i,\beta } l_i \\ &{}\quad k_{i,\beta } l_i &{}\quad k_{i,\beta } l_i^2 &{}\quad &{}\quad -k_{i,\beta } l_i &{}\quad k_{i,\beta } l_i^2 \\ \end{bmatrix} \end{aligned}$$

using the simulated stiffnesses \(k_{i,\alpha }\), \(k_{i,\beta }\) and the length of the beam \(l_i\). Since the lever consists of two finite elements, two \(6 \times 6\) element matrices are joined together to form a \(9 \times 9\) stiffness matrix according to the finite element method. The result is the stiffness matrix \(K_{\text {beam},i}\). As shown in Fig. 3 the total mass of the lever is distributed to the model masses

$$\begin{aligned} m_{i,1}=\frac{m_i}{4} , \quad m_{i,2}= \frac{m_i}{2} \quad \text {and} \quad m_{i,3} =\frac{m_i}{4}. \end{aligned}$$

As the kinetic energy of a beam is equivalent to the kinetic energy of a bar, this results in

$$\begin{aligned} T_{\text {beam},i} = \frac{1}{2}\sum _{j} m_{i,j} v_{i,\text {S}j}^2 + \frac{1}{2}\sum _{j} \varTheta _{i,j} {\dot{\varphi }}_{i}^2, \end{aligned}$$

where \({\dot{\varphi }}_{i}\) is the rotation of the complete beam. For the calculation of the potential energy, the sum of the positional energy of the masses and the elastic energy

$$\begin{aligned} U_{\text{beam},i} = y_{\text {beam},i}^\top K_{\text{beam},i} y_{\text{beam},i} + \sum _j m_{i,j} g_0 y_{i,j} \end{aligned}$$

is calculated where

$$\begin{aligned} y_{\text {beam},i} = \left[ x_{i,1}, y_{i,1}, \varphi _{i}, \xi _{i,1}, \eta _{i,1}, \psi _{i,1}, \xi _{i,2}, \eta _{i,2}, \psi _{i,2} \right] ^\top \end{aligned}$$

are the states of the beam.

Fig. 3
figure 3

Model of a beam consisting of three masses and two springs

Bearing model The bearings are modeled as spring elements between the joints of the couplers. Since the radial bearing force applied by the bearings is a function of deflection, the deflection must be described with the position coordinates of the bodies. Assuming a constant joint stiffness, the potential energy results in

$$\begin{aligned} U_{\text {joint},i} = \frac{1}{2} k_{\text {joint},i} \; \varDelta r_i^{2} \end{aligned}$$

with the joint’s stiffness \(k_{\text {joint},i}\) and its radial deflection \(\varDelta r_i\).

Friction model Friction occurs in all bearings in which a relative movement takes place and will cause a hysteresis in the load-displacement curve. As the relative movements in the joints is small compared to the movement of the pressure bar that connects point D with point R (see Fig. 1), only the bearings guiding this bar are considered. Nevertheless, a variety of model approaches exist for friction. In order to test which approach is the closest to reality in this case, three rate-independent friction models of different complexity are pursued.

\({\mathcal {M}}_1\)::

Since friction is hard to model, it is often neglected which leads to the model equation

$$\begin{aligned} q_{\text {fric}}(t) = 0. \end{aligned}$$
\({\mathcal {M}}_2\)::

The discontinuous Coulomb friction model

$$\begin{aligned} \begin{aligned} q_{\mathrm {fric}}(t) = q_{\mathrm {c}} \; \mathrm {sign} \left( \frac{\mathrm {d}R_x}{\mathrm {d}t} \right) = q_{\mathrm {c}} \; \mathrm {sign} \left( \frac{\mathrm {d}q_{\mathrm {P}}}{\mathrm {d}t} \right) \end{aligned} \end{aligned}$$
(13)

gives a more accurate description of friction in which \(q_\text {c}\) is a friction constant. As we can assume that the sign of \(\frac{\mathrm {d}R_x}{\mathrm {d}t}\) is the same as the sign of \({\dot{q}}_\text {P} = \frac{\mathrm {d}q_\mathrm {P}}{\mathrm {d}t}\) we can simplify the model to be only discontinuous in the input variables and not in the states.

\({\mathcal {M}}_3\)::

As a third model approach, we consider a continuous friction model with rate-independent memory that takes past force data into account (Bertotti and Mayergoyz 2006). Here, we take into account the force of the current time step \(t_i\) and the one before \(t_{i-1}\):

$$\begin{aligned} q_\text {fric}(t_i)&= \mu \underbrace{\left( q_{\text {P}}(t_i), q_{\text {P}}(t_{i-1}), q_{\text {P,min}}(t_i), q_{\text {P,max}}(t_i) \right) }_{{\bar{u}}} , \end{aligned}$$

as well as the minimum and maximum force value during loading and unloading cycles

$$\begin{aligned} q_{\mathrm {P,min}}(t_i)&= {\left\{ \begin{array}{ll}\min (q_{\mathrm {P}}(t_i),q_{\mathrm {P,min}}(t_{i-1})) &{} \text {if } {\dot{q}}_\mathrm {P}(t_i)\ge 0\\ q_{\mathrm {P}}(t_i) &{} \text {if } {\dot{q}}_\mathrm {P}(t_i)<0 \end{array}\right. } \\ q_{\mathrm {P,max}}(t_i)&= {\left\{ \begin{array}{ll}q_{\mathrm {P}}(t_i) &{} \text {if } {\dot{q}}_\mathrm {P}(t_i) \ge 0\\ \min (q_{\mathrm {P}}(t_i),q_{\mathrm {P,max}}(t_{i-1})) &{} \text {if } {\dot{q}}_\mathrm {P}(t_i)<0 \end{array}\right. } \end{aligned}$$

that are internal variables and reduce the complexity of memorizing a large number of time steps. Based on the Preisach model (Preisach 1935) which is a discontinuous hysteresis model, we used an adapted continuous model which is comparable to a neural network topology (Mayergoyz 2003). Figure 4 shows the topology of the used model where \(\rho _i = \arctan ({\bar{u}})\). To train the model, we have to determine the friction force which is the difference of the actual measured process force and the estimated force by the inverse model. The inverse model describes the required force under a measured displacement z and contains the estimated stiffness parameters that have been determined without any friction model in a first step. Applying this to measurements of a loading cycle, the full hysteresis can be identified and used to train the friction model.

Fig. 4
figure 4

Model topology of the classical discontinuous (left) and the adapted Preisach model (right)

Synthesis of the press model The press model consists of 2 rigid bodies, 5 bars, 1 beam, 10 joints and the elasticity of the press frame which represents support points to the environment. This results in a 34-dimensional state vector y.

Equation (12) can now be written as

$$\begin{aligned} f_\text {kin}(y, {\dot{y}}, \ddot{y}) + f_\text {pot}(y) = q(t), \end{aligned}$$

with the contribution of the kinetic energy \(f_\text {kin} \left( y, \dot{y}, \ddot{y} \right)\) and of the potential energy \(f_\text{pot} \left( y \right)\) and the excitation forces

$$\begin{aligned} q(t) = q_\text {P}(t) - q_\text {fric}(t). \end{aligned}$$

In this case we are interested in the quasi-static model to identify uncertain stiffness parameters of two bars \(k_{\text {bar},7}\) and \(k_{\text {bar},5}\), in the following denoted as \(k_7\) and \(k_5\) as shown in Fig. 1. Thus, all derivatives of y are set to zero such that

$$\begin{aligned} f_\text {kin}(y, {\dot{y}}=0, \ddot{y}=0) + f_\text {pot}(y) = q(t) \end{aligned}$$

where the function \(f_\text {pot}\) contains the parameters \(k_7\) and \(k_5\).

To identify the model parameters, a process force \(q_\mathrm {P}\) is applied using an external pneumatic force source.

6 Numerical results for the 3D servo press

We implemented the described procedure to detect model uncertainty using MATLAB R2017b with the included lsqnonlin solver for the parameter identification problems and applied it to the gear mechanism model of the 3D Servo Press.

We use measurements for 29 different process forces (these are the input variables), whereby the first 15 forces describe loading and the last 14 describe unloading of the 3D Servo Press. For each process force we measure the vertical displacements in point D, the horizontal displacements in point F and the vertical displacements in point \(B_0\) when applying a vertical process load \(q_\mathrm {P}\) on the press, see Fig. 1. The displacements are measured in \(\upmu \mathrm {m}\) and the forces in \(\mathrm {N}\).

In this particular application we do not distinguish between initial data and actual measurements. Thus, line 04 in Table 1 is omitted. Each measurement is performed \(n_M = 6\) times on the prototype of the 3D Servo Press although with slightly differing forces \(q^i_{j}\) for each measurement series \(i = 1, \ldots , 6\) due to variations in the pneumatic pressure when applying the force. We know the desired setpoint values for the applied forces \(q^\mathrm {d}_j\) for all \(j = 1, \ldots , 29\). However, the experimental setting makes it impossible to completely eliminate variations in the inputs. There always remains a degree of uncertainty. Since the deviations are very small, a linear interpolation between the measured data \(z_{ijk}\) and the desired inputs \(q^\mathrm {d}_j\) is justified. We need all inputs fixed to these setpoint values, otherwise the definition of a measurement series is violated. In the following, we linearly interpolate the measurements \(z\in {\mathbb {R}}^{6 \times 29 \times 3}\) such as to make them comparable for each setpoint value \(q^\mathrm {d}_j\), respectively. More specifically, we apply the correction

$$\begin{aligned} z_{ijk} :=\dfrac{q^\mathrm {d}_j}{q^i_{j}}\cdot z_{ijk} \end{aligned}$$

for all \(i = 1, \ldots , 6\), \(j = 1, \ldots , 29\) and \(k = 1, \ldots , 3\). We work with these corrected measurements from now on.

In a first step, we analyze the experimental data. In our modeling we assumed that the measurements are normally distributed. Since the true values \(z^\star\) of the quantities that are measured are unknown to us, we check whether the measurement errors \(\varepsilon\) are normally distributed with zero mean instead. In order to verify this assumption, we perform a Shapiro-Wilk goodness-of-fit-test (D’Agostino 1986) applied to the measurement errors

$$\begin{aligned} {\tilde{z}}_k :=\begin{pmatrix} z_{2 k j}-z_{1 k j} \\ z_{4 k j}-z_{3 k j} \\ z_{6 k j}-z_{5 k j} \end{pmatrix}_{j = 1, \ldots , 29}. \end{aligned}$$

for each sensor \(k = 1, \ldots , 3\) with test level \(\alpha = 5 \%\). Evidently, \(z_{1 k j}, \ldots , z_{6 k j}\) are independent and identically distributed with the same mean and the same standard deviation. Hence, the rows in \({\tilde{z}}_k\) are independent and identically distributed with mean zero. The hypothesis that each \({\tilde{z}}_k\) is normally distributed with mean zero and variance estimated from \({\tilde{z}}_k\) is now tested and the results are shown in Table 2. We observe that the hypothesis cannot be rejected with an error of the first kind below \(5 \%\) for all sensors, respectively.

Table 2 Analysis of the measurement data

Having experimental data available, the aim is to reduce the costs for obtaining new measurements in view of future experiments on the real press, i.e., we want to reduce the number of involved sensors. The parameters to be estimated, \(k_5\) and \(k_{7}\), describe the axial stiffness of elastic components of the 3D Servo Press, see Sect. 5. Since the number of involved sensors must be greater or equal to the number of estimated parameters, compare Assumption 3 and the comments below this assumption, we want to choose two of the three sensors for which the design criterion of the covariance matrix of the estimated parameters becomes minimal. Since the number of combinatorial possibilities is small, we solve problem (11) by enumeration. Thus, we compute all design criteria that are mentioned in Sect. 3 for the model \({\mathcal {M}}_3\) for all possible sensor combinations. The results are shown in Table 3.

Table 3 Outcome for the optimal design of experiments problem for model \({\mathcal {M}}_3\)

We observe that omitting the second sensor increases all design criteria by a factor of \(\approx \! 10^{+20}\) compared to the initial sensor configuration, which is an indication that the covariance matrix became close to singular. A removal of the first sensor, though, increases the maximal eigenvalue and the volume of the confidence ellipsoid slightly. However, omitting the last sensor, i.e., measuring the vertical displacements in point \(B_0\), leads to the smallest maximal eigenvalue. We choose the E-criterion as design criterion for reasons explained in Sect. 3. Thus, we proceed with the optimal sensor combination 110, i.e., we choose to measure the vertical displacements in point D and the horizontal displacements in point F. We come to the same conclusion after investigating the results for the models \({\mathcal {M}}_1\) and \({\mathcal {M}}_2\).

Next, we want to test whether our algorithm recognizes the best out of three different models used to describe the data. Therefore, we recall the following friction models from Sect. 5:

$$\begin{aligned} {\mathcal {M}}_1 \;&: \; \text { simple linear model without hysteresis recognition}, \\ {\mathcal {M}}_2 \;&: \; \text { Coulomb's friction model for hysteresis}, \\ {\mathcal {M}}_3 \;&: \; \text { friction behavior learned by a neural network}. \end{aligned}$$

Figure 5 shows the different behavior of these models plotted together with the data.

Fig. 5
figure 5

Repeated measurements of the force-displacement curve of the linkage mechanism and comparison with the output of the models \({\mathcal {M}}_1, \; {\mathcal {M}}_2\) and \({\mathcal {M}}_3\)

For the assembly of \({\mathcal {M}}_3\) we need actual measurements to train the neural network as described in Sect. 5. For this purpose we employ four data series. The remaining two measurement series will be used for the application of our algorithm to detect model uncertainty in the press. In order to make the following test strategy fair, we only use these two measurement series for all models alike since the more data series are involved, the harder it is for a model to reproduce them all.

The variance of the sensors is crucial for the size of the confidence ellipsoid of the parameter estimates. We fix these values to be the sum of the variances of the repeated measurement process, see Table 2, and other internal errors as specified by the manufacturer of the sensor. Thus, we take for the standard deviation the values

$$\begin{aligned} \sigma _1&= \sqrt{\left( 5.5147 \times {10^{-06}}\right) ^2 + \left( 1.4142 \times {10^{-05}}\right) ^2} \approx {1.518\times {10^{-05}}},\\ \sigma _2&= \sqrt{\left( 3.3108 \times {10^{-06}}\right) ^2 + \left( 3.6055 \times {10^{-06}}\right) ^2} \approx {4.895\times {10^{-06}}}, \\ \sigma _3&= \sqrt{\left( 1.4974 \times {10^{-06}}\right) ^2 + \left( 3.6055 \times {10^{-06}}\right) ^2} \approx {3.904\times {10^{-06}}}. \end{aligned}$$

In order to investigate the validity of the models \({\mathcal {M}}_1, {\mathcal {M}}_2\) and \({\mathcal {M}}_3\), we generate calibration and validation sets of almost equal size whereby we omit the first \(q^\mathrm {d}_1 = 0\) and last \(q^\mathrm {d}_{29} = 0\) “applied” force because they are referring to the unloaded press. We first split the test set consisting of the two measurement series into one loading \({\mathcal {S}}^l\) and one unloading \({\mathcal {S}}^u\) test set and consider each set separately. Thus, for the loading set \({\mathcal {S}}^l\), we again split the test set into one calibration \({\mathcal {S}}^{l_1}_c\) and one validation \({\mathcal {S}}^{l_2}_v\) test set. We do the same for the unloading case. Next, we test loading versus unloading and again split the set into one calibration \({\mathcal {S}}^{l}_c\) and one validation \({\mathcal {S}}^{u}_v\) test set. Lastly, we test loading together with unloading and split the set into one calibration \({\mathcal {S}}^{lu}_c\) and one validation \({\mathcal {S}}^{lu}_v\) test set, compare Table 4. The splitting is done manually and in this particular way in order to catch the worst case in the coming hypothesis test, which we expect to be the case for loading vs. unloading.

Table 4 Summary of the calibration and validation test sets for the different cases
Table 5 Test results for the 3D Servo Press models \({\mathcal {M}}_1, {\mathcal {M}}_2\) and \({\mathcal {M}}_3\)

For each of the three models and for each of the \(n_\mathrm {tests} = 4\) test scenarios we perform the hypothesis test as described in Table 1 starting from line 08. Table 5 lists the results. The last three columns show the minimal test level, i.e., the p-value, such that the null hypothesis can only just be rejected. We choose the common \(\mathtt {TOL} = 5 \%\) bound for the FWER and apply the Bonferroni correction which reduces the individual test level to \(\mathtt {TOL} / n_\mathrm {tests} = 1.25 \%\). Comparing the values for \(\alpha _\mathrm {min}\), we clearly see that the model \({\mathcal {M}}_1\), which does not account for hysteresis, is rejected for all test scenarios. Thus, the data cannot be described by this simple linear model. We demand \({\mathcal {M}}_1\) to be updated such as to correctly represent hysteresis. This is done in a first attempt by the Coulomb friction model, see Eq. (13). We thus perform our algorithm on \({\mathcal {M}}_2\). While this model seems to be able to describe loading and unloading separately, it fails to describe both scenarios with the same set of parameters. Since hysteresis is a continuous effect, the discontinuous Coulomb friction model still fails to reproduce the fine nuances of the experimental data. Our proposed method is able to detect this deficiency in the third and fourth test scenario, where the model is clearly rejected since the \(\alpha _\mathrm {min}\) is very small. Hence, a neural network strategy has been employed to further improve the model output as mentioned in Sect. 5. The last column of Table 5 shows that model \({\mathcal {M}}_3\) is well-suited to explain the hysteresis phenomenon.

We finally want to compare the numerical results of our approach with a well-known method to quantify the model discrepancy term \(\delta _{ik}(q_j) :=z_{ijk} - h_k(y_j, p, q_j)\) according to Kennedy and O’Hagan (2001), see Eq. (1). To do so, a Gaussian Process with quadratic basis functions for the mean and squared exponential covariance functions is employed. The \(95 \%\) confidence interval for the discrepancy term is computed analytically and the min-max values of these intervals over the inputs are plotted in Fig. 6 for the second sensor for the loading and unloading case.

Fig. 6
figure 6

Min–max values for the \(95 \%\) confidence intervals of the second sensor data that served as input for the Gaussian Process model of the discrepancy function for \({\mathcal {M}}_1\), \({\mathcal {M}}_2\) and \({\mathcal {M}}_3\) and separated according to the loading or unloading case

We observe that both for the loading and for the unloading case, model \({\mathcal {M}}_3\) has the smallest expansion in the confidence interval of the discrepancy function. This is an indication that \({\mathcal {M}}_3\) is best when compared to \({\mathcal {M}}_1\) and \({\mathcal {M}}_2\). However, the method only quantifies the extent of model uncertainty but does not provide a quantified error probability when opting for the best model. A threshold on the expansion of the discrepancy term would help, but this needs to be adjusted in every application and is thus unable to serve as a general criterion. Additionally, we cannot infer from Fig. 6 that model \({\mathcal {M}}_2\) performs better than model \({\mathcal {M}}_1\) as we did in our approach.

To sum up, we have seen that our algorithm is able to detect model uncertainty and to compute quantified error probabilities in the model selection process. By suitable choice of the calibration and validation test sets, it can even help to identify (neglected) aspects of the 3D Servo Press model that need to be improved. Evidently, the modeling errors can also be seen in Fig. 5 directly or by analyzing the model discrepancy term as pointed out in Fig. 6. Our algorithm, though, provides an automatized way to make a choice based on quantified error probabilities if a model needs to be improved regardless of the dimension of the model’s output.

7 Conclusion

In this paper we have seen how model uncertainty can be identified by combining the optimal design of experiments approach with parameter identification and statistical testing. Optimal design of experiments can be used to choose sensors which allow for parameter estimates with minimal variance. Using the covariance matrix we can then compute confidence ellipsoids which should include the parameter estimates with high probability. If some other test set leads to a solution of the parameter identification outside such a confidence ellipsoid then we can conclude with a small error of the first kind that not all measurements can be explained by the same model with the same set of parameters. We then introduced the 3D Servo Press as an application and demonstrated our approach on mathematical models of the press. This allowed us to show that two simple press models are not valid, since specific effects like hysteresis are not sufficiently modeled. A sophisticated mirroring of the hysteresis effect, though, led to a mathematical model that is well-suited to explain the data and thus to make predictions for future experiments.

It would be interesting to further test our method with models that depend on more than two parameters and to have a larger number of possible sensor locations available. Furthermore, instead of only choosing sensors once in the beginning, it is also possible to re-solve the optimal experimental design problem using the parameters identified through some first experiments to iteratively strengthen the quality of the parameter estimates, in a similar way as proposed by Körkel et al. (2004).