Identification of Model Uncertainty via Optimal Design of Experiments applied to a Mechanical Press

In engineering applications almost all processes are described with the aid of models. Especially forming machines heavily rely on mathematical models for control and condition monitoring. Inaccuracies during the modeling, manufacturing and assembly of these machines induce model uncertainty which impairs the controller's performance. In this paper we propose an approach to identify model uncertainty using parameter identification and optimal design of experiments. The experimental setup is characterized by optimal sensor positions such that specific model parameters can be determined with minimal variance. This allows for the computation of confidence regions, in which the real parameters or the parameter estimates from different test sets have to lie. We claim that inconsistencies in the estimated parameter values, considering their approximated confidence ellipsoids as well, cannot be explained by data or parameter uncertainty but are indicators of model uncertainty. The proposed method is demonstrated using a component of the 3D Servo Press, a multi-technology forming machine that combines spindles with eccentric servo drives.


Introduction
In science, technology and economics mathematical models are commonly used to describe physical phenomena, to solve design problems and to manage production processes. The employment of these models entails various uncertainties. It has been observed that the dominant uncertainties arise from our lack of knowledge about system parameters and from deficiencies in the modeling itself [21]. We understand models to be mathematical constructs consisting of variables, constants and their relations to each other. All present knowledge about the system or phenomenon of interest is described and represented by that model. We mean by model uncertainty that certain relations between model variables and parameters are simplified or insufficient to mirror reality. Thus, the present description of the system or phenomenon of interest is inaccurate or incomplete in the sense that there are aspects which have been ignored. As a consequence, any simulated process or manufactured product that is based upon these models is impaired in its predictive quality or usage. Hence, it is important to develop tools and algorithms for the identification and control of model uncertainty.
In order to identify whether a model is inadequate, one has to compare the model output to actual experimental data. It is, however, difficult to derive a simple criterion for the model to be accurate, since the measurement data is imperfect and subject to uncertainties as well. These data inaccuracies arise from irreducible randomness and from systematic errors in the measurement process. As a consequence, uncertainties are transferred from experimental data to the parameters. In the process of model calibration, the model parameters are adjusted such as to make them compatible to experimental observations. This is typically performed by solving inverse problems that are often ill-posed or under-determined. Thus, additional prior information is required which is well known to be another source of uncertainty.
In this paper we propose an algorithm to detect model uncertainty using both parameter estimation and the optimal design of experiments approach. Here, we define parameter estimation to be the process of adjusting model parameters to the data and optimal design of experiments to be the best choice among experimental setups with reference to the reliability of the parameter estimates. Our methodology is able to distinguish between data and parameter uncertainties on the one hand and model uncertainty on the other hand. Particularly, we interpret inconsistencies in parameter estimates from different measurement series as an indicator that the underlying mathematical model is unable to describe all measurement series with the same set of parameter values. We assume neither an a priori distribution nor a specific form of model uncertainty in the mathematical equations.
The first step before estimating model parameters is to acquire measurements that capture the behavior of the system well. This step can be costly if many physical properties of the system have to be observed in each experiment. In certain engineering applications, the situation occurs that measurements are taken from a small-sized prototype instead of the expensive product which is often unavailable yet. It is therefore desirable to know beforehand the optimal sensor positions for the actual product by considering the results of experiments on the prototype. Thus, it is valuable to reduce the number of sensors if this does not downgrade the reliability of the parameter identification. Additionally, removing unreliable sensors may even improve the quality of the estimate. This can be done using the concept of optimal design of experiments, i.e., by deciding which sensors are actually best for gathering data in order to solve the parameter identification problem with minimal variance. Using these kinds of sensors and their optimal positions, measurements with optimal informational value can be obtained.
To determine model uncertainty based on these measurements, there are two possible procedures which depend on the previous knowledge regarding the model parameters.
In the first case the real model parameters or good approximations thereof are known beforehand. In this case one can compute a confidence ellipsoid for a given confidence level 1 − α, with a fixed value α ∈ (0, 1), around the parameters estimated from data and check whether the real parameter values (from prior knowledge) are contained in this ellipsoid. Unfortunately, in reality the actual values of the parameters are not always known exactly. Therefore, the second approach circumvents these difficulties in the following way: instead of using all measurements for the parameter identification problem, we split the experimental data into a calibration and a validation set. The approach starts by solving the parameter identification problem for the calibration set. Then we compute a confidence ellipsoid for a given confidence level 1−α and if the model is correct, then the solution of the parameter identification problem for the validation set should lie within this confidence ellipsoid. If the optimal parameters for the validation set are outside the 1 − α confidence ellipsoid then this is an indication of model uncertainty.
In the literature, model uncertainty is especially discussed in model-based fault diagnosis of machines. Simani et al. [26] treat uncertainties in the modeling by bounded error terms in the model equations. This method assumes a prior form of uncertainty which we do not.
Our approach is similar to the idea of Körkel et al. [17], Bauer et al. [4] and Galvanin et al. [12] who also combined optimal design of experiments with parameter identification. However, they only used this method to reliably find optimal parameter values. Asprey and Macchietto [2] continued this methodology to choose between competing models via maximizing a measure of the divergence of model predictions. In our approach no such measure is needed, we only employ the parameter estimates and their covariance matrices. Another difference is that we also consider higher order derivatives in the computation of the covariance matrix which is used to determine the confidence ellipsoid, see Bard [3] for more details.
There is extensive literature on Bayesian parameter calibration and validation. However, there seems to be only a few references dealing with model uncertainty from a general viewpoint. Lima et al. [18] describe a general method to select the best model based on Occam's Plausibility Algorithm [20] and Bayesian calibration. However, we do not adopt a Bayesian perspective but we involve design of experiments instead to sharpen the parameter estimates.
Staying within this Bayesian framework the same question whether a set of measurements for a given model is adequately described by the same set of parameters is addressed by Tuomi et al. [27]. Using a given prior distribution for the parameters, they derive an inequality to dismiss the assumption in case the probability for the data to be obtained under different parameter sets is significantly higher. In this work we discuss the same question but from a probabilistic point of view without any assumptions on the prior distribution of the parameters.
Another important approach to identify and control model uncertainty was created by Kennedy and O'Hagan [16]. This method is based on the assumption that the true values of the quantities of interest are the sum of the model output h(q, p), with input q and parameters p, and the model discrepancy term δ(q), i.e., the measurements z should satisfy with independent observational noise ε. Then parameter identification can be performed for (1) to obtain best guesses for both the model parameters p as well as the parameters of the model uncertainty δ. Arendt et al. [1] use this approach for model updating and to distinguish between the effects of model calibration and model discrepancy. However, it has been shown by Brynjarsdóttir and O'Hagan [6] that the success of this approach heavily depends on incorporating a priori knowledge of the specific form of model uncertainty into the representation of δ, which is often assumed to be a specific type of stochastic process, but is actually not known beforehand. In contrast, our approach does not need any assumptions about the specific form of the model uncertainty.
One particular case of technical systems with a multitude of uncertain parameters and uncertain physical effects that challenge the modeling process are forming presses. Forming presses are highly loaded machines, which have kinematic degrees of freedom to perform a motion and to apply large forces on a workpiece. During that motion, the workpiece is then formed into a new shape. This can cause a considerable deflection of machine components which is of high technical importance and therefore we aim to model. In this application, we consider a mechanical forming machine which consists of a linkage mechanism. For this cause, the kinematic chain consisting of multiple mechanical components has to be approximated by a lumped parameter system to reduce the number of parameters. When modeling a machine we typically pursue one of two objectives that lead to different lumped parameter models: an accurate elastic behavior at low frequencies or an accurate frequency response [9]. In this application we seek a model that represents an accurate elastic behavior at low excitation frequencies.
To estimate the stiffness of components with non-uniform cross-sections, a finite element model is a typical technique. In a second step, the finite element model is reduced to the lumped parameter model. This model order reduction entails uncertainty besides a variety of uncertain influencing variables like material properties and inexact geometries. Hence, for some components it is necessary to identify the stiffnesses after the assembly of the machine. Due to the deflection, a relative movement of the components occurs and as a result friction dissipates a portion of this kinetic energy. However, for the modeling of friction on a macroscopic level, multiple phenomenological models exist so far [5].
The paper is organized as follows. First we introduce the parameter identification problem and its covariance estimation. Based on the resulting covariance matrix, we then formulate the problem of optimal experimental design to find optimal sensor positions which lead to the smallest variance of the resulting parameter identification. In Section 4 we describe in more detail how parameter identification and optimal design of experiments can be used to identify and control model uncertainty. Afterwards we introduce the working principle and the mathematical model of the 3D Servo Press as a practical example. The application of our proposed method to the 3D Servo Press is done in Section 6, where we also present numerical results. We end the paper by giving some concluding remarks.

The Parameter Identification Problem and its Covariance Estimation
In this section we present the parameter identification problem as it is done by Körkel et al. [17]. We first introduce some basic notation and assumptions, formulate the parameter estimation problem and then deduce the covariance matrix as well as the considered confidence regions.
The mathematical model is given by a state equation E : R dy × R np × R dq → R d E coupling the state vector y j ∈ R dy and the parameters p ∈ R np for any control variable q j ∈ R dq , i.e., E(y j , p, q j ) = 0, for j = 1, . . . , n q .
This state equation may be a discretized form of a partial differential equation with large dimensions d y and d E . We assume that (2) has a unique solution y j for any given p and q j .
The model parameters p are in general not known beforehand. Therefore, we need measurement data to obtain appropriate estimates. Let n S denote the number of allocated sensors for data collection and n M be the total number of performed measurement series. The control variables q ∈ R nq×dq represent external boundary forces in our case. Now, the aim of the parameter identification problem is to find model parameters p ∈ R np that best fit the measurements z ∈ R n M ×nq×n S for given control q. We assume that the measurements z ijk are collected by prepositioned sensors where each sensor k is accurate up to a standard deviation σ k ∈ R for each control q j and in each measurement series i. In general, it is not possible to measure all of the state components directly. Therefore, we introduce an observation operator (y j , p, q j ) → h(y j , p, q j ) ∈ R n S that maps state, parameters and controls to the actual quantity that is measured. Since we will later choose an optimal subset of all the possible sensors, we introduce binary weights ω ∈ {0, 1} n S such that ω k = 1 if and only if sensor k is used.
We now apply the least-squares method to find the optimal parameter values which minimize the discrepancy between measurement data and model output weighted by the standard deviation of each sensor, respectively.
Remark 1. Alternatively, we can also assume that each sensor k has a given standard deviation σ ijk in each measurement scenario i ∈ {1, . . . , n M } and for each control q j , j ∈ {1, . . . , n q }. However, to keep notation simple, we assume the working precision σ k of each sensor to be constant over all measurement series and all controls.
For convenience we rewrite problem (3) in vector form of dimension n = n M n q n S and eliminate the state equation by inserting the unique state solution into the objective function leading to the optimization problem with the notations , Thus, the vector h consists of n M copies of the vector (h(y j (p), p, q j )) j∈{1,...,nq} , while Ω and Σ are diagonal matrices consisting of n q · n M copies of ω 1 , . . . , ω n S and σ 1 , . . . , σ n S , respectively. The measurement tensor z is vectorized as well and for convenience we use the same symbol.
The inverse problem (3) or rather (4) can be solved using, e.g., an extended Gauss-Newton method, see Dennis et al. [8] for more details. We denote the solution of this optimization problem by p(z, Ω).
For the quantification of data uncertainty we now assume the measurement errors to be normally and independently distributed, i.e., where z are the true but unknown values of the quantities that are measured. Since the measurement data z is a random variable, the estimated parameters are also random variables. Thus, we are interested in how a perturbation of z propagates to p(z, Ω).
Since z is unknown in general, we take some given measurementsz as a best known approximation. To compute a distribution of p(z, Ω), we linearize the solution operator z → p(z, Ω) of the parameter identification problem aroundz, such that the linearized parameter identification is again Gaussian distributed, compare, e.g., Proposition 3.2 in [23]. In order to characterize this distribution, we compute the covariance matrix C for the linearized parameter estimation, while we approximate the mean by the solution p(z, Ω) of the nonlinear parameter estimation problem (4) to the given set of measurementsz. For convenience to the reader we write p instead of p(z, Ω). The covariance matrix C is then given by where we approximate the mean by p.
Thus, the approximated confidence ellipsoid for a certain level 1 − α of the multivariate Gaussian distributed solution of the parameter identification is given by where γ 2 (α) := χ 2 np (1 − α) is the quantile of the χ 2 distribution with n p degrees of freedom. For more details on multivariate Gaussian distributions and confidence ellipsoids, see for example Scheffé [24].
To derive an analytical expression of the covariance matrix C in (5), following Bard [3], we use the linearized version of the mapping z → p(z, Ω) at p: The sensitivity ∂ z p can then be determined using the first order optimality condition for the parameter identification problem (4), i.e, In order to use the implicit function theorem, we make the following assumption: Remark 2. Note that Assumption 1 (i) is implied by the condition that the observation operator h(y(p), p, q) is twice continuously differentiable with respect to p.
Using Assumption 1, we can now apply the implicit function theorem. Thus, equation (7) implicitly defines a mapping z → p(z, Ω) and its derivative is given by in any direction δz. More precisely, we have with J(Ω) := ∂ p r(p,z) and where J(Ω) ∈ R n×np and S(Ω) ∈ R np×np are matrices. Thus, ∂ 2 pp f (p,z, Ω) = H(Ω). The exact calculation of J(Ω) and S(Ω) is given in the appendix, which requires the following assumption to allow the usage of the implicit function theorem: We want to make sure that the principal part J(Ω) ΩJ(Ω) stays invertible when perturbing the sensor variables: Assumption 3. The matrix ΩJ(Ω) has full column rank, i.e., rank(ΩJ(Ω)) = n p , and n S ≥ n p .
From this assumption we can infer invertibility of J(Ω) ΩJ(Ω), compare Körkel et al. [17] for more details. In other words it is always necessary to employ at least as many sensors as the number of parameters which should be estimated. This is an important constraint later in the optimization with respect to the sensor variables.
Thus, from (8) we obtain Using the calculations from above, the approximated covariance matrix is given by

Optimal Design of Experiments
The optimal design of experiments problem deals with the task of finding optimal sensor positions such that the reliability of the parameter estimation is maximized. The reliability also depends on the accuracy of the sensors used for the measurements, whereby each sensor k has a given variance σ 2 k . It is very common to measure the reliability of the parameter estimation by a singlevalued design function Ψ, see Bauer et al. [4] and Franceschini and Macchietto [11]. It is obvious that a small covariance leads to a high reliability of the parameter estimation. However, it is unclear what a small covariance means in terms of matrices. In general, there are different approaches how to choose the Ψ function. We list the most prominent ones according to Fedorov and Leonov [10] • A-criterion: Ψ A (C) = trace(C), the square of the half-length of the rectangle's diagonal that circumscribes the confidence ellipsoid, • D-criterion: Ψ D (C) = det(C), the square of the product of half-lengths of the confidence ellipsoid's principal axes, , the square of the half-length of the largest principal axis.
It seems natural to use the D-criterion due to its close connection to the volume of the confidence ellipsoid and its invariance with respect to transformations applied to the model parameters. However, this criterion tends to emphasize the most sensitive parameter [11]. The A-criterion ignores the amount of information on the off-diagonal elements of the covariance matrix. This is particularly inefficient when there is a high correlation between parameters. For the numerical example in this paper, we choose the E-criterion even though E-optimality may lead to a tolerable increase in volume of the confidence ellipsoid. From an experimental point of view, the ideal confidence region would be a rectangle. This enables experimenters to give confidence intervals for the quantities of interest. The E-criterion may serve this purpose by reducing the largest expansion of the confidence ellipsoid.
We formulate the optimal design of experiments problem as follows: min (4), The constraint g is a general, possibly nonlinear function which describes for example bounds on the number of used sensors. In our case, to fulfill the rank condition in Assumption 3, g must contain the inequality n p − ns i=1 Ω ii ≤ 0 . The optimal design of experiments problem (10) is thus a non-convex mixed-integer nonlinear program (MINLP). Such problems can be solved via spatial branch-and-bound, see, e.g., Burer and Letchford [7] for an overview.
Note, however, that for the correctness of the proposed approach, problem (10) does not necessarily need to be solved to optimality. Using a good but suboptimal sensor placement will not lead to any incorrect rejection of a model, since the variance of the parameter estimation becomes larger and therefore also the confidence ellipsoids increase. Thus, it is also possible to solve (10), which is the computationally most expensive step of the proposed approach, by heuristic methods. In our numerical example, the number of sensors is very small, so that a heuristic method may indeed provide satisfiable results.

Detecting Model Uncertainty
In this section, we discuss how optimal design of experiments and parameter identification can be used to detect model uncertainty. To do so, assume that all parameters of the model have a true physical meaning and that in case the model is correct, the solution of the parameter identification problem, computed for a sufficiently large number of measurement series with zero noise, corresponds to these real parameters. Then repeated solutions of the parameter identification problem for different measurements with differing controls should, within the boundaries of the model up to uncertainty of the measurements, deliver the same set of parameters. On the other hand, if one set of measurements leads to parameters which lie outside a given confidence set of the previous runs, then this implies that the model cannot replicate the results of all measurements reliably, i.e., the underlying model is inadequate.
As already explained in the introduction, the first step of identifying parameters of the model by fitting them to a given set of measurements is to actually acquire these measurements which can be extremely costly. Furthermore, the quality of the parameter estimation may even be improved by removing unreliable sensors. Therefore, we only acquire a minimal amount of measurement series which is needed for the computation of the optimal design of experiments introduced in the previous section to determine optimal sensor positions. In this case, we solve problem (10) with a restriction on the desired minimal number of used sensors to decide which sensors are actually essential to solve the parameter identification problem with minimal variance.
Our approach to detect model uncertainty is depicted in Figure 1. If the parameter values are known by a prior information source, e.g., by previously performed experiments, then we directly solve the optimal design of experiments problem to generate additional measurements with minimal costs. New parameter values are estimated from these data sets together with their confidence ellipsoid. If the previously known parameter values are outside this confidence ellipsoid then this is an indication of model uncertainty and the loop starts again at the deduction of a mathematical model. Now assume that a sufficiently large set S of measurements for variance minimal sensors is given and assume that the real values of the parameters which should be estimated are not known beforehand. Then split the test set into a calibration set S c and a validation set S v . This split can either be done in a homogeneous way to test whether the model is accurate in general, or it can be chosen in a way to test whether a specific physical effect is sufficiently modeled. For example, the test set could be split according to the magnitude of the controls to check if the results for both sets can be reproduced by the model for the same set of parameters. On the one hand, this approach can help to identify ranges of control variables for which the model works better or worse, and on the other hand, to detect specific effects which are not yet sufficiently implemented in the model.
The main part of the algorithm starts by solving the parameter identification problem for the calibration set S c leading to a solution p c = p(z Sc , Ω opt ). From the solution p c we can then compute the corresponding covariance matrix of the parameter identification as given in (9). This covariance matrix allows us to compute a confidence ellipsoid for a given confidence level 1−α using Equation (6). If the model is correct, then the solution of the parameter identification problem for the validation set S v should lie within this confidence ellipsoid. If the optimal parameters for the validation set are outside the 1 − α confidence ellipsoid, this is an indication of model uncertainty and the loop starts again at the deduction of a mathematical model, see Figure 1.

The 3D Servo Press Model
The method for detecting uncertainty is demonstrated at hand of a technical system, the 3D Servo Press [25], a forming machine which transmits the torques and forces of its drives onto a part to be formed, e.g. a car body part. Therefore, a forming machine is subject to high external forces during its motion which cause its mechanism to deflect. While a rigid body model is accurate during the unloaded state, it does not suffice during the forming operation [14]. Especially for the closed-loop control of forming machines, an accurate model is crucial as inaccuracies can cause the control to become unstable [15]. However, the modeling of forming machines requires a high degree of abstraction, since elastic bodies need to be reduced to bars and beams. Furthermore, nonlinear bearing stiffnesses as well as friction have to be taken into account.
where L = T − U is the Lagrangian consisting of the total kinetic energy T and the total potential energy U , y are the system states and q are the non-conservative forces. The non-conservative forces contain all external forces that are applied to the machine, i.e., the torque of the eccentric drive q ecc , the forces of the upper and lower spindles q su , q sl and the reacting process force q P . In this application we want to evaluate the elastic model and therefore fix the drives positions. Thus, only q P is applied and all other non-conservative forces are zero.
whereby the energies of the individual elements are given as follows.
• Bar model A direct approach to discretizing the bar while maintaining inertia and rigidity is the finite element method. It is based on the partial differential equation of the continuous bar and supplies the mass matrix M i and the stiffness matrix K i for an element of mass m i and stiffness k bar,i , which are given by As the actual elements do not have a uniform cross section, the stiffness is determined using a finite element simulation based on the ideal CAD model.

Remark 3.
However, the CAD model and finite element model are based on the detailed knowledge of the elastic modulus and the geometry of the components. Due to natural fluctuations in material production, the elastic modulus may vary from part to part. In addition, manufacturing limitations only impede geometric accuracy. Therefore, determining the stiffness by an a priori FEM simulation leads to an uncertain estimation of the actual stiffness and requires a parameter identification based on posterior measurements.
The kinetic energy of an individual bar shown in Figure 3 sums up to with the translational velocities of the masses v i,Sj , the mass moment of inertia Θ i and the corresponding rotational velocityφ i . Its potential energy originates from the energy stored in the elasticity and the gravitational potential energy of the masses where ξ i is the elongation of the element, g 0 is the standard gravity of Earth and y i,j is the relative distance of each mass to the ground. • Beam model All elements that experience bending moments are modeled as beams. This applies especially to the lever, which connects three points instead of two and is marked as a solid line in Figure 2. Like the bar model, the beam model is based on the equations of the finite element method and serves as the basis for modeling the lever under bending load. Since the lever in total features three joints, the model can be seen as two flat beam elements arranged in a row. A lumped mass model is set up in which all elements outside the main diagonal of the mass matrix are neglected. The stiffness of each finite element results in a stiffness matrix using the simulated stiffnesses k i,α , k i,β and the length of the beam l i . Since the lever consists of two finite elements, two 6×6 element matrices are joined together to form a 9×9 stiffness matrix according to the finite element method. The result is the stiffness matrix K beam,i . The total mass of the lever is distributed to the model masses As the kinetic energy of a beam shown in Figure 4 is equivalent to the kinetic energy of a bar, this results in whereφ i is the rotation of the complete beam. For the calculation of the potential energy, the sum of the positional energy of the masses and the elastic energy are the states of the beam. The bearings are modeled as spring elements between the joints of the couplers. Since the radial bearing force applied by the bearings is a function of deflection, the deflection must be described with the position coordinates of the bodies. Assuming a constant joint stiffness, the potential energy results in with the joint's stiffness k joint,i and its radial deflection ∆r i .

• Friction model
Friction occurs in all bearings in which a relative movement takes place and will cause a hysteresis in the load-displacement curve. As the relative movements in the joints is small compared to the movement of the pressure bar that connects point D with point R (see Figure 2), only the bearings guiding that bar are considered. Nevertheless, a variety of model approaches exist for friction so far. In order to test which approach is the closest to reality in this case, three rate-independent friction models of different complexity are pursued. 1) Since friction is hard to model, it is often neglected which leads to the model q fric (t) = 0.
2) The discontinuous Coulomb friction model gives a more accurate description of friction in which q c is a friction constant.
As we can assume that the sign of dRx dt is the same as the sign ofq P = dq P we can simplify the model to be only discontinuous in the input variables and not in the states.
3) As a third model approach, a continuous friction model with rate-independent memory that takes into account past force data is considered [5]. Here, we take into account the force of the current and last time step q fric (t) = µ (q P (t), q P (t − 1), q P,min (t), q P,max (t)) ū , as well as the minimum and maximum force value during loading and unloading cycles q P,min (t) =    min(q P (t), q P,min (t − 1)) ifq P (t) ≥ 0 that are internal variables and reduce the complexity of memorizing an infinite number of time steps [0, t]. Based on the Preisach model [22] which is a discontinuous hysteresis model, we used an adapted continuous model which is comparable to a neural network topology [19]. Figure 5 shows the topology of the used model where ρ i = arctan(ū). To train the model, we have to determine the friction force which is the difference of the actual measured process force and estimated force by the inverse model. The inverse model describes the required force under a measured displacement z and contains the estimated stiffness parameters that have been determined without any friction model in a first step. Applying this to all 29 measurements of a loading cycle, the full hysteresis can be identified and used to train the friction model.
Synthesis of the press model. The press model consists of 2 rigid bodies, 5 bars, 1 beam, 10 joints and the elasticity of the press frame which represents support points to the environment. This results in a 34-dimensional state vector y.
Equation (11) can now be written as with the mass matrix M (y), the contribution of the kinetic energy f kin (y,ẏ, t) and of the potential energy f pot (y, t) and the excitation forces In this case we are interested in the quasi-static model to identify uncertain stiffness parameters of two bars k bar,7 and k bar,5 , in the following denoted as k 7 and k 5 as shown in Figure 2. Thus, all derivatives of y are set to zero such that where the function f kin contains the parameters k 7 and k 5 .
To identify the model parameters, a process force q P is applied using an external pneumatic force source.

Numerical Results for the 3D Servo Press
We implemented the described procedure to detect model uncertainty using MATLAB R2017b and the included lsqnonlin solver for the parameter identification problems and applied it to the gear mechanism model of the 3D Servo Press.
We use measurements for 29 different process forces (these are the control variables), whereby the first 15 forces describe loading and the last 14 describe unloading of the 3D Servo Press. For each force variable we measure the vertical displacements in point D, the horizontal displacements in point F and the vertical displacements in point B 0 when applying a vertical process load q P on the press, see Figure 2. The displacements are measured in µm and the forces in N. In this section we adopt again the notation from Section 2.
Each measurement is performed n M = 7 times although with slightly differing forces due to variations in the pneumatic pressure when applying the force.
In a first step, we analyze the measurement data. In our modeling we assumed that the measurements z ∈ R 3×29×7 are normally distributed. Since the true values z * of the quantities that are measured are unknown to us, we check whether the measurement errors ε are normally distributed instead. In order to verify this assumption, we first perform a chi-squared goodness-of-fit-test [13] applied to the measurement errors with confidence α = 5%. To do so, we calculate the standard deviation of each sensor k using all the 116 measurements of the random variables .
If z kj1 , z kj2 , . . . , z kj7 are independent and identically distributed with the same mean and standard deviation then we can easily see that each row inz k is independent and identically distributed with mean zero and Var(z n+1,j,k − z n,j,k ) = 2Var(z n,j,k ), and therefore for n = 1, 3, 5 and j = 1, . . . , 29, respectively, and in the same way for the last row of z. Now we test the hypotheses that eachz k is normally distributed with mean zero and variance estimated fromz k for each sensor k ∈ {1, 2, 3}. The results of the hypothesis tests are shown in Table 2: We observe that the hypotheses should be accepted for sensors 1 and 2. For sensor 3 the hypotheses is formally rejected. However, one has to consider that in each experiment the magnitudes of the applied load forces could not be kept constant, they differ slightly in each measurement series. This is an unavoidable limiting factor in our experimental setup. Therefore, one cannot conclude that sensor 3 captures non-Gaussian experimental data. We assume nevertheless that all sensors return data that is normally and independently distributed.
As explained above, the 29 control forces could not be kept constant in all measurement series. However, the deviations are small and the computed standard deviations for the sensors include this effect. Thus, it is justifiable to work with the means from now on: Having the initial experimental data available, the aim is now to reduce the costs for obtaining these measurements, i.e., we want to reduce the number of involved sensors. We first restrict the set of parameters to p 1 = k 5 and p 2 = k 7 shown in Figure 2, while the remaining model parameters are fixed to values computed by a past parameter identification. The parameters k 5 and k 7 describe the axial stiffness of elastic components of the 3D Servo Press, see Section 5. Since the number of involved sensors must be greater or equal to the number of estimated parameters, compare Assumption 3, we want to choose two of the three sensors which together allow the most reliable parameter identification with smallest value for the E-criterion.
For comparison, we compute all three design criteria that are mentioned in Section 3 using optimal parameters obtained by the solution of the parameter identification problem. We see the following result: We observe that omitting the second sensor increases all design criteria by a factor of ≈ 10 20 compared to the initial sensor configuration, which is an indication that the covariance matrix became close to singular. A removal of the first sensor, though, increases the maximal eigenvalue and the volume of the confidence ellipsoid slightly. However, omitting the last sensor, i.e., measuring the vertical displacements in point B 0 , leads to the smallest maximal eigenvalue. We choose the E-criterion as design criterion for reasons explained in Section 3. Thus, we proceed with the optimal sensor combination 110, i.e., we choose to measure the vertical displacements in point D and the horizontal displacements in point F .
The experiments revealed a hysteresis behavior. This has a significant contribution to model uncertainty. We want to test whether our algorithm recognizes the best out of three different models used to describe the data. Therefore, we introduce the following models: Compare Section 5 for a description of the last two models. Figure 6 shows the different behavior of these models plotted together with the data.  For the assembly of M 3 we need real measurements to train the neural network. For this purpose we take three data series away from our available amount of data such that only n M = 4 measurement series remain for the application of our algorithm to detect model uncertainty. In order to make the following test strategy fair, we use only these four measurement series for all models alike since the more data series are involved the harder it is for the model to reproduce them all.
The standard deviation of the sensors is crucial for the size of the confidence ellipsoid of the parameter estimates. We fix these values to be the sum of the standard deviation of the repeated measurement process (repetition accuracy) plus the standard deviation of the linearization error and other internal errors as specified by the manufacturer of the sensor. Thus, we take the values σ 1 = 1.890e-05, σ 2 = 9.241e-06 and σ 3 = 5.184e-06 for the total accuracy of the sensors.
In order to investigate the validity of the models M 1 , M 2 and M 3 , we generate homogeneous calibration and validation sets. We first split the test set consisting of the four measurement series into a loading S l and unloading S u test set and consider each set separately. Thus, for the loading set S l , we again split the test set homogeneously into a calibration and a validation test set. Next, we test loading versus unloading and again split the set homogeneously for calibration S lu− c and validation S lu− v . Lastly, we test loading together with unloading and split the set homogeneously for calibration S lu+ c and validation S lu+ c , compare Table 3. Now, for each of the three models and for each of the four test scenarios we try the hypotheses whether the estimated parameters from the corresponding validation data set lie inside the confidence ellipsoid around the parameter estimate of the corresponding calibration data set. Table 4 lists the results. The last three columns show the minimal confidence level, i.e., the p-value of statistics, such that the parameters of the corresponding validation test set lie on the boundary of the confidence ellipsoid of the corresponding calibration test set.
For the usual choice of the confidence level α = 5%, when comparing the p-values, we clearly see that the model M 1 , which does not account for hysteresis, is rejected for all test scenarios. Thus, we detected that M 1 is uncertain. The data cannot be described by a simple linear model. We demand M 1 to be updated such as to correctly represent hysteresis. This is done in a first attempt by the Coulomb friction model, see equation (12). We thus perform our algorithm on M 2 . While this model seems to be able to describe loading and unloading separately it fails to describe both scenarios with the same set of parameters. Since hysteresis is a continuous effect, the discontinuous Coulomb friction model still fails to reproduce the fine nuances of the experimental data. Our proposed method is able to identify this in the third and fourth test scenario, where the model is clearly rejected since the p-value is close to zero. Hence, a neural network has been employed to improve the model output as mentioned in Section 5. The last column of Table 4 shows that model M 3 is well-suited to explain the hysteresis phenomenon.
To sum up we have seen that the method is able to detect model uncertainty and by suitable choice of the calibration and validation test sets, it can even help to identify particular aspects of the model that need to be improved.

Conclusion
In this paper we have seen how model uncertainty can be identified by combining the optimal design of experiments approach with parameter identification. Optimal design of experiments can be used to choose sensors which allow the parameter identification problem to be solved with minimal variance. Using the covariance matrix of the parameter estimate, we can then compute confidence ellipsoids which should include the parameter estimates with high probability. If some other test set leads to a solution of the parameter identification outside such a confidence ellipsoid, then we can conclude with high probability that not all of the measurements can be explained by the same model with the same set of parameters. We then introduced the 3D Servo Press as an application and demonstrated our approach on mathematical models of the press. This allowed us to show that two simple models were not valid, since specific effects like hysteresis are not sufficiently modeled. A sophisticated mirroring of the hysteresis effect led to a model for which no more model uncertainty could be detected. Thus, the goal of identifying model uncertainty for the 3D Servo Press was achieved.
It would be interesting to further test our method with models that depend on more than two parameters and to have a larger number of possible sensor locations available. Furthermore, instead of only choosing sensors once in the beginning, it would also be possible to re-solve the optimal design of experiments problem using the parameters identified through some first experiments to iteratively strengthen the quality of the parameter identifications, in a similar way as proposed by Körkel et al. [17].
The exact characterization of the vector-tensor and matrix-tensor products in equation (13) and (15) above is given by whereby ⊗ denotes the standard tensor product and with dropped dependencies on h = h(y(p), p, q) and E = E(y(p), p, q) for the sake of clarity. Altogether, H(Ω), J(Ω) and therefore also C(p, Ω) can be determined using the expressions given in (13)- (15).