Real-time parameter updating for nonlinear digital twins using inverse mapping models and transient-based features

In the context of digital twins, it is essential that a model gives an accurate description of the (controlled) dynamic behavior of a physical system during the system’s entire operational life. Therefore, model updating techniques are required that enable real-time updating of physically interpretable parameter values and are applicable to a wide range of (non-linear) dynamical systems. As traditional, iterative, parameter updating methods may be computationally too expensive for real-time updating, the inverse mapping parameter updating (IMPU) method is proposed as an alternative. For this method, ﬁrst, an artiﬁcial neural network (ANN) is trained ofﬂine using novel features of simulated transient response data. Then, in the online phase, this ANN maps, with little computational cost, a set of measured output response features to parameter estimates enabling real-time model updating. In this paper, various types of transient response features are introduced to update parameter values of nonlinear dynamical systems with increased computational efﬁciency and accuracy. To analyze the efﬁcacy of these features, the IMPU method is applied to a (simulated) nonlinear multibody system. It is shown that a


Introduction
As part of the fourth industrial revolution, Digital twins (DTs) hold the promise of increasing the efficiency, and lowering costs and throughput times of, among others, high-tech engineering systems and manufacturing processes [1], such as, e.g., wire bonder machines, lithography systems, or steel manufacturing plants. As the name 'digital twin' suggests, this potential is achieved by having a digital representation of a physical system.
Using this DT, the behavior of the 'physical twin' is predicted risk-free by simulating and analyzing the digital copy of the system [2]. In the design process of future generations of the system, this provides engineers with valuable insight into the dynamics of the cur-rent system, enabling improved design of the next generation. Additionally, measured behavior of the actual system can be compared to behavior predicted by a DT of the healthy system for the benefit of structural health monitoring (SHM) [3], enabling efficient maintenance management, minimizing maintenance cost, and leading to an increase in the lifetime of the system. Furthermore, during the operation of a system, a DT is able to suggest different operation modes or adapt mission planning if it senses that the current state of the system is not suited for an originally planned operation mode or mission [4]. This application can also be extended to using a dedicated DT for each machine in a fleet of akin machines, based on its most recent measured state, to account for machine-to-machine variations originating from, e.g., manufacturing tolerances. Availability of such dedicated DTs allows for adapting model-based (feedforward) controllers to enhance the performance of each individual machine.
Coined in 2014 by Grieves [5], the concept of digital twins is relatively novel and still faces multiple broad challenges, such as managing heterogeneous models across varying disciplines and combining models and big-data for, among others, fault-detection and performance optimization [6]. In this paper, these challenges are regarded as out of scope, however, and the assumption is made that a DT is given by a parametric (dynamic) model, derived using either: (1) analytical approaches based on first-principles (FP) such as Lagrange's equations of motion or Hamilton's principle, (2) (multiphysical) modeling and analysis software packages, such as finite element (FE) or multibody dynamics packages, and (3) software tools that combine techniques from multiple engineering disciplines, e.g., dynamic modeling and motion control, such as the Matlab-based Simulink or Simscape packages. Note that, although these three modeling approaches are all based on FPs, only the first approach provides direct access to the closed-form equations of motion (EoMS) for the user. Models from the second and third approach are also referred to as simulation models. But also in the latter two approaches, the user has access to the parameter values. Another challenge is related to the everpresent mismatch between measurements of a physical system and corresponding predictions of its DT. For the DT to reach its full potential, this mismatch should be minimal throughout the system's operational life [1,4,7]. This challenge of giving DTs life-long learning capabilities is referred to as autonomous model updating and is considered in this paper.
In Mottershead et al. [8,9], the aforementioned mismatch between measurement and model predictions is contributed to three distinct model error sources; incorrect model structure [10,11], discretization errors [12], and/or errors due to incorrect parameter values. This paper focuses on model parameter updating as to minimize the last error source. Therefore, model structure and discretization errors are assumed negligible. Furthermore, in this work, experimental errors caused by, e.g., sensing or calibration errors, are assumed to be corrected, negligible, or at least to be much smaller than modeling errors, which is the case in many situations.
Given the intended application in a general digital twin context, key requirements for an autonomous model updating procedure are defined as follows: (1) although control and SHM will not be considered in this paper yet, we aim to develop a model updating methodology applicable in the context of model-based control and/or SHM; therefore, updating should be performed in real-time, (2) parameter updates should be physically interpretable to enable insightful SHM, (3) updating should be applicable to (complex) nonlinear dynamical models, and (4) the updating method should be suitable for parametric models (not necessarily of dynamical systems) derived using FPs, FE/multibody software, or multidisciplinary software tools. The presentation of an autonomous model updating method simultaneously fulfilling these four requirements is the main contribution of the current paper. The corresponding technical contributions will be discussed in more detail later in this section.
Approaches for model parameter updating found in literature are divided in four types. The first type concerns updating methods originating from the field of structural dynamics, see, e.g., [9,13] for an overview. Although some exceptions exist, see [14][15][16][17], these updating methods are typically limited to linear(ized) systems [8,18]. Furthermore, these methods are based on expensive iterative sensitivity methods [16,17,[19][20][21][22][23][24][25] or physically non-interpretable direct methods [21,[26][27][28]. Consequently, these techniques do not meet the four requirements for model updating for DTs as mentioned above and are thus not regarded as a viable solution to the problem considered here.
Alternatively, system identification techniques are used to find a model that describes measured data. For an overview of system identification techniques see, for example, [29][30][31]. Note that, in contrast to model updating in the field of, typically linear(ized), structural dynamics, system identification has been developed for nonlinear systems [32,33]. However, due to the fact that in system identification generic model classes are fitted to data, the identified models are difficult to interpret from a physical perspective [34] and thus, do not satisfy the above requirements.
A third type of parameter updating techniques concerns filter-based (or observer-based) methods. Examples are the vanilla [35], extended [36,37], and unscented Kalman filter [38][39][40], Gaussian filters [41,42], and particle filters [43]. Although these approaches are often used for DTs, for complex systems (i.e., systems with many states and/or parameters), filter methods might suffer from relatively high computational costs, prohibiting online updating (especially if the required sampling frequency is relatively high) [43]. Furthermore, filter-based methods require an initial guess for the parameter estimates. If this information is not available or sudden changes to the parameter values occur, filter methods might end up in local minima. To avoid this, initial parameter values may be estimated using a coarse global parameter estimation method [38] at a significant cost of computation time. Finally, filter-based updating methods typically require direct accessibility to closed-form EoMs of the dynamical model for the user. Consequently, filter-based methods cannot be directly applied to simulation models (as obtained from, e.g., FE/multibody software packages, or Matlab-based packages as Simulink or Simscape) that may be used to constitute a digital twin.
The fourth approach to model updating employs machine learning (ML). Here, again, we can distinguish between, firstly, identification, i.e., identifying a system without using a FP reference model, and, secondly, updating (parameters of) a predefined reference model. An example of the former is the work of Li on identifying nonlinear normal modes using physicsintegrated deep learning [44].
An other example of ML-based identification is the work by Karniadakis et al. [45][46][47] on physicsinformed neural networks (PINNs) that is also adopted by other researchers [48]. In this method, coefficients of (partial) differential equations governing the measured system are learned using a neural network. In general, the learned coefficients of these (partial) differential equations, however, do not refer to directly interpretable physical quantities, e.g., masses, damp-ing, and stiffness constants. Additionally, to update a set of coefficients using measured data, a neural network needs to be trained, which can take significant computation time and hence, impede real-time updating. This holds especially for models consisting of a large number of states and/or parameters. An other ML-based identification methodology is the sparse identification of nonlinear dynamics (SINDy) algorithm developed by Brunton et al. [49][50][51][52]. Similar to PINNs, this method finds (a sparse set of) coefficients of differential equations, which may not correspond to physical quantities, again hindering physical interpretation. Note that this argument holds for most system identification methods (based on ML or not). A ML-based method more closely related to (physically interpretable) model updating is found in the work of Willcox et al. [53][54][55][56][57]. Here, a (FP) model is selected from a predefined library of models, representing systems with various fault types, using an optimal decision tree. Due to a prerequisite library of models, the 'updated' model is, however, limited to a discrete set of models that are manually defined by the user, and consequently is of limited accuracy. Other MLbased methodologies aim at finding, online, the (global) parameter values corresponding to a measured signal (in a continuous domain) using, for example, iterative genetic algorithms [58,59]. In this case, however, the (genetic) search over the parameter space takes considerable computational effort, rendering real-time updating impossible [38].
As an alternative, the inverse problem of estimating physically interpretable parameter values in a continuous domain using measurement data can be solved using an inverse mapping model (IMM). Here, the IMM directly maps, with small computational costs, a set of measured features, describing the (dynamical) behavior of a system, to a set of corresponding physically-interpretable parameter values of the reference model. In the literature, (trained) Artificial Neural Networks (ANNs) have been found suitable to constitute these IMMs due to their generic function approximation characteristics [18,[60][61][62][63]. Consequently, this methodology was coined the neural network updating method (NNUM) by Atalla [18]. Note that, since other generic function approximation algorithms may also be used to constitute IMMs, the authors prefer to refer to this methodology as the inverse mapping parameter updating (IMPU) method. For example, in recent work of the authors, Gaussian processes are used as IMMs, with the added benefit of enabling uncertainty quantification for the estimated parameter values [64]. The main qualitative advantage of this method with respect to previously discussed parameter updating methods is that online parameter updating is enabled even for large-scale systems with high sampling frequencies. This is enabled by shifting a significant majority of the computation time to an offline (training) phase, whereas the required online (inference) computations are extremely efficient and scale marginally with the complexity of the system. Although this IMPU methodology thus satisfies requirements 1 (real-time updating) and 2 (physically interpretable updating) for a broad range of systems and their DTs, this IMPU methodology has, to the best of the authors' knowledge, however, only been applied to linear(ized) dynamical systems. Also note that the IMPU method can be applied on parametric models derived using any of the aforementioned model derivation options (first-principles, FE/multibody software, or multi-disciplinary software packages), yielding valuable flexibility in the modeling of the DT. Therefore, the three main contributions of this work are: In preliminary work of the authors [65], the IMPU method has been applied to update a nonlinear dynamical model. However, this extended abstract only discussed the use of a single, straightforward timedomain-based feature type (samples of measured timeseries data as directly obtained from measurements). With respect to this extended abstract, one of the impor-tant novelties of this paper lies in the proposal of different transient-based (both time-domain and frequencydomain) feature types that aim at further improving the accuracy of the parameter estimates using engineering knowledge. The current paper expands upon [65] with the following contributions: (1) a full-fledged description of the IMPU method for nonlinear systems, (2) a more complete application and analysis on two simulated multibody systems, one of them including a comparison with the well-known Kalman filtering technique, (3) an elaborate analysis of generalization capabilities of the IMPU method, and (4) the definition, analysis, and comparison of various transient-based feature types.
The outline of this paper is as follows. In Sect. 2, a formal definition of the model parameter updating problem is given. Moreover, the usage of transientbased output features to update, online, interpretable parameter values of nonlinear models with low computational costs is introduced. Subsequently, in Sect. 3, different transient-based output features are introduced. In Sect. 4, the introduced technique is applied on a (simulated) rigid double beam model and results are analyzed. Additionally, this section analyzes the generalization capabilities of the aforementioned transientbased features. Then, in Sect. 5, the IMPU method is applied on a roll plane model, and a brief (quantitative) comparison is made with the application of an extended Kalman filter to update parameter values. Finally, conclusions are discussed and recommendations for future work are given in Sect. 6.

Problem definition
It is assumed that a DT of a dynamical system is available in the form of a (nonlinear) reference model of which the dynamics may be given in first-order State Space (SS) form:

x(t) = g(x(t), u(t), p), y(t) = h(x(t), u(t), p).
(1) Here x = q ,q ∈ R 2n DoF represents the state vector of the dynamical system that generally consists of translation and/or rotation degrees of freedom (DoFs), q, and their time derivatives,q, where n DoF represents the number of DoFs in the model. Furthermore, y ∈ R n out is a vector containing n out outputs and u ∈ R n in contains n in inputs. Finally, the parameter values (to be updated) are collected in p ∈ R n p , where n p denotes the number of updating parameters. Note that, in this work, physically interpretable updating parameters directly available from the EoMs, e.g., spring and damping constants (e.g., representing mechanical connections), are used. Although in this paper mechanical systems are considered, the presented methodology is, in fact, applicable to general nonlinear dynamic systems and therefore suited for the use in the DT field where models are often multi-disciplinary. Note that the introduced methodology is, in principle, applicable to any (traditional) dynamical model, i.e., models that are not necessarily used within the DT context (in which, among others, control, SHM, and visualization are also incorporated). The benefits of the proposed methodology are, however, especially relevant in the DT context, as discussed in Sect. 1.
Values of updating parameters initially used in the reference model may be inaccurate and/or may be subject to change over time with respect to a measured physical system, leading to a DT that may inaccurately represent its physical counterpart. Therefore, we define a unique unknown set of true parameter values,p ∈ R n p , that, when used to parameterize the reference model in (1), exactly replicates the physical system. Here, it is assumed that the structure of the reference model is correct, i.e., functions g(.) and h(.) fully capture the dynamics of the physical system, and that discretization errors are negligible. Therefore, these true dynamics, in which unbiased measurement noisew(t) ∈ R n out is present, are given bẏ

x(t) = g(x(t), u(t),p),
where the bars indicate that variables are related to the true/physical system.

Remark 1
Since, in practice, a correct model structure may prove difficult to achieve, ongoing research by the authors is concerned with (simultaneously) updating parameter values and model structure.
Since nonlinear models are evaluated, output features based on transient responses are required to compare the model and physical system. Therefore, an experiment (for proof of principle and/or a robust study of the method, also simulated experiments can be used) is performed with known excitation signals u(t) and initial conditionsx(t = 0) = x 0 . Measuring the output of the physical system in (2) yields discrete time series dataȳ i for the ith output signalȳ i (t): where N indicates the number of equidistant time points and i = 1, . . . , n out . Simulating this experimental scenario using the reference model in (1), parameterized with an arbitrary set of parameter values p, yields time series data y i ( p), defined similarly to (3). If p =p, there exists a mismatch between y i ( p) and y i , even if the measurement noisew in (2) is negligible. In the following, inverse mapping models will be used to reduce this mismatch by accurately estimatingp in a computationally fast manner.
To improve performance of the updating procedure, i.e., to make it computationally more efficient and/or to improve accuracy of the updating parameter estimates, a set of n ψ,i ≤ N featuresψ i ∈ R n ψ,i is defined that captures important characteristics of the time series data: where L represents a feature extraction function. Using these features, the goal is to estimate a set of parametersp (in real-time) that approximatesp by solvinĝ where with some abuse of notation, ψ i ( p) := ψ i (y i ( p)). Furthermore, E : R n ψ → R is a cost functional andψ contains the concatenated measured features of all n out output signals: such thatψ ∈ R n ψ , with n ψ = n out i=1 n ψ,i . Furthermore, note that ψ( p) is a set of features obtained equivalently using a simulation of the reference model (1) parameterized with the parameter set p.
Since the objective is to solve (5) in (near) real-time, conventional, iterative methodologies that require multiple model evaluations or simulations usually are computationally too inefficient for a DT application. Therefore, in the following section, an alternative method to update parameter values in (near) real-time using an IMM is presented that simultaneously meets the other requirements defined in Sect. 1, i.e., updating interpretable parameter values and updating nonlinear dynamical models.

Inverse mapping model
To solve the minimization problem in (5) with low computational effort, inverse mapping models are employed, as proposed in [60]. In contrast to forward, FP models, such as the reference model in (1), the input to an IMM is a set of (measured) output features and its output is the set of corresponding inferred (i.e., estimated) parameter valuesp. Mathematically, the IMM is represented by the function I : R n ψ → R n p : where it is emphasized that, in general, I(ψ) =p due to the influence of noise and the approximate nature of the IMM. To constitute the function I, a feed-forward artificial neural network (abbreviated as ANN and also known as the multilayer perceptron) is trained in an offline phase. These ANNs are renowned for their universal function approximation capabilities [66]. Moreover, in the online phase, inferring parameter values using ANNs is computationally efficient, as is also discussed in [59], where ANNs are used as surrogates for FP models. Note that, in contrast to parameter updating methods such as PINNs or filter-based techniques (see Sect. 1), known physics are not explicitly used in the ANN. However, since training data are generated using the FP model, the physics governing this model are implicitly incorporated in and learned by the ANN.

Remark 2
In the case that the model structure is incorrect, the IMM method will try to find suitable parameter values such that the measured features are mimicked as closely as possible (but inevitably with loss of accuracy) using the model parameterized with the updated parameter values, i.e., the updated model. Note that in this case, the identified parameter values do not fully represent their physical counterparts as intended in the reference model and should, therefore, be handled with care for, e.g., SHM.
Next, Sect. 2.2.1 gives a general introduction to feed-forward ANNs. Afterward, the offline and online phases of the IMPU method are discussed in Sects. 2.2.2 and 2.2.3, respectively.

Feed-forward ANN
As indicated by (7), a (feed-forward) ANN maps a vector containing featuresψ to a vector containing parameter valuesp. The ANN consists of multiple layers consisting of neurons that are interconnected between neighboring layers, see Fig. 1. In the input layer, each entry of the feature vectorψ[ j], j ∈ {1, . . . , n ψ }, is assigned its own neuron. In the output layer, the parameter vector p is used likewise. Between the inand output layer, n r hidden layers containing n (r ) z neurons in hidden layer r (r is an element of {1, . . . , n r }) are located. Here, n r and n (r ) z should be chosen large enough to capture the complexity of the inverse mapping 1 , but not too large to prevent, e.g., improper training and poor generalizations. It is therefore advisable to optimize these values by employing hyperparameter tuning [69].
The output value of the jth neuron in (hidden) layer r is given by where α(·) represents an activation function that allows for nonlinearities to be introduced in the mapping. Furthermore, w (r ) j ∈ R n (r −1) z represents a vector containing trainable weights between neuron j in layer r and all neurons in layer r − 1, and b (r ) j represents a trainable bias that allows for an affine input to α(·). Here, r = 0 and r = n r + 1 denote the input and output layer, respectively. Since the bias can also be regarded as a weight, from here on, the term 'weight' refers to both the weights w and the biases b. For a more detailed discussion of ANNs, the reader is referred to [66].

Offline phase: training of the ANN
In the offline phase, all weights in the ANN are trained such that the resulting ANN approximates the correct mapping and is consequently able to infer parameter values with high accuracy. To train the weights, supervised learning is employed for which n s simulated pairs (or samples) of parameter values and their corresponding (output) features are employed.
As shown schematically in Fig. 2, depicting the offline phase of the IMPU method, the first step is training/validation/testing data generation, where these data types are distinguished by three distinct color-coded lines. Note that, although the testing data are not used in the offline phase, in Sects. 4 and 5, these data are used to test performance of the trained ANN. In Fig. 2, data  ) refer to universal variables generation (area in light green) is performed for each sample s separately. First, a set of updating parameter values p s ∈ P ⊆ R n p is selected (based on, e.g., Latin HyperCube (LHC) sampling [70]). Here, P denotes a pre-defined admissible set of updating parameter values in which the true updating parameter values are expected to lie throughout the functional life of the physical system. Then, for each sample s, the forward reference model, i.e., (1), with p = p s , is used to perform a simulation an experiment, with predefined initial conditions x 0 and excitations u(t). This results in transient data y( p s ), from which a set of features ψ( p s ) is extracted (for which different feature types may be used) using (4). Note that the feature extraction is performed for each output signal individually, after which features extracted from the output signals are concatenated using (6). In Fig. 2, four feature types are extracted (see Sect. 3 for their definitions) from which features are selected 2 . Furthermore, it is remarked that we assume that the true parameter values will lie in P. Therefore, provided that we use sufficient training samples, all (nonlinear) responses in P are captured in the training data (note that in normalized response features also still a nonlinear dependency of response features and parameters is represented if it is present in the original data). Consequently, encountering such responses in the testing data/online will be recognizable for the trained ANN. Furthermore, note that the lack of hats and bars for p s and ψ( p s ) indicates that these variables correspond to training data. After data generation, to increase robustness of the ANN [71], the inputs and outputs of the training data, i.e., ψ( p s ) and p s for s ∈ {1, . . . , n s }, are normalized such that, per entry in both ψ( p s ) and p s , the minimum and maximum value across all n s training samples equal 0 and 1, respectively. For more details about normalization, see Appendix A.

Remark 3
To make the (training) data highly informative, the excitation signals u(t) employed to generate the output responses should have relevant frequency contents for the system under study, have a proper magnitude (to guarantee a good signal-to-noise ratio), excite relevant nonlinearities, and take the bounds of the physical system into consideration, see, e.g., [31].
Having generated the normalized training data, the weights of the ANN with a user-defined structure (i.e., hyperparameters such as activation functions and number of layers and nodes) are tuned. First, the associated weights are initialized with random values. Subsequently, during iterative application of n e epochs these weights are tuned, where in one epoch all training data are used once. At the start of each epoch, all training data are randomly divided over n b batches (subsets of training data containing multiple training samples). Then, for each sample s in a batch, the following sequence of steps is performed: 1. The ANN (parameterized using the current values for the weights) is used to estimate the parameter values given the features of sample s, i.e., I k ψ( p s )), where the breve indicates that the ANN weights are still being tuned. 2. The cost function value is calculated using p s and its current estimateȊ k ψ( p s )). In this paper, the squared error is used as the cost function: Note that, in contrast to (5), (9) minimizes the difference between the estimated and known training parameter values instead of features, due to the inverse nature of the mapping. 3. Using backpropagation, the gradients of with respect to all weights are calculated.
Using the gradients obtained in step 3 for all samples in the batch, ANN weights are updated such that the mean squared error (MSE) in the parameter values over all samples in the batch is minimized. After this update, steps 1-3 are repeated for a new batch, until all batches have been processed. Then, the validation data are used to calculate the validation loss (MSE in the parameter values). Here, the validation data are obtained in an equivalent manner as the training data (with distinctively sampled parameter values), see Fig. 2, and used to evaluate performance of the ANN without being biased towards the training data. If the validation loss has not decreased for n es consecutive epochs or all n e epochs have been processed, training is stopped and the ANN with the lowest validation loss is saved; otherwise, a new epoch is started. This is referred to as patiencebased early stopping and is employed to prevent overfitting [72]. For more information about backpropaga-tion and ANN training in general, the reader is referred to [66].

Online phase
After having trained the ANN offline with simulated data, leading to the inverse trained mapping in (7), in the online phase, this trained ANN is used to infer parameter valuesp from measured data using (7), where the hat indicates that this is an inferred estimate. As shown in Fig. 3, this is achieved by feeding normalized output featuresψ nor to the ANN see (7). Here,ψ nor is obtained by normalizing (similar to normalization of the training data) the features extracted from a measurement of a physical system. Recall that an overbar is used to indicate that these features relate to the physical system. Since the ANN has been trained using normalized features and parameter values to increase robustness, in the online phase, again normalized features and parameter values are utilized. To maintain consistency, normalization in the online phase is performed using bounds ψ min and ψ max as obtained during training, see Appendix A.
The inferred (or updated) normalized physical parameter values are then denormalized and substituted in the parameterized reference model (1) resulting in the updated model such that the DT better mimics the true system. Furthermore, although the model constituting a DT should in general be computationally efficient (e.g., for control purposes), the IMPU method does not require online evaluation of the parameterized model (1), since in the online phase only measurements are used and the ANN is evaluated for the measured features to generate a parameter estimate. Defining parametric models with low online model evaluation time, i.e., real-time simulation, which can be achieved using, e.g., parametric model order reduction techniques is therefore regarded as out of scope for this paper. Due to the use of transient data to accommodate for nonlinear systems, in contrast to (modal) characteristics of linear systems as previously used in [60,62,63], the measured experiment in the online phase should be identical to the experiments simulated in the offline phase. Therefore, the initial conditions and excitation signals used in both phases are required to be equal and known in advance. Consequently, this updating method is only suited for applications with known forcing which is relevant, e.g., in high-performance motion stage systems with known reference trajectory inputs. Remark 4 In a practical setting, a homing procedure should be used to achieve the desired initial conditions. In applications where, during operation of the physical system, a certain action needs to be performed repetitively, the corresponding initial conditions and excitation signals can be used to constitute the aforementioned experiment, enabling on-the-fly parameter updating.

Definition of transient-based output features
In this section, different definitions for the (not yet normalized) feature vector ψ i ( p), containing a set of n ψ (depending on the choice of the user) features that are used as inputs to the ANN, are given. Here, these features are extracted from sampled transient data y i ( p) as obtained from a simulated (for a model parameterized with p) or measured experiment.
In other words, different definitions for function L in (4) are introduced. Here, the aim is to define highly informative features, i.e., features that are sensitive to variations in output signals resulting from updating parameter value variations. We aim to do so by exploiting engineering knowledge such that accuracy and computational efficiency of the IMPU method are improved. For the remainder of this section, we use the notation for the offline simulated training data (without overbar). For the validation/test and online measured data, the discussed feature extraction methodologies are applied identically.
The introduced feature types are divided in time-and frequency-domain features, which will be discussed in the coming Sects. 3.1 and 3.2, respectively.

Time-domain features
Here, feature types are introduced that directly use the sampled transient data y. In Figs. 2 and 3, these feature types are also indicated in the two left 'extract features' blocks in the feature extraction part (marked in yellow).

Time samples
Time Samples (TS) features simply refer to the case in which a selection of n TS entries of the sampled output signal are used as features: with v TS ∈ N n TS denoting a vector containing the indices of the selected entries in y. In this work, v TS is chosen such that the feature vector consists of time samples measured at equidistant moments in time. Note that, as the initial conditions are identical for all simulations and experiments, the measurements at initial time t = 0 are not included in the feature vector. Using TS features, the ANN is provided with the raw time series data as directly obtained from the measurements. Therefore, without exploiting engineering knowledge, the ANN has to implicitly extract 'hidden features of the underlying system' from the time series data in its hidden layer(s). In contrast, we will later introduce feature extraction methods that exploit engineering knowledge to aid the ANN by directly providing more informative data about the system. Specifically, we aspire to extract (smaller) sets of features that have higher information density, thereby improving accuracy and computational efficiency of the offline training of the ANN and the online inference of parameter estimates using the ANN.

Time extrema
For Time extrema (TE) features, we use the magnitude at and temporal location of a selection of n TE local extrema in the time domain signal: Here, t = [t 1 , . . . , t N ] consists of all time instances at which the measured signal is sampled, and v TE ∈ N n TE contains the (ascending) indices of the first n TE extrema that are identified using some extrema-finding algorithm. Note here that local extrema, i.e., local minima and local maxima, are defined as the local maxima/minima near the peaks and/or troughs of the signals, such that, despite the influence of noise, only one extremum is found per peak/trough. In practice, this can be achieved by using various options of the findpeaks algorithm in Matlab such as the minimum peak prominence (a user-specified minimum change in signal value required on either side of the peak before the signal attains a value beyond the peak itself [73]). As, for different samples obtained for distinct parameter values sets, the measured signals may not have the same number of extrema, n TE should be equal to or smaller than the lowest number of extrema encountered in the output signal across all training samples. An advantage of TE features is that, especially if the transient signal is similar to a free response, the height and temporal location of extrema in the transient signal are related to dynamic characteristics such as amplitude decay of eigenmodes and eigenfrequencies, respectively. For example, the heights of consecutive peaks in an impulse response (which decrease in magnitude) are indicative about the amount of damping in a system. Consequently, only few of these features can already contain a lot of information about the transient response that is closely related to physical parameters such as mass, damping, and stiffness parameters. Therefore, compared to, e.g., TS features, the ANN only needs to learn a relatively simple IMM, potentially boosting its accuracy and simultaneously lowering training and inference time due to the smaller set of features. A disadvantage, however, is that the temporal location of an extremum is relatively sensitive to noise, potentially causing the true location of the extremum to be identified at one of the neighboring entries in t. Due to the discrete nature of t, this may cause relatively large variations in the resulting feature values.

Frequency-domain features
Alternatively, features can be extracted from frequency domain data Y ∈ C N /2+1 , where N is an even integer such that we obtain N /2 − 1 complex numbers and 2 real numbers (for f = 0 and half the Nyquist frequency). For this, as indicated in Figs. 2 and 3, first, the measured/simulated time domain data y are transformed to the frequency domain: where t k denotes timestep k (where k = 1, . . . , N ), f j denotes frequency bin j (where j = 1, . . . , N /2 + 1), and F represents the Discrete Fourier Transform (DFT); in practice, usually the Fast Fourier Transform (FFT) algorithm is used.
Since the discrete frequency domain data are represented in a set of complex numbers, there are two options to pass this data to the ANN (which requires real valued data): 1) using the Magnitude and Phase information (MP): see (13), or 2) using the Real and Imaginary components of the frequency domain data (RI): see (14): where |·|, (·), (·), and (·), represent the magnitude, phase, real part, and imaginary part of a complex number, respectively. Since the phase and imaginary part at f 1 = 0 and at the folding frequency (half of the sample frequency) of real valued time signals are always zero, these numbers do not convey any information and are therefore excluded from the feature vector.
Although the discrete frequency domain signal contains, by definition, exactly the same information as the discrete time domain signal, the relevant information may be more condensed in the frequency domain signal. For example, frequencies at which the measured signal has only small magnitudes are likely less informative about the system's dynamics and response than frequencies with high magnitudes. Indices of the frequency bins that are deemed informative, e.g., located near eigenfrequencies of the linearized system or as encountered in exploratory experiments, are therefore collected in the index set v FT ∈ N n FT . Only using the features belonging to these more relevant frequencies bins, referred to as Frequency Bins of Interest (FBoIs), leads to a more compact set of features, raising potential for improved performance of the ANN. For example, at bins corresponding to high frequencies, the signals may be dominated by noise. Excluding these bins from v FT can improve the inverse mapping learned by the ANN. In this paper, FT-based features that are limited to the information in some FBoIs are indicated by the suffix (FBoI).

Illustrative case study on double beam system
In this section, the inverse mapping parameter updating method and various feature types are applied on a case study of a nonlinear multibody rigid double beam (RDB) system. Here, we employ simulated experiments, where noise will be added to the response signals. Firstly   The translational and rotational joint have stiffness and damping constants k 1 and d 1 , and k 2 and d 2 , respectively. The rest length/rotation of springs k 1 and k 2 are l k 1 and θ k 2 , respectively. Furthermore, the system is subject to gravity, with gravitational acceleration constant g. For reference, the EoMs for this system are presented in Appendix B.1.
In the following case study, the values of parameters d 1 , d 2 , k 1 , and k 2 of the model/DT are assumed to be uncertain and, therefore, will be updated, i.e., p = [d 1 , d 2 , k 1 , k 2 ] . Here, we assume that the reference model is parameterized with an initial guess for these updating parameters p c . Additionally, for this case study, it is assumed that the true parameter values may vary, with equal amounts in both the negative and positive direction, around this initial guess. Consequently, the upper and lower bounds of the updating parameter space P, tabulated in Table 1, are chosen such that p c lies in the center of P. The values for all other parameters remain constant and are given in Table 2.
Beams 1 and 2 are excited, in open-loop, by force u 1 (t) and torque u 2 (t), respectively: Here impulse-like excitations are used since these yield free vibration responses that vary significantly when such that information with respect to the static equilibrium of the system is also obtained. In future work, the excitation signals may be optimized to increase the sensitivity of the response with respect to parameter changes, see, e.g., [74,75]. Furthermore, as initial conditions, the static equilibrium of the system is used, which is parameterized with updating parameter values in the center of the parameter space P (referred to as p c ), i.e., y(t = 0) = 0.05 −0.1898 andẏ(t = 0) = 0 0 .
Using numerical integration 3 , the system is simulated for a time interval of 2 s, in which N = 256 equidistant samples of y 1 (t k ), and y 2 (t k ), k ∈ {1, . . . , N }, are obtained as outputs. Here, the length of the simulated experiment is chosen long enough such that sufficient, relevant dynamics is captured, but not too long to prevent excessive data generation times 4 . Furthermore, N is chosen as a power of 2 to make the Fast Fourier Transform (FFT) algorithm as efficient as possible. Note that, in practice, the simulated signal may require down-sampling such that it matches the measurement sampling rate. To mimic the (simulated) measurements, zero-mean Gaussian noise is added to these outputs with standard deviations σ y 1 = 2 × 10 −4 m (approximately 0.5% of maximum magnitude of y 1 ) and σ y 2 = 2 × 10 −2 rad (approximately 2% of maximum magnitude of y 2 ). In practice, these noise proper-ties should be chosen corresponding to the (expected) noise observed in measurements performed on the physical system. Note that, as explained in [76], the addition of noise to training data improves robustness of the ANN. Examples of possible responses using the above experiment design are provided in Sect. 4.1.2.
To provide the reader with some insight into the dynamic characteristics of the system, properties of the linearized system for five different combinations of updating parameter values, also referred to as parameter cases and tabulated in Table 3, are presented. Note that parameter case 1 refers to p c and the values of cases 2, 3, 4, and 5 are located at the boundaries of P, as defined in Table 1, such that the 'most extreme' behavior of the system is presented here. Each model, parameterized using one of the parameter cases, is linearized around its static equilibrium point. The resulting eigenvalues (with positive imaginary part) λ i , (damped) eigenfrequencies f i,eig , and their corresponding, modal, damping coefficients ξ i , with i = 1, 2 indicating the first and second eigenmodes, are listed in Table 3. More information about the linearized systems, e.g., eigenmodes and Frequency Response Functions (FRFs), is provided in Appendix C.

Output features for exploratory simulations
In this section, exploratory simulation results are shown for the five updating parameter cases, as defined in Table 3. These simulations are performed according to the simulated experiment design (excitation signals and initial conditions) introduced in Sect. 4.1.1. Furthermore, features are extracted, conform the theory in Sect. 3, and plotted for additional insight.
Some settings to extract the features are as follows. For the time samples features, two variants are used: 1) dense with n ψ = N = 256, and 2) sparse with n ψ = 34. For the time extrema features, the minimum peak prominence mpp(y) is used for output signal y (as used in the timepeaks Matlab function [73]) and is defined as where y denotes a sampled output signal. As a result, a peak is only defined if the value of y between that peak and the closest peak with a higher magnitude drops by a minimal amount of mpp [73]. Note that the definition of mpp(y) should be tailored towards the encountered responses using engineering insight. Due to the different dominating frequencies of both output signals, the number of peaks in output signals y 1 (t) and y 2 (t) are n TE,1 = 1 and n TE,2 = 5, respectively. Furthermore, the FBoIs are defined as the frequency bins within the frequency interval from 0 Hz to 4 Hz, such that the (damped) eigenfrequencies of the linearized models, as given in Table 3, lie within this range. In Figs

Artificial neural network
As mentioned in Sect. 2.2, the IMM is constituted by a (trained) feed-forward ANN, see Fig. 1. In this case study, a 4-layer fully connected feed-forward ANN, i.e., n r = 2, is used of which the number of neurons and activation function per layer are listed in Table 4. Here, the Rectified Linear Unit activation function, defined as is abbreviated as ReLU. In the output layer, linear activation functions (α(a) = a) are used to enable inferring of parameters outside the training parameter space P, which is, if the bounds of P are used for normalization, impossible using, e.g., Sigmoid activation functions.
Note that a single multi-output ANN is used as this is typically more efficient than having a single-output ANN per updating parameter [77]. The ANN is trained using a collection of n s = 1000 training samples, of which the corresponding updating parameter values are sampled from P using a Latin HyperCube. Optimization of the weights and biases of all neurons is done using the Keras Adams algorithm [78], where the mean squared error is minimized over n e = 200 epochs in 20 batches of n b = 50 samples. Additionally, patience-based early stopping is included, meaning that training is stopped if the validation loss, calculated for 1000 validation samples, has not decreased in n es = 40 successive epochs. Here, the validation samples are randomly selected from P using a uniform probability distribution. Note that the ANN hyperparameters, i.e., the ANN structure and training settings, used here have been chosen empirically. Since ANN hyperparameter tuning is not trivial, multiple ANNs with different combinations of empirically chosen hyperparameters have been trained and evaluated on the test data. The ANN presented here was chosen due to a favorable combination of parameter estimation accuracy and complexity (number of neurons and layers, and training time). This being said, considering the large hyperparameter space, these hyperparameters are not expected to be optimal. Therefore, optimization of the ANN hyperparameters is part of current research by the authors.
Using a single core 2.60 GHz CPU, the offline generation of the training data, i.e., simulating the reference model for all n s = 1000 training samples, takes

Inferred parameter accuracy
To analyze the accuracy of the inferred parameter values,p, the reference model is simulated for ns = 1000 test samples. These test samples are obtained similarly to the training and validation samples, see Fig. 2. Note that, since, for testing, measurements are mimicked, we employ the overbar notation to indicate (the number of) test samples. To ensure an unbiased evaluation of the ANN performance, the test samples should be distinct from the training and validation samples. Therefore, for each test samples, the model is parameterized with a set of true parametersp that are randomly selected in P, again based on a uniform probability distribution. To mimic actual measurements, noise is added to the simulated outputs with equal properties as used to generate the training data. The features extracted from these simulated measurements are denoted asψs and fed into the ANN, see (7), to obtainps. Accuracy of the inferred parameter values of test samples is evaluated based on the relative error, εs ∈ where denotes the element-wise division operator. Before comparing results for different output features, we first focus on the case in which we only use the RI (FBoI) features as inputs to the ANN. Figure 9 displays the distribution of these relative errors (in %) per updating parameter in a histogram. Indeed, the ANN is able to infer parameter values from the provided features. The accuracy of the estimates varies It is remarked that, despite a relatively larger contribution of noise in y 2 compared to y 1 (as mentioned in Sect. 4.1.1), the parameters most closely related to this signal, i.e., k 2 and d 2 , are estimated with slightly higher accuracy than k 1 and d 1 .
Here, please note that, as is explained in Appendix C, the responses of the system are largely uncoupled. The observation that k 2 and d 2 are estimated more accurately is most likely a result of the relatively low amount of oscillations in signal y 1 compared to y 2 , as seen in Figs. 5 and 8. Nevertheless, even for d 1 , the worst encountered relative error is only approximately 10%, whereas the majority lies below 5%. Therefore, it can be concluded that the parameter values are inferred with good accuracy. Additionally, in Fig. 10, the (absolute) relative error for each test sample is plotted (in %) in the subspace of P spanned by d 1 and d 2 . Note that the values of k 1 and k 2 vary randomly, within P, for each plotted sample. This figure shows that accuracy at or near the edges of the d 1 -d 2 subspace is slightly worse than in the middle. Specifically, for d 1 and d 2 , there are relatively many high relative errors at the left and bottom edge, respectively. This is explained by the fact that, in these regions, there are obviously less training samples (actually, none beyond the bounds of the training space) that allow the ANN to properly learn the mapping between ψ and p in this part of the parameter space. To alleviate this phenomenon, the training space may be extended beyond the parameter values that are expected to be encountered on the physical system.
Having analyzed the results for the RI (FBoI) features only, in the following, different output feature types are compared. For ease of comparison, the performance of the IMPU method (for any feature type) is summarized using three relative error metrics, being the bias of the relative error, μ ε , the (unbiased sample [79]) standard deviation, σ ε , and the mean absolute relative error, μ |ε| : where ⊗ denotes the element-wise multiplication operator.
In Table 5, these metrics are listed (in %) per parameter for the output feature types introduced in Sect. 3.
Firstly, it is observed that, in terms of σ ε and μ |ε| , the dense and sparse time samples feature vectors perform roughly equally well for the stiffness parameters. However, the sparse variant performs worse for the damping parameters. This might be attributed to the fact that the dense TS features are more likely to contain the extremum values of the transient signal which provide valuable information about damping, see Fig. 5. Secondly, the results obtained using time extrema features are comparatively inaccurate for d 1 and k 1 , which can be ascribed to a low number of features 5 and a relatively large influence of noise. As there are more TE features for signal y 2 (t), we see that d 2 and k 2 are estimated with reasonable accuracy despite the low number of features due to the smart choice of features.
Thirdly, for the frequency-domain-based features, omitting the (uninformative and noisy) features outside the FBoIs does significantly improve performance with respect to the frequency-domain-based features where all frequency bins are used, as was hypothesized in Sect. 3.2. Fourthly, MP features are less accurate than RI features, both when using features in all frequency bins, or those restricted to the FBoIs. As this difference holds primarily for d 1 and k 1 , to explain this observation, we focus our attention on the first Fouriertransformed output signal Y 1 (recall the quite uncoupled nature of this particular system, see Appendix C). This difference in performance may be explained by the fact that some of the (normalized) MP features do not deviate significantly for distinct parameter values combinations. To illustrate this, in Figs. 11 and 12, the MP   (FBoI) and RI (FBoI) features are normalized between 0 and 1 such that the maximum and minimum feature values across the five parameter cases defined in Table 3 equal one and zero, respectively. In Fig. 11, it is observed that the normalized features originating from Y 1 , i.e., part of the (normalized) MP features, are relatively similar for parameter combinations 2 and 4, and combinations 3 and 5. Therefore, the ANN has difficulty in distinguishing between each of these similar parameter cases, e.g., the normalized features for parameter combinations 2 and 4 are both approximately 1. In contrast, this is only the case to a lesser extent for normalized features originating from (Y 1 ) in Fig. 12.
Consequently, distinguishing between these parameter cases when using MP features is mostly done based on the |Y 1 | features, whereas for the RI features, both (Y 1 ) and (Y 1 ) can distinguish between these cases. Finally, comparing the TS features with the RI (FBoI) features, we see that the RI (FBoI) features yield better performance than the sparse set of TS features (with an identical number of features). Alternatively, the dense TS features yield comparable performance at the cost of 15 times as many features. Note, however, that still the RI (FBoI) features perform, although slightly, better. This, again, leads to the conclusion that the individual RI (FBoI) features have relatively high informativeness.
It should be remarked that the conclusions drawn here are found for a specific system and (simulated) experiment design. Different systems or designs, e.g., systems with higher modal density, (highly) coupled DoFs, and insufficient actuation will influence accuracy of the individual features types. For example, time extrema features are expected to be less profitable to estimate damping properties related to higher-order (or less dominant) modes. Therefore, automated feature selection techniques have high potential to select the most informative features for any system and experiment design [80].
Nevertheless, some general conclusions are drawn. Firstly, by using transient-based output response features as input to an IMM, physically interpretable parameters of nonlinear dynamical models can be updated online with little computational effort while achieving acceptable levels of accuracy. Furthermore, it is shown that, at least for mechanical systems, defining features using engineering knowledge can increase performance of the ANN. For example, only using frequency-domain-based features in the frequency ranges that are relevant for the dynamics of the system can result in higher accuracy of the inferred updating parameter values.

Updated model accuracy
As shown in the previous section, the IMPU method is able to infer parameter values with mean absolute relative errors of approximately 2% or lower. However, one of the main purposes of parameter updating is to obtain a more accurate model, such that the Digital Twin accurately represents the true physical system. Therefore, the accuracy of the updated model, i.e., the reference model parametrized with the inferred updating parameter values, is briefly discussed here.
For this purpose, a single test sample is selected of which the true parameter estimatesp are listed in Table  6. Simulating a (noise injected) measurement for the system parameterized with these true parameter values  Fig. 13. For this simulation, we have used the same (simulated) experiment settings as discussed in Sect. 4.1.1. Extracting RI (FBoI) features from this signal and using the computationally cheap IMPU method to infer 'updated' parameter valuesp yields the values listed in Table 6. Simulating the updated DT, i.e., the reference model parameterized with these updated parameter values, yields the dashed orange line in Fig. 13. Here, we see that, for both DoFs, there is high agreement in the measured and updated response. Furthermore, the response of the reference model (with an initial guess for the updating parameter values p = p c ) is plotted (dashed purple line) clearly demonstrating the benefit of updating the model parameters.
Due to the retained model structure, the updated model is also accurate for (simulated) experiments with different initial conditions and excitation signals (as may be expected). For example, Fig. 14 shows the response of the, among others, updated model when using a harmonic excitation signal, i.e., u 1 (t k ) = 0.15 cos(ω 1 ( p c )t k ) and u 2 (t k ) = 0.01 cos(ω 2 ( p c )t k ), and deviating initial conditions, i.e., y(t 1 = 0) = 0.055 0.5 andẏ(t 1 = 0) = −0.01 2 . Here, ω 1 ( p c ) and ω 2 ( p c ) represent the damped angular eigenfrequencies of the linearized (reference) model parameterized with p = p c . This shows that a Digital Twin that is updated using some updating experiment can be employed to simulate operating scenarios that have not been used for the updating procedure.

Generalization capabilities
Since the IMM is learned using training data, it is important to obtain insight in how accurate the IMM is for data it has not yet seen before. Note that the analysis in Sect. 4.2.1, which is similar to the analysis in [18], hardly gives insight in the generalization capabilities of the ANN. Therefore, in this section, generalization is analyzed by 1) varying the number of training samples, and 2) excluding parts of the training samples.

Number of training samples
The impact of the number of training samples, n s , on the accuracy of the inferred parameters is analyzed, such that insight in the generalization capabilities of the discussed output features is obtained. Here, the training and testing space are equivalent and as defined in Table  1. The mean absolute relative error for each updating parameter has been plotted versus n s in Fig. 15, for the different output feature types. In general, for n s < 100, the accuracy of the parameter estimates is observed to be relatively low, showing that these low amounts of training samples do not suffice to properly learn the inverse relation between output features and parameter values. It should be noted, however, that the updating parameter space is, in this case, four-dimensional. Consequently, such few training samples do not sufficiently cover P and hence, performance is poor. Overall, it seems that, for almost all analyzed values of n s , the RI (FBoI) features perform best or approximately equally as good as the next best alternative(s). Furthermore, when using all frequency bins for the MP and RI features, significantly more training samples are required to achieve the same accu-

Excluded training zones
In practice, the true parameters of the DT may lie (far) outside the expected parameter space, e.g., when the system is (suddenly) badly damaged. In this case, the IMPU method should recognize this such that the digital twin can be updated for the benefit of SHM.

Remark 5
The IMPU method can theoretically detect both gradual and sudden degradation/changes to the true parameter values. In case of gradual degradation/changes, SHM using the updated DT can be employed to plan timely maintenance. However, sudden degradation/changes might already damage the system before this change has been detected by the (inverse mapping) parameter updating method. This will, however, be a problem for any model updating strategy.
Therefore, in this section, generalization is analyzed by deliberately omitting parts of the training space and testing how accurate test samples are in distinct 'testzones.' In the following, changes are only made to the subspace of P spanned by d 1 and d 2 . Specifically, for  16 Absolute relative error (in %) of each test sample |ε| (see (18) for definition of ε) in the d 1 -d 2 subspace of P per updating parameter in the RDB use case, as obtained by using the RI (FBoI) output features with n s = 645. Red plus signs ( + ) indicate locations of training data samples. Testzones are indicated by the color-coded rectangles as follows: zone 1 ( ), zone 2 ( ), zone 3 ( ), zone 4 ( ), zone 5 ( ), and zone 6 ( ). Note that all absolute relative errors larger than 10% are colored orange both d 1 and d 2 , the test subspace is extended on both sides of the training space with 50% of its width. Furthermore, for both damping parameters, a band, centered in the d 1 -d 2 subspace, with a width of 20% of the training subspace width is excluded from the training data. In Fig. 16, all test samples are allocated to six testzones, indicated by a color-coded box, where all retained training samples are located in testzone 1. A higher number of a testzone indicates decreasing number and proximity of adjacent boxes with training data (or zone 1 boxes). For example, each box in zone 2 shares two edges with adjacent zone 1 boxes and for zone 6 only one of the corners of each box coincides with the corner of a zone 1 box.
To acquire the training data, the LHC sampled training data used in Sect. 4 are reused, of which the samples located in testzones 2 and 3 are excluded. Consequently, as we started with 1000 training samples (as used for Fig. 16 To analyze the effect of these changes in the training and testing parameter subspaces, firstly we focus on the case with n s = 645 obtained using RI (FBoI) features. The absolute relative error per parameter is plotted for each test sample in the d 1 -d 2 subspace in Fig. 16. In the upper left plot of Fig. 16, we observe that for low and high values of d 1 (near the edges of the test subspace), the absolute relative error of d 1 increases. This is simply explained by the fact that the true values for d 1 are far away from the training data, forcing the ANN to extrapolate with respect to d 1 . Similar observations are made for d 2 . Please also note the similarity of this finding with the observation that test samples near the edge of the training space yield relatively poor results, as discussed in Sect. 4.2.1. In contrast, the accuracy of k 1 and k 2 is hardly affected by changes in the training and test d 1 -d 2 subspace (apart from some testsamples in zone 6), as is also seen in Fig. 17. This is explained by the fact that alterations in training and testing data are restricted to the d 1 -d 2 subspace.  Table 7, the mean absolute relative errors μ |ε| for the test samples in testzone 1 are listed (as calculated using (21), but summed over the test samples in testzone 1 only), for the different feature types. Since the density of training samples per test sample is roughly equivalent, we can compare these metrics to those found in Table 5. As seen, proportionate errors are indeed found: due to the lower number of training samples in total, μ |ε| in Table 7 is, however, typically somewhat higher.
In Fig. 17, mean absolute relative errors μ |ε| for the different testzones (as calculated using (21) but summed over the test samples per evaluated testzone), for both the RI and RI (FBoI) features, are plotted as a function of n s . Focusing on the μ |ε| for d 1 and d 2 , we observe that, for most values of n s , testzone 6 is least accurate, followed, in order, by zones 5, 4, 1, 2, and 3, with testzone 3 being the most accurate. Although this result might, initially, be counterintuitive (one would expect testzone 1 to be most accurate). This observation is, again, closely connected to an aforementioned observation in Sect. 4.2.1. Namely, the accuracy within a zone is affected by the relative number of test sam- For (approximately) n s > 2500, this order of accuracy does not hold anymore, since then this phenomenon is alleviated due to the large presence of training samples overall. Furthermore, as is to be expected, Fig. 17 shows the trend that μ |ε| decreases with n s for all testzones. Consequently, as each application requires a different level of accuracy, the number of training samples may be used to tune the accuracy of the IMPU method in an a posteriori manner. Note that, although μ |ε| seems to approach 0 for large n s , the influence of measurement noise and the inexact nature of the trained IMM will always cause some error in the inferred parameter values.
Additionally, comparing the results obtained using the RI and RI (FBoI) features, we see that, again, the RI (FBoI) features outperform the full frequency content RI features, where the latter requires relatively many training samples to yield comparable accuracy. Note that, for, e.g., n s = 645, the RI (FBoI) features are better at extrapolating outside the outer ranges of the training space, i.e., zones 4, 5, and 6, than the RI features are at interpolating, i.e., zones 1, 2, and 3. Furthermore, the qualitative behavior of the results, i.e., high errors for low n s that decrease for increasing values of n s , is similar for RI features and RI (FBoI) features. To conclude, in general, if sufficient training data are used, even test samples in zones outside the training subspace can be approximated relatively accurately when using RI (FBoI). For example, the mean absolute relative errors, calculated for testzones 1, 3, and 6, are listed for d 1 and d 2 in Tables 8 and 9, respectively. Here, we observe that the values of μ |ε| in testzone 6 are, with some exceptions for the TE and MP features, approximately two times higher than the values of μ |ε| in testzones 1 and 3. Given the fact that only few training samples lie somewhat close to testzone 6, this is an acceptable result. Due to the already high relative errors for the TE, MP, and RI features in testzones 1 and 3, their relative errors in testzone 6 are, however, regarded as too high. Nevertheless, as estimates will worsen if their true values are farther from the training space, it is advised to treat parameter estimates outside the training space with caution. Incorrect parameter values that are regarded as true, lead to an inaccurate DT which may lead to faulty decisions. For this reason, explicitly quantifying the uncertainty in the inferred parameter estimates is part of ongoing research of the authors. Since the ANN does extrapolate parameter values, one can, however, often recognize that the true system lies outside the training space.

Case study on roll plane model
In this section, the IMPU method is applied on a different use case. The analyzed model is a four-DoF Roll Plane Model (RPM) of a vehicle with nonlinear suspension driving over a speed bump with constant velocity, as considered in [37], see Fig. 18. The EoMs for this   Table 10. The two parameters considered for updating are k t1 and k t2 (representing the stiffnesses of the tires), for both of which P ranges from 80 to 120% of their nominal values of 96319.76 N/m. As outputs, the relative displacement and relative velocity between the roll bar and both tires, i.e., y 1 = x 1 − x 3 , y 2 = x 2 − x 4 , y 3 =ẋ 1 −ẋ 3 , and y 4 =ẋ 2 −ẋ 4 , are measured every 0.3 s for a duration of 3 s. To simulate a real measurement, zero-mean Gaussian noise with a standard deviation of 1% of the value of y i (t k ) 6 is added to y i (t k ). As inputs, prescribed displacements of the road surface profile u 1 (t) and u 2 (t) are used, see Appendix B.2. Note that all these settings are equivalent to those in [37].
An ANN (with structure listed in Table 11) is trained for various numbers of training samples and 1000 validation samples. As features, TS, MP, and RI features   Fig. 19, where we see that a larger n s leads to a smaller μ |ε| . Furthermore, the different feature types show comparable results. Note that, in this case, there is no point in defining FBoIs due to the low number of time samples and high frequency density of the Fourier transformed output responses. Additionally, as [37] used an extended Kalman filter (EKF) to update the parameter values, a comparison is made between the accuracy of the EKF and IMPU method. In [37], the accuracy of only a single parameter estimate is evaluated (with true parameter values p ref = k t1,refkt2,ref = 100800 88855 N/m), of which the absolute relative error is plotted in Fig. 19. To compare, individual IMPU-based estimates ofp ref are indicated in Fig. 19 as well. Here, we observe that, already for a relatively low number of training samples (order 100), the IMPU method is competitive with and can even outperform (for sufficiently high n s ) the EKF in terms of accuracy. Most likely, this is caused by the fact that the IMPU method simultaneously processes multiple data points (features), yielding a complete overview of all dynamics present in the system, and, simultaneously, (partly) filters out the effects of (measurement) noise. Also, it should be noted that estimating these parameter values takes only 1.5 ms using the IMPU method. Although EKF computation times are not specified in [37], the computation time of the EKF scales significantly with the complexity of the system, in contrast to the inference time of the IMPU method. It is therefore stressed again that the main advantages of the IMPU method over filter-based parameter updating methods (such as EKFs) in the DT context (i.e., online computational efficiency, nonobligatory initial guess, and applicability to simulation models) are of a qualitative nature, see Sect. 1.

Conclusions and future work
To minimize the mismatch between a physical system and its Digital twin (DT, which is represented as a model), this paper proposes to update physically interpretable parameter values of nonlinear dynamical models in real-time by an inverse mapping parameter updating (IMPU) approach. This approach consists of an offline and an online phase. In the offline phase, an Artificial Neural Network (ANN) is trained using training samples consisting of output response features, as obtained from a simulated user-defined experiment (with known initial conditions and excitations), and corresponding (known) parameter values. The resulting trained ANN then constitutes a direct inverse mapping between response features and parameter values. In the online phase, this experiment is per-formed on a real system. Then, the trained ANN is used to map the measured transient-based response features to inferred parameter estimates. These estimates are then used to update the DT. Although data generation and training can take significant time in the offline phase, computation times in the online phase are very small, allowing for (near) real-time parameter updating. For this purpose, novel transient-based feature types have been introduced and evaluated on a (simulated) 2-DoF dynamical multibody system. Engineering knowledge about the system is used to define informative output features resulting in improved accuracy and computational efficiency (the latter both in the learning and in the inferring phase) of the IMPU approach. Additionally, generalization capabilities are shown and improved by properly choosing output features such that relatively few training samples already achieve satisfactory accuracy or parameter values outside the training space are estimated with reduced but still acceptable accuracy. Furthermore, the parameter updating using the IMPU method is compared to parameter updating using an extended Kalman filter (EKF), where it is observed that, in quantitative sense, the IMPU method is competitive with or even outperforms the EKF.
As estimates of true parameter values that lie beyond the training range become less accurate, quantifying the uncertainty in the estimated parameter values is ongoing research. Additionally, the choice of ANN structure and training settings is far from trivial. Therefore, systematically optimizing the ANN hyperparameters is ongoing research as well. Furthermore, as the IMPU method requires a highly accurate model structure, future work should focus on simultaneous (online) updating of model structure and parameter values. Finally, instead of discussing simulated use cases as done in the current paper, the authors intend to apply the introduced methodology to a physical system in the future. max p 1 [ j], . . . , p n s [ j] . (A3) To calculate the denormalized parameter values using the normalized parameter values as inferred in the online phase by the ANN, we use the normalization bounds calculated in the training phase: .(A4) Equivalently, for the normalization of measured features in the online phase, these (measured) features are normalized using the normalization bounds ψ min and ψ max as obtained using equivalent variants of (A2) and (A3) for the feature vectors: .
Here it is emphasized that ψ min and ψ max are thus obtained using training data only. Note that validation samples (features and parameter values) are normalized in an equivalent manner as the measured (or testing) samples, i.e., with normalization bounds calculated in the training phase.

B.2 Roll plane model
The EoMs of the RPM analyzed in Sect. 5 are given by where explicit dependency on t and p are omitted for brevity. Furthermore, the mass matrix M, the vector containing (linear and nonlinear) spring forces f K , the vector with (nonlinear) damper forces f C , and vector with forces due to the prescribed displacement f U are given by where, for i = 1, 2, nonlinear stiffness and damper terms are given by F K i (q) = k i q + k i,3 q 3 , 2 tanh (10q)) .
(B5) Furthermore, the prescribed displacements are shown in Fig. 20. ). Please note that the FRF from u 2 to y 1 is equal to the FRF from u 1 to y 2 Fig. 20 Prescribed displacements of the RPM, u 1 (t) ( ) and u 2 (t) ( ), representing a speed bump over which is driven with a constant velocity

Appendix C: Linearized system properties of RDB model
The frequency response functions (FRFs) of the RDB model are plotted in Fig. 21 for each parameter case, as defined in Table 3. From the FRFs, it is observed that, for this system, the DoFs are largely uncoupled, i.e., y i is mostly influenced by u i , with i indicating the DoF. This is also apparent from the eigenmodes φ i of parameter case 1: with j the imaginary unit.