Low-dimensional data-based surrogate model of a continuum-mechanical musculoskeletal system based on non-intrusive model order reduction

Over the last decades, computer modeling has evolved from a supporting tool for engineering prototype design to an ubiquitous instrument in non-traditional fields such as medical rehabilitation. This area comes with unique challenges, e.g. the complex modeling of soft tissue or the analysis of musculoskeletal systems. Conventional modeling approaches like the finite element (FE) method are computationally costly when dealing with such models, limiting their usability for real-time simulation or deployment on low-end hardware, if the model at hand cannot be simplified without losing its expressiveness. Non-traditional approaches such as surrogate modeling using data-driven model order reduction are used to make complex high-fidelity models more widely available regardless. They often involve a dimensionality reduction step, in which the high-dimensional system state is transformed onto a low-dimensional subspace or manifold, and a regression approach to capture the reduced system behavior. While most publications focus on one dimensionality reduction, such as principal component analysis (PCA) (linear) or autoencoder (nonlinear), we consider and compare PCA, kernel PCA, autoencoders, as well as variational autoencoders for the approximation of a continuum-mechanical system. In detail, we demonstrate the benefits of the surrogate modeling approach on a complex musculoskeletal system of a human upper-arm with severe nonlinearities and physiological geometry. We consider both, the model’s deformation and the internal stress as the two main quantities of interest in a FE context. By doing so we are able to create computationally low-cost surrogate models which capture the system behavior with high approximation quality and fast evaluations.


Introduction
With the growing availability of virtual and augmented reality (AR/VR) devices, interest in scientific applications for this type of hardware is also increasing across different research fields, e.g.immersive data and simulation analysis or computer aided surgery [4,14].Placing the user in an environment that allows for direct, intuitive interaction with data and models, having them properly represented in the same space as the user, or providing crucial information directly in a persons field of vision to aid them in the task they are performing, holds great potential for disciplines, which are -so far -primarily experienced through regular screens.However, as comfort of the wearer, and thus weight and size of the device, are primary concerns for the manufacturers, the computational resources an AR/VR headset can provide are quite limited.
Musculo-skeletal biomechanics simulations to be run on resource-poor systems, or those with realtime performance requirements usually rely on multibody systems using so-called lumped parameter models like Hill-type muscles for the muscle tissue [25].Lumped parameter models capture the macroscopic behavior of the muscle tissue in an arrangement of discrete elastic, dampening and a contractile elements.Modelling the skeleton as multibody system they can be used for forward simulation up to the scale of full-body models.While very performant and able to exhibit some key characteristics of muscles they only offer a once-dimensional representation [44].Realistic muscle geometries like the three-dimensional distribution of stress and deformation or contact mechanics can not be addressed with this type of model.For these purposes a continuummechanical material description in combination with the finite-element method (FE) is well suited for dealing with the large deformations encountered in human soft tissues, like skeletal muscles.These capabilities do, however, come at a considerable computational cost, making them undesirable for any type of time-sensitive simulation or evaluation heavy applications like inverse problem solving musculo-skeletal biomechanics [42].

Surrogate modelling
To reduce evaluation times, surrogate modelling techniques can be applied to capture the fidelity of the original FE model w.r.t.specific aspects, thus decreasing the overall computational cost per evaluation.A common concept for this purpose is the use of response surfaces, i.e. approximating the dependency of an output quantity of interest on the model's inputs by interpolating it based on a number of known states, using a basis function of choice [5].One such method, which has already been applied successfully to continuum-mechanical data, is a sparse grid approach in combination with B-spline interpolation [48].The name refers to building the support for the response surface as sparse as possible, to maintain the methods usefulness for either particularly expensive models or models with high input dimensionality.Another technique involves a coarse model, with lower fidelity, to explore the parameter space it shares with the original, fine-grained or high-fidelity model, which is then only evaluated for states identified as useful or interesting using the surrogate [27].Variants of this type of approach that have been popularized with the more wide-spread use of machine learning over the last decade, attempt to bypass the full model entirely by learning the difference between the surrogate and the original model which can be called discrepancy learning [12,22].Intuitive examples are deep-learning-based smart upscaling techniques to increase visual fidelity, e.g. in computer graphics [50].A similar approach does however also exist for simulations .

Data-based Model Order Reduction
In addition to classical machine learning, the field of model order reduction (MOR) provides a broad library of methods for the efficient approximation of high-dimensional systems.Besides classical linear techniques like modal or Krylov reduction and parametrized model reduction, as well as classical nonlinear techniques such as collocation-based methods or optimized global cubature methods [3,13], data-driven approaches have became a popular solution and close the gap to machine learning.Their benefits include their capability to handle black-and gray-box models and their ability to capture nonlinear behavior.Furthermore, the disadvantage of having to manipulate simulation code is avoided.Unfortunately, this generally requires a costly offline phase and physical behavior can often no longer be guaranteed.
A concept, which many of the data-driven MOR approaches share, is the idea of transforming a high-dimensional state onto a low-dimensional space (reduction) where the relationship between input variables and a low-dimensional representation of the state is learned (regression).Many publications in this context are focused on the use of specific choice of reduction and regression algorithms but obviously there are endless possible combinations.To account for all of them and to understand this idea as a general and modular framework, we will refer to this type of method generally speaking as low-rank regression (LRR).An example for this can be found in [24], where proper orthogonal decomposition (POD) is combined with a neural network to extract a lowdimensional subspace from data first and predict the reduced dynamics of a parametrized partial differential equation second.The authors refer to this certain approach as POD-NN.
In the field of structural dynamics, crash simulation models in particular have been the objective of previous LRR research [20,28,30,31].In these publications, the quantity of interest has been the displacement of the considered models and the results have proven that it is possible to approximate them well using linear reduction techniques like the POD or the CUR decomposition [20].Other physical quantities such as stress, which proved more difficult to capture in our research, have been neglected so far or were only calculated in a post-processing step using standard FEM tools as in [16].Consequently, with this work we apply the LRR framework to both displacement and stress in order to develop more comprehensive surrogate models.Furthermore, we show that this kind of surrogate model is well-applicable for human body models which are rarely considered in data-based MOR of structural dynamical problems.
When linear reduction techniques reach their limits regarding approximation quality, nonlinear techniques can provide better dimensionality reduction capabilities.In [43], kernel proper orthogonal decomposition is used as nonlinear extension of POD in combination with neural networks to approximate parametrized partial differential equations.Another popular nonlinear dimensionality reduction approach are neural networks in the form of autoencoders (AE) [33].They are composed of a decoder that maps the highdimensional data onto a low-dimensional representation and an encoder which follows the task to reconstruct the full data from this reduced representation.In [15], for example, deep convolutional autoencoders are used along with feed-forward neural networks as surrogate models in cardiac electrophysiology.Other applications of convolutional autoencoders in the context of MOR can be found in [18,34].Convolutional neural networks are a popular choice for the task of reduction as they possess a lower number of trainable parameters compared to fully connected neural networks.Moreover, for structured, grid-like data, they are able to detect local patterns with filters.Unfortunately, they are not suited for finite element models which in general are not spatially discretized as grid.This can be remedied by using graph convolutions [11], which are compared to a fully connected neural network in the context of MOR in [19].Graph convolutional autoencoders have been applied in the field of computer vision in combination with mesh sampling operations to reconstruct 3D faces [39] from a low-dimensional space.Moreover, many combinations of different reduction methods are possible: autoencoders which complement a linear subspace [46], a series of linear reduction methods to reduce the system to a certain degree, and subsequent autoencoders to reduce it even further, see, for example, [10], etc.
In contrast to the aforementioned methods which try to approximate the reduced system using a black-box model, other approaches aim to discover low-dimensional governing equations.In [9] an autoencoder is used to find a low-dimensional space in which the governing equations of the system are identified using sparse regression on a library of suitable function candidates.This approach can also be extended to discover hidden variables by adding time-delay embedding, see [6].In the context of structural dynamics, this kind of approach was used to discover low-dimensional equations of a beam, which can then be used for continuation [10].

Scope and Overview
In this work, we use the LRR framework to approximate the behavior of a human arm, i.e., its motion in form of node displacements and resulting internal stress.Thereby, we compare multiple reduction techniques, namely, the frequently used principal component analysis (PCA), its nonlinear extension kernel principal component analysis (KPCA) as well as fully-connected classical autoencoders (AE) and variational autoencoders (VAE) as other nonlinear alternatives.For this purpose, reference data is obtained from a high-fidelity FE model based on different muscle activation levels.With this data the reduction algorithms are fitted and used to create a low-dimensional representation of the data.The relation between muscle activation and reduced coordinates is approximated using a Gaussian process (GP) regression algorithm.For the approximation of the arm's full behavior, the GP's predictions are transformed back into original space.
The theoretical background can be found in Section 2, the resulting algorithm and its implementation in Section 3, while the results for the arm model are presented and discussed in Section 4 followed by the conclusion in Section 5.
The contributions of our work can be summarized as: 1. we generate real-time capable, yet accurate holistic surrogate models for a complex human arm model based on a 5-dimensional input 2. we apply the LRR framework to data with nonlinear structures such as stress 3. we introduce KPCA as a nonlinear alternative to PCA in LRR 4. we compare PCA, KPCA, AE, and VAE regarding their reconstruction and generalization qualities

Methods
The underlying approximation method used in this paper relies on a coordinate transformation to find a low-dimensional description of the system and on a regression algorithm to learn the behavior of the system.Although we start with introducing the specific human upper-arm model, the used methodology generalizes well to other problems and is explained in general terms.

Model
Muscles are the driving organs, i.e. the actuators, of the human body.Together with soft tissues such as tendons, they are essential for the proper interaction and functioning of the human organism.For example, muscle activation has a great impact in car crashes, see [36].It can stiffen extremities, reduce the load on certain parts of the body, and has a huge impact in the likelihood of injuries.
The considered human upper-arm model consists of the radius and ulna bones for the lower arm and the humerus for the upper-arm, see Fig. 1.The elbow joint connecting them is modelled as a simple hinge joint.It contains five muscle actuating this joint, two extensors and three flexors.Namely these are the m.triceps brachii and the m.anconeus, as well as the m.biceps brachii, the m.brachialis and the m.brachioradialis.The geometry of all these components is modelled after the Visible Human Male dataset [2].In [42], a previous iteration of this upper limb model is described and details regarding the constitutive and material modelling are pointed out.Since the model used here heavily relates to the aforementioned one, we will primarily describe parts of the model that differ from the previous version, and those which are most relevant to this work.The upper head of the humerus bone, normally connected to the shoulder, is fixed in an upright position, making the elbow joint the only mechanical degree of freedom in the model.
Input to the model consists of a vector of activation parameters µ ∈ M ⊆ R 5 .Each activation µ j ∈ [0, 1] with j = 1, ..., 5 is associated with one muscle in the model, representing the percentage of the maximum possible active stress currently output by the muscle.The overall stress in the j-th muscle can be written as a sum of isotropic and anisotropic contributions σ iso and σ aniso , of which the latter can be subdivided further into active and passive stresses σ active and σ passive : The variables γ M , γ ST ∈ [0, 1] are scale factors describing the tissue, with γ M , γ ST > 0 being muscle, γ M = 0, 0 < γ ST ≤ 1 tendon, and γ M = γ ST = 0 soft tissues, e.g.fat.The active stress is defined as: In this context, σ max is the maximum possible active stress for the j-th muscle.This occurs at the optimal fibre length λ opt f .More information on this, as well as all other material parameters of the model, can be found in the appendix in Table A1.
The steepness of the force-length-relation of the muscle fibre is given by Γ asc and ν asc for the ascending arm of the relation, and Γ desc and ν desc for the descending arm.λ f is the muscle fibre stretch in the current configuration.M = a 0 ⊗ a 0 is defined as by the fibre orientation in the reference configuration a 0 , with ⊗ being the dyadic product.
Furthermore the passive anisotropic stress can be written as where c 3 and c 4 are material parameters.On top of these anisotropic contributions, the muscles also exhibit some isotropic behavior which is given by: , and Again, c 1 and c 2 are material parameters.The invariants I 1 , I 2 , I 3 , I 4 can be derived from the Cauchy-Green stress tensor C: For the purposes of the approximation methods applied in this work the model described above serves as high-fidelity model and is mathematically defined as F : M → Z ⊆ R N .It possess unknown since not accessible internal dynamics.The quantities of interest in this work, on the contrary, are accessible as simulation results.They include the motion of the arm in form of its (nodewise where q m = q m,x q m,y q m,z T represents the x,-y-, and z-displacement of the m-th node. The simulation results furthermore contain the (element-wise) stress load of the muscle tissue.
For the latter, we consider the equivalent stress values σ ∈ G ⊆ R Nstress according to von Mises as it offers advantage to combine all stress components into a single value.The dimension of the displacements N disp = 48212 • 3 = 144636 corresponds to the product of the number of nodes n i = 48212 and features per node n f = 3, while the dimension of the von Mises stress simply equals the number of elements N stress = 169278.In the following explanations, we will not distinguish whether the displacements q or the stress σ is the quantity of interest as it makes no difference for the underlying approach.Correspondingly, we introduce the variable z ∈ Z ⊆ R N as general state vector, which can describe an arbitrary physical quantity.

Problem Definition
Generally speaking, we pursue the goal of creating an efficient but accurate surrogate model F for a computationally expensive high-fidelity model F .
A major problem encountered in surrogate modeling of sophisticated models is their sheer dimensionality.In the case of finite element models, this high dimensionality arises from the underlying modeling method and not from the necessity to describe the system with so many degrees of freedom.Fortunately, this often results in highdimensional systems that can be expressed with only a few intrinsic coordinates, instead of thousands or millions, by a suitable coordinate transformation.Consequently, our surrogate modeling approach relies on two fundamental steps.The first step involves the identification of a suitable coordinate transformation Ψ : Z → Z r mapping the system state z onto its reduced low-dimensional equivalent z ∈ Z r ⊆ R r where r N .A reverse coordinate transformation from reduced to full space Ψ † : Z r → Z ensures physical interpretability of the results.
The second step involves the approximation of the system behavior in reduced space, i.e., of the reduced coordinates z.Therefore, we seek a function Φ : M → Z r that approximates the relationship z ≈ z = Φ(µ) between quantities from the parameter space M and reduced system behavior.The abstract workflow can be seen in Fig. 2 Taking all mentioned considerations into account, the overall goal can be formulated in a mathematical sense resulting in an optimization problem min In this context, the function L is some measure of approximation quality comparing the results from the surrogate z with the reference results z.The optimization problem ( 1) is minimized over Ψ and Ψ † to find the best low-dimensional representation of the system and over Φ to find the best approximation of the reduced system coordinates.Furthermore, the parameters µ ∈ M are taken into account to guarantee a satisfying approximation in the complete considered parameter space M.
In the following, Ψ will be called reduction mapping and Ψ † accordingly reconstruction mapping.Furthermore, an approximation of a quantity is always indicated by a tilde as in z, whereas a reduced quantity is represented by a bar, e.g., z, as listed in Table 1, where the most important variables of this work are summarized.Parameter dependencies z(µ) are only mentioned where it helps the comprehensibility and neglected otherwise.

Dimensionality Reduction
For an optimal low-dimensional representation of the system states z = Ψ(z) a coordinate transformation (reduction mapping) Ψ : Z → Z r and its reverse transformation (reconstruction mapping Ψ † : Z r → Z are sought.Their function composition z = Ψ † • Ψ(z) yields the reconstructed state.
There exist numerous possibilities to derive the reduction and reconstruction mapping in the field of model order reduction [8,21].In the light of data-based approaches, snapshot-based methods are a popular choice.They assemble κ high-fidelity simulation samples, so-called snapshots, in the where z k = z(µ k ).The individual snapshots rely on the discrete and finite parameter set M κ = {µ 1 , ..., µ κ } ⊆ M. As long as there are enough samples in the resulting subspace S Mκ = span{Z}, it is supposed that this subspace approximates the discrete solution manifold S M = {z(µ) | µ ∈ M}.This, in turns, ensures that if Ψ and Ψ † are found using S Mκ they can be applied to S M as well.
A widespread methodology to find the reduced coordinates are linear reduced-basis methods, see [38].They rely on the mathematical principle of projection.In case of a Galerkin projection with an orthogonal projector P , the state is reconstructed by In this context, the matrix V = v 1 ... v r is known as reduction matrix and consists of the reduced basis vectors {v 1 , ..., v r } ⊆ Z.That means the full state is approximated by a linear combination z ≈ r l=1 z(l) v l of the reduced basis vectors with the reduced states z = z(1) ... z(r) T as coefficients.Using such methods can result in a poor reconstruction or a relatively large number of required modes r due to the restriction to describe the state as a linear combination.Hence, we present besides principal component analysis as a representative of linear projection-based reduction methods, nonlinear alternatives in the form of kernel principal component analysis as well as classical and variational autoencoders.

Principal Component Analysis
Principal component analysis, which is also known as proper orthogonal decomposition [49], works with centered data Ẑ = ẑ1 ẑ2 ... ẑκ .That means the mean z mean = 1 κ κ n=1 z n is subtracted samplewise ẑ = z − z mean from the data.The purpose of PCA is to find orthonormal basis vectors, also referred to as principal components {v l } r l=1 for which the maintained variance under projection remains maximal, see [26].Accordingly, the reduced basis vectors solve the optimization problem represents the covariance matrix and δ kn the Kronecker delta.One possibility to determine the basis vectors is to solve an eigenvalue problem of the covariance matrix The reduced basis vectors then result in the r most dominant eigenvectors, i.e., those with the largest associated eigenvalue λ.Usually, the eigenvectors are ordered in descending order of their eigenvalues so that where we suppose that the r-th eigenvalue is truly greater than its successor.Instead of solving an eigenvalue problem of the covariance matrix S, a singular value decomposition of the snapshot matrix Z yields a solution to (3) as well.
The first r (most important) basis vectors are assembled in the projection matrix so that the reduction mapping results in a projection Ψ PCA (z) = V T PCA z and the reconstruction mapping in the corresponding back projection

Kernel Principal Component Analysis
Kernel principal component analysis was introduced in [45] as a nonlinear extension of PCA.A more recent overview can be found in [17].The underlying idea is to transform the state onto a higher dimensional feature space F ⊆ R η where it is more likely to obtain linear separability.
The transformation is done using an arbitrary nonlinear map with a very large dimension η N .Consequently, the transformed solution manifold S F = Θ(S M ) = {Θ(z(µ)) | µ ∈ M} ⊆ R η should be better approximable by a linear combination of subspaces than the original space was.All that remains to be accomplished is to perform the PCA on the transformed snapshot matrix However, the computational costs of working in the high-dimensional feature space F can be avoided by using the so-called kernel trick.It enables to work in the original rather than in the feature space by replacing scaler products Θ(z i ) T Θ(z j ) with a suitable kernel function As it is possible to formulate the PCA in such a way that all vectors Θ(z) only appear within scalar products no computations in the feature space have to be solved.Applying the kernel function to all samples yields the kernel matrix K which entries are defined as The kernel matrix must usually be centered as a selected kernel function k(•, •) does, in general, not result in a centered matrix.
In contrast to PCA, the principal components themselves are not calculated in KPCA but only the projections of the states onto those components where Θ(z i ) T Θ(z) simply corresponds to the elements K i of the kernel K.The coefficients a il can be obtained by solving an eigenvector equation of the kernel matrix K.With the prescribed approach, the reduction mapping of the KPCA yields The reconstruction mapping Ψ † kPCA ( z) is usually obtained via kernel ridge regression [7] as the principal components themselves are never calculated explicitly.

Autoencoder
An Autoencoder (AE) is a special type of artificial neural network (NN), which aims to learn the identity function with the constraint that it possesses a bottleneck, see e.g., [34].Neural networks in general try to approximate a relationship between an input variable x and the corresponding output variable y based on an available dataset D = {(x n , y n )} κ n=1 .In the simplest case, their architecture is based on successive layers with neurons.Each neuron is represented as a nonlinear transformation of the linear combination of its inputs.For the j-th neuron of the l-th layer, this can be written as where n (l−1) u describes the number of units in the preceding layer and w ij , i = 0, ..., n (l−1) u represent the adjustable weights with the bias w 0j .Moreover, h(•) is a (nonlinear) activation function.Assembling multiple neurons in one layer, the output of the l-th layer itself results in where the nonlinear activation function h(•) is applied element-wise and the adjustable weights are W (l) = {W A basic neural network Ξ can accordingly be seen as a series of successive transformations ) of linear combinations of input variables.
During training, the estimated outputs ỹ are predicted in the forward pass with fixed parameters W , whereupon the errors are propagated back to adjust the weights with respect to a certain loss function like the mean squared error In case of gradient descent optimization, the weights are updated in direction of their negative gradient, see [23].

Fully-connected Autoencoder
Autoencoders have already been introduced as a nonlinear alternative to PCA in the early 90's [33].
As previously mentioned, they aim to learn the identity mapping Ξ : Z → Z with z = Ξ(z) but with a bottleneck in their layer size as exemplified in Fig. 3a.This architecture ensures that the network learns a reduced representation of the input variables within the network.Consequently, the autoencoder's input x = z as well as the output y = z correspond to the system state for the considered problem.
An AE consists of two subnetworks: the encoder Ψ ae and the decoder Ψ † ae .In the encoder Ψ ae : Z → Z r , the state dimension is decreased from the original system size N to r, i.e., z = Ψ ae (z).Analogous to equation (4), a encoder with n l layers can be represented by the function composition The decoder in turn is a mapping from the reduced to the original space Ψ † ae : Z r → Z with z = Ψ † ae ( z) which again is a series of successive transformations Consequently, the autoencoder results out of the composition of the encoder and decoder network The optimal trainable network weights W * ae = {W * enc , W * dec } with respect to the loss function (5) result from One problem in classical autoencoders is that they can lack regularization of the reduced/latent space.Hence, points that are close to each other in the reduced space can result in very different reconstructions.Moreover, the reduced space is not easy to interpret and cannot be assigned to any concrete effects in the full space.To circumvent these problems variational autoencoders [29,41] can be used.

Variational Autoencoder
The input of variational autoencoders (VAEs) is in contrast to the regular autoencoders not transformed into a single reduced coordinate but into a distribution over the reduced space.Often, the prior probability distribution (prior ) over the reduced/latent variables p( z) is assumed to be Gaussian N (0, I) with zero mean and unit variance.Hence, the encoder network returns a mean value ν(z) and a variance δ(z) describing a normal distribution N (ν, δ), see Fig. 3b.Correspondingly, its output ν T δ T T ∈ R 2r is twice the size of the reduced coordinates r.
To reconstruct the original data, a sample is drawn from the reduced/latent distribution z ∼ N (ν, δ) and then transformed by the decoder network.As a result, the variables in the reduced space have continuity, i.e., close points in the reduced space also give similar reconstructions, and each point within the distribution should give a meaningful reconstruction.Consequently, VAEs are often used in generative tasks where data samples are created based on random samples from the reduced/latent distribution.Nevertheless, they also provide a great tool for dimensionality reduction as demonstrated in [35].
In detail, VAEs deal with the problem of variational inference, i.e., they aim to infer the posterior probability distribution (posterior ) p( z|z) over a hidden (in our case reduced/latent) variable z given representative data z.This is achieved by maximizing the Evidence Lower-Bound (ELBO) where the first part serves as reconstruction loss.It ensures that the log-probability of z given z drawn from the approximated posterior b( z|z) is maximized, i.e., that the input is reconstructed well from a sample in the reduced space.The second part includes the Kullback-Leibler divergence KL, which is a distance measure defined on probability distributions.This ensures that the reduced coordinates respect the assumed prior over the latent variables p( z), i.e, that the approximated posterior b( z|z) is pushed closer to the prior.With the parameter β, the KL divergence can be weighted depending on the requirements to either focus more or less on the reconstruction ability of the VAE.A large value of β leads to the socalled disentanglement of the latent variables so that a single unit of the variable reacts sensitive to changes in individual system characteristics.Thereby, the impact of this unit of the variable on the reconstructions can be better interpreted.
In VAEs, the variational inference problem is solved by parameterizing the likelihood and posterior as functions by neural networks.Specifically, the likelihood function is a function parameterized by the decoder z = Ψ † ae ( z) ∼ p(z| z) reconstructing the full state z given a reduced state z and the approximated posterior b( z|z) is represented by the encoder network z = Ψ ae (z) ∼ b( z|z).During training of the model, the so-called reparameterization trick is used, to enable backpropagation of the resulting loss through the network since the process of sampling from a distribution defined by the encoder Ψ ae would normally not be differentiable, see [29,41].Hence, a noise term ζ ∼ N (0, I), which is not parameterized by the network, is introduced whereby a sample can be drawn following z = ν(z) + ζδ(z).
By doing so, the ELBO function ( 6) can be optimized using gradient descent.During evaluation of the model, we neglect the predicted variance δ and only use the predicted mean ν as output of the encoder.
One major problem that arises within the context of fully-connected AEs is the very high number of trainable parameters, i.e., weights W , compared to the already presented methods.This leads to an expensive training phase and is usually mitigated by using convolutional autoencoders as in [18,34].Convolutional neural networks (CNNs) reduce the number of required model parameters by parameter sharing and generalize well by detecting local patterns in spatially distributed data.Nevertheless, they are of limited use for the reduction of finite element models as they are especially suited for data that is discretized in a grid-like structure.FE models, on the contrary, usually possess an irregular spatial discretization and the generalization of CNNs to such data is not trivial.Therefore, in this work, we exclusively use fully-connected autoencoders which are still admissible given the system size of the human upper-arm model.It should be noted, however, that one possible solution to apply CNNs to FE models may be found in graph convolutions [11].

Regression
All presented dimensionality reduction techniques, PCA, KPCA, AE, and VAE, offer the possibility to identify suitable transformations to represent the system states z in a low-dimensional subspace and thus form the foundation for further procedure.The reduction not only retains just the most important system features but also enables highly efficient training of regression algorithms.Thereby, the search for a function Φ : M → Z r approximating the relationship from simulation parameters to reduced system behavior µ → z is simplified.
As we only rely on data in this work, any regression algorithm can be used for this task.However, data generation is often an expensive process, especially in the case of complex FE models.Hence, whenever the offline phase is resourcelimited, an algorithm that can cope with less data is preferable.Since the dataset on which this work is based also only contains relatively few samples, Gaussian process regression is the algorithm of choice.It is well-known to perform well, even in data-poor regimes and was successfully applied in other publication [28,31] in the context of LRR.

Gaussian Process
The basic idea of Gaussian Processes (GPs) is to limit the space of functions fitting the given data by imposing a prior on the functions first and condition them with available observations to receive the posterior probability distribution, see [40].This approach relies on multivariate Gaussian distributions which are defined by a mean ν and a covariance matrix Σ.According to this, Gaussian Processes are completely defined by a mean function m(x) and a covariance or rather kernel function k(x i , x j ) for some inputs x.Thus, a GP can be defined as where K is the kernel matrix obtained by applying the kernel function element-wise to all inputs.The selected kernel function implies a distribution over functions, giving a higher probability to those satisfying certain properties such as smoothness to a greater extent.In order to update the prior with training data, let x be training points with an observed result y and a mean function m 1 and x * test points with the mean function m 2 for which the output y * shall be predicted.Suppose y and y * are jointly Gaussian where K is the kernel matrix for the variables in training input space, K * between the training and test data, and K * * among the test variables themselves.Then, the posterior can be calculated following For regression, the GP's mean function m p serves as predicted output.During training, the kernel function's hyperparameters are tuned to obtain the final regression algorithm Φ.For further information on GPs please rely on [40].

Implementation
The presented framework consists of two successive steps.The first of which is a coordinate transformation to obtain a low-dimensional representation from high-dimensional data and the second is the approximation of the system behavior in the reduced space.For the corresponding tasks, individual algorithms have been presented.However, the entire framework is structured generically so that a wide variety of further suitable algorithms exists.Therefore, in the following the variables Ψ and Ψ † are used for reduction and reconstruction in general.
For the implementation of PCA, KPCA, and GP it is relied on the software library scikitlearn [37], whereas the autoencoders are implemented using TensorFlow [1].The high-fidelity simulations were conducted using the commercial FEM tool LS-DYNA and the resulting data is provided to the interested reader in [32].This dataset contains the model's muscle activations, displacements, as well as processed von Mises stress.The toolbox to generate the presented surrogate models, however, is still in progress and will be published at a later date.

Algorithm
The algorithm can be divided into an offline and an online phase.While data creation and surrogate modeling take place in the former one, the evaluation of the surrogate for new unseen parameters is part of the latter one.
In the offline phase, which is summarized in Algorithm 1, a finite set of simulation parameter M κ = {µ 1 , ..., µ κ } from the parameter domain M is created using some sampling strategy.Based on the selected parameter, the highfidelity model is evaluated and its results are assembled for further processing.The following steps of the offline phase involve (i) calculation of coordinate transformation for dimensionality reduction and subsequently (ii) fitting of the regression algorithm to approximate the reduced system behavior.
To solve the task of dimensionality reduction, the high-fidelity simulation results are assembled in the dataset which contains the full states as in-and output and can be rewritten as snapshot matrix (2) Based on this data, the reduction mapping Ψ and the reconstruction mapping Ψ † are fitted.Once these mappings are found the regression dataset can be composed by reducing the system states z = Ψ(z) resulting in It is composed of the individual simulation parameters, i.e., the muscle activations as inputs and the corresponding reduced system states obtained by the reduction algorithms as output.Subsequently, the regression algorithm Φ is trained using D regr as data basis.The composition of the fitted reconstruction mapping Ψ † along with the regression model Φ yields the complete surrogate model It is used in the online phase to predict the system behavior, see Algorithm 2. In other words, the regression model is evaluated on new (unseen) simulation parameters to receive an approximation of the respective reduced states followed by a transformation back into the high-dimensional physical space.For an overview of the entire workflow including the offline as well as the online phase of the algorithm please refer once again to Fig. 2.

Error Quantities
It is crucial to not only consult the complete surrogate model but also its components to validate its overall approximation quality and gain insights into the individual sources of error.Hence, besides the overall approximation error, also the error induced by the reduction and the regression algorithm are investigated.
For this purpose, three relative performance scores are introduced which allow a reliable assessment.In addition, absolute error measurements serve as physical interpretable quantities.

Relative Performance Scores
In this paper, different reduction methods and their impact on the approximation are explored.A well-suited quantity to consider their impact isolated from the other sources of error is introduced as reconstruction score following the calculation rule It compares a state z with its reconstruction z = Ψ † • Ψ(z).This score, as well as the two following ones, reaches its optimum with a value of 1, whereas a value of 0 corresponds to no prediction at all.It is to be noted that s rec can become negative as the reconstruction can be arbitrarily poor.
Similar to the isolated reconstruction score, the regression score validates the performance only of the regression algorithm by comparing the prediction z = Φ(µ) with the reduced reference state z.This is the only performance measurement taking place in the reduced space while all other measurements are calculated in the full physical space.The composition of both parts, i.e. the overall approximation quality, is measured by the so-called approximation score where z = Ψ † (Φ(µ)) is the overall approximation of the reference z.To visualize the individual scores, the connection between the compared quantities is shown in Fig. 2. While the presented scores can all be assessed for a single simulation, their mean value over all simulations ŝrec/regr/approx (•, is used to compare several simulations at once.

Absolute Error Measures
In addition, to the relative scores, absolute error measures are consulted for the overall approximation.Therefore, the sample wise 2-norm of the error between the features of the m-th node/element z m and their approximation zm is used.Furthermore, its mean and maximum value over all nodes/elements are used for comparisons.Note that in case of displacements, where z m = q m corresponds to the coordinates of a node, the 2-norm e dist (q m , qm ) = e 2 (q m , qm ) represents the Euclidean node distance of the m-th node q m and its approximation qm .
For the element-wise scalar valued stress samples, e stress (σ m , σm ) = e 2 (σ m , σm ) just matches the absolute difference between the von Mises stress of the m-th element and its approximation σm .
As last performance measure, the required computational time for one sample ∆T is used to account for the resource savings achieved by the surrogate models.

Surrogates for Displacement and Stress
The implementation of a surrogate model that captures the displacements of the model and one that captures the stress is very similar and follows the presented approach in both cases.Nevertheless, both quantities, i.e., q and σ live in such different spaces that each requires its own reduction algorithm.Hence, for each quantity an individual dataset must be assembled following (7).This yields D disp red = {(q n , q n )} κ n=1 for the displacements and D stress red = {(σ n , σ n )} κ n=1 for the von Mises stress.For the subsequent regression, a single algorithm could be used to predict values of both quantities.However, we use a single regression algorithm per variable for the sake of clarity.This, in turn, requires individual data sets D disp regr = {(µ n , qn )} κ n=1 and D stress regr = {(µ n , σn )} κ n=1 for the approximation of the reduced displacements q and reduced stress σ.

Results
For the creation of the high-fidelity, i.e., finite element simulation results, 1053 muscle activations µ ∈ R 5 are equidistantly sampled on the parameter domain as simulation parameters.A value of 1 corresponds to full muscle activation, whereas a value of 0 corresponds to no muscle activation.The amount of samples was chosen to be rather small to reduce the overall number of FE model evaluations, since calculating one activation state can take minutes on a high-performance computing system with 64 processors in parallel.Based on these muscle activations, the finite element simulation was evaluated, resulting in 1053 samples for displacement q and stress σ, which are used to build the training data sets following (7) to fit the models.Another 100 parameters µ are randomly sampled, using a uniform distribution, on the parameter domain.These serve as a test set.A particular challenge in surrogate modeling for this simulation lies in the relatively poor data basis, due to its prohibitive computational cost.The following results (with exception of the high-fidelity simulations) were produced on an Apple M1 Max with a 10-Core CPU, 24-Core GPU, and 64 GB of RAM.
In our previous studies [30,31], linear reduction methods have proven to be suitable for the low-dimensional representation of displacements obtained by and elastic multibody and FE models.A similar conclusion can be reached for the arm model by considering the singular values ς, which represent the eigenvalues of the covariance matrix S of the centered snapshot matrix Ẑ.The contribution of a reduced basis vector of the PCA to the total reconstruction of the original quantity is reflected by the magnitude of the corresponding singular value.That means if only a few singular values contribute significantly, it is possible to represent a quantity with only a few reduced basis vectors without losing much information.If this is in contrast not the case, the reduction either requires numerous reduced basis vectors or introduces a remarkable error.This is closely related to the Kolmogorov n-width ( [47], [34]) which quantifies the optimal linear trial subspace by describing the largest distance between any point in the solution manifold for all parameters S M and all n-dimensional subspaces Z n ⊆ Z. Supposing that the given system has unique solutions for all parameter samples, the intrinsic dimension of the solution space is at the most equal to the number of parameters, or number of parameters plus one for dynamical systems.
Consequently, if a linear subspace were sufficient to represent a given system, i.e., if the system possessed a fast decaying Kolmogorov n-width, we would expect that using as many PCA modes as the intrinsic dimension leads to a satisfying lowdimensional representation of the system.For the given human arm model which is parameterized by a five-dimensional parameter space M ⊆ R 5 , this means that having the most dominant behavior within the first five modes would justify the use of a linear subspace.The coarse of the singular values ς is considered in Fig. 4 to investigate the suitability of a linear subspace.For a suitable presentation, the singular values are scaled by the sum of singular values, i.e. ς(l) = ς (l) / κ i=1 ς (i) .On the one hand, the first five PCA modes for the displacements, indeed, capture almost 90% of the behavior of all data.On the other hand, considering the results of the stress data it becomes apparent that the singular values do not decrease as fast as it is the case for the displacements and single singular values are not that dominant.Accordingly, the first five only capture about 60%.This already indicates that the approximation of the stress is more challenging.
In this context, it is to be noted that this does not result from considering the von Mises stress instead of single stress components.The individual components σ x , σ y , σ z , σ xy , σ yz , σ zx possess a similar behavior as shown in Fig. 4. Accordingly, it seems to be more difficult to represent the stress data in a low-dimensional linear subspace in general compared to the displacements.Furthermore, we expect that our results for the von Mises stress are transferable to the individual components.

Hyperparameters
Another noteworthy aspect in this context is that a reduced coordinate z(l) belonging to a reduced basis vector v l with a relatively small corresponding singular value ς (l) tends to possess less assignable characteristics and relations.Thus, they are often more difficult to approximate by regression algorithms.This in turn results in a preference for a few meaningful coordinates over many less meaningful coordinates for the following investigation, not only because of their dimensionality but also because of their predictability.The reduced system size r of the surrogate models is set to the same value for PCA, KPCA, AE, and VAE to guarantee a fair comparison.In case of the displacement, the reduced system size is chosen so that a reconstruction score above s rec > 0.99 is achieved using PCA resulting in a dimension of r disp = 10.For stress, on the contrary, it is set to r stress = 13 resulting in a reconstruction score above s rec > 0.95 using PCA.
Based on this preselected parameter, the individual hyperparameters of the different reduction algorithms are tuned using a grid search on the k-cross-validated training data set (7).As for PCA no further hyperparameter than the reduced system size r exists, only the hyperparameter for KPCA, AE, and VAE are listed in Table 2 and Table 3, respectively.Both autoencoders share the same underlying architecture to ensure a fair comparison.
For KPCA, a polynomial kernel function is chosen and the scaled exponential linear unit (SELU) with β = 1.051 and α = 1.673 serves as activation function in the autoencoder.
For each fitted reduction algorithm, the dataset ( 8) is assembled and regression algorithm is trained.Usually, the regression task is solved by a subnetwork in the case of autoencoders.However, we want to focus on the comparison of the individual reduction methods in the following investigation and want to examine them as independently as possible from the regression algorithm.Hence, the same regression algorithm is used for PCA, KPCA, AE, and VAE to enable comparability of their results within the context of LRR.In detail, a Gaussian process regression algorithm with consistent hyperparameters for all three reduction algorithms is used, see Table 4. Sampling the Reduced Space As first comparison of the reduction algorithms, we present the impact that variations in the reduced space have on the original space.Therefore, the first entry z(1) of the reduced state is varied around its mean value z(1),mean = 1 κ κ j=1 z(1) j with z ∈ D red occurring in the training data set.It is sampled within the range of the standard variance z(1),std .Please note that the mean and standard variance obviously differ for different reduction algorithms.For an isolated view all other entries z(2) , ..., z(r) remain zero.The resulting reduced states are then transformed into a reconstructed arm z(l) = Ψ † ([z (1) , 0, ..., 0] T ). which is shown in Fig. 5 for each reduction algorithm.
In the case of PCA, this representation corresponds to the visualization of the first reduced basis vector.It can be observed that the first reduced basis vector of the PCA is described by a motion during which the arm is raised.A similar arm raise can be observed for the KPCA but simultaneously a non-physical muscle deformation occurs, especially in the brachioradialis.For the autoencoder, non-physical behavior can be observed especially in the lower-arm where it comes to bizarre deformations of the muscle and bone tissue.Comparing this to the results of the VAE, the benefits of regularizing the reduced space become apparent.The reconstructed arm of the VAE exhibits such non-physical behavior to a much lesser extent and is mainly characterized by a slight lowering of the lower-arm.It is not surprising that the arm raising occurs in all reduction algorithms as it is one of the most important motions.However, the individual modes from the PCA are sorted by importance and orthogonal, which leads to the first reduced basis vector being an isolated motion of the most important movement without any other effect occurring.For the AE, the modes are in general neither orthogonal nor sorted by importance and thus different effects can occur simultaneously.Besides this more  abstract representation of the impact of reduced coordinates, concrete performance measurements are presented in the following section.

Performance Measurements
The fitted reduction Ψ and regression algorithms Φ are used in the online algorithm 2 to approximate the displacement q and stress σ of the arm on the test data set, i.e., on unseen muscle activations µ.The results are summarized in Table 5 where s rec is used for assessing of the isolated reduction algorithms, s regr for the isolated regression algorithm, and s appr for an overall impression.Moreover, the overall simulations averaged mean êmean  It confirms that the stress data is more difficult to reconstruct, given that s rec is consistently worse for stress than for displacement.In this context, it is noticeable that the KPCA reaches the best reconstruction score s rec and is accordingly most capable of identifying a low-dimensional representation of the data.Regarding the regression score s regr , the GP that is trained on the data reduced by PCA and KPCA performs very similarly.This suggests that the identified reduced coordinates are similarly learnable.Overall, the combination of an AE and GP yields the best results regarding the approximation score s appr as well as most of the physical error metrics.The variational autoencoder achieves worse results for s regr and consequently its reduced coordinates seem to be more difficult to approximate.However, the VAE is at the same time more robust against these errors in the reduced space and almost reaches the same quality as the original AE but with a more interpretable and regularized reduced space.
Regarding the computational time, the surrogate models using autoencoders are fastest followed by PCA, whereas KPCA is outperformed by far.The different computational times are only partially meaningful as the reduction algorithms are implemented using different software packages.While scikit-learn [37] is used for PCA and KPCA, TensorFlow [1] is used for the autoencoders.In an even comparison, PCA should be faster than the autoencoders as it contains less parameters.Nevertheless, all surrogate models can predict the arm behavior within milliseconds and thus are suitable for real-time applications.It should be noted, that the computational times presented in this work are the averaged times for the approximation off a single sample.When several samples are calculated at once the required time per sample decreases significantly.
For a more detailed elaboration of the results, the distribution of the performance scores of all test simulations are considered in the following paragraph.On the one hand, this is intended to illustrate the relationships between the reduction, regression, and overall approximation performance and, on the other hand, to depict the differences between the individual reduction methods.For this purpose, the distribution of the relative performance scores of the different reduction algorithms are shown in Fig. 6.There a boxplot is used as representation and a visual explanation is given in Fig. 6a.It consists of a box which is divided into quartiles so that 25% of the values are above the median value (upper quartile) and 25% below (lower quartile).Furthermore, the box is extended by lines, so-called whiskers, which are chosen to have a maximum length of 1.5 * IQR.In this context, the interquartile range IQR is the distance between the upper and lower quartiles.Outliers not covered inside the range of the whiskers are visualized as individual points.
Considering the achieved results for the node displacements shown in Fig. 6b, the isolated reconstruction score attests the KPCA's ability to represent the data in a low-dimensional space without much loss of information.Furthermore, the reduced coordinates resulting from PCA and KPCA seem to be identically challenging to learn as the distribution of the regression score achieved by the GP is almost identical.The overall approximation quality mainly suffers from the error induced by the regression as it deviates largely to the reconstruction score.Interestingly, the overall approximation score using the KPCA is influenced to a larger extent as it is the case for the PCA and accordingly seems to be less robust against disturbances in the reduced space with the chosen hyper parameters.For the autoencoders, a better regression score is achieved than for PCA and KPCA and they perform comparably similar.
Considering the scores achieved for the approximation of the arm's stress, as done in Fig. 6c, the impression changes.The reduced coordinates obtained via PCA and KPCA seem to be easier to learn than for the autoencoders and thus they result in better regression scores but at the same time they still contribute to the overall approximation.Quite the opposite is shown by the results for the autoencoders, which reveal that the reduced coordinates are difficult to learn but errors in the reduced space propagate to a much lesser extent to the overall approximation.Consequently, the approximation score s appr stays close to the reconstruction score s rec .This is especially evident for the VAE, for which the worst regression score but the best approximation is achieved.Hence, it seems to be very robust against errors introduced by the regression what may be explained by the regularization of the reduced space.still achieves the slightly better overall approximation for displacements as well as stress.The reason for this is that the errors of the individual categories do not affect the overall result to such an extent as is the case for KPCA which is sensitive to errors in the regression.
Comparing the displacement and stress data, two things are noteworthy.As expected the stress data poses the greater challenge for the reconstruction.Hence, the overall approximation is mainly limited by the reconstruction whereas the regression is the decisive factor for the displacements.At the same time, however, it must be noted that although worse results are achieved for the stress on average, there a less outliers than it is the case for the displacements.In addition to the distribution of the scores, the interested reader can view the performance scores for all individual test simulation in appendix B in Fig. B2 for a more detailed view.
Besides the evaluation criteria already presented, physical errors are considered in the following in order to contextualize them in real terms.In Fig. 7a, the mean (13) and the maximum reached Euclidean node distance ( 14) are shown for the three reduction algorithms.All of them manage to stay below a mean node distance of 1.75 mm for all 100 test simulations.PCA, AE, and VAE are within a similar range but KPCA performs worst.The same applies for the maximum node distance.The rather large values for the KPCA compared to the moderate mean values suggests significant outliers in the approximation of single nodes or areas.
PCA is not able to provide a low mean error for the stress data as visualized in Fig. 7b.Here, it can be seen that all nonlinear reduction methods have a significant advantage.On the contrary, for the maximum error PCA yield a similar accuracy and all algorithms are within a similar range.This suggests that PCA is good at capturing local areas of high peaks, but does not perform that well for the entire model.
In addition to the numeric criteria presented so far, a visual assessment is given in the following to provide information about the most challenging parts of the arm model.A random set of seven simulations is picked from the test set and shown in Fig. 8  i.e. the supposed ground truth, is shown at the top with the corresponding muscle activation of that simulation visualized as bar plot above.The results achieved by the three surrogate models are placed below with the nodes colored by their individual node distance e 2 .It becomes particu-larly apparent which arm regions are most difficult to reproduce.This includes the complete forearm including the bones radius and ulna as well as the brachioradialis muscle on the one hand, and the biceps brachii on the other hand.In particular, the forearm is the part with the largest displacement so it is not surprising to find him in the worst approximated parts.With the individual surrogate models, it is particularly noticeable that the one which relies on PCA and KPCA seem to have larger problems reproducing the triceps bachii.Nevertheless, the error in this region is low for all surrogates.The combination of KPCA and GP, furthermore, leads to large errors in the human arm for some muscle activations.For example, the complete biceps brachii is worse approximated in the last three visualized simulations.In contrast to the motion of the arm, the areas of high loading/large stress occur at different points for the stress.
As expected, the highest stress occurs primarily in the tendons, while only minor stress is found in the muscles as visualized in Fig. 9 where the stress distribution obtained from the FE model are presented at the top.Below of the reference solution, the approximation of the surrogate models colored by the respective errors are plotted.The largest discrepancies therefore unsurprisingly occur in the regions of highest stress.However, it is remarkable that the PCA-based surrogate has larger areas of notable errors than the other surrogate models.

Discussion
The presented results demonstrate that the used methodology can create versatile surrogate models for a high-dimensional human arm model in a data-poor regime even for variables that are difficult to capture.The modular LRR algorithm can be used to approximate stress as well as displacements.The surrogate models enable evaluation in real-time scenarios as they possess a very low computational time in the range of milliseconds.It is also possible to use them on resource-constrained hardware due to the savings in computational complexity.
In comparing the different methods for the dimensionality reduction of the FE data exciting differences are revealed.As expected PCA performs well for displacements and is the simplest in its application.The results for the KPCA, on the other hand, are contradictory.On the one hand, no other reduction algorithm reconstructed the data that well from the reduced/latent space.On the other hand, it is more sensitive to errors in the reduced/latent space introduced by the regression algorithm so that the overall approximation suffers significantly.There may be other kernel functions and hyperparameters with which those hurdles can be surmounted but none of the kernel function and hyperparameter combinations tested during the grid search yielded better results.The autoencoder delivers an overall satisfactory result except for the more expensive training phase.Although a fully connected AE was used, contrary to expectations, we were able to succeed in a data-poor regime.Variational autoencoder have proven to be almost similar performant as the classical autoencoder but with a more interpretable reduced/latent space.
Even though slightly better results were obtained with the nonlinear reduction methods than with the linear PCA, the differences are marginal for the displacements.For the stress, on the contrary, it could be shown that PCA was outperformed considering the mean error.However, the nonlinear methods did not achieve the significant advantage over the linear methods expected for the stress data.This could be due, among other things, to the fact that the autoencoders were trained with relatively little data.
Overall, it can be said that the LRR algorithm is suitable for approximating high-dimensional quantities that are difficult to represent.Depending on the intended use, the individual components must be carefully selected.The Gaussian process regression was successfully combined with all reduction techniques.As one scope of this work was to compare the reduction methods, the GP was not optimized for the individual combinations and consequently introduced a non neglectable error to the overall approximation.With further hyperparameter-tuning the overall error could be further decreased.Moreover, even though GPs worked well in previous literature, other regression methods such as neural networks have also shown satisfying results.For the dimensionality reduction, on the contrary, the results hardly depend on the underlying problem.While all algorithms worked well for data like displacements they introduced a larger error when it comes to the reconstruction of stress data.The best results for stress were still achieved by the nonlinear methods, i.e., KPCA, AE and VAE.However, since the first method has been found to be prone to outliers, the latter is generally preferable.
Besides the approximation quality measures, soft factors can also play a role in the choice of the reduction algorithm.Differences can be found, especially during the offline phase but also regarding the consumption of resources.While PCA usually works right away, KPCA AE, and VAE require a more extensive hyperparameter tuning including the choice of kernel function for the KPCA and the network architecture for the AE and VAE.The latter also require a computational expensive training phase while the former consumes the most memory for evaluation.

Conclusion
While real-time evaluation of complex structural dynamical problems has not been feasible for a long time, it is possible with today's resources and tools.Especially machine learning methods can serve as extremely fast enabler for surrogate models.With the combination of dimensionality reduction and a Gaussian process regressor we were able to create accurate yet real-time capable surrogate models for a complex human arm.In this process, not only often considered physical quantities like displacements but also often neglected stress values have been approximated.The latter turned out to be more challenging to be represented in a low-dimension without loosing to much information.Accordingly, nonlinear reduction techniques in the form of kernel principal component analysis, autoencoder, and variational autoencoder were implemented besides the linear reduction in the form of the principal component analysis.All methods were able to process the displacements satisfactorily.However, for the challenging reduction of the stress data the nonlinear reduction methods only slightly outperformed the linear one which may be owed to the comparatively small data basis used.Nonetheless, the stress behavior of the complex arm model could be satisfyingly and quickly approximated by the surrogate models.As soon as disturbances in the reduced coordinates occur, differences between them become apparent.This means errors induced by the Gaussian process approximating the reduced states can lead to non neglectable errors in the overall approximation, especially for the KPCA-based surrogates.Overall, the autoencoder-based surrogate yielded the best results while the one using PCA performs well albeit its simple application.
We summarize the the effects of the various reduction techniques: i) Principal Component Analysis: PCA Captivates with its simplicity and is computationally the most favorable method in the offline ans online phase as neither expensive hyperparameter tuning nor an expensive training phase is necessary.However, PCA can reach its limits when linear reduction, i.e., a linear combination of reduced basis vectors, is a too great restriction.ii) Kernel Principal Component Analysis: KPCA yields the best reconstruction from coordinates transformed into the reduced space.However, it is more sensitive to errors in the reduced space and thus leads to larger overall errors.No hyperparameter are found which were able to reconstruct well and robust.The offline phase including the hyper parameter tuning is expensive and the memory consumption non neglectable.iii) Autoencoder: AEs provided the best overall approximation and still are fast.These advantages are bought by means of an expensive training phase.Furthermore, they require a lot of data which may limits their applicability for certain problems.iv) Variational Autoencoder: VAEs regularize the reduced space leading to more interpretable reduced coordinates without losing much reconstruction capability compared to the normal autoencoder.Otherwise they behave similarly.
The networks output is represented by the output of the last layer, i.e., y = x (n l ) .

Fig. 4 :
Fig. 4: Scaled singular values of displacement and stress data for PCA as indicator of the Kolmogorov n-width.

Fig. 5 :
Fig. 5: Reconstructed arms for reduced coordinates sampled around the mean of the first entry for PCA, KPCA, AE, and VAE.

Fig. 6 :
Fig.6: Performance scores achieved by surrogate models using different reduction algorithms for displacement and stress data.
. The finite element simulation results, mm (a) Mean and maximum node distance of displacement data.Mean and maximum error of stress data.

Fig. 7 :
Fig. 7: Mean and maximum errors for displacements (a) and stress (b).The plots are sorted by the error made by the surrogate using PCA.

1 Fig. 8 :
Fig. 8: Visualization of the displacements of test simulations comparing the reference results (top) with their approximations using PCA, KPCA, AE, and VAE (from top to bottom).The approximations are colored with respect to the corresponding error of the individual nodes.

1 Fig. 9 :
Fig.9: Visualization of the stress of test simulations comparing the reference results (top) with their approximations using PCA, KPCA, AE, and VAE (from top to bottom).The reference is colored with respect to the stress occurring in the individual node, while the approximations are colored with respect to the corresponding error.

Table 1 :
Definitions of the most important variables.The goal of this work is to approximate z by z with suitable surrogate models.

Table 5 :
Performance measurements Although the AE does in most cases not place first in either of the reconstruction and regression categories, it