1 Introduction

Physical prototypes can be used in scenario-based testing applications, e.g. for safety considerations in crashworthiness investigations. However, such prototypes alone are not practical as they are prohibitively expensive, inflexible, and time-consuming to build and test. Thus, computer-aided engineering emerged as a key pillar in product development, e.g. in automotive [1]. In this context, explicit structural dynamical simulations are particularly relevant, e.g. for identifying weak points in the design of crash structures and optimizing the use of material.

Such simulations are used in a variety of applications including many-query evaluations, design optimizations, or deployment on low-budget hardware. While many of these applications require high accuracy, the accuracy of classic solvers for high-fidelity simulation models is limited by the resolution of the spatial discretization. Consequently, increased accuracy comes at the cost of increased dimensionality and thus computational effort. This makes modern high-fidelity simulation models prohibitive for certain applications such as large parameter studies, for usage on weak hardware, or for real-time applications. Hence, an urgent need for surrogate models that keep their high-fidelity counterparts’ expressiveness while being more cost-effective to simulate is present.

Fortunately, the intrinsic dimension of a given problem is in many cases much smaller and the actual solution space lies on a low-dimensional manifold. This led to the emerge of data-based model order reduction (MOR) [2,3,4,5,6,7] as a viable solution to the task of creating efficient yet accurate surrogate models by identifying suitable low-dimensional descriptions. In this context, two primary challenges must be addressed: One challenge is to identify expressive coordinates that are simultaneously low-dimensional to ensure computational efficiency, yet still adequate for describing the system. The other is the approximation of the (parameter-dependent) system dynamics in the identified reduced coordinates. Widely used data-driven methods to construct a low-dimensional embedding include linear methods such as the proper orthogonal decomposition (POD) [8] (also known as principal component analysis (PCA)) and its nonlinear counterpart, the autoencoder (AE), which produces nonlinear manifolds. Such data-driven methods are particularly appropriate in the context of commercial simulation software with inaccessible source code, where all that is left is data and limited information about the model itself.

Most of the existing MOR approaches directly operate on the given high-dimensional discretization, i.e. the resulting surrogates always try to approximate a system for a fixed high resolution. However, as already mentioned this resolution usually is neither driven by the underlying problem nor the user’s intended application. In many cases, a coarser or adaptive resolution has advantages if the accuracy does not suffer as a result. For example, considerable computational power can be saved when visualizing complex three-dimensional systems by using coarser resolutions. Moreover, static resolutions cannot react to changes in computational environments, like dynamically changing memory restrictions, or to changes in the desired approximation quality. Consequently, the question arises as to why we should limit data-based surrogate models to these fixed original resolutions when they have the advantage of not being limited by the spatial resolution? They can operate on a coarse subselection of a fine mesh with an accuracy only limited by their expressiveness and the high-fidelity data. One discretization-free approximation scheme for parametrized PDEs can be found in [9] and other mesh-free approaches are given in [10] and [11, 12].

In this work, we develop an approach that is fundamentally mesh-free, i.e., it is not restricted to the underlying high-resolution discretization. Instead, we take advantage of the fact that the surrogates do not require fine spatial resolution by first excluding large parts of the model during model creation while taking into account the recent advantages of graph convolutional neural networks (GCNNs) [13]. For this purpose, we transfer and adjust ideas from multiresolution autoencoders [14] to make them applicable to irregular data. In particular, we present a graph-convolutional hierarchical multiscale approximation scheme for a given system in which the global context is captured in coarse representations. By doing so, we can speed up the learning process, create multiple models with individual hardware requirements, and resolve multiscale issues that often arise in spatio-temporal dynamics of complex systems.

Our fundamental idea is (i) to represent the high-fidelity model in a graph-like structure, (ii) apply mesh simplification to derive coarse representations, (iii) fit a surrogate model on the coarsest representation, (iv) refine the model, and (v) fit another surrogate on the next finer level leveraging transfer learning. The steps (iv) and (v) can be repeated until a performance threshold is met or no more coarse representations are available. An abstracted visual impression of this workflow is given in Fig. 1, where it may be seen that our approach resembles U-NETs [15] in its structure. The individual surrogate models themselves are composed out of graph convolutional autoencoders, which construct low-dimensional coordinates, and out of multi-layer perceptrons (MLPs) that approximate those coordinates based on the time and given parameters.

Fig. 1
figure 1

Multi-hierarchical Surrogate modeling approach. Instead of learning on the full system discretization, we learn on a coarse representation of it. We then progressively refine the coarse representation by learning on the residual error. First, different levels of discretization are generated for the original model. On the coarsest level, a surrogate model is trained. If the error is not within the tolerance, the learned model is upsampled to the next level, and an additional surrogate is learned to capture the inaccuracies of the first one. The process is repeated until the error is within the tolerance or no more discretizations are left

1.1 State-of-the-art

The proposed method is located in the research area of data-driven MOR under consideration of graph convolutional networks and hierarchical modeling. Reference is therefore made in the following to related work in these areas.

Data-driven MOR Even though data-driven MOR using POD-based surrogate models continues to be widely used and often is able to produce satisfying results for many problems [16,17,18,19,20], autoencoders [21,22,23] have been shown to outperform their linear counterpart for problems with slowly decaying Kolmogorov n-width [24]. There also exist combinations of POD and autoencoders [25]. Of the many different autoencoder architectures, convolutional autoencoders in particular have stood out [21, 22, 26, 27] as they exploit spatial information and can detect local patterns using filters. This makes convolutional neural networks also interesting for other applications in structural dynamics, see e.g. [28, 29]. Meanwhile, MOR methods such as POD are also used to improve convolutional neural networks, for example to reduce the number of layers [30].

Unfortunately, conventional convolutional neural networks (CNNs) face the significant limitation of only being applicable to regular grid-like data (e.g. to images). Irregular data, on the contrary, as it is present in complex three-dimensional discretized crash simulations requires new techniques, thus leading to an increased interest in geometrical deep learning [31]. While some approaches map the irregular domain to a regular one [32] and apply convolutions there, others apply convolutional-like operations on dynamically constructed graphs of point-clouds [33], and still others apply generalizations of CNN architectures to non-Euclidean domains [34].

Graph convolutional neural networks Graph convolutional neural networks [13] can be directly applied to irregular data by transferring the principle of convolutions to geometric problems. They can extract information and relations about features of nodes from their spatial connections. An early version of GCNNs can be found in [35] and an adaption of it in [36]. GCNNs are found in the context of MOR in [37], where a graph convolutional autoencoder using gcn2 convolutions [38] is compared to a classic fully connected AE for the creation of reduced order models. Furthermore, in [39] a spatial graph convolutional autoencoder is used to derive reduced order models and [40] utilizes GCNNs for the approximation of time-dependent PDEs under geometric variability. In contrast to classic convolutions, graph convolutions themselves cannot automatically reduce the dimensionality of the data. To decrease the number of nodes that are processed in the layers, several pooling operations for irregular data are developed [41]. A general overview of GCNNs can be found in [42] and a literature review focused on MOR in [39].

Graph Networks In addition to graph convolutional neural networks, many other exciting applications in geometrical deep learning have emerged like graph networks [43]. Graph network-based simulation models have been used to model a physically informed simulation model in which the graph network outputs state-time derivatives that are then used in ODE integrators for future time predictions [44, 45]. Moreover, symbolic representations of a learned model are discovered by applying symbolic regression to components of its message passing function [46, 47]. Graph networks are also used in generative tasks where graphs are built sequentially based on learned distributions [48]. Another recent development are attention-based graph transformer models [49] that have been used, for example, as neural operators to capture the solutions of PDEs [50].

Hierarchical Structures in Graph Convolutional Networks Including a hierarchical structure in GCNNs is a natural way of proceeding. In Graph U-Nets [51], pooling layers are used to form smaller graphs using a trainable projection vector. Moreover, a multiscale MeshGraphNet that operates on different resolutions is introduced in [52]. A coarse resolution is used to propagate information further and overcome the issue of slow message propagation occurring in fine resolutions. Moreover, [53] use two MeshGraphNets to create surrogate models for FE simulations of latticed structures. The first one captures the dynamics on a reduced graph representation of the structure and the second one maps these results onto the full-scale displacements.

Another application of mesh reduction is applied in [54], where information of a fine graph is encoded in a coarse subset of the nodes. By doing so they can evolve the latent dynamics efficiently in time using an attention model. Note that the hierarchical approaches are also incorporated in other architectures like variational autoencoders [55] and are not only used in the spatial domain but also for the evolution of dynamics in time, as explained in [56].

1.2 Main contributions

Due to the hierarchical structure of our proposed approach, global dynamics can be captured on the coarsest surrogate, whereas finer details are captured in the refined versions. Thus, the framework is naturally suited for multiscale problems, where macro- and microscale dynamics occur at the same time. Modeling such systems poses a particular challenge and is therefore often approached in a special way. For example, in multigrid methods [57, 58], where a coarse-grained model is gradually refined in areas of high inaccuracy in order to achieve the required accuracy. Such methods also have been unified with convolutional neural networks [59].

While other hierarchical approaches use hierarchical structures only to foster the learning process for an approximation of the fine discretized high-fidelity solution, we use coarse representations that are physically and visually interpretable and consequently directly useful. Moreover, we never perform costly training of a surrogate on the fine data and can reduce the latent representation to a vector of the intrinsic dimension instead of a small but comparably larger graph. Additionally, we build the surrogates one after the other enabling the learning process to be stopped if desired. In doing so, we still take advantage of already learned behavior by applying transfer learning from coarser to refined surrogates.

In addition, we consider a numerical example in form of a simplified kart frontal impact simulation with a number of specific challenges: The considered scenarios encompass a multitude of parameter dependencies, which determine the impact and exhibit complex material properties. Furthermore, they are subject to high nonlinearities and contact with plastic deformation. In contrast to other structural dynamical systems often considered in the literature, transient dynamics without stationary system behavior must be approximated instead of the post-transient dynamics on an attracting submanifold. Moreover, the considered example is a finite element model and consequently shares the challenges with other similar systems including (i) the sheer dimensionality of such systems, (ii) the inaccessibility of commercial software code, and (iii) the computationally intensive data generation. Other works considering crashworthiness or impact scenarios in the context of data-driven surrogate modeling can be found in [19, 60, 61].

The highlights of our work can be summarized as follows:

  1. 1.

    We propose a multi-hierarchical nonlinear model reduction scheme that

    1. (a)

      creates adaptive models for different needs (resolution, memory etc.) in visually interpretable domains

    2. (b)

      leverages transfer learning to progressively refine surrogate models

  2. 2.

    The proposed surrogate architecture is especially suited for multiscale problems and complex discretized 3D structural dynamical problems

  3. 3.

    We provide accurate yet efficient data-driven surrogates for the deformations of a nonlinear finite element kart frame frontal impact simulation in a structured and efficient manner

1.3 Structure

The paper is structured as follows: The proposed multi-hierarchical surrogate modeling approach is explained in Sect. 2 along with the required theory, followed by the presentation of the considered numerical example in form a kart frontal impact simulation in Sect. 3. The surrogate modeling application as well as the results and discussion are presented in Sect. 4. The paper ends with a conclusion in Sect. .

2 Multi-hierarchical surrogate modeling approach

A detailed explanation of the general problem setup, the theory required to follow the explanations and most importantly, the multi-hierarchical surrogate modeling approach itself is given in this section. The following explanations are not especially tailored to the kart example considered later, but follow a general manner so that interested readers can transfer them to their individual problem classes more easily.

2.1 Problem setup

Consider a nonlinear dynamical system

$$\begin{aligned} \frac{\textrm{d}}{\textrm{d}t}\varvec{x}(t, \varvec{\mu })=\varvec{f}(\varvec{x}(t, \varvec{\mu }), t, \varvec{\mu }) \end{aligned}$$
(1)

which is determined by the time \(t\in {\mathcal {T}}\subseteq \mathbb {R}^{+}\), the system state \(\varvec{x}\in {\mathcal {X}}\subseteq \mathbb {R}^{N}\), and (simulation) parameter \(\varvec{\mu }\in {\mathcal {M}}\subseteq \mathbb {R}^{\ell }\). Usually, time-stepping schemes are used to approximate the discrete-time flow map

$$\begin{aligned} \varvec{x}(t, \varvec{\mu })=\varvec{F}(t,\varvec{\mu },\varvec{x}_{0}), \end{aligned}$$
(2)

i.e. the solution of system Eq. (1) at \(\eta \) discrete time points \(t\in \{t_0,...,t_{\eta -1}\}\). The flow map \(\varvec{F}:{\mathcal {T}}\times {\mathcal {M}}\times {\mathcal {X}}\rightarrow {\mathcal {X}}\) describes the mapping from the initial condition \(\varvec{x}_0\in \mathbb {R}^{N}\) and parameter \(\varvec{\mu }\) to the solution at a given time \(t\ge t_0\). In the course of this paper, our major goal is to find a surrogate model \(\varvec{\Sigma }\) that approximates the solution Eq. (2) of Eq. (1), i.e.

$$\begin{aligned} \tilde{\varvec{x}}(t, \varvec{\mu })=\varvec{\Sigma }(t,\varvec{\mu },\varvec{x}_{0})\approx \varvec{F}(t,\varvec{\mu },\varvec{x}_{0}) \end{aligned}$$

while significantly reducing computational requirements. This can often be expressed in terms of the computational time so that the computational time of the surrogate \(\Delta T_{\varvec{\Sigma }}\) is much faster than that of the original system \(\Delta T_{\varvec{F}}\): \(\Delta T_{\varvec{\Sigma }} \ll \Delta T_{\varvec{F}}\).

If, however, not all states are of interest but only a subselection, which is the case when coarsening the original discretization of the system, the surrogate can operate on a downsampled state \(\varvec{x}_{\text {d}}(t, \varvec{\mu })= \mathcal {\varvec{D}}\varvec{x}\in {\mathcal {X}}_{\text {d}} \subset {\mathcal {X}}\)\({\mathcal {X}}_{\text {d}} \subset \mathbb {R}^{n_{\text {d}}}\), where \(\mathcal {\varvec{D}}\in \{0,1\}^{n_{\text {d}}\times N}: {\mathcal {X}}\rightarrow {\mathcal {X}}_\text {d}\) is a binary selection matrix with \(n_{\text {d}}< n\). Its entries are \(\mathcal {\varvec{D}}(p,q)=1\) when the \(q\)-th state is kept or \(\mathcal {\varvec{D}}(p,q)=0, \ \forall p\in \{1,\dots ,n_{\text {d}}\}\) when the  \(q\)-th state is discarded. Consequently, the surrogate \(\varvec{\Sigma }_\text {d}\) only needs to approximate the selected states of the system’s solution

$$\begin{aligned} \tilde{\varvec{x}}_\text {d}(t, \varvec{\mu })=\varvec{\Sigma }_\text {d}(t,\varvec{\mu },\mathcal {\varvec{D}}\varvec{x}_{0})\approx \mathcal {\varvec{D}}\varvec{F}(t,\varvec{\mu },\varvec{x}_{0})=\varvec{x}_\text {d}(t, \varvec{\mu }). \end{aligned}$$

This can on the one hand ease the surrogate modeling process, but is on the other hand especially useful in cases when the fine discretization of the model does not result from user requirements. In order to obtain a suitable downsampling operation, methods from the field of computer vision are useful.

2.2 Down- and upsampling

For the sampling operations, we rely on surface simplification using quadric error metrics [62], which is a method that produces coarse representations of a given mesh maintaining its shape, i.e., its geometrical characteristics. The method does not necessarily preserve the topology of the mesh as topological holes can be closed and unconnected regions can be joined. Classic FE mesh simplification approaches focus on maintaining the topology, but as we are interested in coarse representations of the original system that have a similar visualization the mentioned method is preferable. The same sampling approach is used as a pooling operation in the context of graph convolutional autoencoders in CoMA [63].

Specifically, we assume that the considered model can be interpreted as an undirected graph \({\mathcal {G}}=(\varvec{{\mathcal {N}}}, {\mathcal {E}}, \varvec{A})\) with a set of vertices (nodes) \(\varvec{{\mathcal {N}}}\in \mathbb {R}^{n\times 3}\) and edges \({\mathcal {E}}\in \mathbb {R}^{n_{\text {e}}\times 2}\) describing the node connectivity defined by the adjacency matrix \(\varvec{A}\in \{0,1\}^{n\times n}\). Note that the adjacency can also be weighted. The high-fidelity model used in this elaboration is a finite element (FE) model so that the representation as a graph corresponds to the model formulation, as FE models are composed out of elements that contain nodes and define neighborhoods through their edges.

Downsampling Operation The downsampling operation of the nodes is defined by

$$\begin{aligned} \varvec{{\mathcal {N}}}_\text {d}=\mathcal {\varvec{D}}\varvec{{\mathcal {N}}}\end{aligned}$$

with the downsampling matrix defined as previously but with dimensions aligning to the number of nodes \(\mathcal {\varvec{D}}\in \{0,1\}^{n_{\text {d}}\times n}:\varvec{{\mathcal {N}}}\rightarrow \varvec{{\mathcal {N}}}_\text {d}\) with \(n_{\text {d}}< n\) and \(\varvec{{\mathcal {N}}}_\text {d}\subset \varvec{{\mathcal {N}}}\). The selection of the nodes to keep follows [62] using iterative vertex pair contraction. In general, for a given pair of nodes \((\varvec{\nu }_p, \varvec{\nu }_q)\), a vertex pair contraction \((\varvec{\nu }_p, \varvec{\nu }_q) \rightarrow \varvec{\nu }\) moves node \(\varvec{\nu }_q\) to a new position \(\varvec{\nu }\), connects incident edges to \(\varvec{\nu }_q\), deletes the second node \(\varvec{\nu }_p\), and removes all degenerate edges and faces. As we only consider selected nodes instead of adjusted nodes, the position of the kept node is not changed, and the contraction results in \((\varvec{\nu }_p, \varvec{\nu }_q) \rightarrow \varvec{\nu }_q\). To, introduce a measure which determines which nodes are kept, each node \(\varvec{\nu }=[\nu _x, \nu _y, \nu _z, 1]^\intercal \) is associated with an quadratic error

$$\begin{aligned} \Delta (\varvec{\nu })=\varvec{\nu }^\intercal (\varvec{Q})\varvec{\nu } \end{aligned}$$
(3)

which is defined w.r.t \(\varvec{Q}\in \mathbb {R}^{4 \times 4}\) describing the distance of a given point to the set of planes on which intersections the node is placed. The procedure can be summarized as follows:

  1. 1.

    Select valid vertex pairs (either neighbors or close distant nodes)

  2. 2.

    Select best \(\varvec{\nu }\) out of \(\{\varvec{\nu }_p, \varvec{\nu }_q\}\) for each valid pair based on cost \(\varvec{\nu }^\intercal (\varvec{Q}_p+\varvec{Q}_q)\varvec{\nu }\)

  3. 3.

    Iteratively remove node from pair \((\varvec{\nu }_p, \varvec{\nu }_q)\) of least cost, update costs

Upsampling Operation It is of interest to recover the original representation of the model from every coarse representation given. Unfortunately, a lossless reconstruction of the original mesh based on the simplified one is in general not possible. Consequently, we seek an upsampling matrix \(\mathcal {\varvec{U}}\in \mathbb {R}^{n\times n_{\text {d}}}\) with

$$\begin{aligned} \varvec{{\mathcal {N}}}_\text {u}=\mathcal {\varvec{U}}\varvec{{\mathcal {N}}}_\text {d} \end{aligned}$$
(4)

that approximates the original mesh. In this work, we follow the procedure of [63] and generate the upsampling matrix during the downsampling matrix creation process. A node \(\varvec{\nu }_q\) that is kept in the downsampling process will lead to an entry in the upsampling matrix that follows \(\mathcal {\varvec{U}}(p,q)=1\). A discarded node \(\varvec{\nu }_p\), on the contrary, is mapped onto the down-sampled mesh using barycentric coordinates projecting it into the closest triangle (ijk) in the down-sampled mesh

$$\begin{aligned} \tilde{\varvec{\nu }}_p={w}_i \varvec{\nu }_i+{w}_j \varvec{\nu }_j+{w}_k \varvec{\nu }_k \end{aligned}$$

with \(\varvec{\nu }_i,\varvec{\nu }_j,\varvec{\nu }_k \in \varvec{{\mathcal {N}}}_\text {d}\) and \({w}_i+{w}_j+{w}_k=1\). The upsampling matrix is then updated with the corresponding weighting factors so that \(\mathcal {\varvec{U}}(p,i)={w}_i, \ \mathcal {\varvec{U}}(p,j)={w}_j, \ \mathcal {\varvec{U}}(p,k)={w}_k\). Visual examples of the coarsened FE model are given in Fig. 2.

Fig. 2
figure 2

Differently resolved representations of the kart frame. The red dots represent the nodes

2.3 Surrogate modeling

Having the coarse representations of a model present enables the surrogate modeling process at the different levels. In this work, we rely on graph convolutional neural networks (GCNNs) to create a low-dimensional latent representation of the system state. Note that any other data-driven dimensionality reduction scheme, linear as well as nonlinear, can replace the GCNNs. Nevertheless, GCNNs profit from the proposed framework since the learning process is eased and accelerated, and they provide the best approximation quality among the tested methods.

2.3.1 Graph convolutional neural networks

Graph convolution neural networks generalize convolutional neural networks to irregular discretized domains as it is present in FE models. To understand the underlying principle of graph convolutions, recall a graph \({\mathcal {G}}=\{\varvec{{\mathcal {N}}},{\mathcal {E}},\varvec{A}\}\) as described in Sect. 2.2. A graph signal \(\varvec{x}\in \mathbb {R}^{n}\) is a feature vector of all \(n\) nodes in the graph. A beneficial technique to calculate a convolution between a filter \(\varvec{g}\in \mathbb {R}^{N}\) and signal \(\varvec{x}\) is that a convolution is just a multiplication in Fourier space. To receive a Fourier transform \(\hat{\varvec{x}}\) of the signal, a Fourier basis can be obtained from an eigenvalue factorization of the normalized Laplacian of a graph. The Laplacian is defined as \(\varvec{L}^*=\varvec{D}-\varvec{A}\) with the adjacency matrix \(\varvec{A}=\varvec{A}({\mathcal {G}})\) and the diagonal matrix of node degrees \(\varvec{D}\) with entries \(\varvec{D}_{i,i}=\sum _j \varvec{A}{i,j}\). The normalized version of the Laplacian \(\varvec{L}=\varvec{I}_n-\varvec{D}^{-\frac{1}{2}}\varvec{A}\varvec{D}^{\frac{1}{2}}\) is real symmetric positive semidefinite. Hence, the factorization \(\varvec{L}=\varvec{U}\varvec{\Lambda }\varvec{U}^\intercal \) exists and the matrix \(\varvec{U}=\begin{bmatrix} \varvec{u}_1&\varvec{u}_2&\dots&\varvec{u}_n \end{bmatrix}\) represents eigenvectors of the Laplacian ordered by their corresponding eigenvalues, which are stored in the diagonal matrix \(\varvec{\Lambda }\). The eigenvectors \(\varvec{u}\) are known as Fourier modes of \({\mathcal {G}}\).

A Fourier transform of a signal \(\varvec{x}\) is then given by \(\hat{\varvec{x}}={\mathscr {F}}(\varvec{x})=\varvec{U}^\intercal \varvec{x}\) and the inverse Fourier transform by \(\varvec{x}={\mathscr {F}}(\hat{\varvec{x}})^{-1}=\varvec{U}\hat{\varvec{x}}\). Given those transformations, the graph convolution between the signal \(\varvec{x}\) and a filter \(\varvec{g}\) results in

$$\begin{aligned} \varvec{x}*\varvec{g}={\mathscr {F}}^{-1}({\mathscr {F}}(\varvec{x})\odot {\mathscr {F}}(\varvec{g}))=\varvec{U}(\varvec{U}^\intercal \varvec{x}\odot \varvec{U}^\intercal \varvec{g}), \end{aligned}$$
(5)

where \(\odot \) represents the Hadamard/elementwise product.

Denoting the filter as \(\varvec{g}_{\varvec{{w}}}=\text {diag}(\varvec{U}^\intercal \varvec{g})\) and using the conversion \(\varvec{a}\odot \varvec{b}=\text {diag}(\varvec{b})\varvec{a}\), Eq. (5) simplifies to

$$\begin{aligned} \varvec{x}*\varvec{g}_{\varvec{{w}}}=\varvec{U}\varvec{g}_{\varvec{{w}}}\varvec{U}^\intercal \varvec{x}, \end{aligned}$$
(6)

which is a formulation all spectral-based GCNNs follow. The idea in spectral convolutional neural networks is that the filters \(\varvec{g}_{\varvec{{w}}}=\varvec{W}_{c_i,c_j}^{(l)}=\text {diag}(\varvec{{w}}_{c_i,c_j}^{(l)})\) are the learnable weights \(\varvec{W}_{c_i,c_j}^{(l)}\) in convolutional layers

$$\begin{aligned} \varvec{X}_{:,c_j}^{(l+1)}=\varvec{h}^{(l)}( \sum _{c_i=1}^{n_{c}^{(l)}} \varvec{U}\varvec{W}_{c_i,c_j}^{(l)}\varvec{U}^\intercal \varvec{X}^{(l)}_{:,c} ). \end{aligned}$$
(7)

Here, \(l\) denotes the layer index, \(c_i\) and \(c_j\) are the channel indices, \(n_{c}\) is the number of channels in the \(l\)-th layer, \(\varvec{h}\) is the activation function, \(\varvec{W}_{c_i,c_j}^{(l)}\) is a diagonal matrix with learnable weights of the \(l\)-th layer, and \(\varvec{X}^{(l)}_{:,c}\) is the \(c\)-th channel of \(\varvec{X}^{(l)}\in \mathbb {R}^{n\times n_{c}^{(l)}}\) where \(\varvec{X}^{(0)}=\varvec{X}\in \mathbb {R}^{n\times n_{c}^{(0)}}\) represents the original signal of the graph, in our case the \(n\) nodes of the FE model with the coordinates stored in three channels, i.e. \(n_{c}^{(0)}=3\). The filter formulation of Eq. (7) is not localized in space and requires a high learning complexity. Hence, the use of polynomial filters \(\varvec{g}_{\varvec{{w}}}=\sum _{k}^{K}{w}_k\varvec{\Lambda }^k\) is considered in [35]. As such filters still require costly matrix multiplications with the non sparse Fourier basis \(\varvec{U}\) they propose to use polynomials that can be recursively calculated from the Laplacian \(\varvec{L}\) resulting in ChebNet [35].

Chebyshev Spectral Convolutional Neural Networks In Chebyshev spectral convolutional neural networks [35], the filter \(\varvec{g}_{\varvec{{w}}}\) is approximated by Chebyshev polynomials \(T\) of order \(K\). By doing so, the costly multiplications with the non sparse Fourier basis are replaced by \(K\) multiplications with the sparse Laplacian. In detail, the filter is represented by the Chebyshev polynomials \(\varvec{g}_{\varvec{{w}}}(\check{\varvec{\Lambda }})= \sum _{k=0}^{K} \varvec{{w}}_kT_k(\check{\varvec{\Lambda }})\) of the eigenvalue matrix \(\check{\varvec{\Lambda }}\) of the scaled Laplacian \(\check{\varvec{L}}=2\varvec{L}/ \lambda _{\max } - \varvec{I}_n\). Here, \(\varvec{{w}}_k\) are learnable polynomial coefficients, and the scaling ensures that all eigenvalues are within \([-1,1]\).

Substituting, this filter in Eq. (6) and exploiting the transformation \(\varvec{g}_{\varvec{{w}}}(\check{\varvec{L}})=\varvec{U}\varvec{g}_{\varvec{{w}}}(\check{\varvec{\Lambda }})\varvec{U}^\intercal \), results in a graph convolution

$$\begin{aligned} \varvec{x}*\varvec{g}_{\varvec{{w}}}&= \varvec{U}\varvec{g}_{\varvec{{w}}}(\check{\varvec{\Lambda }})\varvec{U}^\intercal \varvec{x}= \varvec{U}\left( \sum _{k=0}^{K} \varvec{{w}}_kT_k(\check{\varvec{\Lambda }}) \right) \varvec{U}^\intercal \varvec{x}\nonumber \\ &= \sum _{k=0}^{K}\varvec{{w}}_{k} T_k(\check{\varvec{L}})\varvec{x} \end{aligned}$$
(8)

that gets rid of multiplication with \(\varvec{U}\). Chebyshev polynomials themselves can be recursively calculated as \(T_k(\varvec{a})=2\varvec{a}T_{k-1}(\varvec{a})-T_{k-2}(\varvec{a})\) with \(T_{1}=\varvec{a}\) and \(T_{0}=\varvec{0}\).

In [36], the Graph Convolutional Network (GCN) is introduced which represents a first order approximation of ChebNet. Often GCNs face overfitting and oversmoothing, which was mitigated in GCN2 [38], where skip connections are used to propagate information over multiple layers. This approach is used in the context of MOR in [37]. Nevertheless, ChebNet yields better results for our example and is consequently used in the following.

2.3.2 Network architecture: a graph convolutional autoencoder with a multilayer perceptron

The network architecture that we use to create a surrogate model on the lowest level is shown in Fig. 3. It consists of (i) a (graph convolutional) autoencoder which is used to learn a low-dimensional embedding \({\mathcal {Z}}\subseteq \mathbb {R}^{r}\) for the high-dimensional state space \({\mathcal {X}}\) and (ii) a multilayer perceptron (MLP) to capture the parameter- and time-dependencies in the identified low-dimensional latent manifold. The autoencoder consists of an encoder \(\varvec{\Psi }_{\text {enc}}: {\mathcal {X}}\rightarrow {\mathcal {Z}}\) with learnable weights \(\varvec{W}_{\text {enc}}\) mapping the high-dimensional state to a low-dimensional latent representation, i.e. \(\varvec{z}= \varvec{\Psi }_{\text {enc}}(\varvec{x}, \varvec{W}_{\text {dec}})\), and a decoder \(\varvec{\Psi }_{\text {dec}}: {\mathcal {Z}}\rightarrow {\mathcal {X}}\) with learnable weights \(\varvec{W}_{\text {dec}}\) reconstructing the high-dimensional state from the low-dimensional latent representation, i.e. \(\breve{\varvec{x}}= \varvec{\Psi }_{\text {dec}}(\varvec{z}, \varvec{W}_{\text {dec}})\). In case of graph convolutional layers, \(\varvec{W}_{\text {enc}}\) and \(\varvec{W}_{\text {dec}}\) contain the trainable filters. The multilayer perceptron \(\varvec{\Phi }: {\mathcal {M}}\times {\mathcal {T}}\times {\mathcal {Z}}\rightarrow {\mathcal {Z}}\) maps the parameters, time and the encoded initial condition \(\varvec{z}_0\) to the corresponding latent state \(\tilde{\varvec{z}}= \varvec{\Phi }(\varvec{\mu }, t, \varvec{z}_0, \varvec{W}_{\text {mlp}})\) with trainable weights \(\varvec{W}_{\text {mlp}}\). As we only consider simulations starting from the same initial condition in our example, it is neglected in the following.

The complete autoencoder reconstructs a state following

$$\begin{aligned} \breve{\varvec{x}}=\varvec{\Psi }_{\text {dec}}\circ \varvec{\Psi }_{\text {enc}}(\varvec{x}) \end{aligned}$$
(9)

and the surrogate model that captures the (parametric) system dynamics is a function composition of the MLP and the decoder

$$\begin{aligned} \tilde{\varvec{x}}=\varvec{\Psi }_{\text {dec}}\circ \varvec{\Phi }(\varvec{\mu }, t). \end{aligned}$$
(10)

To adjust the weights \(\varvec{W}=\{\varvec{W}_{\text {enc}}, \varvec{W}_{\text {dec}}, \varvec{W}_{\text {mlp}}\}\) of the networks given some data, we minimize the loss function

$$\begin{aligned} {\mathcal {L}}=&\, \gamma _\text {approx}(\varvec{x}-\tilde{\varvec{x}})^2 + \gamma _\text {rec}(\varvec{x}-\breve{\varvec{x}})^2 \end{aligned}$$
(11a)
$$\begin{aligned} =&\,\underset{\text {Decoder and MLP}}{\underbrace{\gamma _\text {approx}(\varvec{x}-\varvec{\Psi }_{\text {dec}}(\varvec{\Phi }(\varvec{\mu }, t, \varvec{W}_{\text {mlp}}), \varvec{W}_{\text {dec}}))^2}} \end{aligned}$$
(11b)
$$\begin{aligned}&+ \underset{\text {Encoder and Decoder}}{\underbrace{ \gamma _\text {rec}(\varvec{x}-\varvec{\Psi }_{\text {dec}}(\varvec{\Psi }_{\text {enc}}(\varvec{x}, \varvec{W}_{\text {enc}}), \varvec{W}_{\text {dec}}))^2}} . \end{aligned}$$
(11c)

The first part of the loss Eq. (11b) ensures that the surrogate captures the system behavior for given parameters, the second part of the loss Eq. (11c) ensures that the autoencoder is able to reconstruct the state from the latent space well.

Fig. 3
figure 3

Graph convolutional autoencoder architecture with graph convolutional layers which operate on a mesh without pooling changing the number of features per node, fully-connected layers in the middle to reduce the dimensionality and an additional multilayer perceptron to capture the reduced dynamics

2.4 Transfer learning

Once a surrogate is found on the coarsest level, the surrogate modeling process can be repeated on the next level of refinement. However, instead of learning everything from scratch, the finer surrogate uses the output of the already trained coarser surrogate. To transfer the knowledge from one level to another, we connect the finer and the coarse graph representations of the system via down- and upsampling matrices, fix the already trained coarse surrogate, and add its output in the latent space and the reconstruction of the fine surrogate as can be seen in Fig. 4.

Fig. 4
figure 4

Transfer learning step inside the multi-hierarchical surrogate modeling approach. The weights of the coarse network are fixed, so that the finer surrogate only needs to capture not covered aspects

Consequently, the fine model only needs to capture inaccuracies and non-captured system behavior of the coarse one. The general architecture of the finer decoder, encoder, and MLP follow the previous definitions. The following section explains how to construct the multi-hierarchical model.

2.4.1 Multi-hierarchical model

The starting points for creating the multi-hierarchical model are the differently resolved discretizations of the original system. The multi-hierarchical modeling approach starts with creating a surrogate \(\varvec{\Sigma }_{\ell }\) on the deepest, i.e. coarsest level \(\ell \) and just follows the definitions given in Eq. (9) and Eq. (10) but operates on a downsampled state \(\varvec{x}_{\ell }=\mathcal {\varvec{D}}_{\ell } \varvec{x}\) instead of the original state description. Once the coarse surrogate is created, the surrogate modeling process continues to the next finer level.

The weights \(\varvec{W}_\ell \) of the already trained coarse surrogate are fixed to train the finer surrogate \(\varvec{\Sigma }_{\ell \text {-}1}\). The first adjustment compared to the presented standard modeling scheme takes place in the encoding of the system state. Instead of having a single encoder \(\varvec{z}_{\ell } = \varvec{\Psi }_{\text {enc}, \ell }(\varvec{x}_{\ell })\) as previously, the latent state is computed as an addition of two encoders

$$\begin{aligned} \begin{aligned} \varvec{z}_{\ell \text {-}1}&= \varvec{\Psi }_{\text {enc}, \ell \text {-}1} (\varvec{x}_{\ell \text {-}1}) \\&= \varvec{\Psi }_{\text {enc}, \ell }(\mathcal {\varvec{D}}_{\ell \text {-}1}^{\ell } \varvec{x}_{\ell \text {-}1} ) + \varvec{\Psi }_{\text {enc}, \ell \text {-}1}^{*}(\varvec{x}_{\ell \text {-}1} ) \\&= \varvec{\Psi }_{\text {enc}, \ell }(\varvec{x}_{\ell }) + \varvec{\Psi }_{\text {enc}, \ell \text {-}1}^{*}(\varvec{x}_{\ell \text {-}1})\\&= \varvec{z}_{\ell } + \varvec{\Psi }_{\text {enc}, \ell \text {-}1}^{*}(\varvec{x}_{\ell \text {-}1} ) \end{aligned} \end{aligned}$$
(12)

with \(\mathcal {\varvec{D}}_{\ell \text {-}1}^{\ell }\) being the downsampling matrix that maps a state from level \(\ell \text {-}1\) to  \(\ell \). In this context \(\varvec{\Psi }_{\text {enc}, \ell }\) is the trained and fixed encoder from the coarse level and \(\varvec{\Psi }_{\text {enc}, \ell \text {-}1}^*\) is a new trainable encoder.

A similar approach is chosen to reconstruct the state in the physical space. Therefore, we rely on an addition of the already trained decoder \(\varvec{\Psi }_{\text {dec}, \ell }\) and a trainable new one \(\varvec{\Psi }_{\text {dec}, \ell \text {-}1}^{*}\) resulting in the refined decoder

$$\begin{aligned} \breve{\varvec{x}}_{\ell \text {-}1}&= \varvec{\Psi }_{\text {dec}, \ell \text {-}1}(\varvec{z}_{\ell \text {-}1}) \nonumber \\&= \mathcal {\varvec{U}}_{\ell }^{\ell \text {-}1}\varvec{\Psi }_{\text {dec}, \ell }(\varvec{z}_{\ell \text {-}1}) + \varvec{\Psi }_{\text {dec}, \ell \text {-}1}^{*}(\varvec{z}_{\ell \text {-}1}) \nonumber \\&= \mathcal {\varvec{U}}_{\ell }^{\ell \text {-}1}\breve{\varvec{x}}_{\ell } + \varvec{\Psi }_{\text {dec}, \ell \text {-}1}^{*}(\varvec{z}_{\ell \text {-}1}) \end{aligned}$$
(13)

where \(\mathcal {\varvec{U}}_{\ell }^{\ell \text {-}1}\) describes the upsampling matrix from level \(\ell \) to \(\ell \text {-}1\). This static and error-prone upsampling matrix can be replaced with an adaptive learnable upsampling scheme to further minimize the error. In this work, we decided to use a simple linear fully-connected layer of the form

$$\begin{aligned} \varvec{x}_{\ell \text {-}1} \approx \varvec{\varTheta }^{\ell \text {-}1}_{\ell }(\varvec{x}_{\ell }, \varvec{W}_{\text {up}, \ell \text {-}1}) :=\varvec{W}_{\text {up}, \ell \text {-}1}^{\setminus \varvec{0}}\varvec{x}_{\ell } + \varvec{W}_{\text {up}, \ell \text {-}1}^{\varvec{0}} \end{aligned}$$
(14)

which proved sufficient in experiments to significantly reduce the upsampling error while still maintaining limited computational effort. The trainable parameters consist of the weights \(\varvec{W}_{\text {up}, \ell \text {-}1}^{\setminus \varvec{0}}\) and the bias \(\varvec{W}_{\text {up}, \ell \text {-}1}^{\varvec{0}}\). Replacing the former upsampling matrix with Eq. (14) in Eq. Eq. (13) leads to the decoder formulation we are using in this work

$$\begin{aligned} \varvec{\Psi }_{\text {dec}, \ell \text {-}1}(\varvec{z}_{\ell \text {-}1}) = \varvec{\varTheta }^{\ell \text {-}1}_{\ell }(\varvec{\Psi }_{\text {dec}, \ell }(\varvec{z}_{\ell \text {-}1})) + \varvec{\Psi }_{\text {dec}, \ell \text {-}1}^{*}(\varvec{z}_{\ell \text {-}1}). \end{aligned}$$
(15)

The refined multilayer perceptron on the contrary uses the previous one’s output as additional input

$$\begin{aligned} \tilde{\varvec{z}}_{\ell \text {-}1}&= \varvec{\Phi }_{\ell \text {-}1}(\varvec{\mu }, t) = \varvec{\Phi }^{*}_{\ell \text {-}1}(\varvec{\mu }, t, \varvec{\Phi }_{\ell }(\varvec{\mu }, t))\nonumber \\&= \varvec{\Phi }^{*}_{\ell \text {-}1}(\varvec{\mu }, t, \tilde{\varvec{z}}_{\ell }(\varvec{\mu }, t)). \end{aligned}$$
(16)

To create the next finer surrogate model \(\varvec{\Sigma }_{\ell \text {-}2}\), the same procedure is repeated but this time with \(\varvec{\Sigma }_{\ell \text {-}1}\) serving as coarse model.

To enable a comparison among all levels, it is of interest to transform the approximations of the surrogate models back into the original finely discretized state space. For the upsampling of the coarse approximation to the original discretization, the static upsampling matrices \(\mathcal {\varvec{U}}_{l}^{0},\ 1\le l\le \ell \) are used

$$\begin{aligned} \tilde{\varvec{x}}_{l}^{0} = \mathcal {\varvec{U}}_{l}^{0}(\tilde{\varvec{x}}_{l}). \end{aligned}$$

The presented framework offers many adjustments to adapt it to one’s own needs, and the presented decision choices represent only one suitable configuration. Some specific variations are mentioned in the following.

2.5 Alternative architectures

One point at which adjustments can be made to the presented architecture are the refined versions of the encoder Eq. (12), decoder Eq. (15), and MLP Eq. (16). Instead of having additive transfer learning for the encoder, the results from several layers could be concatenated, which lead to a higher latent dimension. This drawback, along with the absence of a performance boost in numerical experiments, has led to the abandonment of this idea. Furthermore, adding output of the coarse decoder to the input for the fine one, as we did for the MLP, would significantly increase in the number of input dimensions and is generally difficult for graph convolutions.

Another design modification can be made in the adaptive upsampling Eq. (14). Either by replacing the proposed upsampling mapping, e.g. with another type of layer, by optimizing the sparse upsampling matrix \(\mathcal {\varvec{U}}_{l}^{m}\), or by carefully selecting the nodes that require a refinement.

2.5.1 Adaptive refinement

The idea of adaptive refinement is that only areas of certain interest or high error are refined in the surrogate modeling process. That means that only the coarsest surrogate is trained on the precomputed discretizations. All subsequent finer levels will only use the refined version for areas where it is desired.

A possible data-based approach to select those areas is to calculate a suitable error of the coarse surrogate on a validation dataset and then chose those nodes which have the highest error or penalize an error threshold. We refer to them as faulty nodes. For the selected nodes, all neighboring nodes in the next finer graph are added, see Fig. 5. In the next step, the adjacency matrix defining the resulting graph and suitable up- and downsampling matrices need to be computed.

Fig. 5
figure 5

Adaptive selection of faulty nodes, i.e. of nodes with errors above a defined tolerance measured on validation data, to refine meshes in areas of high error

While this approach gives special consideration to areas of interest, e.g., areas with high variability, and is appealing due to the smaller resulting models and the possibility to include error tolerances (for validation data), it also leads to a complicated framework. In addition, it introduces several design decisions, such as how to define the neighborhood of a refined node. Furthermore, no notable performance boost could be observed in our numerical experiments compared to our vanilla version, and the reduction in computational costs is minimal. Consequently, we only present the results of the more comprehensive vanilla approach in the following.

2.5.2 Unified latent representation

The multi-hierarchical representations of the original system not only enable multiple surrogate models to be trained one after another but can also be used for simultaneous training. In such an approach, the different latent representations for every level \(\varvec{z}_l, \ 1\le l\le \ell \) could be exchanged for one unified description so that the model distinction only takes place in the decoders. This advantage is bought by the disadvantage of not being able to stop the refinement at any point and to learn global behavior very easily and fast in a simple representation.

3 Numerical example of a racing kart

Insights from physics-based high-fidelity models, like explicit structural dynamical FE simulations, are crucial for scenario-based testing and computer-aided engineering applications. One domain where scenario-based testing is particularly relevant is integral vehicle safety. While we do not aim to conduct an industry-relevant crash simulation in this paper, we aim to follow an approach in which the general procedure is similar to the one that usually occurs in such a setting, i.e under closed-source commercial software using explicit time stepping schemes and with limited data. Accordingly, we do not consider a full-scale vehicle model but a simplified frame of a racing kart that still offers aspects such as scenario variations with multiple parameter dependencies, nonlinear material and contact behavior with plastic deformations while being computationally tractable and easy to comprehend. In contrast to full-scale vehicle models, it lacks different zones like crumble zones for energy absorption, a safety cage around the occupants, or the occupants itself. The considered model, consequently, represents a complex and closer to application example than other frequently used ones in data-driven MOR and bridges the gap between classic academic examples and full-scale industrial models. Overall, the aim of this work is to develop a methodology that is capable of quickly and accurately approximating the dynamic behavior a structural dynamical system in a hierarchical fashion. Please note that a complete investigation of crashworthiness or the creation of surrogate models for such an investigation is accordingly not the subject of this paper.

3.1 A racing kart frontal collision simulation

The high-fidelity model considered in the following experiments represents the frame of a racing kart, which is pictured in Fig. 6a. The frame itself is responsible for the essential dynamic behavior of a kart [64] and is therefore interesting for the investigation of crash behavior. The remaining parts of the kart, like its wheels, vehicle shell, engine, and driver are replaced by point masses to render a more tractable model. Slight variations of this model have already been used in [19, 65]. The frame is realized as a finite element model in the commercial software tool LS-Dyna. It is constructed out of steel pipes which are modeled as thin-walled tubes using shell elements resulting in \(n=9314\) nodes, each with \(n_{c}^{(0)}=3\) translational degrees of freedom and the same amount of rotational ones.

For the task of creating a surrogate for the kart model, we are interested in approximating the kart’s behavior in a defined scenario parameterized by the simulation parameter \(\varvec{\mu }\) and the time \(t\). The considered scenario describes a frontal collision of the kart against a rigid wall under varying conditions. It should be noted that other scenarios, such as side- and rear-impact or grazing accidents, fall outside the scope of this work. However, they must be considered when conducting scenario-based testing and in-depth safety evaluations. Moreover, we limit the quantity of interest to the displacements as they define all occurring deformations and can serve as a starting point to generate other quantities of interest like stress (e.g. von Mises stress) using standard FEM tools as is done in [66]. A direct approximation of stress values through data-driven surrogate models is possible as well and has been investigated in a previous study for a continuum-mechanical musculoskeletal system [23]. However, this is outside the scope for the current study just like the approximation of other quantities like decelerations, energy absorption, or forces acting on occupants that are of interest for a thorough crashworthiness investigation. The initial conditions are neglected in the modeling process of the surrogates as all simulations start from the same initial condition. The displacement of the \(p\)-th node of the \(s\)-th simulation at time \(t\) is denoted by \(\varvec{q}_{p}^{s,}(t)=\left[ q_{p}^{s,x}(t), q_{p}^{s,y}(t), q_{p}^{s,z}(t)\right] \in \mathbb {R}^{3}\), where the superscripts \(x,\ y,\ z\) represent the corresponding coordinate direction.

The parameters defining the frontal collision are the impact speed \(\mu _1\in [5, 35]\,\hbox {m s}^{-1}\), impact angle \(\mu _2\in [-45, 45]\,^{\circ }\), and yield stress \(\mu _3\in [168, 758]\,\hbox {MPa}\). The impact angle describes the angle between the normal of the wall and the orientation of the kart whereas the yield stress impacts the effective plastic stress–strain curve of the kart’s material. The course of the curve corresponds to that of a typical steel for which the initial value is determined by the individual yield stress \(\mu _3\) of each simulation, see Fig. 6c. Each crash simulation covers a simulation time of 30 ms and the simulation results are exported with a sampling time of 0.3 ms resulting in \(\eta =101\) samples per simulation while the internal step size during simulation is much smaller and adaptively chosen. In total, \(n_s=128\) quasi-random parameter combinations are sampled using Halton sequences. From the resulting high-fidelity simulation results, \(n_s^{\text {train}}=96\) are used for the generation of the surrogate models and \(n_s^{\text {test}}=32\) serve as test data. The simulation results are concatenated in the data matrix \(\varvec{X}\subseteq \mathbb {R}^{N\times n_s^{\text {train}}\eta }\) consisting of the system states \(\varvec{x}\in {\mathcal {X}}\subseteq \mathbb {R}^{N}\) at different times and simulations. Two example simulations are showcased in Fig. 6b. All simulation results as well as the kart’s source files are published and freely available under [67]. The major goal of our paper is to derive a surrogate model that can reproduce the high-fidelity simulation results of the kart in multiple resolutions with high accuracy and low computational times. Conventional approaches reach their limits in doing so due to the model’s complexity.

Fig. 6
figure 6

Image of the considered racing kart (a) along with two example simulations of the modeled frame (b), the material’s used stress–strain curve (c), and the singular values for the simulated data (d)

Model complexity To showcase the complexity of the presented kart simulation model in the context of MOR, we consider the course of the normalized singular values of the high-fidelity simulation results \(\varvec{X}\) in Fig. 6d. The magnitude of each singular value reflects the importance of the corresponding reduced basis vector for describing the data. If a few singular values are dominant, then the data can be described well with a linear combination of only a few reduced basis vectors. If not, a non-negligible error is introduced or more basis vectors must be used. Accordingly, the singular values can serve as an indicator for the Kolmogorov n-width ( [21, 68])

$$\begin{aligned} d_n(\varvec{F}({\mathcal {M}})) :=\underset{{\mathcal {X}}_{n} \subseteq {\mathcal {X}}}{\inf } \ \underset{\varvec{x}\in {\mathcal {S}}_{{\mathcal {M}}}}{\sup } \ \underset{\tilde{\varvec{x}}_n\in {\mathcal {X}}_{n}}{\inf } \Vert \varvec{x}-\tilde{\varvec{x}}_n \Vert , \end{aligned}$$

which quantifies the optimal linear trial subspace by describing the largest distance between any point in the solution manifold \({\mathcal {S}}_{{\mathcal {M}}}\) for all parameters and all n-dimensional subspaces \({\mathcal {X}}_{n}\subseteq {\mathcal {X}}\). For the considered problem the intrinsic dimension of the solution space is at most equal to the number of parameters plus one for the time resulting in \(r=4\). However, since not only the first four but also the subsequent singular values make significant contributions, it can be assumed that linear reduction methods as PCA lead to appreciable errors. Hence, we apply the proposed multi-hierarchic surrogate modeling scheme to the kart model for which it needs to be represented as a graph.

Representation as a Graph To represent the kart as a graph, we directly work with its FE formulation. The nodes of the FE model serve as vertices \({\mathcal {V}}\) of the graph and the element definitions specify the node connectivity, i.e. the edges \({\mathcal {E}}\) of the graph and the adjacency matrix \(\varvec{A}\). The displacements serve as node features, i.e. they represent the system states \(\varvec{x}:=\varvec{q}\), for the graph convolutional based surrogates. For the other surrogates that don’t operate on graphs, the displacements are vectorized, i.e. \(\varvec{x}:=\left[ \varvec{q}_{1}^{s,}, \dots , \varvec{q}_{n}^{s}\right] ^\intercal \in \mathbb {R}^{3n}\). Consequently, the dataset

$$\begin{aligned} {\mathcal {D}}:=\begin{Bmatrix} \begin{bmatrix} t_0 & \dots & t_{\eta -1}\\ \varvec{\mu }_s& \dots & \varvec{\mu }_s\\ \end{bmatrix}, \begin{bmatrix} \varvec{x}^{s}(t_0)&\dots&\varvec{x}^{s}(t_{\eta -1}) \end{bmatrix} \end{Bmatrix}_{s=1}^{n_s^{\text {train}}}, \end{aligned}$$
(17)

contains the time \(t\) and the parameters \(\varvec{\mu }\) as input for the MLP and the corresponding displacements as target values.

4 Results & discussion

To highlight the performance of the proposed approach, we present numerical results for the aforementioned racing kart frontal collision simulation. The created surrogates are rated regarding their training phase, approximation quality in the coarse and the original representation, and their computing time. All following results, with exception of the finite element simulations were produced on an Apple M1 Max with a 10-Core CPU, 24-Core GPU, and 64 GB of RAM. To compare the proposed framework with more classic approaches, we generate surrogate models that follow the description of Sect. 2.3.2 but operate directly on the original (not downsampled) data and use either proper orthogonal decomposition, fully connected autoencoder or a graph convolutional autoencoder for the reduction step. We refer to them as PODNN, AENN, and GAENN, whereas the surrogates using the multi-hierarchical approach with graph convolutional autoencoders on the different levels are referred to as MH1, MH2, and MH3 (from finest to coarsest surrogate). The chosen architectures are listed in Table 1.

The MH encoders consist of several graph convolutional layers with ELU activation functions. Each graph convolution maintains the signal dimension \(n\) but changes the number of channels \(n_{c}\). The graph convolutions are followed by a dense layer with linear activation function to map the input to the latent dimension. The decoder follows the same architecture in reverse order. All dimensionality reduction networks (POD, AE, GAE, MH1, MH2, MH3) are combined with similar MLPs to predict the latent state based on the simulation parameters. Each one consists of several fully-connected layers with ELU activation function and a final dense layer with linear activation function. The surrogates are trained for 1500 epochs, and the weights with the lowest total loss Eq. (11a) are then used for subsequent predictions. Another noteworthy aspect showcased in Table 1 is that the graph convolutional networks possess much fewer parameters than a comparable multi-layer fully-connected networked architecture.

Table 1 Model Architectures

4.1 Training comparison between fine and coarse models

Before we evaluate the actual performance of the surrogate models, let’s first take a glance into the training phase. For an overview of the data on which the different surrogate models are trained and which variables are optimized in this process, please refer to Table 2. A drawback of the used graph convolution is the associated computational cost that among others arises from the recursive computation of the Chebyshev polynomials. Consequently, the time to train a graph convolutional surrogate model on the full model significantly exceeds the training time of a classic fully-connected autoencoder as shown in Fig. 7a. If the surrogate is created using the multi-hierarchical approach on the contrary the tide turns. The training time is reduced to such an extent that the model operating on the coarsest representations trains even faster than the classical autoencoder on the full model. Even when adding up the training time of all three levels used for the kart example, the time is still in a comparable order of magnitude and is more than ten times faster than the GAENN on the full model.

Table 2 Overview of surrogate models showing which data is used and which parameters are optimized

Considering the computing time required \(\Delta T\) for one prediction of the surrogates, a similar picture emerges as depicted in Fig. 7b. The GAENN requires by far the most time but our approach can substantially mitigate this effect. A surrogate that just uses POD for the dimensionality reduction outperforms the other models as the computation of the reconstruction to the fine physical space only requires one matrix multiplication. Regarding the MH approach, the training and computational time logically increases with every level as the degree of resolution rises. Noteworthy in this context is that the difference in time to receive a prediction for the fine original representation is not much higher than that one of the coarse representations. This is owed to the fact that the upsampling follows Eq. (4) and consequently only requires a sparse matrix multiplication.

Fig. 7
figure 7

Computational time required to train the models (a) and to compute a prediction (b) for the kart model on the fine and the coarse mesh using the proposed multi-hierarchical approach and standard approaches

In addition to computation times, training the MH models provides additional insights. For an evaluation of the transfer learning, we consider the progression of the loss during training in Fig. . It becomes apparent that all losses drop significantly lower with each refinement of the models. Consequently, the saved information of the coarser models helps the finer ones to improve their performance and avoids that already known structures have to be learned twice. Especially, the reconstruction benefits greatly from the transfer learning, see Fig. 8c, but the overall approximation gets better with each level as well, see Fig. 8b.

Fig. 8
figure 8

Training history for the overall loss (a), the approximation loss (b), and the reconstruction loss (c) of the MH models for the racing kart example. The loss functions are smoothed for a clearer representation, whereas the true values are drawn with transparency in the background. It can be seen that the loss decreases significantly with each refinement compared to the previous coarser level

4.2 Evaluation on coarse levels

In the final comparison of the surrogate models, the performance metrics are always measured in the original discretization of the model so that a comparison between the MH models and models that are trained on the original data directly is possible. However, there are two reasons why it is worth considering the performance in the coarse discretizations prior to this final consideration: On the one hand, because the MH models are intended to be evaluated in the coarse representation and on the other hand because in this way the error induced by the upsampling into the original space does not appear in the evaluation. Furthermore, we can compare how the graph convolutional MH models perform in comparison to standard surrogate models on each level and thus justify their use.

For a comparison of the performance we utilize the averaged Euclidean distance between the nodes of the reference FE simulation and their approximation

$$\begin{aligned} e_{2}^{s}(t) = \frac{1}{n}\sum _{p=1}^{n} |q_{p}^{s}(t) - \tilde{q}_{p}^{s}(t) |_2 \end{aligned}$$

at time \(t\) of the \(s\)-th simulation as well as the maximum occurring Euclidean distance among all nodes

$$\begin{aligned} e_{2, \max }^{s}(t) = \max _{p\in \{1,\dots n\}} |q_{p}^{s}(t) - \tilde{q}_{p}^{s}(t) |_2 . \end{aligned}$$

Moreover, the mean value over the time and all test simulations

$$\begin{aligned} \hat{e_{\text {2}}}= \underset{s\in \{1, \dots , n_s^{\text {test}} \}}{\text {mean}}\, \underset{t\in {\mathcal {T}}}{\text {mean}}\, e_{2}^{s}(t) \end{aligned}$$

as well as the corresponding mean maximum error

$$\begin{aligned} {\hat{e}}_{2}^{\max } = \underset{s\in \{1, \dots , n_s^{\text {test}} \}}{\text {mean}}\, \underset{t\in {\mathcal {T}}}{\text {mean}}\, e_{2, \max }^{s}(t) \end{aligned}$$

are used to represent the approximation quality for the complete test data. Please note that capturing other quantities that are of interest in classic crashworthiness investigations is out of scope for this study and the mean node error serves as easy to comprehend measure to compare our proposed method to other surrogate modeling techniques.

The first investigation on the coarse meshes is conducted to emphasize the hypothesis stated at the beginning; that the degrees of freedom result from the necessity of the modeling method. Therefore, finite element models that only differ in their discretization are generated, and the same scenario is simulated for all of them. As shown in Fig. 9, the node distance between the coarse FE models and the reference position of the corresponding selected nodes in the original mesh is far apart. The results reveal qualitatively different dynamic behavior and confirm the need for a fine resolution using the finite element method. Note that the coarse FE models are only produced with the presented downsampling approach and not with a proper mesh simplification method for finite element models. Nevertheless, the results show of how much conventional methods rely on a fine resolution.

Fig. 9
figure 9

Mean (a) and maximum (b) averaged node displacement error of the kart model on different coarse meshes using the proposed multi-hierarchical approach and a standard PODNN approach. The errors is once measured for plain reconstruction of the state and once for the reconstruction of the approximated state. Note that the FEM errors are calculated for a slightly different reference model for technical reasons (connection of point masses to model)

Considering the node distance error of the MH models and PODNN surrogates on the different discretizations, two essential points are noteworthy. On the one hand, the all MH models outperform the POD-based surrogate. Accordingly, the graph convolutional architecture works well on the coarsest representation and justifies its use compared to other similarly applicable architectures. On the other hand, the error decreases with every additional level for the MH models. In an error view of the original fine resolution of the kart, this is not surprising, since the upsampling error decreases with each level. On the coarse discretizations, however, this clearly indicates that transfer learning helps to lower the error at each level.

It is important to emphasize which dynamic effects are learned at which level. For a visual illustration, the learned behavior at each level for an example simulation is given in Fig. . It showcases the models approximation subtracting the already existing prediction of the coarse levels, i.e.

$$\begin{aligned} \tilde{\varvec{x}}_{\ell \text {-}1}^{0} - \tilde{\varvec{x}}_{\ell }^{0}&= \\&\mathcal {\varvec{U}}_{l\text {-}1}^{0} \varvec{\Psi }_{\text {dec}, \ell \text {-}1}(\varvec{\Phi }_{\ell \text {-}1}(\varvec{\mu }, t)) - \mathcal {\varvec{U}}_{l}^{0}\mathcal {\varvec{U}}_{\ell }^{\ell \text {-}1} \varvec{\Psi }_{\text {dec}, \ell }(\varvec{\Phi }_{\ell }(\varvec{\mu }, t)) . \end{aligned}$$

Clearly, the global dynamic behavior is already captured in the coarsest surrogate (Level 3) where a strong deflection of the front fork and a rotation of the entire kart occurs. In the the finer ones (Level 2 and Level 1) minor deformations (especially in areas where the coarser levels lack degrees of freedom) are captured to compensate for local errors.

Fig. 10
figure 10

Learned behavior at the different levels. Global behavior is learned at the coarsest, i.e., deepest level. The learned behavior is then transferred to the finer levels, where only minor adjustments are made

4.3 Approximation quality

In a final comparison, we validate the surrogates’ performance in the original model discretization. We refer to Fig. , where the mean as well as the maximum node distance error over time is shown for the different models. Interestingly, the graph convolutional autoencoder-based surrogate without multi-hierarchical structure fails to capture the dynamics and consequently has the largest error. The surrogate using linear reduction in form of the POD struggles to approximate the intervals of high dynamics as only \(r=4\) reduced basis vectors are not expressive enough to describe all complex deformations occurring in the simulations.

The AENN surrogate model relying on a classic autoencoder already shows promising results indicating the benefits of a nonlinear dimensionality reduction. Nevertheless, even the coarsest MH model beats it in average, and the error decreases with each subsequent finer level considering the mean Euclidean distance although the performance increase subsides. For the maximum error, these observations do indeed change. The coarsest model is not able to beat the AENN surrogate model and only the subsequent finer models lead to a superior performance. Interestingly, the performance boost does not stagnate as more levels are added, but the maximum error continues to decrease significantly. This suggests that even if the overall performance does not increase significantly after a certain level of detail at finer resolutions, highly error-prone areas still benefit greatly from adding more detail. The most important performance indicators are provided in Table 3 to provide the main results at a glance.

Fig. 11
figure 11

Mean (a) and maximum (b) node distance error over time for all test simulations achieved by different surrogate models. The individual test simulations are drawn transparently, while the mean value of all test simulations is shown opaquely

Table 3 Performance measurements

4.4 Discussion

Our results show that the proposed multi-hierarchical surrogate modeling scheme is suitable for creating various reduced order models for the considered kart simulation model. We captured the transient dynamics, including massive plastic deformations of the considered kart’s frame resulting from nonlinear contact under multiple parameter dependencies. In particular, our method outperforms standard approaches regarding accuracy while still maintaining competitive computational costs and possessing less parameters. Moreover, the MH surrogates can directly operate on the coarse (less memory-demanding) representations of the system that are still visually interpretable making them suited for graphical application in hardware-restricted use cases. Nevertheless, their predictions can be lifted into the original system description without adding much computational effort by simple sparse matrix multiplication. We could show that the global dynamic behavior occurring in the investigated crash scenario is already captured in the coarsest surrogate and that the finer ones only need to learn microscale effects. Along with this, we were able to determine that the surrogates accuracy increases with each refinement even in the coarse domains. This effect is also reflected in the course of the loss during the networks’ training phase where the loss dropped significantly faster for the finer models. Those observations lead to the conclusion that the transfer learning helps the models to converge closer to the reference solution. Furthermore, the proposed architecture offers multiple points for adjustments and extensions as stated in Sect. 2.5. However, those benefits are gained at the expense of a few disadvantages and limitations.

Our approach requires knowledge about the internal (geometrical) structure of a given system. Consequently, data alone is not enough. Furthermore, the model simplification is performed based on the spatial properties of a given system so that other quantities of interest might be lost or must be acknowledged in the mesh simplification process. This process itself adds computational effort to the offline phase, which is negligible compared to the training effort for the networks. Additionally, many new design choices and hyperparameters are added by the multi-hierarchical architecture and the use of graph convolutions. This complicates the surrogate modeling process compared to more straightforward approaches like POD in combination with neural networks. However, even without extensive fine tuning the MH models are able to beat the conventional methods.

As the proposed framework is substantially build upon the available high-fidelity data, the results heavily depend on its quality and extrapolation can lead to major challenges. To circumvent this issue, the consideration of low-fidelity data to improve the surrogate models is an interesting future research direction. In our current approach, all hierarchical models are derived from the high-fidelity data only. However, it is considerable to use low-fidelity FE models for parameters living outside the considered training data. Those low-fidelity models could be obtained from the coarse meshes and their results could be incorporated into the surrogate models, similar to multi-fidelity approaches that use cheap low-fidelity models to improve high-fidelity predictions [69, 70] and learn the resulting residual [71], for example.

Another decision worth discussing is the choice of graph convolutions for dimensionality reduction. The multi-hierarchical framework itself works with arbitrary data-driven reduction methods and consequently the GCNNs can be replaced with other methods as well. Nevertheless, as the mesh simplification already operates on graphs, it is an obvious choice to use this structure in the data to gain benefits. As shown in the results, the graph convolutional based surrogate on the coarse mesh beats a linear reduction technique by far, even when, in this coarse representation, no transfer learning takes place. Furthermore, the graph convolutions are an architecture that can benefit a lot from the multi-hierarchical approach. The computational time savings have a much greater impact as the used convolutions are computationally expensive per se. As the convolutions use parameter sharing along the filters, the networks using them require less trainable parameters but still reveal a better expressiveness of the data within our framework. Interestingly, a graph convolutional-based surrogate operating on the original fine mesh failed to capture the system adequately which may be caused by oversmoothing issues [72, 73], the difficulty to transport information over distant nodes in such a fine mesh [74], and the spectral bias [75]. Consequently, the MH approach not only facilitates a successful learning process but makes it possible in the first place.

5 Conclusion

In this paper, we derived a structured surrogate modeling scheme producing efficient yet accurate models for a kart simulation model in a frontal collision scenario despite its complexity and inaccessible source code. The surrogates require only as many parameters as other state-of-the-art linear counterparts, while outperforming even conventional nonlinear data-driven competitors. To achieve this, our scheme operates on various representations of the kart model with different resolutions instead of relying on a single high-resolution discretization. This naturally facilitates the approximation of multiscale effects as global dynamics can be learned at coarse resolutions, while microscale dynamics are captured at finer versions. In addition, we use low-resolution approximations to ease the learning process and improve accuracy of medium- and fine-resolution approximators by transferring knowledge across levels so that finer models only need to capture residuals. Sparse matrix multiplications or adaptive upsampling networks are used to switch between resolutions.

The surrogates on a single level are built of graph convolutional autoencoders for discovery of suitable low-dimensional representations of the data and fully connected neural networks that cover the parameter-dependent latent dynamics. In doing so, the resulting surrogate models have a satisfying accuracy despite the comparably low number of parameters. The hierarchical approach also speeds up the learning process for the graph convolutional surrogates as it eliminates the need to work with the original fine resolution data and creates multiple models with varying memory and computational demands, all operating in visually and physically interpretable domains.

However, the involved mesh simplification process is based on spatial criteria, and thus, other information may be lost in the process. Moreover, similar to other nonlinear reduction techniques, it shows its advantages, especially when the system is reduced to its intrinsic size. For large latent spaces, conventional linear methods can still achieve competitive results. For a thorough investigation of crashworthiness, the suitability of the presented method for full-scale car crash simulation models must be investigated in the future and at the same time attention must be paid to more holistic surrogates. This means that a wider range of scenarios as well as more quantities such as decelerations and forces must be considered in addition to the deformations. Another limitation it shares with data-driven reduced order models is the lack of extrapolation quality. To remedy this disadvantage, low-fidelity FE models, which may come directly from coarser discretizations, can be embedded in the future for parameter combinations outside the training data. This eliminates the drawback of the current framework to being only based on expensive high-fidelity data. Moreover, to continue this promising path, more recent graph convolutional architectures can be used and all hierarchical models can be covered with a single latent variable.