1 Introduction

The recent advancements in high-performance computing and machine learning (ML) techniques have led to significant strides in the neural network (NN) applications in various fields, including structural optimization [1, 2], flow prediction [3,4,5], additive manufacturing [6, 7], and exploring structure–property relations [8, 9]. Besides approximating the underlying physics via simulation data (data-driven NNs), various physics-informed NNs have been proposed to embed the physics principles directly into the loss function, thereby alleviating the need for simulation data [10,11,12,13,14,15]. Trained NNs have been successfully used as efficient surrogate models in inverse design [1, 16] and design optimization [17, 18]. For a comprehensive overview of the application of NNs in computational mechanics, see [19].

NNs can be trained to approximate a particular solution, or they can be trained to approximate the underlying physics/mathematical operator for a class of problems in what is known as operator learning [20]. Previous researchers have proposed different architectures for operator learning, with two notable ones being the Fourier neural operator (FNO) and the Deep Operator Network (DeepONet). FNO was first proposed by Li et al. [21] to solve partial differential equations with parametric inputs. Inspired by the Fourier transform used in solving differential equations, the input function goes through multiple Fourier layers. The encoded information is then mapped onto the output function space. Each Fourier layer takes the fast Fourier transform (FFT) of its input and filters out high-frequency modes. FNO and its improved versions have been successfully applied to Burger’s equation, Darcy flow, Navier–Stokes equation [21], as well as in elasticity and plasticity problems in the solid mechanics field [22]. Modified versions of FNOs were widely used in various areas such as material modeling [23], seismic wave equations [24], and 3D turbulence for large eddy simulation [25]. However, since FNO uses FFT, it suffers difficulties when handling complex geometries or intricate non-periodic boundary conditions. Also, Fourier-based operations are computationally expensive when dealing with high-dimensional problems. Lately, the DeepONet proposed Lu et al. [26] emerged as another capable architecture for operator learning. The neural network architecture learns an operator from the governing differential equation of the system, which enables retraining not necessary despite the input to the system, such as parametric functions, boundary conditions, or even varying geometry. DeepONet consists of two separate neural networks, named branch and trunk. The branch network takes the parametric function as an input, while the trunk network takes domain geometry. Both the branch and trunk were fully connected neural networks (FNNs) in the very first form suggested by Lu et al. [26], encoding their input information each, respectively. The output of the branch and trunk then underwent a dot product to map the parametric functions to the target, the solution to the governing differential equation. Koric et al. [27] successfully predicted stress field on a 2D domain with a plastic deformation. Moreover, DeepONets lately have been frequently applied to solve science and engineering problems such as inverse designing nanoscale heat transport system [28], predicting elastic–plastic stress on complex topology-optimized domain [29], earthquake localization by Haghighat et al. [30], heat conduction problem by Koric and Abueidda [31], non-equilibrium thermodynamics of chemical mixtures combined with physics-informed neural networks by Li et al. [32], and convection–diffusion reaction system by Kobayashi et al. [33]. A comprehensive comparison between FNO and DeepONet was performed by Lu et al. [34].

Many real-world engineering problems involve dynamic, time-dependent loading conditions caused by impact, vibrations, or cyclic loading, resulting in a time-dependent response in the system. Often the full flow field (for CFD) or the stress field (for solid mechanics) is needed to gain critical insights in different local regions. Using classical techniques, including finite element analysis (FEA), topology optimization (TO), and sensitivity analysis to obtain a time-dependent response is computationally expensive and time-consuming. Hence, a scientific need exists for more robust data-driven surrogate models for time-dependent problems capable of predicting full-field contours for multiple time steps and vector components. Recurrent neural network (RNN) models like long short-term memory (LSTM) [35] and gated recurrent unit (GRU) [36] are typically used to capture the time-dependent and causal relationship in the inputs and outputs. However, RNNs are typically used to predict a 1D time sequence instead of a full-field contour [8, 37,38,39]. Realizing the effectiveness of the DeepONet in predicting full-field solution contour with parametric inputs, He et al. [40] proposed a modification of the DeepONet model (termed S-DeepONet) to combine the temporal encoding capability of RNNs with the spatial encoding capabilities of DeepONet to predict outcomes for the last time step, particularly for scalar output. In this work, we proposed an improvement over the original S-DeepONet architecture to simultaneously predict the full-field solution at different time steps as well as for different vector components, addressing the need to predict time-dependent vector fields in a single model. In this work, we tested our approach with two example problems: (1) a lid-driven cavity flow and (2) a path-dependent plasticity problem, both involving time dependence and representing real-world engineering use cases. We used the proposed innovative S-DeepONet formulation to predict full-field solutions and compared performance with the classical DeepONet formulation. Furthermore, we demonstrate the use of the trained NN in an inverse parameter identification via genetic algorithm.

This paper is organized as follows: Sect. 2 introduces neural network architectures and provides details on the data generation method. Section 3 presents and discusses the performance of the NN model. Section 4 summarizes the outcomes and limitations and highlights future works.

2 Methods

2.1 S-DeepONet with multiple output dimensions

The key idea in the original S-DeepONet architecture as proposed by the authors [40] is the separation of the temporal component (handled by a GRU branch network) and the spatial component (handled by the FNN trunk network) of the solution operator, and combining the components via a matrix–vector product and a bias:

$$\begin{aligned} \varvec{G}_{n} = \sum _{h=1}^{HD} \varvec{B}_{h} \varvec{T}_{nh} + \beta , \end{aligned}$$
(1)

where \(\varvec{G}\), \(\varvec{B}\), and \(\varvec{T}\) denote the outputs of the S-DeepONet, branch network, and trunk network, respectively. Dimension index h represents the hidden dimension (HD) of branch and trunk networks, and index n represents the flattened spatial dimension, which contains N nodes in the simulation domain. Finally, \(\beta \) is a bias added to the product. This structure allows the prediction of a full-field solution discretized by N nodes, where the time-dependent input load information is encoded in the branch network and the spatial geometry information is encoded in the trunk network.

Although including time-dependent information in the input load, the original S-DeepONet only predicts a scalar solution field at the end of the load. To extend the S-DeepONet architecture to predict vector solution fields at different time steps, we further exploit the idea of separation of spatial and temporal components into the trunk and branch networks. To this end, a novel S-DeepONet structure is proposed, and its schematic is shown in Fig. 1.

Fig. 1
figure 1

Schematic of the improved S-DeepONet used in this work. \(m_V\), x, y, and \(\hat{G}\) denote the load magnitude, X coordinates, Y coordinates, and the approximated solution operator. Dimensions HD, S, N, and C are the number of hidden dimensions, time steps, nodes in the domain, and output vector components, respectively. The black GRU blocks return a sequence (2D outputs), while the green GRU block compresses the output into 1D

In the proposed architecture, we leverage the GRU branch network to produce encoded hidden outputs for all the output time steps as a tensor \(\varvec{B}\) of shape [HDS], where S is the number of time steps. Similarly, when provided an [N, 2] input vector containing 2D coordinates of all nodes in the domain, the trunk network in the proposed S-DeepONet produces an encoded output tensor \(\varvec{T}\) of shape [NHDC], where C is the number of output vector components. \(\varvec{T}\) contains the encoded hidden outputs for all nodes and all vector components. To account for the new output dimensions, we combine the information from the branch and trunk via the following product:

$$\begin{aligned} \varvec{G}_{nsc} = \sum _{h=1}^{HD} \varvec{B}_{hs} \varvec{T}_{nhc} + \beta , \end{aligned}$$
(2)

where the lowercase indices s and c correspond to the time step and output component dimensions. The tensor product nature of this combination allows for efficient simultaneous generation of full-field vector outputs at multiple output time steps. This combination can be understood as a simultaneous identification of a set of “basis” shapes (from the trunk network) and corresponding weights (from the branch network), where the final solution contours are expressed as the weighted linear combination of those basis contours. This idea of basis and weight identification of the S-DeepONet architecture is illustrated further in Fig. 2.

Fig. 2
figure 2

Basis and weight interpretation of the S-DeepONet combination operator. For each vector component c, S-DeepONet identifies HD basis shapes from the trunk network. Simultaneously, for each time step s, HD weights from the branch network. The solution contour is expressed as the linear combination of all basis shapes, including the data mean

By predicting C output vector components together, we are eliminating the need to train C different single-component S-DeepONets, each specialized in one output component, thus leading to tremendous computational time savings.

The branch network of the developed S-DeepONet consists of four GRU layers. The encoding–decoding structure of the original S-DeepONet is adopted, with the first two being the encoder and the last two being the decoder. All GRU layers use a tanh activation function. Finally, a time-distributed dense layer with linear activation is used to output the branch results with a hidden dimension (HD) of 32 for all S time steps. The trunk network is a FNN. All NNs were implemented in the DeepXDE framework [41] with a TensorFlow backend [42]. All models were trained for 300,000 epochs with a batch size of 64. The Adam optimizer [43] was used, and the scaled mean squared error (MSE) was used as the loss function, which is defined as:

$$\begin{aligned} \mathrm{{MSE}} = \frac{ 1 }{ N } \sum ^N_{i=1} (f_{FE} - f_{Pred})^2, \end{aligned}$$
(3)

where N, \(f_{FE}\), and \(f_{Pred}\) denote the number of data points, the FE-simulated field value, and the NN-predicted field value, respectively.

The choice of hyperparameters is crucial for the performance of NN models. Since the optimal set of hyperparameters depends on the specific problem at hand, instead of employing various automatic hyperparameter optimization methods [44, 45], we have chosen a baseline model size by gradually increasing the number of trainable parameters in the model, and used the same baseline hyperparameters across the two example problems. Using the same model for different problems without any problem-specific fine-tuning highlights the robustness of the proposed architecture. In the process of identifying the baseline hyperparameters, we considered the three following model sizes:

  1. 1.

    Model 1: GRU \(=[64,32,32,64]\), FNN \(=[2,101, HD \times C]\)

  2. 2.

    Model 2: GRU \(= [128,64,64,128]\), FNN \(=[2,101,101,101, HD \times C]\)

  3. 3.

    Model 3: GRU \(= [256,128,128,256]\), FNN \(=[2,101,101,101,101, 101, HD \times C]\)

where the numbers in square brackets denote neurons in each layer and C takes different values in the numerical examples presented in this work. The numbers of trainable parameters are 59,144, 220,932, and 800,384, respectively, when \(C=3\).

2.2 Data generation

In this work, we demonstrate the application of the proposed improved S-DeepONet in time-dependent computational fluid dynamics (CFD) and history-dependent plastic deformation problems. Using the trained S-DeepONet, inverse parameter identification is performed with genetic algorithm.

2.2.1 Lid-driven cavity flow

In the first example, we consider a classical CFD benchmark problem: the lid-driven cavity flow. Consider a rectangular cavity of length 3 and height 1, which is discretized into a \(121 \times 41\) uniform grid with 4961 nodes. A lid is located at \(y=1\) and moves at a time-dependent velocity profile \(\bar{u}(t)\) in the X direction, driving the fluid motion in the cavity. The radial basis interpolation (RBI) was used to generate the smooth lid velocity profile, which was defined by six uniformly spaced control points. The starting velocity (i.e., at \(t=0\)) is 0, and the velocity in all other control points was sampled from the range \([-2,2]\). The fluid is assumed to be incompressible, and the system is governed by the Navier–Stokes equation:

$$\begin{aligned} \frac{ \partial \varvec{u} }{ \partial t } = \mu \nabla ^2 \varvec{u} - ( \varvec{u} \cdot \nabla ) \varvec{u} - \frac{1}{\rho } \nabla P, \end{aligned}$$
(4)

where \(\varvec{u}\), \(\mu \), \(\rho \), and P denote the velocity vector, viscosity, mass density, and pressure, respectively. In this example, a hypothetical fluid with \(\rho =1\) and \(\mu =0.1\) was used. In 2D, let the X and Y components of the velocity vector be denoted as u and v. No-slip boundary condition was used for all four boundaries:

$$\begin{aligned} \begin{aligned} u = \bar{u}(t), \, v = 0 \;\; \textrm{if} \; { {y}} = 1,\\ \varvec{ u } = \varvec{0}, \;\; \textrm{Otherwise}.\\ \end{aligned} \end{aligned}$$
(5)

For the pressure degree of freedom, the following boundary conditions were used:

$$\begin{aligned} \begin{aligned} P = 0 \;\; \textrm{if} \; { {y}} = 1,\\ \frac{\partial P}{\partial x} = 0 \;\; \textrm{if} \; { {x}} = 0,3,\\ \frac{\partial P}{\partial y} = 0 \;\; \textrm{if} \; { {y}} = 0.\\ \end{aligned} \end{aligned}$$
(6)

The trivial initial conditions \(\varvec{u}=\varvec{0}\) and \(P=0\) were used. For this simple geometry, the second-order central finite difference (FD) method was used to discretize the governing equation, and the pressure projection method with explicit time integration was used to evolve the system for 10,000 time steps with a time step size of \(2 \times 10^{-4}\). A Python code was used to solve this problem, which was adopted from the work of Barba et al. [46]. A total of 4000 simulations were conducted with distinct lid velocity profiles. The primary variables P, u, and v were stored at 25 uniformly spaced output time steps over the simulation period. For better NN training and accuracy, it is best to scale the input and output data to suitable ranges. A time-dependent scale factor for each output variable was first calculated to account for the change in solution scale as the fluid flow develops. For a time step i, this scale is defined as the maximum absolute value of the field values (over all cases and all points) at this time step. The data at time step i was then divided by this scale, leading to a new data scale of \([-1,1]\).

2.2.2 Dog bone axial loading

In the second example, we consider the plastic deformation of a dog bone specimen under time-dependent axial loads. The specimen has a length of 110 mm and a gauge width of 20 mm. A total of 4756 linear plane stress elements were used, and the specimen is fixed on its left end with displacement applied on the right end. A schematic of the specimen and boundary conditions are shown in Fig. 3a.

Fig. 3
figure 3

a Schematic of the dog bone specimen and the mesh used in FE simulations. The axial displacement is along the global X direction. b A typical applied displacement in the plastic deformation problem

In the absence of any body and inertial forces, the equilibrium equations and boundary conditions are:

$$\begin{aligned} \begin{aligned} \nabla \cdot \varvec{\sigma } = \varvec{0}, \;\; \forall \varvec{X} \in \Omega ,\\ \varvec{ u } = \bar{\varvec{u}}, \;\; \forall \varvec{X} \in \partial \Omega _u,\\ \end{aligned} \end{aligned}$$
(7)

where \(\varvec{\sigma }\) and \(\bar{\varvec{u}}\) denote the Cauchy stress and prescribed displacement, respectively. The small-strain assumption is applied, which leads to the following definition of total strain tensor:

$$\begin{aligned} \varvec{\epsilon } = \frac{1}{2} ( \nabla \varvec{u} + \nabla \varvec{u}^T ), \end{aligned}$$
(8)

as well as its additive decomposition into elastic and plastic strain parts:

$$\begin{aligned} \varvec{\epsilon } = \varvec{\epsilon }^e + \varvec{\epsilon }^p. \end{aligned}$$
(9)

For linear elastic isotropic material in plane stress, stress can be found from elastic strain via:

$$\begin{aligned} \begin{bmatrix} \sigma _{11} \\ \sigma _{22} \\ \sigma _{12} \end{bmatrix} = \frac{E}{1-\nu ^2} \begin{bmatrix} 1 &{}\quad \nu &{}\quad 0 \\ \nu &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1-\nu \end{bmatrix} \begin{bmatrix} \epsilon ^e_{11} \\ \epsilon ^e_{22} \\ \epsilon ^e_{12} \end{bmatrix}, \end{aligned}$$
(10)

where E and \(\nu \) are the Young’s modulus and Poisson’s ratio. Plasticity is modeled with the simple \(J_2\) plasticity with linear isotropic hardening law:

$$\begin{aligned} \sigma _y = \sigma _{y0} + H \bar{\epsilon }_p, \end{aligned}$$
(11)

where \(\sigma _y\), \(\bar{\epsilon }_p\), \(\sigma _{y0}\), and H denote the flow stress, equivalent plastic strain, initial yield stress, and the hardening modulus, respectively. Relevant material properties are shown in Table 1. \(\bar{\epsilon }_p\) is an internal history variable in plasticity that records the accumulation of plastic strain at different points in the loading history. Due to its vital role in plasticity, this is considered one of the output variables of the S-DeepONet, while the other output variable is the von Mises stress:

$$\begin{aligned} \bar{\sigma } = \sqrt{ \sigma _{11}^2 + \sigma _{22}^2 + \sigma _{11}\sigma _{22} + 3\sigma _{12}^2 }. \end{aligned}$$
(12)
Table 1 Material properties of the elastic–plastic material model

The time-dependent displacement histories were generated similarly to Sect. 2.2.1 with six uniformly spaced control points. The applied displacement is 0 at \(t=0\). The displacement magnitude at each control point was randomly selected such that the nominal axial strain magnitude is below 5%. A typical example of the applied displacement is shown in Fig. 3b. A total of 4000 FE simulations were generated using Abaqus/Standard [47], and \(\bar{\sigma }\) and \(\bar{\epsilon }_p\) were stored at 40 uniformly spaced time steps as the ground truth labels in the NN training. Similar to Sect. 2.2.1, some data scaling is needed for best model performance. The min–max scaler in scikit-learn [48] was used for this example. Two different scalers were used for the two output components to account for the drastically different scales of stress and plastic strain.

2.3 Inverse history identification with trained S-DeepONet

Once the improved S-DeepONet is trained, it can be used to infer time-dependent full-field solution contours efficiently. Take the example of predicting von Mises stress as described in Sect. 2.2.2. Further post-processing of the predicted fields yields a plot of the mean von Mises stress (over the entire dog bone specimen) as a function of load duration, which is a function of the input load history. If given a known curve of the mean stress over time, it is possible to infer the load history necessary to generate this stress history using the trained S-DeepONet and an optimizer. The outputs of this inverse identification are the five scalar displacement values at the control points since the smooth load curve in Sect. 2.2.2 can be uniquely characterized by these values (the value at the first control point is 0). For this work, the genetic algorithm (GA) implementation with PyGAD [49] was used, which provides a gradient-free optimization framework to leverage the trained S-DeepONet as a black-box model. Using GA, 25 generations of optimization were performed with a population size of 100 and the number of parents mating was set to 10. The GA seeks to maximize a scalar fitness function value, and in the current case, it is defined as the inverse of the mean absolute error (MAE) between the predicted and known stress histories. The process of evaluating the fitness value with a trained S-DeepONet is presented in the following algorithm:

Algorithm 1
figure a

Evaluation of the fitness value

3 Results and discussion

All simulations were conducted with six high-end AMD EPYC 7763 Milan CPU cores. All NN training and inference were conducted using a single Nvidia A100 GPU card on Delta, an HPC cluster hosted at the National Center for Supercomputing Applications (NCSA).

To evaluate the model performance in the test set, three quantitative metrics were used, and they are the relative \(L_2\) error, mean absolute error, and \(R^2\) value:

$$\begin{aligned} \begin{aligned} \mathrm{{Relative \; L_2 \; error}} = \frac{ | f_{FE} - f_{Pred} |_2 }{ |f_{FE}|_2 + \epsilon } \times 100\%,\\ \mathrm{{MAE}} = \frac{1}{N_T} \sum _{i=1}^{N_T} \left| f_{FE} - f_{Pred} \right| ,\\ R^2 = 1 - \frac{ \sum _{i=1}^{N_T} \left( f_{FE} - f_{Pred} \right) ^2 }{ \sum _{i=1}^{N_T} \left( f_{FE} - \bar{f}_{FE} \right) ^2 } , \end{aligned} \end{aligned}$$
(13)

where \(f_{FE}\), \(f_{Pred}\), \(N_T\), and \(\bar{f}_{FE}\) denote the finite element (FE) simulated field value, NN-predicted field value, number of test cases, and the mean value of the FE-simulated field values, respectively. A small numerical offset \(\epsilon =1\times 10^{-8}\) was added to prevent division by 0 when the FE solution is 0.

3.1 Time-dependent fluid flow

To use the improved S-DeepONet to solve the lid-driven cavity problem, we set \(C=3\) to predict three vector components P, u, and v and \(S=25\) for 25 output time steps. First, we studied the effect of model size using this example and identified the baseline model size among the three sets of hyperparameters listed in Sect. 2.1. The classical 80–20 data split was used when evaluating different model sizes, and the prediction errors for the three components are shown in Fig. 4a.

Fig. 4
figure 4

Relative \(L_2\) error of different solution variables for different a model sizes and b training data fractions

The results in Fig. 4a clearly show that the prediction errors of all components decreased with increasing model sizes. Therefore, among the three sets of hyperparameters, we chose the configuration of Model 3 as the baseline, and continue using it for all examples shown in this work. Then, with the baseline model size identified, different percentages of the entire dataset were used in training to investigate the data efficiency of the proposed architecture. In three different instances of the model, we tested training with 50%, 65%, and 80% of all available data; the relative \(L_2\) errors for the three solution components are compared in Fig. 4b. Figure 4b shows that the prediction error generally decreases with increasing amount of data in training. Therefore, the classical 80–20 data split with 80% data in training and the rest in testing was adopted in this work for all examples. To test the repeatability of the model performance, we trained the S-DeepONet three times with randomly divided training and testing data. The results show similar repeatable performance, with a mean (over all repetitions and components) relative \(L_2\) error of 5.891% and a standard deviation of 0.498%. Detailed performance metrics for the best-performing run out of the three are shown in Table 2. The average training and inference (per case) times over the three runs were 22,278 s and \(8\times 10^{-3}\)s, respectively. On average, running each FD simulation took 18 s, making the inference 2240 times faster than the direct numerical simulation.

Table 2 Performance metrics for the CFD example

The contour and quiver plots for pressure and velocity vectors at different time steps are shown in Fig. 5. They are ranked by the percentile of mean relative \(L_2\) error in the predictions, and the 0th (best case), 80th, and 100th (worst case) percentiles are shown for representation. For all plots in Fig. 5, the filled contour was colored by pressure, and the velocity vectors are shown as arrows in the domain whose lengths are proportional to the velocity magnitude.

Fig. 5
figure 5

Contour and quiver plots for the flow field at different time and percentiles of prediction accuracy. In each sub-figure, the top and bottom rows show finite difference solution and S-DeepONet prediction, resp. The background contour is colored by the pressure, and the flow velocity field is rendered as arrows with length proportional to the velocity magnitude

When predicting full-field solution fields at multiple time steps, two types of errors are worth investigating. They are the time-averaged error (i.e., for each prediction case, the average error over all prediction time steps) and frame-averaged error (i.e., for each prediction case at each time frame, the average error over all nodes in the domain). The time-averaged error provides a holistic measurement of model prediction accuracy over the entire load history, while the frame-averaged error provides a more instantaneous insight into how the prediction accuracy changes with the current magnitude of the input function. For that, scatter plots of the mean absolute error versus the instantaneous lid velocity are shown in Fig. 6.

Fig. 6
figure 6

Scatter plots and trend lines of frame-averaged errors for: a pressure (P), b x velocity (u), c y velocity (v)

From the performance metrics in Table 2, we see that the S-DeepONet is able to generate accurate predictions with relative \(L_2\) errors less than 10% and \(R^2\) of above 0.99 for all components with only 3200 training data point. The contour plots in Fig. 5 further confirm the accuracy of the model, as we can see that the S-DeepONet predicted contours match closely with those computed from direct numerical simulations. In the worst-case scenario (the last row of Fig. 5), we see that the predicted pressure contour differs significantly from the ground truth at step 1. However, for some time steps (e.g., step 17, third column), the S-DeepONet predicted contours are similar to the ground truth even in the worst case. This observation is reasonable, as the percentiles are ranked based on time-averaged error, and does not necessarily mean that the predictions are inaccurate in all time steps (the errors can be concentrated in a few time steps while other step predictions remain accurate). When inspecting the frame-averaged error versus the instantaneous lid velocity, as shown in Fig. 6, we see a consistent, generally increasing trend with increasing lid velocity. The correlation coefficient of the trend line (\(R^2\)) is around 0.4, indicating a reasonable correlation, a subject which we will further investigate and compare in Sect. 3.2.

3.2 History-dependent plastic deformation

For the plastic deformation problem, we set \(C=2\) to predict two vector components, namely the Mises stress and the equivalent plastic strain. The means and standard deviations of the metrics over three repetitions of the S-DeepONet training are summarized in Table 3.

Table 3 Result repeatability, plastic deformation
Fig. 7
figure 7

Contour plots for stress and plastic strain fields at different times and percentiles of prediction accuracy. In each sub-figure, the top and bottom rows show stress and plastic strain, resp. In each contour, only the top half of the NN prediction is shown due to symmetry, with the FE solution shown in the bottom half

To show the statistical distribution of prediction error among the 800 test cases, stress contours that correspond to the 0th (best case), 80th, and 100th (worst case) percentile prediction error are displayed in Fig. 7 for the improved S-DeepONet model. For the ranking of prediction errors, the average relative \(L_2\) error over all time steps and all output components were used to obtain a single scalar representation of prediction accuracy over all time steps and output components. Due to the symmetry in the loading, only the top half of the NN predictions are shown in Fig. 7, with the bottom half reserved for (the flipped top half of) the FE simulation, the two are separated by a black dashed line.

To highlight the efficiency of the current vector output model, we trained two scalar S-DeepONets (i.e., by setting \(C=1\)), one for \(\bar{\sigma }\) and the other for \(\bar{\epsilon }_p\), and compare the computational efficiency and accuracy of the models. For a fair comparison, training of the three models were conducted using identical training and testing data points. The model size, training, and inference time are compared in Table 4.

Table 4 Vector model versus scalar models

The prediction quality metrics are summarized in Table 5.

Table 5 Vector model versus scalar models, performance metrics

The contour plots predicted by the vector S-DeepONet model and the two scalar S-DeepONet models at different time steps are shown in Fig. 8 for model comparison. The test case corresponding to the 50\(^{th}\) (median case) percentile prediction error is shown for representation.

Fig. 8
figure 8

Contour plots for stress and plastic strain fields predicted by the vector and scalar S-DeepONets. In each sub-figure, the top and bottom rows show stress and plastic strain, resp. In each contour, only the top half of the contour is shown due to symmetry. The top half shows the prediction from the vector S-DeepONet and the bottom half shows that from the scalar S-DeepONets

We compare the histograms of the time-averaged error for von Mises stress and equivalent plastic strain for the vector and scalar S-DeepONets in Fig. 9. The scatter plots and trend lines for the frame-averaged mean absolute error versus the magnitude of the applied displacement are shown in Fig. 10.

Fig. 9
figure 9

Histograms of time-averaged errors for: a von Mises stress, b equivalent plastic strain

Fig. 10
figure 10

Scatter plots and trend lines of frame-averaged errors for: a von Mises stress, b equivalent plastic strain

From the repeatability results in Table 3, we see that the performance of the proposed vector S-DeepONet is consistent and repeatable, with minimal performance variation when trained with different data. With the von Mises stress, the model achieves a relative \(L_2\) error of about 5% and a \(R^2\) value of 0.997, which is accurate considering that only 3200 data points were used in training. The relative \(L_2\) error over all test data for equivalent plastic strain is as high as 21%, but this is expected as this metric is significantly inflated numerically when the specimen is elastic. When inspecting the mean absolute error in plastic strain, we see that the error is only about \(8.4\times 10^{-5}\), and the \(R^2\) value is 0.999, again indicating accurate model predictions. It is also worth highlighting that the proposed architecture solved two problems with completely different physics with only minor changes in network structures to account for the different number of time steps and output vector components, thereby demonstrating the generalizability and versatility of the proposed framework. The contour plots at different percentiles and time steps, as shown in Fig. 7, provide a more direct view of the prediction performance. In the best case (first row of Fig. 7), the S-DeepONet is able to accurately capture the hot spots in stress and plastic strains at different time steps, showing close agreement with the corresponding FE simulations. At the 80\(^{th}\) percentile (second row of Fig. 7), we see that the S-DeepONet was unable to predict the stress contour initially when the stress is small, and response is fully elastic but is still able to predict similar contours of both quantities at later time steps once the magnitudes of the stress and plastic strain increase. This observation is reasonable since the relative \(L_2\) error can be numerically inflated when the magnitude of the ground truth (i.e., in the denominator of Eq. 13(a)) is small. A similar situation is observed for the worst case (last row of Fig. 7), where the specimen remains completely elastic throughout this loading history. In this case, the relative \(L_2\) error for plastic strain is numerically inflated throughout the entire loading history. Hence, it is ranked the worst case.

The key improvement of the current architecture is the ability to predict multiple vector components at time steps with one model. For the case of two output components, Table 4 shows that the vector model only has 0.4% more trainable parameters as compared to the scalar S-DeepONet model. Training the vector model took 15,547 s, which is 20.8% faster than the combined training time for the two scalar models, while the inference speed is about 80% faster. On average, each FE simulation requires 48 s of CPU time to solve, making the S-DeepONet inference about 11,900 times faster than running a FE simulation. Moreover, as revealed by the performance metrics presented in Table 5, we see that the performance of the vector S-DeepONet is highly similar to (albeit slightly worse than) the individual scalar networks. Figure 8 shows that the predicted contour plots of the von Mises stress and equivalent plastic strain are highly similar at different time steps as well. Except at the beginning of the loading history (first column of Fig. 8), where the specimen is fully elastic, the vector and scalar models predicted different contours for equivalent plastic strain (when they should be identically 0). Figure 9 compares the histograms of the time-averaged errors from the three different models. We see that the distribution of the prediction accuracy is similar between the vector and scalar S-DeepONet models. However, considering the significant time and model size savings, training the proposed vector S-DeepONet model instead of training two scalar models individually is the superior option from an efficiency perspective.

Lastly, for a multi-step prediction, it is worth studying how the frame-averaged prediction error changes as a function of the instantaneous applied displacement magnitude. Figure 10 obviously shows that there exists little correlation (i.e., \(R^2=0.02\)) between the current displacement magnitude and the frame-averaged error. As a comparison, recall from Sect. 2.2.1 that all three components (Pu, and v) show a noticeable correlation with a \(R^2\) of around 0.4. This observation difference is reasonable because of the differences in the governing physics. Plasticity is path-dependent; the current stress magnitude depends not only on the current displacement magnitude but also on the past loading history, hence the lower correlation with the current displacement magnitude. On the other hand, the steady-state flow field can be uniquely defined by the lid velocity. Hence, the time-dependent flow field magnitude shows a stronger correlation with the current lid velocity.

3.3 Application of the trained S-DeepONet model

Three additional load curves were randomly generated following the same procedure outlined in Sect. 2.2.2. The trained S-DeepONet model did not see those curves during its training and testing. FE simulations were performed on those load curves to obtain the reference stress history curves, which were inputs to the inverse identification. To validate the results, FE simulations were performed using the load paths identified by S-DeepONet. The load curves and stress histories are compared in Fig. 11.

Fig. 11
figure 11

Comparison of the load curves and stress histories identified by S-DeepONet and GA

With the help of efficient forward inference of the S-DeepONet model, the three GA cases were completed in an average of 24 s, which is fast considering that a single forward FE simulation of the dog bone specimen takes around 48 s. The results show that in all three cases, the S-DeepONet predicted stress histories match closely with the given ground truths. Moreover, a comparison of the S-DeepONet predictions and the corresponding FE validation results (generated from the same load curve predicted by GA) shows that the S-DeepONet predictions remain highly accurate for load curves outside of the original training and testing data points, again showing the high accuracy of the trained model. Despite the high similarity between the identified and given stress histories, the load curves identified by GA and S-DeepONet do not always match the known ground truths. We see two characteristic cases from Fig. 11: (1) matching load curves (i.e., Fig. 11a), (2) load curves with equal but opposite displacements (i.e., Fig. 11b). It is also interesting to see a combination of both characteristic behaviors in Fig. 11c where the second half of the predicted load curve matches closely with ground truth while the first half shows opposite displacements. This is reasonable from a mechanics perspective, since the von Mises stress is a positive quantity regardless of tensile or compressive loading, and the material considered in this work does not exhibit tension–compression asymmetry.

4 Conclusions, limitations, and future work

The sequential DeepONet (S-DeepONet) model previously proposed by the authors [40] is a variant of the DeepONet architecture [26] that uses a gated recurrent unit (GRU) in the branch network to capture the temporal information in the time-dependent input functions. However, the original S-DeepONet architecture only predicts the last time frame of a time-dependent evolution, and the predicted field is a scalar. Realizing this limitation, we introduced an improved version of the S-DeepONet capable of simultaneously predicting multiple time steps for solution fields with multiple vector components. This feature is made possible by the tensor product structure of the combination operation that combines the encoded temporal information from the branch network and the encoded spatial information (for all vector components) from the trunk network. We further elucidate that the tensor product combination process can be viewed as the simultaneous identification of a set of basis shapes and weights, with which the final contour can be expressed as a weighted linear combination. This architecture expands on the idea of exploiting the powerful temporal encoding capability of the GRU and the spatial encoding capability of the DeepONet architecture. To the best of the authors’ knowledge, this is also the first time in literature that DeepONet has been extended to predict a transient field with multiple vector components at different time steps. To demonstrate the application of the improved S-DeepONet, we showed an example of lid-driven cavity flow and an example of plastic deformation, both subjected to time-dependent input loads and having multiple output components at many time steps. In both cases, the improved S-DeepONet was able to provide accurate predictions for all output components using only 3200 data points in training. For all components, the DeepONet predictions achieved a \(R^2\) value of over 0.99 and a relative \(L_2\) error of less than 10% (except for the equivalent plastic strain in Sect. 3.2). We highlight the fact that minimum architecture change is needed (except for change in the output components and time steps) for solving those two problems with totally different underlying physics, which shows that the proposed S-DeepONet is highly versatile. Once trained, the S-DeepONet can infer accurate full-field results at different time steps at least three orders of magnitude faster than direct numerical simulations. The trained model can also be used in conjunction with gradient-free optimizers such as the genetic algorithm to perform accurate inverse parameter identification. The example using the plasticity data shown that the inverse identification is highly efficient (finished in about half the time of one FE simulation) and accurate. Using the plasticity example, we also demonstrated the effectiveness of predicting all output components at once using a vector network instead of training a separate scalar network for each of the output components. With only 0.4% more parameters, the vector S-DeepONet trained 20.8% faster than the two scalar networks combined while maintaining a very similar level of prediction accuracy. Therefore, it is recommended to use the vector S-DeepONet proposed in the current work, if multiple output components are of interest at multiple time steps.

Through the analysis of the mean error magnitude as a function of the mean input load magnitude, it was revealed that the current S-DeepONet suffers from the limitation that it is unable to accurately predict the field contours when the field magnitude is small, even with a time-dependent data scaling scheme. This is likely due to the use of the MSE loss function, which only results in a small loss contribution for those data points with small magnitudes.

With the capability to accurately and efficiently predict solution history at multiple time steps for multiple different components, the improved S-DeepONet architecture proposed in this work provides a versatile tool to the engineering community to build surrogate models for complex nonlinear numerical simulations. The versatility of this architecture is demonstrated in this work through its application in both solid and fluid mechanics problems and can be further employed in different engineering applications. With the weights and bias of the trained model, almost instant full-field forward predictions can be evaluated even on low-end platforms like laptops, which provides a novel tool for rapid preliminary designs, sensitivity analysis, uncertainty quantification, online controls, and as a black-box surrogate model for optimization, as we have demonstrated. In future work, we will investigate the use of transformer models [50] and attention mechanisms [51] in the DeepONet architecture to achieve higher prediction accuracy with lower computational costs.