Research on surrogate model of dam numerical simulation with multiple outputs based on adaptive sampling

Liang, Jiaming; Li, Zhanchao; Pan, Litan; Khailah, Ebrahim Yahya; Sun, Linsong; Lu, Weigang

doi:10.1038/s41598-023-38590-z

Research on surrogate model of dam numerical simulation with multiple outputs based on adaptive sampling

Article
Open access
Published: 24 July 2023

Volume 13, article number 11955, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Research on surrogate model of dam numerical simulation with multiple outputs based on adaptive sampling

Download PDF

Jiaming Liang¹,
Zhanchao Li^1,2,
Litan Pan³,
Ebrahim Yahya Khailah^1,4,
Linsong Sun¹ &
…
Weigang Lu¹

820 Accesses
1 Altmetric
Explore all metrics

Abstract

Dam numerical simulation is an important method to research the dam structural behavior, but it often takes a lot of time for calculation when facing problems that require many simulations, such as structural parameter back analysis. The surrogate model is widely used as a technology to reduce computational cost. Although various methods have been widely investigated, there are still problems in designing the surrogate model's optimal Design of Experiments (DoE). In addition, most of the current DoE focuses on establishing a single-output problem. Designing a reasonable DoE for high-dimensional outputs is also a problem that needs to be solved. Based on the above issues, this research proposes a sequential surrogate model based on the radial basis function model (RBFM) with multi-outputs adaptive sampling. The benchmark function demonstrates the applicability of the proposed method to single-input & multi-outputs and multi-inputs & multi-outputs problems. Then, this method is applied to establishing a surrogate model for dam numerical simulation with multi-outputs. The result demonstrates that the proposed technique can be sampled adaptively and samples can be targeted based on the function form of the surrogate model, which significantly reduces the required sampling and calculation cost.

An intelligent multi-fidelity surrogate-assisted multi-objective reservoir production optimization method based on transfer stacking

Article 15 July 2022

Reliability-based optimum design of hydraulic water retaining structure constructed on heterogeneous porous media: utilizing stochastic ensemble surrogate model-based linked simulation optimization model

Article 31 January 2019

Multi-surrogate-based global optimization using a score-based infill criterion

Article 07 September 2018

Introduction

Almost for all countries in the world, dams are a vital part of the nation’s infrastructure¹. In terms of flood control, power generation, irrigation, water supply, etc. The dam has yielded significant social and economic benefits. As a highly complex system, a large number of numerical simulations are required in the process of parameter back analysis, uncertainty analysis, sensitivity analysis, optimization design, and so on. Meanwhile, a large number of simulations bring extremely expensive computing cost. To overcome this issue, a surrogate model is used to replace the high-fidelity simulation model.

The surrogate model, also known as a meta-model², is a simplified model with a small computation scale, but the calculation results are close to those of a model with high precision. The commonly used surrogate model includes the polynomial regression surface (PRS)^3,4,5, Kriging model^6,7, neural network model^8,9,10,11, support vector regression (SVR) model^12,13,14,15, radial basis function model (RBFM)^16,17,18,19, and so on. RBFM is a function of the distance between data points, and has the characteristics of dimensionality independent and meshless. RBFM, as a precise interpolation model, not only performs well in interpolation accuracy but also has a relatively simple method with clear principles, requiring few specified parameters and easy operation. Therefore, it is extensively acknowledged in the field of surrogate model research^20,21,22,23.

In recent years, many scholars have applied the surrogate model to dam research. Li et al.²⁴ introduced a surrogate-assisted stochastic optimization inversion (SASOI) algorithm to identify the static and dynamic parameters based on Latin hypercube sampling (LHS). Shahzadi et al.²⁵ constructed a surrogate model by combining polynomial chaos expansion and deep learning networks to assess the effect of constitutive soil parameters on the behavior of a rockfill dam using Sobol sampling. By combining the genetic algorithm (GA) with an updated Kriging surrogate model (UKSM), Wang et al.²⁶ presented a new optimization procedure to reduce the computational cost for determining the optimal shape of a gravity dam based on LHS. Zhang et al.²⁷ took the multi-layer perceptron algorithm as a surrogate model for determining deformation monitoring indexes based on LHS. Rad et al.²⁸ constructed a new surrogate model by combining a hybrid SVR based on generalized normal distribution optimization (GNDO) with the Monte Carlo Simulation (MCS) to solve the RBDO problem of gravity dam design.

In the current research, LHS or Sobol sampling or other uniform sampling techniques are utilized more frequently to infill the design space. As it is widely acknowledged that in the absence of any priori knowledge on the problem under consideration, uniform sampling of the design space throughout the whole space is favourable. This kind of Design of Experiments (DoE) of sampling the sample space once is called one shot sampling method (OS). Methods such as central composite design²⁹, uniform design³⁰, LHS³¹, Generalized LHS³², and optimal Latin hypercube sampling (OLHS)³³ are frequently employed. However, these methods often have high uncertainty in the high-nonlinear region, resulting in bad model performance^34,35,36. Especially when dealing with a complex and high-dimensional dam structure, establishing the dam surrogate model contains numerous regions of high uncertainty. Therefore, DoE based on the aforementioned strategies frequently requires many samples in order to establish a sufficiently accurate surrogate model.

The adaptive sampling strategy is applied to DoE to obtain a sufficiently accurate surrogate model with a minimum sample capacity,. Under this strategy, the present surrogate model is used to assist the selection of the new samples, to improve the model's local optimization or the global accuracy of the model. The key to the success of adaptive sampling is therefore the selection of infill criteria, which determines the location of the new samples in the sample space. Wang et al.³⁷ presented a novel adaptive sampling method based on the hyper-volume iteration (HVI) strategy for constructing surrogate aerodynamic models. Applying an adaptive sampling strategy based on expected local errors, Xu et al.³⁸ proposed an ensemble of adaptive surrogate models. Rosenbaum et al.³⁹ compared different sampling strategies and new theoretical methods using a dense set of validation data to gain a deeper understanding of optimal sample distributions and lower error boundaries. Based on Proper Orthogonal Decomposition (POD), Guenot et al.⁴⁰ proposed a novel contribution to adaptive sampling strategies for non-intrusive reduced order models. In addition, adaptive sampling also applies infill criteria such as minimizing the predicted objective function (MP), maximizing the expected improvement function (EI)⁴¹, maximizing the probability of improvement function (PoI)², minimizing the lower confidence bound (LCB)⁴², maximizing the mean squared error (MSE)², and so on.

Currently, most adaptive sampling methods are designed for single-output problems. However, the surrogate model of a dam is usually a multi-outputs model with multiple related outputs. In general, the multi-outputs problem is comprised of asymmetric multi-outputs problem^43,44,45 and symmetric multi-outputs problem⁴⁶. As for the asymmetric multi-outputs problem, the simulation models with different computational costs and fidelities describe the same output. In contrast, for a symmetric multi-outputs problem, when simulating a set of inputs, different outputs are calculated simultaneously. In establishing the dam surrogate model, the structural behavior at different locations is anisotropic. The structural behavior of the dam is affected by the spatial location, but the input parameters are fixed. Therefore, the establishment of a dam surrogate model can be considered as a symmetric multi-outputs problem.

This article starting with the adaptive sampling method based on the RBFM. It applies to creating a surrogate model with multi-outputs using three infill criteria and entropy weight method. It then proposes an adaptive sampling method with multi-outputs based on RBFM. The method starts with a small initial sample set and establishes surrogate models for multi-outputs, respectively. Then, three infill criteria are used to search for infill samples, with MSE and EI being used for global and local searches, respectively. The ideal infill group is determined using the entropy weight method. Under the combination of these three infill criteria and entropy weight method, the adaptive sampling model can not only further sampling, make RBFM more accurate, but also solve the problem of multi-outputs sampling and improve the computational efficiency even further.

The remaining sections of this paper are structured as follows. In section “The proposed approach”, the proposed method's procedure is described in detail. In section “Benchmark function test”, benchmark functions demonstrate the applicability of the proposed method to SIMO and MIMO problems. Using typical measuring points, Section "Numerical experiment" establishes a surrogate model for numerical simulation of dams. Section "Conclusion" draws some conclusions.

The proposed approach

This section introduces the surrogate model with multi-outputs based on the adaptive sampling strategy in detail, including the Design of Experiments (DoE), surrogate model, infill criteria, the determination of the ideal infill group with multi-outputs, stopping criteria, and the flow chart of the proposed method.

Design of Experiments (DoE)

DoE⁴⁷ is used to generate samples in the design space to establish the initial surrogate model. In order to obtain an accurate surrogate model with limited initial samples, uniform sampling is usually adopted for sample space. Sobol' quasi-random sequences, also known as “low-discrepancy” sequences, have the property of uniform distribution in space. Unlike random numbers, quasi-random points are deterministic sequences that know the position of previously sampled points and are constructed to avoid the presence of clusters as much as possible. Meanwhile, Sobol' quasi-random sequences stratify the following three main requirements: (1) Best uniformity of distribution as $N \to \infty$. (2) Good distribution for fairly small initial sets. (3) A very fast computational algorithm. Therefore, this paper takes Sobol sequence as initial sampling.

Surrogate model

RBFM is a kind of nonlinear scatter interpolation model that fits the high dimensional function with the expression of one dimension function.

Suppose n samples are randomly taken as the training set, denoted as $\left( {x_{i} ,y_{i} } \right)$. Then the RBFM can be expressed as a linear combination, as shown in Eq. (1).

$${y\left( x \right)} = \sum\limits_{i = 1}^{n} {{\omega_{i}} \varphi \left( {\left\| {x - x_{i} } \right\|} \right)}$$

(1)

where $\varphi$ is the corresponding kernel function, $r_{i} = \left\| {x - x_{i} } \right\|$ represents the Euclidean distance between the sampling point $x$ of any parameter and the known sampling point $x_{i}$. $\omega_{i}$ is the weight coefficient of the kernel function at different training points. Common radial basis kernel functions include the Thin-plate spline function, Gaussian function, Multiquadric (MQ) function, Inverse multiquadric (IMQ), and so on. In this paper, Multiquadric (MQ) function is taken as the kernel function, and its expression is shown in Eq. (2).

$$\varphi \left( r \right) = \sqrt {r^{2} + c^{2} }$$

(2)

where c is the shape parameter of the radial basis function, whose value determines the specific shape of the kernel function.

Equation (1) of the above n sampling points can be used to obtain the system of equations shown in Eq. (3), from which the value of the weight coefficient $\omega_{i}$ can be obtained.

$$\left\{ \begin{gathered} \omega_{1} \varphi \left( {\left\| {x_{1} - x_{1} } \right\|} \right) + \omega_{2} \varphi \left( {\left\| {x_{1} - x_{2} } \right\|} \right) + \cdots + \omega_{n} \varphi \left( {\left\| {x_{1} - x_{n} } \right\|} \right) = y_{1} \hfill \\ \omega_{1} \varphi \left( {\left\| {x_{2} - x_{1} } \right\|} \right) + \omega_{2} \varphi \left( {\left\| {x_{2} - x_{2} } \right\|} \right) + \cdots + \omega_{n} \varphi \left( {\left\| {x_{2} - x_{n} } \right\|} \right) = y_{2} \hfill \\ \, \vdots \hfill \\ \omega_{1} \varphi \left( {\left\| {x_{n} - x_{1} } \right\|} \right) + \omega_{2} \varphi \left( {\left\| {x_{n} - x_{2} } \right\|} \right) + \cdots + \omega_{n} \varphi \left( {\left\| {x_{n} - x_{n} } \right\|} \right) = y_{n} \hfill \\ \end{gathered} \right.$$

(3)

Defining the coefficient vector ${\text{w}} = \left[ {\omega_{1} ,\omega_{2} , \ldots ,\omega_{n} } \right]^{T}$ and the matrix ${\Phi }_{i,j} = \varphi \left( {\left\| {x_{i} - x_{j} } \right\|} \right)$, where $i = 1,2, \ldots ,n$ and $j = 1,2, \ldots ,n$, this can be written as ${\Phi w} = y^{T}$. Then, provided the inverse of ${\Phi }$ exists, the matrix ${\text{w}} = {\Phi }^{ - 1} y^{T}$ can represent the above expression . The predictor of RBFM can be expressed as

$$\widehat{y}_{{N + 1}} = {\varvec{\upvarphi}} {\mathbf{w}} = {\varvec{\upvarphi}} {\varvec{\Phi}} ^{{ - 1}} y^{T}$$

(4)

where $\varphi = \left[ {\varphi \left( {\left\| {x_{N + 1} - x_{1} } \right\|} \right),\varphi \left( {\left\| {x_{N + 1} - x_{2} } \right\|} \right), \ldots ,\varphi \left( {\left\| {x_{N + 1} - x_{N} } \right\|} \right)} \right]$.

The RBFM treats that each deterministic response is the realization of some stochastic process (taken here to be a Gaussian random variable). Using the (Gaussian) distributions of N responses $y = \left[ {y_{1} ,y_{2} , \ldots ,y_{N} } \right]$ collected so far, it can be shown that the mean and the variance of the assumed stochastic process at $x_{N + 1}$ are⁴⁸

$$\widehat{y}_{{N + 1}} = {\varvec{\upvarphi}} {\varvec{\Phi}} ^{{ - 1}} y^{T}$$

(5)

$$\sigma _{{\widehat{y}_{{N + 1}} }}^{2} = 1 - {\varvec{\upvarphi}} {\varvec{\Phi}} ^{{ - 1}} {\varvec{\upvarphi}} ^{T}$$

(6)

The variance of the Gaussian distribution (Eq. 6) will be taken as a measure of the likely error at the prediction points.

Infill criteria

The infill criteria are developed to evaluate the prediction uncertainty and improvement of a current best value in global optimization by treating an unknown output as a realization of a stochastic process. In the proposed method, three infill criteria are used for self-adaptive sampling: (1) Mean squared error (MSE), (2) Expected improvement on minimum (EI_min), (3) Expected improvement on maximum (EI_max). Accordingly, three new points can be obtained at each updating cycle. The new sampling data is added to the initial DoE to update the surrogate models, which drives DoE towards the global optimum. The infill group above is based on a single-output. How to choose the ideal infill group in the case of multi-outputs is explained in section “Determine the ideal infill group with multi-outputs”.

Maximizing the expected improvement (EI)

Excepted improvement (EI) is an infill criterion to evaluate how much improvement of the current RBFM is expected if a new sample is obtained. We assume a random variable $Y\sim N\left[ {\hat{y}\left( x \right),s^{2} \left( x \right)} \right]$, where $\hat{y}$ is the RBFM predictor defined in Eq. (5), and s² is the MSE defined in Eq. (6). Denoting the best objective value from the sample evaluated so far by $y_{\min } = \min \left[ {y_{1} ,y_{2} , \ldots ,y_{N} } \right]$, where, y_min is the minimum value of all current samples. Then the improvement on the minimum I can be defined as $I = y_{\min } - Y\left( x \right)$, where, Y(x) is the Gaussian distribution. The goal is to find a sample on Y(x) that makes $I > 0$. Equation (7) can be integrated to compute the expectation of I. So the expected improvement is given by

$$E\left[ {I\left( x \right)} \right]_{\min } = \left\{ {\begin{array}{*{20}c} {\left( {y_{\min } - \hat{y}} \right)\Psi \left( {\frac{{y_{\min } - \hat{y}}}{s}} \right) + s\psi \left( {\frac{{y_{\min } - \hat{y}}}{s}} \right)} & {if\;s > 0} \\ {0\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;\;\,} & {if\;s = 0} \\ \end{array} } \right.$$

(7)

where $\Psi \left( \cdot \right)$ and $\psi \left( \cdot \right)$ are the cumulative distribution and probability density function of a standard normal distribution, respectively.

Equation (7) shows the EI on minimum, while EI on maximum is also applied to the selection of the infill points. Expected improvement on maximum can be expressed as

$$E\left[ {I\left( x \right)} \right]_{\max } = \left\{ {\begin{array}{*{20}c} {\left( {\hat{y} - y_{\max } } \right)\Psi \left( {\frac{{\hat{y} - y_{\max } }}{s}} \right) + s\psi \left( {\frac{{\hat{y} - y_{\max } }}{s}} \right)} & {if\;s > 0} \\ {0\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \,} & {if\;s = 0} \\ \end{array} } \right.$$

(8)

Schonlau⁴⁹ points out the expected improvement will tend to be large at a point with a predicted value smaller than y_min and/or there is much uncertainty associated with the prediction. Therefore, expected improvement can be considered as a balance between seeking promising areas of the design space and the uncertainty in the model.

Maximizing the mean squared error (MSE)

It can be found that in a high uncertainty region, the sparsity of training samples will lead to greater uncertainty of prediction of unknown samples. As a measure of the sparseness of the input space, MSE can be used as an infill criteria to search the infill sample that has a large prediction uncertainty. The MSE criteria is the prediction variance (s²) of the RBFM, so the infilling point $\left( {x_{MSE}^{n + 1} } \right)$ can be chosen by maximizing the MSE as

$$x_{MSE}^{n + 1} = \mathop {\arg \max s^{2} \left( x \right)}\limits_{x}$$

(9)

Determine the ideal infill group with multi-outputs

According to the three infill criteria in section “Infill criteria”, adding a new infill group to the DoE can somewhat improve the quality of the surrogate model. However, when faced with a multi-outputs problem, we will get m infill groups (assuming there are m outputs). How to determine the ideal infill group among these m infill groups is the focus of the section. As an objective weighting method, entropy weight method can calculate the weight of each output and provide a reference for determining the ideal infill group. The detailed steps are as follows:

Step 1 Establish a standard decision matrix. For the problem of m alternatives (outputs) and n attribute values (infill criterion), the initial decision matrix is:

$$A = \left( \begin{gathered} y_{11} \cdots y_{1n} \hfill \\ \, \vdots \, \ddots \, \vdots \hfill \\ y_{m1} \cdots y_{mn} \hfill \\ \end{gathered} \right)$$

(10)

The initial matrix A is normalized:

$$B = \left( \begin{gathered} b_{11} \cdots b_{1n} \hfill \\ \, \vdots \, \ddots \, \vdots \hfill \\ b_{m1} \cdots b_{mn} \hfill \\ \end{gathered} \right)$$

(11)

where b_ij is the standard values of the jth evaluation indicator in ith alternatives.

The formula for normalization is:

$$b_{ij} = \frac{{y_{ij} }}{{\sum\limits_{i = 1}^{m} {y_{ij} } }}$$

(12)

In the proposed method, the columns represent MSE, EI_min and EI_max, respectively.

Step 2 Calculate the weights based on the entropy weight method.

Define the entropy of jth indicators

$$e_{j} = - k\sum\limits_{i = 1}^{n} {p_{ij} \ln p_{ij} }$$

(13)

where

$$p_{ij} = \frac{{b_{ij} }}{{\sum\limits_{i = 1}^{n} {b_{ij} } }},k = \frac{1}{\ln \left( n \right)}$$

(14)

Since the smaller the entropy, the larger is the weight, the weight for the jth indicators is:

$$w_{j} = \frac{{1 - e_{j} }}{{\sum\limits_{j = 1}^{n} {\left( {1 - e_{j} } \right)} }}$$

(15)

where $w_{j} \in \left[ {0,1} \right]$, $\sum\limits_{j = 1}^{n} {w_{j} = 1}$, $j = 1,2, \ldots ,n$.

Step 3 Calculate the close degree T_i.

The weighting matrix Z is defined as:

$$Z = \left( \begin{gathered} z_{11} \cdots z_{1n} \hfill \\ \, \vdots \, \ddots \, \vdots \hfill \\ z_{m1} \cdots z_{mn} \hfill \\ \end{gathered} \right) = \left( \begin{gathered} w_{1} b_{11} \cdots w_{n} b_{1n} \hfill \\ \, \vdots \, \ddots \, \vdots \hfill \\ w_{1} b_{m1} \cdots w_{n} b_{mn} \hfill \\ \end{gathered} \right)$$

(16)

Determine the ideal solution matrix P:

$$P = \left[ {p_{1} ,p_{2} , \ldots ,p_{n} } \right]$$

(17)

where $p_{j} = \mathop {\max z_{ij} }\limits_{j}$, $j = 1,2, \ldots ,n$.

Calculate the close degree T_i:

$$T_{i} = 1 - \frac{{\sum\limits_{j = 1}^{n} {z_{ij} p_{j} } }}{{\sum\limits_{j = 1}^{n} {\left( {p_{j} } \right)^{2} } }}$$

(18)

It can be analyzed that the bigger the value of T, the better is the solution. The ideal infill group can be determined by the maximum value of T, and the determined infill group will be added to the DoE.

Stopping criteria

In order to evaluate the performance of surrogate model and determine the stopping criteria of adaptive sampling, it is necessary to select an appropriate index.

R-squared value (R²) and root mean square error (RMSE) are the most commonly used evaluation indexes, which are often used to evaluate the performance of surrogate model. The formula is as follows:

$$R^{2} = 1 - \frac{SSE}{{SST}} = 1 - \frac{{\sum\limits_{i = 1}^{n} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} } }}{{\sum\limits_{i = 1}^{n} {\left( {y_{i} - \overline{y}} \right)^{2} } }}$$

(19)

$$RMSE = \sqrt {\frac{{\sum\limits_{i = 1}^{n} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} } }}{n}}$$

(20)

where SSE is the sum of squares of residuals, SST is the total sum of squares. $\hat{y}_{i}$ and y_i denote the prediction value from the surrogate model and true output at the inferred samples. $\overline{y}$ is the average value of y.

R² is commonly used to measure how well a surrogate model explains changes in the data. R² has a great ability to explain the linear correlation of the model, but tends to be weak when facing complex nonlinear problems. As the standard deviation of the residual between the response value and the predicted value, RMSE commonly pays more attention to the degree of the surrogate model’s prediction of the absolute value of the response value. This paper focuses more on the prediction results of the surrogate model. However, when the prediction error is too large, it will have a great influence on RMSE, especially facing variables with larger dimensions. Therefore, the normalized root mean square error (NRMSE) is used as the model accuracy evaluation index to verify the prediction of the surrogate model. NRMSE is defined as follows:

$$NRMSE = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {\frac{{y_{i} - \hat{y}_{i} }}{\max \left( y \right) - \min \left( y \right)}} \right)^{2} } }$$

(21)

For multi-outputs problems, each output corresponds to an NRMSE. According to “short board effect theory”⁵⁰, the overall accuracy of a multi-outputs problem is determined by the worst model accuracy, so the definition of the system accuracy index NRMSE is:

$$NRMSE_{sys} = \mathop {\max }\limits_{1 \le j \le m} \left\{ {NRMSE_{j} } \right\}$$

(22)

Steps for the proposed approach

The flow chart of the proposed method is shown in Fig. 1, and the specific implementation steps are as follows:

(1)
Generate the initial DoE with Sobol sequence.
(2)
Calculate the real date by numerical simulation.
(3)
Establish surrogate models with multi-outputs.
(4)
Three infill criteria are used to select infill groups for different outputs to form a multi-outputs infill groups matrix.
(5)
Calculate the close degree of each infill group to determine the ideal infill group.
(6)
Calculate the validation metrics (NRMSE) based on validation set.
(7)
Estimate whether the validation metrics satisfy the stopping criteria. If satisfied, the calculation will be stopped. Otherwise, the ideal infill group will be added to DoE and repeat step (3) until satisfy the stopping criteria.

Benchmark function test

This section uses two kinds of benchmark functions to verify the applicability and stability of the adaptive sampling method with multi-outputs in single-input & multi-outputs (SIMO) and multi-inputs & multi-outputs (MIMO) problems. Meanwhile, the modeling accuracy of the surrogate model with multi-outputs is analyzed by NRMSE.

Single-input & multi-outputs (SIMO) problem

In this section, the proposed adaptive sampling method with multi-outputs is applied to SIMO problem, and the applicability of the method is investigated. Here, a non-stationary benchmark function proposed by Gramacy et al.⁵¹ and a multi-model benchmark function are used to verify this. For benchmark function f₁, due to the addition of periodic terms, the benchmark function has a certain fluctuation. Meanwhile, when x increases to a certain extent (x ≥ 2), the exponential term will increase greatly, which will gradually reduce the fluctuation of the function. For benchmark function f₂, the fluctuation degree of the benchmark function gradually increases with the increase of x due to the addition of a linear function. For these two benchmark functions, benchmark function f₁ has a strong fluctuation when x is small, while f₂ has a strong fluctuation when x is large. Therefore, the proposed adaptive sampling method can be well tested using these two benchmark functions. The specific formula is as follows:

$$f_{1} = \frac{{\sin \left( {10\pi x} \right)}}{2x} + \left( {x - 1} \right)^{4} ,x \in \left[ {0.5,2.5} \right]$$

(23)

$$f_{2} = - \left( {1.4 - 3x} \right)\sin \left( {18x} \right),x \in \left[ {0.5,2.5} \right]$$

(24)

For DoE, 10 random samples are generated by Sobol sequence. It is very difficult to choose the stopping criteria value, and a suitable stopping criteria can prevent overfitting or underfitting. This paper refers to the selection of stopping criteria in different literatures^52,53, and finally determines that the stopping criteria is 0.01. This section also gives the calculation process figure and NRMSE after each adaptive sampling, as shown in Fig. 2 and Table 1, respectively. The ideal infill group for each stage is shown in bold.

Table 1 NRMSE after each stage of adaptive sampling.

Full size table

For the benchmark function in SIMO problem, the proposed adaptive sampling method runs 8 stages, and eventually sampled a total of 34 samples, including 10 initial samples and 24 infill samples. Combined with the analysis of Fig. 2 and Table 1, the following conclusions can be drawn:

(1)
Under the initial sample size, the NRMSE of f₁ and f₂ are 0.1075 and 0.2683, respectively. As can be seen from the figure, due to the insufficient initial sample size and the adoption of random sampling method, the initial RBFM cannot reflect the maximum and minimum values of the original model exactly, which makes the performance of the surrogate model unsatisfactory.
(2)
After 3 stages of adaptive sampling, the current DoE consists of 10 initial samples and 9 infill samples. Compared with the initial model, the performance of the model is greatly improved, and the NRMSE of f₁ and f₂ decrease from 0.1075 and 0.2683 to 0.0679 and 0.1408. It can be found that the model accuracy improvement of f₂ is significantly greater than that of f₁, because the infill group corresponding to f₂ is selected as the ideal infill group in the first 3 stages. The newly added 9 infill samples are mainly clustered at x ≤ 1 and x ≥ 2, which significantly improves the model performance of surrogate model of f₂. While for f₁, although the model performance has been improved to a certain extent, there are still some differences with the original model (Supplementary Information).
(3)
After 6 stages of adaptive sampling, a total of 18 infill samples are added. It can be found that f₁ and f₂ are basically similar to the original model under this stage, and NRMSE is decreased to 0.0189 and 0.0107, respectively. In stage 4, infill samples corresponding to f₂ are selected for the ideal infill group, while infill samples corresponding to f₁ are selected for stage 5 and 6. By observing the newly added 9 infill samples, it can be found that most of them are at the maximum and minimum values of the benchmark functions, which makes the surrogate model more closed to original model.
(4)
After 8 stages of adaptive sampling, NRMSE of the two benchmark functions satisfy the stopping criteria, which is 0.0014 and 0.0015, respectively. At this point, the figure of the surrogate model is basically the same as the original model. Combined with the model error figure in Fig. 3, it can be found that the model error of f₂ is slightly larger than that of f₁, with the maximum error reaching 0.1, while that of f₁ is only 0.04. This is because normalized MSE is used to measure the model performance. Assuming another stage of adaptive sampling, the model error of f₂ will be further reduced. In stage 7, the infill samples corresponding to f₂ are selected as the ideal infill group. At this point, NRMSE of f₂ has satisfies the stopping criteria, so f₂ is no longer selected as the ideal infill group in stage 8.
(5)
According to Table 2, it can be found that when Sobol sequence samples 34 samples, NRMSE of f₁ and f₂ are 0.0015 and 0.0023, larger than NRMSE corresponding to adaptive sampling. When the size of samples reaches 40, the surrogate model performance of f₁ is better than that of the proposed method, while the surrogate model performance of f₂ is not significantly improved. This is because in Sobol sequence, samples generated by quasi-random sequence are not targeted at large errors, so the samples are considered as a waste of computational performance and invalid sampling. Until the sample size reaches 47, Sobol sequence obtain valid sample. Therefore, the adaptive sampling method with multi-outputs proposed in this paper can save 27.66% of sampling cost.
(6)
In conclusion, the adaptive sampling method with multi-outputs plays a well performance in SIMO problem. This method can carry out targeted samples according to the characteristics of the function and can greatly reduce the sample size required to train the surrogate model.

Table 2 Comparison of NRMSE of adaptive sampling with multi-outputs and Sobol sequence in SIMO problem.

Full size table

Multi-inputs & multi-outputs (MIMO) problem

The previous section analyzed the application of the adaptive sampling method with multi-outputs in SIMO problem. This section will focus on the application of this method in MIMO problem. In MIMO problem, after the surrogate model is established for each output and its infill group is obtained, the ideal infill group is determined based on the entropy weight method to update the DoE. On this basis, the surrogate model for each output is reconstructed. This process is repeated until the surrogate model for each output meets the stopping criteria. As the sample size increases, the accuracy of the surrogate model for each output tends to increase. However, the ideal infill group can improve the accuracy of the surrogate model corresponding to the output. This section verifies the method using six benchmark functions in Liu et al.⁴⁶. The figures of these six benchmark functions are shown in Fig. 4, and the specific formula is as follows:

Benchmark Function 1 (BF1)

$$\begin{aligned} f_{1} = & 3\left( {1 - x_{1} } \right)^{2} \exp \left( { - x_{1}^{2} - \left( {x_{2} + 1} \right)^{2} } \right) \\ \, & - 10\left( {\frac{{x_{1} }}{5} - x_{1}^{3} - x_{2}^{5} } \right)\exp \left( { - x_{1}^{2} - x_{2}^{2} } \right) \\ \, & - \frac{1}{3}\exp \left( { - \left( {x_{1} + 1} \right)^{2} - x_{2}^{2} } \right),x_{1,2} \in \left[ { - 3,3} \right] \\ \end{aligned}$$

(25)

Benchmark Function 2 (BF2)

$$f_{2} = x_{1} x_{2} + 0.05,x_{1,2} \in \left[ { - 3,3} \right]$$

(26)

Benchmark Function 3 (BF3)

$$\begin{aligned} f_{3} = & \left( {1 + \left( {x_{1} + x_{2} + 1} \right)^{2} \left( {19 - 14x_{1} + 13x_{1}^{2} - 14x_{2} + 6x_{1} x_{2} + 3x_{2}^{2} } \right)} \right) \\ \, & \left( {30 + \left( {2x_{1} - 3x_{2} } \right)^{2} \left( {18 - 32x_{1} + 12x_{1}^{2} - 48x_{2} - 36x_{1} x_{2} + 27x_{2}^{2} } \right)} \right), \\ \, & x_{1,2} \in \left[ { - 3,3} \right] \\ \end{aligned}$$

(27)

Benchmark Function 4 (BF4)

$$\begin{aligned} f_{4} = & - \cos \left( {x_{1} } \right)\cos \left( {x_{2} } \right)\exp \left( { - \left( {x_{1} - \pi } \right)^{2} - \left( {x_{2} - \pi } \right)^{2} } \right), \\ \, & x_{1,2} \in \left[ { - 3,3} \right] \\ \end{aligned}$$

(28)

Benchmark Function 5 (BF5)

$$\begin{aligned} f_{5} = & \frac{1}{36}\left( {\left( {x_{1} + 3} \right)^{2} + \left( {x_{2} + 3} \right)^{2} } \right) \\ \, & - \left( {\cos \left( {3\left( {x_{1} + 3} \right) + \cos \left( {3\left( {x_{2} + 3} \right)} \right)} \right)} \right), \\ \, & x_{1,2} \in \left[ { - 3,3} \right] \\ \end{aligned}$$

(29)

Benchmark Function 6 (BF6)

$$\begin{aligned} f_{6} = & - 20\exp \left( { - \frac{{x_{1}^{2} + x_{2}^{2} }}{90}} \right) \\ & - \exp \left( {0.5\left( {\cos \left( {\frac{2\pi }{3}x_{1} } \right) + \cos \left( {\frac{2\pi }{3}x_{2} } \right)} \right)} \right) + 20 + \exp \left( 1 \right), \\ & x_{1,2} \in \left[ { - 3,3} \right] \\ \end{aligned}$$

(30)

As can be seen from Fig. 4, these six functions have different characteristics. The benchmark function 1 (BF1) contains multiple Gaussian and polynomial terms with different centers. Thus, it has a multi-modal region in the center and a flat boundary region. The benchmark function 2 (BF2) has a simple second-order polynomial term. Compared to BF2, the benchmark function 3 (BF3) has a higher-order polynomial term, and performs a highly nonlinear responses near the boundaries. Besides, its output value is much larger than the output values of other functions. The benchmark function 4 (BF4) has a Gaussian function centered at $\left( {\pi ,\pi } \right)$ and some periodic terms. Therefore, except for a single peak in the right corner, it produces almost zero output across the entire region. The output of the benchmark function 5 (BF5) waving uniformly within the region. The benchmark function 6 (BF6) has a Gaussian function located in the center and another Gaussian function with some periodic terms. Therefore, it produces a single peak region and a boundary region where the output value changes uniformly at the center.

Similar to Sect. 3.1, for DoE, 50 random samples are generated by Sobol sequence, and for stopping criteria, 0.01 is set here. For the benchmark function in MIMO problem, the proposed adaptive sampling method runs 16 stages, and eventually sampled a total of 98 samples, including 50 initial samples and 48 infill samples. The table below shows the NRMSE calculated for each stage and the ideal infill group selected for each stage, where the ideal infill group for each stage is shown in bold. Table 4 compares the results of the adaptive sampling method with multi-outputs and Sobol sequence. The following conclusions can be drawn by combing Tables 3 and 4:

(1)
Under the initial sample size, the NRMSE of each benchmark function are 0.026429, 0.000026, 0.009237, 0.013803, 0.062813, and 0.041767. Only f₂ and f₃ satisfy the stopping criteria. Compared with the other four benchmark functions, f₂ and f₃ have simpler forms and therefore require smaller sample sizes. While for the other four benchmark functions, the initial DoE is insufficient to cover all characteristics, so adaptive sampling is required.
(2)
During the 16 stages of adaptive sampling, the ideal infill groups mainly focused on f₁, f₅, and f₆. After each infill, it can be found that the NRMSE of the benchmark functions corresponding to the ideal infill group is basically better than other benchmark functions. When the adaptive sampling reaches stage 8, the NRMSE of f₁ satisfies the stopping criteria. Therefore, f₁ is no longer used as an indicator of high close degree in the subsequent selection of ideal infill group. Similarly, in stage 13, NRMSE of f₆ also satisfies the stopping criteria. At this time, only f₅ does not satisfy the stopping criteria. Therefore, the last 4 stages of adaptive sampling are targeted on f₅.
(3)
For the six benchmark functions in MIMO problem, this section also establishes a surrogate model using Sobol sequence and compares the results with the adaptive sampling with multi-outputs. The proposed method satisfies the stopping criteria at 98 samples, while Sobol sequence satisfies the stopping criteria at 150 samples. Compared with Sobol sequence, this method can save more than 30% of sample cost. The NRMSE of these two methods on 98 samples are 0.006339, 0, 0.000277, 0.000769, 0.008146, 0.009856 and 0.026285, 0.000022, 0.007969, 0.013974, 0.059359, 0.040626. It can be found that the NRMSE of Sobol sequence is significantly higher than that of the proposed method. Similar to SIMO problem, random sampling fails to sample the model in a targeted way, leading to the waste of a large number of samples.
(4)
In conclusion, the adaptive sampling method with multi-outputs plays a well performance in MIMO problem. Moreover, sampling can be adjusted according to the function form, which greatly reduce the cost of model calculation.

Table 3 Calculation process of adaptive sampling with multi-outputs in MIMO problem.

Full size table

Table 4 Comparison of NRMSE of adaptive sampling with multi-outputs and Sobol sequence in MIMO problem.

Full size table

Numerical experiment

In this section, adaptive sampling with multi-outputs is applied to a gravity dam. The dam is located on the Brahmaputra River in Tibet, China. The controlled catchment area of dam toe is 157,407 km², the annual average flow is 1010 m³/s, the total reservoir capacity is 57.89 million m³, the normal water level of the reservoir is 3477.00 m, and the corresponding reservoir capacity is about 55.28 million m³. The dam is an RCC gravity dam with a maximum dam height of 118.0 m and a total crest length of 389 m. It is divided into 17 sections, among which 6^#–9^# is the overflow dam section and the rest is the water retaining dam section.

In this paper, 17 typical measuring points are selected from the top of the dam, and the dam surrogate model of these 17 measuring points are established by using the adaptive sampling with multi-outputs. The layout of measuring points is shown in Fig. 5.

For DoE, 100 random samples are generated by Sobol sequence in this section. Variable information is shown in Table 5. In engineering problems, the nonlinearity of the model is more complex than that of the benchmark function. When the NRMSE of the model increases to 0.05, every improvement will be slow and will pay a lot of calculation cost. Combined with previous engineering experience, NRMSE is set to 0.05 in most cases. And for engineering problems, setting NMRSE to 0.05 can relatively meet the requirements of most cases. Therefore, NRMSE is set to 0.05 here. Aiming at this practical engineering problem, the proposed adaptive sampling method runs 7 stages, and eventually sampled a total of 121 samples, including 100 initial samples and 21 infill samples. Similar to section “Benchmark function test”, the NRMSE calculated for each stage and the ideal infill group selected for each stage are also given here, where bold represents the ideal infill group for each stage, as shown in Table 6.

Table 5 Variable information table for adaptive sampling of gravity dam.

Full size table

Table 6 Calculation process of adaptive sampling with multi-outputs in practical engineering problem.

Full size table

It can be found that under the initial DoE, NRMSE of each benchmark function is less than 0.1. However, in some benchmark functions, NRMSE is close to 0.05. Therefore, whether the selection of 100 initial samples is too much for the surrogate model of dam based on typical measuring points will be discussed later. In the process of 7 stages of adaptive sampling, although the ideal infill group may not be located at the measuring point with the maximum error of each stage, but it will always select the measuring point with large error. After each stage of infill, the NRMSE of the measuring point corresponding to the ideal infill group tends to decrease significantly.

In order to verify that this method requires fewer samples than uniform sampling or random sampling for multi-outputs problems, this paper uses random sampling to generate 1000 samples and performs the simulations. 200 samples are selected as the verification set, while the other 800 samples are compared with adaptive sampling method. The results are shown in Table 7.

Table 7 Comparison of NRMSE of adaptive sampling with multi-outputs and Sobol sequence in practical engineering problem.

Full size table

It can be seen from Table 7 that when the stopping criteria is set to 0.05, 121 samples are selected by the adaptive sampling with multi-outputs. While when Sobol sequence randomly sampled 121 samples, every measuring point is greater than stopping criteria, but compared with 100 samples, the performance of the model is improved. The performance of the surrogate model does not satisfy the stopping criteria until the sample size of Sobol sequence reaches 400. It can be found that in the previous sampling process, the rest of the measuring points all satisfied the stopping criteria, however, due to the characteristics of random sampling, the surrogate model of measuring point 2 does not been well improved. When the sample size of Sobol sequence reaches 800, it can be found that although the performance of the surrogate model has been improved to some extent, but considering the high calculation cost, this part of sampling can be considered redundant and meaningless.

In the analysis of Table 6, we mentioned that 100 initial samples are too much to establish a surrogate model of dam based on typical measuring points. Therefore, in order to explore the influence of the initial sample size on the final sample size, adaptive sampling is used for different initial sample size. The result is shown in Table 8.

Table 8 The influence of different initial sample size on the final sample size.

Full size table

As can be seen from Table 8, only 95 samples are required when 2 initial samples are selected, while 121 samples are required when 100 initial samples are selected. With the increase of the initial sample size, the final sample size required by this method also increased gradually. Whether different initial sample size will affect the distribution form of the final sampling results of each variable will be studied in detail below.

Figure 6 shows the influence of different initial sample sizes on the distribution form of the final sampling results of each variable. 2, 30, and 100 initial samples are selected as examples. As can be seen from the figure, when the initial sample is 100, all variables tend to be uniformly distributed on the whole, but there are still some slight fluctuations in some areas. However, when the initial samples are 2 and 30, the sample size of some variables will increase within a certain value range. And the rules of these two initial samples are the same.

For the time in variable space, the distribution form of this variable after the final sampling is basically not correlated with the initial sample size, and the distribution form presents a uniform distribution state. This rule is reasonable because the uniformity of time is important to the model. For the upstream water level in variable space, it can be obviously found that the sample size increases significantly when the water level is high. Elastic modulus and linear expansion coefficient of concrete perform similar rules. For the thermal conductivity of concrete in variable space, more samples are gathered at low thermal conductivity. For the deformation modulus of rock in variable space, when it is in the range of 10–25, the samples are few. The remaining variables are not significantly influenced by the initial sample size. Based on the above analysis, it can be found that different initial sample sizes have a certain impact on the final sample sizes, and the final sample size will increase with the increase of the initial sample size, which also reflects from the side that the proposed method can carry out targeted samples according to different forms of surrogate models.

Conclusion

Based on the adaptive sampling technique, the application of the adaptive sampling technique in establishing the surrogate model with multi-outputs is discussed. Then, SIMO and MIMO problems are studied based on the benchmark functions and compared with Sobol sequence. The result shows that this method has a smaller sample size requirement. Finally, by establishing a surrogate model of dam numerical simulation with multi-outputs, the applicability of this method in practical engineering problems is explored. In addition, the influence of different initial sample sizes on the final sample size and the distribution form of final sampling results of each variable are also analyzed. The following conclusions are obtained:

(1)
Since the adaptive sampling model with multi-outputs is developed based on the Bayesian framework, samples can be targeted according to the function form of surrogate model. In addition, by comparing with Sobol sequence, it is found that this method can effectively reduce the sample size required to establish the surrogate model.
(2)
Different initial sample size has a certain influence on the final sample size. Take this practical engineering problem as an example, the smaller the initial sample size, the smaller the final sample size. The smaller the initial sample size, the earlier the adaptive sampling process, and the more targeted samples can be carried out. Therefore, when the initial sample size is small, a clustering phenomenon will occur in some variables within a certain value range.
(3)
In order to verify the applicability of the proposed method, this method is applied to SIMO and MIMO problems, respectively. The result shows that this method can adaptively select the weights of different outputs and determine the ideal infill group, which greatly reduces human intervention. In addition, compared with random sampling, it can be found that this method can use fewer samples to establish a more accurate surrogate model.
(4)
In addition to SIMO and MIMO problems, this method is also applied to establish the surrogate model of dam numerical simulation with multi-outputs. The adaptive sampling based on multiple measuring points can be effectively applied to the parameter back analysis of dam structureto improve the computational efficiency of the back analysis.
(5)
It should be noted that the proposed method is currently applied to the problem in 17 dimensions, and problems with higher dimensions have not been verified. When facing higher dimension problems, the curse of dimensionality is still challenging. Therefore, the focus of future research will be how to conduct adaptive sampling with multi-outputs for higher dimension problems.

References

Sevieri, G., Andreini, M., De Falco, A. & Matthies, H. G. Concrete gravity dams model parameters updating using static measurements. Eng. Struct. 196, 109–231. https://doi.org/10.1016/j.engstruct.2019.05.072 (2019).
Article Google Scholar
Forrester, A. I. J. & Keane, A. J. Recent advances in surrogate-based optimization. Prog. Aerosp. Sci. 45, 50–79. https://doi.org/10.1016/j.paerosci.2008.11.001 (2009).
Article Google Scholar
Schultz, M., Small, M., Farrow, R. & Fischbeck, P. State water pollution control policy insights from a reduced-form model. J. Water Resour. Plann. Manag.-ASCE 130, 150–159. https://doi.org/10.1061/(ASCE)0733-9496(2004)130:2(150) (2004).
Article Google Scholar
Schultz, M., Small, M., Fischbeck, P. & Farrow, R. Evaluating response surface designs for uncertainty analysis and prescriptive applications of a large-scale water quality model. Environ. Model. Assess. 11, 345–359. https://doi.org/10.1007/s10666-006-9043-9 (2006).
Article Google Scholar
Bucher, C. G. & Bourgund, U. A fast and efficient response surface approach for structural reliability problems. Struct. Saf. 7, 57–66. https://doi.org/10.1016/0167-4730(90)90012-E (1990).
Article Google Scholar
Jones, D. R., Schonlau, M. & Welch, W. J. Efficient global optimization of expensive black-box functions. J. Global Optim. 13, 455–492. https://doi.org/10.1023/A:1008306431147 (1998).
Article MathSciNet MATH Google Scholar
Baú, D. A. & Mayer, A. S. Stochastic management of pump-and-treat strategies using surrogate functions. Adv. Water Resour. 29, 1901–1917. https://doi.org/10.1016/j.advwatres.2006.01.008 (2006).
Article ADS Google Scholar
Theocaris, P. S. & Panagiotopoulos, P. D. Generalised hardening plasticity approximated via anisotropic elasticity: A neural network approach. Comput. Methods Appl. Mech. Eng. 125, 123–139. https://doi.org/10.1016/0045-7825(94)00769-J (1995).
Article ADS MathSciNet MATH Google Scholar
Furukawa, T. & Yagawa, G. Implicit constitutive modelling for viscoplasticity using neural networks. Int. J. Numer. Meth. Eng. 43, 195–219. https://doi.org/10.1002/(SICI)1097-0207(19980930)43:2%3c195::AID-NME418%3e3.0.CO;2-6 (1998).
Article MATH Google Scholar
Karimi, I., Khaji, N., Ahmadi, M. T. & Mirzayee, M. System identification of concrete gravity dams using artificial neural networks based on a hybrid finite element–boundary element approach. Eng. Struct. 32, 3583–3591. https://doi.org/10.1016/j.engstruct.2010.08.002 (2010).
Article Google Scholar
Papadrakakis, M., Papadopoulos, V. & Lagaros, N. D. Structural reliability analyis of elastic-plastic structures using neural networks and Monte Carlo simulation. Comput. Methods Appl. Mech. Eng. 136, 145–163. https://doi.org/10.1016/0045-7825(96)01011-0 (1996).
Article ADS MATH Google Scholar
Zhang, X., Srinivasan, R. & Van Liew, M. Approximating SWAT model using artificial neural network and support vector machine1. JAWRA J. Am. Water Resour. Assoc. 45, 460–474. https://doi.org/10.1111/j.1752-1688.2009.00302.x (2009).
Article ADS Google Scholar
Rocco, C. M. & Moreno, J. A. Fast Monte Carlo reliability evaluation using support vector machine. Reliab. Eng. Syst. Saf. 76, 237–243. https://doi.org/10.1016/S0951-8320(02)00015-7 (2002).
Article MATH Google Scholar
Li, X., Li, X. & Su, Y. A hybrid approach combining uniform design and support vector machine to probabilistic tunnel stability assessment. Struct. Saf. 61, 22–42. https://doi.org/10.1016/j.strusafe.2016.03.001 (2016).
Article Google Scholar
Zhao, H.-B. Slope reliability analysis using a support vector machine. Comput. Geotech. 35, 459–467. https://doi.org/10.1016/j.compgeo.2007.08.002 (2008).
Article Google Scholar
Jing, Z., Chen, J. & Li, X. RBF-GA: An adaptive radial basis function metamodeling with genetic algorithm for structural reliability analysis. Reliab. Eng. Syst. Saf. 189, 42–57. https://doi.org/10.1016/j.ress.2019.03.005 (2019).
Article Google Scholar
Li, X. et al. A sequential surrogate method for reliability analysis based on radial basis function. Struct. Saf. 73, 42–53. https://doi.org/10.1016/j.strusafe.2018.02.005 (2018).
Article CAS Google Scholar
Mugunthan, P., Shoemaker, C. A. & Regis, R. G. Comparison of function approximation, heuristic, and derivative-based methods for automatic calibration of computationally expensive groundwater bioremediation models. Water Res. Res. https://doi.org/10.1029/2005WR004134 (2005).
Article Google Scholar
Mugunthan, P. & Shoemaker, C. A. Assessing the impacts of parameter uncertainty for computationally expensive groundwater models. Water Resour. Res. https://doi.org/10.1029/2005WR004640 (2006).
Article Google Scholar
Ghattas, O., Bui-Thanh, T. & Willcox, K. Model reduction for large-scale systems with high-dimensional parametric input space. SIAM. 30, 3270–3288. https://doi.org/10.1137/070694855 (2008).
Article MathSciNet MATH Google Scholar
XueFeng, M., WeiXing, Y., XiongQing, Y., KeLong, L. & Fei, X. A survey of surrogate models used in MDO. Chin. J. Comput. Mech. 5, 608–612 (2005).
Google Scholar
Jin, R., Chen, W. & Simpson, T. W. Comparative studies of metamodelling techniques under multiple modelling criteria. Struct. Multidiscip. Optim. 23, 1–13. https://doi.org/10.1007/s00158-001-0160-4 (2001).
Article Google Scholar
Zhao, D. & Xue, D. A comparative study of metamodeling methods considering sample quality merits. Struct. Multidiscip. Optim. 42, 923–938. https://doi.org/10.1007/s00158-010-0529-3 (2010).
Article Google Scholar
Li, Y., Hariri-Ardebili, M. A., Deng, T., Wei, Q. & Cao, M. A surrogate-assisted stochastic optimization inversion algorithm: Parameter identification of dams. Adv. Eng. Informat. 55, 101853. https://doi.org/10.1016/j.aei.2022.101853 (2023).
Article Google Scholar
Shahzadi, G. & Soulaïmani, A. Deep neural network and polynomial chaos expansion-based surrogate models for sensitivity and uncertainty propagation: An application to a rockfill dam. Water 13, 1830. https://doi.org/10.3390/w13131830 (2021).
Article Google Scholar
Wang, Y., Liu, Y. & Ma, X. Updated kriging-assisted shape optimization of a gravity dam. Water 13, 87. https://doi.org/10.3390/w13010087 (2021).
Article Google Scholar
Zhang, K., Gu, C., Zhu, Y., Li, Y. & Shu, X. A mathematical-mechanical hybrid driven approach for determining the deformation monitoring indexes of concrete dam. Eng. Struct. 277, 115353. https://doi.org/10.1016/j.engstruct.2022.115353 (2023).
Article Google Scholar
Rad, M. J. G. et al. GNDO-SVR: An efficient surrogate modeling approach for reliability-based design optimization of concrete dams. Structures 35, 722–733. https://doi.org/10.1016/j.istruc.2021.11.048 (2022).
Article Google Scholar
Heller B. in Statistics for Experimenters, an Introduction to Design, Data Analysis, and Model Building: G.E.P. Box, W.G. Hunter and J.S. Hunter, (Wiley, 1986).
Fang, K.-T., Lin, D., Winker, P. & Zhang, Y. Uniform design: Theory and application. Technometrics 42, 237–248. https://doi.org/10.2307/1271079 (2002).
Article MathSciNet MATH Google Scholar
Mckay, M. D., Beckman, R. J. & Conover, W. J. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21, 239–245. https://doi.org/10.2307/1268522 (1979).
Article MathSciNet MATH Google Scholar
Dette, H. & Pepelyshev, A. Generalized latin hypercube design for computer experiments. Technometrics 54, 421–429 (2009).
MathSciNet Google Scholar
Jin, R., Chen, W. & Sudjianto, A. An efficient algorithm for constructing optimal design of computer experiments. J. Stat. Plan. Inference 134, 268–287. https://doi.org/10.1016/j.jspi.2004.02.014 (2005).
Article MathSciNet MATH Google Scholar
Jiang, C. et al. A general failure-pursuing sampling framework for surrogate-based reliability analysis. Reliab. Eng. Syst. Saf. 183, 47–59. https://doi.org/10.1016/j.ress.2018.11.002 (2019).
Article Google Scholar
Jiang, C. et al. Iterative reliable design space approach for efficient reliability-based design optimization. Eng. Comput. 36, 151–169. https://doi.org/10.1007/s00366-018-00691-z (2020).
Article Google Scholar
Dong, H., Song, B., Wang, P. & Dong, Z. Hybrid surrogate-based optimization using space reduction (HSOSR) for expensive black-box functions. Appl. Soft Comput. 64, 641–655. https://doi.org/10.1016/j.asoc.2017.12.046 (2018).
Article Google Scholar
Wang, X., Ni, B., Zeng, L. & Liu, Y. An adaptive sampling strategy for construction of surrogate aerodynamic model. Aerosp. Sci. Technol. 112, 106594. https://doi.org/10.1016/j.ast.2021.106594 (2021).
Article Google Scholar
Xu, H., Zhang, X., Li, H. & Xiang, G. An ensemble of adaptive surrogate models based on local error expectations. Math. Probl. Eng. 2021, 1–14. https://doi.org/10.1155/2021/8857417 (2021).
Article Google Scholar
Rosenbaum, B. & Schulz, V. Comparing sampling strategies for aerodynamic Kriging surrogate models. ZAMM – J. Appl. Math. Mech./Z. Angew. Math. Mech. 92, 852–868. https://doi.org/10.1002/zamm.201100112 (2012).
Article MathSciNet MATH Google Scholar
Guénot, M., Lepot, I., Sainvitu, C., Goblet, J. & Coelho, R. F. Adaptive sampling strategies for non-intrusive POD-based surrogates. Eng. Comput. Int. J. Comput.-Aided Eng. 30, 521–547. https://doi.org/10.1108/02644401311329352 (2013).
Article Google Scholar
Jones, D. R. A taxonomy of global optimization methods based on response surfaces. J. Global Optim. 21, 345–383. https://doi.org/10.1023/A:1012771025575 (2001).
Article MathSciNet MATH Google Scholar
Laurenceau, J., Meaux, M., Montagnac, M. & Sagaut, P. Comparison of gradient-based and gradient-enhanced response-surface-based optimizers. AIAA J. 48, 981–994. https://doi.org/10.2514/1.45331 (2010).
Article ADS Google Scholar
Le Gratiet, L. & Cannamela, C. Cokriging-based sequential design strategies using fast cross-validation techniques for multi-fidelity computer codes. Technometrics 57, 418–427. https://doi.org/10.1080/00401706.2014.928233 (2015).
Article MathSciNet Google Scholar
Cai, X., Qiu, H., Gao, L. & Shao, X. Metamodeling for high dimensional design problems by multi-fidelity simulations. Struct. Multidiscip. Optim. 56, 151–166. https://doi.org/10.1007/s00158-017-1655-y (2017).
Article MathSciNet Google Scholar
Zhou, Q. et al. A sequential multi-fidelity metamodeling approach for data regression. Knowl.-Based Syst. 134, 199–212. https://doi.org/10.1016/j.knosys.2017.07.033 (2017).
Article CAS Google Scholar
Liu, H., Xu, S., Wang, X., Yang, S. & Meng, J. A multi-response adaptive sampling approach for global metamodeling. Proc. Inst. Mech. Eng. C J. Mech. Eng. Sci. 232, 3–16. https://doi.org/10.1177/0954406216672250 (2016).
Article Google Scholar
Giunta A., Wojtkiewicz S. & Eldred M. Overview of modern design of experiments methods for computational simulations. 649 https://doi.org/10.2514/6.2003-649 (2003).
Gibbs M. N. in Bayesian Gaussian Processes for Regression and Classification. Thesis. 134 (1997).
Schonlau, M. Computer Experiments and Global Optimization (University of Waterloo, 1997).
Google Scholar
Gorban, A. N., Pokidysheva, L. I., Smirnova, E. V. & Tyukina, T. A. Law of the minimum paradoxes. Bull. Math. Biol. 73, 2013–2044. https://doi.org/10.1007/s11538-010-9597-1 (2011).
Article MathSciNet PubMed MATH Google Scholar
Gramacy, R. B. & Lee, H. K. H. Cases for the nugget in modeling computer experiments. Stat. Comput. 22, 713–722. https://doi.org/10.1007/s11222-010-9224-x (2012).
Article MathSciNet MATH Google Scholar
Jin, S.-S. & Jung, H.-J. Self-adaptive sampling for sequential surrogate modeling of time-consuming finite element analysis. Smart Struct. Syst. 17(4), 611–629. https://doi.org/10.12989/sss.2016.17.4.611 (2016).
Article Google Scholar
Zhai, Z., Li, H. & Wang, X. An adaptive sampling method for Kriging surrogate model with multiple outputs. Eng. Comput. 38, 277–295. https://doi.org/10.1007/s00366-020-01145-1 (2022).
Article Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge tha support form the National Natural Science Foundation of China (Grant No. 51779215), the Natural Science Foundation of Jiangsu Province (Grant No. BK20171288) and the Major Scientific and Technological Project of Shandong Gangshiyuan Construction Engineering Group Co. Ltd (Grant No. 2020RS-1058).

Author information

Authors and Affiliations

College of Water Resources Science and Engineering, Yangzhou University, Yangzhou, 225009, Jiangsu, China
Jiaming Liang, Zhanchao Li, Ebrahim Yahya Khailah, Linsong Sun & Weigang Lu
Intelligent Water Conservancy Research Institute, Nanjing Jurise Engineering Technology Co., Ltd, Nanjing, 210032, Jiangsu, China
Zhanchao Li
Huadian Electeric Power Research Institute Co., Ltd, Hangzhou, 310000, China
Litan Pan
Civil Engineering Department, College of Engineering, Thamar University, Dhamar, 504408, Yemen
Ebrahim Yahya Khailah

Authors

Jiaming Liang
View author publications
You can also search for this author in PubMed Google Scholar
Zhanchao Li
View author publications
You can also search for this author in PubMed Google Scholar
Litan Pan
View author publications
You can also search for this author in PubMed Google Scholar
Ebrahim Yahya Khailah
View author publications
You can also search for this author in PubMed Google Scholar
Linsong Sun
View author publications
You can also search for this author in PubMed Google Scholar
Weigang Lu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.J.: methodology, conceptualization, writing—original draft, and validation. L.Z., P.L.: review, editing, and validation. E.Y.K.: editing. S.L., L.W.: review. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Zhanchao Li or Linsong Sun.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Liang, J., Li, Z., Pan, L. et al. Research on surrogate model of dam numerical simulation with multiple outputs based on adaptive sampling. Sci Rep 13, 11955 (2023). https://doi.org/10.1038/s41598-023-38590-z

Download citation

Received: 04 May 2023
Accepted: 11 July 2023
Published: 24 July 2023
DOI: https://doi.org/10.1038/s41598-023-38590-z
Springer Nature Limited

Research on surrogate model of dam numerical simulation with multiple outputs based on adaptive sampling

Abstract

Similar content being viewed by others

An intelligent multi-fidelity surrogate-assisted multi-objective reservoir production optimization method based on transfer stacking

Reliability-based optimum design of hydraulic water retaining structure constructed on heterogeneous porous media: utilizing stochastic ensemble surrogate model-based linked simulation optimization model

Multi-surrogate-based global optimization using a score-based infill criterion

Introduction

The proposed approach

Design of Experiments (DoE)

Surrogate model

Infill criteria

Maximizing the expected improvement (EI)

Maximizing the mean squared error (MSE)

Determine the ideal infill group with multi-outputs

Stopping criteria

Steps for the proposed approach

Benchmark function test

Single-input & multi-outputs (SIMO) problem

Multi-inputs & multi-outputs (MIMO) problem

Numerical experiment

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Navigation

Research on surrogate model of dam numerical simulation with multiple outputs based on adaptive sampling

Abstract

Similar content being viewed by others

An intelligent multi-fidelity surrogate-assisted multi-objective reservoir production optimization method based on transfer stacking

Reliability-based optimum design of hydraulic water retaining structure constructed on heterogeneous porous media: utilizing stochastic ensemble surrogate model-based linked simulation optimization model

Multi-surrogate-based global optimization using a score-based infill criterion

Introduction

The proposed approach

Design of Experiments (DoE)

Surrogate model

Infill criteria

Maximizing the expected improvement (EI)

Maximizing the mean squared error (MSE)

Determine the ideal infill group with multi-outputs

Stopping criteria

Steps for the proposed approach

Benchmark function test

Single-input & multi-outputs (SIMO) problem

Multi-inputs & multi-outputs (MIMO) problem

Numerical experiment

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation