Abstract
The coupling of Finite Element (FE) simulations with approximate optimization techniques is becoming increasingly popular in forming industry. By doing so, it is implicitly assumed that the optimization objective and possible constraints are smooth functions of the design variables and, in case of robust optimization, design and noise variables. However, non-linear FE simulations are known to introduce numerical noise caused by the discrete nature of the simulation algorithms, e.g. errors caused by re-meshing, time-step adjustments or contact algorithms. The subsequent usage of metamodels based on such noisy data reduces the prediction quality of the optimization routine and is known to even magnify the numerical errors. This work provides an approach to handle noisy numerical data in approximate optimization of forming processes, covering several fundamental research questions in dealing with numerical noise. First, the deteriorating effect of numerical noise on the prediction quality of several well-known metamodeling techniques is demonstrated using an analytical test function. Next, numerical noise is quantified and its effect is minimized by the application of local approximation and regularization techniques. A general approximate optimization strategy is subsequently presented and coupling with a sequential update algorithm is proposed. The strategy is demonstrated by the sequential deterministic and robust optimization of 2 industrial metal forming processes i.e. a V-bending application and a cup-stretching application. Although numerical noise is often neglected in practice, both applications in this work show that the general awareness of its presence is highly important to increase the overall accuracy of optimization results.
Similar content being viewed by others
References
Barthelemy JFM, Haftka RT (1993) Approximation concepts for optimum structural design - a review. Struct Multidiscip Optim 5:129–144
Simpson TW, Toropov V, Balabanov V, Viana FAC (2008) Design and analysis of computer experiments in multidisciplinary design optimization: a review of how far we have come - or not. In 12th AIAA/ISSMO multidisciplinary analysis and optimization conference, MAO, art. no. 2008–5802
Wang H, Li G (2010) Sheet forming optimization based on least square support vector regression and intelligent sampling approach. Int J Mater Form 3:9–12
Jansson T, Andersson A, Nilsson L (2005) Optimization of draw-in for an automotive sheet metal part: An evaluation using surrogate models and response surfaces. J Mater Process Technol 159:426–434
Ejday M, Fourment L (2010) Metamodel assisted evolutionary algorithm for multi-objective optimization of non-steady metal forming problems. Int J Mater Form 3:5–8
Chenot J-L, Bouchard P-O, Fourment L, Lasne P, Roux E (2011) Optimization of metal forming processes for improving final mechanical strength Computational Plasticity XI - Fundamentals and Applications, COMPLAS XI, pp 42–55
Clees T, Steffes-lai D, Helbig M, Sun D-Z (2010) Statistical analysis and robust optimization of forming processes and forming-to-crash process chains. Int J Mater Form 3:45–48
Li YQ, Cui ZS, Ruan XY, Zhang DJ (2006) Cae-based six sigma robust optimization for deep-drawing sheet metal process. Int J Adv Manuf Technol 30:631–637
Strano M (2008) A technique for fem optimization under reliability constraint of process variables in sheet metal forming. Int J Mater Form 1:13–20
Kleiber M, Knabel J, Rojek J (2004) Response surface method for probabilistic assessment of metal forming failures. Int J Numer Anal Model 60:51–67
Oden JT, Belytschko T, Fish J, Hughes TJR, Johnson C, Keyes D, Laub A, Petzold L, Srolovitz D, Yip S (2006) Simulation based engineering science. Technical report, National Science Foundation
Tekkaya AE, Martins PAF (2009) Accuracy, reliability and validity of finite element analysis in metal forming: a user’s perspective. J Eng Comput 26:1026–1056
van Keulen F, Toropov VV (1997) New developments in structural optimization using adaptive mesh refinement and multi-point approximations. Eng Optim 29:217–234
Giunta AA, Dudley JM, Narducci R, Grossman B, Haftka RT, Mason WH, Watson LT (1994) Noisy aerodynamic response and smooth approximations in hsct design. AIAA J Proc 5th Symp Multidiscip Struct Optim 94–4376-CP:1117–1128
Papila M, Haftka RT (2000) Response surface approximations: Noise, error repair and modeling errors. AIAA J 38:2336–2343
Goel T, Haftka RT, Papila M, Shyy W (2006) Generalized pointwise bias error bounds for response surface approximations. Int J Numer Methods Eng 65:2035–2059
Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4:409–423
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. Global Optim 13:455–492
Santner T, Williams B, Notz W (2003) The design and analysis of computer experiments. Springer–Verlag, New York
Toropov V, van Keulen F, Markine V, de Boer H (1996) Multipoint approximations for structural optimization problems with noisy response functions. AIAA J Proc Symp Multidiscip Struct Optim A96-38701:10–31
Siem AYD, den Hertog D (2007) Kriging models that are robust with respect to simulation errors. CentER discussion paper No. 200768, ISSN 0924-7815, Tilburg University, p 1–29
Bishop CM (2006) Pattern recognition and machine learning. Springer Science + Business Media, New York
Bonte MHA, Fourment L, Do TT, van den Boogaard AH, Huetink J (2010) Optimization of forging processes using finite element simulations : A comparison of sequential approximate optimization and other algorithms. J Struct Multidisc Optim 42(5):797–810
Wiebenga JH, van den Boogaard AH, Klaseboer G (2012) Sequential robust optimization of a v-bending process using numerical simulations. J Struct Multidisc Optim 46:137–153
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes. Cambridge University Press, Cambridge
Tikhonov AN, Arsenin VY (1977) Solutions of ill-posed problems. Wiley, New York
Golub GH, Heath M, Wahba G (1979) Generalized cross-validation as a method choosing a good ridge parameter. Technometrics 21:215–223
Craven P, Whaba G (1979) Smoothing noisy data with spline functions. Numer Math 31:377–403
Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36:1627–1639
Schonlau M (1997) Computer experiments and global optimization. PhD thesis. University of Waterloo, Ontario, Canada
DiekA (2012) In-house finite element code for forming simulations of the University of Twente. http://www.utwente.nl/ctw/tm/research/NSM/software/dieka/. Enschede, the Netherlands
Myers RH, Montgomery DC (2002) Response surface methodology. Wiley, New York
MacKay DJC (1992) Bayesian interpolation. Neural Comput 4:415–447
Lophaven SN, Nielsen HB, Sondergaard J (2002) Dace: a Matlab Kriging Toolbox Version 2.0. Technical Report imm-tr-2002-12. Technical report. Technical University of Denmark, Copenhagen, Denmark
Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley, New York
Forrester A, Keane AJ, Bresslo NW (2006) Design and analysis of noisy computer experiments. AIAA J 44:2331–2339
Orr MJL (1996) Introduction to radial basis function networks. Technical report. University of Edinburgh, UK
de Veaux RDD, Schumi J, Schweinsberg J, Ungar LH (1998) Prediction intervals for neural networks via nonlinear regression. Technometrics 40:273–282
Matlab Version 7.13.0 (2012) The MathWorks Inc, Natick Massachusetts, USA
Acknowledgments
This research was carried out under the project number M22.1.08303 in the framework of the Research Program of the Materials innovation institute (www.m2i.nl). The industrial partners co-operating in this research are gratefully acknowledged for their useful contributions to this research.
Author information
Authors and Affiliations
Corresponding author
Appendix: Metamodel types
Appendix: Metamodel types
The aim of a metamodel, denoted by \(\hat {y}(\mathbf {x})\), is to accurately predict the trend of the FE simulation response or true model \(y(\mathbf {x})\). Consider a nonlinear regression model, including a random error term \(\mathbf {\varepsilon }\), defined by:
What follows is a description of different types of metamodeling techniques used in this work to construct \(\hat {y}(\mathbf {x})\).
Response surface methodology
The Response Surface Methodology (RSM) is a well known method for creating an approximate model of a response [32]. Although this method is generally used for constructing a response surface from physical experiments, many authors have applied it to numerical experiments as well. One of the reasons is it’s ability to filter out numerical noise [14–16].
Using RSM, a polynomial model is fitted through the n response measurements or observations \(\mathbf {y}\) allowing for a random error term \(\mathbf {\varepsilon }\). Equation 12 can now be written in matrix form as:
where
Now, \(\mathbf {X}\) is an \(n \times p\) matrix of the levels of independent variables with \(p = m + 1\), \(\mathbf {\beta }\) is a \(p \times 1\) vector of regression coefficients, and \(\mathbf {\varepsilon }\) is an \(n \times 1\) vector or random error terms. Note that the design matrix \(\mathbf {X}\) can incorporate non-linear terms with respect to the m design variables. The order of these terms are referred to as the order of the polynomial model. The metamodel is given by \(\hat {y}=\mathbf {X}\mathbf {\beta }\). The unknown regression coefficients \(\mathbf {\beta }\) are determined by minimizing the error sum of squares at the training points, also referred to as quadratic loss function or \(L_{2}\)-norm:
Differentiating Eq. 14 with respect to \(\mathbf {\beta }\) and setting the results to zero yields the best estimation of \(\mathbf {\beta }\):
where \(\hat {\mathbf {\beta }}\) denotes the estimator of \(\mathbf {\beta }\). The response prediction \(\hat {y}_{0}\) at an unknown design variable setting \(\mathbf {x}_{0}\) is now given by the explicit function:
The variance at this location is given by:
The unbiased estimate of the error variance \(\sigma ^{2}\) is given by:
The prediction uncertainty of the metamodel is given by the square root of the variance as calculated in Eq. 17.
Regression in response surface methodology
Instead of estimating the unknown regression coefficients based on the error sum of squares through Eq. 14, in case of ridge regression these are obtained by minimizing the regularized loss function:
where the regularization parameter \(\lambda \) governs the relative importance of the regularization term, penalizing large weights, compared with the error sum of squares term. The ridge regression formulation results in the solution:
where the optimal \(\lambda \) can be identified by generalized cross-validation. A modification of Eq. 18 in case of ridge regression is provided in MacKay [33].
Kriging
Computer simulations are deterministic in nature meaning that repeated runs for the same input parameters will yield exactly the same result. Therefore, the remaining error, denoted by \(\varepsilon \) in Eq. 12, should formally be zero [19]. In other words, the metamodel should interpolate through the response values at the training points.
The approach proposed in Sacks et al. [17] and Jones et al. [18] is referred to as Design and Analysis of Computer Experiments (DACE) where generally Kriging is used as interpolation technique. Kriging involves a defined base function or regression part, similar to fitting a RSM metamodel. The random error term \(\mathbf {\varepsilon }\) in Eq. 12 is replaced by basis functions or a stochastic part \(Z(\mathbf {x})\) to compute the exact predictions at the available training points:
where \(Z(\mathbf {x})\) is assumed to be a Gaussian stochastic process with mean zero, process variance \(\sigma _{z}^{2}\), and spatial covariance function given by:
where \(R(x_{i},x_{j})\) describes the correlation between the known measurement points \(x_{i}\) and \(x_{j}\). The correlation function R determines the shape of the metamodel between measurement points and is, in case of a Gaussian exponential correlation function, given by:
Now, in case m design variables are present, the correlation function depends on the m one-dimensional correlation functions as follows:
The entries of the vectors \(\mathbf {\theta }=\{\theta _{1},\theta _{2},\dots ,\theta _{m}\}^{\text {T}}\) and the distance between the known measurement points \(\mathbf {x}_{i}\) and \(\mathbf {x}_{j}\) determine the structure of \(R(\mathbf {\theta },\mathbf {x}_{i},\mathbf {x}_{j})\). Analogous to RSM, a Kriging metamodel is fitted in order to minimize the mean squared error between the Kriging metamodel \(\hat {y}(\mathbf {x})\) and the true but unknown response function \(y(\mathbf {x})\) [19, 34]:
In other words, the mean squared error is minimized subject to the unbiasedness constraint that ensures there is no systematic error between the metamodel and the true function. The Best Linear Unbiased Predictor (BLUP) \(\hat {y}_{0}\) at an untried design variable setting \(x_{0}\) is now given by:
where \(\mathbf {x}_{0}\) is the design matrix containing the settings of the untried point \(x_{0}\) and \(\mathbf {X}\) the design matrix containing the training points. The vector \(\mathbf {r}_{0}\) contains the correlation between the point \((x_{0},y_{0})\) and the known measurement \((x_{i},y_{i})\). \(\mathbf {R}\) is a matrix containing the correlation between the training points given by Eq. 23.
The Mean Squared Error (MSE) can be calculated at location \(x_{0}\) by:
The unknown Kriging parameters \(\mathbf {\beta }\), \(\sigma _{z}^{2}\), and \(\mathbf {\theta }\) can be estimated by Maximum Likelihood Estimation (MLE) [17]. Note that maximization of the likelihood function is equivalent to a minimization of the error sum of squares when the error can be assumed to be a Gaussian noise. This optimization procedure is solved using the DACE toolbox provided by Lophaven et al. [34].
Regression in Kriging
In case data is contaminated with noise, it makes more sense to approximate the given data instead of interpolating the data. The generalization capability of Kriging models can be improved by adding a regularization constant \(\lambda \) to the leading diagonal of the correlation matrix \(\mathbf {R}\) as \(\mathbf {R} + \lambda \mathbf {I}\) [35]. This enables a Kriging model to regress the data and approximate noisy functions. Without the regression constant, each point is given an exact correlation with itself, forcing the metamodel to pass through the training points. The regression constant enables control on the interpolation feature of the Kriging model. The regression constant \(\lambda \) is now optimized along with the other unknown parameters in the MLE providing the regression Kriging predictor:
A modification of Eq. 27 in case of regression Kriging is provided in Forrester et al. [36].
Radial basis functions
A function approximation constructed by the linear combination of basis functions \(h_{i}(\mathbf {x})\) takes the form:
where each basis function is weighted by an appropriate coefficient \(w_{i}\). The idea behind Radial Basis Functions (RBF) is that every known DOE point i ’influences’ its surroundings the same way in all directions according to a basis function, so that \(h_{i}(\mathbf {x})\) = \(\phi (r)\) where r is the radial distance r = \(\|\mathbf {x}-\mathbf {x}_{i}\|_{2}\). Now the RBF approximation is a linear combination of the basis functions centered at all n DOE points:
A commonly used radial basis function is the Gaussian exponential function. Referring to Eq. 23 and compose the Gaussian with the radial distance r, the radial basis function is given by:
The weights \(w_{i}\) can be found by minimizing the error sum of squares at the training points. Evaluating Eq. 29 results in solving a linear system of equations of the form \(\mathbf {H}\mathbf {w} = \mathbf {y}\). The estimated mean response \(\hat {y}_{0}\) at \(\mathbf {x}_{0}\) is provided by:
The variance at this location is given by:
Similar to RSM, the unbiased estimate of the error variance \(\sigma ^{2}\) is given by Eq. 18.
Regression in radial basis function approximation
The regularized loss function is formulated as:
Minimization of the loss function results in the best estimation of the regularized weight coefficients:
Also note the resemblance with Eq. 20. A modification of the error variance in case of ridge regression is provided in MacKay [33] and Orr [37].
Artificial neural networks
Neural Networks (NN) follow the same form as Eq. 29 where the choice of the letter h for the basis functions reflects the interest in NN which have hidden units. In addition to the basis functions, the building blocks of NN are neurons and connections. Differences in the learning rules and the network topology result in different NN architectures or NN concepts. In this work, two layer feedforward backpropagation NN are utilized.
A two layer NN architecture is presented in Fig. 20. This architecture is referred to as feed forward since information only proceeds forward through the network and there are no feedback loops in between the layers. Starting with the first layer of S neurons, the output \(\mathbf {a}\) of the so-called Hidden layer (HL) is given by:
The layer includes a weight matrix \(\mathbf {W}_{\text {(HL)}} \in \mathbb {R}^{S \times m}\), an input vector \(\mathbf {x}\), a bias vector \(\mathbf {b}_{\text {(HL)}} = \{b_{\text {(HL)}1}, b_{\text {(HL)}2}, \dots ,b_{\text {(HL)}S}\}^{\text {T}}\), basis functions or activation functions \(\mathbf {G}\) and an output vector \(\mathbf {a} = \{a_{1}, a_{2},\dots ,a_{S}\}^{\text {T}}\). The basis functions used in this work are the tangent sigmoid and the linear basis functions. The tangent sigmoid basis function \(G^{(1)}(d)\) can take any arbitrary input value \(d \in \mathbb {R}\) and suppress the output into the range \((-1,1)\) by:
The output of the linear basis function \(G^{(2)}(d)\) equals its input:
The output of the hidden layer \(\mathbf {a}\) is the input for the next layer. This layer is referred to as the Ouput Layer (OL) since its output is also the output of the network. The basis function used in the hidden layer is the tangent simoid function whereas the linear function is used in output layer. These functions are preferred because of their differentiability which enables determining partial derivatives used in parameter estimation.
The predictor of a two layer architecture with a single network output is now given by:
In essence, Eq. 39 is the linear combination of the weighted tangent sigmoid basis functions. The unknown parameters in Eq. 39 are the bias term of the output layer \(b_{\text {(OL)}}\), the vector with output layer weights \(\mathbf {w}_{\text {(OL)}} = \{w_{\text {(OL)}1}, w_{\text {(OL)}2}, \dots , w_{\text {(OL)}S}\}^{\text {T}}\) and the hidden layer bias vector \(\mathbf {b}_{\text {(HL)}}\) and weight matrix \(\mathbf {W}_{\text {(HL)}}\). The unknown weight and bias parameters can be estimated by minimizing the error sum of squares at the training points. This unconstrained nonlinear optimization problem is solved using a Levenberg-Marguardt optimization algorithm. The procedure is also referred to as Bayesian regulation backpropagation [33].
The variance estimation theory for nonlinear regression as in Eq. 17 and 33 also applies to NN [38]:
where \(\mathbf {J}\) is a matrix whose ijth entry is given by \(\partial \hat {y}(\mathbf {x}_{i})/\partial z_{j}\) and \(\mathbf {g}_{0}\) is a vector whose ith entry is \(\partial \hat {y}(\mathbf {x}_{0})/\partial z_{j}\), evaluated at the optimal parameter vector \(\hat {\mathbf {z}}\) where \(\mathbf {z}\) represents the collection of all unknown parameters. Note that for estimating the weights in NN, \(\mathbf {J}\) is already calculated as part of the optimization procedure. The unbiased estimate of the error variance \(\sigma ^{2}\) is given by Eq. 18. The procedure as described in this section is solved using the NN Matlab toolbox [39].
Regression in artificial neural networks
With many weight and bias parameters involved in NN, there is a considerable danger of overfitting. The generalization capability can be improved by minimizing the regularized loss function as in Eq. 34. Note that regularization both assists in avoiding over fitting due to a high number of hidden units S (and thus many weights and biases to be determined) and the presence of numerical noise in the response data. The loss function is minimized using the Levenberg-Marguardt backpropagation algorithm as implemented in the NN Matlab toolbox [39]. A modification of the error variance in case of ridge regression is provided in [38].
Rights and permissions
About this article
Cite this article
Wiebenga, J.H., van den Boogaard, A.H. On the effect of numerical noise in approximate optimization of forming processes using numerical simulations. Int J Mater Form 7, 317–335 (2014). https://doi.org/10.1007/s12289-013-1130-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12289-013-1130-2