Skip to main content
Log in

On the effect of numerical noise in approximate optimization of forming processes using numerical simulations

  • Original Research
  • Published:
International Journal of Material Forming Aims and scope Submit manuscript

Abstract

The coupling of Finite Element (FE) simulations with approximate optimization techniques is becoming increasingly popular in forming industry. By doing so, it is implicitly assumed that the optimization objective and possible constraints are smooth functions of the design variables and, in case of robust optimization, design and noise variables. However, non-linear FE simulations are known to introduce numerical noise caused by the discrete nature of the simulation algorithms, e.g. errors caused by re-meshing, time-step adjustments or contact algorithms. The subsequent usage of metamodels based on such noisy data reduces the prediction quality of the optimization routine and is known to even magnify the numerical errors. This work provides an approach to handle noisy numerical data in approximate optimization of forming processes, covering several fundamental research questions in dealing with numerical noise. First, the deteriorating effect of numerical noise on the prediction quality of several well-known metamodeling techniques is demonstrated using an analytical test function. Next, numerical noise is quantified and its effect is minimized by the application of local approximation and regularization techniques. A general approximate optimization strategy is subsequently presented and coupling with a sequential update algorithm is proposed. The strategy is demonstrated by the sequential deterministic and robust optimization of 2 industrial metal forming processes i.e. a V-bending application and a cup-stretching application. Although numerical noise is often neglected in practice, both applications in this work show that the general awareness of its presence is highly important to increase the overall accuracy of optimization results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Barthelemy JFM, Haftka RT (1993) Approximation concepts for optimum structural design - a review. Struct Multidiscip Optim 5:129–144

    Google Scholar 

  2. Simpson TW, Toropov V, Balabanov V, Viana FAC (2008) Design and analysis of computer experiments in multidisciplinary design optimization: a review of how far we have come - or not. In 12th AIAA/ISSMO multidisciplinary analysis and optimization conference, MAO, art. no. 2008–5802

  3. Wang H, Li G (2010) Sheet forming optimization based on least square support vector regression and intelligent sampling approach. Int J Mater Form 3:9–12

    Article  Google Scholar 

  4. Jansson T, Andersson A, Nilsson L (2005) Optimization of draw-in for an automotive sheet metal part: An evaluation using surrogate models and response surfaces. J Mater Process Technol 159:426–434

    Article  Google Scholar 

  5. Ejday M, Fourment L (2010) Metamodel assisted evolutionary algorithm for multi-objective optimization of non-steady metal forming problems. Int J Mater Form 3:5–8

    Article  Google Scholar 

  6. Chenot J-L, Bouchard P-O, Fourment L, Lasne P, Roux E (2011) Optimization of metal forming processes for improving final mechanical strength Computational Plasticity XI - Fundamentals and Applications, COMPLAS XI, pp 42–55

  7. Clees T, Steffes-lai D, Helbig M, Sun D-Z (2010) Statistical analysis and robust optimization of forming processes and forming-to-crash process chains. Int J Mater Form 3:45–48

    Article  Google Scholar 

  8. Li YQ, Cui ZS, Ruan XY, Zhang DJ (2006) Cae-based six sigma robust optimization for deep-drawing sheet metal process. Int J Adv Manuf Technol 30:631–637

    Article  Google Scholar 

  9. Strano M (2008) A technique for fem optimization under reliability constraint of process variables in sheet metal forming. Int J Mater Form 1:13–20

    Article  Google Scholar 

  10. Kleiber M, Knabel J, Rojek J (2004) Response surface method for probabilistic assessment of metal forming failures. Int J Numer Anal Model 60:51–67

    MATH  Google Scholar 

  11. Oden JT, Belytschko T, Fish J, Hughes TJR, Johnson C, Keyes D, Laub A, Petzold L, Srolovitz D, Yip S (2006) Simulation based engineering science. Technical report, National Science Foundation

  12. Tekkaya AE, Martins PAF (2009) Accuracy, reliability and validity of finite element analysis in metal forming: a user’s perspective. J Eng Comput 26:1026–1056

    Article  MATH  Google Scholar 

  13. van Keulen F, Toropov VV (1997) New developments in structural optimization using adaptive mesh refinement and multi-point approximations. Eng Optim 29:217–234

    Article  Google Scholar 

  14. Giunta AA, Dudley JM, Narducci R, Grossman B, Haftka RT, Mason WH, Watson LT (1994) Noisy aerodynamic response and smooth approximations in hsct design. AIAA J Proc 5th Symp Multidiscip Struct Optim 94–4376-CP:1117–1128

    Google Scholar 

  15. Papila M, Haftka RT (2000) Response surface approximations: Noise, error repair and modeling errors. AIAA J 38:2336–2343

    Article  Google Scholar 

  16. Goel T, Haftka RT, Papila M, Shyy W (2006) Generalized pointwise bias error bounds for response surface approximations. Int J Numer Methods Eng 65:2035–2059

    Article  MATH  Google Scholar 

  17. Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4:409–423

    Article  MATH  MathSciNet  Google Scholar 

  18. Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. Global Optim 13:455–492

    Article  MATH  MathSciNet  Google Scholar 

  19. Santner T, Williams B, Notz W (2003) The design and analysis of computer experiments. Springer–Verlag, New York

    Book  MATH  Google Scholar 

  20. Toropov V, van Keulen F, Markine V, de Boer H (1996) Multipoint approximations for structural optimization problems with noisy response functions. AIAA J Proc Symp Multidiscip Struct Optim A96-38701:10–31

    Google Scholar 

  21. Siem AYD, den Hertog D (2007) Kriging models that are robust with respect to simulation errors. CentER discussion paper No. 200768, ISSN 0924-7815, Tilburg University, p 1–29

  22. Bishop CM (2006) Pattern recognition and machine learning. Springer Science + Business Media, New York

    MATH  Google Scholar 

  23. Bonte MHA, Fourment L, Do TT, van den Boogaard AH, Huetink J (2010) Optimization of forging processes using finite element simulations : A comparison of sequential approximate optimization and other algorithms. J Struct Multidisc Optim 42(5):797–810

    Article  Google Scholar 

  24. Wiebenga JH, van den Boogaard AH, Klaseboer G (2012) Sequential robust optimization of a v-bending process using numerical simulations. J Struct Multidisc Optim 46:137–153

    Article  Google Scholar 

  25. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  26. Tikhonov AN, Arsenin VY (1977) Solutions of ill-posed problems. Wiley, New York

    MATH  Google Scholar 

  27. Golub GH, Heath M, Wahba G (1979) Generalized cross-validation as a method choosing a good ridge parameter. Technometrics 21:215–223

    Article  MATH  MathSciNet  Google Scholar 

  28. Craven P, Whaba G (1979) Smoothing noisy data with spline functions. Numer Math 31:377–403

    Article  MATH  Google Scholar 

  29. Savitzky A, Golay MJE (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36:1627–1639

    Article  Google Scholar 

  30. Schonlau M (1997) Computer experiments and global optimization. PhD thesis. University of Waterloo, Ontario, Canada

  31. DiekA (2012) In-house finite element code for forming simulations of the University of Twente. http://www.utwente.nl/ctw/tm/research/NSM/software/dieka/. Enschede, the Netherlands

  32. Myers RH, Montgomery DC (2002) Response surface methodology. Wiley, New York

    MATH  Google Scholar 

  33. MacKay DJC (1992) Bayesian interpolation. Neural Comput 4:415–447

    Article  Google Scholar 

  34. Lophaven SN, Nielsen HB, Sondergaard J (2002) Dace: a Matlab Kriging Toolbox Version 2.0. Technical Report imm-tr-2002-12. Technical report. Technical University of Denmark, Copenhagen, Denmark

  35. Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley, New York

    Book  Google Scholar 

  36. Forrester A, Keane AJ, Bresslo NW (2006) Design and analysis of noisy computer experiments. AIAA J 44:2331–2339

    Article  Google Scholar 

  37. Orr MJL (1996) Introduction to radial basis function networks. Technical report. University of Edinburgh, UK

  38. de Veaux RDD, Schumi J, Schweinsberg J, Ungar LH (1998) Prediction intervals for neural networks via nonlinear regression. Technometrics 40:273–282

    Article  MATH  MathSciNet  Google Scholar 

  39. Matlab Version 7.13.0 (2012) The MathWorks Inc, Natick Massachusetts, USA

Download references

Acknowledgments

This research was carried out under the project number M22.1.08303 in the framework of the Research Program of the Materials innovation institute (www.m2i.nl). The industrial partners co-operating in this research are gratefully acknowledged for their useful contributions to this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. H. Wiebenga.

Appendix: Metamodel types

Appendix: Metamodel types

The aim of a metamodel, denoted by \(\hat {y}(\mathbf {x})\), is to accurately predict the trend of the FE simulation response or true model \(y(\mathbf {x})\). Consider a nonlinear regression model, including a random error term \(\mathbf {\varepsilon }\), defined by:

$$ y(\mathbf{x}) = \hat{y}(\mathbf{x}) + \mathbf{\varepsilon} $$
(12)

What follows is a description of different types of metamodeling techniques used in this work to construct \(\hat {y}(\mathbf {x})\).

Response surface methodology

The Response Surface Methodology (RSM) is a well known method for creating an approximate model of a response [32]. Although this method is generally used for constructing a response surface from physical experiments, many authors have applied it to numerical experiments as well. One of the reasons is it’s ability to filter out numerical noise [1416].

Using RSM, a polynomial model is fitted through the n response measurements or observations \(\mathbf {y}\) allowing for a random error term \(\mathbf {\varepsilon }\). Equation 12 can now be written in matrix form as:

$$ \mathbf{y} = \mathbf{X}\mathbf{\beta}+\mathbf{\varepsilon} $$
(13)

where

$$ \begin{array}{rll}\mathbf{y} &=& \begin{bmatrix} y_{1}\\ y_{2}\\ \vdots \\ y_{n}\\ \end{bmatrix} , \quad\mathbf{X} = \left[ \begin{array}{ccccc} 1 x_{11} x_{12} \cdots x_{1m}\\ 1 x_{21} x_{22} \cdots x_{2m}\\ \vdots \vdots \vdots \ddots \vdots\\ 1 x_{n1} x_{n2} \cdots x_{nm} \end{array}\right], \\ \mathbf{\beta} &=& \begin{bmatrix} \beta_{0}\\ \beta_{1}\\ \vdots \\ \beta_{m}\\ \end{bmatrix} , \text{and} \quad\mathbf{\varepsilon} = \begin{bmatrix} \varepsilon_{1}\\ \varepsilon_{2}\\ \vdots \\ \varepsilon_{n}\\ \end{bmatrix} \end{array} $$

Now, \(\mathbf {X}\) is an \(n \times p\) matrix of the levels of independent variables with \(p = m + 1\), \(\mathbf {\beta }\) is a \(p \times 1\) vector of regression coefficients, and \(\mathbf {\varepsilon }\) is an \(n \times 1\) vector or random error terms. Note that the design matrix \(\mathbf {X}\) can incorporate non-linear terms with respect to the m design variables. The order of these terms are referred to as the order of the polynomial model. The metamodel is given by \(\hat {y}=\mathbf {X}\mathbf {\beta }\). The unknown regression coefficients \(\mathbf {\beta }\) are determined by minimizing the error sum of squares at the training points, also referred to as quadratic loss function or \(L_{2}\)-norm:

$$ \mathbf{\varepsilon}^{\text{T}}\mathbf{\varepsilon} = (\mathbf{y}-\mathbf{X}\mathbf{\beta})^{\text{T}}(\mathbf{y}-\mathbf{X}\mathbf{\beta}) $$
(14)

Differentiating Eq. 14 with respect to \(\mathbf {\beta }\) and setting the results to zero yields the best estimation of \(\mathbf {\beta }\):

$$ \hat{\mathbf{\beta}}=(\mathbf{X}^{\text{T}}\mathbf{X})^{-1}\mathbf{X}^{\text{T}}\mathbf{y} $$
(15)

where \(\hat {\mathbf {\beta }}\) denotes the estimator of \(\mathbf {\beta }\). The response prediction \(\hat {y}_{0}\) at an unknown design variable setting \(\mathbf {x}_{0}\) is now given by the explicit function:

$$ \hat{y}_{0}=\mathbf{x}_{0}^{\text{T}}\hat{\mathbf{\beta}} $$
(16)

The variance at this location is given by:

$$ \text{var}[\hat{y}_{0}]=\sigma^{2}\mathbf{x}_{0}^{\text{T}} (\mathbf{X}^{\text{T}}\mathbf{X})^{-1} \mathbf{x}_{0} $$
(17)

The unbiased estimate of the error variance \(\sigma ^{2}\) is given by:

$$ \begin{array}{rll} \hat{\sigma}^{2} &=& \frac{\mathbf{\varepsilon}^{\text{T}}\mathbf{\varepsilon}}{n-p}\\ &=& \frac{\sum\limits_{i=1}^{n} \varepsilon_{i}^{2}}{n-p} \\ &=& \frac{\sum\limits_{i=1}^{n} (\hat{y_{i}}-y_{i})^{2}}{n-p} \end{array} $$
(18)

The prediction uncertainty of the metamodel is given by the square root of the variance as calculated in Eq. 17.

Regression in response surface methodology

Instead of estimating the unknown regression coefficients based on the error sum of squares through Eq. 14, in case of ridge regression these are obtained by minimizing the regularized loss function:

$$ \varepsilon^{\text{T}}\varepsilon + \lambda \mathbf{\beta}^{T}\mathbf{\beta} $$
(19)

where the regularization parameter \(\lambda \) governs the relative importance of the regularization term, penalizing large weights, compared with the error sum of squares term. The ridge regression formulation results in the solution:

$$ \hat{\mathbf{\beta}}=(\mathbf{X}^{\text{T}}\mathbf{X}-\lambda\mathbf{I})^{-1}\mathbf{X}^{\text{T}}\mathbf{y} $$
(20)

where the optimal \(\lambda \) can be identified by generalized cross-validation. A modification of Eq. 18 in case of ridge regression is provided in MacKay [33].

Kriging

Computer simulations are deterministic in nature meaning that repeated runs for the same input parameters will yield exactly the same result. Therefore, the remaining error, denoted by \(\varepsilon \) in Eq. 12, should formally be zero [19]. In other words, the metamodel should interpolate through the response values at the training points.

The approach proposed in Sacks et al. [17] and Jones et al. [18] is referred to as Design and Analysis of Computer Experiments (DACE) where generally Kriging is used as interpolation technique. Kriging involves a defined base function or regression part, similar to fitting a RSM metamodel. The random error term \(\mathbf {\varepsilon }\) in Eq. 12 is replaced by basis functions or a stochastic part \(Z(\mathbf {x})\) to compute the exact predictions at the available training points:

$$ \mathbf{y} = \mathbf{X}\mathbf{\beta}+Z(\mathbf{x}) $$
(21)

where \(Z(\mathbf {x})\) is assumed to be a Gaussian stochastic process with mean zero, process variance \(\sigma _{z}^{2}\), and spatial covariance function given by:

$$ \text{cov}(Z(x_{i}),Z(x_{j}))=\sigma_{z}^{2}R(x_{i},x_{j}) $$
(22)

where \(R(x_{i},x_{j})\) describes the correlation between the known measurement points \(x_{i}\) and \(x_{j}\). The correlation function R determines the shape of the metamodel between measurement points and is, in case of a Gaussian exponential correlation function, given by:

$$ R(\theta,x_{i},x_{j}) = \text{exp}^{-\theta(x_{i}-x_{j})^{2}} $$
(23)

Now, in case m design variables are present, the correlation function depends on the m one-dimensional correlation functions as follows:

$$ R(\mathbf{\theta},\mathbf{x}_{i},\mathbf{x}_{j}) = \prod\limits_{l=1}^{m} \text{exp}^{-\theta_{l}(x_{il}-x_{jl})^{2}} $$
(24)

The entries of the vectors \(\mathbf {\theta }=\{\theta _{1},\theta _{2},\dots ,\theta _{m}\}^{\text {T}}\) and the distance between the known measurement points \(\mathbf {x}_{i}\) and \(\mathbf {x}_{j}\) determine the structure of \(R(\mathbf {\theta },\mathbf {x}_{i},\mathbf {x}_{j})\). Analogous to RSM, a Kriging metamodel is fitted in order to minimize the mean squared error between the Kriging metamodel \(\hat {y}(\mathbf {x})\) and the true but unknown response function \(y(\mathbf {x})\) [19, 34]:

$$\begin{array}{l} \mathrm{min} \; E(\hat{y}(\mathbf{x})-y(\mathbf{x}))^{2} \\ \mathrm{s.t.} \; E(\hat{y}(\mathbf{x})-y(\mathbf{x})) = 0 \end{array} $$
(25)

In other words, the mean squared error is minimized subject to the unbiasedness constraint that ensures there is no systematic error between the metamodel and the true function. The Best Linear Unbiased Predictor (BLUP) \(\hat {y}_{0}\) at an untried design variable setting \(x_{0}\) is now given by:

$$ \hat{y}_{0} = \mathbf{x}_{0}^{\text{T}}\mathbf{\beta}+\mathbf{r}_{0}^{\text{T}}\mathbf{R}^{-1}(\mathbf{y}-\mathbf{X}\mathbf{\beta}) $$
(26)

where \(\mathbf {x}_{0}\) is the design matrix containing the settings of the untried point \(x_{0}\) and \(\mathbf {X}\) the design matrix containing the training points. The vector \(\mathbf {r}_{0}\) contains the correlation between the point \((x_{0},y_{0})\) and the known measurement \((x_{i},y_{i})\). \(\mathbf {R}\) is a matrix containing the correlation between the training points given by Eq. 23.

The Mean Squared Error (MSE) can be calculated at location \(x_{0}\) by:

$$ \text{MSE}(y_{0})=\sigma_{z}^{2}\bigg(1-\begin{bmatrix} \mathbf{x}_{0}^{\text{T}} \mathbf{r}_{0}^{\text{T}} \end{bmatrix} \begin{bmatrix} \mathbf{0} \mathbf{X}^{T} \\ \mathbf{X} \mathbf{R}^{-1} \end{bmatrix} \begin{bmatrix} \mathbf{x}_{0} \\ \mathbf{r}_{0} \end{bmatrix}\bigg)$$
(27)

The unknown Kriging parameters \(\mathbf {\beta }\), \(\sigma _{z}^{2}\), and \(\mathbf {\theta }\) can be estimated by Maximum Likelihood Estimation (MLE) [17]. Note that maximization of the likelihood function is equivalent to a minimization of the error sum of squares when the error can be assumed to be a Gaussian noise. This optimization procedure is solved using the DACE toolbox provided by Lophaven et al. [34].

Regression in Kriging

In case data is contaminated with noise, it makes more sense to approximate the given data instead of interpolating the data. The generalization capability of Kriging models can be improved by adding a regularization constant \(\lambda \) to the leading diagonal of the correlation matrix \(\mathbf {R}\) as \(\mathbf {R} + \lambda \mathbf {I}\) [35]. This enables a Kriging model to regress the data and approximate noisy functions. Without the regression constant, each point is given an exact correlation with itself, forcing the metamodel to pass through the training points. The regression constant enables control on the interpolation feature of the Kriging model. The regression constant \(\lambda \) is now optimized along with the other unknown parameters in the MLE providing the regression Kriging predictor:

$$ \hat{y}_{0} = \mathbf{x}_{0}^{\text{T}}\mathbf{\beta}+\mathbf{r}_{0}^{\text{T}}(\mathbf{R}+\lambda\mathbf{I})^{-1}(\mathbf{y}-\mathbf{X}\mathbf{\beta}) $$
(28)

A modification of Eq. 27 in case of regression Kriging is provided in Forrester et al. [36].

Radial basis functions

A function approximation constructed by the linear combination of basis functions \(h_{i}(\mathbf {x})\) takes the form:

$$ y(\mathbf{x})=\sum\limits_{i=1}^{n} w_{i} h_{i}(\mathbf{x}) $$
(29)

where each basis function is weighted by an appropriate coefficient \(w_{i}\). The idea behind Radial Basis Functions (RBF) is that every known DOE point i ’influences’ its surroundings the same way in all directions according to a basis function, so that \(h_{i}(\mathbf {x})\) = \(\phi (r)\) where r is the radial distance r = \(\|\mathbf {x}-\mathbf {x}_{i}\|_{2}\). Now the RBF approximation is a linear combination of the basis functions centered at all n DOE points:

$$ y(\mathbf{x})=\sum\limits_{i=1}^{n} w_{i} \phi(\|\mathbf{x}-\mathbf{x}_{i}\|_{2}) $$
(30)

A commonly used radial basis function is the Gaussian exponential function. Referring to Eq. 23 and compose the Gaussian with the radial distance r, the radial basis function is given by:

$$ \phi(r) = \text{exp}^{-(\theta r)^{2}} $$
(31)

The weights \(w_{i}\) can be found by minimizing the error sum of squares at the training points. Evaluating Eq. 29 results in solving a linear system of equations of the form \(\mathbf {H}\mathbf {w} = \mathbf {y}\). The estimated mean response \(\hat {y}_{0}\) at \(\mathbf {x}_{0}\) is provided by:

$$ \hat{y}_{0}=\mathbf{h}_{0}^{\text{T}}\hat{\mathbf{w}} $$
(32)

The variance at this location is given by:

$$ \text{var}[\hat{y}_{0}]=\sigma^{2}\mathbf{h}_{0}^{\text{T}} (\mathbf{H}^{\text{T}}\mathbf{H})^{-1} \mathbf{h}_{0} $$
(33)

Similar to RSM, the unbiased estimate of the error variance \(\sigma ^{2}\) is given by Eq. 18.

Regression in radial basis function approximation

The regularized loss function is formulated as:

$$ \mathbf{\varepsilon}^{T}\mathbf{\varepsilon}+\lambda\mathbf{w}^{T}\mathbf{w} $$
(34)

Minimization of the loss function results in the best estimation of the regularized weight coefficients:

$$ \hat{\mathbf{w}}=(\mathbf{H}^{T}\mathbf{H}- \lambda \mathbf{I})^{-1}\mathbf{H}^{T}\mathbf{y} $$
(35)

Also note the resemblance with Eq. 20. A modification of the error variance in case of ridge regression is provided in MacKay [33] and Orr [37].

Artificial neural networks

Neural Networks (NN) follow the same form as Eq. 29 where the choice of the letter h for the basis functions reflects the interest in NN which have hidden units. In addition to the basis functions, the building blocks of NN are neurons and connections. Differences in the learning rules and the network topology result in different NN architectures or NN concepts. In this work, two layer feedforward backpropagation NN are utilized.

A two layer NN architecture is presented in Fig. 20. This architecture is referred to as feed forward since information only proceeds forward through the network and there are no feedback loops in between the layers. Starting with the first layer of S neurons, the output \(\mathbf {a}\) of the so-called Hidden layer (HL) is given by:

$$ \mathbf{a} = \mathbf{G}^{(1)}(\mathbf{d}_{\text{(HL)}}), \;\;\; \mathbf{d}_{\text{(HL)}} = \mathbf{W}_{\text{(HL)}}\mathbf{x}+\mathbf{b}_{\text{(HL)}} $$
(36)
Fig. 20
figure 20

Two layer NN architecture

The layer includes a weight matrix \(\mathbf {W}_{\text {(HL)}} \in \mathbb {R}^{S \times m}\), an input vector \(\mathbf {x}\), a bias vector \(\mathbf {b}_{\text {(HL)}} = \{b_{\text {(HL)}1}, b_{\text {(HL)}2}, \dots ,b_{\text {(HL)}S}\}^{\text {T}}\), basis functions or activation functions \(\mathbf {G}\) and an output vector \(\mathbf {a} = \{a_{1}, a_{2},\dots ,a_{S}\}^{\text {T}}\). The basis functions used in this work are the tangent sigmoid and the linear basis functions. The tangent sigmoid basis function \(G^{(1)}(d)\) can take any arbitrary input value \(d \in \mathbb {R}\) and suppress the output into the range \((-1,1)\) by:

$$ G^{(1)}(d) = \frac{2}{1+\text{exp}(-2d)}-1 $$
(37)

The output of the linear basis function \(G^{(2)}(d)\) equals its input:

$$ G^{(2)}(d) = d $$
(38)

The output of the hidden layer \(\mathbf {a}\) is the input for the next layer. This layer is referred to as the Ouput Layer (OL) since its output is also the output of the network. The basis function used in the hidden layer is the tangent simoid function whereas the linear function is used in output layer. These functions are preferred because of their differentiability which enables determining partial derivatives used in parameter estimation.

The predictor of a two layer architecture with a single network output is now given by:

$$ \hat{y}(\mathbf{x}) = \mathbf{G}^{(2)}(\mathbf{d}_{\text{(OL)}}) = \mathbf{d}_{\text{(OL)}}, \;\;\; \mathbf{d}_{\text{(OL)}} = \mathbf{w}_{\text{(OL)}}^{\text{T}}\mathbf{a}+b_{\text{(OL)}} $$
(39)

In essence, Eq. 39 is the linear combination of the weighted tangent sigmoid basis functions. The unknown parameters in Eq. 39 are the bias term of the output layer \(b_{\text {(OL)}}\), the vector with output layer weights \(\mathbf {w}_{\text {(OL)}} = \{w_{\text {(OL)}1}, w_{\text {(OL)}2}, \dots , w_{\text {(OL)}S}\}^{\text {T}}\) and the hidden layer bias vector \(\mathbf {b}_{\text {(HL)}}\) and weight matrix \(\mathbf {W}_{\text {(HL)}}\). The unknown weight and bias parameters can be estimated by minimizing the error sum of squares at the training points. This unconstrained nonlinear optimization problem is solved using a Levenberg-Marguardt optimization algorithm. The procedure is also referred to as Bayesian regulation backpropagation [33].

The variance estimation theory for nonlinear regression as in Eq. 17 and 33 also applies to NN [38]:

$$ \text{var}[\hat{y}_{0}]=\sigma^{2}\mathbf{g}_{0}^{\text{T}} (\mathbf{J}^{\text{T}}\mathbf{J})^{-1} \mathbf{g}_{0} $$
(40)

where \(\mathbf {J}\) is a matrix whose ijth entry is given by \(\partial \hat {y}(\mathbf {x}_{i})/\partial z_{j}\) and \(\mathbf {g}_{0}\) is a vector whose ith entry is \(\partial \hat {y}(\mathbf {x}_{0})/\partial z_{j}\), evaluated at the optimal parameter vector \(\hat {\mathbf {z}}\) where \(\mathbf {z}\) represents the collection of all unknown parameters. Note that for estimating the weights in NN, \(\mathbf {J}\) is already calculated as part of the optimization procedure. The unbiased estimate of the error variance \(\sigma ^{2}\) is given by Eq. 18. The procedure as described in this section is solved using the NN Matlab toolbox [39].

Regression in artificial neural networks

With many weight and bias parameters involved in NN, there is a considerable danger of overfitting. The generalization capability can be improved by minimizing the regularized loss function as in Eq. 34. Note that regularization both assists in avoiding over fitting due to a high number of hidden units S (and thus many weights and biases to be determined) and the presence of numerical noise in the response data. The loss function is minimized using the Levenberg-Marguardt backpropagation algorithm as implemented in the NN Matlab toolbox [39]. A modification of the error variance in case of ridge regression is provided in [38].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wiebenga, J.H., van den Boogaard, A.H. On the effect of numerical noise in approximate optimization of forming processes using numerical simulations. Int J Mater Form 7, 317–335 (2014). https://doi.org/10.1007/s12289-013-1130-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12289-013-1130-2

Keywords

Navigation