Skip to main content
Log in

Multiple surrogates: how cross-validation errors can help us to obtain the best predictor

  • RESEARCH PAPER
  • Published:
Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Abstract

Surrogate models are commonly used to replace expensive simulations of engineering problems. Frequently, a single surrogate is chosen based on past experience. This approach has generated a collection of papers comparing the performance of individual surrogates. Previous work has also shown that fitting multiple surrogates and picking one based on cross-validation errors (PRESS in particular) is a good strategy, and that cross-validation errors may also be used to create a weighted surrogate. In this paper, we discussed how PRESS (obtained either from the leave-one-out or from the k-fold strategies) is employed to estimate the RMS error, and whether to use the best PRESS solution or a weighted surrogate when a single surrogate is needed. We also studied the minimization of the integrated square error as a way to compute the weights of the weighted average surrogate. We found that it pays to generate a large set of different surrogates and then use PRESS as a criterion for selection. We found that (1) in general, PRESS is good for filtering out inaccurate surrogates; and (2) with sufficient number of points, PRESS may identify the best surrogate of the set. Hence the use of cross-validation errors for choosing a surrogate and for calculating the weights of weighted surrogates becomes more attractive in high dimensions (when a large number of points is naturally required). However, it appears that the potential gains from using weighted surrogates diminish substantially in high dimensions. We also examined the utility of using all the surrogates for forming the weighted surrogates versus using a subset of the most accurate ones. This decision is shown to depend on the weighting scheme. Finally, we also found that PRESS as obtained through the k-fold strategy successfully estimates the RMSE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Acar E, Rais-Rohani M (2008) Ensemble of metamodels with optimized weight factors. In: 49th AIAA/ASME/ASCE/AHS/ASC structures, structural dynamics, and materials, Schaumburg, IL, AIAA 2008-1884

  • Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford, pp 364–371

    Google Scholar 

  • Box GEP, Hunter WG, Hunter JS (1978) Statistics for experimenters. Wiley, New York

    MATH  Google Scholar 

  • Cheng B, Titterington DM (1994) Neural networks: a review from a statistical perspective. Stat Sci 9:2–54

    Article  MATH  MathSciNet  Google Scholar 

  • Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17(1):113–126

    Article  MATH  Google Scholar 

  • Clarke SM, Griebsch JH, Simpson TW (2005) Analysis of support vector regression for approximation of complex engineering analyses. J Mech Des 127:1077–1087

    Article  Google Scholar 

  • Dixon LCW, Szegö GP (1978) Towards global optimization 2. North-Holland, Amsterdam, The Netherlands

    Google Scholar 

  • Goel T, Haftka RT, Shyy W, Queipo NV (2007) Ensemble of surrogates. Struct Multidisc Optim 33:199–216

    Article  Google Scholar 

  • Gunn SR (1997) Support vector machines for classification and regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton, UK

  • Jin R, Chen W, Simpson TW (2001) Comparative studies of metamodelling techniques under multiple modeling criteria. Struct Multidisc Optim 23:1–13

    Article  Google Scholar 

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the fourteenth international joint conference on artificial intelligence, pp 1137–1143

  • Lee J (2007) A novel three-phase trajectory informed search methodology for global optimization. J Glob Optim 38(1):61–77

    Article  MATH  Google Scholar 

  • Lin Y, Mistree F, Tsui K-L, Allen JK (2002) Metamodel validation with deterministic computer experiments. In: 9th AIAA/ISSMO symposium on multidisciplinary analysis and optimization, Atlanta, GA, AIAA, AIAA-2002-5425

  • Lophaven SN, Nielsen HB, Søndergaard J (2002) DACE—A MATLAB kriging toolbox. Technical Report IMM-TR-2002-12, Informatics and Mathematical Modelling. Technical University of Denmark

  • Mathworks Contributors (2004) MATLAB® The language of technical computing, version 7.0 Release 14, The MathWorks Inc.

  • Mckay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables from a computer code. Technometrics 21:239–245

    Article  MATH  MathSciNet  Google Scholar 

  • Meckesheimer M, Barton RR, Simpson TW, Booker A (2002) Computationally inexpensive metamodel assessment strategies. AIAA J 40(10):2053–2060

    Article  Google Scholar 

  • Myers RH, Montgomery DC (1995) Response surface methodology: process and product optimization using designed experiments. Wiley, New York

    MATH  Google Scholar 

  • Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4(4):409–435

    Article  MATH  MathSciNet  Google Scholar 

  • Simpson TW, Peplinski J, Koch PN, Allen JK (1998) On the use of statistics in design and the implications for deterministic computer experiments. In: Proceedings of the design theory and methodology (DTM’97), Paper No. DETC97/DTM-3881. ASME

  • Smith M (1993) Neural networks for statistical modeling. Von Nostrand Reinhold, New York

    MATH  Google Scholar 

  • Smola AJ, Scholkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222

    Article  MathSciNet  Google Scholar 

  • Venkataraman S, Haftka RT (2004) Structural optimization complexity: what has Moore’s law done for us? Struct Multidisc Optim 28:375–387

    Article  Google Scholar 

  • Viana FAC, Goel T (2007) SURROGATES ToolBox, http://fchegury.googlepages.com

  • Zerpa L, Queipo NV, Pintos S, Salager J (2005) An optimization methodology of alkaline–surfactant–polymer flooding processes using field scale numerical simulation and multiple surrogates. J Pet Sci Eng 47:197–208

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felipe A. C. Viana.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Viana, F.A.C., Haftka, R.T. & Steffen, V. Multiple surrogates: how cross-validation errors can help us to obtain the best predictor. Struct Multidisc Optim 39, 439–457 (2009). https://doi.org/10.1007/s00158-008-0338-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00158-008-0338-0

Keywords

Navigation