Multiple surrogates: how cross-validation errors can help us to obtain the best predictor

Viana, Felipe A. C.; Haftka, Raphael T.; Steffen, Valder

doi:10.1007/s00158-008-0338-0

Multiple surrogates: how cross-validation errors can help us to obtain the best predictor

RESEARCH PAPER
Published: 08 January 2009

Volume 39, pages 439–457, (2009)
Cite this article

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Felipe A. C. Viana¹,
Raphael T. Haftka¹ &
Valder Steffen Jr.²

1586 Accesses
312 Citations
Explore all metrics

Abstract

Surrogate models are commonly used to replace expensive simulations of engineering problems. Frequently, a single surrogate is chosen based on past experience. This approach has generated a collection of papers comparing the performance of individual surrogates. Previous work has also shown that fitting multiple surrogates and picking one based on cross-validation errors (PRESS in particular) is a good strategy, and that cross-validation errors may also be used to create a weighted surrogate. In this paper, we discussed how PRESS (obtained either from the leave-one-out or from the k-fold strategies) is employed to estimate the RMS error, and whether to use the best PRESS solution or a weighted surrogate when a single surrogate is needed. We also studied the minimization of the integrated square error as a way to compute the weights of the weighted average surrogate. We found that it pays to generate a large set of different surrogates and then use PRESS as a criterion for selection. We found that (1) in general, PRESS is good for filtering out inaccurate surrogates; and (2) with sufficient number of points, PRESS may identify the best surrogate of the set. Hence the use of cross-validation errors for choosing a surrogate and for calculating the weights of weighted surrogates becomes more attractive in high dimensions (when a large number of points is naturally required). However, it appears that the potential gains from using weighted surrogates diminish substantially in high dimensions. We also examined the utility of using all the surrogates for forming the weighted surrogates versus using a subset of the most accurate ones. This decision is shown to depend on the weighting scheme. Finally, we also found that PRESS as obtained through the k-fold strategy successfully estimates the RMSE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A random forest guided tour

Article 19 April 2016

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

Article 30 August 2019

References

Acar E, Rais-Rohani M (2008) Ensemble of metamodels with optimized weight factors. In: 49th AIAA/ASME/ASCE/AHS/ASC structures, structural dynamics, and materials, Schaumburg, IL, AIAA 2008-1884
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford, pp 364–371
Google Scholar
Box GEP, Hunter WG, Hunter JS (1978) Statistics for experimenters. Wiley, New York
MATH Google Scholar
Cheng B, Titterington DM (1994) Neural networks: a review from a statistical perspective. Stat Sci 9:2–54
Article MATH MathSciNet Google Scholar
Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17(1):113–126
Article MATH Google Scholar
Clarke SM, Griebsch JH, Simpson TW (2005) Analysis of support vector regression for approximation of complex engineering analyses. J Mech Des 127:1077–1087
Article Google Scholar
Dixon LCW, Szegö GP (1978) Towards global optimization 2. North-Holland, Amsterdam, The Netherlands
Google Scholar
Goel T, Haftka RT, Shyy W, Queipo NV (2007) Ensemble of surrogates. Struct Multidisc Optim 33:199–216
Article Google Scholar
Gunn SR (1997) Support vector machines for classification and regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton, UK
Jin R, Chen W, Simpson TW (2001) Comparative studies of metamodelling techniques under multiple modeling criteria. Struct Multidisc Optim 23:1–13
Article Google Scholar
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the fourteenth international joint conference on artificial intelligence, pp 1137–1143
Lee J (2007) A novel three-phase trajectory informed search methodology for global optimization. J Glob Optim 38(1):61–77
Article MATH Google Scholar
Lin Y, Mistree F, Tsui K-L, Allen JK (2002) Metamodel validation with deterministic computer experiments. In: 9th AIAA/ISSMO symposium on multidisciplinary analysis and optimization, Atlanta, GA, AIAA, AIAA-2002-5425
Lophaven SN, Nielsen HB, Søndergaard J (2002) DACE—A MATLAB kriging toolbox. Technical Report IMM-TR-2002-12, Informatics and Mathematical Modelling. Technical University of Denmark
Mathworks Contributors (2004) MATLAB^® The language of technical computing, version 7.0 Release 14, The MathWorks Inc.
Mckay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables from a computer code. Technometrics 21:239–245
Article MATH MathSciNet Google Scholar
Meckesheimer M, Barton RR, Simpson TW, Booker A (2002) Computationally inexpensive metamodel assessment strategies. AIAA J 40(10):2053–2060
Article Google Scholar
Myers RH, Montgomery DC (1995) Response surface methodology: process and product optimization using designed experiments. Wiley, New York
MATH Google Scholar
Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4(4):409–435
Article MATH MathSciNet Google Scholar
Simpson TW, Peplinski J, Koch PN, Allen JK (1998) On the use of statistics in design and the implications for deterministic computer experiments. In: Proceedings of the design theory and methodology (DTM’97), Paper No. DETC97/DTM-3881. ASME
Smith M (1993) Neural networks for statistical modeling. Von Nostrand Reinhold, New York
MATH Google Scholar
Smola AJ, Scholkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222
Article MathSciNet Google Scholar
Venkataraman S, Haftka RT (2004) Structural optimization complexity: what has Moore’s law done for us? Struct Multidisc Optim 28:375–387
Article Google Scholar
Viana FAC, Goel T (2007) SURROGATES ToolBox, http://fchegury.googlepages.com
Zerpa L, Queipo NV, Pintos S, Salager J (2005) An optimization methodology of alkaline–surfactant–polymer flooding processes using field scale numerical simulation and multiple surrogates. J Pet Sci Eng 47:197–208
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, FL, 32611, USA
Felipe A. C. Viana & Raphael T. Haftka
School of Mechanical Engineering, Federal University of Uberlandia, Uberlandia, MG, 38400, Brazil
Valder Steffen Jr.

Authors

Felipe A. C. Viana
View author publications
You can also search for this author in PubMed Google Scholar
Raphael T. Haftka
View author publications
You can also search for this author in PubMed Google Scholar
Valder Steffen Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Felipe A. C. Viana.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Viana, F.A.C., Haftka, R.T. & Steffen, V. Multiple surrogates: how cross-validation errors can help us to obtain the best predictor. Struct Multidisc Optim 39, 439–457 (2009). https://doi.org/10.1007/s00158-008-0338-0

Download citation

Received: 26 May 2008
Revised: 05 September 2008
Accepted: 02 November 2008
Published: 08 January 2009
Issue Date: October 2009
DOI: https://doi.org/10.1007/s00158-008-0338-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple surrogates: how cross-validation errors can help us to obtain the best predictor

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiple surrogates: how cross-validation errors can help us to obtain the best predictor

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation