Subset selection from large datasets for Kriging modeling

Rennen, Gijs

doi:10.1007/s00158-008-0306-8

Subset selection from large datasets for Kriging modeling

Research Paper
Open access
Published: 24 September 2008

Volume 38, pages 545–569, (2009)
Cite this article

Download PDF

You have full access to this open access article

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Subset selection from large datasets for Kriging modeling

Download PDF

Gijs Rennen¹

853 Accesses
17 Citations
Explore all metrics

Abstract

When building a Kriging model, the general intuition is that using more data will always result in a better model. However, we show that when we have a large non-uniform dataset, using a uniform subset can have several advantages. Reducing the time necessary to fit the model, avoiding numerical inaccuracies and improving the robustness with respect to errors in the output data are some aspects which can be improved by using a uniform subset. We furthermore describe several new and current methods for selecting a uniform subset. These methods are tested and compared on several artificial datasets and one real life dataset. The comparison shows how the selected subsets affect different aspects of the resulting Kriging model. As none of the subset selection methods performs best on all criteria, the best method to choose depends on how the different aspects are valued. The comparison made in this paper can be used to facilitate the user in making a good choice.

Article PDF

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Agca S, Eksioglu B, Ghosh JB (2000) Lagrangian solution of maximum dispersion problems. Nav Res Logist 47(2):97–114
Article MATH MathSciNet Google Scholar
Banzhaf W, Francone FD, Keller RE, Nordin P (1998) Genetic programming: an introduction on the automatic evolution of computer programs and its applications. Morgan Kaufmann, San Francisco, CA, USA
MATH Google Scholar
Booker AJ, Dennis JE, Frank, PD, Serafini, DB, Torczon V, Trosset MW (1999) A rigorous framework for optimization of expensive functions by surrogates. Struct Multidisc Optim 17(1):1–13
Google Scholar
Cherkassky V, Mulier F (1998) Learning from data: concepts, theory, and methods. Wiley, New York, NY, USA
MATH Google Scholar
Davis GJ, Morris MD (1997) Six factors which affect the condition number of matrices associated with Kriging. Math Geol 29:669–683
Article Google Scholar
Dixon LCW, Szegö GP (1978) The global optimization problem: an introduction. In: Dixon LCW, Szegö GP (eds) Toward global optimization, vol 2. North-Holland, pp 1–15
Erkut, E (1990) The discrete p-dispersion problem. Eur J Oper Res 46:48–60
Article MATH MathSciNet Google Scholar
Erkut E, Neuman S (1989) Analytical models for locating undesirable facilities. Eur J Oper Res 50:275–291
Article MathSciNet Google Scholar
Ghosh JB (1996) Computational aspects of the maximum diversity problem. Oper Res Lett 19:175–181
Article MATH MathSciNet Google Scholar
Golbraikh A, Shen M, Xiao Z, Xiao YD, Lee KH, Tropsha A (2003) Rational selection of training and test sets for the development of validated QSAR models. J Comput-Aided Mol Des 17(2):241–253
Article Google Scholar
Golbraikh A, Tropsha A (2002) Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. J Comput-Aided Mol Des 16(5–6):357–369
Article Google Scholar
Golub GH, van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore
MATH Google Scholar
Hansen P, Moon IJ (1994) Dispersing facilities on a network. Cahiers du CERO 36:221–234
MATH MathSciNet Google Scholar
Hardy, RL (1971) Multiquadratic equations of topography and other irregular surfaces. J Geophys Res 76(8):1905–1915
Article Google Scholar
Hedayat AS, Sloane NAJ, Stufken J (1999) Orthogonal arrays: theory and applications. Springer, New York
MATH Google Scholar
Husslage BGM, van Dam ER, den Hertog D, Stehouwer HP, Stinstra E (2003) Coordination of coupled black box simulations in the construction of metamodels. Concurr Eng 11(4):267–278
Article Google Scholar
Jin R, Chen W, Simpson TW (2001) Comparative studies of metamodelling techniques under multiple modelling criteria. Struct Multidisc Optim 23:1–13
Article Google Scholar
Jin R, Chen W, Sudjianto A (2002) On sequential sampling for global metamodeling in engineering design. In: DETC-DAC34092, 2002 ASME design automation conference, pp 1–10
Jin R, Chen W, Sudjianto A (2005) An efficient algorithm for constructing optimal design of computer experiments. J Stat Plan Inference 134(1):268–287
Article MATH MathSciNet Google Scholar
Jones DR (2001) A taxonomy of global optimization methods based on response surfaces. J Glob Optim 21(4):345–383
Article MATH Google Scholar
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13(4):455–492
Article MATH MathSciNet Google Scholar
Koehler JR, Owen AB (1996) Computer experiments. Handb Stat 13:261–308
Article MathSciNet Google Scholar
Kordon A (2006) Evolutionary computation at dow chemical. SIGEVOlution 1(3):4–9
Article Google Scholar
Koza JR (1992) Genetic programming: on the programming of computers by natural selection. MIT Press, Cambridge, MA, USA
MATH Google Scholar
Krige DG (1951) A statistical approach to some basic mine valuation problems on the Witwatersrand. J Chem Metall Min Soc S Afr 52(6):119–139
Google Scholar
Kuo CC, Glover F, Dhir KS (1993) Analyzing and modeling the maximum diversity problem by zero-one programming. Decis Sci 24:1171–1185
Article Google Scholar
Lam RLH, Welch WJ, Young SS (2002) Uniform coverage designs for molecule selection. Technometrics 44:99–109
Article MathSciNet Google Scholar
Lophaven SN, Nielsen HB, Sondergaard J (2002) DACE: a Matlab Kriging toolbox version 2.0. Technical Report IMM-TR-2002-12. Technical Univeristy of Denmark, Copenhagen
Google Scholar
Matheron G (1963) Principles of geostatistics. Econ Geol 58(8):1246–1266
Article Google Scholar
McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2):239–245
Article MATH MathSciNet Google Scholar
Owen AB (1992) Orthogonal arrays for computer experiments, integration and visualization. Stat Sin 2:439–452
MATH MathSciNet Google Scholar
Padula SL, Alexandrov N, Green LL (1996) MDO test suite at NASA Langley research center. In: AIAA, NASA, and ISSMO, symposium on multidisciplinary analysis and optimization, vol 6. Bellevue, WA, pp 410–420
Pisinger D (2006) Upper bounds and exact algorithms for p-dispersion problems. Comput Oper Res 33(5):1380–1398
Article MATH Google Scholar
Powell MJD (1987) Radial basis functions for multivariable interpolation: a review. In: Clarendon press institute of mathematics and its applications conference series, pp 143–167
Ravi SS, Rosenkrantz DJ, Tayi GK (1991) Facility dispersion problems: heuristics and special cases (extended abstract). In: Algorithms and data structures, 2nd workshop WADS ’91, 14–16 August, Ottawa, Canada, pp 355–366
Ravi SS, Rosenkrantz DJ, Tayi GK (1994) Heuristic and special case algorithms for dispersion problems. Oper Res 42:299–310
Article MATH Google Scholar
Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4:409–435
Article MATH MathSciNet Google Scholar
Santner ThJ, Williams BJ, Notz WI (2003) The design and analysis of computer experiments. Springer series in statistics. Springer, New York
MATH Google Scholar
Siem, AYD, den Hertog D (2007) Kriging models that are robust with respect to simulation errors. Center Discussion Paper 2007-68. Tilburg University
Simpson, TW, Peplinski J, Koch PN, Allen JK (2001) Metamodels for computer-based engineering design: survey and recommendations. Eng Comput 17:129–150
Article MATH Google Scholar
Srivastava A, Hacker K, Lewis K, Simpson TW (2004) A method for using legacy data for metamodel-based design of large-scale systems. Struct Multidisc Optim 28:145–155
Article Google Scholar
Stehouwer HP, den Hertog D (1999) Simulation-based design optimization: methodology and applications. In: Proceedings of the first ASMO UK / ISSMO conference on engineering design optimization. Ilkley, UK
Stein M (1987) Large sample properties of simulations using Latin hypercube sampling. Technometrics 29(2):143–151
Article MATH MathSciNet Google Scholar
Steuer RE (1986) Multiple criteria optimization: theory and application. John Wiley, New York
MATH Google Scholar
Tang B (1993) Orthogonal array-based Latin hypercubes. J Am Stat Assoc 88:1392–1397
Article MATH Google Scholar
van Dam ER, Husslage BGM, den Hertog D, Melissen JBM (2007) Maximin Latin hypercube designs in two dimensions. Oper Res 55(1):158–169
Article MathSciNet MATH Google Scholar
Wang G, Dong Z, Aitchison P (2001) Adaptive response surface method – a global optimization scheme for approximation-based design problems. J Eng Optim 33:707–734
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Econometrics and Operations Research, Tilburg University, P.O. Box 90153, 5000, LE Tilburg, The Netherlands
Gijs Rennen

Authors

Gijs Rennen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gijs Rennen.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Rennen, G. Subset selection from large datasets for Kriging modeling. Struct Multidisc Optim 38, 545–569 (2009). https://doi.org/10.1007/s00158-008-0306-8

Download citation

Received: 06 February 2008
Revised: 06 June 2008
Accepted: 28 July 2008
Published: 24 September 2008
Issue Date: July 2009
DOI: https://doi.org/10.1007/s00158-008-0306-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Subset selection from large datasets for Kriging modeling

Abstract

Article PDF

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Subset selection from large datasets for Kriging modeling

Abstract

Article PDF

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation