Skip to main content
Log in

Estimating training data boundaries in surrogate-based modeling

  • Research Paper
  • Published:
Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Abstract

Using surrogate models outside training data boundaries can be risky and subject to significant errors. This paper presents a computationally efficient approach to estimate the boundaries of training data inputs in surrogate modeling using the Mahalanobis distance (MD). This distance can then be used as a threshold for deciding whether or not a particular prediction site is within the boundaries of the training data inputs, and has the potential of a likelihood/probabilistic interpretation. The approach is evaluated using two and four dimensional analytical restricted input spaces and a complex biomechanical six dimensional problem. The proposed approach: i) gives good approximations for the boundaries of the restricted input spaces, ii) exhibits reasonable error rates when classifying prediction sites as inside or outside known restricted input spaces and iii) reflects expected error trends for increasing values of the MDs similar to those obtained using a computationally expensive convex hull approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. For example, the Matlab implementation of the Quickhull algorithm (convhulln) was unable to compute the convex hull associated with two hundred (200) training data in a ten (10) dimensional hyper-spherical restricted input space due to lack of memory, when using a computer with a 2.5 GHz Pentium IV processor and 2GB/5GB of RAM/virtual memory.

Abbreviations

BER :

Balanced error rate

C :

Covariance matrix

KS :

Kolmogorov-Smirnov

LHS :

Latin hypercube sampling

m :

Number of training data

MD :

Mahalanobis distance

n :

Number of input variables

p :

Probability of a prediction site being within the training data boundaries

R p :

Set of real numbers of dimension p

S :

Surrogate model

T :

Training data

x :

Input variables

y :

Response variables

α :

Statistical significance level

\(\chi_p^{2} \) :

Chi-square distribution—p degrees of freedom

Δ:

Difference

ε :

Relative error

μ :

Mean

b :

boundary estimation

b20:

median of top 20% largest Mahalanobis distances

bl :

largest Mahalanobis distance

T :

training data

References

  • Barber CB, Dobkin DP, Huhdanpaa HT (1996) The Quickhull algorithm for convex hulls. ACM Trans Math Softw 22(4):469–483

    Article  MATH  MathSciNet  Google Scholar 

  • Bei Y, Fregly BJ (2004) Multibody dynamic simulation of knee contact mechanics. Med Eng Phys 26:777–789

    Article  Google Scholar 

  • Cressie NAC (1993) Statistics for spatial data. Wiley, New York

    Google Scholar 

  • Forrester AIJ, Keane AJ (2009) Recent advances in surrogate based optimization. Prog Aerosp Sci 45:50–79

    Article  Google Scholar 

  • Hammersley JM (1960) Related problems. 3. Monte-Carlo methods for solving multivariable problems. Ann N Y Acad Sci 86(3):844–874

    Article  MATH  MathSciNet  Google Scholar 

  • Jacques J, Lavergne C, Devictor N (2006) Sensitivity analysis in presence of model uncertainty and correlated inputs. Reliab Eng Syst Saf 91:1126–1134

    Article  Google Scholar 

  • Lin YC, Haftka RT, Queipo NV, Fregly BJ (2008) Dynamic simulation of knee motion using three dimensional surrogate contact modeling. In: Proceedings of the ASME 2008 summer bioengineering conference. SBC, Marco Island

    Google Scholar 

  • Lophaven SN, Nielsen HB, Sondergaard J (2002) DACE—A Matlab kriging toolbox, version 2.0. Report IMM-TR-2002-12. Informatics and Mathematical Modeling. Technical University of Denmark

  • Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press

  • McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics (American Statistical Association) 21(2):239–245

    MATH  MathSciNet  Google Scholar 

  • McLachlan GJ (1992) Discriminant analysis and statistical pattern recognition. Wiley Interscience

  • Missoum S, Ramu P, Haftka RT (2007) A convex hull approach for the reliability-based design optimization of nonlinear transient dynamic problems. Comput Methods Appl Mech Eng 196:2895–2906

    Article  MATH  Google Scholar 

  • Mount DM (2002) CMSC 754 Computational geometry. Lecture Notes, University of Maryland

  • Queipo NV, Haftka RT, Shyy W, Goel T, Vaidyanathan R, Kevin Tucker P (2005) Surrogate based analysis and optimization. Prog Aerosp Sci 41:1–28

    Article  Google Scholar 

  • Shioda R, Tunçel L (2007) Clustering via minimum volume ellipsoid. Comput Optim Appl 37:247–295

    Article  MATH  MathSciNet  Google Scholar 

  • Stephens MA (1974) EDF statistics for goodness of fit and some comparisons. J Am Stat Assoc 69:730–737

    Article  Google Scholar 

  • Sun P, Freund MR (2004) Computation of minimum-volume covering ellipsoids. Oper Res 52(5):690–706

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Science Foundation CBET Division under Grant No. 0602996 to B. J. Fregly and R. T. Haftka.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nestor V. Queipo.

Additional information

Part of this work was presented at the 8th World Congress on Structural and Multidisciplinary Optimization, June 1–5, 2009, Lisbon, Portugal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pineda, L.E., Fregly, B.J., Haftka, R.T. et al. Estimating training data boundaries in surrogate-based modeling. Struct Multidisc Optim 42, 811–821 (2010). https://doi.org/10.1007/s00158-010-0541-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00158-010-0541-7

Keywords

Navigation