Advertisement

Structural and Multidisciplinary Optimization

, Volume 42, Issue 6, pp 811–821 | Cite as

Estimating training data boundaries in surrogate-based modeling

  • Luis E. Pineda
  • Benjamin J. Fregly
  • Raphael T. Haftka
  • Nestor V. QueipoEmail author
Research Paper

Abstract

Using surrogate models outside training data boundaries can be risky and subject to significant errors. This paper presents a computationally efficient approach to estimate the boundaries of training data inputs in surrogate modeling using the Mahalanobis distance (MD). This distance can then be used as a threshold for deciding whether or not a particular prediction site is within the boundaries of the training data inputs, and has the potential of a likelihood/probabilistic interpretation. The approach is evaluated using two and four dimensional analytical restricted input spaces and a complex biomechanical six dimensional problem. The proposed approach: i) gives good approximations for the boundaries of the restricted input spaces, ii) exhibits reasonable error rates when classifying prediction sites as inside or outside known restricted input spaces and iii) reflects expected error trends for increasing values of the MDs similar to those obtained using a computationally expensive convex hull approach.

Keywords

Surrogate modeling Training data boundaries Mahalanobis distance 

Nomenclature

BER

Balanced error rate

C

Covariance matrix

KS

Kolmogorov-Smirnov

LHS

Latin hypercube sampling

m

Number of training data

MD

Mahalanobis distance

n

Number of input variables

p

Probability of a prediction site being within the training data boundaries

Rp

Set of real numbers of dimension p

S

Surrogate model

T

Training data

x

Input variables

y

Response variables

α

Statistical significance level

\(\chi_p^{2} \)

Chi-square distribution—p degrees of freedom

Δ

Difference

ε

Relative error

μ

Mean

Subindices

b

boundary estimation

b20

median of top 20% largest Mahalanobis distances

bl

largest Mahalanobis distance

T

training data

Notes

Acknowledgements

This work was supported in part by the National Science Foundation CBET Division under Grant No. 0602996 to B. J. Fregly and R. T. Haftka.

References

  1. Barber CB, Dobkin DP, Huhdanpaa HT (1996) The Quickhull algorithm for convex hulls. ACM Trans Math Softw 22(4):469–483zbMATHCrossRefMathSciNetGoogle Scholar
  2. Bei Y, Fregly BJ (2004) Multibody dynamic simulation of knee contact mechanics. Med Eng Phys 26:777–789CrossRefGoogle Scholar
  3. Cressie NAC (1993) Statistics for spatial data. Wiley, New YorkGoogle Scholar
  4. Forrester AIJ, Keane AJ (2009) Recent advances in surrogate based optimization. Prog Aerosp Sci 45:50–79CrossRefGoogle Scholar
  5. Hammersley JM (1960) Related problems. 3. Monte-Carlo methods for solving multivariable problems. Ann N Y Acad Sci 86(3):844–874zbMATHCrossRefMathSciNetGoogle Scholar
  6. Jacques J, Lavergne C, Devictor N (2006) Sensitivity analysis in presence of model uncertainty and correlated inputs. Reliab Eng Syst Saf 91:1126–1134CrossRefGoogle Scholar
  7. Lin YC, Haftka RT, Queipo NV, Fregly BJ (2008) Dynamic simulation of knee motion using three dimensional surrogate contact modeling. In: Proceedings of the ASME 2008 summer bioengineering conference. SBC, Marco IslandGoogle Scholar
  8. Lophaven SN, Nielsen HB, Sondergaard J (2002) DACE—A Matlab kriging toolbox, version 2.0. Report IMM-TR-2002-12. Informatics and Mathematical Modeling. Technical University of DenmarkGoogle Scholar
  9. Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic PressGoogle Scholar
  10. McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics (American Statistical Association) 21(2):239–245zbMATHMathSciNetGoogle Scholar
  11. McLachlan GJ (1992) Discriminant analysis and statistical pattern recognition. Wiley InterscienceGoogle Scholar
  12. Missoum S, Ramu P, Haftka RT (2007) A convex hull approach for the reliability-based design optimization of nonlinear transient dynamic problems. Comput Methods Appl Mech Eng 196:2895–2906zbMATHCrossRefGoogle Scholar
  13. Mount DM (2002) CMSC 754 Computational geometry. Lecture Notes, University of MarylandGoogle Scholar
  14. Queipo NV, Haftka RT, Shyy W, Goel T, Vaidyanathan R, Kevin Tucker P (2005) Surrogate based analysis and optimization. Prog Aerosp Sci 41:1–28CrossRefGoogle Scholar
  15. Shioda R, Tunçel L (2007) Clustering via minimum volume ellipsoid. Comput Optim Appl 37:247–295zbMATHCrossRefMathSciNetGoogle Scholar
  16. Stephens MA (1974) EDF statistics for goodness of fit and some comparisons. J Am Stat Assoc 69:730–737CrossRefGoogle Scholar
  17. Sun P, Freund MR (2004) Computation of minimum-volume covering ellipsoids. Oper Res 52(5):690–706zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Luis E. Pineda
    • 1
  • Benjamin J. Fregly
    • 2
  • Raphael T. Haftka
    • 2
  • Nestor V. Queipo
    • 1
    Email author
  1. 1.Applied Computing InstituteUniversity of ZuliaMaracaiboVenezuela
  2. 2.Department of Mechanical and Aerospace EngineeringUniversity of FloridaGainesvilleUSA

Personalised recommendations