Kernel density estimation with bounded data

Kang, Young-Jin; Noh, Yoojeong; Lim, O-Kaung

doi:10.1007/s00158-017-1873-3

Kernel density estimation with bounded data

RESEARCH PAPER
Published: 08 December 2017

Volume 57, pages 95–113, (2018)
Cite this article

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Young-Jin Kang¹,
Yoojeong Noh¹ &
O-Kaung Lim¹

1457 Accesses
23 Citations
Explore all metrics

Abstract

The uncertainties of input variables are quantified as probabilistic distribution functions using parametric or nonparametric statistical modeling methods for reliability analysis or reliability-based design optimization. However, parametric statistical modeling methods such as the goodness-of-fit test and the model selection method are inaccurate when the number of data is very small or the input variables do not have parametric distributions. To deal with this problem, kernel density estimation with bounded data (KDE-bd) and KDE with estimated bounded data (KDE-ebd), which randomly generates bounded data within given input variable intervals for given data and applies them to generate density functions, are proposed in this study. Since the KDE-bd and KDE-ebd use input variable intervals, they attain better convergence to the population distribution than the original KDE does, especially for a small number of given data. The KDE-bd can even deal with a problem that has one data with input variable bounds. To verify the proposed method, statistical simulation tests were carried out for various numbers of data using multiple distribution types and then the KDE-bd and KDE-ebd were compared with the KDE. The results showed the KDE-bd and KDE-ebd to be more accurate than the original KDE, especially when the number of data is less than 10. It is also more robust than the original KDE regardless of the quality of given data, and is therefore more useful even if there is insufficient data for input variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A double sampling plan for truncated life tests under two-parameter Lindley distribution

Article 08 April 2024

A Guide for Sparse PCA: Model Comparison and Applications

Article Open access 29 June 2021

Residuals-based distributionally robust optimization with covariate information

Article 26 September 2023

References

Agarwal H, Renaud JE, Preston EL, Padmanabhan D (2004) Uncertainty quantification using evidence theory in multidisciplinary design optimization. Reliab Eng Syst Saf 85(1):281–294
Article Google Scholar
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723
Article MathSciNet MATH Google Scholar
Analytical Methods Committee (1989) Robust statistics-how not to reject outliers. Part 1. Basic concepts. Analyst 114(12):1693–1697
Article Google Scholar
Anderson TW, Darling DA (1952) Asymptotic theory of certain goodness of fit criteria based on stochastic processes. Ann Math Stat 23(2):193–212
Article MathSciNet MATH Google Scholar
Ayyub BM, McCuen RH (2012) Probability, statistics, and reliability for engineers and scientists. CRC Press, Florida
MATH Google Scholar
Betrie GD, Sadiq R, Morin KA, Tesfamariam S (2014) Uncertainty quantification and integration of machine learning techniques for predicting acid rock drainage chemistry: a probability bounds approach. Sci Total Environ 490:182–190
Article Google Scholar
Betrie GD, Sadiq R, Nichol C, Morin KA, Tesfamariam S (2016) Environmental risk assessment of acid rock drainage under uncertainty: the probability bounds and PHREEQC approach. J Hazard Mater 301:187–196
Article Google Scholar
Burnham KP, Anderson DR (2004) Multimodel inference: understanding AIC and BIC in model selection. Sociol Methods Res 33(2):261–304
Article MathSciNet Google Scholar
Chen S (2015) Optimal bandwidth selection for kernel density functionals estimation. J Probab Stat 2015:21
Article MathSciNet Google Scholar
Cho SG, Jang J, Kim S, Park S, Lee TH, Lee M, Choi JS, Kim HW, Hong S (2016) Nonparametric approach for uncertainty-based multidisciplinary design optimization considering limited data. Struct Multidiscip Optim 54(6):1671–1688
Article Google Scholar
Cowling A, Hall P (1996) On pseudodata methods for removing boundary effect in kernel density estimation. J R Stat Soc Ser B Methodol 58(3):551–563
MathSciNet MATH Google Scholar
Cox M, Harris P (2003) Up a GUM tree? Try the full monte! National Physical Laboratory, Teddington
Google Scholar
Eldred MS, Agarwal H, Perez VM, Wojtkiewicz SF Jr, Renaud JE (2007) Investigation of reliability method formulations in DAKOTA/UQ. Struct Infrastruct Eng 3(3):199–213
Article Google Scholar
Frigge M, Hoaglin DC, Lglewicz B (1989) Some implementations of the boxplot. Am Stat 43(1):50–54
Google Scholar
Gabauer W (2000) Manual of codes of practice for the determination of uncertainties in mechanical tests on metallic materials, the determination of uncertainties in tensile testing. UNCERT COP7 report, Project SMT4-CT97-2165
Gasser T, Müller HG (1979) Kernel estimation of regression functions. Smoothing Techniques for Curve Estimation 757:23–68
Article MathSciNet MATH Google Scholar
Guidoum AC (2015) Kernel estimator and bandwidth selection for density and its derivatives. Department of Probabilities & Statistics, Faculty of Mathematics, University of Science and Technology Houari Boumediene, Algeria, https://cran.r-project.org/web/packages/kedd/vignettes/kedd.pdf
Hansen BE (2009) Lecture notes on nonparametrics. University of Wisconsin-Madison, WI, USA, http://www.ssc.wisc.edu/~bhansen/718/NonParametrics1.pdf
Hardle W, Marron JS, Wand MP (1990) Bandwidth choice for density derivatives. J R Stat Soc Ser B Methodol 52(1):223–232
MathSciNet MATH Google Scholar
Jang J, Cho SG, Lee SJ, Kim KS, Hong JP, Lee TH (2015) Reliability-based robust design optimization with kernel density estimation for electric power steering motor considering manufacturing uncertainties. IEEE Trans Magn 51(3):1–4
Article Google Scholar
Jones MC, Kappenman RF (1992) On a class of kernel density estimate bandwidth selectors. Scand J Stat 19(4):337–349
MathSciNet MATH Google Scholar
Jung JH, Kang YJ, Lim OK, Noh Y (2017) A new method to determine the number of experimental data using statistical modeling methods. J Mech Sci Technol 31(6):2901–2910
Article Google Scholar
Kang YJ, Lim OK, Noh Y (2016) Sequential statistical modeling for distribution type identification. Struct Multidiscip Optim 54(6):1587–1607
Article Google Scholar
Kang YJ, Hong JM, Lim OK, Noh Y (2017) Reliability analysis using parametric and nonparametric input modeling methods. J Comput Struct Eng Inst Korea 30(1):87–94
Article Google Scholar
Karanki DR, Kushwaha HS, Verma AK, Ajit S (2009) Uncertainty analysis based on probability bounds (P-box) approach in probabilistic safety assessment. Risk Anal 29(5):662–675
Article Google Scholar
Karunamuni RJ, Alberts T (2005a) On boundary correction in kernel density estimation. Stat Methodol 2(3):191–212
Article MathSciNet MATH Google Scholar
Karunamuni RJ, Alberts T (2005b) A generalized reflection method of boundary correction in kernel density estimation. Can J Stat 33(4):497–509
Article MathSciNet MATH Google Scholar
Karunamuni RJ, Zhang S (2008) Some improvements on a boundary corrected kernel density estimator. Stat Probab Lett 78(5):499–507
Article MathSciNet MATH Google Scholar
Marron JS, Ruppert D (1994) Transformations to reduce boundary bias in kernel density estimation. J R Stat Soc Ser B Methodol 56(4):653–671
MathSciNet MATH Google Scholar
Montgomery DC, Runger GC (2003) Applied statistics and probability for engineers (3^rd edition). Wiley, New York
Noh Y, Choi KK, Lee I (2010) Identification of marginal and joint CDFs using Bayesian method for RBDO. Struct Multidiscip Optim 40(1):35–51
Article MathSciNet MATH Google Scholar
Schindler A (2011) Bandwidth selection in nonparametric kernel estimation. PhD Thesis. Göttingen, Georg-August Universität, Diss
Schuster EF (1985) Incorporating support constraints into nonparametric estimators of densities. Commun StatTheory Methods 14(5):1123–1136
Article MathSciNet MATH Google Scholar
Schwarz (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MathSciNet MATH Google Scholar
Scott DW, Terrell GR (1987) Biased and unbiased cross-validation in density estimation. J Am Stat Assoc 82(400):1131–1146
Article MathSciNet MATH Google Scholar
Shah H, Hosder S, Winter T (2015) Quantification of margins and mixed uncertainties using evidence theory and stochastic expansions. Reliab Eng Syst Saf 138:59–72
Article Google Scholar
Sheather SJ (2004) Density estimation. Stat Sci 19(4):588–597
Article MATH Google Scholar
Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B Methodol 53(3):683–690
MathSciNet MATH Google Scholar
Silverman BW (1986) Density estimation for statistics and data analysis, vol 26. CRC press, London
Book MATH Google Scholar
Tucker WT, Ferson S (2003) Probability bounds analysis in environmental risk assessment. Applied Biomathematics, Setauket, New York, http://www.ramas.com/pbawhite.pdf
Tukey JW (1977) Exploratory data analysis. Pearson, New York
MATH Google Scholar
Verma AK, Srividya A, Karanki DR (2010) Reliability and safety engineering. Springer, London
Book Google Scholar
Wand MP, Jones MC (1994) Kernel smoothing. CRC press, London
MATH Google Scholar
Yao W, Chen X, Quyang Q, Van Tooren M (2013) A reliability-based multidisciplinary design optimization procedure based on combined probability and evidence theory. Struct Multidiscip Optim 48(2):339–354
Article MathSciNet Google Scholar
Youn BD, Jung BC, Xi Z, Kim SB, Lee WR (2011) A hierarchical framework for statistical model calibration in engineering product development. Comput Methods Appl Mech Eng 200(13):1421–1431
Zhang Z, Jiang C, Han X, Hu D, Yu S (2014) A response surface approach for structural reliability analysis using evidence theory. Adv Eng Softw 69:37–45
Article Google Scholar

Download references

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant, funded by the Korean Government (NRF-2015R1A1A3A04001351) and by the Technology Innovation Program (10048305, Launching Plug-in Digital Analysis Framework for Modular System Design) and the Human Resources Development program (No. 20164030201230) of the Korea Institute of Energy Technology Evaluation and Planning (KETEP) grant funded by the Ministry of Trade, Industry and Energy. This support is greatly appreciated.

Author information

Authors and Affiliations

School of Mechanical Engineering, Pusan National University, Pusan, 609-735, South Korea
Young-Jin Kang, Yoojeong Noh & O-Kaung Lim

Authors

Young-Jin Kang
View author publications
You can also search for this author in PubMed Google Scholar
Yoojeong Noh
View author publications
You can also search for this author in PubMed Google Scholar
O-Kaung Lim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoojeong Noh.

Appendix 1: Silverman’s rule of thumb

The Silverman’s rule of thumb is a method which minimizes an objective function, mean integrated squared error (MISE), and it is probably the most popular one among the bandwidth selection methods (Schindler 2011). It assumes that true density is normally distributed therefore Silverman’s rule will compute a bandwidth close to optimal if a random variable X is reasonably close to the normal distribution (Silverman 1986; Hansen 2009). It defines according to various kernel functions as follows (Hansen 2009).

$$ h=\widehat{\sigma}{C}_{\nu }(k){n}^{-1/\left(2\nu +1\right)} $$

(18)

where C _ν (k) is the constant from Table 12, and ν is the order of the kernel.

Table 12 Constants of Silverman’s rule (Hansen 2009)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, YJ., Noh, Y. & Lim, OK. Kernel density estimation with bounded data. Struct Multidisc Optim 57, 95–113 (2018). https://doi.org/10.1007/s00158-017-1873-3

Download citation

Received: 20 March 2017
Revised: 21 November 2017
Accepted: 22 November 2017
Published: 08 December 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s00158-017-1873-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel density estimation with bounded data

Abstract

Access this article

Similar content being viewed by others

A double sampling plan for truncated life tests under two-parameter Lindley distribution

A Guide for Sparse PCA: Model Comparison and Applications

Residuals-based distributionally robust optimization with covariate information

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix 1: Silverman’s rule of thumb

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Kernel density estimation with bounded data

Abstract

Access this article

Similar content being viewed by others

A double sampling plan for truncated life tests under two-parameter Lindley distribution

A Guide for Sparse PCA: Model Comparison and Applications

Residuals-based distributionally robust optimization with covariate information

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix 1: Silverman’s rule of thumb

Appendix 1: Silverman’s rule of thumb

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation