Computational Statistics

, Volume 27, Issue 4, pp 757–777 | Cite as

Density estimation and comparison with a penalized mixture approach

Original Paper

Abstract

The paper presents smooth estimation of densities utilizing penalized splines. The idea is to represent the unknown density by a convex mixture of basis densities, where the weights are estimated in a penalized form. The proposed method extends the work of Komárek and Lesaffre (Comput Stat Data Anal 52(7):3441–3458, 2008) and allows for general density estimation. Simulations show a convincing performance in comparison to existing density estimation routines. The idea is extended to allow the density to depend on some (factorial) covariate. Assuming a binary group indicator, for instance, we can test on equality of the densities in the groups. This provides a smooth alternative to the classical Kolmogorov-Smirnov test or an Analysis of Variance and it shows stable and powerful behaviour.

Keywords

Density estimation Mixture density estimation Penalized spline smoothing ANOVA 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom 19(6): 716–723MathSciNetMATHCrossRefGoogle Scholar
  2. Babu GJ, Canty AJ, Chaubey YP (2002) Application of bernstein polynomials for smooth estimation of a distribution and density function. J Stat Plan Infer 105(2): 377–392MathSciNetMATHCrossRefGoogle Scholar
  3. Bishop CM (2006) Pattern recognition and machine learning. Springer, New York, NYMATHGoogle Scholar
  4. Boneva LI, Kendall D, Stefanov I (1971) Spline transformations: three new diagnostic aids for the statistical data- analyst. J R Stat Soc Ser B 33(1): 1–71MathSciNetMATHGoogle Scholar
  5. Butterfield K (1976) The computation of all the derivatives of a b-spline basis. IMA J Appl Math 17(1): 15–25MathSciNetMATHCrossRefGoogle Scholar
  6. Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. J Classif 13: 195–212. doi:10.1007/BF01246098 MathSciNetMATHCrossRefGoogle Scholar
  7. Claeskens G, Krivobokova T, Opsomer J (2009) Asymptotic properties of penalized spline estimators. Biometrika 96(3): 529–544MathSciNetMATHCrossRefGoogle Scholar
  8. de Boor C (1978) A practical guide to splines. Springer, BerlinMATHCrossRefGoogle Scholar
  9. Dias R (1998) Density estimation via hybrid splines. J Stat Comput Simul 60(4): 277–293MathSciNetMATHCrossRefGoogle Scholar
  10. Efron B, Tibshirani R (1996) Using specially designed exponential families for density estimation. Ann Stat 24(6): 2431–2461MathSciNetMATHCrossRefGoogle Scholar
  11. Eilers PHC, Marx BD (1996) Flexible smoothing with B-splines and penalties. Stat Sci 11(2): 89–121MathSciNetMATHCrossRefGoogle Scholar
  12. Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458): 611–631MathSciNetMATHCrossRefGoogle Scholar
  13. Ghidey W, Lesaffre E, Eilers PHC (2004) Smooth random effects distribution in a linear mixed model. Biometrics 60(4): 945–953MathSciNetMATHCrossRefGoogle Scholar
  14. Good IJ, Gaskins RA (1971) Nonparametric roughness penalties for probability densities. Biometrika 58(2): 255–277MathSciNetMATHCrossRefGoogle Scholar
  15. Gu C (1993) Smoothing spline density estimation: A dimensionless automatic algorithm. J Am Stat Assoc 88(422): 495–504MATHCrossRefGoogle Scholar
  16. Gu C (2009) gss: general smoothing splines. R package version 1.0-5Google Scholar
  17. Gu C, Wang J (2003) Penalized likelihood density estimation: direct cross-validation and scalable approximation. Statistica Sinica 13(3): 811–826MathSciNetMATHGoogle Scholar
  18. Hall P, Patil P (1995) Formulae for mean integrated squared error of nonlinear wavelet-based density estimators. Ann Stat 23(3): 905–928MathSciNetMATHCrossRefGoogle Scholar
  19. Kass RE, Steffey D (1989) Approximate bayesian inference in conditionally independent hierarchical models (parametric empirical bayes models). J Am Stat Assoc 84(407): 717–726MathSciNetCrossRefGoogle Scholar
  20. Kauermann G (2005) A note on smoothing parameter selection for penalised spline smoothing. J Stat Plan Infer 127(1–2): 53–69MathSciNetMATHCrossRefGoogle Scholar
  21. Kauermann G, Krivobokova T, Fahrmeir L (2009) Some asymptotic results on generalized penalized spline smoothing. J R Stat Soc Ser B 71(2): 487–503MathSciNetMATHCrossRefGoogle Scholar
  22. Kauermann G, Opsomer J (2011) Data-driven selection of the spline dimension in penalized spline regression. Biometrika 98(1): 225–230MathSciNetMATHCrossRefGoogle Scholar
  23. Komárek A (2006) Accelerated failure time models for multivariate doubly-interval-censored data with flexible distributional assumptions. Ph.D. thesis, Leuven: Katholieke Universiteit Leuven, Faculteit WetenschappenGoogle Scholar
  24. Komárek A, Lesaffre E (2008) Generalized linear mixed model with a penalized gaussian mixture as a random-effects distribution. Comput Stat Data Anal 52(7): 3441–3458MATHCrossRefGoogle Scholar
  25. Komárek A, Lesaffre E, Hilton J (2005) Accelerated failure time model for arbitrarily censored data with smoothed error distribution. J Comput Graph Stat 14(3): 726–745CrossRefGoogle Scholar
  26. Koo JY, Kooperberg C, Park J (1999) Logspline density estimation under censoring and truncation. Scand J Stat 26(1): 87–105MathSciNetMATHCrossRefGoogle Scholar
  27. Kooperberg C (2009) logspline: Logspline density estimation routines. R package version 2.1.3.Google Scholar
  28. Li JQ, Barron AR (1999) Mixture density estimation. In: Advances in neural information processing systems 12. MIT Press, Cambridge, pp 279–285Google Scholar
  29. Li Y, Ruppert D (2008) On the asymptotics of penalized splines. Biometrika 95(2): 415–436MathSciNetMATHCrossRefGoogle Scholar
  30. Lindsey JK (1974) Comparison of probability distributions. J R Stat Soc Ser B 36(1): 38–47MathSciNetMATHGoogle Scholar
  31. Lindsey JK (1974) Construction and comparison of statistical models. J R Stat Soc Ser B 36(3): 418–425MathSciNetMATHGoogle Scholar
  32. Liu L, Levine M, Zhu Y (2009) A functional EM algorithm for mixing density estimation via nonparametric penalized likelihood maximization. J Comput Graph Stat 18(2): 481–504MathSciNetCrossRefGoogle Scholar
  33. McLachlan G, Peel D (2000) Finite mixture models. Wiley, New YorkMATHCrossRefGoogle Scholar
  34. Müller P, Quintana F, Rosner G (2009) Bayesian Clustering with Regression. University of Texas M.D. Anderson Cancer Center, Houston, TX 77030 U.S.AGoogle Scholar
  35. Nadaraya E (1974) On the integral mean square error of some nonparametric estimates for the density function. Theory Prob Appl 19(1): 133–141MATHCrossRefGoogle Scholar
  36. Nadaraya EA (1964) On estimating regression. Theory Prob Appl 9(1): 141–142CrossRefGoogle Scholar
  37. Nason G (2010) wavethresh: Wavelets statistics and transforms. R package version 4.5Google Scholar
  38. Nason GP (2008) Wavelet methods in statistics with R. Springer, Berlin ISBN 978-0-387-75960-9MATHCrossRefGoogle Scholar
  39. Nason GP, Silverman BW (1999) Wavelets for regression and other statistical problems. In: Schimek MG (ed) Smoothing and regression: approaches, computation, and application, series in probability and statistics. Wiley, New YorkGoogle Scholar
  40. O’Sullivan F (1986) A statistical perspective on ill-posed inverse problems. Stat Sci 1(4): 502–518MathSciNetMATHCrossRefGoogle Scholar
  41. Reiss T, Ogden R (2009) Smoothing parameter selection for a class of semiparametric linear models. J R Stat Soc Ser B 71(2): 505–523MathSciNetMATHCrossRefGoogle Scholar
  42. Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B 71(2): 319–392MathSciNetMATHCrossRefGoogle Scholar
  43. Ruppert D (2002) Selecting the number of knots for penalized splines. J Comput Graph Stat 11(4): 735–757MathSciNetCrossRefGoogle Scholar
  44. Ruppert D, Wand M, Carroll R (2003) Semiparametric regression. Cambridge University Press, CambridgeMATHCrossRefGoogle Scholar
  45. Ruppert D, Wand MP, Carroll RJ (2009) Semiparametric regression during 2003–2007. Electron J Stat 3: 1193–1256MathSciNetCrossRefGoogle Scholar
  46. Schall R (1991) Estimation in generalized linear models with random effects. Biometrika 78(4): 719–727MathSciNetMATHCrossRefGoogle Scholar
  47. Schellhase C (2010) pendensity: density estimation with a penalized mixture approach. R package version 0.2.3Google Scholar
  48. Searle S, Casella G, McCulloch C (1992) Variance components. Wiley, New YorkMATHCrossRefGoogle Scholar
  49. Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B 53(3): 683–690MathSciNetMATHGoogle Scholar
  50. Silverman BW (1982) On the estimation of a probability density function by the maximum penalized likelihood method. Ann Stat 10(3): 795–810MATHCrossRefGoogle Scholar
  51. Simonoff JS (1996) Smoothing methods in statistics. Springer, New YorkMATHCrossRefGoogle Scholar
  52. Wand M (2003) Smoothing and mixed models. Comput Stat 18(2): 223–249MATHGoogle Scholar
  53. Wand M, Jones MC (1995) Kernel smoothing. Chapman and Hall, LondonMATHGoogle Scholar
  54. Wand MP, Ormerod JT (2008) On semiparametric regression with O’Sullivan penalised splines. Aust N Z J Stat 50(2): 179–198MathSciNetMATHCrossRefGoogle Scholar
  55. Watson G (1964) Smooth regression analysis. Sankhya Ser A 26: 359–372MathSciNetMATHGoogle Scholar
  56. Wood S (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J R Stat Soc Ser B 73(1): 3–36MathSciNetCrossRefGoogle Scholar
  57. Wood SN (2006) Generalized additive models. Chapman and Hall/CRC, LondonMATHGoogle Scholar
  58. Young D, Hunter D, Chauveau D, Benaglia T (2009) mixtools: an R package for analyzing mixture models. J Stat Softw 32(6): 1–29Google Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Department for Business Administration and Economics, Centre for StatisticsBielefeld UniversityBielefeldGermany
  2. 2.Department of StatisticsLudwig-Maximilians-University MunichMunichGermany

Personalised recommendations