Statistics and Computing

, Volume 27, Issue 1, pp 39–51 | Cite as

Computationally tractable approximate and smoothed Polya trees

  • William CipolliIIIEmail author
  • Timothy Hanson


A discrete approximation to the Polya tree prior suitable for latent data is proposed that enjoys surprisingly simple and efficient conjugate updating. This approximation is illustrated in two applied contexts: the implementation of a nonparametric meta-analysis involving studies on the relationship between alcohol consumption and breast cancer, and random intercept Poisson regression for Ache armadillo hunting treks. The discrete approximation is then smoothed with Gaussian kernels to provide a smooth density for use with continuous data; the smoothed approximation is illustrated on a classic dataset on galaxy velocities and on recent data involving breast cancer survival in Louisiana.


Bayesian nonparametric Density estimation Generalized linear mixed model Meta-analysis 


  1. Agresti, A.: Categorical Data Analysis. Wiley, New York (2002)CrossRefzbMATHGoogle Scholar
  2. Aitchison, J., Shen, S.M.: Logistic-normal distributions: some properties and uses. Biometrika 67, 261–272 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
  3. Aitkin, M.: A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55, 117–128 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  4. Branscum, A., Hanson, T.: Bayesian nonparametric meta-analysis using Polya tree mixture models. Biometrics 64, 825–833 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  5. Buckley, J.J., James, I.R.: Linear regression with censored data. Biometrika 66, 429–436 (1979)CrossRefzbMATHGoogle Scholar
  6. Burr, D., Doss, H.: A Bayesian semiparametric model for random-effects meta-analysis. J. Am. Stat. Assoc. 100, 242–251 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  7. Canale, A., Dunson, D.: Bayesian kernel mixtures for counts. J. Am. Stat. Assoc. 106, 1528–1539 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  8. Canale, A., Dunson, D.B.: Multiscale Bernstein polynomials for densities. Stat. Sin. (in press, 2016)Google Scholar
  9. Chen, Y., Hanson, T., Zhang, J.: Accelerated hazards model based on parametric families generalized with Bernstein polynomials. Biometrics 70, 192–201 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  10. Christensen, R., Johnson, W., Branscum, A., Hanson, T.: Bayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians. CRC Press, Boca Raton (2010)zbMATHGoogle Scholar
  11. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  12. Draper, D.: Discussion of Bayesian nonparametric inference for random distributions and related functions. J. R. Stat. Soc. B 61, 510–513 (1999)Google Scholar
  13. Escobar, M., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90, 577–588 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  14. Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209–230 (1973)MathSciNetCrossRefzbMATHGoogle Scholar
  15. Ferguson, T.S.: Prior distributions on spaces of probability measures. Ann. Stat. 02, 615–629 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
  16. Follman, D.A., Lambert, D.: Generalizing logistic regression by nonparametric mixing. J. Am. Stat. Assoc. 84, 295–300 (1989)CrossRefGoogle Scholar
  17. Gamerman, D.: Sampling from the posterior distribution in generalized linear mixed models. Stat. Comput. 7, 57–68 (1997)CrossRefGoogle Scholar
  18. Gans, P., Gill, J.: Smoothing and differentiation of spectroscopic curves using spline functions. Appl. Spectrosc. 38, 370–376 (1984)CrossRefGoogle Scholar
  19. Geisser, S., Eddy, W.F.: A predictive approach to model selection. J. Am. Stat. Assoc. 74, 153–160 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  20. Gelfand, A.E., Dey, D.K.: Bayesian model choice: asymptotics and exact calculations. J. R. Stat. Soc. B 56, 501–514 (1994)MathSciNetzbMATHGoogle Scholar
  21. Ghidey, W., Lesaffre, E., Eilers, P.: Smooth random effects distribution in a linear mixed model. Biometrics 60, 945–953 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  22. Ghosh, J.K., Ramamoorthi, R.V.: Bayesian Nonparametrics. Springer, New York (2003)zbMATHGoogle Scholar
  23. Green, P.J.: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  24. Hanson, T.: Inference for mixtures of finite Polya tree models. J. Am. Stat. Assoc. 101, 1548–1565 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  25. Hanson, T., Jara, A.: Surviving fully Bayesian nonparametric regression models. In: Bayesian Theory and Applications, pp. 593–615. Oxford University Press, Oxford (2013)Google Scholar
  26. Hanson, T., Johnson, W.: Modeling regression error with a mixture of Polya trees. J. Am. Stat. Assoc. 97, 1020–1033 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  27. Harrell Jr., F.: rms: Regression Modeling Strategies. R Package Version 4.4-0 (2015)Google Scholar
  28. Higdon, D.: Space and space–time modeling using process convolutions. In: Anderson, C., Barnett, V., Chatwin, P., El-Shaarawi, A. (eds.) Quantitative Methods for Current Environmental Issues, pp. 37–56. Springer, London (2002)CrossRefGoogle Scholar
  29. Hjort, N., Holmes, C., Müller, P., Walker, S.G. (eds.): Bayesian Nonparametrics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (2010)Google Scholar
  30. Ibrahim, J.G., Chen, M.H., Sinha, D.: Bayesian Survival Analysis. Springer, New York (2001)CrossRefzbMATHGoogle Scholar
  31. Ishwaran, H., Zarepour, M.: Exact and approximate sum representations for the Dirichlet process. Can. J. Stat. 30, 269–283 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  32. Jara, A., Hanson, T., Lesaffre, E.: Robustifying generalized linear mixed models using a new class of mixtures of multivariate Polya trees. J. Comput. Graph. Stat. 18, 838–860 (2009)MathSciNetCrossRefGoogle Scholar
  33. Jara, A., Hanson, T., Quintana, F., Müeller, P., Rosner, G.: DPpackage: Bayesian semi- and nonparametric modeling in R. J. Stat. Softw. 40, 1–30 (2011).
  34. Kleinman, K.P., Ibrahim, J.G.: A semi-parametric Bayesian approach to generalized linear mixed models. Stat. Med. 17, 2579–2596 (1998)CrossRefGoogle Scholar
  35. Komárek, A., Lesaffre, E.: Generalized linear mixed model with a penalized Gaussian mixture as a random-effects distribution. Comput. Stat. Data Anal. 52, 3441–3458 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  36. Komárek, A., Lesaffre, E., Hilton, J.: Accelerated failure time model for arbitrarily censored data with smoothed error distribution. J. Comput. Graph. Stat. 14, 726–745 (2005)MathSciNetCrossRefGoogle Scholar
  37. Lavine, M.: Some aspects of Polya tree distributions for statistical modelling. Ann. Stat. 20, 1222–1235 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  38. Lavine, M.: More aspects of Polya tree distributions for statistical modelling. Ann. Stat. 22, 1161–1176 (1994)CrossRefzbMATHGoogle Scholar
  39. Longnecker, M.: Alcoholic beverage consumption in relation to risk of breast cancer: meta-analysis and review. Cancer Causes Control 5, 73–82 (1994)CrossRefGoogle Scholar
  40. Mauldin, R.D., Sudderth, W.D., Williams, S.C.: Polya trees and random distributions. Ann. Stat. 20, 1203–1221 (1992)CrossRefzbMATHGoogle Scholar
  41. McMillan, G.: Ache residential grouping and social foraging. PhD Thesis, University of New Mexico (2001)Google Scholar
  42. Mitra, R., Müller, P. (eds.): Nonparametric Bayesian Inference in Biostatistics. Frontiers in Probability and the Statistical Sciences. Springer, Cham (2015)Google Scholar
  43. Müller, P., Quintana, F., Jara, A., Hanson, T.: Bayesian Nonparametric Data Analysis. Springer, Cham (2015)CrossRefzbMATHGoogle Scholar
  44. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2014)Google Scholar
  45. Roeder, K.: Density estimation with confidence sets exemplified by superclusters and voids in galaxies. J. Am. Stat. Assoc. 85, 617–624 (1990)CrossRefzbMATHGoogle Scholar
  46. Sargent, D.J., Hodges, J.S., Carlin, B.P.: Structured Markov chain Monte Carlo. J. Comput. Graph. Stat. 9, 217–234 (2000)MathSciNetGoogle Scholar
  47. Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4, 639–650 (1994)MathSciNetzbMATHGoogle Scholar
  48. Unser, M., Aldroubi, A., Eden, M.: On the asymptotic convergence of B-spline wavelets to Gabor functions. IEEE Trans. Inf. Theory 38, 864–872 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  49. Wong, W.H., Ma, L.: Optional Polya tree and Bayesian inference. Ann. Stat. 38, 1433–1459 (2010)CrossRefzbMATHGoogle Scholar
  50. Zhao, L., Hanson, T.: Spatially dependent Polya tree modeling for survival data. Biometrics 67, 391–403 (2011)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of South CarolinaColumbiaUSA

Personalised recommendations