Abstract
The core of the nonparametric/semiparametric Bayesian analysis is to relax the particular parametric assumptions on the distributions of interest to be unknown and random, and assign them a prior. Selecting a suitable prior therefore is especially critical in the nonparametric Bayesian fitting. As the distribution of distribution, Dirichlet process (DP) is the most appreciated nonparametric prior due to its nice theoretical proprieties, modeling flexibility and computational feasibility. In this paper, we review and summarize some developments of DP during the past decades. Our focus is mainly concentrated upon its theoretical properties, various extensions, statistical modeling and applications to the latent variable models.
Similar content being viewed by others
References
Aldous D J. Exchangeability and related topics, In: École d’Éte de Probabilités de Saint-Flour XIII-1983, Lecture Notes in Math., Vol. 1117, New York: Springer-Verlag, 1985, 23–34
Antoniak C E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist., 1974, 2(6): 1152–1174
Basu S, Chib S. Marginal likelihood and Bayes factors for Dirichlet process mixture models, J. Amer. Statist. Assoc., 2003, 98(461): 224–235
Bentler P M, Wu E J C. EQS6: Structural Equations Program Manual. Encino, CA: Multivariate Software, 2006
Blackwell D. Discreteness of Ferguson selections. Ann. Statist., 1973, 1(2): 356–358
Blackwell D, MacQueen J B. Ferguson distributions via polya urn schemes. Ann. Statist., 1973, 1(2): 353–355
Bollen K A. Structural Equations with Latent Variables. New York: John Wiley & Sons, 1989
Bush C A, MacEachern S N. A semiparametric Bayesian model for randomised block designs. Biometrika, 1996, 83(2): 275–285
Carota C, Parmigiani G. Semiparametric regression for count data. Biometrika, 2002, 89(2): 265–281
Chow S M, Tang N S, Yuan Y, Song X Y, Zhu H T. Bayesian estimation of semiparametric nonlinear dynamic factor analysis models using the Dirichlet process prior. Br. J. Math. Stat. Psychol., 2011, 64(1): 69–106
Cifarelli D, Regazzini E. Problemi statistici non parametrici in condizioni di scambialbilita parziale: impiego di medie associative. Technical Report, Quad. Insitit. Mat. Finana. Univ. Torino III, 1978, 1–13 (in Italian)
Connor R J, Mosimann J E. Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Amer. Statist. Assoc., 1969, 64(325): 194–206
Crandell L J, Dunson D B. Posterior simulation across nonparametric models for functional clustering. Sankhya B, 2011, 73(1): 42–61
Dalal S R. Dirichlet invariant processes and applications to nonparametric estimation of symmetric distribution functions. Stochastic Process. Appl., 1979, 9(1): 99–107
De Iorio M, Müller P, Rosner G L, MacEacher S N. An ANOVA model for dependent random measures. J. Amer. Statist. Assoc., 2004, 99(465): 205–215
Doss H. Bayesian nonparametric estimation of the median: Part I. Computation of the estimates. Ann. Statist., 1985, 13(4): 1432–1444
Doss H. Bayesian nonparametric estimation of the median: Part II. Asymptotic properties of the estimates. Ann. Statist., 1985, 13(4): 1445–1464
Doss H. Bayesian nonparametric estimation for incomplete data via successive substitution sampling. Ann. Statist., 1994, 22(4): 1763–1786
Duan J A, Guindani M, Gelfand A E. Generalized spatial Dirichlet process models. Biometrika, 2007, 94(4): 809–825
Dunson D B. Nonparametric Bayes local partition models for random effects. Biometrika, 2009, 96(2): 249–262
Dunson D B, Park J H. Kernel stick-breaking processes. Biometrika, 2008, 95(2): 307–323
Dunson D B, Pillai N, Park J H. Bayesian density regression. J. R. Stat. Soc. Ser. B. Stat. Methodol., 2007, 69(2): 163–183
Escobar M D. Estimating the means of several normal populations by estimating the distribution of the means, Ph.D. Thesis. New Haven: Yale Univ., 1988
Escobar M D. Estimating normal means with a Dirichlet process prior. J. Amer. Statist. Assoc., 1994, 89(425): 268–277
Escobar M D, West M. Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc., 1995, 90(430): 577–588
Ewens W J. Population Genetics Theory — The Past and the Future. In: Lessard S. (eds) Mathematical and Statistical Developments of Evolutionary Theory. NATO ASI Series (Series C: Mathematical and Physical Sciences), vol 299. Dordrecht:Springer, 1990
Fabius J. Asymptotic behavior of Bayes’ estimates. Ann. Math. Statist., 1964, 35(2): 846–856
Ferguson T S. A Bayesian analysis of some nonparametric problems. Ann. Statist., 1973, 1(2): 209–230
Ferguson T S. Prior distributions on spaces of probability measures. Ann. Statist., 1974, 2(4): 615–629
Fong D K H, Pammer S E, Arnold S F, Bolton G E. Reanalyzing ultimatum bargaining: comparing nondecreasing curves without shape constraints. J. Busin. Econom. Statist., 2002, 20(3): 423–430
Freedman D A. On the asymptotic behavior of Bayes’ estimates in the discrete case II. Ann. Math. Statist., 1963, 34(4): 1386–1403
Gelfand A E, Kottas A. A computational approach for full nonparametric Bayesian inference under Dirichlet Process mixture models. J. Comput. Graph. Stat., 2002, 11(2): 289–305
Gelfand A E, Kottas A. Bayesian semiparametric for median residual life. Scandinavian Journal of Statistics, 2003, 30(4): 651–665
Gelfand A E, Kottas A, MacEachern S N. Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Amer. Statist. Assoc., 2005, 100(471): 1021–1035
Gelfand A E, Kuo L. Nonparametric Bayesian bioassay including ordered polytomous response. Biometrika, 1991, 78(3): 657–666
Gelfand A E, Smith A F M. Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc., 1990, 85(410): 398–409
Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. Trans. Pattern Anal. Mach. Intell., 1984, PAMI-6(6): 721–741
Ghosh J K, Ramamoorthi R V. Bayesian Nonparametrics, New York: Springer-Verlag, 2003
Giudici P, Mezzetti M, Muliere P. Mixtures of products of Dirichlet processes for variable selection in survival analysis. J. Statist. Plann. Inference, 2003, 111(1/2): 101–115
Gou J W, Xia Y M, Jiang D P. Bayesian analysis of two-part nonlinear latent variable model: Semiparametric method. Statistical Modelling, 2021, https://doi.org/10.1177/1471082X211059233
Griffin J E, Steel M F J. Order-based dependent Dirichlet processes. J. Amer. Statist. Assoc., 2006, 101(473): 179–194
Halmos P R. Random alms. Ann. Math. Statist., 1944, 15(2): 182–189
Hanson T E. Inference for mixtures of finite Polya tree models. J. Amer. Statist. Assoc., 2006, 101(476): 1548–1565
Hastings W K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 1970, 57(1): 97–109
Ishwaran H, James L F. Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc., 2001, 96(453): 161–173
Ishwaran H, James L F. Approximate Dirichlet process computing in finite normal mixtures: smoothing and prior information. J. Comput. Graph. Stat., 2002, 11(3): 508–532
Ishwaran H, James L F. Generalized weighted Chinese restaurant processes for species sampling mixture models. Statist. Sin., 2003, 13(4): 1211–1235
Ishwaran H, James L F. Computational methods for multiplicative intensity models using weighted Gamma process: proportional hazards, marked point processes, and panel count data. J. Amer. Statist. Assoc., 2004, 99(465): 175–190
Ishwaran H, Takahara G. Independent and identically distributed Monte Carlo algorithms for semiparametric linear mixed models. J. Amer. Statist. Assoc., 2002, 97(460): 1154–1166
Ishwaran H, Zarepour M. Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models. Biometrika, 2000, 87(2): 371–390
Jöreskog K, Sörbom D. LISREL 8: Structural Equation Modeling with the SIMPLIS Command Language. Hove and London: Scientific Software International, 1996
Kelloway E K. Using Mplus for Structural Equation Modeling. Canadian Psychology, 1998, 40(4):381–383
Kingman J F C, Taylor S J, Hawkes A G, Walker A M, Cox D R, Smith A F M, Hill B M, Burville P J, Leonard T. Random discrete distributions. J. R. Stat. Soc. Ser. B., 1975, 37: 1–22
Kleinman K P, Ibrahim J G. A semiparametric Bayesian approach to the random effects model. Biometrics, 1998, 54(3): 921–938
Kleinman K P, Ibrahim J G. A semi-parametric Bayesian approach to generalized linear mixed models. Statist. Med., 1998, 17(22): 2579–2596
Kolmogorov A N. Foundations of the Theory of Probability, 2nd ed., trans. Nathan Morrison (1956). Chelsea: New-York, 1933. J. Amer. Statist. Assoc., 1994, 89(425): 278–288
Kong A, Liu J S, Wong W H. Sequential imputations and Bayesian missing data problems. J. Amer. Statist. Assoc., 1994, 89(425): 278–288
Korwar R M, Hollander M. Contributions to the theory of Dirichlet processes. Ann. Probab., 1973, 1(4): 705–711
Kuo L. Computations of mixtures of Dirichlet processes. SIAM J. Sci. Stat. Comput., 1986, 7(1): 60–71
Lavine M. Some aspects of Polya tree distributions for statistical modelling. Ann. Statist., 1992, 20(3): 1222–1235
Lavine M. More aspects of Polya tree distributions for statistical modelling. Ann. Statist., 1994, 22(3): 1161–1176
Lee S Y. Structural Equation Modeling: A Bayesian Approach. Chichester: John Wiley & Sons., 2007
Lee S Y, Lu B, Song X Y. Semiparametric Bayesian analysis of structural equation models with fixed covariates. Statist. Med., 2008, 27(13): 2341–2360
Lennox K P, Dahl D B, Vannucci M, Day R, Tsai J W. A Dirichlet process mixture of hidden Markov Models for protein structure prediction. Ann. Appl. Stat., 2010, 4(2): 916–942
Li Y S, Lin X H, Muöller P. Bayesian inference in semiparametric mixed models for longitudinal data. Biometrics, 2010, 66(1): 70–78
Liu J S. Nonparametric hierarchical Bayes via sequential imputations. Ann. Statist., 1996, 24(3): 911–930
Lo A Y. On a class of Bayesian nonparametric estimates: I. Density estimates. Ann. Statist., 1984, 12(1): 351–357
MacEachern S N. Estimating normal means with a conjugate style Dirichlet process prior. Comm. Stat. Simulat. Comput., 1994, 23(3): 727–741
MacEachern S N. Dependent Dirichlet processes, In: ASA Proceedings of the Section on Bayesian Statistical Science. Alexandria, VA: Amer. Statist. Assoc., 1999: 50–55
MacEachern S N. Decision theoretic aspects of dependent nonparametric processes. In: Bayesian Methods with Applications to Science, Policy and Official Statistics, Crete: International Society for Bayesian Analysis, 2000: 551–560
MacEachern S N, Clyde M, Liu J S. Sequential importance sampling for nonparametric Bayes models: The next generation. Canad. J. Statist., 1999, 27(2): 251–267
MacEachern S N, Müller P. Estimating mixture of Dirichlet process models. J. Comput. Graph. Stat., 1998, 7(2): 223–238
MacEachern S N, Müller P. Efficient MCMC schemes for robust model extensions using encompassing Dirichlet process mixture models. In: Robust Bayesian Analysis, Lecture Notes in Statist., Vol. 152. New York: Springer-Verlag, 2000: 295–315
McCloskey J W. A model for the distribution of individuals by species in an environment. Ph.D. Thesis, East Lansing, MI: Michigan State Univ., 1965
Metropolis N, Rosenbluth A W, Rosenbluth M N, Teller A H, Teller E. Equation of state calculations by fast computing machines. J. Chem. Phys., 1953, 21(6): 1087–1092
Mira A, Petrone S. Bayesian hierarchical non-parametric inference for change-point problems. In: Bayesian Statistics 5, Oxford: Oxford Univ. Press, 1996: 693–703
Muliere P, Petrone S. A Bayesian predictive approach to sequential search for an optimal dose: parametric and nonparametric models. J. Ital. Statist. Soc., 1993, 2(3): 349–364
Muliere P, Tardella L. Approximating distributions of random functionals of Ferguson-Dirichlet priors. Canadian J. Statist., 1998, 26(2): 283–297
Müller P, Erkanli A, West M. Bayesian curving fitting using multivariate normal mixtures. Biometrika, 1996, 83(1): 67–79
Müller P, Quintana F, Rosner G. A method for combining inference across related non-parametric Bayesian models. J. R. Stat. Soc. Ser. B. Stat. Methodol., 2004, 66(3): 735–749
Müller P, Quintana F, Rosner G. A product partition model with regression on covariates. Journal of Computational and Graphical Statistics, 2011, 20, 260–278.
Müller P, Quintana F A, Rosner G L, Maitland M L. Bayesian inference for longitudinal data with non-parametric treatment effects. Biostatistics, 2014, 15(2): 341–352
Muthén L K, Muthén B O. Mplus user’s guild. Los Angels, CA: Muthén & Muthé, 1998. Biostatistics, 2014, 15(2): 341–352
Neal R M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist., 2000, 9(2): 249–265
Papaspiliopoulos O, Roberts G O. Retrospective Markov Chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika, 2008, 95(1): 169–186
Petrone S, Guindani M, Gelfand A E. Hybrid dirichlet mixture models for functional data. J. R. Stat. Soc. Ser. B. Stat. Methodol., 2009, 71(4): 755–782
Pitman J. Some developments of the Blackwell-MacQueen urn scheme. In: Statistics, Probability and Game Theory, Papers in honor of David Blackwell, Hayward, CA: IMS, 1996, 245–267
Pitman J. Random discrete distributions invariant under size-biased permutation. Adv. Appl. Probab., 1996, 28(2): 525–539
Reich B J, Fuentes M. A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. Ann. Appl. Stat., 2007, 1(1): 249–264
Ripley B D. Stochastic Simulation. Chichester: John Wiley & Sons, 1987
Rodríguez, A, Dunson D B, Gelfand A E. The nested Dirichlet process. J. Amer. Statist. Assoc., 2008, 103(483): 1131–1154
Rodriguez A, Dunson D B, Gelfand A E. Bayesian nonparametric functional data analysis through density estimation. Biometrika, 2009, 96(1): 149–162
Scarpa B, Dunson D B. Enriched stick-breaking processes for functional data. J. Amer. Statist. Assoc., 2014, 109(506): 647–660
Sethuraman J. A constructive definition of Dirichlet priors. Statist. Sin., 1994, 4(2): 639–650
Sethuraman J, Tiwari R C. Convergence of Dirichlet measures and the interpretation of their parameters. In: Statistical Decision Theory and Related Topics III, New York: Academic Press, 1982: 305–316
Skrondal A, Rabe-Hesketh S. Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. New York: Chapman & Hall/CRC, 2004
Song X Y, Lee S Y. Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences. New York: John Wiley & Sons, 2012
Song X Y, Xia Y M, Lee S Y. Bayesian semiparametric analysis of structural equation models with mixed continuous and unordered categorical variables. Statist. Med., 2009, 28(17): 2253–2276
Song X Y, Xia Y M, Pan J H, Lee S Y. Model comparison of Bayesian semiparametric and parametric structural equation models. Struct. Equat. Model., 2011, 18(1): 55–72
Tang A M, Tang N S. Semiparametric Bayesian inference on skew-normal joint modeling of multivariate longitudinal and survival data. Statist. Med., 2015, 34(5): 824–843
Tanner M A, Wong W H. The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc., 1987, 82(398): 528–540
Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical Dirichlet processes, J. Amer. Statist. Assoc., 2006, 101(476): 1566–1581
Tomlinson G, Escobar M. Analysis of densities. Technical Report, Toronto: University of Toronto, 1999
Walker S G. Sampling the Dirichlet mixture model with slices, Comm. Statist. Simulation Comput., 2007, 36(1): 45–54
West M, Muöller P, Escobar M D. Hierarchical priors and mixtures models, with applications in regression and density estimates. In: Aspects of Uncertainty, A Tribute to D. V. Lindley. London: John Wiley & Sons, 1994: 363–386
Xia Y M, Gou J W. Assessing heterogeneity in multilevel factor analysis model: A semiparametric Bayesian approach. Acta Math. Sin., 2015, 38(4): 751–768 (in Chinese)
Xia Y M, Gou J W. Bayesian semiparametric analysis for latent variable models with mixed continuous and ordinal outcomes. J. Korean Statist. Soc., 2016, 45(3): 451–465
Xia Y M, Gou J W, Liu Y A. Semi-parametric Bayesian analysis for factor analysis model mixed with hidden Markov model. Appl. Math. J. Chinese Univ. Ser. A, 2015, 30(1): 17–30 (in Chinese)
Xia Y M, Liu Y A. Bayesian semiparametric analysis and model comparison for confirmatory factor model. Chinese J. Appl. Probab. Statist., 2016, 32(2): 157–183
Xia Y M, Pan M L. Bayesian analysis for confirmatory factor model with finite-dimensional Dirichlet prior mixing. Comm. Statist. Theory Methods, 2017, 46(9): 4599–4619
Xia Y M, Tang N S. Bayesian analysis for mixture of latent variable hidden Markov models with multivariate longitudinal data. Computational Statistics & Data Analysis, 2019, 132: 190–211
Yang M G, Dunson D B. Bayesian semiparametric structural equation models with latent variables. Psychometrika, 2010, 75(4): 675–693
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (Grant No. 11471161) and the Technological Innovation Item in Jiangsu Province (No. BK2008156).
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated from Advances in Mathematics (China), 2017, 46(5): 641–666
Rights and permissions
About this article
Cite this article
Xia, Y., Liu, Y. & Gou, J. Dirichlet process and its developments: a survey. Front. Math. China 17, 79–115 (2022). https://doi.org/10.1007/s11464-022-1004-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11464-022-1004-3
Keywords
- Nonparametric Bayes
- Dirichlet process
- Pólya urn prediction
- Sethuraman representation
- stick-breaking procedure
- Chinese restaurant rule
- mixture of Dirichlet process
- dependence Dirichlet process
- Markov Chains Monte Carlo
- blocked Gibbs sampler
- latent variable models