Unsupervised learning of Dirichlet process mixture models with missing data



This study presents a novel approach to unsupervised learning for clustering with missing data. We first extend a finite mixture model to the infinite case by considering Dirichlet process mixtures, which can automatically determine the number of mixture components or clusters. Furthermore, we view the missing features as latent variables and compute the posterior distributions using the variational Bayesian expectation maximization algorithm, which optimizes the evidence lower bound on the complete-data log marginal likelihood. We demonstrate the performance on several artificial data sets with missing values. The experimental results indicate that the proposed method outperforms some classic imputation methods. We finally present an application to seabed hydrothermal sulfide color images analysis problem.



This is a preview of subscription content, access via your institution.


  1. 1

    Li C Z, Xu Z B, Qiao C, et al. Hierarchical clustering driven by cognitive features. Sci China Inf Sci, 2014, 57: 012109

    MathSciNet  Google Scholar 

  2. 2

    Wu C M, Chou S C, Liaw H T. A trend based investment decision approach using clustering and heuristic algorithm. Sci China Inf Sci, 2014, 57: 092117

    MathSciNet  Google Scholar 

  3. 3

    McLachlan G, Peel D. Finite Mixture Models. Hoboken: John Wiley and Sons, 2004

    Google Scholar 

  4. 4

    Fan W, Bouguila N. Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection. Pattern Recogn, 2013, 46: 2754–2769

    Article  MATH  Google Scholar 

  5. 5

    Figueiredo M A T, Jain A K. Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell, 2002, 24: 381–396

    Article  Google Scholar 

  6. 6

    Neal R M. Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat, 2000, 9: 249–265

    MathSciNet  Google Scholar 

  7. 7

    Blei D M, Jordan M I. Variational inference for Dirichlet process mixtures. Bayesian Anal, 2006, 1: 121–143

    MathSciNet  Article  MATH  Google Scholar 

  8. 8

    Kim S, Tadesse M G, Vannucci M. Variable selection in clustering via Dirichlet process mixture models. Biometrika, 2006, 93: 877–893

    MathSciNet  Article  Google Scholar 

  9. 9

    Orbanz P, Buhmann J M. Nonparametric Bayesian image segmentation. Int J Comput Vision, 2008, 77: 25–45

    Article  Google Scholar 

  10. 10

    García-Laencina P J, Sancho-Gómez J L, Figueiras-Vidal A R. Pattern classification with missing data: a review. Neural Comput Appl, 2010, 19: 263–282

    Article  Google Scholar 

  11. 11

    Wang C, Liao X, Carin L, et al. Classification with incomplete data using Dirichlet process priors. J Mach Learn Res, 2010, 11: 3269–3311

    MathSciNet  MATH  Google Scholar 

  12. 12

    Williams D, Liao X J, Xue Y, et al. On classification with incomplete data. IEEE Trans Pattern Anal Mach Intell, 2007, 29: 427–436

    Article  Google Scholar 

  13. 13

    Schafer J L, Graham J W. Missing data: our view of the state of the art. Psychol Method, 2002, 7: 147–177

    Article  Google Scholar 

  14. 14

    Little R J A, Rubin D B. Statistical Analysis with Missing Data. 2nd ed. Hoboken: John Wiley and Sons, 2002

    Google Scholar 

  15. 15

    Chechik G, Heitz G, Elidan G, et al. Max-margin classification of data with absent features. J Mach Learn Res, 2008, 9: 1–21

    MATH  Google Scholar 

  16. 16

    Fidler S, Skocaj D, Leonardis A. Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling. IEEE Trans Pattern Anal Mach Intell, 2006, 28: 337–350

    Article  Google Scholar 

  17. 17

    Chan K, Lee T W, Sejnowski T J. Variational learning of clusters of undercomplete nonsymmetric independent components. J Mach Learn Res, 2003, 3: 99–114

    MathSciNet  MATH  Google Scholar 

  18. 18

    Teh Y W, Jordan M I, Beal M J, et al. Hierarchical dirichlet processes. J Amer Stat Assoc, 2006, 101: 1566–1581

    MathSciNet  Article  MATH  Google Scholar 

  19. 19

    Sethuraman J. A constructive definition of Dirichlet priors. Stat Sin, 1994, 4: 639–650

    MathSciNet  MATH  Google Scholar 

  20. 20

    Ghahramani Z, Beal M J. Propagation algorithms for variational Bayesian learning. In: Leen T K, Dietterich T, Tresp V, eds. Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2001. 507–513

    Google Scholar 

  21. 21

    Hughes M C, Sudderth E. Memoized online variational inference for Dirichlet process mixture models. In: Burges C J C, Bottou L, Welling M, et al, eds. Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2013. 1133–1141

    Google Scholar 

  22. 22

    Bishop C M. Pattern Recognition and Machine Learning. New York: springer, 2006

    Google Scholar 

  23. 23

    Lin T I, Lee J C, Ho H J. On fast supervised learning for normal mixture models with missing information. Pattern Recogn, 2006, 39: 1177–1187

    Article  MATH  Google Scholar 

  24. 24

    Collins L M, Schafer J L, Kam C M. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Method, 2001, 6: 330–351

    Article  Google Scholar 

  25. 25

    Meng X L, Rubin D B. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika, 1993, 80: 267–278

    MathSciNet  Article  MATH  Google Scholar 

  26. 26

    Ueda N, Nakano R. Deterministic annealing EM algorithm. Neural Netw, 1998, 11: 271–282

    Article  Google Scholar 

  27. 27

    Barnard K, Duygulu P, Forsyth D, et al. Matching words and pictures. J Mach Learn Res, 2003, 3: 1107–1135

    MATH  Google Scholar 

  28. 28

    Herzig P M, Hannington M D. Polymetallic massive sulfides at the modern seafloor a review. Ore Geol Rev, 1995, 10: 95–115

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Shiji Song.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Song, S., Zhu, L. et al. Unsupervised learning of Dirichlet process mixture models with missing data. Sci. China Inf. Sci. 59, 1–14 (2016). https://doi.org/10.1007/s11432-015-5429-0

Download citation


  • Dirichlet processes
  • missing data
  • clustering
  • variational Bayesian
  • image analysis
  • 012201


  • Dirichlet过程
  • 缺失数据
  • 聚类
  • 变分贝叶斯
  • 图像分析