Knowledge and Information Systems

, Volume 47, Issue 2, pp 463–488 | Cite as

Data clustering using side information dependent Chinese restaurant processes

  • Cheng LiEmail author
  • Santu Rana
  • Dinh Phung
  • Svetha Venkatesh
Regular Paper


Side information, or auxiliary information associated with documents or image content, provides hints for clustering. We propose a new model, side information dependent Chinese restaurant process, which exploits side information in a Bayesian nonparametric model to improve data clustering. We introduce side information into the framework of distance dependent Chinese restaurant process using a robust decay function to handle noisy side information. The threshold parameter of the decay function is updated automatically in the Gibbs sampling process. A fast inference algorithm is proposed. We evaluate our approach on four datasets: Cora, 20 Newsgroups, NUS-WIDE and one medical dataset. Types of side information explored in this paper include citations, authors, tags, keywords and auxiliary clinical information. The comparison with the state-of-the-art approaches based on standard performance measures (NMI, F1) clearly shows the superiority of our approach.


Side information Similarity Data clustering Bayesian nonparametric models 



We thank anonymous reviewers for their very useful comments and suggestions.


  1. 1.
    Aggarwal CC, Zhao Y, Yu PS (2012) On text clustering with side information. Int Conf Data Eng 0:894–904Google Scholar
  2. 2.
    Akaike H (1973) Information theory and an extension of the maximum likelihood principle, the 2nd international symposium on information theory, p 267–281Google Scholar
  3. 3.
    Aldous D (1985) Exchangeability and related topics. Ecole d’Ete de Probabilities de Saint-Flour XIII 1983:1–198MathSciNetzbMATHGoogle Scholar
  4. 4.
    Antoniak CE (1974) Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann Stat 2(6):1152–1174MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Basu S, Banerjee A, Mooney R (2004) Active semi-supervision for pairwise constrained clustering. In: proceeding of SIAM international conference on data mining, pp 333–344Google Scholar
  6. 6.
    Bilmes JA (1997) A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models, Technical reportGoogle Scholar
  7. 7.
    Blei DM, Frazier PI (2011) Distance dependent chinese restaurant processes. J Mach Learn Res 12:2461–2488MathSciNetzbMATHGoogle Scholar
  8. 8.
    Blei DM, Griffiths TL, Jordan MI, Tenenbaum JB (2004) Hierarchical topic models and the nested Chinese restaurant process, advances in Neural information processing systemsGoogle Scholar
  9. 9.
    Cai D, He X, Han J, Huang TS (2011) Graph regularized non-negative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560CrossRefGoogle Scholar
  10. 10.
    Chen Q, Song Z, Hua Y, Huang Z, Yan S (2012) Hierarchical matching with side information for image classification, computer vision and pattern recognition (CVPR), pp 3426–3433Google Scholar
  11. 11.
    Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In proceedings of the ACM international conference on image and video retrieval, pp 1–9Google Scholar
  12. 12.
    Duan J, Guindani M, Gelfand A (2007) Generalized spatial Dirichlet process models. Biometrika 94:809–825MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Elkan C (2006) Clustering documents with an exponential-family approximation of the dirichlet compound multinomial distribution. Int Conf Mach Learn 148:289–296Google Scholar
  14. 14.
    Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1(2):209–230MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396CrossRefGoogle Scholar
  16. 16.
    Finkel JR, Grenager T, Manning CD (2007) The infinite tree. In proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 272–279Google Scholar
  17. 17.
    Gelman A, Carlin JB, Stern HS, Rubin DB (2003) Bayesian data analysis, 2nd edn (Chapman & Hall/CRC texts in statistical science)Google Scholar
  18. 18.
    Gershman SJ, Blei DM (2011) A tutorial on bayesian nonparametric models. J Math Psychol 56:1–12Google Scholar
  19. 19.
    Ghosh S, Ungureanu AB, Sudderth EB, Blei DM (2011) Spatial distance dependent chinese restaurant processes for image segmentation, NIPS, pp. 1476–1484Google Scholar
  20. 20.
    Griffin JE, Steel MFJ (2006) Order-based dependent Dirichlet processes. J Am Stat Assoc 101(473):179–194MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Huang A (2008) Similarity measures for text document clustering, New Zealand computer science research student conference, pp 49–56Google Scholar
  22. 22.
    Jiang W, Xie L, Chang S-F (2009) Visual saliency with side information, IEEE international conference on acoustics, speech and signal processing, pp 1765–1768Google Scholar
  23. 23.
    Kim D, Oh A (2011) Accounting for data dependencies within a hierarchical dirichlet process mixture model, CIKM, pp 873–878Google Scholar
  24. 24.
    Li C, Phung D, Rana S, Venkatesh S (2013) Exploiting side information in distance dependent chinese restaurant processes for data clustering, international conference on multimedia and expo (ICME), pp 1–6Google Scholar
  25. 25.
    Lowe DG (1999) Object recognition from local scale-invariant features. In proceedings of the international conference on computer vision, WashingtonGoogle Scholar
  26. 26.
    MacEachern SN (1999) Dependent nonparametric processesGoogle Scholar
  27. 27.
    Marin JM, Mengersen KL, Robert C (2005) Bayesian modelling and inference on mixtures of distributions. Handbook stat 25(16):459–507MathSciNetCrossRefGoogle Scholar
  28. 28.
    Neal RM (2000) Markov chain sampling methods for dirichlet process mixture models. J Comput Graph Stat 9(2):249–265MathSciNetGoogle Scholar
  29. 29.
    Nigam K, Mccallum AK, Thrun S, Mitchell T (1999) Text classification from labeled and unlabeled documents using em, machine learning, pp 103–134Google Scholar
  30. 30.
    Orbanz P (2010) Bayesian nonparametric models, Technical reportGoogle Scholar
  31. 31.
    Porteous I, Asuncion AU, Welling M (2010) Bayesian matrix factorization with side information and dirichlet process mixtures., AAAIGoogle Scholar
  32. 32.
    Ross J, Dy J (2013) Nonparametric mixture of gaussian processes with constraints, international conference machine learning, pp 1346–1354Google Scholar
  33. 33.
    Schwarz GE (1978) Estimating the dimension of a model. Ann Stat 6:461–464MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Sethuraman J (1994) A constructive definition of Dirichlet priors. Stat Sin 4:639–650MathSciNetzbMATHGoogle Scholar
  35. 35.
    Socher R, Maas A, Manning CD (2011) Spectral chinese restaurant processes: nonparametric clustering based on similarities, 14th international conference on artificial intelligence and statistics (AISTATS)Google Scholar
  36. 36.
    Song Y, Pan S, Liu S, Wei F, Zhou MX, Qian W (2010) Constrained co-clustering for textual documents, AAAIGoogle Scholar
  37. 37.
    Soumya G, Michalis R, Leonid S, Erik S (2014) Nonparametric clustering with distance dependent hierarchies, uncertainty in artificial intelligenceGoogle Scholar
  38. 38.
    Sudderth EB (2006) Graphical models for visual object recognition and tracking, PhD thesisGoogle Scholar
  39. 39.
    Sudderth EB, Torralba A, Freeman WT, Willsky AS (2005) Describing visual scenes using transformed dirichlet processes. Adv Neural Inf Process Syst 18:1299–1306Google Scholar
  40. 40.
    Sudderth E, Torralba A, Freeman W, Willsky A (2008) Describing visual scenes using transformed objects and parts. Int J Comput Vis 77(1):291–330CrossRefGoogle Scholar
  41. 41.
    Teh YW, Jordan MI, Beal MJ, Blei DM (2006) Hierarchical dirichlet processes. JASA 101:1566–1581MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Vlachos A, Ghahramani Z, Korhonen A (2008) Dirichlet process mixture models for verb clustering, ICML workshop on prior knowledge for text and language processingGoogle Scholar
  43. 43.
    Vlachos A, Korhonen A, Ghahramani Z (2009) Unsupervised and constrained dirichlet process mixture models for verb clustering, GEMS ’09. In: proceedings of the workshop on geometrical models of natural language semanticsGoogle Scholar
  44. 44.
    Wagsta K, Cardie C, Schroedl S (2001) Constrained k-means clustering with background knowledge, international conference on machine learningGoogle Scholar
  45. 45.
    Wang X, Qian B, Davidson I (2012) On constrained spectral clustering and its applications, CoRR abs/1201.5338Google Scholar
  46. 46.
    Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization, international ACM SIGIR conference on research and development in information retrieval, pp 267–273Google Scholar
  47. 47.
    Yang T, Jin R, Jain AK (2010) Learning from noisy side information by generalized maximum entropy model, international conference on machine learningGoogle Scholar
  48. 48.
    Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization, the 14th international conference on machine learning, pp 412–420Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Cheng Li
    • 1
    Email author
  • Santu Rana
    • 1
  • Dinh Phung
    • 1
  • Svetha Venkatesh
    • 1
  1. 1.Centre for Pattern Recognition and Data AnalyticsDeakin UniversityGeelongAustralia

Personalised recommendations