Advertisement

Computational Statistics

, Volume 32, Issue 1, pp 145–177 | Cite as

Non-parametric clustering over user features and latent behavioral functions with dual-view mixture models

  • Alberto Lumbreras
  • Julien Velcin
  • Marie Guégan
  • Bertrand Jouve
Original Paper
  • 197 Downloads

Abstract

We present a dual-view mixture model to cluster users based on their features and latent behavioral functions. Every component of the mixture model represents a probability density over a feature view for observed user attributes and a behavior view for latent behavioral functions that are indirectly observed through user actions or behaviors. Our task is to infer the groups of users as well as their latent behavioral functions. We also propose a non-parametric version based on a Dirichlet Process to automatically infer the number of clusters. We test the properties and performance of the model on a synthetic dataset that represents the participation of users in the threads of an online forum. Experiments show that dual-view models outperform single-view ones when one of the views lacks information.

Keywords

Multi-view clustering Model-based clustering Dirichlet Process (DP) Chinese Restaurant Process (CRP) 

References

  1. Abbasnejad E, Sanner S, Bonilla EV, Poupart P (2013) Learning community-based preferences via Dirichlet process mixtures of Gaussian processes. In: Proceedings of the 23rd international joint conference on artificial intelligence, IJCAI’13. AAAI Press, pp 1213–1219Google Scholar
  2. Anderson E (1935) The irises of the Gaspe Peninsula. Bull Am Iris Soc 59:2–5Google Scholar
  3. Bickel S, Scheffer T (2004) Multi-view clustering. In: Proceedings of the fourth IEEE international conference on data mining, ICDM’04. IEEE Computer Society, Washington, DC, USA, pp 19–26Google Scholar
  4. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory, COLT’98. ACM, New York, NY, USA, pp 92–100Google Scholar
  5. Bonilla EV, Guo S, Sanner S (2010) Gaussian process preference elicitation. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems 23. Curran Associates, Inc, Red Hook, pp 262–270Google Scholar
  6. Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97(1):262–267CrossRefGoogle Scholar
  7. Cheng Y, Agrawal A, Choudhary A, Liu H, Zhang T (2014) Social role identification via dual uncertainty minimization regularization. In: IEEE international conference on data mining. IEEE, pp 767–772Google Scholar
  8. Cheung KW, Tsui KC, Liu J (2004) Extended latent class models for collaborative recommendation. IEEE Trans Syst Man Cybern Part A Syst Hum 34(1):143–148CrossRefGoogle Scholar
  9. Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Do KA, Müller P, Vannucci M (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, Cambridge, pp 201–218CrossRefGoogle Scholar
  10. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95(25):14863–14868CrossRefGoogle Scholar
  11. Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) mclust version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. http://cran.r-project.org/package=mclust
  12. Gilks W, Wild P (1992) Adaptive rejection sampling for Gibbs sampling. Appl Stat 41(2):337–348CrossRefzbMATHGoogle Scholar
  13. Görür D, Rasmussen CE (2010) Dirichlet process Gaussian mixture models: choice of the base distribution. J Comput Sci Technol 25:653–664MathSciNetCrossRefGoogle Scholar
  14. Greene D, Pádraig C (2009) Multi-view clustering for mining heterogeneous social network data. In: Workshop on information retrieval over social networks, 31st European conference on information retrievalGoogle Scholar
  15. Kamishima T, Akaho S (2009) Efficient clustering for orders. In: Zighed DA, Tsumoto S, Ras ZW, Hacid H (eds) Mining complex data. Springer, Berlin, pp 261–279CrossRefGoogle Scholar
  16. Kumar A, Rai P, Daume H (2011) Co-regularized multi-view spectral clustering. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems 24. Curran Associates, Inc, Red Hook, pp 1413–1421Google Scholar
  17. Neal RM (2000) Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 9(2):249–265MathSciNetGoogle Scholar
  18. Niu D, Dy JG, Ghahramani Z (2012) A nonparametric Bayesian model for multiple clustering with overlapping feature views. In: Proceedings of the 15th international conference on artificial intelligence and statistics. JMLR, pp 814–822Google Scholar
  19. Pavlidis P, Weston J, Cai J, Noble WS (2002) Learning gene functional classifications from multiple data types. J Comput Biol 9(2):401–411CrossRefGoogle Scholar
  20. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Nat Acad Sci USA 96(8):4285–4288CrossRefGoogle Scholar
  21. Plummer M, Best N, Cowles K, Vines K (2006) CODA: convergence diagnosis and output analysis for MCMC. R News 6(1):7–11Google Scholar
  22. Plummer M, Best N, Cowles K, Vines K, Sarkar D, Bates D, Almond R, Magnusson A (2015) coda: Output analysis and diagnostics for MCMC. R package version 0.18-1. http://cran.r-project.org/package=coda
  23. R Core Team (2016) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
  24. Rasmussen CE (2000) The infinite Gaussian mixture model. In: Solla SA, Leen TK, Müller K (eds) Advances in neural information processing systems 12. MIT Press, Cambridge, pp 554–560Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.TechnicolorCesson-SévignéFrance
  2. 2.Laboratoire ERICUniversité de LyonBronFrance
  3. 3.FRAMESPA - UMR 5136, CNRSUniversité de ToulouseToulouse, Cedex 9France
  4. 4.IMT - UMR 5219, CNRSUniversité de ToulouseToulouse, Cedex 9France

Personalised recommendations