Skip to main content
Log in

Co-clustering of multi-view datasets

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In many clustering problems, we have access to multiple sources of data representing different aspects of the problem. Each of these data separately represents an association between entities. Multi-view clustering involves integrating clustering information from these heterogeneous sources of data and has been shown to improve results over a single-view clustering. On the other hand, co-clustering has been widely used as a technique to improve clustering results on a single view by exploiting the duality between objects and their attributes. In this paper, we propose a multi-view clustering setting in the context of a co-clustering framework. Our underlying assumption is that similarity values generated from the individual data can be transferred from one view to the other(s) resulting in a better clustering of the data. We provide empirical evidence to show that this framework results in a better clustering accuracy than those obtained from any of the single views, tested on different datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. In practice, a sample is used to estimate the value corresponding to \(\rho \,\%\) rather than sorting the values in the entire matrix and removing the lowest values.

  2. We ignore here the normalization factor for the sake of clarity.

  3. Values for \(\rho =1\) and \(\lambda =0\) are omitted since they will result in all 0’s (pruning 100 % of similarity values in R) and all 1’s (raising all values in M to the power 0) in the R and M matrices, respectively.

References

  1. Aggarwal C, Hinneburg A, Keim D (2001) On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory (ICDT), pp 420–434

  2. Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data. Springer, Berlin Heidelberg, p 25–71

  3. Berry MW (2007) Survey of text mining: clustering, classification, and retrieval. Springer, New York

    Google Scholar 

  4. Bickel S, Scheffer T (2004) Multi-view clustering. In: Proceedings of the IEEE international conference on data mining

  5. Bisson G, Grimal C (2012) Co-clustering of multi-view datasets: a parallelizable approach. In: Proceedings of the 2012 IEEE 12th international conference on data mining. IEEE Computer Society, pp 828–833

  6. Bisson G, Hussain F (2008) Chi-Sim: a new similarity measure for the co-clustering task. In: Proceedings of the 2008 seventh international conference on machine learning and applications, pp 211–217

  7. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory, pp 92–100

  8. Cai X, Nie F, Huang H (2013) Multi-view k-means clustering on big data. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 2598–2604

  9. Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2009) Multi-view clustering via canonical correlation analysis. In: Proceedings of the 26th annual international conference on machine learning, pp 129–136

  10. Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, pp 93–103

  11. Drost I, Bickel S, Scheffer T (2006) Discovering communities in linked data by multi-view clustering. In: From Data and Information Analysis to Knowledge Egineering. Springer, Berlin Heidelberg, p 342–349

  12. Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, London

    MATH  Google Scholar 

  13. Hussain SF (2010) A new co-similarity measure: application to text mining and bioinformatics. Institut National Polytechnique de Grenoble-INPG, Grenoble

    Google Scholar 

  14. Hussain SF (2011) Bi-clustering gene expression data using co-similarity. In: proceedings of the 7th international conference on Advanced Data Mining and Applications (ADMA), Beijing, p 190–200

  15. Hussain SF, Bisson G (2010) Text categorization using word similarities based on higher order co-occurrences. In: SIAM international conference on data mining (SDM 10), Columbus, OH

  16. Hussain SF, Bisson G, Grimal C (2010) An improved co-similarity measure for document clustering. In: Ninth international conference on machine learning and applications (ICMLA), pp 190–197

  17. Hussain SF, Mushtaq M, Halim Z (2014) Multi-view document clustering via ensemble method. J Intell Inf Syst 43(1):81–99

  18. Ienco D, Robardet C, Pensa RG, Meo R (2013) Parameter-less co-clustering for star-structured heterogeneous data. Data Min Knowl Discov 26:217–254

    Article  MathSciNet  MATH  Google Scholar 

  19. Ilc N, Dobnikar A (2012) Generation of a clustering ensemble based on a gravitational self-organising map. Neurocomputing 96:47–56

    Article  Google Scholar 

  20. Izakian H, Pedrycz W (2014) Agreement-based fuzzy C-means for clustering data with blocks of features. Neurocomputing 127:266–280

    Article  Google Scholar 

  21. Janssens F, Glänzel W, De Moor B (2007) Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 360–369

  22. Liu J, Wang C, Gao J, Han J (2013) Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of SIAM data mining conference, pp 252–260

  23. Liu X, Yu S, Janssens F, Glänzel W, Moreau Y, De Moor B (2010) Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. J Am Soc Inf Sci Technol 61:1105–1119

    Google Scholar 

  24. Liu X, Yu S, Moreau Y, De Moor B, Glänzel W, Janssens F (2009) Hybrid clustering of text mining and bibliometrics applied to journal sets. In: Proceedings of SIAM data mining conference

  25. Mirzaei A, Rahmati M (2010) A novel hierarchical–clustering–combination scheme based on fuzzy-similarity relations. IEEE Trans Fuzzy Syst 18:27–39

    Article  Google Scholar 

  26. Mirzaei H (2010) A novel multi-view agglomerative clustering algorithm based on ensemble of partitions on different views. In: IEEE 20th international conference on pattern recognition (ICPR), pp 1007–1010

  27. Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5:32–38

    Article  MathSciNet  MATH  Google Scholar 

  28. Priam R, Nadif M, Govaert G (2013) Gaussian topographic co-clustering model. In: Advances in intelligent data analysis, XII. Springer, Berlin, pp 345–356

  29. Qian M, Zhai C (2014) Unsupervised feature selection for multi-view clustering on text-image web news data. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 1963–1966

  30. Slonim N, Tishby N (2001) The power of word clusters for text classification. In: Proceedings of the 23rd European colloquium on information retrieval research

  31. Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. Ph.D. Thesis, The University of Texas at Austin

  32. Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  33. Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23:2031–2038

    Article  Google Scholar 

  34. Tang L, Wang X, Liu H (2010) Community detection in multi-dimensional networks. Technical Report, Defense Technical Information Center

  35. Tang W, Lu Z, Dhillon IS (2009) Clustering with multiple graphs. In: Ninth IEEE international conference on data mining, ICDM’09, pp 1016–1021

  36. Wang P, Laskey KB, Domeniconi C, Jordan MI (2011) Nonparametric Bayesian co-clustering ensembles. In: Proceedings of SIAM data mining conference

  37. Zheng L, Li T, Ding C (2010) Hierarchical ensemble clustering. In: 2010 IEEE 10th international conference on data mining (ICDM), pp 1199–1204

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Syed Fawad Hussain.

Appendix

Appendix

See Tables 8 and 9.

Table 8 NMI scores for the different experiments corresponding to Table 4
Table 9 NMI scores for the different experiments corresponding to Table 5

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hussain, S.F., Bashir, S. Co-clustering of multi-view datasets. Knowl Inf Syst 47, 545–570 (2016). https://doi.org/10.1007/s10115-015-0861-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0861-4

Keywords

Navigation