Co-clustering of multi-view datasets

Hussain, Syed Fawad; Bashir, Shariq

doi:10.1007/s10115-015-0861-4

Co-clustering of multi-view datasets

Regular Paper
Published: 17 July 2015

Volume 47, pages 545–570, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

1263 Accesses
25 Citations
Explore all metrics

Abstract

In many clustering problems, we have access to multiple sources of data representing different aspects of the problem. Each of these data separately represents an association between entities. Multi-view clustering involves integrating clustering information from these heterogeneous sources of data and has been shown to improve results over a single-view clustering. On the other hand, co-clustering has been widely used as a technique to improve clustering results on a single view by exploiting the duality between objects and their attributes. In this paper, we propose a multi-view clustering setting in the context of a co-clustering framework. Our underlying assumption is that similarity values generated from the individual data can be transferred from one view to the other(s) resulting in a better clustering of the data. We provide empirical evidence to show that this framework results in a better clustering accuracy than those obtained from any of the single views, tested on different datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

In practice, a sample is used to estimate the value corresponding to \(\rho \,\%\) rather than sorting the values in the entire matrix and removing the lowest values.
We ignore here the normalization factor for the sake of clarity.
Values for \(\rho =1\) and \(\lambda =0\) are omitted since they will result in all 0’s (pruning 100 % of similarity values in R) and all 1’s (raising all values in M to the power 0) in the R and M matrices, respectively.

References

Aggarwal C, Hinneburg A, Keim D (2001) On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory (ICDT), pp 420–434
Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data. Springer, Berlin Heidelberg, p 25–71
Berry MW (2007) Survey of text mining: clustering, classification, and retrieval. Springer, New York
Google Scholar
Bickel S, Scheffer T (2004) Multi-view clustering. In: Proceedings of the IEEE international conference on data mining
Bisson G, Grimal C (2012) Co-clustering of multi-view datasets: a parallelizable approach. In: Proceedings of the 2012 IEEE 12th international conference on data mining. IEEE Computer Society, pp 828–833
Bisson G, Hussain F (2008) Chi-Sim: a new similarity measure for the co-clustering task. In: Proceedings of the 2008 seventh international conference on machine learning and applications, pp 211–217
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory, pp 92–100
Cai X, Nie F, Huang H (2013) Multi-view k-means clustering on big data. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 2598–2604
Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2009) Multi-view clustering via canonical correlation analysis. In: Proceedings of the 26th annual international conference on machine learning, pp 129–136
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, pp 93–103
Drost I, Bickel S, Scheffer T (2006) Discovering communities in linked data by multi-view clustering. In: From Data and Information Analysis to Knowledge Egineering. Springer, Berlin Heidelberg, p 342–349
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, London
MATH Google Scholar
Hussain SF (2010) A new co-similarity measure: application to text mining and bioinformatics. Institut National Polytechnique de Grenoble-INPG, Grenoble
Google Scholar
Hussain SF (2011) Bi-clustering gene expression data using co-similarity. In: proceedings of the 7th international conference on Advanced Data Mining and Applications (ADMA), Beijing, p 190–200
Hussain SF, Bisson G (2010) Text categorization using word similarities based on higher order co-occurrences. In: SIAM international conference on data mining (SDM 10), Columbus, OH
Hussain SF, Bisson G, Grimal C (2010) An improved co-similarity measure for document clustering. In: Ninth international conference on machine learning and applications (ICMLA), pp 190–197
Hussain SF, Mushtaq M, Halim Z (2014) Multi-view document clustering via ensemble method. J Intell Inf Syst 43(1):81–99
Ienco D, Robardet C, Pensa RG, Meo R (2013) Parameter-less co-clustering for star-structured heterogeneous data. Data Min Knowl Discov 26:217–254
Article MathSciNet MATH Google Scholar
Ilc N, Dobnikar A (2012) Generation of a clustering ensemble based on a gravitational self-organising map. Neurocomputing 96:47–56
Article Google Scholar
Izakian H, Pedrycz W (2014) Agreement-based fuzzy C-means for clustering data with blocks of features. Neurocomputing 127:266–280
Article Google Scholar
Janssens F, Glänzel W, De Moor B (2007) Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 360–369
Liu J, Wang C, Gao J, Han J (2013) Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of SIAM data mining conference, pp 252–260
Liu X, Yu S, Janssens F, Glänzel W, Moreau Y, De Moor B (2010) Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. J Am Soc Inf Sci Technol 61:1105–1119
Google Scholar
Liu X, Yu S, Moreau Y, De Moor B, Glänzel W, Janssens F (2009) Hybrid clustering of text mining and bibliometrics applied to journal sets. In: Proceedings of SIAM data mining conference
Mirzaei A, Rahmati M (2010) A novel hierarchical–clustering–combination scheme based on fuzzy-similarity relations. IEEE Trans Fuzzy Syst 18:27–39
Article Google Scholar
Mirzaei H (2010) A novel multi-view agglomerative clustering algorithm based on ensemble of partitions on different views. In: IEEE 20th international conference on pattern recognition (ICPR), pp 1007–1010
Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5:32–38
Article MathSciNet MATH Google Scholar
Priam R, Nadif M, Govaert G (2013) Gaussian topographic co-clustering model. In: Advances in intelligent data analysis, XII. Springer, Berlin, pp 345–356
Qian M, Zhai C (2014) Unsupervised feature selection for multi-view clustering on text-image web news data. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 1963–1966
Slonim N, Tishby N (2001) The power of word clusters for text classification. In: Proceedings of the 23rd European colloquium on information retrieval research
Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. Ph.D. Thesis, The University of Texas at Austin
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
MathSciNet MATH Google Scholar
Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23:2031–2038
Article Google Scholar
Tang L, Wang X, Liu H (2010) Community detection in multi-dimensional networks. Technical Report, Defense Technical Information Center
Tang W, Lu Z, Dhillon IS (2009) Clustering with multiple graphs. In: Ninth IEEE international conference on data mining, ICDM’09, pp 1016–1021
Wang P, Laskey KB, Domeniconi C, Jordan MI (2011) Nonparametric Bayesian co-clustering ensembles. In: Proceedings of SIAM data mining conference
Zheng L, Li T, Ding C (2010) Hierarchical ensemble clustering. In: 2010 IEEE 10th international conference on data mining (ICDM), pp 1199–1204

Download references

Author information

Authors and Affiliations

Faculty of Computer Sciences and Engineering, GIK Institute, Topi, 23640, Pakistan
Syed Fawad Hussain
Bahria University, Islamabad, Pakistan
Shariq Bashir
New York University Abu Dhabi, Abu Dhabi, UAE
Shariq Bashir

Authors

Syed Fawad Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Shariq Bashir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Syed Fawad Hussain.

Appendix

See Tables 8 and 9.

Table 8 NMI scores for the different experiments corresponding to Table 4

Full size table

Table 9 NMI scores for the different experiments corresponding to Table 5

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hussain, S.F., Bashir, S. Co-clustering of multi-view datasets. Knowl Inf Syst 47, 545–570 (2016). https://doi.org/10.1007/s10115-015-0861-4

Download citation

Received: 26 November 2014
Revised: 06 April 2015
Accepted: 06 July 2015
Published: 17 July 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10115-015-0861-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Co-clustering of multi-view datasets

Abstract

Access this article

Similar content being viewed by others

Multi-view co-clustering with multi-similarity

Multi-View Clustering

Weighted multi-view co-clustering (WMVCC) for sparse data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Co-clustering of multi-view datasets

Abstract

Access this article

Similar content being viewed by others

Multi-view co-clustering with multi-similarity

Multi-View Clustering

Weighted multi-view co-clustering (WMVCC) for sparse data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation