Abstract
In many clustering problems, we have access to multiple sources of data representing different aspects of the problem. Each of these data separately represents an association between entities. Multi-view clustering involves integrating clustering information from these heterogeneous sources of data and has been shown to improve results over a single-view clustering. On the other hand, co-clustering has been widely used as a technique to improve clustering results on a single view by exploiting the duality between objects and their attributes. In this paper, we propose a multi-view clustering setting in the context of a co-clustering framework. Our underlying assumption is that similarity values generated from the individual data can be transferred from one view to the other(s) resulting in a better clustering of the data. We provide empirical evidence to show that this framework results in a better clustering accuracy than those obtained from any of the single views, tested on different datasets.
Similar content being viewed by others
Notes
In practice, a sample is used to estimate the value corresponding to \(\rho \,\%\) rather than sorting the values in the entire matrix and removing the lowest values.
We ignore here the normalization factor for the sake of clarity.
Values for \(\rho =1\) and \(\lambda =0\) are omitted since they will result in all 0’s (pruning 100 % of similarity values in R) and all 1’s (raising all values in M to the power 0) in the R and M matrices, respectively.
References
Aggarwal C, Hinneburg A, Keim D (2001) On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory (ICDT), pp 420–434
Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data. Springer, Berlin Heidelberg, p 25–71
Berry MW (2007) Survey of text mining: clustering, classification, and retrieval. Springer, New York
Bickel S, Scheffer T (2004) Multi-view clustering. In: Proceedings of the IEEE international conference on data mining
Bisson G, Grimal C (2012) Co-clustering of multi-view datasets: a parallelizable approach. In: Proceedings of the 2012 IEEE 12th international conference on data mining. IEEE Computer Society, pp 828–833
Bisson G, Hussain F (2008) Chi-Sim: a new similarity measure for the co-clustering task. In: Proceedings of the 2008 seventh international conference on machine learning and applications, pp 211–217
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory, pp 92–100
Cai X, Nie F, Huang H (2013) Multi-view k-means clustering on big data. In: Proceedings of the twenty-third international joint conference on artificial intelligence. AAAI Press, pp 2598–2604
Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2009) Multi-view clustering via canonical correlation analysis. In: Proceedings of the 26th annual international conference on machine learning, pp 129–136
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, pp 93–103
Drost I, Bickel S, Scheffer T (2006) Discovering communities in linked data by multi-view clustering. In: From Data and Information Analysis to Knowledge Egineering. Springer, Berlin Heidelberg, p 342–349
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, London
Hussain SF (2010) A new co-similarity measure: application to text mining and bioinformatics. Institut National Polytechnique de Grenoble-INPG, Grenoble
Hussain SF (2011) Bi-clustering gene expression data using co-similarity. In: proceedings of the 7th international conference on Advanced Data Mining and Applications (ADMA), Beijing, p 190–200
Hussain SF, Bisson G (2010) Text categorization using word similarities based on higher order co-occurrences. In: SIAM international conference on data mining (SDM 10), Columbus, OH
Hussain SF, Bisson G, Grimal C (2010) An improved co-similarity measure for document clustering. In: Ninth international conference on machine learning and applications (ICMLA), pp 190–197
Hussain SF, Mushtaq M, Halim Z (2014) Multi-view document clustering via ensemble method. J Intell Inf Syst 43(1):81–99
Ienco D, Robardet C, Pensa RG, Meo R (2013) Parameter-less co-clustering for star-structured heterogeneous data. Data Min Knowl Discov 26:217–254
Ilc N, Dobnikar A (2012) Generation of a clustering ensemble based on a gravitational self-organising map. Neurocomputing 96:47–56
Izakian H, Pedrycz W (2014) Agreement-based fuzzy C-means for clustering data with blocks of features. Neurocomputing 127:266–280
Janssens F, Glänzel W, De Moor B (2007) Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 360–369
Liu J, Wang C, Gao J, Han J (2013) Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of SIAM data mining conference, pp 252–260
Liu X, Yu S, Janssens F, Glänzel W, Moreau Y, De Moor B (2010) Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. J Am Soc Inf Sci Technol 61:1105–1119
Liu X, Yu S, Moreau Y, De Moor B, Glänzel W, Janssens F (2009) Hybrid clustering of text mining and bibliometrics applied to journal sets. In: Proceedings of SIAM data mining conference
Mirzaei A, Rahmati M (2010) A novel hierarchical–clustering–combination scheme based on fuzzy-similarity relations. IEEE Trans Fuzzy Syst 18:27–39
Mirzaei H (2010) A novel multi-view agglomerative clustering algorithm based on ensemble of partitions on different views. In: IEEE 20th international conference on pattern recognition (ICPR), pp 1007–1010
Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5:32–38
Priam R, Nadif M, Govaert G (2013) Gaussian topographic co-clustering model. In: Advances in intelligent data analysis, XII. Springer, Berlin, pp 345–356
Qian M, Zhai C (2014) Unsupervised feature selection for multi-view clustering on text-image web news data. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 1963–1966
Slonim N, Tishby N (2001) The power of word clusters for text classification. In: Proceedings of the 23rd European colloquium on information retrieval research
Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. Ph.D. Thesis, The University of Texas at Austin
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23:2031–2038
Tang L, Wang X, Liu H (2010) Community detection in multi-dimensional networks. Technical Report, Defense Technical Information Center
Tang W, Lu Z, Dhillon IS (2009) Clustering with multiple graphs. In: Ninth IEEE international conference on data mining, ICDM’09, pp 1016–1021
Wang P, Laskey KB, Domeniconi C, Jordan MI (2011) Nonparametric Bayesian co-clustering ensembles. In: Proceedings of SIAM data mining conference
Zheng L, Li T, Ding C (2010) Hierarchical ensemble clustering. In: 2010 IEEE 10th international conference on data mining (ICDM), pp 1199–1204
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hussain, S.F., Bashir, S. Co-clustering of multi-view datasets. Knowl Inf Syst 47, 545–570 (2016). https://doi.org/10.1007/s10115-015-0861-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0861-4