Multi-view document clustering via ensemble method

Hussain, Syed Fawad; Mushtaq, Muhammad; Halim, Zahid

doi:10.1007/s10844-014-0307-6

Multi-view document clustering via ensemble method

Published: 12 February 2014

Volume 43, pages 81–99, (2014)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Syed Fawad Hussain¹,
Muhammad Mushtaq¹ &
Zahid Halim¹

1553 Accesses
46 Citations
Explore all metrics

Abstract

Multi-view clustering has become an important extension of ensemble clustering. In multi-view clustering, we apply clustering algorithms on different views of the data to obtain different cluster labels for the same set of objects. These results are then combined in such a manner that the final clustering gives better result than individual clustering of each multi-view data. Multi view clustering can be applied at various stages of the clustering paradigm. This paper proposes a novel multi-view clustering algorithm that combines different ensemble techniques. Our approach is based on computing different similarity matrices on the individual datasets and aggregates these to form a combined similarity matrix, which is then used to obtain the final clustering. We tested our approach on several datasets and perform a comparison with other state-of-the-art algorithms. Our results show that the proposed algorithm outperforms several other methods in terms of accuracy while maintaining the overall complexity of the individual approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

References

Aggarwal, C., Hinneburg, A., Keim, D. (2001). On the surprising behavior of distance metrics in high dimensional space. In International conference on database theory (ICDT) (pp. 420–434).
Ayad, H.G., & Kamel, M.S. (2008). Cumulative voting consensus method for partitions with variable number of clusters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1), 160–173.
Article Google Scholar
Bickel, S., & Scheffer, T. (2004). Multi-view clustering. In Fourth IEEE international conference on data mining, 2004. ICDM ’04 (pp. 19–26).
Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on computational learning theory (pp. 92–100). New York.
Chaudhuri, K., Kakade, S.M., Livescu, K., Sridharan, K. (2009). Multi-view clustering via canonical correlation analysis. In Proceedings of the 26th annual international conference on machine learning (pp. 129–136). New York.
De Carvalho, F.D.A., Lechevallier, Y., De Melo, F.M. (2012). Partitioning hard clustering algorithms based on multiple dissimilarity matrices. Pattern Recognition, 45(1), 447–464.
Article MATH Google Scholar
de Sa, V.R. (2005). Spectral clustering with two views. In ICML workshop on learning with multiple views.
Fred, A.L., & Jain, A.K. (2002). Data clustering using evidence accumulation. In Proceedings of the 16th international conference on pattern recognition, 2002. (vol. 4, pp. 276–280).
Frings, O., Alexeyenko, A., Sonnhammer, E.L. (2013). MGclus: network clustering employing shared neighbors. Molecular BioSystems.
Hu, B.-G., & Wang, Y. (2008). Evaluation criteria based on mutual information for classifications including rejected class. Acta Automatica Sinica, 34(11), 1396–1403.
Article MathSciNet Google Scholar
Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666.
Article Google Scholar
Janssens, F., Glänzel, W., De Moor, B. (2007). Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 360–369). New York.
Kontschieder, P., Donoser, M., Bischof, H. (2009). Improving affinity matrices by modified mutual kNN-Graphs. In 33rd workshop of the Austrian association for pattern recognition (AAPR/OAGM).
Kumar, A., & Daumé, H. III (2011). A co-training approach for multi-view spectral clustering. In International conference on machine learning.
Lan, M., Tan, C.L., Su, J., Lu, Y. (2009). Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 721– 735.
Article Google Scholar
Liu, X., Yu, S., Moreau, Y., De Moor, B., Glänzel, W., Janssens, F. (2009). Hybrid clustering of text mining and bibliometrics applied to journal sets. In Proceedings of the SIAM international data mining conference (SDM).
Long, B., Wu, X., Zhang, Z.M., Yu, P.S. (2006). Unsupervised learning on k-partite graphs. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 317–326).
Long, B., Yu Phillips, S., Zhang, Z. (2008). A general model for multiple view unsupervised learning. In Proceedings of the SIAM international data mining conference (SDM).
Mirzaei, A., Rahmati, M., Ahmadi, M. (2008). A new method for hierarchical clustering combination. Intelligent Data Analysis, 12(6), 549–571.
Google Scholar
Mooi, E., & Sarrstedt, M. (2011). A concise guide to market research. Berlin Heidelberg: Springer.
Book Google Scholar
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2–3), 103–134.
Article MATH Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T. (1999). The PageRank citation ranking: bringing order to the web. Stanford InfoLab.
Pavlidis, P., Cai, J., Weston, J., Noble, W.S. (2002). Learning gene functional classifications from multiple data types. Journal of Computational Biology, 9, 401–411.
Article Google Scholar
Reed, J.W., Jiao, Y., Potok, T.E., Klump, B.A., Elmore, M.T., Hurson, A.R. (2006). TF-ICF: a new term weighting scheme for clustering dynamic data streams. In Proceedings of the 5th international conference on machine learning and applications (pp. 258–263). Washington, DC.
Strehl, A., & Ghosh, J. (2003). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal Machine Learning Research, 3, 583–617.
MATH MathSciNet Google Scholar
Strehl, A., Ghosh, J., Cardie, C. (2002). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.
Google Scholar
Tang, W., Lu, Z., Dhillon, I.S. (2009). Clustering with multiple graphs. In Ninth IEEE international conference on data mining, 2009. ICDM ’09 (pp. 1016–1021).
Tang, L., Wang, X., Liu, H. (2010). Community detection in multi-dimensional networks. Technical Report, Defense Technical Information Center. [Available at http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=ADA555924].
Varga, R.S., & Nabben, R. (1993). On symmetric ultrametric matrices. In L. Reichel, A. Ruttan, R.S. Varga (Eds.) Numerical linear algebra (pp. 193–199). New York: Walter de Gruyter.
Google Scholar
Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3), 645–678.
Article Google Scholar
Zheng, L., Li, T., Ding, C. (2010). Hierarchical ensemble clustering. In Proceedings of the 2010 IEEE international conference on data mining (pp. 1199–1204). Washington, DC.

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Engineering, GIK Institute of Engineering Sciences and Technology, 23460, Topi, Pakistan
Syed Fawad Hussain, Muhammad Mushtaq & Zahid Halim

Authors

Syed Fawad Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Mushtaq
View author publications
You can also search for this author in PubMed Google Scholar
Zahid Halim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Syed Fawad Hussain.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hussain, S.F., Mushtaq, M. & Halim, Z. Multi-view document clustering via ensemble method. J Intell Inf Syst 43, 81–99 (2014). https://doi.org/10.1007/s10844-014-0307-6

Download citation

Received: 09 September 2013
Revised: 16 December 2013
Accepted: 14 January 2014
Published: 12 February 2014
Issue Date: August 2014
DOI: https://doi.org/10.1007/s10844-014-0307-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-view document clustering via ensemble method

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Anomaly Detection Algorithms

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Citation-based clustering of publications using CitNetExplorer and VOSviewer

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-view document clustering via ensemble method

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Anomaly Detection Algorithms

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Citation-based clustering of publications using CitNetExplorer and VOSviewer

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation