Journal of Intelligent Information Systems

, Volume 43, Issue 1, pp 81–99 | Cite as

Multi-view document clustering via ensemble method

  • Syed Fawad Hussain
  • Muhammad Mushtaq
  • Zahid Halim
Article

Abstract

Multi-view clustering has become an important extension of ensemble clustering. In multi-view clustering, we apply clustering algorithms on different views of the data to obtain different cluster labels for the same set of objects. These results are then combined in such a manner that the final clustering gives better result than individual clustering of each multi-view data. Multi view clustering can be applied at various stages of the clustering paradigm. This paper proposes a novel multi-view clustering algorithm that combines different ensemble techniques. Our approach is based on computing different similarity matrices on the individual datasets and aggregates these to form a combined similarity matrix, which is then used to obtain the final clustering. We tested our approach on several datasets and perform a comparison with other state-of-the-art algorithms. Our results show that the proposed algorithm outperforms several other methods in terms of accuracy while maintaining the overall complexity of the individual approaches.

Keywords

Multi-view clustering Ensemble clustering Affinity matrix Similarity matrices 

References

  1. Aggarwal, C., Hinneburg, A., Keim, D. (2001). On the surprising behavior of distance metrics in high dimensional space. In International conference on database theory (ICDT) (pp. 420–434).Google Scholar
  2. Ayad, H.G., & Kamel, M.S. (2008). Cumulative voting consensus method for partitions with variable number of clusters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1), 160–173.CrossRefGoogle Scholar
  3. Bickel, S., & Scheffer, T. (2004). Multi-view clustering. In Fourth IEEE international conference on data mining, 2004. ICDM ’04 (pp. 19–26).Google Scholar
  4. Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on computational learning theory (pp. 92–100). New York.Google Scholar
  5. Chaudhuri, K., Kakade, S.M., Livescu, K., Sridharan, K. (2009). Multi-view clustering via canonical correlation analysis. In Proceedings of the 26th annual international conference on machine learning (pp. 129–136). New York.Google Scholar
  6. De Carvalho, F.D.A., Lechevallier, Y., De Melo, F.M. (2012). Partitioning hard clustering algorithms based on multiple dissimilarity matrices. Pattern Recognition, 45(1), 447–464.CrossRefMATHGoogle Scholar
  7. de Sa, V.R. (2005). Spectral clustering with two views. In ICML workshop on learning with multiple views.Google Scholar
  8. Fred, A.L., & Jain, A.K. (2002). Data clustering using evidence accumulation. In Proceedings of the 16th international conference on pattern recognition, 2002. (vol. 4, pp. 276–280).Google Scholar
  9. Frings, O., Alexeyenko, A., Sonnhammer, E.L. (2013). MGclus: network clustering employing shared neighbors. Molecular BioSystems.Google Scholar
  10. Hu, B.-G., & Wang, Y. (2008). Evaluation criteria based on mutual information for classifications including rejected class. Acta Automatica Sinica, 34(11), 1396–1403.CrossRefMathSciNetGoogle Scholar
  11. Jain, A.K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666.CrossRefGoogle Scholar
  12. Janssens, F., Glänzel, W., De Moor, B. (2007). Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 360–369). New York.Google Scholar
  13. Kontschieder, P., Donoser, M., Bischof, H. (2009). Improving affinity matrices by modified mutual kNN-Graphs. In 33rd workshop of the Austrian association for pattern recognition (AAPR/OAGM).Google Scholar
  14. Kumar, A., & Daumé, H. III (2011). A co-training approach for multi-view spectral clustering. In International conference on machine learning.Google Scholar
  15. Lan, M., Tan, C.L., Su, J., Lu, Y. (2009). Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 721– 735.CrossRefGoogle Scholar
  16. Liu, X., Yu, S., Moreau, Y., De Moor, B., Glänzel, W., Janssens, F. (2009). Hybrid clustering of text mining and bibliometrics applied to journal sets. In Proceedings of the SIAM international data mining conference (SDM).Google Scholar
  17. Long, B., Wu, X., Zhang, Z.M., Yu, P.S. (2006). Unsupervised learning on k-partite graphs. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 317–326).Google Scholar
  18. Long, B., Yu Phillips, S., Zhang, Z. (2008). A general model for multiple view unsupervised learning. In Proceedings of the SIAM international data mining conference (SDM).Google Scholar
  19. Mirzaei, A., Rahmati, M., Ahmadi, M. (2008). A new method for hierarchical clustering combination. Intelligent Data Analysis, 12(6), 549–571.Google Scholar
  20. Mooi, E., & Sarrstedt, M. (2011). A concise guide to market research. Berlin Heidelberg: Springer.CrossRefGoogle Scholar
  21. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2–3), 103–134.CrossRefMATHGoogle Scholar
  22. Page, L., Brin, S., Motwani, R., Winograd, T. (1999). The PageRank citation ranking: bringing order to the web. Stanford InfoLab.Google Scholar
  23. Pavlidis, P., Cai, J., Weston, J., Noble, W.S. (2002). Learning gene functional classifications from multiple data types. Journal of Computational Biology, 9, 401–411.CrossRefGoogle Scholar
  24. Reed, J.W., Jiao, Y., Potok, T.E., Klump, B.A., Elmore, M.T., Hurson, A.R. (2006). TF-ICF: a new term weighting scheme for clustering dynamic data streams. In Proceedings of the 5th international conference on machine learning and applications (pp. 258–263). Washington, DC.Google Scholar
  25. Strehl, A., & Ghosh, J. (2003). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal Machine Learning Research, 3, 583–617.MATHMathSciNetGoogle Scholar
  26. Strehl, A., Ghosh, J., Cardie, C. (2002). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.Google Scholar
  27. Tang, W., Lu, Z., Dhillon, I.S. (2009). Clustering with multiple graphs. In Ninth IEEE international conference on data mining, 2009. ICDM ’09 (pp. 1016–1021).Google Scholar
  28. Tang, L., Wang, X., Liu, H. (2010). Community detection in multi-dimensional networks. Technical Report, Defense Technical Information Center. [Available at http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=ADA555924].
  29. Varga, R.S., & Nabben, R. (1993). On symmetric ultrametric matrices. In L. Reichel, A. Ruttan, R.S. Varga (Eds.) Numerical linear algebra (pp. 193–199). New York: Walter de Gruyter.Google Scholar
  30. Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3), 645–678.CrossRefGoogle Scholar
  31. Zheng, L., Li, T., Ding, C. (2010). Hierarchical ensemble clustering. In Proceedings of the 2010 IEEE international conference on data mining (pp. 1199–1204). Washington, DC.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Syed Fawad Hussain
    • 1
  • Muhammad Mushtaq
    • 1
  • Zahid Halim
    • 1
  1. 1.Faculty of Computer Science and EngineeringGIK Institute of Engineering Sciences and TechnologyTopiPakistan

Personalised recommendations