MLG: Enchancing Multi-label Classification with Modularity-Based Label Grouping
Multi-label classification on data sets with large number of labels is a practically viable and intractable problem. This paper presents an optimization method for the multi-label classification process for data with a high number of labels. The newly proposed method starts with label grouping using community detection methods on interconnectedness graph of labels based on support sizes for every pair of labels. The grouping process is based on modularity-oriented community detection methods. Next the data instances are classified separately for each label community and the resulting labellings are merged afterwards. Both theoretical analysis and experimental results are provided. Experimental results comparing common classification methods to proposed Modularity-based Label Grouping (MLG) with embedded Binary Relevance, executed on on differentiated data sets show a performance increase by 27-41% compared to standard binary relevance, by 72-81% compared to RAkel and by several dozens compared to ECOC-BR-BCH with none or negligible difference in classification quality.
Keywordsmulti-label classification modularity label grouping community detection label co-occurrence label interconnectedness
Unable to display preview. Download preview PDF.
- 2.Newman, M., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E 69(2), 026113 (2004)Google Scholar
- 5.Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of International Conference on Information and Knowledge Management, pp. 195–200. ACM (2005)Google Scholar
- 8.Gibson, D., Kleinberg, J., Raghavan, P.: Inferring Web communities from link topology. In: Proceedings of the Ninth ACM Conference on Hypertext and Hypermedia: Links, Objects, Time and Space—Structure in Hypermedia Systems Links, Objects, Time and Space—Structure in Hypermedia Systems - HYPERTEXT 1998, pp. 225–234. ACM Press, New York (1998)CrossRefGoogle Scholar
- 10.Hofstad, R.V.D.: Random Graphs and Complex Networks (2013), http://www.win.tue.nl/~rhofstad/NotesRGCN.pdf (accessed April 30, 2008)
- 12.Newman, M.: Fast algorithm for detecting community structure in networks. Physical Review E 69(6), 066133 (2004)Google Scholar
- 13.Reichardt, J., Bornholdt, S.: Statistical mechanics of community detection. Physical Review E 74(1), 016110 (2006)Google Scholar
- 14.Michael Hahsler, B.G.: Introduction to arules: Mining Association Rules and Frequent Item SetsGoogle Scholar
- 15.Pestian, J.P., Brew, C., Matykiewicz, P., Hovermale, D.J., Johnson, N., Cohen, K.B., Duch, W.: A shared task involving multi-label classification of clinical free text. In: Proceedings of the Workshop on BioNLP 2007 Biological Translational and Clinical Language Processing BioNLP 2007, vol. 1, pp. 97–104 (2007)Google Scholar
- 17.Diplaris, S., Tsoumakas, G., Mitkas, P.A., Vlahavas, I.: Protein Classification with Multiple Algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 448–456 (2005)Google Scholar