Active learning is an iterative supervised learning task where learning algorithms can actively query an oracle, i.e. a human annotator that understands the nature of the problem, to obtain the ground truth. The motivation behind this approach is to allow the learner to interactively choose the data it will learn from, which can lead to significantly less annotation cost, faster training and improved performance. Active learning is appropriate for machine learning applications where labeled data is costly to obtain but unlabeled data is abundant. Most importantly, it permits a learning model to evolve and adapt to new data unlike conventional supervised learning. Although active learning has been widely considered for single-label learning, applications to multi-label learning have been more limited. In this work, we present the general framework to apply active learning to multi-label data, discussing the key issues that need to be considered in pool-based multi-label active learning and how existing solutions in the literature deal with each of these issues. We further propose a novel aggregation method for evaluating which instances are to be annotated. Extensive experiments on 13 multi-label data sets with different characteristics and under two different applications settings (transductive, inductive) convey a consistent advantage of our proposed approach against the rest of the approaches and, most importantly, against passive supervised learning and reveal interesting aspects related mainly to the properties of the data sets, and secondarily to the application settings.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Instance-wise and label-wise annotation have been called global and local labeling respectively in Esuli and Sebastiani (2009).
AULC values for the Ranking-Loss measure were multiplied by 10 to consider the third decimal place in the comparison.
Aggarwal CC, Kong X, Gu Q, Han J, Yu PS (2014) Active learning: a survey. In: Aggarwal CC (ed) Data classification: algorithms and applications. CRC Press, Boca Raton, pp 571–606
Brinker K (2006) On active learning in multi-label classification. In: Spiliopoulou M, Kruse R, Borgelt C, Nurnberger A, Gaul W (eds) From data and information analysis to knowledge engineering, studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 206–213
Cherman EA, Tsoumakas G, Monard MC (2016) Active learning algorithms for multi-label data. In: Proceedings of the 12th IFIP international conference on artificial intelligence applications and innovations (AIAI 2016), Thessaloniki, pp 1–12
Demšar J (2006) Statistical comparison of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
Esuli A, Sebastiani F (2009) Active learning strategies for multi-label text classification. In: Proceedings of the 31st European conference on IR research, ECIR ’09. Springer, Berlin, pp 102–113
Gao N, Huang SJ, Chen S (2016) Multi-label active learning by model guided distribution matching. Front Comput Sci 10(5):845–855
Huang S, Chen S, Zhou Z (2015) Multi-label active learning: query type matters. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015, pp 946–952
Hung CW, Lin HT (2011) Multi-label active learning with auxiliary learner. In: Asian conference on machine learning, pp 315–332
McCallumzy AK, Nigamy K (1998) Employing EM and pool-based active learning for text classification. In: Proceedings of the international conference on machine learning (ICML), Citeseer, pp 359–367
Nowak S, Nagel K, Liebetrau J (2011) The CLEF 2011 photo annotation and concept-based retrieval tasks. In: CLEF (notebook papers/labs/workshop), Amsterdam, Netherlands, pp 1–25
Rossi RG, de Andrade Lopes A, Rezende SO (2013) A parameter-free label propagation algorithm using bipartite heterogeneous networks for text classification. In: Proceedings of symposium on applied computing (ACM SAC’2014), New York, NY
Settles B (2010) Active learning literature survey. Tech. Rep. 1648. University of Wisconsin–Madison, Madison
Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 1070–1079
Singh M, Brew A, Greene D, Cunningham P (2010) Score normalization and aggregation for active learning in multi-label classification. Tech. rep. University College Dublin, Dublin
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn 2:45–66
Tsoumakas G, Katakis I, Vlahavas I (2009) Mining multi-label data. Data mining and knowledge discovery handbook, Springer, pp 1–19
Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I (2011) Mulan: a java library for multi-label learning. J Mach Learn Res 12:2411–2414
Tsoumakas G, Zhang ML, Zhou ZH (2012) Introduction to the special issue on learning from multi-label data. Mach Learn 88(1–2):1–4
Yang B, Sun JT, Wang T, Chen Z (2009) Effective multi-label active learning for text classification. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09, ACM, New York, pp 917–926. doi:10.1145/1557019.1557119
Yang Y (2001) A study of thresholding strategies for text categorization. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, ACM, New York, NY, pp 137–145
Ye C, Wu J, Sheng VS, Zhao S, Zhao P, Cui Z (2015) Multi-label active learning with chi-square statistics for image classification. In: Proceedings of the 5th ACM on international conference on multimedia retrieval—ICMR’15, Association for Computing Machinery (ACM), New York, NY, pp 583–586
Zhang B, Wang Y, Chen F (2014) Multilabel image classification via high-order label correlation driven active learning. IEEE Trans Image Process 23(3):1430–1441
Zliobaite I, Bifet A, Pfahringer B, Holmes G (2011) Active learning with evolving streaming data. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, Berlin, pp 597–612
We would like to thank the anonymous reviewers for their constructive comments that helped in improving our paper. E.A. Cherman and M.C. Monard were supported by the São Paulo Research Foundation (FAPESP), Grants 2010/15992-0 and 2011/21723-5, and Brazilian National Council for Scientific and Technological Development (CNPq), Grant 644963.
About this article
Cite this article
Cherman, E.A., Papanikolaou, Y., Tsoumakas, G. et al. Multi-label active learning: key issues and a novel query strategy. Evolving Systems 10, 63–78 (2019). https://doi.org/10.1007/s12530-017-9202-z
- Supervised learning
- Multi-label learning
- Active learning
- Pool-based strategies
- Knowledge discovery