Multi-label active learning: key issues and a novel query strategy

Cherman, Everton Alvares; Papanikolaou, Yannis; Tsoumakas, Grigorios; Monard, Maria Carolina

doi:10.1007/s12530-017-9202-z

Multi-label active learning: key issues and a novel query strategy

Original Paper
Published: 30 August 2017

Volume 10, pages 63–78, (2019)
Cite this article

Evolving Systems Aims and scope Submit manuscript

Everton Alvares Cherman¹,
Yannis Papanikolaou ORCID: orcid.org/0000-0003-3498-3255²,
Grigorios Tsoumakas² &
…
Maria Carolina Monard¹

612 Accesses
9 Citations
Explore all metrics

Abstract

Active learning is an iterative supervised learning task where learning algorithms can actively query an oracle, i.e. a human annotator that understands the nature of the problem, to obtain the ground truth. The motivation behind this approach is to allow the learner to interactively choose the data it will learn from, which can lead to significantly less annotation cost, faster training and improved performance. Active learning is appropriate for machine learning applications where labeled data is costly to obtain but unlabeled data is abundant. Most importantly, it permits a learning model to evolve and adapt to new data unlike conventional supervised learning. Although active learning has been widely considered for single-label learning, applications to multi-label learning have been more limited. In this work, we present the general framework to apply active learning to multi-label data, discussing the key issues that need to be considered in pool-based multi-label active learning and how existing solutions in the literature deal with each of these issues. We further propose a novel aggregation method for evaluating which instances are to be annotated. Extensive experiments on 13 multi-label data sets with different characteristics and under two different applications settings (transductive, inductive) convey a consistent advantage of our proposed approach against the rest of the approaches and, most importantly, against passive supervised learning and reveal interesting aspects related mainly to the properties of the data sets, and secondarily to the application settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on semi-supervised learning

Article Open access 15 November 2019

A survey on ensemble learning

Article 30 August 2019

A survey of transfer learning

Article Open access 28 May 2016

Notes

Instance-wise and label-wise annotation have been called global and local labeling respectively in Esuli and Sebastiani (2009).
http://mulan.sourceforge.net.
http://www.cs.waikato.ac.nz/ml/weka.
http://mulan.sourceforge.net/data sets.html.
http://meka.sourceforge.net/.
http://www.csie.ntu.edu.tw/~cjlin/liblinear/.
https://www.dropbox.com/s/cxyf27wzp9xzlxr/appendix.pdf?dl=0.
https://en.wikipedia.org/wiki/Transduction_(machine_learning).
https://www.dropbox.com/s/cxyf27wzp9xzlxr/appendix.pdf?dl=0.
AULC values for the Ranking-Loss measure were multiplied by 10 to consider the third decimal place in the comparison.

References

Aggarwal CC, Kong X, Gu Q, Han J, Yu PS (2014) Active learning: a survey. In: Aggarwal CC (ed) Data classification: algorithms and applications. CRC Press, Boca Raton, pp 571–606
Chapter Google Scholar
Brinker K (2006) On active learning in multi-label classification. In: Spiliopoulou M, Kruse R, Borgelt C, Nurnberger A, Gaul W (eds) From data and information analysis to knowledge engineering, studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 206–213
Chapter Google Scholar
Cherman EA, Tsoumakas G, Monard MC (2016) Active learning algorithms for multi-label data. In: Proceedings of the 12th IFIP international conference on artificial intelligence applications and innovations (AIAI 2016), Thessaloniki, pp 1–12
Demšar J (2006) Statistical comparison of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
MathSciNet MATH Google Scholar
Esuli A, Sebastiani F (2009) Active learning strategies for multi-label text classification. In: Proceedings of the 31st European conference on IR research, ECIR ’09. Springer, Berlin, pp 102–113
Gao N, Huang SJ, Chen S (2016) Multi-label active learning by model guided distribution matching. Front Comput Sci 10(5):845–855
Article Google Scholar
Huang S, Chen S, Zhou Z (2015) Multi-label active learning: query type matters. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015, pp 946–952
Hung CW, Lin HT (2011) Multi-label active learning with auxiliary learner. In: Asian conference on machine learning, pp 315–332
McCallumzy AK, Nigamy K (1998) Employing EM and pool-based active learning for text classification. In: Proceedings of the international conference on machine learning (ICML), Citeseer, pp 359–367
Nowak S, Nagel K, Liebetrau J (2011) The CLEF 2011 photo annotation and concept-based retrieval tasks. In: CLEF (notebook papers/labs/workshop), Amsterdam, Netherlands, pp 1–25
Rossi RG, de Andrade Lopes A, Rezende SO (2013) A parameter-free label propagation algorithm using bipartite heterogeneous networks for text classification. In: Proceedings of symposium on applied computing (ACM SAC’2014), New York, NY
Settles B (2010) Active learning literature survey. Tech. Rep. 1648. University of Wisconsin–Madison, Madison
Google Scholar
Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 1070–1079
Singh M, Brew A, Greene D, Cunningham P (2010) Score normalization and aggregation for active learning in multi-label classification. Tech. rep. University College Dublin, Dublin
Google Scholar
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn 2:45–66
MATH Google Scholar
Tsoumakas G, Katakis I, Vlahavas I (2009) Mining multi-label data. Data mining and knowledge discovery handbook, Springer, pp 1–19
Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I (2011) Mulan: a java library for multi-label learning. J Mach Learn Res 12:2411–2414
MathSciNet MATH Google Scholar
Tsoumakas G, Zhang ML, Zhou ZH (2012) Introduction to the special issue on learning from multi-label data. Mach Learn 88(1–2):1–4
Article MathSciNet MATH Google Scholar
Yang B, Sun JT, Wang T, Chen Z (2009) Effective multi-label active learning for text classification. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09, ACM, New York, pp 917–926. doi:10.1145/1557019.1557119
Yang Y (2001) A study of thresholding strategies for text categorization. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, ACM, New York, NY, pp 137–145
Ye C, Wu J, Sheng VS, Zhao S, Zhao P, Cui Z (2015) Multi-label active learning with chi-square statistics for image classification. In: Proceedings of the 5th ACM on international conference on multimedia retrieval—ICMR’15, Association for Computing Machinery (ACM), New York, NY, pp 583–586
Zhang B, Wang Y, Chen F (2014) Multilabel image classification via high-order label correlation driven active learning. IEEE Trans Image Process 23(3):1430–1441
Article MathSciNet MATH Google Scholar
Zliobaite I, Bifet A, Pfahringer B, Holmes G (2011) Active learning with evolving streaming data. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, Berlin, pp 597–612

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their constructive comments that helped in improving our paper. E.A. Cherman and M.C. Monard were supported by the São Paulo Research Foundation (FAPESP), Grants 2010/15992-0 and 2011/21723-5, and Brazilian National Council for Scientific and Technological Development (CNPq), Grant 644963.

Author information

Authors and Affiliations

Institute of Mathematics and Computer Sciences, University of Sao Paulo, Sao Carlos, SP, Brazil
Everton Alvares Cherman & Maria Carolina Monard
Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece
Yannis Papanikolaou & Grigorios Tsoumakas

Authors

Everton Alvares Cherman
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Papanikolaou
View author publications
You can also search for this author in PubMed Google Scholar
Grigorios Tsoumakas
View author publications
You can also search for this author in PubMed Google Scholar
Maria Carolina Monard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yannis Papanikolaou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cherman, E.A., Papanikolaou, Y., Tsoumakas, G. et al. Multi-label active learning: key issues and a novel query strategy. Evolving Systems 10, 63–78 (2019). https://doi.org/10.1007/s12530-017-9202-z

Download citation

Received: 24 January 2017
Accepted: 19 August 2017
Published: 30 August 2017
Issue Date: 01 March 2019
DOI: https://doi.org/10.1007/s12530-017-9202-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-label active learning: key issues and a novel query strategy

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

A survey on ensemble learning

A survey of transfer learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-label active learning: key issues and a novel query strategy

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

A survey on ensemble learning

A survey of transfer learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation