Active Learning Strategies for Multi-Label Text Classification

  • Andrea Esuli
  • Fabrizio Sebastiani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5478)

Abstract

Active learning refers to the task of devising a ranking function that, given a classifier trained from relatively few training examples, ranks a set of additional unlabeled examples in terms of how much further information they would carry, once manually labeled, for retraining a (hopefully) better classifier. Research on active learning in text classification has so far concentrated on single-label classification; active learning for multi-label classification, instead, has either been tackled in a simulated (and, we contend, non-realistic) way, or neglected tout court. In this paper we aim to fill this gap by examining a number of realistic strategies for tackling active learning for multi-label classification. Each such strategy consists of a rule for combining the outputs returned by the individual binary classifiers as a result of classifying a given unlabeled document. We present the results of extensive experiments in which we test these strategies on two standard text classification datasets.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Machine Learning 15(2), 201–221 (1994)Google Scholar
  2. 2.
    Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR 1994), Dublin, IE, pp. 3–12 (1994)Google Scholar
  3. 3.
    Lewis, D.D.: Reuters-21578 text categorization test collection Distribution 1.0 README file, v 1.3 (2004)Google Scholar
  4. 4.
    Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)Google Scholar
  5. 5.
    Liu, T., Yang, Y., Wan, H., Zeng, H., Chen, Z., Ma, W.: Support vector machines classification with a very large-scale taxonomy. SIGKDD Explorations 7(1), 36–43 (2005)CrossRefGoogle Scholar
  6. 6.
    Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45–66 (2001)MATHGoogle Scholar
  7. 7.
    Davy, M., Luz, S.: Active learning with history-based query selection for text categorisation. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 695–698. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Liere, R., Tadepalli, P.: Active learning with committees for text categorization. In: Proceedings of the 14th Conference of the American Association for Artificial Intelligence (AAAI 1997), Providence, US, pp. 591–596 (1997)Google Scholar
  9. 9.
    McCallum, A.K., Nigam, K.: Employing EM in pool-based active learning for text classification. In: Proceedings of the 15th International Conference on Machine Learning (ICML1998), Madison, US, pp. 350–358 (1998)Google Scholar
  10. 10.
    Xu, Z., Yu, K., Tresp, V., Xu, X., Wang, J.: Representative sampling for text classification using support vector machines. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 393–407. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Esuli, A., Fagni, T., Sebastiani, F.: MP-boost: A multiple-pivot boosting algorithm and its application to text categorization. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 1–12. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Schapire, R.E., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning 39(2/3), 135–168 (2000)CrossRefMATHGoogle Scholar
  13. 13.
    Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd ACM International Conference on Research and Development in Information Retrieval (SIGIR 1999), Berkeley, US, pp. 42–49 (1999)Google Scholar
  14. 14.
    Hoi, S.C.H., Jin, R., Lyu, M.R.: Large-scale text categorization by batch mode active learning. In: Proceedings of the 15th International Conference on World Wide Web (WWW 2006), Edinburgh, UK, pp. 633–642 (2006)Google Scholar
  15. 15.
    Raghavan, H., Madani, O., Jones, R.: InterActive feature selection. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), Edinburgh, UK, pp. 841–846 (2005)Google Scholar
  16. 16.
    Raghavan, H., Madani, O., Jones, R.: Active learning with feedback on features and instances. Journal of Machine Learning Research 7, 1655–1686 (2006)MathSciNetMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Andrea Esuli
    • 1
  • Fabrizio Sebastiani
    • 1
  1. 1.Istituto di Scienza e Tecnologia dell’InformazioneConsiglio Nazionale delle RicerchePisaItaly

Personalised recommendations