Advertisement

Active Learning with Bagging for NLP Tasks

  • Ruy Luiz Milidiú
  • Daniel Schwabe
  • Eduardo Motta
Part of the Advances in Intelligent Systems and Computing book series (volume 167)

Abstract

Supervised classifiers are limited by the annotated corpora available. Active learning is a way to circumvent this bottleneck, reducing the number of annotated examples required. In this paper, we analyze the benefits of active learning combined with bagging applied to Quotation Start, Noun Phrase Chunking and Text Chunking tasks. We employ query-by-committee as query strategy to actively select examples to be annotated. By using these techniques, we achieve reductions up to 62.50% on the annotation effort depending on the task to obtain the same quality as in passive supervised learning.

Keywords

Active Learning Noun Phrase Word Sense Disambigua Annotate Corpus Human Language Technology 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abe, N., Mamitsuka, H.: Query learning strategies using boosting and bagging. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, pp. 1–9. Morgan Kaufmann Publishers Inc., San Francisco (1998), http://dl.acm.org/citation.cfm?id=645527.657478 Google Scholar
  2. 2.
    Dagan, I., Engelson, S.P.: Committee-based sampling for training probabilistic classifiers. In: ICML 1995, pp. 150–157 (1995)Google Scholar
  3. 3.
    Fernandes, W.P.D., Motta, E., Milidiú, R.L.: Quotation extraction for portuguese. In: Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology (STIL 2011), Cuiabá, pp. 204–208 (2011)Google Scholar
  4. 4.
    Freitas, M.C., Garrao, M., Oliveira, C., dos Santos, C.N., Silveira, M.: A anotação de um corpus para o aprendizado supervisionado de um modelo de sn. In: Proceedings of the III TIL/XXV Congresso da SBC, São Leopoldo - RS - Brasil (2005)Google Scholar
  5. 5.
    Freitas, C., Rocha, P., Bick, E.: Floresta Sintá(c)tica: Bigger, Thicker and Easier. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 216–219. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Hammerton, J.: Introduction to Special Issue on Machine Learning Approaches to Shallow Parsing. Journal of Machine Learning Research 19(2), 313–558 (2002), doi:10.1162/153244302320884533Google Scholar
  7. 7.
    Milidiú, R.L., Santos, C.N., Duarte, J.C.: Phrase chunking using entropy guided transformation. In: Proc. of ACL 2008: HLT, pp. 647–655 (2008)Google Scholar
  8. 8.
    Olsson, F.: A literature survey of active machine learning in the context of natural language processing. Tech. Rep. 06, Box 1263, SE-164 29 Kista, Sweden(2009), http://soda.swedish-ict.se/3600/1/SICS-T2009-06--SE.pdf
  9. 9.
    Sang, E.F.T.K., Buchholz, S.: Introduction to the conll-2000 shared task: Chunking. In: Proceedings of CoNLL 2000 and LLL 2000, Lisbon, Portugal, pp. 127–132 (2000)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  • Ruy Luiz Milidiú
    • 1
  • Daniel Schwabe
    • 1
  • Eduardo Motta
    • 1
  1. 1.Departamento de InformáticaPontifícia Universidade Católica do Rio de JaneiroRio de JaneiroBrazil

Personalised recommendations