Advertisement

Multi-class Ensemble-Based Active Learning

  • Christine Körner
  • Stefan Wrobel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)

Abstract

Ensemble-based active learning has been proven to efficiently reduce the number of training instances and thus the cost of data acquisition. To determine the utility of a candidate training instance, the disagreement about its class value among the ensemble members is used. While the disagreement for binary classification is easily determined using margins, the adaption to multi-class problems is not straightforward and little studied in the literature. In this paper we consider four approaches to measure ensemble disagreement, including margins, uncertainty sampling and entropy, and evaluate them empirically on various ensemble strategies for active learning. We show that margins outperform the other disagreement measures on three of four active learning strategies. Our experiments also show that some active learning strategies are more sensitive to the choice of disagreement measure than others.

Keywords

Disagreement Measure Ensemble Member Training Instance Uncertainty Sampling Active Learning Strategy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Abe, N., Mamitsuka, H.: Query learning strategies using boosting and bagging. In: Proc. of ICML 1998, pp. 1–9. Morgan Kaufmann, San Francisco (1998)Google Scholar
  2. 2.
    Muslea, I., Minton, S., Knoblock, C.A.: Selective sampling with redundant views. In: Proc. of AAAI 2000, pp. 621–626. AAAI Press / The MIT Press (2000)Google Scholar
  3. 3.
    Melville, P., Mooney, R.: Diverse ensembles for active learning. In: Proc. of ICML 2004, pp. 584–591. ACM, New York (2004)Google Scholar
  4. 4.
    Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: Proc. of ICML 1994, pp. 148–156. ACM Press, New York (1994)Google Scholar
  5. 5.
    Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proc. of SIGIR 1994, pp. 3–12. ACM / Springer (1994)Google Scholar
  6. 6.
    Dagan, I., Engelson, S.: Committee-based sampling for training probabilistic classifiers. In: Proc. of ICML 1995, pp. 150–157. Morgan Kaufmann, San Francisco (1995)Google Scholar
  7. 7.
    McCallum, A., Nigam, K.: Employing em and pool-based active learning for text classification. In: Proc. of ICML 1998, pp. 350–358. Morgan Kaufmann, San Francisco (1998)Google Scholar
  8. 8.
    Melville, P., Yang, S.M., Saar-Tsechansky, M., Mooney, R.: Active learning for probability estimation using jensen-shannon divergence. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS, vol. 3720, pp. 268–279. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proc. of COLT 1992, pp. 287–294. ACM, New York (1992)CrossRefGoogle Scholar
  10. 10.
    Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Machine Learning 28(2-3), 133–168 (1997)MATHCrossRefGoogle Scholar
  11. 11.
    Breiman, L.: Bagging predictors. Technical report 421, University of California, Berkeley (1994)Google Scholar
  12. 12.
    Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proc. of ICML 1996, pp. 148–156. Morgan Kaufmann, San Francisco (1996)Google Scholar
  13. 13.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. of COLT 1998, pp. 92–100. ACM, New York (1998)CrossRefGoogle Scholar
  14. 14.
    Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proc. of CIKM 2000, pp. 86–93. ACM, New York (2000)CrossRefGoogle Scholar
  15. 15.
    Muslea, I.: Active Learning with Multiple Views. PhD thesis, University of Southern California (2002)Google Scholar
  16. 16.
    Melville, P., Mooney, R.: Constructing diverse classifier ensembles using artificial training examples. In: Proc. of IJCAI 2003, pp. 505–510. Morgan Kaufmann, San Francisco (2003)Google Scholar
  17. 17.
    Blake, C.L., Merz, C.J.: Uci repository of machine learning databases, http://www.ics.uci.edu/~mlearn/MLRepository.html
  18. 18.
    Witten, I.H., Frank, E.: Data Mining - Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Christine Körner
    • 1
  • Stefan Wrobel
    • 1
    • 2
  1. 1.Fraunhofer Institut Intelligente Analyse- und InformationssystemeGermany
  2. 2.Dept. of Computer Science IIIUniversity of BonnGermany

Personalised recommendations