Advertisement

Efficient Coverage of Case Space with Active Learning

  • Nuno Filipe Escudeiro
  • Alípio Mário Jorge
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5816)

Abstract

Collecting and annotating exemplary cases is a costly and critical task that is required in early stages of any classification process. Reducing labeling cost without degrading accuracy calls for a compromise solution which may be achieved with active learning. Common active learning approaches focus on accuracy and assume the availability of a pre-labeled set of exemplary cases covering all classes to learn. This assumption does not necessarily hold. In this paper we study the capabilities of a new active learning approach, d-Confidence, in rapidly covering the case space when compared to the traditional active learning confidence criterion, when the representativeness assumption is not met. Experimental results also show that d-Confidence reduces the number of queries required to achieve complete class coverage and tends to improve or maintain classification error.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Uc irvine machine learning repository (2009), http://archive.ics.uci.edu/ml/
  2. 2.
    Adami, G., Avesani, P., Sona, D.: Clustering documents into a web directory for bootstrapping a supervised classification. Data & Knowledge Engineering 54, 301–325 (2005)CrossRefGoogle Scholar
  3. 3.
    Angluin, D.: Queries and concept learning. Machine Learning 2, 319–342 (1988)MathSciNetGoogle Scholar
  4. 4.
    Balcan, M.-F., Beygelzimer, A., Langford, J.: Agnostic active learning. In: ICML, pp. 65–72. ICML (2006)Google Scholar
  5. 5.
    Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Machine Learning (15), 201–221 (1994)Google Scholar
  6. 6.
    Cohn, D., Ghahramani, Z., Jordan, M.: Active learning with statistical models. Journal of Artificial Intelligence Research 4, 129–145 (1996)zbMATHGoogle Scholar
  7. 7.
    Dasgupta, S.: Coarse sample complexity bonds for active learning. In: Advances in Neural Information Processing Systems, vol. 18 (2005)Google Scholar
  8. 8.
    Dasgupta, S., Hsu, D.: Hierarchical sampling for active learning. In: Proceedings of the 25th International Conference on Machine Learning (2008)Google Scholar
  9. 9.
    Escudeiro, N.F., Jorge, A.M.: Semi-automatic Creation and Maintenance of Web Resources with webTopic. In: Ackermann, M., Berendt, B., Grobelnik, M., Hotho, A., Mladenič, D., Semeraro, G., Spiliopoulou, M., Stumme, G., Svátek, V., van Someren, M. (eds.) Semantics, Web and Mining. LNCS (LNAI), vol. 4289, pp. 82–102. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Escudeiro, N., Jorge, A.: Learning partially specified concepts with d-confidence. In: Brazilian Simposium on Artificial Intelligence, Web and Text Intelligence Workshop (2008)Google Scholar
  11. 11.
    Hanneke, S.: A bound on the label complexity of agnostic active learning. In: Proceedings of the 24th International Conference on Machine Learning (2007)Google Scholar
  12. 12.
    Kääriäinen, M.: Active learning in the non-realizable case. In: Algorithmic Learning Theory, pp. 63–77. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: SIGIR 1994: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 3–12. Springer, New York (1994)Google Scholar
  14. 14.
    Li, M., Sethi, I.: Confidence-based active learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1251–1261 (2006)CrossRefGoogle Scholar
  15. 15.
    Liu, H., Motoda, H.: Instance Selection and Construction for Data Mining. Kluwer Academic Publishers, Dordrecht (2001)CrossRefGoogle Scholar
  16. 16.
    Muslea, I., Minton, S., Knoblock, C.A.: Active learning with multiple views. Journal of Artificial Intelligence Research 27, 203–233 (2006)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Ribeiro, P., Escudeiro, N.: On-line news “à la carte”. In: Proceedings of the European Conference on the Use of Modern Information and Communication Technologies (2008)Google Scholar
  18. 18.
    Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the International Conference on Machine Learning (2001)Google Scholar
  19. 19.
    Schohn, G., Cohn, D.: Less is more: Active learning with support vector machines. In: Proceedings of the International Conference on Machine Learning (2000)Google Scholar
  20. 20.
    Seung, H., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Nuno Filipe Escudeiro
    • 1
    • 3
  • Alípio Mário Jorge
    • 2
    • 3
  1. 1.Instituto Superior de Engenharia do PortoPortugal
  2. 2.Faculdade de CienciasUniversidade do PortoPortugal
  3. 3.LIAAD INESC Porto L.A.Portugal

Personalised recommendations