A Selective Sampling Strategy for Label Ranking

  • Massih Amini
  • Nicolas Usunier
  • François Laviolette
  • Alexandre Lacasse
  • Patrick Gallinari
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)

Abstract

We propose a novel active learning strategy based on the compression framework of [9] for label ranking functions which, given an input instance, predict a total order over a predefined set of alternatives. Our approach is theoretically motivated by an extension to ranking and active learning of Kääriäinen’s generalization bounds using unlabeled data [7], initially developed in the context of classification. The bounds we obtain suggest a selective sampling strategy provided that a sufficiently, yet reasonably large initial labeled dataset is provided. Experiments on Information Retrieval corpora from automatic text summarization and question/answering show that the proposed approach allows to substantially reduce the labeling effort in comparison to random and heuristic-based sampling strategies.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amini, M., Usunier, N., Gallinari, P.: Automatic text summarization based on word clusters and ranking algorithms. In: Proc. of the 27th ECIR (2005)Google Scholar
  2. 2.
    Brinker, K.: Active learning of label ranking functions. In: Proc. of 21st International Conference on Machine learning (2004)Google Scholar
  3. 3.
    Brinker, K., Fürnkranz, J., Hüllermeier, E.: Label ranking by learning pairwise preferences. Journal of Machine learning Research (2005)Google Scholar
  4. 4.
    Chapelle, O.: Active learning for parzen window classifier. In: AI STATS (2005)Google Scholar
  5. 5.
    Crammer, K., Singer, Y.: A family of additive online algorithms for category ranking. Journal of Machine Learning Research 3(6), 1025–1058 (2003)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Floyd, S., Warmuth, M.: Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning 21(3), 269–304 (1995)Google Scholar
  7. 7.
    Kääriäinen, M.: Generalization error bounds using unlabeled data. In: Proceedings of the 18th Annual Conference on Learning Theory, pp. 127–142 (2005)Google Scholar
  8. 8.
    Laviolette, F., Marchand, M., Shah, M.: Margin-sparsity trade-off for the set covering machine. In: Proc. of the 16th ECML, pp. 206–217 (2005)Google Scholar
  9. 9.
    Littlestone, N.: Manfred Warmuth. Relating data compression and learnability. Technical Report, University of California (1986)Google Scholar
  10. 10.
    Marcu, D.: The automatic construction of large-scale corpora for summarization research. In: Proceedings of the 22nd ACM SIGIR, pp. 137–144 (1999)Google Scholar
  11. 11.
    Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2002)MATHCrossRefGoogle Scholar
  12. 12.
    Usunier, N., Amini, M., Gallinari, P.: Boosting weak ranking functions to enhance passage retrieval for question answering. In: IR4QA-workshop, SIGIR (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Massih Amini
    • 1
  • Nicolas Usunier
    • 1
  • François Laviolette
    • 2
  • Alexandre Lacasse
    • 2
  • Patrick Gallinari
    • 1
  1. 1.Laboratoire d’Informatique de Paris 6Université Pierre et Marie CurieParisFrance
  2. 2.Département IFT-GLOUniversité LavalSainte-Foy (QC)Canada

Personalised recommendations