Advertisement

Information Retrieval

, Volume 12, Issue 5, pp 559–580 | Cite as

Preferential text classification: learning algorithms and evaluation measures

  • Fabio Aiolli
  • Riccardo Cardin
  • Fabrizio Sebastiani
  • Alessandro Sperduti
Article

Abstract

In many applicative contexts in which textual documents are labelled with thematic categories, a distinction is made between the primary categories of a document, which represent the topics that are central to it, and its secondary categories, which represent topics that the document only touches upon. We contend that this distinction, so far neglected in text categorization research, is important and deserves to be explicitly tackled. The contribution of this paper is threefold. First, we propose an evaluation measure for this preferential text categorization task, whereby different kinds of misclassifications involving either primary or secondary categories have a different impact on effectiveness. Second, we establish several baseline results for this task on a well-known benchmark for patent classification in which the distinction between primary and secondary categories is present; these results are obtained by reformulating the preferential text categorization task in terms of well established classification problems, such as single and/or multi-label multiclass classification; state-of-the-art learning technology such as SVMs and kernel-based methods are used. Third, we improve on these results by using a recently proposed class of algorithms explicitly devised for learning from training data expressed in preferential form, i.e., in the form “for document d i , category c′ is preferred to category c′′”; this allows us to distinguish between primary and secondary categories not only in the classification phase but also in the learning phase, thus differentiating their impact on the classifiers to be generated.

Keywords

Preferential learning Supervised learning Text categorization Text classification Primary and secondary categories 

Notes

Acknowledgements

We thank Tiziano Fagni for indexing the WIPO-alpha collection and Andrea Esuli for useful discussions on Kendall distance. Thanks also to Lijuan Cai, Shantanu Godbole, Juho Rousu, Sunita Sarawagi, Domonkos Tikk, and S. Vishwanathan for clarifying the details of their experiments. This work has been partially supported by the project “Tecniche di classificazione automatica per brevetti”, funded by the University of Padova.

References

  1. Aiolli, F. (2005). A preference model for structured supervised learning tasks. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05) (pp. 557–560). Houston, USA.Google Scholar
  2. Aiolli, F., & Sperduti, A. (2005). Learning preferences for multiclass problems. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems 17 (Vol. 17, pp. 17–24). Cambridge, MA: MIT Press.Google Scholar
  3. Altun, Y., Hofmann, T., & Tsochantaridis, I. (2007). Support vector machine learning for interdependent and structured output spaces. In: G. H. Bakir, T. Hofmann, B. Schölkopf, A. J. Smola, B. Taskar, & S. V. N. Vishwanathan (Eds.), Predicting structured data (pp. 85–104). Cambridge, MA: The MIT Press.Google Scholar
  4. Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM’04) (pp. 78–87). Washington, DC.Google Scholar
  5. Cai, L., & Hofmann, T. (2007). Exploiting known taxonomies in learning overlapping concepts. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07) (pp. 714–719). Hyderabad, IndiaGoogle Scholar
  6. Chu, W., & Keerthi, S. S. (2007). Support vector ordinal regression. Neural Computation, 19(3), 792–815.zbMATHCrossRefMathSciNetGoogle Scholar
  7. Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2, 265–292.CrossRefGoogle Scholar
  8. Crammer, K., & Singer, Y. (2002). A new family of online algorithms for category ranking. In Proceedings of the 25th ACM International Conference on Research and Development in Information Retrieval (SIGIR’02) (pp. 151–158). Tampere, FIGoogle Scholar
  9. Cristianini, N., & Shawe-Taylor J. (2000). An introduction to support vector machines. Cambridge, UK: Cambridge University Press.Google Scholar
  10. Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D., & Vee, E. (2004). Comparing and aggregating rankings with ties. In Proceedings of ACM International Conference on Principles of Database Systems (PODS’04) (pp. 47–58). Paris, FranceGoogle Scholar
  11. Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D., & Vee, E. (2006). Comparing partial rankings. SIAM Journal on Discrete Mathematics, 20(3), 628–648.zbMATHCrossRefMathSciNetGoogle Scholar
  12. Fall, C. J., Törcsvári, A., Benzineb, K., & Karetka, G. (2003). Automated categorization in the international patent classification. SIGIR Forum, 37(1), 10–25.CrossRefGoogle Scholar
  13. Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3, 1289–1305.zbMATHCrossRefGoogle Scholar
  14. Friess, T.-T., Cristianini, N., & Campbell, C. (1998). The Kernel-Adatron: A fast and simple learning procedure for support vector machines. In Proceedings of the 15th International Conference on Machine Learning (ICML’98) (pp. 188–196). Madison, USA.Google Scholar
  15. Geng, X., Liu, T.-Y., Qin, T., & Li, H. (2007). Feature selection for ranking. In Proceedings of the 30th ACM International Conference on Research and Development in Information Retrieval (SIGIR’07) (pp. 407–414). Amsterdam, The Netherlands.Google Scholar
  16. Herbrich, R., Graepel, T., & Campbell, C. (2001). Bayes point machines. Journal of Machine Learning Research, 1, 245–279.zbMATHCrossRefMathSciNetGoogle Scholar
  17. Hersh, W., Buckley, C., Leone, T., & Hickman, D. (1994). Ohsumed: An interactive retrieval evaluation and new large text collection for research. In Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR’94) (pp. 192–201). Dublin, Ireland.Google Scholar
  18. Hofmann, T., Cai, L., & Ciaramita, M. (2003). Learning with taxonomies: Classifying documents and words. In Proceedings of the NIPS’03 Workshop on Syntax, Semantics, and Statistics. Vancouver, Canada.Google Scholar
  19. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning (ECML’98) (pp. 137–142). Chemnitz, Germany.Google Scholar
  20. Lam, W., & Ho, C. Y. (1998). Using a generalized instance set for automatic text categorization. In Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval (SIGIR’98) (pp. 81–89). Melbourne, Australia.Google Scholar
  21. Lewis, D. D. (1995). Evaluating and optmizing autonomous text classification systems. In Proceedings of 18th ACM International Conference on Research and Development in Information Retrieval (SIGIR’95) (pp. 246–254). Seattle, USA.Google Scholar
  22. Lewis, D. D., Schapire, R. E., Callan, J. P., & Papka, R. (1996). Training algorithms for linear text classifiers. In Proceedings of the 19th ACM International Conference on Research and Development in Information Retrieval (SIGIR’96) (pp. 298–306). Zürich, Switzerland.Google Scholar
  23. Platt, J. C., Cristianini, N., & Shawe-Taylor, J. (1999). Large-margin DAGs for multiclass classification. In Proceedings of the 11th International Conference on Neural Information Processing Systems (NIPS’99) (pp. 533–547). Denver, USA.Google Scholar
  24. Rousu, J., Saunders, C., Szedmak, S., & Shawe-Taylor, J. (2006). Kernel-based learning of hierarchical multilabel classification models. Journal of Machine Learning Research, 7, 1601–1626.MathSciNetGoogle Scholar
  25. Ruiz, M., & Srinivasan, P. (2002). Hierarchical text classification using neural networks. Information Retrieval, 5(1), 87–118.zbMATHCrossRefGoogle Scholar
  26. Seeger, M. W. (2007). Cross-validation optimization for large scale hierarchical classification Kernel methods. In B. Schölkopf, J. Platt, & Hoffman, T. (Eds.), Advances in neural information processing systems (Vol. 19, pp. 1233–1240). Cambridge, MA: MIT Press.Google Scholar
  27. Shahbaba, B. (2007). Improving classification models when a class hierarchy is available. Ph.D. thesis, Graduate Department of Public Health Sciences, University of Toronto, Toronto, Canada.Google Scholar
  28. Tikk, D., & Biró, G. (2003). Experiment with a hierarchical text categorization method on the WIPO-alpha patent collection. In Proceedings of the 4th International Symposium on Uncertainty Modeling and Analysis (ISUMA’03) (pp. 104–109). College Park, USA.Google Scholar
  29. Tikk, D., Biró, G., & Yang, J. (2004). Experiments with a hierarchical text categorizer. In Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE’04) (pp. 1191–1196). Budapest, Hungary.Google Scholar
  30. Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In Proceedings of the 21st International Conference on Machine Learning (ICML’04) (pp. 104–111). Banff, Canada.Google Scholar
  31. Vapnik, V. (1998). Statistical learning theory. New York: Wiley.zbMATHGoogle Scholar
  32. Vishwanathan, S., Schraudolph, N., & Smola, A. (2006). Step size adaptation in reproducing kernel Hilbert space. Journal of Machine Learning Research, 7, 1107–1133.MathSciNetGoogle Scholar
  33. Voorhees, E. (1998). Variations in relevance judgments and the measurement of retrieval effectiveness. In Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval (SIGIR’98) (pp. 315–323). Melbourne, Australia.Google Scholar
  34. Yang, Y., Zhang, J., & Kisiel, B. (2003). A scalability analysis of classifiers in text categorization. In Proceedings of the 26th ACM International Conference on Research and Development in Information Retrieval (SIGIR’03) (pp. 96–103). Toronto, Canada.Google Scholar
  35. Yom-Tov, E., Fine, S., Carmel, D., & Darlow, A. (2005). Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In Proceedings of the 28th ACM International Conference on Research and Development in Information Retrieval (SIGIR’05) (pp. 512–519). Salvador de Bahia, Brazil.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Fabio Aiolli
    • 1
  • Riccardo Cardin
    • 1
  • Fabrizio Sebastiani
    • 2
  • Alessandro Sperduti
    • 1
  1. 1.Dipartimento di Matematica Pura e ApplicataUniversità di PadovaPadovaItaly
  2. 2.Istituto di Scienza e Tecnologie dell’InformazioneConsiglio Nazionale delle RicerchePisaItaly

Personalised recommendations