Classification of Short Texts by Deploying Topical Annotations

  • Daniele Vitale
  • Paolo Ferragina
  • Ugo Scaiella
Conference paper

DOI: 10.1007/978-3-642-28997-2_32

Volume 7224 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Vitale D., Ferragina P., Scaiella U. (2012) Classification of Short Texts by Deploying Topical Annotations. In: Baeza-Yates R. et al. (eds) Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg

Abstract

We propose a novel approach to the classification of short texts based on two factors: the use of Wikipedia-based annotators that have been recently introduced to detect the main topics present in an input text, represented via Wikipedia pages, and the design of a novel classification algorithm that measures the similarity between the input text and each output category by deploying only their annotated topics and the Wikipedia link-structure. Our approach waives the common practice of expanding the feature-space with new dimensions derived either from explicit or from latent semantic analysis. As a consequence it is simple and maintains a compact intelligible representation of the output categories. Our experiments show that it is efficient in construction and query time, accurate as state-of-the-art classifiers (see e.g. Phan et al. WWW ’08), and robust with respect to concept drifts and input sources.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Daniele Vitale
    • 1
  • Paolo Ferragina
    • 1
  • Ugo Scaiella
    • 1
  1. 1.Dipartimento di InformaticaUniversity of PisaItaly