Chapter

Advances in Information Retrieval

Volume 7224 of the series Lecture Notes in Computer Science pp 376-387

Classification of Short Texts by Deploying Topical Annotations

  • Daniele VitaleAffiliated withDipartimento di Informatica, University of Pisa
  • , Paolo FerraginaAffiliated withDipartimento di Informatica, University of Pisa
  • , Ugo ScaiellaAffiliated withDipartimento di Informatica, University of Pisa

* Final gross prices may vary according to local VAT.

Get Access

Abstract

We propose a novel approach to the classification of short texts based on two factors: the use of Wikipedia-based annotators that have been recently introduced to detect the main topics present in an input text, represented via Wikipedia pages, and the design of a novel classification algorithm that measures the similarity between the input text and each output category by deploying only their annotated topics and the Wikipedia link-structure. Our approach waives the common practice of expanding the feature-space with new dimensions derived either from explicit or from latent semantic analysis. As a consequence it is simple and maintains a compact intelligible representation of the output categories. Our experiments show that it is efficient in construction and query time, accurate as state-of-the-art classifiers (see e.g. Phan et al. WWW ’08), and robust with respect to concept drifts and input sources.