Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization

  • Luigi Galavotti
  • Fabrizio Sebastiani
  • Maria Simi
Conference paper

DOI: 10.1007/3-540-45268-0_6

Part of the Lecture Notes in Computer Science book series (LNCS, volume 1923)
Cite this paper as:
Galavotti L., Sebastiani F., Simi M. (2000) Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization. In: Borbinha J., Baker T. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2000. Lecture Notes in Computer Science, vol 1923. Springer, Berlin, Heidelberg

Abstract

We tackle two different problems of text categorization (TC), namely feature selection and classifier induction. Feature selection (FS) refers to the activity of selecting, from the set of r distinct features (i.e. words) occurring in the collection, the subset of r′ ≪ r features that are most useful for compactly representing the meaning of the documents. We propose a novel FS technique, based on a simplified variant of the X2 statistics. Classifier induction refers instead to the problem of auto- matically building a text classifier by learning from a set of documents pre-classified under the categories of interest. We propose a novel variant, based on the exploitation of negative evidence, of the well-known k-NN method. We report the results of systematic experimentation of these two methods performed on the standard Reuters-21578 benchmark.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Luigi Galavotti
    • 1
  • Fabrizio Sebastiani
    • 2
  • Maria Simi
    • 3
  1. 1.AUTON S.R.L.FirenzeItaly
  2. 2.Consiglio Nazionale delle RicercheIstituto di Elaborazione dell’InformazionePisaItaly
  3. 3.Dipartimento di InformaticaUniversità di PisaPisaItaly

Personalised recommendations