Skip to main content

Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (ECDL 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1923))

Included in the following conference series:

Abstract

We tackle two different problems of text categorization (TC), namely feature selection and classifier induction. Feature selection (FS) refers to the activity of selecting, from the set of r distinct features (i.e. words) occurring in the collection, the subset of r′ ≪ r features that are most useful for compactly representing the meaning of the documents. We propose a novel FS technique, based on a simplified variant of the X 2 statistics. Classifier induction refers instead to the problem of auto- matically building a text classifier by learning from a set of documents pre-classified under the categories of interest. We propose a novel variant, based on the exploitation of negative evidence, of the well-known k-NN method. We report the results of systematic experimentation of these two methods performed on the standard Reuters-21578 benchmark.

We here make the assumptions that a document d j can belong to zero, one or many of the categories in C; this assumption is verified in the Reuters-21578 benchmark we use for our experiments. All the techniques we discuss here can be straightforwardly adapted to the other case in which each document belongs to exactly one category.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. J._Ittner, D. D. Lewis, and D. D. Ahn. Text categorization of low quality images. In Proceedings of SDAIR-95, 4th Annual Symposium on Document Analysis and Information Retrieval, pages 301–315, Las Vegas, US, 1995.

    Google Scholar 

  2. T. Joachims. Text categorization with support vector machines: learning with many relevant features. In C. Nédellec and C. Rouveirol, editors, Proceedings of ECML-98, 10th European Conference on Machine Learning, number 1398 in Lecture Notes in Computer Science, pages 137–142, Chemnitz, DE, 1998. Springer Verlag, Heidelberg, DE.

    Google Scholar 

  3. D. D. Lewis. Representation and learning in information retrieval. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, US, 1992.

    Google Scholar 

  4. H. T. Ng, W. B. Goh, and K. L. Low. Feature selection, perceptron learning, and a usability case study for text categorization. In N. J. Belkin, A. D. Narasimhalu, and P. Willett, editors, Proceedings of SIGIR-97, 20th ACM International Conference on Research and Development in Information Retrieval, pages 67–73, Philadelphia, US, 1997. ACM Press, New York, US.

    Google Scholar 

  5. R. E. Schapire, Y. Singer, and A. Singhal. Boosting and Rocchio applied to text filtering. In W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors, Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval, pages 215–223, Melbourne, AU, 1998. ACM Press, New York, US.

    Google Scholar 

  6. F. Sebastiani. Machine learning in automated text categorisation: a survey. Technical Report IEI-B4-31-1999, Istituto di Elaborazione dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, IT, 1999.

    Google Scholar 

  7. Y. Yang. Expert network: effective and efficient learning from human decisions in text categorisation and retrieval. In W. B. Croft and C. J. van Rijsbergen, editors, Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval, pages 13–22, Dublin, IE, 1994. Springer Verlag, Heidelberg, DE.

    Chapter  Google Scholar 

  8. Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1-2):69–90, 1999.

    Article  Google Scholar 

  9. Y. Yang and X. Liu. A re-examination of text categorization methods. In M. A. Hearst, F. Gey, and R. Tong, editors, Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval, pages 42–49, Berkeley, US, 1999. ACM Press, New York, US.

    Google Scholar 

  10. Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In D. H. Fisher, editor, Proceedings of ICML-97, 14th International Conference on Machine Learning, pages 412–420, Nashville, US, 1997. Morgan Kaufmann Publishers, San Francisco, US.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Galavotti, L., Sebastiani, F., Simi, M. (2000). Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization. In: Borbinha, J., Baker, T. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2000. Lecture Notes in Computer Science, vol 1923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45268-0_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-45268-0_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41023-2

  • Online ISBN: 978-3-540-45268-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics