Skip to main content

A Hybrid Approach to Optimize Feature Selection Process in Text Classification

  • Conference paper
  • First Online:
  • 331 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2175))

Abstract

Feature selection and weighting are the primary activity of every learning algorithm for text classification. Traditionally these tasks are carried out individually in two distinct phases: the first is the global feature selection during a corpus pre-processing and the second is the application of the feature weighting model. This means that two (or several) different techniques are used to optimize the performances even if a single algorithm may have more chances to operate the right choices. When the complete feature set is available, the classifier learning algorithm can better relate to the suitable representation level the different complex features like linguistic ones (e.g. syntactic categories associated to words in the training materialor terminological expressions). In [3] it has been suggested that classifiers based on generalized Rocchio formula can be used to weight features in category profiles in order to exploit the selectivity of linguistic information techniques in text classification. In this paper, a systematic study aimed to understand the role of Rocchio formula in selection and weighting of linguistic features will be described.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Basili, A. Moschitti, and M.T. Pazienza. Language sensitive text classification. In Proceeding of 6th RIAO Conference, Collège de France, Paris, France, 2000.

    Google Scholar 

  2. R. Basili, A. Moschitti, and M.T. Pazienza. Modeling terminological information in text classification. In Proceeding of 7th TALN Conference, 2000.

    Google Scholar 

  3. R. Basili, A. Moschitti, and M.T. Pazienza. NLP-driven IR: Evaluating performances over text classification task. In Proceeding of the 10th IJCAI Conference, Seattle, Washington, USA, 2001.

    Google Scholar 

  4. David J. Ittner, David D. Lewis, and David D. Ahn. Text categorization of low quality images. In Proceedings of SDAIR-95, pages 301–315, Las Vegas, US, 1995.

    Google Scholar 

  5. G: Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513–523, 1988.

    Article  Google Scholar 

  6. Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval Journal, May, 1999.

    Google Scholar 

  7. Y. Yang and Jan O. Pedersen. A comparative study on feature selection in text categorization. In Proceedings of ICML-97, pages 412–420, Nashville, US, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Basili, R., Moschitti, A., Pazienza, M.T. (2001). A Hybrid Approach to Optimize Feature Selection Process in Text Classification. In: Esposito, F. (eds) AI*IA 2001: Advances in Artificial Intelligence. AI*IA 2001. Lecture Notes in Computer Science(), vol 2175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45411-X_33

Download citation

  • DOI: https://doi.org/10.1007/3-540-45411-X_33

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42601-1

  • Online ISBN: 978-3-540-45411-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics