Skip to main content

A Hybrid Approach to Optimize Feature Selection Process in Text Classification

  • 291 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 2175)

Abstract

Feature selection and weighting are the primary activity of every learning algorithm for text classification. Traditionally these tasks are carried out individually in two distinct phases: the first is the global feature selection during a corpus pre-processing and the second is the application of the feature weighting model. This means that two (or several) different techniques are used to optimize the performances even if a single algorithm may have more chances to operate the right choices. When the complete feature set is available, the classifier learning algorithm can better relate to the suitable representation level the different complex features like linguistic ones (e.g. syntactic categories associated to words in the training materialor terminological expressions). In [3] it has been suggested that classifiers based on generalized Rocchio formula can be used to weight features in category profiles in order to exploit the selectivity of linguistic information techniques in text classification. In this paper, a systematic study aimed to understand the role of Rocchio formula in selection and weighting of linguistic features will be described.

Keywords

  • Feature Selection
  • Linguistic Feature
  • Proper Noun
  • Breakeven Point
  • Syntactic Role

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/3-540-45411-X_33
  • Chapter length: 7 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-540-45411-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Basili, A. Moschitti, and M.T. Pazienza. Language sensitive text classification. In Proceeding of 6th RIAO Conference, Collège de France, Paris, France, 2000.

    Google Scholar 

  2. R. Basili, A. Moschitti, and M.T. Pazienza. Modeling terminological information in text classification. In Proceeding of 7th TALN Conference, 2000.

    Google Scholar 

  3. R. Basili, A. Moschitti, and M.T. Pazienza. NLP-driven IR: Evaluating performances over text classification task. In Proceeding of the 10th IJCAI Conference, Seattle, Washington, USA, 2001.

    Google Scholar 

  4. David J. Ittner, David D. Lewis, and David D. Ahn. Text categorization of low quality images. In Proceedings of SDAIR-95, pages 301–315, Las Vegas, US, 1995.

    Google Scholar 

  5. G: Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513–523, 1988.

    CrossRef  Google Scholar 

  6. Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval Journal, May, 1999.

    Google Scholar 

  7. Y. Yang and Jan O. Pedersen. A comparative study on feature selection in text categorization. In Proceedings of ICML-97, pages 412–420, Nashville, US, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Basili, R., Moschitti, A., Pazienza, M.T. (2001). A Hybrid Approach to Optimize Feature Selection Process in Text Classification. In: Esposito, F. (eds) AI*IA 2001: Advances in Artificial Intelligence. AI*IA 2001. Lecture Notes in Computer Science(), vol 2175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45411-X_33

Download citation

  • DOI: https://doi.org/10.1007/3-540-45411-X_33

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42601-1

  • Online ISBN: 978-3-540-45411-3

  • eBook Packages: Springer Book Archive