Abstract
Feature selection and weighting are the primary activity of every learning algorithm for text classification. Traditionally these tasks are carried out individually in two distinct phases: the first is the global feature selection during a corpus pre-processing and the second is the application of the feature weighting model. This means that two (or several) different techniques are used to optimize the performances even if a single algorithm may have more chances to operate the right choices. When the complete feature set is available, the classifier learning algorithm can better relate to the suitable representation level the different complex features like linguistic ones (e.g. syntactic categories associated to words in the training materialor terminological expressions). In [3] it has been suggested that classifiers based on generalized Rocchio formula can be used to weight features in category profiles in order to exploit the selectivity of linguistic information techniques in text classification. In this paper, a systematic study aimed to understand the role of Rocchio formula in selection and weighting of linguistic features will be described.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. Basili, A. Moschitti, and M.T. Pazienza. Language sensitive text classification. In Proceeding of 6th RIAO Conference, Collège de France, Paris, France, 2000.
R. Basili, A. Moschitti, and M.T. Pazienza. Modeling terminological information in text classification. In Proceeding of 7th TALN Conference, 2000.
R. Basili, A. Moschitti, and M.T. Pazienza. NLP-driven IR: Evaluating performances over text classification task. In Proceeding of the 10th IJCAI Conference, Seattle, Washington, USA, 2001.
David J. Ittner, David D. Lewis, and David D. Ahn. Text categorization of low quality images. In Proceedings of SDAIR-95, pages 301–315, Las Vegas, US, 1995.
G: Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513–523, 1988.
Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval Journal, May, 1999.
Y. Yang and Jan O. Pedersen. A comparative study on feature selection in text categorization. In Proceedings of ICML-97, pages 412–420, Nashville, US, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Basili, R., Moschitti, A., Pazienza, M.T. (2001). A Hybrid Approach to Optimize Feature Selection Process in Text Classification. In: Esposito, F. (eds) AI*IA 2001: Advances in Artificial Intelligence. AI*IA 2001. Lecture Notes in Computer Science(), vol 2175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45411-X_33
Download citation
DOI: https://doi.org/10.1007/3-540-45411-X_33
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42601-1
Online ISBN: 978-3-540-45411-3
eBook Packages: Springer Book Archive