A Wrapper Approach with Support Vector Machines for Text Categorization

Montañés, E.; Quevedo, J.R.; Díaz, I.

doi:10.1007/3-540-44868-3_30

E. Montañés⁵,
J.R. Quevedo⁵ &
I. Díaz⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2686))

Included in the following conference series:

International Work-Conference on Artificial Neural Networks

1021 Accesses
6 Citations

Abstract

Text Categorization (TC)-the assignment of predefined categories to documents of a corpus-plays an important role in a wide variety of information organization and management tasks of Information Retrieval (IR). It involves the management of a lot of information, but some of them could be noisy or irrelevant and hence, a previous feature reduction could improve the performance of the classification. In this paper we proposed a wrapper approach. This kind of approach is timeconsuming and sometimes could be infeasible. But our wrapper explores a reduced number of feature subsets and also it uses Support Vector Machines (SVM) as the evaluation system; and this two properties make the wrapper fast enough to deal with large number of features present in text domains. Taking the Reuters-21578 corpus, we also compare this wrapper with the common approach for feature reduction widely applied in TC, which consists of filtering according to scoring measures.

The research reported in this paper has been supported in part under MCyT and Feder grant TIC2001-3579 and FICYT grant BP01-114.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D.W. Aha and R.L. Bankert. Feature selection for case-based classification of cloud types: An empirical comparison. In Proceedings of the AAAI94 Workshop on Case-Based Reasoning, 1994.
Google Scholar
C. Apte, F. Damerau, and S. Weiss. Automated learning of decision rules for text categorization. Information Systems, 12(3):233–251, 1994.
Google Scholar
R. Caruana and D. Freitag. Greedy attribute selection. In Proceedings of the 11th International Conference on Machine Learning ICML94, 1994.
Google Scholar
K. J. Cherkauer and J. W. Shavlik. Growing simpler decision trees to facilitate knowledge discovery. In Proceedings of the 2th International Conference on Knowledge Discovery and Data Mining KDD96, 1996.
Google Scholar
Reuters Collection. http://www.research.attp.com/lewis/reuters21578.html.
E. F-Combarro, I. Díaz, E. Monta nés, A. M. Pea, and J. Ranilla. Aplicacin de distintos mtodos de aprendizaje automtico a la clasificacin documental. In Conferencia Iberoamericana en Sistemas, Ciberntica e Informtica CISCI 2002, 2002.
Google Scholar
T. Joachims. Text categorization with support vector machines: learning with many relevant features. In Claire Nédellec and Céline Rouveirol, editors, Proceedings of ECML-98, 10th European Conference on Machine Learning, number 1398, pages 137–142, Chemnitz, DE, 1998. Springer Verlag, Heidelberg, DE.
Google Scholar
G.H. John, R. Kohavi, and K. Pfleger. Irrelevant features and the subset selection problem. In Proceedings of the 11th International Conference on Machine Learning ICML94, 1994.
Google Scholar
R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(12):273–324, 1997.
Article MATH Google Scholar
H. Liu and R. Setiono. A probabilistic approach to feature selection— a filter solution. In Proceedings of the 13th International Conference on Machine Learning ICML96, 1996.
Google Scholar
E. Monta nés, J. Fernández, I. Díaz, E. F. Combarro, and J. Ranilla. Text categorisation with support vector machines and feature reduction. In Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation CIMCA2003, 2003.
Google Scholar
M. F. Porter. An algorithm for suffix stripping. Program (Automated Library and Information Systems), 14(3):130–137, 1980.
Article Google Scholar
J. R. Quevedo, E. Monta nés, and M. A. Alonso. Feature selection on modelling continuous systems by examples. In Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation CIMCA2003, 2003.
Google Scholar
G. Salton and M. J. McGill. An introduction to modern information retrieval. McGraw-Hill, 1983.
Google Scholar
F. Sebastiani. Machine learning in automated text categorisation. ACM Computing Survey, 34(1), 2002.
Google Scholar
V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
Google Scholar
T. Yang and J. P. Pedersen. A comparative study on feature selection in text categorisation. In Proceedings of ICML’97, 14th International Conference on Machine Learning, pages 412–420, 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Center, University of Oviedo, Campus de Viesques, Gijón (Asturias), Spain
E. Montañés, J.R. Quevedo & I. Díaz

Authors

E. Montañés
View author publications
You can also search for this author in PubMed Google Scholar
J.R. Quevedo
View author publications
You can also search for this author in PubMed Google Scholar
I. Díaz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

E.T.S. de Ingeniería Informática Departamento de Inteligencia Artificial, Universidad Nacional de Educación a Distancia, Juan del Rosal, 16, 28040, Madrid, Spain
José Mira & José R. Álvarez &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Montañés, E., Quevedo, J., Díaz, I. (2003). A Wrapper Approach with Support Vector Machines for Text Categorization. In: Mira, J., Álvarez, J.R. (eds) Computational Methods in Neural Modeling. IWANN 2003. Lecture Notes in Computer Science, vol 2686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44868-3_30

Download citation

DOI: https://doi.org/10.1007/3-540-44868-3_30
Published: 18 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40210-7
Online ISBN: 978-3-540-44868-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics