A comparison of the performance of SVM and ARNI on Text Categorization with new filtering measures on an unbalanced collection

Combarro, Elías F.; Montañés, Elena; Ranilla, José; Fernández, Javier

doi:10.1007/3-540-44869-1_94

A comparison of the performance of SVM and ARNI on Text Categorization with new filtering measures on an unbalanced collection

Elías F. Combarro⁶,
Elena Montañés⁵,
José Ranilla⁵ &
…
Javier Fernández⁵

Conference paper
First Online: 01 January 2003

554 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2687))

Abstract

Text Categorization (TC) is the process of assigning documents to a set of previously fixed categories. A lot of research is going on with the goal of automating this time-consuming task due to the great amount of information available. Machine Learning (ML) algorithms are methods recently applied with this purpose. In this paper, we compare the performance of two of these algorithms (SVM and ARNI) on a collection with an unbalanced distribution of documents into categories. Feature reduction is previously applied with both classical measures (information gain and term frequency) and 3 new measures that we propose here for first time. We also compare their performance.

The research reported in this paper has been supported in part under MCyT and Feder grant TIC2001-3579 and FICYT grant BP01-114.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. W. Aha. A Study of Instance-based Algorithms for Supervised Learning Tasks: Mathematical, Empirical, and Psychological Evaluations. PhD thesis, University of California at Irvine, 1990.
Google Scholar
C. Apte, F. Damerau, and S. Weiss. Automated learning of decision rules for text categorization. Information Systems, 12(3):233–251, 1994.
Google Scholar
P. Clark and T. Niblett. The cn2 induction algorithm. Machine Learning, 3(4):261–283, 1989.
Google Scholar
W. Cohen. Fast effective rule induction. In International Conference on Machine Learning, 1995.
Google Scholar
W. Cohen. Learning to classify english text with ilp methods, 1995.
Google Scholar
P. Domingos. Unifying instance-based and rule-based induction. Machine Learning, 24:141–168, 1996.
Google Scholar
E. F-Combarro, I. Díaz, E. Monta nés, A. M. Pea, and J. Ranilla. Aplicacin de distintos mtodos de aprendizaje automtico a la clasificacin documental. In Con-ferencia Iberoamericana en Sistemas, Ciberntica e Informtica CISCI 2002, 2002.
Google Scholar
J. Fürnkranz and G. Widmer. Incremental reduced error pruning. In International Conference on Machine Learning, pages 70–77, 1994.
Google Scholar
T. Joachims. Text categorization with support vector machines: learning with many relevant features. In Claire Nédellec and Céline Rouveirol, editors, Proceedings of ECML-98, 10th European Conference on Machine Learning, number 1398, pages 137–142, Chemnitz, DE, 1998. Springer Verlag, Heidelberg, DE.
Google Scholar
S. Muggleton. Inverse entailment and progol. New Generation Computing, Special issue on Inductive Logic Programming, 13(3–4):245–286, 1995.
Article Google Scholar
E. Monta nés, J. Fernández, I. Díaz, E. F. Combarro, and J. Ranilla. Text categorisation with support vector machines and feature reduction. In Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation CIMCA2003, 2003.
Google Scholar
M. F. Porter. An algorithm for suffix stripping. Program (Automated Library and Information Systems), 14(3):130–137, 1980.
Article Google Scholar
J. R. Quinlan. Constructing decision tree in c4.5. In Programs of Machine Learning, pages 17–26. Morgan Kaufman, 1993.
Google Scholar
J. Ranilla, M. Garca-Pellitero, and A. Bahamonde. Construccin de rboles de decisin usando el nivel de impureza. In Proc. of the VIII Conf. of the Spanish Asoc. for Artificial Intelligence, volume I, pages 34–41, 1999.
Google Scholar
J. Ranilla, R. Mones, and A. Bahamonde. El nivel de impureza de una regla de clasificacin aprendida a partir de ejemplos. In VII Conferencia de la Asociacin Espaola para la Inteligencia Artificial, III Jornadas de Transferencia Tecnolgica de Inteligencia Artificial, volume I, pages 64–71, Murcia, Spain, 1999.
Google Scholar
J. Rochio. Relevance feedback in information retrieval. smart retrieval system. In Experiments in automatic document processing, pages 313–323. 1971.
Google Scholar
G. Salton and M. J. McGill. An introduction to modern information retrieval. McGraw-Hill, 1983.
Google Scholar
F. Sebastiani. Machine learning in automated text categorisation. ACM Computing Survey, 34(1), 2002.
Google Scholar
M. R. Spiegel. Estadstica. McGraw-Hill, 1970.
Google Scholar
V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
Google Scholar
T. Yang and J. P. Pedersen. A comparative study on feature selection in text categorisation. In Proceedings of ICML’97, 14th International Conference on Machine Learning, pages 412–420, 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Center, University of Oviedo, Campus de Viesques, Gijón(Asturias), Spain
Elena Montañés, José Ranilla & Javier Fernández
Computer Science Department, University of Oviedo, Campus de Viesques, Gijón(Asturias), Spain
Elías F. Combarro

Authors

Elías F. Combarro
View author publications
You can also search for this author in PubMed Google Scholar
Elena Montañés
View author publications
You can also search for this author in PubMed Google Scholar
José Ranilla
View author publications
You can also search for this author in PubMed Google Scholar
Javier Fernández
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

E.T.S. de Ingeniería Informática Departamento de Inteligencia Artificial, Universidad Nacional de Educación a Distancia, Juan del Rosal, 16, 28040, Madrid, Spain
José Mira & José R. Álvarez &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Combarro, E.F., Montañés, E., Ranilla, J., Fernández, J. (2003). A comparison of the performance of SVM and ARNI on Text Categorization with new filtering measures on an unbalanced collection. In: Mira, J., Álvarez, J.R. (eds) Artificial Neural Nets Problem Solving Methods. IWANN 2003. Lecture Notes in Computer Science, vol 2687. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44869-1_94

Download citation

DOI: https://doi.org/10.1007/3-540-44869-1_94
Published: 18 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40211-4
Online ISBN: 978-3-540-44869-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics