Evolutionary learning of document categories

Serrano, J. I.; Castillo, M. D. del

doi:10.1007/s10791-006-9012-6

Evolutionary learning of document categories

Published: 22 August 2006

Volume 10, pages 69–83, (2007)
Cite this article

Download PDF

Information Retrieval Aims and scope Submit manuscript

Evolutionary learning of document categories

Download PDF

J. I. Serrano¹ &
M. D. del Castillo¹

114 Accesses
4 Citations
Explore all metrics

Abstract

This paper deals with a supervised learning method devoted to producing categorization models of text documents. The goal of the method is to use a suitable numerical measurement of example similarity to find centroids describing different categories of examples. The centroids are not abstract or statistical models, but rather consist of bits of examples. The centroid-learning method is based on a Genetic Algorithm for Texts (GAT). The categorization system using this genetic algorithm infers a model by applying the genetic algorithm to each set of preclassified documents belonging to a category. The models thus obtained are the category centroids that are used to predict the category of a test document. The experimental results validate the utility of this approach for classifying incoming documents.

References

del, Castillo M. D., & Serrano, J. I. (2004). A multistrategy approach for digital text categorization from imbalanced documents. ACM SIGKDD Explorations, 6, 70–79.
Cohen, W. W., & Singer, Y. (1999). Context-sensitive learning methods for text categorization. ACM Trans. Inform. Systems, 17(2), 141–173.
Article Google Scholar
Cohen, W. W. (1995). Learning to classify English text with ILP methods. In L. De Raedt, (Ed.), Advances logic programming (pp. 124–143). Amsterdam: IOS Press.
Google Scholar
Doan, A., Domingos, P., & Halevy, A. (2003). Learning to match the schemas of data sources: a multistrategy approach. Machine Learning, 50, 279–301.
Article Google Scholar
Dumais, S. T., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representation for text categorization. In Proceedings of the CIKM-98, 7th International Conference on Information and Knowledge Management (pp. 148–155). Bethesda.
Goldberg, D. (1989). Genetic algorithms in search, optimization & machine learning, (ed.) Addison-Wesley Publishing Company, Inc.
Godoy, D., & Amandi, A. (2000). PersonalSearcher: an intelligent agent for searching web pages (pp. 43–52). LNAI, 1952. Springer-Verlag.
Grobelnik, M., & Mladenic, D. (1998). Efficient text categorization. In text mining workshop on the 10th european conference on machine learning (pp. 1–10). Chemnitz.
Han Eui-Hong, S., Karypis, G., & Kumar, V. (2001). Text categorization using weight adjusted k-nearest neighbor classification. In PAKDD’2001 (pp. 53–65). Springer-Verlag, LNAI 2035.
Hull, D. A. (1994). Improving text retrieval for the routing problem using latent semantic indexing. In Proceedings of SIGIR-94, 17th ACM International Conference on Research and Development in Information Retrieval (pp. 282–289). Dublin.
Joachims, T. (1998). Text categorization with support vector machines. In Proceedings of ECML-98 10th European Conference on Machine Learning (pp. 137–142). Chemnitz.
Lenz, M., Hubner, A., & Kunze, M. (1998). Textual CBR. In M. Lenz, B. Bartsch, B. D. Burkhard, and S. Wess (Eds.), Case-based reasoning technology (pp. 115–138). Springer-Verlag, LNAI 1400.
Lewis, D. D. (1998). Naïve Bayes at forty: The independence assumption in information retrieval. In Proceedings of ECML-98, 10th European Conference on Machine Learning (pp. 4–15). Germany.
Lewis, D. D., & Gale, W. A. (1994). Heterogeneous uncertainty sampling for supervised learning. In Proceedings of SIGIR-94, 11th International Conference on Research and Development in Information Retrieval (pp. 3–12). Dublin.
Lewis, D. D., & Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval (pp. 81–93). Las Vegas.
Mitchell, T. M. (1997) Machine learning. The McGraw- Hill Companies.
Porter, M. F. (1980) An algorithm for suffix stripping. Program, 14(3), 130–137.
Google Scholar
Ritcher, M. M. (1995). The knowledge contained in similarity measures. In Invited Talk at ICCBR-95.
Ruiz, M. E., & Srinivasan, P. (1997). Automatic text categorization using neural networks. In Proceedings of the 8th ASIS/SIGCR Workshop on Classification Research (pp. 59–72). Washington.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47.
Article Google Scholar
Sinka, M. P., & Corne, D. W. (2002). A large benchmark dataset for web document clustering. In A. Abraham, J. Ruiz-del-Solar, and M. Koeppen (eds.), Soft computing systems: design, management and applications (pp. 881–890). (Volume 87 of Frontiers in Artificial Intelligence and Applications, 2002).
Weiss, S. M., Apté, Damerau, F. J., Johnson, D. E., Oles, F. J., Goezt, T., & Hampp, T. (1999). Maximizing text-mining performance. IEEE Intelligent Systems, 14(4), 63–69.
Yang, Y., & Pedersen, J. P. (1997). A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning (pp. 412–420). Nashville.
Zechner, K. (1997). A literature survey on text summarization. Paper for Directed Reading (Fall 1996), Carnegie Mellon university. Computational Linguistics.

Download references

Author information

Authors and Affiliations

Instituto de Automática Industrial, CSIC, Madrid, Spain
J. I. Serrano & M. D. del Castillo

Authors

J. I. Serrano
View author publications
You can also search for this author in PubMed Google Scholar
M. D. del Castillo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. D. del Castillo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Serrano, J.I., Castillo, M.D.d. Evolutionary learning of document categories. Inf Retrieval 10, 69–83 (2007). https://doi.org/10.1007/s10791-006-9012-6

Download citation

Received: 03 December 2004
Accepted: 17 July 2006
Published: 22 August 2006
Issue Date: January 2007
DOI: https://doi.org/10.1007/s10791-006-9012-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Evolutionary learning of document categories

Abstract

Article PDF

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Evolutionary algorithms and their applications to engineering problems

Siamese Neural Networks: An Overview

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evolutionary learning of document categories

Abstract

Article PDF

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Evolutionary algorithms and their applications to engineering problems

Siamese Neural Networks: An Overview

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation