A cooperative coevolutionary algorithm for instance selection for instance-based learning

García-Pedrajas, Nicolás; Romero del Castillo, Juan Antonio; Ortiz-Boyer, Domingo

doi:10.1007/s10994-009-5161-3

A cooperative coevolutionary algorithm for instance selection for instance-based learning

Published: 08 December 2009

Volume 78, pages 381–420, (2010)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

A cooperative coevolutionary algorithm for instance selection for instance-based learning

Download PDF

Nicolás García-Pedrajas¹,
Juan Antonio Romero del Castillo¹ &
Domingo Ortiz-Boyer¹

1146 Accesses
60 Citations
Explore all metrics

Abstract

This paper presents a cooperative evolutionary approach for the problem of instance selection for instance based learning. The model presented takes advantage of one of the recent paradigms in the field of evolutionary computation: cooperative coevolution. This paradigm is based on a similar approach to the philosophy of divide and conquer. In our method, the training set is divided into several subsets that are searched independently. A population of global solutions relates the search in different subsets and keeps track of the best combinations obtained. The proposed model has the advantage over standard methods in that it does not rely on any specific distance metric or classifier algorithm. Additionally, the fitness function of the individuals considers both storage requirements and classification accuracy, and the user can balance both objectives depending on his/her specific needs, assigning different weights to each one of these two terms. The method also shows good scalability when applied to large datasets.

The proposed model is favorably compared with some of the most successful standard algorithms, IB3, ICF and DROP3, with a genetic algorithm using CHC method, and with four recent methods of instance selection, MSS, entropy-based instance selection, IMOEA and LVQPRU. The comparison shows a clear advantage of the proposed algorithm in terms of storage requirements, and is, at least, as good as any of the other methods in terms of testing error. A large set of 50 problems from the UCI Machine Learning Repository is used for the comparison. Additionally, a study of the effect of instance label noise is carried out, showing the robustness of the proposed algorithm.

The major contribution of our work is showing that cooperative coevolution can be used to tackle large problems taking advantage of its inherently modular nature. We show that a combination of cooperative coevolution together with the principle of divide-and-conquer can be very effective both in terms of improving performance and in reducing computational cost.

Article PDF

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A comparative analysis of gradient boosting algorithms

Article 24 August 2020

A survey on ensemble learning

Article 30 August 2019

References

Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.
Google Scholar
Anderson, T. W. (1984). An introduction to multivariate statistical analysis (2nd edn.). Wiley series in probability and mathematical statistics. New York: Wiley.
MATH Google Scholar
Baluja, S. (1994). Population-based incremental learning (Technical Report CMU-CS-94-163). Carnegie Mellon University, Pittsburgh.
Barandela, R., Ferri, F. J., & Sánchez, J. S. (2005). Decision boundary preserving prototype selection for nearest neighbor classification. International Journal of Pattern Recognition and Artificial Intelligence, 19(6), 787–806.
Article Google Scholar
Blum, A., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, 245–271.
Article MATH MathSciNet Google Scholar
Bongard, J., & Lipson, H. (2005a). Active coevolutionary learning of deterministic finite automata. Journal of Machine Learning Research, 6, 1651–1678.
MathSciNet Google Scholar
Bongard, J. C., & Lipson, H. (2005b). Nonlinear system identification using coevolution of models and tests. IEEE Transactions on Evolutionary Computation, 9(4), 361–384.
Article Google Scholar
Brighton, H., & Mellish, C. (2002). Advances in instance selection for instance-based learning algorithms. Data Mining and Knowledge Discovery, 6, 153–172.
Article MATH MathSciNet Google Scholar
Brodley, C. E. (1995). Recursive automatic bias selection for classifier construction. Machine Learning, 20(1/2), 63–94.
Article Google Scholar
Cano, J. R., Herrera, F., & Lozano, M. (2003). Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Transactions on Evolutionary Computation, 7(6), 561–575.
Article Google Scholar
Cano, J. R., Herrera, F., & Lozano, M. (2005). Stratification for scaling up evolutionary prototype selection. Pattern Recognition Letters, 26(7), 953–963.
Article Google Scholar
Cano, J. R., Herrera, F., & Lozano, M. (2007). Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability. Data & Knowledge Engineering, 60(1), 90–108.
Article Google Scholar
Chaudhuri, S., Motwani, R., & Narasayya, V. (1998). Random sampling for histogram construction: how much is enough? In L. Haas & A. Tiwary (Eds.), Proceedings of ACM SIGMOD, international conference on management of data (pp. 436–447). New York, USA.
Chellapilla, K., & Fogel, D. B. (1999). Evolving neural networks to play checkers without relying on expert knowledge. IEEE Transactions on Neural Networks, 10(6), 1382–1391.
Article Google Scholar
Chen, J. H., Chen, H. M., & Ho, S. Y. (2005). Design of nearest neighbor classifiers: multi-objective approach. International Journal of Approximate Reasoning, 40(1–2), 3–22.
Article MATH Google Scholar
Cochran, W. (1977). Sampling techniques. New York: Wiley.
MATH Google Scholar
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. Wiley series in telecommunication. New York: Wiley.
Book MATH Google Scholar
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
Google Scholar
Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1923.
Article Google Scholar
Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning, 40, 139–157.
Article Google Scholar
Eshelman, L. J. (1990). The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. San Mateo: Morgan Kauffman.
Google Scholar
García-Pedrajas, N., & Fyfe, C. (2007) Immune network based ensembles. In Neurocomputing (pp. 1155–1166).
García-Pedrajas, N., & Ortiz-Boyer, D. (2007). A cooperative constructive method for neural networks for pattern recognition. Pattern Recognition, 40(1), 80–99.
Article MATH Google Scholar
García-Pedrajas, N., Hervás-Martínez, C., & Muñoz-Pérez, J. (2002). Multiobjective cooperative coevolution of artificial neural networks (multi-objective cooperative networks). Neural Networks, 15(10), 1255–1274.
Article Google Scholar
García-Pedrajas, N., Hervás-Martínez, C., & Ortiz-Boyer, D. (2005). Cooperative coevolution of artificial neural network ensembles for pattern classification. IEEE Transactions on Evolutionary Computation, 9(3), 271–302.
Article Google Scholar
Gates, G. W. (1972). The reduced nearest neighbor rule. IEEE Transactions on Information Theory, 18(3), 431–433.
Article Google Scholar
Goldberg, D. E. (1989). Genetic algorithms in search, optimization and machine learning. Reading: Addison–Wesley.
MATH Google Scholar
Grefenstette, J. J. (1987). Incorporating problem specific knowledge into genetic algorithms. In L. Davis (Ed.), Genetic algorithms and simulated annealing (pp. 42–60). San Mateo: Morgan Kaufmann.
Google Scholar
Hettich, S., Blake, C., & Merz, C. (1998). UCI Repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html.
Hillis, W. D. (1991). Co-evolving parasites improves simulated evolution as an optimization procedure. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (Eds.), In Artificial Life II (pp. 313–384).
Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor: The University of Michigan Press.
Google Scholar
Hussain, F., Liu, H., Tan, C., & Dash, M. (1999). Discretization: an enabling technique (Technical Report TRC6/99). School of Computing, National University of Singapore.
Ishibuchi, H., & Nakashima, T. (2000). Pattern and feature selection by genetic algorithms in nearest neighbor classification. Journal of Advanced Computational Intelligence and Intelligent Informatics, 4(2), 138–145.
Google Scholar
Kivinen, J., & Mannila, H. (1994). The power of sampling in knowledge discovery. In Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (pp. 77–85). Minneapolis, Minnesota, USA.
Kuncheva, L. (1995). Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognition Letters, 16, 809–814.
Article Google Scholar
Leung, Y. W., & Wang, Y. P. (2001). An orthogonal genetic algorithm with quantization for global numerical optimization. IEEE Transactions on Evolutionary Computation, 5(1), 41–53.
Article Google Scholar
Li, J., Manry, M. T., Yu, C., & Wilson, D. R. (2005). Prototype classifier design with pruning. International Journal of Artificial Intelligence Tools, 14(1–2), 261–280.
Google Scholar
Liu, H., & Motoda, H. (1998). Feature selection for knowledge discovery and data mining. Norwell: Kluwer.
MATH Google Scholar
Liu, H., & Motoda, H. (2002). On issues of instance selection. Data Mining and Knowledge Discovery, 6, 115–130.
Article MathSciNet Google Scholar
Louis, S. J., & Li, G. (1997). Combining robot control strategies using genetic algorithms with memory. In Lecture notes in computer science : Vol. 1213. Evolutionary programming VI (pp. 431–442). Berlin: Springer.
Chapter Google Scholar
Michalewicz, Z. (1994). Genetic algorithms + Data structures = Evolution programs. New York: Springer.
MATH Google Scholar
Moriarty, D. E., & Miikkulainen, R. (1995). Discovering complex Othello strategies through evolutionary neural networks. Connection Science, 7(3), 195–209.
Article Google Scholar
Moriarty, D. E., & Miikkulainen, R. (1996). Efficient reinforcement learning through symbiotic evolution. Machine Learning, 22, 11–32.
Google Scholar
Potter, M., & De Jong, K. A. (1994). A coopetative coevolutionary approach to function optimization. In Proceedings of the third conference on parallel problem solving from nature (pp. 249–257).
Potter, M. A., & De Jong, K. A. (2000). Cooperative coevolution: an architecture for evolving coadapted subcomponents. Evolutionary Computation, 8(1), 1–29.
Article Google Scholar
Provost, F. J., & Kolluri, V. (1999). A survey of methods for scaling up inductive learning algorithms. Data Mining and Knowledge Discovery, 2, 131–169.
Article Google Scholar
Reeves, C. R., & Bush, D. R. (2001). Using genetic algorithms for training data selection in RBF networks. In H. Liu & H. Motoda (Eds.), Instances selection and construction for data mining (pp. 339–356). Norwell: Kluwer.
Google Scholar
Ritter, G. L., Woodruff, H. B., Lowry, S. R., & Isenhour, T. L. (1975). An algorithm for selective nearest neighbor decision rule. IEEE Transactions on Information Theory, 21(6), 665–669.
Article MATH Google Scholar
Rosin, C. D., & Belew, R. K. (1997). New methods for competitive coevolution. Evolutionary Computation, 5(1), 1–29.
Article Google Scholar
Smith, P. (1998). Into statistics. Singapore: Springer.
MATH Google Scholar
Smyth, B., & Keane, M. T. (1995). Remembering to forget. In C. S. Mellish (Ed.), Proceedings of the fourteenth international conference on artificial intelligence (Vol. 1. pp. 377–382).
Son, S. H., & Kim, J. Y. (2006). Data reduction for instance-based learning using entropy-based partitioning. In Lecture notes in computer science : Vol. 3982. Proceedings of the international conference on computational science and its applications—ICCSA 2006 (pp. 590–599). Berlin: Springer.
Google Scholar
Syswerda, G. (1991). A study of reproduction in generational and steady-state genetic algorithms. In G. Rawlins (Ed.), Foundations of genetic algorithms (pp. 94–101). San Mateo: Morgan Kaufmann.
Google Scholar
Tomek, I. (1976). Two modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, SMC-6, 769–772.
MathSciNet Google Scholar
Whitley, D. (1989). The GENITOR algorithm and selective pressure. In M. K. Publishers (Ed.), Proc. 3rd international conf. on genetic algorithms (pp. 116–121). Los Altos, CA.
Whitley, D., & Kauth, J. (1988). GENITOR: a different genetic algorithm. In Proceedings of the Rocky mountain conference on artificial intelligence (pp. 118–130). Denver, CO.
Whitley, D., & Starkweather, T. (1990). GENITOR II: a distributed genetic algorithm. Journal of Experimental Theoretical Artificial Intelligence, 2, 189–214.
Article Google Scholar
Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, 2(3), 408–421.
Article MATH Google Scholar
Wilson, D. R., & Martinez, A. R. (1997). Instance pruning techniques. In D. Fisher (Ed.), Proceedings of the fourteenth international conference on machine learning (pp. 404–411). San Francisco, CA, USA.
Wilson, D. R., & Martinez, T. R. (2000). Reduction techniques for instance-based learning algorithms. Machine Learning, 38, 257–286.
Article MATH Google Scholar
Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C. M., & Grunert, V. (2003). Performance assesment of multiobjective optimizers: an analysis and review. IEEE Transactions on Evolutionary Computation, 7(2), 117–132.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing and Numerical Analysis, University of Córdoba, Campus Universitario de Rabanales, 14071, Córdoba, Spain
Nicolás García-Pedrajas, Juan Antonio Romero del Castillo & Domingo Ortiz-Boyer

Authors

Nicolás García-Pedrajas
View author publications
You can also search for this author in PubMed Google Scholar
Juan Antonio Romero del Castillo
View author publications
You can also search for this author in PubMed Google Scholar
Domingo Ortiz-Boyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolás García-Pedrajas.

Additional information

Editor: Risto Miikkulainen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

García-Pedrajas, N., Romero del Castillo, J.A. & Ortiz-Boyer, D. A cooperative coevolutionary algorithm for instance selection for instance-based learning. Mach Learn 78, 381–420 (2010). https://doi.org/10.1007/s10994-009-5161-3

Download citation

Received: 30 October 2006
Revised: 10 August 2009
Accepted: 24 October 2009
Published: 08 December 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s10994-009-5161-3

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A cooperative coevolutionary algorithm for instance selection for instance-based learning

Abstract

Article PDF

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A comparative analysis of gradient boosting algorithms

A survey on ensemble learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Navigation

A cooperative coevolutionary algorithm for instance selection for instance-based learning

Abstract

Article PDF

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A comparative analysis of gradient boosting algorithms

A survey on ensemble learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation