Abstract
This paper presents a proposal to improve REGAL, a concept learning system based on a distributed genetic algorithm that learns first-order logic multi-modal concept descriptions in the field of classification tasks. This algorithm has been a pioneer system and source of inspiration for others. Studying the philosophy and experimental behaviour of REGAL, we propose some improvements based principally on a new treatment of counterexamples that promote its underlying goodness in order to achieve better performances in accuracy, interpretability and scalability, so that the new system meets the main requirements for classification rules extraction in data mining. The experimental study carried out shows valuable improvements compared with both REGAL and G-Net distributed genetic algorithms and interesting results compared with some state-of-the-art representative algorithms in this field.
Similar content being viewed by others
References
Aguilar-Ruiz JS, Riquelme JC, Toro M (2003) Evolutionary learning of hierarchical decision rules. IEEE Trans Syst Man Cybern Part B Cybern 33(2):324–331
Alba E, Troya JM (1999) A survey of parallel distributed genetic algorithms. Complexity 4(4):31–52
Alba E, Nebro AJ, Troya JM (2002) Heterogeneous computing and parallel genetic algorithms. J Parallel Distrib Comput 62(9):1362–1385
Alcalá-Fdez J, Sánchez L, García S, del Jesus MJ, Ventura S, Garrell-Guiu JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318
An A, Cercone N (2000) Rule quality measures improve the accuracy of rule induction: an experimental approach. In: Foundations of intelligent systems. Lecture Notes in Computer Science, vol 1932. Springer, Berlin, pp 119–129
Anand R, Mehrotra K, Mohan CK, Ranka S (1995) Efficient classification for multiclass problems using modular neural networks. IEEE Trans Neural Netw 6(1):117–124
Asuncion A, Newman DJ (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html
Bacardit J, Goldberg D, Butz M (2007) Improving the performance of a Pittsburgh learning classifier system using a default rule. In: Kovacs T, Llorà X, Takadama K, Lanzi P, Stolzmann W, Wilson S (eds) Learning classifier systems. Lecture Notes in Computer Science, vol 4399. Springer, Berlin, pp 291–307
Ben-David A (2007) A lot of randomness is hiding in accuracy. Eng Appl Artif Intell 20(7):875–885
Bernadó-Mansilla E, Garrell-Guiu JM (2003) Accuracy-based learning classifier systems: models, analysis and applications to classification tasks. Evolut Comput 11(3):209–238
Bianchini R, Brown CM, Cierniak M, Meira W (1995) Combining distributed populations and periodic centralized selections in coarse-grain parallel genetic algorithms. In: Proceedings of the international conference on artificial neural networks and genetic algorithms 1995, pp 483–486
Cantú-Paz E (1998) A survey of parallel genetic algorithms. Calculateurs Paralleles 10:141–171
Carvalho DR, Freitas AA (2002) A genetic algorithm with sequential niching for discovering small-disjunct rules. In: Proceedings of the genetic and evolutionary computation conference. Morgan Kaufmann Publishers Inc., San Francisco, pp 1035–1042
Ching JY, Wong AKC, Chan KCC (1995) Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans Pattern Anal Mach Intell 17(7):641–651
Clark P, Boswell R (1991) Rule induction with CN2: some recent improvements. In: Kodratoff Y (ed) Machine learning EWSL-91. Lecture Notes in Computer Science, vol 482. Springer, Berlin, pp 151–163
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the 12th international conference on machine learning. Morgan Kaufmann, pp 115–123
De Jong KA, Spears WM, Gordon D (1993) Using genetic algorithms for concept learning. Special Issue Genet algorithms 13(2–3):161–188
De Jong KA, Potter M, Grefenstette JJ (1995) A coevolutionary approach to learning sequential decision rules. In: Proceedings of the sixth international conference on genetic algorithms. Morgan Kaufmann, pp 366–372
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(7):1–30
Domingos P (1995) Rule induction and instance-based learning a unified approach. In: Proceedings of the fourteenth international joint conference on artificial intelligence, vol 2, pp 1226–1232
Fernández A, García S, Luengo J, Bernadó-Mansilla E, Herrera F (2010) Genetics-based machine learning for rule induction: state of the art, taxonomy and comparative study. IEEE Trans Evolut Comput (in press)
Finner H (1993) On a monotonicity problem in step-down multiple test procedures. J Am Stat Assoc 88(423):920–923
Freitas AA (2001) Understanding the crucial role of attribute interaction in data mining. Artif Intell Rev 16(3):177–199
Freitas AA (2003) A survey of evolutionary algorithms for data mining and knowledge discovery. In: Ghosh A, Tsutsui S (eds) Advances in evolutionary computing: theory and applications. Springer-Verlag New York, Inc., New York, pp 819–845
Friedman JH (1996) Another approach to polychotomous classification. Tech. rep. Department of Statistics, Stanford University, Stanford, CA. http://www-stat.stanford.edu/jhf/ftp/poly.ps.Z
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Gallagher M, Bo Y (2005) A hybrid approach to parameter tuning in genetic algorithms. In: Proceedings of 2005 IEEE congress on evolutionary computation, IEEE, vol 2, pp 1096–1103
García S, Fernández A, Luengo J, Herrera F (2009) A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput Fusion Found Methodol Appl 13(10):959–977
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inform Sci 180(10):2044–2064
Giordana A, Neri F (1995) Search-intensive concept induction. Evolut Comput 3(4):375–416
Giordana A, Saitta L, Bello GL (1997) A coevolutionary approach to concept learning. In: ISMIS ’97: Proceedings of the 10th international symposium on foundations of intelligent systems, vol 1325. Springer, London, UK, pp 257–266
Greene DP, Smith SF (1993) Competition-based induction of decision models from examples. Mach Learn 13(2):229–257
Hekanaho J (1997) GA-based rule enhancement in concept learning. In: Proceedings of the third international conference on knowledge discovery and data mining. AAAI Press, pp 183–186
Herrera F, Lozano M (2003) Fuzzy adaptive genetic algorithms: design, taxonomy and future directions. Soft Comput 7(8):545–562
Ho Y, Pepyne D (2002) Simple explanation of the no-free-lunch theorem and its implications. J Optim Theory Appl 115(3):549–570
Holden N, Freitas A (2009) Hierarchical classification of protein function with ensembles of rules and particle swarm optimisation. Soft Comput 13(3):259–272
Holland JH, Reitman JS (1977) Cognitive systems based on adaptive algorithms. In: Waterman DA, Hayes-Roth F (eds) Pattern directed inference systems. Academic Press, New York, pp 313–329
Janikow CZ (1993) A knowledge-intensive genetic algorithm for supervised learning. Mach Learn 13(2):189–228
Jiao L, Liu J, Zhong W (2006) An organizational coevolutionary algorithm for classification. IEEE Trans Evolut Comput 10(1):67–80
Kim MW, Ryu JW (2007) An efficient coevolutionary algorithm using dynamic species control. In: Proceedings of the third international conference on natural computation (ICNC 2007), vol 3. IEEE, Haikou, pp 431–435
Knerr S, Personnaz L, Dreyfus G (1990) Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Fogelman J (ed) Neurocomputing: algorithms, architectures and applications, vol F68. Springer, NATO ASI, New York, pp 41–50
Lanzi PL (2008) Learning classifier systems: then and now. Evolut Intell 1(1):63–82
Liu JJ, Kwok JTY (2000) An extended genetic rule induction algorithm. In: Proceedings of the 2000 congress on evolutionary computation, vol 1, CEC00 (Cat. No. 00TH8512), IEEE, La Jolla, CA, pp 458–463
Marín-Blázquez J, Martínez Pérez G (2009) Intrusion detection using a linguistic hedged fuzzy-xcs classifier system. Soft Comput 13(3):273–290
Mendes RRF, Voznika FDB, Freitas AA, Nievola JC (2001) Discovering fuzzy classification rules with genetic programming and co-evolution. In: Proceedings of the fifth European conference on principles of data mining and knowledge discovery. Lecture Notes In Computer Science, vol 2168. Springer, London. pp 314–325
Michalewicz Z (1996) Genetic algorithms + data structures = evolution programs, 3rd edn. Springer, London, UK
Michalski RS (1980) Pattern recognition as rule-guided inductive inference. IEEE Trans Pattern Anal Mach Intell 2(4):349–361
Michalski RS (1983) A theory and methodology of inductive learning. Artif Intell 20(2):111–161
Mitchell TM (1982) Generalization as search. Artif Intell 18(2):203–226
Neri F (2002) Relational concept learning by cooperative evolution. J Exp Algorithm 7:12–37
Neri F, Saitta L (1996) An analysis of the universal suffrage selection operator. Evolut Comput 4(1):87–107
Nojima Y, Ishibuchi H, Kuwajima I (2008) Parallel distributed genetic fuzzy rule selection. Soft Comput 13(5):511–519
Orriols-Puig A, Bernadó-Mansilla E (2005) The class imbalance problem in learning classifier systems. In: Proceedings of the 2005 workshops on genetic and evolutionary computation, GECCO ’05. ACM Press, New York, pp 74–78
Orriols-Puig A, Bernadó-Mansilla E (2009) Evolutionary rule-based systems for imbalanced data sets. Soft Comput 13(3):213–225
Orriols-Puig A, Casillas J, Bernadó-Mansilla E (2008) Genetic-based machine learning systems are competitive for pattern recognition. Evolut Intell 1(3):209–232
Provost F, Kolluri V (1999) A survey of methods for scaling up inductive algorithms. Data Min Knowl Discov 3(2):131–169
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA
Reynolds A, de la Iglesia B (2009) A multi-objective grasp for partial classification. Soft Comput 13(3):227–243
Rissanen J (1989) Stochastic complexity in statistical inquiry theory. World Scientific Publishing Co., Inc., River Edge, NJ
Rivero D, Dorado J, Rabual J, Pazos A (2009) Modifying genetic programming for artificial neural network development for data mining. Soft Comput 13(3):291–305
Rodríguez M, Escalante DM, Peregrín A (2010) Efficient distributed genetic algorithm for rule extraction. Appl Soft Comput (in press)
Stout M, Bacardit J, Hirst J, Smith R, Krasnogor N (2009) Prediction of topological contacts in proteins using learning classifier systems. In: Special issue on evolutionary and metaheuristics based data mining (EMBDM), vol 13. Springer, Berlin, pp 245–258
Tan KC, Yu Q, Ang JH (2006a) A dual-objective evolutionary algorithm for rules extraction in data mining. Comput Optim Appl 34(2):273–294
Tan KC, Yu Q, Ang JH (2006b) A dual-objective evolutionary algorithm for rules extraction in data mining. Int J Syst Sci 37(12):835–864
Venturini G (1993) SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Machine learning: ECML-93. Lecture Notes in Computer Science, vol 667. Springer, Berlin, pp 280–296
Weilie Y, Qizhen L, Yongbao H (2000) Dynamic distributed genetic algorithms. In: Proceedings of the 2000 congress on evolutionary computation, vol 2. IEEE, La Jolla, CA, pp 1132–1136
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometr Bull 1(6):80–83
Wilson SW (1995) Classifier fitness based on accuracy. Evolut Comput 3(2):149–175
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inform Technol Decis Mak 5(4):597–604
Yoon HS, Moon BR (2002) An empirical study on the synergy of multiple crossover operators. IEEE Trans Evolut Comput 6(2):212–223
Zar JH (2007) Biostatistical analysis, 5th edn. Prentice-Hall, Inc., Upper Saddle River, NJ
Zhang X, Luo M, Pi D (2005) Effective classifier pruning with rule information. In: Hoffmann A, Motoda H, Scheffer T (eds) Discovery science. Lecture Notes in Computer Science, vol 3735. Springer, Berlin, pp 392–395
Acknowledgments
This paper was supported in part by the Spanish Ministry of Education and Science under grant no. TIN2008-06681-C06-06 and the Andalusian government under grant no. P07-TIC-03179.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lopez, L.I., Bardallo, J.M., De Vega, M.A. et al. REGAL-TC: a distributed genetic algorithm for concept learning based on REGAL and the treatment of counterexamples. Soft Comput 15, 1389–1403 (2011). https://doi.org/10.1007/s00500-010-0678-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-010-0678-8