Abstract
Algorithmic scalability is a major concern for any machine learning strategy in this age of ‘big data’. A large number of potentially predictive attributes is emblematic of problems in bioinformatics, genetic epidemiology, and many other fields. Previously, ExSTraCS was introduced as an extended Michigan-style supervised learning classifier system that combined a set of powerful heuristics to successfully tackle the challenges of classification, prediction, and knowledge discovery in complex, noisy, and heterogeneous problem domains. While Michigan-style learning classifier systems are powerful and flexible learners, they are not considered to be particularly scalable. For the first time, this paper presents a complete description of the ExSTraCS algorithm and introduces an effective strategy to dramatically improve learning classifier system scalability. ExSTraCS 2.0 addresses scalability with (1) a rule specificity limit, (2) new approaches to expert knowledge guided covering and mutation mechanisms, and (3) the implementation and utilization of the TuRF algorithm for improving the quality of expert knowledge discovery in larger datasets. Performance over a complex spectrum of simulated genetic datasets demonstrated that these new mechanisms dramatically improve nearly every performance metric on datasets with 20 attributes and made it possible for ExSTraCS to reliably scale up to perform on related 200 and 2000-attribute datasets. ExSTraCS 2.0 was also able to reliably solve the 6, 11, 20, 37, 70, and 135 multiplexer problems, and did so in similar or fewer learning iterations than previously reported, with smaller finite training sets, and without using building blocks discovered from simpler multiplexer problems. Furthermore, ExSTraCS usability was made simpler through the elimination of previously critical run parameters.
Similar content being viewed by others
References
Bacardit J, Burke E, Krasnogor N (2009) Improving the scalability of rule-based evolutionary learning. Memet Comput 1(1):55–67
Bacardit J, Goldberg D, Butz M, Llora X, Garrell J (2004) Speeding-up pittsburgh lcss: modeling time and accuracy. In: Parallel problem solving from nature-PPSN VIII. Springer, New York, , pp 1021–1031
Bacardit J, Krasnogor N (2009) A mixed discrete-continuous attribute list representation for large scale classification domains. In: Proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, New York, pp 1155–1162
Bacardit J, Krasnogor N (2009) Performance and efficiency of memetic pittsburgh learning classifier systems. Evol comput 17(3):307–342
Bacardit J, Llorà X (2013) Large-scale data mining using genetics-based machine learning. Wiley Interdiscip Rev Data Min Knowl Discov 3(1):37–61
Bacardit J, Stout M, Hirst J, Sastry K, Llorà X, Krasnogor N (2007) Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In: Proceedings of the 9th annual conference on genetic and evolutionary computation. ACM, New York, pp 346–353
Bagallo G, Haussler D (1990) Boolean feature discovery in empirical learning. Machine Learn 5(1):71–99
Barto AG, Anandan P, Anderson CW (1986) Cooperativity in networks of pattern recognizing stochastic learning automata. In: Adaptive and learning systems. Springer, New York, pp 235–246
Bernadó-Mansilla E, Garrell-Guiu J (2003) Accuracy-based learning classifier system: models, analysis and applications to classification tasks. Evol Comput 11(3):209–238
Bull L, Studley M, Bagnall A, Whittley I (2007) Learning classifier system ensembles with rule-sharing. IEEE Trans Evol Comput 11(4):496–502
Butz MV (2006) Xcs in binary classification problems. In: Rule-based evolutionary online learning systems. Springer, New York, pp 147–156
Butz MV, Goldberg DE, Tharakunnel K (2003) Analysis and improvement of fitness exploitation in xcs: bounding models, tournament selection, and bilateral accuracy. Evol Comput 11(3):239–277
Butz MV, Kovacs T, Lanzi PL, Wilson SW (2004) Toward a theory of generalization and learning in xcs. IEEE Trans Evol Comput 8(1):28–46
Butz MV, Pelikan M (2006) Studying xcs/boa learning in boolean functions: structure encoding and random boolean functions. In: Proceedings of the 8th annual conference on genetic and evolutionary computation. ACM, New York, pp 1449–1456
Butz MV, Sastry K, Goldberg DE (2003) Tournament selection: stable fitness pressure in xcs. In: Genetic and evolutionary computation GECCO 2003. Springer, New York, pp 1857–1869
DeJong KA, Spears WM (1990) Learning concept classification rules using genetic algorithms. Tech. rep, DTIC document
Franco M, Krasnogor N, Bacardit J (2010) Speeding up the evaluation of evolutionary learning systems using gpgpus. In: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, New York, pp 1039–1046
Granizo-Mackenzie D, Moore JH (2013) Multiple threshold spatially uniform relieff for the genetic analysis of complex human diseases. In: Evolutionary computation, machine learning and data mining in bioinformatics. Springer, New York, pp 1–10
Greene C, Penrod N, Kiralis J, Moore J (2009) Spatially uniform relieff (surf) for computationally-efficient filtering of gene-gene interactions. BioData Mining 2(1):1–9
Greene CS, Himmelstein DS, Kiralis J, Moore JH (2010) The informative extremes: using both nearest and farthest individuals can improve relief algorithms in the domain of human genetics. In: Evolutionary computation, machine learning and data mining in bioinformatics. Springer, New York, pp 182–193
Higuchi T, Niwa T, Tanaka T, Iba H, de Garis H, Furuya T (1993) Evolvable hardware-genetic-based generation of electric circuitry at gate and hardware description language (hdl) levels. Electrotechnical Laboratory, Tsukuba, Japan, pp 93–94
Iba H, De Garis H, Sato T (1994) Genetic programming using a minimum description length principle. Adv Genet Progr 1:265–284
Iqbal M, Browne WN, Zhang M (2012) Reusing building blocks of extracted knowledge to solve complex, large-scale boolean problems
Iqbal M, Browne WN, Zhang M (2013) Extending learning classifier system with cyclic graphs for scalability on complex, large-scale boolean problems. In: Proceeding of the 15th annual conference on genetic and evolutionary computation conference. ACM, New York, pp 1045–1052
Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: Machine learning: ECML-94. Springer, New York, pp 171–182
Lanzi PL, Loiacono D (2010) Speeding up matching in learning classifier systems using cuda. In: Learning classifier systems. Springer, New York, pp 1–20
Llorà X, Sastry K (2006) Fast rule matching for lcss via vector instructions. In: Proceedings of the 8th annual conference on genetic and evolutionary computation. ACM, New York, pp 1513–1520
Moore J, White B (2007) Tuning relieff for genome-wide genetic analysis. Evolutionary computation, machine learning and data mining in bioinformatics, pp 166–175
Moore JH (2015) Epistasis analysis using relieff. In: Epistasis. Springer, New York, pp 315–325
Moore JH, Asselbergs FW, Williams SM (2010) Bioinformatics challenges for genome-wide association studies. Bioinformatics 26(4):445–455
Moore JH, Ritchie MD (2004) The challenges of whole-genome approaches to common diseases. JAMA 291(13):1642–1643
Quinlan JR (1988) An empirical comparison of genetic and decision-tree classifiers. In: ML, pp 135–141
Rudd J, Moore JH, Urbanowicz RJ (2013) A multi-core parallelization strategy for statistical significance testing in learning classifier systems. Evol Intell 6(2):127–134
Smith RE, Cribbs HB III (1994) Is a learning classifier system a type of neural network? Evol Comput 2(1):19–36
Tan J, Moore J, Urbanowicz R (2013) Rapid rule compaction strategies for global knowledge discovery in a supervised learning classifier system. Adv Artif Life, ECAL 12:110–117
Urbanowicz R (2014) ExSTraCS 2.0. Accessed 24 August 2014. URL http://sourceforge.net/projects/exstracs/
Urbanowicz R, Granizo-Mackenzie A, Moore J (2012) Instance-linked attribute tracking and feedback for michigan-style supervised learning classifier systems. In: Proceedings of the fourteenth international conference on genetic and evolutionary computation conference. ACM, New York, pp 927–934
Urbanowicz R, Moore J (2010) The application of michigan-style learning classifier systems to address genetic heterogeneity and epistasis in association studies. In: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, New York, pp 195–202
Urbanowicz R, Moore J (2011) The application of pittsburgh-style lcs to address genetic heterogeneity and epistasis in association studies. Parallel problem solving from nature-PPSN XI, pp 404–413
Urbanowicz RJ, Andrew AS, Karagas MR, Moore JH (2013) Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach. J Am Med Inf Assoc
Urbanowicz RJ, Bertasius G, Moore JH (2014) An extended michigan-style learning classifier system for flexible supervised learning, classification, and data mining. In: Parallel problem solving from nature-PPSN XIII. Springer, New York
Urbanowicz RJ, Granizo-Mackenzie A, Moore JH (2012) An analysis pipeline with statistical and visualization-guided knowledge discovery for michigan-style learning classifier systems. IEEE Comput Intell Mag 7(4):35–45
Urbanowicz RJ, Granizo-Mackenzie D, Moore JH (2012) Using expert knowledge to guide covering and mutation in a michigan style learning classifier system to detect epistasis and heterogeneity. In: Parallel problem solving from nature-PPSN XII. Springer, New York, pp 266–275
Urbanowicz RJ, Kiralis J, Fisher JM, Moore JH (2012) Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection. BioData Min 5(1):1–13
Urbanowicz RJ, Kiralis J, Sinnott-Armstrong NA, Heberling T, Fisher JM, Moore JH (2012) Gametes: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min 5(1):16
Urbanowicz RJ, Moore JH (2009) Learning classifier systems: a complete introduction, review, and roadmap. J Artif Evol Appl
Velez DR, White BC, Motsinger AA, Bush WS, Ritchie MD, Williams SM, Moore JH (2007) A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 31(4):306–315
Wilson S (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175
Wilson SW (1987) Classifier systems and the animat problem. Mach Learn 2(3):199–228
Acknowledgments
This work was supported by NIH grants AI59694, LM009012, LM010098, EY022300, LM011360, CA134286, and GM103534.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Urbanowicz, R.J., Moore, J.H. ExSTraCS 2.0: description and evaluation of a scalable learning classifier system. Evol. Intel. 8, 89–116 (2015). https://doi.org/10.1007/s12065-015-0128-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-015-0128-8