The state-of-the-art in associative classification includes interesting approaches for building accurate and interpretable classifiers. These approaches generally work on four different phases (data discretization, pattern mining, rule mining, and classifier building), some of them being computational expensive. The aim of this work is to propose a novel evolutionary algorithm for efficiently building associative classifiers in Big Data. The proposed model works in only two phases (a grammar-guided genetic programming framework is performed in each phase): (1) mining reliable association rules; (2) building an accurate classifier by ranking and combining the previously mined rules. The proposal has been implemented on different architectures (multi-thread, Apache Spark and Apache Flink) to take advantage of the distributed computing. The experimental results have been obtained on 40 well-known datasets and analyzed through non-parametric tests. Results were compared to multiple approaches in the field and analyzed on three ways: quality of the predictions, level of interpretability, and efficiency. The proposed method obtained accurate and interpretable classifiers in an efficient way even on high-dimensional data, outperforming the state-of-the-art algorithms on three different levels: quality of the predictions, interpretability, and efficiency.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Fernández A, del Río S, Chawla N V, Herrera F. An insight into imbalanced big data classification: outcomes and challenges. Complex &, Intelligent Systems 2017;3(2):105–20.
Chen H, Chiang R, Storey V. Business intelligence and analytics: from big data to big impact. MIS Quarterly: Management Information Systems 2012;36(4):1165–88.
Cambria E, Chattopadhyay A, Linn E, Mandal B, White B. Storages are not forever. Cogn Comput 2017;9(5):646–58.
Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. SIGMOD Rec 1993;22(2):207–16.
Han J, Kamber M. 2011. Data mining: concepts and techniques. Morgan Kaufmann.
Quinlan R. C4.5: Programs for machine learning. San Mateo: Morgan Kaufmann Publishers; 1993.
Cortes C, Vapnik V. Support vector networks. Mach Learn 1995;20:273–97.
Thabtah FA. A review of associative classification mining. Knowl Eng Rev 2007;22(1):37–65.
Asghar M Z, Khan A, Bibi A, Kundi F M, Ahmad H. Sentence-level emotion detection framework using rule-based classification. Cogn Comput 2017;9(6):868–94.
Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. 4th International Conference on Knowledge Discovery and Data Mining(KDD98); 1998. p. 80–86.
Bechini A, Marcelloni F, Segatori A. A MapReduce solution for associative classification of big data. Inf Sci 2016;332:33–55.
Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. Communications of the ACM - 50th anniversary issue: 1958 - 2008 2008;51(1):107–13.
Alcalá-Fdez J, Alcalá R, Herrera F. A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Trans Fuzzy Syst 2011;19(5):857–72.
Venturini L, Baralis E, Garza P. Scaling associative classification for very large datasets. Journal of Big Data 2017;4(1):44.
Padillo F, Luna J M, Ventura S. Exhaustive search algorithms to mine subgroups on big data using Apache spark. Progress in Artificial Intelligence 2017;6(2):145–58.
Ventura S, Luna JM. Pattern mining with evolutionary algorithms. New York: Springer International Publishing; 2016.
Oneto L, Bisio F, Cambria E, Anguita D. SLT-based ELM for big social data analysis. Cogn Comput 2017;9(2):259–74.
Kim S S, McLoone S, Byeon J H, Lee S, Liu H. Cognitively inspired artificial bee colony clustering for cognitive wireless sensor networks. Cogn Comput 2017;9(2):207–224.
Al-Radaideh Q A, Bataineh DQ. A hybrid approach for arabic text summarization using domain knowledge and genetic algorithms. Cogn Comput 2018;10(4):651–69.
Molina D, LaTorre A, Herrera F. An insight into bio-inspired and evolutionary algorithms for global optimization: review, analysis, and lessons learnt over a decade of competitions. Cogn Comput 2018;10(4):517–44.
Siddique N, Adeli H. Nature inspired computing: an overview and some future directions. Cogn Comput 2015; 7(6):706–14.
Lam C. Hadoop in action, 1st ed. Greenwich: Manning Publications Co.; 2010.
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: cluster computing with working sets. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. HotCloud’10. Berkeley, CA, USA; 2010.
Kumar C, Anjaiah P, Patil S, Lingappa E, Rakesh M. 2017. Mining association rules from NoSQL data bases using MapReduce fuzzy association rule mining algorithm.
Martín D, Martínez-Ballesteros M, García-Gil D, Alcalá-Fdez J, Herrera F, Riquelme-Santos JC. MRQAR: a generic MapReduce framework to discover quantitative association rules in big data problems. Knowl-Based Syst 2018;153:176–92.
McKay R I, Hoai N X, Whigham P A, Shan Y, O’Neill M. Grammar-based genetic programming: a survey. Genet Program Evolvable Mach 2010;11:365–96.
Herrera F, Carmona C J, González P, del Jesus MJ. An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 2011;29(3):495–525.
Luna JM, Padillo F, Pechenizkiy M, Ventura S. Apriori versions based on MapReduce for mining frequent patterns on big data. IEEE Trans Cybern 2017;PP(99):1–15.
Ben-David A. Comparison of classification accuracy using Cohen’s Weighted Kappa. Expert Syst Appl 2008; 34(2):825– 32.
Triguero I, González S, Moyano J M, Garcîa S, Alcalá-Fdez J, Luengo J, et al. KEEL 3.0: an open source software for multi-stage analysis in data mining. Int J Comput Intell Syst 2017;10(1):1238–49.
Yin X, Han J. CPAR: classification based on predictive association rules. 3rd SIAM International Conference on Data Mining(SDM03); 2003. p. 331–5.
Li W, Han J, Pei J. CMAR: accurate and efficient classification based on multiple class-association rules. 2001 IEEE International Conference on Data Mining(ICDM01); 2001. p. 369–76.
Liu B, Ma Y, Wong CK. . Classification Using Association Rules: Weaknesses and Enhancements. Kluwer Academic Publishers; 2001. p. 591–601.
Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 2004;8(1):53–87.
Cohen WW. Fast effective rule induction. Machine Learning: Proceedings of the 12th International Conference; 1995. p. 1–10.
Tan K C, Yu Q, Ang J H. A coevolutionary algorithm for rules discovery in data mining. Int J Syst Sci 2006;37(12):835–64.
Holte R C. Very simple classification rules perform well on most commonly used datasets. Mach Learn 1993; 11:63–91.
Segatori A, Bechini A, Ducange P, Marcelloni F. A distributed fuzzy associative classifier for big data. IEEE Trans Cybern 2018;48(9):2656–69.
Fazzolari M, Alcalá R, Herrera F. A multi-objective evolutionary method for learning granularities based on fuzzy discretization to improve the accuracy-complexity trade-off of fuzzy rule-based classification systems: D-MOFARC algorithm. Appl Soft Comput 2014;24:470–81.
This research was financially supported by the Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund, projects TIN2017-83445-P.
Conflict of interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Padillo, F., Luna, J.M. & Ventura, S. A Grammar-Guided Genetic Programing Algorithm for Associative Classification in Big Data. Cogn Comput 11, 331–346 (2019). https://doi.org/10.1007/s12559-018-9617-2