Fuzzy rule based classification systems for big data with MapReduce: granularity analysis

Fernández, Alberto; del Río, Sara; Bawakid, Abdullah; Herrera, Francisco

doi:10.1007/s11634-016-0260-z

Fuzzy rule based classification systems for big data with MapReduce: granularity analysis

Regular Article
Published: 06 June 2016

Volume 11, pages 711–730, (2017)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Alberto Fernández¹,
Sara del Río¹,
Abdullah Bawakid² &
…
Francisco Herrera^1,2

949 Accesses
31 Citations
Explore all metrics

Abstract

Due to the vast amount of information available nowadays, and the advantages related to the processing of this data, the topics of big data and data science have acquired a great importance in the current research. Big data applications are mainly about scalability, which can be achieved via the MapReduce programming model.It is designed to divide the data into several chunks or groups that are processed in parallel, and whose result is “assembled” to provide a single solution. Among different classification paradigms adapted to this new framework, fuzzy rule based classification systems have shown interesting results with a MapReduce approach for big data. It is well known that the performance of these types of systems has a strong dependence on the selection of a good granularity level for the Data Base. However, in the context of MapReduce this parameter is even harder to determine as it can be also related with the number of Maps chosen for the processing stage. In this paper, we aim at analyzing the interrelation between the number of labels of the fuzzy variables and the scarcity of the data due to the data sampling in MapReduce. Specifically, we consider that as the partitioning of the initial instance set grows, the level of granularity necessary to achieve a good performance also becomes higher. The experimental results, carried out for several Big Data problems, and using the Chi-FRBCS-BigData algorithms, support our claims.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MapReduce-Driven Rough Set Fuzzy Classification Rule Generation for Big Data Processing

An overview of recent distributed algorithms for learning fuzzy models in Big Data classification

Article Open access 10 March 2020

Survey on Fuzzy Associative Classifications Techniques and Their Performance Evaluation with Different Fuzzy Clustering Techniques Over Big Data

References

Chen CP, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347
Article Google Scholar
Chi Z, Yan H, Pham T (1996) Fuzzy algorithms with applications to image processing and pattern recognition. World Scientific, Singapore
MATH Google Scholar
Cordón O, Herrera F (2000) A proposal for improving the accuracy of linguistic modeling. IEEE Trans Fuzzy Syst 8(3):335–344
Article Google Scholar
Cordón O, del Jesus M, Herrera F (1999) A proposal on reasoning methods in fuzzy rule-based classification systems. Int J Approx Reason 20(1):21–45
Article Google Scholar
Cordón O, Herrera F, Villar P (2000) Analysis and guidelines to obtain a good fuzzy partition granularity for fuzzy rule-based systems using simulated annealing. Int J Approx Reason 25(3):187–215
Article MATH Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72–77
Article Google Scholar
Fernández A, Río S, López V, Bawakid A, del Jesus M, Benítez J, Herrera F (2014) Big data with cloud computing: an insight on the computing environment, MapReduce and programming framework. WIREs Data Min Knowl Discov 4(5):380–409
Article Google Scholar
Fernández A, Garcfa S, Luengo J, Bernadó-Mansilla E, Herrera F (2010) Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study. IEEE Trans Evolut Comput 14(6):913–941
Article Google Scholar
Gacto MJ, Alcalá R, Herrera F (2011) Interpretability of linguistic fuzzy rule-based systems: an overview of interpretability measures. Inf Sci 181(20):4340–4360
Article Google Scholar
Hong T-P, Lee Y-C, Wu M-T (2014) An effective parallel approach for genetic-fuzzy data mining. Expert Syst Appl 41(2):655–662
Article Google Scholar
Ishibuchi H, Mihara S, Nojima Y (2013) Parallel distributed hybrid fuzzy gbml models with rule set migration and training data rotation. IEEE Trans Fuzzy Syst 21(2):355–368
Article Google Scholar
Ishibuchi H, Nakashima T (2001) Effect of rule weights in fuzzy rule-based classification systems. IEEE Trans Fuzzy Syst 9(4):506–515
Article Google Scholar
Ishibuchi H, Nakashima T, Nii M (2004) Classification and modeling with linguistic information granules: advanced approaches to linguistic data mining. Springer, Berlin
MATH Google Scholar
Ishibuchi H, Yamamoto T (2005) Rule weight specification in fuzzy rule-based classification systems. IEEE Trans Fuzzy Syst 13:428–435
Article Google Scholar
Jackowski K, Krawczyk B, Wozniak M (2014) Improved adaptive splitting and selection: the hybrid training method of a classifier based on a feature space partitioning. Int J Neural Syst 24(3):1430007
Kambatla K, Kollias G, Kumar V, Grama A (2014) Trends in big data analytics. J Parallel Distrib Comput 74(7):2561–2573
Article Google Scholar
Kraska T (2013) Finding the needle in the big data systems haystack. IEEE Internet Comput Mag 17(1):84–86
Article Google Scholar
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, pp 1–12. doi:10.1007/s13748-016-0094-0 (in press)
Lam C (2011) Hadoop in action, 1st edn. Manning, Shelter Island
Google Scholar
Lichman M (2013) UCI machine learning repository; university of california, irvine, school of information and computer sciences. http://archive.ics.uci.edu/ml
López V, del Río S, Benítez JM, Herrera F (2015) Cost-sensitive linguistic fuzzy rule based classification systems under the mapreduce framework for imbalanced big data. Fuzzy Sets Syst 258:5–38
Article MathSciNet Google Scholar
Madden S (2012) From databases to big data. IEEE Internet Comput Mag 16(3):4–6
Article Google Scholar
Marx V (2013) The big challenges of big data. Nature 498(7453):255–260
Article Google Scholar
Mattmann CA (2013) Computing: a vision for data science. Nature 493:473–475
Article Google Scholar
O’Neil C, Schutt R (2013) Doing data science, 1st edn. O’Reilly Media, Sebastopol
Google Scholar
Provost F, Fawcett T (2013a) Data science and its relationship to big data and data-driven decision making. Big Data 1(1):51–59
Article Google Scholar
Provost F, Fawcett S (2013b) Data science for business. What you need to know about data mining and data-analytic thinking, 1st edn. O’Reilly Media, Sebastopol
Google Scholar
Río S, López V, Benítez J, Herrera F (2015) A MapReduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules. Int J Comput Intell Syst 8(3):422–437
Article Google Scholar
Waller M, Fawcett S (2013) Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. J Bus Logist 34:77–84
Article Google Scholar
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques. Morgan Kaufmann series in data management systems. Morgan Kaufmann, Burlington
Google Scholar
Wozniak M, Graña M, Corchado E (2014) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17
Article Google Scholar
Wozniak M, Krawczyk B (2012) Combined classifier based on feature space partitioning. Appl Math Comput Sci 22(4):855–866
MathSciNet Google Scholar
Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
Article Google Scholar
Zikopoulos PC, Eaton C, deRoos D, Deutsch T, Lapis G (2011) Understanding big data-analytics for enterprise class hadoop and streaming data, 1st edn. McGraw-Hill Osborne Media, East Windsor
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Alberto Fernández, Sara del Río & Francisco Herrera
Faculty of Computing and Information Technology, King Abdulaziz University (KAU), Jeddah, Saudi Arabia
Abdullah Bawakid & Francisco Herrera

Authors

Alberto Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Sara del Río
View author publications
You can also search for this author in PubMed Google Scholar
Abdullah Bawakid
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto Fernández.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fernández, A., del Río, S., Bawakid, A. et al. Fuzzy rule based classification systems for big data with MapReduce: granularity analysis. Adv Data Anal Classif 11, 711–730 (2017). https://doi.org/10.1007/s11634-016-0260-z

Download citation

Received: 08 November 2015
Revised: 24 May 2016
Accepted: 25 May 2016
Published: 06 June 2016
Issue Date: December 2017
DOI: https://doi.org/10.1007/s11634-016-0260-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fuzzy rule based classification systems for big data with MapReduce: granularity analysis

Abstract

Access this article

Similar content being viewed by others

MapReduce-Driven Rough Set Fuzzy Classification Rule Generation for Big Data Processing

An overview of recent distributed algorithms for learning fuzzy models in Big Data classification

Survey on Fuzzy Associative Classifications Techniques and Their Performance Evaluation with Different Fuzzy Clustering Techniques Over Big Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Fuzzy rule based classification systems for big data with MapReduce: granularity analysis

Abstract

Access this article

Similar content being viewed by others

MapReduce-Driven Rough Set Fuzzy Classification Rule Generation for Big Data Processing

An overview of recent distributed algorithms for learning fuzzy models in Big Data classification

Survey on Fuzzy Associative Classifications Techniques and Their Performance Evaluation with Different Fuzzy Clustering Techniques Over Big Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation