A View on Fuzzy Systems for Big Data: Progress and Opportunities

Fernández, Alberto; Carmona, Cristobal José; del Jesus, María José; Herrera, Francisco

doi:10.1080/18756891.2016.1180820

A View on Fuzzy Systems for Big Data: Progress and Opportunities

Research Article
Open access
Published: 01 April 2016

Volume 9, pages 69–80, (2016)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computational Intelligence Systems Aims and scope Submit manuscript

A View on Fuzzy Systems for Big Data: Progress and Opportunities

Download PDF

Alberto Fernández¹,
Cristobal José Carmona²,
María José del Jesus¹ &
…
Francisco Herrera^3,4

39 Accesses
44 Citations
Explore all metrics

Abstract

Currently, we are witnessing a growing trend in the study and application of problems in the framework of Big Data. This is mainly due to the great advantages which come from the knowledge extraction from a high volume of information. For this reason, we observe a migration of the standard Data Mining systems towards a new functional paradigm that allows at working with Big Data. By means of the MapReduce model and its different extensions, scalability can be successfully addressed, while maintaining a good fault tolerance during the execution of the algorithms. Among the different approaches used in Data Mining, those models based on fuzzy systems stand out for many applications. Among their advantages, we must stress the use of a representation close to the natural language. Additionally, they use an inference model that allows a good adaptation to different scenarios, especially those with a given degree of uncertainty. Despite the success of this type of systems, their migration to the Big Data environment in the different learning areas is at a preliminary stage yet. In this paper, we will carry out an overview of the main existing proposals on the topic, analyzing the design of these models. Additionally, we will discuss those problems related to the data distribution and parallelization of the current algorithms, and also its relationship with the fuzzy representation of the information. Finally, we will provide our view on the expectations for the future in this framework according to the design of those methods based on fuzzy sets, as well as the open challenges on the topic.

Article PDF

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

J. Alcalá-Fdez, R. Alcalá, and F. Herrera. A fuzzy association rule-based classification model for highdimensional problems with genetic rule selection and lateral tuning. IEEE Transactions on Fuzzy Systems, 19(5):857–872, 2011.
Google Scholar
Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinländer, Matthias J. Sax, Sebastian Schelter, Mareike Höger, Kostas Tzoumas, and Daniel Warneke. The stratosphere platform for big data analytics. International Journal on Very Large Databases, 23(6):939–964, 2014.
Google Scholar
Jorge A. Balazs and Juan D. Velasquez. Opinion mining and information fusion: A survey. Information Fusion, 27:95–110, 2016.
Google Scholar
Gema Bello-Orgaz, Jason J. Jung, and David Camacho. Social big data: Recent achievements and new challenges. Information Fusion, 28:45–59, 2016.
Google Scholar
James C. Bezdek. Fuzzy c-means cluster analysis. Scholarpedia, 6(7):2057, 2011.
Google Scholar
C. J. Carmona, P. González, M. J. del Jesus, and F. Herrera. NMEEF-SD: Non-dominated Multiobjective Evolutionary algorithm for Extracting Fuzzy rules in Subgroup Discovery. IEEE Transactions on Fuzzy Systems, 18(5):958–970, 2010.
Google Scholar
C.J. Carmona, P. González, M.J. del Jesus, and F. Herrera. Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. WIREs Data Mining and Knowledge Discovery, 4(2):87–103, 2014.
Google Scholar
Cristóbal J. Carmona, V. Ruiz-Rodado, María José del Jesús, A. Weber, M. Grootveld, Pedro González 0001, and D. Elizondo. A fuzzy genetic programmingbased algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Inormation. Sciences, 298:180–197, 2015.
F. Charte, A.J. Rivera, M.J. del Jesus, and F. Herrera. Li-mlc: A label inference methodology for addressing high dimensionality in the label space for multilabel classification. Neural Networks and Learning Systems, IEEE Transactions on, 25(10):1842–1854, 2014.
Google Scholar
C.L. Philip Chen and Chun-Yang Zhang. Dataintensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences, 275:314–347, 2014.
Google Scholar
Z. Chi, H. Yan, and T. Pham. Fuzzy algorithms with applications to image processing and pattern recognition. World Scientific, 1996.
Google Scholar
O. Cordón, M.J. del Jesus, F. Herrera, and M. Lozano. Mogul: A methodology to obtain genetic fuzzy rulebased systems under the iterative rule learning approach. International Journal of Intelligent Systems, 14(11):1123–1153, 1999.
Google Scholar
Oscar Cordon. A historical review of evolutionary learning methods for mamdani-type fuzzy rule-based systems: Designing interpretable genetic fuzzy systems. International Journal of Approximate Reasoning, 52(6):894–913, 2011.
Jeffrey Dean and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.
Google Scholar
Jeffrey Dean and Sanjay Ghemawat. MapReduce: A flexible data processing tool. Communications of the ACM, 53(1):72–77, 2010.
Google Scholar
María José del Jesús, Frank Hoffmann, Luis Junco Navascués, and Luciano Sánchez. Induction of fuzzyrule-based classifiers with evolutionary boosting algorithms. IEEE Transactions on Fuzzy Systems, 12(3):296–308, 2004.
Google Scholar
T. Dietterich, R. Lathrop, and T. Lozano-Perez. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1–2):31–71, 1997.
Google Scholar
Pietro Ducange, Francesco Marcelloni, and Armando Segatori. A mapreduce-based fuzzy associative classifier for big data. In Adnan Yazici, Nikhil R. Pal, Uzay Kaymak, Trevor Martin, Hisao Ishibuchi, Chin-Teng Lin, João M. C. Sousa, and Bülent Tütmez, editors, FUZZ-IEEE, pages 1–8. IEEE, 2015.
Google Scholar
Stephan Ewen, Kostas Tzoumas, Moritz Kaufmann, and Volker Markl. Spinning fast iterative data flows. PVLDB, 5(11):1268–1279, 2012.
Google Scholar
A. Fernández, S. Río, V. López, A. Bawakid, M.J. del Jesus, J.M. Benítez, and F. Herrera. Big data with cloud computing: An insight on the computing environment, mapreduce and programming framework. WIREs Data Mining and Knowledge Discovery, 4(5):380–409, 2014.
Google Scholar
Alberto Fernandez, Victoria Lopez, Maria Jose del Jesus, and Francisco Herrera. Revisiting evolutionary fuzzy systems: Taxonomy, applications, new trends and challenges. Knowlegde Based Systems, 80:109–121, 2015.
Google Scholar
Maria Jose Gacto, Rafael Alcalá, and Francisco Herrera. Interpretability of linguistic fuzzy rule-based systems: An overview of interpretability measures. Information Sciences, 181(20):4340–4360, 2011.
Google Scholar
J. Gama and M. Gaber (Eds). Learning from Data Streams–Processing techniques in Sensor Networks. Springer, 2007.
Google Scholar
David García, Antonio González, and Raúl Pérez. Overview of the slave learning algorithm: A review of its evolution and prospects. International Journal of Intelligent Systems, 7(6):1194–1221, 2014.
Google Scholar
D. Garg and K. Trivedi. Fuzzy k-mean clustering in mapreduce on cloud based hadoop. In 2014 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), pages 1607–1610, 2014.
Google Scholar
Eva Gibaja and Sebastián Ventura. A tutorial on multilabel learning. ACM Computing Surveys, 47(3):52:1–52:38, 2015.
Google Scholar
Sergio González, Francisco Herrera, and Salvador García. Monotonic random forest with an ensemble pruning mechanism based on the degree of monotonicity. New Generation Computing, 33(4):367–388, 2015.
Google Scholar
Sumit Goswami and Mayank Singh Shishodia. A fuzzy based approach to text mining and document clustering. In 2013 International Conference on Computational and Information Sciences (ICCIS), 2013.
Google Scholar
F. Herrera, C. J. Carmona, P. González, and M. J. del Jesus. An overview on Subgroup Discovery: Foundations and Applications. Knowledge and Information Systems, 29(3):495–525, 2011.
T Hey and A. E. Trefethen. The UK E-science core programme and the grid. Future Generation Computer Systems, 18(8):1017–1031, 2002.
Google Scholar
J. Hühn and E. Hüllermeier. Furia: an algorithm for unordered fuzzy rule induction. Data Mining and Knowledge Discovery, 19(3):293–319, 2009.
Google Scholar
Fabian Hueske, Mathias Peters, Matthias Sax, Astrid Rheinländer, Rico Bergmann, Aljoscha Krettek, and Kostas Tzoumas. Opening the black boxes in data flow optimization. PVLDB, 5:1256–1267, 2012.
Google Scholar
T. Jo and N. Japkowicz. Class imbalances versus small disjuncts. SIGKDD Explorations, 6(1):40–49, 2004.
Google Scholar
Karthik Kambatla, Giorgos Kollias, Vipin Kumar, and Ananth Grama. Trends in big data analytics. J. Parallel Distrib. Comput., 74(7):2561–2573, 2014.
Google Scholar
G. Klir and B. Yuan. Fuzzy sets and fuzzy logic: theory and applications. Prentice-Hall, 1995.
Google Scholar
Marcin Korytkowski, Leszek Rutkowski, and Rafal Scherer. Fast image classification by boosting fuzzy classifiers. Information Sciences, 327:175–182, 2016.
Google Scholar
Wojciech Kotlowski and Roman Slowinski. On nonparametric ordinal classification with monotonicity constraints. IEEE Transactions on Knowledge and Data Engineering, 25(11):2576–2589, 2013.
Google Scholar
T. Kraska. Finding the needle in the big data systems haystack. IEEE Internet Comput., 17(1):84–86, 2013.
Google Scholar
T. Kraska, A. Talwalkar, J.Duchi, R. Griffith, M. Franklin, and M.I. Jordan. Mlbase: A distributed machine learning system. In Conference on Innovative Data Systems Research, pages 1–7, 2013.
Google Scholar
Chuck Lam. Hadoop in action. Manning, 1st edition, 2011.
K.-H. Lee, Y.-J. Lee, H. Choi, Y. D. Chung, and B. Moon. Parallel data processing with mapreduce: a survey. SIGMOD Record, 40(4):11–20, 2012.
Google Scholar
Jimmy Lin. Mapreduce is good enough? Big Data, 1(1):BD28–BD37, 2013.
G. Linden, B. Smith, and J York. Amazon.com recommendations. item-to-item collaborative filtering. IEEE Internet Comput., 7(1):76–80, 2003.
Google Scholar
Victoria López, Sara del Río, José Manuel Benítez, and Francisco Herrera. Cost-sensitive linguistic fuzzy rule based classification systems under the mapreduce framework for imbalanced big data. Fuzzy Sets and Systems, 258:5–38, 2015.
Google Scholar
Victoria Lopez, Alberto Fernandez, Salvador Garcia, Vasile Palade, and Francisco Herrera. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250(20):113–141, 2013.
Google Scholar
Simone A. Ludwig. Mapreduce-based fuzzy c-means clustering algorithm: implementation and scalability. Int. J. Machine Learning & Cybernetics, 6(6):923–934, 2015.
Google Scholar
V. Marx. The big challenges of big data. Nature, 498(7453):255–260, 2013.
Google Scholar
E. W. T. Ngai and F. K. T. Wat. Literature review and classification of electronic commerce research. Information & Management, 39(5):415–429, 2002.
Google Scholar
Cathy O’Neil and Rachel Schutt. Doing Data Science. O’Reilly Media, 2013.
Google Scholar
Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman. Mahout in Action. Manning Publications Co., 2011.
Google Scholar
W. Pedrycz and F. Gomide. An Introduction to Fuzzy sets: Analysis and Design. Prentice-Hall, 1998.
Google Scholar
Foster Provost and Tom Fawcett. Data Science for Business. What you need to know about data mining and data-analytic thinking. O’Reilly Media, 1st edition, 2013.
F. Pulgar-Rubio, C. J. Carmona, M. D. Pérez-Godoy, A. J. Rivera-Rivas, P. González, and M. J. del Jesus. NMEEFBD: A MapReduce Solution for Subgroup Discovery in Big Data enviroments. Knowledge-Based Systems, Submited, 2016.
Google Scholar
F. Pulgar-Rubio, C. J. Carmona, A. J. Rivera-Rivas, P. González, and M. J. del Jesus. Una primera aproximación al descubrimiento de subgrupos bajo el paradigma MapReduce. In 1er Workshop en Big Data y Análisis de Datos Escalable, pages 991–1000, 2015.
Google Scholar
Kumar Ravi and Vadlamani Ravi. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowledge-Based Systems, 89:14–46, 2015.
S. Río, V. López, J.M. Benítez, and F. Herrera. A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules. International Journal of Computational Intelligence Systems, 8(3):422–437, 2015.
Google Scholar
Andreu Sancho-Asensio, Albert Orriols-Puig, and Jorge Casillas. Evolving association streams. Information Sciences, 334–335:250–272, 2016.
Google Scholar
Luciano Sánchez and José Otero. Boosting fuzzy rules in classification problems under single-winner inference. International Journal of Intelligent Systems, 22(9):1021–1034, 2007.
Google Scholar
Norman Spangenberg, Martin Roth, and Bogdan Franczyk. Evaluating new approaches of big data analytics frameworks. In Witold Abramowicz, editor, BIS, volume 208 of Lecture Notes in Business Information Processing, pages 28–37. Springer, 2015.
Google Scholar
Shiliang Sun. A survey of multi-view machine learning. Neural Computing and Applications, 23(7):2031–2038, 2013.
Isaac Triguero, Salvador García, and Francisco Herrera. Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems, 42(2):245–284, 2013.
Google Scholar
S. Vluymans, D. Sanchez-Tarrago, Y. Saeys, C. Cornelis, and F. Herrera. Fuzzy multi-instance classifiers. IEEE Transactions on Fuzzy Systems, in press, doi: 10.1109/TFUZZ.2016.2516582, 2016.
Sarah Vluymans, Danel Sanchez-Tarrago, Yvan Saeys, Chris Cornelis, and Francisco Herrera. Fuzzy rough classifiers for class imbalanced multiinstance data. Pattern Recognition, in press, doi: 10.1016/j.patcog.2015.12.002, 2016.
M.A. Waller and S.E. Fawcett. Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. Journal of Business Logistics, 34(2):77–84, 2013.
Google Scholar
F.-Y. Wang, K. M. Carley, D. Zeng, and W. Mao. Social computing: From social informatics to social intelligence. IEEE Intelligent Systems, 22(2):79–83, 2007.
Google Scholar
M. Wasikowski and X.-W. Chen. Combating the small sample class imbalance problem using feature selection. IEEE Transactions on Knowledge and Data Engineering, 22(10):1388–1400, 2010.
Google Scholar
Gary M. Weiss. The impact of small disjuncts on classifier learning. In Robert Stahlbock, Sven F. Crone, and Stefan Lessmann, editors, Data Mining, volume 8 of Annals of Information Systems, pages 193–226. Springer, 2010.
Google Scholar
Tom White. Hadoop: The Definitive Guide. O’Reilly Media, 2nd edition, 2012.
Michael Wozniak, Manuel Graña, and Emilio Corchado. A survey of multiple classifier systems as hybrid systems. Information Fusion, 16:3–17, 2014.
Google Scholar
Xindong Wu, Xingquan Zhu, Gong-Qing Wu, and Wei Ding. Data mining with big data. IEEE Transactions on Knowledge Data Engineering, 26(1):97–107, 2014.
Google Scholar
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12, pages 1–14, 2012.
Google Scholar
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Erich M. Nahum and Dongyan Xu, editors, HotCloud 2010, pages 1–7. USENIX Association, 2010.
Google Scholar
Min-Ling Zhang and Zhi-Hua Zhou. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8):1819–1837, 2014.
Google Scholar
Ruixin Zhang and Yinglin Wang. An enhanced agglomerative fuzzy k-means clustering method with mapreduce implementation on hadoop platform. In 2014 International Conference on Progress in Informatics and Computing (PIC), pages 509–513, 2014.
Google Scholar
Paul C. Zikopoulos, Chris Eaton, Drik deRoos, Thomas Deutsch, and George Lapis. Understanding Big Data - Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Osborne Media, 2011.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Jaén, 23071, Jaén, Spain
Alberto Fernández & María José del Jesus
Department of Civil Engineering, University of Burgos Burgos, 09006, Spain
Cristobal José Carmona
Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada Granada, 18071, Spain
Francisco Herrera
Faculty of Computing and Information Technology, King Abdulaziz University Jeddah, Saudi Arabia
Francisco Herrera

Authors

Alberto Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Cristobal José Carmona
View author publications
You can also search for this author in PubMed Google Scholar
María José del Jesus
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto Fernández.

Rights and permissions

This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Reprints and permissions

About this article

Cite this article

Fernández, A., Carmona, C.J., del Jesus, M.J. et al. A View on Fuzzy Systems for Big Data: Progress and Opportunities. Int J Comput Intell Syst 9 (Suppl 1), 69–80 (2016). https://doi.org/10.1080/18756891.2016.1180820

Download citation

Received: 08 February 2016
Accepted: 12 March 2016
Published: 01 April 2016
Issue Date: January 2016
DOI: https://doi.org/10.1080/18756891.2016.1180820

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A View on Fuzzy Systems for Big Data: Progress and Opportunities

Abstract

Article PDF

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation