Abstract
Currently, we are witnessing a growing trend in the study and application of problems in the framework of Big Data. This is mainly due to the great advantages which come from the knowledge extraction from a high volume of information. For this reason, we observe a migration of the standard Data Mining systems towards a new functional paradigm that allows at working with Big Data. By means of the MapReduce model and its different extensions, scalability can be successfully addressed, while maintaining a good fault tolerance during the execution of the algorithms. Among the different approaches used in Data Mining, those models based on fuzzy systems stand out for many applications. Among their advantages, we must stress the use of a representation close to the natural language. Additionally, they use an inference model that allows a good adaptation to different scenarios, especially those with a given degree of uncertainty. Despite the success of this type of systems, their migration to the Big Data environment in the different learning areas is at a preliminary stage yet. In this paper, we will carry out an overview of the main existing proposals on the topic, analyzing the design of these models. Additionally, we will discuss those problems related to the data distribution and parallelization of the current algorithms, and also its relationship with the fuzzy representation of the information. Finally, we will provide our view on the expectations for the future in this framework according to the design of those methods based on fuzzy sets, as well as the open challenges on the topic.
Article PDF
Avoid common mistakes on your manuscript.
References
J. Alcalá-Fdez, R. Alcalá, and F. Herrera. A fuzzy association rule-based classification model for highdimensional problems with genetic rule selection and lateral tuning. IEEE Transactions on Fuzzy Systems, 19(5):857–872, 2011.
Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinländer, Matthias J. Sax, Sebastian Schelter, Mareike Höger, Kostas Tzoumas, and Daniel Warneke. The stratosphere platform for big data analytics. International Journal on Very Large Databases, 23(6):939–964, 2014.
Jorge A. Balazs and Juan D. Velasquez. Opinion mining and information fusion: A survey. Information Fusion, 27:95–110, 2016.
Gema Bello-Orgaz, Jason J. Jung, and David Camacho. Social big data: Recent achievements and new challenges. Information Fusion, 28:45–59, 2016.
James C. Bezdek. Fuzzy c-means cluster analysis. Scholarpedia, 6(7):2057, 2011.
C. J. Carmona, P. González, M. J. del Jesus, and F. Herrera. NMEEF-SD: Non-dominated Multiobjective Evolutionary algorithm for Extracting Fuzzy rules in Subgroup Discovery. IEEE Transactions on Fuzzy Systems, 18(5):958–970, 2010.
C.J. Carmona, P. González, M.J. del Jesus, and F. Herrera. Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. WIREs Data Mining and Knowledge Discovery, 4(2):87–103, 2014.
Cristóbal J. Carmona, V. Ruiz-Rodado, María José del Jesús, A. Weber, M. Grootveld, Pedro González 0001, and D. Elizondo. A fuzzy genetic programmingbased algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Inormation. Sciences, 298:180–197, 2015.
F. Charte, A.J. Rivera, M.J. del Jesus, and F. Herrera. Li-mlc: A label inference methodology for addressing high dimensionality in the label space for multilabel classification. Neural Networks and Learning Systems, IEEE Transactions on, 25(10):1842–1854, 2014.
C.L. Philip Chen and Chun-Yang Zhang. Dataintensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences, 275:314–347, 2014.
Z. Chi, H. Yan, and T. Pham. Fuzzy algorithms with applications to image processing and pattern recognition. World Scientific, 1996.
O. Cordón, M.J. del Jesus, F. Herrera, and M. Lozano. Mogul: A methodology to obtain genetic fuzzy rulebased systems under the iterative rule learning approach. International Journal of Intelligent Systems, 14(11):1123–1153, 1999.
Oscar Cordon. A historical review of evolutionary learning methods for mamdani-type fuzzy rule-based systems: Designing interpretable genetic fuzzy systems. International Journal of Approximate Reasoning, 52(6):894–913, 2011.
Jeffrey Dean and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.
Jeffrey Dean and Sanjay Ghemawat. MapReduce: A flexible data processing tool. Communications of the ACM, 53(1):72–77, 2010.
María José del Jesús, Frank Hoffmann, Luis Junco Navascués, and Luciano Sánchez. Induction of fuzzyrule-based classifiers with evolutionary boosting algorithms. IEEE Transactions on Fuzzy Systems, 12(3):296–308, 2004.
T. Dietterich, R. Lathrop, and T. Lozano-Perez. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1–2):31–71, 1997.
Pietro Ducange, Francesco Marcelloni, and Armando Segatori. A mapreduce-based fuzzy associative classifier for big data. In Adnan Yazici, Nikhil R. Pal, Uzay Kaymak, Trevor Martin, Hisao Ishibuchi, Chin-Teng Lin, João M. C. Sousa, and Bülent Tütmez, editors, FUZZ-IEEE, pages 1–8. IEEE, 2015.
Stephan Ewen, Kostas Tzoumas, Moritz Kaufmann, and Volker Markl. Spinning fast iterative data flows. PVLDB, 5(11):1268–1279, 2012.
A. Fernández, S. Río, V. López, A. Bawakid, M.J. del Jesus, J.M. Benítez, and F. Herrera. Big data with cloud computing: An insight on the computing environment, mapreduce and programming framework. WIREs Data Mining and Knowledge Discovery, 4(5):380–409, 2014.
Alberto Fernandez, Victoria Lopez, Maria Jose del Jesus, and Francisco Herrera. Revisiting evolutionary fuzzy systems: Taxonomy, applications, new trends and challenges. Knowlegde Based Systems, 80:109–121, 2015.
Maria Jose Gacto, Rafael Alcalá, and Francisco Herrera. Interpretability of linguistic fuzzy rule-based systems: An overview of interpretability measures. Information Sciences, 181(20):4340–4360, 2011.
J. Gama and M. Gaber (Eds). Learning from Data Streams–Processing techniques in Sensor Networks. Springer, 2007.
David García, Antonio González, and Raúl Pérez. Overview of the slave learning algorithm: A review of its evolution and prospects. International Journal of Intelligent Systems, 7(6):1194–1221, 2014.
D. Garg and K. Trivedi. Fuzzy k-mean clustering in mapreduce on cloud based hadoop. In 2014 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), pages 1607–1610, 2014.
Eva Gibaja and Sebastián Ventura. A tutorial on multilabel learning. ACM Computing Surveys, 47(3):52:1–52:38, 2015.
Sergio González, Francisco Herrera, and Salvador García. Monotonic random forest with an ensemble pruning mechanism based on the degree of monotonicity. New Generation Computing, 33(4):367–388, 2015.
Sumit Goswami and Mayank Singh Shishodia. A fuzzy based approach to text mining and document clustering. In 2013 International Conference on Computational and Information Sciences (ICCIS), 2013.
F. Herrera, C. J. Carmona, P. González, and M. J. del Jesus. An overview on Subgroup Discovery: Foundations and Applications. Knowledge and Information Systems, 29(3):495–525, 2011.
T Hey and A. E. Trefethen. The UK E-science core programme and the grid. Future Generation Computer Systems, 18(8):1017–1031, 2002.
J. Hühn and E. Hüllermeier. Furia: an algorithm for unordered fuzzy rule induction. Data Mining and Knowledge Discovery, 19(3):293–319, 2009.
Fabian Hueske, Mathias Peters, Matthias Sax, Astrid Rheinländer, Rico Bergmann, Aljoscha Krettek, and Kostas Tzoumas. Opening the black boxes in data flow optimization. PVLDB, 5:1256–1267, 2012.
T. Jo and N. Japkowicz. Class imbalances versus small disjuncts. SIGKDD Explorations, 6(1):40–49, 2004.
Karthik Kambatla, Giorgos Kollias, Vipin Kumar, and Ananth Grama. Trends in big data analytics. J. Parallel Distrib. Comput., 74(7):2561–2573, 2014.
G. Klir and B. Yuan. Fuzzy sets and fuzzy logic: theory and applications. Prentice-Hall, 1995.
Marcin Korytkowski, Leszek Rutkowski, and Rafal Scherer. Fast image classification by boosting fuzzy classifiers. Information Sciences, 327:175–182, 2016.
Wojciech Kotlowski and Roman Slowinski. On nonparametric ordinal classification with monotonicity constraints. IEEE Transactions on Knowledge and Data Engineering, 25(11):2576–2589, 2013.
T. Kraska. Finding the needle in the big data systems haystack. IEEE Internet Comput., 17(1):84–86, 2013.
T. Kraska, A. Talwalkar, J.Duchi, R. Griffith, M. Franklin, and M.I. Jordan. Mlbase: A distributed machine learning system. In Conference on Innovative Data Systems Research, pages 1–7, 2013.
Chuck Lam. Hadoop in action. Manning, 1st edition, 2011.
K.-H. Lee, Y.-J. Lee, H. Choi, Y. D. Chung, and B. Moon. Parallel data processing with mapreduce: a survey. SIGMOD Record, 40(4):11–20, 2012.
Jimmy Lin. Mapreduce is good enough? Big Data, 1(1):BD28–BD37, 2013.
G. Linden, B. Smith, and J York. Amazon.com recommendations. item-to-item collaborative filtering. IEEE Internet Comput., 7(1):76–80, 2003.
Victoria López, Sara del Río, José Manuel Benítez, and Francisco Herrera. Cost-sensitive linguistic fuzzy rule based classification systems under the mapreduce framework for imbalanced big data. Fuzzy Sets and Systems, 258:5–38, 2015.
Victoria Lopez, Alberto Fernandez, Salvador Garcia, Vasile Palade, and Francisco Herrera. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250(20):113–141, 2013.
Simone A. Ludwig. Mapreduce-based fuzzy c-means clustering algorithm: implementation and scalability. Int. J. Machine Learning & Cybernetics, 6(6):923–934, 2015.
V. Marx. The big challenges of big data. Nature, 498(7453):255–260, 2013.
E. W. T. Ngai and F. K. T. Wat. Literature review and classification of electronic commerce research. Information & Management, 39(5):415–429, 2002.
Cathy O’Neil and Rachel Schutt. Doing Data Science. O’Reilly Media, 2013.
Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman. Mahout in Action. Manning Publications Co., 2011.
W. Pedrycz and F. Gomide. An Introduction to Fuzzy sets: Analysis and Design. Prentice-Hall, 1998.
Foster Provost and Tom Fawcett. Data Science for Business. What you need to know about data mining and data-analytic thinking. O’Reilly Media, 1st edition, 2013.
F. Pulgar-Rubio, C. J. Carmona, M. D. Pérez-Godoy, A. J. Rivera-Rivas, P. González, and M. J. del Jesus. NMEEFBD: A MapReduce Solution for Subgroup Discovery in Big Data enviroments. Knowledge-Based Systems, Submited, 2016.
F. Pulgar-Rubio, C. J. Carmona, A. J. Rivera-Rivas, P. González, and M. J. del Jesus. Una primera aproximación al descubrimiento de subgrupos bajo el paradigma MapReduce. In 1er Workshop en Big Data y Análisis de Datos Escalable, pages 991–1000, 2015.
Kumar Ravi and Vadlamani Ravi. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowledge-Based Systems, 89:14–46, 2015.
S. Río, V. López, J.M. Benítez, and F. Herrera. A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules. International Journal of Computational Intelligence Systems, 8(3):422–437, 2015.
Andreu Sancho-Asensio, Albert Orriols-Puig, and Jorge Casillas. Evolving association streams. Information Sciences, 334–335:250–272, 2016.
Luciano Sánchez and José Otero. Boosting fuzzy rules in classification problems under single-winner inference. International Journal of Intelligent Systems, 22(9):1021–1034, 2007.
Norman Spangenberg, Martin Roth, and Bogdan Franczyk. Evaluating new approaches of big data analytics frameworks. In Witold Abramowicz, editor, BIS, volume 208 of Lecture Notes in Business Information Processing, pages 28–37. Springer, 2015.
Shiliang Sun. A survey of multi-view machine learning. Neural Computing and Applications, 23(7):2031–2038, 2013.
Isaac Triguero, Salvador García, and Francisco Herrera. Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems, 42(2):245–284, 2013.
S. Vluymans, D. Sanchez-Tarrago, Y. Saeys, C. Cornelis, and F. Herrera. Fuzzy multi-instance classifiers. IEEE Transactions on Fuzzy Systems, in press, doi: 10.1109/TFUZZ.2016.2516582, 2016.
Sarah Vluymans, Danel Sanchez-Tarrago, Yvan Saeys, Chris Cornelis, and Francisco Herrera. Fuzzy rough classifiers for class imbalanced multiinstance data. Pattern Recognition, in press, doi: 10.1016/j.patcog.2015.12.002, 2016.
M.A. Waller and S.E. Fawcett. Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. Journal of Business Logistics, 34(2):77–84, 2013.
F.-Y. Wang, K. M. Carley, D. Zeng, and W. Mao. Social computing: From social informatics to social intelligence. IEEE Intelligent Systems, 22(2):79–83, 2007.
M. Wasikowski and X.-W. Chen. Combating the small sample class imbalance problem using feature selection. IEEE Transactions on Knowledge and Data Engineering, 22(10):1388–1400, 2010.
Gary M. Weiss. The impact of small disjuncts on classifier learning. In Robert Stahlbock, Sven F. Crone, and Stefan Lessmann, editors, Data Mining, volume 8 of Annals of Information Systems, pages 193–226. Springer, 2010.
Tom White. Hadoop: The Definitive Guide. O’Reilly Media, 2nd edition, 2012.
Michael Wozniak, Manuel Graña, and Emilio Corchado. A survey of multiple classifier systems as hybrid systems. Information Fusion, 16:3–17, 2014.
Xindong Wu, Xingquan Zhu, Gong-Qing Wu, and Wei Ding. Data mining with big data. IEEE Transactions on Knowledge Data Engineering, 26(1):97–107, 2014.
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12, pages 1–14, 2012.
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Erich M. Nahum and Dongyan Xu, editors, HotCloud 2010, pages 1–7. USENIX Association, 2010.
Min-Ling Zhang and Zhi-Hua Zhou. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8):1819–1837, 2014.
Ruixin Zhang and Yinglin Wang. An enhanced agglomerative fuzzy k-means clustering method with mapreduce implementation on hadoop platform. In 2014 International Conference on Progress in Informatics and Computing (PIC), pages 509–513, 2014.
Paul C. Zikopoulos, Chris Eaton, Drik deRoos, Thomas Deutsch, and George Lapis. Understanding Big Data - Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Osborne Media, 2011.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
About this article
Cite this article
Fernández, A., Carmona, C.J., del Jesus, M.J. et al. A View on Fuzzy Systems for Big Data: Progress and Opportunities. Int J Comput Intell Syst 9 (Suppl 1), 69–80 (2016). https://doi.org/10.1080/18756891.2016.1180820
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1080/18756891.2016.1180820