Abstract
Big data mining is an excellent source of information and knowledge from systems to end users. However, managing such amounts of data or knowledge requires automation, which leads to serious consideration of the use of machine learning algorithms. Machine learning helps us make decisions if there is no right way to solve a problem identified in previous knowledge bases, and that is, too, one of the most widely used analysis and modeling tools for this purpose. In this work, we present an in-depth study that helps us to choose the best machine learning algorithms in order to process big data and extract knowledge from it, so that, this treatment can be very flexible, either in a simple system with sequential computing, or in a distributed system with parallel computing. To achieve this, we will, first and foremost, test the accuracy of the results provided by the classifiers; here we mean the strength and flexibility of a classifier when it comes to dealing with big data mining. Second, we will also test the execution speed for each classifier in complex cases; that is, when the classifier will not be sufficient to solve a particular problem in the context of big data mining, especially if all cases are dealt with quickly and efficiently. The results obtained in this paper demonstrated the superiority of certain classifiers over others in certain cases, and demonstrated their failure in other cases, the reason being due to the nature of the dataset, in particular the number of instances, the number of attributes , and the number of classes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bailly, S., Meyfroidt, G., Timsit, J.-F.: What’s new in ICU in 2050: big data and machine learning. Intensive Care Med. 44(9), 1524–1527 (2017). https://doi.org/10.1007/s00134-017-5034-3
Jayasri, N.P., Aruna, R.: Big data analytics in health care by data mining and classification techniques. ICT Express (2021). https://doi.org/10.1016/j.icte.2021.07.001
Smith, P.F., Zheng, Y.: Applications of multivariate statistical and data mining analyses to the search for biomarkers of sensorineural hearing loss, tinnitus, and vestibular dysfunction. Front. Neurol. 12, 205 (2021). https://doi.org/10.3389/fneur.2021.627294. ISSN 1664-2295
Dasgupta, A., Nath, A.: Classification of machine learning algorithms. Int. J. Innov. Res. Adv. Eng. 3(3), 6–11 (2016)
Dogan, A., Birant, D.: Machine learning and data mining in manufacturing. Expert Syst. Appl. 166, 114060 (2020). https://doi.org/10.1016/j.eswa.2020.114060
Kushwaha, A.K., Kar, A.K., Dwivedi, Y.K.: Applications of big data in emerging management disciplines: a literature review using text mining. Int. J. Inf. Manag. Data Insights 1(2), 100017 (2021). https://doi.org/10.1016/j.jjimei.2021.100017
Chui, K.T., Lytras, M.D., Visvizi, A., Sarirete, A.: An overview of artificial intelligence and big data analytics for smart healthcare: requirements, applications, and challenges, pp. 243–254. Academic Press (2021). https://doi.org/10.1016/B978-0-12-822060-3.00015-2
Sathyaraj, R., Ramanathan, L., Lavanya, K., Balasubramanian, V., Saira Banu, J.: Chicken swarm foraging algorithm for big data classification using the deep belief network classifier. Data Technol. Appl. (2020). https://doi.org/10.1108/DTA-08-2019-0146
O’Donovan, P., Leahy, K., Bruton, K., O’Sullivan, T. J.: Big data in manufacturing: a systematic mapping study. J. Big Data 20(2) (2015). https://doi.org/10.1186/s40537-015-0028-x
Hariri, R.H., Fredericks, E.M., Bowers, K.M.: Uncertainty in big data analytics: survey, opportunities, and challenges. J. Big Data 6(1), 1–16 (2019). https://doi.org/10.1186/s40537-019-0206-3
Chen, M., Liu, Y.: Big data: a survey, mobile networks and application. 19(2), 171–209 (2014)
Erl, T., Khattak, W., Buhler, P.: Big Data Fundamentals: Concepts, Drivers and Techniques. Prentice Hall Press, Hoboken (2016)
Chan, J.O.: An architecture for big data analytics. Commun. IIMA 13(2), 1–13 (2013)
Deutsch, R., Corrigan, D., Zikopoulos, P., Giles, J.: Harness the Power of Big Data: The IBM Big Data Platform. McGraw-Hill, New York (2013)
Khan, N., Shah, H., Badsha, G., Abbasi, A.A., Alsaqer, M., Salehian, S.: 10 Vs, issues and challenges of big data. In: International Conference on Big Data and Education ICBDE 2018, pp. 203–210 (2018)
Kayyali, D., Knott, S.V.: The big-data revolution in us health care: accelerating value and innovation. Mc Kinsey Company 2(8), 1–13 (2013)
Katal, A., Wazid, M., Goudar, R.: Big data: issues, challenges, tools and good practices. In: Sixth International Conference on Contemporary Computing (IC3), pp. 404–409. IEEE (2013)
Ferguson, M.: Enterprise information protection-the impact of big data. IBM (2013)
Patgiri, R., Ahmed, A.: Big data: the v’s of the game changer paradigm. In: IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (2016). https://doi.org/10.1109/HPCC-SmartCity-DSS.2016.8
IBM, The top five ways to get started with big data (2014)
Elgendy, N., Elragal, A.: Big data analytics: a literature review paper. In: Perner, P. (ed.) Advances in Data Mining. Applications and Theoretical Aspects, ICDM 8557 (2014)
Cen, T., Chu, Q., He, R.: Big data mining for investor sentiment. J. Phys. Conf. Ser. 1187(5) (2019)
Che, D., Safran, M., Peng, Z.: From big data to big data mining: challenges, issues, and opportunities. In: Hong, B., Meng, X., Chen, L., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7827, pp. 1–15. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40270-8_1
Oussous, A., Benjelloun, F.-Z., Lahcen, A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ. - Comput. Inf. Sci. (2017). http://dx.doi.org/10.1016/j.jksuci.2017.06.001
Xindong, W., Xingquan, Z., Gong-Qing, W., Wei, D.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014). https://doi.org/10.1109/TKDE.2013.109
Xingquan, Z., Ian, D.: Knowledge Discovery and Data Mining: Challenges and Realities. Hershey, New York (2007). ISBN 978-1-59904-252
Bailly, S., Meyfroidt, G., Timsit, J.: What’s new in ICU in 2050: big data and machine learning. Intensive Care Med 44, 1524–1527 (2018). https://doi.org/10.1007/s00134-017-5034-3
Klaine, P.V., Imran, M.A., Onireti, O., Souza, R.D.: A survey of machine learning techniques applied to self-organizing cellular networks. IEEE Commun. Surv. Tutor. 19(4), 2392–2431 (2017). https://doi.org/10.1109/COMST.2017.2727878
Khan, B., Olanrewaju, R.F., Altaf, H.: Critical insight for MapReduce optimization in Hadoop. Int. J. Comput. Sci. Control Eng. 2(1), 1–7 (2014)
An, C., Lim, H., Kim, D.: Machine learning prediction for mortality of patients diagnosed with COVID-19: a nationwide Korean cohort study. Sci. Rep. 10, 1–11 (2020). https://doi.org/10.1038/s41598-020-75767-2
Goodman-Meza, D., Rudas, A., Chiang, J., Adamson, P., Ebinger, J., Sun, N.: A machine learning algorithm to increase COVID-19 inpatient diagnostic capacity. PLoS One 15(9), e0239474 (2020). https://doi.org/10.1371/journal.pone.0239474
Mathkunti, N.M., Rangaswamy, S.: Machine learning techniques to identify dementia. SN Comput. Sci. 1(3), 1–6 (2020). https://doi.org/10.1007/s42979-020-0099-4
Muhammad, L.J., Algehyne, E.A., Usman, S.S., Ahmad, A., Chakraborty, C., Mohammed, I.A.: Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN Comput. Sci. 2(1), 1–13 (2020). https://doi.org/10.1007/s42979-020-00394-7
Li, Y., Hai-Tao, Z., Jorge, G.: A machine learning-based model for survival prediction in patients with severe COVID-19 infection. medRxiv (2020). https://doi.org/10.1101/2020.02.27.20028027
James, G., Witten, D., Hastie, T., Tibshirani, R.: Statistical learning. In: An Introduction to Statistical Learning. Springer Texts in Statistics, vol. 103, 15–57. Springer, New York (2013)
Siirtola, P., Roning, J.: Comparison of regression and classification models for user independent and personal stress detection. Sensors 20, 4402 (2020)
Coulet, A., Chawki, M., Jay, N., Shah, N., Wack, M., Dumontier, M.: Predicting the need for a reduced drug dose, at first prescription. Sci. Rep. 8(1), 1–11 (2018). https://doi.org/10.1038/s41598-018-33980-0
Nguyen, D., et al.: A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning. Sci. Rep. 9(1), 1–10 (2019). https://doi.org/10.1038/s41598-018-37741-x
Lalmuanawma, S., Hussain, J., Chhakchhuak, L.: Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: a review. Chaos Solit. Fractals 139(1), 110059 (2020). https://doi.org/10.1016/j.chaos.2020.110059
Pham, Q., Nguyen, D.C., Huynh-The, T., Hwang, W., Pathirana, P.N.: Artificial intelligence (AI) and big data for coronavirus (COVID-19) pandemic: a survey on the state-of-the-arts. IEEE Access 8, 130820–130839 (2020). https://doi.org/10.1109/ACCESS.2020.3009328
Ardakani, A.A., Kanafi, A., Acharya, U.R., Khadem, N., Mohammadi, A.: Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: results of 10 convolutional neural networks. Comput. Biol. Med. 121, 103795 (2020). https://doi.org/10.1016/j.compbiomed.2020.103795
Ozturk, T., Talo, M., Yildirim, E.A., Baloglu, U.B., Yildirim, O., Rajendra Acharya, U.: Automated detection of COVID-19 cases using deep neural networks with x-ray images. Comput. Biol. Med. (2020). https://doi.org/10.1016/j.compbiomed.2020.103792
Sun, L., et al.: Combination of four clinical indicators predicts the severe/critical symptom of patients infected COVID-19. J. Clin. Virol. (2020). https://doi.org/10.1016/j.jcv.2020.104431
Wu, J., et al.: Rapid and accurate identification of COVID-19 infection through machine learning based on clinical available blood test results. medRxiv (2020). https://doi.org/10.1101/2020.04.02.20051136
Sharma, R., Singh, S.N.: Data mining classification techniques - comparison for better accuracy in prediction of cardiovascular disease. Int. J. Data Anal. Tech. Strategies 11(4), 356–373 (2019)
Sadrfaridpour, E., Razzaghi, T., Safro, I.: Engineering fast multilevel support vector machines. Mach. Learn. 108(11), 1879–1917 (2019). https://doi.org/10.1007/s10994-019-05800-7
Chiroma, H., et al.: Progress on artificial neural networks for big data analytics: a survey. IEEE Access 7, 70535–70551 (2019). https://doi.org/10.1109/access.2018.2880694
Deng, Z., Zhu, X., Cheng, D., Zong, M., Zhang, S.: Efficient kNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016). https://doi.org/10.1016/j.neucom.2015.08.112
Xing, W., Bei, Y.: Medical health big data classification based on kNN classification algorithm. IEEE Access 8, 28808–28819 (2020). https://doi.org/10.1109/ACCESS.2019.2955754
Djafri, L., Amar-Bensaber, D., Adjoudj, R.: Big data analytics for prediction: parallel processing of the big learning base with the possibility of improving the final result of the prediction. Inf. Discov. Deliv. 46(3), 147–160 (2018). https://doi.org/10.1108/IDD-02-2018-0002
Dhamodharavadhani, S., Rathipriya, R.: Enhanced-logistic-regression-(ELR)-model-for-big-data. IGI Global (2019). https://doi.org/10.4018/978-1-7998-0106-1.ch008
Scutari, M., Vitolo, C., Tucker, A.: Learning Bayesian networks from big data with greedy search: computational complexity and efficient implementation. Stat. Comput. 29(5), 1095–1108 (2019). https://doi.org/10.1007/s11222-019-09857-1
Fengying, M., Zhang, J., Liang, W., Xue, J.: Automated classification of atrial fibrillation using artificial neural network for wearable devices. Math. Probl. Eng. (2020). Article ID 9159158. https://doi.org/10.1155/2020/9159158
Miao, J., Zhu, W.: Precision-recall curve (PRC) classification trees. arXiv:201107640v1 [stat.ML] (2020)
Naseem, R., et al.: Performance assessment of classification algorithms on early detection of liver syndrome. J. Healthc. Eng. (2020). Article ID 6680002. https://doi.org/10.1155/2020/6680002
Eedi, H., Kolla, M.: Machine learning approaches for healthcare data analysis. J. Crit. Rev. 7(4), 806–811 (2020). ISSN 2394-5125
Rustam, F., Mehmood, A., Ahmad, M., Ullah, S., Khan, D.M., Sang Choi, G.: Classification of shopify app user reviews using novel multi text features. IEEE Access 8, 30234–30244 (2020). https://doi.org/10.1109/ACCESS.2020.2972632
Lamurias, A., Jesus, S., Neveu, V., Salek, R.M., Couto, F.M.: Information retrieval using machine learning for biomarker curation in the exposome-explorer. bioRxiv (2020). https://doi.org/10.1101/2020.12.20.423685
Zhang, X., Saleh, H., Younis, E.M.G., Sahal, R., Ali, A.A.: Predicting coronavirus pandemic in real-time using machine learning and big data streaming system. Complexity, Article ID 6688912 (2020). https://doi.org/10.1155/2020/6688912
Ghori, K.M., Imran, M., Nawaz, A., Abbasi, R.A., Ullah, A., Szathmary, L.: Performance analysis of machine learning classifiers for non-technical loss detection. J. Ambient Intell. Human. Comput. (2020). https://doi.org/10.1007/s12652-019-01649-9
Hanafy, M., Ming, R.: Machine learning approaches for auto insurance big data. Risks 9, 42 (2021). https://doi.org/10.3390/risks9020042
Muhammad, Y., Tahir, M., Hayat, M., Chong, K.: Early and accurate detection and diagnosis of heart disease using intelligent computational Model. Sci. Rep. 10, 19747 (2020). https://doi.org/10.1038/s41598-020-76635-9
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Djafri, L., Gafour, Y. (2022). Machine Learning Algorithms for Big Data Mining Processing: A Review. In: Lejdel, B., Clementini, E., Alarabi, L. (eds) Artificial Intelligence and Its Applications. AIAP 2021. Lecture Notes in Networks and Systems, vol 413. Springer, Cham. https://doi.org/10.1007/978-3-030-96311-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-96311-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96310-1
Online ISBN: 978-3-030-96311-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)