Skip to main content

Machine Learning Algorithms for Big Data Mining Processing: A Review

  • Conference paper
  • First Online:
Artificial Intelligence and Its Applications (AIAP 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 413))

Abstract

Big data mining is an excellent source of information and knowledge from systems to end users. However, managing such amounts of data or knowledge requires automation, which leads to serious consideration of the use of machine learning algorithms. Machine learning helps us make decisions if there is no right way to solve a problem identified in previous knowledge bases, and that is, too, one of the most widely used analysis and modeling tools for this purpose. In this work, we present an in-depth study that helps us to choose the best machine learning algorithms in order to process big data and extract knowledge from it, so that, this treatment can be very flexible, either in a simple system with sequential computing, or in a distributed system with parallel computing. To achieve this, we will, first and foremost, test the accuracy of the results provided by the classifiers; here we mean the strength and flexibility of a classifier when it comes to dealing with big data mining. Second, we will also test the execution speed for each classifier in complex cases; that is, when the classifier will not be sufficient to solve a particular problem in the context of big data mining, especially if all cases are dealt with quickly and efficiently. The results obtained in this paper demonstrated the superiority of certain classifiers over others in certain cases, and demonstrated their failure in other cases, the reason being due to the nature of the dataset, in particular the number of instances, the number of attributes , and the number of classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.

  2. 2.

    https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html.

References

  1. Bailly, S., Meyfroidt, G., Timsit, J.-F.: What’s new in ICU in 2050: big data and machine learning. Intensive Care Med. 44(9), 1524–1527 (2017). https://doi.org/10.1007/s00134-017-5034-3

    Article  Google Scholar 

  2. Jayasri, N.P., Aruna, R.: Big data analytics in health care by data mining and classification techniques. ICT Express (2021). https://doi.org/10.1016/j.icte.2021.07.001

  3. Smith, P.F., Zheng, Y.: Applications of multivariate statistical and data mining analyses to the search for biomarkers of sensorineural hearing loss, tinnitus, and vestibular dysfunction. Front. Neurol. 12, 205 (2021). https://doi.org/10.3389/fneur.2021.627294. ISSN 1664-2295

    Article  Google Scholar 

  4. Dasgupta, A., Nath, A.: Classification of machine learning algorithms. Int. J. Innov. Res. Adv. Eng. 3(3), 6–11 (2016)

    Google Scholar 

  5. Dogan, A., Birant, D.: Machine learning and data mining in manufacturing. Expert Syst. Appl. 166, 114060 (2020). https://doi.org/10.1016/j.eswa.2020.114060

    Article  Google Scholar 

  6. Kushwaha, A.K., Kar, A.K., Dwivedi, Y.K.: Applications of big data in emerging management disciplines: a literature review using text mining. Int. J. Inf. Manag. Data Insights 1(2), 100017 (2021). https://doi.org/10.1016/j.jjimei.2021.100017

    Article  Google Scholar 

  7. Chui, K.T., Lytras, M.D., Visvizi, A., Sarirete, A.: An overview of artificial intelligence and big data analytics for smart healthcare: requirements, applications, and challenges, pp. 243–254. Academic Press (2021). https://doi.org/10.1016/B978-0-12-822060-3.00015-2

  8. Sathyaraj, R., Ramanathan, L., Lavanya, K., Balasubramanian, V., Saira Banu, J.: Chicken swarm foraging algorithm for big data classification using the deep belief network classifier. Data Technol. Appl. (2020). https://doi.org/10.1108/DTA-08-2019-0146

  9. O’Donovan, P., Leahy, K., Bruton, K., O’Sullivan, T. J.: Big data in manufacturing: a systematic mapping study. J. Big Data 20(2) (2015). https://doi.org/10.1186/s40537-015-0028-x

  10. Hariri, R.H., Fredericks, E.M., Bowers, K.M.: Uncertainty in big data analytics: survey, opportunities, and challenges. J. Big Data 6(1), 1–16 (2019). https://doi.org/10.1186/s40537-019-0206-3

    Article  Google Scholar 

  11. Chen, M., Liu, Y.: Big data: a survey, mobile networks and application. 19(2), 171–209 (2014)

    Google Scholar 

  12. Erl, T., Khattak, W., Buhler, P.: Big Data Fundamentals: Concepts, Drivers and Techniques. Prentice Hall Press, Hoboken (2016)

    Google Scholar 

  13. Chan, J.O.: An architecture for big data analytics. Commun. IIMA 13(2), 1–13 (2013)

    Google Scholar 

  14. Deutsch, R., Corrigan, D., Zikopoulos, P., Giles, J.: Harness the Power of Big Data: The IBM Big Data Platform. McGraw-Hill, New York (2013)

    Google Scholar 

  15. Khan, N., Shah, H., Badsha, G., Abbasi, A.A., Alsaqer, M., Salehian, S.: 10 Vs, issues and challenges of big data. In: International Conference on Big Data and Education ICBDE 2018, pp. 203–210 (2018)

    Google Scholar 

  16. Kayyali, D., Knott, S.V.: The big-data revolution in us health care: accelerating value and innovation. Mc Kinsey Company 2(8), 1–13 (2013)

    Google Scholar 

  17. Katal, A., Wazid, M., Goudar, R.: Big data: issues, challenges, tools and good practices. In: Sixth International Conference on Contemporary Computing (IC3), pp. 404–409. IEEE (2013)

    Google Scholar 

  18. Ferguson, M.: Enterprise information protection-the impact of big data. IBM (2013)

    Google Scholar 

  19. Patgiri, R., Ahmed, A.: Big data: the v’s of the game changer paradigm. In: IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (2016). https://doi.org/10.1109/HPCC-SmartCity-DSS.2016.8

  20. IBM, The top five ways to get started with big data (2014)

    Google Scholar 

  21. Elgendy, N., Elragal, A.: Big data analytics: a literature review paper. In: Perner, P. (ed.) Advances in Data Mining. Applications and Theoretical Aspects, ICDM 8557 (2014)

    Google Scholar 

  22. Cen, T., Chu, Q., He, R.: Big data mining for investor sentiment. J. Phys. Conf. Ser. 1187(5) (2019)

    Google Scholar 

  23. Che, D., Safran, M., Peng, Z.: From big data to big data mining: challenges, issues, and opportunities. In: Hong, B., Meng, X., Chen, L., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7827, pp. 1–15. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40270-8_1

    Chapter  Google Scholar 

  24. Oussous, A., Benjelloun, F.-Z., Lahcen, A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ. - Comput. Inf. Sci. (2017). http://dx.doi.org/10.1016/j.jksuci.2017.06.001

  25. Xindong, W., Xingquan, Z., Gong-Qing, W., Wei, D.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014). https://doi.org/10.1109/TKDE.2013.109

    Article  Google Scholar 

  26. Xingquan, Z., Ian, D.: Knowledge Discovery and Data Mining: Challenges and Realities. Hershey, New York (2007). ISBN 978-1-59904-252

    Google Scholar 

  27. Bailly, S., Meyfroidt, G., Timsit, J.: What’s new in ICU in 2050: big data and machine learning. Intensive Care Med 44, 1524–1527 (2018). https://doi.org/10.1007/s00134-017-5034-3

    Article  Google Scholar 

  28. Klaine, P.V., Imran, M.A., Onireti, O., Souza, R.D.: A survey of machine learning techniques applied to self-organizing cellular networks. IEEE Commun. Surv. Tutor. 19(4), 2392–2431 (2017). https://doi.org/10.1109/COMST.2017.2727878

    Article  Google Scholar 

  29. Khan, B., Olanrewaju, R.F., Altaf, H.: Critical insight for MapReduce optimization in Hadoop. Int. J. Comput. Sci. Control Eng. 2(1), 1–7 (2014)

    Google Scholar 

  30. An, C., Lim, H., Kim, D.: Machine learning prediction for mortality of patients diagnosed with COVID-19: a nationwide Korean cohort study. Sci. Rep. 10, 1–11 (2020). https://doi.org/10.1038/s41598-020-75767-2

    Article  Google Scholar 

  31. Goodman-Meza, D., Rudas, A., Chiang, J., Adamson, P., Ebinger, J., Sun, N.: A machine learning algorithm to increase COVID-19 inpatient diagnostic capacity. PLoS One 15(9), e0239474 (2020). https://doi.org/10.1371/journal.pone.0239474

    Article  Google Scholar 

  32. Mathkunti, N.M., Rangaswamy, S.: Machine learning techniques to identify dementia. SN Comput. Sci. 1(3), 1–6 (2020). https://doi.org/10.1007/s42979-020-0099-4

    Article  Google Scholar 

  33. Muhammad, L.J., Algehyne, E.A., Usman, S.S., Ahmad, A., Chakraborty, C., Mohammed, I.A.: Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN Comput. Sci. 2(1), 1–13 (2020). https://doi.org/10.1007/s42979-020-00394-7

    Article  Google Scholar 

  34. Li, Y., Hai-Tao, Z., Jorge, G.: A machine learning-based model for survival prediction in patients with severe COVID-19 infection. medRxiv (2020). https://doi.org/10.1101/2020.02.27.20028027

  35. James, G., Witten, D., Hastie, T., Tibshirani, R.: Statistical learning. In: An Introduction to Statistical Learning. Springer Texts in Statistics, vol. 103, 15–57. Springer, New York (2013)

    Google Scholar 

  36. Siirtola, P., Roning, J.: Comparison of regression and classification models for user independent and personal stress detection. Sensors 20, 4402 (2020)

    Article  Google Scholar 

  37. Coulet, A., Chawki, M., Jay, N., Shah, N., Wack, M., Dumontier, M.: Predicting the need for a reduced drug dose, at first prescription. Sci. Rep. 8(1), 1–11 (2018). https://doi.org/10.1038/s41598-018-33980-0

    Article  Google Scholar 

  38. Nguyen, D., et al.: A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning. Sci. Rep. 9(1), 1–10 (2019). https://doi.org/10.1038/s41598-018-37741-x

    Article  Google Scholar 

  39. Lalmuanawma, S., Hussain, J., Chhakchhuak, L.: Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: a review. Chaos Solit. Fractals 139(1), 110059 (2020). https://doi.org/10.1016/j.chaos.2020.110059

    Article  MathSciNet  Google Scholar 

  40. Pham, Q., Nguyen, D.C., Huynh-The, T., Hwang, W., Pathirana, P.N.: Artificial intelligence (AI) and big data for coronavirus (COVID-19) pandemic: a survey on the state-of-the-arts. IEEE Access 8, 130820–130839 (2020). https://doi.org/10.1109/ACCESS.2020.3009328

    Article  Google Scholar 

  41. Ardakani, A.A., Kanafi, A., Acharya, U.R., Khadem, N., Mohammadi, A.: Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: results of 10 convolutional neural networks. Comput. Biol. Med. 121, 103795 (2020). https://doi.org/10.1016/j.compbiomed.2020.103795

    Article  Google Scholar 

  42. Ozturk, T., Talo, M., Yildirim, E.A., Baloglu, U.B., Yildirim, O., Rajendra Acharya, U.: Automated detection of COVID-19 cases using deep neural networks with x-ray images. Comput. Biol. Med. (2020). https://doi.org/10.1016/j.compbiomed.2020.103792

  43. Sun, L., et al.: Combination of four clinical indicators predicts the severe/critical symptom of patients infected COVID-19. J. Clin. Virol. (2020). https://doi.org/10.1016/j.jcv.2020.104431

  44. Wu, J., et al.: Rapid and accurate identification of COVID-19 infection through machine learning based on clinical available blood test results. medRxiv (2020). https://doi.org/10.1101/2020.04.02.20051136

  45. Sharma, R., Singh, S.N.: Data mining classification techniques - comparison for better accuracy in prediction of cardiovascular disease. Int. J. Data Anal. Tech. Strategies 11(4), 356–373 (2019)

    Article  Google Scholar 

  46. Sadrfaridpour, E., Razzaghi, T., Safro, I.: Engineering fast multilevel support vector machines. Mach. Learn. 108(11), 1879–1917 (2019). https://doi.org/10.1007/s10994-019-05800-7

    Article  MathSciNet  Google Scholar 

  47. Chiroma, H., et al.: Progress on artificial neural networks for big data analytics: a survey. IEEE Access 7, 70535–70551 (2019). https://doi.org/10.1109/access.2018.2880694

    Article  Google Scholar 

  48. Deng, Z., Zhu, X., Cheng, D., Zong, M., Zhang, S.: Efficient kNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016). https://doi.org/10.1016/j.neucom.2015.08.112

    Article  Google Scholar 

  49. Xing, W., Bei, Y.: Medical health big data classification based on kNN classification algorithm. IEEE Access 8, 28808–28819 (2020). https://doi.org/10.1109/ACCESS.2019.2955754

    Article  Google Scholar 

  50. Djafri, L., Amar-Bensaber, D., Adjoudj, R.: Big data analytics for prediction: parallel processing of the big learning base with the possibility of improving the final result of the prediction. Inf. Discov. Deliv. 46(3), 147–160 (2018). https://doi.org/10.1108/IDD-02-2018-0002

    Article  Google Scholar 

  51. Dhamodharavadhani, S., Rathipriya, R.: Enhanced-logistic-regression-(ELR)-model-for-big-data. IGI Global (2019). https://doi.org/10.4018/978-1-7998-0106-1.ch008

  52. Scutari, M., Vitolo, C., Tucker, A.: Learning Bayesian networks from big data with greedy search: computational complexity and efficient implementation. Stat. Comput. 29(5), 1095–1108 (2019). https://doi.org/10.1007/s11222-019-09857-1

    Article  MathSciNet  MATH  Google Scholar 

  53. Fengying, M., Zhang, J., Liang, W., Xue, J.: Automated classification of atrial fibrillation using artificial neural network for wearable devices. Math. Probl. Eng. (2020). Article ID 9159158. https://doi.org/10.1155/2020/9159158

  54. Miao, J., Zhu, W.: Precision-recall curve (PRC) classification trees. arXiv:201107640v1 [stat.ML] (2020)

  55. Naseem, R., et al.: Performance assessment of classification algorithms on early detection of liver syndrome. J. Healthc. Eng. (2020). Article ID 6680002. https://doi.org/10.1155/2020/6680002

  56. Eedi, H., Kolla, M.: Machine learning approaches for healthcare data analysis. J. Crit. Rev. 7(4), 806–811 (2020). ISSN 2394-5125

    Google Scholar 

  57. Rustam, F., Mehmood, A., Ahmad, M., Ullah, S., Khan, D.M., Sang Choi, G.: Classification of shopify app user reviews using novel multi text features. IEEE Access 8, 30234–30244 (2020). https://doi.org/10.1109/ACCESS.2020.2972632

    Article  Google Scholar 

  58. Lamurias, A., Jesus, S., Neveu, V., Salek, R.M., Couto, F.M.: Information retrieval using machine learning for biomarker curation in the exposome-explorer. bioRxiv (2020). https://doi.org/10.1101/2020.12.20.423685

  59. Zhang, X., Saleh, H., Younis, E.M.G., Sahal, R., Ali, A.A.: Predicting coronavirus pandemic in real-time using machine learning and big data streaming system. Complexity, Article ID 6688912 (2020). https://doi.org/10.1155/2020/6688912

  60. Ghori, K.M., Imran, M., Nawaz, A., Abbasi, R.A., Ullah, A., Szathmary, L.: Performance analysis of machine learning classifiers for non-technical loss detection. J. Ambient Intell. Human. Comput. (2020). https://doi.org/10.1007/s12652-019-01649-9

  61. Hanafy, M., Ming, R.: Machine learning approaches for auto insurance big data. Risks 9, 42 (2021). https://doi.org/10.3390/risks9020042

    Article  Google Scholar 

  62. Muhammad, Y., Tahir, M., Hayat, M., Chong, K.: Early and accurate detection and diagnosis of heart disease using intelligent computational Model. Sci. Rep. 10, 19747 (2020). https://doi.org/10.1038/s41598-020-76635-9

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Djafri, L., Gafour, Y. (2022). Machine Learning Algorithms for Big Data Mining Processing: A Review. In: Lejdel, B., Clementini, E., Alarabi, L. (eds) Artificial Intelligence and Its Applications. AIAP 2021. Lecture Notes in Networks and Systems, vol 413. Springer, Cham. https://doi.org/10.1007/978-3-030-96311-8_5

Download citation

Publish with us

Policies and ethics