Skip to main content

Machine Learning Techniques for Big Data Analytics in Healthcare: Current Scenario and Future Prospects

  • Chapter
  • First Online:
Telemedicine: The Computer Transformation of Healthcare

Part of the book series: TELe-Health ((TEHE))

Abstract

Big data analytics (BDA) using machine learning play an important role to drive actionable insights from a huge volume of data for different organizations, especially healthcare. The healthcare industry’s data is growing significantly due to the advancement of technology. It becomes a challenging task to acquire, store, analyze, and visualize healthcare data mainly due to the diversity of data sources like clinical records, electronic health records, sensing data, social media data, medical image data, omics data, etc. The complexity and heterogeneity of healthcare data entails proper management and analytical procedures to provide better solutions to real-world healthcare problems. In this chapter, various platforms that can handle healthcare data along with different algorithms based on machine learning (ML) techniques have been discussed. The chapter reviews different big data lifecycle challenges like storage, processing, data sharing, security, privacy, etc. The study also presents and implements a framework for big data analytics that can be explored for real-time disease prediction showcasing diabetes. This model can also be validated over different diseases like cancer detection, heart disease prediction, Alzheimer’s detection, etc. which share a commonality of data with diabetes, as in our case.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wasson M, Buck A, Robe J., Wilson M. Big data architecture style. Azur. Appl. Archit. Guid. | Microsoft Docs, 2018, pp. 1–7.

    Google Scholar 

  2. Gandomi A, Haider M. Beyond the hype: Big data concepts, methods, and analytics. Int J Inf Manag. 2015;35(2):137–44. https://doi.org/10.1016/j.ijinfomgt.2014.10.007.

    Article  Google Scholar 

  3. Oracle. Oracle: Big Data for the enterprise Oracle White Paper—Big Data for the enterprise, An Oracle White Pap., no. June, 2013.

    Google Scholar 

  4. Dash S, Shakyawar SK, Sharma M, Kaushik S. Big data in healthcare: management, analysis and future prospects. J Big Data. 2019;6(1) https://doi.org/10.1186/s40537-019-0217-0.

  5. How much data do we create every day? The mind-blowing stats everyone should read.

    Google Scholar 

  6. 300 Hours of video are uploaded to Youtube every minute..

    Google Scholar 

  7. Google Search Statistics—Internet live stats.

    Google Scholar 

  8. Infographic: How Big Data will unlock the potential of healthcare.

    Google Scholar 

  9. Saracco R. Another shift in content production, 2020. pp. 2019–2020

    Google Scholar 

  10. Shafer T. The 42 V’s of Big Data and Data Science, kdnuggets.com Elder Res., pp. 1–3, 2017, [Online]. Available: https://www.kdnuggets.com/2017/04/42-vs-big-data-data-science.html.

  11. Hameed Shnain A, Jasim Hadi H, Hadishaheed S, Haji Ahmad A. Big data and five V’S characteristics. Int J Adv Electron Comput Sci. 2015;2:393–2835. Available: https://www.researchgate.net/publication/332230305

    Google Scholar 

  12. Ganie SM, Malik MB. Comparative analysis of various supervised machine learning algorithms for the early prediction of type-II diabetes mellitus. Int J Med Eng Inform. 2021;1(1):1. https://doi.org/10.1504/ijmei.2021.10036078.

    Article  Google Scholar 

  13. Herland M, Khoshgoftaar TM, Wald R. A review of data mining using big data in health informatics. J Big Data. 2014;1(1) https://doi.org/10.1186/2196-1115-1-2.

  14. Sahoo PK, Mohapatra SK, Wu SL. Analyzing Healthcare Big data with prediction for future health condition. IEEE Access. 2016;4:9786–99. https://doi.org/10.1109/ACCESS.2016.2647619.

    Article  Google Scholar 

  15. Pashazadeh A, Navimipour NJ. Big data handling mechanisms in the healthcare applications: a comprehensive and systematic literature review. J Biomed Inform. 2018;2017(82):47–62. https://doi.org/10.1016/j.jbi.2018.03.014.

    Article  Google Scholar 

  16. Abouelmehdi K, Beni-Hessane A, Khaloufi H. Big healthcare data: preserving security and privacy. J. Big Data. 2018;5(1):1–18. https://doi.org/10.1186/s40537-017-0110-7.

    Article  Google Scholar 

  17. Bahri S, Zoghlami N, Abed M, Tavares JMRS. BIG DATA for Healthcare: a survey. IEEE Access. 2019;7:7397–408. https://doi.org/10.1109/ACCESS.2018.2889180.

    Article  Google Scholar 

  18. Chong D, Shi H. Big data analytics: a literature review. J Manag Anal. 2015;2(3):175–201. https://doi.org/10.1080/23270012.2015.1082449.

    Article  Google Scholar 

  19. Tsai CW, Lai CF, Chao HC, Vasilakos AV. Big data analytics: a survey. J Big Data. 2015;2(1):1–32. https://doi.org/10.1186/s40537-015-0030-3.

    Article  Google Scholar 

  20. B. T. Erl, P. Buhler, and W. Kha, Big Data adoption on and planning considerations LiveLessons (Video Training) Big Data analytics lifecycle. This chapter is from the book Business Case Evaluation This chapter is from the book, 2019, pp. 1–19.

    Google Scholar 

  21. Yaqoob I, et al. Big data: From beginning to future. Int J Inf Manag. 2016;36(6):1231–47. https://doi.org/10.1016/j.ijinfomgt.2016.07.009.

    Article  Google Scholar 

  22. Delen D, Ram S. Research challenges and opportunities in business analytics. J Bus Anal. 2018;1(1):2–12. https://doi.org/10.1080/2573234x.2018.1507324.

    Article  Google Scholar 

  23. Mazumdar S, Seybold D, Kritikos K, Verginadis Y. A survey on data storage and placement methodologies for cloud-big data ecosystem. J Big Data. 2019;6(1):1–37. Springer International Publishing

    Article  Google Scholar 

  24. Winter G. Machine learning in healthcare. Br J Heal Care Manag. 2019;25(2):100–1. https://doi.org/10.12968/bjhc.2019.25.2.100.

    Article  Google Scholar 

  25. Ganie SM, Malik MB, Arif T. Various platforms and machine learning techniques for Big Data analytics: a technological survey. Int J Scientific Res Comput Sci Eng Inform Technol. 2018;3(6):679–87.

    Google Scholar 

  26. Singh D, Reddy CK. A survey on platforms for big data analytics. J. Big Data. 2015;2(1):1–20. https://doi.org/10.1186/s40537-014-0008-6.

    Article  Google Scholar 

  27. Irestig M, Hallberg N, Eriksson H, Timpka T. Peer-to-peer computing in health-promoting voluntary organizations: system design analysis. J Med Syst. 2005;29(5):425–40. https://doi.org/10.1007/s10916-005-6100-x.

    Article  Google Scholar 

  28. Landset S, Khoshgoftaar TM, Richter AN, Hasanin T. A survey of open source tools for machine learning with big data in the Hadoop ecosystem. J Big Data. 2015;2(1):1–36. https://doi.org/10.1186/s40537-015-0032-1.

    Article  Google Scholar 

  29. Mehta S, Mehta V. Hadoop ecosystem: an introduction. Int J Sci Res. 2016;5(6):557–62. https://doi.org/10.21275/v5i6.nov164121.

    Article  Google Scholar 

  30. Bhagavatula VSN, Raju SS. A survey of hadoop ecosystem as a handler of bigdata, no. August 2016, 2017.

    Google Scholar 

  31. Leang B, Ean S, Ryu GA, Yoo KH. Improvement of kafka streaming using partition and multi-threading in big data environment. Sensors (Switzerland). 2019;19(1) https://doi.org/10.3390/s19010134.

  32. Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: OSDI 2004—6th Symp. Oper. Syst. Des. Implement.; 2004. p. 137–49. https://doi.org/10.21276/ijre.2018.5.5.4.

    Chapter  Google Scholar 

  33. Sun P, Wen Y. Scalable architectures for Big Data analysis. Encycl Big Data Technol. 2019:1446–54. https://doi.org/10.1007/978-3-319-77525-8_281.

  34. Kaur I, Kaur N, Ummat A, Kaur J, Kaur N. Research paper on big data and Hadoop. Int J Comput Sci Technol. 2016;8491(1):50–3.

    MATH  Google Scholar 

  35. Mathiya BJ, Desai VL. Apache Hadoop Yarn Parameter configuration challenges and optimization. In: Proceedigs of the IEEE International Conference on Soft-Computing and Networks Security (ICSNS). IEEE; 2015. https://doi.org/10.1109/ICSNS.2015.7292373.

    Chapter  Google Scholar 

  36. Perwej Y, Kerim B, Adrees MS, Sheta OE. An empirical exploration of the Yarn in Big Data. Int J Appl Inf Syst. 2017;12(9):19–29. https://doi.org/10.5120/ijais2017451730.

    Article  Google Scholar 

  37. Alkatheri S, Abbas SA, Siddiqui MA. Big Data frameworks: a comparative study. Int J Comput Sci Inf Secur. 2019;17(1)

    Google Scholar 

  38. Perwej DY, Omer M, Kerim B. A comprehend the Apache Flink in big data environments. IOSR J Comput Eng (IOSR-JCE). 2018;20(1):48–58. https://doi.org/10.9790/0661-2001044858.

    Article  Google Scholar 

  39. Rabl T, Traub J, Katsifodimos A, Markl V. Apache Flink in current research. IT Inf Technol. 2016;58(4):2–9. https://doi.org/10.1515/itit-2016-0005.

    Article  Google Scholar 

  40. Benbrahim H, Hachimi H, Amine A. Comparison between Hadoop and Spark. In: Proceedings of the International Conference on Industrial Engineering and Operations Management, vol. 2019; 2019. p. 690–701.

    Google Scholar 

  41. Qureshi NM, et al. Dynamic container-based resource management framework of spark ecosystem. In: 2019 21st International Conference on Advanced Communication Technology (ICACT). IEEE; 2019. p. 522–6. https://doi.org/10.23919/ICACT.2019.8701970.

    Chapter  Google Scholar 

  42. Basu P. HDFS for big data. J Chem Inf Model. 2013;53(9):1689–99. https://doi.org/10.1017/CBO9781107415324.004.

    Article  Google Scholar 

  43. Jin C, Ran S. The research for storage scheme based on Hadoop. In: Proceedings of the 2015 IEEE International Conference Computer and Communications (ICCC) 2015. IEEE; 2015. p. 62–6. https://doi.org/10.1109/CompComm.2015.7387541.

    Chapter  Google Scholar 

  44. Swarna C, Ansari Z. Apache Pig—a data flow framework based on Hadoop map reduce. Int J Eng Trends Technol. 2017;50(5):271–5. https://doi.org/10.14445/22315381/ijett-v50p244.

    Article  Google Scholar 

  45. Fuad A, Erwin A, Ipung HP. Processing performance on Apache Pig, Apache Hive and MySQL cluster. In: Proceedings of the 2014 International Conference on Information, Communication Technology and System (ICTS), 2014. IEEE; 2014. p. 297–301. https://doi.org/10.1109/ICTS.2014.7010600.

    Chapter  Google Scholar 

  46. Eluri VR, Ramesh M, Al-Jabri ASM, Jane M. A comparative study of various clustering techniques on big data sets using Apache Mahout. In: 2016 3rd MEC Int. Conf. Big Data Smart City, ICBDSC 2016. IEEE; 2016. p. 374–7. https://doi.org/10.1109/ICBDSC.2016.7460397.

    Chapter  Google Scholar 

  47. Kumar D, Ali L, Memon S. Design and implementation of high performance computing (HPC) cluster design and implementation of high performance computing (HPC) Cluster, no. January, 2018.

    Google Scholar 

  48. Yeo CS, Buyya R, Eskicioglu R, Graham P. Handbook of nature-inspired and innovative computing. In: Handbook nature inspired innovative computing, June 2014; 2006. p. 0–24. https://doi.org/10.1007/0-387-27705-6.

    Chapter  Google Scholar 

  49. Ruiz-Rosero J, Ramirez-Gonzalez G, Khanna R. Field programmable gate array applications—a scientometric review. Computation. 2019;7(4):63. https://doi.org/10.3390/computation7040063.

    Article  Google Scholar 

  50. Lai H, Huang H, Keshavjee K, Guergachi A, Gao X. Predictive models for diabetes mellitus using machine learning techniques. BMC Endocr Disord. 2019;19(1):1–9. https://doi.org/10.1186/s12902-019-0436-6.

    Article  Google Scholar 

  51. Guleria P, Sood M. Intelligent Learning analytics in healthcare sector using machine learning. 2020.

    Google Scholar 

  52. Sarwar MA, Kamal N, Hamid W, Shah MA. Prediction of diabetes using machine learning algorithms in healthcare. In: ICAC 2018–2018 24th IEEE Int. Conf. Autom. Comput. Improv. Product. through Autom. Comput., September; 2018. p. 1–6. https://doi.org/10.23919/IConAC.2018.8748992.

    Chapter  Google Scholar 

  53. Doupe P, Faghmous J, Basu S. Machine learning for health services researchers. Value Heal. 2019;22(7):808–15. https://doi.org/10.1016/j.jval.2019.02.012.

    Article  Google Scholar 

  54. Ferdous M, Debnath J, Chakraborty NR. Machine learning algorithms in healthcare: a literature survey. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE; 2020. https://doi.org/10.1109/ICCCNT49239.2020.9225642.

    Chapter  Google Scholar 

  55. Patil R, Tamane S. A comparative analysis on the evaluation of classification algorithms in the prediction of diabetes. Int J Electr Comput Eng. 2018;8(5):3966–75. https://doi.org/10.11591/ijece.v8i5.pp3966-3975.

    Article  Google Scholar 

  56. Celine S, Dominic MM, Devi MS. Logistic regression for employability prediction. Int J Innov Technol Explor Eng. 2020;9(3):2471–8. https://doi.org/10.35940/ijitee.c8170.019320.

    Article  Google Scholar 

  57. Kaviani P, Dhotre S. International journal of advance engineering and research short survey on Naive Bayes algorithm. Int J Adv Eng Res Dev. 2017;4(11):607–11.

    Google Scholar 

  58. Elkan C. Naive Bayesian learning. 2007, pp. 1–4.

    Google Scholar 

  59. Jegan C, Kumari VA, Chitra R. Classification of diabetes disease using support vector machine. Int J Eng Res Appl. 2018;3(2):1797–801. Available: https://www.researchgate.net/publication/320395340

    Google Scholar 

  60. Abdillah AA, Suwarno S. Diagnosis of diabetes using support vector machines with radial basis function kernels. Int J Technol. 2016;7(5):849–58. https://doi.org/10.14716/ijtech.v7i5.1370.

    Article  Google Scholar 

  61. Tree D. Decision trees tutorial (https://opendatascience.com/decision-trees-tutorial/), 2020, pp. 1–11.

  62. Chari KK, Chinna Babu M, Kodati S. Classification of diabetes using random forest with feature selection algorithm. Int J Innov Technol Explor Eng. 2019;9(1):1295–300. https://doi.org/10.35940/ijitee.L3595.119119.

    Article  Google Scholar 

  63. Lateef Z. A comprehensive guide to Random Forest in R, pp. 1–14, 2019 [Online]. Available: https://www.edureka.co/blog/naive-bayes-in-r/.

  64. Santhosh KV, Nayak S. Engineering vibration communication and information processing, vol. 478. Springer; 2019. p. 523–35. https://doi.org/10.1007/978-981-13-1642-5.

    Book  Google Scholar 

  65. Is W, Learning D. what is a neural network? Introduction to artificial neural networks. 2020, pp. 1–7.

    Google Scholar 

  66. View ALL Data Sets Citation Policy. 2021, p. 2021.

    Google Scholar 

  67. Malik MM, Abdallah S, Ala’raj M. Data mining and predictive analytics applications for the delivery of healthcare services: a systematic literature review. Ann Oper Res. 2018;270(1–2):287–312. https://doi.org/10.1007/s10479-016-2393-z.

    Article  MathSciNet  MATH  Google Scholar 

  68. Nissa N, Jamwal S, Mohammad S. Early detection of cardiovascular disease using machine learning techniques an experimental study. Int J Recent Technol Eng. 2020;9(3):635–41. https://doi.org/10.35940/ijrte.c46570.99320.

  69. Anaconda Inc., Anaconda Distribution, Anaconda, 2019, [Online]. Available: https://www.anaconda.com/distribution/.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Majid Bashir Malik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ganie, S.M., Malik, M.B., Arif, T. (2022). Machine Learning Techniques for Big Data Analytics in Healthcare: Current Scenario and Future Prospects. In: Choudhury, T., Katal, A., Um, JS., Rana, A., Al-Akaidi, M. (eds) Telemedicine: The Computer Transformation of Healthcare. TELe-Health. Springer, Cham. https://doi.org/10.1007/978-3-030-99457-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99457-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99456-3

  • Online ISBN: 978-3-030-99457-0

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics