Abstract
Big data analytics (BDA) using machine learning play an important role to drive actionable insights from a huge volume of data for different organizations, especially healthcare. The healthcare industry’s data is growing significantly due to the advancement of technology. It becomes a challenging task to acquire, store, analyze, and visualize healthcare data mainly due to the diversity of data sources like clinical records, electronic health records, sensing data, social media data, medical image data, omics data, etc. The complexity and heterogeneity of healthcare data entails proper management and analytical procedures to provide better solutions to real-world healthcare problems. In this chapter, various platforms that can handle healthcare data along with different algorithms based on machine learning (ML) techniques have been discussed. The chapter reviews different big data lifecycle challenges like storage, processing, data sharing, security, privacy, etc. The study also presents and implements a framework for big data analytics that can be explored for real-time disease prediction showcasing diabetes. This model can also be validated over different diseases like cancer detection, heart disease prediction, Alzheimer’s detection, etc. which share a commonality of data with diabetes, as in our case.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wasson M, Buck A, Robe J., Wilson M. Big data architecture style. Azur. Appl. Archit. Guid. | Microsoft Docs, 2018, pp. 1–7.
Gandomi A, Haider M. Beyond the hype: Big data concepts, methods, and analytics. Int J Inf Manag. 2015;35(2):137–44. https://doi.org/10.1016/j.ijinfomgt.2014.10.007.
Oracle. Oracle: Big Data for the enterprise Oracle White Paper—Big Data for the enterprise, An Oracle White Pap., no. June, 2013.
Dash S, Shakyawar SK, Sharma M, Kaushik S. Big data in healthcare: management, analysis and future prospects. J Big Data. 2019;6(1) https://doi.org/10.1186/s40537-019-0217-0.
How much data do we create every day? The mind-blowing stats everyone should read.
300 Hours of video are uploaded to Youtube every minute..
Google Search Statistics—Internet live stats.
Infographic: How Big Data will unlock the potential of healthcare.
Saracco R. Another shift in content production, 2020. pp. 2019–2020
Shafer T. The 42 V’s of Big Data and Data Science, kdnuggets.com Elder Res., pp. 1–3, 2017, [Online]. Available: https://www.kdnuggets.com/2017/04/42-vs-big-data-data-science.html.
Hameed Shnain A, Jasim Hadi H, Hadishaheed S, Haji Ahmad A. Big data and five V’S characteristics. Int J Adv Electron Comput Sci. 2015;2:393–2835. Available: https://www.researchgate.net/publication/332230305
Ganie SM, Malik MB. Comparative analysis of various supervised machine learning algorithms for the early prediction of type-II diabetes mellitus. Int J Med Eng Inform. 2021;1(1):1. https://doi.org/10.1504/ijmei.2021.10036078.
Herland M, Khoshgoftaar TM, Wald R. A review of data mining using big data in health informatics. J Big Data. 2014;1(1) https://doi.org/10.1186/2196-1115-1-2.
Sahoo PK, Mohapatra SK, Wu SL. Analyzing Healthcare Big data with prediction for future health condition. IEEE Access. 2016;4:9786–99. https://doi.org/10.1109/ACCESS.2016.2647619.
Pashazadeh A, Navimipour NJ. Big data handling mechanisms in the healthcare applications: a comprehensive and systematic literature review. J Biomed Inform. 2018;2017(82):47–62. https://doi.org/10.1016/j.jbi.2018.03.014.
Abouelmehdi K, Beni-Hessane A, Khaloufi H. Big healthcare data: preserving security and privacy. J. Big Data. 2018;5(1):1–18. https://doi.org/10.1186/s40537-017-0110-7.
Bahri S, Zoghlami N, Abed M, Tavares JMRS. BIG DATA for Healthcare: a survey. IEEE Access. 2019;7:7397–408. https://doi.org/10.1109/ACCESS.2018.2889180.
Chong D, Shi H. Big data analytics: a literature review. J Manag Anal. 2015;2(3):175–201. https://doi.org/10.1080/23270012.2015.1082449.
Tsai CW, Lai CF, Chao HC, Vasilakos AV. Big data analytics: a survey. J Big Data. 2015;2(1):1–32. https://doi.org/10.1186/s40537-015-0030-3.
B. T. Erl, P. Buhler, and W. Kha, Big Data adoption on and planning considerations LiveLessons (Video Training) Big Data analytics lifecycle. This chapter is from the book Business Case Evaluation This chapter is from the book, 2019, pp. 1–19.
Yaqoob I, et al. Big data: From beginning to future. Int J Inf Manag. 2016;36(6):1231–47. https://doi.org/10.1016/j.ijinfomgt.2016.07.009.
Delen D, Ram S. Research challenges and opportunities in business analytics. J Bus Anal. 2018;1(1):2–12. https://doi.org/10.1080/2573234x.2018.1507324.
Mazumdar S, Seybold D, Kritikos K, Verginadis Y. A survey on data storage and placement methodologies for cloud-big data ecosystem. J Big Data. 2019;6(1):1–37. Springer International Publishing
Winter G. Machine learning in healthcare. Br J Heal Care Manag. 2019;25(2):100–1. https://doi.org/10.12968/bjhc.2019.25.2.100.
Ganie SM, Malik MB, Arif T. Various platforms and machine learning techniques for Big Data analytics: a technological survey. Int J Scientific Res Comput Sci Eng Inform Technol. 2018;3(6):679–87.
Singh D, Reddy CK. A survey on platforms for big data analytics. J. Big Data. 2015;2(1):1–20. https://doi.org/10.1186/s40537-014-0008-6.
Irestig M, Hallberg N, Eriksson H, Timpka T. Peer-to-peer computing in health-promoting voluntary organizations: system design analysis. J Med Syst. 2005;29(5):425–40. https://doi.org/10.1007/s10916-005-6100-x.
Landset S, Khoshgoftaar TM, Richter AN, Hasanin T. A survey of open source tools for machine learning with big data in the Hadoop ecosystem. J Big Data. 2015;2(1):1–36. https://doi.org/10.1186/s40537-015-0032-1.
Mehta S, Mehta V. Hadoop ecosystem: an introduction. Int J Sci Res. 2016;5(6):557–62. https://doi.org/10.21275/v5i6.nov164121.
Bhagavatula VSN, Raju SS. A survey of hadoop ecosystem as a handler of bigdata, no. August 2016, 2017.
Leang B, Ean S, Ryu GA, Yoo KH. Improvement of kafka streaming using partition and multi-threading in big data environment. Sensors (Switzerland). 2019;19(1) https://doi.org/10.3390/s19010134.
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: OSDI 2004—6th Symp. Oper. Syst. Des. Implement.; 2004. p. 137–49. https://doi.org/10.21276/ijre.2018.5.5.4.
Sun P, Wen Y. Scalable architectures for Big Data analysis. Encycl Big Data Technol. 2019:1446–54. https://doi.org/10.1007/978-3-319-77525-8_281.
Kaur I, Kaur N, Ummat A, Kaur J, Kaur N. Research paper on big data and Hadoop. Int J Comput Sci Technol. 2016;8491(1):50–3.
Mathiya BJ, Desai VL. Apache Hadoop Yarn Parameter configuration challenges and optimization. In: Proceedigs of the IEEE International Conference on Soft-Computing and Networks Security (ICSNS). IEEE; 2015. https://doi.org/10.1109/ICSNS.2015.7292373.
Perwej Y, Kerim B, Adrees MS, Sheta OE. An empirical exploration of the Yarn in Big Data. Int J Appl Inf Syst. 2017;12(9):19–29. https://doi.org/10.5120/ijais2017451730.
Alkatheri S, Abbas SA, Siddiqui MA. Big Data frameworks: a comparative study. Int J Comput Sci Inf Secur. 2019;17(1)
Perwej DY, Omer M, Kerim B. A comprehend the Apache Flink in big data environments. IOSR J Comput Eng (IOSR-JCE). 2018;20(1):48–58. https://doi.org/10.9790/0661-2001044858.
Rabl T, Traub J, Katsifodimos A, Markl V. Apache Flink in current research. IT Inf Technol. 2016;58(4):2–9. https://doi.org/10.1515/itit-2016-0005.
Benbrahim H, Hachimi H, Amine A. Comparison between Hadoop and Spark. In: Proceedings of the International Conference on Industrial Engineering and Operations Management, vol. 2019; 2019. p. 690–701.
Qureshi NM, et al. Dynamic container-based resource management framework of spark ecosystem. In: 2019 21st International Conference on Advanced Communication Technology (ICACT). IEEE; 2019. p. 522–6. https://doi.org/10.23919/ICACT.2019.8701970.
Basu P. HDFS for big data. J Chem Inf Model. 2013;53(9):1689–99. https://doi.org/10.1017/CBO9781107415324.004.
Jin C, Ran S. The research for storage scheme based on Hadoop. In: Proceedings of the 2015 IEEE International Conference Computer and Communications (ICCC) 2015. IEEE; 2015. p. 62–6. https://doi.org/10.1109/CompComm.2015.7387541.
Swarna C, Ansari Z. Apache Pig—a data flow framework based on Hadoop map reduce. Int J Eng Trends Technol. 2017;50(5):271–5. https://doi.org/10.14445/22315381/ijett-v50p244.
Fuad A, Erwin A, Ipung HP. Processing performance on Apache Pig, Apache Hive and MySQL cluster. In: Proceedings of the 2014 International Conference on Information, Communication Technology and System (ICTS), 2014. IEEE; 2014. p. 297–301. https://doi.org/10.1109/ICTS.2014.7010600.
Eluri VR, Ramesh M, Al-Jabri ASM, Jane M. A comparative study of various clustering techniques on big data sets using Apache Mahout. In: 2016 3rd MEC Int. Conf. Big Data Smart City, ICBDSC 2016. IEEE; 2016. p. 374–7. https://doi.org/10.1109/ICBDSC.2016.7460397.
Kumar D, Ali L, Memon S. Design and implementation of high performance computing (HPC) cluster design and implementation of high performance computing (HPC) Cluster, no. January, 2018.
Yeo CS, Buyya R, Eskicioglu R, Graham P. Handbook of nature-inspired and innovative computing. In: Handbook nature inspired innovative computing, June 2014; 2006. p. 0–24. https://doi.org/10.1007/0-387-27705-6.
Ruiz-Rosero J, Ramirez-Gonzalez G, Khanna R. Field programmable gate array applications—a scientometric review. Computation. 2019;7(4):63. https://doi.org/10.3390/computation7040063.
Lai H, Huang H, Keshavjee K, Guergachi A, Gao X. Predictive models for diabetes mellitus using machine learning techniques. BMC Endocr Disord. 2019;19(1):1–9. https://doi.org/10.1186/s12902-019-0436-6.
Guleria P, Sood M. Intelligent Learning analytics in healthcare sector using machine learning. 2020.
Sarwar MA, Kamal N, Hamid W, Shah MA. Prediction of diabetes using machine learning algorithms in healthcare. In: ICAC 2018–2018 24th IEEE Int. Conf. Autom. Comput. Improv. Product. through Autom. Comput., September; 2018. p. 1–6. https://doi.org/10.23919/IConAC.2018.8748992.
Doupe P, Faghmous J, Basu S. Machine learning for health services researchers. Value Heal. 2019;22(7):808–15. https://doi.org/10.1016/j.jval.2019.02.012.
Ferdous M, Debnath J, Chakraborty NR. Machine learning algorithms in healthcare: a literature survey. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE; 2020. https://doi.org/10.1109/ICCCNT49239.2020.9225642.
Patil R, Tamane S. A comparative analysis on the evaluation of classification algorithms in the prediction of diabetes. Int J Electr Comput Eng. 2018;8(5):3966–75. https://doi.org/10.11591/ijece.v8i5.pp3966-3975.
Celine S, Dominic MM, Devi MS. Logistic regression for employability prediction. Int J Innov Technol Explor Eng. 2020;9(3):2471–8. https://doi.org/10.35940/ijitee.c8170.019320.
Kaviani P, Dhotre S. International journal of advance engineering and research short survey on Naive Bayes algorithm. Int J Adv Eng Res Dev. 2017;4(11):607–11.
Elkan C. Naive Bayesian learning. 2007, pp. 1–4.
Jegan C, Kumari VA, Chitra R. Classification of diabetes disease using support vector machine. Int J Eng Res Appl. 2018;3(2):1797–801. Available: https://www.researchgate.net/publication/320395340
Abdillah AA, Suwarno S. Diagnosis of diabetes using support vector machines with radial basis function kernels. Int J Technol. 2016;7(5):849–58. https://doi.org/10.14716/ijtech.v7i5.1370.
Tree D. Decision trees tutorial (https://opendatascience.com/decision-trees-tutorial/), 2020, pp. 1–11.
Chari KK, Chinna Babu M, Kodati S. Classification of diabetes using random forest with feature selection algorithm. Int J Innov Technol Explor Eng. 2019;9(1):1295–300. https://doi.org/10.35940/ijitee.L3595.119119.
Lateef Z. A comprehensive guide to Random Forest in R, pp. 1–14, 2019 [Online]. Available: https://www.edureka.co/blog/naive-bayes-in-r/.
Santhosh KV, Nayak S. Engineering vibration communication and information processing, vol. 478. Springer; 2019. p. 523–35. https://doi.org/10.1007/978-981-13-1642-5.
Is W, Learning D. what is a neural network? Introduction to artificial neural networks. 2020, pp. 1–7.
View ALL Data Sets Citation Policy. 2021, p. 2021.
Malik MM, Abdallah S, Ala’raj M. Data mining and predictive analytics applications for the delivery of healthcare services: a systematic literature review. Ann Oper Res. 2018;270(1–2):287–312. https://doi.org/10.1007/s10479-016-2393-z.
Nissa N, Jamwal S, Mohammad S. Early detection of cardiovascular disease using machine learning techniques an experimental study. Int J Recent Technol Eng. 2020;9(3):635–41. https://doi.org/10.35940/ijrte.c46570.99320.
Anaconda Inc., Anaconda Distribution, Anaconda, 2019, [Online]. Available: https://www.anaconda.com/distribution/.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Ganie, S.M., Malik, M.B., Arif, T. (2022). Machine Learning Techniques for Big Data Analytics in Healthcare: Current Scenario and Future Prospects. In: Choudhury, T., Katal, A., Um, JS., Rana, A., Al-Akaidi, M. (eds) Telemedicine: The Computer Transformation of Healthcare. TELe-Health. Springer, Cham. https://doi.org/10.1007/978-3-030-99457-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-99457-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99456-3
Online ISBN: 978-3-030-99457-0
eBook Packages: MedicineMedicine (R0)