Big data analytics enhanced healthcare systems: a review

  • Sarah Shafqat
  • Saira Kishwer
  • Raihan Ur Rasool
  • Junaid Qadir
  • Tehmina Amjad
  • Hafiz Farooq Ahmad


There is increased interest in deploying big data technology in the healthcare industry to manage massive collections of heterogeneous health datasets such as electronic health records and sensor data, which are increasing in volume and variety due to the commoditization of digital devices such as mobile phones and wireless sensors. The modern healthcare system requires an overhaul of traditional healthcare software/hardware paradigms, which are ill-equipped to cope with the volume and diversity of the modern health data and must be augmented with new “big data” computing and analysis capabilities. For researchers, there is an opportunity in healthcare data analytics to study this vast amount of data, find patterns and trends within data and provide a solution for improving healthcare, thereby reducing costs, democratizing health access, and saving valuable human lives. In this paper, we present a comprehensive survey of different big data analytics integrated healthcare systems and describe the various applicable healthcare data analytics algorithms, techniques, and tools that may be deployed in wireless, cloud, Internet of Things settings. Finally, the contribution is given in formation of a convergence point of all these platforms in form of SmartHealth that could result in contributing to unified standard learning healthcare system for future.


Healthcare analytics Big data Cloud computing Knowledge management Learning healthcare system 


  1. 1.
    Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2(1):3CrossRefGoogle Scholar
  2. 2.
    Perer A (2012) Healthcare analytics for clinical and non-clinical settings. Proceedings of CHI ConferenceGoogle Scholar
  3. 3.
    Krumholz HM (2014) Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. Health Aff (Millwood) 33(7):1163–1170CrossRefGoogle Scholar
  4. 4.
    The Learning Healthcare Project the Learning Health Care Project.
  5. 5.
    Cloud Security Alliance (2013) Big data analytics for security intelligence. Big Data Working GroupGoogle Scholar
  6. 6.
    IBM Centre for applied insights (2014) Raising the game: The IBM business tech trends studyGoogle Scholar
  7. 7.
    Islam SR, Kwak D, Kabir MH, Hossain M, Kwak K-S (2015) The internet of things for health care: a comprehensive survey. IEEE Access 3:678–708CrossRefGoogle Scholar
  8. 8.
    Cunha J, Silva C, Antunes M (2015) Health twitter big bata management with hadoop framework. Procedia Comput Sci 64:425–431CrossRefGoogle Scholar
  9. 9.
    Basuthkar VS, Srinivas C (2016) Cost effective knowledge based quality and value data extraction from clinical healthcare data. Int J Adv Res Comput Commun Eng 5(4):1098–1103 [India]Google Scholar
  10. 10.
    Kaur H, Wasan SK (2006) Empirical study on applications of data mining techniques in healthcare. J Comput Sci 2(2):194–200CrossRefGoogle Scholar
  11. 11.
    Srinivas K, Rani BK, Govrdhan A (2010) Applications of data mining techniques in healthcare and prediction of heart attacks. Int J Comput Sci Eng 2(02):250–255Google Scholar
  12. 12.
    Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J (2016) Doctor ai: predicting clinical events via recurrent neural networks. In: Machine Learning for Healthcare Conference, pp 301–318Google Scholar
  13. 13.
    Wu X et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37CrossRefGoogle Scholar
  14. 14.
    Buczak AL, Guven E (2016) A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun Surv Tutor 18(2):1153–1176CrossRefGoogle Scholar
  15. 15.
    Boukenze B, Mousannif H, Haqiq A (2016) Predictive analytics in healthcare system using data mining techniques. Comput Sci Inf Technol 1:1–9Google Scholar
  16. 16.
    Talbi E-G (2002) A taxonomy of hybrid metaheuristics. J Heuristics 8(5):541–564CrossRefGoogle Scholar
  17. 17.
    Alba E, Giacobini M, Tomassini M, Romero S (2002) Comparing synchronous and asynchronous cellular genetic algorithms. In: International Conference on Parallel Problem Solving from Nature, pp 601–610Google Scholar
  18. 18.
    Ramesh AN, Kambhampati C, Monson JR, Drew PJ (2004) Artificial intelligence in medicine. Ann R Coll Surg Engl 86(5):334CrossRefGoogle Scholar
  19. 19.
    Bujok P (2013) Synchronous and asynchronous migration in adaptive differential evolution algorithms. Neural Netw World 23(1):17CrossRefGoogle Scholar
  20. 20.
    Zhao J, Papapetrou P, Asker L, Boström H (2017) Learning from heterogeneous temporal data in electronic health records. J Biomed Inform 65:105–119CrossRefGoogle Scholar
  21. 21.
    Archenaa J, Anita EM (2015) A survey of big data analytics in healthcare and government. Procedia Comput Sci 50:408–413CrossRefGoogle Scholar
  22. 22.
    Neto S, Ferraz FS (2016) Disease surveillance big data platform for large scale event processing. In: Proceedings on the International Conference on Internet Computing (ICOMP), p 89Google Scholar
  23. 23.
    Xiao X (2016) Data mining techniques for complex user-generated data. Politecnico di Torino, TurinGoogle Scholar
  24. 24.
    Ling ZJ et al (2014) Gemini: an integrative healthcare analytics system. Proc VLDB Endow 7(13):1766–1771CrossRefGoogle Scholar
  25. 25.
    Calyam P et al (2016) Synchronous big data analytics for personalized and remote physical therapy. Pervasive Mob Comput 28:3–20CrossRefGoogle Scholar
  26. 26.
    Kulkarni SM, Babu BS (2015) Cloud-based patient profile analytics system for monitoring diabetes mellitus. In: International Conference on Computational Systems for Health & Sustainability (CSFHS), IJITR, pp 228–231Google Scholar
  27. 27.
    Ng K, Ghoting A, Steinhubl SR, Stewart WF, Malin B, Sun J (2014) PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records. J Biomed Inform 48:160–170CrossRefGoogle Scholar
  28. 28.
    Kobielus J, Marcus B (2014) Deploying big data analytics applications to the cloud. In: The Cloud Standards Customer Council 2014Google Scholar
  29. 29.
    Shafqat S, Abbasi A, Amjad T, Ahmad HF (2018) SmartHealth simulation representing a hybrid architecture over cloud integrated with IoT: a modular approach. In: Presented at the Future of Information and Communications Conference (FICC) 2018, SingaporeGoogle Scholar
  30. 30.
    Health Level Seven Standard. American National Standard Institute.
  31. 31.
    WHO Who we are, what we do. WHO.
  32. 32.
    Standards Organizations for the NHII. ASPE, 26-Nov-2016.
  33. 33.
  34. 34.
    Kaggal VC et al (2016) Toward a learning health-care system-knowledge delivery at the point of care empowered by big data and NLP. Biomed Inform Insights 8(Suppl 1):13Google Scholar
  35. 35.
    Andreu-Perez J, Poon CC, Merrifield RD, Wong ST, Yang G-Z (2015) Big data for health. IEEE J Biomed Health Inform 19(4):1193–1208CrossRefGoogle Scholar
  36. 36.
    Cortés R, Bonnaire X, Marin O, Sens P (2015) Stream processing of healthcare sensor data: studying user traces to identify challenges from a big data perspective. Procedia Comput Sci 52:1004–1009CrossRefGoogle Scholar
  37. 37.
    Escaravage J, Guerra P (2013) Enabling cloud analytics with data level security. In: Tapping the full potential of big data and cloud, Booz, Allen, HamiltonGoogle Scholar
  38. 38.
    Cao P et al (2015) Towards an unified security testbed and security analytics framework. In: ACM, Urbana, USAGoogle Scholar
  39. 39.
    Pentland A, Reid TG, Heibeck T (2013) Revolutionizing medicine and public health. Report of the Big Data and Health Working Group. World Innovation Summit for Health, DohaGoogle Scholar
  40. 40.
    Berkman LF (2001) Social integration, social networks, and health. In: Smelser NJ, Baltes PB (eds) International encyclopedia of the social & behavioral sciences. Pergamon, Oxford, pp 14327–14332CrossRefGoogle Scholar
  41. 41.
    Valente TW (2010) Social networks and health: models, methods, and applications. Oxford University Press, New YorkCrossRefGoogle Scholar
  42. 42.
    WHO Report from the first consultation of the health and social protection action research & knowledge sharing (SPARKS) network. WHO. Based on:
  43. 43.
    Sengur A, Turkoglu I (2008) A hybrid method based on artificial immune system and fuzzy k-NN algorithm for diagnosis of heart valve diseases. Expert Syst Appl 35(3):1011–1020CrossRefGoogle Scholar
  44. 44.
    Zheng B, Yoon SW, Lam SS (2014) Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst Appl 41(4):1476–1482CrossRefGoogle Scholar
  45. 45.
    Khaing HW (2011) Data mining based fragmentation and prediction of medical data. In: Computer Research and Development (ICCRD), 2011 3rd International Conference on, vol 2, pp 480–485Google Scholar
  46. 46.
    Sawacha Z, Guarneri G, Avogaro A, Cobelli C (2010) A new classification of diabetic gait pattern based on cluster analysis of biomechanical data. SAGE Publications, Los AngelesGoogle Scholar
  47. 47.
    Phanich M, Pholkul P, Phimoltares S (2010) Food recommendation system using clustering analysis for diabetic patients. In: Information Science and Applications (ICISA), 2010 International Conference on, pp 1–8Google Scholar
  48. 48.
    Antonelli D, Baralis E, Bruno G, Cerquitelli T, Chiusano S, Mahoto N (2013) Analysis of diabetic patients through their examination history. Expert Syst Appl 40(11):4672–4678CrossRefGoogle Scholar
  49. 49.
    Purwar A, Singh SK (2015) Hybrid prediction model with missing value imputation for medical data. Expert Syst Appl 42(13):5621–5631CrossRefGoogle Scholar
  50. 50.
    Polat K, Güneş S, Arslan A (2008) A cascade learning system for classification of diabetes disease: generalized discriminant analysis and least square support vector machine. Expert Syst Appl 34(1):482–487CrossRefGoogle Scholar
  51. 51.
    Karan O, Bayraktar C, Gümüşkaya H, Karlık B (2012) Diagnosing diabetes using neural networks on small mobile devices. Expert Syst Appl 39(1):54–60CrossRefGoogle Scholar
  52. 52.
    Menshawy ME, Benharref A, Serhani M (2015) An automatic mobile-health based approach for EEG epileptic seizures detection. Expert Syst Appl 42(20):7157–7174CrossRefGoogle Scholar
  53. 53.
    Miah SJ, Hasan J, Gammack JG (2017) On-cloud healthcare clinic: an e-health consultancy approach for remote communities in a developing country. Telemat Inform 34(1):311–322CrossRefGoogle Scholar
  54. 54.
    Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I (2017) Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J 15:104–116CrossRefGoogle Scholar
  55. 55.
    Demirkan H, Delen D (2013) Leveraging the capabilities of service-oriented decision support systems: putting analytics and big data in cloud. Decision Support Systems, vol 55, no 1, pp 412–421Google Scholar
  56. 56.
    Sordo M (2002) Introduction of neural networks in healthcare. Open Clinical: Knowledge Management for Medical CareGoogle Scholar
  57. 57.
    Sun J et al (2014) Predicting changes in hypertension control using electronic health records from a chronic disease management program. J Am Med Inform Assoc 21:337–344CrossRefGoogle Scholar
  58. 58.
    Luo D, Wang F, Sun J, Markatou M, Hu J, Ebadollahi S (2012) Sor: scalable orthogonal regression for non-redundant feature selection and its healthcare applications. In: Proceedings of the 2012 SIAM International Conference on Data Mining, pp 576–587Google Scholar
  59. 59.
    Zhou J, Sun J, Liu Y, Hu J, Ye J (2013) Patient risk prediction model via top-k stability selection. In: Proceedings of the 2013 SIAM International Conference on Data Mining, pp 55–63Google Scholar
  60. 60.
    Sondhi P, Sun J, Zhai C, Sorrentino R, Kohn MS (2012) Leveraging medical thesauri and physician feedback for improving medical literature retrieval for case queries. J Am Med Inform Assoc 19(5):851–858CrossRefGoogle Scholar
  61. 61.
    Eswari T, Sampath P, Lavanya S (2015) Predictive methodology for diabetic data analysis in big data. Procedia Comput Sci 50:203–208CrossRefGoogle Scholar
  62. 62.
    Esfandiari N, Babavalian MR, Moghadam A-ME, Tabar VK (2014) Knowledge discovery in medicine: current issue and future trend. Expert Syst Appl 41(9):4434–4463CrossRefGoogle Scholar
  63. 63.
  64. 64.
    Li J, Jiang B, Fine JP (2012) Multicategory reclassification statistics for assessing improvements in diagnostic accuracy. Biostatistics 14(2):382–394CrossRefGoogle Scholar
  65. 65.
    Cloud Analytics Platform. Gurucul Predictive Security AnalyticsGoogle Scholar
  66. 66.
    Kang U, Tong H, Sun J (2012) Fast random walk graph kernel. In: Proceedings of the 2012 SIAM International Conference on Data Mining, pp 828–838Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.International Islamic University, IIUIIslamabadPakistan
  2. 2.Victoria UniversityMelbourneAustralia
  3. 3.Information Technology University (ITU)LahorePakistan
  4. 4.College of Computer Sciences and Information Technology (CCSIT)King Faisal UniversityAlahssaSaudi Arabia

Personalised recommendations