Skip to main content

Scalable Architecture for Personalized Healthcare Service Recommendation Using Big Data Lake

Part of the Lecture Notes in Business Information Processing book series (LNBIP,volume 234)

Abstract

The personalized health care service utilizes the relational patient data and big data analytics to tailor the medication recommendations. However, most of the health care data are in unstructured form and it consumes a lot of time and effort to pull them into relational form. This study proposes a novel data lake architecture to reduce the data ingestion time and improve the precision of healthcare analytics. It also removes the data silos and enhances the analytics by allowing the connectivity to the third-party data providers (such as clinical lab results, chemist, insurance company, etc.). The data lake architecture uses the Hadoop Distributed File System (HDFS) to provide the storage for both structured and unstructured data. This study uses K-means clustering algorithm to find the patient clusters with similar health conditions. Subsequently, it employs a support vector machine to find the most successful healthcare recommendations for the each cluster. Our experiment results demonstrate the ability of data lake to reduce the time for ingesting data from various data vendors regardless of its format. Moreover, it is evident that the data lake poses the potential to generate clusters of patients more precisely than the existing approaches. It is obvious that the data lake provides an unified storage location for the data in its native format. It can also improve the personalized healthcare medication recommendations by removing the data silos.

Keywords

  • Electronic Health Record
  • EHR
  • Data lake
  • Big data
  • Personalized medication

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Wang, H., Zhang, Z., Taleb, T.: Special issue on security and privacy of IoT. World Wide Web 21(1), 1–6 (2017)

    CrossRef  Google Scholar 

  2. Wang, H., Jiang, X., Kambourakis, G.: Special issue on security, privacy and trust in network-based big data. Inf. Sci. Int. J. 318(C), 48–50 (2015)

    MathSciNet  Google Scholar 

  3. Jain, K.K., et al.: Textbook of Personalized Medicine. Springer, New York (2009). https://doi.org/10.1007/978-1-4419-0769-1

    CrossRef  Google Scholar 

  4. Zhang, Y., Qiu, M., Tsai, C.W., Hassan, M.M., Alamri, A.: Health-CPS: Healthcare cyber-physical system assisted by cloud and big data. IEEE Syst. J. 11(1), 88–95 (2017)

    CrossRef  Google Scholar 

  5. Wang, H., Zhang, Y., et al.: Detection of motor imagery EEG signals employing naïve bayes based learning process. Measurement 86, 148–158 (2016)

    CrossRef  Google Scholar 

  6. Feldman, B., Martin, E.M., Skotnes, T.: Big data in healthcare hype and hope, October 2012. Dr. Bonnie 360 (2012)

    Google Scholar 

  7. Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Netw. Appl. 19(2), 171–209 (2014)

    CrossRef  Google Scholar 

  8. Inmon, W.H., Strauss, D., Neushloss, G.: DW 2.0: The Architecture for the Next Generation of Data Warehousing. Morgan Kaufmann, San Francisco (2010)

    Google Scholar 

  9. Devlin, B., Cote, L.D.: Data Warehouse: From Architecture to Implementation. Addison-Wesley Longman Publishing Co., Inc., Boston (1996)

    Google Scholar 

  10. Simitisis, A., Vassiliadis, P., Skiadopoulos, S., Sellis, T.: Data warehouse refreshment (2007)

    Google Scholar 

  11. Amine, A., Daoud, R.A., Bouikhalene, B.: Efficiency comparaison and evaluation between two ETL extraction tools. Indonesian J. Electr. Eng. Comput. Sci. 3(1), 174–181 (2016)

    CrossRef  Google Scholar 

  12. Simitsis, A., Vassiliadis, P., Sellis, T.K.: Extraction-transformation-loading processes (2005)

    Google Scholar 

  13. Inmon, B.: Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump. Technics Publications (2016)

    Google Scholar 

  14. Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD 2016, pp. 2097–2100. ACM, New York (2016)

    Google Scholar 

  15. Walker, C., Alrehamy, H.: Personal data lake with data gravity pull. In: 2015 IEEE Fifth International Conference on Big Data and Cloud Computing (BDCloud), pp. 160–167. IEEE (2015)

    Google Scholar 

  16. Vernon, M.M., Ulicny, B., Bennett, D.: An information provider’s wish list for a next generation big data end-to-end information system. In: CIDR (2015)

    Google Scholar 

  17. Henry, R., Venkatraman, S.: Big data analytics the next big learning opportunity. J. Manage. Inf. Decis. Sci. 18(2), 17 (2015)

    Google Scholar 

  18. Mathew, P.S., Pillai, A.S.: Big data challenges and solutions in healthcare: a survey. In: Snášel, V., Abraham, A., Krömer, P., Pant, M., Muda, A.K. (eds.) Innovations in Bio-Inspired Computing and Applications. AISC, vol. 424, pp. 543–553. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28031-8_48

    CrossRef  Google Scholar 

  19. Li, H., Wang, Y., Wang, H., Zhou, B.: Multi-window based ensemble learning for classification of imbalanced streaming data. World Wide Web 20(6), 1–19 (2017)

    CrossRef  Google Scholar 

  20. Kamal, R., Shah, M.A., Hanif, A., Ahmad, J.: Real-time opinion mining of twitter data using spring XD and hadoop. In: 2017 23rd International Conference on Automation and Computing (ICAC), pp. 1–4. IEEE (2017)

    Google Scholar 

  21. Begum, N., Shankara, A.A.: Rectify and envision the server log data using apache flume. Int. J. Technol. Res. Eng. 3(9) (2016)

    Google Scholar 

  22. Abbas, A., Ali, M., Khan, M.U.S., Khan, S.U.: Personalized healthcare cloud services for disease risk assessment and wellness management using social media. Pervasive Mobile Comput. 28, 81–99 (2016)

    CrossRef  Google Scholar 

  23. Archenaa, J., Anita, E.M.: A survey of big data analytics in healthcare and government. Procedia Comput. Sci. 50, 408–413 (2015)

    CrossRef  Google Scholar 

  24. Shaikh, S., Vora, D.: YARN versus MapReduce-a comparative study. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1294–1297. IEEE (2016)

    Google Scholar 

  25. Patel, V., Adhil, M., Bhardwaj, T., Talukder, A.K.: Big data analytics of genomic and clinical data for diagnosis and prognosis of cancer. In: 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 611–615. IEEE (2015)

    Google Scholar 

  26. Sun, L., Wang, H., Soar, J., Rong, C.: Purpose based access control for privacy protection in e-healthcare services. J. Softw. 7(11), 2443–2449 (2012)

    CrossRef  Google Scholar 

  27. Li, J., Wang, H., Jin, H., Yong, J.: Current developments of k-anonymous data releasing. Electron. J. Health Inform. 3(1), 6 (2008)

    Google Scholar 

  28. Sun, L., Wang, H., Yong, J., Wu, G.: Semantic access control for cloud computing based on e-healthcare. In: 2012 IEEE 16th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 512–518. IEEE (2012)

    Google Scholar 

  29. Wang, H., Cao, J., Zhang, Y.: A flexible payment scheme and its role-based access control. IEEE Trans. Knowl. Data Eng. 17(3), 425–436 (2005)

    CrossRef  Google Scholar 

  30. Valliyappan, V., Singh, P.: Hap: protecting the apache hadoop clusters with hadoop authentication process using kerberos. In: Nagar, A., Mohapatra, D.P., Chaki, N. (eds.) Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics. SIST, vol. 43, pp. 151–161. Springer, New Delhi (2016). https://doi.org/10.1007/978-81-322-2538-6_16

    CrossRef  Google Scholar 

  31. Shaw, S., Vermeulen, A.F., Gupta, A., Kjerrumgaard, D.: Hive security. In: Practical Hive, pp. 233–243. Springer, New York (2016)

    Google Scholar 

  32. Weston, J.: Support vector machine. Tutorial http://www.cs.columbia.edu/~kathy/cs4701/documents/jason_svm_tutorial.pdf. Accessed 01 Aug 2017

  33. Ghosh, S., Dubey, S.K.: Comparative analysis of K-means and fuzzy C-means algorithms. Int. J. Adv. Comput. Sci. Appl. 4(4), 35–39 (2013)

    Google Scholar 

  34. Sun, X., Wang, H., Li, J., Zhang, Y.: Satisfying privacy requirements before data anonymization. Comput. J. 55(4), 422–437 (2012)

    CrossRef  Google Scholar 

  35. Strack, B., DeShazo, J.P., Gennings, C., Olmo, J.L., Ventura, S., Cios, K.J., Clore, J.N.: Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed Res. Int. 2014, 11 (2014)

    Google Scholar 

  36. Katehakis, D.G., Tsiknakis, M.: Electronic health record. In: Wiley Encyclopedia of Biomedical Engineering (2006)

    Google Scholar 

  37. Yoon, J., Davtyan, C., van der Schaar, M.: Discovery and clinical decision support for personalized healthcare. IEEE J. Biomed. Health Inform. 21(4), 1133–1145 (2017)

    CrossRef  Google Scholar 

  38. Davis, D.A., Chawla, N.V., Blumm, N., Christakis, N., Barabási, A.L.: Predicting individual disease risk based on medical history. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 769–778. ACM (2008)

    Google Scholar 

  39. Dentino, B., Davis, D., Chawla, N.V.: HealthcareND: leveraging EHR and care for prospective healthcare. In: Proceedings of the 1st ACM International Health Informatics Symposium, pp. 841–844. ACM (2010)

    Google Scholar 

  40. Calyam, P., Mishra, A., Antequera, R.B., Chemodanov, D., Berryman, A., Zhu, K., Abbott, C., Skubic, M.: Synchronous big data analytics for personalized and remote physical therapy. Pervasive Mobile Comput. 28, 3–20 (2016)

    CrossRef  Google Scholar 

  41. Barlow, S.: Comparing the three major approaches to healthcare data warehousing (2017)

    Google Scholar 

  42. Linn, L.A., Koo, M.B.: Blockchain for health data and its potential use in health it and health care related research. In: ONC/NIST Use of Blockchain for Healthcare and Research Workshop, Gaithersburg, Maryland, United States: ONC/NIST (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarathkumar Rangarajan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rangarajan, S., Liu, H., Wang, H., Wang, CL. (2018). Scalable Architecture for Personalized Healthcare Service Recommendation Using Big Data Lake. In: Beheshti, A., Hashmi, M., Dong, H., Zhang, W. (eds) Service Research and Innovation. ASSRI ASSRI 2015 2017. Lecture Notes in Business Information Processing, vol 234. Springer, Cham. https://doi.org/10.1007/978-3-319-76587-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-76587-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-76586-0

  • Online ISBN: 978-3-319-76587-7

  • eBook Packages: Computer ScienceComputer Science (R0)