Abstract
The personalized health care service utilizes the relational patient data and big data analytics to tailor the medication recommendations. However, most of the health care data are in unstructured form and it consumes a lot of time and effort to pull them into relational form. This study proposes a novel data lake architecture to reduce the data ingestion time and improve the precision of healthcare analytics. It also removes the data silos and enhances the analytics by allowing the connectivity to the third-party data providers (such as clinical lab results, chemist, insurance company, etc.). The data lake architecture uses the Hadoop Distributed File System (HDFS) to provide the storage for both structured and unstructured data. This study uses K-means clustering algorithm to find the patient clusters with similar health conditions. Subsequently, it employs a support vector machine to find the most successful healthcare recommendations for the each cluster. Our experiment results demonstrate the ability of data lake to reduce the time for ingesting data from various data vendors regardless of its format. Moreover, it is evident that the data lake poses the potential to generate clusters of patients more precisely than the existing approaches. It is obvious that the data lake provides an unified storage location for the data in its native format. It can also improve the personalized healthcare medication recommendations by removing the data silos.
Keywords
- Electronic Health Record
- EHR
- Data lake
- Big data
- Personalized medication
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Wang, H., Zhang, Z., Taleb, T.: Special issue on security and privacy of IoT. World Wide Web 21(1), 1–6 (2017)
Wang, H., Jiang, X., Kambourakis, G.: Special issue on security, privacy and trust in network-based big data. Inf. Sci. Int. J. 318(C), 48–50 (2015)
Jain, K.K., et al.: Textbook of Personalized Medicine. Springer, New York (2009). https://doi.org/10.1007/978-1-4419-0769-1
Zhang, Y., Qiu, M., Tsai, C.W., Hassan, M.M., Alamri, A.: Health-CPS: Healthcare cyber-physical system assisted by cloud and big data. IEEE Syst. J. 11(1), 88–95 (2017)
Wang, H., Zhang, Y., et al.: Detection of motor imagery EEG signals employing naïve bayes based learning process. Measurement 86, 148–158 (2016)
Feldman, B., Martin, E.M., Skotnes, T.: Big data in healthcare hype and hope, October 2012. Dr. Bonnie 360 (2012)
Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Netw. Appl. 19(2), 171–209 (2014)
Inmon, W.H., Strauss, D., Neushloss, G.: DW 2.0: The Architecture for the Next Generation of Data Warehousing. Morgan Kaufmann, San Francisco (2010)
Devlin, B., Cote, L.D.: Data Warehouse: From Architecture to Implementation. Addison-Wesley Longman Publishing Co., Inc., Boston (1996)
Simitisis, A., Vassiliadis, P., Skiadopoulos, S., Sellis, T.: Data warehouse refreshment (2007)
Amine, A., Daoud, R.A., Bouikhalene, B.: Efficiency comparaison and evaluation between two ETL extraction tools. Indonesian J. Electr. Eng. Comput. Sci. 3(1), 174–181 (2016)
Simitsis, A., Vassiliadis, P., Sellis, T.K.: Extraction-transformation-loading processes (2005)
Inmon, B.: Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump. Technics Publications (2016)
Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD 2016, pp. 2097–2100. ACM, New York (2016)
Walker, C., Alrehamy, H.: Personal data lake with data gravity pull. In: 2015 IEEE Fifth International Conference on Big Data and Cloud Computing (BDCloud), pp. 160–167. IEEE (2015)
Vernon, M.M., Ulicny, B., Bennett, D.: An information provider’s wish list for a next generation big data end-to-end information system. In: CIDR (2015)
Henry, R., Venkatraman, S.: Big data analytics the next big learning opportunity. J. Manage. Inf. Decis. Sci. 18(2), 17 (2015)
Mathew, P.S., Pillai, A.S.: Big data challenges and solutions in healthcare: a survey. In: Snášel, V., Abraham, A., Krömer, P., Pant, M., Muda, A.K. (eds.) Innovations in Bio-Inspired Computing and Applications. AISC, vol. 424, pp. 543–553. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28031-8_48
Li, H., Wang, Y., Wang, H., Zhou, B.: Multi-window based ensemble learning for classification of imbalanced streaming data. World Wide Web 20(6), 1–19 (2017)
Kamal, R., Shah, M.A., Hanif, A., Ahmad, J.: Real-time opinion mining of twitter data using spring XD and hadoop. In: 2017 23rd International Conference on Automation and Computing (ICAC), pp. 1–4. IEEE (2017)
Begum, N., Shankara, A.A.: Rectify and envision the server log data using apache flume. Int. J. Technol. Res. Eng. 3(9) (2016)
Abbas, A., Ali, M., Khan, M.U.S., Khan, S.U.: Personalized healthcare cloud services for disease risk assessment and wellness management using social media. Pervasive Mobile Comput. 28, 81–99 (2016)
Archenaa, J., Anita, E.M.: A survey of big data analytics in healthcare and government. Procedia Comput. Sci. 50, 408–413 (2015)
Shaikh, S., Vora, D.: YARN versus MapReduce-a comparative study. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1294–1297. IEEE (2016)
Patel, V., Adhil, M., Bhardwaj, T., Talukder, A.K.: Big data analytics of genomic and clinical data for diagnosis and prognosis of cancer. In: 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 611–615. IEEE (2015)
Sun, L., Wang, H., Soar, J., Rong, C.: Purpose based access control for privacy protection in e-healthcare services. J. Softw. 7(11), 2443–2449 (2012)
Li, J., Wang, H., Jin, H., Yong, J.: Current developments of k-anonymous data releasing. Electron. J. Health Inform. 3(1), 6 (2008)
Sun, L., Wang, H., Yong, J., Wu, G.: Semantic access control for cloud computing based on e-healthcare. In: 2012 IEEE 16th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 512–518. IEEE (2012)
Wang, H., Cao, J., Zhang, Y.: A flexible payment scheme and its role-based access control. IEEE Trans. Knowl. Data Eng. 17(3), 425–436 (2005)
Valliyappan, V., Singh, P.: Hap: protecting the apache hadoop clusters with hadoop authentication process using kerberos. In: Nagar, A., Mohapatra, D.P., Chaki, N. (eds.) Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics. SIST, vol. 43, pp. 151–161. Springer, New Delhi (2016). https://doi.org/10.1007/978-81-322-2538-6_16
Shaw, S., Vermeulen, A.F., Gupta, A., Kjerrumgaard, D.: Hive security. In: Practical Hive, pp. 233–243. Springer, New York (2016)
Weston, J.: Support vector machine. Tutorial http://www.cs.columbia.edu/~kathy/cs4701/documents/jason_svm_tutorial.pdf. Accessed 01 Aug 2017
Ghosh, S., Dubey, S.K.: Comparative analysis of K-means and fuzzy C-means algorithms. Int. J. Adv. Comput. Sci. Appl. 4(4), 35–39 (2013)
Sun, X., Wang, H., Li, J., Zhang, Y.: Satisfying privacy requirements before data anonymization. Comput. J. 55(4), 422–437 (2012)
Strack, B., DeShazo, J.P., Gennings, C., Olmo, J.L., Ventura, S., Cios, K.J., Clore, J.N.: Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed Res. Int. 2014, 11 (2014)
Katehakis, D.G., Tsiknakis, M.: Electronic health record. In: Wiley Encyclopedia of Biomedical Engineering (2006)
Yoon, J., Davtyan, C., van der Schaar, M.: Discovery and clinical decision support for personalized healthcare. IEEE J. Biomed. Health Inform. 21(4), 1133–1145 (2017)
Davis, D.A., Chawla, N.V., Blumm, N., Christakis, N., Barabási, A.L.: Predicting individual disease risk based on medical history. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 769–778. ACM (2008)
Dentino, B., Davis, D., Chawla, N.V.: HealthcareND: leveraging EHR and care for prospective healthcare. In: Proceedings of the 1st ACM International Health Informatics Symposium, pp. 841–844. ACM (2010)
Calyam, P., Mishra, A., Antequera, R.B., Chemodanov, D., Berryman, A., Zhu, K., Abbott, C., Skubic, M.: Synchronous big data analytics for personalized and remote physical therapy. Pervasive Mobile Comput. 28, 3–20 (2016)
Barlow, S.: Comparing the three major approaches to healthcare data warehousing (2017)
Linn, L.A., Koo, M.B.: Blockchain for health data and its potential use in health it and health care related research. In: ONC/NIST Use of Blockchain for Healthcare and Research Workshop, Gaithersburg, Maryland, United States: ONC/NIST (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Rangarajan, S., Liu, H., Wang, H., Wang, CL. (2018). Scalable Architecture for Personalized Healthcare Service Recommendation Using Big Data Lake. In: Beheshti, A., Hashmi, M., Dong, H., Zhang, W. (eds) Service Research and Innovation. ASSRI ASSRI 2015 2017. Lecture Notes in Business Information Processing, vol 234. Springer, Cham. https://doi.org/10.1007/978-3-319-76587-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-76587-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76586-0
Online ISBN: 978-3-319-76587-7
eBook Packages: Computer ScienceComputer Science (R0)