Tools for Healthcare Data Lake Infrastructure Benchmarking

Dolci, Tommaso; Amata, Lorenzo; Manco, Carlo; Azzalini, Fabio; Gribaudo, Marco; Tanca, Letizia

doi:10.1007/s10796-023-10468-5

Tools for Healthcare Data Lake Infrastructure Benchmarking

Published: 17 January 2024

(2024)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Tommaso Dolci ORCID: orcid.org/0000-0002-1403-7766¹,
Lorenzo Amata¹,
Carlo Manco¹,
Fabio Azzalini¹,
Marco Gribaudo¹ &
…
Letizia Tanca¹

192 Accesses
Explore all metrics

Abstract

Vast amounts of medical data are generated every day, and constitute a crucial asset to improve therapy outcomes, medical treatments and healthcare costs. Data lakes are a valuable solution for the management and analysis of such a variety and abundance of data, yet to date there is no data lake architecture specifically designed for the healthcare domain. Moreover, benchmarking the underlying infrastructure of data lakes is fundamental for optimizing resource allocation and performance, increasing the potential of this kind of data platforms. This work describes a data lake architecture to ingest, store, process, and analyze heterogeneous medical data. Also, we present a benchmark for infrastructures supporting healthcare data lakes, focusing on a variety of analysis tasks, from relational analysis to machine learning. The benchmark is tested on a virtualized implementation of our data lake architecture, and on two external cloud-based infrastructures. Our results highlight distinctions between infrastructures and tasks of different nature, according to the machine learning techniques, data sizes and formats involved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supporting Big Healthcare Data Management and Analytics: The Cloud-Based QFLS Framework

Open Source Robust Machine Learning Software for Medical Patient Data Analysis and Cloud Storage

Enabling Real-World Medicine with Data Lake Federation: A Research Perspective

Data Availability

MIMIC-III Dataset is available with credentialed access on the PhysioNet website: https://physionet.org/content/mimiciii, and the MIMIC-III Waveform Database alone at: https://physionet.org/content/mimic3wdb-matched. The remaining datasets are freely available online. Stroke Prediction Dataset at: https://kaggle.com/datasets/fedesoriano/stroke-prediction-dataset. ICU Patients Mortality Prediction Dataset at: https://kaggle.com/datasets/msafi04/predict-mortality-of-icu-patients-physionet and from PhysioNet: https://physionet.org/content/challenge-2012. Brain MRI Images Dataset at: https://kaggle.com/datasets/navoneel/brain-mri-images-for-brain-tumor-detection. MIT-BIH Arrhythmia Database at: https://physionet.org/physiobank/database/mitdb. MIT-BIH Normal Sinus Rhythm Database at: https://physionet.org/physiobank/database/nsrdb. BIDMC Congestive Heart Failure Database at: https://physionet.org/physiobank/database/chfdb.

Code Availibility

Code regarding the tasks included in the benchmark is available at: https://github.com/TommasoD/SEASHELL. The proof-of-concept implementation of the data lake architecture is available at: https://github.com/MancoCarlo/healer-prototype.

Notes

https://hadoop.apache.org
https://nifi.apache.org
https://kafka.apache.org
https://pypi.org/project/hdfs
https://atlas.apache.org
https://ranger.apache.org
https://www.docker.com
https://kubernetes.io
The proof-of-concept implementation of the data lake architecture is available at: https://github.com/MancoCarlo/healer-prototype.
Code from the benchmark tasks is available at: https://github.com/TommasoD/SEASHELL.
https://physionet.org/content/mimiciii
https://kaggle.com/datasets/fedesoriano/stroke-prediction-dataset
https://physionet.org/content/challenge-2012
https://physionet.org/physiobank/database/mitdb
https://physionet.org/physiobank/database/nsrdb
https://physionet.org/physiobank/database/chfdb
https://kaggle.com/datasets/navoneel/brain-mri-images-for-brain-tumor-detection
https://spark.apache.org/sql
https://keras.io/api
https://tensorflow.org
https://www.databricks.com/product/faq/community-edition
https://colab.research.google.com
https://www.netdata.cloud
While a computer has system RAM, most contemporary graphics cards have access to a dedicated set of memory known as Video RAM, or VRAM.

References

Agrahari, A., & Rao, D. (2017). A review paper on big data: technologies, tools and trends. International Research Journal of Engineering and Technology, 4(10), 10.
Google Scholar
Alarsan, F. I., & Younes, M. (2019). Analysis and classification of heart diseases using heartbeat features and machine learning algorithms. Journal of Big Data, 6(1). https://doi.org/10.1186/s40537-019-0244-x
Alwidian, J., Rahman, S. A., Gnaim, M., et al. (2020). Big data ingestion and preparation tools. Modern Applied Science, 14(9), 12–27.
Article Google Scholar
Baim, D. S., Colucci, W. S., Monrad, E. S., et al. (1986). Survival of patients with severe congestive heart failure treated with oral milrinone. Journal of the American College of Cardiology, 7(3), 661–670. https://doi.org/10.1016/S0735-1097(86)80478-8
Article Google Scholar
Baim, D. S., Colucci, W. S., Monrad, E. S., et al. (2000). Bidmc congestive heart failure database. PhysioNet. https://doi.org/10.13026/C29G60
Bajaber, F., Sakr, S., Batarfi, O., et al. (2020). Benchmarking big data systems: A survey. Computer Communications, 149, 241–251. https://doi.org/10.1016/j.comcom.2019.10.002
Article Google Scholar
Barbierato, E., Gribaudo, M., Serazzi, G., et al. (2021). Performance evaluation of a data lake architecture via modeling techniques. In: Performance Engineering and Stochastic Modeling. Springer, pp. 115–130.
Batini, C., Cappiello, C., Francalanci, C., et al. (2009). Methodologies for data quality assessment and improvement. ACM computing surveys (CSUR), 41(3), 1–52.
Article Google Scholar
Beheshti, A., Benatallah, B., Nouri, R., et al. (2017). Coredb: a data lake service. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2451–2454.
Bhattacharya, S., Rajan, V., & Shrivastava, H. (2017). Icu mortality prediction: a classification algorithm for imbalanced datasets. In: Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v31i1.10721
Calabrese, B., & Cannataro, M. (2015). Cloud computing in healthcare and biomedicine. Scalable Computing: Practice and Experience, 16(1), 1–18.
Google Scholar
Canham, S., Ohmann, C., Boiten, J. W., et al. (2021). EOSC-Life Report on data standards for observational and interventional studies, and interoperability between healthcare and research data. EOSC-Life: Tech. rep.
Google Scholar
Cappiello, C., Gribaudo, M., Plebani, P., et al. (2022a). Enabling real-world medicine with data lake federation: A research perspective. In: VLDB Workshop on Data Management and Analytics for Medicine and Healthcare, Springer, pp. 39–56.
Cappiello, C., Gribaudo, M., Plebani, P., et al. (2022b). Enabling real-world medicine with data lake federation: A research perspective. In: VLDB Workshop on Data Management and Analytics for Medicine and Healthcare, Springer, pp. 39–56.
Chakrabarty, N. (2019). Brain mri images for brain tumor detection. https://www.kaggle.com/datasets/navoneel/brain-mri-images-for-brain-tumor-detection
Chakraborty, M., & Kundan, A. P. (2021). Grafana. In: Monitoring Cloud-Native Applications: Lead Agile Operations Confidently Using Open Source Software. Springer, pp. 187–240.
Chawla, N. V., & Davis, D. A. (2013). Bringing big data to personalized healthcare: a patient-centered framework. Journal of general internal medicine, 28(3), 660–665.
Article Google Scholar
Chollet, F., et al. (2015). Keras. https://keras.io
Couto, J., Borges, O. T., Ruiz, D. D., et al. (2019). A mapping study about data lakes: An improved definition and possible architectures. In: SEKE, pp. 453–578.
Deekshatulu, B., Chandra, P., et al. (2013). Classification of heart disease using k-nearest neighbor and genetic algorithm. Procedia technology, 10, 85–94.
Article Google Scholar
Deligiannis, K., Raftopoulou, P., Tryfonopoulos, C., et al. (2020). Hydria: An online data lake for multi-faceted analytics in the cultural heritage domain. Big Data and Cognitive Computing, 4(2), 7.
Article Google Scholar
Deng, J., Dong, W., Socher, R., et al. (2009). Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Ieee, pp. 248–255.
Dritsas, E., & Trigka, M. (2022). Stroke risk prediction with machine learning techniques. Sensors, 22(13), 4670. https://doi.org/10.3390/s22134670
Article Google Scholar
Eder, J., & Shekhovtsov, V. A. (2021). Data quality for federated medical data lakes. International Journal of Web Information Systems, 17(5), 407–426.
Article Google Scholar
Esteva, A., Kuprel, B., Novoa, R. A., et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118.
Article Google Scholar
Giacobbe, D. R., Signori, A., Del Puente, F., et al. (2021). Early detection of sepsis with machine learning techniques: A brief clinical perspective. Front Med (Lausanne), 8, 617486.
Article Google Scholar
Giebler, C., Gröger, C., Hoos, E., et al. (2019). Leveraging the data lake: Current state and challenges. In: Proceedings of the 21st International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), pp. 179–188. https://doi.org/10.1007/978-3-030-27520-4_13
Giebler, C., Gröger, C., Hoos, E., et al. (2020). A zone reference model for enterprise-grade data lake management. In: 2020 IEEE 24th International Enterprise Distributed Object Computing Conference (EDOC), IEEE, pp. 57–66.
Giebler, C., Gröger, C., Hoos, E., et al. (2021). The data lake architecture framework. In: Database Systems for Business, Technology and Web (BTW). Gesellschaft für Informatik, Bonn. https://doi.org/10.18420/btw2021-19
Goldberger, A. L., Amaral, L. A., Glass, L., et al. (2000). Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation, 101(23), e215–e220.
Gulshan, V., Peng, L., Coram, M., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama, 316(22), 2402–2410.
Article Google Scholar
Hai, R., Geisler, S., & Quix, C. (2016). Constance: An intelligent data lake system. In: Proceedings of the 2016 international conference on management of data, pp. 2097–2100.
Hamadou, H. B., Pedersen, T. B., & Thomsen, C. (2020). The danish national energy data lake: Requirements, technical architecture, and tool selection. In: 2020 IEEE International Conference on Big Data, IEEE, pp. 1523–1532.
He, K., Zhang, X., Ren, S., et al. (2016). Deep Residual Learning for Image Recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, CVPR ’16, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
Heinis, T., & Ailamaki, A. (2017). Data infrastructure for medical research. Found Trends Databases, 8(3), 131–238. https://doi.org/10.1561/1900000050
Article Google Scholar
Hlupić, T., Oreščanin, D., Ružak, D., et al. (2022). An overview of current data lake architecture models. 2022 45th Jubilee International Convention on Information (pp. 1082–1087). IEEE: Communication and Electronic Technology (MIPRO).
Google Scholar
Huang, S., Huang, J., Dai, J., et al. (2010). The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pp. 41–51. https://doi.org/10.1109/ICDEW.2010.5452747
Hukkeri, T. S., Kanoria, V., & Shetty, J. (2020). A study of enterprise data lake solutions. International Research Journal of Engineering and Technology (IRJET), 7.
Inmon, B. (2016). Data Lake Architecture: Designing the Data Lake and avoiding the garbage dump (1st ed.). LLC, Denville, NJ, USA: Technics Publications.
Google Scholar
Isah, H., & Zulkernine, F. (2018). A scalable and robust framework for data stream ingestion. In: 2018 IEEE International Conference on Big Data, IEEE, pp. 2900–2905.
Iwase, S., Nakada, Ta., Shimada, T., et al. (2022). Prediction algorithm for icu mortality and length of stay using machine learning. Scientific reports, 12(1), 12912. https://doi.org/10.1038/s41598-022-17091-5
Article Google Scholar
Jagadeeswari, V., Subramaniyaswamy, V., Logesh, R., et al. (2018). A study on medical internet of things and big data in personalized healthcare system. Health information science and systems, 6(1), 1–20.
Article Google Scholar
Johnson, A., Pollard, T., & Mark, R. (2016a) MIMIC-III clinical database. PhysioNet. https://doi.org/10.13026/C2XW26
Johnson, A., Pollard, T., Shen, L., et al. (2016). MIMIC-III, a freely accessible critical care database. Scientific data, 3(1), 1–9.
Article Google Scholar
Kagadis, G. C., Kloukinas, C., Moore, K., et al. (2013). Cloud computing in medical imaging. Medical physics, 40(7), 070901.
Article Google Scholar
Karthikeyan, A., Garg, A., Vinod, P. K., et al. (2021). Machine learning based clinical decision support system for early covid-19 mortality prediction. Frontiers in Public Health, 9. https://doi.org/10.3389/fpubh.2021.626697
Khemphila, A., Boonjing, V. (2011). Heart disease classification using neural network and feature selection. In: 2011 21st International Conference on Systems Engineering, IEEE, pp. 406–409.
Khine, P. P., & Wang, Z. S. (2018). Data lake: a new ideology in big data era. In: ITM web of conferences, EDP Sciences, p. 03025.
Khosla, A., Cao, Y., Lin, C. C. Y., et al. (2010). An integrated machine learning approach to stroke prediction. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 183–192.
Krause, J., Gulshan, V., Rahimy, E., et al. (2018). Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy. Ophthalmology, 125(8), 1264–1272.
Article Google Scholar
Kumar, P. (2023). A minimum metadata model for healthcare data interoperability. Master’s thesis, Politecnico di Milano, available at https://hdl.handle.net/10589/204642
Liu, P., Loudcher, S., Darmont, J., et al. (2021). Archaeodal: A data lake for archaeological data management and analytics. In: 25th International Database Engineering & Applications Symposium, pp. 252–262.
Lundervold, A. S., & Lundervold, A. (2019). An overview of deep learning in medical imaging focusing on MRI. Zeitschrift für Medizinische Physik, 29(2), 102–127. https://doi.org/10.1016/j.zemedi.2018.11.002
Article Google Scholar
Madera, C., & Laurent, A. (2016). The next information architecture evolution: the data lake wave. In: Proceedings of the 8th international conference on management of digital ecosystems, pp. 174–180.
Maini, E., Venkateswarlu, B., & Gupta, A. (2018). Data lake-an optimum solution for storage andanalytics of big data in cardiovascular disease prediction system. International Journal of Computational Engineering & Management (IJCEM), 21(6), 33–39.
Google Scholar
Manco, C., Dolci, T., Azzalini, F., et al. (2023). HEALER: A data lake architecture for healthcare. In: Proceedings of the Workshops of the EDBT/ICDT 2023 Joint Conference, vol 3379. CEUR-WS.org.
McKinney, W., et al. (2010). Data structures for statistical computing in python. In: Proceedings of the 9th Python in Science Conference, pp. 51–56. https://doi.org/10.25080/Majora-92bf1922-00a
Meng, X., Bradley, J., Yavuz, B., et al. (2016). Mllib: Machine learning in apache spark. The journal of machine learning research, 17(1), 1235–1241.
Google Scholar
Mollura, M., Mantoan, G., Romano, S., et al. (2020). The role of waveform monitoring in sepsis identification within the first hour of intensive care unit stay. In: 2020 11th Conference of the European Study Group on Cardiovascular Oscillations (ESGCO), pp. 1–2. https://doi.org/10.1109/ESGCO49734.2020.9158013
Moody, B., Moody, G., Villarroel, M., et al. (2020). MIMIC-III waveform database matched subset. PhysioNet. https://doi.org/10.13026/c2294b
Moody, G. (1999). MIT-BIH normal sinus rhythm database. PhysioNet. https://doi.org/10.13026/C2NK5R
Moody, G., & Mark, R. (2001). The impact of the mit-bih arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3), 45–50. https://doi.org/10.1109/51.932724
Article Google Scholar
Moody, G., & Mark, R. (2005). MIT-BIH arrhythmia database. PhysioNet. https://doi.org/10.13026/C2F305
Nancy, A. M., & Maheswari, R. (2020). A review on unstructured data in medical data. J Crit Rev, 7, 2202–2208.
Google Scholar
Parsonson, L., Grimm, S., Bajwa, A., et al. (2012). A cloud computing medical image analysis and collaboration platform. In: Cloud Computing and Services Science, Springer, pp. 207–224.
Prasser, F., Kohlbacher, O., Mansmann, U., et al. (2018). Data integration for future medicine (DIFUTURE). Methods Inf Med, 57(S 01), e57–e65
Qian, L., Luo, Z., Du, Y., et al. (2009). Cloud computing: An overview. In: Cloud Computing: First International Conference, CloudCom 2009, Beijing, China, December 1-4, 2009. Proceedings 1, Springer, pp. 626–631.
Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2(1). https://doi.org/10.1186/2047-2501-2-3
Ravat, F., & Zhao, Y. (2019). Data lakes: Trends and perspectives. In: International Conference on Database and Expert Systems Applications, Springer, pp. 304–313.
Ren, P., Li, S., Hou, W., et al. (2021). Mhdp: an efficient data lake platform for medical multi-source heterogeneous data. In: Web Information Systems and Applications: 18th International Conference, WISA 2021, Kaifeng, China, September 24–26, 2021, Proceedings 18, Springer, pp. 727–738.
Rieke, N., Hancox, J., Li, W., et al. (2020). The future of digital health with federated learning. npj Digital Medicine, 3(1). https://doi.org/10.1038/s41746-020-00323-1
Sawadogo, P., & Darmont, J. (2021). Benchmarking data lakes featuring structured and unstructured data with dlbench. Big Data Analytics and Knowledge Discovery (pp. 15–26). Cham: Springer International Publishing.
Chapter Google Scholar
Sawadogo, P., & Darmont, J. (2021). On data lake architectures and metadata management. Journal of Intelligent Information Systems, 56(1), 97–120.
Article Google Scholar
Sha, M.M., & Rahamathulla, M. P. (2020). Cloud-based healthcare data management framework. KSII Transactions on Internet and Information Systems (TIIS), 14(3), 1014–1025.
Silva, I., Moody, G., Scott, D. J., et al. (2012). Predicting in-hospital mortality of icu patients: The physionet/computing in cardiology challenge 2012. In: 2012 Computing in Cardiology, IEEE, pp. 245–248.
Soriano, F. (2021). Stroke prediction dataset. https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset
SPEC. (2017). SPEC CPU Benchmarks. https://www.spec.org/cpu/. Accessed 24 Mar 2023.
Taher, N. C., Mallat, I., Agoulmine, N., et al. (2019). An iot-cloud based solution for real-time and batch processing of big data: Application in healthcare. In: 2019 3rd international conference on bio-engineering for smart technologies (BioSMART), IEEE, pp. 1–8.
Transaction Processing Performance Council. (2021). TCPx-HS benchmark specification. Specification 1.0, Transaction Processing Performance Council. https://www.tpc.org/tpcx-hs/
Truică, C. O., Apostol, E. S., Darmont, J., et al. (2020). TextBenDS: a generic textual data benchmark for distributed systems. Information Systems Frontiers, 23(1), 81–100. https://doi.org/10.1007/s10796-020-09999-y
Article Google Scholar
Walker, C., & Alrehamy, H. (2015). Personal data lake with data gravity pull. In: 2015 IEEE Fifth International Conference on Big Data and Cloud Computing, IEEE, pp. 160–167.
Wang, L., Zhan, J., Luo, C., et al. (2014). Bigdatabench: A big data benchmark suite from internet services. In: 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), pp. 488–499. https://doi.org/10.1109/HPCA.2014.6835958
Weber, G. M., Murphy, S. N., McMurry, A. J., et al. (2009). The shared health research information network (shrine): a prototype federated query tool for clinical data repositories. Journal of the American Medical Informatics Association, 16(5), 624–630.
Weiss, K., Khoshgoftaar, T. M., & Wang, D. (2016). A survey of transfer learning. Journal of Big data, 3(1), 1–40.
Article Google Scholar
Xin, R. (2014). Apache spark officially sets a new record in large-scale sorting. https://www.databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html. Accessed 11 July 2023.
Zaharia, M., Xin, R. S., Wendell, P., et al. (2016). Apache spark: a unified engine for big data processing. Communications of the ACM, 59(11), 56–65.
Article Google Scholar

Download references

Acknowledgements

We are grateful to Enrico Barbierato and Giuseppe Serazzi for their advice during the definition and realization of this work, and the support in the revision of the paper.

Funding

This work has been partially supported by the Health Big Data Project (CCR-2018-23669122), funded by the Italian Ministry of Economy and Finance and coordinated by the Italian Ministry of Health and the network Alleanza Contro il Cancro.

Author information

Authors and Affiliations

Dep. of Electronics, Information and Bionengineering, Politecnico di Milano, Via Ponzio 34/5, Milano, 20133, Italy
Tommaso Dolci, Lorenzo Amata, Carlo Manco, Fabio Azzalini, Marco Gribaudo & Letizia Tanca

Authors

Tommaso Dolci
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Amata
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Manco
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Azzalini
View author publications
You can also search for this author in PubMed Google Scholar
Marco Gribaudo
View author publications
You can also search for this author in PubMed Google Scholar
Letizia Tanca
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the definition and the design of this research. The data lake architecture was mainly created and implemented by Carlo Manco, with contributions from Tommaso Dolci, Fabio Azzalini, Marco Gribaudo and Letizia Tanca. The implementation and testing of the benchmark was mainly conducted by Lorenzo Amata, with contributions from Tommaso Dolci, Fabio Azzalini, Marco Gribaudo and Letizia Tanca. The first draft of the manuscript was written by Tommaso Dolci, Carlo Manco and Lorenzo Amata, and all authors contributed to the revision and improvement of the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tommaso Dolci.

Ethics declarations

Ethics Approval and Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Conflict of Interest

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dolci, T., Amata, L., Manco, C. et al. Tools for Healthcare Data Lake Infrastructure Benchmarking. Inf Syst Front (2024). https://doi.org/10.1007/s10796-023-10468-5

Download citation

Accepted: 21 December 2023
Published: 17 January 2024
DOI: https://doi.org/10.1007/s10796-023-10468-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tools for Healthcare Data Lake Infrastructure Benchmarking

Abstract

Access this article

Similar content being viewed by others

Supporting Big Healthcare Data Management and Analytics: The Cloud-Based QFLS Framework

Open Source Robust Machine Learning Software for Medical Patient Data Analysis and Cloud Storage

Enabling Real-World Medicine with Data Lake Federation: A Research Perspective

Data Availability

Code Availibility

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval and Consent to Participate

Consent for Publication

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tools for Healthcare Data Lake Infrastructure Benchmarking

Abstract

Access this article

Similar content being viewed by others

Supporting Big Healthcare Data Management and Analytics: The Cloud-Based QFLS Framework

Open Source Robust Machine Learning Software for Medical Patient Data Analysis and Cloud Storage

Enabling Real-World Medicine with Data Lake Federation: A Research Perspective

Data Availability

Code Availibility

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval and Consent to Participate

Consent for Publication

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation