Skip to main content

Advertisement

Log in

Design and Development of a Medical Big Data Processing System Based on Hadoop

  • Transactional Processing Systems
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Secondary use of medical big data is increasingly popular in healthcare services and clinical research. Understanding the logic behind medical big data demonstrates tendencies in hospital information technology and shows great significance for hospital information systems that are designing and expanding services. Big data has four characteristics – Volume, Variety, Velocity and Value (the 4 Vs) – that make traditional systems incapable of processing these data using standalones. Apache Hadoop MapReduce is a promising software framework for developing applications that process vast amounts of data in parallel with large clusters of commodity hardware in a reliable, fault-tolerant manner. With the Hadoop framework and MapReduce application program interface (API), we can more easily develop our own MapReduce applications to run on a Hadoop framework that can scale up from a single node to thousands of machines. This paper investigates a practical case of a Hadoop-based medical big data processing system. We developed this system to intelligently process medical big data and uncover some features of hospital information system user behaviors. This paper studies user behaviors regarding various data produced by different hospital information systems for daily work. In this paper, we also built a five-node Hadoop cluster to execute distributed MapReduce algorithms. Our distributed algorithms show promise in facilitating efficient data processing with medical big data in healthcare services and clinical research compared with single nodes. Additionally, with medical big data analytics, we can design our hospital information systems to be much more intelligent and easier to use by making personalized recommendations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Lin, C., Lin, I.-C., and Roan, J., Barriers to physicians’ adoption of healthcare information technology: an empirical study on multiple hospitals. J. Med. Syst. 36(3):1965–1977, 2012.

    Article  Google Scholar 

  2. Poon, E. G., Jha, A. K., Christino, M., Honour, M. M., Fernandopulle, R., Middleton, B., Newhouse, J., Leape, L., Bates, D. W., and Blumenthal, D., Assessing the level of healthcare information technology adoption in the United States: a snapshot. BMC Med. Inform. Decis. Mak. 6(1):1, 2006.

    Article  Google Scholar 

  3. Miller, R. H., and Sim, I., Physicians’ use of electronic medical records: barriers and solutions. Health Aff. 23(2):116–126, 2004.

    Article  Google Scholar 

  4. Blumenthal, D., Stimulating the adoption of health information technology. N. Engl. J. Med. 360(15):1477–1479, 2009.

    Article  Google Scholar 

  5. Dean, J., and Ghemawat, S., Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1):107–113, 2008. doi:10.1145/1327452.1327492.

    Article  Google Scholar 

  6. Dean, J., and Ghemawat, S., MapReduce: a flexible data processing tool. Commun. ACM 53(1):72–77, 2010. doi:10.1145/1629175.1629198.

    Article  Google Scholar 

  7. Horiguchi, H., Yasunaga, H., Hashimoto, H., and Ohe, K., A user-friendly tool to transform large scale administrative data into wide table format using a mapreduce program with a pig latin based script. BMC Med. Inform. Decis. Mak. 12:8, 2012. doi:10.1186/1472-6947-12-151.

    Article  Google Scholar 

  8. Liu, B., Madduri, R. K., Sotomayor, B., Chard, K., Lacinski, L., Dave, U. J., Li, J. Q., Liu, C. C., and Foster, I. T., Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses. J. Biomed. Inform. 49:119–133, 2014. doi:10.1016/j.jbi.2014.01.005.

    Article  Google Scholar 

  9. Santana-Quintero, L., Dingerdissen, H., Thierry-Mieg, J., Mazumder, R., and Simonyan, V., HIVE-Hexagon: high-performance, parallelized sequence alignment for next-generation sequencing data analysis. PLoS One 9(6):11, 2014. doi:10.1371/journal.pone.0099033.

    Article  Google Scholar 

  10. Taylor, R. C., An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinforma. 11:6, 2010. doi:10.1186/1471-2105-11-s12-s1.

    Article  Google Scholar 

  11. Schatz, M. C., CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363–1369, 2009. doi:10.1093/bioinformatics/btp236.

    Article  Google Scholar 

  12. Jun, L., and Peng, Z., Mining explainable user interests from scalable user behavior data. First Int. Conf. Inf. Technol. Quant. Manag. 17:789–796, 2013. doi:10.1016/j.procs.2013.05.101.

    Google Scholar 

  13. Wang, Z. H., Tu, L., Guo, Z., Yang, L. T., and Huang, B. X., Analysis of user behaviors by mining large network data sets. Futur. Gener. Comput. Syst. 37:429–437, 2014. doi:10.1016/j.future.2014.02.015.

    Article  Google Scholar 

  14. Shim, J. M., Schneider, J., and Curlin, F. A., Patterns of user disclosure of Complementary and Alternative Medicine (CAM) use. Med. Care 52(8):704–708, 2014.

    Article  Google Scholar 

  15. Astin, J. A., Why patients use alternative medicine - results of a national study. JAMA J. Am. Med. Assoc. 279(19):1548–1553, 1998. doi:10.1001/jama.279.19.1548.

    Article  Google Scholar 

  16. Gustafson, D. H., Hawkins, R., Boberg, E., Pingree, S., Serlin, R. E., Graziano, F., and Chan, C. L., Impact of a patient-centered, computer-based health information/support system. Am. J. Prev. Med. 16(1):1–9, 1999. doi:10.1016/s0749-3797(98)00108-1.

    Article  Google Scholar 

  17. Powell, J., Inglis, N., Ronnie, J., and Large, S., The characteristics and motivations of online health information seekers: cross-sectional survey and qualitative interview study. J. Med. Internet Res. 13(1):11, 2011. doi:10.2196/jmir.1600.

    Article  Google Scholar 

  18. Dowell, J., and Hudson, H., A qualitative study of medication-taking behaviour in primary care. Fam. Pract. 14(5):369–375, 1997. doi:10.1093/fampra/14.5.369.

    Article  Google Scholar 

  19. Li, J.-S., Zhang, X.-G., Wang, H.-Q., Wang, Y., Wang, J.-M., and Shao, Q.-D., The meaningful use of EMR in Chinese hospitals: a case study on curbing antibiotic abuse. J. Med. Syst. 37(2):1–10, 2013.

    Article  MATH  Google Scholar 

  20. Chaudhry, B., Wang, J., Wu, S., Maglione, M., Mojica, W., Roth, E., Morton, S. C., and Shekelle, P. G., Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann. Intern. Med. 144(10):742–752, 2006. doi:10.7326/0003-4819-144-10-200605160-00125.

    Article  Google Scholar 

  21. Kobewka, D., Backman, C., Hendry, P., Hamstra, S. J., Suh, K. N., Code, C., and Forster, A. J., The feasibility of e-learning as a quality improvement tool. J. Eval. Clin. Pract. 20(5):606–610, 2014. doi:10.1111/jep.12169.

    Article  Google Scholar 

  22. Tian, Y., Zhou, T. S., Yao, Q., Zhang, M., and Li, J. S., Use of an agent-based simulation model to evaluate a mobile-based system for supporting emergency evacuation decision making. J. Med. Syst. 38(12):13, 2014. doi:10.1007/s10916-014-0149-3.

    Article  Google Scholar 

  23. Deidda, M., Lupianez-Villanueva, F., Codagnone, C., and Maghiros, I., Using data envelopment analysis to analyse the efficiency of primary care units. J. Med. Syst. 38(10):10, 2014. doi:10.1007/s10916-014-0122-1.

    Article  Google Scholar 

  24. Hernan, M. A., With great data comes great responsibility publishing comparative effectiveness research in EPIDEMIOLOGY. Epidemiology 22(3):290–291, 2011. doi:10.1097/EDE.0b013e3182114039.

    Article  Google Scholar 

  25. Weiss, N. S., The new world of data linkages in clinical epidemiology are we being brave or foolhardy? Epidemiology 22(3):292–294, 2011. doi:10.1097/EDE.0b013e318210aca5.

    Article  Google Scholar 

  26. Sturmer, T., Funk, M. J., Poole, C., and Brookhart, M. A., Nonexperimental comparative effectiveness research using linked healthcare databases. Epidemiology 22(3):298–301, 2011. doi:10.1097/EDE.0b013e318212640c.

    Article  Google Scholar 

  27. Chen, Y., Pavlov, D., and Canny, J. F., Large-scale behavioral targeting. Paper presented at the Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France, 2009.

  28. Ahmed, A., Low, Y., Aly, M., Josifovski, V., Smola, A. J., Scalable distributed inference of dynamic user interests for behavioral targeting. Paper presented at the Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego, California, USA, 2011

  29. Kim, M., Jung, Y., Jung, D., and Hur, C., Investigating the congruence of crowdsourced information with official government data: the case of pediatric clinics. J. Med. Internet Res. 16(2):12, 2014. doi:10.2196/jmir.3078.

    Google Scholar 

  30. Alor-Hernandez, G., Perez-Gallardo, Y., Posada-Gomez, R., Cortes-Robles, G., Rodriguez-Gonzalez, A., and Aguilar-Laserre, A. A., iPixel: a visual content-based and semantic search engine for retrieving digitized mammograms by using collective intelligence. Inform. Health Soc. Care 37(3):159–176, 2012. doi:10.3109/17538157.2012.654840.

    Article  Google Scholar 

  31. Gagnon, M. P., Ghandour, E. K., Talla, P. K., Simonyan, D., Godin, G., Labrecque, M., Ouimet, M., and Rousseau, M., Electronic health record acceptance by physicians: testing an integrated theoretical model. J. Biomed. Inform. 48:17–27, 2014. doi:10.1016/j.jbi.2013.10.010.

    Article  Google Scholar 

  32. Dunnebeil, S., Sunyaev, A., Blohm, I., Leimeister, J. M., and Krcmar, H., Determinants of physicians’ technology acceptance for e-health in ambulatory care. Int. J. Med. Inform. 81(11):746–760, 2012. doi:10.1016/j.ijmedinf.2012.02.002.

    Article  Google Scholar 

  33. Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., and Murthy, R., Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endowment 2(2):1626–1629, 2009.

    Article  Google Scholar 

  34. Inmon, W. H., Building the data warehouse. Wiley, New York, 2005.

    Google Scholar 

  35. Giacomelli, P., Apache mahout cookbook. Packt Publishing Ltd., 2013

  36. Bonabeau, E., Decisions 2.0: the power of collective intelligence. MIT Sloan Manag. Rev. 50(2):45–52, 2009.

    Google Scholar 

  37. Ting, K.M., Precision and recall. In: Encyclopedia of machine learning. Springer, pp 781–781, 2010.

  38. Yao Q, Wang Y, Li J-s Hospital information system integration based on cloud computing. In: 1st international workshop on cloud computing and information security. Atlantis Press, 2013.

  39. Yoo, S., Kim, S., Kim, T., Baek, R.-M., Suh, C. S., Chung, C. Y., and Hwang, H., Economic analysis of cloud-based desktop virtualization implementation at a hospital. BMC Med. Inform. Decis. Mak. 12(1):119, 2012.

    Article  Google Scholar 

  40. Yao, Q., Han, X., Ma, X.-K., Xue, Y.-F., Chen, Y.-J., and Li, J.-S., Cloud-based hospital information system as a service for grassroots healthcare institutions. J. Med. Syst. 38(9):1–7, 2014.

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation (Grant No. 61173127), National High-tech R&D Program (No. 2013AA041201), and the Zhejiang University Top Disciplinary Partnership Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing-Song Li.

Additional information

This article is part of the Topical Collection on Transactional Processing Systems

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yao, Q., Tian, Y., Li, PF. et al. Design and Development of a Medical Big Data Processing System Based on Hadoop. J Med Syst 39, 23 (2015). https://doi.org/10.1007/s10916-015-0220-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-015-0220-8

Keywords

Navigation