Big Data: What Is It and What Does It Mean for Cardiovascular Research and Prevention Policy

  • A. R. Pah
  • L. J. Rasmussen-Torvik
  • S. Goel
  • P. Greenland
  • A. N. Kho
Cardiovascular Risk Health Policy (W Rosamond, Section Editor)
Part of the following topical collections:
  1. Topical Collection on Cardiovascular Risk Health Policy


Over the past decade, there has been explosive growth in the amount of healthcare-related data generated and interest in harnessing this data for research purposes and informing public policy. Outside of healthcare, specialized software has been developed to tackle the problems that voluminous data creates, and these techniques could be applicable in several areas of cardiovascular research. Cardiovascular risk analysis may benefit from the inclusion of patient genetic and health record data, while cardiovascular epidemiology could benefit from crowd-sourced environmental data. Some of the most significant advances may come from the ability to predict and respond to events in real-time—such as assessing the impact of new public policy at the community level on a weekly basis through electronic health records or monitoring a patient’s cardiovascular health remotely with a smartphone.


Big data Health information technology (HIT) Electronic health records (EHR) Medical informatics Expert systems Cardiovascular diseases Epidemiology Health sensors Genome-wide association study (GWAS) Natural language processing (NLP) Personalized medicine 


Compliance with Ethics Guidelines

Conflict of Interest

Satyender Goel, Laura Rasmussen-Torvik, Adam Pah, Abel Kho, and Philip Greenland have no conflicts of interest.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.


Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance

  1. 1.
    O’Luanaigh C. CERN Data Center passe 100 petabytes. (2013). at <>.
  2. 2.
    Kho AN et al. Practical challenges in integrating genomic data into the electronic health record. Genet Med. 2013;15:772–8.PubMedCentralPubMedCrossRefGoogle Scholar
  3. 3.
    Chute CG et al. Some experiences and opportunities for big data in translational research. Genet Med. 2013;15:802–9.PubMedCentralPubMedCrossRefGoogle Scholar
  4. 4.
    Jee K, Kim G-H. Potentiality of big data in the medical sector: focus on how to reshape the healthcare system. Health Inform Res. 2013;19:79–85.CrossRefGoogle Scholar
  5. 5.
    Dwoskin E. How New York’s fire department uses data mining. Wall Str. J. (2014). at <>.
  6. 6.
    Kuehn BM. Agencies use social media to track foodborne illness. JAMA. 2014. doi: 10.1001/jama.2014.7731.Google Scholar
  7. 7.
    Chang F et al. Bigtable. ACM Trans Comput Syst. 2008;26:1–26.CrossRefGoogle Scholar
  8. 8.
    Shvachko K, Kuang H, Radia S, Chansler R. The Hadoop Distributed File System. in 2010 I.E. 26th Symp. Mass Storage Syst Technol. 1–10 (IEEE, 2010). doi: 10.1109/MSST.2010.5496972.
  9. 9.
    Dean J, Ghemawat S. MapReduce. Commun ACM. 2008;51:107.CrossRefGoogle Scholar
  10. 10.
  11. 11.
    Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. (McGraw-Hill Osborne Media; 1 edition, 2011). at <>.
  12. 12.
    Shute J et al. F1: a distributed SQL database that scales. Proc VLDB Endow. 2013;6:1068–79.CrossRefGoogle Scholar
  13. 13.
    Lin L, Lychagina V, Liu W, Kwon Y, Mittal S, Wong M. Tenzing A SQL Implementation On The MapReduce Framework. in Proc. VLDB 1318–1327 (2011). at <>.
  14. 14.
    Malewicz G et al. Pregel. in Proc. 28th ACM Symp. Princ. Distrib. Comput. - Pod.’09 6 (ACM Press, 2009). doi: 10.1145/1582716.1582723.
  15. 15.
    Pennisi E. How will big pictures emerge from a sea of biological data? Science (80-.). 309, 94 (2005).Google Scholar
  16. 16.
    Narula J. Are we up to speed?: from big data to rich insights in CV imaging for a hyperconnected world. Int J Cardiovasc Imaging. 2013;6:1222–4.CrossRefGoogle Scholar
  17. 17.
    Davis GS, Sevdalis N, Drumright LN. Spatial and temporal analyses to investigate infectious disease transmission within healthcare settings. J Hosp Infect. 2014;86:227–43.PubMedCrossRefGoogle Scholar
  18. 18.
    Kho A, Sales-Pardo M, Wilson J. From clean dishes to clean hands. IEEE Eng Med Biol Mag. 2008;27:26–8.PubMedCrossRefGoogle Scholar
  19. 19.
    Weiss CH et al. A clinical trial comparing physician prompting with an unprompted automated electronic checklist to reduce empirical antibiotic utilization. Crit Care Med. 2013;41:2563–9.PubMedCrossRefGoogle Scholar
  20. 20.
    Jha AK et al. Use of electronic health records in U.S. hospitals. N Engl J Med. 2009;360:1628–38.PubMedCrossRefGoogle Scholar
  21. 21.
    Blumenthal D. Launching HITECH. N Engl J Med. 2010;362:382–5.PubMedCrossRefGoogle Scholar
  22. 22.
    Blumenthal D. Implementation of the Federal Health Information Technology Initiative. N Engl J Med. 2011;365:2426–31.PubMedCrossRefGoogle Scholar
  23. 23.
    Hsiao C-J et al. Office-based physicians are responding to incentives and assistance by adopting and using electronic health records. Health Aff (Millwood). 2013;32:1470–7.CrossRefGoogle Scholar
  24. 24.
    DesRoches CM et al. Adoption of electronic health records grows rapidly, but fewer than half of US hospitals had at least a basic system in 2012. Health Aff (Millwood). 2013;32:1478–85.CrossRefGoogle Scholar
  25. 25.
    Fleurence RL et al. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc. 2014;21:578–82.PubMedCentralPubMedCrossRefGoogle Scholar
  26. 26.
    Friedman CP, Wong AK, Blumenthal D. Achieving a nationwide learning health system. Sci Transl Med. 2, 57cm29 (2010).Google Scholar
  27. 27.
    Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet. 2012;13:395–405.PubMedCrossRefGoogle Scholar
  28. 28.
    Roque FS et al. Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Comput Biol. 2011;7:e1002141.PubMedCentralPubMedCrossRefGoogle Scholar
  29. 29.
    Patnaik D et al. Experiences with mining temporal event sequences from electronic medical records. in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD’11 360 (ACM Press, 2011). doi: 10.1145/2020408.2020468.
  30. 30.
    Bereznicki B et al. Data-mining of medication records to improve asthma management. Med. J. Aust. 189, (2008).Google Scholar
  31. 31.
    Kho AN et al. Electronic medical records for genetic research: results of the eMERGE consortium. Sci. Transl. Med. 3, 79re1 (2011).Google Scholar
  32. 32.
    FitzHenry F et al. Exploring the frontier of electronic health record surveillance: the case of postoperative complications. Med Care. 2013;51:509–16.PubMedCentralPubMedCrossRefGoogle Scholar
  33. 33.
    Goel S, Hofman JM, Lahaie S, Pennock DM, Watts DJ. Predicting consumer behavior with Web search. Proc Natl Acad Sci U S A. 2010;107:17486–90.PubMedCentralPubMedCrossRefGoogle Scholar
  34. 34.
    McAfee A, Brynjolfsson E. Big data: the management revolution. Harv Bus Rev 90, 60–6, 68, 128 (2012).Google Scholar
  35. 35.
    Ginsberg J et al. Detecting influenza epidemics using search engine query data. Nature. 2009;457:1012–4.PubMedCrossRefGoogle Scholar
  36. 36.
    Butler D. When Google got flu wrong. Nature. 2013;494:155–6.PubMedCrossRefGoogle Scholar
  37. 37.
    Lazer D, Kennedy R, King G, Vespignani A. Big data. The parable of Google Flu: traps in big data analysis. Science. 2014;343:1203–5.PubMedCrossRefGoogle Scholar
  38. 38.
    Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: data quality issues and informatics opportunities. AMIA Jt Summits Transl Sci Proc AMIA Summit Transl Sci. 2010;2010:1–5.Google Scholar
  39. 39.
    Kathiresan S, Srivastava D. Genetics of human cardiovascular disease. Cell. 2012;148:1242–57.PubMedCentralPubMedCrossRefGoogle Scholar
  40. 40.•
    Andreassen OA et al. Identifying common genetic variants in blood pressure due to polygenic pleiotropy with associated phenotypes. Hypertension 63, 819–26 (2014). The authors conducted a meta-analysis of GWAS results from eleven previous studies and identified 62 loci that were associated with systolic blood pressure, 42 of which were novel loci.Google Scholar
  41. 41.
    Johansen CT et al. Excess of rare variants in genes identified by genome-wide association study of hypertriglyceridemia. Nat Genet. 2010;42:684–7.PubMedCentralPubMedCrossRefGoogle Scholar
  42. 42.
    Arking DE, Chakravarti A. Understanding cardiovascular disease through the lens of genome-wide association studies. Trends Genet. 2009;25:387–94.PubMedCrossRefGoogle Scholar
  43. 43.
    Zhang X et al. Genetic associations with expression for genes implicated in GWAS studies for atherosclerotic cardiovascular disease and blood phenotypes. Hum Mol Genet. 2014;23:782–95.PubMedCrossRefGoogle Scholar
  44. 44.
    Ehret GB et al. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature. 2011;478:103–9.PubMedCrossRefGoogle Scholar
  45. 45.
    Wilson PWF et al. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97:1837–47.PubMedCrossRefGoogle Scholar
  46. 46.•
    Kennedy EH, Wiitala WL, Hayward RA, Sussman JB. Improved cardiovascular risk prediction using nonparametric regression and electronic health record data. Med Care. 2013;51:251–8. Using Veterans Health Administration EHR data, the authors define a patient cohort that suffered a cerebro- or cardiovascular death in a 5-year period. The authors then compare the results from the Framingham Risk Score (FRS) to multiple nonparametric methods and show that nonparametric regression algorithms that include EHR-derived predictor variables outperformed the FRS in accuracy by 5%. Notably, the inclusion of EHR-derived predictor variables provided a 3 % increase in accuracy over using a nonparametric regression alone.Google Scholar
  47. 47.
    Shah SJ et al. Abstract 17399: Phenomapping: Hierarchical Cluster Analysis of Phenotypic Data for the Classification of Heart Failure and Preserved Ejection Fraction. Circulation 126, (2012).Google Scholar
  48. 48.
    Katz DH et al. Abstract 11954: Phenomapping: Hierarchical Cluster Analysis of Phenotypic Data for Novel Classification of Hypertension. Circulation 128, (2013).Google Scholar
  49. 49.
    Mathias JS et al. Development of a 5 year life expectancy index in older adults using predictive mining of electronic health record data. J Am Med Inform Assoc. 2013;20:e118–24.PubMedCentralPubMedCrossRefGoogle Scholar
  50. 50.
    Chute CG et al. The SHARPn project on secondary use of Electronic Medical Record data: progress, plans, and possibilities. AMIA Annu Symp Proc. 2011;2011:248–56.PubMedCentralPubMedGoogle Scholar
  51. 51.
    Savova GK, Ogren PV, Duffy PH, Buntrock JD, Chute CG. Mayo clinic NLP system for patient smoking status identification. J Am Med Inform Assoc. 2008;15:25–8.PubMedCentralPubMedCrossRefGoogle Scholar
  52. 52.
    Hsieh J-C, Li A-H, Yang C-C. Mobile, cloud, and big data computing: contributions, challenges, and new directions in telecardiology. Int J Environ Res Public Health. 2013;10:6131–53.PubMedCentralPubMedCrossRefGoogle Scholar
  53. 53.
    Hsieh JC, Hsu MW. A cloud computing based 12-lead ECG telemedicine service. BMC Med Inform Decis Mak. 2012;12:77.PubMedCentralPubMedCrossRefGoogle Scholar
  54. 54.
    Singh S et al. American society of echocardiography: remote echocardiography with web-based assessments for referrals at a distance (ASE-REWARD) study. J Am Soc Echocardiogr. 2013;26:221–33.PubMedCrossRefGoogle Scholar
  55. 55.
    Sengupta PP. Intelligent platforms for disease assessment: novel approaches in functional echocardiography. Int J Cardiovasc Imagin. 2013;6:1206–11.CrossRefGoogle Scholar
  56. 56.
    Sengupta PP et al. Emerging trends in CV flow visualization. Int J Cardiovasc Imaging. 2012;5:305–16.CrossRefGoogle Scholar
  57. 57.
    Reshef DN et al. Detecting novel associations in large data sets. Science. 2011;334:1518–24.PubMedCentralPubMedCrossRefGoogle Scholar
  58. 58.
    Greenlee RT. Measuring disease frequency in the Marshfield Epidemiologic Study Area (MESA). Clin Med Res. 2003;1:273–80.PubMedCentralPubMedCrossRefGoogle Scholar
  59. 59.
    Friedman GD et al. Cardia: study design, recruitment, and some characteristics of the examined subjects. J Clin Epidemiol. 1988;41:1105–16.PubMedCrossRefGoogle Scholar
  60. 60.
    Hill C et al. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. Am J Epidemiol. 1989;129:687–702.Google Scholar
  61. 61.••
    Collins FS, Hudson KL, Briggs JP, Lauer MS. PCORnet: turning a dream into reality. J Am Med Inform Assoc. 2014;21:576–7. The aim of PCORnet is to build a national research network that shares a common data model and is embedded in clinical care systems. The Patient Centered Outcomes Research Institute has funded the creation of 12 regional linked networks to enable large-scale observational research and eventually launch a clinical trial using the national network.Google Scholar
  62. 62.
    Lauer MS. Personal reflections on big science, small science, or the right mix. Circ Res. 2014;114:1080–2.PubMedCrossRefGoogle Scholar
  63. 63.••
    Manolio TA, Collins R. Vehement agreement on new models? Am J Epidemiol. 2013;177:290–1. This work details the cohort recruitment strategy for the UK Biobank project, which involved the recruitment of 503,000 participants and was completed ahead of schedule and within budget. The Biobank project utilized a central body to direct the study and multiple provider locations that assessed patients that participated in the study. The authors posit that using this model of study design could aid in reducing costs when applied to other countries.Google Scholar
  64. 64.
    Ness RB. Counterpoint: the future of innovative epidemiology. Am J Epidemiol. 2013;177:281–2.PubMedCrossRefGoogle Scholar
  65. 65.
    Kuller LH. Point: is there a future for innovative epidemiology? Am J Epidemiol. 2013;177:279–80.PubMedCrossRefGoogle Scholar
  66. 66.
    Petsko GA. Herding cats. Sci Transl Med 3, 97cm24 (2011).Google Scholar
  67. 67.
    Lauer MS. Time for a creative transformation of epidemiology in the United States. JAMA. 2012;308:1804–5.PubMedCrossRefGoogle Scholar
  68. 68.
    Rusanov A, Weiskopf NG, Wang S, Weng C. Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research. BMC Med Inform Decis Mak. 2014;14:51.PubMedCentralPubMedCrossRefGoogle Scholar
  69. 69.
    Weiskopf NG, Rusanov A, Weng C. Sick patients have more data: the non-random completeness of electronic health records. AMIA Annu Symp Proc. 2013;2013:1472–7.PubMedCentralPubMedGoogle Scholar
  70. 70.
    Jordan K et al. Measuring disease prevalence: a comparison of musculoskeletal disease using four general practice consultation databases. Br J Gen Pract. 2007;57:7–14.PubMedCentralPubMedGoogle Scholar
  71. 71.•
    Violán C et al. Comparison of the information provided by electronic health records data and a population health survey to estimate prevalence of selected health conditions and multimorbidity. BMC Public Health. 2013;13:251. The representation of disease between EHR and health surveys was assessed using a Catalan government health survey and the local EHR system that covered 80% of the population. The results of this study are notable for cardiovascular researchers since many cardiovascular conditions (myocardial infarction, cardiac disease, and hypertension) are shown to have representation that is close to equivalent between the two sources.Google Scholar
  72. 72.
    Green LA, Fryer GE, Yawn BP, Lanier D, Dovey SM. The ecology of medical care revisited. N Engl J Med. 2001;344:2021–5.PubMedCrossRefGoogle Scholar
  73. 73.
    New York City Department of Health and Mental Hygiene. Developing an Electronic Health Record-Based Population Health Surveillance System. (2013).Google Scholar
  74. 74.
    Manolio TA et al. New models for large prospective studies: is there a better way? Am J Epidemiol. 2012;175:859–66.PubMedCentralPubMedCrossRefGoogle Scholar
  75. 75.
    Kaplan GA. How big is big enough for epidemiology? Epidemiology. 2007;18:18–20.PubMedCrossRefGoogle Scholar
  76. 76.
    Weiss KB, Wagener DK. Geographic variations in US asthma mortality: small-area analyses of excess mortality, 1981-1985. Am J Epidemiol. 1990;132:107–15.Google Scholar
  77. 77.
    Luo L, McLafferty S, Wang F. Analyzing spatial aggregation error in statistical models of late-stage cancer risk: a Monte Carlo simulation approach. Int J Health Geogr. 2010;9:51.PubMedCentralPubMedCrossRefGoogle Scholar
  78. 78.
    Goovaerts P. Geostatistical analysis of health data with different levels of spatial aggregation. Spat Spatiotemporal Epidemiol. 2012;3:83–92.PubMedCentralPubMedCrossRefGoogle Scholar
  79. 79.
    Li W et al. Small-area estimation and prioritizing communities for obesity control in Massachusetts. Am J Public Health. 2009;99:511–9.PubMedCentralPubMedCrossRefGoogle Scholar
  80. 80.
    Swan M. Crowdsourced health research studies: an important emerging complement to clinical trials in the public health research ecosystem. J Med Internet Res. 2012;14:e46.PubMedCentralPubMedCrossRefGoogle Scholar
  81. 81.
    Patel CJ, Bhattacharya J, Butte AJ. An Environment-Wide Association Study (EWAS) on type 2 diabetes mellitus. PLoS ONE. 2010;5:e10746.PubMedCentralPubMedCrossRefGoogle Scholar
  82. 82.
    De Nazelle A et al. Improving estimates of air pollution exposure through ubiquitous sensing technologies. Environ Pollut. 2013;176:92–9.PubMedCentralPubMedCrossRefGoogle Scholar
  83. 83.
    Robinson PL et al. Does distance decay modelling of supermarket accessibility predict fruit and vegetable intake by individuals in a large metropolitan area? J Health Care Poor Underserved. 2013;24:172–85.PubMedCentralPubMedCrossRefGoogle Scholar
  84. 84.
    Roth C, Foraker RE, Payne PRO, Embi PJ. Community-level determinants of obesity: harnessing the power of electronic health records for retrospective data analysis. BMC Med Inform Decis Mak. 2014;14:36.PubMedCentralPubMedCrossRefGoogle Scholar
  85. 85.
    Walsh JA, Topol EJ, Steinhubl SR. Novel wireless devices for cardiac monitoring. Circulation. 2014;130:573–81.PubMedCrossRefGoogle Scholar
  86. 86.
    Luo K, Li J, Wu J. A Dynamic Compression Scheme for Energy-Efficient Real-Time Wireless Electrocardiogram Biosensors. IEEE Trans. Instrum. Meas. PP, 1–1 (2014).Google Scholar
  87. 87.
    Noh YH, Jeong DU. Implementation of a data packet generator using pattern matching for wearable ECG monitoring systems. Sensors. 2014;14(12623–39).Google Scholar
  88. 88.
    Smith DW, Nowacki D, Li JK-J. ECG T-wave monitor for potential early detection and diagnosis of cardiac arrhythmias. Cardiovasc Eng. 2010;10:201–6.PubMedCrossRefGoogle Scholar
  89. 89.
    Barutcu A et al. Arrhythmia risk assessment using heart rate variability parameters in patients with frequent ventricular ectopic beats without structural heart disease. Pacing Clin. Electrophysiol. n/a–n/a (2014). doi: 10.1111/pace.12446.
  90. 90.
    Orchard J, Freedman SB, Lowres N, Peiris D, Neubeck L. iPhone ECG screening by practice nurses and receptionists for atrial fibrillation in general practice: The GP-SEARCH qualitative pilot study. 43, 315 (2014).Google Scholar
  91. 91.
    Hickey KT, Dizon J, Frulla A. Detection of recurrent atrial fibrillation utilizing novel technology. JAFIB J. Atr. Fibrillation. Dec2013/Jan2014 6, (2014).Google Scholar
  92. 92.
    Donaire-Gonzalez D et al. Comparison of physical activity measures using mobile phone-based CalFit and Actigraph. J Med Internet Res. 2013;15:e111.PubMedCentralPubMedCrossRefGoogle Scholar
  93. 93.
    Carter MC, Burley VJ, Nykjaer C, Cade JE. Adherence to a smartphone application for weight loss compared to website and paper diary: pilot randomized controlled trial. J Med Internet Res. 2013;15:e32.PubMedCentralPubMedCrossRefGoogle Scholar
  94. 94.
    Dayer L, Heldenbrand S, Anderson P, Gubbins PO, Martin BC. Smartphone medication adherence apps: potential benefits to patients and providers. J Am Pharm Assoc. 2003;53:172–81.CrossRefGoogle Scholar
  95. 95.
    Van Sickle D, Magzamen S, Truelove S, Morrison T. Remote monitoring of inhaled bronchodilator use and weekly feedback about asthma management: an open-group, short-term pilot study of the impact on asthma control. PLoS ONE. 2013;8:e55335.PubMedCentralPubMedCrossRefGoogle Scholar
  96. 96.
    Spring B et al. Better population health through behavior change in adults: a call to action. Circulation. 2013;128:2169–76.PubMedCentralPubMedCrossRefGoogle Scholar
  97. 97.
    Helmerhorst HJF, Brage S, Warren J, Besson H, Ekelund U. A systematic review of reliability and objective criterion-related validity of physical activity questionnaires. Int J Behav Nutr Phys Act. 2012;9:103.PubMedCentralPubMedCrossRefGoogle Scholar
  98. 98.
    Kerr J, Duncan S, Schipperijn J, Schipperjin J. Using global positioning systems in health research: a practical approach to data collection and processing. Am J Prev Med. 2011;41:532–40.PubMedCrossRefGoogle Scholar
  99. 99.
    Kelly P et al. An ethical framework for automated, wearable cameras in health behavior research. Am J Prev Med. 2013;44:314–9.PubMedCrossRefGoogle Scholar
  100. 100.
    Frieden TR, Berwick DM. The “Million Hearts” initiative—preventing heart attacks and strokes. N Engl J Med. 2011;365.Google Scholar
  101. 101.
    Magid DJ et al. A pharmacist-led, American Heart Association Heart360 Web-enabled home blood pressure monitoring program. Circ Cardiovasc Qual Outcomes. 2013;6:157–63.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • A. R. Pah
    • 1
    • 3
  • L. J. Rasmussen-Torvik
    • 2
  • S. Goel
    • 3
  • P. Greenland
    • 2
    • 3
  • A. N. Kho
    • 2
    • 3
  1. 1.Department of Chemical and Biological EngineeringNorthwestern UniversityChicagoUSA
  2. 2.Department of Preventive MedicineNorthwestern UniversityChicagoUSA
  3. 3.Department of MedicineNorthwestern UniversityChicagoUSA

Personalised recommendations