Big Data: What Is It and What Does It Mean for Cardiovascular Research and Prevention Policy
- 542 Downloads
Over the past decade, there has been explosive growth in the amount of healthcare-related data generated and interest in harnessing this data for research purposes and informing public policy. Outside of healthcare, specialized software has been developed to tackle the problems that voluminous data creates, and these techniques could be applicable in several areas of cardiovascular research. Cardiovascular risk analysis may benefit from the inclusion of patient genetic and health record data, while cardiovascular epidemiology could benefit from crowd-sourced environmental data. Some of the most significant advances may come from the ability to predict and respond to events in real-time—such as assessing the impact of new public policy at the community level on a weekly basis through electronic health records or monitoring a patient’s cardiovascular health remotely with a smartphone.
KeywordsBig data Health information technology (HIT) Electronic health records (EHR) Medical informatics Expert systems Cardiovascular diseases Epidemiology Health sensors Genome-wide association study (GWAS) Natural language processing (NLP) Personalized medicine
Compliance with Ethics Guidelines
Conflict of Interest
Satyender Goel, Laura Rasmussen-Torvik, Adam Pah, Abel Kho, and Philip Greenland have no conflicts of interest.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance
- 1.O’Luanaigh C. CERN Data Center passe 100 petabytes. (2013). at <http://home.web.cern.ch/about/updates/2013/02/cern-data-centre-passes-100-petabytes>.
- 5.Dwoskin E. How New York’s fire department uses data mining. Wall Str. J. (2014). at <http://blogs.wsj.com/digits/2014/01/24/how-new-yorks-fire-department-uses-data-mining/?mod=WSJBlog>.
- 8.Shvachko K, Kuang H, Radia S, Chansler R. The Hadoop Distributed File System. in 2010 I.E. 26th Symp. Mass Storage Syst Technol. 1–10 (IEEE, 2010). doi: 10.1109/MSST.2010.5496972.
- 10.Laney D. Application Delivery Strategies. (2001). at <http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf>.
- 11.Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. (McGraw-Hill Osborne Media; 1 edition, 2011). at <http://www.amazon.com/Understanding-Big-Data-Analytics-Enterprise-ebook/dp/B0069QEHOE>.
- 13.Lin L, Lychagina V, Liu W, Kwon Y, Mittal S, Wong M. Tenzing A SQL Implementation On The MapReduce Framework. in Proc. VLDB 1318–1327 (2011). at <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.226.772>.
- 14.Malewicz G et al. Pregel. in Proc. 28th ACM Symp. Princ. Distrib. Comput. - Pod.’09 6 (ACM Press, 2009). doi: 10.1145/1582716.1582723.
- 15.Pennisi E. How will big pictures emerge from a sea of biological data? Science (80-.). 309, 94 (2005).Google Scholar
- 26.Friedman CP, Wong AK, Blumenthal D. Achieving a nationwide learning health system. Sci Transl Med. 2, 57cm29 (2010).Google Scholar
- 29.Patnaik D et al. Experiences with mining temporal event sequences from electronic medical records. in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD’11 360 (ACM Press, 2011). doi: 10.1145/2020408.2020468.
- 30.Bereznicki B et al. Data-mining of medication records to improve asthma management. Med. J. Aust. 189, (2008).Google Scholar
- 31.Kho AN et al. Electronic medical records for genetic research: results of the eMERGE consortium. Sci. Transl. Med. 3, 79re1 (2011).Google Scholar
- 34.McAfee A, Brynjolfsson E. Big data: the management revolution. Harv Bus Rev 90, 60–6, 68, 128 (2012).Google Scholar
- 38.Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: data quality issues and informatics opportunities. AMIA Jt Summits Transl Sci Proc AMIA Summit Transl Sci. 2010;2010:1–5.Google Scholar
- 40.•Andreassen OA et al. Identifying common genetic variants in blood pressure due to polygenic pleiotropy with associated phenotypes. Hypertension 63, 819–26 (2014). The authors conducted a meta-analysis of GWAS results from eleven previous studies and identified 62 loci that were associated with systolic blood pressure, 42 of which were novel loci.Google Scholar
- 46.•Kennedy EH, Wiitala WL, Hayward RA, Sussman JB. Improved cardiovascular risk prediction using nonparametric regression and electronic health record data. Med Care. 2013;51:251–8. Using Veterans Health Administration EHR data, the authors define a patient cohort that suffered a cerebro- or cardiovascular death in a 5-year period. The authors then compare the results from the Framingham Risk Score (FRS) to multiple nonparametric methods and show that nonparametric regression algorithms that include EHR-derived predictor variables outperformed the FRS in accuracy by 5%. Notably, the inclusion of EHR-derived predictor variables provided a 3 % increase in accuracy over using a nonparametric regression alone.Google Scholar
- 47.Shah SJ et al. Abstract 17399: Phenomapping: Hierarchical Cluster Analysis of Phenotypic Data for the Classification of Heart Failure and Preserved Ejection Fraction. Circulation 126, (2012).Google Scholar
- 48.Katz DH et al. Abstract 11954: Phenomapping: Hierarchical Cluster Analysis of Phenotypic Data for Novel Classification of Hypertension. Circulation 128, (2013).Google Scholar
- 60.Hill C et al. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. Am J Epidemiol. 1989;129:687–702.Google Scholar
- 61.••Collins FS, Hudson KL, Briggs JP, Lauer MS. PCORnet: turning a dream into reality. J Am Med Inform Assoc. 2014;21:576–7. The aim of PCORnet is to build a national research network that shares a common data model and is embedded in clinical care systems. The Patient Centered Outcomes Research Institute has funded the creation of 12 regional linked networks to enable large-scale observational research and eventually launch a clinical trial using the national network.Google Scholar
- 63.••Manolio TA, Collins R. Vehement agreement on new models? Am J Epidemiol. 2013;177:290–1. This work details the cohort recruitment strategy for the UK Biobank project, which involved the recruitment of 503,000 participants and was completed ahead of schedule and within budget. The Biobank project utilized a central body to direct the study and multiple provider locations that assessed patients that participated in the study. The authors posit that using this model of study design could aid in reducing costs when applied to other countries.Google Scholar
- 66.Petsko GA. Herding cats. Sci Transl Med 3, 97cm24 (2011).Google Scholar
- 71.•Violán C et al. Comparison of the information provided by electronic health records data and a population health survey to estimate prevalence of selected health conditions and multimorbidity. BMC Public Health. 2013;13:251. The representation of disease between EHR and health surveys was assessed using a Catalan government health survey and the local EHR system that covered 80% of the population. The results of this study are notable for cardiovascular researchers since many cardiovascular conditions (myocardial infarction, cardiac disease, and hypertension) are shown to have representation that is close to equivalent between the two sources.Google Scholar
- 73.New York City Department of Health and Mental Hygiene. Developing an Electronic Health Record-Based Population Health Surveillance System. (2013).Google Scholar
- 76.Weiss KB, Wagener DK. Geographic variations in US asthma mortality: small-area analyses of excess mortality, 1981-1985. Am J Epidemiol. 1990;132:107–15.Google Scholar
- 86.Luo K, Li J, Wu J. A Dynamic Compression Scheme for Energy-Efficient Real-Time Wireless Electrocardiogram Biosensors. IEEE Trans. Instrum. Meas. PP, 1–1 (2014).Google Scholar
- 87.Noh YH, Jeong DU. Implementation of a data packet generator using pattern matching for wearable ECG monitoring systems. Sensors. 2014;14(12623–39).Google Scholar
- 89.Barutcu A et al. Arrhythmia risk assessment using heart rate variability parameters in patients with frequent ventricular ectopic beats without structural heart disease. Pacing Clin. Electrophysiol. n/a–n/a (2014). doi: 10.1111/pace.12446.
- 90.Orchard J, Freedman SB, Lowres N, Peiris D, Neubeck L. iPhone ECG screening by practice nurses and receptionists for atrial fibrillation in general practice: The GP-SEARCH qualitative pilot study. 43, 315 (2014).Google Scholar
- 91.Hickey KT, Dizon J, Frulla A. Detection of recurrent atrial fibrillation utilizing novel technology. JAFIB J. Atr. Fibrillation. Dec2013/Jan2014 6, (2014).Google Scholar
- 100.Frieden TR, Berwick DM. The “Million Hearts” initiative—preventing heart attacks and strokes. N Engl J Med. 2011;365.Google Scholar