Health Services Data: Big Data Analytics for Deriving Predictive Healthcare Insights

Agrawal, Ankit; Choudhary, Alok

doi:10.1007/978-1-4939-8715-3_2

Ankit Agrawal⁸ &
Alok Choudhary⁸

Part of the book series: Health Services Research ((HEALTHSR))

1979 Accesses
4 Citations
2 Altmetric

Abstract

This chapter describes the application of big data analytics in healthcare, particularly on electronic healthcare records so as to make predictive models for healthcare outcomes and discover interesting insights. A typical workflow for such predictive analytics involves data collection, data transformation, predictive modeling, evaluation, and deployment, with each step tailored to the end goals of the project. To illustrate each of these steps, we shall take the example of recent advances in such predictive analytics on lung cancer data from the Surveillance, Epidemiology, and End Results (SEER) program. This includes the construction of accurate predictive models for lung cancer survival, development of a lung cancer outcome calculator deploying the predictive models, and association rule mining on that data for bottom-up discovery of interesting insights. The lung cancer outcome calculator illustrated here is available at http://info.eecs.northwestern.edu/LungCancerOutcomeCalculator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 649.99; Price excludes VAT (USA)

Hardcover Book: USD 899.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal A, Choudhary A. Association rule mining based hotspot analysis on seer lung cancer data. Int J Knowl Discov Bioinform (IJKDB). 2011a;2(2):34–54.
Article Google Scholar
Agrawal A, Choudhary A. Identifying hotspots in lung cancer data using association rule mining. In: 2nd IEEE ICDM workshop on biological data mining and its applications in healthcare (BioDM); 2011b. p. 995–1002.
Google Scholar
Agrawal A, Choudhary A. Perspective: materials informatics and big data: realization of the fourth paradigm of science in materials science. APL Mater. 2016;4(053208):1–10.
Google Scholar
Agrawal A, Huang X. Psiblast pairwisestatsig: reordering psi-blast hits using pairwise statistical significance. Bioinformatics. 2009;25(8):1082–3.
Article CAS Google Scholar
Agrawal A, Huang X. Pairwise statistical significance of local sequence alignment using sequence- specific and position-specific substitution matrices. IEEE/ACM Trans Comput Biol Bioinformatics. 2011;8(1):194–205.
Article Google Scholar
Agrawal A, Misra S, Narayanan R, Polepeddi L, Choudhary A. A lung cancer outcome calculator using ensemble data mining on seer data. In: Proceedings of the tenth international workshop on data mining in bioinformatics (BIOKDD), New York: ACM; 2011. p. 1–9.
Google Scholar
Agrawal A, Misra S, Narayanan R, Polepeddi L, Choudhary A. Lung cancer survival prediction using ensemble data mining on seer data. Sci Program. 2012;20(1):29–42.
Google Scholar
Agrawal A, Patwary M, Hendrix W, Liao WK, Choudhary A. High performance big data clustering. IOS Press; 2013a. p. 192–211.
Google Scholar
Agrawal A, Al-Bahrani R, Merkow R, Bilimoria K, Choudhary A. “Colon surgery outcome prediction using acs nsqip data,” In: Proceedings of the KDD workshop on Data Mining for Healthcare (DMH); 2013b. p. 1–6.
Google Scholar
Agrawal A, Al-Bahrani R, Raman J, Russo MJ, Choudhary A. Lung transplant outcome prediction using unos data. In: Proceedings of the IEEE big data workshop on Bioinformatics and Health Informatics (BHI); 2013c. p. 1–8.
Google Scholar
Andreu-Perez J, Leff DR, Ip H, Yang G-Z. From wearable sensors to smart implants – toward pervasive and personalized healthcare. IEEE Trans Biomed Eng. 2015;62(12):2750–62.
Article Google Scholar
Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): the tripod statement. Ann Intern Med. 2015;162(1):55–63.
Article Google Scholar
Ganguly AR, Kodra E, Agrawal A, Banerjee A, Boriah S, Chatterjee S, Chatterjee S, Choudhary A, Das D, Faghmous J, Ganguli P, Ghosh S, Hayhoe K, Hays C, Hendrix W, Fu Q, Kawale J, Kumar D, Kumar V, Liao WK, Liess S, Mawalagedara R, Mithal V, Oglesby R, Salvi K, Snyder PK, Steinhaeuser K, Wang D, Wuebbles D. Toward enhanced understanding and projections of climate extremes using physics-guided data mining techniques. Nonlinear Process Geophys. 2014;21:777–95.
Article Google Scholar
Hey T, Tansley S, Tolle K, editors. The fourth paradigm: data-intensive scientific discovery. Redmond: Microsoft Research; 2009.
Google Scholar
Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, Hill DP, Kania R, Schaeffer M, Pierre SS, et al. Big data: the future of biocuration. Nature. 2008;455(7209):47–50.
Article CAS Google Scholar
Huang X, Madan A. Cap3: a dna sequence assembly program. Genome Res. 1999;9(9):868–77.
Article CAS Google Scholar
Lee K, Agrawal A, Choudhary A. Real-time disease surveillance using twitter data: demonstration on flu and cancer. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD); 2013.p. 1474–77.
Google Scholar
Lee K, Agrawal A, Choudhary A. Mining social media streams to improve public health allergy surveillance. In: Proceedings of IEEE/ACM international conference on Social Networks Analysis and Mining (ASONAM); 2015.p. 815–22.
Google Scholar
Magill SS, Edwards JR, Bamberg W, Beldavs ZG, Dumyati G, Kainer MA, Lynfield R, Maloney M, McAllister-Hollod L, Nadle J, Ray SM, Thompson DL, Wilson LE, Fridkin SK. Multistate point-prevalence survey of health care-associated infections. N Engl J Med. 2014;370(13):1198–208.
Article CAS Google Scholar
Marx V. Biology: the big challenges of big data. Nature. 2013;498(7453):255–60.
Article CAS Google Scholar
Mathias JS, Agrawal A, Feinglass J, Cooper AJ, Baker DW, Choudhary A. Development of a 5 year life expectancy index in older adults using predictive mining of electronic health record data. J Am Med Inform Assoc. 2013;20:e118–24. JSM and AA are co-first authors.
Article Google Scholar
Misra S, Agrawal A, Liao W-k, Choudhary A. Anatomy of a hash-based long read sequence mapping algorithm for next generation dna sequencing. Bioinformatics. 2011;27(2):189–95.
Article CAS Google Scholar
ODriscoll A, Daugelaite J, Sleator RD. Big data, hadoop and cloud computing in genomics. J Biomed Inform. 2013;46(5):774–81.
Article Google Scholar
Ries LAG, Eisner MP. Cancer of the lung. In: Ries LAG, Young JL, Keel GE, Eisner MP, Lin YD, Horner M-J, eds. SEER survival monograph: Cancer survival among adults: U.S. SEER program, 1988–2001, Patient and Tumor Characteristics. NIH Pub. No. 07–6215. Bethesda, Md: National Cancer Institute, SEER Program; 2007:73–80.
Google Scholar
SEER, Surveillance, epidemiology, and end results (seer) program (www.seer.cancer.gov) limited-use data (1973–2006). National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch; 2008. Released April 2009, based on the November 2008 submission.
Xie Y, Honbo D, Choudhary A, Zhang K, Cheng Y, Agrawal A. Voxsup: a social engagement framework. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD) (Demo paper). ACM; 2012. p. 1556–9.
Google Scholar
Xie Y, Chen Z, Zhang K, Cheng Y, Honbo DK, Agrawal A, Choudhary A. Muses: a multilingual sentiment elicitation system for social media data. IEEE Intell Syst. 2013a;99:1541–672.
Google Scholar
Xie Y, Chen Z, Cheng Y, Zhang K, Agrawal A, WK Liao, Choudhary A. Detecting and tracking disease outbreaks by mining social media data. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI); 2013b.p. 2958–60.
Google Scholar
Xie Y, Palsetia D, Trajcevski G, Agrawal A, Choudhary A. Silverback: scalable association mining for temporal data in columnar probabilistic databases. In: Proceedings of 30th IEEE International Conference on Data Engineering (ICDE), Industrial and Applications Track; 2014. p. 1072–83.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, USA
Ankit Agrawal & Alok Choudhary

Authors

Ankit Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Alok Choudhary
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ankit Agrawal .

Editor information

Editors and Affiliations

Community Health and Epidemiology, Dalhousie University, Halifax, NS, Canada
Adrian Levy
ICON plc, Vancouver, BC, Canada
Sarah Goring
Department of Biostatistics, Brown University, Providence, RI, USA
Constantine Gatsonis
University of British Columbia, Vancouver, BC, Canada
Boris Sobolev
European Observatory on Health Systems and Policies, Department of Health Care Management, Berlin University of Technology, Berlin, Germany
Ewout van Ginneken
Department Health Care Management Faculty of Economics and Management, Technische Universität Berlin, Berlin, Germany
Reinhard Busse

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Agrawal, A., Choudhary, A. (2019). Health Services Data: Big Data Analytics for Deriving Predictive Healthcare Insights. In: Levy, A., Goring, S., Gatsonis, C., Sobolev, B., van Ginneken, E., Busse, R. (eds) Health Services Evaluation. Health Services Research. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-8715-3_2

Download citation

DOI: https://doi.org/10.1007/978-1-4939-8715-3_2
Published: 12 February 2019
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-8714-6
Online ISBN: 978-1-4939-8715-3
eBook Packages: MedicineReference Module Medicine

Publish with us

Policies and ethics