Skip to main content

Health Services Data: Big Data Analytics for Deriving Predictive Healthcare Insights

Part of the Health Services Research book series (HEALTHSR)

Abstract

This chapter describes the application of big data analytics in healthcare, particularly on electronic healthcare records so as to make predictive models for healthcare outcomes and discover interesting insights. A typical workflow for such predictive analytics involves data collection, data transformation, predictive modeling, evaluation, and deployment, with each step tailored to the end goals of the project. To illustrate each of these steps, we shall take the example of recent advances in such predictive analytics on lung cancer data from the Surveillance, Epidemiology, and End Results (SEER) program. This includes the construction of accurate predictive models for lung cancer survival, development of a lung cancer outcome calculator deploying the predictive models, and association rule mining on that data for bottom-up discovery of interesting insights. The lung cancer outcome calculator illustrated here is available at http://info.eecs.northwestern.edu/LungCancerOutcomeCalculator.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-4939-8715-3_2
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   749.99
Price excludes VAT (USA)
  • ISBN: 978-1-4939-8715-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   899.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3

References

  • Agrawal A, Choudhary A. Association rule mining based hotspot analysis on seer lung cancer data. Int J Knowl Discov Bioinform (IJKDB). 2011a;2(2):34–54.

    CrossRef  Google Scholar 

  • Agrawal A, Choudhary A. Identifying hotspots in lung cancer data using association rule mining. In: 2nd IEEE ICDM workshop on biological data mining and its applications in healthcare (BioDM); 2011b. p. 995–1002.

    Google Scholar 

  • Agrawal A, Choudhary A. Perspective: materials informatics and big data: realization of the fourth paradigm of science in materials science. APL Mater. 2016;4(053208):1–10.

    Google Scholar 

  • Agrawal A, Huang X. Psiblast pairwisestatsig: reordering psi-blast hits using pairwise statistical significance. Bioinformatics. 2009;25(8):1082–3.

    CrossRef  CAS  Google Scholar 

  • Agrawal A, Huang X. Pairwise statistical significance of local sequence alignment using sequence- specific and position-specific substitution matrices. IEEE/ACM Trans Comput Biol Bioinformatics. 2011;8(1):194–205.

    CrossRef  Google Scholar 

  • Agrawal A, Misra S, Narayanan R, Polepeddi L, Choudhary A. A lung cancer outcome calculator using ensemble data mining on seer data. In: Proceedings of the tenth international workshop on data mining in bioinformatics (BIOKDD), New York: ACM; 2011. p. 1–9.

    Google Scholar 

  • Agrawal A, Misra S, Narayanan R, Polepeddi L, Choudhary A. Lung cancer survival prediction using ensemble data mining on seer data. Sci Program. 2012;20(1):29–42.

    Google Scholar 

  • Agrawal A, Patwary M, Hendrix W, Liao WK, Choudhary A. High performance big data clustering. IOS Press; 2013a. p. 192–211.

    Google Scholar 

  • Agrawal A, Al-Bahrani R, Merkow R, Bilimoria K, Choudhary A. “Colon surgery outcome prediction using acs nsqip data,” In: Proceedings of the KDD workshop on Data Mining for Healthcare (DMH); 2013b. p. 1–6.

    Google Scholar 

  • Agrawal A, Al-Bahrani R, Raman J, Russo MJ, Choudhary A. Lung transplant outcome prediction using unos data. In: Proceedings of the IEEE big data workshop on Bioinformatics and Health Informatics (BHI); 2013c. p. 1–8.

    Google Scholar 

  • Andreu-Perez J, Leff DR, Ip H, Yang G-Z. From wearable sensors to smart implants – toward pervasive and personalized healthcare. IEEE Trans Biomed Eng. 2015;62(12):2750–62.

    CrossRef  Google Scholar 

  • Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): the tripod statement. Ann Intern Med. 2015;162(1):55–63.

    CrossRef  Google Scholar 

  • Ganguly AR, Kodra E, Agrawal A, Banerjee A, Boriah S, Chatterjee S, Chatterjee S, Choudhary A, Das D, Faghmous J, Ganguli P, Ghosh S, Hayhoe K, Hays C, Hendrix W, Fu Q, Kawale J, Kumar D, Kumar V, Liao WK, Liess S, Mawalagedara R, Mithal V, Oglesby R, Salvi K, Snyder PK, Steinhaeuser K, Wang D, Wuebbles D. Toward enhanced understanding and projections of climate extremes using physics-guided data mining techniques. Nonlinear Process Geophys. 2014;21:777–95.

    CrossRef  Google Scholar 

  • Hey T, Tansley S, Tolle K, editors. The fourth paradigm: data-intensive scientific discovery. Redmond: Microsoft Research; 2009.

    Google Scholar 

  • Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, Hill DP, Kania R, Schaeffer M, Pierre SS, et al. Big data: the future of biocuration. Nature. 2008;455(7209):47–50.

    CrossRef  CAS  Google Scholar 

  • Huang X, Madan A. Cap3: a dna sequence assembly program. Genome Res. 1999;9(9):868–77.

    CrossRef  CAS  Google Scholar 

  • Lee K, Agrawal A, Choudhary A. Real-time disease surveillance using twitter data: demonstration on flu and cancer. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD); 2013.p. 1474–77.

    Google Scholar 

  • Lee K, Agrawal A, Choudhary A. Mining social media streams to improve public health allergy surveillance. In: Proceedings of IEEE/ACM international conference on Social Networks Analysis and Mining (ASONAM); 2015.p. 815–22.

    Google Scholar 

  • Magill SS, Edwards JR, Bamberg W, Beldavs ZG, Dumyati G, Kainer MA, Lynfield R, Maloney M, McAllister-Hollod L, Nadle J, Ray SM, Thompson DL, Wilson LE, Fridkin SK. Multistate point-prevalence survey of health care-associated infections. N Engl J Med. 2014;370(13):1198–208.

    CrossRef  CAS  Google Scholar 

  • Marx V. Biology: the big challenges of big data. Nature. 2013;498(7453):255–60.

    CrossRef  CAS  Google Scholar 

  • Mathias JS, Agrawal A, Feinglass J, Cooper AJ, Baker DW, Choudhary A. Development of a 5 year life expectancy index in older adults using predictive mining of electronic health record data. J Am Med Inform Assoc. 2013;20:e118–24. JSM and AA are co-first authors.

    CrossRef  Google Scholar 

  • Misra S, Agrawal A, Liao W-k, Choudhary A. Anatomy of a hash-based long read sequence mapping algorithm for next generation dna sequencing. Bioinformatics. 2011;27(2):189–95.

    CrossRef  CAS  Google Scholar 

  • ODriscoll A, Daugelaite J, Sleator RD. Big data, hadoop and cloud computing in genomics. J Biomed Inform. 2013;46(5):774–81.

    CrossRef  Google Scholar 

  • Ries LAG, Eisner MP. Cancer of the lung. In: Ries LAG, Young JL, Keel GE, Eisner MP, Lin YD, Horner M-J, eds. SEER survival monograph: Cancer survival among adults: U.S. SEER program, 1988–2001, Patient and Tumor Characteristics. NIH Pub. No. 07–6215. Bethesda, Md: National Cancer Institute, SEER Program; 2007:73–80.

    Google Scholar 

  • SEER, Surveillance, epidemiology, and end results (seer) program (www.seer.cancer.gov) limited-use data (1973–2006). National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch; 2008. Released April 2009, based on the November 2008 submission.

  • Xie Y, Honbo D, Choudhary A, Zhang K, Cheng Y, Agrawal A. Voxsup: a social engagement framework. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD) (Demo paper). ACM; 2012. p. 1556–9.

    Google Scholar 

  • Xie Y, Chen Z, Zhang K, Cheng Y, Honbo DK, Agrawal A, Choudhary A. Muses: a multilingual sentiment elicitation system for social media data. IEEE Intell Syst. 2013a;99:1541–672.

    Google Scholar 

  • Xie Y, Chen Z, Cheng Y, Zhang K, Agrawal A, WK Liao, Choudhary A. Detecting and tracking disease outbreaks by mining social media data. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI); 2013b.p. 2958–60.

    Google Scholar 

  • Xie Y, Palsetia D, Trajcevski G, Agrawal A, Choudhary A. Silverback: scalable association mining for temporal data in columnar probabilistic databases. In: Proceedings of 30th IEEE International Conference on Data Engineering (ICDE), Industrial and Applications Track; 2014. p. 1072–83.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ankit Agrawal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Verify currency and authenticity via CrossMark

Cite this entry

Agrawal, A., Choudhary, A. (2019). Health Services Data: Big Data Analytics for Deriving Predictive Healthcare Insights. In: Levy, A., Goring, S., Gatsonis, C., Sobolev, B., van Ginneken, E., Busse, R. (eds) Health Services Evaluation. Health Services Research. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-8715-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8715-3_2

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-8714-6

  • Online ISBN: 978-1-4939-8715-3

  • eBook Packages: MedicineReference Module Medicine