Skip to main content

Healthcare Analytics

  • Chapter
  • First Online:
Algorithms for Data Science

Abstract

Healthcare analytics refers to data analytic methods applied in the healthcare domain. Healthcare analytics is becoming a prominent data science domain because of the societal and economic burden of disease and the opportunities to better understand the healthcare system through the analysis of data. This chapter introduces the reader to the domain through the analysis of diabetes prevalence and incidence. The data are drawn from the Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    An affordable care organization (ACO) is a network of physicians and hospitals that provide patient care. ACO’s have a responsibility to insure quality care and limit expenditures while allowing patients some freedom in selecting specific medical services.

  2. 2.

    The tutorials of this chapter will reveal substantial geographic differences in prevalence and incidence across the United States.

  3. 3.

    It’s controllable in the sense that related conditions such as retinopathy can be avoided or delayed.

  4. 4.

    We’ve discussed and used BRFSS data in Chap. 3

  5. 5.

    The sampling weights reflect the likelihood selecting a particular respondent but are not the probability of selecting the respondent.

  6. 6.

    If incidence is approximately constant over the interval, \(\widehat{\beta }_{0,i}\) is a more precise estimator of prevalence at the midpoint of the time span.

  7. 7.

    The value labels for a specific question are usually the same from year to year.

  8. 8.

    This is the question asked in the year 2004 survey. The exact phrasing has changed over time.

  9. 9.

    You may already have some of these from having worked on the tutorial of Chap. 3, Sect. 3.6

  10. 10.

    Federal Information Processing Standards

  11. 11.

    Chapter 3 Sect. 3.6 discusses the creation of the functions.py module.

  12. 12.

    The number pairs m will be 15 except for Louisiana and some U.S. territories.

  13. 13.

    If the BRFSS samples were random samples, then we would call the probability estimate an empirical probability.

  14. 14.

    The precision of a prediction is directly related to the variance of the estimator, and the variance depends on the number of observations used to compute the estimate.

  15. 15.

    Other variables are potentially useful for prediction (race and exercise level).

  16. 16.

    Not every possible profile was observed. The number of observed profiles was 14,270, slightly less than 14,784.

  17. 17.

    We could define the event of interest more rigorously as metabolic syndrome, a set of medical conditions that are considered to be precursors to type 2 diabetes.

  18. 18.

    functions.py should reside in a directory below parent. For instance, the full path might be /home/HealthCare/PythonScripts/functions.py, in which case parent is /home/HealthCare.

  19. 19.

    The algorithm is essentially an implementation of the one-nearest neighbor prediction function.

  20. 20.

    In the tutorial of Sect. 7.5, the predictor variables are age, education, income, and body mass index and so p = 4.

  21. 21.

    A cohort is a population subgroup with similar characteristics

  22. 22.

    In the unlikely event that the target profile is not in the dictionary, we find a set of most similar profiles in the dictionary.

References

  1. C.C. Aggarwal, Data Mining - The Textbook (Springer, New York, 2015)

    MATH  Google Scholar 

  2. American Diabetes Association, http://www.diabetes.org/diabetes-basics/statistics/. Accessed 15 June 2016

  3. Centers for Disease Control and Prevention, Behavioral Risk Factor Surveillance System Weighting BRFSS Data (2013). http://www.cdc.gov/brfss/annual_data/2013/pdf/Weighting_Data.pdf

  4. Centers for Disease Control and Prevention, The BRFSS Data User Guide (2013). http://www.cdc.gov/brfss/data_documentation/pdf/userguidejune2013.pdf

  5. E.L. Korn, B.I. Graubard, Examples of differing weighted and unweighted estimates from a sample survey. Am. Stat. 49 (3), 291–295 (1995)

    Google Scholar 

  6. X. Zhuo, P. Zhang, T.J. Hoerger, Lifetime direct medical costs of treating type 2 diabetes and diabetic complications. Am. J. Prev. Med. 45 (3), 253–256 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Steele, B., Chandler, J., Reddy, S. (2016). Healthcare Analytics. In: Algorithms for Data Science. Springer, Cham. https://doi.org/10.1007/978-3-319-45797-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45797-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45795-6

  • Online ISBN: 978-3-319-45797-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics