Skip to main content

Development and Validation of an Algorithm to Identify Nonalcoholic Fatty Liver Disease in the Electronic Medical Record


Background and Aims

Nonalcoholic fatty liver disease (NAFLD) is the most common cause of chronic liver disease worldwide. Risk factors for NAFLD disease progression and liver-related outcomes remain incompletely understood due to the lack of computational identification methods. The present study sought to design a classification algorithm for NAFLD within the electronic medical record (EMR) for the development of large-scale longitudinal cohorts.


We implemented feature selection using logistic regression with adaptive LASSO. A training set of 620 patients was randomly selected from the Research Patient Data Registry at Partners Healthcare. To assess a true diagnosis for NAFLD we performed chart reviews and considered either a documentation of a biopsy or a clinical diagnosis of NAFLD. We included in our model variables laboratory measurements, diagnosis codes, and concepts extracted from medical notes. Variables with P < 0.05 were included in the multivariable analysis.


The NAFLD classification algorithm included number of natural language mentions of NAFLD in the EMR, lifetime number of ICD-9 codes for NAFLD, and triglyceride level. This classification algorithm was superior to an algorithm using ICD-9 data alone with AUC of 0.85 versus 0.75 (P < 0.0001) and leads to the creation of a new independent cohort of 8458 individuals with a high probability for NAFLD.


The NAFLD classification algorithm is superior to ICD-9 billing data alone. This approach is simple to develop, deploy, and can be applied across different institutions to create EMR-based cohorts of individuals with NAFLD.

This is a preview of subscription content, access via your institution.

Fig. 1


  1. Williams CD, Stengel J, Asike MI, et al. Prevalence of nonalcoholic fatty liver disease and nonalcoholic steatohepatitis among a largely middle-aged population utilizing ultrasound and liver biopsy: a prospective study. Gastroenterology. 2011;140:124–131.

    Article  PubMed  Google Scholar 

  2. Byrne CD, Targher G. NAFLD: a multisystem disease. J Hepatol. 2015;62:S47–S64.

    Article  PubMed  Google Scholar 

  3. Musso G, Gambino R, Cassader M, Pagano G. Meta-analysis: natural history of non-alcoholic fatty liver disease (NAFLD) and diagnostic accuracy of non-invasive tests for liver disease severity. Ann Med. 2011;43:617–649.

    Article  PubMed  Google Scholar 

  4. Vernon G, Baranova A, Younossi ZM. Systematic review: the epidemiology and natural history of non-alcoholic fatty liver disease and non-alcoholic steatohepatitis in adults. Aliment Pharmacol Ther. 2011;34:274–285.

    Article  CAS  PubMed  Google Scholar 

  5. White DL, Kanwal F, El-Serag HB. Association between nonalcoholic fatty liver disease and risk for hepatocellular cancer, based on systematic review. Clin Gastroenterol Hepatol. 2012;10:e1342.

    Article  Google Scholar 

  6. Charlton M. Cirrhosis and liver failure in nonalcoholic fatty liver disease: Molehill or mountain? Hepatology. 2008;47:1431–1433.

    PubMed Central  Article  PubMed  Google Scholar 

  7. Matteoni CA, Younossi ZM, Gramlich T, Boparai N, Liu YC, McCullough AJ. Nonalcoholic fatty liver disease: a spectrum of clinical and pathological severity. Gastroenterology. 1999;116:1413–1419.

    Article  CAS  PubMed  Google Scholar 

  8. Dam-Larsen S, Franzmann M, Andersen IB, et al. Long term prognosis of fatty liver: risk of chronic liver disease and death. Gut. 2004;53:750–755.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  9. Ekstedt M, Franzen LE, Mathiesen UL, et al. Long-term follow-up of patients with NAFLD and elevated liver enzymes. Hepatology. 2006;44:865–873.

    Article  CAS  PubMed  Google Scholar 

  10. Soderberg C, Stal P, Askling J, et al. Decreased survival of subjects with elevated liver function tests during a 28-year follow-up. Hepatology. 2010;51:595–602.

    Article  PubMed  Google Scholar 

  11. Sung KC, Kim BS, Cho YK, et al. Predicting incident fatty liver using simple cardio-metabolic risk factors at baseline. BMC Gastroenterol. 2012;12:84.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  12. Liao KP, Cai T, Gainer V, et al. Electronic medical records for discovery research in rheumatoid arthritis Arthritis. Care Res. 2010;62:1120–1127.

    Google Scholar 

  13. Ananthakrishnan AN, Cai T, Savova G, et al. Improving case definition of Crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm Bowel Dis. 2013;19:1411–1420.

    PubMed Central  Article  PubMed  Google Scholar 

  14. Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006;101:1418–1429.

    Article  CAS  Google Scholar 

  15. Friedman JHT, Tibshirani R. The elements of statistical learning. New York: Springer; 2001.

    Google Scholar 

  16. Dunn W, Xu R, Wingard DL, et al. Suspected nonalcoholic fatty liver disease and mortality risk in a population-based cohort study. Am J Gastroenterol.. 2008;103:2263–2271.

    PubMed Central  Article  PubMed  Google Scholar 

  17. Ong JP, Pitts A, Younossi ZM. Increased overall mortality and liver-related mortality in non-alcoholic fatty liver disease. J Hepatol. 2008;49:608–612.

    Article  PubMed  Google Scholar 

  18. Targher G, Bertolini L, Rodella S, et al. Nonalcoholic fatty liver disease is independently associated with an increased incidence of cardiovascular events in type 2 diabetic patients. Diabetes Care. 2007;30:2119–2121.

    Article  CAS  PubMed  Google Scholar 

  19. Kramer JR, Davila JA, Miller ED, Richardson P, Giordano TP, El-Serag HB. The validity of viral hepatitis and chronic liver disease diagnoses in Veterans Affairs administrative databases. Aliment Pharmacol Ther. 2008;27:274–282.

    Article  CAS  PubMed  Google Scholar 

  20. Husain N, Blais P, Kramer J, et al. Nonalcoholic fatty liver disease (NAFLD) in the Veterans Administration population: development and validation of an algorithm for NAFLD using automated data. Aliment Pharmacol Ther. 2014;40:949–954.

    Article  CAS  PubMed  Google Scholar 

  21. Browning JD, Szczepaniak LS, Dobbins R, et al. Prevalence of hepatic steatosis in an urban population in the United States: impact of ethnicity. Hepatology. 2004;40:1387–1395.

    Article  PubMed  Google Scholar 

  22. Peabody JW, Luck J, Jain S, Bertenthal D, Glassman P. Assessing the accuracy of administrative data in health information systems. Medical Care. 2004;42:1066–1072.

    Article  PubMed  Google Scholar 

  23. Newton KM, Peissig PL, Kho AN, et al. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. JAMIA. 2013;20:e147–e154.

    PubMed Central  PubMed  Google Scholar 

  24. Corey KE, Chalasani N. Management of Dyslipidemia as a Cardiovascular Risk Factor in Individuals With Nonalcoholic Fatty Liver Disease. Clin Gastroenterol Hepatol. 2014;12:1077–1084.

  25. Trivedi B. Biomedical science: betting the bank. Nature. 2008;452:926–929.

    Article  CAS  PubMed  Google Scholar 

  26. Murphy S, Churchill S, Bry L, et al. Instrumenting the health care enterprise for discovery research in the genomic era. Genome Res. 2009;19:1675–1681.

    PubMed Central  Article  CAS  PubMed  Google Scholar 

Download references


The authors would like to acknowledge Dr. Ashwin N. Ananthakrishnan, MBBS, MPH who provided feedback and critical comments.

Financial Support

This study was funded in part by grants from the NIH K23 DK099422 (KEC) and NIH U54 LM008748 (SYS).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Kathleen E. Corey.

Ethics declarations

Conflict of interest


Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 12 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Corey, K.E., Kartoun, U., Zheng, H. et al. Development and Validation of an Algorithm to Identify Nonalcoholic Fatty Liver Disease in the Electronic Medical Record. Dig Dis Sci 61, 913–919 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Nonalcoholic fatty liver disease
  • Nonalcoholic steatohepatitis
  • Electronic medical records
  • Triglycerides