Skip to main content

Advertisement

Log in

Non-alcoholic Fatty Liver and Liver Fibrosis Predictive Analytics: Risk Prediction and Machine Learning Techniques for Improved Preventive Medicine

  • Systems-Level Quality Improvement
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Non-alcoholic fatty liver disease (NAFLD) is the most common liver disease worldwide, with a prevalence of 20%–30% in the general population. NAFLD is associated with increased risk of cardiovascular disease and may progress to cirrhosis with time. The purpose of this study was to predict the risks associated with NAFLD and advanced fibrosis on the Fatty Liver Index (FLI) and the ‘NAFLD fibrosis 4’ calculator (FIB-4), to enable physicians to make more optimal preventive medical decisions. A prospective cohort of apparently healthy volunteers from the Tel Aviv Medical Center Inflammation Survey (TAMCIS), admitted for their routine annual health check-up. Data from the TAMCIS database were subjected to machine learning classification models to predict individual risk after extensive data preparation that included the computation of independent variables over several time points. After incorporating the time covariates and other key variables, this technique outperformed the predictive power of current popular methods (an improvement in AUC above 0.82). New powerful factors were identified during the predictive process. The findings can be used for risk stratification and in planning future preventive strategies based on lifestyle modifications and medical treatment to reduce the disease burden. Interventions to prevent chronic disease can substantially reduce medical complications and the costs of the disease. The findings highlight the value of predictive analytic tools in health care environments. NAFLD constitutes a growing burden on the health system; thus, identification of the factors related to its incidence can make a strong contribution to preventive medicine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Not applicable.

References

  1. Loomba R, Sanyal AJ (2013) The global NAFLD epidemic. Nat Rev Gastroenterol Hepatol 10 (11):686-690. https://doi.org/10.1038/nrgastro.2013.171

    Article  CAS  PubMed  Google Scholar 

  2. Zelber-Sagi S, Nitzan-Kaluski D, Halpern Z, Oren R (2006) Prevalence of primary non-alcoholic fatty liver disease in a population-based study and its association with biochemical and anthropometric measures. Liver Int 26 (7):856-863. https://doi.org/10.1111/j.1478-3231.2006.01311.x

    Article  PubMed  Google Scholar 

  3. Younossi ZM, Koenig AB, Abdelatif D, Fazel Y, Henry L, Wymer M (2016) Global epidemiology of nonalcoholic fatty liver disease-Meta-analytic assessment of prevalence, incidence, and outcomes. Hepatology 64 (1):73-84. https://doi.org/10.1002/hep.28431

    Article  Google Scholar 

  4. Estes C, Razavi H, Loomba R, Younossi Z, Sanyal AJ (2018) Modeling the epidemic of nonalcoholic fatty liver disease demonstrates an exponential increase in burden of disease. Hepatology 67 (1):123-133

    Article  CAS  Google Scholar 

  5. European Association for the Study of the L, European Association for the Study of D, European Association for the Study of O (2016) EASL-EASD-EASO Clinical Practice Guidelines for the management of non-alcoholic fatty liver disease. J Hepatol 64 (6):1388-1402. https://doi.org/10.1016/j.jhep.2015.11.004

    Article  Google Scholar 

  6. Hagstrom H, Nasr P, Ekstedt M, Hammar U, Stal P, Hultcrantz R, Kechagias S (2017) Fibrosis stage but not NASH predicts mortality and time to development of severe liver disease in biopsy-proven NAFLD. J Hepatol 67 (6):1265-1273. https://doi.org/10.1016/j.jhep.2017.07.027

    Article  PubMed  Google Scholar 

  7. Ekstedt M, Hagstrom H, Nasr P, Fredrikson M, Stal P, Kechagias S, Hultcrantz R (2015) Fibrosis stage is the strongest predictor for disease-specific mortality in NAFLD after up to 33 years of follow-up. Hepatology 61 (5):1547-1554. https://doi.org/10.1002/hep.27368

    Article  CAS  PubMed  Google Scholar 

  8. Motamed N, Sohrabi M, Ajdarkosh H, Hemmasi G, Maadi M, Sayeedian FS, Pirzad R, Abedi K, Aghapour S, Fallahnezhad M (2016) Fatty liver index vs waist circumference for predicting non-alcoholic fatty liver disease. World journal of gastroenterology 22 (10):3023

    Article  CAS  Google Scholar 

  9. European Association for Study of L, Asociacion Latinoamericana para el Estudio del H (2015) EASL-ALEH Clinical Practice Guidelines: Non-invasive tests for evaluation of liver disease severity and prognosis. J Hepatol 63 (1):237-264. https://doi.org/10.1016/j.jhep.2015.04.006

    Article  Google Scholar 

  10. Sun W, Cui H, Li N, Wei Y, Lai S, Yang Y, Yin X, Chen DF (2016) Comparison of FIB-4 index, NAFLD fibrosis score and BARD score for prediction of advanced fibrosis in adult patients with non-alcoholic fatty liver disease: a meta-analysis study. Hepatology Research 46 (9):862-870

    Article  CAS  Google Scholar 

  11. Stefan N, Kantartzis K, Haring HU (2008) Causes and metabolic consequences of Fatty liver. Endocr Rev 29 (7):939-960. er.2008-0009 [pii]. https://doi.org/10.1210/er.2008-0009

    Article  CAS  PubMed  Google Scholar 

  12. Chalasani N, Younossi Z, Lavine JE, Charlton M, Cusi K, Rinella M, Harrison SA, Brunt EM, Sanyal AJ (2018) The diagnosis and management of nonalcoholic fatty liver disease: Practice guidance from the American Association for the Study of Liver Diseases. Hepatology 67 (1):328-357. https://doi.org/10.1002/hep.29367

    Article  PubMed  Google Scholar 

  13. Anstee QM, Lawitz EJ, Alkhouri N, Wong VW, Romero-Gomez M, Okanoue T, Trauner M, Kersey K, Li G, Han L, Jia C, Wang L, Chen G, Subramanian GM, Myers RP, Djedjos CS, Kohli A, Bzowej N, Younes Z, Sarin S, Shiffman ML, Harrison SA, Afdhal NH, Goodman Z, Younossi ZM (2019) Noninvasive Tests Accurately Identify Advanced Fibrosis due to NASH: Baseline Data From the STELLAR Trials. Hepatology 70 (5):1521-1530. https://doi.org/10.1002/hep.30842

    Article  PubMed  Google Scholar 

  14. Bedogni G, Bellentani S, Miglioli L, Masutti F, Passalacqua M, Castiglione A, Tiribelli C (2006) The Fatty Liver Index: a simple and accurate predictor of hepatic steatosis in the general population. BMC gastroenterology 6 (1):33

    Article  Google Scholar 

  15. Poynard T, Ratziu V, Naveau S, Thabut D, Charlotte F, Messous D, Capron D, Abella A, Massard J, Ngo Y, Munteanu M, Mercadier A, Manns M, Albrecht J (2005) The diagnostic value of biomarkers (SteatoTest) for the prediction of liver steatosis. Comp Hepatol 4:10. https://doi.org/10.1186/1476-5926-4-10

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Hsu CL, Wu FZ, Lin KH, Chen YH, Wu PC, Chen YH, Chen CS, Wang WH, Mar GY, Yu HC (2019) Role of Fatty Liver Index and Metabolic Factors in the Prediction of Nonalcoholic Fatty Liver Disease in a Lean Population Receiving Health Checkup. Clin Transl Gastroenterol 10 (5):1-8. https://doi.org/10.14309/ctg.0000000000000042

  17. Wargny M, Smati S, Pichelin M, Bigot-Corbel E, Authier C, Dierry V, Zair Y, Jacquin V, Hadjadj S, Boursier J, Cariou B (2019) Fatty liver index is a strong predictor of changes in glycemic status in people with prediabetes: The IT-DIAB study. PLoS One 14 (8):e0221524. https://doi.org/10.1371/journal.pone.0221524

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Leutner M, Gobl C, Schlager O, Charwat-Resl S, Wielandner A, Howorka E, Prunner M, Bozkurt L, Maruszczak K, Geyik H, Prosch H, Pacini G, Kautzky-Willer A (2017) The Fatty Liver Index (FLI) Relates to Diabetes-Specific Parameters and an Adverse Lipid Profile in a Cohort of Nondiabetic, Dyslipidemic Patients. J Am Coll Nutr 36 (4):287-294. https://doi.org/10.1080/07315724.2016.1262802

    Article  CAS  PubMed  Google Scholar 

  19. Kozakova M, Palombo C, Eng MP, Dekker J, Flyvbjerg A, Mitrakou A, Gastaldelli A, Ferrannini E, Investigators R (2012) Fatty liver index, gamma-glutamyltransferase, and early carotid plaques. Hepatology 55 (5):1406-1415. https://doi.org/10.1002/hep.25555

    Article  CAS  PubMed  Google Scholar 

  20. Jung CH, Lee WJ, Hwang JY, Yu JH, Shin MS, Lee MJ, Jang JE, Leem J, Park JY, Kim HK (2013) Assessment of the fatty liver index as an indicator of hepatic steatosis for predicting incident diabetes independently of insulin resistance in a Korean population. Diabet Med 30 (4):428-435. https://doi.org/10.1111/dme.12104

    Article  CAS  PubMed  Google Scholar 

  21. Huh JH, Ahn SV, Koh SB, Choi E, Kim JY, Sung KC, Kim EJ, Park JB (2015) A Prospective Study of Fatty Liver Index and Incident Hypertension: The KoGES-ARIRANG Study. PLoS One 10 (11):e0143560. https://doi.org/10.1371/journal.pone.0143560

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Drinda S, Grundler F, Neumann T, Lehmann T, Steckhan N, Michalsen A, Wilhelmi de Toledo F (2019) Effects of Periodic Fasting on Fatty Liver Index-A Prospective Observational Study. Nutrients 11 (11). https://doi.org/10.3390/nu11112601

  23. Zelber-Sagi S, Webb M, Assy N, Blendis L, Yeshua H, Leshno M, Ratziu V, Halpern Z, Oren R, Santo E (2013) Comparison of fatty liver index with noninvasive methods for steatosis detection and quantification. World J Gastroenterol: WJG 19 (1):57

    Article  Google Scholar 

  24. Romero-Gomez M, Zelber-Sagi S, Trenell M (2017) Treatment of NAFLD with diet, physical activity and exercise. J Hepatol 67 (4):829-846. https://doi.org/10.1016/j.jhep.2017.05.016

    Article  PubMed  Google Scholar 

  25. Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. J Royal Statist Soc: Series C (Applied Statistics) 29 (2):119-127

    Google Scholar 

  26. Schwimmer JB, Johnson JS, Angeles JE, Behling C, Belt PH, Borecki I, Bross C, Durelle J, Goyal NP, Hamilton G (2019) Microbiome signatures associated with steatohepatitis and moderate to severe fibrosis in children with nonalcoholic fatty liver disease. Gastroenterology 157 (4):1109-1122

    Article  CAS  Google Scholar 

  27. Perveen S, Shahbaz M, Keshavjee K, Guergachi A (2018) A systematic machine learning based approach for the diagnosis of non-alcoholic fatty liver disease risk and progression. Sci Reports 8 (1):1-12

    CAS  Google Scholar 

  28. Yip TF, Ma A, Wong VS, Tse YK, Chan HY, Yuen PC, Wong GH (2017) Laboratory parameter-based machine learning model for excluding non-alcoholic fatty liver disease (NAFLD) in the general population. Alimentary Pharmacol Therapeutics 46 (4):447-456

    Article  CAS  Google Scholar 

  29. Birjandi M, Ayatollahi SMT, Pourahmad S, Safarpour AR (2016) Prediction and diagnosis of non-alcoholic fatty liver disease (NAFLD) and identification of its associated factors using the classification tree method. Iranian Red Crescent Medical Journal 18 (11)

  30. Kawaguchi T, Tokushige K, Hyogo H, Aikata H, Nakajima T, Ono M, Kawanaka M, Sawada K, Imajo K, Honda K (2018) A data mining-based prognostic algorithm for NAFLD-related hepatoma patients: A nationwide study by the Japan study group of NAFLD. Sci Reports 8 (1):1-13

    CAS  Google Scholar 

  31. Sowa J-P, Heider D, Bechmann LP, Gerken G, Hoffmann D, Canbay A (2013) Novel algorithm for non-invasive assessment of fibrosis in NAFLD. PloS One 8 (4):e62439

    Article  CAS  Google Scholar 

  32. Ma H, Xu C-F, Shen Z, Yu C-H, Li Y-M (2018) Application of machine learning techniques for clinical predictive modeling: a cross-sectional study on nonalcoholic fatty liver disease in China. BioMed Res Int 2018

  33. Rogulj D, Konjevoda P, Milić M, Mladinić M, Domijan A-M (2012) Fatty liver index as an indicator of metabolic syndrome. Clin Biochem 45 (1-2):68-71

    Article  CAS  Google Scholar 

  34. Eslam M, Hashem AM, Romero-Gomez M, Berg T, Dore GJ, Mangia A, Chan HLY, Irving WL, Sheridan D, Abate ML (2016) FibroGENE: a gene-based model for staging liver fibrosis. J Hepatol 64 (2):390-398

    Article  CAS  Google Scholar 

  35. Eddowes PJ, McDonald N, Davies N, Semple S, Kendall T, Hodson J, Newsome P, Flintham R, Wesolowski R, Blake L (2018) Utility and cost evaluation of multiparametric magnetic resonance imaging for the assessment of non-alcoholic fatty liver disease. Aliment Pharmacol Ther 47 (5):631-644

    Article  CAS  Google Scholar 

  36. Arregle F, Gouriet F, Amphoux B, Edouard S, Chaudet H, Jean-Paul C, Habib G, Fournier P-E, Raoult D (2019) Western Immunoblotting for the diagnosis of Enterococcus faecalis and Streptococcus gallolyticus infective endocarditis. Front Cell Infect Microbiol 9:314

    Article  CAS  Google Scholar 

  37. Modeler IS, Guide A (2010) IBM Corporation. NY, USA

    Google Scholar 

  38. Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, Gatelli D, Saisana M, Tarantola S (2008) Global sensitivity analysis: the primer. John Wiley & Sons,

  39. Huang X, Xu M, Chen Y, Peng K, Huang Y, Wang P, Ding L, Lin L, Xu Y, Chen Y (2015) Validation of the fatty liver index for nonalcoholic fatty liver disease in middle-aged and elderly Chinese. Medicine 94 (40)

  40. Ruhl C, Everhart J (2015) Fatty liver indices in the multiethnic United States National Health and Nutrition Examination Survey. Aliment Pharmacol Ther 41 (1):65-76

    Article  CAS  Google Scholar 

  41. Sterling RK, Lissen E, Clumeck N, Sola R, Correa MC, Montaner J, Sulkowski MS, Torriani FJ, Dieterich DT, Thomas DL (2006) Development of a simple noninvasive index to predict significant fibrosis in patients with HIV/HCV coinfection. Hepatology 43 (6):1317-1325

    Article  CAS  Google Scholar 

  42. Alberti K, Eckel RH, Grundy SM, Zimmet PZ, Cleeman JI, Donato KA, Fruchart J-C, James WPT, Loria CM, Smith Jr SC (2009) Harmonizing the metabolic syndrome: a joint interim statement of the international diabetes federation task force on epidemiology and prevention; national heart, lung, and blood institute; American heart association; world heart federation; international atherosclerosis society; and international association for the study of obesity. Circulation 120 (16):1640-1645

    Article  CAS  Google Scholar 

  43. Association AD Standards of medical care in diabetes—2017 Diabetes care.[Internet] Jan 2017 [citado 2017 Feb 23]; 40 (Supl1).

  44. Nedovic L, Mihailovic B, Ralevic NM Some properties of pseudo-measures and pseudo-probability. In: 2007 5th International Symposium on Intelligent Systems and Informatics, 2007. IEEE, pp 155-159

  45. Al Ghoson AM (2010) Decision Tree Induction & Clustering Techniques In SAS Enterprise Miner, SPSS Clementine, and IBM Intelligent Miner A Comparative Analysis. International Journal of Management & Information Systems (IJMIS) 14 (3)

  46. Brandusoiu IB, Toderean G (2013) Predicting Churn in Mobile Telecommunications Industry. Acta Technica Napocensis 54 (3):7

    Google Scholar 

Download references

Acknowledgments

This work was funded by the Israel National Institute For Health Policy Research, Grant# 2018/52/א.

Funding

This work was funded by The Israel National Institute For Health Policy Research, Grant Number: 2018/52/א.

Author information

Authors and Affiliations

Authors

Contributions

Orit Goldman = OG.

Ofir Ben-Assuli = OB.

Ori Rogowski = OR.

David Zeltser = DZ.

Itzhak Shapira = IS.

Shlomo Berliner = SB.

Shira Zelber-Sagi = SZ.

Shani Shenhar-Tsarfaty = SS.

Conceptualization: SZ + SS + OB; Data curation: SS + OR + DZ + IS+SB; Formal analysis: OB + OG + SS; Data acquisition: SS + OR + DZ + IS+SB; Methodology: OG + OB + SZ + SS; Software: OG; Validation: OB; Visualization: OG + OB; Writing - original draft: OG + OB + SZ + SS; Writing - review & editing: OG + OB+ OR + DZ + IS+SB + SZ + SS. all authors reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Orit Goldman.

Ethics declarations

Conflict of interest

None of the authors have conflicts to declare.

Code availability

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Systems-Level Quality Improvement

Appendices

Appendices

Appendix A: Independent Variables

The independent variables were composed of demographics, athletics habits & exercise test results, metabolic components (number of components of the metabolic syndrome, waist-hip ratio, BMI, glucose and hemoglobin A1c blood tests), routine blood counts, chemistry and lipid profile, history of hypertension, dyslipidemia and cardiovascular diseases as well as medication intake. Lifestyle and subjective stress questionnaire scores were also obtained. For the dataset, blood samples were drawn upon the participants’ arrival at the center after a 12-h overnight fast. All blood tests were analyzed in the fasting state, and analyzed in routine laboratories working under the ISO 9001:2008 quality assurance stipulations. The concentrations of glycated hemoglobin (A1C) and the concentration of total hemoglobin were measured, and the ratio reported as % A1C. A1C levels were categorized into three states, healthy <5.7%, pre-diabetic 5.7–6.4% and diabetic >6.5% according to the American Diabetes Association (ADA) guidelines. Evaluation and diagnosis of metabolic syndrome (and its components) were performed with reference to the joint interim statement of the International Diabetes Federation Task Force on Epidemiology and Prevention, the National Heart, Lung, and Blood Institute, the American Heart Association, the World Heart Federation, the International Atherosclerosis Society and the International Association for the Study of Obesity [42]. Briefly, an elevated waist circumference was defined as ≥94 cm (37 in.) in men and ≥ 80 (31.5 in.) in women, as recommended for individuals of European and Middle Eastern descent. Elevated triglycerides were defined as ≥150 mg/dl (1.7 mmol/l) or on drug treatment for elevated triglycerides. Reduced high-density lipoprotein-cholesterol (HDL) was defined as <40 mg/dl (1.0 mmol/l) in men and < 50 mg/dl (1.3 mmol/l) in women. Elevated blood pressure was defined as ≥130 mmHg for systolic blood pressure or ≥ 85 mmHg for diastolic blood pressure or on antihypertensive drug treatment in a patient with a history of hypertension. Elevated fasting glucose was defined as ≥100 mg/dl (5.55 mmol/l). The diagnosis of metabolic syndrome was based on the existence of at least three abnormal findings out of the five mentioned above [43]. The exercise ECG stress test was performed according to the Bruce protocol. ECG results were manually reviewed on the spot by a cardiologist. All participants were asked whether they had received any previous diagnosis of liver or cardiovascular disease or diabetes as part of their medical history questionnaire.

Appendix B: Definition of Gains and Lift

At a specified percentile level, the lift represents the ratio of the incident frequency up to this percentile to the total incident frequency in the full dataset. The lift chart sorts the predicted probabilities [44] in descending order and shows the corresponding curve. The lift chart depicts the extent of improvement of the prediction rate returned by our model compared to random expectation. The X axis shows the percentages (e.g. 10% refers to the top decile) and the Y axis shows the value of the lift. For a model with good prediction, the lift values will start above 1 and gradually decrease to 1. If the model does not improve the prediction, the lift values are around 1. Gains at a specified percentile level indicate the ratio of the cumulative number of incidents up to this percentile to the total number of incidents in the full dataset. The gains chart shows how widely the net needs to be cast to capture a given percentage of all the incidents in the dataset. For a model with good prediction, the gains values rise steeply in the 100% direction and then continue in a straight line. If the model does not improve the prediction, the gains values gradually increase from 0 to 100%. The visualization of the predictive analytic outcomes have become increasingly important to interpreting many graphs (e.g. ROC) and more recently Lift and Gains [45, 46].

Appendix C: Decision tree for FLI-CHAID model

Fig. 7
figure 7

Decision tree for FLI-CHAID model

The figure shows part of the decision tree developed on the full dataset (the entire tree is very spread out and contains many branches) to predict FLI. The dataset contained 7581 records. The first split (tree root) was made on the basis of the BMI on the last visit (BMI-last), and split into several branches. The branch we selected presents obese individuals on their last visit (BMI >28.7, n = 760). Of these, 373 (49%) had a FLI < 60 and 387 (51%) had a FLI ≥ 60. For this subpopulation, the next split of the tree was based on the annual change in triglycerides between the first and the last visit (triglyceride velocity). For individuals with an annual change of less than 6.4 (227 individuals) the next split was by gender, and for the rest (533 individuals) the next split was by weight on the first visit. The next splits continued as depicted. For those with a weight > 87.9 kg (257 individuals), the following split was made according to the last Uric Acid value. At each stage of the tree’s growth, the prevalence distribution of FLI ≥ 60 and FLI < 60 changed. Notably, at the starting point (at the root of the tree) the distribution was 15.01%. In the gender branch, there was a prediction of 94 women with FLI < 60 (because 82% had a FLI < 60 and only 18% had a FLI ≥ 60), and 133 men with FLI < 60 (because 58% had a FLI < 60 and 42% had a FLI ≥ 60). In the weight branch, there was a prediction of 276 people to weigh less than 87.9 kg with a FLI < 60 (because 51% had a FLI < 60 and 49% had a FLI ≥ 60). For those whose weight exceeded 87.9 kg, the calculation returned 174 individuals whose last Uric Acid value was 6.3 who had a FLI ≥ 60 (because 63% had a FLI ≥ 60), and 83 individuals whose last Uric Acid value was >6.3 who had a FLI ≥ 60 (because 85% had a FLI ≥ 60).

Appendix D: lift and gains curves for the FIB4-CHAID models

Fig. 8
figure 8

Lift chart for FIB4-CHAID

Fig. 9
figure 9

Gains chart for FIB4-CHAID

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goldman, O., Ben-Assuli, O., Rogowski, O. et al. Non-alcoholic Fatty Liver and Liver Fibrosis Predictive Analytics: Risk Prediction and Machine Learning Techniques for Improved Preventive Medicine. J Med Syst 45, 22 (2021). https://doi.org/10.1007/s10916-020-01693-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-020-01693-5

Keywords

Navigation