Background

Non-alcoholic fatty liver disease (NAFLD) represents a spectrum of progressive liver disease ranging from simple steatosis to non-alcoholic steatohepatitis (NASH), fibrosis, and cirrhosis, in the absence of excessive alcohol consumption. NAFLD is regarded as a hepatic manifestation of metabolic syndrome (MetS) [1], therefore the presence of NAFLD is not only strongly associated with liver-related mortality, but also with diseases related to the MetS, such as diabetes and cardiovascular diseases [2],[3]. As NAFLD is highly prevalent and affects up to 30% of the general adult population [4], screening for and diagnosing NAFLD has become an important issue in public health to prevent NAFLD-related complications and reduce healthcare costs.

Liver biopsy remains the “gold standard” for NAFLD diagnosis; however, this is an invasive technique making it impractical to be used widely. Ultrasonography is therefore the recommended first-line imaging technique in clinical practice, although it is known to have limited sensitivity [3]. Other non-invasive tools have been developed for diagnosing NAFLD, such as computed tomography and proton magnetic resonance spectroscopy (1H-MRS). However, these tools are expensive and time-consuming, and are not considered cost-effective for large-scale NAFLD screening. Recently, five biomarker-based non-invasive prediction scores of NAFLD have been developed: SteatoTest [5], fatty liver index (FLI) [6], NAFLD liver fat score (LFS) [7], lipid accumulation product (LAP) [8], and hepatic steatosis index (HSI) [9]. These scores are derived from simple clinical risk factors and biomarkers, and can therefore potentially be used for large-scale NAFLD screening. However, different definitions and techniques were used to define NAFLD in the original studies, and the performances of these scores have not been validated, evaluated, and compared directly in a large general population. In addition, whether these non-invasive scores of NAFLD can predict clinical outcome remains largely unknown.

In this study, we aimed to validate and evaluate the performance of these non-invasive prediction scores of NAFLD in predicting ultrasonography-diagnosed NAFLD in a representative general adult population in the USA (cross-sectional NAFLD prediction cohort), and to test if the marker can predict mortality in the general population (prospective mortality prediction cohort).

Methods

Participant recruitment

Data from the third National Health and Nutrition Examination Survey (NHANES III) were used [10]. NHANES III was conducted by the National Center for Health Statistics (NCHS) from 1988 to 1994, using a stratified multistage probability sample that represented the civilian non-institutionalized population in the USA. Participants gave written consent before participation, and ethics approval was obtained from the Human Subjects Committee of the US Department of Health and Human Services.

We studied people aged 20 to 74 years who participated in the NHANES III survey. Laboratory tests were carried out in a mobile examination center (n = 14,797) (see Additional file 1: Figure S1). Because all non-invasive score formulae require levels of biomarkers in fasting blood, we included participants with blood taken after fasting for at least 8 hours fasting (n = 9,268). Of those, participants with factors that can confound the diagnosis of NAFLD (including excessive alcohol consumption, defined as >21 drinks/week in men and >14 drinks/week in women [4]; viral hepatitis, defined as positive serum hepatitis B surface antigen and positive serum hepatitis C antibody; iron overload, defined as transferrin saturation ≥50%; or pregnancy) were excluded (n = 1,089). The LFS formula includes fasting insulin level, therefore we further excluded participants who were using insulin or other medications for diabetes (n = 315). This left 7,864 participants, and once the appropriate exclusion criteria were adopted for each aim (see Additional file 1: Figure S1), we had 5,184 and 5,892 participants included in the analysis of Aim 1 (evaluation of the performance of non-invasive prediction scores of NAFLD in predicting ultrasound-diagnosed NAFLD) and Aim 2 (evaluation of the relationship between non-invasive prediction scores of NAFLD and mortality), respectively.

Definition of NAFLD

In the original NHANES III between 1988 and 1994, gall bladder ultrasonography video images were recorded using a Toshiba Sonolayer SSA-90A and Toshiba video recorder. Between 2009 and 2010, hepatic steatosis (fatty liver) was assessed by archived video images being re-reviewed by three ultrasonography readers (trained by a board-certified radiologist specializing in hepatic imaging), who graded the presence of fat within the hepatic parenchyma.

The following information was recorded on a standard paper collection form: 1) presence of liver-to-kidney contrast; 2) degree of the brightness of the liver parenchyma; 3) presence of deep beam attenuation; 4) presence of echogenic walls in the small intrahepatic vessels, and (5) definition of the gallbladder walls. Finally, an overall primary finding was given based on the presence or absence of each of the five parameters. The liver was graded as having no, mild, moderate, or severe hepatic steatosis. Of the 13,983 participants with hepatic imaging records, 13,856 of them could be graded [7]. Detailed descriptions and procedures have been provided previously [7],[11]. In the absence of a standard definition, we defined NAFLD as moderate or severe hepatic steatosis, and non-NAFLD as no or mild hepatic steatosis [12]. The overall intra-rater and inter-rater - statistics for reliability of the dichotomized outcomes (“no or mild” and “moderate or severe”) were 0.77 (95% CI 0.73 to 0.82) and 0.70 (95% CI 0.64 to 0.76), respectively [11].

Non-invasive markers of NAFLD

The non-invasive markers of NAFLD were calculated based on the equations reported in the literature [6]-[9]. In brief, FLI includes body mass index (BMI), γ-glutamyltranspeptidase, triglycerides, and waist circumference; HSI includes aspartate aminotransferase (AST)/alanine aminotransferase (ALT) ratio, BMI, diabetes, and sex; LAP includes sex, triglycerides, and waist circumference; and LFS includes AST/ALT ratio, diabetes, fasting AST level, fasting insulin level, and MetS. The SteatoTest was not included in the current study, as this test is a commercially one, and the calculation formula is not disclosed. The threshold used in the current study also adopted the cutoff points suggested in the literature: the high/low cutoff points were ≥1.257/≤ -1.413, ≥30/<30, and ≥30/<30 for LFS, HSI, and FLI, respectively [6],[7],[9].

Mortality follow-up

In NHANES III, cause of death was coded using the International Classification of Diseases. 10th Revision (ICD-10). ICD codes I00 to I78 and E10 to E14 were used to assess cardiovascular and diabetes mortality, respectively, as in our previous studies [13],[14]. Malignancy and liver mortality were defined by the Underlying Cause of Death (UCOD)_113 20 to 23, 25 to 26, and 43, and UCOD_113 15, 24, and 93 to 95, respectively, as in the literature [15]. The length of follow-up was the time from the study examination date to death or to December 31, 2006, whichever was earlier.

Definition of diabetes, hypertension, and MetS

Diabetes was defined according to the latest American Diabetes Association (ADA) guideline, which includes fasting glucose ≥126 mg/dl, random plasma glucose ≥200 mg/dl, or A1C ≥ 6.5. Patients were considered to have hypertension if they had systolic blood pressure (SBP) ≥140 or diastolic blood pressure (DBP) ≥90 mmHg, or if they were receiving anti-hypertensive drug therapy. MetS was defined according to the joint scientific statement on harmonizing MetS [16], that is, having three or more of the following factors: 1) elevated blood pressure (SBP ≥ 130 mmHg and/or DBP ≥85 mmHg and/or being in receipt of anti-hypertension drug therapy); 2) elevated triglycerides (≥150 mg/dl (1.7 mmol/l) and/or being in receipt of drug treatment for elevated triglycerides); 3) reduced high-density liproprotein (HDL) cholesterol (<40 mg/dl (1.0 mmol/l) in men and <50 mg/dl (1.3 mmol/l) in women and/or being in receipt of drug treatment for elevated HDL cholesterol); 4) elevated fasting glucose (≥100 mg/dl (5.6 mmol/l) and/or being in receipt of treatment for elevated glucose); and 5) large waist circumference (>102 cm in men and >88 cm in women of European descent). Liver fat percentage was estimated using the equation reported in the same literature as the LFS [7]. The equation includes the same variables as the LFS, but with a different calculation.

Statistical analysis

To assess model discrimination, we calculated the area under curve (AUC) for the receiver operating characteristic (ROC) for each non-invasive score of NAFLD. The difference between two AUCs was compared using the maximum likelihood estimation method [17] and implemented using ROCKIT [18]. Sensitivity, specificity, positive likelihood ratio (+LR), negative likelihood ratio (-LR), and corresponding 95% CIs were also calculated. The non-invasive NAFLD measurement with the best performance (in terms of AUC for ROC) was selected and evaluated for its association with mortality.

In the Cox proportional hazard regression model, non-invasive score was modeled as threshold and continuous variables. Using the lower threshold as the reference, the hazard ratio (HR) and 95% CI for the highest threshold were calculated using the simple and fully adjusted Cox regression models. In the simple model, we adjusted for age and sex. In the full model, we adopted the adjustment model of a recent study related to NAFLD fibrosis and mortality [15], which includes age, sex, race/ethnicity, income, education, diabetes, hypertension, use of lipid-lowering medication, smoking, drinking, history of cardiovascular disease (CVD), waist circumference, dietary caffeine intake, HDL cholesterol, triglycerides, transferrin saturation and C-reactive protein. P ≤ 0.05 was considered significant. The proportional hazards assumption was evaluated by including time-dependent covariates in the full regression model; the overall test of proportional hazards was not significant (P > 0.05) suggesting that the proportional assumption was valid. To gain additional insight into the potential nonlinearity of the effect of LFS, we examined the Cox regression models using penalized spline. Two degrees of freedom (df) used in the spline because the model had the lowest Akaike's information criterion (AIC) (best fit) when df = 2. Sample weights that accounted for the unequal probabilities of selection, oversampling, and non-response were applied in all analyses using the complex sampling module in SPSS (V18.0; SPSS Inc, Chicago, IL, USA) or R software (V2.15.0) [19]. All values presented were weighted to represent the civilian population of the USA.

We also evaluated the ability in risk reclassification using integrated discrimination improvement (IDI) [20] and category-less net reclassification improvement (NRI) [20]. IDI was used to compare the difference in discrimination slopes [21], while category-less NRI was used to compare classifications from two models for changes by outcome for a net calculation of changes in the right direction. Estimated risk of death of different models was calculated using the equation of 1/(1 + exp(-1×XBeta)). Analyses were performed using R software (V2.15.0) [19].

Results

Cross-sectional NAFLD prediction cohort

Of the 5,184 participants included for the AUC study, 18.4% (16.5 to 20.4%) had NAFLD. The characteristics of this cohort are provided in Table 1. For NAFLD prediction, LFS was the best performer for predicting NAFLD, with an AUC of 0.771 (P < 0.001), whereas the lowest AUC (0.732) was observed for HSI (Table 2). Using maximum likelihood estimation, the difference between the AUC of LFS and other markers (FLI, LAP, and HSI) was statistically significant (all P < 0.01). Interestingly, the diagnostic accuracy of these markers differed by race/ethnicity (Table 2). The sensitivity and specificity, and the + LR and -LR of the suggested high and low cutoff points for excluding/including NAFLD are provided in Table 3 and Additional file 1: Table S1, respectively. The raw number used to calculate the diagnostic accuracy and the characteristics of true and false positive, and true and false negative (based on the LFS threshold) are provided in Additional file 1: Tables S2 and S3.

Table 1 Characteristics of participants in the cross-sectional NAFLD prediction cohort according to NAFLD status a
Table 2 Quality of prediction scores in predicting NAFLD
Table 3 Sensitivity and specificity of exclusion/inclusion cutoff points of LFS, FLI, and HSI in the cross-sectional NAFLD prediction cohort and literature

Prospective mortality prediction cohort

Of the several NAFLD non-invasive prediction scores tested, the LFS gave the best performance. As our second aim was to investigate the relationship of the best non-invasive score with outcome, we tested if LFS was associated with mortality. Table 4 shows the characteristics of the participants. During a median follow-up of 14.7 years (range 0.1 to 18.2 years) and 83,830.5 person-years, 793, 311, 209, 58, and 17 participants died from all, cardiovascular-, malignancy-, diabetes-, and liver-related causes, respectively. Higher LFS were associated with all causes of mortality tested, except malignancy-related causes.

Table 4 Characteristics of participants in the prospective mortality prediction cohort according to different LFS thresholds

The results of Cox regression analysis are shown in Table 5. Participants in the high LFS group had a 60% higher risk (HR = 1.6; 95% CI 1.01 to 2.54; P = 0.048 in full model) of all-cause mortality than the intermediate LFS group. For cardiovascular mortality, participants in the high LFS group was associated with 2.24-fold (95% CI 1.03 to 4.88; P = 0.042 in full model) and 2.3-fold (95% CI 1.19 to 4.48; P = 0.015 in full model) increase in risk of death compared with the low and intermediate LFS groups. For liver mortality, participants in the high LFS group had a 31.25-fold (95% CI 3.13 to 333.33; P = 0.004 in full model) and 30.3-fold (95% CI 4 to 250; P = 0.001 in full model) increase in risk of death compared with the low and intermediate LFS groups. The Kaplan-Meier survival curves for cardiovascular and liver-related mortality are provided in Figure 1. When LFS was treated as continuous variable, a one-unit increase of LFS was associated with increased mortality of all-cause and cardiovascular-, liver-, and diabetes-related mortality, with HRs of 1.09 (95% CI 1.01 to 1.19; P = 0.039), 1.11 (95% CI 1.03 to 1.19; P = 0.006), 1.32 (95% CI 1.12 to 1.55; P = 0.001), and 1.21 (95% CI 1.02 to 1.44; P = 0.034), respectively (Table 5), after full adjustment. Similar results were obtained after further adjustment of the NAFLD fibrosis score (NFS). The relationship between LFS and cardiovascular mortality as examined by penalized regression spline is shown in Figure 2.

Table 5 Association between LFS and mortality
Figure 1
figure 1

Kaplan-Meier curves of mortality curves according to different liver fat score (LFS) thresholds. (a) Cardiovascular-related and (b) liver-related mortality.

Figure 2
figure 2

Association between liver fat score (LFS) and cardiovascular mortality via penalized regression splines.

Age, sex, hypertension, and diabetes are commonly used in assessing risk of mortality. We therefore evaluated whether addition of LFS (categorical: low/intermediate/high risk of NAFLD) in a basic clinical model composed of these traditional risk factors could improve the risk prediction. Risk reclassification with IDI showed a modest positive shift to improvement when LFS was added in the basic clinical model (IDI: 0.0131; 95% CI 0.009 to 0.017; P < 0.001). Similar result was observed using category-less NRI (NRI: 0.133; 95% CI 0.054 to 0.211; P < 0.001).

Discussion

Using a large, nationally representative cohort with more than 10 years of follow-up and ultrasonographic data, we have demonstrated that LFS is the best prediction score for ultrasonography-diagnosed NAFLD, and can predict mortality, including cardiovascular- and liver-related mortality.

It is important to find an easy and cost-effective way to screen for NAFLD. Of the several non-invasive scores we tested, LFS showed the best performance in identifying ultrasonography-diagnosed NAFLD. Notably, LFS was derived using 1H-MRS-diagnosed NAFLD, whereas FLI, HSI, and LAP were derived from ultrasonography-diagnosed NAFLD. For those scores derived from ultrasonography, the definitions of the NAFLD were also somewhat different (see Additional file 1: Table S4). Ultrasonography is a semi-quantitative imaging technique, and the definitions of NAFLD differed between studies. By contrast, 1H-MRS is by far the most sensitive and quantitative imaging tool in identifying hepatic steatosis. This could be the reason why the non-invasive score (LFS) derived from 1H-MRS performed better and more robustly in identifying cases in the current study. As there is no standard definition of ultrasonography-diagnosed NAFLD for good measurement, we used three additional definitions to test the performance of different non-invasive indices (see Additional file 1: Table S5), and the LFS still came out best. We evaluated whether combining all prediction scores (combined score) could improve the NAFLD prediction. The AUC of the combined score increased to 0.782 (95% CI 0.766 to 0.798), suggesting that there are unique NAFLD predicting components being captured in different prediction scores.

Interestingly, there was a difference in the diagnostic accuracy of the different non-invasive scores, with the lowest diagnostic accuracy being observed in black patients for all tested scores, suggesting that the clinical risk factors of NAFLD could be ethnicity-specific and particularly different in black populations. Like other disease predictions [22], deriving an ethnicity- or population-specific prediction model may be required to achieve a high accuracy of NAFLD prediction. Notably, although there was an observed difference in the diagnostic accuracy of LFS for NAFLD, no significant interaction (P > 0.05) between LFS and race/ethnicity on mortality was observed, therefore, no subgroup analysis was performed in the subsequent analyses.

LFS was calculated based on the AST/ALT ratio, diabetes, fasting AST level, fasting insulin level, and MetS. Given that diabetes and MetS are known to be associated with mortality, the association between LFS and mortality could be attributable to these factors. However, the components of MetS were adjusted for in the full model, suggesting that the association of LFS with mortality may be independent of these factors. A number of different organizations use different recommended waist circumference thresholds for abdominal obesity in defining MetS. In addition to the threshold suggested by ATPIII [16], we also used the population-specific threshold suggested by the International Diabetes Federation, and the findings remained unchanged (data not shown).

In the literature on FLI, LFS, and HSI, various high and low cutoff points have been proposed to include and exclude NAFLD [6],[7],[9]. The high cutoff point should have a high specificity and + LR, while the low cutoff point should have a high sensitivity and low −LR. In general, the diagnostic performance of the defined cutoff points of the NAFLD prediction scores was not the same as that originally reported in the literature because the sample populations were different (Table 3). The high cutoff point of LFS had a slightly higher specificity (96.4%) in the current study than the figure (95%) reported in the literature, meaning that study participants with a high LFS were very likely to have a higher risk of mortality.

Two previous studies validated the non-invasive prediction scores in adults [23],[24]. Koehler et al. validated FLI and LAP in 2,652 participants in the Rotterdam Study. FLI and LAP had an AUC of 0.813 and 0.786, respectively, in predicting ultrasonography-diagnosed NAFLD. FLI had a higher AUC in the Rotterdam Study than in the current study. Interestingly, the Rotterdam Study used the scoring protocol of Hamaguchi et al. [25], and the one used by NHANES was an algorithm derived based on that same publication. The Rotterdam Study was a population-based cohort study of elderly inhabitants of a district of Rotterdam, whereas NHANES was a nationally representative population-based study with participants of different races/ethnicities and age. LFS was previously validated in a study of 40 non-diabetic patients with biopsy-proven NAFLD and 85 healthy controls [23], which showed that LFS had an AUC of 0.86. Although the AUC from different validation studies cannot be compared directly, our study is in agreement with previous validation studies showing that LFS had the highest AUC, followed by FLI and LAP. Although no validation study has been performed for HSI, our study suggested that HSI is better than LAP as a predictor of NAFLD.

Although identifying people with NAFLD is important, identifying people with adverse clinical outcome is even more important, as NAFLD consists of a wide spectrum of conditions, ranging from simple steatosis to cirrhosis with varying prognosis. In concordance with previous NHANES reports [12],[15], our study did not reveal any significant association between ultrasonography-diagnosed NAFLD and mortality (data not shown). This finding is intriguing. Ultrasonography-diagnosed NAFLD is not associated with mortality, whereas LFS, a marker of NAFLD, is associated with mortality. This could be due to the reason mentioned earlier, namely, that LFS was derived using the sensitive and quantitative imaging tool 1H-MRS, whereas other NAFLD prediction scores were derived using a less sensitive semi-quantitative ultrasonography. In fact, we found no association of other markers of NAFLD with mortality (se Additional file 1: Table S5), further suggesting that 1H-MRS-derived LFS may be more superior in identifying NAFLD and predicting clinical outcome.

We then investigated which individual component of the LFS was associated with CVD mortality in the multivariable model, and found the only significant association observed Qa with fasting serum insulin (estimate of 1.02; 95% CI 1.01 to 1.03; P < 0.001), suggesting that fasting serum insulin may be the main driver for the observed association.

Notably, high LFS is also associated with low transferrin saturation (Table 4). In our previous study, we showed that low transferrin saturation was robustly associated with pre-diabetes [26]. These observations suggested that elevated insulin resistance might be the key factor leading to mortality in people with NAFLD, which may also explain why LFS can predict mortality whereas ultrasonography-diagnosed NAFLD cannot. Another possibility is that high LFS indicates the presence of other NAFLD-related conditions, such as NASH and fibrosis. In participants with NAFLD, high LFS is associated with high NFS (data not shown), which is a prediction score of NAFLD fibrosis [27], although NFS does not predict NAFLD (data not shown) nor is it significantly associated with mortality in the general population (see Additional file 1: Table S5). However, further adjustment of NFS revealed that the effect of LFS is independent of NFS (Table 5), suggesting that the association between LFS and mortality may be independent of NAFLD fibrosis. Future study is required to confirm our observations and to examine the underlying mechanisms.

Age, sex, and presence of diabetes or hypertension are simple risk factors that are commonly used by clinicians to evaluate mortality risk. We showed that LFS has an independent role in predicting mortality and improved risk reclassification. Similarly, although the Framingham Risk Score (FRS) was not intended for use in mortality prediction, we found that the associations of LFS with cardiometabolic disease related mortality were independent of FRS (see Additional file 1: Table S6). These findings suggest that abnormal liver function may play a role in mortality determination, independently of traditional risk factors.

Our study has several strengths. The study population is large, multiethnic, nationally representative, and well-characterized, with data on ultrasonography-diagnosed NAFLD, multiple risk factors, and potential confounders. The long follow-up and the large number of events provided ample statistical power. The wide range of collected data from NHANES III allowed construction of four different non-invasive prediction scores simultaneously, so that they could be compared in parallel and with different definitions of NAFLD.

Nevertheless, there are limitations. The major limitation of the current study is the use of ultrasonography-diagnosed NAFLD, which can lead to misclassification error. In the absence of a standard definition, we defined NAFLD as presence of moderate or severe hepatic steatosis, while non-NAFLD was defined as presence of no or mild hepatic steatosis, as in previous study. The case definition, especially when mild hepatic steatosis was defined as non-NAFLD, could be a potential source of bias. This classification could have led to underestimation of the NAFLD prevalence in the current study, which was reported to be 20% to 33% in the general population [28], although the lower prevalence observed could also be due to the lower prevalence of obesity in the current study [29]. It is acknowledged that ultrasonography has limited sensitivity and specificity in diagnosing NAFLD, especially when less than 33% of the liver parenchyma is infiltrated by fat [30],[31] and in the presence of liver cirrhosis that may lead to decreased hepatic steatosis. To confirm our findings, we defined NAFLD in different ways and still found that LFS was the best marker of NAFLD (Table 4; see Additional file 1: Table S7), and participants with mild hepatic steatosis also did not have increased mortality compared with those without hepatic steatosis (see Additional file 1: Table S8).

Although liver biopsy is considered the gold standard in diagnosing NAFLD, it is not justifiable to perform liver biopsy in large numbers of asymptomatic individuals, therefore ultrasonography is still considered an acceptable first-line screening procedure for NAFLD in clinical practice [32]. However, ultrasonography cannot distinguish between NASH, fibrosis or cirrhosis.

The prediction score named the SteatoTest, was not included in the current study; whether it is superior to LFS or otherwise requires further study.

In the Cox regression analysis, there were too few cases of liver-related mortality, which led to unreliable estimates, and could also be a potential source of bias, therefore cautious interpretation is required.

Conclusion

In conclusion, we found that 1H-MRS derived NAFLD prediction score LFS was the most robust non-invasive score identifying NAFLD in this US population and predicted mortality. NAFLD is highly prevalent, and can be associated with morbidity and mortality if left unidentified. Our findings suggest that LFS may be a promising tool for large-scale NAFLD screening. If confirmed in future studies, LFS may be a useful marker for large-scale NAFLD screening and prediction of long-term clinical outcomes.

Authors' contributions

CLC was responsible for study conception and design; data generation and analysis; and manuscript drafting. CLC, KSLL, ICKW, BMYC were responsible for data interpretation. All authors reviewed, commented upon, and approved the final submission.

Authors' information

CLC is the article guarantor.

Additional file