Introduction

Nonalcoholic fatty liver disease (NAFLD) has reached pandemic proportions, currently affecting 25–30% of the United States population [1]. An estimated 2 million Veterans (30%) have NAFLD, with the prevalence nearly tripling in recent years [2]. NAFLD encompasses a spectrum of liver disease including hepatic steatosis (HS) and its more severe form, nonalcoholic steatohepatitis (NASH), which can progress to cirrhosis and hepatocellular carcinoma (HCC). Patients with NAFLD are also at an increased risk of non-hepatic complications, including cardiovascular events [3, 4], extrahepatic malignancies [5], and overall poor quality of life [6]. As NASH-related cirrhosis is becoming the leading indication for liver transplantation [7, 8], early disease detection is critical.

However, diagnosing NAFLD in its early stages remains a challenge. In routine clinical care, HS incidentally found on imaging with or without elevated transaminases, rather than recognition of metabolic comorbidities as part of a dedicated evaluation, prompts referral to hepatology/gastroenterology clinics for further evaluation of NAFLD. Liver biopsy (LB) remains the diagnostic gold standard which is limited by cost, sampling error, and procedural risks [9], preventing its implementation at a population scale and restricting its use primarily to clinical trials [10]. To address this challenge, in 2021, the American Gastroenterology Association (AGA) developed guidelines that encourage primary care providers to recognize and screen patients at risk of NAFLD defined as having (1) at least 2 metabolic risk factors (obesity, hypertriglyceridemia, low high-density lipoprotein-HDL, and hypertension or prediabetes), (2) type 2 diabetes (diabetes), or (3) elevated transaminases [11]. However, given the large burden of NAFLD coupled with competing clinical responsibilities during primary care visits, developing easily applicable, automated algorithms embedded in electronic health records (EHR) to identify at-risk patients can facilitate early detection.

Vibration-controlled transient elastography (VCTE, FibroScan®) can estimate liver stiffness and HS noninvasively [12]. The VCTE controlled attenuation parameter (CAP) has been validated in HS diagnosis in several prospective biopsy-controlled studies that report excellent diagnostic accuracy for HS detection [13,14,15,16,17]. It is therefore gaining traction as a non-invasive method of risk-stratifying chronic liver disease. Further, HS as measured by the CAP score is associated with an increased overall mortality in the United States, independent of hepatic fibrosis [18]. Despite its advantages, implementation of VCTE has challenges: it is not always available to all clinics; it requires additional training in its interpretation; and most importantly, it has not been shown to be cost effective in an initial assessment of patients at risk for NAFLD. [19]

The Veterans Affairs Healthcare System (VAHCS) is the largest integrated health system in the United States with long-term longitudinal clinical and demographic data. Despite the growing burden of NAFLD among Veterans, Veterans Health Administration (VHA)-based NAFLD predictive algorithms are lacking [20]. Given that VCTE is becoming the standard of care in clinical practice [11, 21] and the unique VHA EHR offers, we aimed to develop a clinical predictive algorithm using the VCTE CAP score to identify at-risk Veterans for NAFLD.

Methods

Data Source

The study was approved by the Veterans Affairs Institutional Review Board. VCTE was introduced to the VA Greater Los Angeles Healthcare System (VAGLAHS) in 03/2016 for assessment of hepatic fibrosis. In 01/2017, the CAP was introduced for HS measurement. We queried the Corporate Data Warehouse, a repository of all clinical health data at VAGLAHS, from 01/2017 to 0/62020.

Discovery Cohort

At VAGLAHS, 1,767 VCTE records were available between 01/2016 and 06/2020, of which 1,403 had a CAP score. As VCTE is operator dependent and requires training for proficiency [22], we selected cases from 2018 to 2020 to allow enough time for operators to become proficient. Three trained navigators (S.B., R.S., A.L.) abstracted clinical data from the EHR using a common data abstraction form after being trained by the same person (S.B). Near the end of data abstraction, one primary clinician abstractor (S.B.) reviewed a 20% random sample from each abstractor’s data sample to assure accuracy of data. The study team discussed and resolved any discrepancies by consensus. This resulted in 740 unique patients with VCTE reports. Exclusion criteria were as follows: high-risk ethanol use (any AUDIT-C > 3 in women or > 4 for men [20] or with a diagnosis of alcohol-associated liver disease (ALD) made by a hepatologist); chronic HCV genotype 3 (n = 4), given its competing cause of HS; patients with a history of primary or secondary liver malignancy or who had a liver transplantation were also excluded; and VCTE reports with poor technical quality (liver stiffness measurement interquartile range (IQR) > 30% [23]), resulting in 434 patients. Patients lacking any radiographic or LB data were also excluded, yielding a final discovery cohort of 403 patients (Fig. 1).

Fig. 1
figure 1

Study Population

Data Collection

Socio-demographic and clinical information were collected from the EHR including age, sex, race, ethnicity, hepatitis B surface antigen, hepatitis C antibody, hepatitis C RNA, human immunodeficiency virus (HIV) antibody, and antiviral treatment history (hepatitis B and C); body mass index (BMI) and systolic and diastolic blood pressures on the date of VCTE; and whether patients had active prescriptions for anti-hypertensives, statins, or for diabetes medications at the time of VCTE. Laboratory parameters including white blood cell count, platelet count, ALT, aspartate aminotransferase (AST), alkaline phosphatase (AP), total bilirubin, lipid panel, hemoglobin A1c, random glucose, and lipid panel were also obtained. All laboratory parameters were collected on dates closest to the VCTE date and no more than 1 year from the VCTE date. All VCTE (Fibroscan® 502 touch) reports were reviewed. CAP score, LSM with IQR, and probe size (M or XL) were obtained from each VCTE report.

Selection of Covariables

Covariables were selected a priori based on previous NAFLD algorithms, considering both biological plausibility and clinical availability. We included age, sex, and race, and ethnicity due to prior studies describing their associations [24]. Given that many of the clinical and laboratory variables were part of the metabolic syndrome, multicollinearity was evaluated prior to final covariable selection (Supplementary Fig. 2). Patients who were prescribed statin medications exhibited a significantly lower low-density lipoprotein (LDL) (p < 0.001; Supplementary Fig. 3), so we avoided inclusion of both LDL and use of statin because of the anticipated causal relationship. We assessed the effect of anti-hypertensive medication prescription rather than blood pressure measurement, given that single random blood pressure on the day of the VCTE may not accurately capture a clinical diagnosis of hypertension. For diabetes, because having an active prescription for diabetes medication did not have the predicted relationship with hemoglobin A1c (being on treatment for was associated with higher hemoglobin A1c, p < 0.001; Supplementary Fig. 4), we defined diabetes as a hemoglobin A1c > 6.5% or having a diabetes medication prescription. We considered including HDL, triglycerides and BMI as components of metabolic syndrome, consistent with the National Heart, Lung, and Blood Institute (NHLBI) definition and given the AGA Clinical Care Pathway guidance statement (see Introduction). Given that triglycerides are commonly defined in the NHLBI metabolic syndrome definition, they were selected for inclusion over HDL. ALT and AST demonstrated a high correlation (r = 0.81), and given the association between ALT and NAFLD in previous work [20], ALT alone was selected for inclusion.

Validation Cohort

To validate the predicted NAFLD clinical definition and CAP score, we identified 40 unique Veterans in 2017–2021. VCTE reports and either imaging (US or MRI) or LB were available for all 40 patients (see Statistical Analysis for power calculation).

Imaging Data

US and MRI abdomen were reviewed centrally for the presence or absence of HS, by a single expert abdominal radiologist (F.H.) who was blinded to the remainder of the data and statistical analysis. Of note, only 7 MRI with elastography with PDFF to quantitate HS were available and included in our cohort (MRE was introduced after VCTE at GLA). HS on MRI was identified using T1-weighted gradient-echo in-phase and out-of-phase sequences. If there was signal drop out in the out-of-phase sequence compared to in-phase, then HS was diagnosed. Computerized tomography imaging data were excluded because of its limited accuracy in HS diagnosis [25]. All imaging data were obtained at the time closest to the VCTE date, with median time of 8.2 weeks for US and 2.1 weeks for MRI. Of note, 3 abdominal USs were conducted outside our local VA including through community care, which were not able to be reviewed locally.

Liver Biopsy

LB pathology reports were reviewed when available. The presence or absence of HS was recorded as a binary variable and included any grade of steatosis. Of note, NAFLD activity score was not clinically utilized.

Statistical Analysis

Covariable Descriptions

Patient demographics and clinical parameters were summarized by mean ± standard deviation (SD) for continuous variables and frequency (%) for categorical variables.

Power Calculation for Validation Cohort

To test the fit of our 6-predictor regression, assuming the tests for total regression with R2 = 0.35 (F2 = 0.55), n = 29 patients were needed for a power of 80% and a significance of 0.05.

Development and Performance of NAFLD Definition

Multivariable linear regression modeling using CAP score as a continuous variable and backward elimination of covariables to determine significant predictors of HS. Specifically, the “caret” library in R was used to implement the leave-one-out, cross-validation approach, which offers more stable estimates in high-dimension, small sample data. To evaluate the performance of our NAFLD definition, we determined the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). We calculated the correlation between predicted and actual CAP scores.

Determination of CAP Threshold

Given the heterogeneity in steatosis thresholds across different populations [26], we developed an internally validated CAP threshold that distinguishes presence or absence HS in Veterans. We defined the HS using a hierarchical composite algorithm where all available LBs were selected first (classified as positive or negative for any grade of HS), followed by all unique MRIs (classified as positive or negative for HS), and finally unique US (classified as positive or negative HS). Agreement between the test results for LB, MRI, and US were explored using Cohen’s kappa. Associations between CAP score and each of LB, MRI, and US were explored using unpaired t tests. Receiver operating characteristic (ROC) analysis was used to determine the Veteran-specific CAP cut-off using our HS definition as a binary variable. When results occasionally conflicted, the result from the more accurate test was used, as defined by LB, followed by MRI, and then US. Each patient was only counted once. We identified 31 patients without a LB, MRI, or US. Since CAP scores of the 31 patients ranged from no HS to severe steatosis [27] (Supplementary Fig. 1), they were excluded from our final cohort (Fig. 1). The optimal CAP threshold for HS detection was selected using the Youden method. The ROC curve was plotted using R 4.1 library pROC. The optimal cut-off was identified using library cutpointr function “oc_youden_kernel,” which maximizes both sensitivity and specificity on the ROC curve after smoothing of the predictor by a binned kernel density estimate. p values are from 2-sided tests with p < 0.05 considered statistically significant. Analyses was completed in R 4.1.2.

Results

Demographic and Clinical Characteristics

Four-hundred and thirty-four patients met our inclusion criteria (Fig. 1). Of these, 403 had at least one of LB, MRI, or US available for inclusion in CAP threshold development (Table 1). All 403 patients had a complete set of pre-selected predictors. The mean age was 60.1 ± 13.4 years. The cohort was predominantly male (96.8%, n = 390) and was racially and ethnically diverse, with African American/Black race representing 26.1% (n = 105) and Hispanic ethnicity constituting 19.5% (n = 78) of the cohort. Most patients had features of the metabolic syndrome: 53.5% (n = 215) were on anti-hypertensive medications; 34.0% (n = 137) were on diabetes treatment; and 51.0% (n = 205) were prescribed a statin. The median CAP score in our cohort was 286.5 dB/m (Fig. 2a). The median LSM was 6.6 kPa, with the majority (n = 295, 73.2%) of patients with early stage (F0–F2) fibrosis and were therefore less likely to have “burnt-out” NASH [28] (Fig. 2b). Of the patients with ≥ 20 kPa (n = 39), 51% had CAP > 274 dB/m (mean 280.5 dB/m). This was consistent with largely preserved synthetic liver function, evidenced by mean total bilirubin of 0.8 ± 0.4 mg/dL and mean platelet count of 210.6 × 109/L, suggestive of no clinically significant portal hypertension.

Table 1 Patient and clinical demographics of cohort (n = 403)
Fig. 2
figure 2

CAP and fibrosis distribution of cohort based on VCTE (n = 403)

NAFLD Predictors and Model Performance

Model Development

Our regression analysis included the following covariables: age, sex, race, ethnicity, statin prescription, anti-hypertensive prescription, ALT, triglycerides, cholesterol, diabetes, and BMI. After stepwise variable selection and cross-validation, the optimal linear multivariate model identified had 6 predictors F(6, 387) = 37.8, p < 0.001, \({R}_{\mathrm{adj}}^{2}\) = 0.36 (Supplementary Fig. 5). The 6 predictors included African American/Black race, Hispanic ethnicity, cholesterol, diabetes, BMI, and triglycerides (Table 2). Consistent with previous work [24], we found a significant positive association between CAP scores and patients who self-identified as Hispanic (p < 0.001; Supplementary Fig. 6) and a significant negative association between self-identifying as African American/Black and CAP scores (p < 0.001; Supplementary Fig. 6). The final NAFLD definition derived from the multivariable model was as follows:

$$\begin{aligned} \widehat{CAP} = & {75}.{67} + \left( { - {26}.{6} \times {\text{Black}}} \right) + \left( {{13}.{7} \times {\text{Hispanic}}} \right) + \left( {0.{15} \times {\text{Cholesterol}}} \right) + \left( {{5}.{3} \times {\text{BMI}}} \right) \\ &\quad+ \left( {{32}.0 \times {\text{Diabetes}} - {\text{Prescription}}\;{\text{or}}\;{\text{high}}\;{\text{A1c}}} \right) + \left( {0.0{6} \times {\text{Triglycerides}}} \right) \\ \end{aligned}.$$
Table 2 Significant clinical predictors of CAP scores

Model Performance in Discovery Cohort

We used our NAFLD predictive equation to calculate a predicted CAP (\(\widehat{\mathrm{CAP}}\)) score for each patient. \(\widehat{\mathrm{CAP}}\) and CAP were significantly correlated (p < 0.001, 95% CI 0.54–0.67; Fig. 3). We further evaluated the performance of our predictive model within our Veteran cohort using our internally developed CAP threshold of 273.5 dB/m. \(\widehat{\mathrm{CAP}}\) was classified as above or below 273.5 dB/m (see above) and compared to actual CAP as above or below 273.5 dB/m. The sensitivity, specificity, NPV, and PPV for our model to identify HS as defined by CAP > 273.5 dB/m were 77%, 76%, 72%, and 80%, respectively.

Fig. 3
figure 3

Relationship of CAP predicted by formula and measured CAP

Model Performance in Validation Cohort

We used our NAFLD predictive equation to calculate a predicted CAP score for each patient in the validation cohort (n = 40) (Supplementary Table 3). CAP and \(\widehat{\mathrm{CAP}}\) were significantly correlated (p < 0.001, 95% CI 0.46–0.81; see Supplement Figs. 7 and 10). We again classified \(\widehat{\mathrm{CAP}}\) and actual CAP for each patient as above or below 273.5 dB/m and compared them. The sensitivity, specificity, NPV, and PPV for our model to identify HS as defined by CAP > 273.5 dB/m were 82%, 83%, 86%, and 78%, respectively.

Development of Internally Validated CAP Threshold

Associations Between Diagnostic Modalities

We first verified that the CAP score was significantly higher when steatosis was present for LB (t(41) = 4.9, 95% CI 45.9–110.9), MRI (t(109) = 6.4, 95% CI 50.4–95.3), and US (t(366) = 9.5, 95% CI 51.4–78.2) (Supplementary Fig. 8). Several patients had more than one US, MRI, and LB available (Supplementary table 2). When a patient had multiple studies available, we compared imaging and LB data for the presence or absence of HS. US and LB (ĸ(50) = 0.47, 95% CI 0.47–0.72) and US and MRI (ĸ(104) = 0.45, 95% CI 0.6–0.75) had moderate concordance, while MRI and LB exhibited fair/low concordance (ĸ(50) = 0.22, 95% CI − 0.25 to 0.69), attributable to a low count (n = 14) (Supplementary Fig. 9). As expected and as is typical of a real-world cohort, the majority of our data had HS arbitrated by radiographic evidence (either MRI or US, n = 349; 87%) and fewer by LB (n = 54; 13%).

ROC Analysis and CAP Threshold Selection

The area under the curve for the CAP scores predicting NAFLD as defined by HS on US, MRI, or LB was 75.5% (95% CI 70.7–80.3%) (Fig. 4). The Youden cut-off was 273.5 dB/m, resulting in sensitivity of 74.8%, specificity of 68.9%, PPV of 75.4%, and NPV of 68.2%. We additionally assessed the CAP threshold using only the sub-group of patients who had a LB and MRI (without US), given their higher sensitivity in detecting HS. We found that a total of 155 patients had either a LB or MRI. Similarly to the group of patients with US, we found that the optimal CAP threshold was 272 dB/m (Supplementary Fig. 10). Alternate CAP cut-offs and their resulting sensitivity, specificity, PPV, and NPV values are provided in Supplementary table 4.

Fig. 4
figure 4

Receiver operating characteristic curve replicating CAP scores that suggest tests positive for hepatic steatosis (n = 403)

Discussion

In this study, we developed a novel HS clinical predictive algorithm that can be used to screen Veterans in primary care and endocrinology clinics. Our model has key features that are worth emphasizing: (1) it uses clinically reasonable and readily available predictors, allowing ease of implementation; (2) it is the first to utilize the VCTE CAP score in its development in a Veteran population; (3) it is validated in an independent Veteran cohort; and (4) it is in keeping with growing efforts to develop clinical care pathways that can capture NAFLD patients early, before they develop late complications, especially as most cirrhosis related to NAFLD is found incidentally. [29]

In the context of linkage to care, we envision embedding our predictive model in EHR to identify patients at risk for NAFLD in primary care clinics. Patients who are predicted to have NAFLD based on the predicted CAP score can subsequently be risk stratified by FIB-4, which can guide downstream assessment by VCTE and referral to sub-specialty care. FIB-4 has been shown to have a low accuracy when applied to the general population and has the potential to overestimate the risk of liver disease and cause over-referrals to sub-specialty care [30, 31]. While VCTE is more accurate, its general use would hinder clinics without this technology and likely drive health care costs given it has been shown not be cost effective as an initial non-invasive test for NAFLD assessment [19]. There is therefore a gap in developing a simple clinical tools, such as ours, to bridge non-invasive testing where VCTE may not be available, as recommended by the society guidelines. [11, 32]

Variable selection was methodical and informed by NAFLD biologic plausibility and also ease of clinical use. Noureddin and colleagues predicted the prevalence of NAFLD based on CAP scores using machine learning methods in the National Health And Nutrition Examination Survey cohort [33]. In their agnostic approach, they identified all of our predefined covariables, affirming the validity of our model [33]. However, they identified waist circumference as a significant predictor, which is not collected in routine clinical care, as evidenced by the lack of waist circumference measurements in our cohort and which therefore limits the clinical applicability of their model. Previous algorithms, such as the fatty liver index, the NAFLD-Liver Fat Score, the NAFLD Screening score, and the ALD/NAFLD Index, are also similarly limited because they utilize variables such as waist circumference, insulin level, uric acid, and haptoglobin which are not routinely collected by primary care [34, 35]. By contrast, our \(\widehat{\mathrm{CAP}}\) model can be calculated for any Veteran evaluated in primary or sub-specialty clinics without requiring additional dedicated bloodwork or clinic encounters.

In addition to predicting a CAP threshold, identifying one that diagnoses HS can simplify clinical decision-making. We noted heterogeneity in previously published CAP thresholds that may be attributable to population-specific factors or the method by which HS was detected [33, 36, 37]. We therefore opted to determine an internal VA CAP threshold rather than using historical cut-offs. Although LB remains the gold standard in the spectrum of NAFLD and NASH, few are conducted clinically. The use of LB to select NAFLD cases also introduces selection bias because a very small, narrow subset of patients with NAFLD are, in fact, biopsied. To achieve more real-world applicability, our study incorporated imaging modalities to arbitrate the diagnosis of steatosis. Previous investigation into the sensitivity and specificity of US and MRI in HS detection guided our approach [25]. While, US is the most commonly used method in real-world practice, as evidenced by our findings, the sensitivity for HS detection is lower when < 30% [25]. We found that our CAP score findings in Veterans are concordant with recent VHA NAFLD Consensus Guidelines [38], which were guided by expert opinion.

In our proposed score, self-identified Black race was a negative predictor, while Hispanic ethnicity was a positive predictor of NAFLD. This is consistent with prior large genetic and epidemiologic studies on population-specific prevalence of NAFLD [24, 39]. It is increasingly recognized that differences in epidemiology, natural history, and outcomes according to race and ethnicity may at least partially reflect underlying unmeasured social and structural determinants of health [40]. The inclusion of race and ethnicity in our clinical score can serve to indirectly measure these influences in addition to any potentially unknown hereditary factors which contribute to NAFLD risk, and as our data show, can improve prediction of CAP score and NAFLD risk overall.

Our study is not without limitations including its retrospective nature and single-center study population. There is a selection bias of our cohort as it mainly consists of patients evaluated in sub-specialty care, where the vast majority of VCTE are ordered. We also recognize that patients with “burnt-out” NASH may not have steatosis, potentially precluding them from being identified, although we found that 51% of our cohort with kPa ≥ 20 still had elevated CAP scores. Finally, the population is predominantly male and may not be generalizable to non-VA populations but performs well within Veterans, where VA-specific scores are needed.

NAFLD remains underdiagnosed [41,42,43], limiting our appreciation of the true disease burden and therefore, our ability to tackle it most effectively: at an early stage, when primary prevention can avert or delay complications. In line with the recent NAFLD Consensus Statement and society guidelines advocating for early identification [32, 44], we propose that incorporating our predictive algorithm in VHA EHR would simplify the identification of at-risk Veterans in primary care and endocrinology clinics, who can subsequently be risk stratified for advanced fibrosis by non-invasive tests [11, 44, 45]. This would provide an opportunity to implement preventative interventions including weight loss counseling, nutrition referral, and initiation of structured exercise programs which are available in VAHCS [46]. Future work will focus on validating the predictive model prospectively in Veterans, while quality improvements measures in NAFLD clinical care pathways and engagement of key stakeholders are ongoing.