Background

Liver disease is a major cause of death in many countries [1], and patients with liver fibrosis and cirrhosis are at high risk of developing liver cancer. Liver cancer is ranked among the top three cancers in 46 countries. In 2020, 905,700 persons worldwide were diagnosed with liver cancer, and 830,200 died from the disease. The global burden of primary liver cancer is estimated to increase by more than 55% from 2020 to 2040 [1]. Earlier diagnosis of liver disease can improve outcomes and reduce the risk of progression to malignancy.

Current clinical practice guidelines include liver biopsy as the gold standard for the diagnosis of liver fibrosis and cirrhosis [2]; however, liver biopsy is invasive and is prone to sampling error. Noninvasive tests of liver function and fibrosis provide additional diagnostic information, and diagnostic accuracy can be improved by combining information into algorithms that include patient factors and clinical biomarkers. Current diagnostic algorithms include Fib-4, the enhanced liver fibrosis (ELF) test, the aspartate platelet ratio index (APRI), Fibrotest (CE)/FibroSure (US), and Fibrometer (see Additional File 1). Imaging tests include Fibroscan and magnetic resonance elastography (MRE). While these tests and algorithms help guide clinical decision making, each of them has drawbacks and limitations. Additional biomarkers are needed to improve the accuracy of current methods, particularly for the earlier detection of liver fibrosis and cirrhosis.

GP73 was initially described as a novel Golgi-localized protein that is upregulated in viral and nonviral liver disease [3]. A subsequent study found that GP73 protein levels were minimal or undetected in normal liver but significantly elevated in hepatitis B virus (HBV)-, hepatitis C virus (HCV)-, and alcohol-induced liver disease, as well as autoimmune liver disease [4]. Several groups have since developed immunoassays to study serum GP73 levels for the diagnosis of liver disease [5,6,7,8,9,10,11,12,13,14] including chronic hepatitis, cirrhosis, and hepatocellular carcinoma (HCC). One study found that GP73 detected by enzyme-linked immunosorbent assay (ELISA) was a more accurate biomarker of liver fibrosis compared to Fib-4 (area under the curve [AUC] 0.751 for Fib-4 versus 0.898 for GP73) [14]. However, we previously showed that GP73 was not significantly elevated in HCC compared to the biomarker PIVKA-II (protein induced by the absence of vitamin K or antagonist-II) and AFP [15].

Laminin-332 (Ln-322) is abundant in HCC tissue, where it has been reported to support proliferation, migration, and invasion of tumor cells [16,17,18,19]. More specifically, a component of Ln-322, the Laminin-gamma 2 monomer (LG2m), is frequently expressed in several types of malignant cancer cells and tissues [20] and promotes the adhesion, migration, and scattering of HCC cells [21]. LG2m may play a crucial role in cancer invasion and metastasis as it is deposited in high concentrations at the invading edge of solid tumors, where it mediates the migration and invasion of transformed cells [20, 22, 23]. A previous study reported that serum LG2m levels were significantly elevated in patients with HCC and the presence of both LG2m and PIVKA-II was more sensitive for diagnosis of HCC than existing liver tumor markers [24]. In addition, the level of LG2m was found to predict extrahepatic spread in patients with HCC and the development of HCC in patients with chronic hepatitis C who achieved a sustained virological response [25]. Therefore, LG2m in human serum may be useful as a biomarker for HCC surveillance and risk stratification for HCC development and metastasis in patients with chronic liver disease.

In previous work, we developed and validated a new algorithm for the early detection of HCC that included age, sex, alpha fetoprotein (AFP), and PIVKA-II (ASAP) [26]. Other studies have since validated the same four biomarkers using different statistical approaches (e.g., GAAD/GALAD and GAAP models) [27, 28] and have developed an online calculator for detecting HCC in patients with HBV [29]. Applying the same approach here, we conducted a pilot study to evaluate the utility of combining GP73, LG2m, age, and sex (GLAS algorithm) for the detection of liver fibrosis and cirrhosis. GP73 and LG2m were measured using newly developed, robust chemiluminescent immunoassays run on the Abbott ARCHITECT system. A pilot study of the GLAS algorithm was performed at Johns Hopkins University School of Medicine (JHU) and a validation study was performed at Peking Union Medical College Hospital (PUMCH) in China.

Methods

GP73 antibodies and immunoassay development

Mice were immunized with recombinant GP73 and fusions were performed with an NSO myeloma cell line to produce monoclonal antibodies. Twelve IgG antibodies were produced in-house and were screened for use in a prototype GP73 ARCHITECT immunoassay, for a total of approximately 144 antibody pairs. The best antibody pairs were initially selected based on sensitivity, range, and reagent stability. The capture IgG1 (kappa) antibody was coated on magnetic microparticles. The conjugate antibody was murine human chimeric Fab (muhuFab) produced in CHO cells. This conjugate design minimized interference by human anti-mouse antibodies (HAMA) because it lacks the Fc region of the antibody and provided better sensitivity [30]. Microparticle reagent bulk stability was tested under heat stress for 3 days at 45 °C compared to controls at 2–8 °C (see Additional File 2). The conjugate reagent bulk stability was tested with heat stress at 30 and 37 °C for 7 and 14 days compared to the 2–8 °C control condition (see Additional File 3). Further prototype verification studies with the selected antibody pair included reagent stability, limit of blank, limit of detection, and limit of quantitation (LoBDQ), dilution linearity, 20-day precision, range, and interference testing.

LG2m assay development

A hybridoma of an anti-LG2m monoclonal antibody (Clone 1) used for capture was originally developed by Koshikawa et al. [31]. A hybridoma of an anti-LG2m monoclonal antibody (Clone 2) used for detection was developed by Abbott Laboratories (Lake County, IL, USA). Clone 1 detected only the LG2m monomer, and not LG2m as a component of Ln-332. Monoclonal antibodies were prepared and purified by Abbott Laboratories on a protein A column. The capture antibody (Clone 1) was produced in CHO cells and coated on magnetic microparticles. The conjugate antibody (Clone 2) was labeled with acridinium for the detection of LG2m. Assay prototype verification studies including LoBDQ, dilution linearity, 20-day precision, range, auto-dilution, and interference testing were performed.

Study design and serum samples

The prototype GP73 and LG2m ARCHITECT immunoassays were used to measure GP73 and LG2m concentrations in residual serum samples collected between 2003 and 2016 at JHU in Baltimore, MD, from patients with fibrosis or cirrhosis with viral or non-viral etiology, and healthy controls (n = 147) [26]. All fibrosis samples had a known stage determined by traditional methods of biopsy and standard liver enzyme tests. Additional residual serum samples were analyzed at JHU that had been collected after obtaining informed consent from patients with liver cirrhosis at the University of Texas Southwestern Medical Center (UTSMC) in Dallas, TX. For each residual serum sample, the following de-identified data was collected: age, sex, race/ethnicity, and etiology of liver disease. An additional set of serum samples (collectively referred to as the Western Vendor Cohort, WVC; n = 246) were purchased from BioIVT (Wesbury, NY), Biomex GmbH (Heidelberg, Germany), Discovery Life Sciences (Huntsville, AL), and ProMedDx (Norton, MA). These three sample sets (JHU/UTSMC/WVC) were combined to develop and train the liver fibrosis and cirrhosis diagnostic algorithm (development cohort). The study was approved by the JHU IRB (#IRB00196747).

For the validation cohort, samples were obtained from PUMCH (Bejing, China), from patients with liver disease (fibrosis, cirrhosis, or chronic liver disease) and healthy subjects (n = 501). This cohort was used to validate the model derived from the development cohort. Patients with chronic liver disease included those with fatty liver disease, HBV-induced liver disease, and/or autoimmune hepatitis. For each serum sample, the following de-identified data was collected: age, sex, race/ethnicity, and etiology of liver disease. The study was approved by the PUMCH IRB (HS-2386).

Sample storage and assays

Serum samples were stored at approximately − 80 °C prior to analysis. GP73 and LG2m levels were measured using the prototype GP73 and LG2m ARCHITECT immunoassays on an ARCHITECT i2000SR analyzer (Abbott Laboratories, North Chicago, IL). Each two-step sandwich immunoassay utilizes paramagnetic microparticles coated with either anti-GP73 [32] or anti-LG2m [33] antibodies and produces a chemiluminescent signal for the quantitative measurement of GP73 or LG2m in human serum and plasma. The performance characteristics for the prototype ARCHITECT GP73 and LG2m assays are described in Table 1. Both assays were analytically robust, with performance similar to that of other automated in vitro diagnostic immunoassays.

Table 1 Performance characteristics of the prototype GP73 and LG2m assays

Statistical analysis

Biomarker concentrations were stratified by disease category. The probability of each biomarker to detect non-cancer liver disease (chronic liver disease, fibrosis, and/or cirrhosis) was determined and logistic regression (LR) classification models were used to explore the best combination of biomarkers for the detection of non-cancer liver disease. For biomarkers with skewed distribution, logarithmic transformation was applied prior to modeling. Wilcoxon tests were used for significance testing.

All of the samples from the development cohort (JHU/UTMSC/WVC) were used to train the models. The response variable for the models was the binary liver disease status (fibrosis and/or cirrhosis versus healthy). Multiple LR models were developed by selecting different combinations of age, sex, and the two biomarkers as the classifiers. The best model was selected based on the combination of classifiers with the highest ROC AUC. Confidence intervals for AUCs were calculated by taking 2000 stratified bootstrapped replicates. The sensitivities (SEs) and specificities (SPs) were reported at the default cutoff of 0.5 for LR. Additionally, sensitivity at a fixed specificity of 90% was reported as the median value across 2000 stratified bootstrapped replicates. Median values were also calculated from specificity at fixed sensitivities of 90% and 75%. AUCs were compared by pairing AUC curves as described by Delong et al. [34].

The best model selected from the development cohort was further assessed. To evaluate the generalizability of the best model in a different population, an independent validation cohort was used to validate model performance. Non-cancer liver disease was added to chronic liver disease (including fatty liver disease, HBV-induced liver disease, and/or autoimmune hepatitis) to better evaluate the performance in clinical practice.

All statistical analyses were performed using R 4.1.1 (The R Foundation for Statistical Computing).

Results

Algorithm development cohort demographics

The development cohort consisted of serum samples from 78 patients with fibrosis (with or without hepatitis), 182 patients with cirrhosis (with or without hepatitis), and 133 healthy subjects (Table 2; N = 393). The median ages for patients in the fibrosis, cirrhosis, and healthy control groups were 54 (interquartile range [IQR] 45–63), 56 (IQR 48–63), and 40 (IQR 33–56) years, respectively, with the majority of patients being White males (fibrosis/cirrhosis) or Black males (healthy).

Table 2 Development Cohort Demographics (N = 393)

Algorithm validation cohort demographics

The validation cohort included 503 individuals; of these, 501 were included in this analysis and 2 were excluded for missing assay results as these specimens had been depleted prior to the study. The validation cohort included 119 patients with cirrhosis, 129 patients with fibrosis, 147 patients with chronic liver disease, and 106 healthy subjects (Table 3). Patient age ranged from 19 to 88 years, with a greater proportion of men in the chronic liver disease and fibrosis groups and a greater proportion of women in the healthy and cirrhosis groups. All individuals in the validation cohort were Asian.

Table 3 Validation Cohort Demographics (n = 501)

Biomarker concentrations

In the development cohort, median GP73 and LG2m concentrations were found to be higher in patients with fibrosis/cirrhosis (GP73 121.27 ng/mL; LG2m 29.16 pg/mL) than in healthy controls (GP73 52.99 ng/mL; LG2m 9.01 pg/mL). On average, patients with cirrhosis had higher median biomarker concentrations (GP73 152.49 ng/mL; LG2m 48.39 pg/mL) compared to patients with fibrosis GP73 72.12 ng/mL; LG2m 10.46 pg/mL; Fig. 1A, B). In the validation cohort, median GP73 and LG2m levels were also higher overall in patients with fibrosis/cirrhosis (GP73 105.14 ng/mL; LG2m 31.70 pg/mL) compared to healthy controls (GP73 51.80 ng/mL; LG2m 10.40 pg/mL; Fig. 1C, D). However, both GP73 and LG2m median concentrations in the validation cohort were slightly higher for patients with fibrosis (GP73 115.10 ng/mL; LG2m 42.30 pg/mL) compared to patients with cirrhosis (GP73 91.11 ng/mL; LG2m 22.65 pg/mL; Fig. 1C, D). On average, biomarker levels for patients with chronic liver disease fell between those in the fibrosis/cirrhosis and healthy groups, with a median GP73 concentration in the chronic liver disease group of 57.76 ng/mL and 12.75 pg/mL for LG2m (Fig. 1C, D).

Fig. 1
figure 1

Distribution of GP73 and LG2m by disease state in the development (A, B) and validation cohorts (C, D). Biomarker concentrations were significantly different between healthy and disease (fibrosis/cirrhosis) samples (Wilcoxon test)

Model performance in the development cohort

Five models were created and compared from the development cohort data using four potential variables: age, sex, GP73, and/or LG2m (Table 4). The AUC for differentiating fibrosis/cirrhosis from healthy controls was slightly higher for GP73 alone (Model 1: 0.86, 95% CI: 0.82–0.89) compared to LG2m alone (Model 2: 0.83, 95% CI: 0.79–0.87), but the difference was not statistically significant. The addition of age and sex to either the GP73 or LG2m models increased AUCs (Model 3: 0.91, 95% CI: 0.89–0.94 and Model 4: 0.88, 95% CI: 0.85–0.92, respectively). The AUC values from both updated models were improved and statistically significant compared to the individual biomarkers alone (p < 0.0001 and p = 0.0003, respectively).

Table 4 Diagnostic Performance of Biomarkers Alone and in Combination with Clinical Factors in the Development Cohort (JHU/UTSMC/WVC) and Model 5 in the Validation Cohort (PUMCH)

The best model included all four variables, GP73, LG2m, age, and sex (the GLAS algorithm), and increased the AUC to 0.92 (Model 5: 95% CI: 0.90–0.95), with a sensitivity of 88.8% and a specificity of 75.9% (Fig. 2A). The increase was statistically significant compared to GP73 or LG2m alone (Models 1 and 2) and the model with LG2m, age, and sex (Model 4; all p-values < 0.0001). The increase was not statistically different compared to Model 3 with GP73, age, and sex (p = 0.0621).

Fig. 2
figure 2

ROC curves for the GLAS algorithm (Model 5) in the development (A) and validation (B) cohorts for fibrosis/cirrhosis versus healthy subjects

Model validation in an independent cohort

The best model from the development cohort (Model 5, the GLAS algorithm) was evaluated using the validation cohort data as an independent assessment of clinical performance (Table 4). The GLAS algorithm had an estimated AUC for fibrosis/cirrhosis of 0.93 (95% CI: 0.90–0.95) in the validation cohort (Fig. 2B). It had an estimated sensitivity of 91.1% and a specificity of 80.2%; when specificity was held to 90%, the median sensitivity was estimated to be 81.0%. The GLAS algorithm was further assessed using the validation cohort data set after stratification of fibrosis and cirrhosis etiology. AUCs were comparable for viral and non-viral liver disease, with an AUC of 0.91 (95% CI: 0.86–0.96) for viral induced fibrosis/cirrhosis and 0.94 (95% CI: 0.91–0.97) for non-viral induced fibrosis/cirrhosis compared to healthy subjects. As an exploratory analysis, the GLAS algorithm was also applied to the validation cohort to discriminate patients with chronic liver disease from healthy subjects. For this application, the model had an estimated AUC of 0.65 (95% CI: 0.58–0.71), with a sensitivity of 42.9% and specificity of 80.2%.

GLAS algorithm performance by disease state

The performance of the GLAS algorithm was assessed by disease state in both the development and validation cohorts (Fig. 3A, B). In the development cohort, the median GLAS algorithm predicted probability was 0.959 in the fibrosis/cirrhosis group compared to 0.232 in healthy controls, and patients with cirrhosis had a median GLAS prediction of 0.983 compared to 0.671 in patients with fibrosis (Fig. 3A). In the validation cohort, the GLAS algorithm predicted probability was also higher overall in the fibrosis/cirrhosis group (median 0.949) versus healthy controls (median 0.249; Fig. 3B). In the validation cohort, the median predicted probability using the GLAS algorithm was 0.950 in patients with fibrosis versus 0.944 in patients with cirrhosis, which matches the trend observed for GP73 and LG2m biomarker concentrations in these groups (Figs. 1B and 3B). Additionally, the median GLAS prediction value for patients with chronic liver disease was 0.434, falling in between that of the healthy controls and patients with fibrosis/cirrhosis, which matches the trend seen in biomarker concentrations in each group (Figs. 1B and 3B).

Fig. 3
figure 3

Distribution of GLAS algorithm predicted probabilities by disease state in the development (A) and validation (B) cohorts. Dashed line at 0.50 represents the cutoff for the algorithm

Discussion

In this pilot study, we demonstrated that serum biomarkers GP73 and LG2m, when combined with age and sex to create the GLAS algorithm, showed superior sensitivity and specificity for detection of liver fibrosis and cirrhosis. Analysis of the GLAS algorithm in an independent validation cohort showed similar clinical performance, although with lower AUC, sensitivities, and specificities. Differences in the demographics and disease classifications and etiologies between the development and validation cohorts may account for differences in biomarker levels and model performance; for example, the development cohort had a smaller number of patients with cirrhosis and a more diverse patient population than the validation cohort. Nevertheless, the GLAS algorithm remained robust for distinguishing between healthy subjects and patients with fibrosis or cirrhosis in these two highly different cohorts.

The new GLAS algorithm outperformed other algorithms used for diagnosing liver disease that are reported in the literature. In our study, Model 5 had an AUC of 0.92, sensitivity of 88.8%, and specificity of 75.9% in the development cohort and an AUC 0.93, sensitivity of 91.1%, and specificity of 80.2% in the validation cohort for detection of fibrosis or cirrhosis. These AUC values are much higher than those for the FIB-4 (AUC 0.751) and APRI algorithms (AUC 0.737) for significant fibrosis [14] (see Additional File 1). The diagnostic accuracy of the ELF algorithm, which combines detection of hyaluronic acid, type III procollagen peptide (PIIINP), and tissue inhibitor of metalloproteinase-1 (TIMP1), was evaluated in a recent meta-analysis of studies including nearly 20,000 individuals with or at risk of developing a wide variety of viral and non-viral liver diseases [35]. The analysis reported an AUC of 0.811 for detecting fibrosis, 0.812 for advanced fibrosis, and 0.810 for advanced cirrhosis. In patients with chronic HCV, the Fibrotest/FibroSure algorithm, which includes α2-macroglobulin, haptoglobin, gamma-glutamyltransferase, gamma-globulin, total bilirubin, and apolipoprotein A1, had an AUC of 0.74, sensitivity of 75.4%, and specificity of 71.4% for detection of fibrosis [36]. In a meta-analysis, the algorithm was found to have suboptimal diagnostic accuracy for fibrosis and cirrhosis in patients with HBV (AUC 0.84, sensitivity 61%, and specificity 80%) [37]. Fibrometer, which combines age, weight, platelet count, AST, ALT, ferritin, and glucose, has been evaluated for the detection of fibrosis in NAFLD. In a recent meta-analysis of 7 studies including 1616 patients with NAFLD, Van Dijk et al. reported an AUC of 0.82 (sensitivity 83.5, specificity 91.1%) for Fibrometer in detecting advanced fibrosis, and lower accuracy (0.62–0.78) for detecting significant fibrosis in 3 studies [38]. In a study of 134 patients with various autoimmune liver diseases, Fibrometer was found to have an AUC of 0.66 for severe fibrosis, which increased to 0.77 when combined with liver stiffness measured by transient elastography [39].

Differences between our findings and those of others may be related to the use of the LR model for statistical analysis, as well as differences in the patient populations. A limitation of this study was that the control groups in the development and validation cohorts were different in composition, given the retrospective nature of the analysis, which limits assessment of specific confounding variables.

This pilot study resulted in an algorithm consisting of two biomarkers (LG2m, GP73) and two demographic variables (age and sex) that demonstrated promise for predicting liver disease (fibrosis and cirrhosis). The GLAS algorithm will be further tested in cohorts with staged fibrosis and cirrhosis to evaluate its performance in early detection of liver disease. The GLAS algorithm will also be compared in head-to-head studies with other leading blood-based algorithms such as FIB4 or APRI, as well as in larger cohorts with greater geographic and racial/ethnic diversity and various liver disease etiologies.

Conclusions

Compared to individual biomarkers, the combination of GP73 and LG2m with age and sex significantly improved the accuracy of detecting fibrosis and cirrhosis liver disease in two large and diverse patient cohorts. Further refinement of the GLAS algorithm will produce a highly accurate clinical tool to aid in the evaluation of signs of disease progression in patients with advanced liver disease. Larger longitudinal studies are needed to validate the GLAS algorithm in detecting stable versus advancing liver disease in patients over time.