Study population
This prospective study was approved by the Institutional Review Board of our hospital, and informed consent was obtained. From October 2013 to April 2015, a total of 144 hepatitis B virus (HBV) patients who met the eligibility criteria were included in the study. The inclusion and exclusion criteria are detailed in the Supplementary Materials. For each patient, comprehensive blood tests (aspartate transaminase (AST), alanine transaminase (ALT), serum albumin, g-glutamyltransferase, total bilirubin, and platelet count) were performed no more than 3 days before the surgery or biopsy. Combinations of simple markers such as the aspartate aminotransferase-to-platelet ratio index (APRI) and fibrosis-4 index (FIB-4) were calculated; the formula is provided in the Supplementary Materials [15].
Liver histology analysis
All patients included in the study underwent partial liver surgery (n = 61) or biopsy (n = 83). Resected liver specimens approximately 10 mm × 10 mm in size were preserved intraoperatively. A US-guided percutaneous liver biopsy of the right lobe was performed with an 18-gauge needle (Bard) within 3 days after ultrasonography. All specimens were fixed in formalin, embedded in paraffin and stained with hematoxylin-eosin (H&E) and Masson. Two liver pathologists with > 10 years of experience, who were blinded to the results of imaging but not to the clinical and biochemical data of the patient, analyzed the specimens. Liver fibrosis was evaluated according to the METAVIR scoring system as follows: F0, no fibrosis; F1, portal fibrosis without septa; F2, portal fibrosis and few septa; F3, numerous septa without cirrhosis; F4, cirrhosis. Significant fibrosis was defined as a score of F2 or greater. The METAVIR system was used to score the intensity of necroinflammatory activity (mainly necrosis) as follows: A0 = no necroinflammatory activity, A1 = mild activity, A2 = moderate activity and A3 = severe activity [4]. Steatosis was scored, using a four-grade scoring system, from S0 to S4: S0 = no steatosis; S1 = mild (1–5%) (% of hepatocytes containing visible macrovesicular steatosis); S2 = moderate (6–32%); S3 = marked (33–66%); S4 = severe (67–100%) [4, 16, 17].
Multiparametric ultrasomics acquisition and feature extraction
All US examinations were performed with the Aplio 500 scanner (Canon Medical System) equipped with a 375BT convex transducer (frequency, 3.5 MHz). US examinations were performed by one of two radiologists (X.Y.X. and W.W.) with at least 10 years of experience with routine US. Three types of parameters were acquired:
-
1.
B-mode images in digital imaging and communications in medicine (DICOM) format. Images were obtained with intercostal oblique scanning and were expected to show the liver parenchyma from the right intercostal space to the segment 6 region of the right hepatic lobe. Display depth and transmit focus were fixed at 6 cm and 4 cm, respectively, with the receive gain equal to 80. Large vessels defined as > 2.0 mm were avoided. The settings, including time-gain compensation, dynamic range, focal length and mechanical index, were optimized for each examination. Conventional images in DICOM format were stored on the Canon Medical System platform. DICOM images were used to extract conventional radiomics features using A. K. software (Ultrasomics Kit, version 1.0, ZhiXing-Tech), including the first-order intensity statistics, texture and wavelet features. Mathematical definitions of all radiomic features were previously described and are detailed in the Supplementary Materials [18].
-
2.
Radiofrequency-based raw data. The same scanning method and planes of conventional radiomics were used, but data were stored as raw data. Post-beam-formed original radiofrequency data (ORF features) with intact frequency information were used to extract the statistical features of the acquired echo amplitude of the raw data.
-
3.
Dynamic contrast-enhanced micro-flow (CEMF) images. Contrast harmonic imaging was used with a mechanical index of 0.08. The transmission frequency used in contrast harmonic imaging was 3.5 MHz and a frame rate of 15–18 frames per second. Images were obtained with intercostal oblique scanning and were expected to show the liver parenchyma from the right intercostal space to the segment 6 region of the right hepatic lobe and the right kidney on a single screen. Focus was set at a depth of 6 to 8 cm to visualize the kidney. After the contrast harmonic imaging mode was activated, a bolus injection of 2.4 ml of SonoVue (Bracco) was administered intravenously via an antecubital vein, followed immediately by a 5 ml saline flush. Patients were instructed to hold their breath after the injection for 7–8 s (1–2 s before the visualization of renal artery), and then the CEMF mode was initiated. Image acquisition proceeded until the liver was wholly enhanced. Clips obtained for approximately 15–20 s immediately after SonoVue infusion were saved as clip data in DICOM format. Dynamic CEMF features were extracted via our built-in model through off-line analysis. The model was developed with the understanding that the liver received dual blood supply from the hepatic artery and portal vein. The blood supply of the kidney was set as the “hepatic artery supply” and was used as a comparison indicator to reduce the influence of circulation difference.
According to our preliminary and reported studies [10, 11, 19,20,21], the three categories of parameters (conventional radiomics, ORF and CEMF features) acquired from the three types of images were expected to provide potential information for liver fibrosis staging and are detailed in the Supplementary Materials. In total, there were 472 features, including 396 conventional radiomics, 54 ORF and 22 CEMF features. These computed quantitative features were selected to construct the ultrasomics—the “omics” data of ultrasound in this study (Fig. 1).
Multiparameter-based ultrasomic analysis of liver fibrosis using machine learning
Feature selection and analysis of multiparametric ultrasomics
Spearman’s correlation coefficient (R) was used to assess correlations between features in all parameters. Feature pairs with |R| > 0.90 were considered to be highly correlated and likely to provide redundant rather than complementary information. The highly correlated features were collapsed into one representative feature, usually the one with the greatest variability or highest dynamic range. This procedure yielded independent features for conventional radiomics, ORF and CEMF features.
Then, we explored feature correlations by establishing a correlation map for pairwise associations among the three categories of parameters. A hierarchical cluster of all quantitative features was plotted with different stages of fibrosis, activity and steatosis. The performance of each feature was further quantified by calculating the area under the receiver-operator characteristic curve (AUC) for fibrosis stage, activity and steatosis.
Multiparametric ultrasomic-based models for significant fibrosis using machine learning
Three categories of parameters with AUCs > 0.6 for assessing all stages of fibrosis were selected for the following analysis. To assess the optimal machine-learning method, all parameters of the three categories were selected for model construction. A total of six types of machine-learning algorithms—adaptive boosting (AdaBoost), decision tree (DT), logistic regression (LR), neural network (NN), random forest (RF) and support vector machine (SVM)—were tested in this study. These machine-learning algorithms were selected because of their promising performance in classification [13]. The brief descriptions of each classifier were explained in the Supplementary Material.
The entire cohort was randomly divided into a training data set (100 cases) and validation data set (44 cases). The training data set was used to compose a model and evaluated it by a validation data set. Six models with different machine-learning methods were built on the training data set, and the performance of each model was then assessed on the validation data set. To ensure the robustness of the classifiers to training and testing data, we adopted a ten-fold cross-validation method to calculate the diagnostic value for significant fibrosis (≥ F2). All processes were repeated ten times with random seed, resulting in ten different training and validation data sets. We repeatedly composed a model using a training data set and evaluated it by a validation data set, and a model that showed the best classification performance was chosen as the best model. The classification performance for significant fibrosis was assessed using the AUC in the validation data sets.
Comparison of multiparametric ultrasomics models
Multiparametric ultrasomics models using optimized machining-learning methods were compared against models of three categories of parameters combined: (1) a combination of conventional radiomics, ORF and CEMF; two categories of parameters combined: (2) a combination of conventional radiomics and CEMF, (3) a combination of ORF and CEMF, (4) a combination of conventional radiomics and ORF; and models with a single parameter: (5) conventional radiomics, (6) ORF and (7) CEMF. Classifier performance was assessed by computing the accuracy, sensitivity, specificity and receiver-operating characteristic (ROC) curve. The AUCs for significant fibrosis in the validation data sets were assessed with the adopted ten-fold cross-validation method.
Statistical analysis
Statistical analyses were performed with the open-source statistical computing environment R (version 3.3.1; R Foundation for Statistical Computing). We filtered features of three modalities based on independence from other features (intraclass Pearson correlation, |r| > 0.9). Heat maps of interclass Pearson correlations among the three categories of parameters were calculated and plotted using the R package “corrplot”. Six machine-learning algorithms were applied with the R packages “rpart”, “ada”, “randomForest”, “kernlab”, “rms” and “nnet”. AUCs for staging fibrosis, activity and steatosis were explored for the three categories of features with the R package “pROC”. The differences between model performance across different machine-learning and parameter subgroups were evaluated via a permutation test by using the R package “Deducer”. Coefficients of variation (CVs) were calculated to compare the discrete degrees of AUCs. All statistical tests were two-sided, and p values < 0.05 were considered statistically significant.