Repeatability and reproducibility of deep-learning-based liver volume and Couinaud segment volume measurement tool

Purpose Volumetric and health assessment of the liver is crucial to avoid poor post-operative outcomes following liver resection surgery. No current methods allow for concurrent and accurate measurement of both Couinaud segmental volumes for future liver remnant estimation and liver health using non-invasive imaging. In this study, we demonstrate the accuracy and precision of segmental volume measurements using new medical software, Hepatica™. Methods MRI scans from 48 volunteers from three previous studies were used in this analysis. Measurements obtained from Hepatica™ were compared with OsiriX. Time required per case with each software was also compared. The performance of technicians and experienced radiologists as well as the repeatability and reproducibility were compared using Bland–Altman plots and limits of agreement. Results High levels of agreement and lower inter-operator variability for liver volume measurements were shown between Hepatica™ and existing methods for liver volumetry (mean Dice score 0.947 ± 0.010). A high consistency between technicians and experienced radiologists using the device for volumetry was shown (± 3.5% of total liver volume) as well as low inter-observer and intra-observer variability. Tight limits of agreement were shown between repeated Couinaud segment volume (+ 3.4% of whole liver), segmental liver fibroinflammation and segmental liver fat measurements in the same participant on the same scanner and between different scanners. An underestimation of whole-liver volume was observed between three non-reference scanners. Conclusion Hepatica™ produces accurate and precise whole-liver and Couinaud segment volume and liver tissue characteristic measurements. Measurements are consistent between trained technicians and experienced radiologists. Graphic abstract Supplementary Information The online version contains supplementary material available at 10.1007/s00261-021-03262-x.


Introduction
Hepatocellular carcinoma is the fastest growing cause of cancer-related deaths in the USA [1], with 18,000 men and 9000 women dying each year. Metastasis to the liver remains a substantial problem [2] with 50,000 deaths each year [3,4]. Guidelines from AASLD [5], EASL [6] and APASL [7] for management of primary liver cancer are generally congruous and patient outcomes are consistently most favourable when early-stage tumours are treated with surgical resection. Similarly for secondary liver cancer, surgical resection [8] currently represents the main curative therapy, often preceded by neoadjuvant chemotherapy or regional radiotherapy in order to suppress the spread and growth of the tumour. However, chemotherapy-associated steatohepatitis has implications for the functional reserve of the liver in patients undergoing surgery [9].
It is widely understood that the estimation of the volume and health of remnant liver parenchyma [10,11] is a key parameter in assessing a resection plan. Underestimation of these factors can lead to post-operative liver failure [12][13][14][15]. As such, consensus opinion amongst surgeons is converging on agreement that a safe lower limit for the future liver remnant (FLR) should be 20% for a patient with a normal healthy liver parenchyma, 30% those with steatosis and 40% when liver has fibrosis or cirrhosis [16]. There is a lack of clear evidence guiding the modulation of safe lower FLR limits for an individual patient, often due to the failure to detect parenchymal liver disease prior to surgery. Interpreting liver function tests can be complex [17] and biopsy-derived pathology scores are associated with risk of haemorrhage and prone to sampling bias [18].
Formal anatomical resection techniques are widely applied to patients with multiple, or deep-lying, lesions such as the extended right hepatectomy where Couinaud segments 1, 4, 5, 6, 7 and 8 are removed. This often results in an FLR of 25-30%, indicating an unsuitability for patients with an inflamed or cirrhotic liver [19]. In such cases, alternative treatment management steps may be considered including two-stage processes such as pre-operative portal vein embolization or associating liver partition and portal vein ligation for staged hepatectomy (ALPPS), where hypertrophy of the FLR is encouraged and has been shown to improve safety of major hepatectomy procedures [20][21][22][23][24][25]. These complex strategies are being adopted more widely to reduce the risk of post-operative morbidity and long hospital stays [26].
Liver volume measurements and subsequent FLR estimations are typically performed by the radiologist using timeconsuming manual techniques (up to 40 min/case [27,28]) with associated intra-operator variability [28,29]. In this study, we evaluate and assess performance of new medical software (Hepatica™), a medical device which performs automatic liver volumetry followed by semi-automatic delineation of the Couinaud segments, in comparison to clinical gold standard of experienced radiologists with a specialty in hepatic imaging. In addition to volumetry, the software tool reports validated biomarkers of liver health corrected T1 (cT1) and proton-density fat fraction (PDFF) [30], increasing the available information in pre-operative liver assessment to improve surgical decision-making.

Patients
Participants were selected amongst volunteers who had taken part in two previous ethically approved studies with informed consent. Liver MRI data of 48 volunteers from 2 different studies were used in evaluating the performance of the device. 18 participants (Group A) were used to verify the volumetry accuracy. 30 healthy volunteers (Group B) were used to evaluate the repeatability, reproducibility and intraand inter-operator variability (See Supplementary Material for further details). These studies were approved by their respective ethics committee (IRAS IDs 241312 and 226607). Subject demographics are shown in Table 1. Data used in this analysis had not previously been used as training/test data for algorithmic validation or any other development of the device.

Image acquisition
Multiple scanners were used in this study in order to evaluate the reproducibility (same participant, different scanner) and repeatability (same participant, same scanner) of the new device. These scanners were Siemens Prisma 3T, Siemens Avanto fit 1.5T, GE Discovery MR750 3T, GE Optima MR450w 1.5T, Philips Achieva dStream 3T and Philips Ingenia 1.5T. Fat-saturated T1-weighted gradient recalled echo (GRE) images without contrast-agent were used for volumetric analysis. cT1, PDFF and T2* maps were acquired as previously described using multislice shortened MOLLI and IDEAL sequences [31,32] with extensive validation. Imaging data as DICOM files are transferred using a secure online portal for analysis by an operator. Summary results are then reviewed and returned to the referring clinician as a report.

Liver volume delineation
Hepatica™ delineates the liver from a 3D T1-weighted MR image. The volume corresponding to the liver is segmented using a convolutional neural network (CNN) that automatically delineates the liver including the caudate. A technician can then refine the volumes obtained from the CNN with manual edits using paintbrush tools and liver lesions are excluded from the segmentation and quantifications. Liver volume data were also analysed by two trained radiologists (9 and 12 years training) using OsiriX software.

Couinaud segmentation
Couinaud classification of liver anatomy divides the organ into nine segments, based on the vasculature [33]. In the new device, a technician positions the following eight landmark points in an interactive 3D visualisation: inferior vena cava (superior zone), inferior vena cava (inferior zone), middle hepatic vein, gallbladder fossa, right hepatic vein, umbilical fissure, right portal vein and left portal vein (Fig. 1). Combinations of these landmarks are then used by Hepat-ica™ to define the planes that divide the liver into Couinaud segments.

Quantitative liver tissue characteristics: cT1 and PDFF
Multislice cT1 extracted from LiverMultiScan™ (Perspectum Ltd., UK) has been demonstrated to be an accurate biomarker of hepatic fibroinflammation [34] in MR imaging and its combination with PDFF allows objective evaluation of future liver health. The multislice PDFF and cT1 data are then aligned with the volumetric MRI data to report the liver tissue characteristics within the volume of each individual Couinaud segment.

Imaging parameters
cT1 maps are generated from 5 axial slices of T1 maps (shMOLLI) 8 mm thick with 12 mm gap, corrected for the presence of hepatic iron from a T2* map (DIXON) [35]. PDFF is measured from 5 × 20-mm-thick slices from IDEAL acquisition [32]. 3D T1-weighted images use vendor standard sequences within a single expiratory breath-hold, typically with a reconstructed resolution of 1.2 × 1.2 × 3.0 mm.

Repeatability, reproducibility and intra/ inter-operator variability
Comparisons between pairs of measurements were assessed using Bland-Altman plots, and 95% limits of agreement (LOA) were calculated. The accuracy of Hepatica™ reporting whole-liver volumetry was evaluated in comparison with OsiriX, both operated by two trained radiologists. Additionally, the accuracy of a trained technician reporting whole-liver and Couinaud segment volumetry, PDFF and cT1 was evaluated by comparing results of a technician with the average of two experienced radiologists. Time spent per case by users in both devices was also measured. Precision is defined in terms of repeatability and reproducibility. Reproducibility is the difference of metrics of the same patient between the reference scanner (Siemens 3T Prisma scanner, selected as de facto reference scanner owing to availability) and non-reference scanners (Siemens Avanto fit 1.5T, GE Discovery MR750 3T, GE Optima MR450w 1.5T, Philips Achieva dStream 3T and Philips Ingenia 1.5T). Repeatability, performed on each of the six scanners, was measured as the difference between two acquisitions of the same patient under the same scanner, roughly 10 min apart. The patient was scanned, removed from the scanner, then returned and rescanned in order to induce realistic positional variation.
Intra-and inter-operator variability was assessed by one technician examining the same dataset twice and two technicians examining the same dataset.

Accuracy of device compared to current gold standard
The similarity of whole-liver segmentations from two experienced radiologists using Hepatica™ and OsiriX was very high (n = 36 cases, 18 patients analysed separately by each radiologist, mean Dice score 0.947 ± 0.010), with resultant volume measurements from the two devices in strong agreement (Fig. 2a) (Fig. 2b, c). The time spent to generate the whole-liver segmentation masks is significantly shorter whilst using Hepatica™ (median of 17 min per case) compared to OsiriX (median of 34 min per case) (n = 7 matched cases, **p = 0.0033 Wilcoxon test, Fig. 3).
Trained technicians using the new medical device demonstrated consistently high agreement when compared directly with experienced radiologists, with an average segment variability of ± 3.5% and whole-liver volume LOA = [− 4.2%, 0.5%] ( Table 2). Segmental cT1 and segmental PDFF were in high agreement between technician and radiologists (Table 3), with average segment volume LOA of ± 1.1% and ± 0.2%, respectively.

Repeatability
The consistency of volumetric measurement results from the device on each of the six different scanners was evaluated with repeat scans of ten participants scanned on/ off/on, simulating a follow-up visit with no volumetric change. High repeatability was observed on each scanner used, where the broadest LOA appeared in Philips 1.5T for whole-liver volumetry, equal to [− 7.9%, 4.8%] ( Table 4). The average liver segment volume LOA was equal or within ± 3.4% for all the scanners.

Reproducibility
Volumetric measurements from a Siemens 3T scanner were compared against five other major scanner models and field strengths to measure variability amongst MRI scanner type (

Discussion
The aim of this study was to evaluate the accuracy, reproducibility, repeatability, intra-and inter-observer variability and time saving resulting from using Hepatica™ in relation to currently available software. Results indicate that the new medical device is capable of delineating livers accurately and of dividing the volume into Couinaud segments based  on anatomical landmarks. This is performed with a substantial reduction in time compared with the current gold standard, which uses manual segmentation by an experienced radiologist. The automation of several steps results in reduced subjectivity and increased robustness of volumetry as shown by the reduction in variability of liver volumetry. Robustness of this medical software was demonstrated in three aspects: Firstly, within the repeatability study, where despite the participant exiting the scanner and returning for a second scan, volumetric measurements were remarkably consistent, as demonstrated on six major MRI scanner models. Secondly, Couinaud segment volume measurements were also demonstrated to be highly reproducible when the same participant was subsequently scanned on separate scanner models. Finally, inter-and intra-operator variability was demonstrated to be low indicating that results are not impacted by subjective bias. Liver volumetry has previously been extensively examined in the context of liver resection surgery, transplant surgery and image analysis techniques; this is the first report of a highly robust deep-learning-based methodology that could be widely deployed using less experienced professionals to obtain accurate outputs. Reproducibility tests for GE Discovery MR750 3T, Philips Achieva dStream 3T and Philips Ingenia 1.5T showed a higher disagreement compared to the reference scanner. These disagreements may stem from differences in contrast generated by different scanner models at the boundary of the liver. This effect may be consistent with any currently available volumetry software, but we are currently unaware of any between-scanner reproducibility studies in liver volumetry imaging.
The clinical utility of this medical software for preoperative planning is further demonstrated beyond saving time, as it quantifies liver health using cT1 and PDFF within the whole liver as well as within the remaining segments of the future liver remnant (FLR). Additionally, when the FLR of a particular operation is requested, radiologists must typically delineate each surgical option manually; in the device examined here, the volume of each Couinaud segment is generated (Fig. 4); thus, the FLR of each possible surgical option can be quickly measured (e.g. right hepatectomy versus segmentectomy of segment 5 and 8). Study limitations include the relatively small sample size (n = 48) and the associated low number of cases with underlying liver disease (n = 15). With the continued use of this device in more patients being considered for liver surgery, we intend to surveil the accuracy of liver tissue characterisation.
In conclusion, Hepatica™ is a new medical device that provides robust whole-liver and Couinaud segment volume and liver tissue characteristic measurements to support the treatment decision-making process. This enables surgeons to make individual assessments of a patient based on the volume and health of remnant livers prior to resection for liver cancer.

Declarations
Conflict of interest Perspectum Ltd. is a privately funded commercial enterprise that develops medical devices to address unmet clinical needs, including Hepatica®. RB is the CEO and founder of Perspectum. LN, JC, AF, RN, AB, MM, AP, ZA, CF, MK and JMB are employees of Perspectum. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.