Towards liver segmentation in the wild via contrastive distillation

Purpose Automatic liver segmentation is a key component for performing computer-assisted hepatic procedures. The task is challenging due to the high variability in organ appearance, numerous imaging modalities, and limited availability of labels. Moreover, strong generalization performance is required in real-world scenarios. However, existing supervised methods cannot be applied to data not seen during training (i.e. in the wild) because they generalize poorly. Methods We propose to distill knowledge from a powerful model with our novel contrastive distillation scheme. We use a pre-trained large neural network to train our smaller model. A key novelty is to map neighboring slices close together in the latent representation, while mapping distant slices far away. Then, we use ground-truth labels to learn a U-Net style upsampling path and recover the segmentation map. Results The pipeline is proven to be robust enough to perform state-of-the-art inference on target unseen domains. We carried out an extensive experimental validation using six common abdominal datasets, covering multiple modalities, as well as 18 patient datasets from the Innsbruck University Hospital. A sub-second inference time and a data-efficient training pipeline make it possible to scale our method to real-world conditions. Conclusion We propose a novel contrastive distillation scheme for automatic liver segmentation. A limited set of assumptions and superior performance to state-of-the-art techniques make our method a candidate for application to real-world scenarios. Supplementary Information The online version contains supplementary material available at 10.1007/s11548-023-02912-3.


A.1 Experiments
Data pre-processing We thresholded the Hounsfield scale (normalized scale for CT scans) to [0, 500] Hounsfield units (HU) for CT data and [0, 1.5] for MR T2-SPIR data. We removed outliers based on the upper and lower 2 percentiles, normalized to [0,1], and standardized to zero-mean-unit-variance each scan. In the single-DG experiments, we followed related works and preprocessed the CT data by thresholding the inputs at 125 HU. To extract the preliminary features with the pre-trained backbone, we resized each slice to 224 × 224.
Labeled data To promote dataset consistency (only CHAOS dataset contains healthy liver), pixels belonging to liver tumors are considered part of the liver in the experiments. The performance is neither degraded nor improved by this label treatment: the p-value of the paired t-test on the DICE metric is 0.910.
Training During training, the data has been augmented with standard techniques: elastic deformation, blurring, noise, and gamma histogram transformation. Training time varies according to which backbone is used: with a ViT-small/16 backbone and a dataset of ≈ 10000 slices the training procedure converges in ≈ 6 hours.
Inference time Thanks also to the relatively small number of parameters of our model (2.5 · 10 5 , compared to 3 · 10 6 of the traditional U-Net architecture [18]) the inference time per scan (126 slices on average, with resolution 224 × 224) is 0.591 ± 0.082 seconds with a ViT-small/16 backbone.
Prediction post-processing We binarized the predicted segmentation masks based on the 0.5 threshold, like in related works. In the evaluation phase, we resized each scan slice to 128 × 128, following related works. In the single-DG experiments, we resized each scan slice to 192 × 192, like in related works. Table A1 lists the hyperparameter values not listed in the main paper.

Method comparisons
In the multi-DG experiments, we did not perform inference on the 3D-IRCADb-01 dataset because the LiTS dataset already includes it.

Data availability
The study used the publicly available datasets BTCV, CHAOS ([22], https://chaos.grand-challenge.org), 3D-IRCADb-01 ([23], https://www.ircad.fr/research/data-sets/liver-segmentation-3d-ircadb-01/), . The SRFA procedure entails thermal ablation of liver tumors with a multiple-needle stereotactic approach. A precise 3D planning on multi-modal pre-procedural scans and the insertion of coaxial needles in the patient are the first two steps of the procedure. Needle placement is verified via fusion of pre-procedural and intra-procedural control scans. Next, alternating current passes through the ablation probes and thermal energy is transmitted to the target tissue. Once the target tissue temperature is reached (e.g. 60 Celsius degree) irreversible destruction of the tumor tissue nearby the needle is achieved. The abundance of tumoral areas, residues of ablation zones, and the presence of ablation probes are some of the difficulties in performing liver segmentation on SRFA scans. Moreover, intraprocedural CT scans feature various sorts of artifacts due to the uncontrolled imaging acquisition environment (varying dose, type of contrast agent, and patient position).
Comparison to commercial systems Two of the most successful commercial systems for automatic liver analysis are Siemens syngo.via and Ablation-fit. Since Ablation-fit requires two different contrast-enhanced liver phases, in the following we report comparison just with the Siemens system. We used the automatic "CT Liver Analysis" program of Siemens syngo.via to open the patient scans. It takes 13.966 ± 2.377 seconds to perform the liver segmentation using a Windows 10 workstation with 32 GB RAM, 8 × Intel Core i7-10700K CPU @ 2.90GHz, 2 GB NVIDIA GeForce GT 1030. Siemens syngo.via exports segmentation results as dotted contours, so we performed dilation and skeletonization to recover a continuous contour. For space reasons, only three interesting cases are shown: we chose them because they are in hepatic arterial phase and portal venous phase, commonly used for planning and verification of the procedures. An expert interventional radiologist confirmed that our predictions are at least of comparable quality as the results from Siemens syngo.via. Refer to the supplementary videos showing two other cases collected in hepatic arterial phase and portal venous phase for a more in-depth analysis. Figure A1: Liver segmentation prediction on a planning CT scan (100 mL Visipaque 320, arterial phase) in a 45 year old male patient. The prediction of our method is shown in red and the result using the Siemens syngo.via software is displayed in green. The Siemens system mistakes the inferior vena cava for liver tissue. Our method cannot segment the liver in the inferior part. Figure A2: Liver segmentation prediction on a planning CT scan (74 mL Ultravist 370, arterial phase) in a 45 year old male patient. The prediction of our method is shown in red and the result using the Siemens syngo.via software is displayed in green. The Siemens system includes skin in the liver segmentation (second row). Note that our method successfully avoids segmenting the SRFA ablation zone (second slice). Figure A3: Liver segmentation prediction on a planning CT scan (100 mL of Jopamiro 300, portal venous phase) in a 61 year old male patient. The prediction of our method is shown in red and the result using the Siemens syngo.via software is displayed in green. The Siemens system mistakes the inferior vena cava and part of the gallbladder for liver tissue (second row on the left), while our method shows some inaccuracies in the lower part of the liver close to the ribs.