CACTUSS: Common Anatomical CT-US Space for US examinations

Purpose: The detection and treatment of abdominal aortic aneurysm (AAA), a vascular disorder with life-threatening consequences, is challenging due to its lack of symptoms until it reaches a critical size. Abdominal ultrasound (US) is utilized for diagnosis; however, its inherent low image quality and reliance on operator expertise make computed tomography (CT) the preferred choice for monitoring and treatment. Moreover, CT datasets have been effectively used for training deep neural networks for aorta segmentation. In this work, we demonstrate how leveraging CT labels can be used to improve segmentation in ultrasound and hence save manual annotations. Methods: We introduce CACTUSS: a common anatomical CT-US space that inherits properties from both CT and ultrasound modalities to produce an image in intermediate representation (IR) space. CACTUSS acts as a virtual third modality between CT and US to address the scarcity of annotated ultrasound training data. The generation of IR images is facilitated by re-parametrizing a physics-based US simulator. In CACTUSS we use IR images as training data for ultrasound segmentation, eliminating the need for manual labeling. In addition, an image-to-image translation network is employed for the model’s application on real B-modes. Results: The model’s performance is evaluated quantitatively for the task of aorta segmentation by comparison against a fully supervised method in terms of Dice Score and diagnostic metrics. CACTUSS outperforms the fully supervised network in segmentation and meets clinical requirements for AAA screening and diagnosis. Conclusion: CACTUSS provides a promising approach to improve US segmentation accuracy by leveraging CT labels, reducing the need for manual annotations. We generate IRs that inherit properties from both modalities while preserving the anatomical structure and are optimized for the task of aorta segmentation. Future work involves integrating CACTUSS into robotic ultrasound platforms for automated screening and conducting clinical feasibility studies.


Introduction
Abdominal aortic aneurysm (AAA) is a life-threatening disease of the main blood vessel in the human body, the aorta, where an aneurysm, or expansion, occurs thereby weakening the aorta walls.AAA can lead to a high risk of a rupturing of the aorta with an overall incidence rate of 1.9% to 18.5%, in males age 60+ years of age and an average subsequent mortality rate 60% [19].
Abdominal ultrasound has been recommended as an initial examination modality for asymptomatic patients with a high risk of AAA.There is evidence of a significant reduction of premature death from AAA in men aged 65 and above

Ultrasound Simulator
Input: tissue parameters

Image-to-Image network
Trained with an unpaired set of real and simulation images who undergo ultrasound screening [19].Per definition, the aorta is considered aneurysmatic when the absolute anterior to posterior diameter is larger than 3 cm, independently of the relative body size of the patient.However, because the interpretation of the US image is heavily based on the sonographer's experience, the resulting diagnosis is largely operator-dependent, as reported in [15].

2) Domain Adaptation
To overcome the challenge of reproducible ultrasound screening, robotic ultrasound (RUS) imaging has been proposed to offer reproducible ultrasound scans independent of operator skill [11,7,8].Specifically for screening of AAA, this has required an external camera and MRI atlas to locate and track the trajectory of the aorta, which reduces the usability and subsequent acceptance of the methods [20,9].Furthermore, ultrasound image quality has been criticized for not offering the resolution needed to make an accurate measurement [4].
Computed tomography (CT) scans are used in clinical practice to assess, manage, and monitor AAA after an initial discovery during screening [3].In recent years segmentation models based on deep neural networks have demonstrated great performance for automatized CT aorta segmentation, and numerous studies have been trained on large, expert annotated, and publicly available datasets [17,1,10,2].This leads to the possible application of automatic screen-ing and monitoring of AAA in CT imaging using deep learning [21].However, acquiring a CT scan exposes the patient to ionizing radiation.
Ultrasound imaging can serve as a viable alternative to CT and help to reduce patient exposure to ionizing radiation.The application of deep learning for US image segmentation has been hampered due to the complexity of the modality and the lack of annotated training data, which is required for good DNN performance.In order to facilitate the applications of US segmentation for automated AAA scanning, without the use of external imaging, an intermediate representation (IR) is required between US and CT so that CT labels and pretrained networks can be applied to the task of ultrasound image segmentation.

Contributions
We propose Common Anatomical CT-US Space (CACTUSS) which is an anatomical IR and is modality agnostic.The proposed method allows for: 1) real-time inference and segmentation of live ultrasound acquisitions, 2) training a deep neural network without the use of manually labeled ultrasound images, 3) reproducible and interpretable screening and monitoring of AAA.
We evaluate the results from the proposed approach by comparing it to a fully supervised segmentation network and investigate the use of the proposed method for measuring the anterior-posterior aortic diameter compared to the current clinical workflow.In total, the proposed method meets the clinical requirements associated with AAA screening and diagnosis.The source code for our method is publicly available5 .Ultrasound Simulation Parametrization: Ultrasound Simulation has been an active research topic in the last two decades [18,6,16].Since ultrasound data is limited in quantity and difficult to acquire, simulated ultrasound images are used to define an intermediate ultrasound representation.To help define a common anatomical space, we take advantage of a hybrid US simulator introduced by [16], implemented in ImFusion 6 .In CACTUSS, the hybrid raytracing convolutional ultrasound simulator is used to define an anatomical IR with anisotropic properties, preserving the direction-dependent nature of US imaging while also having modality-specific artifacts of ultrasound imaging and well-defined contrast and resolution of CT.This anatomical IR should reside on the joint domain boundary of US and CT distributions.This has the benefit that from a single CT scan, a large number of samples of the IR can be created.Simulation parameters are listed in Table 1 and Table 2 and  the CT domain to the ultrasound domain.Input to the simulator is a threedimensional label map where each voxel is assigned six acoustic parameters that describe the tissue characteristics -speed of sound c, acoustic impedance Z, and attenuation coefficient α, which are used to compute the acoustic intensity at each point along the travel path of the ray.In this way, we create a virtual modality that provides important characteristics from ultrasound, such as tissue interfaces, while learning from annotated CT.

Domain Adaptation
Since there is a domain shift between the IR and real ultrasound B-modes, we learn a mapping between them while preserving the patient-specific anatomical characteristics of each image.In order to translate real ultrasound images into the IR we employ a recent Contrastive Learning for Unpaired image-to-image translation network (CUT) [13].The CUT network assumes a maximum correlation between the content information of a patch of the target image with the spatially corresponding patch of the source image vs. any other patches in the source image.The network generator function G : X → Y translates input domain images X to look like an output domain images Y, with unpaired samples from source X = x ∈ X and target Y = y ∈ Y respectively.The generator G is composed of an encoder G enc and a decoder G dec , which are applied consecutively y = G(z) = G dec (G enc (x)).G enc is restricted to extracting content characteristics, while G dec learns to create the desired appearance using a patch contrastive loss [13].The generated sample Y is stylized, while preserving the structure of the input x.Thus, samples can have the appearance of the IR while maintaining the anatomical content of the US image.
Aorta Segmentation: In the last phase, a segmentation network is trained on the samples from phase 1 to perform aorta segmentation on intermediate space images.The corresponding labels can be directly extracted from the CT slices, saving manual labelling.Critically, no ultrasound image segmentation is required for CACTUSS.

Data
Two domains of images are utilized in this work as can be seen in Figure 2. Intermediate Space: Eight partially labeled CT volumes of men and women were downloaded from a publicly available dataset Synapse7 .These labels were augmented with labels of bones, fat, skin and lungs to complete the label map.The CTs were used to generate 5000 simulated intermediate space samples with a size of 256x256 pixels.From those simulated data, a subset of 500 images was used for domain Y for the CUT network training.This dataset will be referred to as intermediate space set (ISS).In-vivo images: Ten US abdominal sweeps were acquired of the aortas of ten volunteers (m:6/f:4), age = 26 ± 3 with a convex probe (CPCA19234r55) on a cQuest Cicada US scanner (Cephasonics, Santa Clara, CA, US).Per sweep, 50 frames were randomly sampled, each with size 256x256 pixels, for a total of 500 samples and used for domain X of the CUT network.For testing the segmentation network, which was trained only on IRs, a subset of 100 frames, with 10 random frames per volunteer was labelled by a medical expert and used as a test set.For the purpose of comparing against a supervised approach, additional images were annotated to train a patient-wise split 8-fold-cross validation network with 50 images per fold from 8 subjects.Additionally, 23 images from a volunteer not from the existing datasets were acquired from ACUSON Juniper (Siemens Healthineers, Erlangen, Germany) with a 5C1 convex probe and annotated for further evaluation.

Training
For phase 2 we train the CUT network for 70 epochs with a learning rate of 10 −5 and default hyperparameters.For phase 3 we train a U-Net [14] for 50 epochs with a learning rate of 10 −3 , batch size of 64, Adam optimizer and DSC loss.Both models were implemented in PyTorch 1.8.1 and trained on a Nvidia GeForce RTX 3090 using Polyaxon8 .Phase 3 training includes augmentations with rotation, translation, scaling and noise and is randomly split in 80-20% ratio for training and validation, respectively.For testing, the test set, consisting of 100 images from the in-vivo images, is inferred through the CUT network and translated into the common anatomical representation before being inferred with the phase 3 network.

Evaluation Metrics
We use the following metrics to quantitatively evaluate our method: For CUT we use the Fŕechet inception distance (FID) [5] for performance evaluation and early stopping regularization.FID quantifies the difference in feature distribution between two sets of images e.g.real and IR, using feature vectors from the Inception network.As proposed in [5] we use the second layer for FID calculation and consider the epochs with the top 3 FID scores and qualitatively select based on the desired appearance.For the segmentation model, we report the average Dice Score (DSC) and mean absolute error (MAE) of the diameter of the resulting segmentation as proposed in [12].

Experiments
We test the proposed framework quantitatively wrt. the following experiments: Imaging Metrics: We evaluate the accuracy of the proposed method by comparing it to a supervised network.For this, we train an 8-fold cross-validation U-Net, where each fold contains 50 in-vivo images from one subject.We test on 3 hold-out subjects and report the average DSC.Clinical applicability: We measure the anterior-posterior diameter of the aorta in mm, according to current clinical practice [4], and report the MAE and standard deviation compared to ground truth labels for both CACTUSS and the supervised segmentation.Clinically, an error of less than 8 mm is considered acceptable for a medical diagnosis of AAA [4].Robustness: We evaluate against images of a patient scanned with a second US machine as described in Section 2.1.Thus we show how robust is the method to domain shift and again evaluate against the supervised network.Different Intermediate Representation: We replace the proposed common anatomical IR with two alternatives to test the sensitivity of the proposed method's IR choice and specification.The first alternative tested processes CT slices with a Canny edge detector, bilaterial filter, and subsequent convex mask with shape from a convex US probe.The second is a realistic ultrasound simulation from the same label map as the ISS.These alternative IRs were evaluated on a DCS score on 100 in-vivo frames passed through the trained model, with expert annotation ground truth.

Results and Discussion
In Tables 3 and 4, we present the DSC and MAE values of CACTUSS and a supervised U-Net when evaluated on the Cephasonics and Siemens scanners, respectively.This evaluation is performed on two real-world scanners, while the CACTUSS phase 3 segmentation network has only been trained on synthetic samples from the ISS.Remarkably, on the Cephasonics scanner, CACTUSS achieves a higher DSC in aortic segmentation and lower mean absolute error of aorta diameter measurements, a key metric in AAA diagnosis.On the Siemens scanner, CACTUSS has a slightly higher MAE, but still exhibits a lower standard deviation.Furthermore, CACTUSS diameter measurement results are still within the clinically accepted range.For this particular experiment, four out of 23 images were wrongly predicted and only from the supervised method.Alternative Intermediate Representations: Result from evaluating alternative IRs on the are reported in Table 5.The proposed IR in CACTUSS outperforms both edge detection and realistic simulated ultrasound images.

Discussion
CACTUSS was able to not only successfully segment real B-mode images while being trained only on IR data but was able to surpass the supervised U-Net as depicted in Table 3.Furthermore, CACTUSS was able to achieve an aortic diameter measure accuracy of 2.9 ± 1.9 compared to 4.3 ± 1.9 for the supervised U-Net on the Cephasonics machine.For both ultrasound devices, the diameter accuracy is within the accuracy required for clinical diagnosis AAA [4].The results showed that it performed well independently from the machine used; however, the performance may vary due to different preprocessing steps and filters in each US machine.
Our testing of alternative intermediate representations showed the unique advantage that CACTUSS offers.By embedding the anatomical layout in a space between US and CT, i.e. with the contrast of CT and attenuation and reflectively of ultrasound, the greatest segmentation performance was displayed.By testing ultrasound simulation representations and an edge detection representation with high contrast, we show that the choice of representation is not arbitrary.In the case of the US simulations, the reduction of performance is likely due to the increased complexity and the lower SNR of the image due to the addition of ultrasound-specific features such as shadows, reflections and speckle.Alternative representations are also possible, but our testing shows that including fundamental physical assumptions of both modalities enhances model performance.
One challenge of using CUT for domain adaptation is the possibility of hallucinations.Those networks are prone to hallucinate incorrect characteristics with the increasing number of training loops.However, we mitigate this issue by integrating FID to select the most performant CUT model.This approach can remain challenging for complex outputs; however, the CACTUSS IR is simplified in structure and is cleared from features such as speckle noise or reflections, thus improving trainability.
The reproducibility of diagnostic measurements between sonographers, which is heavily dependent on their expertise, can lead to large inter as well as intraobserver variability.In particular, the differences in measurements between sonographers lie between 0.30-0.42cm,and the mean repeatability among technicians is 0.20cm [4].Neural network-based methods provide a standardized computer-aided diagnostic approach that improves the reproducibility of results since models are deterministic.In this way, CACTUSS shows reproducible deterministic results, which are within the clinically accepted ranges, and shows stability in evaluation results.
Additionally, CACTUSS shows reproducible results on images from different US machines, which is a positive indication that the algorithm can be machine agnostic.Moreover, CACTUSS can also be referred to as modality agnostic since intermediate representation images can also be generated from other medical modalities such as MRI.Initial experimental results on AAA sample images show that CACTUSS is able to successfully generate an IR for AAA B-mode images independently of anatomical size and shape 9 .The desired segmentation performance can be achieved by re-training the segmentation network on any in-distribution data.This shows that CACTUSS is applicable to AAA cases and has the potential to generalize to other applications and anatomies.This demonstrates the adaptivity of the proposed method.

Conclusion
In this work, we presented CACTUSS, a common anatomical CT-US space, which is generated out of physics-based simulation systems to address better the task of aorta segmentation for AAA screening and monitoring.We successfully show that US segmentation networks can be trained with existing labeled data from other modalities and effectively solve clinical problems.By better utilizing medical data, we show that the problem of aorta segmentation for AAA screening can be performed within the standards of current medical practice.Furthermore, we show the robustness of this work by evaluating CACTUSS on data from multiple scanners.Future work includes the integration of CACTUSS in robotic ultrasound platforms for automatic AAA screening and clinical feasibility studies of the method.

Fig. 1 .
Fig. 1.Overview of the proposed framework.In phase one, an established ultrasound simulator is re-purposed and parameterized to define an intermediate representation, between the ultrasound and CT space.In phase two, an unsupervised network is trained separately in an isolated fashion to translate clinical ultrasound images to the intermediate representation defined in phase one.In phase three, a segmentation network is trained on the segmentation task using only samples generated with the ultrasound simulator.At inference time, real ultrasound images are passed to the image-to-image network, translated to the intermediate representation and segmented.This is the first time that the segmentation network has seen the intermediate representation from real ultrasound images.

Fig. 3 .
Fig. 3. Examples of B-mode images after inference through the CUT network to the IR.Top row: input B-mode.Bottom row: result after inference.

Table 3 .
Evaluation of CACTUSS on Cicada samples.

Table 4 .
Evaluation of CACTUSS on Juniper samples.

Table 5 .
Comparison of DSC of segmentation given alternative IRs.