Automated contouring of CTV and OARs in planning CT scans using novel hybrid convolution-transformer networks for prostate cancer radiotherapy

Arjmandi, Najmeh; Nasseri, Shahrokh; Momennezhad, Mehdi; Mehdizadeh, Alireza; Hosseini, Sare; Mohebbi, Shokoufeh; Tehranizadeh, Amin Amiri; Pishevar, Zohreh

doi:10.1007/s12672-024-01177-9

Automated contouring of CTV and OARs in planning CT scans using novel hybrid convolution-transformer networks for prostate cancer radiotherapy

Research
Open access
Published: 31 July 2024

Volume 15, article number 323, (2024)
Cite this article

Download PDF

You have full access to this open access article

Discover Oncology Aims and scope Submit manuscript

Automated contouring of CTV and OARs in planning CT scans using novel hybrid convolution-transformer networks for prostate cancer radiotherapy

Download PDF

Najmeh Arjmandi¹,
Shahrokh Nasseri^1,2,
Mehdi Momennezhad^1,2,
Alireza Mehdizadeh³,
Sare Hosseini^4,5,
Shokoufeh Mohebbi⁶,
Amin Amiri Tehranizadeh⁷ &
…
Zohreh Pishevar⁴

355 Accesses
Explore all metrics

Abstract

Purpose objective(s)

Manual contouring of the prostate region in planning computed tomography (CT) images is a challenging task due to factors such as low contrast in soft tissues, inter- and intra-observer variability, and variations in organ size and shape. Consequently, the use of automated contouring methods can offer significant advantages. In this study, we aimed to investigate automated male pelvic multi-organ contouring in multi-center planning CT images using a hybrid convolutional neural network-vision transformer (CNN-ViT) that combines convolutional and ViT techniques.

Materials/methods

We used retrospective data from 104 localized prostate cancer patients, with delineations of the clinical target volume (CTV) and critical organs at risk (OAR) for external beam radiotherapy. We introduced a novel attention-based fusion module that merges detailed features extracted through convolution with the global features obtained through the ViT.

Results

The average dice similarity coefficients (DSCs) achieved by VGG16-UNet-ViT for the prostate, bladder, rectum, right femoral head (RFH), and left femoral head (LFH) were 91.75%, 95.32%, 87.00%, 96.30%, and 96.34%, respectively. Experiments conducted on multi-center planning CT images indicate that combining the ViT structure with the CNN network resulted in superior performance for all organs compared to pure CNN and transformer architectures. Furthermore, the proposed method achieves more precise contours compared to state-of-the-art techniques.

Conclusion

Results demonstrate that integrating ViT into CNN architectures significantly improves segmentation performance. These results show promise as a reliable and efficient tool to facilitate prostate radiotherapy treatment planning.

Clinical implementation of MRI-based organs-at-risk auto-segmentation with convolutional networks for prostate radiotherapy

Article Open access 11 May 2020

A Novel Hybrid Convolutional Neural Network for Accurate Organ Segmentation in 3D Head and Neck CT Images

Pelvic U-Net: multi-label semantic segmentation of pelvic organs at risk for radiation therapy anal cancer patients using a deeply supervised shuffle attention convolutional neural network

Article Open access 28 June 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Accurate contouring of both the clinical target volume (CTV) and critical organs at risk (OARs) including bladder, rectum, right femoral head (RFH), and left femoral head (LFH) is crucial for successful radiation therapy of prostate cancer [1, 2]. Manual contouring of the prostate region in CT images is a time-consuming process [3, 4]. This issue can result in delays in starting radiotherapy treatment, particularly in clinics with limited resources [5]. Reports have demonstrated significant variations in contouring results among different experts [6]. Furthermore, the low contrast of soft tissue in male pelvic CT images often leads to unclear boundaries between the prostate region and surrounding organs [7], thereby rendering accurate contouring a challenging endeavor. Additional complexities arise from the considerable variability in the shapes and size of male pelvic organs [7, 8].

To address these challenges, many automatic contouring methods have been proposed. Ling Ma et al. [9] conducted a hybrid approach that combined deep learning techniques with an Atlas model to automatically contour the prostate on 2D CT images. They obtained preliminary contour using a convolutional neural network (CNN) and subsequently refined the CNN-derived results with the atlas method. This proposed method yielded a dice similarity coefficient (DSC) of 86.8%.

Kazemifar et al. [8] developed an automatic approach to contour the prostate, rectum, bladder, and femoral heads in CT images. They designed a 2D U-Net that received CT images slice-by-slice and outputted the corresponding segmented image. In another study [10], they used a 2D U-Net for organ localization and then used a 3D U-Net approach to achieve precise contouring. The combination of the 2D localization network and a 3D contouring network led to an improvement in the Dice similarity coefficient for the prostate, increasing from 88 to 90%.

He et al. [11] developed a two-step framework for CT prostate segmentation using fully convolutional networks. The first stage localizes the prostate region, while the second stage precisely segments it using a multi-task U-Net architecture. The proposed network uses voxel-wise sampling in a multitask learning module, enhancing the quality of the learned feature space.

Wang et al. [12] introduced an automatic deep learning-based prostate segmentation method for 313 CT male pelvic scans. Their segmentation framework includes an organ localization model, a boundary-sensitive representation model, and a multi-label cross-entropy loss function. This approach outperforms baseline fully convolutional networks.

Pan et al. proposed a token-based transformer network for multi-organ segmentation using CT images. Their hybrid architecture combines a ResNet-like encoder, a transformer module for capturing global dependencies, and a mirroring decoder for detailed segmentations. The network's performance was evaluated using several metrics. Dice scores for the prostate, rectum, bladder, left femoral head, and right femoral head reached 0.84, 0.89, 0.94, 0.95, and 0.95, respectively. Hausdorff distances ranged from 2.56 mm to 6.59 mm, while mean surface distances varied from 0.91 mm to 4.97 mm, and residual mean square distances from 1.24 mm to 2.03 mm [13].

Kawula et al. investigated the efficacy of a 3D U-Net model for segmenting the prostate, bladder, and rectum in CT images. Geometric accuracy was assessed using the DSC and 95% HD. The DSC values for the prostate, bladder, and rectum were 0.87, 0.97, and 0.89, respectively. The average and 95% HD for these organs were all below 1.6 mm and 4 mm, below 0.95 mm and 2.5 mm, and below 1.4 mm and 5 mm, respectively [14].

Shen et al. proposed a convolutional CUNet network for automated contouring of the CTV and OARs in prostate cancer radiotherapy. CUNet leverages a 3D U-Net architecture with an attention center block that enhances feature refinement and performance by selectively emphasizing informative features while suppressing less relevant ones. The model's performance was evaluated using Dice Similarity Coefficient (DSC) and 95th percentile Hausdorff distance (95HD) metrics for CTV and OAR delineation. The mean DSC and 95HD values for the defined CTVs were (0.84 ± 0.05) and (5.04 ± 2.15) mm, respectively. For OARs, the DSC values ranged from 0.783 to 0.913, with corresponding 95HD values spanning 1.424 to 6.278 mm [15].

Mofid et al. investigated the use of a 3D nnU-net architecture for automatic segmentation in prostate cancer patients. The nnU-net architecture adheres to a 3D U-Net pattern, incorporating an encoder-decoder structure with skip connections. The algorithm demonstrated high performance, achieving DSC of 0.97 (bladder), 0.96 (right femur head), 0.9 (rectum), 0.82 (prostate), 0.77 (lymph nodes), and 0.69 (seminal vesicles). Corresponding HD were 4.13, 3.58, 10.04, 3.68, 15.5, and 10.95 mm, respectively [16].

Although these studies have demonstrated promising results achieved by CNNs in male pelvic multi-organ contouring, the precise delineation of the prostate region on CT images using CNNs still remains a challenging task. One notable drawback of employing CNNs for medical image segmentation is their limited ability to capture global dependencies [17]. CNNs typically have localized receptive fields, which means they focus on small regions of the input image at a time. In medical imaging, where global context and spatial relationships are critical for accurate segmentation, this limitation can negatively impact the performance of CNN-based models [18].

Vision transformer involves applying transformer-based models, which have shown great success in natural language processing tasks, to the task of segmenting medical images [18, 19]. Unlike CNNs, ViTs operate on the entire image rather than localized regions. This allows them to capture long-range dependencies and contextual information across the image [18, 20]. However, directly applying up-sampling techniques on ViTs is ineffective in adequately restoring fine-grained information, often resulting in a coarse segmentation outcome [18].

Many studies have focused on the hybrid CNN-ViT architecture [18, 20] to maximize the advantages offered by both models. In TransUnet [18], the feature tensor obtained from the ViT was used and combined with the hierarchical deconvolved features of matching resolution from the CNN in the decoder module.

This study proposes the use of hybrid CNN-ViT networks for the male pelvic multi-organ contouring of prostate cancer patients. In this paper, we implemented a novel approach that combines the ViT and CNN architectures to capture detailed features with long-range dependency capabilities. Our main objective was to propose and employ an attention-based fusion mechanism to merge the detailed features extracted through the convolutional model with the global features obtained through the transformer model. We used 104 radiotherapy planning CT volumes to train and evaluate two CNN and two hybrid CNN-ViT networks.

2 Materials and methods

2.1 Patient data

For this study, we used retrospective data from 104 localized prostate cancer patients. An attending radiation oncologist delineated the target organ (prostate) and critical OARs (bladder, rectum, RFH, and LFH) on CT images using ISOgray radiotherapy treatment planning software (DOSIsoft, SA, France).

Data collection for this study involved multiple centers where CT images were acquired using different scanners from various manufacturers. Research data were collected from three radiotherapy centers in Mashhad, Iran: the Research and Treatment Center of Imam Reza Hospital, Reza Radiotherapy and Oncology Center, and Razavi Hospital. Radiotherapy planning CT images of prostate cancer patients were obtained using different CT scanners, including NeuVis (PNMS manufacturer), LightSpeed (GE manufacturer), and Somatom Sensation Open (Siemens manufacturer).

2.2 Data preprocessing

The planning CT images were cropped to exclude the non-pelvic regions, as they lack any relevant information for network training and only contribute to increased computational time. Data normalization and standardization were then applied, rescaling the images to have a mean of 0 and a standard deviation (SD) of 1. Finally, the pre-processed CT images were shuffled and split into three sets: 70% for training, 10% for validation, and 20% for testing. To enhance data variability, we applied online data augmentation techniques to the training set. This approach, generating augmented data dynamically during training, offers flexibility, and minimizes storage needs. Augmentation techniques included rotations, flipping (horizontal and vertical), cropping, shifting, zooming, random local rotations, and shearing.

2.3 ViT model implementation

The research procedures were performed on Linux Ubuntu version 18.04.4 LTS, utilizing a system with NVIDIA GeForce RTX 2070 SUPER and 8-GB V-RAM. Furthermore, the code implementations were carried out using Pytorch 2.4.3 in Python version 3.11.5.

2.4 Network architecture

We have implemented hybrid CNN-ViT networks that combine convolutional and transformer techniques. Figure 1 illustrates the architecture of the hybrid CNN-ViT networks, which consists of 2 parallel parts: CNN encoding and transformer encoding. These components operate simultaneously to process the input data. Furthermore, there is a fusion part that integrates and combines the outputs of these parallel components.

2.4.1 CNN encoding part

The input image passes through a CNN encoder, which extracts local features and captures spatial information. This results in a set of CNN features. The CNN encoder is composed of multiple convolutional layers, pooling layers, and activation functions. In this study, we used two transfer learning network architectures, VGG16-UNet and ResNet50-UNet, for the CNN encoding part.

2.4.2 Transformer encoding part

The structure of the transformer part is based on the conventional encoder-decoder architecture. This part begins with global self-attention and gradually restores the local details. The input image is initially divided into patches of equal size. Subsequently, these patches are flattened and forwarded into a linear embedding layer.

2.4.3 Fusion block

Both parts extract features of the same resolution, which are then inputted into our proposed attention-based fusion module. As illustrated in Fig. 2, the two tensors from the CNN and ViT branches are weighted using a global attention unit and then concatenated by the middle spatial attention branch. By using this technique, the model leverages both global and spatial attention units to fuse the extracted features from both models.

2.5 Model training strategy

We trained our models using transfer learning and fine-tuning on an ImageNet dataset. The ResNet50-UNet-ViT and VGG16-UNet-ViT were trained for 50 epochs using the Adam optimizer (learning rate = 10⁻⁴) with a batch size of 5.

2.6 Model loss function

We employed a multi-class weighted cross-entropy (CE) intersection-over-union (IOU) loss function to train the networks. The inclusion of class weights in the loss function serves the purpose of adjusting it to penalize false positives or false negatives more significantly

$${Loss={Loss}_{IOU}^{w}+Loss}_{CE}^{w}$$

(1)

During the training process, the weights vector {w} for each organ is calculated by considering the number of class weights associated with that specific organ.

2.7 Network evaluation criteria

Automatic contours are compared with manual contours in terms of geometry. To evaluate the geometry, we used spatial overlap-based metrics, volume-based metrics, and spatial distance-based metrics.

The DSC is a spatial overlap index that calculates how much of the reference contour and the result contour overlap. Higher DSC results from more overlaps between the result and reference contour.

$$Dice\, Coefficient=2*\left(Reference\cap Result)/(Reference+Result\right)$$

(2)

Volume-based metrics include volume overlap error and Relative Volume Difference (RVD).

$$VOE=\left(1-\left(Reference\cap Result)/(Reference+Result\right)\right)*100$$

(3)

RVD calculates the relative difference in volume between the binary objects in the two images. It is calculated according to the following formula.

$$RVD=\left(\left(Result-Reference)/(Reference\right)\right)*100$$

(4)

Hausdorff distance (HD) is a spatial distance-based metric that indicates the maximum distance of 1 set (automatic contour) to the closest point in another set (manual contour), measured in millimeters. A smaller HD value corresponds to better results. HD can be significantly affected by outlier data, so we used HD95.

The HD between X and Y is:

$${\text{HD }}\left( {{\text{X}},{\text{ Y}}} \right) \, = {\text{max }}({\text{hd }}\left( {{\text{Y}},{\text{ X}}} \right),{\text{ hd }}\left( {{\text{X}},{\text{ Y}}} \right))$$

(5)

where (hd (Y, X) is the one-sided HD from X to Y that measures the maximum distance from any point in X to its closest neighbor in Y. Mathematically, this is expressed as:

$${\text{hd }}\left( {{\text{Y}},{\text{ X}}} \right) \, = ||\mathop {\max { }}\limits_{{{\text{y}} \in {\text{Y}}}} \mathop {{ }\mathop {min}\limits_{{{\text{x}} \in {\text{X}}}} }\limits_{ } {\text{ || x}} - {\text{y|| 2}}$$

(6)

where, || . || represents the Euclidean distance. We can similarly calculate the one-sided HD from Y to X:

$${\text{hd }}\left( {{\text{Y}},{\text{ X}}} \right) \, = \mathop {\max { }}\limits_{{{\text{y}} \in {\text{Y}}}} \mathop {{ }\mathop {min}\limits_{{{\text{x}} \in {\text{X}}}} }\limits_{ } {\text{ || x}} - {\text{y || 2}}$$

(7)

Another spatial distance-based metric is the average surface distance (ASD), which represents the mean distance between the boundary points of an automatically segmented region and the boundary points of the ground truth.

The ASD is calculated as follows:

$$ASD\left( {A,B} \right) = 1/\left| {s(A)} \right| + \left| {S(B)} \right| + \sum\limits_{{S_{A} \in S\left( A \right)}} {d\left( {s_{A} ,S\left( B \right)} \right) + \sum\limits_{{S_{B} \in S\left( B \right)}} {d\left( {s_{B} ,S\left( A \right)} \right)} }$$

(8)

where $d\left(v,S\left(A\right)\right)$ is the shortest distance from an arbitrary voxel $v$ to the set of surface voxels $S(A)$and is defined as follows:

$$d\left( {v,S\left( A \right)} \right) = \,\mathop {\min \left\| {v\, - \,S_{A} } \right\|}\limits_{{S_{A} \in \,S\left( A \right)}}$$

(9)

where, || represents the Euclidean distance.

3 Results

3.1 Ablation study

We assessed the performance of models using only CNN components (ablating ViT). Training a purely ViT-based model (ablating CNN) was not feasible due to our hybrid CNN-ViT architecture's reliance on convolutional operations for decoding. To evaluate the impact of a purely ViT approach, we trained an additional model, Swin-UNet [21] (ViT), a fully transformer-based model.

3.2 Quantitative results

We successfully segmented five pelvic organs (prostate, bladder, rectum, and femoral heads) of prostate cancer patients using our proposed 2D hybrid CNN-ViT segmentation networks. To validate their efficacy, we evaluated the hybrid CNN-ViT networks by comparing them with the corresponding pure CNN models. We used the same patient dataset to train and test these network configurations. The CNN models include ResNet50-UNet and VGG16-UNet, which use ResNet50 and VGG16 backbones as CNN encoders. It is noteworthy that all five classes were simultaneously trained using a single network configuration, forward propagation, and loss function.

Table 1 summarizes the quantitative analysis, presenting the mean and standard deviation (SD) of various metrics. We evaluated the impact of ablating each component on the model's performance, as measured by these metrics. As shown in the table, both ResNet50-UNet-ViT and VGG16-UNet-ViT achieve more precise segmentation compared to their corresponding pure convolutional and ViT networks. Furthermore, VGG16-UNet-ViT outperforms ResNet50-UNet-ViT in all five classes.

Table 1 Quantitative evaluation of the hybrid CNN-ViT networks compared to the corresponding pure CNN and transformer networks. Negative RVD values indicate a predicted volume smaller than the reference volume, whereas positive RVD values indicate a predicted volume larger than the reference volume

Full size table

Additionally, we conducted a paired t-test to obtain the p-values to compare the results of the CNN method with our proposed hybrid CNN-ViT segmentation network. The analysis demonstrates that ResNet50-UNet-ViT achieves the highest performance in contouring the prostate, bladder, rectum, and femoral heads compared to ResNet50-UNet with statistical significance. To determine statistical significance, we calculated the P-value by comparing the hybrid CNN-ViT with the corresponding pure CNN model. Statistically significant improvements are indicated with an asterisk (*) when the P-value is less than 0.05.

Table 2 summarizes the impact of ablating in the global attention unit within the fusion module on the model's performance, as measured by DSC. As shown in the table, both ResNet50-UNet-ViT and VGG16-UNet-ViT generally achieve more precise segmentation using a 7 × 7 convolutional kernel compared to other kernel sizes.

Table 2 Impact of kernel size of the global attention unit on model performance

Full size table

3.3 Qualitative results

The predicted contours of the five classes for the five networks are presented in Fig. 3. The contours produced by the hybrid CNN-ViT segmentation networks exhibit a high degree of similarity to the ground truth contours.

Figure 4 displays the reference organ boundaries and segmentation results of a randomly selected slice from the testing dataset. It is evident that our suggested approach accurately contours the organ boundaries, as indicated by the significant overlap between the automated and reference segmentation outcomes.

3.4 Comparison with the state-of-the-art techniques

Table 3 provides a comparison between our proposed method's performance and other state-of-the-art methods in the literature.

Table 3 Comparison of the present study with the state-of-the-art studies (‘-’ denotes that the metric is not reported)

Full size table

4 Discussion

In this study, our objective was to investigate automated male pelvic multi-organ contouring from multi-center and diverse planning CT images using hybrid CNN-ViT networks that combine convolution and transformer techniques. We introduced a novel attention-based fusion module that merges the detailed features extracted through convolution with the global features obtained through the transformer.

Experiments conducted on multicenter planning CT images indicate that combining the ViT structure with the CNN network resulted in superior performance for all organs compared to pure CNN and transformer architectures, except for the LFH in the ResNet50-UNet network. As evidenced by the p-values reported in Table 1, VGG16-UNet-ViT demonstrated statistically superior accuracy compared to VGG16-UNet and ViT for all structures in terms of DSC.

According to Table 3, our DSC for the prostate was superior compared to other similar studies. This superiority can be primarily attributed to the utilization of a combination of convolution and transformer techniques.

In our proposed method, the mean DSC for the bladder is 95.54%, which ranks second after the study of Zhang et al. [22]. Although they achieved higher DSCs for the bladder (97%), their study's reliance on a single observer and a single CT device as a reference introduces potential bias in the results, particularly when compared to a multicenter study.

In our study, the DSC for the rectum is 86.8%, which is comparatively lower than the results reported in certain similar studies. Among similar studies, Kazemifar et al.’s method [10] achieved the best segmentation result for the rectum. Our study is not directly comparable to their study for the rectum because they used patients with endo-rectal balloon insertion. Endo-rectal balloons are commonly used in the radiotherapy of prostate cancer patients to spare the rectum [23].

Our findings, which are based on the utilization of private and diverse datasets, are consistent with the results of studies conducted by Kazemifar et al. [8]; He et al. [11]; Zhang et al. [22]; Kearney et al. [24]; and Wang et al [12]. All of these methods obtained satisfactory results for RFH, LFH, and bladder. RFH and LFH contouring, due to their high contrast, is easy for networks [25]. Similarly, delineating bladder boundaries is relatively easy due to its distinct wall structure and large size [12]. However, accurately delineating the boundaries of the prostate and rectum presents more significant challenges due to their smaller size and lower contrast [7], especially in regions where these two organs are in close proximity.

Sensitivity is a commonly used metric in image analysis [12]. The hybrid CNN-ViT networks exhibit superior sensitivity compared to the corresponding pure CNN networks.

We evaluated the models using different metrics (spatial overlap-based metrics, volume-based metrics, and spatial distance-based metrics), as shown in Table 1, to ensure result consistency. In general, the hybrid CNN-ViT networks exhibit lower HD95 and ASD values compared to the corresponding pure CNN networks. As expected, the rectum demonstrates the highest RVD (-1.84%) among all structures in the VGG16-UNet-ViT network. This observation is consistent with its lower DSC value (86.8%). This means that the predicted volume is 1.84% smaller than the reference volume. This small volume difference is likely to have a negligible effect on the dose-volume metrics used in routine radiotherapy treatment planning optimization.

VGG16-UNet-ViT for all organs and ResNet50-UNet-ViT, except for the LFH and rectum, achieve more precise segmentation using a 7 × 7 convolutional kernel compared to other convolution sizes. This improved performance with a 7 × 7 kernel is likely attributed to its larger receptive field and better ability to capture contextual information.

5 Conclusion and future work

This paper introduces a segmentation network that uses a novel attention-based fusion method to combine the ViT and CNN architectures for male pelvic multi-organ contouring on planning CT images. Our findings demonstrate that integrating convolutional and transformer techniques resulted in superior segmentation performance compared to solely relying on either convolutional or transformer networks.

Additionally, the proposed method achieves more precise contours compared to state-of-the-art techniques. The results show promise as a reliable and efficient tool to aid in prostate radiotherapy treatment planning. Automatic contouring is a valuable tool in radiotherapy treatment planning; however, it cannot be solely relied upon as the definitive treatment contours. It is imperative that a qualified physician evaluates the contours and makes any required modifications to ensure accuracy and precision. Incorporating automated contouring methods in clinics provides several benefits, such as minimizing variability between different observers and accelerating the segmentation process.

Our work has certain limitations. We used a limited test set consisting of only 20 cases, which may not fully represent the diverse range of male pelvic CT images. To address this, we plan to validate our proposed method on a larger dataset to demonstrate its applicability and generalizability. Additionally, in the future, we aim to investigate the dosimetry impact of deep learning-based auto-contoured structures compared to manual contours for radiotherapy treatment planning.

Data availability

The datasets used and analyzed in the study are available from the corresponding author upon reasonable request.

Abbreviations

CT:: Computed Tomography
CNN-ViT:: Convolutional Neural Network-Vision Transformer
CNN:: Convolutional Neural Network
ViT:: Vision Transformer
CTV:: Clinical Target Volume
OAR:: Organs At Risk
DSC:: Dice Similarity Coefficient
RFH:: Right Femoral Head
LFH:: Left femoral head
SD:: Standard deviation
CE:: Cross-Entropy
IOU:: Intersection-Over-Union
VOE:: Volume Overlap Error
HD:: Hausdorff Distance
ASD:: Average Surface Distance

References

Baroudi H, Brock KK, Cao W, Chen X, Chung C, Court LE, El Basha MD, Farhat M, Gay S, Gronberg MP, Gupta AC, Hernandez S, Huang K, Jaffray DA, Lim R, Marquez B, Nealon K, Netherton TJ, Nguyen CM, Reber B, Rhee DJ, Salazar RM, Shanker MD, Sjogreen C, Woodland M, Yang J, Yu C, Zhao Y. Automated contouring and planning in radiation therapy: what is ‘clinically acceptable? Diagnostics. 2023. https://doi.org/10.3390/DIAGNOSTICS13040667.
Article PubMed PubMed Central Google Scholar
Salembier C, Villeirs G, De Bari B, Hoskin P, Pieters BR, Van Vulpen M, Khoo V, Henry A, Bossi A, De Meerleer G, Fonteyne V. ESTRO ACROP consensus guideline on CT- and MRI-based target volume delineation for primary radiation therapy of localized prostate cancer. Radiother Oncol. 2018;127:49–61. https://doi.org/10.1016/J.RADONC.2018.01.014.
Article PubMed Google Scholar
Wright JL, Yom SS, Awan MJ, Dawes S, Fischer-Valuck B, Kudner R, Mailhot Vega R, Rodrigues G. Standardizing normal tissue contouring for radiation therapy treatment planning an ASTRO consensus Paper. Pract Radiat Oncol. 2019;9:2. https://doi.org/10.1016/J.PRRO.2018.12.003.
Article Google Scholar
Skrzyński W, Zielińska-Da̧browska S, Wachowicz M, Ślusarczyk-Kacprzyk W, Kukolowicz PF, Bulski W. Computed tomography as a source of electron density information for radiation treatment planning. Strahlenther Onkol. 2010. https://doi.org/10.1007/S00066-010-2086-5.
Article PubMed Google Scholar
Czipczer V, Manno-Kovacs A. Adaptable volumetric liver segmentation model for CT images using region-based features and convolutional neural network. Neurocomputing. 2022;505:388–401. https://doi.org/10.1016/J.NEUCOM.2022.07.024.
Article Google Scholar
Hong Y, Wei Mao X, Lei Hui Q, Ping Ouyang X, Yi Peng Z, Xing Kong D. Automatic liver and tumor segmentation based on deep learning and globally optimized refinement. Appl Math. 2021. https://doi.org/10.1007/s11766-021-4376-3.
Article Google Scholar
He K, Cao X, Shi Y, Nie D, Gao Y, Shen D. Pelvic organ segmentation using distinctive curve guided fully convolutional networks. IEEE Trans Med Imaging. 2019;38:585–95. https://doi.org/10.1109/TMI.2018.2867837.
Article PubMed Google Scholar
Kazemifar S, Balagopal A, Nguyen D, McGuire S, Hannan R, Jiang S, Owrangi A. Segmentation of the prostate and organs at risk in male pelvic CT images using deep learning. Biomed Phys Eng Express. 2018. https://doi.org/10.1088/2057-1976/aad100.
Article Google Scholar
Ma L, Guo R, Zhang G, Tade F, Schuster DM, Nieh P, Master V, Fei B. Automatic segmentation of the prostate on CT images using deep learning and multi-atlas fusion. Proc SPIE--the Int Soc Opt Eng. 2017. https://doi.org/10.1117/12.2255755.
Article Google Scholar
Balagopal A, Kazemifar S, Nguyen D, Lin MH, Hannan R, Owrangi A, Jiang S. Fully automated organ segmentation in male pelvic CT images. Phys Med Biol. 2018. https://doi.org/10.1088/1361-6560/aaf11c.
Article PubMed Google Scholar
He K, Lian C, Adeli E, Huo J, Gao Y, Zhang B, Zhang J, Shen D. MetricUNet: Synergistic image- and voxel-level learning for precise prostate segmentation via online sampling. Med Image Anal. 2021. https://doi.org/10.1016/j.media.2021.102039.
Article PubMed PubMed Central Google Scholar
Wang S, He K, Nie D, Zhou S, Gao Y, Shen D. CT male pelvic organ segmentation using fully convolutional networks with boundary sensitive representation. Med Image Anal. 2019;54:168–78. https://doi.org/10.1016/j.media.2019.03.003.
Article PubMed PubMed Central Google Scholar
Pan S, Lei Y, Wang T, Wynne J, Chang CW, Roper J, Jani AB, Patel P, Bradley JD, Liu T, Yang X. Male pelvic multi-organ segmentation using token-based transformer Vnet. Phys Med Biol. 2022. https://doi.org/10.1088/1361-6560/ac95f7.
Article PubMed PubMed Central Google Scholar
Kawula M, Purice D, Li M, Vivar G, Ahmadi SA, Parodi K, Belka C, Landry G, Kurz C. Dosimetric impact of deep learning-based CT auto-segmentation on radiation therapy treatment planning for prostate cancer. Radiat Oncol. 2022. https://doi.org/10.1186/s13014-022-01985-9.
Article PubMed PubMed Central Google Scholar
Shen J, Tao Y, Guan H, Zhen H, He L, Dong T, Wang S, Chen Y, Chen Q, Liu Z, Zhang F. Clinical validation and treatment plan evaluation based on autodelineation of the clinical target volume for prostate cancer radiotherapy. Technol Cancer Res Treat. 2023;22:1–8. https://doi.org/10.1177/15330338231164883.
Article CAS Google Scholar
Mofid B, Mohammad S, Mosalla M, Goodarzi M, Tavakoli H. Deep CNN-based fully automated segmentation of pelvic multi-organ on ct images for prostate cancer radiotherapy. J Biomed Phys Eng. 2024. https://doi.org/10.31661/jbpe.v0i0.2307-1649.
Article Google Scholar
Wang R, Lei T, Cui R, Zhang B, Meng H, Nandi AK. Medical image segmentation using deep learning: a survey. IET Image Process. 2022;16:1243–67. https://doi.org/10.1049/IPR2.12419.
Article Google Scholar
Zhang Y, Liu H, Hu Q. TransFuse: fusing transformers and CNNs for medical image segmentation. Lect Notes Comput Sci. Cham: Springer International Publishing; 2021. https://doi.org/10.1007/978-3-030-87193-2_2.
Book Google Scholar
Xiao H, Li L, Liu Q, Zhu X, Zhang Q. Transformers in medical image segmentation: A review. Biomed Signal Process Control. 2023. https://doi.org/10.1016/j.bspc.2023.104791.
Article Google Scholar
He Q, Yang Q, Xie M. HCTNet: a hybrid CNN-transformer network for breast ultrasound image segmentation. Comput Biol Med. 2023;155: 106629. https://doi.org/10.1016/J.COMPBIOMED.2023.106629.
Article Google Scholar
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M. Swin-Unet: Unet-Like pure transformer for medical image segmentation lect notes comput sci. Cham: Springer; 2023. https://doi.org/10.1007/978-3-031-25066-8_9.
Book Google Scholar
Zhang Z, Zhao T, Gay H, Zhang W, Sun B. ARPM-net: A novel CNN-based adversarial method with Markov random field enhancement for prostate and organs at risk segmentation in pelvic CT images. Med Phys. 2021;48:227–37. https://doi.org/10.1002/MP.14580.
Article CAS PubMed Google Scholar
Vanneste BGL, Boychak O, Nordsmark M, Hoffmann L. Means for target volume delineation and stabilisation: fiducial markers, balloons and others. Image-Guided High-Precision Radiother. 2022:221–47. https://doi.org/10.1007/978-3-031-08601-4_10
Google Scholar
Kearney V, Chan JW, Wang T, Perry A, Yom SS, TD. Solberg, attention-enabled 3D boosted convolutional neural networks for semantic CT segmentation using deep supervision. Phys Med Biol. 2019. https://doi.org/10.1088/1361-6560/ab2818.
Article PubMed Google Scholar
Kiljunen T, Akram S, Niemelä J, Löyttyniemi E, Seppälä J, Heikkilä J, Vuolukka K, Kääriäinen OS, Heikkilä VP, Lehtiö K, Nikkinen J, Gershkevitsh E, Borkvel A, Adamson M, Zolotuhhin D, Kolk K, Pang EPP, Tuan JKL, Master Z, Chua MLK, Joensuu T, Kononen J, Myllykangas M, Riener M, Mokka M, Keyriläinen J. A deep learning-based automated CT segmentation of prostate cancer anatomy for radiation therapy planning-a retrospective multicenter study. Diagnostics. 2020. https://doi.org/10.3390/diagnostics10110959.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors are grateful to Hamed Alizadeh for his valuable assistance in data collection at Razavi Hospital.

Funding

This work is part of the Ph.D. thesis and was funded by Mashhad University of Medical Sciences.

Author information

Authors and Affiliations

Department of Medical Physics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
Najmeh Arjmandi, Shahrokh Nasseri & Mehdi Momennezhad
Medical Physics Research Center, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
Shahrokh Nasseri & Mehdi Momennezhad
Ionizing and Non-Ionizing Radiation Protection Research Center, School of Paramedical Sciences, Shiraz University of Medical Sciences, Shiraz, Iran
Alireza Mehdizadeh
Department of Radiation Oncology, Mashhad University of Medical Sciences, Mashhad, Iran
Sare Hosseini & Zohreh Pishevar
Cancer Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
Sare Hosseini
Medical Physics Department, Reza Radiation Oncology Center, Mashhad, Iran
Shokoufeh Mohebbi
Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
Amin Amiri Tehranizadeh

Authors

Najmeh Arjmandi
View author publications
You can also search for this author in PubMed Google Scholar
Shahrokh Nasseri
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Momennezhad
View author publications
You can also search for this author in PubMed Google Scholar
Alireza Mehdizadeh
View author publications
You can also search for this author in PubMed Google Scholar
Sare Hosseini
View author publications
You can also search for this author in PubMed Google Scholar
Shokoufeh Mohebbi
View author publications
You can also search for this author in PubMed Google Scholar
Amin Amiri Tehranizadeh
View author publications
You can also search for this author in PubMed Google Scholar
Zohreh Pishevar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

N.A. made significant contributions to the research, including data collection, image processing, implementation, optimization, analysis of the results, and manuscript writing. M.M. conceptualized the study. S.H. performed organ contouring on the TPS system, while Z.P. contributed to data collection, preparation, and manuscript writing. S.M. collaborated on data collection. A.M. provided valuable input to the final manuscript. A.A.T. collaborated on model optimization and final manuscript writing. S.N. supervised the project. All authors discussed the results, reviewed the manuscript, and approved the final version.

Corresponding authors

Correspondence to Amin Amiri Tehranizadeh or Zohreh Pishevar.

Ethics declarations

Ethics approval and consent to participate

This study was conducted in accordance with ethical principles and the national norms and standards for conducting Medical Research in Iran. The Mashhad University of Medical Sciences Ethics Committee approved the study (Approval ID IR.MUMS.MEDICAL.REC.1399.667) and waived the need for informed consent due to the retrospective nature of the study and the use of anonymized data. All patient data utilized was anonymized during export from the Treatment Planning System (TPS) and obtained from archived PACS data of previously treated patients, ensuring the privacy and confidentiality of all individuals.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Arjmandi, N., Nasseri, S., Momennezhad, M. et al. Automated contouring of CTV and OARs in planning CT scans using novel hybrid convolution-transformer networks for prostate cancer radiotherapy. Discov Onc 15, 323 (2024). https://doi.org/10.1007/s12672-024-01177-9

Download citation

Received: 24 March 2024
Accepted: 18 July 2024
Published: 31 July 2024
DOI: https://doi.org/10.1007/s12672-024-01177-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Automated contouring of CTV and OARs in planning CT scans using novel hybrid convolution-transformer networks for prostate cancer radiotherapy

Abstract

Purpose objective(s)

Materials/methods

Results

Conclusion

Similar content being viewed by others

Clinical implementation of MRI-based organs-at-risk auto-segmentation with convolutional networks for prostate radiotherapy

A Novel Hybrid Convolutional Neural Network for Accurate Organ Segmentation in 3D Head and Neck CT Images

Pelvic U-Net: multi-label semantic segmentation of pelvic organs at risk for radiation therapy anal cancer patients using a deeply supervised shuffle attention convolutional neural network

Explore related subjects

1 Introduction

2 Materials and methods

2.1 Patient data

2.2 Data preprocessing

2.3 ViT model implementation

2.4 Network architecture

2.4.1 CNN encoding part

2.4.2 Transformer encoding part

2.4.3 Fusion block

2.5 Model training strategy

2.6 Model loss function

2.7 Network evaluation criteria

3 Results

3.1 Ablation study

3.2 Quantitative results

3.3 Qualitative results

3.4 Comparison with the state-of-the-art techniques

4 Discussion

5 Conclusion and future work

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation