Abstract
The detection of prostate cancer is an important challenge for medical personnel. To improve the medical system’s ability to process increasing numbers of oncological patients, demand for automation systems is growing. At the National Information Processing Institute, such systems are undergoing active development. In this work, the authors present the results of a pilot study whose goal is to analyze possible directions in the development of new, advanced deep learning systems using a high quality dataset that is currently in development.
You have full access to this open access chapter, Download conference paper PDF
Keywords
1 Introduction
Prostate cancer is one of the most common neoplasms in men [6]. This indicates the importance of developing systems for its efficient detection, treatment, and monitoring. The gold standard of cancer diagnosis is the study of histopathology; however, due to high variability in the structure of the prostate gland, particularly among older patients, the selection of optimal sites for biopsy remains challenging. This explains the necessity of medical imaging. The most established imaging modality for prostate cancer detection is multimodal magnetic resonance imaging (MRI). However, the interpretation of the multimodal 3D images requires time and expertise from radiologists. The increasing average age of patients and the rising prevalence of cancers place intense pressure on medical organizations to supply enough skilled personnel to meet growing demand. One possible solution for alleviating this problem lies in the design of automated systems for cancer detection. This, in turn, has led to growing demand for high quality datasets and deep learning algorithms. Both solutions are undergoing active development at the National Information Processing Institute.
The selection of architecture is one of the most crucial decisions that influences a model’s performance. Until recently, most of the research conducted in computer vision was based on convolutional neural networks, during a time when natural language processing tasks witnessed an explosion of transformer-based architectures. However, according to new research in computer vision, transformer-based architectures promise performance that is consistently better than that of convolutional neural networks [8]. One of the main characteristics of convolutional neural networks is the enforcement of models to include information on the local co-occurrence of image features, which have been proven to be a significant inductive bias. Pure transformers do not share this characteristic; they learn the spatial correlations between image features via attention mechanisms. This adds a number of degrees of freedom to the models that enable them to learn the nonlocal, long-range dependencies in images, at the cost of requiring larger datasets to achieve the same performance. Moreover, the newest research [19] tackles the high memory requirements of nonmodified transformer architectures and the technical problems in training larger models on graphical processing units. One solution involves fusing convolutional and transformer-based architectures to take advantage of both using a hybrid transformer. This can be achieved by inserting a transformer into different layers of a U-shaped architecture, composing architectures, and using attention mechanisms on features calculated by convolutional neural networks [8]. The authors of this article concentrated on the first type of hybrid architecture, as they have already proven to be efficient in multimodal MRI settings [7] and specifically in prostate cancer detection [17]. At the time of writing, no consensus exists on the best available transformer-based architecture for prostate cancer detection and segmentation. This points to the necessity of further research and experimentation, the preliminary results of which are presented below.
2 Material and Methods
The data used to train and validate the model was accessed from Artificial Intelligence and Radiologists at Prostate. Cancer Detection in MRI: The PI-CAI Challenge [1]. The data encompasses 1,500 partially labelled cases of prostate parametric MRI (bpMRI). The labels, when present, indicate the locations of prostate cancer. The algorithm described below utilized T2-weighted imaging (T2W), axial-computed high value (\(\ge \) 1400 s /mm2) diffusion-weighted imaging (DWI), and axial-apparent diffusion coefficient maps (ADCs). The labels were annotated manually by human experts, and at least two changes were considered significant for the International Society of Urological Pathology (ISUP). The main library used in the work was Monai [4], which is a PyTorch-based [14] framework for deep learning in the medical imaging domain. To improve the code structure and training time, the code was refactored for use with Pytorch Lightning [5]. Image preprocessing was completed using the proposed algorithm from the PI-CAI Challenge [1], based on the nnUnet [9] architecture. All preprocessing steps were implemented as Monai transforms. Image augmentations were performed using the batchgenerators library [10]. To improve the reproducibility of the algorithm, training and inference were conducted using Docker containers [13]. All experiments were performed in the Google Cloud cluster using a server with NVIDIA A100 40 GB RAM GPU.
2.1 Preprocessing
The MRI data was normalized in each channel using z-score normalization. The image shape was set to (256, 256,32) for it to be a multiple of the sixteen in each axis, as the chosen architecture required. The spacing of the dataset was highly inhomogeneous; for this reason, all images were resampled to achieve (0.5,0.5,3.0) voxel size. Image augmentations were performed using the batchgenerators library [10] and encompassed Gaussian noise, elastic deformations, Gaussian blur, brightness modifications, contrast augmentations, simulations of low resolution, and mirroring. All of the labels were converted to binary masks and included in augmentations that led to spatial deformations of the original images.
2.2 Deep Learning Architecture
We selected Swin UNETR [7] as the architecture because it demonstrates characteristics that are crucial for the further development and finetuning of the algorithm on the new dataset in development. The neural network architecture is based on transformers. This has multiple advantages over traditional, convolution-based architectures. Primarily, it increases the receptive field, which enables the learning of long-range image dependencies. It partially avoids translation invariance of convolutions, which, in the context of medical imaging, can lead to the loss of relevant location-based information. Transformer-based architectures also have generally higher expressive power due to their less pronounced inductive bias. However, such architectures also cause difficulties due to their high memory footprint and relatively poor performance on small datasets (because of reduced inductive bias). The architecture is summarised in Fig. 1.
A simplified schematic diagram of Swin UNETR on the basis of Fig. 1 from Hatamizadeh et al. The input comprises four channels with output of segmentation of whole gland, ADC, HBV, and T2W values [7]
For the current work and the dataset in development, the Swin UNETR architecture has additional crucial characteristics that are well suited to modelling multimodal images. As a transformer architecture, it is possible to extend Swin UNETR to incorporate clinical data in tabular form.
2.3 Optimization
The model’s optimization was implemented using the PyTorch AdamW [12] optimizer. Cosine annealing with warm restarts [11] was used for the Learning Rate Scheduler, and the initial learning rate was established by the Learning Rate Finder [18] implemented in PyTorch Lightning.
2.4 Hyperparameter Selection
Hyperparameter tuning was achieved using a genetic algorithm implemented in the Optuna [2] library. Hyperparameter tuning was used in the selection of the optimizer, architecture, and optimizer-related decisions like the Learning Rate Scheduler.
2.5 Postprocessing
The training was conducted as a five-fold cross-validation using splits provided by the contest organizers, and the outputs of each fold were combined by a mean ensemble algorithm. The model’s output was passed through a sigmoid activation function before lesion candidates were extracted using the report guided annotation library [3]. The proposed lesions were analyzed further by assessment of simple radiomic characteristics that are important for the task at hand; this can help increase the model’s precision by filtering out some false positive results. Proposed lesions were assessed for their:
-
size, where too big and small lesions were filtered out;
-
elongation and roundness, where highly elongated changes were filtered out, as they typically represented the obturator internus muscle or some of the large vessels in the pelvis;
-
the hypointensity of the ADC map and the hyperintensity of a high b-value DW image, defined as the difference of the mean value of complementary modalities concerning a lesion’s neighborhood. As the presence of hyperintense lesions on a high b-value DW image with related hypointense signal intensity on the ADC map is typical for prostate cancer, lesions that failed to meet this criterion were filtered out.
Figure 2 presents an example of the algorithm output, before and after the changes are filtered out by their radiomic features.
3 Results and Discussion
Validation of the algorithm was performed using the Picai evaluation library [15] on the validation dataset provided by the contest organizers. Preliminary results for the model give a Ranking Score of 0.531, Area Under the Receiver Operating Characteristic curve of 0.686, and Average Precision of 0.376. An analysis of simple radiomic characteristics was performed and is summarized in Table 1. For each measured quantity—elongation, physical size, and roundness—the incorrect segmented instances presented approximately two times higher standard deviations, which indicates far higher variability. This also suggests a far wider distribution of the aforementioned quantities and the possibility of identifying suitable thresholds that define some of the segmented instances as false positives with high probability. As an example, in Fig. 3, one can observe that in the dataset, all segmented instances with roundness lower than 0.4 were false positives. A similar analysis can be performed for all other quantities. However, final conclusions regarding increases in model specificity using radiomic-based postprocessing require further study.
The results suggest that the model performs comparably to the state-of-the-art non-transformer-based baseline architectures provided by the contest organizers. However, a significant number of the top-ranking results that are presented on the contest leaderboard are based on transformer architectures. This demonstrates their impressive ability to learn the presented task and the presence of further opportunities for optimization.
4 Conclusions
This study indicates the usefulness of new transformer-based architectures in multimodal three-dimensional medical imaging. An additional feature considered necessary for analyzing the dataset is the proven ability of transformer-based architectures to incorporate data from different sources [16]. This provides a strong base for incorporating clinical data directly into the neural network architecture. Radiomic analysis performed in the postprocessing step proved helpful in the study by increasing the model’s specificity; work on more advanced radiomic analysis is fully justified. The use of model PyTorch-based libraries enabled efficient training, which supplies further proof of its efficiency. Such tools can serve as the basis for additional work on the algorithm’s development.
References
Artificial intelligence and radiologists at prostate cancer detection in MRI: The PI-CAI challenge. https://pi-cai.grand-challenge.org/PI-CAI/
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019)
Bosma, J., Saha, A., Hosseinzadeh, M., Slootweg, I., de Rooij, M., Huisman, H.: Report-guided automatic lesion annotation for deep learning-based prostate cancer detection in bpMRI (2021)
Diaz-Pinto, A., et al.: Monai label: A framework for AI-assisted interactive labeling of 3d medical images (2022)
Falcon, W., et al.: Pytorchlightning/pytorch-lightning: 0.7.6 release (2020). https://doi.org/10.5281/zenodo.3828935
Grönberg, H.: Prostate cancer epidemiology. Lancet 361(9360), 859–864 (2003). https://doi.org/10.1016/S0140-6736(03)12713-4, https://www.sciencedirect.com/science/article/pii/S0140673603127134
Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H., Xu, D.: Swin UNETR: swin transformers for semantic segmentation of brain tumors in MRI images. In: Crimi, A., Bakas, S. (eds.) BrainLes 2021. LNCS, vol. 12962, pp. 272–284. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08999-2_22
He, K., et al.: Transformers in medical image analysis: a review (2022). https://doi.org/10.48550/ARXIV.2202.12165, https://arxiv.org/abs/2202.12165
Isensee, F., Jaeger, P., Kohl, S., Petersen, J., Maier-Hein, K.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 1–9 (2021). https://doi.org/10.1038/s41592-020-01008-z
Isensee, F., et al.: batchgenerators - a python framework for data augmentation (2020). https://doi.org/10.5281/zenodo.3632567
Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts (2016). https://doi.org/10.48550/ARXIV.1608.03983, https://arxiv.org/abs/1608.03983
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2017). https://doi.org/10.48550/ARXIV.1711.05101, https://arxiv.org/abs/1711.05101
Merkel, D.: Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014(239), 2 (2014)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Saha, A., et al.: Artificial intelligence and radiologists at prostate cancer detection in MRI: the PI-CAI challenge (study protocol) (2022). https://doi.org/10.5281/zenodo.6667655
Serdyuk, D., Braga, O., Siohan, O.: Transformer-based video front-ends for audio-visual speech recognition for single and multi-person video (2022). https://doi.org/10.48550/ARXIV.2201.10439, https://arxiv.org/abs/2201.10439
Singla, D., Cimen, F., Aluganti, C.: Novel artificial intelligent transformer u-net for better identification and management of prostate cancer. Mol. Cell. Biochem. 478, 1439–1445 (2022). https://doi.org/10.1007/s11010-022-04600-3
Smith, L.N.: Cyclical learning rates for training neural networks (2015). https://doi.org/10.48550/ARXIV.1506.01186, https://arxiv.org/abs/1506.01186
Wu, C.Y., et al.: Memvit: memory-augmented multiscale vision transformer for efficient long-term video recognition, pp. 13577–13587 (2022). https://doi.org/10.1109/CVPR52688.2022.01322
Acknowledgements
This work has been funded by the Polish National Centre for Research and Development as part of the program, INFOSTRATEG I, project INFOSTRATEG-I/0036/2021 “AI-augmented radiology - detection, reporting and clinical decision making in prostate cancer diagnosis”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this paper
Cite this paper
Mitura, J. et al. (2023). Prostate Cancer Detection Using a Transformer-Based Architecture and Radiomic-Based Postprocessing. In: Biele, C., Kacprzyk, J., Kopeć, W., Owsiński, J.W., Romanowski, A., Sikorski, M. (eds) Digital Interaction and Machine Intelligence. MIDI 2022. Lecture Notes in Networks and Systems, vol 710. Springer, Cham. https://doi.org/10.1007/978-3-031-37649-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-37649-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37648-1
Online ISBN: 978-3-031-37649-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)