Prostate Cancer Detection Using a Transformer-Based Architecture and Radiomic-Based Postprocessing

Mitura, Jakub; Jóźwiak, Rafał; Mykhalevych, Ihor; Gorbenko, Iryna; Sobecki, Piotr; Lorenc, Tomasz; Tupikowski, Krzysztof

doi:10.1007/978-3-031-37649-8_11

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 710))

Included in the following conference series:

Machine Intelligence and Digital Interaction Conference

1363 Accesses

Abstract

The detection of prostate cancer is an important challenge for medical personnel. To improve the medical system’s ability to process increasing numbers of oncological patients, demand for automation systems is growing. At the National Information Processing Institute, such systems are undergoing active development. In this work, the authors present the results of a pilot study whose goal is to analyze possible directions in the development of new, advanced deep learning systems using a high quality dataset that is currently in development.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction

Prostate cancer is one of the most common neoplasms in men [6]. This indicates the importance of developing systems for its efficient detection, treatment, and monitoring. The gold standard of cancer diagnosis is the study of histopathology; however, due to high variability in the structure of the prostate gland, particularly among older patients, the selection of optimal sites for biopsy remains challenging. This explains the necessity of medical imaging. The most established imaging modality for prostate cancer detection is multimodal magnetic resonance imaging (MRI). However, the interpretation of the multimodal 3D images requires time and expertise from radiologists. The increasing average age of patients and the rising prevalence of cancers place intense pressure on medical organizations to supply enough skilled personnel to meet growing demand. One possible solution for alleviating this problem lies in the design of automated systems for cancer detection. This, in turn, has led to growing demand for high quality datasets and deep learning algorithms. Both solutions are undergoing active development at the National Information Processing Institute.

The selection of architecture is one of the most crucial decisions that influences a model’s performance. Until recently, most of the research conducted in computer vision was based on convolutional neural networks, during a time when natural language processing tasks witnessed an explosion of transformer-based architectures. However, according to new research in computer vision, transformer-based architectures promise performance that is consistently better than that of convolutional neural networks [8]. One of the main characteristics of convolutional neural networks is the enforcement of models to include information on the local co-occurrence of image features, which have been proven to be a significant inductive bias. Pure transformers do not share this characteristic; they learn the spatial correlations between image features via attention mechanisms. This adds a number of degrees of freedom to the models that enable them to learn the nonlocal, long-range dependencies in images, at the cost of requiring larger datasets to achieve the same performance. Moreover, the newest research [19] tackles the high memory requirements of nonmodified transformer architectures and the technical problems in training larger models on graphical processing units. One solution involves fusing convolutional and transformer-based architectures to take advantage of both using a hybrid transformer. This can be achieved by inserting a transformer into different layers of a U-shaped architecture, composing architectures, and using attention mechanisms on features calculated by convolutional neural networks [8]. The authors of this article concentrated on the first type of hybrid architecture, as they have already proven to be efficient in multimodal MRI settings [7] and specifically in prostate cancer detection [17]. At the time of writing, no consensus exists on the best available transformer-based architecture for prostate cancer detection and segmentation. This points to the necessity of further research and experimentation, the preliminary results of which are presented below.

2 Material and Methods

The data used to train and validate the model was accessed from Artificial Intelligence and Radiologists at Prostate. Cancer Detection in MRI: The PI-CAI Challenge [1]. The data encompasses 1,500 partially labelled cases of prostate parametric MRI (bpMRI). The labels, when present, indicate the locations of prostate cancer. The algorithm described below utilized T2-weighted imaging (T2W), axial-computed high value (\(\ge \) 1400 s /mm2) diffusion-weighted imaging (DWI), and axial-apparent diffusion coefficient maps (ADCs). The labels were annotated manually by human experts, and at least two changes were considered significant for the International Society of Urological Pathology (ISUP). The main library used in the work was Monai [4], which is a PyTorch-based [14] framework for deep learning in the medical imaging domain. To improve the code structure and training time, the code was refactored for use with Pytorch Lightning [5]. Image preprocessing was completed using the proposed algorithm from the PI-CAI Challenge [1], based on the nnUnet [9] architecture. All preprocessing steps were implemented as Monai transforms. Image augmentations were performed using the batchgenerators library [10]. To improve the reproducibility of the algorithm, training and inference were conducted using Docker containers [13]. All experiments were performed in the Google Cloud cluster using a server with NVIDIA A100 40 GB RAM GPU.

2.1 Preprocessing

The MRI data was normalized in each channel using z-score normalization. The image shape was set to (256, 256,32) for it to be a multiple of the sixteen in each axis, as the chosen architecture required. The spacing of the dataset was highly inhomogeneous; for this reason, all images were resampled to achieve (0.5,0.5,3.0) voxel size. Image augmentations were performed using the batchgenerators library [10] and encompassed Gaussian noise, elastic deformations, Gaussian blur, brightness modifications, contrast augmentations, simulations of low resolution, and mirroring. All of the labels were converted to binary masks and included in augmentations that led to spatial deformations of the original images.

2.2 Deep Learning Architecture

We selected Swin UNETR [7] as the architecture because it demonstrates characteristics that are crucial for the further development and finetuning of the algorithm on the new dataset in development. The neural network architecture is based on transformers. This has multiple advantages over traditional, convolution-based architectures. Primarily, it increases the receptive field, which enables the learning of long-range image dependencies. It partially avoids translation invariance of convolutions, which, in the context of medical imaging, can lead to the loss of relevant location-based information. Transformer-based architectures also have generally higher expressive power due to their less pronounced inductive bias. However, such architectures also cause difficulties due to their high memory footprint and relatively poor performance on small datasets (because of reduced inductive bias). The architecture is summarised in Fig. 1.

For the current work and the dataset in development, the Swin UNETR architecture has additional crucial characteristics that are well suited to modelling multimodal images. As a transformer architecture, it is possible to extend Swin UNETR to incorporate clinical data in tabular form.

2.3 Optimization

The model’s optimization was implemented using the PyTorch AdamW [12] optimizer. Cosine annealing with warm restarts [11] was used for the Learning Rate Scheduler, and the initial learning rate was established by the Learning Rate Finder [18] implemented in PyTorch Lightning.

2.4 Hyperparameter Selection

Hyperparameter tuning was achieved using a genetic algorithm implemented in the Optuna [2] library. Hyperparameter tuning was used in the selection of the optimizer, architecture, and optimizer-related decisions like the Learning Rate Scheduler.

2.5 Postprocessing

The training was conducted as a five-fold cross-validation using splits provided by the contest organizers, and the outputs of each fold were combined by a mean ensemble algorithm. The model’s output was passed through a sigmoid activation function before lesion candidates were extracted using the report guided annotation library [3]. The proposed lesions were analyzed further by assessment of simple radiomic characteristics that are important for the task at hand; this can help increase the model’s precision by filtering out some false positive results. Proposed lesions were assessed for their:

size, where too big and small lesions were filtered out;
elongation and roundness, where highly elongated changes were filtered out, as they typically represented the obturator internus muscle or some of the large vessels in the pelvis;
the hypointensity of the ADC map and the hyperintensity of a high b-value DW image, defined as the difference of the mean value of complementary modalities concerning a lesion’s neighborhood. As the presence of hyperintense lesions on a high b-value DW image with related hypointense signal intensity on the ADC map is typical for prostate cancer, lesions that failed to meet this criterion were filtered out.

Figure 2 presents an example of the algorithm output, before and after the changes are filtered out by their radiomic features.

Table 1. A summary of the simple shape statistics of segmented instances

Full size table

3 Results and Discussion

Validation of the algorithm was performed using the Picai evaluation library [15] on the validation dataset provided by the contest organizers. Preliminary results for the model give a Ranking Score of 0.531, Area Under the Receiver Operating Characteristic curve of 0.686, and Average Precision of 0.376. An analysis of simple radiomic characteristics was performed and is summarized in Table 1. For each measured quantity—elongation, physical size, and roundness—the incorrect segmented instances presented approximately two times higher standard deviations, which indicates far higher variability. This also suggests a far wider distribution of the aforementioned quantities and the possibility of identifying suitable thresholds that define some of the segmented instances as false positives with high probability. As an example, in Fig. 3, one can observe that in the dataset, all segmented instances with roundness lower than 0.4 were false positives. A similar analysis can be performed for all other quantities. However, final conclusions regarding increases in model specificity using radiomic-based postprocessing require further study.

The results suggest that the model performs comparably to the state-of-the-art non-transformer-based baseline architectures provided by the contest organizers. However, a significant number of the top-ranking results that are presented on the contest leaderboard are based on transformer architectures. This demonstrates their impressive ability to learn the presented task and the presence of further opportunities for optimization.

4 Conclusions

This study indicates the usefulness of new transformer-based architectures in multimodal three-dimensional medical imaging. An additional feature considered necessary for analyzing the dataset is the proven ability of transformer-based architectures to incorporate data from different sources [16]. This provides a strong base for incorporating clinical data directly into the neural network architecture. Radiomic analysis performed in the postprocessing step proved helpful in the study by increasing the model’s specificity; work on more advanced radiomic analysis is fully justified. The use of model PyTorch-based libraries enabled efficient training, which supplies further proof of its efficiency. Such tools can serve as the basis for additional work on the algorithm’s development.

References

Artificial intelligence and radiologists at prostate cancer detection in MRI: The PI-CAI challenge. https://pi-cai.grand-challenge.org/PI-CAI/
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019)
Google Scholar
Bosma, J., Saha, A., Hosseinzadeh, M., Slootweg, I., de Rooij, M., Huisman, H.: Report-guided automatic lesion annotation for deep learning-based prostate cancer detection in bpMRI (2021)
Google Scholar
Diaz-Pinto, A., et al.: Monai label: A framework for AI-assisted interactive labeling of 3d medical images (2022)
Google Scholar
Falcon, W., et al.: Pytorchlightning/pytorch-lightning: 0.7.6 release (2020). https://doi.org/10.5281/zenodo.3828935
Grönberg, H.: Prostate cancer epidemiology. Lancet 361(9360), 859–864 (2003). https://doi.org/10.1016/S0140-6736(03)12713-4, https://www.sciencedirect.com/science/article/pii/S0140673603127134
Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H., Xu, D.: Swin UNETR: swin transformers for semantic segmentation of brain tumors in MRI images. In: Crimi, A., Bakas, S. (eds.) BrainLes 2021. LNCS, vol. 12962, pp. 272–284. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08999-2_22
Chapter Google Scholar
He, K., et al.: Transformers in medical image analysis: a review (2022). https://doi.org/10.48550/ARXIV.2202.12165, https://arxiv.org/abs/2202.12165
Isensee, F., Jaeger, P., Kohl, S., Petersen, J., Maier-Hein, K.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 1–9 (2021). https://doi.org/10.1038/s41592-020-01008-z
Article Google Scholar
Isensee, F., et al.: batchgenerators - a python framework for data augmentation (2020). https://doi.org/10.5281/zenodo.3632567
Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts (2016). https://doi.org/10.48550/ARXIV.1608.03983, https://arxiv.org/abs/1608.03983
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2017). https://doi.org/10.48550/ARXIV.1711.05101, https://arxiv.org/abs/1711.05101
Merkel, D.: Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014(239), 2 (2014)
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Saha, A., et al.: Artificial intelligence and radiologists at prostate cancer detection in MRI: the PI-CAI challenge (study protocol) (2022). https://doi.org/10.5281/zenodo.6667655
Serdyuk, D., Braga, O., Siohan, O.: Transformer-based video front-ends for audio-visual speech recognition for single and multi-person video (2022). https://doi.org/10.48550/ARXIV.2201.10439, https://arxiv.org/abs/2201.10439
Singla, D., Cimen, F., Aluganti, C.: Novel artificial intelligent transformer u-net for better identification and management of prostate cancer. Mol. Cell. Biochem. 478, 1439–1445 (2022). https://doi.org/10.1007/s11010-022-04600-3
Article Google Scholar
Smith, L.N.: Cyclical learning rates for training neural networks (2015). https://doi.org/10.48550/ARXIV.1506.01186, https://arxiv.org/abs/1506.01186
Wu, C.Y., et al.: Memvit: memory-augmented multiscale vision transformer for efficient long-term video recognition, pp. 13577–13587 (2022). https://doi.org/10.1109/CVPR52688.2022.01322

Download references

Acknowledgements

This work has been funded by the Polish National Centre for Research and Development as part of the program, INFOSTRATEG I, project INFOSTRATEG-I/0036/2021 “AI-augmented radiology - detection, reporting and clinical decision making in prostate cancer diagnosis”.

Author information

Authors and Affiliations

Laboratory of Applied Artificial Intelligence, National Information Processing Institute, Warsaw, Poland
Jakub Mitura, Rafał Jóźwiak, Ihor Mykhalevych, Iryna Gorbenko & Piotr Sobecki
Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
Rafał Jóźwiak & Ihor Mykhalevych
I Department of Clinical Radiology, Medical University of Warsaw, Warsaw, Poland
Tomasz Lorenc
Lower Silesian Oncology, Pulmonology and Hematology Center, Wrocław, Poland
Krzysztof Tupikowski
Medical University Lublin, Lublin, Poland
Jakub Mitura

Authors

Jakub Mitura
View author publications
You can also search for this author in PubMed Google Scholar
Rafał Jóźwiak
View author publications
You can also search for this author in PubMed Google Scholar
Ihor Mykhalevych
View author publications
You can also search for this author in PubMed Google Scholar
Iryna Gorbenko
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Sobecki
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Lorenc
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Tupikowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafał Jóźwiak .

Editor information

Editors and Affiliations

National Research Institute, National Information Processing Institut, Warszaw, Poland
Cezary Biele
Polish Academy of Sciences, Systems Research Institute, Warsaw, Poland
Janusz Kacprzyk
Polish-Japanese Academy of Information T, Warsaw, Poland
Wiesław Kopeć
Polish Academy of Sciences, Systems Research Institute, Warsaw, Poland
Jan W. Owsiński
Institute of Applied Computer Science, Łódż University of Technology, Łódź, Poland
Andrzej Romanowski
Department of Informatics in Management, Faculty of Management and Economics, Gdańsk University of Technology, Gdańsk, Poland
Marcin Sikorski

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mitura, J. et al. (2023). Prostate Cancer Detection Using a Transformer-Based Architecture and Radiomic-Based Postprocessing. In: Biele, C., Kacprzyk, J., Kopeć, W., Owsiński, J.W., Romanowski, A., Sikorski, M. (eds) Digital Interaction and Machine Intelligence. MIDI 2022. Lecture Notes in Networks and Systems, vol 710. Springer, Cham. https://doi.org/10.1007/978-3-031-37649-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-37649-8_11
Published: 25 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37648-1
Online ISBN: 978-3-031-37649-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Prostate Cancer Detection Using a Transformer-Based Architecture and Radiomic-Based Postprocessing