Application of artificial neural networks for automated analysis of cystoscopic images: a review of the current status and future prospects
Optimal detection and surveillance of bladder cancer (BCa) rely primarily on the cystoscopic visualization of bladder lesions. AI-assisted cystoscopy may improve image recognition and accelerate data acquisition.
To provide a comprehensive review of machine learning (ML), deep learning (DL) and convolutional neural network (CNN) applications in cystoscopic image recognition.
A detailed search of original articles was performed using the PubMed-MEDLINE database to identify recent English literature relevant to ML, DL and CNN applications in cystoscopic image recognition.
In total, two articles and one conference abstract were identified addressing the application of AI methods in cystoscopic image recognition. These investigations showed accuracies exceeding 90% for tumor detection; however, future work is necessary to incorporate these methods into AI-aided cystoscopy and compared to other tumor visualization tools. Furthermore, we present results from the RaVeNNA-4pi consortium initiative which has extracted 4200 frames from 62 videos, analyzed them with the U-Net network and achieved an average dice score of 0.67. Improvements in its precision can be achieved by augmenting the video/frame database.
AI-aided cystoscopy has the potential to outperform urologists at recognizing and classifying bladder lesions. To ensure their real-life implementation, however, these algorithms require external validation to generalize their results across other data sets.
KeywordsNeural networks Deep learning Cystoscopic images Medical image analysis
Bladder cancer (BCa) accounts for approximately 7% of all newly diagnosed cancers in the USA, being the fourth most common cancer and eighth most lethal in men [1, 2]. In Germany, BCa represents 4.7% of all new cancer cases and 3.2% of all cancer-related deaths . Despite a low stage and grade, recurrence rates range from 50 to 75% at 20 years, setting patients at risk of progression to muscle-invasive disease . Periodic cystoscopic examinations remain the cornerstone of patient follow-up to detect early recurrences and reduce the risk of progression . However, cystoscopic findings are diverse and often challenging to recognize and classify, ranging from healthy tissue to urothelial carcinoma . A precise recognition of these features currently depends on the examiner´s skill and experience, leading to wide inter-observer variability in the interpretation of cystoscopic findings .
Artificial intelligence (AI) is becoming more prevalent in the field of medicine as a critical component of computer-aided diagnosis (CAD) and subsequent treatment. This has been notably shown in the successful application of deep learning (DL) approaches in various medical image analysis tasks . Such works include image classification, segmentation, detection and registration of data obtained from radiology, pathology and endoscopy. The emergence of AI-assisted endoscopy, by training DL algorithms with large data sets of images or videos, has shown promising results, providing CAD tools that improve lesion detection and achieve diagnosis with high sensitivity and specificity [9, 10]. AI-assisted endoscopy offers the potential to radically change surgical practice by assisting physicians in identifying areas of malignancy given the heterogeneous aspect of lesions . In the field of urology, cystoscopy, in particular, can benefit from this methodology. The applicability of CAD in cystoscopic examinations, however, has not been extensively evaluated.
This work provides a brief overview of new methods for tumor visualization, current techniques of automated image evaluation and finally discusses novel AI-based methods to automatically evaluate cystoscopic images.
A comprehensive review of current literature was performed using the PubMed–Medline database up to October 2019 using the term “cystoscopy”, combined with one of the following terms: “machine learning”, “deep learning” and “convolutional neural network”. To capture recent trends in machine learning (ML), deep learning (DL) and convolutional neural network (CNN) applications, the search was limited to articles and abstracts published within the last 5 years, originally published in English. Review articles and editorials were excluded. Publications relevant to the subject and their cited references were retrieved and appraised independently by two authors (M.N. and A.R.). After full text evaluation, data were independently extracted by the authors for further assessment of qualitative and quantitative evidence synthesis. The following information was extracted from each study: name of author, journal and year of publication, AI method, number of participants per study, and outcome prediction accuracy.
State of the art of urological endoscopy
Although white light cystoscopy (WLC) is the current standard of care for the initial evaluation of BCa, it has several shortcomings, such as the difficulty to detect small or flat lesions, including carcinoma in situ (Cis) . Current data suggest that early recurrence in BCa patients may be the result of undetected lesions during cystoscopy and transurethral resection of bladder tumors (TURB) . The application of advanced optical techniques is becoming more widespread to improve the diagnostic sensitivity of inconspicuous lesions and prevent recurrent non-muscle invasive bladder cancer (NMIBC).
In photodynamic diagnosis (PDD) or fluorescence cystoscopy, an optical imaging agent is instilled preoperatively into the bladder resulting in the accumulation of protoporphyrins in rapidly proliferating cells such as malignant bladder tumors . They are subsequently converted to photoactive porphyrins emitting a red fluorescent light under blue light. Studies have demonstrated that PDD improves Cis detection and reduces recurrences in patients with known or suspected NMIBC thereby also reducing health-care costs . Its role in the surveillance scenario is emerging for use in patients with high-risk tumor lesions .
Narrow band imaging (NBI) has been similarly designed to improve BCa detection compared to WLC. This technology aims to augment the visualization of tumoral blood vessels by light modulation, by narrowing the bandwidth of light output to 415 and 540 nm, which is strongly absorbed by hemoglobin .
Confocal laser endomicroscopy (CLE) couples microscopic imaging with a fiber-optic bundle transmitting a 488 nm laser light, which provides real-time and in vivo histopathologic information . The CLE probe has a 0.85 mm to 2.6 mm diameter, 240 μm penetration depth and can pass through the working channel of standard cystoscopes . It uses fluorescein as a contrast agent and can obtain a thorough evaluation of tissue structures to distinguish between low- and high-grade BCa. Unfortunately, a small field of view and reduced tissue penetration limit its ability for surveying the entire bladder. Moreover, large-scale studies examining the diagnostic accuracy of CLE, including sensitivity and specificity, are lacking.
Furthermore, new techniques and methods are currently under development: optical coherence tomography (OCT) is a non-invasive real-time microscopic imaging technique using a near-infrared wavelength light (890–1300 nm) with a 2 mm penetration depth that produces an ultrasound-like image . OCT measures changes of backscattered light with an interferometer caused by asymmetric structures within the tissue and can be applied to standard rigid cystoscopy given its 2.7 mm diameter probe . Huang et al. recently performed a meta-analysis to evaluate the diagnostic accuracy of OCT. The sensitivity and specificity of the included publications ranged from 76 to 100% and 62% and 97%, resulting in a pooled sensitivity and specificity of 0.96 and 0.82, respectively . OCT is limited by a small area of analysis, making the examination of the entire bladder impractical; therefore, its use with other ancillary methods is encouraged.
Raman spectroscopy (RS) is an endomicroscopic technology that does not require photoactive substances and can be used to examine molecular components in tissue. RS uses infrared light (785–845 nm) to analyze the inelastic photon scattering after its interaction with intramolecular bonds. RS can determine the integrity of bladder wall layers, assess penetration depth and recognize low- or high-grade BCa . Chen et al. recently applied a fiber-optic Raman sensing probe to evaluate the diagnostic potential of RS in identifying various bladder pathologies ex vivo. The sensitivities and specificities for normal bladder tissue and low-grade and high-grade bladder tumors were 88.5%, 95.1% and 90.3%, and 98%, 97.5% and 96.4%, respectively . Nevertheless, RS has several limitations, such as a restricted field of view, making the screening of the entire bladder surface impractical and should be targeted at suspicious lesions identified by PDD or NBI.
Multiphoton microscopy (MPM) uses a laser-scanning microscope and simultaneously absorbs two near-infrared photons (700–800 nm) based on the autofluorescence of cells and extracellular components with intrinsic tissue fluorophores, such as flavin adenine dinucleotide (FAD) and nicotinamide adenine dinucleotide (NADH) to provide information on cellular metabolic activity . Pradère et al. recently assessed an optical multimodal technique on samples of patients suspected of having BCa and found that it was able to discriminate tumor from healthy tissue and determine the grade of tumors . However, when MPM is used alone, it is limited by its shallow penetration and the difficulty to recognize intranuclear modifications.
As opposed to the previously mentioned optical techniques, which require a human professional for interpretation, AI methods rely on training CNNs with images or videos so it may learn to recognize complex patterns. Although research in this area has increased dramatically in recent years, there is presently no real-time application of AI-assisted cystoscopy in the clinical setting to accurately and reproducibly classify cystoscopic findings.
Artificial neural networks for analysis of images: an overview
In image analysis, the term object recognition refers to the detection or classification of objects in image or video data. It describes whether or to what extent input data are similar to previously seen objects. Methods for object class recognition are usually based on a learning process. Classical image processing methods such as SIFT  determine from a training data set a suitable selection of features that can be used to reliably identify instances of the object class. In recent years, object recognition methods using deep learning (DL) or artificial neural networks (ANNs) have shown to be superior to classical methods . ANNs have a strong representational power that enables them to learn complex hierarchical representation of images, thus increasingly representing abstract concepts. They are powerful at generalizing to never seen data. This is especially the case when it comes to recognizing a multitude of different objects whose appearance also varies greatly. ANNs are also more robust against variations, e.g., masking, fading of colors, damage or soiling. As early as the end of the 1990s, ANNs were used to solve simple tasks such as handwriting recognition (so-called LeNet) . DL differs from the classical methods for object recognition primarily in that the objects or object classes to be recognized are not described by a set of manually predefined features, but the feature representation of an object is learned during training from the neural network.
Artificial neural networks
The selection of the activation function depends on the application and represents one of the hyperparameters that influence the performance of the neural network. Typical activation functions are, e.g., Rectified Linear Unit (ReLU), Logistic Sigmoid Activation Function or Hyperbolic Tangent (tanh). The activation function must be differentiable so that the derivation or gradient can be determined during training. The activations in subsequent hidden layers are calculated analogously. In summary, the output values are calculated from the input values for the MLP.
Although the MLP seems simple in its structure, many tasks can be solved with it. Thus, any limited continuous function can be approximated with any accuracy by an MLP with only one hidden layer . Based on this simple network structure, a variety of network architectures for different applications have been developed.
Convolutional neural networks
In the field of image processing, the so-called convolutional neural networks (CNNs) are of particular interest . These networks take advantage of the fact that information in images is mostly local, i.e., limited to a smaller part of the image. They exploit this spatial layout of images by using local connectivity and parameter sharing . Therefore, filters can be used for the analysis of the image content, which only look at a small part of the image. This means that a much smaller number of neurons is required than is the case for fully connected networks, which results in considerable reduction in computing time and memory requirements. However, it is only in recent years that it has become possible to develop networks that can handle huge amount of input data for the detection of more complex objects. This is due in particular to the availability of powerful and highly parallel graphics processors (graphical processing units, GPUs) . CNNs have played a major role in the success of DL for image analysis in a wide range of applications including image classification [26, 33, 34], image segmentation , instance segmentation  and object detection [35, 36].
Instance segmentation A segmentation approach that combines object detection  and semantic segmentation , the so-called instance segmentation, is a rapidly growing application area of CNNs. The idea is to localize objects in an image and precisely segment each instance . For medical image analysis, instance segmentation models such as Mask-RCNN are reported to be powerful at detecting instances accurately, but generate less accurate segmentation masks compared to U-Net .
An essential requirement for training neural networks is to have a sufficient amount of annotated training data. Manual annotation is time-consuming; for instance, in image segmentation each object must be traced exactly. Ideally, the training data and the test data should come from the same distribution. While it is possible to classify objects in similar scenes with networks pre-trained on data from related domains, a good quality training data is very crucial. Training on large amount of data is very computation intensive and can take several days to weeks even on powerful graphics cards [47, 48]. During inference, however, the fully trained networks usually carry out the analysis of an image quickly. Typical values are processing times from 5 ms per image for simple tasks up to approximately 500 ms for more complex analyses using a current GPU; of course, this depends on the degree of optimization of the algorithms, training procedures, and the performance of the hardware used [35, 47].
Another challenge in training DL models is their sensitivity to many hyperparameters that have to be set properly to achieve optimal results. Tuning these hyperparameters manually requires domain knowledge of their effect on the generalization of the network  and is time intensive. Significant progress has been done in automatic tuning of hyperparameters for DL models; however, the process is still computationally intensive [49-52].
Application of ANNs for urological endoscopy
The field of urology has benefitted from recent advances in DL. However, in early stages, significant work has been done toward application of DL in areas such as image classification , object detection  and image segmentation. Ikeda et al. trained a CNN with 177 images of histologically confirmed bladder tumors and 133 images of healthy urothelium . The neural network was pre-trained with 1.2 million images from ImageNet data set , which were classified into 1000 categories. The output revealed 93.0% sensitivity and 83.7% specificity for differentiating between tumoral and healthy urothelium.
Eminaga et al. studied 479 cystoscopy videos from patients with 44 urologic findings . The authors used data augmentation to enlarge their data set of 479 images by up to a factor of 40, resulting in 18,681 images. The authors evaluated several CNN models and found that the Xception-based model achieved the highest accuracy score (99.52%) for the diagnostic classification of cystoscopic images. Nonetheless, 7.86% of bladder stone images and 1.43% containing indwelling catheters were falsely classified .
Summary of current approaches of AI in cystoscopic images
Artificial neural networks on analysis of cystoscopic images
Eminaga et al. 2018 
Image classification, 44 classes
18, 681 training images, used data augmentation
Xception model: F1 score (99.52%). Other models 0.04–4.05 difference in performance to Xception model
Shkolyar et al. 2015 
Instance segmentation, 2 classes: cancer and benign
Training with 417 cancer and 2335 normal frames, validation 211 cancer, 1002 normal frames
per-frame sensitivity was 88%, per-frame specificity was 99%, per-tumor sensitivity was 90%
Ikeda et al. 2018 
Image classification, binary classes: tumoral and healthy urothelium
Data from 177 tumor lesion images and 133 normal images
93.0% sensitivity and 83.7% specificity
The impact of AI in interpreting cystoscopic images is currently under study in a recent collaborative research project, RaVeNNA-4pi. The goal of the project is to develop a digital platform with 4PI Endoimaging for 3D-reconstruction, semantic segmentation and visualization of the bladder for BCa documentation and improved patient follow-up.
The procedure uses U-Net, an encoder–decoder network that has been used successfully in various medical image analysis tasks [8, 38, 42]. The network architecture was slightly modified by adding batch normalization after each convolutional layer. To tackle class imbalance in the data, we use a weighted categorical cross-entropy loss. Data augmentation methods such as image rotation and zoom were used to make the network more robust.
We found that certain classes, such as papillary tumors and Cis suspicion, have high inter-class similarities and therefore are challenging for pixel-wise classification, and thereby degrade the average DSC performance over all classes. The current research work focuses on developing methods to distinguish these classes with higher precision/recall.
Medical care is one of the most important areas of a modern society. The contributions made in this field in clinics, research and teaching have continuously improved the standard of living of citizens and increased life expectancy worldwide. In the recent past, medical research has been strongly influenced by the inclusion of IT methods and different microsystem technologies among diagnostic and therapeutic means. Machine learning (ML) enables highly efficient investigation of large data sets in which regularities and patterns are searched for. Particularly in cystoscopy, image-based data are generated that have not yet been adequately researched and exploited. In this work, outlines and potential of these technologies for application in urology were discussed. The next years will certainly be shaped by the incorporation of AI methods into result interpretation and reporting by endoscopic examinations. Scientific activities in this field should therefore be intensified.
Open Access funding provided by Projekt DEAL. The authors thank Grigor Andreev and Manuel Hecht for their valuable work in data annotation.
MN: data management, data analysis and interpretation, manuscript writing/editing. SH: manuscript writing/editing. RSI: data collection, manuscript writing/editing. AM: project development, data and result evaluation, manuscript writing/editing. AR: project development, data and result evaluation, manuscript writing/editing.
This study was funded by the German Federal Ministry of Education and Research (grant number 13GW0203A).
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
All procedures performed in this study were approved by the local Ethical Committee of the University of Freiburg, Germany. This article does not contain any studies with animals performed by any of the authors.
Subject retrospective data was handled with anonymity; therefore, no informed consent was obtained from all individual participants included in the study.
- 1.Boslaugh SE (2007) American cancer society. In: Colditz G (ed) Encyclopedia of cancer and society. SAGE Publications Inc, Thousand Oaks.Google Scholar
- 3.Robert Koch Institute Cancer in Germany 2013/2014 2018 (German Centre for Cancer Registry Data 11th Edition)Google Scholar
- 8.Ronneberger OA, Fischer A, Philipp A, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. Springer, BerlinGoogle Scholar
- 12.Fradet Y, Grossman HB, Gomella L et al (2007) A comparison of hexaminolevulinate fluorescence cystoscopy and white light cystoscopy for the detection of carcinoma in situ in patients with bladder cancer: a phase III, multicenter study. J Urol 178(1):68–73. https://doi.org/10.1016/j.juro.2007.03.028 CrossRefPubMedGoogle Scholar
- 13.Hermann GG, Mogensen K, Carlsson S et al (2011) Fluorescence-guided transurethral resection of bladder tumours reduces bladder tumour recurrence due to less residual tumour tissue in Ta/T1 patients: a randomized two-centre study. BJU Int 108(8 Pt 2):E297–303. https://doi.org/10.1111/j.1464-410X.2011.10090.x CrossRefPubMedGoogle Scholar
- 15.Burger M, Grossman HB, Droller M et al (2013) Photodynamic diagnosis of non-muscle-invasive bladder cancer with hexaminolevulinate cystoscopy: a meta-analysis of detection and recurrence based on raw data. Eur Urol 64(5):846–854. https://doi.org/10.1016/j.eururo.2013.03.059 CrossRefPubMedGoogle Scholar
- 16.Babjuk M, Burger M, Comperat E et al (2018) EAU guidelines on non-muscle-invasive bladder cancer (TaT1 and CIS). European Association of UrologyGoogle Scholar
- 26.Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks, pp 1097–1105Google Scholar
- 27.Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognitionGoogle Scholar
- 28.Bishop CM (2006) Pattern recognition and machine learning (Information Science and Statistics)Google Scholar
- 31.Zeiler MD, Fergus R (2013) Visualizing and understanding convolutional networks. https://arxiv.org/pdf/1311.2901
- 32.Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge, Massachusetts, London, EnglandGoogle Scholar
- 33.Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. https://arxiv.org/pdf/1409.1556
- 34.He K, Zhang X, Ren S et al (2015) Deep residual learning for image recognition. https://arxiv.org/pdf/1512.03385
- 35.Ren S, He K, Girshick R et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networksGoogle Scholar
- 37.Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentationGoogle Scholar
- 38.Çiçek Ö, Abdulkadir A, Lienkamp SS et al. (2016) 3D U-net: learning dense volumetric segmentation from sparse annotation. https://arxiv.org/pdf/1606.06650
- 40.Oktay O, Schlemper J, Le Folgoc L et al (2018) Attention U-Net: learning where to look for the pancreasGoogle Scholar
- 41.Chen W, Liu B, Peng S et al (2019) S3D-UNet: separable 3D U-Net for brain tumor segmentation. In: Crimi A (ed) Brain lesion: glioma, multiple sclerosis, stroke and traumatic brain injuries: 4th international workshop, BrainLes 2018, held in conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, vol 11384. Springer, Cham, Switzerland, pp 358–368CrossRefGoogle Scholar
- 42.Isensee F, Petersen J, Klein A et al. (2018) nnU-net: self-adapting framework for U-net-based medical image segmentationGoogle Scholar
- 43.Gordienko Y, Gang P, Hui J et al (2019) Deep learning with lung segmentation and bone shadow exclusion techniques for chest X-ray analysis of lung cancer 754(1): 638–647. doi: 10.1007/978-3-319-91008-6_63Google Scholar
- 44.Ma X, Hadjiiski L, Wei J et al (2019) 2D and 3D bladder segmentation using U-Net-based deep-learning. In: International society for optics and photonics, 109500YGoogle Scholar
- 45.He K, Gkioxari G, Dollár P et al (2018) Mask R-CNN. https://arxiv.org/pdf/1703.06870
- 46.Vuola AO, Akram SU, Kannala J (2019) Mask-RCNN and U-net ensembled for nuclei segmentation. https://arxiv.org/pdf/1901.10170
- 47.Coleman CA, Narayanan D, Kang D et al (2017) DAWNBench: an end-to-end deep learning benchmark and competitionGoogle Scholar
- 49.Falkner S, Klein A, Hutter F (2018) BOHB: robust and efficient hyperparameter optimization at scaleGoogle Scholar
- 50.Domhan T, Springenberg JT, Hutter F (2015) Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curvesGoogle Scholar
- 51.Ilievski I, Akhtar T, Feng J et al (2016) Efficient hyperparameter optimization of deep learning algorithms using deterministic RBF surrogatesGoogle Scholar
- 52.Li L, Jamieson KG, DeSalvo G et al (2017) Hyperband: bandit-based configuration evaluation for hyperparameter optimization. ICLRGoogle Scholar
- 54.Deng J, Dong W, Socher R, Li LJ, Li K, Fei-fei L (2009) Imagenet: a large-scale hierarchical image databaseGoogle Scholar
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.