1 Introduction

In the medical care and management system, there is a substantial increase in medical images. There are different imaging modalities like Ultrasound images, Mammography Images (MG), X-Rays, Computed Tomography (CT), Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI), Magnetic Resonance Angiography (MRA), pathological tests, etc. It is often difficult and time-consuming to analyse medical images [1].

Deep learning (DL) models can address the problem of medical image analysis. Deep learning is an application of Artificial Intelligence that can learn from the input data and make decisions or predictions depending on the training data. There are three learning methods: unsupervised learning, supervised learning, and semi-supervised learning. Extraction of features is needed in the machine learning algorithms, and specific problem-related feature selection requires the help of a domain expert. Deep learning algorithms are a part of machine learning that automatically extract the necessary features from the input data [2]. Most of the review papers are on the capabilities of deep learning algorithms in the medical fields of radiology [3], MRI [4], Neurology [5], and Cardiology [6]. For object detection, segmentation, and classification of medical images, Convolutional Neural Networks (CNN) are used in deep learning [7, 22]. The collection of medical images requires a lot of effort. Even with high effort, if the data collected, the labelling and annotation of the data require the help of the doctors. The unavailability of a large collection of images of the same disease is another problem. Recently, GANs have been extensively used for the synthesis of medical images. The synthetic images from GAN aid in overcoming the problems of privacy, low data set size, imbalanced data set, etc. Rotation, scaling, flipping, and translation are traditional augmentation methods. These traditional augmentation methods result in changes in the shape, location, and size of images. The GANs generate realistic images and are used to augment the training images with good outcomes in medical applications. The main objective of this work is to review the recent use of deep learning models, or GANs, in different medical image analysis. The paper is organised as follows: Sect. 2 deals with different applications of deep learning models in the medical field; Sect. 3 deals with deep learning-based generative models and their applications in the medical field; followed by discussion and conclusion in Sects. 4 and 5.

2 Different applications of DL model in medical image analysis

In the development of modern deep learning, techniques like Lenet and AlexNet were frequently used. Subsequent network architectures are substantially more complicated, with each generation building on ideas and insights from prior systems, producing in state-of-the-art improvements. The prominent basic building CNN architectures described below:

AlexNet [34] employed an eight-layer network structure with three fully connected layers and five convolutional layers. The maximum pooling technique is used to minimize the quantity of data after each convolution in the five convolutional layers. The input size for AlexNet is 227 × 227 pixels. Use of RELUs, dropout regularization, dividing the computation across many GPUs, and data augmentation during training are notable aspects.

Oxford University’s VGG Group first proposed VGG16. The larger convolution kernels in AlexNet, such as 11 × 11 and 5 × 5, are replaced by a series of sequential 3 × 3 kernels in this system. The effect of using several small convolution kernels is better than using a larger convolution kernel for a given receptive field range because nonlinear layer can increase network depth to ensure more complex patterns are learned, and the computational cost is also lower.

GoogLeNet, which started the same year as VGGNet [35], had similar success. GoogleNet comprises a module called inception in contrast to VGGNet [35]. In order to minimize computation, it has a dense structure of convolutional layers with 1 × 1 kernel size.

ResNet [40] introduced skip connections, allowing for the training of considerably deeper networks. The network has the option to simply transfer the activations from layer to layer (more specifically, from ResNet block to ResNet block), maintaining information while data moves across the layers, by having skip connections in addition to the usual path. Some features are better extracted in the shallow networks, while others require deeper networks. The simultaneous capability of both is provided via skip connections, enhancing the network’s adaptability to input data.

DenseNet [39] was developed on the principles of ResNet, but concatenates the activations produced by one layer with those of later layers rather than adding them. Therefore, each layer (blocks of layers) keeps the original inputs in addition to the activations from previous layers, maintaining some sort of global state. This promotes feature reuse and reduces the number of parameters required for a given depth. Therefore, DenseNets are more suited for smaller datasets.

YOLO [37], introduced a novel, streamlined method for detecting objects in images and classifying them. It employs a single CNN that processes the image directly while outputting bounding boxes and class probabilities. It incorporates a number of components from the aforementioned networks, such as the inception modules and the pretraining smaller network. It moves quickly enough to allow real-time processing. By lowering the size of the model, YOLO makes it simple to exchange accuracy for speed. On a common benchmark data set, YOLOv3-tiny was able to process images at over 200 frames per second while still delivering accurate predictions.

UNet [55] is a well-known and effective network for segmenting 2D images. A traditional CNN is used to downscale an input image before it is upscaled using transpose convolutions till it reaches its original size. Additionally, based on the concepts of ResNet, there are skip connections that combine features from the up-sampling paths and the down sampling paths.

VNet is a three-dimensional version of the U-net with the same skip connections and volumetric convolutions as ResNet.

2.1 Image classification

In the computer-aided diagnosis system, image classification plays an important role. Image classification methods classify input images into classes like fracture or not-fracture or diseases or no-diseases [23, 24]. Normal uses of image classification in clinical applications include glaucoma diagnosis [25], skin disease detection [26, 27], retinopathy-related eye disease detection [28, 29], corneal disease detection [30], Brain cancer [31], and breast cancer [32] detection using pathological images, eye disease [33] detection using OCT, spine fracture classification [34] using CT images. A frequently used classification framework for medical image classification and analysis is the convolutional neural network (CNN) [35]. There is continuous improvement in the CNN framework with the evolution of the deep learning model. The AlexNet [36] was the pioneering CNN architecture, which comprises repeated convolutions with ReLU activation and max pooling. The performance of CNN architecture improved by increasing the depth of architecture in VGGNet [37] with convolution kernels of size 3 × 3, max pooling with size 2 × 2, in the inception network [38] with stacking of convolution kernels of sizes 1 × 1, 3 × 3, and 5 × 5 and pooling of size 3 × 3, and its alternation [39, 40]. Skip connections were used in DenseNet [41] and ResNet [42] to diminish the gradient vanishing. Apart from image classification, the CNN can be used for some other computer applications like segmentation and detection. For the evaluation of binary classification algorithms, commonly used evaluation metrics are recall, precision, accuracy, F1-score, AUC/ROC curve, etc. And for multiclass classification, commonly used evaluation metrics are accuracy and kappa coefficient.

The design of computer-aided decision support systems for fracture detection, lesion detection, cancer detection, and others is an evolving area of research. A computer-aided decision support system in the medical field requires classification of the data. Compared to traditional methods of data augmentation, deep learning-based generative models (GANs) are the best method of augmentation. With a GAN-augmented data set, we can avoid biased results and overfitting of the data. The performance of the CNN classification can be improved with GAN-augmented data.

2.2 Detection of object

Both localization and identification tasks are present in the object detection algorithms. Deciding the classes of the objects that appear in the region of interest is called an identification task, whereas precise localising the object position in the image is termed a localization task. Object detection in medical images aims to detect the abnormality or fracture. Ideal detection tasks in clinical applications comprise using chest X-ray or CT images to detect lung nodules [43, 44], mammogram detection using CT [45], and lesion detection on CT images [46, 47]. Anchor-based methods and anchor-free methods are two different methods in Object detection algorithms. Anchor-based methods are further classified as single-stage and two- or multistage anchor-based methods. Single-stage anchor-based methods are computationally efficient, while on the contrary, the object detection performance of two- or multistage anchor-based methods is better when compared to single-stage anchor-based methods. Widely used single-stage detectors are single-shot multiboxes [48] and the YOLO family [49]. Feed-forward CNN is the basis of multibox and YOLO architectures. A fixed number of bounding boxes are produced by these architectures, and in the boxes, for each object of a given class, architectures produce corresponding scores. Final predictions are obtained by the non-maximum suppression step. The SSD produces better detection performance because it makes use of multiscale feature maps, which is contrary to the YOLO architecture, which makes use of single-scale feature maps. Inference speed is high in a single-stage object detection architecture, whereas in a two-stage architecture, high object recognition and localization performance are present. Faster-RCNN [50] and Mask-RCNN [51] are popular two-stage object detection architectures that generate a set of ROIs. In Faster-RCNN and Mask-RCNN, Region Proposal Networks (RPN) generate bounding boxes in the first stage, and in the second stage, classification is done. The CornerNet [52] is a popular anchor-free technique. It is a single CNN that uses paired key points instead of anchor boxes; the bounding box is defined by the bottom-right and top-left corners. To evaluate the performance of detection methods, two main metrics are used: false positives per image and mean average precision.

2.3 Image segmentation

In deep learning, Image segmentation is a foremost research area. It is a pixel labelling method where images are separated into regions with similar properties. Segmentation techniques determine the outline of a body structure or organ in the medical images. In clinical applications, segmentation is used in segmenting different organs like the liver [53], pancreas [54], and whole heart [55] in CT imaging modalities. Expeditious development in deep learning leads to the development of very good semantic segmentation methods. In image segmentation, Fully Convolutional Neural Network (FCN) [56], which is the first CNN to perform segmentation tasks, has attained great success. In medical image segmentation, there are two categories of image segmentation: 2D and 3D, depending on the dimensions of the input image. For the segmentation of medical images, UNet architecture [57] is extensively used. U-Net comprises a downsample side and an upsample side. For downsampling, it comprises repeated convolutions, which are followed by Rectified Linear Unit (ReLU) and strided max pooling. The number of feature channels is doubled in each step. The upsampling path consists of feature map upsampling, followed by deconvolution with half the number of feature channels. Different types of U-Net-based frameworks have been developed. For the segmentation of the medical images No new U-Net (nnU-Net) is proposed by Isensee et al. [58]. The nnU-Net got excellent performance in segmenting tumours, lesions, and different organs in different imaging modalities across 19 different datasets with 49 segmentations. Polycystic kidneys segmentation [59], segmentation of brain tumours [60], striatum segmentation [61], deformable prostate segmentation [62], segmentation of acute ischemic lesion [63], organs at risk in the neck and head region segmentation using CT images [64], and 3D multiscale FCN segmentation of the spine using MR images [65], kidney by mask R-CNN segmentation [66], liver segmentation [67,68,69,70,71] are some of the medical image segmentation applications. The metrics to evaluate the performance of the segmentation task are Intersection Over Union (IOU) and the dice similarity coefficient method.

2.4 Image restoration

For many years, denoising MR images and estimating noise in MRI have been important research areas [72, 73]. Recently, for denoising medical images, deep learning approaches have been extensively used. Bermudez et al. [74] used deep learning for implicit brain MRI manifold learning. Here, with skip connections, autoencoders are implemented for image denoising. Benou et al. [75] addressed spatiotemporal denoising of brain dynamic contrast-enhanced MR images with bolus injections of contrast agent (CA). The results of quantitative and qualitative denoising were superior to those of spatiotemporal Beltrami, stacked denoising autoencoders [76], and the dynamic Non-Local Means method [77]. Deep learning techniques are also used in filtering the artefacts in spectroscopic MRI [78], automated reference-free detection of patient motion artefacts in MRI [79], and detection and removal of ghosting artefacts in MR spectroscopy [80].

2.5 Image registration

Image fusion or image warping are other names for Image registration. It is the process of overlaying two or more images that are captured from different imaging modalities or different angles. The main aim of medical image registration is to set up optimal correspondence in the images captured by different imaging modalities like CT, X-Ray, MRI, and ultrasound, at different times in longitudinal studies, or from distinct viewpoints like axial, sagittal, and coronal, to collect valuable information. In many medical applications like image-assisted surgery [81], computer-assisted intervention, and treatment planning [82], image registration is a very important pre-processing technique. The overlaying of anatomical images like CT or MRI with functional images like PET scans or functional MRIs is very helpful in disease monitoring and diagnosis [83]. The state-of-the-art performance is achieved by image registration methods that are based on deep learning methods [84]. Abdominal MRI registration was done [85] by applying a CNN to compensate for respiration deformation. Obtaining reliable ground truth is a challenging task in spite of the success of supervised deep learning-based techniques. Unsupervised techniques can effectively diminish the absence of training datasets and ground truth. trained a fully convolutional network to execute deformable brain 3D MRI by self-supervision [86]. Motivated from Spatial Transfer Network (STN) [87], Kuang et al. [88] implemented a CNN based on STN to execute MRI brain volumes deformable registration. Lately, Reinforcement Learning and Generative Adversarial network (GAN)-based techniques have caught attention. 3D ultrasound and MRI registration were performed by Yan et al. [89]. In the implemented work, estimation of the rigid transformation was done by a generator, and a discriminator network was trained to differentiate between ground-truth-based aligned images and predicted ones. The 2D–3D prostate MRI robust nonrigid deformable registration was done by the reinforcement learning method [90]. Retinal imaging, is crucial for diagnosing eye pathologies and systemic disorders. [91,92,93,94,95] presented deep learning approaches are used for registering retinal images. Depending on the imaging modalities, image registration can be categorised into two types: multimodal or monomodal. For evaluation of the performance of image restoration, two of the most commonly used metrics are mean square error and Dice coefficient.

3 Different applications of Generative Adversarial Networks (GANs) in medical image analysis

The Generative Adversarial Networks (GANs) comprises of generator (G) and discriminator (D) networks, where the generator learns input data distribution and uses the noise to generate realistic images. The Discriminator determines whether an image is real or synthetic. Discriminator input data x, the probability distribution is represented as \({p}_{data}\). The Generator (G) with \({\theta }_{g}\) parameters \(G\left(Z,{\theta }_{g}\right)\) map the input noise Z of distribution \({P}_{z}\) to data space \({P}_{g}\left(x\right).\) Similarly, discriminator (D) with parameter \({\theta }_{d}\) takes real and generated data and gives single scalar probability value as output. GAN plays “minmax” game that is (D) discriminator maximize and (G) generator tries to minimize the chances of predicting the correct classes and is represented by the Eq. (1) [20].

$${G}_{min}{D}_{max}\mathrm{ V }({\text{G}},\mathrm{ D}) ={E}_{x\sim {p}_{data}}\left[{\text{ln}}\left(D\left(x\right)\right)\right]+ {E}_{z\sim {P}_{z}}\left[{\text{ln}}\left(1-D\left(G\left(Z\right)\right)\right)\right]$$
(1)

For medical applications, GANs can be applied in two ways. The first is in the generative direction, where it generates a new realistic-looking synthetic image. The second is a discriminator (D) to discriminate images, which can be employed as a detector. The main applications of GANs in medical applications are detection, segmentation, classification, reconstruction, registration, and image synthesis.

Deep Convolutional GAN [96] is proposed in 2015. Both the generator and discriminator in DCGAN use the deep convolutional network and make use of hierarchical feature learning. It consists of a fully connected convolution layer without any max pooling. Batch normalization and the leakyReLU activation function are used in this GAN architecture to enhance training.

Wasserstein-GAN [97] was proposed in 2017. They measure divergence using the Wasserstein distance. It is a GAN extension that uses an alternative training technique for better approximation. Although WGAN is practically quite simple to construct, it has a problem with slow optimisation.

PGGAN [98] can produce realistic images of high quality. The basic process in a PGGAN is to train at a very low resolution, initially starting at 4 × 4, and then build up the model slowly and iteratively by adding layers and fine-tuning up to exponentially larger resolutions in powers of 2. Prior to being utilised to make the lower resolution images, the input image is centre cropped to reach the proper input resolution. Networks can more easily learn various image styles since they develop adaptively. Instead of having to quickly learn how to map a random noise latent vector to an image with a high resolution, say 512 × 512 networks gradually pick up this information by starting with a small-scale image, such a 4 × 4,8 × 8,16 × 16, etc., images.

Super-resolution GAN [99] generates higher resolution images, it uses a deep network together with an adversary network. In comparison to a similar design without GAN, SRGAN generates more visually appealing images with more details. Super-resolution (SR) images are upsampled using a GAN generator. The discriminator is used to differentiate between HR images and generated images and backpropagate the GAN loss to train the generator.

Conditional GAN [100] was proposed in the year 2014. Since no explicit control over the data generation is provided in the original GAN, the conditional GAN (cGAN) includes extra information like class labels in the synthesis process. In the cGAN, the generator is given some prior knowledge c along with random noise z. Along with the corresponding real, generated data, the discriminator also receives the prior knowledge c. If a class label is given, it can be used to conditionally generate images of a specific type or class.

For the purpose of transforming images between two domains, the model should be able to extract distinctive features from each domain and identify the underlying relationship between them. The CycleGAN [101] offer these mappings. To identify a mapping from domain X to domain Y and vice versa, the system essentially merges two GANs. A generator G: X → Y and a generator F: Y → X, taught by discriminator DY and discriminator DX, respectively, make up these systems. A cyclic loss function causes the two chained GANs to condense the range of potential mapping functions. This cyclic loss function accurately minimises the difference between the original image and the reconstruction produced by the chained generators.

The Pix2Pix is a highly effective cGAN version for high-resolution image-to-image translation. While the discriminator, uses a fully convolutional architecture to distinguish between the real and generated high resolution data, the Pix2Pix [101] generator adheres to the U-Net architecture. The skip connections inside the U-Net generator were advantageous for the overall coherence of the synthesised images. Pix2Pix needs pairs of related input and intended output images, unlike the original GAN framework. This makes it possible to stabilise the training by using the l1 loss between the output of the generators and the actual ground-truth image.

3.1 Image synthesis for data augmentation

In image synthesis, there are two main categories: unconditional image synthesis and cross-modality image synthesis [102]. DCGAN, PGGAN, and WGAN are used for unconditional synthesis, where only the random noise vector is input to the generator and the condition vector is not provided as input. 256 × 256 resolution images can be generally handled by DCGAN and WGAN, whereas high-resolution images are generated by PGGAN. Table 1 shows unconditional image synthesis work in different imaging modalities. In cross-modality image synthesis, the images of one modality are generated from other modalities (e.g., CT from MRI or vice versa). Pix2Pix GAN and Cycle GAN are extensively used cross-modal image generators. Table 2 shows cross-modality medical image synthesis work in different imaging modalities. And Table 3 shows GAN-based Segmentation in different imaging modalities of medical images.

Table 1 Unconditional medical image synthesis
Table 2 Cross modality image synthesis
Table 3 GAN based segmentation in different imaging modalities

3.2 Reconstruction

The radiation hazard is the main limitation in medical imaging like MRI, CT, X-rays, etc. To avoid this decrease in radiation dosage, which results in the amplification of noise and affects the diagnostic details in the images [198]. To capture a high-resolution MR image, a large capture time is needed [199], and lower-quality medical images are the result of small-scale graphical coverage. So, reconstruction of the image is needed. The GAN, which generates a realistic-looking image, can be used for reconstruction of images. Table 4 describes some of the reconstruction work done by GAN in medical applications.

Table 4 Reconstruction work by GAN in medical applications

3.3 Detection

The supervised deep learning algorithm for anomaly detection in medical images needs a huge annotated or labelled training image. For medical applications, such hugely labelled data is not readily accessible. Depending only on annotated data whose appearance is the same during training limits the ability of supervised DL methods to detect anomalies. The new paradigm is GAN-based unsupervised anomaly detection. Pioneering work on AnoGAN, implemented in [217], inferred that a similar idea could be helpful in anomaly detection in retinal OCT. Brain anomaly detection in MR images was implemented in [218] and [219]. Similarly, Alzheimer’s disease detection using VA-GAN (visual attribution GAN) was implemented [220]. Detection of prostate cancer [221] and skin lesions [228] by GAN, in which the generator uses U-Net and CGAN, respectively. Table 5 summarises how anomaly detection works.

Table 5 Anomaly detection by GAN in medical applications

3.4 Registration

Heavy optimisation load and parameter dependency are the drawbacks of traditional registration methods [230]. Medical images are successfully aligned using CNNs in a single forward pass. The Generative Adversarial Networks are considered a candidate to extract optimal registration mapping with their very good image transformation ability. Unsupervised GAN is implemented for structural pattern registration in brain images, where implemented GAN does not require specific similarity metrics or ground truth deformations [231]. An adversarial image registration framework is implemented for the registration of MRI and transrectal ultrasound. This image fusion helps in prostate interventions [232]. In the same way, [233] implemented GANs for deformation regularisation, which helps in training image registration.

3.5 Super-resolution [SR] methods

The generation of high-resolution images from low-resolution images is the main purpose of the SR method. In the GAN-based techniques to improve the resolution of LR images, the patterns are learned in the same region of paired low- and high-resolution training images. In GAN, low-resolution images are given as input to the generator, which generates synthetic SR images as output. And generated images and real SR images are given as input to the discriminator, which distinguishes their authenticity [234]. The Meta-SRGAN implemented [235] generates arbitrary SR images of brain 2D-MRI, which perform well when compared to traditional methods. Meta-SRGAN is a network that uses a Meta-Upscale Module and SRGAN. Rather than a single GAN, [236] implemented an ensemble model for SR-MR knee image synthesis by training multiple GANs and merging multiple outputs into one final output. In terms of peak SNR (peak signal-to-noise ratio) and structural similarity index, the ensemble model performed well. SR methods have been implemented for 3D image generation. A SRGAN-based network with enhanced up-sampling techniques is able to generate realistic synthetic images. The 3D-SRGAN is implemented in [237] to generate high-resolution images from low-resolution MR images of the brain. A multi-scale GAN with patch-wise learning is implemented to generate synthetic high-resolution 2D, 3D CT, and X-Ray thorax images. The GAN suppressed the objects that occur in patch-wise training and generated realistic 3D 512 × 512 thorax CT and 2048 × 2048 thorax X-ray images [238]. High-dose CT images and brain MRIs from low-dose images can also be generated with SRGAN.

3.6 De-noising

In CT and MR images, to reduce the exposure to radiation dose and to decrease image capturing time, Generative Adversarial Networks (GANs) have been implemented to reduce noise in CT and MR images captured in low-dose conditions. De-noising of low-dose single-photon emission computed tomography (SPECT) images was done using GANs [239]. CT images look forward to giving anatomic information; the removal of noise is very important while preserving contrast and the shape of organs. To accomplish this need, GANs that use perceptual sharpness loss The GANs with perceptual loss are implemented to generate high-dose abdominal CT from normal dose and simulated four-dose and are evaluated using a pre-trained VGG [240]. In the other modified type of GAN, a sharpness detection network is added to calculate the denoised image sharpness [241]. The models were trained with high- and low-dose pair CT images, which generate reduced-noise versions of images. Jelmer and team [242] trained the model with low-dose, routine CT pair images to generate synthetic noise-reduced images based on the low-dose images. The GAN-based reduction of noise helps for accurate quantification of calcification of the coronary artery from low-dose cardiac CT.

4 The datasets and evaluation indicators for various medical applications

Deep learning models have shown remarkable promise in healthcare and other domains, demonstrating that they are capable of performing tasks that humans could. But there are obstacles on the path to success. Large datasets are necessary for the training of deep learning algorithms. Deep learning’s applicability to medical image analysis has been limited by the lack of data. The expense of acquiring, annotating, and analysing medical images is high, and ethical restrictions limit their use. This makes it challenging for researchers who are not in the medical field to obtain huge amounts of relevant medical data. Thus, in an attempt to be as thorough as possible, this section of the paper presents a selection of medical imaging datasets for deep learning research (Table 6).

Table 6 Dataset for medical applications

During the classification training process, the evaluation metric is essential to obtaining the best classifier. Therefore, choosing an appropriate assessment metric is crucial to differentiating and achieving the best classifier. The list of commonly used evaluation metrics that are particularly intended for classifier optimization [278] are:

  • Accuracy: The accuracy metric quantifies the proportion of accurate predictions to all instances examined.

  • Error Rate: The ratio of inaccurate predictions to the total number of instances evaluated is known as the misclassification error.

  • Sensitivity: Sensitivity quantifies the percentage of positive patterns that are appropriately classified.

  • Specificity: Specificity quantifies the percentage of negative patterns that are appropriately classified.

  • Precision: The positive patterns that are accurately predicted from the total anticipated patterns in a positive class are measured by precision.

  • Recall: Recall quantifies the percentage of positive patterns that are appropriately classified.

  • F1-score: The harmonic mean of the recall and precision values is represented by F1 score.

  • Geometric-mean: This measure is used to maintain a somewhat balanced true positive and true negative rate while optimizing both rates.

  • Averaged Accuracy, Averaged Error Rate: Average accuracy and error of all classes.

  • Averaged Precision, Averaged Recall, Averaged F1-Measure: Average of per-class precision, Recall, F1-score.

Artificial intelligence research has grown rapidly over the past years due to deep learning models, particularly in the area of medical image segmentation. The list of commonly used evaluation metrics for segmentation [279] are: DSC: Dice Similarity Coefficient, IoU: Intersection-over-Union, Sensitivity, Specificity, Accuracy, ROC: Receiver Operating Characteristic, AUC: Area Under the ROC curve, Cohen’s Kappa (Kap), AHD: Average Hausdorff Distance.

5 Discussion

The main purpose of this work is to review deep learning model applications in Classification, segmentation, detection, restoration, registration, and GAN applications like data augmentation, segmentation, reconstruction, detection, denoising, and registration of medical images.

Deep learning models are most widely used for medical image classification and segmentation, and many works have been published in this area. For example, breast lesion segmentation and classification by an automated CNN approach were successfully implemented in [280]. Similarly, Segmentation of Cone-Beam CT for Oral Lesion Detection by the DL model was implemented in [281]. In classification applications, DL models based on CNN have seen progress. In medical image classification, CNN's success led the researchers to explore its benefits in classification. For instance, CNN's automatic classification of anatomical location and medical image modality got very good results [282]. Similarly, the lung nodule classification using the DL model [283], breast cancer classification [284], MRI brain tumour classification [285], shoulder fracture detection [286], COVID-19 detection [287], and cardiomyopathies classification in MRI [288] have also been implemented successfully. The remaining applications of the DL model in the medical field relating to detection, restoration, and registration are also evolving areas in medical applications.

Lately, the number of medical applications implementing GANs has increased remarkably. A major portion of GAN's works are medical image synthesis in its own modality and cross modality, indicating image synthesis is the most important GAN usage in medical applications. The literature shows that among all imaging modalities, MR images are ranked as the most popular imaging modality explored by GANs. MRI acquisition requires a large amount of time, which may be the main reason for the remarkable interest in using GANs for MRI. GANs generate synthetic MRI sequences from acquired images, which reduces image acquisition time. Other popular medical applications of GAN include segmentation and reconstruction frameworks. On generator output, strong texture and shape regulations are imposed, which results in promising performance of both tasks. For instance, adversarial loss improves 3D CT liver segmentation performance on non-contrast CT better than CRF and graph cut [289]. Further, the applications that utilised GAN for augmenting the data in classification focused more on generating synthetic objects like fractures, lesions, nodules, cells, etc. The training of a neural network (CNN) relies on a large data set to improve the generalisation of the network and reduce overfitting. Traditional data augmentation techniques like rotation, flipping, colour jittering, etc. are not as effective as data augmentation by GAN, which may be because of the smaller distribution variation in the synthetically generated images compared to real ones. For example, implementations that use GAN for generating chest X-rays [290] are used in the detection of pneumonia and COVID-19. The remaining applications of GAN in the medical field relating to registration, reconstruction, detection, denoising, and SR are so limited that it is difficult to draw any conclusions.

6 Conclusion

The main requirement for the clinically assisted decision support system for medical image analysis is the need of the hour. This paper contains the details and strategies of Deep Learning and Generative Adversarial Networks for medical image analysis in CADs. There are two main objectives. The first objective is a deep learning model for medical image analysis. The second objective is generative adversarial networks in medical image analysis. The successful DL models were reviewed in different medical image applications, like Classification, segmentation, detection, restoration, and registration. The DL-based models got good results in classification, segmentation, and detection and are used most commonly in medical image applications. For medical challenges Various solutions exist. Although there are still some issues in medical image applications that are required to be addressed with DL models, Numerous current DL model implementations, including supervised, semi-supervised, and unsupervised models, are slowly developing that can manage real data without manual labelling. The DL model aims to help radiologists make clinical decisions. Automation of radiologist workflow can be done by the DL model to ease decision-making among radiologists. The DL model is also able to aid physicians by automatically classifying and identifying lesions, minimising medical errors, and minimising time for interpretation. In the next few decades, DL-based CADs utilising medical images will be widely used for patient care. Hence, scientists, radiologists, and physicians look for ways to provide good patient care with the aid of DL models. Due to the limited availability of labelled data sets, weakly supervised and unsupervised techniques are emerging areas of research in DL-based medical image analysis. Similarly, different Generative Adversarial Network (GAN) architectures were implemented as powerful tools for medical imaging applications. GANs have realised data augmentation, segmentation, reconstruction, detection, denoising, and registration of medical images. The achievement of short-time image acquisition, low-dose imaging, and maintained quality of images were marked as clinically important features. Domain adaptation that uses available expertise is required to be a quick solution with less time for emerging issues. Further advancement in network models and computational power will permit new applications to deal with higher-dimensional images, like temporal and volumetric imaging. Overall, deep learning and generative adversarial networks are novel, fast-developing fields in medical image analysis that offer many obstacles, opportunities, and solutions.