Synthesizing Retinal Images using End-To-End VAEs-GAN Pipeline-Based Sharpening and Varying Layer

Saeed, Ali Q; Sheikh Abdullah, Siti Norul Huda; Che-Hamzah, Jemaima; Abdul Ghani, Ahmad Tarmizi; Abu-ain, Waleed Abdel karim

doi:10.1007/s11042-023-17058-2

Synthesizing Retinal Images using End-To-End VAEs-GAN Pipeline-Based Sharpening and Varying Layer

Open access
Published: 09 October 2023

Volume 83, pages 1283–1307, (2024)
Cite this article

Download PDF

You have full access to this open access article

Multimedia Tools and Applications Aims and scope Submit manuscript

Synthesizing Retinal Images using End-To-End VAEs-GAN Pipeline-Based Sharpening and Varying Layer

Download PDF

Ali Q Saeed ORCID: orcid.org/0000-0002-2276-3776^1,2,
Siti Norul Huda Sheikh Abdullah¹,
Jemaima Che-Hamzah³,
Ahmad Tarmizi Abdul Ghani¹ &
…
Waleed Abdel karim Abu-ain⁴

1052 Accesses
1 Citation
Explore all metrics

A Correction to this article was published on 08 November 2023

This article has been updated

Abstract

This study attempts to synthesize a realistic-looking fundus image from a morphologically changed vessel structure using the newly proposed sharpening and varying vessels technique (SVV). This technique sharpens the reconstructed vessels and introduces variation to their structure to generate multiple images from a single input mask. This helps to reduce the reliance on expensive and scarce annotated medical data. The study also aims to overcome the limitations of current methods, such as unrealistic optic disc boundaries, extreme vessel tortuosity, and missed optic discs. This is mainly due to the fact that existing models penalize their weights based on the difference between real and synthetic images using only a single mask. Therefore, their emphasis is on generating the input mask while disregarding other important fundoscopic features. Inspired by the recent progress in Generative Adversarial Nets (GANs) and Variational Autoencoder (VAE), the proposed approach was able to preserve the geometrical shape of critical fundus characteristics. Visual and quantitative results indicate that the produced images are considerably distinct from the ones used for training. However, they also exhibit anatomical coherence and a reasonable level of visual. The data utilized in this study and the programming code necessary to recreate the experiment can be accessed at https://github.com/AliSaeed86/SVV_GAN.

SequenceGAN: Generating Fundus Fluorescence Angiography Sequences from Structure Fundus Image

Retinal image synthesis from multiple-landmarks input with generative adversarial networks

Article Open access 21 May 2019

Retinal Image Synthesis for Glaucoma Assessment Using DCGAN and VAE Models

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Deep Learning (DL) has exhibited exceptional performance in medical image analysis tasks in recent years [1] for instance classification of lung and colon cancer [2], diagnosing of cervical cancer prognosis [2], Improving hepatocellular carcinoma fatality prognosis [3], and classifying the grade of an invasive ductal carcinomas breast cancer [4]. However, all DL models require extensive datasets with diverse images and accurate annotations. Acquiring and annotating fundus images is time-consuming, tedious, expert-dependent, and expensive. Therefore, researchers opt to use generative models like Generative Adversarial Networks (GAN) [5] and Variational Autoencoder (VAE) [6] to artificially generate new synthetic images. These models are a unique area of DL research and have demonstrated immense potential for integrating medical domain knowledge into DL models [7].

GAN consists of two neural networks: generator (G) and discriminator (D) which are trained simultaneously in a min–max game to learn the distribution of actual images from noise (z) to generate realistic images that can deceive D. D is a binary classifier that distinguishes between real (1) and fake (0) samples. The two networks improve until they reach a convergence point (Nash equilibrium) where no further improvement can be made.

VAE has an encoder and a decoder neural network. The encoder maps input images (x) into a latent representation (z), while the decoder reconstructs the image from the latent representation to be identical to the input image. GAN and VAE have been experimented together to enhance the vanilla VAE [8] and used in both combined or single models to address the shortage of medical images. However, these models have limitations, as shown in Fig. 1 and discussed here: 1) Synthetic images lack vessel abundance, and exhibit extreme tortuosity with vessels come from nowhere [9]. 2) Vessels are gathered in two main arcades and weaken as they further extend [10]. 3) Inability to produce extremely thin vessels [11]. 4) Missing or duplicated optic disc in the synthetic images [10, 11]. 5) The produced images included hazy disc boundaries [10, 12, 13]. 6) Limited number of generated/synthesized images with lack of anatomical characteristics [9, 10]. 7) Synthetic image quality inferior to genuine images [9,10,11,12,13].

Models in literature are categorized into two groups based on their architectural design. The first group utilizes two-stage pipelines that combine VAE-GAN or GAN-GAN architectures for generating fundus images, as seen in [9,10,11, 13]. The second group uses a single GAN architecture to synthesize retinal images, as demonstrated in [14,15,16,17]. Despite their exceptional performance, there is still scope for improvement, as mentioned earlier. The primary reasons behind these limitations are:

First, to generate high-quality synthetic images with complex structures like vessels, macula, and optic disc, [18] recommends having the same number of iterations (t) for both G and D, with t > 4. Increasing t to train D more than G can also improve synthetic image quality, as observed in [18]. A competitive D is crucial for a robust GAN model, as emphasized by [19]. [17] proposes a different training strategy, with G using double the iterations compared to D to balance the learning flow. In contrast, the vanilla GAN [5] uses double the iterations with D over G in the learning process. Balancing the learning process between the two players is critical and can lead to collapse if not handled properly, as mentioned in [20].

Balancing the training process becomes more challenging in the two-stage pipeline method, where two models learn simultaneously and the second model depends entirely on the output of the first model [16]. If the first model produces low-quality images, the second model's performance will be reduced, resulting in a decrease in the entire pipeline's performance. [10, 21] depend on the segmentation efficiency of the first model, leading to visual artifacts when under-segmentation of vessel structures.

Second, VAE produces blurry images during reconstruction [13]. Although, recent studies like [10, 22, 23] suggested using a hybrid method of VAE-GAN, with the GAN discriminator replacing VAE's decoder to improve loss function calculation. However, the resulting images from the modified VAE may still exhibit blurriness and dotted structures (Fig. 2), requiring pre-processing before feeding into GAN architecture.

Third, some studies [11, 21] utilize pre-existing images for their model training, whereas the generator should learn from a regularized latent space to produce unlimited synthesized images, independent of a specific image count.

Fourth, Using a single GAN model, as in [15, 16] is challenging due to the difficulty in controlling GAN's latent space compared to VAEs. Additionally, GAN lacks continuity properties required for certain applications [24].

Lastly, some existing methods [11, 13, 17, 18] use a single mask in synthesizing images, causing the models to prioritize the mask and ignore other fundoscopic characteristics. This leads to blurry shapes, unclear optic disc boundaries, and a lack of anatomical details in the resulting images.

This paper aims to combine VAE and GAN architectures to address the limitations of individual generative methods. The advantage of combining the two architectures lies in the capability of the VAE architectures to represent images in the latent space with large variability, and in the capability of the GANs architecture to produce sharp images with high resolution and good perceptual quality [25, 26]. These two privileges will be complementary employed to address the blurred and low quality images generated by the VAE architecture [13], and addressing the difficulty of GAN to capture the full data distribution [27], and the complexity of controlling the latent space for better generality [28]. Therefore, in this work we propose multiple VAEs to synthesize both vessel structure and optic disc separately, fuse the generated masks and manipulate them with SVV function beforehand feeding them to image-to-image translation [29], lastly we compare the synthesized pairs with real pairs using GAN architecture.

The significant contributions of this work are summarized as follows:

A new SVV layer assists in sharpening and varying vessels' abundance in the reconstructed tabular structure.
To our best knowledge, this work is the first to synthesize fundus images while controlling the complexity of vessels structure.
Multiple landmarks are involved in the synthesis process.
Extensive experiments demonstrate the effectiveness of the proposed method, including visual assessment, qualitative evaluation using the SSIM similarity index, and quantitative evaluation using downstream segmentation and classification tasks.

The paper is organized as follows: Sect. 2 provides a review of related literature, Sect. 3 details the proposed method, and Sect. 4 explains the environment setup, datasets, framework architecture, training strategy, and evaluation metrics. Section 5 presents the qualitative and quantitative evaluations, and Sect. 6 discusses the study's limitations.

2 Related Literature

This section will focus on the generative model used to synthesis retinal images and will discuss their architectures, datasets, and evaluation metrics in addition to the pros and cons of their methodologies.

[30] proposed a vessel generation method with high-quality images, validated qualitatively by experts’ visual perceptions scoring 12 synthetic retinal fundus images and quantitatively using VAMPIRE segmentation algorithm with 10 synthetic and real images from HRF dataset. Although, they were able to generate retinal images without employing deep learning methods. However, complex computations were required due to the large image resolution. Furthermore, because the proposed method is not a convolutional-backbone method, the model encounters limited capabilities to extract deep and complex features as a result the quality of the synthetic images inferior to realistic images.

[17] proposed MI-GAN framework for two tasks: synthesizing retinal vessels from only a few samples and segmenting real/fake images. They modified the generator equation by replacing L1 function with cross-entropy loss plus the sum of style loss, content loss, and total variation loss. They validated their method by incorporating different discriminators (i.e., patch GAN and Image GAN) in a segmentation task. Their experiments on DRIVE and STARE datasets showed that their method outperformed existing methods and surpassed expert ability. Although, they were able to produce unlimited number of synthetic images from same input, using a small training set (only tens of images). However, their method increases the rate of false negatives in the vessel edges and endpoints, as it tends to assign low probabilities to pixels within uncertain areas.

Similarly, [15] developed Tub-sGAN, a GAN framework that synthesizes multiple images from a single binary vessel segmentation input using style transfer (including style loss, content loss, and total variation loss). Tub-sGAN can learn from small training sets of less than ten images, and was trained on four datasets (DRIVE, STARE, HRF, and NeuB1). Downstream segmentation tasks and SSIM [31] were used to validate the synthetic images. Although, their synthetic images excel in preserving proper connectivity of the vessel trees, and their model can generate different outputs from the same tubular structured annotations. However, it is relatively weak in synthesizing local details, such as exudate regions. Furthermore, some anatomical details are less than perfect such as the boundaries of the optic disc are often not as clear as those of the real images, and the macular region is sometimes also not entirely accurate.

[16] proposed a glaucoma assessment using a retinal image synthesizer and semi-supervised learning with DCGAN [32]. Their method was trained on a small glaucoma-labelled dataset and a large unlabeled dataset comprising 86,926 cropped retinal images from 14 datasets. To validate their approach, they performed a quantitative evaluation by comparing the pixel proportions of optic disc and vessel network structures in real and synthetic images, as well as the 2D-histogram and mean squared error between the two. This work is the first that used a semi-supervised learning method and a retinal image synthesizer to generate unlimited number of glaucoma-labeled images. Their method can generate images synthetically and provide the labels automatically. Although, the number of retinal images used during training is significantly greater than any other work in the literature, they were unable to generate synthetic images better than the DCGAN [32] or Costa’s method [10].

[21] introduced a novel image synthesis method based on image-to-image translation [29] and adversarial learning. They used a U-net architecture to extract a binary vessel tree from the actual fundus image and trained a pix2pix network on image pairs to map the binary vessel map to a retinal image using the global L1 and GAN adversarial losses. The Messidor-DB1 dataset was used for training, and evaluation metrics included Qv [33] and Image Structure Clustering (ISC) [34] scores. Although, the synthetic images exhibit noticeable disparities in prominent visual characteristics, such as the image's color, tone, and illumination. However, the primary drawback of their method is its reliance on an existing vessel tree to generate a new image. Additionally, if the vessel tree is obtained through a segmentation technique applied to the original image, any weaknesses inherent in the segmentation algorithm will be carried over to the synthesized image.

Moreover, [10] developed an end-to-end retinal image synthesizer using Adversarial AE and GAN architecture. The model can generate fundus images by sampling the latent space from a probability distribution. The model was trained on the Messidor-1 dataset and validated visually and quantitatively using segmentation models and specific evaluation metrics, including Mutual Information (MI) to measure information overlap between real and synthetic blood vessels, and the Image Structure Clustering (ISC) metric to assess relevant retinal anatomical structures. Despite the ability of their method to generate realistic synthetic images that significantly deviate from the examples in the training set, with smooth variations in color and texture, and accurately placed optic discs, the resulting images still exhibit artifacts like broken tubular structures and chessboard patterns.

[9] proposed a two-stage pipeline method using DCGAN and pix2pix GAN architectures to synthesize vasculature and retinal images. The first architecture was trained on the DRIVE dataset to generate vasculature images from noise, while the second architecture used the Messidor-1 dataset to generate corresponding retinal images. Synthetic images were validated using a U-Net segmentation model trained on real images from DRIVE and synthetic images, with F1-score used to assess segmentation results. The Kullback–Leibler divergence (KL) score was also used to show that synthetic images were distinct from actual images and that the model did not memorize training data. Although, their method was able to produce larger quantities of images that are made publicly available to be used in data-driven machine learning tasks. However, their synthetic images exhibit artifacts, extreme tortuosity vessels, missed optic disc or macula, hazy optic discs with unclear boundaries, and vessels comes from nowhere.

Other studies such as [35] have worked on synthesizing digital camera noise to generate realistic images. This approach is based on a conditional GAN training scheme using Style Loss to supervise the generator training and Gaussian noise injection into each decoder block. The approach looks attractive; however, it hasn’t been adopted yet in ophthalmology to generate retinal images. Furthermore, [36] presented a latent diffusion model to synthesize high-resolution images quickly and efficiently, applying the diffusion model in the latent space of powerful pre-trained autoencoders. This approach allows preserving image details and reduce training time complexity. Notably, this approach has not been exploited in retinal image generation.

In conclusion, there is no specific evaluation metric for synthesized retinal images, but most studies evaluate them through downstream segmentation tasks and specific metrics. Other methods, such as similarity measurements or expert assessment, can also be used. Figure 3 shows a taxonomy of evaluation methods used in related literature and specifies the evaluation metrics followed in this study.

2.1 Method

We proposed a multi-stage pipeline for retinal image synthesis in this work. The framework consists of two VAE architectures and a GAN architecture to generate realistic fundus images with vessel trees and optic disc masks. The synthetic images were evaluated qualitatively and quantitatively to demonstrate the usefulness of the proposed method. The framework architecture, presented in Fig. 4, is divided into three sections: blood vessel and optic disc synthesis, masks to retinal image translation, and latent space to retinal image synthesis.

2.2 Blood Vessel and Optic Disc Synthesis

Generating realistic vessel trees and optic disc masks in addition to the synthesized fundus image is a fundamental aspect of an end-to-end fundus image synthesis system. In this section, the VAEs generate unlimited blood vessels and optic disc masks with plausible anatomical structures and high variability.

The VAE architecture encodes training images into a latent representation, z ∼ Q(x) = q(z|x), using an encoder network (Q) and decoder network (P). The encoder network's hyperparameters (θ1) are trained to generate a distribution Q(x,θ1), which can be used to sample a latent variable z. The decoder network's hyperparameters (θ2) are trained to reconstruct the latent variable back to its original form, ẍ ∼ P(z) = p(x|z), which belongs to the input data distribution. However, to overcome the difficulty of VAE to generate new samples close to the real ones due to uncontrollable latent representation space. we applied a modification proposed by [37], which combined VAE with GAN by replacing the GAN generator with the VAE encoder as the generator component in the adversarial game. This can better regularize the generative model and enforce the generator network to follow the pre-specified prior distribution in the production of latent representations. The discriminator is trained to distinguish whether a sample is from the latent representation or the true normal distribution. Figure 5 shows the modified VAE architecture.

Therefore, inspired by the original GAN Eq. (1 +), the modified GAN equation will replace the G with the encoder q(z|x) of VAE, as in Eqs. (2) and (3):

$$\underset{G}{\mathrm{min}}\underset{D}{\mathrm{max}}V\left(D,G\right)={E}_{x\sim {p}_{data}\left(x\right)} \left[logD\left(x\right)\right] + {E}_{z\sim {p}_{z}\left(z\right)} \left[log\left(1 -D\left(G\left(z\right)\right)\right)\right]$$

(1)

$${L}_{BV}({D}_{BV},\mathrm{q}) ={{E}_{z\sim {p}_{\left(z\right)}}}_{ }[\mathrm{log}({D}_{BV}\left(\mathrm{z}\right)] +{ E}_{v\sim {p}_{data}(x)}\left[\mathrm{log }(1 -{D}_{BV}\left(q\left(z|x\right)\right))\right])$$

(2)

$${L}_{OD}({D}_{OD},\mathrm{q}) ={{E}_{z\sim {p}_{\left(z\right)}}}_{ }[\mathrm{log}({D}_{OD}\left(\mathrm{z}\right)] +{ E}_{v\sim {p}_{data}(x)}\left[\mathrm{log }(1 -{D}_{OD}\left(q\left(z|x\right)\right))\right])$$

(3)

Both q and p weights are updated to minimize the reconstruction error, while also maximizing the error rate of D. By adding the reconstruction loss and Kullback–Leibler divergence loss (L_KL) specified in [4] to [2] and [3] equations, the final loss function of the adversarial autoencoder VAE_BV/ VAE_OD that controls the learning process will be a combination of ${L}_{OD}/{L}_{BV}$, reconstruction loss, and ${L}_{KL}$ losses as follow:

$${L}_{KL}={D}_{KL}\left(q\left(z|x\right) \| p(x|z)\right),$$

(4)

$${L}_{VAE\_BV}\left({D}_{BV},\mathrm{q},\mathrm{p}\right)={L}_{BV}\left({D}_{BV},q\right)+\mathrm{ \yen }{L}_{Rec}\left[q,\mathrm{p}\right]+{L}_{KL},$$

(5)

$${L}_{VAE\_OD}\left({D}_{OD},\mathrm{q},\mathrm{p}\right)={L}_{OD}\left({D}_{OD},q\right)+\mathrm{ \yen }{L}_{Rec}\left[q,\mathrm{p}\right]+{L}_{KL},$$

(6)

Similarly as in the work by [10], $\mathrm{\yen }$ is set to 100 to balance the two classes. The training mechanism here mimic the min–max game of the vanilla GAN, both $q$ and $p$ tasks are to minimize the overall loss, while D_BV/D_OD maximizes it. Once Nash equilibrium is achieved in Eqs. (5,6), the encoder ($q$) can synthesize vessel tree/optic disc mask from the latent distribution.

For sharper and clean synthesized vessels, an additional layer named sharpening and varying vessels (SVV) is placed right after the modified VAE_BV. The SVV layer works on sharpening blurred pixels and varies the abundance of vessels in the reconstructed vessel tree. As Fig. 6 shows the attached SVV to the VAE_BV. The SVV layer includes a filter of $(9\times 9)$ in size that sharpens the input image through a convolution process to retain the highest pixel values and enhance spatial resolution by emphasizing the boundaries of each pixel. This sharpening process is followed by batch normalization and a sigmoid activation function. The resulting image is then passed through a lambda function, where a factor called Ұ is randomly assigned during each iteration to regulate the number of vessels generated and to introduce variability in the output, as follows:

$$\ddot{\mathrm{x}}(\mathrm{i},\mathrm{ j})=\left\{\begin{array}{cc}\ddot{\mathrm{x}}(\mathrm{i},\mathrm{j})& \mathrm{if }\ddot{\mathrm{x}}\left(i,j\right)>=\yen , \\ 0& \mathrm{otherwise}\end{array}\right.$$

(7)

where $(\mathrm{\yen })$randomly set between 0.2 ans 0.4.

In this equation, (ẍ) is the generated vessel tree, (i,j) represent the pixel coordinates, and the value of ¥ is to control the complexity of the generated vessels structure which is randomly generated value between 0.2 and 0.4. As shown in Fig. 11, if the value of Ұ is set to 0.2, less vessels will be generated, while if the value of Ұ is set to 0.4 too many vessels will be generated. The optimal Ұ value for achieving a realistic appearance of vessel structures is when the value is set to 0.3. The threshold value of Ұ controls the diversity in vessel structures. A low threshold allows only the passage of vessel pixels below the threshold, resulting in a sparse vessel structure. Conversely, a high threshold value allows a more significant number of vessel pixels to pass, resulting in a more abundant vessel structure.

To evaluate the effectiveness of the proposed SVV layer, the visual comparison is performed with various image processing techniques, see Fig. 7. Considering that the performance of a single layer within a model typically performs less effectively than a dedicated deep learning model specifically designed for image enhancement purposes, we compared our proposed layer with traditional sharpening techniques rather than other deep learning-based methods that primarily focus on image sharpening. Needless to mention the primary goal of this study is to generate realistic fundus images not to maintain noisy/blurred images.

Furthermore, we utilized the sharpness estimation equation proposed by [38] to evaluate the sharpness of the images. As depicted in Fig. 7, the reconstructed vessels display some blurriness and fuzzy texture, primarily attributed to the inherent limitations of the VAE architecture. Taking the score of the reconstructed vessels as the reference for sharpness estimation, we observed that the SVV layer yielded the closest score to the reference images while maintaining the appearance and continuity of the vessels. Although other sharpening filters produced higher estimation scores than the SVV layer, they often resulted in corrupted vessel structures or influenced pixel contrast, forming halos around the vessels.

2.3 Masks to Retinal Image Translation

This section involves training the model to generate a retinal image from existing vessel tree and optic disc masks. This is done through an image-to-image translation process, based on a method proposed in [29]. The approach involves two adversarial neural networks that emulate the competition of GAN. The G is trained to map the merged vessel tree and optic disc masks (x) to a new representation (r) while maximizing the misclassification error to deceive the D. On the other hand, the D aims to distinguish real and generated images (Eq. 1). The adversarial loss is then formulated accordingly.

$${{\varvec{L}}}_{{\varvec{a}}{\varvec{d}}{\varvec{v}}}\left(\mathbf{G},\mathbf{D}\right)={E}_{x,r\sim {p}_{data}\left(x,r\right)} \left[logD\left(x,r\right)\right] + {E}_{x\sim {p}_{data}\left(x\right)} \left[log\left(1 -D(x,G(x)\right)\right]$$

(8)

The term "p_data(x)" refers to the distribution of real vessel trees, while "E_x,r∼p_data(x,r)" denotes the expectation over pairs (x, r) that are sampled from the joint data distribution "p_data(v, r)".

Recent research [14, 29] has shown that combining the global L1 loss function with equation [8] results in visually sharp images. The adversarial loss penalizes smoothed regions and promotes sharp images, while the L1 loss function preserves global consistency. To achieve this, equation [8] was modified by adding the L1 loss function as follow.

$${{\varvec{L}}}_{{\varvec{P}}{\varvec{i}}{\varvec{x}}2{\varvec{P}}{\varvec{i}}{\varvec{x}}}\left(\mathbf{G},\mathbf{D}\right) ={L}_{adv}\left(\mathrm{G},\mathrm{D}\right)+\updelta { E}_{x,r\sim {p}_{data}\left(x,r\right)} \left[\| r -G\left(x\right){\| }_{1}\right],$$

(9)

The variable "δ" balances the two losses and is set to 100, as in the original paper by [10]. The G aims to maintain global regularity and consistency in visual features with the help of the L1 loss function. Meanwhile, the D is trained to differentiate between real and generated N x N image regions. The image-to-image translation problem is a part of the overall framework, and its architecture is depicted in Fig. 8.

2.4 Latent Space to Retinal Image Synthesis

This section combines autoencoder models with an image-to-image translation model to create the proposed framework that generates a retinal image with vessel tree and optic disc masks from a random sample. To produce a realistic retinal image, balancing the training process of these models is critical. We train all models simultaneously and sum up their loss functions using addition. The adversarial loss of GAN, defined in (8), is used with the fake image input generated by autoencoders instead of a normal GAN situation. As all models' tasks are interconnected, they act as G component and are trained to deceive D by generating a plausible retinal image (r) that implicitly contains a plausible vessel tree and optic disc, as follows:

$${\mathrm{\acute{L} }}_{{\varvec{a}}{\varvec{d}}{\varvec{v}}}\left(\mathbf{G},\mathbf{D}\right)={E}_{x,r\sim {p}_{data}\left(x,r\right)} \left[logD\left(x,r\right)\right] + {E}_{x\sim {p}_{data}\left(x\right)} \left[log\left(1 -D(x,\mathrm{ p}(\mathrm{q}\left(\mathrm{v}\right) )\right))\right]$$

(10)

Then the modified loss function of the image-to-image translation model defined in (9) should be updated as follow:

$${\mathrm{\acute{L} }}_{{\varvec{P}}{\varvec{i}}{\varvec{x}}2{\varvec{P}}{\varvec{i}}{\varvec{x}}}(\mathbf{G},\mathbf{D}) ={\mathrm{\acute{L} }}_{adv}(\mathrm{G},\mathrm{D})+\updelta { E}_{x,r\sim {p}_{data}\left(x,r\right)} \left[\| r -\mathrm{p}(\mathrm{q}\left(\mathrm{v}\right)){\| }_{1}\right]$$

(11)

Lastly, the modified loss functions in (10) and (11) are combined with the loss functions of the VAEs defined in (5) and (6) to generate the global loss of the entire framework, as follow:

$${{\varvec{L}}}_{{\varvec{G}}{\varvec{l}}{\varvec{o}}{\varvec{b}}{\varvec{a}}{\varvec{l}}}\left(\mathrm{G},\mathrm{D},{\mathrm{D}}_{\mathrm{BV}},{\mathrm{q}}_{\mathrm{BV}},{\mathrm{p}}_{\mathrm{BV}},{\mathrm{D}}_{\mathrm{OD}},{\mathrm{q}}_{\mathrm{OD}},{\mathrm{p}}_{\mathrm{OD}}\right)={\mathrm{\acute{L} }}_{Pix2Pix}\left(G,D\right)+{L}_{VA{E}_{B}V}\left({D}_{BV},\mathrm{q},\mathrm{p}\right)+{ L}_{VA{E}_{O}D}\left({D}_{OD},\mathrm{q},\mathrm{p}\right)$$

(12)

D, D_BV, and D_OD aim to maximize the loss, while G, q_BV, p_BV, q_OD, and p_OD aim to minimize it in the equation. Joint training helps D improve VAE_BV and VAE_OD, and G benefits both VAEs by producing a realistic fundus image with SVV layer, which maximizes D's classification error. Figure 9 depicts the model combination.

3 Expressions Implementation and Training

3.1 Dataset

The proposed framework was trained on the publicly available Messidor-1 dataset [39], which contains 1200 fundus images with four grades of diabetic retinopathy. As there is no ground truth of the blood vessels, a U-net model trained on the DRIVE dataset [40] was used to extract the vessel tree. 254 images from Messidor-1 were excluded due to advanced diabetic retinopathy. The final set of 946 retinal images was downscaled to 256 × 256 and randomly split into training (614), validation (155), and testing (177) sets.

3.2 Models’ Architecture

The proposed framework has the same architecture as [10], with eight blocks in the encoders of both VAE_BV and VAE_OD. Each block has two convolutional layers with the same kernel size and different strides, except for the last block, which has only one convolutional layer. Figure 10 shows the block architecture for both VAEs. Dropouts with 0.5 were used in the 5th, 6th, and 7th layers after every activation function. Each encoder outputs two fully connected layers for mean μ(x) and standard deviation σ(x) with 32 units. The decoder has the same architecture as the encoder but with upsampling layers and a fully connected input layer to receive the encoder's output.

The GAN architecture in this work is based on [29], which uses a U-net with 8 blocks for both the encoder and decoder. Each block includes a (3 × 3) convolutional layer, followed by batch normalization and LeakyReLU activation. Dropouts were used in the first three blocks, and a sigmoid activation function was used in the final block. The discriminator D has the same architecture as the G encoder and is used to classify 16 × 16 patches. The D_BV/D_OD consists of two fully connected layers with LeakyReLU and sigmoid activation functions, respectively.

Properly balancing the training process of GAN and VAE architectures is crucial. Poor tuning may result in noisy images lacking complex structures and the discriminator being unable to learn distinguishable features. Additionally, the discriminator could become stronger than the generator and remain unaffected by the small changes performed on synthetic images. The typical training method for vanilla GAN [5], did not produce satisfactory results in our case, as it was intended for generating images of digits that lack the intricacies present in our images, such as vessels, optic disc, and macula. Therefore, we employed a distinct training approach for our models, presented in Table 1.

Table 1 Presents the training strategy of the proposed model

Full size table

After the model is trained, neither input vessel mask nor disc mark is needed as prior requirement to produce an image. However, the advantage of an end-to-end framework is to generate a complete retinal image using some specific features. In our case, the trained VAEs will be able to reconstruct the vessel structure and the optic disc features from a specific sample point randomly picked from the regularized latent space, the reconstructed masks are then sharpened and fused to be converted into a complete retinal image with the help of image-to-image translation that is performed by the generator (G) of the GAN architecture, and the discriminator D is to distinguish between the real and artificially created images.

3.3 Evaluation Metrics

For each application, a particular quality measure should be employed [41,42,43,44]. In line with the previous studies discussed in literature and the suggestions of [45], we used the Structural SIMilarity Index (SSIM) [31] to evaluate the quality of the synthesized images, which is commonly used in medical image applications [14]. The SSIM is a perceptual metric that measures the loss in image quality due to processing by comparing two images of the same scene. A higher SSIM value (close to 1) indicates greater similarity between real and synthesized images. Additionally, we evaluated downstream tasks such as segmentation and classification as recommended by [15, 50]. The classifier's Precision, Recall, and F1-score were calculated based on the reported false positives rate (FP), false negatives rate (FN), and true positives rate (TP), using the following equations:

$$\mathrm{Precision }= \frac{TP}{TP + FP},$$

(13)

$$\mathrm{Recall }= \frac{TP}{TP + FN},$$

(14)

$$\mathrm{F}1-\mathrm{score }= 2 . \frac{\mathrm{Recall }\times PPV}{\mathrm{Recall }+ PPV},$$

(15)

Furthermore, histograms is used to evaluate the similarity between real and artificial datasets, and the KL-divergence score is calculated to estimate the difference between them.

4 Quantity and Quality Evaluation

To ensure fair assessment, real and synthetic images were randomly chosen for both quantitative and qualitative experiments. To match the size of synthesized images, real images were downscaled to 256 × 256 as the model generates this resolution. Low retinal image resolution can still achieve state-of-the-art performance as shown in [46] for diabetic retinopathy classification. All the evaluation experiments were conducted on the Python 3.7.13 platform, utilizing the open-source Keras library (version 2.1.1) and TensorFlow-gpu (version 1.15.0). The experiments were performed on an MS Windows 11 operating system environment, running on an Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz (12 CPUs), 16GB RAM, NVIDIA GeForce RTX 2060-16GB GPU, and CUDA Toolkit 10.0.130.

For the training parameters, the model was trained with a learning rate of 0.0002 using the Adam optimizer with a beta_1 value of 0.5. The training iteration was set to 500, and the batch size was 20. Additionally, the dimension of the latent space was configured to 32.

4.1 Visual Images Evaluation

In this section, we present synthesized images generated by our model with their corresponding vessel tree and optic disc for visual evaluation (see Fig. 11). The retinal contents were correctly placed within the Field of View, and the coloring and illumination were visually acceptable, indicating the model's robustness. The SVV layer in our proposed model improved the generator network's capability to synthesize retinal images efficiently with varying vessel abundance, as demonstrated in Fig. 11.

Note that all images in Fig. 11 were generated without the need for pre-existing vessel trees or fundus images, demonstrating the model's ability to produce an unlimited number of unique images. To ensure the model did not memorize the training images, we analyzed the distance between the synthetic and training images using the visual information fidelity (VIF) method proposed by [47]. This method calculates mutual information obtained by the human visual system (HVS) channel for both training and synthetic images separately for later comparison, which is widely used in medical image registration [48]. The VIF analysis confirmed that our model can generate realistic images that are visually different from the training set, indicating its excellent generalization capability. Equation (16) shows the calculation of VIF:

$$VIF= \frac{ {\Sigma }_{j \in subbands} I\left({\overline{C} }^{N,j} ; {\overline{F} }^{N,j} \right| {S}^{N,j}= {s}^{N,j})}{ {\Sigma }_{j \in subbands} I\left({\overline{C} }^{N,j} ; {\overline{E} }^{N,j} \right| {S}^{N,j}= {s}^{N,j})},$$

(16)

where $I\left({\overline{C} }^{N} ; {\overline{F} }^{N} \right| {s}^{N})$ and $I\left({\overline{C} }^{N} ; {\overline{E} }^{N} \right| {s}^{N})$ represent the information that human brain can extract from the training and the synthetic images, respectively.

Figure 12 compares our synthetic retinal images, achieving the highest VIF compared to the training set, and existing literature studies that focused on synthesizing Messidor-1 DB images. According to a review study by [45] and our extensive research, only [10, 21], and [9] studies have explored the synthesis of retinal images on Messidor-1 dataset, making them vital to perform visual comparison as shown in Fig. 12.

Our synthetic images exhibit a distinct overall appearance in the second row compared to the real images in the first row. This indicates that our model did not simply memorize the training set but possesses a strong generalization capability to generate realistic-looking images. Our proposed method and the method introduced by [10] are the only end-to-end retinal image synthesis approaches in the literature. This means that the vessels' structure is initially synthesized in the first phase, followed by the generation of synthetic images based on the generated vessel structure in the second phase. In contrast, other studies [9, 21] synthesized their retinal images using an existing vessel tree instead of random sampling from a latent space.

The vessels in the synthetic images generated by [9, 21] appear sharper and more abundant than those generated by [10]. However, despite their vessels being generated by a segmentation model rather than through synthetic generation, their vessels are comparable to those in our synthetic images. Additionally, extreme tortuosity is observed in the images generated by [9], and there are instances of missed optic discs reported in some images generated by [9, 10]. In contrast, the characteristics of our synthetic images appear more realistic when compared to [9, 10] and exhibit a closer resemblance to the real images.

We conducted further visual evaluation to verify the accuracy of anatomical features in our synthetic images, including the optic disc. Preserving the precise geometrical shape and clear boundaries of the optic disc is challenging in other studies [10, 15]. This is crucial in medical image generation as accurate representation is necessary for proper diagnosis. Medical images have extreme variations in patterns, colors, and illuminations that hinder the ability of unary GAN-based methods to generate complex image structures, according to [9]. However, our proposed framework breaks down the synthesis task into multiple generative models, each trained on a specific part of an image. Starting with the VAE models. They are weighted according to the significance of their tasks, as mentioned in Eqs. 5 and 6. In the first stage, VAEs are trained to generate unique segmentation geometry. In other words, focus on the low dimensional problem and ignore photorealism. While in the second stage, GANs are responsible for generating textures, lighting, and colors for the given geometry and then comparing it with real images. Such a pipeline framework allows the model to converge faster and perform better than a single GAN-based method in generating images’ geometries and textures. Figure 13 illustrates the optic disc generated by our model and other literary works.

Although our artificial optic discs in Fig. 13 are not identical to the actual images, they are closer in size and shape to real optic discs than those generated by [9, 10, 21]. Additionally, our model generates sharper disc boundaries, allowing for a clear distinction between optic disc pixels and background pixels in our synthetic images. Unlike, the hazy and indistinguishable boundaries in other literature works [9, 10, 21].

4.2 Qualitative Images Evaluation

Assessing the quality of synthetic images objectively remains a challenge due to the lack of available references [20, 29, 49]. For qualitative evaluation, we used the SSIM. Three datasets containing a similar number of retinal images were used for fair evaluation: (1) Real dataset of unseen retinal images randomly taken from the Messidor-DB1 test set, (2) Baseline dataset containing retinal images by the baseline method, and (3) the dataset generated by our proposed method.

To assess the similarity and variability of synthesized images, we conducted multiple experimental comparisons. The first experiment involved estimating image variability within the same dataset. This was done by dividing each of the three datasets into two parts, performing self-comparison, and examining the standard deviation (std) for all images in the dataset. Each reported Std value in Table 2 represents the average SSIM value of all images deviating from their mean, allowing us to estimate the variability. Higher std values indicate greater variability among images.

Table 2 SSIM measures the similarity and variability of our synthetic images compared to baseline and real images

Full size table

Table 2 presents the comparison results between two subsets of the same dataset, with the first three columns displaying these results. To determine a standard of comparison, we used the Real-Real column's Std value since it was calculated from real images. Our self-comparison synthetic images had a Std value that was closer to this gold standard than the self-comparison of the baseline method. However, our synthesized images exhibited greater variability in content than those generated by the baseline method. This was primarily caused by the SVV layer, which controlled vessel abundance and increased the synthesized images' variability.

On the other hand, if we compare the synthetic datasets with the actual dataset, not with each other, a lower std value indicates that the variance of the synthetic images is similar to that of the actual images (the gold standard). As shown in Table 2, our synthetic images have structural information that is much closer to the actual images since we reported a lower std value (0.0238) compared to the baseline method's std value (0.0373). The std value obtained from comparing our synthetic images to the actual images is almost the same as the std value obtained from comparing real-to-real images, with only a 0.03 margin, indicating that our images have nearly the same variance as the actual images.

The second experiment aimed to evaluate how similar the first half of the synthetic datasets was to both the second half and the real datasets. The mean SSIM value was used as a measure of similarity, with a higher value indicating greater similarity. Initially, each of the three datasets was self-compared to estimate the mean SSIM between the two sets within each dataset. In terms of the self-comparison test, the dataset generated by our model had a lower mean SSIM than the baseline method. This result indicates that our model had higher generalization capabilities than the baseline method since the images generated by our model were less similar to each other.

In contrast, our artificial images had a higher mean SSIM value of 0.8765 compared to the baseline method, which had a mean SSIM value of 0.8402, when compared to the actual dataset. This suggests that the content and overall layout of our synthetic images resemble real images more closely than those produced by the baseline method.

Our proposed model demonstrated superior performance by assigning each task to a specific generative model and weighting them, particularly when generating retinal structures that comprise the complete retinal image. This approach encourages the generative models to prioritize improving their outputs to closely resemble real images. Additionally, Fig. 14 provides a detailed summary of the comparison using a boxplot.

Figure 15 displays a comparison of the visual disparities between the distribution of our synthetic images and two distributions of real images that were randomly selected from the same dataset. Finally, we obtained a KL divergence score of 7.5199 when comparing our synthetic dataset with the first distribution of the real dataset, which is very close to the KL divergence obtained by comparing the two distributions of the real dataset with only 0.9 margins.

4.3 Quantitative Images Evaluation

Researchers suggest assessing the usefulness of synthetic images for medical applications by evaluating their impact on the segmentation performance of a model [9, 15], or by training a classifier to distinguish between real and synthetic images instead of recruiting domain experts [50]. This paper used both segmentation and classification approaches to evaluate the reliability of the synthesized images. The segmentation task employed a state-of-the-art model from [51] to verify if the synthetic images can train a model to segment vessel trees from retinal fundus images. The model was first trained on 20 real images from the training set of the publicly available DRIVE dataset [40], and then on 20 synthetic images randomly selected from our dataset, followed by testing both models' performances using the remaining 20 images from the DRIVE test set.

In Fig. 16, we compare the performance of a model trained on real images with a model trained on our synthetic images using ROC curves. The AUC score for the model trained on real images is 0.974, while the AUC score for the model trained on our synthetic images is 0.943. These results show promise, as the model trained on synthetic images performs similarly to state-of-the-art models trained on real images like [52,53,54,55].

In the classification task, images from the real Messidor-1 dataset and the synthetic dataset, were randomly selected and labelled as 1 (real) or 0 (fake). The images were then mixed, shuffled, and split into 80% training and 20% testing sets. The trained classifier had difficulty distinguishing between real and synthetic images during testing with an accuracy of only 0.6216. Table 3 displays the reduced scores for each class in terms of precision, f1-score, and recall metrics.

Table 3 Classifier's Performance in Precision, F1-Score, and Recall Metrics to Distinguish Real and Synthetic Images

Full size table

Precision and recall are presented as a recall-precision curve (Fig. 17), along with the confusion matrix (right) and the Precision-Recall curve (left).

5 Model’s Limitations

Despite the proposed model can generate realistic images with consistent optic disc geometry. However, there are some limitations, such as the size of the generated images is not the same as the input images. This is mainly due to limited hardware resources beside the large size of the proposed architecture forced the authors to consider $(256\times 256)$ images size instead of $(512\times 512$), as a results, the generated images lack of high resolution. Furthermore, the reliability of the generated images still requires further validation. Although multiple quality assessments were performed, ophthalmologists must be involved in the appraisal of synthetic images, as recommended by [56] the clinical assessment is necessary to verify the reliability of the images. Lastly, The SVV layer's threshold may affect the generation of vessel structure through generating too much of very few vessels abundance that are unacceptable by the experts. Also, giving the same generation priority for the veins and arteries made it difficult to distinguish between them. Therefore, further investigations are needed in future research.

6 Conclusion

In this study, multiple VAEs and GANs were trained on Messidor-DB1, along with the proposed SVV layer to sharpen and vary vessel structure morphology. Unlike other generative models, the proposed model does not require vessel masks to synthesize images and instead samples from a predefined Gaussian distribution to generate unlimited images. In this work, we followed a new training strategy that uses 70% of the batch size to train the G, and the remaining 30% is used to train the D. The method produced more realistic image texture, sharper optic disc boundaries, and controlled overall vessel morphology. Qualitative and quantitative results showed that our synthesized images could train a segmentation model comparably to real images, promising in fulfilling the increased demand for annotated data in medical applications.

In the future, we aim to reduce the huge size of the proposed model through incorporating transfer learning such as the pre-trained VGG16 architecture instead of training the VAEs models from scratch aiming to minimize the number of trainable parameters and reduce the complexity of computation time. Furthermore, exploiting the large availability of unlabeled image data in the training stage may further improve the quality of the generated images.

7 Funding Statement

This work was funded by the Ministry of Higher Education, Malaysia via a research grant award FRGS-1–2019-ICT02-UKM-02–9. In addition to the ethical approval from the medical hospital of UKM which is referenced as UKM PPI/111/8/JEP-2021–718 on 1^st Nov 2021.

Data availability

The experimental data and the simulation results that support the findings of this study are available in GitHub through the link https://github.com/AliSaeed86/SVV_GAN.

Change history

08 November 2023
A Correction to this paper has been published: https://doi.org/10.1007/s11042-023-17636-4

References

Litjens G, Kooi T, Bejnordi BE et al (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88
Article Google Scholar
Sharma M (2019) Cervical cancer prognosis using genetic algorithm and adaptive boosting approach. Health Technol (Berl) 9:877–886. https://doi.org/10.1007/s12553-019-00375-8
Article Google Scholar
Sharma M, Kumar N (2022) Improved hepatocellular carcinoma fatality prognosis using ensemble learning approach. J Ambient Intell Humaniz Comput 13:5763–5777. https://doi.org/10.1007/s12652-021-03256-z
Article Google Scholar
Kumaraswamy E, Kumar S, Sharma M (2023) An Invasive Ductal Carcinomas Breast Cancer Grade Classification Using an Ensemble of Convolutional Neural Networks. Diagnostics 13:1977. https://doi.org/10.3390/diagnostics13111977
Article Google Scholar
Goodfellow IJ, Pouget-Abadie J, Mirza M, et al (2014) Generative Adversarial Nets. Adv Neural Inf Process Syst 27 (NIPS 2014) 27:2672--2680. https://doi.org/10.1109/ICCVW.2019.00369
Kingma DP, Welling M (2014) Auto-encoding variational bayes. 2nd Int Conf Learn Represent ICLR 2014 - Conf Track Proc
Xie X, Niu J, Liu X, et al (2021) A survey on incorporating domain knowledge into deep learning for medical image analysis. Med Image Anal 69. https://doi.org/10.1016/j.media.2021.101985
Osuala R, Kushibar K, Garrucho L, et al (2023) Data synthesis and adversarial networks: A review and meta-analysis in cancer imaging. Med Image Anal 84. https://doi.org/10.1016/j.media.2022.102704
Guibas JT, Virdi TS, Li PS (2017) Synthetic Medical Images from Dual Generative Adversarial Networks. In: 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA
Costa P, Galdran A, Meyer MI et al (2018) End-to-End Adversarial Retinal Image Synthesis. IEEE Trans Med Imaging 37:781–791. https://doi.org/10.1109/TMI.2017.2759102
Article Google Scholar
Haoqi G, Ogawara K (2020) CGAN-based Synthetic Medical Image Augmentation between Retinal Fundus Images and Vessel Segmented Images. In: 2020 5th International Conference on Control and Robotics Engineering, ICCRE 2020. 218–223
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. Proc IEEE Int Conf Comput Vis. 2242–2251. https://doi.org/10.1109/ICCV.2017.244
Diaz-Pinto A, Colomer A, Naranjo V, et al (2018) Retinal Image Synthesis for Glaucoma Assessment Using DCGAN and VAE Models. In: 19th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL). Springer Verlag. 224–232
Yu Z, Xiang Q, Meng J, et al (2019) Retinal image synthesis from multiple-landmarks input with generative adversarial networks. Biomed Eng Online 18. https://doi.org/10.1186/s12938-019-0682-x
Zhao H, Li H, Maurer-Stroh S, Cheng L (2018) Synthesizing retinal and neuronal images with generative adversarial nets. Med Image Anal 49:14–26. https://doi.org/10.1016/j.media.2018.07.001
Article Google Scholar
Diaz-Pinto A, Colomer A, Naranjo V et al (2019) Retinal Image Synthesis and Semi-Supervised Learning for Glaucoma Assessment. IEEE Trans Med Imaging 38:2211–2218. https://doi.org/10.1109/TMI.2019.2903434
Article Google Scholar
Iqbal T, Ali H (2018) Generative Adversarial Network for Medical Images (MI-GAN). J Med Syst 42:231. https://doi.org/10.1007/s10916-018-1072-9
Article Google Scholar
Biswas S, Rohdin J, Drahansky M (2019) Synthetic Retinal Images from Unconditional GANs. Proc Annu Int Conf IEEE Eng Med Biol Soc EMBS 2736–2739. https://doi.org/10.1109/EMBC.2019.8857857
Son J, Park SJ, Jung KH (2019) Towards Accurate Segmentation of Retinal Vessels and the Optic Disc in Fundoscopic Images with Generative Adversarial Networks. J Digit Imaging 32:499–512. https://doi.org/10.1007/s10278-018-0126-3
Article Google Scholar
Saeed AQ, Abdullah SNHS, Che-Hamzah J, Ghani ATA (2021) Accuracy of using generative adversarial networks for glaucoma detection: Systematic review and bibliometric analysis. J Med Internet Res 23. https://doi.org/10.2196/27414
Costa P, Galdran A, Meyer MI, et al (2017) Adversarial synthesis of retinal images from vessel trees. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer Verlag. 516–523
Lamb A, Dumoulin V, Courville A (2016) Discriminative Regularization for Generative Models
Dosovitskiy A, Brox T (2016) Generating images with perceptual similarity metrics based on deep networks. Adv Neural Inf Process Syst 658–666
Doersch C (2016) Tutorial on Variational Autoencoders. arXiv 3:
Karras T, Laine S, Aittala M, et al (2020) Analyzing and improving the image quality of stylegan. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 8107–8116. https://doi.org/10.1109/CVPR42600.2020.00813
Brock A, Donahue J, Simonyan K (2019) Large scale GaN training for high fidelity natural image synthesis. 7th Int Conf Learn Represent ICLR 2019
Metz L, Brain G, Poole B, et al (2017) Unrolled Generative Adversarial Networks. ICLR Conf Pap
Asperti A, Tonelli V (2022) Comparing the latent space of generative models. Neural Comput Appl. https://doi.org/10.1007/S00521-022-07890-2
Article Google Scholar
Isola P, Zhu J, Efros AA, et al (2017) Image-to-Image Translation with Conditional Adversarial Networks. In: In Proceedings of the IEEE conference on computer vision and pattern recognition. 5967–5976
Bonaldi L, Menti E, Ballerini L, et al (2016) Automatic generation of synthetic retinal fundus images: Vascular network. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer Verlag. 167–176
Larkin KG (2015) Structural Similarity Index SSIMplified: Is there really a simpler concept at the heart of image quality measurement? Occas Texts Purs Clarity Simplicity Res Ser 1, Number 1:
Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. 4th Int Conf Learn Represent ICLR 2016 - Conf Track Proc 1–16
Köhler T, Budai A, Kraus MF, et al (2013) Automatic no-reference quality assessment for retinal fundus images using vessel segmentation. In: Proceedings - IEEE Symposium on Computer-Based Medical Systems. 95–100
Niemeijer M, Abràmoff MD, van Ginneken B (2006) Image structure clustering for image quality verification of color retina images in diabetic retinopathy screening. Med Image Anal 10:888–898. https://doi.org/10.1016/j.media.2006.09.006
Article Google Scholar
Song M, Zhang Y, Aydın TO, et al (2023) A Generative Model for Digital Camera Noise Synthesis. arXiv
Rombach R, Blattmann A, Lorenz D, et al (2022) High-Resolution Image Synthesis with Latent Diffusion Models. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 10674–10685
Larsen ABL, Sønderby SK, Larochelle H, Winther O (2016) Autoencoding beyond pixels using a learned similarity metric. In: 33rd International Conference on Machine Learning, ICML. 2341–2349
Kumar J, Chen F, Doermann D (2012) Sharpness estimation for document and scene images. Proc - Int Conf Pattern Recognit 3292–3295
Decencière E, Zhang X, Cazuguel G, et al (2014) Feedback on a publicly distributed image database: the Messidor database. ias-iss.org 33:231–234. https://doi.org/10.5566/ias.1155
Staal J, Abràmoff MD, Niemeijer M et al (2004) Ridge-based vessel segmentation in color images of the retina. IEEE Trans Med Imaging 23:501–509. https://doi.org/10.1109/TMI.2004.825627
Article Google Scholar
Zamani NA, Zaharudin ADM, Abdullah SNHS, Nordin MJ (2012) Sparse representation super-resolution method for enhancement analysis in video forensics. In: International Conference on Intelligent Systems Design and Applications, ISDA. 921–926
Yahya SR, Omar K, Abdullah SNHS, Sophian A (2018) Image enhancement background for high damage malay manuscripts using adaptive Threshold Binarization. In: International Journal on Advanced Science, Engineering and Information Technology. 1552–1564
Senan MFEM, Abdullah SNHS, Kharudin WM, Saupi NAM (2017) CCTV quality assessment for forensics facial recognition analysis. In: Proceedings of the 7th International Conference Confluence 2017 on Cloud Computing, Data Science and Engineering. 649–655
Pirahansiah F, Abdullah SNHS, Sahran S (2015) Camera calibration for multi-modal robot vision based on image quality assessment. ieeexplore.ieee.org 1–6. https://doi.org/10.1109/ascc.2015.7360336
Bellemo V, Burlina P, Yong L, et al (2019) Generative Adversarial Networks (GANs) for Retinal Fundus Image Synthesis. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer Verlag, 289–302
Gulshan V, Peng L, Coram M et al (2016) Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA - J Am Med Assoc 316:2402–2410. https://doi.org/10.1001/jama.2016.17216
Article Google Scholar
Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15:430–444. https://doi.org/10.1109/TIP.2005.859378
Article Google Scholar
Pluim JPW, Maintz JBAA, Viergever MA (2003) Mutual-information-based registration of medical images: A survey. IEEE Trans Med Imaging 22:986–1004. https://doi.org/10.1109/TMI.2003.815867
Article Google Scholar
Salimans T, Goodfellow I, Zaremba W, et al (2016) Improved techniques for training GANs. In: Advances in Neural Information Processing Systems. 2234–2242
Yi X, Walia E, Babyn P (2019) Generative adversarial network in medical imaging: A review. Med Image Anal 58. https://doi.org/10.1016/j.media.2019.101552
Xiancheng W, Wei L, Bingyi M, et al (2018) Retina Blood Vessel Segmentation Using A U-Net Based Convolutional Neural Network. In: International Conference on Data Science. 1–11
Lahiri A, Ayush K, Biswas PK, Mitra P (2017) Generative Adversarial Learning for Reducing Manual Annotation in Semantic Segmentation on Large Scale Miscroscopy Images: Automated Vessel Segmentation in Retinal Fundus Image as Test Case. IEEE Comput Soc Conf Comput Vis Pattern Recognit Work. 794–800. https://doi.org/10.1109/CVPRW.2017.110
Soomro TA, Afifi AJ, Gao J, et al (2017) Boosting Sensitivity of a Retinal Vessel Segmentation Algorithm with Convolutional Neural Network. DICTA 2017 - 2017 Int Conf Digit Image Comput Tech Appl. 1–8. https://doi.org/10.1109/DICTA.2017.8227413
Zhao Y, Rada L, Chen K et al (2015) Automated Vessel Segmentation Using Infinite Perimeter Active Contour Model with Hybrid Region Information with Application to Retinal Images. IEEE Trans Med Imaging 34:1797–1807. https://doi.org/10.1109/TMI.2015.2409024
Article Google Scholar
Soomro TA, Hellwich O, Afifi AJ, et al (2019) Strided U-Net Model: Retinal Vessels Segmentation using Dice Loss. In: 2018 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2018. IEEE
Chen X, Wang X, Zhang K, et al (2022) Recent advances and clinical applications of deep learning in medical image analysis. Med Image Anal 79. https://doi.org/10.1016/j.media.2022.102444

Download references

Acknowledgements

We express our gratitude to Universiti Kebangsaan Malaysia for providing the research facility and environment for the Digital Forensics Lab. In addition to our grateful to the UKM hospital for granting the ethical approval referenced UKM PPI/111/8/JEP-2021-718 on 1st Nov 2021.

Author information

Authors and Affiliations

Center for Cyber Security, Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, 43650, Selangor, Malaysia
Ali Q Saeed, Siti Norul Huda Sheikh Abdullah & Ahmad Tarmizi Abdul Ghani
Computer Center, Northern Technical University, Ninevah, 41001, Iraq
Ali Q Saeed
Department of Ophthalmology, Faculty of Medicine, Universiti Kebangsaan Malaysia, 43650, Cheras, Kuala Lumpur, Malaysia
Jemaima Che-Hamzah
Department of Computer Science, Applied College, Taibah University, 46429, Al-Madinah al-Munawwara, Saudi Arabia
Waleed Abdel karim Abu-ain

Authors

Ali Q Saeed
View author publications
You can also search for this author in PubMed Google Scholar
Siti Norul Huda Sheikh Abdullah
View author publications
You can also search for this author in PubMed Google Scholar
Jemaima Che-Hamzah
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Tarmizi Abdul Ghani
View author publications
You can also search for this author in PubMed Google Scholar
Waleed Abdel karim Abu-ain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ali Q Saeed or Siti Norul Huda Sheikh Abdullah.

Ethics declarations

Conflicts of Interest

None declared

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: The copyright line has been updated and Open access statement has been inserted.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Saeed, A.Q., Sheikh Abdullah, S.N.H., Che-Hamzah, J. et al. Synthesizing Retinal Images using End-To-End VAEs-GAN Pipeline-Based Sharpening and Varying Layer. Multimed Tools Appl 83, 1283–1307 (2024). https://doi.org/10.1007/s11042-023-17058-2

Download citation

Received: 25 March 2023
Revised: 12 July 2023
Accepted: 14 September 2023
Published: 09 October 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-17058-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Synthesizing Retinal Images using End-To-End VAEs-GAN Pipeline-Based Sharpening and Varying Layer

Abstract

Similar content being viewed by others

SequenceGAN: Generating Fundus Fluorescence Angiography Sequences from Structure Fundus Image

Retinal image synthesis from multiple-landmarks input with generative adversarial networks

Retinal Image Synthesis for Glaucoma Assessment Using DCGAN and VAE Models

1 Introduction