Abstract
In optical coherence tomography (OCT), there is a trade-off between the scanning time and image quality, leading to a scarcity of high quality data. OCT platforms provide different scanning presets, producing visually distinct images, limiting their compatibility. In this work, a fully automatic methodology for the unpaired visual conversion of the two most prevalent scanning presets is proposed. Using contrastive unpaired translation generative adversarial architectures, low quality images acquired with the faster Macular Cube preset can be converted to the visual style of high visibility Seven Lines scans and vice-versa. This modifies the visual appearance of the OCT images generated by each preset while preserving natural tissue structure. The quality of original and synthetic generated images was compared using brisque. The synthetic generated images achieved very similar scores to original images of their target preset. The generative models were validated in automatic and expert separability tests. These models demonstrated they were able to replicate the genuine look of the original images. This methodology has the potential to create multi-preset datasets with which to train robust computer-aided diagnosis systems by exposing them to the visual features of different presets they may encounter in real clinical scenarios without having to obtain additional data.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Optical coherence tomography (OCT) is a non-invasive medical imaging technique that can generate cross-sectional, 3-dimensional images of ocular tissue at a micrometre resolution [1]. Thanks to the advances in signal processing, optics and electronics, the quality and resolution of the images obtained through OCT have steadily improved through the years, leading to its widespread use as a diagnostic tool [2], with around 30 million OCT scanning procedures being performed every year worldwide [3]. For reference, OCT imaging has been used to study relevant ocular pathologies such as diabetic macular edema, the most common cause of blindness in patients of diabetes mellitus [4,5,6]; glaucoma, the leading cause of irreversible blindness worldwide [7,8,9] and age-related macular degeneration, the leading cause of blindness in people over 50 in the developed world [10,11,12]; as well as to study the vascular structure of the eye [13,14,15,16].
In an OCT scanning session, a low coherence optical beam is swept through the retina of the patient, generating two-dimensional images or B-scans. Due to the forward and backward scattering of light waves, these B-scans present speckle noise, the main quality affecting factor in the OCT images [17]. OCT scanners can be configured to combine and average several B-scans over the same location, leading to a reduction in noise and an overall increase in tissue visibility and quality of details. Nonetheless, this method requires that the tissue does not move throughout the scanning process, to be able to average readings taken at the same point. This requirement limits the overall amount of scans that can be performed in a session due to involuntary eye movements. This translates into a constraint between the total surface of tissue that can be analysed in a given time and the quality of the obtained tomograms.
OCT scanner platforms typically provide a series of configuration presets. This way, the specialists can choose which type of scan to be performed according to whether they need to sacrifice quality in order to be able to scan a broader surface of tissue, or whether they can afford the extra time required to perform a higher quality scan. While many presets exist, with different sweeping patterns such as radial or annular scanning, the two most widely used by expert clinicians in medical services are volume scan patterns. The first one scans a square-shaped section of the eye fundus, averaging a small amount of B-scans per final tomogram or image slice. This results in a great number of slices over a wide patch of the retina presenting speckle noise. This scanning preset is known as Fast Scan in Heidelberg spectralis\(^{{{\circledR }}}\) platforms or Macular Cube in Carl Zeiss cirrus-hd\(^{{{\circledR }}}\) models, the two most common OCT imaging platforms in clinical settings, and will be referred to as Macular Cube in this manuscript. The second one, known as Seven Lines in spectralis\(^{{{\circledR }}}\) platforms or Five Line in cirrus-hd\(^{{{\circledR }}}\), is a more intensive scan. In this configuration, a thinner band of the retina is scanned averaging many B-scans per slice. While this scan produces only a few slices over a narrow strip of tissue, these provide much higher visibility, texture detail and image resolution, resulting in images that are much clearer and easier to analyse than the aforementioned preset. We will henceforth refer to this type of scanning preset as Seven Lines. An example of the sweeping pattern and a resulting slice for each of these two most representative scanning presets is presented in Fig. 1.
While these presets are the two most prevalent in medical services, this trade-off between scanning time and image quality affects any other configuration, constituting a common paradigm in OCT imaging. Consequently, this leads to a forced choice between the quality of the produced images and the amount of tissue that can be analysed, which therefore results in a shortage of high quality OCT images. Furthermore, the images that are produced by using different presets present visual differences that complicate the development of automatic computer-aided diagnosis (CAD) systems.
Recent years have seen an advancement both in medical imaging techniques as well as in computational architectures and algorithms, leading to the development of new and improved CAD systems based on deep learning [18, 19]. These systems can aid in the detection of several relevant pathologies while achieving equal or better results than board-certified specialists [20,21,22,23]. Nonetheless, the development of CAD systems based on machine learning requires well-curated data for training. The images in the training datasets should cover all the possible variabilities in imaging platforms, acquisition conditions, presets and configurations that are present in the clinical setting that the system is intended to work in, so that it can perform its diagnosis functions adequately under such conditions. The labour and economical costs associated with the acquisition of these images, combined with their sensitive nature, lead to an aggravation of the problem of data scarcity, which affects all the domains of application of deep learning [24] and is, therefore, especially prevalent in medical imaging [20].
Given the relevance of this issue, some works have addressed the problem of improving the quality of OCT scans to facilitate the visual inspection of the images and their clinical diagnosis. In 2004, Adler et al. [25] first proposed the use of wavelet filters to reduce the speckle noise in the OCT images. Similarly, other authors continued to improve the image denoising by applying different statistical models such as Bayesian inference [26, 27], non-local means [28] or Huber total variation regularisation [29]. Other approaches expanded upon the wavelet filtering method by applying 3-D block matching techniques [30] or dictionary learning [31]. More recently, some works have approached this task by using deep learning. Apostolopoulos et al. [32] presented a study where they employ an artificial neural network to increase the contrast and reduce the noise in OCT images. Similarly, in the work of Xu et al. [33], a non-linear mapping convolutional neural network is used to perceptually enhance the images resulting in a reduction of the speckle noise. In the work of Seeböck et al. [34], the authors use a generative adversarial network (GAN) to reduce the variability between OCT images acquired with two devices, demonstrating performance gains when using the transformed images for the segmentation of retinal fluid. Lastly, the approach proposed by Huang et al. [35] uses a GAN to both remove the speckle noise that is present in the images and increase the image resolution. These works offer promising results concerning the speckle noise reduction, the perceptual quality enhancement or the increase of resolution of the OCT images, concerning visual inspection. Nevertheless, none of them has addressed the visual differences that exist between images acquired with different scanning presets and their relation to the problem of data scarcity for machine learning-based OCT CAD systems. This leaves the problem of training CAD systems with different presets to be addressed, because while speckle noise can be considered the main quality-affecting factor for OCT images, it is not the only visual difference between images acquired with multiple presets.
To mitigate this problem, this work presents a fully automatic methodology for the mutual unpaired conversion of OCT images that were acquired with different scanning presets. To do this, contrastive unpaired translation architectures are employed for the target conversion. The first approach presented in this work consists of training a model to translate the more numerous images acquired with a low-quality extensive scanning preset such as Macular Cube into the style of the higher visibility Seven Lines preset to help mitigate the issue of high quality data scarcity in OCT datasets. A second, complementary approach was designed with the intention of performing the opposite translation, transferring the visual features of images obtained with the Macular Cube preset to original Seven Lines images. The images generated by these two approaches are assessed based on their quality in a complete methodology aimed at determining the optimal point at which the generative models are able to confer the intended visual features of the target preset. This methodology can not only increase the total available number of images by means of oversampling, helping to mitigate the problem of data scarcity that is so prevalent in medical imaging, but it also has the potential to create multi-preset datasets that can be used to train CAD systems in a robust, variability tolerant manner. While this methodology is exemplified through the use of these two most common scanning presets in this work, it is also extensible to any other scanning preset or OCT imaging device as they are all affected by this compromise between scanned area in a given time and image quality. This way, deep learning models can be trained with the visual features of the various presets and acquisition conditions that it may encounter in its use in a clinical environment without the need to procure the otherwise scarce original images acquired with such presets. Preliminary results of this strategy were obtained in [36], demonstrating that this conversion approach can be suitable to address this problem of data scarcity.
2 Methods
In this section, the materials and resources that were used for the implementation of this work are covered, along with a description of the OCT image translation methodology. Specifically, the reader can find information on the dataset that was used (Section 2.1), the software and hardware resources (Section 2.2), a description of the image translation methodology (Section 2.3) and an explanation of the experiments conducted to validate the synthetic generated images (Section 2.4).
2.1 Dataset
Regarding the dataset, a total of 1034 OCT images, acquired using a Heidelberg spectralis\(^{{{\circledR }}}\) platform, were used. These images were obtained from 56 different patients participating in a study of diabetic macular edema in accordance with the Declaration of Helsinki, approved by the local Ethics Committee of Investigation from A Coruña/Ferrol (2014/437) the 24th of November, 2014. OCT image resolutions ranged from 511 × 495 to 1535 × 495 pixels. In total, 517 of the images were obtained using the Fast, or Macular Cube preset. In this preset, the scanner sweeps a 20∘× 20∘ patch of the eye fundus, averaging 9 B-scans to form every tomogram, obtaining 25 slices per scan. The remaining 517 slices were acquired with the Seven Lines preset. This preset involves scanning a longer and thinner, 30∘× 5∘ eye fundus strip and using the average of 25 B-scans for each of the 7 produced tomograms per session. A representative example of the OCT images produced by these presets can be found in Fig. 1.
2.2 Software and hardware resources
In this work, the PyTorch [37] machine learning library (version 1.7.1) under Python (version 3.7.7) for convolutional neural network training and validation was employed. OpenCV [38] (version 3.4.8) and NumPy (version 1.15.0) were used for all image manipulation and processing requirements. Regarding the hardware, the training and validation process of the models was performed on a computer consisting of an NVIDIA GeForce GTX TITAN X GPU, an Intel Xeon E5-2640 CPU and 64 GB of RAM.
2.3 Methodology
To perform the conversion between OCT images, the process was modeled as a “style transfer” approach, in which a neural network attempts to confer the visual features of a target class to an original image while preserving the original structure. In particular, a contrastive unpaired translation generative adversarial network (CUT-GAN) [39] architecture was used for this purpose.
The typical GAN training method involves the use of a generative network G and an additional discriminative network D whose task is to determine if images belong to the target class from the training set y ∈ Y or were synthesised by the generative network \(\hat {y} = G(x)\), while the generative network trains to maximise the probability that D makes a mistake [40]. This way, the training procedure pushes the generator G to synthesise images that resemble the target class from the training set, by using an adversarial loss:
A CUT-GAN architecture is intended for the unpaired translation of images from one domain to another. As such, the generative network G consists of an encoder Genc and a decoder part Gdec, and its task is to transfer the characterising features of the target domain to original images x ∈ X without modifying that which is common to both domains. This is achieved by adding a patchwise noise contrastive estimation loss [41] that takes advantage of the ability of the encoder part of the generative network to capture domain-invariant features such as the location of the inner limiting membrane and the choroid, as well as that of the decoder part, which has the means to synthesise domain-specific features like tissue texture as well as speckle noise, or lack thereof.
To calculate this contrastive loss, a set of features is extracted from the output from a series of layers l ∈{1,2,...,L} of the generative encoder. These features are obtained by applying the encoder to patches of both the original \(\{\mathbf {z}_{l}\}_{L} = \{H_{l}\left (G^{l}_{\text {enc}}\left (x\right )\right )\}_{L}\) and the synthesised images \(\{\hat {\mathbf {z}}_{l}\}_{L} = \{H_{l}\left (G^{l}_{\text {enc}}\left (G\left (x\right )\right )\right )\}_{L}\), passing them through a two-layer MLP network Hl. Specifically, a patch is extracted from a location s in the original image, along with the a patch extracted from the same location in the synthesised image, and a series of patches from other locations S ∖ s in the original image. This patchwise loss is then calculated as a cross-entropy loss between the positive and negative examples:
where τ = 0.07 serves as a temperature with which the distances between the query and the other examples are scaled, and {0,4,8,12,16} are the layers selected for the contrastive loss. By penalising differences in the inner representation of the same image patch in both images, as well as similarities between patches extracted from different regions, the network is trained to preserve the anatomical structure of the eye while changing the visual appearance of the OCT images according to the target preset, as illustrated in Fig. 2.
In order to address the problem of the visual variability between images acquired with different OCT scanning processes, two complementary approaches were taken, which are detailed below:
-
1.
First approach: Macular Cube to Seven Lines Conversion: The first approach was designed to address the issue of high quality data scarcity in OCT by perceptually increasing the quality of the more numerous Macular Cube preset scans. A CUT model was trained to transfer the higher visibility style of Seven Lines images to samples acquired with the Macular Cube preset, effectively conferring them the reduction in noise of the more intensive scanning preset, along with its more precise visual features. Through this approach, high quality data scarcity can be compensated by converting the more readily available images into the higher quality visual style of Seven Lines scans. An example of this conversion is illustrated in Fig. 3.
-
2.
Second approach: Seven Lines to Macular Cube Conversion: The second approach has the complementary purpose of converting higher quality Seven Lines scans into the style of the extensive Macular Cube preset. Therefore, a second CUT model was trained on the same data to perform the opposite translation. This results in an overall increase in the amount of available images. Moreover, these additional images can be used to train machine learning-based CAD systems in a preset variability-tolerant fashion by exposing them to the different presets that the system may find in a real clinical setting. An example of the visual features of this conversion can be found in Fig. 4.
For both of these approaches, a residual network [42] backbone with nine residual blocks was used as a base architecture for the CUT generator. The original images were resized to 286 × 286 pixels, with random crops of 256 × 256 being used as training inputs. During the training, the models were optimised using Adam [43] with β1 = 0.5,β2 = 0.999 and a learning rate of 2e − 4. The training process lasted for a maximum of 400 epochs, linearly decaying the learning rate for the last 200. Finally, both of these models were used to generate the synthetic counterpart for every original image, as illustrated in Fig. 5.
2.4 Evaluation
A series of experiments were conducted to evaluate the images that were generated by both of the approaches described above. This subsection describes how these experiments were carried out.
In order to validate the perceptual quality of the synthetic images, a qualitative experiment using the Blind/Referenceless Image Spatial Quality Evaluator (brisque) [44] was performed. brisque is an image quality evaluator which, unlike other measures such as Peak Signal to Noise Ratio or Structural Similarity Index Measure [45, 46], does not require a reference image to compare. Instead, it returns a score indicative of the perceptual quality of the processed image, with lower brisque scores indicating a higher image quality. To do this, a series of luminance coefficients are used to measure distortions and their orientations in the image. These are used to compute a series of features at multiple scales. Then, these features are classified and quantified by support vector machines. This way, the different distortions and their effect on image quality and perception can be measured. In this line, brisque has been previously used to assess the quality of medical images with favourable results [47,48,49]. In this experiment, the brisque score of each set of the original Macular Cube and Seven Lines images was calculated and compared to those corresponding to the synthetic images.
Complementarily, a set of experiments aimed at measuring the perceptual quality of the synthetic generated images was conducted. The equivalent number of looks (ENL) and the contrast-to-noise ratio (CNR), calculated as \(\text {ENL}=\frac {\bar {x}_{\text {BG}}^{2}}{s_{\text {BG}}^{2}}\) \(\text {CNR}=\frac {\bar {x}_{\text {ROI}}-\bar {x}_{\text {BG}}}{s_{\text {BG}}}\), where \(\bar {x}\) denotes the arithmetic mean and s denotes the standard deviation of the intensity values in the images, were used as referenceless image quality estimators. A random representative subset of 100 images of each class was annotated with the location of a homogeneous region of interest (ROI) and the background (BG) containing no tissue. This subset was employed to calculate these estimators. Furthermore, the referenceless Blind Image Quality Index (BIQI) [50] and the Natural Image Quality Evaluator (NIQE) [51] were measured for both the original and synthetic generated images. Section 3.2 covers the results of these experiments.
Subsequently, the separability of the generated images was assessed. While the training process of a GAN uses a discriminator network D to enforce the similarity between original and synthetic images of the target class, this is an intentionally simple architecture in order to avoid overpowering the generator, and it tends to be biased due to the GAN training process. To validate whether the images that were converted between the two presets display the visual features of their target presets, an automatic separability experiment was conducted. In this experiment, an external classifier model was trained to classify between images acquired with the Macular Cube preset and images acquired with Seven Lines, using a subset of the original images. This network was then tested separately with the remaining original images and with the synthetic generated images. Afterwards, the results produced by the original images were compared with those of the synthetic images. The aim of this experiment is to determine whether the synthetic generated images are classified as their original class or their target class. A densely connected convolutional network [52] architecture was chosen to serve as the external classifier model. This architecture has seen extensive use in medical image classification and screening, surpassing other convolutional neural network architectures [53,54,55,56]. Figure 6 illustrates the structure of this model.
Since two separate generative models were trained, one for each approach, each class was tested separately. The Accuracy, calculated as \(\text {Accuracy} = \frac {\mathrm {TP + TN}}{\mathrm {TP + TN + FP + FN}}\), was used to evaluate this classification. When testing the two classes separately, positives are considered to be those images of the target class. Due to the absence of true negatives and false positives, the specificity cannot be computed for this test. The results of this experiment can be found in Section 3.3.
Finally, with the purpose to further assess the perceptual validity of these images, a test was conducted to ascertain whether the specialist clinicians are able to discern between the synthetic and original images of both classes. The motivation behind this test was to assess whether these models can preserve the natural tissue structure of the eye, as well as to verify that artificial artefacts are not introduced in the synthetic generated images. The clinicians were asked to classify a random set of 200 images into whether they were acquired with the Macular Cube or the Seven Lines preset and if they were original images obtained with an OCT platform or they were generated by the network. The results of this experiment are discussed in Section 3.4.
3 Results
In this section, the results produced by the aforementioned synthetic image generation methodology are presented, along with those of the tests that were previously described.
3.1 Generative model training
The curves for both of the GAN losses, along with the contrastive losses for the training of the models are displayed in Fig. 7. These show the loss pattern that is often apparent in GAN training where both discriminator and generator losses tend to converge to a relatively stable value as the training progresses. This mutually dependent stability, however, complicates the task of determining the optimal stopping point of the training process. To work around this problem, both generative models were trained for up to 400 epochs, which was found to be a sufficient length for them to produce satisfactory results. During the training, a checkpoint copy of the generative network state was saved every 20 epochs, for a total of 20 checkpoints per model. At each of these checkpoints, the complete set of images was generated. Inference time at this stage was measured at 250 milliseconds per generated image. Then, the images that were generated by each checkpoint were evaluated using brisque to determine the training checkpoint which produced the best results for each model This process is explained in the next subsection.
3.2 Image quality assessment
A test was conducted using the brisque score to validate the perceptual quality distribution of the synthetic images. The aim of this experiment was to determine whether the quality distributions of the synthetic generated images are similar to those of the original images. In this experiment, every OCT image for each of the classes in the original dataset was evaluated using brisque. Then, the brisque score for each image generated by every checkpoint of the generative models was calculated and compared to the original images. Figure 8 shows the evolution of the brisque score for both computational approaches for different epochs.
These graphs are indicative of the ability of the generative models to approach the image quality of the target classes. The highest brisque score (corresponding to the lowest perceptual quality) is achieved by the original Macular Cube images, represented in red. These images also show the highest variability, with their standard deviation being indicated in red dashed lines. Original Seven Lines images, represented in green, show an overall lower brisque score, also having a lower variability than their Macular Cube counterparts. Overall, all the synthetic images show a reduced variability in image quality, with synthetic Seven Lines images displaying a considerably higher quality than their original Macular Cube counterparts and stabilising around the lower fringes of the original Seven Lines distribution. In accordance with these results, the lowest scoring of the Seven Lines-generating model checkpoints and the highest of the Macular Cube-generating model checkpoints were selected, and the brisque score distributions of their generated images were studied. The histograms of the indicated distributions of the original and the synthetic images can be found in Fig. 9.
Complementarily, a representative subset of the original Macular Cube images achieved an ENL and a CNR of 94.34 ± 44.41 and 143.35 ± 15.38 respectively, while the synthetic generated macular cube images were measured at 64.81 ± 27.73 and 144.26 ± 10.88. On the other hand, the higher-quality original Seven Lines images achieved an ENL and CNR of 152.56 ± 81.08 and 146.88 ± 25.59, while their synthetic counterparts were rated at 188.28 ± 111.08 and 147.68 ± 23.32. In terms of automated image quality scores BIQI and NIQE, the original Macular Cube images achieved scores of 25.79 ± 4.20 and 6.42 ± 2.53 respectively, while synthetic generated Macular Cube images were rated at 35.62 ± 4.52 and 8.72 ± 1.11. Regarding the Seven Lines images, the originals were rated at scores of 24.76 ± 1.84 and 6.05 ± 0.80, while their synthetic counterparts achieved a very similar 24.71 ± 1.84 and 6.12 ± 1.90.
3.3 Separability test
As previously mentioned, a test was performed to validate the separability of these synthetic images by training a densely connected convolutional network to classify original images between those obtained with the Macular Cube preset and those acquired with Seven Lines. The original dataset was randomly partitioned in balanced sets, with 60% (622 images) forming a training set, 20% (206 images) making up a validation set to prevent overfitting and the remaining 206 images being used to test the network. Training and validation losses and accuracy values for this model are presented in Fig. 10.
The synthetic images converted by both of the generative models at the checkpoints that were selected in the previous subsection based on their brisque score were then tested with this network. The accuracy obtained for the original and synthetic dataset is represented in Table 1. It should be clarified that, for the Macular Cube to Seven Lines model, the positives are the images of the Seven Lines class, while the opposite is true for the inverse model.
3.4 Validation by clinical specialists
Complementarily, a final test was conducted in order to assess whether medical specialists are able to detect the synthetic images. A random subset of 200 images which are representative of the four classes was created, with 72 of them being original Macular Cube, 31 synthetic Macular Cube, 64 original Seven Lines and 33 synthetic Seven Lines. The synthetic images were generated by the models that were selected as described in Section 3.2. Two ophthalmologists from the Hospital Clínico San Carlos in Madrid were asked to determine whether each of the OCT images was of the Macular Cube or Seven Lines type, and whether they were original or synthetic. One of the clinicians is a medical resident, while the other is an expert specialist with extensive medical experience. The two confusion matrices representing the final results of the test are displayed in Fig. 11.
4 Discussion
In this section, the results obtained from the generative methodology as well as those of the tests that were performed to evaluate the synthetic generated images are discussed.
4.1 Image quality assessment
The image quality assessment results obtained from evaluating the synthetic generated images using brisque (Fig. 9) show that all the sets are positively skewed, with most images having a lower brisque score than the mean and a long tail of samples with increasingly higher scores, formed by images with progressively lower quality. This is especially apparent for the original Macular Cube samples, which is to be expected of the set with the highest variability in noise and tissue visibility, being the fastest scanning preset considered. Conversely, while the synthetic images also present a similar positive skewness, there is a significant reduction in the amount of unusually highly scored images. These histograms also show the previously mentioned decrease in variability for the synthetic generated images. Overall, the generative models show a great consistency and stability at generating the synthetic images, producing images that show score distributions coherently formed around their respective target quality.
The images were also visually inspected to ensure that the measured changes in brisque score correspond to changes in actual perceptual quality. The synthetic Seven Lines images show a significant reduction in speckle noise and tissue visibility in the retina and choroid (Fig. 12). Conversely, regarding synthetic Macular Cube, it is apparent that generated images representing the original tissue with an addition of speckle noise and visual features bear resemblance to the original Macular Cube samples (see Fig. 13). In both figures, the values of the brisque score correlate with perceptual changes in image visibility, with noisier images being rated higher scores, and images showing greater retinal and choroid visibility achieving lower scores.
Aside from this, the results for the GAN evolution (Fig. 8) show that, for some epochs, synthetic Macular Cube images seem to be rated at a similar or higher quality than their original Seven Lines counterparts. While this behaviour is not necessarily unusual when training a GAN due to the oscillatory nature of both the discriminator and generative networks, these images were also inspected. A sample of images from training epoch 260 of the Seven Lines to Macular Cube translating model, in which the network perceptually increased the quality of the Seven Lines images, can be found in Fig. 14. This behaviour, combined with the absence of an absolute indicator of GAN training progression, is what motivates the use of an external quality evaluator such as brisque to assess image quality and determine a satisfactory epoch to stop the training process at a point where the generated images present the desired visual features.
4.2 Separability test
The results obtained from the automatic separability test (Table 1) show that the synthetic images are able to mimic the visual features of their target class, with every original Macular Cube converted to Seven Lines being correctly recognised as a Seven Lines image and only 6 synthetic Macular Cube images being confused as Seven Lines. The synthetic test set contains the images which the network recognises as their original class, but converted to the opposite class. These synthetic generated images were classified by the model as their target class instead of the original one, indicating that they display the intended visual features.
4.3 Validation by clinical specialists
Regarding the ability of the ophthalmology specialists to discern the original and synthetic images of the two presets, the test results (Fig. 11) highlight the difference in experience level between both specialists and its relevance for a complex problem such as this, with the resident achieving an overall accuracy of 24% and the expert correctly guessing more than half of the samples. This complexity is even more evident when evaluating Macular Cube and Seven Lines separability, with the resident attaining an accuracy of 44% while the expert reached 93%. This indicates that while the OCT images are clearly visually separable according to their acquisition preset, this is by no means a trivial problem, and the ability to do so is acquired with experience. Absence or presence of speckle noise is not enough to differentiate between the images, with other visual features being necessary to distinguish them. When taking into account original and synthetic separability, the results show that the synthetic images are able to deceive even the expert specialist. The expert was not able to determine whether images were originally acquired with a scanner or converted by the generative networks while correctly identifying the visual features that characterise both Macular Cube and Seven Lines images. Most of the synthetic Seven Lines samples, presenting a clearer visibility, were incorrectly classified as original images while, conversely, most of the original Macular Cube images, which display more noise and a perceptually worse appearance, were mistakenly identified as synthetic. These results show that the synthetic generated images are effectively indiscernible from the original ones while at the same time preserving the distinctive visual features of their target classes. The obtained results are also indicative of the absence of visual artefacts that could be introduced in the synthetic generated images. Therefore, the tests that were conducted with the specialists demonstrate the substantial performance of the generative models, showing that they are able to generate images that are interchangeable with the original ones even to the expert eye. All of the results obtained indicate that these models are suitable for the purpose of supplying datasets with images converted to the style of different configurations that can be used just as if they were acquired with their target presets.
It should also be highlighted that while other approaches exist in the literature for the denoising or resolution enhancement of low quality images, this proposal is the first to address the problem of data scarcity in OCT through the mutual conversion of images between scanning presets. Due to this focus shift from the enhancement of low quality images to the mutual conversion between visual features within a domain, no comparison with the currently existing methods can be drawn.
5 Conclusions
OCT is a relevant medical imaging technique that can be used in conjunction with CAD systems to diagnose relevant ocular pathologies and to study the eye tissue. The overall quality and visibility of the OCT images is considerably affected by light scattering. To overcome this, the OCT scanning platforms typically sample each point multiple times and average the signals to obtain a clearer image. Due to involuntary eye movements, there is a limitation to the amount of samples that can be taken in a scanning session. OCT platforms provide a number of scanning presets that determine the number of scans that are averaged per OCT image, balancing the amount of tissue that is scanned and the quality of the produced images. This compromise between sampled area and image quality leads to a scarcity of high quality data. Moreover, the visual differences that exist between images obtained with different presets limits the potential of datasets based on the scanning preset that was used to acquire their images.
In this work, a complete methodology for the automatic mutual conversion of OCT images has been presented. These OCT images were acquired with the two scanning presets most representative of those used by clinical specialists in medical services: Macular Cube, a fast scanning preset which produces 25 eye slices over a square patch of the retina with considerable speckle noise, and Seven Lines, an intensive preset that can create cleaner images at the cost of only producing 7 B-scans over a narrow band per session; representing a context of image quality versus quantity compromise that is so widespread in several medical imaging areas. This mutual conversion is achieved by training a contrastive unpaired translation GAN model to translate the more numerous Macular Cube images into the higher-visibility style of the intensive Seven Lines preset and a second model to perform the complementary conversion. The quality of the synthetic images generated by these models is assessed and compared to the originals in order to determine the optimal training model checkpoint, with an aim to validate the quality of these images and to solve the problem of when to stop the GAN training process.
The experiments that were conducted to validate the synthetic generated images show that these are able to display the visual features of those acquired with their target preset. Qualitatively, the brisque score of original and synthetic images of each preset are very similar, with synthetic images presenting a consistent stability around their target quality distributions. In a validation experiment using a dense convolutional network trained to classify the original images based on their acquisition preset, the synthetic generated images demonstrated a clear separability, being classified as if they were originals of their target preset.
Complementarily, as a way to assess the visual and perceptual qualities of the synthetic images, two ophthalmology specialists with different levels of experience were tasked with classifying images according to whether they are original or synthetic, and according to their acquisition presets. In this experiment, the clinicians were unable to discern between the original and synthetic images, while the expert was clearly able to correctly identify the presets of the originals and the intended target ones of the synthetic generated images. Overall, the generative models demonstrated their ability to provide synthetic generated images that are exceptionally similar to the original ones of their target classes, even to the expert eye.
From the obtained experimental results, it can be concluded that this methodology is able to replicate the visual features of each of the presets in images acquired with another. The synthetic images were validated in terms of perceptual quality, automatic separability and expert separability, with results showing that they resemble their target presets in each of these terms. These generative models can be used to supply OCT datasets limited by their acquisition presets with quality synthetic generated images that display the visual features of any other preset.
Plans for future work include assessing the possible benefits that may be obtained from the paired translation of images in terms of tissue preservation, as well as the possible application of this methodology to produce multi-preset datasets that can be used to train CAD systems in a more robust manner, allowing them to train with all the possible presets it may encounter in a real setting without the need to procure these images. Furthermore, a more elaborate analysis and evaluation of these models and how they perform when trained with images belonging to patients of different ages, sexes and affected by different pathologies is considered for future work. Lastly, this methodology should be considered for the exploration of this context of data scarcity related to image quality and acquisition conditions in other fields of medical imaging where it is so widespread, constituting a paradigm in itself.
References
Huang D, Swanson E, Lin C, Schuman J, Stinson W, Chang W, Hee M, Flotte T, Gregory K, Puliafito C, Fujimoto J. (1991) Optical coherence tomography. Science 254 (5035):1178–1181. https://doi.org/10.1126/science.1957169, https://science.sciencemag.org/content/254/5035/1178, https://science.sciencemag.org/content/254/5035/1178.full.pdf
Drexler W, Fujimoto JG (2008) State-of-the-art retinal optical coherence tomography. Prog Retin Eye Res 27(1):45–88. https://doi.org/10.1016/j.preteyeres.2007.07.005, https://www.sciencedirect.com/science/article/pii/S1350946207000444
Swanson EA, Fujimoto JG (2017) The ecosystem that powered the translation of OCT from fundamental research to clinical and commercial impact Invited. Biomed Opt Express 8(3):1638–1664. https://doi.org/10.1364/BOE.8.001638, http://www.osapublishing.org/boe/abstract.cfm?URI=boe-8-3-1638
Hee M (1995) Quantitative assessment of macular edema with optical coherence tomography. Arch Ophthalmol 113(8):1019. https://doi.org/10.1001/archopht.1995.01100080071031
de Moura J, Samagaio G, Novo J, Almuina P, Fernández MI, Ortega M (2020) Joint diabetic macular edema segmentation and characterization in OCT images. J Digit Imaging 33(5):1335–1351. https://doi.org/10.1007/s10278-020-00360-y
Mookiah MRK, Acharya UR, Chua CK, Lim CM, Ng E, Laude A (2013) Computer-aided diagnosis of diabetic retinopathy: a review. Comput Biol Med 43(12):2136–2155. https://doi.org/10.1016/j.compbiomed.2013.10.007
Jaffe GJ, Caprioli J (2004) Optical coherence tomography to detect and manage retinal disease and glaucoma. Am J Ophthalmol 137(1):156–169. https://doi.org/10.1016/s0002-9394(03)00792-x
Tan O, Chopra V, Lu ATH, Schuman J, Ishikawa H, Wollstein G, Varma R, Huang D (2009) Detection of macular ganglion cell loss in glaucoma by fourier-domain optical coherence tomography. Ophthalmology 116(12):2305–2314.e2. https://doi.org/10.1016/j.ophtha.2009.05.025
Hood DC (2017) Improving our understanding and detection, of glaucomatous damage: an approach based upon optical coherence tomography (OCT). Prog Retin Eye Res 57:46–75. https://doi.org/10.1016/j.preteyeres.2016.12.002
Mitchell P, Liew G, Gopinath B, Wong TY (2018) Age-related macular degeneration. The Lancet 392(10153):1147–1159. https://doi.org/10.1016/s0140-6736(18)31550-2
Vidal PL, de Moura J, Novo J, Penedo MG, Ortega M (2018) Intraretinal fluid identification via enhanced maps using optical coherence tomography images. Biomed Opt Express 9(10):4730. https://doi.org/10.1364/boe.9.004730
Borrelli E, Sarraf D, Freund KB, Sadda SR (2018) OCT angiography and evaluation of the choroid and choroidal vascular disorders. Prog Retin Eye Res 67:30–55. https://doi.org/10.1016/j.preteyeres.2018.07.002
Spaide RF, Fujimoto JG, Waheed NK, Sadda SR, Staurenghi G (2018) Optical coherence tomography angiography. Prog Retin Eye Res 64:1–55. https://doi.org/10.1016/j.preteyeres.2017.11.003
de Moura J, Novo J, Rouco J, Penedo MG, Ortega M (2017) Automatic detection of blood vessels in retinal OCT images. In: International work-conference on the interplay between natural and artificial computation. Springer, pp 3–10
Kashani AH, Chen CL, Gahm JK, Zheng F, Richter GM, Rosenfeld PJ, Shi Y, Wang RK (2017) Optical coherence tomography angiography: a comprehensive review of current methods and clinical applications. Prog Retin Eye Res 60:66–100. https://doi.org/10.1016/j.preteyeres.2017.07.002
de Moura J, Novo J, Charlón P, Barreira N, Ortega M (2017) Enhanced visualization of the retinal vasculature using depth information in OCT. Med Biol Eng Comput 55(12):2209–2225. https://doi.org/10.1007/s11517-017-1660-8
Schmitt JM, Xiang SH, Yung KM (1999) Speckle in optical coherence tomography. J Biomed Opt 4(1):95. https://doi.org/10.1117/1.429925
Chan HP, Hadjiiski LM, Samala RK (2020) Computer-aided diagnosis in the era of deep learning. Med Phys 47(5):e218–e227. https://doi.org/10.1002/mp.13764
Singh LK, Pooja, Garg M, Khanna M, Bhadoria RS (2021) An enhanced deep image model for glaucoma diagnosis using feature-based detection in retinal fundus. Med Biol Eng Comput 59(2):333–353. https://doi.org/10.1007/s11517-020-02307-5
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der Laak JA, van Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88. https://doi.org/10.1016/j.media.2017.07.005
Lee JH, Kim YT, Lee JB, Jeong SN (2020) A performance comparison between automated deep learning and dental professionals in classification of dental implant systems from dental imaging: a multi-center study. Diagnostics 10(11):910. https://doi.org/10.3390/diagnostics10110910
Ting DSW, Cheung CYL, Lim G, Tan GSW, Quang ND, Gan A, Hamzah H, Garcia-Franco R, Yeo IYS, Lee SY, Wong EYM, Sabanayagam C, Baskaran M, Ibrahim F, Tan NC, Finkelstein EA, Lamoureux EL, Wong IY, Bressler NM, Sivaprasad S, Varma R, Jonas JB, He MG, Cheng CY, Cheung GCM, Aung T, Hsu W, Lee ML, Wong TY (2017) Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 318(22):2211. https://doi.org/10.1001/jama.2017.18152
Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Nelson PC, Mega JL, Webster DR (2016) Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316(22):2402. https://doi.org/10.1001/jama.2016.17216
Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data 6(1). https://doi.org/10.1186/s40537-019-0197-0https://doi.org/10.1186/s40537-019-0197-0
Adler DC, Ko TH, Fujimoto JG (2004) Speckle reduction in optical coherence tomography images by use of a spatially adaptive wavelet filter. Opt Lett 29(24):2878. https://doi.org/10.1364/ol.29.002878
Wong A, Mishra A, Bizheva K, Clausi DA (2010) General Bayesian estimation for speckle noise reduction in optical coherence tomography retinal imagery. Opt. Express 18(8):8338–8352. https://doi.org/10.1364/OE.18.008338, http://www.opticsexpress.org/abstract.cfm?URI=oe-18-8-8338
Cameron A, Lui D, Boroomand A, Glaister J, Wong A, Bizheva K (2013) Stochastic speckle noise compensation in optical coherence tomography using non-stationary spline-based speckle noise modelling. Biomed Opt Express 4(9):1769–1785. https://doi.org/10.1364/BOE.4.001769, http://www.osapublishing.org/boe/abstract.cfm?URI=boe-4-9-1769
Aum J, hyun Kim J, Jeong J (2015) Effective speckle noise suppression in optical coherence tomography images using nonlocal means denoising filter with double gaussian anisotropic kernels. Appl Opt 54 (13):D43. https://doi.org/10.1364/ao.54.000d43
Li M, Idoughi R, Choudhury B, Heidrich W (2017) Statistical model for OCT image denoising. Biomed Opt Express 8(9):3903–3917. https://doi.org/10.1364/BOE.8.003903, http://www.osapublishing.org/boe/abstract.cfm?URI=boe-8-9-3903
Chong B, Zhu YK (2013) Speckle reduction in optical coherence tomography images of human finger skin by wavelet modified BM3D filter. Opt Commun 291:461–469. https://doi.org/10.1016/j.optcom.2012.10.053, https://www.sciencedirect.com/science/article/pii/S0030401812012199
Kafieh R, Rabbani H, Selesnick I (2015) Three dimensional data-driven multi scale atomic representation of optical coherence tomography. IEEE Trans Med Imaging 34(5):1042–1062. https://doi.org/10.1109/TMI.2014.2374354
Apostolopoulos S, Salas J, Ordóñez JLP, Tan SS, Ciller C, Ebneter A, Zinkernagel M, Sznitman R, Wolf S, Zanet SD, Munk MR (2020) Automatically enhanced OCT scans of the retina: a proof of concept study. Sci Rep 10(1). https://doi.org/10.1038/s41598-020-64724-8
Xu M, Tang C, Hao F, Chen M, Lei Z (2020) Texture preservation and speckle reduction in poor optical coherence tomography using the convolutional neural network. Med Image Anal 64:101,727. https://doi.org/10.1016/j.media.2020.101727, https://www.sciencedirect.com/science/article/pii/S1361841520300918
Seeböck P, Romo-Bucheli D, Waldstein S, Bogunovic H, Orlando JI, Gerendas BS, Langs G, Schmidt-Erfurth U (2019) Using Cyclegans for effectively reducing image variability across OCT devices and improving retinal fluid segmentation. In: 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019), pp 605–609, DOI https://doi.org/10.1109/ISBI.2019.8759158, (to appear in print)
Huang Y, Lu Z, Shao Z, Ran M, Zhou J, Fang L, Zhang Y (2019) Simultaneous denoising and super-resolution of optical coherence tomography images based on generative adversarial network. Opt Express 27(9):12,289–12,307. https://doi.org/10.1364/OE.27.012289, http://www.opticsexpress.org/abstract.cfm?URI=oe-27-9-12289
Gende M, de Moura J, Novo J, Ortega M (2022) High/low quality style transfer for mutual conversion of OCT images using contrastive unpaired translation generative. In: Image analysis and processing – ICIAP 2022, Lecture notes in computer science. Springer International Publishing, Cham, pp 210– 220
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: an imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems, vol 32. Curran Associates, Inc, pp 8026–8037. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
Bradski G (2000) The opencv library. Dr Dobb’s. J Softw Tools 25:120–125. https://ci.nii.ac.jp/naid/10028167478/en/
Park T, Efros AA, Zhang R, Zhu JY (2020) Contrastive learning for unpaired image-to-image translation. In: Computer Vision – ECCV 2020. Springer International Publishing, Cham, pp 319–345, DOI https://doi.org/10.1007/978-3-030-58545-7∖_19, (to appear in print)
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144. https://doi.org/10.1145/3422622
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: III HD, Singh A (eds) Proceedings of the 37th international conference on machine learning, Proceedings of Machine Learning Research. https://proceedings.mlr.press/v119/chen20j.html, vol 119. PMLR, pp 1597–1607
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2016.90, pp 770–778
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015 Conference Track Proceedings. arXiv:1412.6980
Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708. https://doi.org/10.1109/tip.2012.2214050
Horé A, Ziou D (2010) Image quality metrics: PSNR vs. SSIM. In: 2010 20th international conference on pattern recognition. https://doi.org/10.1109/ICPR.2010.579, pp 2366–2369
Wang Z, Bovik A, Sheikh H, Simoncelli E (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861
Zhang Z, Dai G, Liang X, Yu S, Li L, Xie Y (2018) Can signal-to-noise ratio perform as a baseline indicator for medical image quality assessment. IEEE Access 6:11,534–11,543. https://doi.org/10.1109/access.2018.2796632
Yu S, Dai G, Wang Z, Li L, Wei X, Xie Y (2018) A consistency evaluation of signal-to-noise ratio in the quality assessment of human brain magnetic resonance images. BMC Med Imaging 18(1). https://doi.org/10.1186/s12880-018-0256-6
Chaabouni A, Gaudeau Y, Lambert J, Moureaux JM, Gallet P (2014) Subjective and objective quality assessment for H264 compressed medical video sequences. In: 2014 4th international conference on image processing theory, tools and applications (IPTA). https://doi.org/10.1109/IPTA.2014.7001922, pp 1–5
Moorthy AK, Bovik AC (2010) A two-step framework for constructing blind image quality indices. IEEE Sig Process Lett 17(5):513–516
Mittal A, Soundararajan R, Bovik AC (2013) Making a “completely blind” image quality analyzer. IEEE Sig Process Lett 20(3):209–212
Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2017.243. IEEE, pp 2261–2269
Nugroho KA (2018) A comparison of handcrafted and deep neural network feature extraction for classifying optical coherence tomography (OCT) images. In: 2018 2nd international conference on informatics and computational sciences (ICICoS). https://doi.org/10.1109/ICICOS.2018.8621687, pp 1–6
Al-Bander B, Williams BM, Al-Nuaimy W, Al-Taee MA, Pratt H, Zheng Y (2018) Dense fully convolutional segmentation of the optic disc and cup in colour fundus for glaucoma diagnosis. Symmetry 10(4). https://doi.org/10.3390/sym10040087, https://www.mdpi.com/2073-8994/10/4/87
Wang S, Tang C, Sun J, Zhang Y (2019) Cerebral micro-bleeding detection based on densely connected neural network. Front Neurosci 13. https://doi.org/10.3389/fnins.2019.00422
Yildirim O, Talo M, Ay B, Baloglu UB, Aydin G, Acharya UR (2019) Automated detection of diabetic subject using pre-trained 2D-CNN models with frequency spectrum images extracted from heart rate signals. Comput Biol Med 113:103,387. https://doi.org/10.1016/j.compbiomed.2019.103387, http://www.sciencedirect.com/science/article/pii/S0010482519302641
Funding
Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research was funded by Instituto de Salud Carlos III, Government of Spain (research project DTS18/00136); Ministerio de Ciencia e Innovación, Government of Spain (research projects RTI2018-095894-B-I00, PID2019-108435RB-I00, TED2021-131201B-I00 and PDC2022-133132-I00); Consellería de Cultura, Consellería de Cultura, Educación e Universidade, Xunta de Galicia, through Grupos de Referencia Competitiva (grant number ED431C 2020/24), predoctoral grant (grant number ED481A 2021/161) and postdoctoral grant (grant number ED481B 2021/059); Axencia Galega de Innovación (GAIN), Xunta de Galicia (grant number IN845D 2020/38); CITIC, Centro de Investigación de Galicia (grant number ED431G 2019/01), receives financial support from Consellería de Educación, Universidade e Formación Profesional, Xunta de Galicia, through the ERDF (80%) and Secretaría Xeral de Universidades (20%). Funding for open access charge: Universidade da Coruña/CISUG.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of Universidade da Coruña/Ferrol (2014/437) the 24th of November, 2014.
Conflict of interest
The authors declare no competing interests.
Additional information
Disclaimer
The funding sources had no role in the development of this work.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gende, M., de Moura, J., Novo, J. et al. A new generative approach for optical coherence tomography data scarcity: unpaired mutual conversion between scanning presets. Med Biol Eng Comput 61, 1093–1112 (2023). https://doi.org/10.1007/s11517-022-02742-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-022-02742-6