Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Phase Contrast Microscopy [19] allows researchers to acquire images on hundreds of live cells from different treatments over days or weeks without invasively staining them (Fig. 1(a)). The high-throughput experiments need a wide-view to monitor the entire cell population, so the magnification of phase contrast microscope is set low. But, the low magnification loses cell details and provides low-resolution (\(\mathbf {LR}\)) images on individual cells (Fig. 1(b)). To obtain high-resolution (\(\mathbf {HR}\)) images with cell details, we have to increase the magnification with a limited view on a small number of cells (Fig. 1(c)), which is not suitable for monitoring large-scale cell populations.

Fig. 1.
figure 1

(a) Monitoring cell populations with a low magnification; (b) A zoomed-in image shows the low resolution on individual cells; (c) A high magnification visualizes cell details but with a limited view.

To simultaneously achieve a high magnification and wide-view, we propose a novel super-resolution approach to increase the image resolution on cell details while maintaining the wide-view on cell populations.

2 Related Work and Our Proposal

The single image super-resolution problem is first tackled by some filtering approaches such as linear, bicubic [8] and Lanczos [4]. However, these filtering methods produce overly smoothed textures in the recovered HR images.

To preserve the edges in the recovered HR images, example-based approaches [17, 18] aiming at learning a mapping between low- and high-resolution image patches are proposed. The example-based methods (or patch-based methods) exploit the self-similarity property and construct high-resolution image patches from the input image [5]. However, the example-based methods suffer from the heavy computational cost of patch search. Moreover, the found low- and high-resolution image patch pairs may not be sufficient to represent the large textural variations in the testing images.

In order to overcome the drawbacks of the example-based approaches, deep learning algorithms [3, 9] are proposed to super-resolve an image. Instead of modeling the low- and high-resolution mapping in the patch space, these algorithms learn the nonlinear mapping in the image space. Dong \(et\ al.\) upscale an input LR image and train a three layer deep convolutional network to learn an end-to-end mapping between low- and high-resolution images [3]. Ledig \(et\ al.\) present a Generative Adversarial Network (GAN) for image super-resolution, which is capable of inferring photo-realistic natural images for \(4\times \) upscaling factors [12]. Lai \(et\ al.\) propose a Laplacian Pyramid Super-Resolution Network to progressively reconstruct the sub-band residuals of the high-resolution images [11]. However, these deep learning algorithms mainly focus on the natural image super-resolution, and may not be suitable for super-resolving microscopy images as no optical properties of the microscopy imaging are taken into account.

In this paper, we propose a cascaded refinement GAN for phase contrast microscopy image super-resolution. The contributions of this study are mainly in three aspects. \(\underline{First}\), to our best knowledge, this is the first framework capable of super-resolving phase contrast microscopy images for \(8 \times \) upscaling factors. \(\underline{Second}\), we design a perceptual loss consisting of an adversarial loss and a content loss for our algorithm and achieve the best performance on phase contrast microscopy image super-resolution. \(\underline{Third}\), an optics-related data enhancement is developed to improve the performance of our algorithm on phase contrast microscopy image super-resolution.

3 Preliminaries

3.1 Generative Adversarial Networks

There are one generator network G and one discriminator network D in the Generative Adversarial Networks (GANs) [6]. These two networks are trained alternatively to compete in a two-player min-max game. The generator G is optimized to simulate the true data distribution by synthesizing images that are challenging for the discriminator D to tell from the real ones. The discriminator D is optimized to try not to be fooled by the generator G by correctly distinguishing the real images from the synthetic images. These two networks play the min-max game with the following objective function

$$\begin{aligned} \min _{G}\max _{D} V(D, G) = \mathbb {E}_{x\sim p_{data}}[logD(x)] + \mathbb {E}_{z\sim p_{z}(z)}[log(1 - D(G(z)))], \end{aligned}$$
(1)

where x is a real image sampled from the real data distribution \(p_{data}\), z is a noise vector drawn from distribution \(p_{z}\) (e.g., Gaussian or uniform distribution).

3.2 Optics-Related Data Enhancement

The properties of phase contrast microscopy images (e.g., halo artifact and low image contrast) motivated researchers to derive optics-related imaging models for microscopy image restoration [16], which include a series of Diffraction Pattern Filters (DPFs, Fig. 2(a1–a6)). Rather than solving the inverse problem of image restoration, we leverage the DPFs to enrich phase contrast microscopy images (Fig. 2(b0)) by convolving it with DPFs. As shown in Fig. 2(b1–b6), the convolution generates a set of images sensitive to different types of cell regions, which is an optics-related data enhancement to the original input.

Fig. 2.
figure 2

DPFs and the enriched phase contrast microscopy images.

4 A Cascaded Refinement GAN for Super Resolution

In this section, we describe the network structure of the proposed cascaded refinement generative adversarial network (GAN) for phase contrast microscopy image super resolution, the loss function for network optimization, and the network implementation and training details.

4.1 Network Architecture

Generator Architecture: As shown in Fig. 3, the LR phase contrast image is first enhanced by convolving with some DPFs. The proposed generator takes the enhanced image stacks as input and then refines them with cascaded refinement modules. Each module operates at a certain resolution and the resolution is doubled between two consecutive modules.

All the modules \(M_{s} (s \in \{1,2,3\})\) are structurally identical and consist of three layers: the input layer, the upsampling layer, and the convolutional layer (Fig. 3). The input layer of the first module is the enriched image stack of the LR image, while the input layers of the other modules are identical to the convolutional layers of the previous modules. The upsampling layer is obtained by bilinearly upsampling the input layer. As the upconvolutions is prone to introduce characteristic artifacts to the output image [2, 14], we discard the upconvolution and use bilinear upsampling. The convolutional layer is obtained by implementing \(3 \times 3\) convolutions, layer normalization [1], and Leaky ReLu nonlinearlity [13] operations on the upsampling layer. A linear projection (\(1 \times 1\) convolution) is applied on the convolutional layer to project the feature tensor to the output image. Note that the output image at each upsampling level is not used as the input of the next module.

Fig. 3.
figure 3

Architecture of the Generator and Discriminator in our model.

Discriminator Architecture: There are four downsampling modules in the discriminator, and each downsampling module consists of \(3 \times 3\) convolutions, layer normalization, Leaky ReLu nonlinearlity, and \(2 \times 2\) max-pooling operations. The discriminator tries to classify whether an input image is real or fake.

4.2 Loss Function

Let x be the input LR image and y be the corresponding HR counterpart. Our ultimate goal is to learn a generator G with parameters \(\theta _{G}\) for generating a HR image \(\hat{y} = G(x;\theta _{G})\) which is as similar to the ground truth HR image y as possible. To learn the parameters (\(\theta _{G}\) in the generator and \(\theta _{D}\) in the discriminator), we formulate a perceptual loss function as the weighted sum of an adversarial loss and a content loss:

$$\begin{aligned} L(y, \hat{y}) = L_{adv}(y, \hat{y}) + \alpha L_{con}(y, \hat{y}). \end{aligned}$$
(2)

Adversarial Loss: The adversarial loss in the GAN encourages the generator to generate images residing on the manifold of the ground truth images by trying to fool the discriminator. From Eq. 1 we can see the adversarial loss is formulated as \(\mathbb {E}_{z\sim p_{z}(z)}[log(1 - D(G(z))]\). In our case, the generator takes a LR image instead of a noise vector as the input, accordingly the adversarial loss \(L_{adv}(y, \hat{y})\) over K training samples can be defined as:

$$\begin{aligned} L_{adv}(y, \hat{y}) = \frac{1}{K} \sum _{k = 1}^{K} [log(1 - D(G(x_{k}; \theta _{G}); \theta _{D}))], \end{aligned}$$
(3)

Early in training, we can minimize \(log(-D(G(x_{k}; \theta _{G}); \theta _{D}))\) instead of \(log(1 - D(G(x_{k}; \theta _{G})); \theta _{D})\) to mitigate the gradient saturation [6].

Content Loss: In addition to the adversarial loss in the GAN, we also add a content loss \(L_{con}(y, \hat{y})\) to our perceptual loss function. Many previous state-of-the-art approaches rely on the widely used pixel-wise mean square error (MSE) loss to learn the parameters of the LR-HR mapping functions [3, 15], however, solutions of MSE optimization problems often lack the ability of learning high-frequency contents and result in overly smoothed textures [7, 12]. Instead, we propose a new content loss using the DPFs to emphasize the loss on different types of cell regions:

$$\begin{aligned} L_{con}(y, \hat{y}) = \frac{1}{K} \sum _{k = 1}^{K} \sum _{s = 1}^{S} \sum _{n = 0}^{N-1} \Vert y_{s} *DPF_{n} - \hat{y}_{s} *DPF_{n} \Vert _{1}, \end{aligned}$$
(4)

where \(y_{s}\) is the downsampled ground truth image at level s, \(\hat{y}_{s}\) is the generated upscaled LR image at level s (the output layer of the \(s^{th}\) module), S is the number of refinement modules in the generator, N is the number of DPFs, and \(\Vert \cdot \Vert _{1}\) is the \(\ell _{1}\) distance.

4.3 Implementation and Training Details

The number of refinement modules in our generator is decided by the upscaling factor. In our experiments with the upscaling factor of \(8 \times \), there are 3 refinement modules, corresponding to a resolution increase from \(32 \times 32\) to \(256 \times 256\) in the generator. The Leaky ReLu in the generator and discriminator is with a negative slope of 0.2. Zero-padding is implemented before the convolution to keep the size of all feature maps unchanged in each module. The number of DPFs in our experiment is 12.

To train our networks, we alternate between one gradient descent step on the discriminator, and then one step on the generator. The minibatch stochastic gradient descent (SGD) is used in our experiment and the Adam solver [10] with momentum parameter 0.9 is applied to optimize the SGD. We use the batch size of 1. The learning rate is \(1e-4\), and the algorithm is trained for 100 iterations. The weight \(\alpha \) in the loss function is 1.

5 Experimental Results

Dataset: 11,500 high-resolution phase contrast microscopy images with different cell densities are captured at the resolution of \(256 *256\) pixels, and the low-resolution images are obtained by downsampling the high-resolution images. After getting the high-resolution and low-resolution image pairs, we randomly select 10,000 pairs of them as the training set, 1,000 as the validation set, and 500 as the testing set.

Evaluation Metrics: We evaluate the proposed super-resolution algorithm with a widely-used image quality metric: Peak Signal-to-Noise Ratio (PSNR).

Fig. 4.
figure 4

Upsampling phase contrast images by different algorithms (\(8 \times \) upsampling).

Evaluation: We compare our algorithm with the baseline bicubic [8] and the current state-of-the-art LapSRN [11] algorithms. Figure 4 presents the visual comparison results on some randomly picked images. The bicubic filtering gives very blurry super-resolution results. LapSRN generates much sharper and clearer high resolution images than bicubic, however, the generated images are over-smoothed. It is mainly because the designed loss function of LapSRN is not suitable for microscopy image super-resolution. By taking the optical property of phase contrast microscopy imaging into consideration and designing a perceptual loss, our proposed algorithm generates HR images with clear cell details.

Fig. 5.
figure 5

Upsampling an input phase contrast image with a wide-view. Please zoom-in the online version to observe cell details.

Given an input phase contrast image with a wide-view (e.g., Fig. 5(a)), we can divide the image into \(32 \times 32\) patches, super-resolve each patch, and then combine the predicted patches to generate a HR image with megapixel resolutions (Fig. 5(b)).

Ablation Study: First, we investigate the effect of different loss functions. As shown in (Fig. 6(b1, b2)), only using the adversarial loss cannot generate HR images with clear overall contents though some cell details are exhibited. The generated images by only using the content loss (Fig. 6(c1, c2)) are not sharp enough and the cell structures are not presented well.

Second, we investigate the effect of the optics-related data enhancement. If using the original image as the input to the generator in Fig. 3 (without enriching the input image by DPFs, the content loss is defined as \(\frac{1}{K}\sum _{k = 1}^{K} \sum _{s = 1}^{S}\Vert y_{s} - \hat{y}_{s}\Vert \) accordingly), the generated images (Fig. 6(d1, d2)) provide sharp images but with some cell details missed, compared to the images generated using the enriched input (Fig. 6(e1, e2)). The enriched phase contrast images can present more feature information of cells, especially when the original input phase contrast image has low contrast and less textures.

Fig. 6.
figure 6

The effect of different loss functions and the optics-related data enhancement.

The quantitative results of evaluation and ablation study are summarized in Table 1, which shows that our cascaded refinement GAN with optics-related data enhancement and perceptual loss (adversrial loss plus content loss) achieves the best performance.

Table 1. Quantitative evaluation.

6 Conclusion

In this paper, we investigate a super resolution algorithm to generate high-resolution phase contrast microscopy images from the low-resolution ones. Instead of upscaling the input image to the desired resolution directly, the proposed algorithm predicts the high resolution image in a coarse-to-fine fashion, i.e., increasing the spatial resolution of the input image progressively. A new loss function is designed for the proposed algorithm which consists of a content loss and an adversarial loss. The content loss forces the prediction of the high-resolution image to be similar to the real image in the feature domain enriched by optics-related data enhancement, while the adversarial loss encourages the prediction to be sharp and with more cell details.

The experiments demonstrate that our algorithm is very effective to recover high resolution phase contrast microscopy images from low resolution ones, and our algorithm outperforms the current state-of-the-art super-resolution algorithm. The research outcome provides a computational solution on achieving a high magnification on individual cells’ details and a wide-view on cell populations at the same time, which will benefit the microscopy community.