1 Introduction

The demand for fine processing has led to interest in the optical detection of micro/nano defects in a large area. Large fine optics for high-power lasers produce surfaces that are free from sub-micron defects [1,2,3]. Zero-defect manufacturing for realizing specular surfaces has been extensively studied [4]. Recent studies in the fields of additive manufacturing and semiconductor processing have shown that sub-micron defect or particle detection is critical for maintaining fine processing conditions [5]. However, it is difficult to detect sub-micron defects because the intensity of scattered light substantially decreases with decreasing defect size [6].

For the optical detection of weakly scattered light from small defects, studies have applied physical and machine vision approaches. For the physical approach, many studies have considered the dark-field condition. The structured light of a spatial or spectral region offers advantages for detecting oriented sub-micron defects [7, 8]. A line-scanning laser has been applied to improve detection speed [9, 10]. Both the intensity and phase information of scattered light can be utilized with good robustness [11,12,13]. Localized physical phenomena of evanescent waves or Raman scattering allow the detection of sub-micron defects [14, 15]. However, the above methods have poor sensitivity and efficiency. For the machine vision approach, methods that use a microscope have been developed to detect the position of small surface defects and classify them [16,17,18,19]. The numerical aperture of the optical system determines the defect detection limit.

To overcome this problem, we adopt ghost imaging (GI), which is a single-pixel imaging method based on the correlation between the intensity distribution of an illumination light and the light intensities detected by a bucket detector [20,21,22]. The high-sensitivity detector and correlation-based detection of GI make its sensitivity higher than that of general methods. GI is thus attractive for detecting weak light intensity [23]. However, obtaining ghost images is inefficient because many measurements are required to acquire clear images for correlation calculations. Therefore, GI has been limited to static objects. To overcome this problem, we apply deep learning (DL) to reduce the number of measurements required for correlation. DL is applied to predict true values using less information [24, 25]. Specifically, for the imaging field, DL can reconstruct high-resolution images using lower-resolution images. The image quality of GI with fewer measurements is low even though a GI image contains sample information. There are a few studies about DL combined with GI [26, 27]. However, these proposed papers were based on an analytical method, called compressive sensing, rather than using correlation. Therefore, it is difficult to apply for detecting weak light information. Accordingly, to recover an image for weak light intensity, we apply DL to GI (DLGI) with fewer measurements. In this paper, we develop a GI system and combine it with a convolutional neural network (CNN) based on noise analysis of GI towards fast defect-position-mapping of weakly scattered light sources.

2 Concept of Sub-Micron Defect Mapping Using Ghost Imaging with Deep Learning

The concept of sub-micron defect mapping using GI with DL is shown in Fig. 1. Computer-generated random patterns are illuminated onto a sample with small defects. Then, light scattered by the defects is detected by a single-pixel detector. GI is based on the second-order correlation of the fluctuation between illumination patterns and detected signals [20]. The distribution of correlation function \( G\left( {x,y} \right) \) is expressed as

Fig. 1
figure 1

Configuration of sub-micron defect mapping using ghost imaging with DL

$$ G\left( {x,y} \right) = \Delta I_{n}\left( {x,y} \right) \Delta B_{n} , $$
(1)

where \( \Delta I_{n}\left( {x,y} \right) \) is the fluctuation of the spatial distribution of illumination patterns and \( \Delta B_{n} \) is the intensity fluctuation of time-dependent signals detected by the single-pixel detector with n measurements.

Figure 2 shows examples of GI images. A commercial projector was used as an illumination light (wavelength: 532 nm) and a photodiode was used as a single-pixel detector to image a ϕ10-μm pinhole. As shown in Fig. 2, although the difference in illumination intensity was large (12 vs. 0.3 mW), the reconstructed images had almost the same quality. The correlation-based imaging automatically reduced the noise signals in the imaging process. However, with GI, a sufficient number of measurements is required to obtain clear images. Therefore, the detection time is very long. To overcome this problem, a CNN was employed to improve ghost image quality with fewer measurements.

Fig. 2
figure 2

Advantage of GI for detecting weak light from 10,000 measurements. Illumination power is a 12 mW and b 0.3 mW. The top graphs show time-dependent behavior of illumination light and the bottom images show reconstructed image of ϕ10-μm pinhole

3 Design and Numerical Analysis of Convolutional Neural Network for Ghost Imaging

A CNN generally consists of convolutional layers and dense layers [25, 26]. Before designing the CNN for GI, we analyzed the degradation of reconstructed GI images. Figure 3 shows the causes of GI degradation, which can be divided into local and global causes. Local causes include the presence of an alternative hypothesis and a blurred pattern. Because GI is based on a null hypothesis analogy, GI with fewer measurements does not satisfy the condition of the null hypothesis. Therefore, some pixels have a high correlation value regardless of defect existence. The alternative hypothesis is independent of each pixel. Furthermore, the point spread function (PSF) of the optical setup depends on the projection setup. Therefore, the projected pattern is blurred on the sample surface. The blurred error does not depend on each pixel. A convolutional layer can reduce the effect of these local causes. Global causes include global noise around the GI system, which affects all pixels in the image. Environmental noise, such as atmospheric turbulence and stray light, and electrical noise such as shot noise, thermal noise, and dark current, are error sources that can cause GI image degradation when few measurements are used. The signal of scattered light affects the detected intensity. A dense layer can reduce the effect of these global causes.

Fig. 3
figure 3

Analysis of reconstructed image degradation for GI

Figure 4 shows the architecture of the CNN for GI. The CNN consists of an input layer, some convolutional and dense layers, and a softmax function for deriving probability distributions. The input layer uses a reconstructed image and its illumination patterns to avoid the alternative hypothesis. First, we reduced the effects of the local and global causes using convolutional and dense layers, respectively. Then, we obtained the existence probability of light at all pixels as an image. All layers used a rectified linear unit (ReLU) function as the activation function. A dataset of 4000 images, each of which was 16 × 16 pixels and generated a numerical simulation, was used for training.

Fig. 4
figure 4

Architecture of CNN for GI. The input layer consists of reconstructed GI image and its illumination patterns

We evaluated the performance of the proposed CNN using numerical calculations. Figures 5 and 6 show the simulation results for a CNN with convolutional layers and an input layer without and with illumination patterns, respectively. The sample size was 16 × 16 pixels with one bright pixel in the center. The prediction accuracy initially increased with the number of epochs and then remained at a constant value. The accuracy of the CNN with convolutional layers was higher than that without convolutional layers. Using illumination patterns in the input layer, with or without the convolution layers, resulted in a larger difference in accuracy compared to the case without illumination patterns. This large difference is attributed to the image degradation contributing less to the null hypothesis condition.

Fig. 5
figure 5

Numerical results of prediction accuracy obtained using convolutional layers without illumination patterns for input layer

Fig. 6
figure 6

Numerical results of prediction accuracy obtained using a convolutional layer with four illumination patterns for input layer

To confirm the validity of the built CNN, we have compared the predicted position of the conventional ghost imaging with that of the proposed method. Figure 7 shows simulated results of residual from set pixel between GI and DLGI. The numbers of measurements of GI and DLGI are from 0 to 16. The sample size was 16 × 16 pixels with one bright pixel in the center. It took less than a second to derive the expected images from the original GI images. Each residual was an average of the difference between the set position and the predicted position over 1000 repeated measurements. As the number of measurements was increased, the residuals decreased. In particular, the residuals are smaller than one pixel in four or more measurements. In our proposed method, it is important to improve not only the detection speed but also the residuals. In order to meet the two requirements, we have determined that the measurement numbers were four. In the case of using the conventional GI, the residuals also decreased as the number of measurements increased. However, the residual for the four measurements was 5.61, which was higher than the value of 0.61 measured using DLGI. From these results, we have determined the number of measurements and confirmed the effectiveness of the constructed CNN.

Fig. 7
figure 7

Comparison of simulated results of residual from set pixel between GI and DLGI. The number of measurements of GI and DLGI are from 0 to 16

4 Experimental Results

The efficiency of GI with the CNN was experimentally evaluated. We used a digital mirror device (DMD) (Texas Instruments: DLP 3000) as a projector as shown in Fig. 8. Its wavelength (532 nm) was selected using an interferometric filter. Computer-generated random patterns (16 × 16 pixels) were illuminated onto a sample. The illumination light was scattered by the sample and detected by a photodiode (Hamamatsu: S6967). The detected signals were converted by an analogue-to-digital device and recorded on a personal computer. A GI image was obtained by calculating the correlation efficiency between the illumination patterns and detected signals. Figure 9 shows the experimental results of GI images obtained without the CNN. To confirm the proposed principle, we used a simple sample made of a black piece of paper with a square hole in the center. As shown in Fig. 9, the quality of the conventional GI images increased with increasing number of measurements. With 256 measurements, although the detection time was long, the position of the hole was clear, whereas with four measurements, the reconstructed image was noisy.

Fig. 8
figure 8

Experimental setup of GI for detecting scattered light

Fig. 9
figure 9

Experimental results of GI images without CNN

Figure 10 shows the experimental results of reconstructed DLGI images. The trained CNN improved the quality of the reconstructed images. However, as shown in Fig. 10a, the quality of some images was not improved. A histogram of the predicted hole position in images of 50 measurements is shown in Fig. 10b. The standard deviation of the predicted values was 7.11 pixels. These distributions are most likely due to reducing global noise imperfectly.

Fig. 10
figure 10

Experimental results of GI images (n = 4) reconstructed using CNN. a Reconstructed GI image and b histogram of 50 predicted positions

To overcome this problem, we analyzed the noise dependency of the predicted positions. Fewer measurements are not sufficient for noise reduction. In our experimental conditions, other than the detector, for example, fluctuations in air or light source intensity and so on, can also generate noise. Most of the fluctuations have a Gaussian distribution. Therefore, Gaussian distributions were used to improve CNNs. Figure 11 shows simulated results of residual from set pixel between GI and DLGI with 10% Gaussian noise signal derived from the experimental environment. The numbers of measurements of GI and DLGI are from 0 to 32 and 4, respectively. The sample size was 16 × 16 pixels with one bright pixel in the center. Before the calculation of Eq. (1), a 10% Gaussian noise was added to the detected signal B in Eq. (1). The CNN was also retrained with the new dataset for 200 epochs with a signal containing 10% Gaussian noise. Each residual was an average of the difference between the set position and the predicted position over 1000 repeated measurements. In the case of using the conventional GI, the residuals decreased as the number of measurements increased. However, the residual for the four measurements was 5.61, which was higher than the value of 3.66 measured using DLGI. From these results, we confirmed the effectiveness of the reconstructed CNN by using 10% Gaussian noise signals.

Fig. 11
figure 11

Comparison of simulated results of residual from set pixel between GI and DLGI using detected signal with 10% noise. The numbers of measurements of GI and DLGI are from 0 to 32 and 4, respectively

The residuals to the noise loading rate of the detection signal used for training were experimentally investigated by varying the noise loading rate of the detection signal used for training. The sample was the same as the one shown in Fig. 9. The amount of added noise was varied from 0 to 10%. We retrained the CNN in each noise level. Each residual was an average of the difference between the set position and the predicted position over 100 repeated measurements. As shown in Fig. 12, with increasing added noise, the residuals were decreased significantly.

Fig. 12
figure 12

Experimental results of dependence of residual from set pixel on the amount of noise used for CNN training

Figure 13 shows the experimental results of reconstructed DLGI images with 10% Gaussian noise. As shown in Fig. 13a, the quality of DLGI images of four measurements was improved. A histogram of the predicted hole position in images of 50 measurements is shown in Fig. 13b. The standard deviation of the predicted values was 3.12 pixels. Although the cause of this Gaussian effect is not well understood at this time, the proposed CNN is demonstrated to increase the quality of GI images.

Fig. 13
figure 13

Experimental results of GI images (n = 4) reconstructed using CNN trained with 10% Gaussian noise added to intensity in dataset. a Reconstructed GI image and b histogram of 50 predicted positions

5 Conclusions

We proposed a method that uses GI combined with DL for fast imaging. We introduced a CNN to increase the quality and speed of GI detection. The CNN had convolutional and dense layers and the obtained image and illuminated random patterns were applied to its input layer. In a feasibility experiment that used a DMD projector and a photodiode, the proposed method was found to improve the quality of GI images. After experimental investigation, a noise-loaded dataset was found to be effective in reducing global noise. The results show that GI with a noise-trained CNN is suitable for detecting a position of scattered light. In future work, we will attempt to improve prediction accuracy by reconsidering the CNN architecture, and implement the method to detect sub-micron defects or particles on a Si wafer with a large inspection area.