1 Introduction

Light control through scattering media is a long-standing issue in the fields of optics and photonics for biomedical imaging, optical communication, and so on [1, 2]. Various methods have been proposed with this aim, and they are mainly categorized into feedback-based and inversion-based approaches. The first approach uses an iterative feedback process for optimizing a light pattern behind or inside scattering media [3, 4]. The second one takes the inverse of the transmission matrix for non-iterative wavefront shaping through scattering media [5, 6].

A machine learning technique has recently been introduced to the inversion-based approach for light control through scattering media to simplify the optical setup and extend the range of its applications [7, 8]. In this method, the relationship between the input and output of light through scattering media is regressed with a large number of training input and output pairs. In the previous studies, a coherent light source and a spatial light modulator were used, but these devices were costly. Here we show that the machine learning approach can be applied to control incoherent light through scattering media. This reduces the hardware cost and extends the impact of light control through scattering media to various fields. As an example of its application, we demonstrate a multiview stereo display (MSD) with scattering media based on the machine learning approach described in this study.

An MSD is one category of three-dimensional displays, which presents a three-dimensional scene to viewers. MSDs are promising for glasses-free viewing and commercial products [9, 10]. This display technique employs an optical modulation element on a two-dimensional display or a stack of multiple displays for angle control of light rays toward observers [10,11,12]. In this study, we took the first approach because it uses only a single display and is cost-effective compared to the second one. Parallax barriers, cylindrical lens arrays, and microlens arrays are typically used as the optical modulation elements in MSDs based on the first approach. An image on the display using this approach is geometrically calculated to produce arbitrary parallax images based on a pinhole array model. Misalignments and/or aberrations (especially when the viewing angle is large) of the optical modulation elements affect the reproduced image quality due to the mismatch between the model and the actual optical system [13, 14].

In our experimental demonstration, a diffuser was used as a novel optical modulation element of the MSD to show the general versatility of this approach, although it is readily applicable to conventional MSDs for compensating the model mismatch based on machine learning. Our display reproduced different images at just two viewing positions for simplicity, but it is straightforward to extend the system to MSDs with more viewing positions. Our study relaxes the requirements for optical control through scattering media and its applications. Also, the demonstrated display technique may contribute to cost reduction of the optical modulation element by, for example, allowing use of a low-quality lens array and diffuser, as well as improved performance of MSDs.

2 Methods

Fig. 1
figure 1

Schematic diagram illustrating the proposed method

A schematic diagram of the proposed method is shown in Fig. 1. A diffuser is attached to a liquid crystal display to serve as the optical modulation element. The scattered light field is captured as a parallax pair by a stereo camera with two viewing positions. In the training stage, randomly generated input patterns are displayed on the display, and their output images are captured by the camera through the diffuser. The inverse function from the output to the input is regressed with an artificial neural network. The optical process is linear because incoherent light is emitted from the liquid crystal display. Thus, we employ a simple perceptron without any hidden layers or activation layers [15]. After the training stage, a target image is provided to the network, and an input pattern is calculated. The calculated input pattern is displayed on the display, and the target image is reproduced on the camera through the diffuser.

3 Experimental demonstration

The proposed method was experimentally demonstrated. A diffuser composed of five acrylic ground plates (Template acry manufactured by Acrysunday, thickness: 3 mm, color number: 811, diffusion angle of the five stacked plates: 26 \(^\circ \)) was directly attached to the liquid crystal display (XSP-04 manufactured by Luckyster, pixel count: \(1920 \times 1080\), screen size: 13.3 in.). The stereo camera was implemented by a camera lens (Ai Nikkor 85 mm F2 manufactured by Nikon), a monochrome image sensor (PL-B953 manufactured by PixeLink, pixel count: \(768\times 1024\), pixel pitch: \(4.65~\upmu \hbox {m}\)), and an iris stop (diameter: 1 cm) located in front of the camera lens with a motorized stage (OSMS20-85 manufactured by OptoSigma). The exposure time was set to 700 ms. The left view of the parallax pair was captured by moving the iris stop to the left edge of the camera lens, and the right view was captured by moving the iris stop to the right edge. The distance between the diffuser and the camera lens was 54.9 cm. The distance between the left and right viewing positions was 3.2 cm. This distance may be controllable by choosing the scattering angle of the diffuser.

Fig. 2
figure 2

Examples of the random input patterns and their parallax pairs

Examples of the random input patterns and their parallax pairs are shown in Fig. 2. The pixel count of the images was \(64\times 64\). The image size was \(1.5~\mathrm {cm}\times 1.5~\mathrm {cm}\) on the surface of the diffuser. The simple perceptron was trained with 100,000 random input patterns and parallax pairs. The perceptron was composed of two layers (input and output layers) without any hidden or activation layer. The Adam optimizer was used for optimizing the network with an initial learning ratio of 0.001, a batch size of 32, and a number of epochs of 20 [16].

Fig. 3
figure 3

Experimental results of image transmission through scattering media with the right viewing position

First, incoherent light control through the diffuser was performed with only the right viewing position as an individual demonstration. In this case, the numbers of nodes in the input and output layers of the neural network were \(64\times 64\) and \(64\times 64\), respectively. The inputs to the network in the training stage were the captured images in the right viewing position. The network calculated an input pattern from the target image. Then, the calculated input pattern was displayed on the display, and the image was reproduced through the diffuser. The reproduced image was captured with the right viewing position of the stereo camera. The experimental result is shown in Fig. 3, where the target images were handwritten numbers randomly selected from the MNIST database [17]. The target images were reproduced through the diffuser. The reproduced images were evaluated using structural similarity (SSIM) [18]. The average SSIM of five hundred reproduced images was 0.32.

Fig. 4
figure 4

Experimental results of reproduction of parallax pairs through scattering media

Next, reproduction of the parallax pairs through the diffuser was demonstrated. In this case, images on both the right and left viewing points were provided to the network, where the numbers of nodes in the input and output layers were \(64\times 64\times 2\) and \(64\times 64\), respectively. The target parallax pair consisted of two different handwritten numbers randomly selected from the MNIST database. The network calculated an input pattern from the target parallax pair. Then, the calculated input pattern was displayed on the display, and the parallax pair was reproduced through the diffuser. The reproduced parallax pair was captured by the stereo camera. Examples of the target parallax pairs, the calculated input patterns, the reproduced parallax pairs, and the reproduced images at the center viewing position are shown in Fig. 4. The handwritten numbers were reproduced at the left and right viewing positions. The average SSIM of five hundred parallax pairs was 0.15. The crosstalk between the left and right viewing positions is shown in the reproduced images at the center viewing position.

4 Conclusion

We presented an extension of light control through scattering media based on machine learning to the incoherent case. The relationship between an image on the liquid crystal display and that captured by the camera was learned using the perceptron with randomly generated training patterns. The image on the display was calculated with the network for reproducing a target image on the camera. We showed a potential application of the concept to MSDs. In this case, the network was trained with parallax pairs captured by a stereo camera. Target images and target parallax pairs were reproduced in the experimental demonstration in both cases of single-view and multiview, respectively.

Our method simplifies the optical setup for light control through scattering media using a conventional display. The demonstrated application for MSDs here may be used for compensating optical processes, such as aberrations of lens arrays, that cannot be expressed with the pinhole array model used in conventional MSDs, and can be also extended to secure information displays [19, 20]. The crosstalk and the viewing zone of the scattering-based MSD should be investigated further in future work.