Introduction

Flat panel displays have become ubiquitous with widespread applications spanning from microdisplays, smartphones, pads, notebook and desktop monitors, to large screen TVs, just to name a few. While we enjoy the rapid advances of two-dimensional (2D) displays, the pursuit of ideal 3D displays [1,2,3] has never stopped since they have the potential to offer natural viewing experience similar to the real world. More recently, the fast growth of head-mounted displays (HMDs) [4] for virtual reality (VR) [5] and augmented reality (AR) [6] further accelerates the 3D display development. The current industry standard to provide a 3D sensation is still the conventional stereoscopy, where each eye is provided with a distinct 2D image rendered with binocular disparity. In this fashion, the eyes can converge to different 3D objects with correct angles, indicating that one of the critical depth cues [7], the vergence cue, is correctly provided to offer the observer a 3D sensation. However, the apparent distance between the image and the user is usually fixated by the physical screen and corresponding viewing optics. Consequently, the adaptive lenses in human eyes are always focused on a single depth when the viewer is observing a 3D scene with continuous depths. This inability to display retinal blur results in the loss of correct eye focusing, which is another critical depth cue, the accommodation cue. Since these depth cues are usually matched when we observe the real world [Fig. 1(a)], their mismatch, also referred to as vergence-accommodation conflict (VAC) [8, 9], in stereoscopic displays [Fig. 1(b)] can slow down the binocular imagery process and also lower the depth perceiving precision of human vision system (HVS) [10]. VAC has also been reported to cause visual fatigue, including but not limited to eye strain, vision blurring and headache, during and even after the viewing [11, 12].

Fig. 1
figure 1

Stimuli to accommodation and vergence. The vergence-accommodation conflict occurs because the accommodation distance in the stereoscopic displays is fixed, unlike that in the real world. The multifocal methods can provide effective near-correct depth cues with a stack of 2D focal planes, mitigating the mismatch between vergence and accommodation cues

A variety of methods have been proposed to address VAC [13], especially in HMDs, because of the rapidly increasing interest in AR and VR during this decade. Since the vergence cues in the stereoscopic displays can be rendered similar to nature viewing, the majority of these methods provide correct accommodation cues to mitigate their conflict with the vergence cues. The representatives of this kind of methods include volumetric displays [14,15,16,17], which directly depict the information in a physical 3D space, and holographic displays [18,19,20,21] that reconstruct the real wavefront of a 3D scene. However, so far these technologies are still not ideal for practical applications due to their demanding hardware requirements. On the other hand, there are also solutions with extended depth of focus (DOF), such as Maxwellian-view retinal projection [22,23,24,25] and varifocal displays [26,27,28], offering blur-free images independent of the human eye’s focusing state, i.e., accommodation. Since these approaches cannot provide the natural optical blur, they require additional computational rendering to include the retinal blur in the digital images with proper eye-tracking devices [29] for a realistic viewing experience.

To offer acceptable accommodation cues and alleviate the heavy burden on hardware, multifocal displays [Fig. 1(c)] with approximate depth blur have been proposed and studied intensively since the beginning of this century [30]. As the name implies, multifocal displays generate multiple 2D virtual images spanning the desired range of spatial focal depths. This approach takes advantage of HVS’s limited depth sensitivity and fast-developing flat panel displays to provide high-resolution and fatigue-free 3D viewing experience. Both varifocal and multifocal displays require the ability to show images on different focal planes, therefore they may share the same optical display architecture in some cases. The major difference between them is that varifocal displays actively track the accommodation of the eye in real time and provide a correct image at the interested depth, while multifocal displays passively offers multiple depths to simulate a 3D scene without considering where the eye focuses. In this paper, we provide a systematic overview of the current development of multifocal displays, mainly for AR and VR HMDs. This review intends to provide comprehensive perspectives and challenges for those who are interested in this fast-developing field.

Method classification

Although the multifocal display technology is rarely implemented in current commercial display products, numerous design proposals and prototype solutions have been proposed since the 1990s [8]. The multifocal solutions can be classified into two categories: depth generation approach and information multiplexing channel. Figure 2 offers a schematic illustration of the solution diagram.

Fig. 2
figure 2

Classification of methods to enable multifocal displays. The multifocal enabling methods can be categorized into power-based and distance-based ones, depending on the focal plane generating mechanism. The other classification standard is how the 2D information is multiplexed into 3D scenes. The current multiplexing channels include space, time, polarization, wavelength and their hybrids

Displays are essentially imaging systems using the retina in HVS as the final screen, which also follow the paraxial imaging equation [31]:

$$ \frac{n_i}{L_i}+\frac{n_o}{L_o}=K, $$
(1)

where ni and no are the refractive indices of image and object space; Lo and Li are object distance and image distance; K is the system optical power including the eye lens. The object and image distance are measured from the primary and secondary principle planes, respectively. Assuming the refractive index of image space inside of eyeball is a constant, there are two options to change the focal plane Li, varying the object (display) optical distance Lo/no or the system optical power K. In this manner, distance-based and power-based systems can be distinguished. The distance-based systems usually consist of several display panels or projection screens that are optically located at different depths with different optical path lengths. In contrast, in terms of power-based designs, the multiple imaging planes are varied by special optical elements that can change its optical power by external stimuli or modulating the properties of incoming light. It is worth noting that there are multifocal display designs enabled by tuning the optical distance Lo/no and system optical power K simultaneously, such as spatially cascaded freeform optical combiners with different optical power [32, 33].

In multifocal displays, the high information flow rate is usually necessary to expand the 2D displays to a 3D volume by adding another spatial dimension, which is usually realized with multiplexing processes similar to those in the communication technology [34]. There are four types of information channels reported so far for stacking virtual image planes, including space, time, polarization and wavelength.

Space-multiplexed displays are a traditional type of multifocal architecture as they spatially stack physical display panels to enrich the display information flow. This is the most straightforward and convenient method since most of the developed flat panel displays, such as liquid crystal display (LCD), organic light-emitting diode (OLED), light-emitting diode (LED) are all applicable for this purpose with appropriate optical combiners. However, trading space for focal planes usually results in an enlarged volume or a reduced resolution, both of which are critical sacrifices for practical HMD products. On the other hand, ideal transparent displays with high pixel density, compact form factor and decent see-through quality are also under intensive development in recent years, which has potential to help space-multiplexing stand out against all other information channels.

In time-multiplexed designs, each content frame is divided temporally into several sub-frames, displaying images at different depths faster than the flicker fusion threshold. This kind of method necessitates high-refresh-rate display panels and also adaptive optical components with fast switching time. The significant merit of time-multiplexing is the compact form factor, compared to those multiplexed spatially. However, a high refresh rate display is usually desired in HMDs to minimize the motion blurs [35, 36]. Implementing time-multiplexing in HMDs would render this target even more challenging.

Polarization, as an inherent property of light itself, can be utilized as the multiplexing channel for multifocal displays. Conventional LCDs can be considered as amplitude-only spatial light modulators (SLMs), while in polarization-multiplexed multifocal displays, the emitting light from screens should be modulated for both amplitude and polarization. The light amplitude from each pixel can be separated into distinct focal depths based on its encoded polarization state. One the other hand, polarization-sensitive optical devices are vital in these designs since they work as the demultiplexer to separate the multiplexed display light into the desired virtual image planes. An intrinsic limitation of polarization-multiplexed multifocal displays is the restricted number of focal planes. Since there are only two orthogonal polarization states, i.e., s- and p-polarizations, or right- and left-handed circular polarizations, the polarization-multiplexed display can generate only two independent focal depths.

Last but not least, the wavelength-multiplexing approach was recently developed [37], which employs another type of information channel embedded in the light. The tristimulus nature of HVS’s color response renders the fine visible spectrum considerably redundant for display applications. It is this redundancy that makes the cornerstone of spectrally multiplexed designs. Emissive display panels with narrow spectral bandwidth are preferred in these systems, especially the displays with laser light sources. Similar to polarization-multiplexed designs, the wavelength-multiplexing enables multifocal functionality without any sacrifice on the display frame rate nor spatial resolution. In contrast, this technology is not theoretically restricted to two-focal-plane configurations as long as the optical elements with fine spectral sensitivity are available together with the narrowband light sources. However, this normally comes at the cost of display color gamut because only the intersection part of different color gamut from all depths can be displayed to achieve uniform color appearance among the entire focal range.

System designs

The specific multifocal designs with their theoretical principle and enabling technology are introduced and discussed in this section, which is further divided into four sub-sections based on their information multiplexing channels. Since each method has its own pros and cons, there are also hybrid approaches taking advantage of more than one principle or technology.

Space-multiplexing

As the most conventional approach, space-multiplexing allows a direct way to build multifocal displays. This section summarizes the representative designs based on the spatially multiplexed channel, including transparent display/screen stack, optical combiner stack, and optical space-to-depth mapping.

Transparent display/screen stack

Rolland et al. [30] proposed a multifocal HMD with a thick transparent display stack [Fig. 3(a)] at the end of last century. Based on the visual acuity, stereo acuity and pupil size of HVS, an optimal focal plane arrangement was proposed, where virtual planes are linearly spaced within the DOF from 0 to 2 diopters with a 1/7 diopter spacing. Although transparent display panels can enable the simplest implementation of distance-based multifocal displays, it is still challenging at this time for transparent displays to manifest high transmittance with decent image quality as or close to conventional flat panel displays [38]. Most of the transparent displays are realized by reducing the aperture ratio of the light-emitting region [39], saving more non-emitting areas to achieve the semi-transparency. Therefore, there is an intrinsic trade-off between transparency and brightness in this type of transparent displays. A direct consequence of low transmittance is low light efficiency and therefore high power consumption. Additionally, since there is a significant permittivity contrast between the light-emitting area and transparent area, the diffraction of light passing through the display panel may degrade the image quality of rear panels [40]. This common issue for transparent displays can become more acute and intolerable with such a thick panel stack in multifocal displays. Moreover, due to the similar spatial periodicity of the pixels in cascaded panels, the Moiré effect [41] may happen through the multiple panels. The resulting Moiré fringes would degrade the quality and resolution of displayed images.

Fig. 3
figure 3

Multifocal display designs based on transparent display/screen stacks. The distance-based spatially multiplexed multifocal displays can be built based on (a) transparent display stack, (b) transparent dynamic fog screens, and (c) holographic scattering screen stack

Lee et al. [42] demonstrated a multi-focal display prototype by projecting 2D images onto multiple immaterial scattering FogScreens [Fig. 3(b)], which consist of a thin sheet of fog protected by surrounding non-turbulent airflow. The FogScreens [43] can be arranged with either a stack or an L-shaped configuration to extend 2D screens to a 3D display. Since the projection screens utilized here are made of fog, the audience could directly work through the 3D scene and manipulate the 3D objects. Although their experiments verified the 3D effect of the FogScreens projection display with approximate accommodation cue, the vergence cue was not enabled in this design. The images projected on the FogScreens were rendered based on the depth-fused algorithm for the midpoint of the viewer’s two eyes, so there is always an error in the images with both eyes open. Also, the fog flow is not an ideal projection screen since complicated turbulences exist within the fog flow and it tends to break up at the margin of the screen. Later, Barnum et al. [44] also presented a similar projection-type multi-plane design using water droplets instead of fog flow as the nonsolid screen. Rakkolainen and Palovuori [45] further extended this concept to fluorescent dye screens working by photoluminescence but not scattering with ultraviolet projectors, which manifest higher transparency than the standard FogScreen. The fluorescence of dye screens offers omnidirectional emission, unlike the Mie scattering from FogScreens, where bright images can be observed only from a small range of viewing angles.

Recently, Lee et al. [46] demonstrated a dual-focal projection-type see-through display based on holographic optical elements (HOEs) [Fig. 3(c)]. Each holographic screen only diffuses light from a distinct direction that satisfies the Bragg condition. Thus, with a proper spatial configuration, each HOE screen with ~ 90% transmittance can work as a see-through additive 2D focal plane. Compared with dynamic fog flow screens, the static holographic screens can offer a stable and sharper image with simpler hardware. In general, at this stage, multifocal displays using a projection screen stack can avoid the limitations and artifacts from those with transparent displays at the cost of an enlarged footprint.

Optical combiner stack

Instead of directly stacking displays or projection screens, the distance-based multifocal displays can be built by stacking multiple optical combiners, including beam splitters, freeform prisms, and lightguide. Akeley et al. [47] designed and presented a multifocal display prototype with three focal planes [Fig. 4(a)], utilizing stacked beam splitters to divide a LCD panel into three sub-panels. The flat panel had a width of 47.8 cm and a height of 29.9 cm. The resulting virtual planes were equally separated by 0.67 diopters and aligned on-axis in front of the viewer. This prototype is more for a proof of concept than a practical implementation due to its large volume, which may further increase if a wider field of view (FOV) is desired. The employment of beam splitters can be found in numerous designs not only for cascading focal planes but also for enabling the see-through functionality. Suyama et al. [48] also built a dual-focal depth fused display using a beam splitter for combining two LCD panels. Afterwards, as the most available optical combiner, beam splitters were utilized in the many later designs with space-multiplexing [49,50,51].

Fig. 4
figure 4

Multifocal display designs based on optical combiner stacks. The space-multiplexed multifocal displays can be developed by stacking optical combiners, including (a) beam splitters and (b) freeform prisms

As a distinct type of optical combiners for augmented reality displays, freeform optics enables off-axis operation and offers more degrees of freedom in the design of HMDs. Cheng et al. [32] proposed a space-multiplexed dual-focal near-eye display by stacking two freeform prisms, both of which are equipped with a micro-display [Fig. 4(b)]. According to our taxonomy, this custom-designed freeform layout is both distance- and power-based. Despite the demanding optical design process to maintain a decent contrast and resolution for both see-through and virtual images, another apparent problem of this freeform prism stack is the considerably large footprint, and it is quite challenging for miniaturization. The thickness of just two freeform prism stacks is already around 20 cm, which would be much heavier than lightguide combiners in HMD applications. In the proposed design, the two focal planes are located at 0.2 and 0.8 diopters; namely, the viewing distance ranges from 1.25 m to 5 m. To support 3D objects closer than 1.25 m, more prisms need to be stacked on the optical axis, which would render the freeform combiner even thicker. In this regard, among the various optical combiners, geometric lightguides [52] can make a better candidate for space-multiplexed multifocal displays due to their compact form factor, although no prototype has been reported yet.

Optical space-to-depth mapping

Cui and Gao [53] designed and demonstrated an power-based space-multiplexed multifocal display by dividing a display panel into four subpanels and optically mapping them to different depths with a liquid-crystal-on-silicon (LCoS) SLM located at the Fourier plane of the 4f system [Fig. 5(a)]. The SLM presents a static phase profile, including the quadratic phases that image the subpanels to different depths and linear phases that shift the center of each subpanel to the optical axis, which functions as a multifocal off-axis diffractive lens. By changing the configuration of subpanels and the phase pattern on the LCoS SLM, this design can dynamically change the number of its focal planes under the trade-off between lateral resolution and depth density. The LCoS SLM can be replaced by a diffractive optical element (DOE) [54] if the arrangement of the focal planes is fixed. This prototype can only display monochromic contents, since the display panel was covered with a narrowband color filter at 550 nm with 10 nm bandwidth, in order to reduce the chromatic aberrations originated from the wavelength-dependent effective focal lengths of the LCoS SLM. Field sequential color helps achieve full-color operation but at the expense of frame rate loss. Another problem is that this kind of design is prone to the stray light and resolution loss issue due to the phase quantization and phase resets of SLMs.

Fig. 5
figure 5

Multifocal display designs by optical space to depth mapping. The space-multiplexed multifocal displays can be established by mapping spatial locations to (a) discrete and (b) continuous focal depths

In the meantime, Matsuda et al. [55] proposed and demonstrated a full-color focal surface display with an 18° FOV. In this design, the pixels at different spatial locations on the 2D display are optically mapped to different depths using a SLM-based programmable lens with spatially varying focal length [Fig. 5(b)]. Although the time-multiplexing method was implemented to present three focal surfaces in the prototype, the novel optical feature of this work is the generation of a 3D focal surface from a 2D panel by space-multiplexing. The focal surface display has a smaller footprint than the design reported by Cui and Gao [53] due to the absence of the 4-f system and it is able to support arbitrary depth maps. Moreover, the key advantage of focal surface display over conventional multifocal displays is more accurate depth blur with reduced multiplexed images. In the prototype, the primary concerns of using SLM, such as chromatic aberrations and stray light, were identified and mitigated. The transverse chromatic aberrations were digitally corrected by pre-warping the displayed images, while the average axial chromatic aberrations were measured as 0.25 diopter within the supported DOF (0.75–4 diopters), which is less than that of typical human eyes. Also, a circular polarizer is placed upon the display panel to suppress the stray light reflections. As a result, high-resolution imagery was achieved within the supported focal range, according to the measured modulation transfer function (MTF). In spite of the FOV limited by the existence of beam splitter and the size of the SLM, a limitation of these optical mapping designs is the increased stray light when the SLM manifests a shorter focal length, as the SLM phase accuracy is relatively lower when synthesizing high spatial frequencies. Another practical concern is about the computation speed, similar to holographic displays. Here, the whole task of generating correct depth is placed on the computation, rendering it very challenging to achieve real-time global optimization of phase patterns on the SLM.

Time-multiplexing

The time-multiplexed multifocal display designs temporally divide each frame of the 3D content into multiple sub-frames with distinct depths and present 2D images sequentially through the DOF. In this dynamic type of solution, active components are utilized to avoid the difficulty of stacking multiple physical displays in a compact way, as in space-multiplexed designs. However, the information added for expanding 2D to 3D always comes with a cost. The temporally multiplexed systems necessitate not only fast-response tunable devices but also high-refresh-rate displays to attain a flicker-free performance. The following part also covers some optical layouts of varifocal displays that can be adapted to multifocal designs with updated hardware.

Mechanical sweeping

Mechanical sweeping screen or optics along the optical axis is a typical time-multiplexed distance-based method, working by actively changing the optical path lengths in the display system. Shiwa et al. [56] demonstrated the first mechanical sweeping 3D display [Fig. 6(a)] with 48° FOV in 1996. In their proof-of-concept prototype, a 20-in. cathode-ray-tube (CRT) display was split horizontally into left and right sub-screens, each displaying a distinct image content for one eye. The light emitted from each sub-screen passes through a relay lens and an eyepiece before reaching the observer’s eye. The relay lenses, which can be mechanically displaced along the optical axis, produce a real intermediate image of the according sub-screen at the vicinity of the eyepiece’s focal point. The stepper motor is able to displace the relay lens by 4 mm within 0.3 s, sweeping the virtual image from 20 cm to 10 m. The proposed design detects the observer’s gaze point and moves the virtual image to the matching depth, illustrating the concept and requirement of a varifocal display, which also includes a potential hardware layout for multifocal displays based on mechanical sweeping. This layout was further adapted by Sugihara and Miyasato [57] for a lightweight HMD.

Fig. 6
figure 6

Multifocal display designs enabled by mechanical sweep. The virtual image distance can be shifted by mechanically sweeping the viewing optics, like the relay lens in (a) and the reflective combiner in (b), or (c) the display panel

Akşit et al. [58] built a varifocal HMD with a holographic see-through projection screen and a movable curved half-mirror combiner [Fig. 6(b)]. Since their reflective combiner is placed in front of the screen but not between the viewer’s eye and the screen, the eye relief distance can stay still when varying the display depth. In their demonstration, a holographic rear projection screen is placed in front of the eye as an intermediate image plane to display the information offered by an off-axis projector. The forwardly diffused light is reflected and collimated by a curved beam combiner, which is essentially a custom spherical concave mirror with 80% reflectance and an f-number of f/0.6. They demonstrated the varifocal ability by translating their curved beam combiner back and forth up to 5 mm, covering a depth range of 1 to 4 diopters.

Shibata et al. [59] designed a varifocal display based on the mechanical translation of the display panel, instead of the viewing optics [Fig. 6(c)]. This implementation includes a 6-in. LCD panel and a custom-designed telecentric optical system, which can keep the size of the virtual image unchanged when the LCD panel is translated mechanically to offer a depth range spanning from 30 cm to 200 cm. These varifocal layouts can provide dynamic accommodation cue with eye-tracking but not the authentic optical depth blur, which necessitates displaying virtual images at multiple depths at a flicker-free rate.

Voxon Photonics [60] demonstrated a time-multiplexed multiplane or volumetric display, VX1, which consists of a high-speed projection system with a fast-moving reciprocating screen. The rear projection diffuser screen is driven back and forth at 15 cycles per second, occupying an 18 cm × 18 cm × 8 cm volume. Since the projector can offer 2D images with 1000 × 1000 resolution at 4000 frames per second, this product can display 3D scenes with ~ 200 depths at 30 frames per second. Such a dense focal plane stack offers near-correct accommodation support and vivid 3D experience. Despite its wide applications for multiuser interactive displays, this design is not adaptable so far for a mobile system such as HMDs due to the challenges in miniaturization.

Switchable screen stack

As an alternative to swept screens, active projection screens that can switch between transparent and diffusive states are also developed, utilizing liquid crystal (LC) technologies, for multifocal displays with time-multiplexing. Stacking switchable screens for high-speed projection in sequential results in a multifocal display without moving parts. The switchable screens or shutters demand not only a high contrast ratio between two states to avoid cross-talk between distinct depths but also ultra-fast switching time for supporting a dense depth stack at a flicker-free rate.

In 2004, Sullivan [61] from LightSpace Technologies integrated a custom 3-chip DLP projector employing the digital micromirror devices (DMDs) and an air-spaced stack of LC scattering shutters together as a multiplanar display called DepthCube [Fig. 7(a)]. As a compromise for achieving a higher frame rate, the color depth of the projector was restricted to 5-bit per color. In this case, the maximum frame rate supported by the DLP projector can be more than 1500 frames per second. Since it takes time for the screen to switch between transparent and scattering, a blanking interval was inserted between each 2D images in a 3D scene, which reduced the frame rate to 1000 frames per second. The 20 stacked switchable screens in the system were made of polymer-stabilized cholesteric texture (PSCT) [62], which can manifest 88% and 2% transmittance without anti-reflection coating in the transparent and scattering state, respectively. The custom PSCT screens can switch rapidly from scattering to transparent state in 0.08 ms and the other way around in 0.39 ms. This submillisecond switchable screen and fast projector enabled 5-bit full-color 2D images with a 1024 × 768 resolution displayed in 20 depths at 50 Hz. An issue of this system is that the 2D image intensity drops gradually further away from the projector. Even with a proper anti-reflection coating, the total transmittance of the stack of 20 PSCT screens decreases to 44%. Thus, it is still challenging to make full use of the light from the projection engine and minimize the crosstalk between screens.

Fig. 7
figure 7

Multifocal display designs with switchable screen stack. The time-multiplexed multifocal displays for both (a) direct-view display and (b) HMD can be constructed by high-speed projectors and switchable screen stacks

Recently, Zabels et al. [63] (also from LightSpace) demonstrated a multifocal HMD using a similar architecture. They minimized the design and added an eyepiece for generating virtual images. This prototype supports six depths, linearly spaced by 0.58 diopter, at 60 Hz but with a relatively low resolution, 480 × 800, with a 72° horizontal FOV. They improved the maximum transparency of the screens at the transparent state to 93.6% over the visible spectrum, from 420 nm to 700 nm. However, the response time is ~ 0.5 ms, which is slower than that reported by Sullivan 15 years ago, as mentioned earlier.

Liu et al. [64,65,66] developed a series of multifocal HMD benchtop systems [Fig. 7(b)] using polymer-stabilized liquid crystal (PSLC) instead of PSCT. Both PSLC and PSCT usually consist of low-molar-mass LCs and a high-molar-mass polymer. The main difference between them is that chiral dopants are added in PSCT but not PSLC. Compared with polymer-dispersed liquid crystals (PDLCs), where ~ 50 wt% LCs are dispersed as droplets in a polymer matrix, the concentration of polymer is much lower in both PSLC and PSCT, usually in the order of 3% or less [67]. Their first demonstration [64] is a single-color dual-focal display at 30 Hz, where the projector is an amplitude SLM with a 60 Hz refresh rate and two virtual image planes are located at 1.25 and 5 diopters. The PSLC shutter exhibits 6% and 86% transmittance at the scattering state and clear state, respectively. The rise and decay time of the PSLC screens are ~ 0.3 ms and ~ 0.35 ms. The second demonstration [65] is a monochromatic multifocal system utilizing reverse-mode PSLC screens that stay transparent without driving and becomes diffusive when a voltage beyond the threshold is applied, which is different from conventional PSLC and PSCT that can be switched from diffusive state to transparent state by applying a voltage. If N screens are employed in this type of multifocal system, the voltage needs to be applied to N-1 PSLC or PSCT screens but only one reversed-mode PSLC screen. Thus, the reversed-mode PSLC screens reduce the power consumption for screen driving to 1/(N-1) of that with conventional LC switchable screens. In this design, they increased the number of focal planes to four and replaced the amplitude SLM by a 120 Hz DMD projector, such that four focal planes (located at 0.2, 1.25, 2.5 and 5 diopters) can be displayed at 30 Hz refresh rate. Afterward, in [66], they built a binocular full-color dual-focal HMD benchtop demo using conventional PSLC and 360 Hz DMD projector to reach a flicker-free refresh rate at 60 Hz.

Polarization-dependent optical distance

Another type of approach for distance-based multifocal display with time-multiplexing has been developed by creating polarization-dependent optical distance in the system. The optical path between displays and optics can be switched with a high-speed polarization rotator in these systems, such that two or more virtual image planes can be displayed in sequence.

In 2016, Lee et al. [68] from our group demonstrated a proof-of-concept dual-focal near-eye display system that is temporally multiplexed and based on polarization-dependent optical distance [Fig. 8(a)]. They used a broadband twisted-nematic (TN) LC cell as the polarization rotator, manifesting a response time of 4.3 ms and 1.0 ms for rising and decay, respectively. In this design, the optical path difference is produced by placing two mirrors at different distances from a polarizing beam splitter.

Fig. 8
figure 8

Multifocal display designs based on polarization-dependent optical distances. (a) A polarizing beam splitter (PBS) separates the paths of two linear polarizations. Then, the difference between the PBS and two mirrors creates the distinct optical distances with quarter-wave plates (QWPs). (b) The modified Savart plate provides different optical distances for ordinary and extraordinary lights. (c) Scattering polarizers only diffuse one linear polarization and transmit the other. (d) CLC cells with opposite handedness reflect one circular polarization and transmit the orthogonal one. In all configurations, a polarization rotator (PR) is employed to switch the incoming polarization states. A linear polarizer (LP) is usually applied for displays with unpolarized light

Later that year, Lee et al. [69] from Seoul National University reported a temporally multiplexed dual-focal HMD prototype [Fig. 8(b)] also by switching the polarization states of the display light. They took advantage of a polarization-dependent Savart plate made of two anisotropic crystal plates, which is placed in front of the display panel to distinguish the effective refractive indices for ordinary and extraordinary lights. Thus, for light with different polarization states, the Savart plate would manifest different optical path lengths. In their prototype, the 60 Hz 1666 pixel-per-inch micro-OLED provides high-resolution yet 30 Hz contents for each of the two depths, which are placed 230 mm and 640 mm in front of the eyebox. They also put efforts to reduce the aberrations of the imaging system. Sub-pixel shifting is included in the rendering process as a digital correction of the transverse chromatic aberrations. Meanwhile, a half-wave plate is inserted between the two plane-parallel calcite plates, forming a modified Savart plate to compensate the astigmatisms optically. In addition to the devoted anisotropic optical design, another merit of the prototype is the fast response time of the LC polarization rotator, which can work at a refresh rate of 540 Hz and support up to 9 focal planes without flickering given a fast enough display.

In 2017, Moon et al. [70] built a projection-type dual-focal prototype utilizing polarization-dependent scattering polarizers as the projection screen [Fig. 8(c)]. These screens from Teijin Dupont Films can diffuse light rays with a linear polarization state and transmit those with the orthogonal polarization state. Their system consists of two screens and a 60 Hz projector synchronized with a polarization rotator with ultra-fast 30 μs response time from LC-Tec Displays AB. Similarly, the system refresh rate is display-limited but not polarization-rotator-limited as that in [69]. An issue of this design is that the diffusing angle of the scattering polarizers is around 10°, resulting in apparent vignetting in the virtual images.

Recently, Chen et al. [71] demonstrated another HMD design employing reflective cholesteric liquid crystal (CLC) cells with opposite handedness to differentiate the optical distances between right- and left-handed circularly polarized (RCP and LCP) light [Fig. 8(d)]. The polarization rotator is also a TN LC cell, whose rise time and decay time are 3.52 ms and 0.52 ms, respectively. However, it is quite challenging for a CLC cell to support a full-color operation. In their system, even with a large birefringence (Δn~ 0.4) LC material, the reflection band of the two CLC cells can only provide a high extinction ratio over the green and red spectral range.

Most of the reported multifocal systems based on polarization-dependent optical distances only demonstrate two focal planes, since there are only two orthogonal polarization states, either s/p or RCP/LCP. Generally, it requires extra polarization-dependent components to add more focal planes, but the system footprint would also increase accordingly in this case. This form factor issue is severer for the distance-based systems than the power-based ones.

Polarization-dependent lens

Optical elements with polarization-dependent focal lengths can offer the same features as the systems comprising polarization-dependent optical distances, yet usually with a compact size. Figure 9 illustrates the optical behavior of several special optical lenses with polarization-dependent focal length, including anisotropic crystal lenses [72], LC Fresnel lenses [73,74,75,76], Pancharatnam-Berry phase lenses (PBLs) [77,78,79,80] and CLC lenses [81]. Even though only part of them has been implemented to enable a multifocal display, the others also hold great potential for this application.

Fig. 9
figure 9

Optical elements with polarization-dependent focal length. Multifocal displays can be enabled by polarization-sensitive lenses, including (a) birefringent crystal lens, (b) LC Fresnel lens, (c) Pancharatnam-Berry phase lens, and (d) CLC lens. (a) and (c) work for linear polarization, while (b) and (d) function with circular polarization

Love et al. [72] reported a temporally multiplexed multifocal display prototype with four focal planes generated by two anisotropic crystal lenses [Fig. 9(a)]. These lenses are made of calcite crystals and assembled in such way that their extraordinary axis is vertical and ordinary axis is horizontal, both perpendicular to the system optical axis. In this manner, the s- and p-polarized light can experience different refractive indices and also focal lengths. Each planar-convex calcite lens provides 0.6 diopter optical power difference between the extraordinary and ordinary polarizations, so the system working focal range is 1.8 diopters with two calcite lenses. Additionally, the polarization rotator they employed is made of ferroelectric LC that can switch the polarization state very quickly (< 1 ms). In their prototype, the four focal planes are presented at 45 Hz, which is determined by the 180-Hz CRT display. If equipped with a high-frame-rate display panel, this prototype can be improved to display flicker-free multifocal 3D scenes.

The PBLs [Fig. 9(b)] working by spatially varying optical anisotropy was proposed to enable multifocal displays in [82]. More recently, Yoo et al. [83] built a dual-focal see-through near-eye display with two PBLs. This kind of lenses can be considered as polarization-sensitive DOEs and also dielectric metasurfaces, manifesting opposite optical power for RCP and LCP lights in a diffractive fashion. Unlike the calcite lens functioning by the dynamic phase, the PBLs works by the Pancharatnam-Berry phase, also known as geometric phase. Thus, the PBLs made of liquid crystal polymer are very thin, usually with a thickness < 5 μm, delivering an attractive form factor for cascading more focal planes. A practical concern of PBL is that the spectral dispersion of focal length is much severer than that of calcite lens due to its diffractive nature. Hence, PBLs with large optical power may lead to significant chromatic aberrations, as discussed by Yousefzadeh et al. [84]. Although adding a refractive optics can help correct the chromatic aberrations for one polarization as discussed in [79], it is still challenging to fix the other orthogonal polarization at the same time. With a compromise, the transverse chromatic aberrations can be corrected digitally by image warping, but the longitudinal ones are not changed. Birefringent crystal lenses and PBLs mentioned above can work with both orthogonal polarization states but with a different optical power.

There are other optical elements that only have optical power for a single polarization state. The well-developed LC Fresnel lenses [Fig. 9(c)] can continuously tune the focal length but only for the linearly polarized light whose polarization direction is parallel to the LC alignment direction. Another example is the reflective DOE lens with patterned CLC, reported by Kobashi et al. [81]. This type of CLC lenses [Fig. 9(d)] only focus or defocus light with one circular polarization in the reflective fashion, while let the that with orthogonal polarization pass through. Although the CLC lenses reported so far can only work within a limited spectral bandwidth, it is possible to enlarge the spectral range by cascading several thin-film lenses with different working wavelengths [85]. The LC Fresnel lens and CLC lens can also function as the calcite lens and PBL if they are cascaded with another one that works for the orthogonal polarization or integrated with a polarization-independent bias lens, like the standard refractive lens.

Continuously tunable lens

Various types of optical components with tunable or switchable optical power has been developed for a wide range of applications. These optical parts can be utilized to build power-based multifocal displays if they can be manufactured with an appropriate size and vary their optical power fast enough. The following sections will cover designs using transmissive tunable lenses, including liquid crystal lenses, liquid lenses, freeform lenses, and also reflective optical components such as LCoS-SLM and deformable membranes mirrors.

To our knowledge, Suyama et al. [86] demonstrated the first LC lens based multifocal display system in 2000. They fabricated an active addressable Fresnel lens with dual-frequency liquid crystal (DFLC) mixture, whose dielectric anisotropy changes sign from positive to negative by increasing the electric field frequency. The DFLC is injected to the cell made of a Fresnel lens with surface alignment layer. In this way, by alternating the driving signal frequency, the optical power of the DFLC Fresnel lens can oscillate between − 1.2 and + 1.5 diopters at 60 Hz. In the prototype, the DFLC lens is placed between two static lenses as part of the viewing optics. To maintain a constant FOV when the focal length of the optical power changes, the DFLC lens is placed at the focal point of the eyepiece. An issue about the demonstrated tunable lens is the limited imaging quality when its optical power is oscillating between the two stable states. Since the cell gap is not uniform and the alignment at the Fresnel surface is not well defined, the effective refractive index of the LC mixture may vary considerably within the whole volume during the transitional states. Thus, the disturbed phase profile would degrade the imaging ability of the DFLC lens.

Liu et al. [87] reported a see-through dual-focal HMD prototype [Fig. 10(a)] enabled by a tunable liquid lens from Varioptic™. This design is adapted from the conventional bird-bath architecture by adding an electrowetting liquid lens in front of the beam splitter. The employed tunable lens manifests a varying optical power from − 5 to 20 diopters if driven by an alternating electric field with a root-mean-square voltage from 32 Vrms to 60 Vrms. The response time of the liquid lens, Arctic 320, employed in this work was ~ 75 ms. This prototype also utilized an OLED microdisplay with a refresh rate of 85 Hz and a graphics card that can support 75 Hz rendering. Thus, the system refresh rate is limited to ~ 7 Hz by the liquid lens. Then, in their continued work [88], a faster liquid lens with 9-ms response time was adopted and boosted the dual-focal refresh rate to 37.5 Hz, which is limited by the graphics card. Since they also added an empty frame after each frame of the image content to avoid transitional states of the liquid lens, the final refresh rate could be reduced to 18.75 Hz if accurate focus cues are desired. After that, they further improved their system [89] in 2010 by upgrading the graphics card to support 240 Hz SVGA (800 × 600) contents. As a result, the refresh rate of dual-focal display with empty frames is increased to 21.25 Hz, becoming microdisplay-limited at this time. However, even if the micro-OLED is replaced by a fast-response DMD display, the system still could not achieve 60 Hz rate for two depths because the response of the tunable lens is not fast enough. As electrowetting lenses function by electrically changing the contact angle of a fluid droplet, the inertial effect intrinsically limits the response time of these lenses, especially those with a large aperture.

Fig. 10
figure 10

Multifocal displays structures employing continuously tunable lenses. Configuration (a) modified birdbath design using a beam splitter(BS) with electrowetting liquid lens. Design (b), (c) and (d) achieved flicker-free performance with ultra-fast DMD display and fluid-membrane lens. In (b) the DMD is directly synchronized with the tunable lens, while in (c) the lens is constantly driven, and its focal length is tracked in real time by an infrared (IR) optical path constructed by dichroic mirrors (DMs) and a position sensing detector. In configuration (d), the DMD synchronized with the tunable lens is employed as a fast-response pixelated backlight for the LCD panel

In 2015, Llull et al. [90] realized a flicker-free multifocal display [Fig. 10(b)] employing an ultra-high-speed DMD display and a fast-response tunable lens from Optotune™, which is based on a combination of optical fluids and a polymer membrane. The employed DMD display offers 400 Hz 6-bit grayscale imagery and the fast liquid lens can switch its focal length between 2 and 8 diopters within 2 ms, yet with 5 ms settling time. In this benchtop binocular prototype, they successfully offered six focal planes (0.6 diopters spacing) with 31° FOV at the rate of 60 Hz. Then, from the same company, Wu et al. [91] presented a multifocal display with content-adaptive depth arrangement to improve the perceived 3D image quality, which appears like a hybrid of varifocal and multifocal concept.

In 2018, Chang et al. [92] further increased the focal plane number yet lowered the frame rate, demonstrating a proof-of-concept grey near-eye display [Fig. 10(c)] with a dense collection of 40 depths at a 40 Hz refresh rate. The optical layout in this work is similar to that reported by Llull et al. [90], employing a DMD display for large bandwidth and a fast-response liquid lens for depth changing. An essential feature of their design is the focal length tracking system that can accurately acquire the real-time depth at microsecond time resolution, allowing rapid and precise synchronization between the constantly sweeping tunable lens and DMD display. In the meantime, another independent 60 Hz multifocal display work that supports full-color operation was reported by Rathinavel et al. [93], featuring a denser depth stack (280 depths) and using the same liquid lens. The novel part of this work is decomposing colored 3D scenes into 280 binary patterns, each with a single color channel and placed at one depth. The design offers a dense focal stack, spanning from 0.25 diopter to 6.7 diopters, which is adequate for supporting most 3D image contents.

Lee et al. [94] from Seoul National University proposed the tomographic display [Fig. 10(d)], featuring a 60 Hz LCD with a fast DMD backlight and a tunable liquid lens. In the prototype, 80 depths occupy the focal range from 0 to 5.5 diopters with 0.07 diopter spacing in between. When the scanning virtual image plane arrives at one depth, only the 3D content at the vicinity of this depth is illuminated by the DMD. Thus, in principle, the correct synchronization of the liquid lens and the DMD backlight can map each pixel of the 2D image displayed on the LCD to an arbitrary depth within the working focal range, generating an accurate but discretized focal surface [55]. A limitation of the tomographic display is that each pixel on the LCD is fixed at one depth, making it unachievable to provide accurate depth for 3D scenes with semi-transparent objects, where the viewer should observe multiple depths in a single direction.

Switchable lens

Aside from the continuously tunable lenses, addressable lenses that can switch between discrete optical powers can also be applied for building multifocal display systems in a time-sequential way.

In early 2018, Zhan et al. [95] fabricated switchable LC lenses employing the Pancharatnam-Berry phase and built a 4-depth multifocal HMD prototype based on them [Fig. 11(a)]. These actively addressable PBLs are made of LC instead of LC polymer, so their polarization-dependent optical power will vanish if a large enough electric field is applied to the electrodes. Since the cell gap of PBLs is usually less than 2 μm, the response time can be as fast as 0.5 ms if a low-viscosity LC material is employed. In the prototype, two LC PBLs with 0.5 and 1.5 diopters optical power, are attached together and sandwiched by two plano-convex lenses as a compact and switchable viewing optics assembly with four evenly spaced optical powers. A fast LCD with 240 Hz was employed to provide 60 Hz flicker-free contents for the four depths. After the synchronization of the LCD and two PBLs, a multifocal HMD prototype was constructed with the same form factor of commercial VR HMDs. Thanks to the bifocal nature of the switchable PBLs, this design is free from the longitudinal focal shift when switching between focal planes using continuously tunable lenses. A general limitation of this design and others using switchable lenses is that it cannot realize the dynamic multifocal structure proposed by Wu et al. [91] because of the fixed focal plane arrangement.

Fig. 11
figure 11

Multifocal displays structures enabled by switchable lenses. Multifocal displays can be enabled by switchable lenses that have discrete focal lengths, including (a) addressable PBLs made of LCs and (b) multifocal freeform lenses with a patterned shutter. It is doable to cascade multiple switchable lenses to achieve a dense focal stack

Later, Wang et al. [96] reported another multi-focal switchable lens using freeform optics and a patterned LC shutter [Fig. 11(b)]. They designed and fabricated a freeform singlet consisting of four concentric zones, and each zone has a distinct optical power. Then a custom-designed LC shutter is attached to the freeform surface, which also has four corresponding concentric patterns of electrodes that can be controlled independently. This combination realized a switchable lens with four focal lengths. The LC shutters can switch between transparent and dark states within 2.5 ms, rendering it possible for the switchable freeform singlet to scan through the four focal lengths at an overall rate of about 400 Hz. Such a fast-response freeform lens could potentially support a multifocal display with 60 Hz and six depths if combined and synchronized with a high speed display panel such as DMD.

Tunable reflector

In addition to the transmissive active optics, there also exist reflective optical parts that have been developed and exploited for multifocal display systems. Although employing reflective elements usually enlarges the footprint of display systems, a decent optical see-through functionality can be acquired if the tunable reflector is also configured as the see-through combiner.

Traub [97] is a forerunner of using varifocal mirrors for multifocal displays. The tunable mirror he employed in 1967 was a metalized Mylar membrane reported earlier in 1960 by Muirhead [98]. The membrane was taut and fixed over a circular frame as a stretchable curved mirror, which was driven by a loudspeaker to change the curvature and hence the focal length [Fig. 12 (a)]. Traub claimed that the mirror surface was mostly spherical when driven by a single frequency. In the demonstration, an oscilloscope screen was placed 45° to the normal of the mirror surface, and the viewer could directly observe the multifocal display volume from the mirror. This pioneering work successfully verified the feasibility of multifocal displays, even though the deformable membranes driven by loudspeakers are not suitable for practical applications.

Fig. 12
figure 12

Multifocal displays structures employing tunable reflectors. The deformable mirrors in (a), (c) and (d) and the FLC-SLM in (b) work as tunable reflective optical elements for multifocal system. The deformable mirror in (a) is vibrating with the air pressure from a loudspeaker while that in (b) and (c) can be driven electrically

In 1997, Neil et al. [99] demonstrated a multifocal display using a programmable reflective Fresnel zone plate generated by a ferroelectric liquid crystal (FLC) SLM that could support a high refresh rate at several kHz. This design includes two FLC SLMs, where one is used for generating 2D images and the other for functioning as the active Fresnel zone plate [Fig. 12(b)]. As a proof-of-concept, a monochrome two-level grey 3D scene was displayed with three focal planes located at 45 cm, 90 cm and infinity at 60 Hz. However, the contrast of the displayed images was not good enough as shown in the paper. This problem could be caused by the unsatisfying performance of the amplitude modulation from the first SLM and also the unwanted orders from the second SLM with the Fresnel zone plate phase. Nevertheless, this work is still a precursor of multifocal display systems using SLM as the active optics.

McQuaide et al. [100] proved the feasibility of offering multi/vari-focal planes with an electrically-driven deformable membrane mirror in a retinal scanning display [Fig. 12(c)]. The membrane mirror (from Flexible Optical B.V.) utilized in their monocular prototype was made of a thin membrane of silicon nitride that was coated with aluminum and suspended over an electrode. The membrane surface would change to parabolic shape when a voltage is applied to the electrode. By varying the voltage from 0 to 300 V, the optical power of the membrane can be continuously tuned from 0 to 1 diopter. They achieved a working focal range from 0 to 3 diopters and verified its accuracy by simultaneously measuring the eye accommodation response using an autorefractor. Then, in 2006, Schowengerdt and Seibel [101] further improved this retinal scanning design, demonstrating a binocular prototype with extended focal range, from 0 to 16 diopters.

In addition to the retinal scanning displays, the electromechanical deformable membrane was also applied in panel-based designs reported by Hu and Hua [102,103,104]. Their first demonstration [102] in 2011 features a DMD display and a deformable membrane mirror in a birdbath arrangement [Fig. 12(d)]. The illuminated DMD is imaged by the active membrane mirror to an intermediate image plane in front of the eyepiece. Even though the addressable mirror with 1-kHz rate can support up to 16 focal planes with 60 Hz contents, in this design, they demonstrated six focal planes evenly occupying the working focal range from 0 to 3 diopters. As they claimed, this is the first display system that can offer 2D images with decent image quality at six depths without flicker. Then, in [104], they miniaturized this design and modified it as the light engine for a see-through freeform combiner. With the help of custom-designed freeform optics, their six-depth AR HMD prototype can offer decent imaging (1.8 arcmin resolution) and see-through quality within 40° diagonal FOV.

Polarization-multiplexing

To our knowledge, the concept of multiplexing 2D images by polarization was firstly mentioned in early 2016 by Lee et al. [68], where they claim that a pixelated LC panel can create depth information to a 2D image by polarization modulation in the pixel level. Later in 2018, Zhu et al. [105] proposed a detailed optical design of a multifocal display with both space and polarization multiplexing. In this configuration, two LCDs are cascaded together for creating two independent depths. Then, another LC panel as the polarization modulator is employed to define a spatial-varying polarization pattern on the two LCDs, and therefore double the focal plane number with the birefringent crystal lens. However, to our knowledge, there has been no published implementation of this design yet.

Tan et al. [106] built and reported the first polarization-multiplexed dual-focal HMD prototype [Fig. 13(a)]. They used a non-switchable PBL made of LC polymer for providing distinct optical power for the RCP and LCP lights. The prototype consists of two 60-Hz LC panels (one as the display and the other as the polarization modulator), a quarter-wave plate, a PBL, and an eyepiece. Before implementation, they characterized the polarization modulation performance of the LC panel for a detailed mapping between the input grey level and output polarization states at RGB wavelengths. In this manner, a dual-focal full-color near-eye display is demonstrated with a ~ 70° FOV. Moreover, the ghost images caused by the zero-order leakage of the single-layer PBL in this work can be considerably reduced by employing a ultra-broadband PBL with tailor spectral response for display applications [79, 107]. Compared with commercial VR devices, this design only adds three flat optical components to achieve the bifocal functionality such that a small footprint can be well maintained. If the PBL is replaced by a birefringent crystal lens as designed by Zhu et al. [105], then the quarter-wave plate is no longer needed [Fig. 13(b)]. In general, most previously mentioned multifocal designs that function by polarization switching can incorporate with the polarization multiplexing method for sharing the burden from the high frame rate.

Fig. 13
figure 13

Multifocal displays structures with polarization multiplexing. Polarization-multiplexed multifocal displays usually consist of a display panel, a polarization modulator and a polarization dependent optics, such as (a) PBLs and (b) birefringent crystal lenses

Wavelength-multiplexing

The wavelength-multiplexed multifocal display system was not developed until recently by Zhan et al. in [37]. The dual-focal benchtop demonstration [Fig. 14] is essentially a distance-based dual-focal birdbath design. A spectral notch filter was used as the wavelength-sensitive component to generate distinct depths. The two laser beam scanning projectors employed in the system have close but different wavelengths, 532 nm and 517 nm, for the green channel. Due to the sharp spectral stopband of the notch filter, the 517 nm laser light can pass through it with negligible reflection, but the 532 nm light is totally reflected. In this manner, the notch-filter-based layout can create an optical path length difference, and therefore distinct focal planes for the two green wavelengths. Although this simple design verified the feasibility of offering multiple focal planes by wavelength multiplexing, there are still many challenges for practical applications. Firstly, mixing wavelengths in each color channel would directly affect the overall color performance, since only colors shared by all focal planes can be displayed if a uniform system color appearance is desired. Also, the stopband of the notch filter usually shifts under different angles of incidence, which becomes a critical limitation for achieving a larger FOV.

Fig. 14
figure 14

Multifocal displays designs using wavelength multiplexing. Two laser projectors with distinct primary wavelengths project two images on the screen. These two images are separated into two depths by the notch filter

After the detailed discussion of multifocal display optical designs, Fig. 15 offers a short summary for all the approaches and their relations discussed in this section.

Fig. 15
figure 15

Multifocal display designs. Various multifocal-enabling technologies are categorized based on their key optical features and classified based on the depth generation and information multiplexing approaches

Image rendering

The general goal of the rendering is to approximate the continuous 3D scene with discrete display planes. The planes usually have equal diopter spacing. Still, optimization for the display plane placement with respect to different 3D scenes has also been investigated [91], with increased computation burden as sacrifice. In addition, to achieve a continuously tunable multiplane display system is difficult in optics. Therefore, we only review the cases with fixed display planes and optimize for the display content.

Direct blending

In the direct blending method, the virtual objects are segmented by the display planes into several parts [Fig. 16(a)]. For the part that falls exactly on the display plane (A in Fig. 16(a)), the content is directly displayed by the corresponding pixels on that plane. The part in between the planes (B in Fig. 16(a)), on the other hand, is rendered by distributing the light intensities on the pixels of the neighboring planes (B1 and B2) with different weight ratios (w1 and w2). The blending scheme determining the weight ratios should induce the viewer’s eye to accommodate to the desired depth. Figure 16(b) depicts different blending methods. The most commonly used one is linear blending [47, 72, 91], where w1 = d2/(d1 + d2) and w2 = 1- w1. Then, a more complex nonlinear blending scheme was proposed by Liu and Hua [108]. However, it is later reported that linear blending may still be preferable when considering specific biological factors and natural-image statistics [109]. As shown in Fig. 16(b), the blending schemes that maximize the MTFs of target with varied resolutions (dotted lines) have a good consistency with linear blending.

Fig. 16
figure 16

Direct blending method. (a) Schematic plot. A and B are points of the 3D virtual scene. B1 and B2 are actual pixels on the display planes. (b) Different blending schemes. The black and red curves stand for linear and non-linear blending. The dotted lines represent the combinations of intensity ratio and accommodation distance under stimuli with different spatial frequencies in cycle per degree (cpd) that maximize the MTF area considering the biologic aspects of the HVS

While the direct blending method has the fastest processing speed due to the simple rendering rule compared to other techniques, it brings artifacts around occlusion boundaries and in other non-Lambertian cases like reflection and refraction. To better imitate the 3D virtual scene, more advanced decomposition schemes involving optimization processes are required.

Optimal decomposition

The sketch of the scene rendering setup is depicted in Fig. 17. A total number of N display planes are used for rendering. Because the viewer can accommodate to different depths in the scene, the optimization process needs to consider all the cases with equal weights. Therefore, the space is segmented into M measurement focal planes to evaluate the rendering in different depths.

Fig. 17
figure 17

Sketch of the optimal decomposition. The red solid lines stand for display planes (p1...pN) and the black dashed lines represent measurement focal planes (f1...fM). The red curves show the PSF from point F on different display planes

When the viewer accommodates to i-th focal plane (z = fi), the final image on the retina would be the direct summation of the ones formed by N display planes. If we note the point spread function (PSF) at the j-th display plane as H(dj, fi) and the display image as uj, the total image Si is expressed as

$$ {S}_i=\sum \limits_jH\left({d}_j,{f}_i\right)\ast {u}_j, $$
(2)

where ∗ represents convolution. For the virtual scene, the ideal image vi formed on the retina can be rendered by assigning the camera in the virtual scene with a focal distance fi. The ultimate goal is to minimize the total error at different focal depths with the constraint on display pixel values (normalized to 1), which can be expressed as

$$ \underset{u}{\mathrm{argmin}}\sum \limits_i{\left\Vert {v}_i-\sum \limits_jH\left({d}_j,{f}_i\right)\ast {u}_j\right\Vert}^2,0<{u}_j<1. $$
(3)

Because the system is discretized due to the pixelated display, we can arrange the pixels of an image into a single vector, like uj and vi. Then, Eq. (3) can be rewritten as

$$ \underset{u}{\mathrm{argmin}}{\left\Vert \left(\begin{array}{c}{v}_1\\ {}\vdots \\ {}{v}_M\end{array}\right)-\left(\begin{array}{ccc}{H}_{11}& \cdots & {H}_{1N}\\ {}\vdots & \ddots & \vdots \\ {}{H}_{M1}& \cdots & {H}_{MN}\end{array}\right)\left(\begin{array}{c}{u}_1\\ {}\vdots \\ {}{u}_N\end{array}\right)\right\Vert}^2. $$
(4)

Although, in theory, the above problem can be approached by solving the normal equation: (HTH)u = HTv, to explicitly derive the matrix H is difficult, let alone HTH is too large and dense for practical computation.

To avoid this, Narain et al. [110] proposed to tackle the problem in frequency domain, where the convolution becomes product and the matrix can be explicitly formed by Fourier-transforming the PSF. However, it also brings the difficulty of handling the constraints because the constraints on pixel intensities are in space domain. These authors proposed to use primal-dual hybrid gradient algorithm and switch between space domain and frequency domain in each iteration to satisfy the constraints. The algorithm shows a faster speed than algebraic reconstruction techniques commonly used in tomography. The rendering results also demonstrate good saliency at the boundaries of occlusion, reflection and refraction compared to direct blending methods. However, because of the frequent Fourier transforms and relatively complex iteration process, the computation speed is still not fast enough to fulfill real-time frame rates.

Further improvement is demonstrated by Mercier et al. in [111] with approximation of the PSF by a disk function, which well applies to near-eye displays where the eye focal distance is significantly larger than the eye pupil diameter. In this case, the convolution is greatly simplified to the element-wise multiplication of matrices, which enables the fast computation of kernel matrix. The size of the kernel matrix can also be greatly reduced because the PSF is translation-invariant. With great simplification of the kernel matrix, the normal equation (HTH)u = HTv can be conveniently solved by Jacobi iteration with constraints of pixel values. The method shows great accuracy and impressive speed that fulfills real-time rendering.

Additive light field rendering

The above-mentioned methods are all based on the imaging of 3D scene on the human eye retina. Alternatively, some works [46, 95] adopt the method of additive light field rendering which directly approximates the physical light field of the 3D scene. The light field is the representation of the light rays emitting from all spatial points. It can be parameterized by a point Q on a 2D plane and the spatial angle θ as L(Qx,Qy,θx,θy), as plotted in Fig. 18.

Fig. 18
figure 18

Illustration of the additive light field rendering. The black dashed lines represent sampling rays. Q1 and Q2 are sampling points and θ is the sampling angle. P1 (green) and P2 (red) are the display image planes. The green and red arrows are light rays coming from image planes and the orange ones are light rays coming from the 3D virtual scene

When we perform one sampling, the ray (black dashed lines) passes through the stacked display images. The intensity of the ray is thus the addition of the pixel intensities ui(x,y) on the image planes, which is written as

$$ L\left({Q}_x,{Q}_y,{\theta}_x,{\theta}_y\right)=\sum \limits_i{u}_i\left({Q}_x+{h}_i\tan {\theta}_x,{Q}_y+{h}_i\tan {\theta}_y\right), $$
(5)

where hi is the separation of the image planes (h1 defined as 0). If we concatenate the images into a single vector u, then the light field L can also be expressed in the matrix form

$$ \left(\begin{array}{c}{L}_1\\ {}\vdots \\ {}{L}_M\end{array}\right)=\left(\begin{array}{c}{\delta}_{P1,j}^1+{\delta}_{P2,j}^1+\dots +{\delta}_{P3,j}^1\\ {}\vdots \\ {}{\delta}_{P2,j}^M+{\delta}_{P2,j}^M+\dots +{\delta}_{P3,j}^M\end{array}\right)\;\left(\begin{array}{c}{u}_1\\ {}\vdots \\ {}{u}_N\end{array}\right), $$
(6)

where j is the column number, δPi, jm is the Kronecker delta function for the i-th image plane in the m-th sampling (\( {Q}_x^m,{Q}_y^m,{\theta}_x^m,{\theta}_y^m \)) defined as δPi, jm =1 if \( j\to \mathrm{pixel}\ \left(\left[{Q}_x+{h}_i\tan {\theta}_x^m\right],\left[{Q}_y+{h}_i\tan {\theta}_y^m\right]\right) \) of i-th image, and 0 otherwise. The goal is to approach the target light field Lt given by the 3D virtual object by optimization of the display content u, expressed as

$$ \underset{u}{\arg\;\min}\;\sum \limits_{Q_x,{Q}_y,{\theta}_x,{\theta}_y}{\left\Vert \left(\begin{array}{c}{L}_{t1}\\ {}\vdots \\ {}{L}_{tM}\end{array}\right)-\left(\begin{array}{c}{\delta}_{P1,j}^1+{\delta}_{P2,j}^1+\dots +{\delta}_{P3,j}^1\\ {}\vdots \\ {}{\delta}_{P1,j}^M+{\delta}_{P2,j}^M+\dots +{\delta}_{P3,j}^1\end{array}\right)\left(\begin{array}{c}{u}_1\\ {}\vdots \\ {}{u}_N\end{array}\right)\right\Vert}^2. $$
(7)

The objective again turns into a constrained least-square problem. Different from the optimal decomposition where the correlation between the images makes the kernel matrix large and dense, the approximation of the light field with ray optics simplifies the kernel to a very sparse matrix. It can be approached using the trust-region algorithm [112], while other methods including the previously mentioned ones are also applicable.

Conclusion

We have presented a systematic review on the state-of-the-art multifocal displays, mainly focusing on the optical architectures and their enabling technologies. Firstly, we provide a classification of the numerous multifocal designs according to their depth generation method and information multiplexing channels. Multiple depths in a display system can be created by changing the optical path length (distance-based) and/or the optical power of the imaging components (power-based). Assembling conventional 2D displays into a 3D scene requires much more information being presented at a faster rate, where information multiplexing is necessary. Thus, we further divide previous designs into four types, including space-, time-, polarization- and wavelength-multiplexing. Then, we dive into detailed designs in this field of research and development as a potential solution to the VAC. So far, the majority of designs operate by time-multiplexing, trading the display frame rate for more focal planes, including those with the densest focal stack reported so far. We expect this trend will continue in the near future due to its compact form factor and simple layout. On one hand, with the rapid development of display panels, driving electronics, and tunable/switchable optics, there will be more powerful enabling components with faster refresh rate and higher image quality for time-multiplexed applications. On the other hand, the recently developed polarization- and wavelength-multiplexing methods could also help alleviate the heavy hardware burden on both space- and time-multiplexing. It is not hard to realize that most of the polarization-dependent time-multiplexed designs can utilize polarization-multiplexing to reduce the required frame rate by a half. In theory, wavelength-multiplexing can offer more improvement by displaying three or more depths at the same moment, although this requires optical components with sharper spectral sensitivity.