1 Introduction

In recent years, there has been an increased interest in exploring light field techniques in 3D visualisation. These techniques have been used extensively for solving the problems related to eye fatigue and discomfort viewing experience from wearing the 3D display glasses. Due to recent advances in theory and microlens manufacturing, holoscopic 3D (H3D) technology is becoming a practical, prospective 3D display technology and is thus attracting much interest in the 3D visualisation sector.

The main concept of H3D, also known as Integral imaging technology, is based on the fly’s eye imaging system concept. The concept of this technology is based on the proposal of Gabriel Lippmann in [1]. He proposed an analogue multi-lens photo plate that is “reminiscent of the insect’s compound eye, where a large number of crystalline lenses are arranged hexagonally.

At the capturing stage, the conventional 2D camera loses angular information of the incident light rays on the 2D camera sensor [2]. To acquire more angular information, the optical representation of the fly’s eye is replaced by a 2D sheet containing large number of macrolenses with lenticular or hexagonal shapes [2]. A 3D holoscopic image is recorded using a lenslets arrays closely packed together with a recording device. Each lenslet views the scene at a slightly different angle than its neighbour, so a scene is captured from different viewpoints and parallax information is recorded [3].

Stereoscopy is a visualisation technique to provide the illusion of depth to the viewer’s eyes with different image offsets. The offsets images are separately displayed in the left and right of the viewer’s eyes.

This visualisation technique depends on channelling the light rays to the left and right of the viewer’s eyes using specific 3D glasses.

A major problem with this commonly used technique is that it causes headache and eye strain for the viewers in addition to intensive post-production work [4].

Besides, stereoscopic visual contents productions with dual cameras are complicated in terms of the encoding and decoding processing aspects and costly due to the required man work efforts in post-production. This may lead to make stereoscopic unsuitable for the current market requirements. Adding to these factors, the stereoscopic displays require physical 3D display head glasses to identify the perceived 3D effects by the viewers. Further research development on the lenticular sheet is made to provide 3D visualisation that is capable of ensuring a higher motion parallax continuity for the end-users. In the 2000s, Sharp was the first company that produced a 3D LCD screen implemented on a laptop device [5].

In 2007, Philips promoted lenticular-lens-based 3D displays that create more depth information. Such displays provide more 3D depth with higher texture details at the cost of a limited motion parallax continuity [6]. Contrary to conventional 2D displays or stereoscopic displays, which are only providing a monoscopic or a stereoscopic view, 3D displays provide various numbers of viewpoints capabilities, viewing distances, and display resolutions capabilities [7].

Autostereoscopic, also known as a glasses-free display type, is first developed by Reinhard Boerner [8]. This type of displays does not only provided higher 3D depth perception result compared to stereoscopic display type but also visualise full-motion parallax continuity. Therefore, it accommodates motion parallax and allows multiple viewers simultaneously without requiring viewers to wear glasses.

These 3D experiences can be achieved without the need to wear 3D glasses [9]. The main autostereoscopic principle is using a flat panel of many macrolenses. The first practical attempt of manufacturing a prototype of this display type was in 1985 by Boerner [8].

As autostereoscopic displays provide 3D viewing perception without the need for special glasses or any head device, for this reason, this technology may lead to incorporate escalation in larger variations of next-generation 3DTV services [10]. The autostereoscopic displays can be categorised into three categories: regular two-view display, head-tracked display, and multiview with three or more views display [11].

Nowadays, there is a great deal of research interest in developing content adaptation techniques for autostereoscopic displays [12,13,14] these research studies aiming to provide the display viewers with richer 3D viewing experiences.

To create a stereoscopic image, two images are projected onto the same screen through different polarising filters. The viewer wears 3D eyewear that also contains a pair of polarising filters. Each filter allows only light that is similarly polarised to pass through the viewer eye, ensuring each eye sees different image. This produces the 3D effect by projecting the same scene into both eyes but depicted from slightly different viewing perspectives. In contrast, autostereoscopic imaging systems use a large number of viewing regions that are recorded and presented on displays that do not require glasses for 3D viewing. A number of such systems are commercially available [15].

Manufacturing of autostereoscopic displays can be grouped into two types: head-tracked displays and multiview displays.

In the first type, the virtual scene cameras provide different viewing points with a limited motion parallax compared to multiview displays [9]. This type of displays requires a precise tracking mechanism to switch between different viewpoints according to various moving head directions. In the second type, a multiview display is produced with a fixed number of viewpoint images which depend on the pitch size of the manufactured lenticular sheet. In contrast to the first type, a multiviewer with higher motion parallax continuity can be achieved at a time due to increase in the number of viewpoints of the display. The perceived viewing of depth information can be distinguished from slightly different viewing angles on each eye due to the disparity between the viewpoint images. As a result, higher motion parallax can also be perceived in this type of displays [9].

Although some progress has been made to improve 3D visualisation systems in terms of user experience, further research is needed.

The need arose due to issues with the current methods of commercially available 3D visualisation systems which are tend to cause eye strain, fatigue, and headaches after a short period of time. These visual discomfort issues are attributed to the fact that the display viewers should focus on the screen plane to accommodate their eyes on the 3D display screen and converge their eyes to a point in space in different screen plane (convergence), producing unnatural 3D viewing experience [16]. In this case, the viewer still has to use his/her brain attempts to interpret the 3D visual contents.

The potential advantages of using H3D imaging over computer-generated 3D visualisation techniques are related to acquiring more viewpoints and parallax information at acquisition and visualisation stages.

The H3D data is recorded using a regularly spaced array of lenslets closely packed together with a recording device. Each lenslet views the scene at a slightly different angle than its neighbour, so a scene is captured from many viewpoints and parallax information is recorded. With H3D imaging, this process is more intuitive because the images are captured by recording more angular and natural light information without having to create a mental 3D image or use 3D eyewear [17].

The performance of such light field systems, however, has been limited by the camera sensor size (number of pixels) and of the characteristics of the microlens array [18, 19]. These two factors are calculated according to the sampling theorem to optimise the trade-off results between spatial and angular resolution of the recorded light field image [18].

The motion parallax continuity depends on how many the viewpoints can be generated at the same in the display. The 3D visual contents generated from multiview capturing system are captured using either multi-camera rigs set-up or single moving 2D imaging system. Several issues are arising from multiview capturing system requirements such as multiview camera installation set-up, colour mismatch and viewpoint image misalignment that are difficult to deal with in post-processing stage.

A single-aperture holoscopic camera can reduce the related calibration issues of multiview set-ups and implementation cost by eliminating the calibration issues of multiview set-ups.

The holoscopic 3D imaging system is designed depending on the fly’s visual system; this is made possible by the introduction of the macrolens array placed right before the imaging sensor, making it the key component that differentiates an H3D imaging system from a 2D imaging system. This macrolens array gives the H3D imaging system the ability to record the angular and spatial information of any given scene. The MLA can be grouped into two categories: omnidirectional OH3D and unidirectional H3D types. Figure 1 shows a practical example using both MLA types. The type of the generated elemental images in H3D system depends on the used MLA characteristics at acquisition stage.

Fig. 1
figure 1

Macrolens array types

The acquisition with omnidirectional MLA provides more directional information (with recording full parallax of a scene) compared to unidirectional MLA. This type of MLA acquires more angular information at the cost of lower spatial information. The unidirectional MLA using vertical cylindrical MLA alignment can capture only horizontal parallax movement of the scene. Therefore, by using omnidirectional MLA, we can get more angular information in each elemental image with different directions and full-motion parallax continuity. In omnidirectional H3D images, each part is represented by a group of replicated elemental images with a slightly different capturing angle.

One of the main holoscopic 3D applications is transforming raw H3D images into multiview images. The required calibration time at multiview capturing stage could be reduced significantly using H3D imaging system with higher reconstruction accuracy. Additionally, using H3D acquisition concept in autostereoscopic visualisation system improves the visualisation system performance by reducing required viewpoints extraction time at the post-processing stage.

To overcome the problem of generated low resolution viewpoint images from H3D images, we evaluated and optimised computer-generated images using H3D capturing concept with different MLA sizes. The evaluation work has been conducted at autostereoscopic visualisation stage. Furthermore, H3D content adaptation framework for autostereoscopic display presented and evaluated with two upsampling techniques (bicubic and nearest neighbour interpolation techniques) in terms of subjective and objective quality assessments.

The rest of this paper is organised as follows. Section 2 provides overview of the autostereoscopic display and H3D imaging system. In Sect. 3, a holoscopic adaptation framework for multiview visualisation is described. Section 4 presents experimental evaluation set-up for the proposed work and the achieved results. The section includes the characteristics of computer-generated H3D test images, the used evaluation methodologies in terms of quality image assessment, objective metric, and computational complexity. Section 5 concludes the proposed work with future work recommendations.

2 Hardware specifications and H3D imaging properties

This section is divided into two subsections. In the first subsection, construction of the autostereoscopic 3D display system is described. In the second subsection, an H3D capturing imaging system and its properties are presented, highlighting the process of generating holoscopic 3D computer-generated images.

2.1 Autostereoscopic display properties

The main feature of the autostereoscopic display is that it can visualise the generated 3D visual contents without the need of wear display glasses, or other devices at the viewer’s end. This type of displays can be divided into four viewing scenarios: (1) electro-holographic where wave-front reconstruction is used to visualise the image on the autostereoscopic display, (2) volumetric display where the image is formed within a volume of space without the use of light interference, (3) multiple images where the viewing field displays many 2D images at the same time, and (4) a light field display where the projector and macrolens arrays form the final 3D image [20]. A simple schematic diagram showing the main components of an autostereoscopic display is depicted in Fig. 2.

Fig. 2
figure 2

Main components of an autostereoscopic display

The viewing range of the autostereoscopic display depends on the maximum number of generated viewpoint images the display can provide simultaneously. This capability enables the viewers to perceive different levels of depth and motion parallax at a fixed viewing range.

2.2 Holoscopic 3D imaging system, 3D image properties and dataset

This section presents the components of the holoscopic 3D image acquisition process, its image properties and the synthetic data, in addition to the macrolens array specifications and their effects on the reconstructed autostereoscopic image quality. A prototype holoscopic 3D camera is shown in Fig. 3a.

Fig. 3
figure 3

Holoscopic 3D capturing

The main components of the holoscopic 3D camera are macrolens array, relay lens and digital camera sensors. Its schematic diagram is featured in Fig. 3b. The macrolens array (rectangular white layout) is mounted at a predefined distance to the camera sensor as shown in Fig. 3b. The main lens image plane is placed in front of the macrolens array, which allows the macrolens array to capture the positions in the scene from different perspectives. The combination of lens layouts integrated with the Sony camera sensor model alpha mark ii is shown in Fig. 3a. The total weight of the prototype H3D camera is 1.57 kg making it lighter than other multiview capturing systems and more flexible in mobility than most stereo imaging rigs [21].

The synthetic holoscopic 3D data presented in this paper is recorded with a virtual holoscopic 3D imaging system with the same system design demonstrated in Fig. 3, with the only exception of different macrolens array sizes.

The acquisition process in holoscopic 3D camera is done by recording the camera sensor light rays with different angular information due to diffraction or splitting of viewpoint light arrays. The diffracted light arrays of the prime lens are then passed through the macrolens arrays, making their way to the relay lens and then finally recorded on the imaging sensor. The recorded light ray from the same viewpoint is captured on the imaging sensor with equal step size intervals between viewpoints. The final holoscopic 3D image is represented by replicating the group of macroimages captured with slightly different angles (Fig. 5). The H3D capturing process concept is illustrated in Fig. 4.

Fig. 4
figure 4

Holoscopic 3D image registration concept

Thus, the synthetic holoscopic 3D data presented in this paper is recorded with a virtual holoscopic 3D imaging system with the same specifications mentioned and as demonstrated in Fig. 4. The main components of the holoscopic 3D imaging system are the macrolens array, camera sensor, square aperture and relay lens array. The square aperture controls the amount of light that is allowed into the camera and helps to avoid severe vignetting the generated image corners. The final generated image, as depicted in Fig. 5, shows a sample of the final computer-generated image with image resolution (7952 × 4472 p.).

Fig. 5
figure 5

Holoscopic 3D synthetic image data sample

In the experimental work, the properties of the generated synthetic test images are reported in Table 1. The synthetic images are generated from the original image size 7952 × 4472 p.

Table 1 Synthetic H3D image resolution vs different MLA sizes

The extracted viewpoint images are extracted from elemental images with resolution ranging from 10 × 10 pixels to 100 × 100 pixels. The implemented camera sensor size was 35.9 × 24 mm with a dot pixel pitch of 0.0041 mm.

The macrolens array records multiple viewpoint images; however, the initial H3D data is recorded with inverse depth before being corrected by the relay lens for data registration at the camera sensor. Based on the implemented macrolens size, the elemental holoscopic images are recorded with different resolutions. In another way, the elemental image size is the same as the total number of viewpoint images recorded in that single image.

Figure 6 presents the synthetic holoscopic 3D extracted viewpoint results achieved with different MLA sizes. As can be seen from the figure, the resolution of the extracted viewpoint images is increased with increase in the MLA size as indicated in Table 1.

Fig. 6
figure 6

H3D viewpoints images resolution without interpolation

In the experimental work, the standard disparity size between viewpoint images is 0.02 cm, which is equivalent to 5 pixels recorded by the H3D imaging sensor used for this paper. The viewing position in front of the multiview display plays an important factor in experiencing 3D depth information and motion parallax continuity. Furthermore, the pitch size of the implemented lenticular sheet on the multiview display will affect the perceived scaling depth from horizontal disparities.

3 Proposed holoscopic 3D content adaptation framework

The holoscopic 3D content adaptation workflow for multiview displays is divided into three major steps, namely (1) holoscopic 3D content recording, (2) selective H3D viewpoint image extraction, and (3) viewpoint image interlacing, as shown in Fig. 7. This section describes the processing pipeline for H3D contents adaptation on multiview display.

Fig. 7
figure 7

Holoscopic 3D content adaptation framework

3.1 Holoscopic 3D content capturing

The first step in the content adaptation workflow is the capturing of good quality holoscopic data.

At this stage, the generated nonlinear distortion from the optical design of the holoscopic camera lens (MLA alignment with camera sensor) is corrected. The correction process includes H3D camera calibration process presented in [21]. A white background is used in the calibration process, and a grid of circular dots is generated from the implemented MLA [21].

In order for a holoscopic 3D dataset to be classified of sufficiently good quality, the following conditions must be met:

  • Resolution should be at least 1080 p and above.

  • The square aperture must be properly aligned to produce a fine square grid of elemental images.

  • There should be no overlapping or ghosting between elemental images.

  • Image noise should be kept to the minimum value.

  • The image should be in full-frame focus.

Once all the following requirements are met, the possibility of recording/capturing high-quality 3D images is greatly improved. Figure 8 shows an extracted single frame from holoscopic 3D video captured in normal room lighting conditions. A total number of 90 × 64 active MLA (fully framed MLA) is used in the capturing process.

Fig. 8
figure 8

Extracted frame from holoscopic 3D video data

There are two capturing methods: linear and angular. In the linear method, a recording of targeted objects is mostly applied when the dataset is image-based or static. This allows viewers to mainly see the depth information of the scene without being able to experience multiview scene when the user changes his/her position or changes viewing angles.

In regard to the angular method of recording, the aim here is to record a slightly different viewing angle of the scene, resulting in the display viewers being able to experience motion parallax and depth. This results in the extraction of multiview point images that are closely packed together, providing higher motion parallax continuity when moving slightly with different viewing images, is very important. Figure 9 is an illustration of the two capturing methods used in recording 3D data for the autostereoscopic 3D display.

Fig. 9
figure 9

Autostereoscopic capturing approaches. a Angular capturing. b Linear capturing

When the holoscopic 3D camera is being moved linearly while recording, only autographic information of a scene is captured, resulting in the visualisation of the same viewpoint at different locations. However, when angular motion is applied, different viewpoints can be seen along with depth information depending on the viewer’s location.

3.2 Selective H3D viewpoint extrapolation

Figure 10 demonstrates the implemented process workflow for viewpoint extraction of raw holoscopic 3D data. The best pixel selection for viewpoint extraction process depends on achieving the best trade-off between the perceived visual quality and perceived 3D depth.

Fig. 10
figure 10

Viewpoint extraction workflow for holoscopic 3D data

The difference between the two data types used for viewpoint extrapolation is that holoscopic video data will include two extra steps before viewpoint image extraction, while the H3D image data does not. The first step of viewpoint extraction in H3D video data includes frame separation and selection; this is where keyframes are selected when there is a significant activity change from the last selected frame. In the second step, the H3D video data goes through a frame rate smoothing process to ensure no skipping from frame to frame. The selection and smoothing stages are introduced into the viewpoint extrapolation workflow to reduce computational expense when dealing with H3D video data. Depending on the video time duration and frame rate, one could end up with H3D image frames up to tens of thousands, leading to a high amount of computational power required to complete the task in near real-time. The following step is the extraction of viewpoint images.

To extract VPIs from the H3DI, the computational reconstruction algorithm generates VPIs independently by superimposing the pixels from all EIs, as shown in Fig. 11.

Fig. 11
figure 11

Basic viewpoint (VP) image extraction from an OH3DI consisting of nine elemental images (EI)

The following viewpoint extraction process can be defined as in Eq. (1):

$${\text{VPI}}\left( {k, l} \right) = \mathop \sum \limits_{k = 1}^{K - 1} \mathop \sum \limits_{l = 1}^{L - 1} {\text{EI}}_{k,l} \left( {x - kS,y - lS} \right),$$
(1)

where VP(k, l) is the extracted viewpoint image, EI(k, l) is the coordinates of the elemental images making up the holoscopic image, and (S) is the shift step size used in extracting all corresponding pixels that make up a particular image.

As discussed earlier in the previous sections, the corresponding viewpoint image size is always the same as the number of elemental images, as illustrated in Fig. 11. As for this paper, the viewpoint extraction process presented in Eq. (1) is further extended to accommodate a patch-based interpolation technique that reconstructs high-definition viewpoint images (HDVPI) by superimposing neighbouring viewpoint images with the shift and integrate Eq. (2) [22].

$${\text{HDVPI}}_{ij} = \mathop \sum \limits_{k = 1}^{N} \mathop \sum \limits_{p = 1}^{N} V_{{i \pm \Delta \left( {1 \mp k} \right),k,j \pm \Delta \left( {1 \mp p} \right),p}} ,$$
(2)

where HDVPIij is the result of the reconstructed high-definition viewpoint image and coordinates i, j; k, p are the indexed number of VP ranging from 1 to N.

On the other hand, the image quality of the extracted viewpoint images can be improved without the introduction of noise by using different MLA characteristics. By increasing the number of macrolenses in MLA, the default spatial resolution of the extracted image is automatically increased. Accordingly, the patch size or the number of neighbouring viewpoint images which are required for interpolation is reduced, decreasing the amount of noise in the final constructed multiview image.

However, this increase in spatial resolution comes at the cost of the disparity range of multiviewpoint images. Therefore, finding the trade-off between image quality and image perceived depth is important and must be considered in autostereoscopic visualisation.

3.3 Multiviewpoint interlacing

Multiviewpoint interpolation consists of remapping and interlacing the extracted viewpoint images together to be visualised on the autostereoscopic display. Figure 12 illustrates the principle behind the viewpoint interlacing technique for autostereoscopic displays.

Fig. 12
figure 12

Viewpoint image interpolation process of a multiview display

The extracted viewpoint images (n) are remapped in such a way that each lenticular strip refracts pixels of similar viewpoint images to a specific direction and while the viewer moves, the chance of visualising other viewpoint images becomes possible.

The overall process of the proposed adaptation framework for H3D content is demonstrated in Fig. 12.

In the following sections, we present the evaluation work results and supportive narrative analysing the obtained viewpoint extracted images at different MLA sizes, computational complexity, and interpolation techniques (Fig. 13).

Fig. 13
figure 13

Holoscopic 3D to multiview conversion process

4 Experimental results

The holoscopic depth vs viewpoint image resolution trade-off analysis is presented in this section. Moreover, a brief introduction to the evaluation platform and its metrics is presented. This is done to facilitate a better understanding of the nature of errors that occur during the viewpoint extrapolation process.

An experimental study is conducted to point out how much-unwanted artefact is introduced in a viewpoint image. The objective image quality evaluation uses the conventional PSNR metric. The section concludes with the best trade-off specifications.

4.1 Evaluation methodology

In this section, an evaluation platform to evaluate different interpolation techniques on extracted H3D images is demonstrated in Fig. 14. The extracted 2D image is compared to different image interpolation methods. An evaluation is applied to all the extracted viewpoints. The number of viewpoints depends on how many viewpoints the multiview display can display at one time. The objective evaluation is implemented on the extracted viewpoints images (N − 1), where N is the maximum number of viewpoints for the implemented multiview display.

Fig. 14
figure 14

Holoscopic 3D evaluation platform

The most widely used objective quality metric is in terms of peak signal to noise ratio (PSNR), which can emulate the perceived video quality as observed by the human visual system it has been used widely in image and video processing measurements due to it being relatively simple [23]. Furthermore, the PSNR quality metric is considered to be one of the most reliable indicators of visual quality variations in image/video processing algorithms in both industry and academia, and a reference benchmark for video quality evaluations in the context of developing video coding standards [24].

A referenced image is generated from Pov-Ray software. The viewpoint images are represented as reference pictures and without any form of image interpolation. Then, the processed images are upsampled with the nearest neighbour and bicubic, respectively.

The PSNR calculation is based on finding a ratio between the signal energy and noise energy. For each frame, a pixel in the Luminance Y component of the reference frame (signal energy) is compared with the corresponding pixel from the processed frame pixel (noise energy). PSNR can be calculated on a logarithmic scale as shown in Eq. (3) [23]:

$${\text{PSNR}} = 20{ \log }_{10} \left[ {\frac{{\left( {2^{n} - 1} \right)}}{{\sqrt {\frac{1}{x}\mathop \sum \nolimits_{i} \mathop \sum \nolimits_{j} \left( {Y_{\text{ref}} \left( {i,j} \right) - Y_{\text{prc}} \left( {i,j} \right)} \right)^{2} } }}} \right],$$
(3)

where \(\left( {2^{n} - 1} \right)\) is the square of the peak signal value, (n) represents the number of bits per pixel in the luminance component, \(Y_{\text{ref}} \left( {i,j} \right)\) represents the pixel values of the referenced image, \(Y_{\text{prc}} \left( {i,j} \right)\) represents the pixel values of the processed frame, and X is the total number of pixels in the frame [23].

4.2 Evaluation of interpolation-based upsampling techniques

The holoscopic 3D imaging technology has the ability to record the full parallax of any given scene; the work associated with this paper takes advantage of that unique ability to aid the creation of multiple viewpoints for multiview displays since this serves as a cost-effective mechanism for capturing multiview content. However, the H3D inability to produce high-resolution viewpoint images poses a serious challenge and results in a fragile multiview camera rig set-up that is prone to acquisition errors like image misalignment and colour mismatch that is extremely difficult to fix in post-production.

The analysis of the trade-off between holoscopic depth and viewpoint image resolution is presented in this section. A brief introduction to the interpolation-based image upsampling technique is presented. This aims at providing the readers with an appreciation of the nature of errors that occur during the viewpoint extrapolation process.

An experimental study is conducted to point out how much-unwanted artefacts are introduced in the generated viewpoints. The image quality is evaluated using the objective PSNR metric. The experiments are conducted using three preselected synthetic images, namely Cone, Dice and Bird (Fig. 15).

Fig. 15
figure 15

Synthetic test images

The nearest neighbour, bilinear and bicubic techniques are the most common upsampling techniques used in today’s image processing applications. The nearest neighbour upsampling technique upsamples images by predicting the value of pixels based on their neighbours; this can lead to pixelated images particularly when the upscaling factor is large. The bilinear upsampling technique upscales images by considering the weights of two neighbouring pixels resulting in a more consistent transition during image upscaling; however, this interpolation technique tends to wash out feature edges [25]. The bicubic being the most accurate of the three techniques takes in a complex sixteen weight matrix to derive pixel values that are introduced during image upsampling [26]. However, depending on the step size, these techniques could still result in producing unpleasant results. Tables 2, 3 and 4 report the obtained objective quality results for the nearest neighbour and bicubic interpolation techniques using different image resolutions. The overall PSNR levels of the three processed images are shown in Fig. 16.

Table 2 Effects of varying interpolation step sizes on (PSNR) levels for (cone image)
Table 3 Effects of varying interpolation step sizes on (PSNR) levels for (Dice image)
Table 4 Effects of varying interpolation step sizes with PSNR values for (bird image)
Fig. 16
figure 16

Objective quality results using nearest neighbour and bicubic interpolation techniques with different MLA sizes

From the achieved PSNR results, it is clear that bicubic interpolation preserves far more details compared with nearest neighbour upscaling interpolation. However, both interpolation techniques result in increased image artefacts in the obtained images. In view of the overall PSNR results featuring in Fig. 17, the quality of the extracted viewpoint images increases as the step size reduces.

Fig. 17
figure 17

Average PSNR of different extracted viewpoints

This is due to the fact that the starting resolution of the extracted viewpoint image is higher than the current standard viewpoint images. However, depending on the pitch size of the lenticular sheet used in the autostereoscopic display, this value can change. As for the autostereoscopic display used in this work, the disparity size is estimated to be a distance of 5 pixels, meaning that if all the first pixels of the array elemental images are used to reconstruct the first viewpoint image, then the next five are ignored, and the sixth pixels are used to reconstruct the second viewpoint image, and so on. With this in mind, the disparity range needed for creating content suitable for autostereoscopic display falls within the range of 30 × 30 pixels to 60 × 60 pixels per elemental image. With fewer pixels to be interpolated, the possible introduction of unwanted artefacts is reduced, leaving content creators with the option of using the upsampling image technique that is most suitable to their needs and requirements.

4.3 Computational complexity

The computational complexity of the holoscopic content adaptation framework is measured in accordance with the consumed processing time required to extract the viewpoint images, since it is known that there is a direct correlation between step size and the generated undesired image artefacts. This section highlights the effects of increasing the extracted viewpoint image resolution on the framework computation complexity.

Table 5 reports the processing time spent on different patch sizes. The experimental work is tested on PC with an Intel(R) Core (TM) i7-4790 @ 3.60 GHz CPU.

Table 5 Execution time for the viewpoint extraction process

The viewpoint images are all extracted from the selected H3D images with an elemental image size of \(90 \times 90\) pixels. As a result, the default resolution is 88 × 58 pixels, and larger patch size is used to upsample the extracted viewpoint image to high-definition quality. By using a larger patch size, the computational complexity increases along with the exposure to noise. However, by increasing the MLA number, the step size need can be reduced from \(21 \times 21\) pixels to 9 × 9 pixels for the best quality and computational complexity results.

5 Conclusion

The unique feature of the holoscopic 3D imaging technique is that it can acquire more angular information than in 2D imaging. However, this advantage comes at the cost of a lower spatial resolution.

In this paper, we presented holoscopic 3D content adaptation framework that is optimised for autostereoscopic displays with low-delay constraints. The proposed adaptation framework is validated with computer-generated holoscopic 3D images. During the design stage, the best trade-off between image resolution and disparity range at the lowest computational complexity is taken into consideration based on the activity differences between elemental images.

Moreover, the nearest neighbour and bicubic interpolation techniques are selected to improve the spatial resolution of the extracted viewpoints. This pre-processing stage is essential to reduce the generated errors during the viewpoint extraction process. The evaluation work includes testing the performance of the viewpoint extraction process in terms of the perceived spatial resolution, interpolation step size, macrolens size, and the required processing time.

From the obtained results, we conclude that the proposed adaptation framework works in an efficient way with holoscopic 3D computer-generated contents. The spatial resolution of the extracted multiview images is increased with increase in step size of the viewpoint extraction process.

Moreover, the spatial resolution of the extracted viewpoint images increases, while the number of pixels in each macrolens decreases.

When employing the bicubic interpolation with the adaptation framework, the gain in objective quality increased by 1.56 dB compared with the nearest neighbour interpolation. For future work, the adaptation framework will be further optimised and evaluated for real captured holoscopic 3D content using both image and video datasets. It is recommended to implement the generative adversarial network (GAN) for image super-resolution (SR) with the proposed holoscopic 3D content adaptation framework in order to improve the spatial resolution of the extracted viewpoint images.