1 Introduction

Image data compression technologies, such as the JPEG 2000 international standard [1,2], allow high quality images to be transmitted via worldwide digital communication networks. Digital cinema and 4K images are remarkable examples of such technology [3,4]. These images require a huge number of pixels to express fine textures at high spatial resolutions.

Recently, high dynamic range (HDR) images have attracted considerable attention [5]. These images have a high resolution of pixel values, i.e., numerous pixel tones. Compared with the current standard for low dynamic range (LDR) images, which are expressed in 8 bits, HDR images have an extremely long bit depth and high dynamic range of pixel values. To fully utilize this dynamic range under limited memory space, the pixel values are expressed as floating-point data, such as in OpenEXR or RGBE format [6,7]. This paper focuses on the compression of HDR images in these data formats. Moreover, the proposed method, referred to as bit-depth scalable coding, is backward compatible with a standard coding method for LDR images.

Bit-depth scalable coding outputs compressed data in two layers, a base layer and an enhancement layer. From the bit stream in the base layer, the LDR image is decoded with a standard lossy decoder. By adding the bit stream in the enhancement layer, the original HDR image can be decoded without any loss. This scalable coding system has the advantage that it can directly accommodate both HDR and LDR users. Therefore, the system has attracted many researchers, and a number of variations have been reported [8-15].

Ward et al. [8] proposed a backward compatible bit-depth scalable coding method in which the original HDR color image is tone mapped in the base layer to produce an LDR image that is compressed by the JPEG international standard encoder. The enhancement layer then embeds the luminance ratio of the LDR and HDR images. The original HDR color image is decoded by multiplying the luminance ratio in the enhancement layer and the LDR color image in the base layer. This method has been extended to video signals and has attracted attention as a bit-depth scalable video coding method in international standardization activities [9-11,16]. For still images, Khan introduced a piecewise linear model of a tone mapping [12]. Jinno et al. improved the coding efficiency in the enhancement layer by replacing the ratio with a low-pass-filtered HDR image [15]. However, these reports focused on ‘lossy’ coding for HDR images.

Unlike these previous reports, we discuss the ‘lossless’ coding of HDR images under a scalable coding scheme that is compatible with lossy LDR image coding. The lossless coding of HDR images is especially important for storing and archiving original visual data such as medical, artistic, and astrograph images. Such data can be used for diagnosis based on medical images, analysis of astrograph images, art preservation, and bio-medical detections [17,18].

First, we discuss a baseline method [19] that was simply extended to lossless scalable coding from a non-scalable HDR image coding method [20]. Although the baseline method is straightforward and easy to implement, the coding efficiency in the enhancement layer is not satisfactory. To cope with this problem, we introduced a reversible logarithmic mapping and reduced the dynamic range of the HDR images [19,21]. This approach was shown to be effective for compressing data in the enhancement layer. However, the method was limited to the OpenEXR format [6]. Another representative format, referred to as RGBE [7], has been ignored.

In this paper, we improve on our previous conference papers [19,21] and add some theoretical analysis. First, we show that a simple extension of the reversible logarithmic mapping (Rev) to the RGBE format degrades the visual quality of the decoded LDR images. To avoid this problem, we introduce a format conversion (Cnv) to the system. We demonstrate that simply extending Rev magnifies the quantization error added by the lossy coding in the base layer. Second, we analyze the theoretical basis for why our method improves the coding efficiency of the system. We estimate how the bit depth of the residual image to be encoded in the enhancement layer is reduced by Rev. We also explain why the simply extended Rev degrades the LDR images, and why Cnv improves their quality in the RGBE format.

This paper is organized as follows. In Section 2, we describe two floating-point data formats and a non-scalable HDR image coding method. A baseline scalable coding method that simply extends the non-scalable coding approach is then summarized in Section 3, and the concept and implementation of the proposed method are introduced in Section 4. The theoretical analysis is described in Section 5, and our experimental results are summarized in Section 6. Finally, we present our conclusions in Section 7.

2 Data format and non-scalable coding

We first describe two floating-point data formats for HDR images. A non-scalable lossy coding method, which is extended to scalable lossless coding of HDR images in the next section, is also summarized.

2.1 2.1 Type A format of HDR images

To date, there are two well-known representative data formats for HDR images. One is the OpenEXR floating-point data format [6] and the other is the RGBE data format [7].

In the OpenEXR data format, a pixel value x H,c of an HDR image is described by an exponent value x E,c , mantissa value x M,c , and sign value x S,c as

$$ \left\{ \begin{array}{l} x_{H,c}=(-1)^{x_{S,c}}\left(1+2^{-10}x_{M,c}\right)2^{-15+x_{E,c}} \\ \text{if} \;\; x_{E,c}\neq 0 \end{array} \right. $$
((1))

and

$$ \left\{ \begin{array}{l} x_{H,c}=(-1)^{x_{S,c}}\left(0+2^{-10}x_{M,c}\right)2^{-14} \\ \text{if} \;\; x_{E,c}=0 \end{array} \right. $$
((2))

for a color component c∈{R,G,B}. The exponent, mantissa, and sign values are given as integers in the ranges

$${} x_{M,c}\in \left[0, 2^{10}-1\right],\; x_{E,c}\in \left[0, 2^{5}-1\right], \; x_{S,c}\in \left[0, 1\right]. $$
((3))

The mantissa, exponent, and sign have depths of 10 bits, 5 bits, and 1 bit, respectively. Therefore, a pixel value of an HDR image is expressed in 10+5+1=16 bits for each color component. Note that in certain special cases, x E,c =31 [6].

In the remainder of this paper, we denote the exponent, mantissa, and sign of each color component as a vector

$$ \left\{ \begin{array}{ll} \textbf{x}_{E}&= \left[ x_{E,R},\; x_{E,G},\;x_{E,B} \right]^{T}\\ \textbf{x}_{M} &= \left[x_{M,R},\;x_{M,G},\;x_{M,B}\right]^{T}\\ \textbf{x}_{S}& = \left[ x_{S,R},\; x_{S,G},\; x_{S,B} \right]^{T} \end{array} \right. $$
((4))

and define the HDR image data x D as

$$ \textbf{x}_{D}=\left[ \textbf{x}_{E},\;\textbf{x}_{M},\;\textbf{x}_{S} \right]. $$
((5))

Using these vectors, we denote Equations 1 and 2 as

$$ \textbf{x}_{H}=\text{Flt}_{A}\left(\textbf{x}_{D}\right), $$
((6))

where the pixel value of the HDR image x H is

$$ \textbf{x}_{H}=\left[ x_{H,R},\;x_{H,G},\;x_{H,B} \right]^{T}. $$
((7))

Hereafter, we refer to OpenEXR as the ‘type A’ format.

2.2 2.2 Type B format of HDR images

In the RGBE data format, a pixel value of an HDR image x H,c is given as

$$ x_{H,c}=\left\{ \begin{array}{ll} \frac{x_{M,c}+0.5}{256}2^{x_{E,0}-128} & \text{if } x_{E,0} \neq 0 \\ 0 & \text{if } x_{E,0} = 0 \end{array} \right. $$
((8))

for a color component c∈{R,G,B}. Both the mantissa and exponent have depths of 8 bits, i.e.,

$$ x_{M,c}\in \left[0,2^{8}-1\right], \; x_{E,0}\in \left[0,2^{8}-1\right]. $$
((9))

The exponent x E,0 is commonly used among three color components. In this format, a pixel value is expressed with a total of 32 bits [7]. Using the vectors, we denote Equation 8 as

$$ \textbf{x}_{H}=\text{Flt}_{B}\left(\textbf{x}_{D} \right). $$
((10))

Hereafter, we refer to RGBE as the ‘type B’ format. Note that, for a type B image, x H in Equation 10 is non-negative. In contrast, x H in Equation 6 for a type A image can be negative, zero, or positive.

2.3 2.3 Non-scalable lossy coding

Figure 1 illustrates the ‘HDR image coding in JPEG 2000’ reported in [20]. At the encoder, the HDR image data x D is converted into the pixel value x H by Flt, where Flt denotes Flt A in Equation 6 for type A images and Flt B in Equation 10 for type B images. The logarithmic function loge is applied to each color component of x H . Note that pixel values that are less than or equal to zero are first clipped to the minimum positive pixel value in the image. In terms of the signal-to-noise ratio (SNR) of the variances, the effect on the LDR images is almost zero. At worst, of the nine test images considered in this paper, the SNR is less than 10−2 [%] for a type A ‘still life’ image. The effect on HDR images is also limited, with an SNR of less than 10−10 [%] for the same input image.

Figure 1
figure 1

HDR image coding in JPEG 2000. The logarithmic function is applied and normalized to 8-bit depth before lossy encoding.

The pixel values are normalized to the range [ 0,255] by

$$ \text{Nrm}(x)=(x-\text{min}X) \cdot \frac{255}{\text{maxX}-\text{min}X} $$
((11))

for X={x|xi m a g e}, where minX and maxX are the minimum and maximum pixel values in the set X, respectively. Because the input values to the encoder must be integers, the results are rounded to be integers. Namely,

$$ \textbf{x}_{B}= \text{Rnd}(\text{Nrm}(\log_{e}(\text{Clp}(\textbf{x}_{H})))) $$
((12))

is fed into the encoder, where Rnd and Clp are the rounding and clipping operations, respectively. In the decoder, the HDR pixel values y H are recovered from the decoded image y B with the inverse of each Nrm and loge.

In this paper, we extend this method to the scalable lossless coding of HDR images. The tone mapping operator Tmo described in Section 2.4 is added to this procedure as ‘part A’ to display color LDR images with better quality.

2.4 2.4 Tone mapping operation

We now summarize the tone mapping operator for color images based on the Hill function [5]. A pixel value of the HDR image y H,c is tone mapped to y L,c of the LDR image as

$$ y_{L,c}= \text{Rnd}\left(255y_{H,c} \cdot y_{L,Y}/y_{H,Y} \right) $$
((13))

for c∈{R,G,B}, where

$$ \left\{ \begin{array}{rl} y_{H,Y}&=0.27y_{H,R}+0.67y_{H,G}+0.06y_{H,B}\\ y_{L,Y}&=\text{Hill}\left(y_{H,Y}/\bar{Y}_{H,Y}\right), \end{array} \right. $$
((14))

and the Hill function is defined as

$$ \text{Hill}(x)=\frac{x^{a}}{x^{a}+b^{a}}. $$
((15))

In (14), \({\bar Y}_{H,Y}\) is defined as

$$ \hspace{+8pt} {\bar Y}_{H,Y}=\exp\left(\text{Ens}\left(\log_{e}(y_{H,Y}) \right) \right), $$
((16))

where Ens(·) denotes the ensemble average over all positive values of y H,Y in the image. a and b are user-set parameters. In our experiments, we use (a,b)=(1,1). We denote the tone mapping in Equation 13 as

$$ \textbf{y}_{L}= \text{Rnd}\left(\text{Clp'} \left(\text{Tmo}\left(\textbf{y}_{H}\right) \right) \right), $$
((17))

where

$$ \left\{ \begin{array}{rl} \textbf{y}_{H} &= \left[ y_{H,R},\;y_{H,G},\;y_{H,B} \right]^{T}\\ \textbf{y}_{L} & = \left[ y_{L,R},\;y_{L,G},\;y_{L,B} \right]^{T} \end{array} \right. $$
((18))

for color components. Because the output values of Tmp exceed 8-bit integers for color images, we clip the output values to the range [ 0,255] with Clp’.

3 Baseline method

The baseline scalable lossless coding method is simply an extension of the non-scalable lossy coding method. We now summarize this baseline method, as well as the problem considered in this paper.

3.1 3.1 Scalable lossless coding

Figure 2 illustrates the baseline method, which we use as a reference in this paper. This is a simple extension of the non-scalable lossy coding in Figure 1 to the scalable lossless coding of HDR images. ‘Part B’ denotes the processes that have been added.

Figure 2
figure 2

The baseline method. The HDR image is reconstructed without any loss.

To achieve the lossless coding of HDR images, x H is converted into the integer value x I by the reversible integer mapping Int detailed in Section 3.2. Note that the inverse mapping Int−1 reconstructs the original value without any loss. The procedure for generating the LDR images is almost the same as the method in Figure 1. The bit stream needed to reconstruct the LDR image is embedded in the base layer. In the enhancement layer, the integer value y I is reconstructed from the decoded LDR image y B with the inverse normalization Nrm−1, the exponential function exp, and the rounding operation

$$ \mathbf{y}_{I}= \mathrm{Rnd }\left(\exp \left(\text{Nrm}^{-1}(\mathrm{y}_{B}) \right) \right). $$
((19))

Finally, the residual

$$ \textbf{e}_{I}=\textbf{y}_{I} - \textbf{x}_{I} $$
((20))

is encoded with a lossless coding method to generate the bit stream in the enhancement layer.

3.2 3.2 Reversible integer mapping

The reversible integer mapping Int from the real value x H to the integer value x I was introduced in [19]. It is defined as

$$ \begin{array}{ll} x_{I,c}= & \left\{ \begin{array}{ll} (-1)^{x_{S,c}} \left(x_{H,c}2^{25-\text{min}X_{E}}-2^{10} \right) & \text{if}\hspace{+4pt} \text{min}X_{E}\neq0 \\ (-1)^{x_{S,c}} x_{H,c}2^{24} & \text{if}\hspace{+4pt} \text{min}X_{E}=0 \end{array} \right. \end{array} $$
((21))

where c∈{R,G,B} and X E ={x E,c |x E,c ∈image} for type A images. This is a simple scaling applied to the rational number x H,c in Equation 1 so that it becomes an integer. In other words, we shift the decimal point to the right. Note that the minimum min X E of all the pixel values x E in the image is stored and embedded into the bit stream. We denote the mapping in Equation 21 as

$$ \textbf{x}_{I}=\text{Int}_{A}(\textbf{x}_{H}) $$
((22))

for

$$ \textbf{x}_{I}= \left[x_{I,R},\;x_{I,G},\;x_{I,B}\right]^{T}. $$
((23))

Similarly, a mapping for type B images can be defined as

$$ \textbf{x}_{I,c} = \left(256 x_{H,c}2^{128-\text{min}X_{E}^{+}}-0.5 \right) 2 +1 $$
((24))

where c∈{R,G,B} and \(X_{E}^{+}=\left \{ x_{E,c} | x_{E,c}>0 \right \}\). Note that the minimum min\(X_{E}^{+}\) of all the positive pixel values x E >0 in the image is stored and embedded into the bit stream. We denote this mapping as

$$ \textbf{x}_{I}=\text{Int}_{B}(\textbf{x}_{H}) $$
((25))

for type B images.

Note that the inverse of this mapping recovers the original value without any loss. Therefore, the baseline method becomes lossless for the original HDR image.

3.3 3.3 Problem setting

In this paper, we tackle the following limitation of the baseline method. As a result of the reversible integer mapping, the residual e I in Equation 20 requires a very large bit depth. It is somewhat difficult to compress this data volume in the high bit rate coding of the LDR image. This is because e I is a magnified version of the coding noise e B =y B x B . In lossy coding, the noise e B is added in the base layer and is magnified by Nrm−1 and exp as indicated in Equation 19. Because this noise tends to have a weak correlation, the difference e I also has a weak correlation. Therefore, the data volume of the enhancement layer becomes huge. Note that the correlation of e I increases in the low bit rate coding of the LDR image. This is investigated in Section 6.

To cope with this problem, we previously introduced the reversible logarithmic mapping (Rev) to reduce the bit depth of the residual image [19]. However, in this previous report, we only presented experimental results without any theoretical endorsement. In this paper, we theoretically compare Int and Rev in respect of the bit depth of the residual image in the enhancement layer.

In addition, Rev has been limited to the type A format, ignoring type B. In this paper, we show that a simple extension of Rev to the type B format degrades the LDR images. To avoid this problem, we introduce a format conversion (Cnv) from type B to type A in the base layer. We present a theoretical justification for why the simply extended Rev degrades the LDR images and Cnv improves its quality for type B images.

4 Proposed method

The reversible logarithmic mapping (Rev) is introduced to reduce the data volume of the enhancement layer. In particular, for type B format images, the format conversion (Cnv) is introduced to maintain the visual quality of the LDR images.

4.1 4.1 Type I method for type A format images

Figure 3 illustrates the proposed type I method. Instead of Flt and In t in the baseline method (Figure 2), the reversible logarithmic mapping Rev defined in Section 4.2 is applied to the HDR data x D to produce x R . This is converted to an 8-bit depth integer x B as

$$ \textbf{x}_{B}= \text{Rnd}(\text{Nrm}(\rm Clp(\textbf{x}_{R}))) $$
((26))
Figure 3
figure 3

The proposed type I method. The bit depth of the residual e R is reduced by the reversible logarithmic mapping Rev.

and fed into the lossy encoder, which outputs the bit stream in the base layer. The reconstructed pixel y B given by the decoder is inversely normalized and rounded to an integer as

$$ \mathbf{y}_{R}= \text{Rnd} \left(\text{Nrm}^{-1} \left(\boldsymbol{y_{B}} \right) \right). $$
((27))

Then, the difference

$$ \textbf{e}_{R}=\textbf{x}_{R}-\textbf{y}_{R} $$
((28))

is encoded with the lossless encoder to generate the bit stream in the enhancement layer. In the decoder, y R is added to e R to recover x R . Applying the inverse of Rev, the original HDR data x D are retrieved without any loss. Namely, they are recovered as

$$ \textbf{x}_{D}= \text{Rev}^{-1} (\textbf{e}_{R} +\textbf{y}_{R}). $$
((29))

The LDR image y L is reconstructed from the decoded image y B in a similar way to the baseline method with a compensation factor (Cmp). This recovers the HDR pixel value y H , and then applies the tone mapping operation Tmo as

$$ \begin{array}{rl} \textbf{y}_{L} &= \text{Rnd}\left(\text{Tmo}\left(\text{Flt}\left(\text{Rev}^{-1}\left(\textbf{y}_{R} \right)\right)\right)\right) \\ &= \text{Rnd}\left(\text{Cmp}\left(\textbf{y}_{R}\right)\right). \end{array} $$
((30))

It is also possible to display y B as an LDR image without using Cmp. In this case, y B in the proposed method is almost the same as that of the baseline method as illustrated in Figure 4. There are two approaches that use the Hill function in Equation 15 to generate the LDR image y L exampled in Figure 5. The first introduces Cmp in the encoding process, and the second introduces Cmp in the decoding process. The former case is convenient for data receivers, because it is not necessary to add Cmp to a standard decoder. However, this increases the data volume of the enhancement layer. In this paper, we employ the latter approach.

Figure 4
figure 4

Images decoded in the base layer.

Figure 5
figure 5

LDR images tone mapped with the Hill function.

4.2 4.2 Reversible logarithmic mapping

In the proposed type I method illustrated in Figure 3, the reversible logarithmic mapping is applied to generate the integer value x R . This technique was originally introduced in [22]. The mapping for type A images is defined as

$$ x_{R,c} = (-1)^{x_{S,c}} \left(\left(x_{E,c}-\text{min}X_{E} \right) 2^{10}+x_{M,c} \right) $$
((31))

for c∈{R,G,B}. We denote this mapping as

$$ \textbf{x}_{R}=\text{Rev}_{A}(\textbf{x}_{D}) $$
((32))

for

$$ \textbf{x}_{R} = \left[x_{R,R},\;x_{R,G},\;x_{R,B}\right]^{T}. $$
((33))

This mapping approximates the logarithm of an HDR image x H . Substituting x E,c from Equation 1, i.e.,

$$ x_{E,c} = \log_{2} x_{H,c} -\log_{2} \left(1+2^{-10}x_{M,c}\right) +15, $$
((34))

for positive values in Equation 31, we have

$$ x_{R,c}= \left(\log_{2} x_{H,c} +15 -\text{min }X_{E} -\epsilon_{A} \right) 2^{10} $$
((35))

where

$$ \begin{aligned} \begin{array}{rl} \epsilon_{A} &= \log_{2} \left(2^{-10}x_{M,c} +1\right) -2^{-10}x_{M,c} \\ &= \log_{2} \frac{\delta_{A}+1}{2^{\delta_{A}}} \\ \end{array} \end{aligned} $$
((36))

for δ A =2−10 x M,c . As indicated in Equation 35, Rev A generates a good approximation of the logarithm of the HDR image [23,24]. The approximation error is relatively small, as ε A fluctuates around 0.06 depending on the mantissa. Therefore, Nrm(x R ) becomes close to x B in Equation 12. This is encoded with a standard lossy encoder to generate the bit stream in the base layer.

The reversible logarithmic mapping is suitable for lossless scalable coding because it one-to-one maps an integer to an integer. Therefore, its inverse mapping reconstructs the original integer values without any loss, i.e., Rev is ‘reversible’. This property also reduces the dynamic range of the mapped integer values. We have experimentally confirmed [21] that the residual in the enhancement layer e R has a lower bit depth than that of e I in the baseline method. We provide the theoretical basis for this observation in Section 5.1.

4.3 4.3 Type I method for type B format images

For type A images, Rev A in Equation 32 is applied in Figure 3. For type B images, a direct extension of Rev A can be defined as

$$ \begin{array}{ll} x_{R,c} = \left \{ \begin{array}{ll} x_{E,0}^{*} 2^{8} + x_{M,c}+1 & \text{if}\hspace{+4pt} x_{E,0}\neq0 \\ 0 & \text{if}\hspace{+4pt} x_{E,0}=0 \end{array} \right. \end{array} $$
((37))

for c∈{R,G,B} and

$$ x_{E,0}^{*}=x_{E,0} - \text{min}X_{E}^{+} +1. $$
((38))

We denote this mapping as

$$ \textbf{x}_{R}=\text{Rev}_{B}\left(\textbf{x}_{D}\right), $$
((39))

and apply this to type B images in the type I method.

Note that the depth of x R from Rev B is a maximum of 16 bits for each color component. Therefore, it costs 48 bits for all the color components, which exceeds the original 32-bit data. However, using the reversible color transform (RCT) in JPEG 2000 lossless coding reduces the cost by 16 bits. The RCT is defined as

$$ \left \{ \begin{array}{ll} x_{1}=\lfloor \left(x_{R}+2x_{G}+x_{B} \right)/4 \rfloor \\ x_{2}=x_{B}-x_{G} \\ x_{3}=x_{R}-x_{G} \end{array} \right. $$
((40))

Because the second and third row of this RCT take the difference between the color components, the exponent term \(x_{E,0}^{*}\) in Equation 37 disappears. As a result, the bit depth becomes 48−16=32 bits in total. Furthermore, the exponent term is less than 5 bits in the type B images tested in our experiment. Therefore, in practice, the system can compress the data volume.

In this paper, we show that the quality of LDR images is degraded in this directly extended Rev B for type B images. Figure 6 shows some LDR images produced by the proposed type I and type II methods. The former has lower quality than the latter, with a peak SNR (PSNR) of 20.79 dB compared with 29.08 dB at the same bit rate of 5.23 bppc in the base layer. The reason for this is analyzed in Section 5.2. To solve this problem, we introduce the following format conversion.

Figure 6
figure 6

LDR images given by the proposed type I and type II methods. The LDR image from the type I method is degraded in type B format.

4.4 4.4 Type II method for type B format images

Figure 7 illustrates the modified method for type B images. We refer to this as the type II method. The type II approach includes the format conversion (Cnv). First, Rev B converts the HDR data x D into type B x R . In the figure, this is denoted as \(\textbf {x}_{R}^{(B)}\) to clearly indicate the type. The conversion introduced in this paper is defined as

$$ \left \{ \begin{array}{ll} \text{Cnv}(\textbf{x}) = \text{Rev}_{A}\left(\text{Flt}_{A}^{-1}\left(\text{Flt}_{B}\left(\text{Rev}_{B}^{-1}(\textbf{x})\right)\right)\right) \\ \textbf{x} = \text{Clp}\left(\textbf{x}_{R}^{(B)}\right) \end{array} \right. $$
((41))
Figure 7
figure 7

The proposed type II method. Cnv converts type B to type A in the base layer.

as illustrated in Figure 8. In the proposed type II method,

$$ \textbf{x}_{B}=\text{Rnd}\left(\text{Nrm}\left(\text{Cnv}\left(\text{Clp}\left(\textbf{x}_{R}^{(B)} \right)\right)\right)\right) $$
((42))
Figure 8
figure 8

The format conversion Cnv in the proposed type II method.

is encoded with the lossy encoder.

As a result of this conversion, the type B image is temporarily converted into a type A image in the base layer. Therefore, the problem caused by Rev B can be avoided, and the quality of the LDR image is improved compared to that given by applying the type I method to type B images. This assertion is theoretically endorsed in Section 5.2. This conversion is reversible as far as a large enough bit depth is assigned to values inside the process. However, reversion is not always necessary, as the system becomes lossless for HDR images in as much as the y R are exactly the same in the encoder and the decoder, even though Cnv is not perfectly reversible.

5 Theoretical analysis

We now present a theoretical analysis of why the proposed method reduces the bit depth of the enhancement layer. The rationale for introducing the format conversion is also explained.

5.1 5.1 Bit depth of the enhancement layer

We estimate the bit depth of the residual e I in the baseline method and e R in the proposed method and theoretically demonstrate that the bit depth of the proposed method is smaller than that of the baseline method.

The bit depth of pixel values in an image x is defined as

$$ B_{dp}\left(\textbf{x}\right) = \log_{2} (\text{max}X-\text{min}X+1), $$
((43))

where maxX and minX denote the maximum and minimum pixel values in the image. We must calculate the maximum of e I in the baseline method to estimate its bit depth. In Figure 2, the relations

$$ \left\{ \begin{array}{l} \textbf{x}_{B}=\text{Rnd}\left(\text{Nrm}\left(\log_{e}\text{Clp}\left(\mathbf{x}_{I}\right)\right)\right) \\ \textbf{y}_{I} =\text{Rnd}\left(\exp(\text{Nrm}^{-1}(\textbf{y}_{B}))\right) \end{array} \right. $$
((44))

are modeled as

$$ \left\{ \begin{array}{l} \textbf{x}_{B}=\text{Nrm}(\log_{e}\textbf{x}_{I}) +e_{1} \\ \textbf{y}_{I} =\exp(\text{Nrm}^{-1}(\textbf{y}_{B})) +e_{2} \end{array} \right. $$
((45))

for positive values of x I , where e 1,e 2∈[ −0.5,0.5] are rounding errors due to Rnd in Equation 44. Therefore, the maximum of

$$ \begin{aligned} \textbf{e}_{I} &=\textbf{y}_{I}-\textbf{x}_{I}\\ &=\exp \left(\text{Nrm}^{-1} \left(\textbf{y}_{B}\right)\right) -\textbf{x}_{I} +e_{2}\\ &=\exp \left(\text{Nrm}^{-1}\left(\textbf{x}_{B}+\textbf{e}_{B}\right)\right) -\textbf{x}_{I} +e_{2}\\ &=\exp \left(\text{Nrm}^{-1}\left(\text{Nrm}\left(\log_{e}\textbf{x}_{I}\right) \right.\right.\\ &\quad+\left.\left.\textbf{e}_{B} +e_{1}\right)\right) -\textbf{x}_{I} +e_{2} \end{aligned} $$
((46))

is estimated as

$$ \text{max}E_{I} = \text{max}E_{B} \cdot \text{max}X_{I} \cdot C/255 $$
((47))

for

$$ C= \log_{e}(\text{max}X_{I}) - \log_{e}\left(\text{min}X_{I}^{+}\right), $$
((48))

as detailed in Appendix A. Substituting

$$ \text{min}E_{I} = -\text{max}E_{I} $$
((49))

and Equation 47 into Equation 43, the bit depth of e I is estimated as

$$ \begin{array}{ll} B_{dp}(\textbf{e}_{I}) &=\log_{2}(2 \cdot \text{max}E_{I}+1) \\ &\approx \log_{2}(\text{max}E_{B} \cdot \text{max}X_{I} \cdot C/255)+1, \end{array} $$
((50))

giving the bit depth of the residual of the baseline method. Similarly, using the model

$$ \left\{ \begin{array}{rl} \textbf{x}_{B} & =\text{Nrm'}(\textbf{x}_{R}) +e'_{1} \\ \textbf{y}_{R} & =\text{Nrm'}^{-1}(\textbf{y}_{B}) +e'_{2} \end{array} \right. $$
((51))

in the proposed method, the maximum of

$$ \begin{aligned} \textbf{e}_{R} & =\textbf{y}_{R} -\textbf{x}_{R}\\ &=\text{Nrm'}^{-1} \left(\textbf{x}_{B}+\textbf{e}_{B}\right) -\textbf{x}_{R} +e_{2}' \\ &=\text{Nrm'}^{-1} \left(\text{Nrm'}(\textbf{x}_{R}) +\textbf{e}_{B} +e'_{1}\right) \\ &\quad - \textbf{x}_{R} +e_{2}' \\ \end{aligned} $$
((52))

is given as

$$ \text{max}E_{R}=\text{max}E_{B} \cdot \text{max}X_{R} /255, $$
((53))

as shown in Appendix . Substituting

$$ \text{min}E_{R}=-\text{max}E_{R}. $$
((54))

and Equation 53 into Equation 43, the bit depth of e R can be estimated as

$$ \begin{array}{ll} B_{dp}\left(\textbf{e}_{R}\right) &= \log_{2}(2 \cdot \text{max}E_{R}+1) \\ &\approx \log_{2}\left(\text{max}E_{B} \cdot \text{max}X_{R}\right)/255)+1, \end{array} $$
((55))

giving the bit depth of the residual of the proposed method.

We can now compare e I and e R in terms of bit depth. The error in the base layer e B is composed of errors due to the rounding before applying the lossy encoder, as well as quantization errors added by the lossy coding. Therefore, the maximum and minimum of

$$ \textbf{e}_{B} = \textbf{y}_{B} - \textbf{x}_{B} $$
((56))

are

$$ \text{max}E_{B}=-\text{min}E_{B}=Q, $$
((57))

where Q is determined by the quantization step size of the lossy coding in the base layer. Taking the difference between Equation 50 and Equation 55, we have

$$ \begin{aligned} \Delta B_{dp} &=B_{dp}\left(\textbf{e}_{I}\right)-B_{dp}\left(\textbf{e}_{R}\right)\\ &=\log_{2}\left(\text{max}E_{B} \cdot \text{max}X_{I} /255 \cdot C\right) \\ &\quad -\log_{2}\left(\text{max}E_{B} \cdot \text{max}X_{R} /255 \right), \end{aligned} $$
((58))

and therefore

$$ \Delta B_{dp} =\log_{2} \frac{\text{max}X_{I} \cdot C}{\text{max}X_{R}} $$
((59))

is the difference in bit depth. From Equations 1, 21, and 31, the maxima of x I and x R are expressed as

$$ \left\{ \begin{array}{ll} \text{max}X_{I} = \left(2^{C^{\ast}} -1\right) 2^{10} \approx 2^{C^{\ast}} \cdot 2^{10} \\ \text{max}X_{R} = C^{\ast} \cdot 2^{10} \\ \end{array} \right. $$
((60))

for

$$ C^{\ast} =\text{max}X_{E}-\text{min}X_{E} +\gamma, $$
((61))

where γ∈[ 0,1) is determined according to the mantissa ∈[0,210). Substituting Equation 60 into Equation 59, we have the difference as

$$ \Delta B_{dp} =\log_{2} \frac{2^{C^{\ast}} \cdot C}{C^{\ast}} >0, $$
((62))

which is always a positive value. This indicates that the bit depth of the proposed method is smaller than that of the baseline method. Thus, we have theoretically shown that the proposed method achieves bit-depth reduction in the enhancement layer.

5.2 5.2 Difference between type I and type II for type B format

Next, we show that the format conversion introduced in Section 4.4 alleviates the degradation of LDR images in the base layer. The output LDR image y L is tone mapped from the decoded HDR image y H , which is generated from y R in the proposed method. Therefore, we analyze the relation between y H and y R for the type I and type II methods.

As illustrated in Figure 3, the proposed type I method produces y H as

$$ \textbf{y}_{H}=\text{Flt}_{B}\left(\text{Rev}^{-1}_{B}\left(\textbf{y}_{R}\right) \right). $$
((63))

for type B images. For example, the exponent of the type B image data becomes

$$ x_{E,0}^{*} = \left(x_{R,c}-x_{M,c} -1 \right) /256 $$
((64))

from the inverse of Equation 37. Substituting this equation into Equation 8, we have

$$ x_{H,0} = f_{Ia}(x_{R,c}) \cdot f_{Ib}(x_{M,c}) \cdot 2^{\text{min}X_{E}^{+} -127}, $$
((65))

where

$$\left\{ \begin{array}{rl} f_{Ia}(x_{R,c})& = ~ \exp(x_{R,c} \cdot 2^{-8}\log_{e}2), \\ \\ f_{Ib}(x_{M,c})& = ~ \frac{\delta_{B}+2^{-9}}{2^{\delta_{B}+2^{-8}}} \in [\!0.002, 0.499], \\ \\ \delta_{B}& = ~ \frac{x_{M,c}}{256} \in [\!0, 1). \end{array} \right. $$

This is the relation between x H and x R and includes a function f Ia that is proportional to the exponent of x R . However, note that this is chopped by the function f Ib . This is confirmed by Figure 9, which indicates the relation between x B and x H for the type B ‘tree’ image. Note that x B is a scaled version of x R . The points ‘o’ indicate where the mapping given by the type I method becomes discontinuous.

Figure 9
figure 9

Mapping in the systems for the type B image ‘tree’.

In contrast, the proposed type II method in Figure 7 produces y H as

$$ \begin{array}{ll} \textbf{y}_{H} & =\text{Flt}_{B} \left(\text{Rev}_{B}^{-1} \left(\text{Rnd}\left(\text{Cnv}^{-1} \left(\textbf{y}_{R}^{(A)}\right)\right)\right)\right) \\ & \approx \text{Flt}_{A} \left(\text{Rev}^{-1}_{A}\left(\textbf{y}_{R}^{(A)} \right) \right) \end{array} $$
((66))

neglecting the effect of Rnd. This means that the image is converted to type A in the base layer. Therefore, taking the inverse of Equation 35, we have

$$ x_{H,c} = f_{IIa}(x_{R,c}) \cdot f_{IIb}(x_{M,c}) \cdot 2^{\text{min}X_{E} -15}, $$
((67))

where

$$\left\{ \begin{array}{rl} f_{IIa}(x_{R,c}) & = ~ \exp(x_{R,c} \cdot 2^{-10} \log_{e}2), \\ f_{IIb}(x_{M,c}) & = ~ \frac{\delta_{A}+1}{2^{\delta_{A}}} \in [\!1, 1.06), \\ \delta_{A} & = ~ \frac{x_{M,c}}{1024} \in [\!0, 1). \end{array} \right. $$

Similar to the type I method, the function f IIa is proportional to the exponent of x R . Note that the function f IIb is close to one. Therefore, unlike the type I method, the type II method gives an HDR image x H that is approximately proportional to the exponent of x R . This is confirmed by the points marked ‘x’ in Figure 9.

Next, we investigate how the mappings in Equations 65 and 67 magnify the quantization error e B . Denoting the mapping as x H =f(x B ), the error is magnified as

$$ {}\textbf{y}_{H}-\textbf{x}_{H} = f(\textbf{x}_{B}+\textbf{e}_{B})-f(\textbf{x}_{B}) $$
((68))
$$ {} \approx \frac{\partial f(\textbf{x}_{B})}{\partial \textbf{x}_{B}} \cdot \textbf{e}_{B}. $$
((69))

Figure 10 illustrates the absolute value of

$$ \frac{\partial f(\textbf{x}_{B})} {\partial \textbf{x}_{B}} \approx \frac{\textbf{y}_{H}-\textbf{x}_{H}}{\textbf{y}_{B}-\textbf{x}_{B}} = \frac{\Delta \textbf{x}_{H}}{\Delta \textbf{x}_{B}} $$
((70))
Figure 10
figure 10

Magnification of coding errors in the systems for the type B image ‘tree’.

for the type I method (marked ‘o’) and the type II method (marked ‘x’). In the figure, a larger value signifies greater amplification of the error. We can see that the type I method has larger values, especially at the jump points of Figure 10 which come from the discontinuous points of Figure 9. This implies that the degradation in LDR image quality produced by the type I method is alleviated by the type II method, which uses the format conversion in Section 4.4.

6 Experiments

We now describe a series of experiments that tested nine HDR images, including five type A images and four type B images. For the lossy coding in the base layer and the lossless coding in the enhancement layer, we used the JPEG 2000 international standard [1] in lossy mode and lossless mode, respectively.

6.1 6.1 Base layer

We compared the coding performance in the base layer of the baseline method and the proposed method. In this section, the proposed method denotes the type I procedure in Figure 3 for type A images and the type II procedure in Figure 7 for type B images. Figure 11 compares the baseline and proposed methods for the type A ‘cannon’ image. The horizontal axis records the data volume of the base layer in bits per pixel per color component (bppc). The vertical axis indicates the LDR image quality in terms of PSNR, which is defined by

$$ \text{PSNR}=10\log_{10} \frac{255^{2}}{\text{Ens} \left((\textbf{y}_{L}-\textbf{x}_{L})^{2}\right)} \hspace{+8pt}[\!dB] $$
((71))
Figure 11
figure 11

Coding performance in the base layer for type A image ‘cannon’.

for

$$ \textbf{x}_{L} =\text{Rnd}\left(\text{Clp'}\left(\text{Tmo}(\textbf{x}_{H})\right) \right), $$
((72))

where Ens(·) denotes the ensemble average over all pixels in the image. The results indicate that the proposed method is slightly worse (by 0.46 dB at 3.1 bpp) than the baseline approach. Figure 12 indicates the rate distortion curves for the type B ‘Belgium’ image. The results are very similar to those for ‘cannon’. The ‘tree’ image was investigated in different formats. Figures 13 and 14 indicate the curves for this image in type A and type B formats, respectively. Figure 15 summarizes the PSNR at 1.5 bppc in the base layer. This indicates that the proposed method is slightly better than the baseline technique. It can be concluded that the proposed method is comparable to or slightly better than the baseline method. This is considered to be because of the similarity of x B in the baseline method and the proposed method. Both quantities represent the logarithm of the original HDR image x H .

Figure 12
figure 12

Coding performance in the base layer for type B image ‘Belgium’.

Figure 13
figure 13

Coding performance in the base layer for type A image ‘tree’.

Figure 14
figure 14

Coding performance in the base layer for type B image ‘tree’.

Figure 15
figure 15

Image quality of LDR images for various images at 1.5 bppc enhancement layer bit stream.

6.2 6.2 Enhancement layer

Figure 16 compares the output from the proposed and baseline methods for the type A ‘cannon’ image. The horizontal axis indicates the PSNR of the reconstructed LDR images, and the vertical axis indicates the bit rate of the bit stream in the enhancement layer. Note that, because the decoded HDR images are lossless, the PSNR is infinite. This figure indicates that the proposed method reduces the bit rate by more than 3.4 bppc for this image. As indicated in the figure, the bit rate in the enhancement layer decreases as the PSNR increases. However, the bit rate in the base layer increases with PSNR, which means that there is a trade-off in the bit rate in these layers.

Figure 16
figure 16

Bit rate of the enhancement layer for type A image ‘cannon’.

Figure 17 shows the results for the type B ‘Belgium’ image. We can observe that the bit rate decreases by 8.03 bppc at 35 dB LDR image quality. Unlike the case in Figure 16, the bit rate increases with the PSNR. This is because the correlation among neighboring pixels in e I increases in low PSNR (low bit rate) coding of the LDR image as indicated in Figure 18. For this input image, the correlation is observed to be 0.14 at a PSNR of 36.9 dB. The correlation monotonically increases as PSNR decreases, reaching 0.80 at 18.8 dB. Because e I is encoded with a transform that uses this correlation, a higher correlation serves to lower the bit rate. This is why the curve of the baseline method in the figure increases monotonically. The bit depth of the enhancement layer decreases monotonically from 26.7 bits at a PSNR of 18.8 dB to 24.1 bits at 36.9 dB as indicated in Figure 19. Furthermore, the logarithm of the variance of e I is also monotonically decreasing as indicated in Figure 20.

Figure 17
figure 17

Bit rate of the enhancement layer for type B image ‘Belgium’.

Figure 18
figure 18

Correlation of the difference for type B image ‘Belgium’.

Figure 19
figure 19

Bit depth of the difference for type B image ‘Belgium’.

Figure 20
figure 20

Log of variance of the difference for type B image ‘Belgium’.

The ‘tree’ image was again investigated in different formats. Figures 21 and 22 show the bit rate for the type A and type B image formats, respectively. We can see that better PSNR in the LDR images brings about a lower bit rate in the enhancement layer. This suggests that a higher data volume in the base layer will lead to a lower volume in the enhancement layer. Figure 23 summarizes the bit rate at 35 dB LDR image quality. This figure indicates that the proposed method reduces the data volume of type A images by a minimum of 3.82 bppc (for the ‘cannon’ image) and by a maximum of 8.82 bppc (for ‘still life’). For type B images, the data volume is reduced by a minimum of 7.8 bppc for ‘desk’. It was confirmed that the proposed method significantly reduces the data volume of the enhancement layer for both type A and type B format images.

Figure 21
figure 21

Bit rate of the enhancement layer for type A image ‘tree’.

Figure 22
figure 22

Bit rate of the enhancement layer for type B image ‘tree’.

Figure 23
figure 23

Bit rate of the enhancement layer for various images at 35 dB LDR image quality.

7 Conclusions

In this paper, we have presented a bit-depth scalable lossless coding for HDR images in floating-point data formats. Unlike most conventional scalable coding methods, the proposed method reconstructs the original HDR image without any loss. Introducing a reversible logarithmic mapping and format conversion technique, it was confirmed that the proposed method reduces the bit depth as well as the bit rate in the enhancement layer. It was also confirmed that the proposed method maintains the LDR image quality and coding performance of the baseline method in the base layer for both the OpenEXR and RGBE formats.

As our investigation has been limited to a difference-based approach, it is necessary to include ratio-based approaches, such as [8].

8 Appendix A

Substituting

$$\left\{ \begin{array}{ll} \text{Nrm}^{-1}\left(\textbf{x}_{B}\right) = \textbf{x}_{B} \cdot C/255 + C_{1} \\ C = C_{2} - C_{1} \\ C_{1} =\log_{e}\left(\text{min}X_{I}^{+}\right), \; C_{2}=\log_{e}\left(\text{max}X_{I}\right) \end{array} \right. $$

into

$$\begin{aligned} \textbf{e}_{I} &= \exp\left(\text{Nrm}^{-1}\left(\text{Nrm}\left(\log_{e}\textbf{x}_{I}\right) \right.\right.\\ &\quad+\left.\left. \textbf{e}_{B} + e_{1}\right)\right) - \textbf{x}_{I} +e_{2}, \end{aligned} $$

we have

$$\begin{array}{rl} \textbf{e}_{I} = \exp(x +\epsilon)-\textbf{x}_{I}+e_{2} \end{array} $$

where

$$\left\{ \begin{aligned} x &= \log_{e}\textbf{x}_{I}, \\ \epsilon &= \left(\textbf{e}_{B}+e_{1}\right) \cdot C/255. \end{aligned} \right. $$

When x takes its maximum value, εx holds. For example, the value of ε/x for all images tested in this paper is less than 10−4. In this case,

$$\begin{array}{rl} \textbf{e}_{I} & = ~ \exp(x+\epsilon)-\textbf{x}_{I} +e_{2} \vspace{+2pt}\\ & \approx ~ \frac{\partial \exp (x)}{\partial x}\epsilon +\exp (x)-\textbf{x}_{I} +e_{2} \end{array} $$

holds. Therefore, we have

$$\begin{aligned} \textbf{e}_{I} &= \exp (x) \epsilon +\exp (x)-\textbf{x}_{I} +e_{2}\\ &= \exp \left(\log_{e}\textbf{x}_{I}\right) \left(\textbf{e}_{B}+e_{1}\right) \cdot C/255 \\ &\quad+ \exp \left(\log_{e}\textbf{x}_{I}\right)-\textbf{x}_{I} +e_{2} \\ &= \textbf{x}_{I} \left(\textbf{e}_{B}+e_{1}\right) \cdot C/255 + \textbf{x}_{I} -\textbf{x}_{I} +e_{2} \\ &= \textbf{x}_{I} \left(\textbf{e}_{B}+e_{1}\right) \cdot C/255 +e_{2} \\ \end{aligned} $$

namely,

$$\text{max}E_{I} = \text{max}X_{I} \left(\text{max}E_{B} +e_{1}\right) \cdot C/255 +e_{2}. $$

According to our experiments, max E I is 7.23×103 in ‘cannon’ at minimum. Therefore e 2 is negligible compared to max E I since the maximum of e 2 is 0.5. Similarly, max E B takes value between 3 and approximately 27 depending on the bit rate of the base layer. Therefore, e 1 is negligible in low bit rate compared to max E B and we have

$$\text{max}E_{I} = \text{max}X_{I} \cdot \text{max}E_{B} \cdot C/255. $$

Note that precision of this estimation slightly decreases in high bit rate coding in which max E B takes small value such as 3.

9 Appendix B

Substituting

$$\left\{ \begin{array}{ll} \text{Nrm'}^{-1}(\textbf{x}_{B}) = \textbf{x}_{B} \cdot C'/255 + C'_{1} \\ \text{Nrm'} (\textbf{x}_{R}) = \left(\textbf{x}_{R} - C'_{1}\right) \cdot 255/C' \\ C'=C'_{2}-C'_{1} \\ C'_{1} ={\text{min}X_{R}}, \;C'_{2}={\text{max}X_{R}} \end{array} \right. $$

into

$$ \textbf{e}_{R} = \text{Nrm'}^{-1} \left(\text{Nrm'}(\textbf{x}_{R}) +\textbf{e}_{B} +e'_{1}\right) - \textbf{x}_{R} +e_{2}', $$

we have

$$\begin{aligned} \textbf{e}_{R}&= \text{Nrm'}^{-1} \left(\left(\textbf{x}_{R} - C'_{1}\right) \cdot 255/C' +\textbf{e}_{B} +e'_{1}\right)\\ &\quad- \textbf{x}_{R} +e_{2}', \vspace{+2pt}\\ &= \textbf{x}_{R} -C'_{1} +\left(\textbf{e}_{B} +e'_{1}\right) \cdot C'/255 +C'_{1} \\ &\quad- \textbf{x}_{R} +e_{2}', \vspace{+2pt}\\ &= \left(\textbf{e}_{B} +e'_{1}\right) \cdot C'/255 +e_{2}'. \vspace{+2pt}\\ \end{aligned} $$

According to our experiments, max E R is 94 in ‘cannon’ at minimum. Therefore e2′∈[ −0.5,0.5] is negligible compared to max E R . Similarly to Appendix A, e 1 is negligible compared to max E B . As a result, we have

$$\text{max}E_{R}=\text{max}E_{B} \cdot \text{max}X_{R}/255. $$