In this section, we provide a general description of the proposed universal distortion function UNIWARD and explain how it can be used to embed in the JPEG and the side-informed JPEG domains. The distortion depends on the choice of a directional filter bank and one scalar parameter whose purpose is stabilizing the numerical computations. The distortion design is finished in Section 5, which investigates the effect of the filter bank and the stabilizing constant on empirical security.
Since rich models[18, 20–22] currently used in steganalysis are capable of detecting changes along ‘clean edges’ that can be well fitted using locally polynomial models, whenever possible the embedding algorithm should embed into textured/noisy areas that are not easily modellable in any direction. We quantify this using outputs of a directional filter bank and construct the distortion function in this manner.
3.1 Directional filter bank
By a directional filter bank, we understand a set of three linear shift-invariant filters represented with their kernels. They are used to evaluate the smoothness of a given image X along the horizontal, vertical, and diagonal directions by computing the so-called directional residuals W(k) = K(k) ⋆ X, where ‘ ⋆’ is a mirror-padded convolution so that W(k) has again n1 × n2 elements. The mirror padding prevents introducing embedding artifacts at the image boundary.
While it is possible to use arbitrary filter banks, we will exclusively use kernels built from one-dimensional low-pass (and high-pass) wavelet decomposition filters h (and g):
(2)
In this case, the filters correspond, respectively, to two-dimensional LH, HL, and HH wavelet directional high-pass filters, and the residuals coincide with the first-level undecimated wavelet LH, HL, and HH directional decomposition of X. We constrained ourselves to wavelet filter banks because wavelet representations are known to provide good decorrelation and energy compactification for images of natural scenes (see, e.g., Chapter 7 in[23]).
3.2 Distortion function (non-side-informed embedding)
We are now ready to describe the universal distortion function. We do so first for embedding that does not use any precover. Given a pair of cover and stego images, X and Y, represented in the spatial (pixel) domain, we will denote with and, k = 1,2,3, u ∈ {1,…,n1}, v ∈ {1,…,n2}, their corresponding uv th wavelet coefficient in the k th subband of the first decomposition level. The UNIWARD distortion function is the sum of relative changes of all wavelet coefficients with respect to the cover image:
(3)
where σ > 0 is a constant stabilizing the numerical calculations.
The ratio in (3) is smaller when a large cover wavelet coefficient is changed (where texture and edges appear). Embedding changes are discouraged in regions where is small for at least one k, which corresponds to a direction along which the content is modellable.
For JPEG images, the distortion between the two arrays of quantized DCT coefficients, X and Y, is computed by first decompressing the JPEG files to the spatial domain, and evaluating the distortion between the decompressed images, J-1(X) and J-1(Y), in the same manner as in (3):
(4)
Note that the distortion (3) is non-additive because changing pixel X
ij
will affect s × s wavelet coefficients, where s × s is the size of the 2D wavelet support. Also, changing a JPEG coefficient X
ij
will affect a block of 8×8 pixels and therefore a block of (8 + s - 1) × (8 + s - 1) wavelet coefficients. It is thus apparent that when changing neighboring pixels (or DCT coefficients), the embedding changes ‘interact,’ hence the non-additivity of D.
3.3 Distortion function (JPEG side-informed embedding)
By side-informed embedding in JPEG domain, we understand the following general principle. Given the raw DCT coefficient D
ij
obtained from the precover P, the embedder has the choice of rounding D
ij
up or down to modulate its parity (usually the least significant bit of the rounded value). We denote with e
ij
= |D
ij
- X
ij
|, e
ij
∈ [0,0.5], the rounding error for the ij th coefficient when compressing the precover P to the cover image X. Rounding ‘to the other side’ leads to an embedding change, Y
ij
= X
ij
+ sign(D
ij
- X
ij
), which corresponds to a ‘rounding error’ 1 - e
ij
. Thus, every embedding change increases the distortion with respect to the precover by the difference between both rounding errors: |D
ij
- Y
ij
| - |D
ij
-X
ij
| = 1 - 2e
ij
. For the side-informed embedding in JPEG domain, we therefore define the distortion as the difference:
(5)
Note that the linearity of DCT and the wavelet transforms guarantee that D(SI)(X,Y) ≥ 0. This is because rounding a DCT coefficient (to obtain X) corresponds to adding a certain pattern (that depends on the modified DCT mode) in the wavelet domain. Rounding to the other side (to obtain Y) corresponds to subtracting the same pattern but with a larger amplitude. This is why for all k,u,v.
We note at this point that (5) bears some similarity to the distortion used in Normalized Perturbed Quantization (NPQ)[11, 12], where the authors also proposed the distortion as a relative change of cover DCT coefficients. The main difference is that we compute the distortion using a directional filter bank, allowing thus directional sensitivity and potentially better content adaptability. Furthermore, we do not eliminate DCT coefficients that are zeros in the cover. Finally, and most importantly, in contrast to NPQ, our design naturally incorporates the effect of the quantization step because the wavelet coefficients are computed from the decompressed JPEG image.
3.3.1 Technical issues with zero embedding costs
When running experiments with any side-informed JPEG steganography in which the embedding cost is zero, when e
ij
= 1/2, we discovered a technical problem that, to the best knowledge of the authors, has not been disclosed elsewhere. The problem is connected to the fact that when e
ij
= 1/2 the cost of rounding D
ij
‘down’ instead of ‘up’ should not be zero because, after all, this does constitute an embedding change. This does not affect the security much when the number of such DCT coefficients is small. With an increasing number of coefficients with e
ij
= 1/2 (we will call them 1/2-coefficients), however, 1-2e
ij
is no longer a good measure of statistical detectability and one starts observing a rather pathological behavior - with payload approaching zero, the detection error does not saturate at 50% (random guessing) but rather at a lower value and only reaches 50% for payloads nearly equal to zeroe. The strength with which this phenomenon manifests depends on how many 1/2-coefficients are in the image, which in turn depends on two factors - the implementation of the DCT used to compute the costs and the JPEG quality factor. When using the slow DCT (implemented using ‘dct2’ in Matlab), the number 1/2-coefficients is small and does not affect security at least for low-quality factors. However, in the fast-integer implementation of DCT (e.g., Matlab’s imwrite), all D
ij
are multiples of 1/8. Thus, with decreasing quantization step (increasing JPEG quality factor), the number of 1/2-coefficients increases.
To avoid dealing with this issue in this paper, we used the slow DCT implemented using Matlab’s dct2 as explained in Section 2.2 to obtain the costs. Even with the slow DCT, however, 1/2-coefficients do cause problems when the quality factor is high. As one can easily verify from the formula for the DCT (1), when k,l ∈ {0,4}, the value of D
kl
is always a rational number because the cosines are either 1 or, which, together with the multiplicative weights w, gives again a rational number. In particular, the DC coefficient (mode 00) is always a multiple of 1/4, the coefficients of modes 04 and 40 are multiples of 1/8, and the coefficients corresponding to mode 44 are multiples of 1/16. For all other combinations of k,l ∈ {0,…,7}, D
ij
is an irrational number. In practice, any embedding whose costs are zero for 1/2-coefficients will thus strongly prefer these four DCT modes, causing a highly uneven distribution of embedding changes among the DCT coefficients. Because rich JPEG models[24] utilize statistics collected for each mode separately, they are capable of detecting this statistical peculiarity even at low payloads. This problem becomes more serious with increasing quality factor.
These above embedding artifacts can be largely suppressed by prohibiting embedding changes in all 1/2-coefficients in modes 00, 04, 40, and 44f. In Figure1, where we show the comparison of various side-informed embedding methods for quality factor 95, we intentionally included the detection errors for all tested schemes where this measure was not enforced to prove the validity of the above arguments.
The solution of the problem with 1/2-coefficients, which is clearly not optimal, is related to the more fundamental problem, which is how exactly the side information in the form of an uncompressed image should be utilized for the design of steganographic distortion functions. The authors postpone a detailed study of this quite intriguing problem to a separate paper.
3.4 Additive approximation of UNIWARD
Any distortion function D(X,Y) can be used for embedding in its additive approximation[4] using D to compute the cost ρ
ij
of changing each pixel/DCT coefficient X
ij
. A significant advantage of using an additive approximation is the simplicity of the overall design. The embedding can be implemented in a straightforward manner by applying nowadays a standard tool in steganography - the Syndrome-Trellis Codes (STCs)[3]. All experiments in this paper are carried out with additive approximations of UNIWARD.
The cost of changing X
ij
to Y
ij
and leaving all other cover elements unchanged is
(6)
where X∼i jY
ij
is the cover image X with only its ij th element changed: X
ij
→ Y
ij
g. Note that ρ
ij
= 0 when X = Y. The additive approximation to (3) and (5) will be denoted as DA(X,Y) and, respectively. For example,
(7)
where [S] is the Iverson bracket equal to 1 when the statement S is true and 0 when S is false.
Note that, due to the absolute values in D(X,Y) (3), ρ
ij
(X,X
ij
+ 1) = ρ
ij
(X,X
ij
- 1), which permits us to use a ternary embedding operation for the spatial and JPEG domainsh. Practical embedding algorithms can be constructed using the ternary multi-layered version of STCs (Section 4 in[3]).
On the other hand, for the side-informed JPEG steganography, is inherently limited to a binary embedding operation because D
ij
is either rounded up or down.
The embedding methods that use the additive approximation of UNIWARD for the spatial, JPEG, and side-informed JPEG domain will be called S-UNIWARD, J-UNIWARD, and SI-UNIWARD, respectively.
3.5 Relationship of UNIWARD to WOW
The distortion function of WOW bears some similarity to UNIWARD in the sense that the embedding costs are also computed from three directional residuals. The WOW embedding costs are, however, computed a different way that makes it rather difficult to use it for embedding in other domains, such as the JPEG domaini.
To obtain a cost of changing pixel X
ij
→ Y
ij
, WOW first computes the embedding distortion in the wavelet domain weighted by the wavelet coefficients of the cover. This is implemented as a convolution (see Equation 2 in[17]). These so-called embedding suitabilities are then aggregated over all three subbands using the reciprocal Hölder norm, to give WOW the proper content adaptivity in the spatial domain.
In principle, this approach could be used for embedding in the JPEG (or some other) domain in a similar way as in UNIWARD. However, notice that the suitabilities increase with increasing JPEG quantization step (increasing spatial frequency), giving the high-frequency DCT coefficients smaller costs,, and thus a higher embedding probability than for the low-frequency coefficients. This creates both visible and statistically detectable artifacts. In contrast, the embedding costs in UNIWARD are higher for high-frequency DCT coefficients, desirably discouraging embedding changes in coefficients which are largely zeros.