1 Introduction

The goal of steganography in our study is to hide information imperceptibly in a cover image so that the presence of hidden data cannot be detected by visual appearance. Images are a good carrier for transmitting secret messages over the internet, due to the redundant information in images and visual resilience to small changes in the original pixel values. In many applications, the most important requirement for steganography is the undetectability of the hidden data. This means that the image that contains the hidden data, the stego-image, should be visually and statistically similar to the cover image [1, 2]. Many digital steganography techniques have been proposed in recent years. All of them share the fundamental concept of injecting secret information into a cover image to generate a stego-image output as shown in Fig. 1.

Image steganography can be categorized into two different embedding domains, spatial domain [3] and frequency domain [1]. In spatial domain technology, secret information is embedded directly into pixel intensity values. In the frequency domain techniques, a discrete frequency transform (mainly DCT or DWT) is used and the secret information is embedded into the frequency coefficients of the cover image [1, 2]. The inverse transformation generates the image steganography. In both embedding domains, the process introduces distortion in the cover image, which could lead to steganographic detectability. The objective is to preserve the visual and statistical properties while embedding the message in the cover image, with a high embedding rate. The invisibility of any steganography technique in the spatial domain depends on the selection of pixels for embedding the secret message [4]. Due to the masking phenomenon of the human visual system [5], small distortions in pixels in smooth areas are much more noticeable than distorted pixels in high-frequency texture areas as described in Fig. 2. To maintain the visual properties of the image, the secret message should be embedded along the edges of the cover image, where the visual impairment is low [1, 6].

In this paper, we will introduce a novel image steganography technique, which embeds the secret message in the spatial domain. The proposed steganography technique is found to have excellent invisibility and high capacity. The paper is organized as follows: Section 2 discusses some well-known spatial domain image steganographic techniques. Section 3 introduces our novel steganography technique, which embeds the secret message in high-energy areas in the image. Section 4 contains experimental results and Section 5 concludes our work and elaborates on directions for future work.

Fig. 1
figure 1

General steganographic system

Fig. 2
figure 2

Effect of embeding data in cover image (a) Cover image (b) Smooth area (c) High texture area

2 Related work

There are several steganographic techniques to embed data securely in an image and some tools to detect the presence of a secret message in a steganogram. Image steganography can be divided into two main types: spatial or frequency domain steganography.

Spatial domain steganography changes some bits in the image pixels during data hiding. When hiding data in a pixel, the physical location of a pixel is considered and then the binary format of that pixel value is used to hide the data. The most common methods are based on least-significant-bit (LSB) substitution [7]. There are several sophisticated LSB approaches to embed secret data by replacing k LSBs of a pixel with k secret bits [4]. In some of the LSB approaches, the choice of embedding positions within a cover image depends on a pseudo-random number generator without considering the relationship between the image content itself and the visual impairment of the secret message [8]. Several variances of wet paper codes, which did not consider visual impairment, were proposed as a tool for constructing steganographic schemes with an arbitrary selection channel that is not shared between the sender and the recipient [9, 10]. Other methods use the fact that human vision can tolerate severe changes in the edge region to increase the quality of the stego-images [3, 6, 11]. These methods can embed most secret data along sharper edges and can achieve more visually imperceptible stego-images.

Random location algorithms do not take into account the human visual system (HVS), so degradations might be more noticeable. In our scheme, however, we choose the locations of the embedded bits in the best location in terms of minimum reduction of the perceived image quality. One more advantage of our scheme in comparison with other schemes is that we don’t need to send the locations of the affected pixels, whereas in some of the other schemes the location of those pixels must be signaled to the other side.

When using a cover image in the spatial domain, the main issue is to select the best location in the image to hide the secret information. Pixels within high-frequency texture areas are a better choice for embedding the secret information since the visual impairment is low. Image texture is highly dependent on image content. To make the perceived degradation of the original image low, our proposed scheme embeds the secret bits into high-texture areas while keeping smooth regions as they are. Several visual texture measures are considered for defining the energy of an image. For simplification, a simple image operator called Max Energy Seam (MES) is introduced in this paper. The operator is based on the idea of seam carving that supports content-aware image resizing [12]. There have been several recent studies that locate and preserve the key visual elements in the image [13, 14]. Those studies locate the best-connected seam or area of low-energy pixels crossing the image from top to bottom or from left to right. In our study, we locate the connected seams or area of high-energy pixels crossing the image from left to right. By inserting the secret message in an image along the MES, we could hide a large amount of data that could fit a given image size. Embedding the information in a location of high energy texture will be less noticeable by the HVS compared to smooth areas where the sensitivity of the HVS is more dominant.

3 Methodolgy

3.1 The encoder- steganography

A color image \(I_{RGB}\), which uses three color planes, R, G, and B, is represented by the intensity image I as in (1). To ensure synchronization between the decoder and the encoder during the energy calculation. The encoder and the decoder use the same method. The n least significant bits of the three image plans, R, G, and B, were reset to generate the saliency map without any influence on the original image. The RGB image (after the reset of the LSBs) is converted to grayscale values by forming a weighted sum of the R, G, and B components as in (1), where k represents the number of hidden bits that could be inserted to each channel.

$$\begin{aligned} I(i)= 0.2989*2^k \lfloor \frac{R(i)}{2^k} \rfloor + 0.5870 *2^k \lfloor \frac{G(i)}{2^k} \rfloor + 0.1140 *2^k \lfloor \frac{B(i)}{2^k} \rfloor \end{aligned}$$
(1)

where k=1,2,3,4,5.

There are several possible image importance measures found in the literature as the energy function, which we could support to guide our best connected MES [12, 14]. A good and simple example of the energy function e(\(\cdot \)) is to use the gradient magnitude of the image I, which usually indicates an edge. The edges are the part of the image that is potentially suitable for message embedding. This example of e(\(\cdot \)) for an image could be represented as in (2):

$$\begin{aligned} e(I)= \mid {\frac{\partial I}{\partial x}}\mid + \mid \frac{\partial I}{\partial y}\mid \end{aligned}$$
(2)

Each pixel p in an image I has a certain amount of energy represented by the gradient function e (\(\cdot \)). Pixels with higher energy in e(\(\cdot \)) are more salient, and they are good candidates for embedding the secret message. Those pixels are less noticeable by the HVS compared to smooth areas. Given a gradient function e(\(\cdot \)) of an image, on the encoder side, the pixels with the highest energy in the gradient image were selected to carry the secret message. At the same time, they should maintain the possibility of decoding the message on the decoder side. This leads to our strategy of selecting a seam in the image that has the maximum energy in the gradient image. A seam is defined as an eight-connected path of pixels in the image from left to right. It’s not essential to have an eight-connected seam from a left-to-right  pixel, but only one pixel per column will be selected. From [12] the formal mathematical definition for the horizontal seam \(S^y\) (from left to right) in an n\(\times \)m image I is:

$$\begin{aligned} S^y = \{{S^y_{j}}\}_{j=1}^m = \{(j, y(j))\}_{j=1}^m , \hspace{0.3cm} s.t. \hspace{0.4cm} \forall j , \mid y(j) - y(j-1)\mid \le 1 \end{aligned}$$
(3)

Where y is a mapping \(y: [1... m] \xrightarrow {} [1... n]\) The pixels of the path of horizontal seams \(S^y\) will therefore be:

$$\begin{aligned} I^s_{y} = \{I{(S_j^y })\}_{j=1}^m = \{ I (j, y(j))\}_{j=1}^m \end{aligned}$$
(4)

Given an energy function e, we can define the cost of horizontal seams as:

$$\begin{aligned} {E(S)}= E(I_{S}^y )=\sum _{j=1}^{m} e(I(S_j^y )) \end{aligned}$$
(5)

The optimal horizontal seam S* is the seam which maximizes the energy function:

$$\begin{aligned} S_{y}^* = \max _{\textbf{s}} E(s) =\max _{\textbf{s}} \sum _{j=1}^{m} e(I(S_j^y)) \end{aligned}$$
(6)

In the same way, we could use a vertical seam or both types of seams. In this study, for simplicity, we introduced only horizontal seams. Using both types of seams could enable the assimilation of a greater amount of hidden information to fit into the image.

The optimal horizontal seam can be found using dynamic programming. The first step is to traverse the gradient image from the second column to the last column and compute the cumulative maximum energy M for all possible connected seams for each entry (i, j) as in (7):

$$\begin{aligned} M(i, j) = e(i, j) + max (M(i-1, j-1),M(i-1, j),M(i-1, j + 1)) \end{aligned}$$
(7)

For example, the energy function e(\(\cdot \)) that represents the gradient magnitude of the image I as in Fig. 3:

Fig. 3
figure 3

An example of the gradient function e(\(\cdot \)), which represents the energy

From the gradient function e(.) we compute the cumulative maximum energy M for all possible connected seams for each entry (i, j), as shown in Fig. 4, where the red arrow represents the selected value.

Fig. 4
figure 4

The process of generating the cumulative maximum energy M

At the end of this process, the maximum value of the last column in M will indicate the end of the maximal connected horizontal seam. Hence, in the second step, we backtrack from this maximum entry on M to find the path of the horizontal Maximal Energy Seam (MES) as in Fig. 5. The definition of M for vertical seams is similar.

Fig. 5
figure 5

Backtrack from the maximum entry on M to find the path of the best MES

The process of generating the first MES is illustrated in Fig. 6. Figure 6(a) illustrates the first selected seam on top of the gradient image of the Golden Gate Bridge. The highlighted MES is the one that contains the most energy between all possible routes. In Fig. 6, the first MES is drawn on top of the bridge image to emphasize the location of the MES in the image.

Fig. 6
figure 6

The green line indicates the location of the first optimal MES with the max energy

This first MES is the most salient edge in the gradient image e(\(\cdot \)) and it is a good candidate for secret message embedding. Within the MES, the secret data will embed only to pixels that satisfy a threshold value T which is calculated in the energy plan.

The threshold value T was calculated by finding the intensity level such that the desired k percentage of the image pixels is below this value. This is extracted from the normalized cumulative histogram of the gradient image e(\(\cdot \)) where h(\(\cdot \)) is the normalized histogram as in (8). The threshold is recalculated for each iteration of the algorithms which select one MES.

$$\begin{aligned} \sum _{j=1}^{T} \frac{h(j)}{mn} \ge k \end{aligned}$$
(8)

where T is the threshold.

The secret message bits are inserted into the LSBs of each of the RGB channels that belong to the MES. Figure 7 and (9) represents the case where each channel carries one secret bit. It is possible to increase the amount of secret data carried by the cover image by using two or more LSBs for each of the RGB channels, as described in (1) [4, 7, 15] using two or more bits will result in larger quantity integration and could therefore result in reduced image quality.

The embedding operation of 1-LSB steganography may be described by the following equation:

$$\begin{aligned} R(i)= 2\lfloor \frac{R(i)}{2} \rfloor + S(j) ; G(i) = 2\lfloor \frac{G(i)}{2} \rfloor + S(j+1) ; B(i) = 2\lfloor \frac{B(i)}{2} \rfloor + S(j+2) \end{aligned}$$
(9)

where R(i), G(i) and B(i) belong to the \(i-th\) selected pixel along the MES and S(j) is the \(j-th\) bit of the secret message .

Fig. 7
figure 7

Three bits of secret message, embedded into the LSBs of one RGB pixel of the cover image

To carve the n-th MES from the energy image, the energy of the n-1 MES was reset and the cumulative maximum energy M of the image is recalculated. Using dynamic programing, it is verified that there is no collision between pixels in different MES’s which carry data. In case of collision, the partial MES is reset and the process restarts. The secret message is inserted into the LSBs of the selected path of the MES in the cover image to create the stego-image.

3.2 The decoder-steganalysis

During the decoding the same process is applied. From the stego-image, the LSBs of the three image plans R, G, and B were reset. The RGB image (after the LSB reset) was converted to grayscale values by forming a weighted sum of the R, G, and B components as in (2). From the grayscale image the gradient image was generated as in (1). At each iteration, a new cumulative maximum energy M was generated with a new threshold and the MES’s were created from them one by one. The LSBs of the RGB pixels along the MES’s were carrying the secret data. Extracting and reordering the bits will yield the secret message.

4 Experimental results

To demonstrate the quality of the process, we emulated the algorithm using Matlab script. Several experiments were conducted:

4.1 Lena image as a stego-data

In the first experiment, a small gray-level image of LENA was hidden in the RGB cover image of the Golden Gate Bridge (Fig. 8(a) and (b)). Lena’s image size is 38 x 38 x 8 = 11,552 bits with a 24-bit header that is used to indicate the data type and the image size, yielding 11,576 bits of hidden data. As a cover image we use the Golden Gate Bridge RGB image, at a size of 488 x 664 = 297,472 pixels. The ratio between hidden message size in bits and the number of pixels in the cover image is about 3.8%. Figure 8(b) represents the iterative process that selects the best MES’s to carry the hidden data in the cover image. The 11,576 secret bits are inserted into eight MES’s. The red pixels along the MES represent pixels which carry the secret data. The blue pixels are not used to carry data, since they were under the calculated threshold (Fig. 8(c)). The length of each MES in the Golden Gate Bridge as a cover image is 664 pixels, the width of the image. Each pixel could carry three bits, so the maximum capacity of each MES is 1992 bits. Figure 8 shows that the MES lines are formed mainly on the edge lines. In the process of creating the MES’s, the constraint of eight-connected path may cause intermediate transitions in smooth areas. Using the threshold T specified in (8) enables the assimilation of the hidden information only in the desired areas. In our tests, we use a high dynamic threshold with k = 0.9 in (8) which is updated at each iteration in order to ensure that the quality of the original image will not reduce. We repeat the same experiment using a Carriage image as cover image, Fig. 8(d) and (e).

Fig. 8
figure 8

Embedding Lena image [8(a)] into Golden Gate cover image [8(b)], and into Carriage cover image [8(d)]. The red part of the lines demonstrates the MES’s that carry the data [8(c),8(e)]

Table 1 The amount of data that each MES is carrying

The different amount of data that each MES carries for each image is illustrated in Table 1. The selected threshold influences the number of bits that each MES carries. Since each RGB pixel carries three bits from the stego-data and the secret message contains 11,576 bits, there is a 7/8 probability that the amount of 3,859 pixels could change. Assume the binomial distribution the probability of getting at least one change in a pixel is given by the cumulative mass function in (10), where n=3 and p=0.5.

$$\begin{aligned} Pr (1\le X \ge 3) = \sum _{i=1}^{3} ((\overset{n}{i}) p^i (1- p)^{(n-i)} \end{aligned}$$
(10)

The experimental results in Table 2 show that 2218 out of 3,859 pixels have been changed in the Golden Gate Bridge image, and 2305 out of 3,859 pixels have been changed in the Carriage image, which is less than the calculated probability. Table 2 summarizes the number of bits that changed in each image. The visual impairment was so low that it was impossible for the HVS to distinguish between the original image and the stego-image.

To quantify the difference between the original image and the stego-image, we use three objective image quality assessments.

Table 2 The number of bits that were modified in the pixels that belong to the MES
Table 3 The measures when using Lena image as stego-data

Mean Squared Error (MSE) between images x and y is given by:

$$\begin{aligned} MSE= \frac{1}{N} \sum _{i=1}^{N} (x_i-y_i)^2 \end{aligned}$$
(11)

where N is the number of pixels in the image.

Peak Signal-to-Noise ratio (PSNR) [16, 17] is given by:

$$\begin{aligned} PSNR= 10 \log _{10} \frac{\max (x^2)}{MSE} \end{aligned}$$
(12)

where \(max (x^2)\) is the maximum pixel value of the image x. Structural SIMilarity (SSIM) [16, 18, 19] index between images x and y [20]:

$$\begin{aligned} SSIM(x,y) = \frac{((2 \mu _x \mu _y + C_1)(2 \sigma _{xy} + C_2))}{((\mu _x^2+ \mu _y^2 + C_1) (\sigma _x^2 + \sigma _y^2 + C_2))} \end{aligned}$$
(13)

where \(\mu _x\), \(\mu _y\), \(\sigma _x\), \(\sigma _y\), and \(\sigma _{xy}\) are the local means, standard deviations, and cross-covariance for images x, y. The constants \(C_1\) = \((K_1L)^2\) and \(C_2\) = \((K_2L)^2\) are included to avoid unstable results when either \((\mu _x^2+ \mu _y^2 )\) or \((\sigma _x^2 +\sigma _y^2)\) are very close to zero. L is the dynamic range of the pixel values and \(K2 \ll 1, K1 \ll 1\) are small constants.

The MSE and the PSNR have clear physical meanings, but they are not matched to the perceived visual quality, the structural similarity (SSIM) predicts the perceived image quality [20]. For the Golden Gate image and the Carriage image, the SSIM calculation in this case yields the maximum possible value 1, which means a complete visual similarity between the original image and the stego-image. Table 3 shows the results of the PSNR, MSE and SSIM calculations.

Table 4 The measures when using text as stego-data

The results indicate that the differences between the cover image and the stego-image are not large. The amount of information that can be embedded in the picture at a high level of concealment depends on the target image. The higher the texture, the more information can be embedded in the image. In our study, we used images with large smooth areas, allowing us to assimilate information at a density of about 5% of the image size, at high level of concealment.

4.2 Text data as a stego-data

The second experiment was conducted with text as the stego-data. The English pangram “The quick brown fox jumps over the lazy dog” is a sentence using every letter of the English alphabet . The sentence was repeated 43 times and used as stego-data. The data size was 43 chars x 33 lines x 8 bits = 11,352 bits, plus a 24-bit header, yielding 11,376 bits of hidden data that could influence 3,792 pixels in the cover image. (The size of the information is about the size of the previous experiment with the Lena image). As a cover image we use the same Golden Gate Bridge and Carriage images, as described in section 5a. The experimental results show that out of 3,792 pixels, 2,102 pixels have been changed. As in section 5a, the maximum capacity of each MES is 1992 bits. We ensured visual similarity between the original image and the stego image by using k=0.9 as a threshold in (8). Table 4 illustrates the measures between the stego-image and the original image: the MSE, PSNR and the SSIM. The SSIM is 1 since the original image and the stego-image were indistinguishable to the human eye.

The results indicate that the differences between the cover image and the stego-image are not large. The higher the texture of the cover image, the more information can be embedded in the image. In this experiment, the hidden information density is about 5% of the image size, with high level of concealment.

4.3 Iterative process to carve the MES’s

During the iterative algorithm, the MES’s were generated without overlap between the pixels which carry data. During the process in the Golden Gate Bridge image, the algorithm generated 57 possible MES’s. Eight of them fulfills the conditions of maximum energy and continuity along the horizontal axis, and also had values higher than the threshold as in (7). These eight MES’s were selected to carry the secret data. Figure 9(a) demonstrates the 57 experimental MES’s on top of the energy image. Figure 9(b) demonstrates the same action for the Carriage image. Here the algorithms found, from among 26 possible MES’s, eight which fulfill the conditions and could carry the secret data.

Fig. 9
figure 9

The dynamic selection of MESs by generating valid options

Table 5 Quality measures of stego-image with the same stego-data, but with different number of hidden bits per pixel

4.4 Increasing the number of hidden bits per pixel

In order to test the strength of the algorithms and the sensitivity of human vision to the areas on which the hidden information was written, the following two experiments were conducted: In the first experiment, we use the fixed-size image of Lena that was hidden in the RGB cover image of the Golden Gate Bridge (as in Section 4.1). The hidden data size of 11,576 bits was embedded into the image at the size of 488x664 pixels of 24 bits. We control the number of hidden bits that have been written per pixel, in order to measure the stego-image quality with different quantity of MES’s. In Table 5, the first row indicates the number of bits that were embedded per pixel. The other rows are the quality measures of the stego-image in each case. The last row is the number of MES’s that contain the hidden data. From the quality measures (PSNR, MSE, SSIM ), we can see that a large change in fewer MES’s is more significant than small changes in more MES’s. Since the MES’s are located on the image edges, due to the Mach Bands effect, a large change in the values of the MES’s is not perceived by the HVS. Together with a fine threshold of K=0.9, in all the illustrated cases it was impossible to distinguish from observation that the image has an unusual phenomenon in the zone of the MES’s.

In the second experiment we used a different sized Lena image, which was hidden on a fixed-size (488 x 664 pixels) Golden Gate Bridge image that was used as cover image. In Table 6, the first and the second rows indicate the hidden data size that was embedded into the cover image. Increasing the size of the hidden data influences the number of hidden bits that have been written per pixel by maintaining the same number of MES’s (the last row). The other rows contain the quality measures of the stego-image in each of the cases. From the results, we can see that quality measures are affected by the amount of hidden information. But even with a large amount of hidden information, the objective measures (PSNR and MSE) are quite good, and the SSIM index also gives a good result. As in the previous experiment, due to the location of the MES’s on the edges and as well as the Mach bands effect, a large change in the values of the MES’s is not perceived by the HVS. Together with a fine threshold of K=0.9, in all the illustrated cases it was impossible to distinguish from observation that the image has an unusual phenomenon in the zone of the MES’s.

Table 6 Quality measures of stego-image with the different amounts of stego-data and with different numbers of hidden bits per pixel

5 Conclusion and future work

In this paper, we propose a new method for image steganography in the spatial domain. The method is based on LSB substitution, embedding the secret data into RGB images without creating a perceptible distortion. The method uses an energy function to define the saliency map of the image. From the saliency image a cumulative maximum energy matrix is created. The max energy horizontal seams are selected from the cumulative matrix and the secret message is embedded along the seams. The experimental results show that the algorithm has a fair capacity and good invisibility.. Some open issues that can be further incorporated in our future work include: diverse options for generating the saliency image; different approaches for selecting the MES’s; and considering approaches to defense against attacks intended to destroy or detect the embedded information. Finally, the method presented here could be extended to video and audio signals.