Robust color image watermarking using multi-core Raspberry pi cluster

Image authentication approaches have gotten a lot of interest recently as a way to safeguard transmitted images. Watermarking is one of the many ways used to protect transmitted images. Watermarking systems are pc-based that have limited portability that is difficult to use in harsh environments as military use. We employ embedded devices like Raspberry Pi to get around the PC’s mobility limitations. Digital image watermarking technology is used to secure and ensure digital images’ copyright by embedding hidden information that proves its copyright. In this article, the color images Parallel Robust watermarking algorithm using Quaternion Legendre-Fourier Moment (QLFM) in polar coordinates is implemented on Raspberry Pi (RPi) platform with parallel computing and C++ programming language. In the host image, a binary Arnold scrambled image is embedded. Watermarking algorithm is implemented and tested on Raspberry Pi model 4B. We can combine many Raspberry Pi’s into a ‘cluster’ (many computers working together as one) for high-performance computation. Message Passing Interface (MPI) and OpenMP for parallel programming to accelerate the execution time for the color image watermarking algorithm implemented on the Raspberry Pi cluster.


Introduction
and essential image processing application due to the spread of the forged sensitive images over the internet. Proving the integrity and the copyright of the images is the watermarking role [28,44].
The methods of invisible watermarking are mainly categorized into three groups: Fragile, Semi-Fragile, and Robust. Robust watermarking is used for copyright protection and ownership identification since it is intended to resist attacks attempting to erase or damage the mark. [33].
The robust image watermarking algorithms has three stages: Embedding, Detection, and Extraction of the watermark [9]. We embed a secret invisible signature in the host image in the embedding process. In the second step, we detected the hidden watermark information. Finally, if we could extract the invisible signature from the watermarked image, we are sure that the transmitted image is original. The correct extraction of the embedded watermark image from the watermarked host image reflects the utilized watermarking algorithm [27].
Natural images are color where color information plays a vital role in most computer-based systems; color image features extraction is the foundation of many applications of image processing as color image watermarking. Color images have more information than gray images. Traditional methods in color image watermarking ignore the correlation between color image channels. Quaternion orthogonal moments represent the color images holistically [14,15].
Moment-based watermarking algorithms are successfully used in robust color image watermarking due to the quaternion moments' exact, fast, and computational stability. According to the extensive study [16], quaternion Legendre-Fourier moments outperformed the other quaternion moments in robustness against geometric, Scaling, Rotation, & Translation, and common attacks from signal processing [17].
Despite the high accuracy in the watermarking algorithms' CPU technology, the execution time for the watermarking algorithms in real-time applications is still an issue. The computations of color image moments and image reconstruction are intensive processes, mainly for higher moment orders and large image sizes. The key approach for solving the intensive computational problem is parallel computations in computing the image moments. Several attempts have been made to parallelize the image moment computations by using GPU, Multicore CPU, and clusters of GPU [19,20,32,45,46]. For real-time applications, these approaches are not helpful.
Although many techniques were used to accelerate the watermarking algorithms, they still performed on ordinary P.C.s with limited portability because of their size and weight. The large size of the P.C.s prevents many smart-cities applications based on watermarking. To overcome the P.C. limits, we use embedded systems like Raspberry Pi, a small-portable platform. The significant advantage of Raspberry Pi is its small size and low cost compared to P.C. [34]. The Raspberry Pi is a single-board computer with many models with an opensource platform that can be used in many different projects [10]. Raspberry Pi model 4B has a Broadcom BCM2711, Quad-core CortexA72 (ARM v8) 1.5GHz processor with 1 G.B., 2 G.B., or 4GB LPDDR4-3200 SDRAM. Moreover, Raspberry Pi has an S.D. card slot for storage and I/O units such as USB, Ethernet, and HDMI. The Raspberry Pi low-cost singleboard computer is used to reduce the complexity of systems in real-time applications [35].
Recently, much research work has been performed using the credit-card-sized open-source Raspberry Pi platform in various fields as image processing [22,31,39], the internet of things (IoT) [2,4,25], home automation [3,21], and many other applications. Due to its portability, the user can control it via the internet, and the most important reason for its popularity among researchers is its low cost. When constructing massive supercomputing clusters, power consumption has become an important metric. Instead of the more common high-end server CPUs, we use low-power embedded processors to minimize the power consumption in large clusters [6].
Due to its low cost and low power consumption, a single Raspberry Pi can be easily afforded, making it possible to create an in-expensive and energy-efficient cluster by combining several Raspberries into a cluster; we gain the ability to do more at once [1]. Many research works have been done using the Raspberry Pi platform in various fields as image processing, due to its portability, can be controlled via the internet and its low cost [30]. The Raspberry Pi is used in real-time applications to reduce system complications [36].
The need for portable, cheap, small, and powerful computers increased. A successful way to increase the computational power is by using parallel computing architecture such as OpenMP and Message Passing Interface (MPI) [26]. Raspberry Pi cluster has many advantages, such as its low cost and portability that is needed in smart cities applications, and running an open-source operating system on Raspberry pi makes it a preferred choice. The contribution of this paper are summarized as: The rest of the article is arranged as follows: Section 2 provides a brief description of the color image watermarking algorithm using QLFM, Section 3 presents the implementation of the watermarked algorithm on Raspberry Pi, Section 4 presents the performance and experimental results of our watermarking scheme, and the conclusion is ultimately provided in Section 5.

Quaternion color image
& The quaternion representation, as defined by Hamilton [11] as a generalization of complex numbers : Where i, j, and k are imaginary units and a, b, c, and d are real numbers. & Color image pure quaternion is defined as : The f R x; y ð Þ,f G x; y ð Þandf B x; y ð Þrefer to the RGB components of the color pixels, respectively.

Host image
& The input host image of size N x N in Cartesian coordinate is first converted to the polar domain using cubic interpolation [41]. & The idea of polar conversion is splitting a unit disk into many circular rings (M); each circular ring is divided into several non-overlapping circular sectors (K x Þ with the same area as the Cartesian image has uniform pixel size, as shown in Fig. 1 [18].
& The circular sectors number (K x Þ in each ith ring and the Radial distance (R x Þ of the x th the circular ring is: Where the value of M depends on the constraints N 2 M Nwhere N X N is the color image size, the innermost circular ring is at x=0, and (S) is the number of the circular sectors in the innermost circle. The value of θ xy can be calculated as: The LFM moments in polar coordinates : Where M pq is LFM moment, b f r x ; θ xy À Á is the interpolated image, I p r x ð Þ andI q θ xy À Á are the image kernels.
& Where the upper and lower limits of the radial integral are: & And the Legendre Polynomial ÀP p r ð Þ can be computed as follows:

Binary watermark image
The binary 2D watermark image is first converted to a 1D sequence using Arnold scrambling algorithm ( b2 l ð Þ ) to embed this 1D sequence in the interpolated host image to improve the watermarking embedding process's robustness with the scrambling algorithm.

Watermark embedding
& After computing QLFM moments of host color image, we select suitable QLFM moments for embedding watermark that achieve these two factors: First, QLFM moments with q=4 m are ignored. Second, only the positive repetition q is selected to avoid information redundancy.
& After selecting the suitable moments, the binary sequence of watermark image b2 l ð Þ is embedded in these selected moments by changing their magnitudes using the function of dither modulation [42] : Where

Watermark extraction
To extract the binary watermark image from the watermarked image, we don't need the original image in the extraction process; we follow the same steps as the embedding process in selecting the QLFM moments, and with the same quantization step, we quantize 0 M l ð Þ moments, the binary image value is determined to be 0 or 1 by comparing the distance between 0 M l ð Þ and quantized 0 M l ð Þ .the binary image is descrambled using the inverse Arnold scrambling algorithm to form the binary watermark image. All the steps of the QLFMs watermarking are summarized in Fig. 2. & The color image on Raspberry pi is first converted to the polar domain using cubic interpolation. Then the QLFM moment of the polar color image is calculated. The right QLFM moments are selected for the embedding process based on the two selection factors (positive repetition q and q≠4 m). Finally, embedding the scrambled binary watermark image in these calculated moments by modifying their magnitude using the dither modulation function.

Message Passing Interface and OpenMP
Message Passing Interface (MPI) is a library for writing parallel programs to accelerate the computing time using a cluster compared to a single computer; a cluster consists of at least two nodes. Message Passing Interface (MPI) is an interface that allows the head node (Master node) to distribute the computing task among all the slave nodes in the cluster in a parallel way as needed. The programmer must specify the number of processes in any Message Passing Interface (MPI) program. Each method has an identical program copy but executes only a specific part; the processes can send and receive data using MPI.
Message Passing Interface (MPI) allows parallel computation with the distributed memory where each node has its memory, and they are independent of each other. Therefore, MPI provides communications between the computing nodes. OpenMP is a programming extension for parallel computing. The memory is spread through all the computer systems. The program is divided into several parallelized processes that function on the memory that is shared. When the parallelized processes are finished, all the forked processes can quickly rejoin the master process [40].
OpenMP is a C++ library that allows shared memory for multi-core and multi-processor programming. OpenMP also offers methods for synchronizing the running threads and schedule the tasks [8].

Raspberry pi cluster
& Because of its low cost, single Raspberry pi is easily afforded where it could be used to handle a light workload with low power consumption. To increase performance, we can use a cluster of Raspberry pi's [7]. & The Raspberry pi cluster has several advantages, including its portability due to its lightweight and small size, low cost, and low power consumption. & For smart-cities watermarking application needs and the sake of portability, we can use Raspberry pi's cluster for watermarking purposes in a few seconds. & Lightweight, portable clusters are useful in trains, planes, ships, and cars. A mobile cluster can be configured to continue running even though the failures of any cluster nodes. It would be difficult to fix the cluster in some instances, such as in military use. & There is a continually increasing demand for small, portable, and low-cost, powerful computers that provide powerful computational power, increasing the computational power can be done by parallel computing, parallel computing on Raspberry pi cluster can achieve the computational power needs in applications where time is a critical metric in these applications, besides the low cost of the Raspberry pi cluster that makes it a better choice. & By integrating many Raspberries as a cluster, we gain the ability to do more at once. A cluster consists of at least 2 nodes to facilitate parallel computing with distributed memory, and the Message Passing Interface (MPI) is required. MPI provides communication between nodes and distributes the computing task between them. The number of processes must be specified. Each will have the same copy of the program, but each node executes only a specific program part determined by the Master node in the cluster. & The following items are required (as displayed in Fig. 4) to build our raspberry pi cluster:

Experiments results and discussion
Parallel robust QLFM color image watermarking algorithm using a C++ programming language is implemented on Raspberry Pi model 4 cluster, the cluster consists of 4 Raspberry pi's, the RPi's cluster degrades the time needed for the color watermarking algorithm, several experiments were done on a different number of RPi's cluster cores and different image sizes and moment order 40. The QLFM watermarking algorithm is tested on a different set of host color images, as displayed in Fig. 5.

QLFMS watermarking performance and numerical experiments
This section presents a set of numerical experiments to evaluate the performance of the QLFM watermarking algorithm. We evaluated the output in terms of visual imperceptibility and watermark robustness of the QLFM watermarking algorithm. The peak signal calculates the invisibility of the watermark to noise ratio (PSNR). In contrast, the robustness of the watermark is measured against multiple attacks using the bit error rate (BER) of the extracted watermark. The QLFM watermarking algorithm is tested on a different set of host color images and binary watermark images, as displayed in Figs. 5 and 6, respectively.

Watermark invisibility
The watermark invisibility of the watermarked image Fig. 7 is calculated in Table 1 to evaluate the algorithm performance using PSNR-peak signal to noise ratio on a color image of size 256 × 256 with quantization step ranges from [0.1 To 1], increasing the quantization step leads to decreasing the average PSNR [29]. SSIM-structural similarity image index is also calculated to measure the watermarked image quality and similarity [12,37].
The PSNR can be calculated as follows: Where MSE (mean square error) is defined as:  It is easy to notice that the extracted watermark images are very similar to the original binary watermark image as displayed in Fig. 8; these results prove the robustness of the QLFM watermarking algorithm against the standard attacks.
The obtained BER values are approaching zeros which ensure the robustness of the watermarking algorithm.

Bit Error Rate (BER)
The bit Error Rate (BER) metric is the ratio between wrongly extracted binary patterns and the length of the embedded binary bits. Therefore, the lower the BER, the greater efficiency of the embedding scheme [23]. BER of the watermark extracted bits measures watermark robustness Where nbits refer to the complete number of the watermarked image embedded bits, and Ber ror is the incorrectly extracted bits. Applying the BER function on our algorithm shows a similar watermark image extraction as the original binary watermark image embedded without extraction errors.

Reconstruction accuracy
Numerical experiments were performed on color images to show the reconstruction capabilities for the QLFM watermarking scheme, as shown in Fig. 9. Image reconstruction is an excellent way to assess the utilized algorithm's accuracy, especially for higher-order moments.

Comparison of QLFMs and other quaternion moments
The efficiency of the QLFMs watermarking algorithm is compared to that of all other quaternion moments using numerical tests. After embedding and extracting a 128-bit watermark utilizing QLFMs and other quaternion moments, the average BER values are appropriately obtained, as shown in Table 2.
The average PSNR and SSIM are also computed, as depicted in Table 3. The comparisons explicitly reveal that QLFMs outperform all other quaternion moments. These observations and results convinced us to use this proposed method in our work.

Implementation of single Raspberry Pi
The first step of the watermarking algorithm parallelization is profiling the sequential time for the different stages of the algorithm to know the time-consuming stages that must be parallelized. The sequential execution time of each step in the QLFM watermarking algorithm on Raspberry pi is listed in Tables 4 and 5; these experiments were done using moment order 40 and color image size 512 × 512 and 256 × 256 and binary watermark image of size 32 × 32.

Implementation on four nodes Raspberry pi cluster
As shown in Tables 4 and 5, The QLFM watermarking algorithm's two most intensive computational steps are the computation of the moment step and the watermarking image reconstruction step; these two steps can be parallelized using Message-passing Interface (MPI) on Raspberry pi cluster to accelerate execution time of the watermarking algorithm, these intensive computing steps are distributed into portions over the cluster nodes, and each portion on each node is parallelized over the number of cores available in the raspberry pi, each node execute only a specific part determined by the master node, each node has its memory, and they are independent of each other. MPI provides communications between the computing nodes; the nodes can send and receive data using MPI. The parallelized execution time of intensive computing steps in the QLFM watermarking algorithm on the Raspberry pi cluster is listed in Tables

Speedup and improvement ratio
To measure the efficiency of the watermarking algorithm implementation on the Raspberry pi cluster, we calculate the speedup and the execution time improvement ratio.
1. The speedup is the metric of efficiency to determine how much faster parallel execution is versus serial performance. The algorithm's sequential runtime ratio to the time taken by the parallel algorithm to solve the same problem on (n) processors.
The optimal speed up is 16x, 8x, and 4x for the 16,8, and 4 core cluster clusters. Practically this value cannot be reached.
2. The execution time improvement ratio is the ratio of comparing the two execution times. It can be calculated as [13]: Tables 8 and 9 show the speedup and ETIR of the parallelized QLFM watermarking algorithm on 4 raspberry pi cluster nodes. The speedup and ETIR are calculated for different cores with moment order 40 and color image size 512 × 512 and 256 × 256. Based on the speedups and ETIR listed in Tables 8 and 9, it's evident that parallelizing the QLFM watermarking algorithm is now much faster on Raspberry pi's cluster. For smart-cities watermarking application needs and the sake of portability, this technique is applicable. Since even after one of the cluster nodes fails, clusters can be configured to continue to run, we need this in some situations where repairing the cluster will be difficult. The transmission of sensitive images over the internet is still an issue, and a challenge against attackers-the limited portability of P.C. limits the smart-cities for new applications to be applied. Watermarking becomes an essential step in securing sensitive transmitted images using many algorithms. One of these robust watermarking schemes is quaternion Legendre Fourier moment (QLFM) for color images; detailed analysis and watermarking scheme steps have been presented. Searching for outstanding performance, lower power consumption, portability, and lower cost simultaneously, we can use a cluster of Raspberry pi's to perform the watermarking task in a few seconds. For smart-cities watermarking application needs and portability, this technique uses C++ programming language and parallel computing on Raspberry pi's cluster that shows excellent performance compared to ordinary expensive P.C. with low specifications to Raspberry pi specifications with its low cost and small size. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.