Distributed video coding scheme of multimedia data compression algorithm for wireless sensor networks
- 92 Downloads
The emergence of multimedia data has enriched people’s lives and work and has penetrated into education, finance, medical, military, communications, and other industries. The text data takes up a small space, and the network transmission speed is fast. However, due to its richness, the multimedia data makes it occupy an ample space. Some high-definition multimedia information even reaches the GB level, and the multimedia data network transmission is relatively slow. Compared with the traditional scalar data, the multimedia data better describes the characteristics of the transaction, but at the same time, the multimedia data itself has a large capacity and must be compressed. Nodes of wireless multimedia sensor networks have limited ability to process data. Traditional data compression schemes require high processing power of nodes and are not suitable for sensor networks. Therefore, distributed video codec scheme in recent years becomes one of the hot multimedia sensor network technologies, which is a simple coding scheme, coding complexity of decoding performance. In this paper, distributed video codec and its associated knowledge based on the study present a distributed video coding scheme and its improvements. Aiming at the problem that the traditional distributed video coding scheme cannot accurately decode the motion severe region and the boundary region, a distributed video coding algorithm based on gradient-domain ROI is proposed, which can enhance the coding efficiency of the severe motion region and improve the decoded image while reducing the code rate and quality, ultimately reducing sensor node energy consumption.
KeywordsSensation multimedia network device Data compression Region of interest Distributed video coding
Modern science and technology have driven the rapid development of information technology. Since the birth of the first computer in the 1950s, information technology has experienced three typical leaps. In the 1960s, computer systems were cited in various fields, which promoted the progress of computer hardware and software, information processing, and storage technology. The second leap was based on the emergence and widespread application of the Internet. The third leap forward is marked by the introduction of the concept of wireless sensor networks and the promotion of applications. Wireless sensor networks (WSNs) [1, 2] appeared in the 1970s, and further research on it began in recent years in this century. Related technical research has attracted wide attention from scholars. Moreover, with the increase in people’s attention, WSN technology is gradually closely related to real life. WSNs in industrial manufacturing [3, 4], health care [5, 6], environmental monitoring [7, 8], intelligent transportation , smart home [10, 11], military investigation [12, 13], and disaster relief and rescue  play an important role. Depending on the application environment and usage of the system, WSNs use a variety of different technologies. WSNs collect information on the monitoring area through sensor nodes, which are usually based on environmental parameters such as humidity, temperature, and light intensity. With the continuous advancement of human life, people are increasingly demanding to master comprehensive multimedia data (including multimedia information such as images, sounds, and videos). Obviously, the simple data provided by traditional WSNs can no longer meet the requirements of modern technology life. With the development of multimedia information acquisition devices such as camera nodes, portable audio and video equipment, and other related hardware devices, and the urgent need for multimedia information, traditional WSNs have gradually evolved into wireless multimedia sensor networks (WMSNs) [15, 16].
Multimedia data  is abundant, and some have reached the GB level. If it is not effectively compressed, the transmission will be prolonged, and it will occupy a huge storage space when stored, which creates space and waste. Therefore, the compression processing of multimedia data has increasingly become the focus of attention. With the continuous development of computer multimedia technology, many high-performance image compression technologies are emerging. Multimedia data compression technology is one of the key technologies for network development and plays an important role. Multimedia data compression technology is one of the relatively fast development technologies. The so-called multimedia image compression refers to rendering as much original image information as possible with a minimum amount of data. There are various kinds of redundancy in the image. For example, a picture background has a large number of identical pixel values. It is entirely possible to use a particular strategy and algorithm to store large background information with a small amount of space to achieve the purpose of compression. In multimedia applications, the amount of digital information is quite large, and it has high requirements on the storage capacity of the memory, the network bandwidth, and the processing speed of the computer. It is impossible to meet the actual requirements by adding hardware facilities, and it must be effective—compression technology. Data compression technology [18, 19] can be divided into two development stages. One is the creation and research stage of the theory from 1977 to 1984. The fundamental theoretical research in this period has been significantly developed. After 1985, it developed into a practical stage, which was the second period. During the development of basic theory, Leempel2Ziv (LZ77) compression technology was mainly researched and developed. This technology can perform techniques for finding redundant strings and replacing them with shorter symbol marks. The string-based compression technique has been experimentally tested, and this technology is a major milestone in data compression technology.
With the continuous development of wireless multimedia communication technology, more and more video application requirements have emerged, such as wireless multimedia sensor networks, mobile videophones, wireless video surveillance, and wireless PC cameras. Traditional video coding standards (such as MPEG, H.26X) use asymmetric coding. The coding uses motion estimation and motion compensation techniques to make full use of the temporal and spatial correlation of video sequences for predictive coding, which occupies a large number of resources. In general, the coding complexity is 5 to 10 times of decoding complexity. In these new video applications, traditional video coding standards are no longer applicable due to node computing power or node energy limitations. In recent years, a new video codec framework, distributed video coding (DVC) [20, 21, 22], has received wide attention from scholars. At present, the more classical distributed codec solutions mainly include the Wyner-Ziv video coding proposed by Stanford University Girod and Aaron [10, 11, 12, 13, 14, 15, 16, 17, 18], and PRISM video coding by Ramchandran et al., University of California, Berkeley [19, 20, 21]. The proposed layered Wyner-Ziv video coding  and the state-free distributed video coding proposed by Umar et al.  distributed video coding based on wavelet coding and multi-view distributed video coding.
Although many classic schemes have superior performance, there is still room for further research in reducing the code rate at the encoding end and fully exploiting the original source correlation to improve the coding efficiency. Many schemes, such as the spatial domain-based DVC scheme, fail to mine the correlation of image frames. Sex and most of the schemes are based on the encoding of the full frame , which largely causes a large burden on the encoding end. Based on the Slepian-Wolf theory and Wyner-Ziv theory, the present study proposes a distributed video coding algorithm based on gradient-domain ROI to reduce the decoding rate and improve the decoding quality.
2 Proposed method
2.1 Multimedia sensor network
Since the birth of sensor network technology, it has not been for decades. Recently, it has received the attention of countries outside Europe, China, Japan, and South Korea. At present, its development frontier is also constantly expanding. In general, the development trend can be roughly divided into two categories: One is to design wireless sensor networks for special tasks, such as wireless multimedia sensor networks and wireless sensor execution networks. The second is to design for special applications. Wireless sensor networks work in environments, such as underwater and underground environments. Broadly speaking, “wireless multimedia sensor networks” include “traditional scalar sensor networks,” “image sensor networks,” “audio and video sensor networks,” and “visual sensor networks,” and a mix of the above types of sensor networks. WMSN is a new type of sensor network with cameras, microphones, and other sensors that collect multimedia information. These multimedia sensor nodes form a distributed sensing network through self-organizing, and the nodes cooperatively perceive, collect, process, and transmit multimedia information such as audio, video, still images, and numerical data in the network coverage area and then collect the data through multi-hop mode. Go to the aggregation node for comprehensive and effective environmental monitoring. As an exclusive sensor network, WMSNs have their common features, such as node processing capability and limited network resources (computing, communication capabilities, energy, storage, bandwidth resources, large-scale, self-organizing, multi-hop communication, strong dynamics, application-related, data-centric, etc.), and also have distinct personality characteristics: (1) with multimedia, information processing capabilities. For example, audio, video, and images, and node and network processing capabilities are enhanced; (2) perceived media is rich, and there are a variety of heterogeneous data and (3) complex processing tasks, which (4) can fully and effectively perceive the environment. When designing wireless multimedia sensor networks, you need to consider the following: QoS requirements, high bandwidth requirements, energy consumption, multimedia information fusion, multimedia source coding technology, synchronization and positioning technology, multimedia coverage, network security, integration with the Internet (IP), and other wireless networks.
Compared with other traditional networks, wireless sensor networks have the following characteristics: (1) large-scale, high-density. In order to obtain complete and accurate information, wireless sensor networks usually need to deploy a large number of sensor nodes in the monitoring area, and the number of nodes is large. A large number of nodes work together to improve the monitoring quality of the system, increase the coverage of the monitoring area, and make the system have strong fault tolerance, thus ensuring the fault tolerance and invulnerability of the entire system. (2) Node and network resources are limited due to the limitation of cost and volume; the power energy, calculation, storage, and communication capabilities of the sensor nodes are very limited. At the same time, due to the physical characteristics of the wireless channel itself, the network bandwidth is relatively wired. Therefore, the wireless sensor network has the limited node processing power and network resources. (3) Self-organizing, distributed wireless sensor network node deployment and establishment does not depend on fixed preset network facilities; each node has self-organizing capabilities and can automatically configure and manage, through distributed network protocols and algorithms automatically. From a multi-hop wireless network system, there is generally no control center in the strict sense in the formed network. Each node can join or leave the network at any time. The failure of any node will not affect the normal operation of the entire network.
WMSNs are roughly composed of three main levels: the first level is the bottom layer, and the bottom layer is mainly composed of multiple nodes with sensing and collecting multimedia information. Usually, level 1 is located in the actual environmental monitoring area, and the main function is to collect the scalar of the required environment and multimedia information. Level 2 is a collection node. These information aggregation nodes aggregate the information transmitted by the bottom layer and process it. After the information is processed by the aggregation node, the information is further transmitted to level 3. The main function of level 3 is to control the whole network according to the received information. It is composed of some nodes with control capabilities. These three levels work in harmony, and each level has important research significance. Our research is mainly oriented to level 1, and more attention is paid on how to reduce the energy loss problem of level 1.
2.2 Data compression
The carrier of information is data, which is used to record and transmit information. What can we use is information from the data, not the data itself. Information theory has become the theoretical basis of data compression, and data compression technology is based on information theory. As the limit of data compression theory, information entropy is paved with two basic concepts of information. Data compression technology is increasingly developed, and coding methods adapted to various applications are constantly being produced. There are different compression methods for different types of multimedia data redundancy. Data compression technology must follow two basic principles: based on the human visual and auditory physiological characteristics, the compression-encoded audio-visual signals still have satisfactory subjective quality in recurring; the original source data has excellent redundancy. Removing the redundancy does not reduce the amount of information, and the data can still be recovered as it is. In fact, data compression is based on a certain quality loss, and a simplified data expression is derived from a given source in a certain way; it reduces signal space, such as physical volume, space, time, and spectrum, so that the signal can be arranged into a given set of information data samples.
Multimedia data can be compressed because there are three forms of redundancy in the original data: coding redundancy, such as coding with the same length of code for pixels with very different frequencies; inter-pixel redundancy, such as temporal or spatial correlation between adjacent pixels; and visual information redundancy which means that the human vision is insensitive to sharp changes in image edges, is weak in color resolution, is sensitive only to the brightness of the image, and has little influence on the image distortion after compression and decompression. The redundancy of these data and the human sensory characteristics constitute the basis of multimedia data compression and also determine the direction of data compression research. Taking the image storage method of the currently used bitmap format as an example, in this form of image data, there is an excellent correlation between pixels in both the row direction and the column direction, and thus the overall data redundancy. The margin is very large; under the premise of allowing a certain degree of distortion, the image data can be compressed to a large extent. The compression of data is actually an encoding process that encodes and compresses the original data. The decompression of data is the inverse of data compression, i.e., the compressed code is restored to the original data. Therefore, the data compression method is also called the encoding method. Data compression technology is a series of re-encoding of the original data, which can eliminate the extra data in the original data and can reduce the amount of data to a minimum, so as to achieve various images, audio, and video on the computer, which is the purpose of media data.
Two basic principles of data compression technology are based on the human visual and auditory physiology; compression-encoded audiovisual signals still have satisfactory subjective quality when recurring original source data has great redundancy and eliminates redundancy. It does not reduce the amount of information, and data can still be recovered. In fact, data compression is based on a certain quality loss, and a simplified data expression is derived from a given source according to a certain method. It allows signals to be organized into a given set of information data samples by reducing signal space, such as physical volume, time, and spectrum. The four indicators that measure the quality of data compression technology are as follows: (1) the compression ratio is larger, (2) the ratio of information storage required before and after compression is larger, (3) the recovery effect is better, and (4) the original data should be restored as much as possible. The hardware overhead for implementing compression is small. Data compression technology with the development of communication technology and computer technology, and the coding methods adapted to different applications are continuously generated. Currently, the commonly used coding methods can be divided into two categories: one is a redundant compression method, also called lossless compression method; another is entropy compression, also known as lossy compression. Among them, the redundant compression method removes or reduces the redundancy in the data, but this redundancy can be re-inserted into the data, so there is no distortion. This method is generally used for the compression of text data, which guarantees the complete original data is restored. The disadvantage is that the compression is relatively small. The entropy compression method compresses the entropy, so there is a certain degree of distortion. It is mainly used to compress data such as sound, image, and dynamic video, and the compression is relatively high.
2.3 Interested area
In the image of complex scenes, the place where the human eye perceives is considered to be the region of interest. It follows the visual habits of human eye physiology. Generally speaking, the human eye has a more contrasting color, a more obvious image feature, and a gradient. The area where the texture is more obvious is more sensitive. Also for an image, the middle area is more concerned than the area at the edge of the image. In the field of security video surveillance, faces and moving objects are often used as areas of interest. The most interesting area in the field of intelligent transportation is the license of vehicles, especially in people’s cities such as Beijing and Shanghai. The regulations require daily restrictions. If the police stare at the license plate number of each car every day, it is a very heavy task. As the giant’s thing, now, this cumbersome task is handed over to the camera collection system to handle it much simpler. Nowadays, the driver’s monitoring in the window of the car also becomes the area of interest, and it is determined whether the driver is making a call, and the driver’s face analysis is used to determine whether the fatigue has been automatically reminded. The determination of the vehicle is also the area of interest for intelligent traffic monitoring. At present, many cities have already opened the analysis of road congestion conditions and analyzed the road condition information through the judgment analysis of the vehicle. In the medical field, the tuberculosis tumor plate was detected as a region of interest, and the particle thrombus in the vessel wall on the myocardial infarction was determined as the region of interest. For different application fields, the selection of regions of interest varies widely, and specific analysis of specific issues is required. The development of automated extraction technology in regions of interest will certainly promote the economic development and progress of society.
Currently, there are two main methods to extract the regions of interest. First, through manual calibration, the accuracy of extracting the regions of interest is high, but it is highly susceptible to subjective factors. For today’s society, the transmission of massive image data is unrealistic, so this method is rarely used in today’s applications. Secondly, the traditional image segmentation method is used to extract the region of interest. The method is simple and fast and has a good effect on the extraction of the interest area on a specific occasion. The downside is that it only works for a specific area and has no versatility.
Thirdly, the feature extraction method based on feature extraction is suitable for complex scenes, but the premise needs to train image feature templates. This process is extremely complicated. Currently, mature templates are only limited to human faces and human eyes. In recent years, there are more popular methods of extracting regions of interest based on the extraction method of human visual mechanism. According to the visual habits of the human eye, it automatically finds the sensitive area of the human eye in the image. This method is suitable for the extraction of the region of interest in complex scenes. However, the accuracy of extracting the region of interest in this way needs to be improved.
2.4 Distributed video coding
Distributed video coding is based on the Slepian-Wolf theory and Wyner-Ziv theory, independently encodes two or more independent and identically distributed sources, and then uses a single decoder to utilize the correlation between sources for all encoded sources, perform joint decoding. The difference between it and the traditional video coding technology is that traditional technical standards are usually used to fully exploit the redundant information of the video signal at the encoding end. The coding complexity is generally 5 to 10 times the decoding complexity, which is suitable for the case where the video signal is encoded once and multiple times (such as video broadcasting, video on demand, video disc storage), and the distributed video coding has an encoder complexity, low degree, low power consumption at the coding end, good fault tolerance, etc., which is suitable for some wireless video terminals (such as wireless video surveillance systems, video sensor networks) with limited computing power, memory capacity, and power consumption.
The distributed coding method can be regarded as the source information on the wood to generate the side information through a noise channel, thereby recovering the source information according to the information related to the source (for example, or together with the side information, through channel decoding). This produces the following coding methods: the idea of grid coding based on two-step quantization (fine quantization and coarse quantization) is to decompose the entire coding space into multiple coding sets (cosets), and the distance is twice the maximum distortion of the edge information. According to this, the coding cost of the source information is first determined, the index of the set is encoded and transmitted to the decoding end, and the value of the side information is also transmitted. Then, the coset in which it resides is restored, and the element closest to the edge information value is found in the set as the decoded value, which can be proved to be equal to the source information value.
Fine quantization and coarse quantization are contradictory in coding performance. The smaller the step size of fine quantization, the smaller the resulting noise, and the higher the signal-to-noise ratio is. However, the smaller the step size, the larger the set of code words generated. In the case of maximum distortion, the larger the number of cosets required, the higher the data rate that needs to be transmitted. Conversely, the larger the fine quantization step size, the greater the distortion of the decoded result. The size of the set generated by the fine quantization is fixed. The larger the step size of the coarse quantization, the more cosets generated, and the greater the distance between the elements of coset, the greater the probability of correct decoding, but the corresponding rate is also corresponding. Increasingly, the smaller the step size of rough quantization, the smaller the number of cosets and the lower the transmission rate required, but the unit interval decreases, which may be less than twice the maximum distortion interval, leading to decoding errors.
3 Experimental results
In this simulation experiment, JPEG encoding and decoding for key frames (keyframes), LDPC encoding for W frames (Wyner-Ziv frames) at the encoding end, and EM iterative decoding and LDPC for different regions at the decoding end, and finally performing a spatial domain-based deblocking filtering algorithm on the decoded image at the decoding end, wherein the check matrix of the LDPC code is generated by the PEG method. The code rate of the LDPC code in this algorithm is 7/8. In the simulation experiment, two standard video sequences based on salesman and foreman are used. The image format is QCIF (176 × 144), and the number of encoded frames is 100 frames (30 frames/s). The gradient-domain ROI discrimination algorithm and Wyner-Ziv coding are compared respectively using the H. 263+ intraframe coding (“I + I + I + I”) algorithm, H. 263+ interframe coding algorithm (“I + P + I + P”), and traditional JPEG coding algorithm. Among them, H. 263+ encoder uses TMN8, and the JPEG standard uses AnnexK.
Gradient field-effect discrimination algorithm ROI only with LDPC decoding Wyner-Ziv coding of video systems based on the average number can be increased, because the algorithm accurately moved the subject region which was extracted more vigorously, and lossless entropy coding can be reduced not only based on the encoded code rate and strengthened the encoding process of these areas, greatly reducing the inaccurate motion estimation at the decoding end; on this emergency basis, in order to make full use of the redundant information in the frame and improve the decoding accuracy, the decoding end adopts the LDPC. The BP algorithm performs iterative decoding to improve the decoded image quality.
With the continuous development of video surveillance applications, distributed video coding has attracted more and more attention. It minimizes the complexity of the encoder while ensuring compression performance. Although the performance of distributed video coding is somewhat different from traditional coding under the same operating conditions, it does not reduce the confidence of researchers. Researchers are trying to find a new coding framework to solve the problem. In addition, the researchers have improved the various modules of the original coding algorithm, in order to be able to play all the improvements and ask questions, resulting in a significant improvement in performance.
The performance of data compression performance of multimedia video compression system depends largely on the compression method. The distributed video codec is a special coding scheme applied to WMSN network. It not only has the advantages of its own simple coding, but also has the traditional general characteristics of data compression. As a new type of wireless sensor network, wireless multimedia sensor networks have many new problems waiting to be solved. In current wireless multimedia sensor networks, not only are the node capabilities limited, but also are the network resources. The traditional distributed video coding scheme has the characteristics of simple coding end. Although it is very suitable for use in wireless sensor networks, it does not distinguish between motion and boundary regions at the coding end and uniformly encodes all regions. The decoding end causes a problem of distortion in motion and boundary region decoding estimation.
Although the current data compression technology has been dramatically developed and can be widely used in real life, the research on multimedia data compression technology needs further research, especially the principle and performance of various typical algorithms which need to be analyzed continuously. Comparison was performed in order to pursue a better compression effect to satisfy its main function of serving humanity. In turn, in the development of technology to achieve breakthrough progress, enabling rapid development of multimedia technology, this technology has a great significance for promoting the development of network technology, but also can further expand the application space of multimedia technology. In the research background of wireless multimedia sensor networks, this paper focuses on the multimedia data processing technology carried by multimedia sensor networks and determines the compression problem of multimedia data (audio and video) as the research direction of this paper. The main research content is distributed video codec technology, which is based on the Slepian-Wolf theory and Wyner-Ziv theory. Based on the traditional distributed codec scheme, this study proposes a distributed video coding algorithm based on gradient-domain ROI and proves the effectiveness of the algorithm.
The author would like to thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.
About the author
Ning Ma was born in Liaoyuan, Jilin, People’s Republic of China, in 1977. She received her bachelor’s degree from Shandong University, People’s Republic of China. Now, she workes in the College of Science, China University of Petroleum. Her research interest includes numerical solution of PDEs, computer software and simulation, and big data algorithm analysis. E-mail: firstname.lastname@example.org
The author NM wrote the first version of the paper. The author read and approved the final manuscript.
This work was supported by the Basic Subjects Fund of China University of Petroleum (Beijing).
The author declares that there are no competing interests.
- 6.W.Y. Chung, S.C. Lee, S.H. Toh, WSN based mobile u-healthcare system with ECG, blood pressure measurement function. Conf. Proc. IEEE. Eng. Med. Biol. Soc 2008, 1533–1536 (2008)Google Scholar
- 11.Alaiad A, Zhou L. Patients' behavioral intentions toward using wsn based smart home healthcare systems: An empirical investigation. 2015 48th Hawaii International Conference on System Sciences. IEEE, 824–833 (2015)Google Scholar
- 13.Zhang P, Willems F M J, Huang L. Investigations of noncoherent OOK based schemes with soft and hard decisions for WSNs. 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 1702-1709 (2011)Google Scholar
- 14.X. Cui, X. Chen, S. Wei, et al., The application of robot localization and navigation based on WSN in the disaster relief. Commun. Comput. Inf. Sci 320, 621–628 (2013)Google Scholar
- 15.I. Lee, W. Shaw, X. Fan, Wireless multimedia sensor networks. J. Comput. Res. Dev 46(3), 425–433 (2009)Google Scholar
- 19.Rajski J, Tyszer J. Test data compression and compaction for embedded test of nanometer technology designs. Proceedings 21st International Conference on Computer Design. IEEE, 331–336 (2003)Google Scholar
- 24.I.A. Umar, Z.M. Hanapi, A. Sali, et al., Towards overhead mitigation in state-free geographic forwarding protocols for wireless sensor networks. Wirel. Netw 6, 1–14 (2018)Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.