5.1 Status of Research on Emergency Interaction Technologies

As an island nation, Japan experiences many large and small earthquakes, typhoons, tsunamis, and other natural disaster events every year, prompting Japan to become one of the countries that place the greatest emphasis on emergency management. In order to effectively respond to various disasters, Japan has built a series of emergency communication systems to serve disaster prevention and mitigation efforts and has achieved remarkable results. In Japan, besides the central disaster prevention wireless network and fire prevention wireless network, which are mainly governmental functions, there are also disaster prevention wireless networks which mainly involve various autonomous bodies and local residents, and the whole emergency communication network is very well developed.

Some foreign developed countries started earlier in the field of emergency communication and have formed their own distinctive emergency management mechanisms and technical support systems, and mature and full-featured emergency communication products have been tested for a long time in actual use, but the command and control of the communication systems at home and abroad are different and cannot be directly introduced for application. In addition, the imported equipment is not suitable for nationwide deployment since it is expensive and has high maintenance costs.

Emergency communications in China began in the 1970s and was initially used for strategic reserve work as an emergency communications backup for the government and the military in special situations. With the national work emphasis shifting to economic construction, special communications gradually evolved into emergency communications based on the policy of combination of ordinary and military application, and were developed rapidly in the 1990s.

At present, China's emergency communication business is at a stage of high-speed development. Since 2006, governments at all levels nationwide have started to comprehensively build emergency platform systems at the request of the State Council, with the provincial and municipal levels establishing emergency command centers and the county level deploying small mobile emergency platforms. Currently, some regions have completed the construction and deployment of large, medium and small emergency platforms, and played an important role in dealing with Wenchuan earthquake, Yushu earthquake, Zhouqu mudslide and other emergent public events, Fig. 5.1 shows the current power emergency communication interaction system in China.

Fig. 5.1
An illustration of power emergency communication between power source, satellite, device, emergency repair site, electricity repair vehicle, and emergency repair center.

Power emergency communication interaction system diagram

However, the small mobile emergency equipment currently available in the country is either not well integrated and used with too much wiring erection work, or too large in size and weight to be portable, or the communication interface is single and not flexible enough to cope with complex on-site situations. Therefore, given that the current small mobile emergency equipment has many disadvantages, there is still much room for research and improvement.

5.2 Power Emergency Site Real-Time Interaction Technology

5.2.1 Audio and Video Codec Technology

Due to the high levels of maturity of coding and decoding technology, we plan to use the H.264 video coding and decoding algorithm and the G.729 audio coding and decoding algorithm for efficient real-time coding and decoding of remote assistance audio and video data streams. By adapting audio and video coding and decoding algorithms under different technical architectures, we can achieve support for cross-platform terminals.

  1. (1)

    H. 264 video stream encoding and decoding technology

The H.264 video sequence is divided into groups of pictures (GOP), pictures, groups of slices, slices, macroblocks, sub-blocks, and blocks based on its coding hierarchical structure. The encoding and decoding process is as follows:

  1. 1)

    Image group

The video sequence is first divided into a series of picture groups, each of which contains multiple images. The number of images can be fixed or set according to actual needs. The arrangement of frames in each image can be the same or different. The structure of the picture group is shown in Fig. 5.2.

Fig. 5.2
A 3-D diagram of picture group structure. It consists of G O P 1, G O P 2, and so on. Each group of pictures has only one I-frame, several B-frames, and P-frames.

Image group structure diagram

Each picture group contains only one I-frame and several B-frames and P-frames. The I-frame uses intra-frame prediction coding, which has a comparatively low compression efficiency, while the B-frames and P-frames mainly use inter-frame coding, which has a comparatively high compression ratio. Although the compression ratio of B-frames and P-frames is high, too many inserted B-frames and P-frames will lead to error accumulation and affect the encoding quality, especially in the case of video sequences with intense motion and scene changes, which are more likely to cause mosaic images. Therefore, the length of the picture group needs to be set according to the actual application, and the number and arrangement of inserted B-frames and P-frames need to be determined.

  1. 2)

    Image

The H.264 standard supports frame, field, and frame-field adaptive coding. When field coding is used, a frame is divided into two parts, the top field and the bottom field, which are encoded separately. An image is not just a physical image but a collection of top and bottom fields, frames, and so on. Since both unidirectional prediction (such as in P frames) and bidirectional prediction (such as in B frames) are commonly used in predictive coding, there are differences between the coding and display of images. During image display, P frames are displayed after the reference I frame, while B frames are displayed between the reference I frame and the P frames, as shown in the figure below. When predictive coding is used as a reference image, the P frames come after the I frame which is used as a refrence frame, while B frames come after the I and P frames which are used as reference frames, as shown in Fig. 5.3.

Fig. 5.3
A 3-D diagram of reference picture sequence. It consists of G O P 1, G O P 2, and so on. Each group of pictures has an I-frame followed by P-frames and then B-frames.

Reference image sequence

  1. 3)

    Slice group

The concept of image type is related to encoding, while the slice is the largest encoding unit in H.264. An image can be divided into one or more slices, each of which contains several macroblocks, including I, P, B, SP, and SI types. The I slice only uses intra-frame prediction, the P slice can use intra-frame and forward inter-frame prediction, and the B slice can use intra-frame, forward frame, and bi-directional frame prediction. The SP and SI types are extended functionality for switching between different bitstreams. Each slice group contains one or more slices and indicates the mapping between slices and the image. The decoding end restores the decoded macroblocks into a complete image according to the mapping method of the slice group.

  1. 4)

    Macroblock

The macroblock is the basic coding unit of the H.264 standard with a size of 16 × 16 pixels. The encoding process of H.264 can be simplified as the process of encoding each macroblock in the image. Each macroblock corresponds to a slice, and the type of the slice determines the method of macroblock prediction coding.

According to the intra-frame and inter-frame coding methods, a 16 × 16 macroblock can be subdivided into several sub-blocks, each of which corresponds to a different prediction mode. Intra-frame prediction can use two different sub-blocks of 16 × 16 and 4 × 4, and four prediction modes can be used in the 16 × 16 mode, while nine prediction modes can be used in the 4 × 4 mode. During encoding, all predictions need to be calculated, and the optimal prediction mode for the macroblock is selected based on the error calculation standard. Compared with intra-frame prediction, inter-frame prediction is more complex. There are not only four modes of 16 × 16, 16 × 8, 8 × 16, and 8 × 8, but also three modes of 8 × 4, 4 × 8, and 4 × 4 for each 8 × 8 sub-block mode. During encoding, all partitions of each 16 × 16 macroblock are calculated for error, and the partition with the least error is selected for encoding.

The H.264 codec is composed of several functional modules, including prediction, transformation, quantization, and others, and its process structure is shown in Figs. 5.4 and 5.5.

Fig. 5.4
A block diagram. It has F n current, F prime n minus 1 current reference, F prime n reconstruction followed by connections of 5 blocks and 2 summing points that give D n to T to Q to reordering to entropy to give N A L. X from Q goes to Q inverse to T inverse and then to summing point 2.

H. 264 encoder structure

Fig. 5.5
A block diagram. F prime n minus reference to M C followed by a switch from which D prime n goes to the summing point. N A L goes to entropy to reordering which gives X to Q inverse followed by T inverse to the summing point. It gives u F prime n to frame prediction and filtering to reconstruction.

H. 264 decoder structure

It can be seen from the functional module structure of the encoderthat H.264's encoder also uses a hybrid coding method that combines transformation and prediction. It processes the images of the input frame or field in macroblocks. If intra-frame prediction is used for encoding the current input image, the predicted value is obtained by motion compensation of the reference image that has already been encoded in the current slice. In order to improve prediction accuracy and compression ratio, the reference image actually used can be selected from past or future frames that have been encoded, decoded, reconstructed, and filtered. In other words, the frames that participate in encoding may be involved in decoding, filtering, and other reconstruction processes, and are used as reference frames for subsequent encoding.

The functional realization of the H.264 decoder is actually the implementation of the reconstruction chain in its encoder, which involves processing the input NAL data stream through entropy coding and other related preprocessing operations to obtain a set of transform coefficients X before inverse quantization. Then, similar operations to the reconstruction chain are performed to output the decoded image. A key to correctly reconstructing the decoded image is to pass the related parameters used in the encoding to the decoder for decoding, so the parameter sets SPS and PPS are very important for correct decoding in the decoder.

  1. (2)

    G. 729 Audio Stream Encoding and Decoding Technology

G.729 is an 8kbit/s speech coding standard that uses Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP). It was published by the International Telecommunication Union (ITU-T) in 1996.

The G.729 encoder uses the Code Excited Linear Prediction (CELP) encoding mode principle. The encoder runs on a 10 ms speech frame, which corresponds to 80 samples which sample at a rate of 8000 times per second. For each 10 ms frame, the speech signal is analyzed to extract the parameters of the CELP mode (linear prediction filter coefficients, adaptive and fixed codebook indices, and gain), which are encoded and transmitted. At the decoder side, these parameters are used to recover the excitation and synthesis filter parameters. The speech is reconstructed by filtering the excitation through a short-term synthesis filter. The short-term synthesis filter is based on a 10th-order Linear Prediction (LP) filter. The long-term or pitch synthesis filter is implemented by using a so-called adaptive codebook method. After calculating the reconstructed speech, it is further enhanced by a post-filter. The reconstructed speech is further enhanced by a post-filter after it has been calculated.

The audio coding principle used is shown in Fig. 5.6. As can be seen from the figure, the process of processing speech and obtaining transmission parameters is as follows: the prediction coefficients are obtained from the preprocessed speech signal. In the encoder, the reconstructed speech signal is used as the target signal for adaptive and fixed codebook search after being perceptually weighted. The target signal for pitch period analysis is obtained to determine the pitch period. The target search signal for the residual signal after removing the pitch period is obtained by using fixed codebook search. The adaptive and fixed codebook excitation is used to reconstitute the target signal through a synthesis filter. This process is the same as the speech synthesis process in the decoder, and the parameters are encoded and transmitted.

Fig. 5.6
A block diagram. Input voice to pre-processing, L P analysis, quantification, interpolation, synthesis filter, summing point, perceptual weighing, pitch analysis, fixed codebook search, gain quantization, and parameter coding gives transferred Bitstream. It has 2 more codebooks and a summing point.

Encoding principle of G.729 encoder

During decoding, the decoder first extracts the index of the parameters from the received bitstream. These indices are decoded to obtain the coding parameters corresponding to a 10 ms speech frame. These parameters include the long-term prediction coefficients, two fractional pitch delays, two fixed codebook vectors, and two sets of adaptive and fixed codebook gains. The LP coefficients are interpolated within each subframe and converted to LP filter coefficients. Then the following steps are performed for each subframe:

  1. 1)

    Construct the excitation by adding the adaptive codebook vector and the fixed codebook vector which are scaled by their respective gains.

  2. 2)

    Reconstruct speech by filtering the above excitation through the LP synthesis filter.

  3. 3)

    The reconstructed speech signal goes through a post-processing stage which includes an adaptive post-filter based on long-term and short-term synthesis filters, a high-pass filter, and a scaling operation.

5.2.2 Audio–Video Synchronization Technology

How to fairly provide streaming media services and traditional data services on TCP/IP networks is a core issue that needs to be considered by the network transmission control protocol. In addition, the delay, jitter, network congestion and other factors in TCP/IP network transmission cause a mismatch between the sending speed of the streaming media sender and the receiving speed of the receiver, which has an important impact on the quality of real-time media services, leading to media jitter (discontinuity) and poor matching between media (asynchrony) during media playback at the receiving end. Research on real-time multimedia transmission and synchronization is an important step to ensure the quality of multimedia services and is one of the key technologies in multimedia research.

The multi-party audio and video synchronization technology based on relative timestamps adopts the method of separately processing the relevant media, that is, processing the audio and video streams separately. The usual method is to use the initial playback time and RTP timestamp to coordinate the playback of audio and video data on the receiving end. When the receiving end receives a frame of data, it calculates the difference between the timestamp of that frame and the initial RTP timestamp, as well as the difference between the current time and the initial playback time, to determine whether the frame should be played and whether it is synchronized. However, as the receiving end continues to receive new data, it is necessary to continuously calculate and compare the time difference between each frame of audio and video data. In this method, there is no direct synchronization relationship between audio and video, but only references their respective timestamps and controls the playback of their own data separately using the same local time. Additionally, when the playback rate of audio or video data falls behind, it is necessary to continuously process multiple RTP packets to find a new synchronization point, that is, to find a frame of data that can correspond to the local clock. Therefore, it is necessary to find a method that can directly reflect the synchronization relationship between the audio and the video, and can quickly find a new synchronization playback point when their playback rates are inconsistent. The video capture and the audio capture theoretically start at the same time, but due to the sequential execution of the program, the starting time of the video capture and that of the audio capture must be different, resulting in different encoding starting times. First, start the video capture thread, then immediately start the audio capture thread, and record the time difference between the video capture time and the audio capture time. Secondly, calculate the number of video frames that should be discarded based on the capture rate of the video and that of the audio, and then start the video and audio encoding threads. Video frames should be discarded before starting the encoding process. The system's video bitrate is 29.97 frames per second and the audio frame size is 10 ms. The encapsulated video RTP packets and audio RTP packets are respectively stored in the video send buffer and audio send buffer of the sender, the video packets and the audio packets of the same sequence number being sent at the same time. If timestamp method is used to transmit data, there is no need to change the data stream or attach a synchronization channel. The disadvantage is that the selection of relative timestamp and the determination of timestamp operation are rather complex, and an amount of overhead is needed for synchronization operation. In addition, when the transmission rate of the video or audio data packets lags behind, it is necessary to continuously process multiple RTP packets to find new synchronization points. The proposed method can directly reflect the synchronization relationship between the video and the audio, and when their transmission speeds are different, a new synchronization playback point can be found quickly. This solution introduces an index for the video package in the RTP packet header of the audio frame. This index indicates the sequence number of the first RTP package of the video frame played simultaneously with the audio frame and can be implemented through the extension structure of the RTP packet header.

The index relationship between the audio frames and the video frames is shown in Fig. 5.7. The number above the audio data packet is its own RTP sequence number, and the number below is the RTP sequence number of the corresponding video data packet. The reason for adding the index number to the audio packet is that on the one hand, people are more sensitive to changes in sound, and on the other hand, the data volume of the video is larger than that of the audio, and usually the lack of synchronization is caused by the lagging of the video.

Fig. 5.7
A diagram presents indexes of the audio frame and video frame. The indexes of the video frame are 1, 2, 3, 4, 5, 6, 7, 8. and 9. The indexes of the audio frame are 1 1, 2 3, 3 6, and 4 9 over video frame indexes 1, 3, 6, and 9.

Index relationship between the audio frames and the video frames

Using this synchronization scheme has the following three advantages: First, when the audio and the video at the destination are not synchronized, there is no need to process multiple RTP data packets continuously until a new synchronization point is found. This not only reduces the processing burden on the destination but also speeds up the synchronization process. Second, it is beneficial for the source to adjust the data transmission rate to adapt to changes in network bandwidth. When network congestion occurs, the sender needs to reduce the transmission rate. At this time, a frame dropping strategy can be used, which means only sending keyframes of the audio data index. This can adapt to network jitter faster. Finally, the index number is directly implemented through the extension structure of the RTP packet header. The method is simple to operate, less costly and is real-time. It does not require a synchronization clock and has good compatibility.

5.2.3 Audio and Video Transmission Technology

  1. (1)

    Transmission control and congestion control technology

The system uses G.729A audio codec, H.264 video codec, and RTP real-time transmission protocol, and is divided into four modules: audio and video collection, encoding and decoding, transmission control, and playback, to solve the bandwidth management problem of audio and video streams during transmission. The following diagram shows the hierarchical structure of the system. At the sender end, the audio and video are sampled and encoded, and then packaged and sent out through the RTP protocol. Then, based on the network feedback information, the RTP protocol estimates the available transmission bandwidth of the network and adaptively adjusts the encoding output rate of the codec (including the adjustment of source code rate and channel code rate), so that the audio and video bitstreams can meet the restrictions on the current network transmission bandwidth. At the receiver end, the received audio and video streams are decoded, the audio and video signals are reconstructed, the current network transmission parameters (such as the packet loss rate during transmission) are calculated, and feedback control information is sent as shown in Fig. 5.8.

Fig. 5.8
A block diagram. The sending end has an audio and video acquisition module followed by an encoding module, rate adjustment, rate control, R T P layer, U D P layer, and network. Network to U D P layer followed by R T P layer, Q o S monitoring, feedback control, decoding, and playback modules at receiving end.

Block diagram of audio and video real-time transmission system

In practical applications, network-based congestion control is quite challenging because the transformation of the entire Internet architecture cannot be completed overnight. Therefore, in the existing limited network conditions, it is particularly important to improve the quality of video transmission services and achieve efficient use of network resources as optimally as possible. Terminal-based congestion control treats the entire network as a black box, and obtains congestion information based on end-to-end measurement parameters. Then it adjusts the terminal's sending rate according to a certain strategy based on the congestion information, and reduces the load on the network congestion path to achieve control over network congestion. In this sense, end-to-end congestion control belongs to the domain of congestion recovery.

Real-time multimedia systems often use UDP for transmission. UDP has no congestion control mechanism. When TCP detects network congestion, it uses the AIMD (Additive Increase and Multiplicative Decrease) algorithm to reduce its sending rate by half. If TCP continues to detect network congestion, the sending rate will be continuously reduced until the data transmission is stopped. At this time, the application of UDP will take up all the bandwidth, causing congestion for the application of TCP. One solution is to replace the First-In-First-Out (FIFO) scheduling algorithm in Internet routers with algorithms similar to Random Early Detection (RED) to increase differentiated services for different types of data. For example, by using the Class-Based Queue (CBQ) protocol in routers, different bandwidths can be dynamically allocated to different types of applications to ensure Quality of Service (QoS) for various applications. Another method is to add congestion control mechanisms to UDP-based applications, which can ensure that UDP and TCP data streams coexist in a friendly manner (TCP friendliness). This method requires adding a congestion control mechanism to the top of UDP. This process mainly involves adjusting the encoding rate of the video stream to adapt to the network bandwidth. Because the network bandwidth is variable and unknown, it is not possible to directly set a transmission rate to adapt to the network status. Two methods are usually used for real-time adjustment: one is the window method, which gradually increases the transmission rate and then reduces the sending rate when packet collisions are detected (packet loss is detected) on the network. The other is rate-based method, which estimates the network bandwidth resources first, and then adjusts the target rate of encoding to adapt to the network status. Window-based solutions will introduce retransmissions similar to TCP, which is unacceptable. There are usually three solutions: rate control based on the sender, rate control based on the receiver, and hybrid control.

  1. 1)

    Rate control based on the sender

By adjusting the transmission rate at the sending end to adapt to the network, the packet loss rate can be greatly reduced when the transmission rate matches the network bandwidth. In implementation, a feedback channel is typically required to transmit network status information detected from the receiving end back to the sending end, which adjusts the transmission rate based on the network status information.

  1. 2)

    Receiver-based rate control

One typical representative of this type of rate control method is the Receiver-Driven Layered Multicast (RLM) algorithm. The RLM algorithm is the first practical adaptive algorithm based on layered video data transmission over the Internet and driven by the receiver. Its main idea is that the sender divides the video data into multiple layers and uses independent multicast groups to send each layer. When the data starts to be transmitted, the receiver only needs to join the first layer of data. Then, it periodically joins higher layers of multicast to probe the network bandwidth. If the receiver does not experience congestion for a period of time, it will continue to join higher layers of multicast groups. If packet loss is detected, the receiver will exit from the highest layer of data it is currently receiving and receive data at lower layers.

RLM is considered a promising direction for adaptive video transmission. Firstly, because it is compatible with the current “best-effort” Internet architecture; secondly, because it implements adaptive strategies at the receiver end, it has good scalability and can solve receiver heterogeneity problems. However, RLM has many problems, namely, it does not consider the fairness between data streams, nor does it consider the synchronization between the joining and departure of the receiver. Additionally, a failed trial can cause congestion for other data streams.

  1. 3)

    Hybrid Rate Control

The hybrid method of rate adjustment adjusts the rate based on feedback from the reverse channel by the sender, while the receiver adds or removes channels to transmit layered or non-layered images in multicasting. Unlike the sender-based method, the hybrid method uses multiple channels, and the rate of each channel is not fixed and can be adjusted according to the congestion status of the network.

We plan to adopt a host-based network-adaptive-sending-rate-control-method for congestion control in audio and video transmission. This method is built on an improved version of UDP, which needs to be improved to ensure real-time transmission of multimedia signals in the network. The UDP protocol used adds control bits in the packet header to provide more detailed descriptions of the packet, enriching the information of the UDP protocol. At the same time, a sequence number is added to the packet header, which allows the receiving end to reassemble the data when multiple UDP packets are used to transmit information, ensuring correct unpacking of the data. The method is to encapsulate additional header information for UDP datagrams, as shown in Fig. 5.9.

Fig. 5.9
A diagram presents 3 layers. The bottom layer represents the timestamp followed by the sequence number in the middle. The top layer has 5 sections: packet type, frame type, congestion type, reserved, and packet length of 2, 2, 2, 10, and 16 bits, respectively.

Diagram of improved header information

  • Packet Type (PT): 2 bits, indicating whether the transmitted packet is a video data packet or a feedback packet.

  • Frame Type (FT): 2 bits, indicating which frame is to be transmitted first.

  • Congestion Indication (CIn): 2 bits, indicating the congestion identifier of the feedback packet (e.g., 1 is used to indicate packet loss).

  • Reserved: 10 bits, indicating whether it is a timestamp or RTO.

  • Timestamp: 32 bits, used for RTT calculation.

The congestion control of video end-to-end is implemented by relying on the sender and the receiver, and control packets without data can be identified. Considering that the packet loss requirement of video streams is not strict and it is sensitive to delay, feedback packets do not cause retransmissions. This method is very effective in detecting bandwidth and estimating RTT, and can adjust the transmission rate of video streams in a timely manner.

  1. (2)

    Audio and video concurrent recording and storage technology

As for remote audio and video servers, a specific storage plan is required for the storage of media data. From the perspective of the file system, media data should be stored in a file-based manner. From a human perspective, what people are concerned with are segments of media data, such as recording data triggered by a specific event or recording data scheduled at a specific time, which can be summarized as a recording “segment”.

The most intuitive approach is to use one “file” to represent one “segment,” but it is not practical because the size of a segment is difficult to determine. For example, in the case of continuous recording for 24 h, the size of a segment can be infinitely large, but the file system usually has a maximum size limit for files. It is not feasible to put the entire segment data into one file, and oversized files are not conducive to system management and the use of playback and backup functions. For timed or triggered recordings that generate one “segment,” since the length of this time period is difficult to define, the sizes of the files may vary greatly, which will cause inconvenience to the management of recording data. Too many small files may also reduce the efficiency of the file system. Therefore, it is necessary to consider merging small “segments” into one “file” for management.

A compromise approach was adopted based on the discussion above for the storage of media data on remote audio and video servers. Firstly, the size of each recording file was limited to 200 MB, and the size of a recording file could not exceed this limit. Then, several recording segments could be stored in this limited-size recording file. When the media data needs to be stored, a new segment would be created in the currently underused recording file and the media data would be written into this segment. If the size of the recording file was to exceed the limit of 200 MB during the writing process, the current recording segment would be ended, and a new recording file would be created, then a new segment would be created in the new recording file, and the storage process would continue.

With this design, the size of each video file is limited to 200 MB, which facilitates the management of these files in the system. However, inevitably, video segments that should have been stored continuously will be truncated into one or more segments due to the limit on the size of the file, which is particularly serious in the case of 24-h uninterrupted continuous storage. For this problem, the system provides search functionality for already stored video files, and the search criteria can be based on the recording time, triggering events, or a combination of multiple conditions. The resultant file records can be used for playback, video conferencing, downloading, and other functions. Considering that the system can manage a large number of encoding devices and store massive amounts of media data, users need to process media data based not only on the customized recording plan, but also on specific times or triggered events. Therefore, by providing time and event search, all video segments that users are interested in can be retrieved. Although video segments in the system may be truncated into multiple segments due to the limit on the size of the, overall, it will not greatly affect the user experience when users are processing video data. The search function is implemented by other modules in the system.

When the storage space is insufficient, users can choose two strategies: stop storage, which means the system stops storing the video data of that media source; or overwrite storage, which means the system finds the earliest recorded file of that media source, deletes it, and continues storing video data in a new file.

According to the above storage scheme, the media file storage structure adopted by this system is shown in Fig. 5.10.

Fig. 5.10
A diagram presents the structure of files and the data segment. The file has segment header information, segment data, segment header information, segment data, and so on. The data segment has packet header information, data content, packet header information, data content, and so on.

The structure of media data files

The segment header information includes a start code (4-byte “SEGH”), device type, stream mode, video size, frame rate, keyframe interval, and segment length. Following the segment header is the segment's data content. When media data files are being analyzed, the next segment header can be located from the current segment header, thus traversing the entire media data file.

In the segment data content, the data is stored as individual data packets with package header information. The package header includes the arrival time of the data and its length.

We need to implement a search function for media data recording segments, so we have established an index table for recording segments in the database to facilitate data retrieval and speed up the search process.

As shown in Table 5.1, the server ID and media source ID are UUIDs, used to identify the server where the recording segment is located and the media source from which it originated.

Table 5.1 Index table for media data recording segments

The specific storage process is as follows: the storage object maintains a video data buffer, and the working state class registers the receiving callback with the SDK when starting the stream sending, and then pushes each received video data packet to the session buffer. The purpose of ensuring that the size of each write data is around 512 KB is to reduce the movement of the hard disk head in a multi-threaded environment. The larger the amount of data written each time, the lower the frequency of head movement. Before ending the writing of several data packets and waiting for the next writing, the recording segment index information in the database and the length of the segment data in the recording file will be updated. This is to make sure that the recording segment index in the database and the recording file records are relatively new and the lost data is relatively small in the case of abnormal situations such as power failure. By using the methods of search and file analysis, the video recording data stored before the system gets abnormal can also be found.

5.2.4 Audio–Video Processing Algorithm Model

  1. (1)

    Audio noise reduction and video enhancement technology

  2. 1)

    Electric Power Field Audio Noise Reduction Technology

We research on key technologies for audio collection and noise reduction in power field in view of the large noise of various power equipment such as transformers in power field. We research on active noise reduction algorithms for audio based on the noise environment during power grid operations. We study the adaptive algorithm VSSLMS and the effects of adjusting parameters such as step size genetic factor, weight of instantaneous error energy, maximum and minimum values of algorithm convergence step size on reducing noise in the operation field. The specific design scheme is as follows:

There are many methods for audio noise reduction, which can be divided into active noise reduction and passive noise reduction according to whether there is a reference signal. According to the frequency band processed and whether the useful signal and noise are at the same frequency, noise reduction can be divided into classical filtering noise reduction and modern filtering noise reduction. Regardless of the method used, noise reduction still relies on filters. Adaptive filters are an important part of modern filters. Adaptive noise cancellation methods belong to active noise reduction methods, which utilize adaptive optimal filtering theory. The signal to be processed consists of a useful signal and background noise, and the background noise is correlated with the noise in the reference signal. The purpose of the adaptive noise cancellation method is to remove the background noise in the signal to be processed. Therefore, adaptive noise cancellation technology mainly uses the reference noise signal obtained to process the background noise in the signal.

Adaptive filters have been widely used in channel equalization, echo cancellation, antenna reception, linear prediction, image recognition, and other fields. According to Wiener filter theory, the adaptive filter for noise cancellation requires an infinite number of taps to minimize the output error. In practice, it is impossible to make a filter with an infinite number of taps, so a finite impulse response (FIR) filter that meets the Wiener filter theory must be used, which means that the adaptive filter must have a finite length.

Passive noise reduction is the process of reducing noise from an input signal using an existing noise reduction model, without a reference noise signal. Active noise reduction, on the other hand, involves the use of adaptive algorithms to cancel out noise that is mixed with the useful signal, with the aid of a reference noise signal. The main difference between the two is that active noise reduction adjusts its parameters based on the input signal to remove noise, while it is not the case for passive noise reduction. Active noise reduction has a lower algorithm complexity, while passive noise reduction has a higher algorithm complexity. The output signal of active noise reduction has a higher signal-to-noise ratio, while that of passive noise reduction has a lower signal-to-noise ratio, especially when the background noise changes dramatically.

The basic principle of active noise reduction using the VSSLMS adaptive algorithm is shown in Fig. 5.11. In the figure, s(n) is the useful signal, v(n) is the background noise, and the expected signal d(n) is the signal to be processed, which is composed of the useful signal s(n) and the background noise v(n), and s(n) and v(n) are uncorrelated.

Fig. 5.11
A block diagram. The signal source and noise source go to points 1 and 2. D of n is equal to s of n plus v of n from point 1 to the summing point. X of n equal to v prime of n from point 2 to adaptive filter gives y of n to summing point. The summing point gives e of n equal to s prime of n.

Block diagram of adaptive algorithm for noise cancellation

The input of an audio active noise reduction system consists of two signals: the first is the useful signal mixed with noise, and the second is the reference noise. Based on the VSSLMS adaptive algorithm, the noise reduction system utilizes the reference noise from the second signal to cancel out the noise in the first signal and output the useful signal. Compared to passive noise reduction, active noise reduction can adjust its parameters according to the changes in the input signal, and thus effectively remove noise. Compared to classical filters (high-pass, low-pass, etc.), the adaptive filter of the active noise reduction system can remove noise at the same frequency as the useful signal.

The hardware platform for active audio noise reduction is shown in Fig. 5.12. The active noise reduction system is mainly composed of a hardware platform and Verilog HDL program. The main microphone is used to collect the useful signal mixed with environmental noise, while the reference microphone collects the environmental noise, corresponding to d(n) and y(n) in the algorithm, and the audio output is equivalent to e(n) in the algorithm.

Fig. 5.12
A block diagram. F P G A cyclone 2 to audio chip W n 8 7 3 1 via data bus. Audio chip W n 8 7 3 1 gives G U X to the audio output device. Main M I C and reference M I C to an amplifier that gives A U X to audio chip W n 8 7 3 1.

Hardware platform diagram of active noise reduction system

  1. 2)

    Enhancing video quality in jittery mobile environments

The mobile video end used in this technical solution includes wearable and handheld devices that are used for audio and video interaction. For tasks such as object detection and tracking, the shaking of the camera video during movement creates two motion vectors in a mobile background, resulting in inconsistent image coordinate systems between adjacent frames. This lack of stability in the output image leads to errors when the video image is further observed, making it difficult to extract valid and accurate information. Therefore, in order to output stable video in real-time, it is necessary to use motion estimation algorithms, motion filtering, and image compensation techniques to enhance mobile video images in complex backgrounds where shaking occurs.

  1. 3)

    Motion estimation algorithm

  2. a)

    Block matching motion estimation algorithm

Block matching is the most common method of translational motion estimation. The full-search block matching algorithm is considered to be the most accurate method of translational motion estimation. The algorithm is easy to understand and easy to implement in large-scale integrated circuits, so block matching is often used in engineering to estimate translational motion. The algorithm divides the image into non-overlapping sub-blocks of equal size based on the size of the image. When the sub-blocks are being made, it is necessary to ensure that the pixels in each block have similar motion trends, so it is not suitable to make the sub-blocks too large. Generally, the size of a sub-block is 16 * 16. The position of each sub-block in the next frame is found based on a matching criterion. Since most part of the image is background, the motion trend of most sub-blocks can be used as the motion trend of the background.

Assuming that the size of the reference frame is MxN and is affected by the translational jitter within the amplitude range of (±dxmax, ± dymax), the search range for the current frame is between (M ± dxmax) and (N ± dymax). The search block traverses each pixel within the dashed box, and the pixel values covered by the search block are processed using a 2D mean value. The best offset position satisfying the search criteria is found through calculation. The coordinate difference between the initial and final positions of the search block represents the motion vector (U,V) of the current video frame, where U is the horizontal offset and V is the vertical offset. Assuming that the image size is MxN, and fk1 and fk are the image blocks where the matching block and search block are located, the following criteria can be used to match the image blocks: Normalized Cross-Correlation Function (NCCF), Mean Square Error (MSE), and Sum of Absolute Differences (SAD).

In view of the three equations above, the normalized cross-correlation function (NCCF) is used as the matching criterion. It can be seen that when the two sub-blocks are the best matching sub-blocks, NCCF(i,j) reaches its maximum value, which can be as big as 1. When mean squared error (MSE) is used as the matching criterion, the MSE is the smallest, which can be as small as 0, when the two sub-blocks are the best matching sub-blocks. When the sum of absolute differences (SAD) is used as the matching criterion, the SAD is the smallest, which can be as small as 0, when the two sub-blocks are the best matching sub-blocks. From the block matching process, it can be seen that by using a full search method, the search block needs to traverse all pixels in the search area to determine the position of the optimal matching sub-block. The search process is very time-consuming and computationally intensive, making real-time processing difficult to achieve. Although many improved search methods have been proposed later to reduce the matching time of block matching, such as three-step search, diamond search, four-step search, etc., the increase in speed has also caused a decrease in motion estimation accuracy. In order to improve the speed of translational motion vector estimation, the grayscale projection algorithm has been proposed, which greatly improves the speed of translational motion estimation.

  1. b)

    Grayscale projection motion estimation algorithm

The essence of the grayscale projection motion estimation algorithm is a kind of image dimensionality reduction matching algorithm. By accumulating the image data for each row and each column respectively, the image data is mapped to two one-dimensional image data about the row and column data. Through the cross-correlation calculation of the reduced-dimensional row and column data, the offset at which the minimum cross-correlation value is found is obtained. The global motion estimation offset is obtained based on the offset. The grayscale projection algorithm only performs cross-correlation operation on the reduced-dimensional row and column data, avoiding the cross-correlation calculation of each pixel in the image. It reduces the computation while improving the matching speed, making it a fast matching method.

Therefore, cosine filters are often used to filter the projection curves to reduce the interference of edge information. From the theory of the gray projection algorithm, we can infer that if the contrast of the image is not sufficient, the data accumulated in each row and each column will be very close. When cross-correlation calculations are being performed, there may be no “valley” present. Therefore, histogram equalization preprocessing of the image is often required before the gray projection algorithm is used to enhance the contrast of the image. The gray projection algorithm accumulates the entire image along each row and each column to achieve dimensionality reduction, which cannot avoid the influence of local motion. Currently, based on the assumption that the foreground moving object is generally located in the middle of the field of view, the use of the gray projection algorithm should be avoided in the middle area of the image to avoid local motion interference. Although the gray projection algorithm solves the problem of fast translational motion estimation, it has high requirements for illumination and image similarity, and the algorithm's robustness is poor. The development of phase correlation algorithm has well solved the robustness problem of the translational motion estimation algorithm.

  1. c)

    Phase Correlation Motion Estimation Algorithm

The phase correlation algorithm is a robust frequency domain processing algorithm that not only has strong noise tolerance but also is insensitive to changes in brightness. It has low requirements for the similarity of image content, and as long as the degree of similarity can be 1/3, it can accurately estimate the motion between two frames.

The phase correlation algorithm is the only translational motion estimation algorithm based on frequency domain processing. It not only has strong noise resistance and anti-illumination changeability, but also has good performance on occlusion, and can estimate large translational motion. Currently, fast Fourier transform can be implemented in both hardware and software, greatly improving the computational efficiency of phase correlation, making it possible for engineering applications.

Currently, service systems are using phase correlation motion estimation algorithms to evaluate and estimate images. This approach not only incorporates the advantages of the grayscale projection algorithm, which solves the problem of fast translational motion estimation, but also effectively handles video images with low degree of contrast and image similarity. This algorithm has good robustness.

  1. d)

    Motion filtering and image compensation

To obtain a stable and enhanced video, we need to separate the jitter from the motion of the camera device and only compensate for the jitter while preserving the original subjective motion.

There are two main methods for separating jitter motion from subjective motion of the camera device: one method uses filtering based on the frequency difference between jitter motion and subjective motion, while the other method simulates the subjective motion of the camera device through curve fitting. In practice, due to the diversity and complexity of camera device motion, it is difficult to simulate the real camera device motion trajectory, and curve fitting is rarely used. In most image stabilization algorithms, filtering is still used.

In the filtering method, the mean filter is the simplest filter, but in this system, we will use a recursive filter combined with bilinear interpolation to compensate for the image.

The recursive filter is an optimal autoregressive data processing algorithm widely used in engineering applications. In this system, the recursive filter is used for motion filtering to separate the desired motion of the camera from the jitter, preserving the expected camera motion and treating the camera jitter as noise. Based on state equations and observation equations, the recursive filter does not simply take a weighted average or arbitrarily select one of the estimated values from the state equation and the predicted values from the observation equation as the output result, but rather determines the final output result through the minimum mean square error matrix.

Image compensation is a process of transforming the original image based on the difference between the actual motion trajectory and the motion trajectory after motion filtering, thereby obtaining a stable video that satisfies the smoothed motion trajectory after motion filtering. When new pixel coordinates is being computed, it is common to encounter situations where the new pixel coordinates are floating point numbers. In such cases, interpolation is commonly used to handle the situation.

The recursive filter will be used to distinguish between the subjective motion of the camera and high-frequency jitter noise. The bilinear interpolation method will be used to achieve the requirements of the stabilization algorithm in terms of preserving edge information and algorithm complexity. Therefore, the bilinear interpolation method is chosen to perform the reverse image shift compensation on the image.

  1. (2)

    Audio and video adaptive adjustment optimization

Due to the instability of mobile network environments such as 3G, 4G, WiFi, etc., real-time data transmission over the Internet can result in: (1) Delay (the time it takes for audio and video data packets to be transmitted from the sender to the receiver. The larger the delay, the more serious the damage to the quality of communication); (2) Delay jitter (i.e., the difference in delay between different data packets. Delay jitter can cause discontinuous audio and affect the quality of communication).

Therefore, it is necessary to regularly adopt various strategies on mobile terminals to optimize and process the audio and video. The specific measures are as follows:

  1. 1)

    According to the timing algorithm, the network environment is searched at regular intervals, and when the network bandwidth drops below certain threshold values, the following measures are taken for processing.

  2. 2)

    The video resolution and audio sampling rate can be dynamically allocated based on certain threshold ranges, and adaptively processed according to the bandwidth status.

  3. 3)

    Recovery measures for packet loss include requesting the remote end to resend the lost packet by using TCP packets with the NACK (Negative Acknowledgement) feature, which meets the standard.

    If the request is not received by the remote end due to delay, the TCP packet should be considered lost.

    If the request is received by the remote end, the TCP packet will be recovered.

    Otherwise, the previous link will be disconnected and the attempt to decode the video will result in frame loss until the next video frame arrives. Meanwhile, increasing the TCP packet buffer size can partially solve this problem.

  4. 4)

    Jitter buffer processing measures. Although the jitter buffer can cause a delay of approximately 100 ms, it can improve the quality of the video. After caching TCP packets, the NACK packet is used to request lost packets, and the lost TCP packets are recovered. The jitter buffer also records the cached frames based on the TCP sequence number to achieve frame compensation. This may cause some delay, but it will result in a smooth video at the frame rate, such as 24 or 25 frames per second, which is calculated based on the TCP timestamp.

  5. 5)

    When a TCP packet is lost, it may result in frame loss in the video. The video system tries to avoid packet loss, but sometimes it is inevitable and results in frame loss. These lost frames will be transmitted to all endpoints when the video stream is split. When reconstruction technology is being used, the system will render the stream on the terminal and send a request for new packets to repair the video frames. Only the lost TCP packets will interrupt the display, and they will be immediately recovered.

  6. 6)

    Using a circular dynamic queue to cache the audio and video frames can achieve the goal of optimizing the audio and video on mobile devices in unstable network conditions.

  7. (3)

    Video target tracking and labeling technology

Tracking of moving targets in a mobile environment relies on continuous learning of the locked target to obtain its latest appearance features, thus refining the tracking in a timely manner to achieve optimal state tracking.

At the initial stage, a comprehensive scanning is performed on each frame image based on a static target image provided, to find all appearances similar to the target object. Positive and negative samples are generated from the results of detection. As the target moves continuously, the system can continuously detect and obtain changes in the target's angle, distance, depth of field, and other aspects, and identify them in real-time. After a period of learning and training, the target can be accurately captured and labeled. The specific processing techniques include:

  1. 1)

    Motion Object Detection Based on Feature Classification

Motion object detection based on feature classification includes two processing steps, namely the learning process and the decision process, as shown in Fig. 5.13.

Fig. 5.13
A flow diagram presents 2 processes. The training process has a sample image set with feature extraction followed by feature same set, classification model, and detection result. The decision-making process has an image to be detected, a scanning window, a feature vector, a classification model, and a detection result.

Diagram of Feature-based Object Detection using Classification

The basic idea of the learning process in feature-based motion object detection is to select or construct a type of image feature that is conducive to describing the target of interest. Through a feature extraction algorithm, a set of labeled image samples is mapped to a feature space to form a feature sample set. Then, by using the sample set as input, the corresponding pattern recognition classifier is supervised and trained, and a trained detection classifier is ultimately obtained.

The basic idea of the decision-making process is to first identify all the regions in the current image that may contain the object of interest, and then use the trained classifier to quantify the possibility of the existence of the object in these regions. Finally, a decision strategy is used to evaluate the output of the classifier, and ultimately achieve the detection of the object.

The two key points of feature-based classification for motion object detection are image features and classification models, where the construction of the classification model is closely related to the dimension of the feature vector.

For image features with small dimensions, such as color histograms, color moments, HOG, LBP, etc., a distance-based decision method is generally used. That is, the optimal linear decision threshold is calculated using the intra-class distance of the training target samples, and the detection of the target in the scene is achieved by comparing the distance between the feature of the target image and the average feature of the target sample. Common distance measures for features include Euclidean distance, Minkowski distance, Chebyshev distance, Mahalanobis distance, Canberra distance, and cosine distance. The use of distance-based classification methods for low-dimensional image features has the advantages of high computational efficiency and ease of implementation, but the classification performance is generally not ideal due to the simple decision-making method. Another way to classify low-dimensional features is to use pattern recognition classifiers such as k-nearest neighbor (K-NN) classifiers, Bayesian classifiers, decision tree classifiers, support vector machines (SVM), and neural network classifiers.

For high-dimensional image features, specific methods need to be adopted to select the components with strong discriminatory ability from the feature vector, in order to reduce the computational complexity of the training and decision-making processes. Generally, there are two types of methods that can be used to select feature components. One is the feature extraction method, which reduces the dimensionality of the feature vector through linear transformations by employing decorrelation strategies. Feature extraction methods generally include principal component analysis, linear discriminant analysis, and Sammon projection, among others. The other type is ensemble learning models in machine learning, which achieve decision-making on the features by combining weighted weak classifiers corresponding to the feature components. The weight of the weak classifier reflects the importance of the feature component to the decision. Common ensemble learning methods include Boosting algorithm, Bagging algorithm, and AdaBoost algorithm.

  1. 2)

    Object tracking based on classification

Motion object tracking based on classification, also known as detection-based motion object tracking, is a popular object tracking technique in recent years. The core of such algorithms is to find a boundary between the scene background and the motion object image to separate the foreground object from the local background. Specifically, an online learning feature classifier is used to quantize and classify the feature vectors extracted from the current region of interest, and the region with the best classification output is selected as the target region. This type of algorithm can continuously update the classifier using the feature vectors obtained from the current frame's best classification region, which greatly improves the algorithm's adaptability to the changes of the target shape. The process of motion object tracking based on online learning classification is shown in Fig. 5.14.

Fig. 5.14
A flow diagram with 6 steps. Video sequence frames are followed by scanning window image set, feature extraction, and online classification model to give tracking result. The tracking result leads to a local search strategy to scanning window image set. Online learning from classification to classification model.

Flowchart of classification-based motion object tracking

Typical classification-based object tracking methods include On-line AdaBoost, Online MilTrack, and TLD (Tracking Learning Detection).

In this study, multiple feature fusion methods were used to establish a directed scene motion pattern model for mobile operations in the power grid based on the characteristics of power grid equipment operations. A target detection method based on scene motion pattern was employed to process input images. Combined with video annotation technology, dynamic video features were extracted between consecutive image frames to ensure the consistency of annotation and video timing.

  1. (4)

    Audio–video assisted graphic guidance information overlay technology

The technical solution of the system supports online annotation editing function, which can add text, pen, 2D vector graphics and other elements. Backend technical experts can add text or annotated vector graphics to any position of the video screen in real-time according to their needs. On-site maintenance personnel can see these annotations synchronously, which is convenient for remote command. During the layer editing process, it will not affect the synchronous playback of live video and audio or the quality of the image.

There are three common methods for displaying text: dot matrix display, font image display, and real-time generation of text images for loading and display. They are suitable for different languages and different application scenarios. Next, we will introduce these three display methods respectively.

  1. 1)

    Text dot matrix display

The information of computerized characters is stored in font patterns, which record the shape of the characters using binary bits. Each bit in each byte of the font pattern corresponds to a dot in the character grid. For example, the way the letter ‘A’ is recorded in a font pattern is shown in the following figure. The dot matrix display method reads the font pattern information and sets the corresponding values at the corresponding positions in the image based on the 0's and 1's in the font pattern dot matrix. The advantage of this method is that it is simple and convenient to display while supporting multi-language operation techniques and various development platforms; the disadvantage is that it requires pre-reading of the font pattern information, and the effect of display is simple and cannot use rendering effects supported by operation techniques, as shown in Fig. 5.15.

Fig. 5.15
A list of 16-bit codes and 16-character pattern information. The bit codes have 8 bits each and have a pattern of 0s and 1s corresponding to the character. The character pattern information has zeros in the beginning followed by a cross corresponding codes.

Record mode in font module

  1. 2)

    Using font images for display

This method involves first creating a font resource image and then establishing an index table for the quick positioning of the displayed text. When text needs to be displayed, the image can be directly pasted. The advantages of this method are that the font resources are relatively small, the display is convenient, and it supports multi-language operation techniques. The disadvantage is that it is not suitable for displaying a large number of Chinese characters.

  1. 3)

    Real-time generation of text images for loading and display

This method involves first extracting the TextMetrics and ABC structures of the characters to obtain the display size of the characters, and then creating an image (Bitmap) based on this size. Then the text is outputted, and finally, the contour information of the character is dynamically superimposed onto the image at the corresponding position. The advantage of this method is that it does not require a separate font resource and can directly use the fonts supported by the operating system, namely, the fonts that have already been installed. However, the drawback is that real-time image generation requires CPU time and memory consumption, resulting in lower execution efficiency.

From the above analysis, we can see that each of the three methods has its own advantages and disadvantages, and is suitable for different application scenarios. Considering the special nature of the wearable device technology and the complexity of Chinese encoding, we generally use the third method, that is, real-time generation of text images and loading for display in the specific implementation of overlaying text on the multi-channel distribution controller.

The specific approach for SFU to overlay text is as follows in the technical scheme: SFU first decodes the terminal video data received from the network into YUV format (or RGB format), then uses the text display method mentioned earlier to overlay text on the YUV (or RGB) image data, encodes the overlaid YUV (or RGB) data, and finally sends the encoded video data with text display to the network, so that the terminal can display the image with text after receiving it. The specific operational flow is shown in Fig. 5.16.

Fig. 5.16
A flow diagram. Network to encoded data gives decoder to Y U V format image, which gives input to subtitle image overlay. Displayed text information goes to dynamically generated text images and then to subtitle image overlay. Output from overlay to Y U V format image to encoded data gives network.

Flowchart of Text Overlay on Multi-channel Distribution Controller

From the above analysis and design, it can be seen that the most complex part of text display in this technical solution is the process of dynamically generating text images. Once the text images are generated, text display can be accomplished through image overlay. This method produces text with good compatibility and smooth text edges, resulting in a clean and polished appearance, thanks to the rendering effects supported by mobile operating techniques. However, this method using SFU to overlay text requires a certain amount of CPU time and memory, which is its drawback. If standardization organizations can define a standardized protocol that enables text overlay functions to be performed by video terminals, this problem can be avoided.

The collected images are combined into a continuous sequence of frames to create video content that can be viewed with the naked eye. The image collection process is mainly done by devices such as cameras, which capture raw data in YUV format that is then compressed and encoded into formats such as H.264 for distribution. Common video container formats include MP4, 3GP, AVI, MKV, WMV, MPG, VOB, FLV, SWF, MOV, RMVB, and WebM.

Images, due to their strong visual impact and relatively large size, constitute the main part of video content. The main challenges in image acquisition and encoding are: poor device compatibility, sensitivity to delays and stutters, as well as various video image processing operations such as watermarking logos, and drawing vector graphics for annotations.

The raw data is obtained after the video or audio has been captured. In order to enhance some on-site effects or add additional effects, the data is generally processed before being encoded and compressed, such as adding watermarks such as timestamps or company logos, and editing annotations of vector graphics.

Vector graphics are typically represented in a vector structure. Vector structures precisely represent geographic entities such as points, lines, and areas by recording coordinates. The coordinate space is continuous, allowing for precise definitions of any position, length, and area. Its precision is only limited by the accuracy of digital devices and the digitization bit length. In general, vector structures are much more precise than raster structures.

The characteristics of vector structure are clear positioning and implicit attributes. The positioning is directly stored based on coordinates, while the attributes are generally stored in the header or in specific locations within the data structure. This characteristic makes the overall graphics algorithm more complex than the raster structure, and some algorithms are even difficult to implement. However, there are also advantages. In computing the length, the area, the shape, the graphic editing, and the geometric transformation operations, the vector structure has high levels of efficiency and accuracy. On the other hand, vector structures are relatively difficult to use for operations such as overlaying and domain searches.

Vector data is composed of point objects, line objects, and polygon objects. Each type of object has its own internal structure, which can be further categorized as point data, line data, and polygon data.

Points, also known as point entities, correspond to point objects of geographic features. Points are objects with a specific location and a dimension of 0. Points are the simplest type of vector data and a 2D point can be represented by a (x, y) coordinate pair, while a 3D point can be represented by (x, y, z).

Line data, also known as line entities, corresponds to the line objects in video broadcasting technology. Lines are spatial components commonly used in one dimension, representing the spatial properties of objects and their boundaries. At least two points determine a straight line, where each point is called an endpoint or node, and lines can have multiple nodes as needed. Line data is commonly used to represent static phenomena such as roads and rivers. Line data can also be used as a data layer to help display dynamic data such as car travel routes, driving directions between two addresses, and flight paths. Depending on the level of detail required, the number of points that make up a line can be densified or generalized.

Polygon data, also known as area entities, correspond to polygon objects in video streaming technology solutions. Polygon entities describe phenomena such as lakes, islands, and land parcels. They usually record the boundaries of area features and are therefore also called polygon data. A polygon is composed of multiple lines. One difference between lines and polygons is that the former is open while the latter is closed.

The technology solution has conducted a detailed analysis of the requirements for graphic modeling. The demand for graphic modeling in the power industry differs from that of ordinary drawing software, and an understanding of the power industry modeling process is required for its requirements analysis. The drawing of basic graphic elements and the business graphic element editor have been analyzed for their requirements. The analysis of editing graphic functions mainly includes the editing of graphic elements and canvas, layer operations, and attribute editing.

The technical solution is aimed at providing a robust graphics drawing platform that supports both basic graphics editor functions and specific requirements of the electric power industry, such as topological connections, topology coloring, and power industry business graphic editors. By using the graphic editor, various graphics, such as main circuit diagrams and protection circuit diagrams, can be edited for different substations based on their specific situations. Additionally, the graphics platform should have an intuitive user interface. When drawing and controlling graphics, a separation between rendering and selection is used. The main idea is to draw the rendering part and selection part separately when creating a graphic element. The rendering part refers to the specific basic graphics that can be seen with the naked eye, including the outline, fill image, color, and so on. The selection part is used to control the graphic element, such as dragging the control point to change the size of the graphic element or dragging the graphic element to change its position. This technical solution involves using multithreading to control this process. A separate rendering thread is responsible for rendering the rendering part, and another selection thread is responsible for rendering the selection part. After creating a graphic element, the user can see the work completed by the rendering thread. However, the selection thread also draws a graphic element at the same location in a different way, and this element is specially processed so that the user cannot see it. When the user selects or drags an already drawn graphic element with the mouse, the work done by the selection thread is used. The selection thread senses the position of the mouse when it is moved over a graphic element, based on the color of the pixel point where the mouse is located, which is the color drawn by the selection thread.

From a functional perspective, there are two necessary parts. If only one rendering part is used, it is obviously impossible to distinguish each element. Because this technology solution involves using the method of distinguishing elements by the pixel color of the mouse position, different elements can obviously be of the same color, so a single rendering thread cannot use color to distinguish the differences between them.

5.2.5 Friendly Neighborhood Mutual View and Interaction Technology

Mobile emergency neighbor communication refers to the method of using mobile emergency terminals, wireless communication networks, embedded GIS, etc., to establish a mobile emergency location service network, and share location, information, and files within the network, as well as visualize the neighbor locations during emergency situations.

  1. (1)

    Implementation of neighbor-to-neighbor visual communication technology

Terminal networking: Relevant emergency terminals are incorporated into the same mobile emergency location service network and corresponding networking rules are set.

Information reporting: Emergency terminals obtain their own location, multimedia information, and custom text information, establish a connection with the central server through a wireless communication network, and upload them to the central server.

Information Push: The central server pushes the locations and information of other emergency resources, as well as central dispatch instructions, to the relevant intelligent PDA terminals within the network based on the pre-set networking rules.

Neighbor Expression: Based on embedded GIS, intelligent PDAs locate and display the positions of other emergency resources, measure their relative position and distance, and display received files, messages, and instructions.

Mobile monitoring and command: As a result, a mobile emergency location service network has been initially constructed, which can realize functions such as information sharing within the network, neighbor location visualization, and mobile command.

  1. (2)

    Neighbor position and information interaction

After various emergency resources are included in the mobile emergency location service network, their positioning and communication terminals establish a data link with the central server via GPRS or Beidou communication. The terminal reports its own location and information to the central server, requests for neighbor viewing, and obtains or updates the neighbor list from the central server. The central server pushes the location and information of other emergency resources in the network to the intelligent PDA according to the neighbor rules and the application period and positioning frequency.

The PDA establishes a heartbeat connection with the central server and reports its own location to the server at set time intervals. The heartbeat program formulates satellite search and program running strategies based on GPS satellite visibility and emergency working hours, and effectively manages the power consumption of the PDA.

After obtaining its own position, the Beidou terminal reports its location to the Beidou vehicle-mounted terminal connected to the center server via a serial port through the Beidou communication network, and then enters the mobile emergency location service network through the communication server. For small and medium-sized emergency departments, vehicle-mounted Beidou terminals can replace command-type Beidou terminals, effectively saving construction costs.

The emergency command center can send location and short messages to emergency personnel holding smart PDAs in the mobile emergency location service network through GPRS or BeiDou communication link. Emergency personnel with smart PDAs can also establish voice and text communication with friends through the mobile emergency mutual viewing common software. In addition, based on the friend list, group members can choose to send messages to multiple friend groups.

Developing mobile emergency neighbor view software based on embedded GIS components is crucial for locating and displaying the positions and information of emergency terminals and neighboring terminals. The key technology for expressing the positions of neighboring terminals is power-saving management, as well as effective management of the limited memory of the PDA under high-frequency location refresh to prevent memory overflow.

5.3 Real-Time Interaction Technology Between Emergency Site and Emergency Command Center

5.3.1 Power Emergency Communication Vehicle Technology

This chapter focuses on the construction of the electric power emergency communication vehicle system, and gives the main functions of the system according to the overall requirements of the system. According to the overall requirements of the system, the main functions of the system are given, and the design of the carrier platform subsystem, electronic information subsystem and comprehensive security subsystem is completed. The design of the carrier platform subsystem, electronic information subsystem and integrated security subsystem is completed.

With the continuous development of power emergency communication technology, State Grid Zhejiang Electric Power Co., LTD. (hereinafter the “Zhejiang Electric Power Compony”) has put forward new requirements for the new power emergency communication technology: (1) Based on the existing power emergency network in Zhejiang Province, improve the structure of the power emergency network and establish a perfect electric power emergency communication platform suitable for Zhejiang Province. (2) Support multi-service function, and support power emergency communication’s access to image, voice, video and other information. (3) Emergency communication platforms meet technical requirements and construction costs are controlled. (4) High system adaptability, system capacity and functionality should have a certain degree of scalability to meet the needs of future systems.

The services carried by the power emergency communication system in Zhejiang Province include data services, voice services and multimedia services. (1) Data Services: It mainly includes the real-time monitoring service for the data transmission and monitoring between the fault site—provincial emergency communication center and data transmission and monitoring between the provincial emergency communication center and the national emergency communication, which is a distributed transmission method because the bandwidth of the transmitted data channel is not large, so a distributed transmission method is used. (2) Voice services: Provide the voice communication between Zhejiang Electric Power Company, substations and power plants to ensure smooth communication between each department and ensure that voice can be used for command. The transmission rate required for voice service is not high, but the real-time and reliability of voice is required. (3) Multimedia Service: to provide the video communication between the provincial company and the municipal bureaus in emergency situations, as well as to provide the video communication between substations and power plants, and to ensure the video communication between various departments and the scene of fault.

As a special network for dealing with emergency events, the power emergency communication network must have fast response, strong network architecture and economy. The unexpectedness of accidents often result in the lack of a readily complete communication system on site, so the power emergency communication network provides real-time and reliable information transmission, including data, voice and multimedia services, providing a scientific basis for emergency center staff.

The security requirements of the power emergency communication network include the security of the network, the security of the equipment and the security of the data. The power emergency communication network, as a special network for handling power emergencies, needs a highly reliable and self-healing network to ensure the stable and reliable operation of the communication network; The safety of equipment is an important part of the safety of the power emergency communication system. The key equipment, such as power supply, main control, switches, etc., must have relative stability; Data security is an important guarantee of information security, so the system should ensure the security and originality of the data in the operating system and the database, while being able to resist malicious attacks and information theft.

The design goal of this system is to establish emergency communication in the case of natural disasters and emergencies, to ensure the smooth communication between the emergency center and the scene, and the system consists of a satellite ground fixed station, emergency command vehicle, emergency communication vehicle and single soldier system. This is shown in Fig. 5.17.

Fig. 5.17
An illustration presents a system network. An emergency control center rents a dedicated landline telephone to operate satellite Earth stations. Stations are linked to satellite transponders, satellite link, portable clusters, emergency command and communication vehicles, and single soldier system.

System network topology diagram

This report focuses on the application of 4G technology in electric power emergency communication, while the emergency communication vehicle mainly uses a combination of 4G and satellite communication technology, and the command vehicle uses satellite communication technology, so the following mainly focuses on the design and research of the emergency communication vehicle. Broadband wireless network emergency mobile command vehicle system means a broadband wireless base station and other equipment involved in the system integration and installed in a carefully modified communications vehicle chassis, and the base station system can be used for a wide range of mobile coverage. That can form a coverage radius of 2–5 km emergency broadband wireless communication system for emergency command, and connect a large number of data communication equipment (including computers, laptops, PDAs, IP phones or cell phones and video equipment, etc.) within the coverage area into an emergency interoperable data communication network, that is, the emergency broadband wireless network is realized based on vehicle-mounted broadband wireless base stations. The system can be connected to the higher-level communication network through the vehicle-mounted microwave communication system or satellite communication system, to realize the connection of mobile broadband wireless network and private network, and constitute an integrated network with the higher-level or neighboring command center to quickly complete the communication tasks of data transmission and command and control.

A set of motion 4G-satellite communication system is formed by 4G communication technology, satellite communication technology, multimedia and vehicle modification and other technologies. Based on the combination of 4G technology and satellite communication, the command communication vehicle could communicate with the single system in 4G signal coverage and finally transmit the signals back to the general command through the satellite base station on the communication vehicle. An emergency command platform, which is supposed to achieve real-time information collection and handle power emergency site, could be realized with the requirement of a technologically advanced and highly reliable command vehicle in good practicality. As the command communication system of the emergency center of Dongguan Power Supply Bureau, the emergency command vehicle completes the functions of emergency command, information collection, information transmission, video conference and so on. The command vehicle system realizes the communication of audio, data and video with the emergency command center and is divided into communication subsystem, audio subsystem and auxiliary system according to the functions. In terms of overall layout, the command vehicle is divided into the driving area, command area and operation area according to the functions. Driving area: being modified on the basis of the original design of the car to ensure driving convenience and comfort. Command area: the area used for commanding staff to command on site, and for issuing instructions for emergency emergencies. The command vehicle is capable of accommodating space for 6 staff members, including 4 commanders and 2 operators. Operation area: the area used for staff to realize commands and detect equipment, to carry out tasks such as the transmission of command messages, the reading of command cases, and the issue of commands. The matching electrical equipment is safe and reliable, able to cope with the impact of environmental factors, including high temperature, high wind, rain and snow, etc., able to drive and work normally in different road conditions, and able to organize command work in the scene of fire and earthquake. We should ensure that the command vehicle is beautiful, comfortable and practical after modification.

The construction principles of the 4G command and communication vehicle are as follows:

Advancedness: Make full use of the combination of 4G technology and satellite communication technology to ensure the advanced nature of the command system and guarantee the life cycle of the system. The system is flexible, easy to operate, and has a certain degree of fault tolerance.

Expandability: The system is scalable and system upgradeable, with a modular design for reserved space for system upgrade and transformation, and is able to support different interfaces.

Applicability: Ensure that the command vehicle has mobility, flexibility and the ability to guarantee staff safety in different environments, and realize the interaction between information and the site and the general command.

Economy: The command vehicle achieves maximum economy and rational allocation of resources for the hardware and software of the command vehicle.

Security: The security of the system includes information security and electrical security. Information security for the setting of user privileges, to prevent illegal and over-level operation of users, and to ensure the backup and recovery function of information. Vehicle system should provide safe and reliable power supply and consumption.

The main functions of the 4G Emergency Command Vehicle include:

Command and dispatch: Using 4G and satellite communication means to receive instructions from the general command, command the emergency operation at the site, dynamically monitor the site, ensure the orderliness and rationality of the site command, and realize video and voice communication in the process of emergency command.

Data functions support a variety of data terminals in wired and wireless ways. Basic data services: support.

VLAN, support for real-time data and non-real-time data. Voice function: The emergency communication vehicle adopts satellite technology to realize the communication with the command center, ensuring the smoothness and reliability of the communication link. Synchronization between the voice and video images without delay is also required. The video function needs to support a variety of video services such as video surveillance, conversation, and conference. Video monitoring: allowing the command center staff to observe the site amount in real-time, the video information can also be sent to the terminal equipment sink, so that the terminal can see the video information on site. Video conversation: The terminal can have a direct video conversation with the dispatcher or between terminals. Video conference: Through the video exchange function, multi-party video conference function can be realized.

GPS function

GPS real-time positioning: Through GPS, you can grasp the location of the terminal in real-time, and then send the location information to the communication network through wireless network.

Historical track playback: The system saves the position information and forms an accurate movement track through the playback of the track. The system saves the time and position information, so you can grasp the time and position information in real-time.

Mileage statistics: The current form mileage can be calculated for statistics.

  1. (4)

    Instant messaging function

The system inherits the business of communication and supports dialogue and communication between correspondents, which is mainly text. Besides, point-to-point and point-to-multipoint text message sending and receiving are implemented. The system is also supposed to retrieve the online status information of contacts with different query criteria.

  1. (5)

    Cluster scheduling function

Cluster scheduling service is divided into voice service and data service. Meanwhile voice and data can be jointly scheduled. Data services need to include real-time data such as video and non-real-time data such as file transfer.

  1. (6)

    Communication security

The system has 4G and the means of satellite communication. The command communication vehicle has 4G coverage, and the information received by 4G is transmitted to the general command via satellite communication to ensure remote communication.

The command operation subsystem is to monitor, store and play audio and video information, provide decision support and basis for commanders, and provide hardware and software support for on-site meetings. The command operation subsystem consists of the audio, the video and the centralized control system, and these three subsystems are introduced as follows. Audio system: When the audio system receives the audio signal, the signal is switched, amplified, stored and being processed in other ways. Then the signal is sent to the conference video system to meet the needs of command and video conferencing. The conference table in the communications van is equipped with a MIC interface for easy access to microphones and computers. The audio signal inside the vehicle can be connected to other vehicles and can also receive the audio signal from the conference system. The audio matrix switcher and the mixer are the core of the audio system, the switcher can switch the audio signal to the output port, while the mixer can amplify the signal, mixing and processing the signal in other ways. The storage of the audio signal of the emergency communication vehicle is accomplished by using a hard disk recorder, which can realize the storage of 4-way audio signal. Video system: The video system will receive signals from cameras installed in the car, and the roof, as well as from the video conferencing system, and then switch and display the video signal to meet the needs of command and video conferencing. The emergency communication vehicle is equipped with a video VGA interface, which can be connected through the output interface of the computer, and the video conferencing system can receive the video signal from the scene. The video matrix and VGA are the keys to the video system, where the video signal is transmitted in multiple channels and displayed on multiple screens. Video is stored on a hard disk recorder, which can store up to 4 signals.

The centralized control system is designed for the control of the display equipment and the audio and video equipment, which mainly realizes the centralized control of the hardware of the emergency communication vehicle and provides technical support for the intelligent command system. The centralized control system concentrates the operation contents on one screen, and the staff can complete the control of related equipment according to the options, which makes the work more efficient. The interactive interface of the centralized control system needs to be simple in design. The centralized control system of the emergency communication vehicle in Dongguan City is mainly composed of the host, the touch screen, the relay, the infrared emission and the software. The connection diagram of the centralized control system is shown in Fig. 5.18, which realizes the control of the power supply of the equipment in the emergency communication vehicle, the control of the camera platform outside/inside the vehicle, and the switching and recording of the audio/video system.

Fig. 5.18
A block diagram of a centralized control system. It has connections between strong electrical equipment, relays, central control unit, touch screen, large screen, D V D, V G A matrix, audio or video matrix, in-car camera, strong electrical equipment, roof top camera, and video conferencing.

Centralized control system connection diagram

The computer subsystem is the basis for receiving, transmitting and processing information, and provides a hardware platform for safe and stable operation of emergency command. The design of the computer subsystem of the emergency communication vehicle of Dongguan City uses 2 Gigabit switches, one of which is the command confidential network and one is the unclassified public network. The two switch networks are independent of each other to ensure the security of the system. The confidential command network is the core of command communication, which is composed of computer terminals, switches, firewalls and printers, and the emergency command system and information database can be directly connected to the confidential command network. The Command Classified network is responsible for most of the signal transmission, which is tethered to the command station by satellite systems or fiber optics after passing through the firewall. The unclassified public network is for the query and processing of public information, offering access via public wireless network. A wireless router is set up in the emergency communication vehicle, and portable devices inside or outside the vehicle can be connected to the unclassified network through the public network. The design of dual-network composed of the classified and the unclassified ensures the security of the command system. The classified network is used for the command network, and the firewall is responsible for isolating malicious attacks and viruses. Moreover, the computer subsystem is equipped with reliable devices, each equipped with rich interfaces to meet the scalability of the system. In order to reduce the large space occupied by the devices due to the computer system, an integrated host and monitor are used in this report. The design of the system also includes a secrecy machine that encrypts the transmitted signals and prevents illegal methods and modifications to the signals.

The communication subsystem is an important part of the emergency communication vehicle. It can achieve communication between the site and the command center through cable, satellite, 4G, shortwave, etc., and achieve interoperability between provinces and municipalities through the transfer between the command networks. It can transmit real-time information on the emergency site to the destination, providing guarantee for the emergency command.

5.3.2 Power Emergency Communication Hardware System

  1. (1)

    The basic characteristics of shortwave communication

Shortwave communication frequency: Shortwave communication refers to radio communication with a wavelength of 100–10 m (frequency of 3–30 MHz), which is sometimes called high frequency communication, and occupies an important position in the field of traditional radio communication. Military shortwave radios in the battlefield can be used to transmit telegraph messages, voice, data, etc., and are an important means of military communications. Shortwave communication is the only way to achieve ultra-long-range communication in a simple device, low-powered personal portable radio. Short-wave communication has unparalleled anti-destructive ability and autonomous communication ability, and can quickly establish communication sites. Short-wave radio waves can cover mountainous areas, deserts, oceans and other special areas, which is not possible with general mobile communication systems, and compared to satellite communications, the cost is very low, easy to maintain, and has great competitiveness in the field of emergency communications. Modern mobile communication facilities are emerging, and cellular networks, 4G networks and other base station-centric networks are prevalent, but shortwave is still alive and well in aerospace, military and maritime communications, and is growing rapidly.

The propagation of short waves: There are two basic propagation paths of short waves, one is the ground wave, and the other is the sky wave. Ground wave propagates along the earth's surface, and its propagation distance depends on the characteristics of the surface medium. The conductivity of the medium on the sea surface is most favorable for wave propagation, and the short-wave ground wave signal can propagate along the surface for about 1000 km; the conductivity of the medium on the land surface is poor, and the wave decay is large, and the short-wave signal can only propagate along the ground for a few tens of kilometers at most. But the main propagation path of short waves is sky waves. It is the reflection of ionosphere for propagation. When the atmosphere is irradiated by sunlight, a layer of charged air is formed, called the ionosphere, which ranges from 60 km to about 2000 km above the ground. The ionosphere is divided into four layers, namely, D, E, F1 and F2. The D layer is 60–90 km high and reflects frequencies of 2–9 MHz during the day. The E layer is 85–150 km high, and this layer has less reflection effect on short waves. The F layer has the greatest reflection effect on short waves and is divided into two layers, namely, F1 and F2. The F1 layer is 150–200 km high and works only during daytime, while the F2 layer is more than 200 km high and is the main body of the F layer, which supports shortwave propagation during daytime and nighttime. When the short wave enters the ionosphere, it will be bent by refraction, and when the wave penetrates to a certain depth in the ionosphere, it will turn around and propagate downward, and finally return to the ground, and the radio wave returning to the ground will be reflected back to the sky, and then be reflected back to the ground, and so on, using the reflection of the ionosphere in the atmosphere to spread to places thousands of kilometers away.

The current status of shortwave communication: Due to the irreplaceable nature of shortwave communication in military communication, shortwave communication has started to receive attention again since 1960s As a result of a variety of new technology applications, such as channel adaptive technology, differential frequency hopping technology, broadband direct sequence spread spectrum technology, channel coding technology, channel balancing technology, shortwave group technology, many problems of shortwave communications have been solved and with the rapid development of microcomputers, mobile communications and microelectronics technology, microprocessors, digital signal processing, and constantly high quality of shortwave communications and data transmission rate, shortwave communications and equipment has been greatly developed.

Problems faced by shortwave communications:

  1. 1)

    The reliability of communication is yet to be improved. The sky wave propagation of short-wave communication is extremely unstable because of the ionospheric changes and multipath propagation. On the one hand, the height and the concentration of the ionosphere change with the regions, seasons, time, sunspot activities, and other natural factors. On the other hand, they are also changed by the ground nuclear tests, high-altitude nuclear tests and high-power radar and other human factors, which means the frequency of short-wave communication must also be changed accordingly. Especially at dawn and dusk, the ionosphere electron density changes greatly, and it is necessary to change the frequency in time, otherwise it will lead to communication interruptions.

  2. 2)

    The data transmission rate is not high enough. Due to the carrier frequency, the traditional shortwave communication has the problem of low data transmission rate (no more than 600bit/s), and the communication method is generally frequency coded. The modern communication requires more and more information, including the requirements for images, data, audio and video, which cannot be achieved for the limited channel capacity of shortwave, so the data rate of modulation on the limited carrier frequency needs to be further improved.

  3. 3)

    Anti-interference capability. Because of its important performance in field communication and battlefield applications, shortwave communication plays an irreplaceable role in communication command. But it is also very vulnerable to enemy interference and confrontation, so how to improve the system's anti-jamming ability is a huge challenge.

  4. 4)

    Networking. As communication becomes more and more networked, shortwave communication is gradually developing in the direction of networking. However, compared with the microwave band, its channel capacity is more limited. How to achieve high efficiency system networking, optimize network structure, and improve network response speed for the limited data rate of shortwave is the key issue.

The current situation of shortwave network: According to the network form, shortwave network can be divided into centralized control, distributed control and the combination of the two mixed control network. Centralized control is to use some special network nodes as the central base station, other nodes belonging to the subordinate position, and the interaction of all information is completed through the unified resource allocation of the base station. The advantages of this network form are simple and reliable control, convenient management, and high channel utilization, but the disadvantage is the poor resistance to destruction, especially once the node of the base station is destroyed, the whole network will be paralyzed. Affected by the central role of the base station, the network coverage area depends on the communication range of the base station, and the expansion of the network appears to be very difficult. The advantage of distributed control network, also called centerless network, is that each node in the network is on an equal footing, and its advantage lies in the self-organizing ability and self-restoring ability of the network, so that the destruction of any node will not cause the failure of the overall network. The disadvantage is that the network management is difficult and the network protocol is very complex. The hybrid network adopts both network structures, and the layout of the network is designed according to the specific situation.

  1. (2)

    System solutions for emergency communication networks

  2. 1)

    the choice of shortwave frequency band: Considering that the emergency communication situation often involves complex terrain, traffic inconvenience, and long-distance communication, one of the most direct solutions is to expand the distance of direct communication. In addition, due to the specificity of the actual situation after the disaster, it is often the case that the distance between two neighboring nodes is several kilometers to tens of kilometers, or even blocked by mountains or buildings, etc. Therefore, it is very important to ensure the nodes’ ability to communicate over long distances and across obstacles. Based on this consideration, we choose the wireless communication frequency as short wave. Compared to microwave communications, shortwave wavelength is longer, with excellent bypass and diffraction capabilities, you can easily achieve communication across obstacles. Because the means of shortwave transmission mainly depends on ionospheric reflection, and in clear daylight, it may even reach hundreds of kilometers of communication distance with less power, which is out of the reach of microwave communication which can only rely on straight-line transmission to achieve line-of-sight communication. Compared with long-wave communication, the volume of short-wave transceiver antenna is relatively small, and it is easy to make simple, efficient, and portable antenna, while long-wave communication generally need to build a huge tower antenna to improve efficiency, and it is extremely inconvenient to construct the antenna. In addition, because the frequency is too low, the data transmission rate is very low. The biggest disadvantage of shortwave communication is its much lower frequency compared to microwave communication and the lower data transmission bit rate that can be achieved.

  3. 2)

    The choice of centerless network form: the conventional communication method belongs to the centralized control network, including cellular phone, Internet, telephone network, etc.. Although these communication networks work well and deliver high performance in normal circumstances, these networks invariably require establishing information exchange control center and data forwarding base station. The structure of the base station or the control center is generally very complex, and need to deal with all the data of the whole area or the whole network. Once a base station fails, it will cause the information in the whole area to be closed. Once the control center fails, it will lead to the failure of all the base stations associated with it, and the larger area will be in isolation, which is the biggest drawback of the centralized network. In contrast, another kind of distributed network, i.e., centerless AdHoc network, is established without relying on the centralized data forwarding of base stations and control centers, but any node can send and receive information directly, and indirectly accessible nodes can choose Ḁ some nodes to forward information for them. At the same time, the nodes of the centerless network do not require special complex design as base stations, so they can be fully powered up and ready to use, and the network is not damaged by power failure. Considering the difficulty of establishing base stations and control centers in disaster situations, the centerless network is the first choice for post-disaster emergency communication networks. The centerless network has the following characteristics: (1) No central node. The network is a peer-to-peer network. Each network node has the same hardware structure in the network, and the network status is equal. The node can join and leave the network at any time, and the failure of any node does not affect the operation of the whole network, which has strong resistance to destruction. (2) Self-organization. Once a node is powered on, it can search for surrounding networks and join them, and quickly set up an independent and perfect network. (3) Multi-hop routing. Network nodes can communicate between any two points, and the communication between two distant points usually need to go through a number of nodes forwarding, commonly known as “multi-hop”. This is essentially different from the structure of the traditional central network in which nodes only communicate with routing. (4) Dynamic topology. The network has a strong dynamic topology and self-healing ability. With the movement of nodes the change of the strength of the signal, and the joining and exit of nodes, etc., the network can be adjusted according to the self-organization protocol, update the routing information to adapt to the changes in the network structure.

  4. 3)

    The proposed reference model of shortwave networking system: In 1981, the International Organization for Standardization recommended a network system architecture development system interconnection model, referred to as OSI. As a result of the establishment of this network standard, every network standard since then has been aligned with OSI. The network system model proposed in this report refers to the IEEE802.15.4 standard and the architectural idea of Zigbee protocol stack defined by Zigbee Alliance, which divides the shortwave networking system into four layers: physical layer, media access control layer, network layer and application layer.

    The physical layer includes hardware facilities such as wireless transceivers and hardware codecs, etc. The media access control layer mainly defines the types of data frames, frame structure, basic data sending and receiving and answering methods, etc. The network layer is mainly responsible for the topology of the network. The application layer defines the commands, network maintenance methods, path selection methods, exception handling methods, etc. according to the user's functional requirements.

  5. 4)

    Implementation of network node device structure: Each standard network node of the centerless network designed in this system has equal status, except for the difference of ID number and upper layer attribute setting, its underlying devices are identical and can be copied directly, and each node can be used when it is powered on. The node of the standard centerless network in this design consists of the following eight parts: transceiver antenna, transceiver, hardware codec, USB interface converter, embedded platform, application software, display and keyboard, etc., and battery backup.

  1. (3)

    Emergency communication network hardware design scheme

There are six main components in the overall physical layer hardware solution, which are transceiver antenna, receiver module, transmitter module, FPGA codec module, USB communication module and power module. The antenna is the basis of microwave signal transmission and reception; the receiver module is responsible for signal reception, conditioning and demodulation; the transmitter module is responsible for signal modulation, conditioning and transmission; The FPGA module is responsible for signal coding and decoding, error detection and error correction; the USB communication module is responsible for the communication between the underlying circuit and the upper computer. The hardware environment for proper system communication can be ensured by the above modules.

The modulation and demodulation part, modulator FSK modulation circuit, FSK modulation circuit is mainly based on Analog AD9954 and the AD9954 is a highly integrated DDS device developed using advanced DDS technology, which is a DDS chip specifically for signal modulation. With a core conversion rate of up to 400MSPS and a built-in high-speed, high-performance D/A converter and ultra-high-speed comparator, it can be used as a digitally programmable frequency synthesizer capable of generating up to 200 MHz analog sine waves. The AD9954 contains 1024 × 32 static RAM, which can be used for high-speed modulation and supports several sweep modes. The AD9954 provides a customizable linear sweep mode of operation with fast frequency conversion and good frequency resolution via the AD9954's serial port input control word. Applications range from sensitive frequency synthesizers, programmable clock generators, FM modulation sources for radar and scanning systems, and test and measurement devices. Its main features are built-in 400MSPS clock, including 14-bit DAC, phase, amplitude programmable, 32-bit frequency conversion word, available serial control, built-in ultra-high-speed analog comparator, automatic linear and non-linear sweep, internal integration of 1024 × 32-bit RAM, 1.8 V power supply, 4–20 times the frequency multiplier, support for most digital inputs in the 5 V input level It can realize multi-chip synchronization, etc.

For digital signal transmission, high-speed modulation and demodulation is a challenge. The conventional signal source is generated by the phase-locked loop. The phase-locked loop is a slow and stable process, so it is impossible to achieve high-speed frequency jump, and the phase-locked loop signal source is mostly used for analog modulation such as FM. DDS devices solve the problem of high-speed modulation of short-wave frequencies. Since DDS is basically a digital device, its output waveform is directly synthesized by DA converter, and the modulation speed is not a bottleneck, it can achieve the above modulation speed in the short-wave band.

5.3.3 Communication Monitoring Software System

This chapter introduces the software design and implementation of the wireless self-organizing network communication/monitoring system for emergency and disaster relief. Firstly, it introduces the construction of the system software platform environment, including the construction of the cross-compiling environment, bootloader porting and Linux kernel porting, which provides a good foundation for the system software design and implementation; Secondly, it introduces the process of designing and implementing the software part of the self-organizing terminal, including video, voice, text, file, picture and other functional modules; Finally, we introduce the routing protocols in the network and the principle of dynamic source routing protocol (DSR) used in this system and the process of implementing it on the wireless self-organizing terminal. The parts of the wireless self-organizing communication/monitoring system software are shown in Fig. 5.19.

Fig. 5.19
A framework has 3 parts. The embedded environment build has a cross-compiling environment build, bootloader migration, and Linux kernel porting. Self-organizing terminal system software has video, audio, text, file, and image. Routing protocol is I P-based multi-self-organizing dynamic routing protocol.

Software block diagram of wireless self-organizing communication/monitoring system

  1. (1)

    System software platform environment construction

Linux is an open source, Unix-like operating system that includes a kernel, system tools, and a complete development environment. The embedded Linux system is a dedicated computer system designed and developed for specific applications based on embedded technology and computer technology by using many features of Linux itself and applying it to the embedded environment. The embedded Linux system structure is divided into user level, kernel level and hardware level. The user application implements the calls to the Linux driver and file system through the Linux system call interface, and the called device driver finds the corresponding hardware module of the embedded development board and implements the calls to the hardware level part of the embedded Linux system to realize the functions required by the user. The embedded Linux system architecture is shown in Fig. 5.20.

Fig. 5.20
A framework of Embedded Linux system. User applications to Library C followed by Linux system call surface, Linux file system, and sockets, character device drivers, hard disk or flash file system T C P or I P, block device drivers, network device driver to hardware.

Embedded Linux system architecture

  1. (2)

    Establishment of cross-compilation environment

Since embedded systems do not have development capabilities, a development environment needs to be set up first for embedded software development. This report studies a wireless self-organizing network communication/monitoring system for emergency and disaster relief, and the ARM Linux platform is chosen to run the corresponding program, which is called the target machine; The developed programs need to be written, compiled and debugged on the X86 Linux platform, which is called the host. As shown in Fig. 5.21, the task of generating an ARM platform runtime program on the host is accomplished with the help of the host's cross-compilation toolchain, a complex compilation environment that includes a compiler, a connector, and an interpreter.

Fig. 5.21
A 3-D diagram presents the functions of the host computer and target machine. The host computer has a cross-compilation toolchain for cross-platform development environments. The target machine has a bootloader kernel root file system application program.

Functional structure of the host and target machine

The cross-compilation tool chain used by the host computer is arm-linux-gcc-4.5.1, and the target machine is connected to the host computer by a serial cable to realize the cross-compilation development of embedded Linux software. arm-linux-gcc-4.5.1 Installation Install as follows:

  1. 1)

    Copy the arm-linux-gcc-4.5.1.tgz zip file to the specified folder #mv

  2. 2)

    arm-linux-gcc-4.5.1.tgz /cd/home/cai/

  3. 3)

    Unpack the file #tar zxvf arm-linux-gcc-4.5.1-C

  4. 4)

    After adding the cross-compiler tool chain path to the environment variable path, enter the command arm-linux-gcc -v in the window to view the software version, default path and other details of the installed cross-compiler tool.

  1. (3)

    Bootloader porting

In traditional computer architectures, the system is usually booted by the BIOS (Basic Input Output System) and the operating system bootloader present in the MBR, while in embedded Linux systems, the bootloader is used to boot the system. The bootloader is a relatively small program cured in the embedded system, whose main role is to guide the operating system and user programs to run properly.

The bootloader program is related to the embedded system hardware platform, processor architecture, CPU architecture, development board peripheral hardware devices, each part of the difference will affect the bootloader program. The more common bootloader programs in embedded Linux systems are Red boot, ARM boot, U-boot, Super boot, etc. Usually, users need to choose the right bootloader according to the device used.

The bootloader is generally started in two phases. The first phase is implemented using assembly and is mainly responsible for initializing the CPU architecture-dependent hardware and making calls to the code in the second phase; The second stage of the bootloader code is implemented in C. The main function is to check the system memory map status and set the kernel boot parameters as well as load the image and root files.

  1. (4)

    Linux kernel porting

The main advantage of an embedded Linux system is its kernel's cuttable nature, which allows the Linux system to maximize the performance of the hardware system after a proper cut and port. The embedded Linux kernel is divided into five different parts based on their functions: memory management, process scheduling, virtual file system, network interface, and inter-process communication. The functions of each part are described separately below.

Memory management: control and coordinate each process to safely share the main memory area, manage the physical memory of the entire Linux system reasonably and effectively, and respond positively and quickly to each subsystem's request for memory allocation. The Linux system supports virtual memory, which can be requested through the disk to get extra memory, and other programs are temporarily stored in the disk. In the state of memory shortage, the system's memory management realizes the transformation of process virtual memory and physical memory.

Process scheduling: Process scheduling controls access to the CPU for each process on a Linux system. When a process is needed to start, the process scheduler prioritizes the processes according to their priority and starts the most important processes first.

Virtual File System: The virtual file system uses a file model to represent different file systems. This model shields the specific differences between file systems and allows the Linux system kernel to support multiple file systems.

Network Interface: The network interface provides Linux system support for multiple network standards and hardware devices, and the parameters of the network interface can be configured to enable information communication between different network devices.

Inter-process communication: There are various ways to communicate between processes, including pipes, sockets, semaphores and shared memory, etc. These mechanisms enable user space synchronization, data exchange and sharing between different processes.

Embedded Linux system, it is necessary to provide corresponding drivers for each hardware device of the self-assembled network system to link the kernel with the hardware devices to ensure the normal operation of the system. The system uses Linux 3.5.0 kernel version, the specific migration steps are as follows:

  1. 1)

    Kernel preparation

    Download the Linux-3.5.0.tgz source tarball, unzip the tarball, and go to the unzipped folder: #tarzxvf linux-3.5.0.tgz Modify the Makefile file so that ARCH = armCROSS_COMPILE = arm-linux-, i.e., select the arm structure type CPU and arm-linux-gcc cross-compile toolchain.

  2. 2)

    Kernel configuration

    Go to the unpacked Linux-3.5.0 folder and execute the make menuconfig command to configure the embedded Linux kernel.

  3. 3)

    Kernel compilation

    Once the embedded Linux kernel parameters are configured, the kernel is compiled by executing the make command at the command line of the Secure CRT software, and the Zimage kernel image is generated under the default path.

  4. (5)

    System software interface design

The design and the implementation of the wireless self-organizing network communication/monitoring system software for disaster relief mainly contain: software interface, communication signaling, video, voice, text, file, picture module. The following part describes the implementation process of each part respectively.

The self-configuring terminal interface of a wireless self-configuring communication/monitoring system for emergency and disaster relief is designed and implemented in Qt, a C+ + graphical user interface development framework from Qt Compony, which widely supports Linux, OSX, Windows, Android, Black Berry, IOS and other platforms.

Qt uses a modular code base to make porting easier for users, who only need to port the appropriate modules. Qt divides all functional modules into three parts: Qt Essentials, Qt Add-Ons, and Qt tools. The module framework is shown in Fig. 5.22.

Fig. 5.22
A framework of Q t. It consists of Q t test, Q t S Q L, Q t web kit, Q t multimedia, Q t Q m l, Q t quick, Q t network, Q t G U I, and Q t core.

Qt framework

Base Module: Defines the basic functionality of Qt for various platforms and is the core of Qt.

Extension modules: additional modules designed for Ḁ special features, available only on Ḁ platforms, such as Bluetooth, Qt Graphical Effects, Communication Qt Serial Port, etc.

Development Tools Module: Development tools needed in the interface design process, such as Qt designer, etc. Qt, a C+ + program development framework, includes GUI toolkits and development modules for networking, OpenGL, sensors, communication protocols, databases, Web technologies, XML and JSON, etc., providing a convenient development environment for users, database, Web technology, XML and JSON development modules, providing users with a convenient development environment.

Signals and slots are used in Qt instead of callback techniques. When a specific event occurs, a signal is fired. A slot is a function that is called corresponding to a specific signal, and Qt has several predefined slots internally that can be subclassed to add their own slots to handle signals of interest. Signals and slots are safe mechanisms, the parameter type of a signal must match the parameter type of its receiving slot for signaling to occur. Signals and slots are loosely coupled; the class that emits the signal does not need to care about the class that receives it.

Qt's signal slot mechanism allows the slot to receive the exact signal parameters and call them at the right time. Qt has unique advantages over other development tools:

  1. 1)

    With good cross-platform features, Qt supports Windows 95/98, Windows NT, Linux, SunOS, Digital NUIX, SCO and other operating systems, the same code can be compiled and run on all supported platforms, and it will show the unique interface style of the platform according to the different platform features.

  2. 2)

    Qt is object-oriented and very convenient for development users. Good packaging mechanism makes the Qt framework highly modular and reusabile.

  3. 3)

    Qt has a rich API interface and contains at least 240 C++ classes in the library, as well as template-based file, serialization, I/O device, and data/time classes.

  4. 4)

    Qt supports rendering of 2D/3D graphics on the interface and supports Open GL to provide users with good visual effects.

  5. 5)

    Qt supports extensive documentation development.

The interface of the self-assembling terminal of the wireless self-assembling network communication/monitoring system for emergency and disaster relief is implemented in Qt language. Considering the usage scenarios of the self-assembling terminal, the interface of the self-assembling terminal is designed to be simple and easy to operate. The interface is divided into six major parts: information, video, voice, camera, file and attributes, which realize multi-hop wireless transmission of text information, video data, voice signal, picture and file respectively, and the attributes page shows the configuration of each parameter of the self-grouping terminal. The following is a detailed description of each part respectively.

  1. (6)

    H.264 based video transmission

The video transmission of the wireless self-assembling communication/monitoring system for emergency and disaster relief uses H.264 encoding and the V4L2 programming framework to provide good video service for emergency and disaster relief personnel.

  1. 1)

    H.264 encoding

H.264 is a new generation of digital video compression coding format jointly developed by the International Organization for Standardization (ISO) and the International Telecommunication Union (ITU), which is based on the MPEG-4 coding method. Better image quality. Its advantages are mainly reflected in the following aspects: low bit rate: compared with other coding technologies such as MPEG-2 and MPEG-4, the amount of data encoded with H.264 technology is 1/8 of that encoded with MPEG-2 technology for the same image quality. High quality video picture: H.264 is able to provide high-definition picture quality even at low bit rates, providing smooth HD video transmission at lower bandwidths. Powerful network adaptability: H.264 itself provides network abstraction layer services, and files can be transmitted smoothly on various networks (e.g. Internet, GPRS, WCDMA, CDMA2000, etc.).

Hybrid coding structure: H.264 adopts a hybrid coding structure, using a mixture of DCT transform coding and DPCM differential coding to improve coding efficiency. Error recovery function: H.264 effectively solves the problem of packet loss during network transmission, which makes H.264-encoded image data suitable for transmission in wireless networks. The H.264 video coding technology uses a layered design structure, namely the VCL video coding layer and the NAL network abstraction layer, to separate the compression of the video stream from the network transmission of the packets. The video encoding layer is responsible for compressing and encoding the video stream data, which is the core of H.264 encoding, while the network abstraction layer is responsible for encapsulating the encoded data according to network standards to ensure that the data can be transmitted over multiple networks. The H.264 layered model is shown in Fig. 5.23.

Fig. 5.23
A layered model of H dot 264. Control data is connected 2-way to the video coding layer and segmentation data. V C L is connected to segmentation data. Control data and segmentation data go to the Network Abstraction Layer with H dot 320, M P 4 F F, H dot 324, and so on.

H.264 layered model

The video coding layer mainly consists of video encoder and video decoder. Different methods can be chosen for coding, such as inter-frame coding or intra-frame coding for efficient compression of video streams, and the coding process consists of 5 parts: inter-frame and intra-frame prediction, transform and inverse transform, quantization and inverse quantization, loop filtering, and entropy coding. The network abstraction layer is responsible for encapsulating data using the network segmentation format, including group frames, logical signaling, and end-of-sequence signals, etc. The basic unit of the NAL layer is the NALU, which is composed of a fixed syntax sequence with variable byte lengths that vary depending on the transmitted data. The NALU mainly contains the NAL header information and the RBSP byte stream.

  1. 2)

    V4L2 Framework

The wireless self-organizing communication/monitoring system uses V4L2 framework to implement the video data capture function of USB camera. V4L2 is a dedicated video framework for Linux devices, which provides driver support for all video data capture on Linux system, and users can directly call API functions to operate on board multimedia devices. V4L2 is compatible with most existing drivers and is widely used in embedded devices, personal computers, and mobile terminals.

V4L2 uses pipeline to implement video data acquisition. The video acquisition program is implemented by calling the ioctl() function, which has three main parameters, namely, device descriptor, control command character and control command parameter. The function interfaces such as open, read, mmap, write, etc. are commonly used in the video capture process, and the common control command characters are shown in Fig. 5.24.

Fig. 5.24
A flowchart with 14 steps. The first step is to turn on the device and the last step is to turn off the divide. The second step is to get drive information followed by 9 steps and a decision box. It asks if the collection is finished. If yes, stop collecting. If no, go back to step 10.

Video capture process

  • VIDIOC_REQBUFS: allocate memory

  • VIDIOC_QUERYCAP: query driver

  • VIDIOC_S_FMT: Video capture format setting

  • VIDIOC_QBUF: read video data from the cache

  • VIDIOC_DQBUF: put data back into the buffer queue

  • VIDIOC_STREAMON: start video display

The video data acquisition process of V4L2 is shown in the figure below.

  1. 3)

    H.264 encoding implementation

The video coding of the wireless self-organizing network communication/monitoring system for emergency and disaster relief uses hardware codecs and is implemented using Samsung series processors. In the acquisition and transmission stage, the USB camera captures YUV422 format images, which are first converted into digital streams by analog-to-digital conversion, then converted into NV12 format images by FIMC (camera controller integrated in the main control chip), and finally converted into H.264 format streams by MFC (Samsung Multimedia Hardware Processor).

The Tiny4412 core board used in this system is integrated with the Samsung Multimedia Hardware Processor MFC module, which supports MPEG-2, MPEG-4, H.263, H.264 and other codec formats and is a low-power, high-performance video codec. In addition, the traditional software codec uses the CPU to undertake the coding and decoding work, which supports many kinds of codes but increases the workload of the CPU and affects the normal operation of other parts; By adopting hardware codec, the GPU is responsible for codec tasks, which takes up very little CPU resources, greatly reducing CPU workload and improving codec speed and smooth operation of the whole system. The coding flow of the MFC module is shown in Fig. 5.25.

Fig. 5.25
A flowchart with 11 steps. Step 1 is to turn on the M C F device. The last steps are error handling and clearing the M F C modules. The flow ends if the previous steps are not successful as per 3 decision boxes. It clears modules if there is no next frame and loops back to encoding if it has frames.

MFC coding flow

  1. 4)

    RTP transmission

Streaming media is a format in which the audio and video data is compressed and encoded for real-time transmission over a network. The files transmitted over the network by streaming technology are generally video and audio files. The audio and video transmission in the network mainly includes two kinds of transmission methods: downloading and streaming. Due to the limitation of network bandwidth at this stage, it takes some time to download video files from the network, which leads to a great delay, while with streaming technology, the playback of audio and video files can be realized after only a few seconds of start-up buffering, which greatly uplifts the user experience. The video transmission of a wireless self-organizing communication/monitoring system for emergency and disaster relief is implemented using RTP, the full name of which is Real-time Transport Protocol. The RTP protocol is commonly used in streaming media transmission systems to provide real-time end-to-end transmission services for streaming data transmission of the system. The RTP standard contains two sub-protocols, the RTP protocol and the RTCP protocol. Data Transfer Protocol RTP, responsible for the real-time transmission of data over the network, provides information such as timestamps, sequence numbers and load formats to enable synchronization and packet sequence detection; The control protocol RTCP is used to provide Quality of Service (QoS) feedback for RTP, collecting statistics on the number of bytes transferred, number of lost packets, network latency time, and other streaming media transfer stages. The RTP and RTCP protocols work in tandem to provide a reliable mechanism for real-time network delivery of streaming media. RTP datagram contains two main parts: RTP message header and RTP data load. The header format is fixed and contains the information for streaming media transmission; the data load part is the audio and video data to be transmitted.

  1. 5)

    Video transmission interface

The video transmission of the self-assembled terminal of this system is full-duplex communication, i.e. both sides send and receive the video at the same time during the communication phase, which ensures the mutual understanding of the situation between rescue and relief personnel when a disaster occurs. The right side of the interface is divided into four parts, the first part is the network technical indicators, reflecting the network conditions during transmission, respectively, throughput, time delay and jitter values, throughput reflects the number of successfully received data per unit of time, time delay reflects the time required to transmit messages or packets from one end of the network to the other, jitter reflects the degree of variation in the delay of messages or packets; The second part is the short code input box of the target terminal node, input the short code corresponding to the other terminal node, and click the video button to realize the communication signaling request; The third part is the last-hop address display box, which displays the last-hop address of the packets received during the communication process. The system supports multi-hop transmission of services and adopts the basic idea of Dynamic Source Routing Protocol (DSR), which is an IP-based multi-hop self-organized dynamic routing protocol implemented in embedded Linux, so that the display of the last-hop address can accurately know the path through which data is transmitted; The fourth part is the send and exit buttons for video. After entering the short number of the target terminal node, clicking the video button will perform the communication signaling request and communication connection establishment, and clicking the exit will quit the current task. The left video frame is the video display area, divided into two display parts. The upper right corner of the video frame shows the screen captured and sent by the camera of this terminal, and the remaining area of the video frame shows the video stream data received by this terminal. The video service supporting full-duplex communication and multi-hop transmission can provide powerful communication guarantee for the rescue operation in the disaster area.

  1. (7)

    G729-based voice intercom

Voice intercom is an important way of information communication, and the wireless self-organizing network communication/monitoring system for emergency and disaster relief realizes the voice intercom function between terminals. Good quality of voice communication has always been the goal that people pursue. With the development of technology, compressing the transmission bandwidth of the signal and increasing the transmission rate of the channel play an important role in the process of voice communication. Speech coding is the coding of the original speech signal collected by the hardware device. It converts the analog signal into digital signal to achieve the purpose of compressing the voice signal and high communication quality, and also facilitates the transmission of voice signal. There are three common speech coding methods: waveform coding, parametric coding and hybrid coding. Waveform coding is a direct conversion of the time-domain waveform of the speech signal into a digital code sequence. The technique targets the waveform of the collected analog speech signal and quantifies it in layers according to a certain sampling frequency, and at the receiving end, the received digital speech signal is decoded and restored to the original analog speech signal, and the speech signal obtained at the receiving end maintains the same waveform as the original signal. Waveform coding has high quality of voice call and higher coding rate, with the disadvantage that the compression of the voice signal is inefficient and takes up a larger transmission bandwidth. Parametric coding, also known as sound source coding, is to abstract the human body into a fixed vocal model, to encode the speech signal after extracting the characteristic parameters of the model, and to work on the recovery of speech at the receiving end in combination with the mathematical model. Parametric coding is characterized by a high compression rate and occupies a small transmission bandwidth, with the disadvantage that the coding rate is low and is affected by the noise in the surrounding environment. Hybrid coding is a combination of waveform coding and parametric coding, combining the advantages of both into one to make up for the shortcomings of each, and adding waveform coding technology on top of parametric coding technology to realize the joint improvement of both compression efficiency and voice call quality, which has good effect on voice transmission in practice. Voice communication in mobile communications is generally implemented by choosing hybrid coding techniques.

In view of the application scenario of wireless self-assembled communication/monitoring system and the development platform used, and considering the limited resources and processing capability of the embedded system, and based on the requirements of low code rate, low complexity and real-time, the G729 speech coding technology is the final choice for this system through a comprehensive comparison of several speech coding technologies discussed above. G729 speech coding technique is a linear predictive coding using conjugate structure arithmetic coding excitation. At a sampling frequency of 8 kHz, a speech signal of a length of 10 ms is coded to work. Each frame includes 80 sampled signals. The speech samples are analyzed and the CELP model is extracted. The CELP model consists of linear prediction coefficients, adaptive codebook indexes and gains. These parameters are transmitted to the receiver by coding, and at the receiver, the speech signal is reconstructed by decoding.

The system uses the ALSA architecture to implement the work of capturing the original sound signal. The full name of ALSA is Advanced Linux Sound Architecture, which provides support for audio on Linux systems, and the modular design efficiently supports all kinds of interfaces from entry-level to professional-level audio devices. The system can easily call the on-board devices for voice signal acquisition through the functions in ALSA. On the left side of the interface is the input box of the short code of the other party and the configuration information of this terminal. Enter the corresponding short code of the self-assembled terminal that needs voice communication in the short code input box, and click the call button to realize the request of voice communication signaling and voice communication connection, and realize the collection and playback of voice signal through the standard 3.5 mm headset and microphone external to the wireless self-assembled terminal. Both parties enter the voice communication interface. The wavy line on the right side indicates that the voice signal is currently being collected, the timer below accurately records the time of voice communication, and the exit button at the bottom serves to exit the current voice communication and interrupt the connection. The voice communication interface of the wireless self-assembling network communication/monitoring system for emergency and disaster relief is friendly, simple to operate and functional, which can provide a strong guarantee for information communication in complex geographical environment.