Fuzzy-based video compression using bilinear fuzzy relation equations

Cardone, Barbara; Di Martino, Ferdinando

doi:10.1007/s12652-023-04748-w

Fuzzy-based video compression using bilinear fuzzy relation equations

Original Research
Open access
Published: 25 January 2024

Volume 15, pages 2215–2225, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Fuzzy-based video compression using bilinear fuzzy relation equations

Download PDF

407 Accesses
Explore all metrics

Abstract

We present a novel color video compression method using the greatest solution of a system of bilinear fuzzy relation equations to assess the similarity between frames. The frames in each band are treated separately and each frame is classified as an Intra frame or a Predictive frame. A frame is labelled as Predictive frame, and compressed more than an Intra-frame, if the similarity value with the previous Intra frame is higher than a selected threshold; A pre-processing activity is performed to select the optimal threshold value of the similarity between frames. The proposed method allows to supply a high quality of the reconstructed frames and has the advantage of not requiring high CPU time and memory storage for its execution; it was tested on color videos of the Fast-Moving Objects dataset; the results show that it produces better performances than the Lukasiewicz similarity-based video compression method and comparable with those achieved by MPEG-4 and the deep learning video compression method DVC_pro. The results show that the quality of the reconstructed frames obtained with BFRE is comparable with that of DVC Pro, but has a lower computational complexity, providing better performances in terms of video encoding speed.

No-reference video quality assessment from artifacts and content characteristics: a neuro-fuzzy framework for video quality evaluation

Article 01 November 2023

An enhanced video compression approach through RLAH encoding and KDENN algorithms

Article Open access 22 January 2024

Local Binary Patterns and Neural Networks for No-Reference Image and Video Quality Assessment

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The exponential evolution of digital technologies and the need to use and archive video streams in numerous applications has led to the development of standard video compression methods and approaches used today in devices and video communication applications (Abomhara et al. 2010; Beach and Owen 2018).

Recently some authors present new video compression algorithms aimed to balance information loss with computational speed. In particular, some deep neural network-based autoencoders for video compression were proposed. These methods minimize the rate distortion of frames, setting or maximizing the number of bits (or bit rate) for the compression, in order to obtain a higher quality of the reconstructed frame. Their main benefits are their adaptiveness to the input and their non-linearity which makes them flexible to detect spatial variations in frames. However, even if deep learning video compression algorithms improve the rate-distortion by providing a high quality of the reconstructed frames, they have the disadvantage of being slower than traditional video compression algorithms and require a high memory capacity in the learning phase.

For this purpose, in this research we propose a new video compression algorithm based on bilinear fuzzy relation equations (for shorts, BFRE) which improves the quality of the reconstructed frames compared to traditional algorithms and requires lower CPU time and stored capacity than those required for deep learning video compression algorithms.

In this paper we propose a new video compression method based on BFRE, used to measure the similarity between frames.

The main goal of this research is to build an efficient and computationally fast video compression method by measuring the similarity between the frame and previous frames applying solutions of BFRE systems. We consider the decomposition of the video frames into two types of frames, called Intra-frames (for short, I-frames) and Predictive Frames (for short, P-frames) as proposed in (Nobuhara 2006). A P-frame is a frame whose similarity with the previous I-frame is higher than a prefixed threshold. Therefore, it can be compressed at a higher compression rate than that used to compress an I-frame.

We apply a similarity measure based on the bilinear fuzzy relation equations described in (Di Martino and Sessa 2017, 2018). The authors prove that the BFRE similarity index is more robust to noise with respect to similarity measures based on the Lukasiewicz t-norm.

The iterative frame classification process proposed in (Nobuhara 2006; Di Martino et al. 2010) is used to classify frames; we treat separately the three bands Red (R), Green (G) and Blue (B), then, for each band we apply the BFRE similarity measures to compare a frame with the previous I-frame, where each frame is normalized in [0,1] to represent a fuzzy relation. Instead of setting a-priori the similarity threshold, we execute a pre-processing phase in which the first k frames are analyzed; the first frame is marked as an Intra frame (I-frame) and the following frames as Predictive frames (P-frames). Each of the (k -1) P-frames is subsequently reconstructed and the trend of the PSNR is analyzed as the BFRE similarity index changes, choosing as the similarity threshold that value of the similarity index below which the PSNR trend decreases rapidly.

In Fig. 1 is schematized the pre-processing phase.

The pre-processing phase ends after selecting the similarity threshold value; then, is executed the BFRE compression process, in which each frame is labelled and compressed as I-frame or P-frame depending on whether the BFRE similarity with the previous I-frame is less than the selected threshold or not. The BFRE decompression process reconstructs the original frame; finally, three reconstructed frames in the three bands are merged to obtain the frame of the reconstructed color video.

The main performance benefits that characterize the proposed color video compression method are the following:

it uses an algorithm based on the BFRE similarity between images to label frames; it allows to reconstruct color videos with less information loss than that obtained in (Di Martino et al. 2010), in which a similarity measure based on Lukasiewicz’s triangular norm is used. Indeed, in (Di Martino & Sessa 2017, 2018;) it was demonstrated that the BFRE similarity is more robust to noise with respect Lukasiewicz t-norm similarity;
the adopted pre-processing phase allows to determine for each band an optimal value of the similarity threshold, guaranteeing a lower loss of information in the reconstructed frames compared to that obtained by selecting other threshold values;
the proposed method is independent of the image compression algorithm used to compress the frames;
unlike video-compression deep learning algorithms, it does not require high CPU time and memory storage.

1.1 Related work

In recent years the need to improve the quality of the reconstructed frames has prompted many researchers to propose video compression methods that minimize the RDO of the reconstructed frames.

In (Jridi and Meher 2017) a video compression technique based on Discrete Cosine Transform (for short DCT) is applied; it has better performances than other DCT-based video compression techniques, but it has high processing times. In (Wang et al. 2017) a video-compression method based on a quadtree plus binary tree block partition structure is proposed; this approach represents a tradeoff between encoding complexity and quality performance, but it engenders a not negligible decrease in the quality of the reconstructed videos.

In (Haitham 2018) a new video-compression technique based on a Contourlet Transform used with a motion compensation technique (MC) is proposed.

Recently, deep neural network-based autoencoders for video compression were proposed. These methods minimize the rate distortion (RDO), setting or maximizing the number of bits (or bit rate) for the compression, in order to obtain a higher quality of the reconstructed frame. Their main benefits are their adaptiveness to the input and their non-linearity which makes them flexible to detect spatial variations in frames.

A neural video compression algorithm is proposed in (Chen et al. 2020) to predict spatiotemporal relationship between blocks in video frames, minimizing the residual block error. In (Honggui and Trocan 2018) a deep neural framework is proposed in which multiple deep neural networks are trained to predict pixels of intra-frames based on pixels of previous frames.

An end-to-end network neural scheme is proposed in (Lu et al. 2018) to replace the standard codec structure. In (Lu et al. 2021) a hybrid variation of this end-to-end neural video compression framework is proposed to improve the compression efficiency and optimize the rate distortion. The results show that this hybrid video compression method provides better quality reconstructed frames than those obtained using traditional algorithms, but at the expense of high computational complexity.

A Deep Convolutional Neural Network with three hidden layers is used in (Putra et al. 2022) for intra-frame-based video compression; test results performed on color videos show that the quality of the reconstructed frames is better than that obtained using discrete cosine transform and fractal video-compression algorithms.

Surveys of deep learning video compression algorithms are provided in (Zhang et al. 2020; Bidwe et al. 2022). These studies show that deep learning video compression approaches improve the rate-distortion; however, they are slower than traditional video compression algorithms and require higher memory consumption.

In order to reach a trade-off between quality of the reconstructed frames and CPU times, we propose a new color videos compression method based on BFRE where the solutions of bilinear fuzzy relation equations are applied to measure the similarity e between frames.

(Loia and Sessa 2005) applied fuzzy relation equations in an iterative video compression method in which the Lukasiewicz fuzzy triangular norm measures the similarity between frames. The first frame is labelled as a I-frame, then, a t-norm based similarity measure is applied to compare the next frame to the previous I-frame; if this similarity value is higher than the selected threshold, then the current frame is labelled as a P-frame, otherwise it is labelled as a I-frame and becomes the I-frame to be compared with the next frame. The frames are compressed by applying the fuzzy relation equations image compression method (Di Martino et al. 2003a, b; Nobuhara 2006). In (Di Martino et al. 2010) the video compression method proposed in (Loia and Sessa 2005) is applied using the Fuzzy transform image compression algorithm (Perfilieva 2006; Di Martino and Sessa 2007; Di Martino et al. 2008) (for short F-transform) to compress each frame. The authors compare their method with the video compression method in (Loia and Sessa 2005) showing that the better results in terms of quality of the reconstructed video are obtained by using the F-transform algorithm.

(Di Martino and Sessa 2013) proposed a new video compression method in which the frames are classified in I-frames, P-frames, and Bi-directional frames (for short, B-frames), following the MPEG-4 algorithm classification (Pereira and Ebrahimi 2002; Richardson 2010). A B-frame is reconstructed by using either the previous or successive I-frame. The authors compare their method with the MPEG algorithm and the video compression method (Loia and Sessa 2005) using the Peak Signal to Noise Ratio (for short PSNR) index to measure the quality of the reconstructed frames. They show that the performances of the three algorithms are similar.

One of the main critical issues of the fuzzy-based videos compression algorithms is the selection of the similarity threshold between frames. In (Nobuhara et al. 2000; Loia & Sessa 2005) the Lukasiewicz’s t-norm operator is used to calculate the similarity between two frames considered as fuzzy relations; the similarity threshold is experimentally set at 0.95 as the quality of the reconstructed frames decreases exponentially when the similarity measure is below this value; however, this threshold value could vary according to the characteristics and type of frames. In the proposed method a preprocessing phase is adopted to optimize the choice of the similarity threshold in each band.

In Sect. 2 we introduce the BFRE image similarity measure, synthetizing the method to measure the similarity measure between images based on the BFRE greatest solution. In Sect. 3 the proposed BFRE color video compression method is presented. The results of experimental tests applied on color videos are shown and discussed in Sect. 4. Final considerations are discussed in Sect. 5.

2 BFRE Similarity between images

Let F and G be two M × N grey images. We denote with F_p,q and G_p,q the values of the pixels of F and G at the p^th row and at the q^th column with p = 1,…,M and q = 1,…,N. Let L the number of grey levels of the two images.

We construct a 3 × 3 window around the pixel F_pq with components:

$$ {\text{a}}_{{{\text{i, j}}}} = \frac{{{\text{F}}_{{{\text{p}} + {\text{i}} - 2,{\text{ q}} + {\text{j}} - 2}} }}{{{\text{L}} - 1}}{\text{ i}} = {1,}..{{,3 j = 1, \ldots ,3}} $$

(1)

and a correspondent 3 × 3 window around the pixel G_p,q with components:

$$ {\text{b}}_{{{\text{i}},{\text{j}}}} { = }\frac{{{\text{G}}_{{{\text{p}} + {\text{i}} - 2,{\text{ q}} + {\text{j}} - 2}} }}{{{\text{L}} - 1}}{\text{ i}} = {1,}..{{,3 j = 1, \ldots ,3}} $$

(2)

Let A = [a_ij] and B = [b_ij], where i = 1,…,3 and j = 1,…,3, be two 3 × 3 fuzzy relations; we construct the 3 × 3 BFRE system:

$$ \mathop \cup \limits_{{{\text{j}} = 1}}^{3} ({\text{a}}_{{{\text{ij}}}} \wedge {\text{x}}_{{\text{j}}} ) = \mathop \cup \limits_{{{\text{j}} = 1}}^{3} ({\text{b}}_{{{\text{ij}}}} \wedge {\text{x}}_{{\text{j}}} ){{ i}} = {1,}..{,3 } $$

(3)

where x = (x₁,x₂,x₃) is a vector of three fuzzy variables and the union and intersection operators are given, respectively, by the max s-norm and by the min t-norm.

In (Li 1992) is proposed an algorithm, called the BFRE algorithm, to find the greatest solution of a BFRE system in the form (3).

The BFRE algorithm is applied in (Di Martino and Sessa 2018) to find the greatest solution of the (3), given by ${\widehat{\mathbf{x}}}^{\mathbf{p}\mathbf{q}}={({\widehat{{\text{x}}}}_{1}^{{\text{pq}}},{\widehat{{\text{x}}}}_{2}^{{\text{pq}}},{\widehat{{\text{x}}}}_{3}^{{\text{pq}}})}^{{\text{T}}}$.

If ${\mathbf{x}}_{\mathbf{s}}^{\mathbf{p}\mathbf{q}}={(\text{0,0,0})}^{{\text{T}}}$ is the smallest solution of (3), in (Di Martino and Sessa 2018) is defined the following measure of similarity between the two fuzzy relations A and B:

$$ {\text{Sim}}_{{{\text{BFRE}}}}^{{{\text{pq}}}} = \frac{1}{3}\sqrt {\mathop \sum \limits_{{{\text{j}} = 1}}^{3} \left( {{\hat{\text{x}}}_{{\text{j}}}^{{{\text{pq}}}} - {\text{x}}_{{{\text{sj}}}}^{{{\text{pq}}}} } \right)^{2} } = \frac{1}{3}\sqrt {\mathop \sum \limits_{{{\text{j}} = 1}}^{3} \left( {{\hat{\text{x}}}_{{\text{j}}}^{{{\text{pq}}}} } \right)^{2} } $$

(4)

The index ${{\text{Sim}}}_{{\text{BFRE}}}^{{\text{pq}}}$ varies between 0 and $\frac{\sqrt{3}}{3}$. Its value is 0 if a_ij = 0 (1) and b_ij = 1 (0) for each i = 1,2,3 and j = 1,2,3 (maximum difference between the two windows) and is $\frac{\sqrt{3}}{3}$ ≈ 0.577 if the greatest solution of (9) is ${\widehat{\mathbf{x}}}^{\mathbf{p}\mathbf{q}}={(\mathrm{1,1},1)}^{{\text{T}}}$ (the two 3 × 3 windows are identical).

In (Di Martino and Sessa 2018) is defined the following similarity index between the images F and G as

$$ {\text{Sim}}_{{{\text{BFRE}}}} \left( {{\text{F}},{\text{G}}} \right) = \frac{1}{{{\text{M}} \times {\text{N}}}}\mathop \sum \limits_{{{\text{p}} = 1}}^{{\text{M}}} \mathop \sum \limits_{{{\text{q}} = 1}}^{{\text{N}}} {\text{Sim}}_{{{\text{BFRE}}}}^{{{\text{pq}}}} $$

(5)

where M and N are the number of rows and columns of the two images.

If F and G are color images in (Di Martino and Sessa 2018) the similarity between the two images is given by the average of the similarities in the three bands Red (R), Green (G) and Blue (B):

$$ {\text{Sim}}_{{{\text{BFRE}}}} \left( {{\text{F}},{\text{G}}} \right) = \frac{1}{3}\left( {{\text{Sim}}_{{{\text{BFRE}}}}^{{\text{R}}} \left( {{\text{F}},{\text{G}}} \right) + {\text{Sim}}_{{{\text{BFRE}}}}^{{\text{G}}} \left( {{\text{F}},{\text{G}}} \right) + {\text{Sim}}_{{{\text{BFRE}}}}^{{\text{B}}} \left( {{\text{F}},{\text{G}}} \right)} \right) $$

(6)

This similarity measure (6) is compared in (Di Martino and Sessa 2018) with other image similarity measures based on fuzzy relations. The authors measure the similarity between images distorted by Gaussian noise, proving the similarity measure (6) is more robust to noise with respect to other image similarity indices.

Below is schematized in pseudocode the algorithm used to calculate the similarity between two grey levels images in which the formula (5) is used.

3 The proposed video compression method

We propose a method based to the BFRE similarity measure (6) to compress color videos. We use the BFRE similarity measure to classify individual frames as I-frames or P-frames, treating videos in the three color bands R, G and B separately.

The first frame is always labeled as I-frame; then, the similarity (6) between the I-frame and the next frame is measured. If this similarity value is greater than a fixed threshold the next frame is labeled as P-frame, otherwise it is labeled as I-frame and is considered in the comparisons with the following frames. This process is iterated until the last frame of the video has been analyzed.

Following (Di Martino et al. 2010), we apply the Fuzzy Transform image compression method to compress the frames. The I-frames are compressed with a compression rate with a weak com-pression rate ρ_I and the P-frames are compressed by applying a strong compression rate ρ_P.

In order to compress the P-frames with a strong compression rate, in (Loia and Sessa 2005; Di Martino et al. 2010) an approach was used in which the P-frame is transformed in a difference frame D, called Δ-frame, where:

$$ {\text{D}}\left( {{\text{i}},{\text{j}}} \right) = \frac{1}{2}\left[ {{\text{F}}\left( {{\text{i}},{\text{j}}} \right) - {\text{G}}\left( {{\text{i}},{\text{j}}} \right)} \right]{\text{ i}} = 1, \ldots ,{\text{M j}} = 1, \ldots ,{\text{N}} $$

(7)

As the Δ-frame D has a low quantity of information it can be compressed with a strong compression rate. If $\widehat{F}$ is the decoded I-frame and $\widehat{D}$ the decoded Δ-frame, the P-frame G can be reconstructed by the following formula:

$$ {\hat{\text{G}}}\left( {{\text{i}},{\text{j}}} \right) = \frac{{\max \left\{ {0,{\hat{\text{F}}}\left( {{\text{i}},{\text{j}}} \right) - 2{\hat{\text{D}}}\left( {{\text{i}},{\text{j}}} \right) + 1} \right\}}}{{\max \left\{ {1,{\hat{\text{F}}}\left( {{\text{i}},{\text{j}}} \right) - 2{\hat{\text{D}}}\left( {{\text{i}},{\text{j}}} \right) + 1} \right\}}}{\text{ i}} = 1, \ldots ,{\text{M j}} = 1, \ldots ,{\text{N}} $$

(8)

Below is schematized the proposed video compression algorithm.

The following algorithm schematize the decompression process in which a P-frame is reconstructed by (8).

To fix the similarity threshold Sim_th is executed a pre-processing phase in which the first k frames of the single band video is analyzed; in this phase the first frame is marked as an I-frame and the following k-1 frames are considered P-frames. Each compressed P-frame is subsequently reconstructed and is calculated the PSNR measure, adopted to evaluate the quality of the reconstructed frame $\widehat{{\text{G}}}$ compared to the correspondent original frame G. Then, the trend of the PSNR index with respect to the BFRE similarity between the frame and the first frame is constructed. The PSNR index between the original frame G and the reconstructed frame $\widehat{{\text{G}}}$ is given by

$$ {\text{PSNR}}\left( {{\text{G}},{\hat{\text{G}}}} \right) = 20{\text{log}}_{10} \frac{{{\text{L}} - 1}}{{{\text{RMSE}}}} $$

(9)

where:

$$ {\text{RMSE}} = \sqrt {\frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{\text{M}}} \mathop \sum \nolimits_{{{\text{j}} = 1}}^{{\text{N}}} \left[ {{\text{G}}\left( {{\text{i}},{\text{j}}} \right) - {\hat{\text{G}}}\left( {{\text{i}},{\text{j}}} \right)} \right]^{2} }}{{{\text{M}} \times {\text{N}}}}} $$

(10)

The threshold Sim_th is selected as the value of the similarity index below which the PSNR trend decreases rapidly.

In Algorithm 4 is schematized in pseudocode the pre-processing method applied to select the similarity threshold.

We tested our method by applying it to color videos with different sizes. To measure the performance of the BFRE video compression algorithm we compared the results with those obtained using the MPEG-4 video compression method, the video com-pression method based on fuzzy transform (Di Martino and Sessa 2007) and the end-to-end Deep Video Compression DVC_Pro in (Lu et al 2021). The results are shown and discussed in the next section.

4 Results

To test the BFRE video compression algorithm, we have executed it on a set of 16 color videos extracted from the Fast-Moving Object (FMO) web page http://cmp.felk.cvut.cz/fmo. The video frames have sizes 1920 × 1080, 1280 × 820 and 960 × 540 pixels. An Intel(R) Core(TM) i7 processor with a CPU clock speed of 2.90 GHz was used for all tests.

For the sake of brevity, the results obtained on the color video darts_window1, made up of 179 frames, and ping_pong_side, made up of 445 frames, are shown in detail. In Fig. 2a and b are shown the first frames of the two-color videos.

All the frames were decomposed in the three bands Red, Green and Blue and the frames in each band have been treated separately. In these tests the values ρ_I = 0.25 and ρ_P = 0.0625 were selected respectively for the compression rate of I-frames and P-frames.

Now are shown the results obtained for the color video darts_window1. The first 40 frames are used in the pre-processing phase to fix the similarity threshold. The first frame is considered an I-frame and the other frames are all considered P-frames. In Fig. 3 is show the trend of the PSNR index with respect to the BFRE similarity in the three bands.

All the three trends in Fig. 4 show a rapid decrease below the similarity value 0.535, for which the selected similarity threshold is Sim_th = 0.535 in any band.

After selecting the similarity threshold were executed the BFRE video compression and BFRE video decompression algorithms to obtain the reconstructed video. The reconstructed first two frames are shown in Fig. 4a–b.

The plot in Fig. 5 shows the trend of the PSNR index with respect to the BFRE simi-larity measure obtained by executing the BFRE video compression and decompression algorithms. The PSNR value computed for each frame is given by the average of the PSNR measured in the three bands. The PSNR index rises as the similarity increases, oscillating between the values 29.6 and 30.9.

To analyze the performance of the BFRE video compression method we compared the results obtained with those obtained executing the fuzzy transform-based video compression algorithm in (Di Martino et al. 2010) (FTR), the ISO/IEC standard MPEG-4 (Pereira and Ebrahimi 2002) (MPEG) and the end-to-end Deep Video Compression DVC_Pro in (Lu et al 2021). In the comparison, the ρ_I and ρ_P compression rates selected for I-Frame and P-frame are used by performing FTR. A bit rate of 10 mbps has been selected in the MPEG-4 compression. The learning-based optical flow model Spynet (Ranjan and Black 2017) is applied in DVC_Pro to assess motion information. In Table 1 are shown the mean, standard deviation, min and max values of the PSNR index measured for all the reconstructed frames by executing BFRE, FTR MPEG and DVC_Pro.

Table 1 darts_window1– PSNR ccomparison results

Full size table

These results show that BFRE provides better results with respect to FTR and PSNR values similar to those obtained by running MPEG and DVC_Pro. Moreover, BFRE shows better stability than FTR, MPEG and DVC_Pro, also providing the best measure of the standard deviation of the PSNR value for all frames.

Now are shown the results obtained for the color video ping_pong_side. In Fig. 6a–c are shown the trend of the PSNR index with respect to the BFRE similarity measure in the three bands.

The trend of PSNR in the bands G and B (Fig. 6b and c) show a rapid decrease below the similarity value 0.500, for which we select a similarity threshold Sim_th = 0.500 in these two bands. The trend in the R band show a rapid decrease below a BFRE similarity 0.540; this value is selected as similarity threshold in the R band.

The reconstructed first two frames are shown, respectively, in Fig. 7a and b.

In Table 2 are shown the mean, standard deviation, min and max values of the PSNR index measured for all the reconstructed frames by executing BFRE, FTR and MPEG.

Table 2 ping_pong_side – PSNR comparison results

Full size table

The results in Table 2 confirm that the BFRE algorithm provide better performance than FTR in terms of mean PSNR. Moreover, the mean PSNR is comparable with the one measured executing MPEG-4 and DVC_Pro. In addition, BFRE shows better stability than FTR, MPEG and DVC_Pro, showing the lowest value of the standard deviation measure.

In Table 3 are shown the mean and the standard deviation of the PSNR values obtained for all the FMO videos executing BFRE, FTR, MPEG-4 and DVC_Pro. These measures are given by the average of the mean and standard deviation of PSNR obtained in each band. In bold are highlighted the best results.

Table 3 Mean PSNR and standard deviation obtained for the color videos in the FMO dataset

Full size table

These results confirms that the quality performances of BFRE are better than the ones obtained executing FTR and comparable with MPEG-4 and DVC_PRO. In addition, for all videos the lowest value of the standard deviation is obtained executing the BFRE algorithm, showing that BFRE provides greater stability compared to FTR, MPEG-4 and DVC_PRO in the oscillation of the PSNR measured for each frame.

We compare the computational complexity of the four video compression methods measuring the encoding speed in frames per second (fps). Table 4 show the encoding speed measured executing the four methods for each color video.

Table 4 Encoding speed measured for the color videos in the FMO dataset

Full size table

These results highlight that BFRE provide better encoding speed values with respect to MPEG and DVC_Pro; the encoding speed values obtained executing BFRE are similar than the ones obtained executing FTR.

5 Conclusions

We propose a new color video compression method based on the BFRE greatest solution algorithm. Following the frame type classification in (Loia and Sessa 2005; Nobuhara et al. 2006; Di Martino et al. 2010) we partition the set of frames in I-frame and P-frame. To classify a frame a new similarity measures based of BFRE is applies; a frame is labelled as P-frame if its similarity with the previous I-frame is greater than a fixed threshold. The source color video is treated separately in the three bands Red, Green and Blue. A pre-processing phase is executed in each band to select the optimal value of the similarity threshold in that band; in this phase the first k frames are analyzed labelling as an I-frame the first frame and as a P-frame any other frame. The similarity threshold is given by the value of the similarity below which the PSNR trend decreases rapidly.

We execute comparative test of the proposed method with the method based on the F-transforms proposed in (Di Martino et al. 2010), MPEG-4 and the learning video compression method DVC-Pro proposed in (Lu et al. 2021). The results show that our method provides better performance than those provided by the F-transforms and MPEG-video compression method and comparable with the ones provided by DVC-Pro. In addition, the PSNR values measured for each frame obtained by performing BFRE show a lower standard deviation than those obtained by performing the other three video compression methods; this result suggests that the proposed method may be more robust than FTR, MPEG-4 and DVC_Pro. Finally, encoding speed comparisons were performed to analyze the computational complexity of the four methods. The results show that BFRE provides encoding speeds higher than those provided by DVC_Pro and comparable to those provided by FTR. Thus, BFRE provides overall optimal performance in terms of reconstructed frame quality and video encoding speed.

In the future we intend to carry out further experimental tests on video datasets of different sizes, frame rates and resolutions. Furthermore, we intend to test our method on videos with noisy frames and sequences of frames to measure its robustness.

Data availability

The data that support the findings of this study are available on request from the corresponding author, [F.D.M].

References

Abomhara M, Khalifa OO, Zakaria O, Zaidan AA, Zaidan BB, Rame A (2010) Video compression techniques: an overview. J Appl Sci 10:1834–1840. https://doi.org/10.3923/jas.2010.1834.1840
Article Google Scholar
Beach A, Owen A (2018) Video compression Handbook 2nd Edition, Peachpit Press, 336 pp., ISBN: 978–0134866215
Bidwe RV, Mishra S, Patil S, Shaw K, Vora DR, Kotecha K, Zope B (2022) Deep learning approaches for video compression: a bibliometric analysis. Big Data Cogn Comput 6:44. https://doi.org/10.3390/bdcc6020044
Article Google Scholar
Chen Z, He T, Jin X, Wu F (2020) Learning for video compression. IEEE Trans Circuits Syst Video Technol 30(2):566–576. https://doi.org/10.1109/TCSVT.2019.2892608
Article Google Scholar
Di Martino F, Sessa S (2007) Compression and decompression of images with discrete fuzzy transforms. Inf Sci 177:2349–2362. https://doi.org/10.1016/j.ins.2006.12.027
Article MathSciNet Google Scholar
Di Martino F, Sessa S (2018) Comparison between images via bilinear fuzzy relation equations. J Ambient Intell Humaniz Comput 9:1517–1525. https://doi.org/10.1007/s12652-017-0576-3
Article Google Scholar
Di Martino F, Loia V, Perfilieva I, Sessa S (2008) An image coding/decoding method based on direct and inverse fuzzy transforms. Int J Approxim Reason 47:110–131. https://doi.org/10.1016/j.ijar.2007.06.008
Article Google Scholar
Di Martino F, Loia V, Sessa S (2010) Fuzzy transforms for compression and decompression of color videos. Inf Sci 180(20):3914–3931. https://doi.org/10.1016/j.fss.2009.08.002
Article MathSciNet Google Scholar
Haitham A-A (2018) Video compression based on motion compensation and contourlet transform. Third Scientific Conference of Electrical Engineering (SCEE) 2018:90–94. https://doi.org/10.1109/SCEE.2018.868407
Article Google Scholar
Honggui L, Trocan M (2018) Deep neural network based single pixel prediction for unified video coding. Neurocomputing 272:558–570. https://doi.org/10.1016/j.neucom.2017.07.037
Article Google Scholar
Jridi M, Meher PK (2017) A scalable approximate DCT architectures for efficient HEVC compliant video coding. IEEE Trans Circuits Syst Video Technol 27(8):1815–1825. https://doi.org/10.1109/TCSVT.2016.2556578
Article Google Scholar
Li JX (1992) A new algorithm for the greatest solution of fuzzy bilinear equations. Fuzzy Sets Syst 46:193–210. https://doi.org/10.1016/0165-0114(92)90132-n
Article MathSciNet Google Scholar
Loia V, Sessa S (2005) Fuzzy relation equations for coding/decoding processes of images and videos. Inf Sci 171:145–172. https://doi.org/10.1016/j.ins.2004.04.003
Article MathSciNet Google Scholar
Lu G, Zhang X, Ouyang W, Chen L, Gao Z, Xu D (2021) An end-to-end learning framework for video compression. IEEE Trans Pattern Anal Mach Intell 43(10):3292–3308. https://doi.org/10.1109/TPAMI.2020.2988453
Article Google Scholar
Lu G, Ouyang W, Xu D, Zhang X, Cai C, Gao Z (2018) DVC: an end-to-end deep video compression framework. arXiv preprint arXiv:1812.00101
Di Martino F, Sessa S (2013) Coding B-Frames of Color Videos with Fuzzy Transforms. Adv Fuzzy Syst Article ID 652429, 9 pp. https://doi.org/10.1155/2013/652429
Di Martino F, Sessa S (2017), Bilinear equations and fuzzy image comparison, 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 9–12 July 2017, Naples (Italy), pp. 1–6. https://doi.org/10.1109/FUZZ-IEEE.2017.8015397
Di Martino F, Loia V, Sessa S (2003a) A method in the compression/decompression of images using fuzzy equations and fuzzy similarities, in: Proceedings of Conference IFSA 2003, (29/6–2/7/2003, Istanbul, Turkey) In: Bilgic T, De Baets B and Kaynak O (Eds.), Lecture Notes in Artificial Intelligence, Vol. 2715, Springer, Berlin, Germany, pp. 524–527
Di Martino F, Loia V, Sessa S (2003b) A method for coding/decoding images by using fuzzy relation equations, Selected papers from IFSA (29/6–2/7/2003, Istanbul, Turkey), In: Bilgic T, De Baets B and Kaynak O (Eds.), Lecture Notes in Artificial Intelligence, Vol. 2715, Springer, Berlin, Germany, 2003, pp. 436–441. https://doi.org/10.1007/3-540-44967-1_52
Nobuhara H, Hirota K, Pedrycz W (2000) Fast solving method of fuzzy relational equations and its application to lossy image compression. IEEE Transact Fuzzy Syst 8(3):325–334. https://doi.org/10.1109/91.855920
Article Google Scholar
Nobuhara H, Hirota K, Pedrycz W, Sessa S (2006) A motion compression/reconstruction method based on max T-norm composite fuzzy relational equations. Inf Sci 176(17):2526–2552. https://doi.org/10.1016/j.ins.2005.12.004
Article Google Scholar
Pereira F, Ebrahimi T (2002), MPEG-4 Book, Prentice Hall Professional, Upper Saddle River, NJ, USA, 849 pp., ISBN: 978–0–13–061621–0
Perfilieva I (2006) Fuzzy transforms: theory and applications. Fuzzy Sets Syst 157:993–1023. https://doi.org/10.1016/j.fss.2005.11.012
Article MathSciNet Google Scholar
Putra ABW, Gaffar A, Sumadi M, Setiawati L (2022) Intra-frame based video compression using Deep Convolutional Neural Network (DCNN), 6(3), 650659, https://doi.org/10.30630/joiv.6.3.1012
Ranjan A, Black M J (2017) Optical flow estimation using a spatial pyramid network, in Proc. IEEE Conf. Comput. Vis. Pattern Recognition, Art. no. 2, pp. 4161–4170
Richardson IE (2010) The H. 264 advanced videos compression standard. John Wiley & Sons, Hoboken, NJ, USA, 316 pp., ISBN: 978–0470516928
Wang Z, Wang S, Zhang J, Wang S, Ma S (2017) Effective quadtree plus binary tree block partition decision for future video coding, 2017 Data Compression Conference (DCC), 4–7 April 2017, Snowbird, UT, USA, pp. 23–32, https://doi.org/10.1109/DCC.2017.70
Zhang Y, Kwong S, Wang S (2020) Machine learning based video coding optimizations: a survey. Inf Sci 506:395–423. https://doi.org/10.1016/j.ins.2019.07.096
Article Google Scholar

Download references

Funding

Open access funding provided by Università degli Studi di Napoli Federico II within the CRUI-CARE Agreement.

Author information

Authors and Affiliations

Dipartimento Di Architettura, Università Degli Studi Di Napoli Federico II, Via Toledo 402, 80134, Naples, Italy
Barbara Cardone & Ferdinando Di Martino
Università Degli Studi Di Napoli Federico II, Centro Interdipartimentale Di Ricerca “A. Calza Bini”, Via Toledo 402, Naples, Italy
Ferdinando Di Martino

Authors

Barbara Cardone
View author publications
You can also search for this author in PubMed Google Scholar
Ferdinando Di Martino
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, B.C. and F.D.M.; methodology, B.C. and F.D.M.; software, B.C. and F.D.M.; validation, B.C. and F.D.M.; formal analysis, B.C. and F.D.M.; investigation, B.C. and F.D.M.; re-sources, B.C. and F.D.M.; data curation, B.C. and F.D.M.; writing—original draft preparation, B.C. and F.D.M.; writing—review and editing, B.C. and F.D.M.; visualization, B.C. and F.D.M.; super-vision, B.C. and F.D.M. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Ferdinando Di Martino.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

This research does not contain any studies involving human participants performed by any of the authors.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cardone, B., Di Martino, F. Fuzzy-based video compression using bilinear fuzzy relation equations. J Ambient Intell Human Comput 15, 2215–2225 (2024). https://doi.org/10.1007/s12652-023-04748-w

Download citation

Received: 20 January 2023
Accepted: 16 December 2023
Published: 25 January 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s12652-023-04748-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fuzzy-based video compression using bilinear fuzzy relation equations

Abstract

Similar content being viewed by others

No-reference video quality assessment from artifacts and content characteristics: a neuro-fuzzy framework for video quality evaluation

An enhanced video compression approach through RLAH encoding and KDENN algorithms

Local Binary Patterns and Neural Networks for No-Reference Image and Video Quality Assessment