Probabilistic Approach Versus Machine Learning for OneShot QuadTree Prediction in an Intra HEVC Encoder
 321 Downloads
Abstract
Evolutions of the Internet of Things (IoT) in the next years are likely to boost mobile video demand to an unprecedented level. A large number of batterypowered systems will integrate an Hevc video codec, implementing the latest encoding MPEG standard, and these systems will need to be energy efficient. Constraining the energy consumption of Hevc encoders is a challenging task, especially for embedded applications based on software encoders. The most efficient approach to reduce the energy consumption of an Hevc encoder consists in optimizing the quadtree block partitioning of the image and tradeoff compression efficiency and energy consumption by efficiently choosing the nearoptimal pixel block sizes. For the purpose of reducing the energy consumption of a realtime Hevc Intra encoder, this paper proposes and compares two methods that predict the quadtree partitioning in “oneshot”, i.e. without iterating. These methods drastically limit the computational cost of the recursive RateDistortion Optimization (RDO) process. The first proposed method uses a Probabilistic approach whereas the second method is based on Machine Learning approach. Experimental results show that both methods are capable of reducing the energy consumption of an embedded Hevc encoder of 58% for a bit rate increase of respectively 3.93% and 3.6%.
Keywords
HEVC Intra Quadtree prediction Oneshot Machine learning Energy Realtime1 Introduction
With the progress of microelectronics, many embedded applications now encode and stream live video contents. The Hevc [33, 34, 42] standard represents the stateoftheart video standard. When compared with the previous ISO/IEC Moving Picture Experts Group (Mpeg) Advanced Video Coding (AVC) standard, Hevc Main profile reduces the bit rate by 50% on average for a similar objective video quality [35, 37]. This gain reduces the energy needed for transmitting video. On the other hand, the computational complexity of the encoders has been significantly increased. The additional complexity brought by Hevc is mostly due to the new quadtree block partitioning structure of Coding Tree Units (CTUs) and the increase in the number of Intra prediction modes, which exponentially impact the RateDistortion Optimization (RDO) process [20].
The main limitation of recent embedded systems, particularly in terms of computational performance, comes from the bounded energy density of batteries. This limitation is a major constraint for image and video applications, video encoding and decoding being for instance the most energyconsuming algorithms on smart phones [3]. A large share of systems are likely to integrate the Hevc codec in the long run and will require to be energy efficient, and even energy aware. As a consequence, energy consumption represents a serious challenge for embedded Hevc realtime encoders. For both hardware and software codecs, a solution to reduce energy consumption is to decrease the computational complexity while controlling compression quality losses.
To reduce the computational complexity of Hevc encoders, several algorithmic solutions have been proposed at the level of quadtree partitioning. Indeed, choosing the right encoding block sizes is necessary to obtain a good compression ratio but this choice is difficult and usually results from a costly RDO process. The exhaustive search partitioning solution is the optimal one, obtained by testing all possible partitioning configurations and selecting the one that minimizes the RateDistortion (RD)cost. This process is the most time consuming operation in an Hevc encoder and thus it offers the biggest opportunity of complexity reduction (up to 78% in the considered embedded encoder) [20]. Complexity reduction solutions at the quadtree level consist in predicting, without encoding, the adequate level of partitioning that offers the lowest RD–cost.
As examples of related works, authors in [31] and [4] propose to use the correlation between the minimum depth of the colocated CTUs in the current and previous frames to skip computing some depth levels during the RDO process. Authors in [1, 9, 15, 25, 40] use CTU texture complexities to predict the quadtree partitioning. All these solutions are based on reducing the complexity of an offline (i.e. nonrealtime) costly reference encoder called Hevc test Model (HM). In this paper, we target energy reduction in a realtime context of an optimized software encoder. A realtime encoder such as Kvazaar is up to 10 times faster than HM [39]. The complexity reduction performance of stateoftheart solutions based on HM are biased since they are measured with respect to a large compression time. The complexity overhead of stateoftheart solutions is thus comparatively higher in the context of a realtime encoder.
We propose in this paper two energy reduction methods for Hevc Intra encoders based on a CTU partitioning prediction technique. We then compare these methods that drastically limit the recursive RDO process. The first method exploits the correlation between CTU partitioning and the variance of the CTU luminance samples to predict the quadtree decomposition in oneshot. The second method uses a Machine Learning approach to perform the same prediction. Machine Learning is an interdisciplinary subfield of computer science that aims to replace the manually engineered solutions for extracting information from sensored data in all application fields. The two methods are compared in terms of prediction accuracy of the quadtree partitioning as well as in terms of compression performance.
The rest of this paper is organized as follows. Section 2 presents an overview of the Hevc intra encoder and goes through StateoftheArt methods of complexity reduction techniques. Section 3 details the first proposed probabilistic algorithm of quadtree partitioning prediction based on variance studies. Section 4 presents the second proposed Machine Learning algorithm of quadtree partitioning prediction. The two proposed energy reduction schemes are then compared in term of quadtree partitioning accuracy and performance in Section 5. Finally, Section 6 concludes the paper.
2 Related works
2.1 HEVC Encoding and its Rate Distortion Optimisation
2.2 Software RealTime HEVC Encoder
For embedded applications, hardware encoding solutions [27] consume much lower energy than software solutions. However, when the considered system does not embed a hardware coprocessor, a software Hevc encoder [13, 24, 36, 38] can be used, for instance the Hevc reference software model (HM). HM is widely used, as it has been designed to achieve an optimal coding efficiency (in terms of RD). However, the computational complexity of HM is high and not adapted to embedded applications. To fill this gap, the x265 [24], f265 [38] and Kvaazar [36] Hevc software encoders provide realtime encoding solutions, leveraging on parallel processing and lowlevel Single Instruction Multiple Data (Simd) optimizations for different specific platforms.
This study is based on the Kvaazar Hevc encoder [36] for its realtime encoding capacity of Ultra High Density (UHD) videos. The conclusions of this study can however be extended to other real time software or hardware encoders, as they all depend on a classical RDO process to reach high compression performance.
2.3 Complexity Reduction of the QuadTree Partitioning
As shown in [20], in a realtime software Hevc Intra encoder, two specific parts of the encoding algorithm provide the highest opportunities of energy reduction; the Intra prediction (IP) level offers at best 30% of energy reduction whereas the CTU quadtree partitioning level has a potential of energy reduction of up to 78%. Previous studies on low complexity CTU quadtree partitioning can be classified into two categories: the early termination complexity reduction techniques which are applied during the RDO process to dynamically terminate the process when further gains are unlikely, and the predictionbased complexity reduction techniques which are applied before starting the RDO process and predict the quadtree partitioning with lower complexity processing. In this paper, we focus on predictionbased complexity reduction techniques.
Authors of [4, 31, 44] propose to reduce the complexity of the Hevc encoder by skipping some depth levels of the quadtree partitioning. The skipped depths are selected based of the correlation between the minimum depth of the colocated CTUs in the current and previous frames. Results in [4] show an average time savings of 45% for a Bjøntegaard Delta Bit Rate (BDBR) increase of 1.9%. For the algorithm from [31], results show an average complexity reduction of 21%. Concerning [44], experimental results show that the method can save about 48% encoding time for a BDBR increase of 2.9%. In this paper, the objective of the study is to demonstrate a drastic energy reduction in a realtime encoding setup by predicting the CTU partitioning. As a consequence, higher energy reductions are obtained at the expense of higher BDBR increases.
Works in [1, 9, 15, 25, 40] use CTU texture complexities to predict the quadtree partitioning. Min et al. [1] propose to decide if a CU has to be split, nonsplit or if it is undetermined, using the global and local edge complexities in four different directions (horizontal, vertical, 45^{∘} and 135^{∘} diagonals) of CU and subCUs. This method provides a computational complexity reduction of 52% (in the nonrealtime HM) for a BDBR increase of 0.8%. Feng et al. [9] use information entropy of CUs and subCUs saliency maps to predict the CUs size. This method reduces the complexity by 37.9% (in HM) for a BDBR increase of 0.62%.
Khan et al. [15] propose a method using texture variance to efficiently predict the CTU quadtree decomposition. The authors model the Probability Density Function (PDF) of variance populations by a Rayleigh distribution to estimate some variance thresholds and determine the quadtree partitioning. This method reduces the complexity by 44% (in HM) with a BDBR increase of 1.27%. Our experiments have shown that the assumption of a Rayleigh distribution is not verified in many cases. For this reason, our Probabilistic proposed method, based on the variance, does not consider the Rayleigh distribution and thus differs from [15].
In [26], Penny et al. propose the first Paretobased energy controller for an Hevc encoder. From [26] are extracted the following results which are the average results on one sequence of each video class (A, B, C, D et E). For an energy reduction from 49% to 71% (in HM), authors achieve a BDBR increase between 6.84% and 25%, respectively.
Several works have been proposed that use Machine Learning based optimization to reduce the complexity of the Hevc encoding process. Authors of [29, 30] present an Intra CU size classifier based on datamining with an offline classifier training. The classifier is a threenode decision tree using mean and variance of CUs and subCUs as characteristics. This algorithm reduces the coding time by 52% (in HM) at the expense of BDBR increase of 2%. Duanmu et al. [7] present a fast CU partitioning using Machine Learning for screen content video compression. Authors use several characteristics such as CU luma variances, color Kurtosis of CU, gradient Kurtosis of CU. Shen and Yu [32] propose a CU splitting early termination algorithm based on a Support Vector Machine (SVM). The RD cost losses due to the misclassification are used as features (weights) during the SVM training. In [43], authors model the coding tree determination in Hevc with a threelevel hierarchical decision problem using SVM predictors.
These studies are all based on complexity reduction of the HM software encoder and their performance can not be directly translated to realtime encoders. The two methods proposed in the next sections are studied within a realtime optimized encoder and demonstrate high prediction efficiency.
3 Probabilistic Approach for Predicting an HEVC QuadTree Partitioning
3.1 VarianceBased Decision for QuadTree Partitioning
To study how to predict the quadtree partitioning from the variance values of CU luminance samples, two populations of CUs at a current depth d are defined: Merged (M) and Non Merged (NM). The CU belongs to the Non Merged population when the full RDO process chooses to encode the CU at the current depth d, while the CU belongs to the Merged population when the RDO process choose to encode the CUs at a new depth d^{′} with d^{′} < d. With a bottomup approach (i.e. d from 4 to 1), all CUs of the quadtree decomposition of all CTUs can be classified into one of these two populations.
Cumulative Distribution Function (CDF) of the Non Merged population can be used to decide whether a CU has to be merged or not. In our case, the CDF defines the probability of the variance population of a given CU size being less or equal to a given value.
Variance thresholds υ_{th}(Δ,d) of the 50th frame of two example sequences versus d and Δ.
Sequence name  Δ  d = 1  d = 2  d = 3  d = 4 

PeopleOnStreet  0.3  31.8  31.1  51.4  97.0 
0.5  40.8  40.1  66.4  127.0  
0.7  49.8  50.1  84.4  166.0  
0.9  58.8  61.1  109.4  219.0  
ParkScene  0.3  41.2  24.3  29.1  53.5 
0.5  51.2  31.3  37.1  70.5  
0.7  62.2  40.3  48.1  90.5  
0.9  74.2  50.3  62.1  117.5 
3.2 Variance Threshold Modelling

Thresholds from CDFs of variances can be predicted from reference Learning Frame (F_{L}).

LookUp Tables (LUTs) requires light computation and memory overheads for the determination of the threshold.

The prediction of thresholds is independent from the QP value (Fig. 5).

Threshold modelling is accurate with a mean Rsq of 0.86 for the different depths.

Thresholds can be precomputed according to Δ value as a parameter.
The next section describes our first proposed algorithm to predict the CTU partitioning using a variance criterion and the obtained thresholds υ_{th}(Δ,d).
3.3 Probabilistic Prediction of the CTU Partitioning
The following section describes the proposed algorithm and the prediction scheme that predicts in oneshot the CTU partitioning using a variance criterion and the thresholds υ_{th}(Δ,d) described in Section 3.1.
3.3.1 Computing the CTU Depth Map
3.3.2 Refining the CTU Depth Map
To increase the accuracy of the oneshot depth map prediction with a limited impact on the complexity, a second algorithm is designed that refines the Cdm.
The algorithm, described by Algorithm (2), takes as input a Cdm matrix from Algorithm (1) and generates a second Cdm called Rcdm. The Rcdm is the result of merging all groups of four neighboring blocks (in the Zscan order) having the same depth in the input Cdm. Algorithm (2) details the process as follows.
The first step checks whether the input Cdm depth is equal to 0, if so then no merge can be applied and thus the Rcdm is also set to 0 (line 2). If not, the Cdm is analysed element by element (lines 4–5). Due to the fact that a depth of 4 in a Cdm corresponds to 4 CUs 4 × 4, they are always merged to a depth 3 and thus the value in the Rcdm is automatically set to 3 (line 7). For the general case (i.e d ∈{1,2,3}), if the evaluated element in the matrix correspond to the fourth block (in the Zscan order) of the given depth d (line 11) and if the 3 others blocks have the depth d (line 12), then the algorithm fills the corresponding blocks of the Rcdm with the upper depth d − 1 (line 13).
Figures 6 show an example of a Cdm (Fig. 6a) and its associated Rcdm (Fig. 6b) matrices. The grey blocks in the Rcdm Fig. 6b represent the merged blocks. The next section describes our Probabilistic energy reduction scheme.
3.3.3 Resulting Probabilistic CTU Prediction Method
Based on the previous elements, we propose to limit the recursive search of the RDO process on the CTU quadtree decomposition by predicting the codingtree partitioning from video frame content properties. We introduce a probabilistic varianceaware quadtree partitioning prediction method, illustrated in Fig. 9. First, the video sequence is split into Groups of Frames (GOF). The first frame of a GOF, called Learning Frame (F_{L}) is encoded with a full RDO process. From this encoding are extracted the variances υ^{d} according to the depth d ∈{1,2,3,4} selected during the full RDO process. Then, the two following statistical moments according the depth d are computed: the means \(\mu _{\upsilon ^{d}}\) and the standard deviations \(\sigma _{\upsilon ^{d}}\) of the variance populations υ^{d}. According to the parameter Δ, the set of thresholds υ_{th}(d) are calculated using Eq. 2 and the LUT of the coefficients a(Δ,d), b(Δ,d) and c(Δ,d) computed offline (cf. Section 3.2).
To conclude this section, our proposed probabilistic energy reduction scheme takes as input the parameter Δ to generate Cdms and Rcdm. Then, the Hevc encoder is forced to only apply the RDO process between the Cdm and the Rcdm. The next section details the competing Machine Learning method.
4 Machine Learning Approach for Predicting an HEVC QuadTree Partitioning
This Section presents our second quadtree prediction method based on Machine Learning. This quadtree prediction is then used to drastically simplify the brute force algorithm usually employed in Hevc encoders.
4.1 Machine Learning Based Decision
As in the probabilistic method (Section 4), the Machine Learningbased quadtree prediction follows a bottomup approach (from CU 4 × 4 to CU 32 × 32). The classification problem remains to determine whether the CU of the depth d has to be merged in CU of the depth d − 1 as illustrated in Fig. 2. The next section details the training setup of the learning algorithm.
4.1.1 Training SetUp for the Coding Tree Structure Determination
Machine Learning efficiency is very linked to the diversity of data serving for the training. Video sequences used to train the Machine Learning framework are chosen to cover a vast space of content types. To select this training data set including a large range of video contents and complexities, the Spatial Information (SI) and Temporal Information (TI) metrics [12] are used to characterize video sequences. The TI and SI give respectively the degrees of motion and spatial details in the video sequence. Since compression complexity is highly linked to these two spatiotemporal parameters, the set of training sequences for the Machine Learning feature evaluation should span a large range of both SI and TI.
Overfitting, i.e. overspecializing a model to a training set, constitutes one of the main risks for the quality of an Machine Learningbased model [6]. Thus, the dataset used for training should result in a low bias. In our case, due to the broad range of resolutions and frame rates across the training sequences, the total number of CTU for each class is not equally distributed. For instance, sequences with high resolution contain a high number of CTUs with low texture complexities when compared to sequences with low resolution. To avoid such bias, datasets used for training are forced to be composed of a fixed number of CTUs from each class. To avoid the temporal bias, which would lead to redundant information, the sampled CTUs come from frames uniformly distributed throughout the sequences: 13 frames of the class A, 25 frames of class B, 55 frames for class E, 125 frames of class C and 500 frames of class D. For each depth d, 80000 instances are randomly sampled from the previous defined data pool, composed by 40000 instances of each prediction decision at each depth d.
The open source Waikato Environment for Knowledge Analysis (WEKA) Machine Learning framework is used for the training process [11]. Weka is chosen for its popularity and extensive set of documentations. It includes a large number of Machine Learning algorithms for data mining tasks, such as REPTree, LMT, RandomForest, BFTree and C4.5 among others. WEKA also provides several useful tools for features evaluation that use strategies according to a search algorithm so as to rank the features depending on their usefulness. For the current study, features have been selected using the information gain provided by the WEKA software. Information gain is based on the KullbackLeibler Divergence (KLD) [18], also called relative entropy, which measures the divergence between two probability distributions.
4.1.2 Decision TreesBased Partitioning Decisions
Stateoftheart studies described in Section 2.3 gather many characteristics used to predict the coding tree decomposition of a CTU. To predict the coding tree in oneshot, only characteristics independent from the encoding process with a limited overhead of computation are considered.

CU var [7, 15, 21, 25, 29, 30] : the variance of the CU luminance samples of depth d (1 features).

LowerCU var [7, 15, 21, 29, 30]: the variances of the 4 subCU luminance samples of depth d + 1 (4 features).

UpperCU var [15, 21, 25, 29, 30]: the variances of the upper CU luminance samples of depth d − 1 (1 features).

NhbrCU var [7, 15]: the variances of the neighbouring CU luminance samples of the depth d in the Zscan order (3 features).

Var of lowerCU mean [29, 30]: the variance of the mean of the 4 subCU luminance samples of the depth d + 1 (1 feature).

Var of lowerCU var [29, 30]: the variance of the variance of the 4 subCU luminance samples of the depth d + 1 (1 feature).

QP: the QP of the frame (1 feature).
The training of the decision trees is performed with the C4.5 algorithm [28] because the trees it generates are light weight. In terms of information gain, the C4.5 algorithm uses KLD to select the best features for each decision. The C4.5 algorithm is iterating among all training instances and searches for each features the threshold that achieves the best classification, i.e. with the highest information gain. Then, the features and its corresponding threshold are used to divide the training instances into two subsets. Finally, the process is recursively iterated on the two different subsets of training instances.
To measure the accuracy of the decision trees, a 10fold crossvalidation is performed on the training instances. The crossvalidation technique evaluates a predictive models by partitioning the original instances into a training set to train the model, and a test set to evaluate it. In 10fold crossvalidation, the original instances are randomly split into 10 equally sized subsets. Among the 10 subsets of instances, one subset is used as the validation instances for testing the model, and the remaining 9 subsets are used as training instances. The crossvalidation process is then repeated 10 times (called folds), with each of the 10 subsets used exactly once as the validation instances. Let the Percentage of Correctly Classify Instances (PCCI) given by the 10fold crossvalidation be the accuracy of the decision trees.
Two types of classifiers are defined for each depth d: the Merge and Split decision trees. These two decision trees solve the same classification problem illustrated in Section 3 by Fig. 2 but differ in their input features. The Merge decision trees use the features linked to CU of depth d to predict if the 4 CUs of the d have to be merged in the CU of depth d − 1. The Split decision trees use the features linked to CU of depth d − 1 to predict if the current CU of the d − 1 have to be split in 4 CUs of depth d.
Decision trees dimensions and accuracy (PCCI) according to the depth d. The accuracy of both Merge and Split decision trees are over than 80% of good decisions
Merge decision trees  
Depth  d = 4  d = 3  d = 2  d = 1 
Nb leaves  18  15  10  9 
Size  15  13  13  15 
PCCI  81.39%  80.52%  80.19%  81.26% 
Split decision trees  
Depth  d = 3  d = 2  d = 1  d = 0 
Nb leaves  18  15  10  9 
Size  35  29  19  17 
PCCI  82.24%  80.89%  80.87%  80.83% 
The next sections describes how we use decisions trees to predict CTU partitioning.
4.2 Formalisation of the CTU Partitioning Decisions
4.2.1 Machine Learning Prediction Algorithm for CTU Partitioning
Algorithm (3) describes our proposed bottomup algorithm that predicts the CTU partitioning using a Machine Learning approach. The algorithm takes as inputs all the features \({\mathcal {F}}_{x,y}^{d}\) defined in Section 4.1.2 previously computed to generate the Cdm associated to the input CTU.
4.2.2 Resulting Machine Learning CTU Prediction Method
5 Probabilistic Approach versus Machine Learning for OneShot QuadTree Prediction
This section gives the experimental setup and the results obtained for the two proposed energy reduction schemes on the real time Hevc encoder Kvazaar [36].
5.1 Experimental SetUp and Metrics to Evaluate the QuadTree Partitioning Predictions
5.1.1 Experimental SetUp and Parameters
To conduct the experiments, 18 video sequences [2] that strongly differ from one another in terms of frame rate, motion, texture and spatial resolution were used. All experimentations are performed on one core of the EmETXei87M0 platform from Arbor Technologies based on an Intel Core i54402E processor at 1.6 GHz. The used Hevc software encoder is the real time Kvazaar [16, 17, 39] in All Intra (AI) configuration. Since the configuration aims to be realtime, from [20], the RateDistortion Optimisation Quantization (RDOQ) [14] and the Intra transform skipping [19] features are disabled. Each sequence is encoded with 4 different QP values: 22, 27, 32, 37 [2]. For the Probabilistic approach, previous experiments showed that the best prediction is obtained with Δ ∈ [0.6,0.7] [21]. For the following experiments, Δ is fixed to 0.6 and GOF size is fixed to 50, which is shown in [22] to be an appropriate value for drastic energy reductions.
Bjøntegaard Delta Bit Rate (BDBR) and Bjøntegaard Delta Psnr (BDPsnr) [41] are used to measure the compression efficiency difference between two encoding configurations. The BDBR reports the average bit rate difference in percent for two encodings at the same quality in terms of Peak SignaltoNoise Ratio (Psnr). Similarly, the BDPsnr measures the average Psnr difference in decibels (dB) for two different encoding algorithms considering the same bit rate.
To measure the energy consumed by the platform, Intel Running Average Power Limit (Rapl) interfaces are used to obtain the energy consumption of the CPU package, which includes cores, IOs, DRAM and integrated graphic chipset. As shown in [10], Rapl power measurements are coherent with external measurements and [8] proves the reliability of this internal measure across various applications. In this work, the power gap between the IDLE state and video encoding is measured. The CPU is considered to be in IDLE state when it spends more than 90% of its time in the C7 Cstates mode. The C7 state is the deepest Cstate of the CPU characterized by all core caches being flushed, the PLL and core clock being turned off as well as all uncore domains. The power of the board is measured to 16.7W when the CPU is in idle mode and goes up to 31W during video encoding in average. Rapl shows that 72% of this gap is due to the CPU package, the rest of the power going to the external memory, the voltage regulators and other elements of the board.
5.1.2 Experimental Metrics
The recall ρ(A,B) represents the share of correct quadtree decomposition in term of pixel area between predicted CTUs A and reference CTUs B. Let us use Fig. 6 as example, the recall between the Cdm Fig. 6a (considered as predicted) and Fig. 6b (considered as reference) is equal to \(\rho = 43\times \frac {100}{64} = 67.19\%\).
The recall ρ(P,R) and the distance Γ(P,R) are used in the following sections to evaluate the accuracy of the prediction with P being the predicted Cdm and R the reference Cdm,^{1} generated by a full RDO process (optimal). The average of ρ(P,R) measurements gives the percentage of good prediction in term of pixel area, it falls between 0% and 100% and the more ρ(P,R) is close to 100%, the more the predicted Cdms accurately fit the reference Cdms. The average distance Γ(P,R) represents the mean error in term of depth between the predicted Cdms and the reference one, the more Γ(P,R) is close to 0, the more precise the predicted Cdm P becomes.
5.2 Comparison of Probabilistic and Machine Learning Approach for Predicting an Hevc QuatTree Partitioning
The recall ρ(P,R), distance Γ(P,R), BDBR, BDPsnr and ER of the Probabilistic and Machine Learning drastic energy reduction schemes according to the sequences. For the same energy reduction, the Machine Learning energy reduction techniques achieve better results than the Probabilistic energy reduction techniques for both quadtree prediction accuracy and encoding degradation.
Probabilistic  Machine learning  

Sequence  ρ (in %)  Γ (in d)  BDBR (in %)  BDPsnr (in dB)  ER (in %)  ρ (in %)  Γ (in d)  BDBR (in %)  BDPsnr (in dB)  ER (in %) 
Traffic  44.76  0.86  4.59  0.24  60.16  46.96  0.79  4.05  0.21  58.69 
PeopleOnStreet  51.92  0.70  4.28  0.24  59.06  51.34  0.72  3.73  0.21  57.09 
Kimono  18.87  2.06  13.28  0.43  52.59  40.82  0.94  9.51  0.31  61.33 
ParkScene  38.66  1.24  4.29  0.19  55.84  47.09  0.84  3.86  0.17  60.28 
BasketballDrive  47.68  0.76  3.69  0.11  60.75  50.74  0.69  4.65  0.13  60.16 
Cactus  43.86  0.95  3.56  0.13  60.40  50.03  0.73  3.79  0.14  61.34 
BQTerrace  51.83  0.66  2.25  0.14  62.06  50.35  0.69  1.99  0.13  58.93 
RaceHorses480  46.86  0.91  3.11  0.18  59.68  54.59  0.64  2.95  0.17  60.37 
PartyScene  55.38  0.56  1.88  0.13  59.48  52.54  0.63  1.10  0.08  57.10 
BasketballDrill  51.40  0.64  2.74  0.13  58.33  43.62  0.85  4.97  0.24  62.26 
BQMall  53.78  0.66  3.46  0.19  57.54  52.62  0.66  3.16  0.17  57.13 
RaceHorses240  52.01  0.69  2.47  0.15  59.07  57.88  0.54  1.93  0.12  58.57 
BQSquare  65.92  0.41  2.73  0.21  57.81  67.81  0.42  1.00  0.08  55.51 
BlowingBubbles  55.70  0.55  1.45  0.10  55.25  51.90  0.64  1.24  0.08  53.54 
BasketballPass  57.55  0.55  2.39  0.14  59.20  57.46  0.56  2.31  0.14  58.01 
FourPeople  49.80  0.74  4.78  0.27  55.70  50.03  0.73  4.51  0.25  55.13 
Johnny  53.61  0.67  5.76  0.23  51.63  60.17  0.55  5.49  0.22  52.26 
KristenAndSara  58.50  0.56  4.10  0.21  54.24  61.32  0.50  4.62  0.23  53.39 
Average  49.89  0.79  3.93  0.19  57.71  52.63  0.67  3.60  0.17  57.84 
In terms of quadtree prediction accuracy in oneshot, Table 3 shows that the Machine Learning energy reduction techniques achieve better results (around 53% of ρ(P,R) for a distance Γ(P,R) of 0.67 depth level) than the Probabilistic energy reduction techniques (around 50% of ρ(P,R) for a distance Γ(P,R) of 0.79 depth level).
The results show that both energy reduction techniques achieve an average of 58% of energy reduction. In fact, the overhead due to the unconstrained Learning Frame (F_{L}) and the variance computations of the Probabilistic approach is approximately equal to the overhead of the features computations of the Machine Learning approach. However, even if the Probabilistic approach does not constrain all the frames (only 49 every 50 frames), this approach causes more encoding degradations: + 0.33% of BDBR and 0.02dB of BDPsnr, than the Machine Learning approach. These results show that the two metrics ρ(P,R) and Γ(P,R) of quadtree prediction accuracy are well correlated with the impact in encoding degradations.
It is noticeable in Table 3 than the Kimono sequence has more degradations than the other sequences: 13.28% of BDBR increasing with the Probabilistic approach and 9.51% of BDBR increasing with the Machine Learning approach. This can be explained by the texture specificity of the Kimono video sequence which is composed by a traveling of trees and vegetation in the background. This video sequence has the highest Spatial Information (54.1) due to the details. Nevertheless, the results show that the Machine Learning approach reduces the degradation of 3.77% of BDBR compare to the Probabilistic approach.
The performance of stateoftheart solutions (cf Section 2.3) based on HM can not be directly compared to these results. Indeed, they are measured comparatively to a large compression time, far from realtime. The complexity overhead of stateoftheart solutions is thus comparatively higher in the context of a realtime encoder. Previously published results can thus not be directly applied to reduce the energy consumption in a realtime encoder as the two methods developed here.
To conclude, the Machine Learning approach achieves better results on average than the probabilistic approach and does not require unconstrained learning frames to predict the quadtree partitioning. These two points make the proposed Machine Learning approach a good candidate to build energy reduction methods for realtime Hevc encoders.
6 Conclusion
This paper proposes and compares two energy reduction methods for realtime Hevc Intra encoders. These methods are based on CTU partitioning prediction techniques that drastically limit the recursive RDO process. The first proposed method exploits the correlation between a CTU partitioning and the variance of the CTU luminance samples to predict the quadtree decomposition in oneshot. The second method uses a Machine Learning method to predict in oneshot the quadtree decomposition.
Experimental results show that the Machine Learning method has a slight edge over the probabilistic method and that this performance has a direct impact on the encoding degradations. Both energy reduction techniques are capable of reducing the energy consumption of the Hevc encoder by 58% — including the additional algorithm overhead under a realtime encoder — for a bit rate increase of respectively 3.93% and 3.6%. The obtained energy gain is substantial and close to the theoretical maximum of 78% gain that would be obtained if the perfect quadtree decomposition would be known in advance. Future work will use oneshot quadtree partitioning prediction to control the energy consumption of an Hevc Intra encoder for a given energy consumption budget.
Footnotes
 1.
Exhaustive search leading to the optimal solution.
Notes
Acknowledgments
This work is partially supported by the French ANR ARTEFaCT project, by COVIBE project funded by Brittany region and by the European CelticPlus project 4KREPROSYS funded by Finland, Flanders, France, and Switzerland.
References
 1.Biao, M., & Cheung, R.C.C. (2015). A fast CU size decision algorithm for the HEVC intra encoder. IEEE Transactions on Circuits and Systems for Video Technology, 25(5), 892–896. https://doi.org/10.1109/TCSVT.2014.2363739.CrossRefGoogle Scholar
 2.Bossen, F. (2013). Common HM test conditions and software reference configurations. In JCTVCL1100. Switzerland: Geneva.Google Scholar
 3.Carroll, A., & Heiser, G. (2010). An analysis of power consumption in a smartphone. In USENIX annual technical conference, Boston, MA (Vol. 14, pp. 21?21).Google Scholar
 4.Cassa, M.B., Naccari, M., Pereira, F. (2012). Fast rate distortion optimization for the emerging HEVC standard. In Picture coding symposium (PCS), 2012 (pp. 493–496). IEEEGoogle Scholar
 5.Chan, T.F., Golub, G.H., LeVeque, R.J. (1982). Updating formulae and a pairwise algorithm for variances computing sample. In COMPSTAT 1982 5th symposium held at Toulouse 1982 (p. 30). Springer Science & Business Media.Google Scholar
 6.Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55 (10), 78–87. https://doi.org/10.1145/2347736.2347755.CrossRefGoogle Scholar
 7.Duanmu, F., Ma, Z., Wang, Y. (2015). Fast CU partition decision using machine learning for screen content compression. In 2015 IEEE international conference on image processing (ICIP) (pp. 4972–4976). IEEE.Google Scholar
 8.Efraim, R., Alon, N., Doron, R., Avinash, A., Eliezer, W. (2012). Powermanagement architecture of the intel microarchitecture codenamed sandy bridge. IEEE Computer Society, 32(2), 20– 27.Google Scholar
 9.Feng, L., Dai, M., Zhao, C.l., Xiong, J.y. (2016). Fast prediction unit selection method for HEVC intra prediction based on salient regions. Optoelectronics Letters, 12(4), 316–320. https://doi.org/10.1007/s1180101660648.CrossRefGoogle Scholar
 10.Hackenberg, D., Schone, R., Ilsche, T., Molka, D., Schuchart, J., Geyer, R. (2015). An energy efficiency feature survey of the intel Haswell processor. In 2015 IEEE international parallel and distributed processing symposium workshop (IPDPSW) (pp. 896–904). IEEE. https://doi.org/10.1109/IPDPSW.2015.70.
 11.Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H. (2009). The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18.CrossRefGoogle Scholar
 12.ITU (1999). Recommandation ITUT P.910. Subjective video quality assessment methods for multimedia applications. Geneva.Google Scholar
 13.JCTVC (2016). HEVC reference software. https://hevc.hhi.fraunhofer.de/.
 14.Karczewicz, M., Ye, Y., Chong, I. (2008). Rate distortion optimized quantization. In VCEGAH21, Antalya Turkey.Google Scholar
 15.Khan, M.U.K., Shafique, M., Henkel, J. (2013). An adaptive complexity reduction scheme with fast prediction unit decision for HEVC intra encoding. In 2013 20th IEEE international conference on image processing (ICIP) (pp. 1578–1582). IEEE.Google Scholar
 16.Koivula, A., Viitanen, M., Lemmetti, A., Vanne, J., Hämäläinen, T.D. (2015). Performance evaluation of Kvazaar HEVC intra encoder on Xeon Phi manycore processor. In 2015 IEEE global conference on signal and information processing (GlobalSIP) (pp. 1250–1254). IEEE.Google Scholar
 17.Koivula, A., Viitanen, M., Vanne, J., Hamalainen, T.D., Fasnacht, L. (2015). Parallelization of Kvazaar HEVC intra encoder for multicore processors. In 2015 IEEE workshop on signal processing systems (SiPS) (pp. 1–6). IEEE.Google Scholar
 18.Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86.MathSciNetCrossRefGoogle Scholar
 19.Lan, C., Xu, J., Sullivan, G.J., Wu, F. (2012). Intra transform skipping. In JCTVCI0408, Geneva, CH.Google Scholar
 20.Mercat, A., Arrestier, F., Hamidouche, W., Pelcat, M., Menard, D. (2017). Energy reduction opportunities in an HEVC realTime encoder. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1158–1162). IEEE.Google Scholar
 21.Mercat, A., Arrestier, F., Pelcat, M., Hamidouche, W., Menard, D. (2017). Prediction of quadtree partitioning for budgeted energy HEVC encoding. In 2017 IEEE international conference on signal processing systems (SiPS) (pp. 1–6). IEEE.Google Scholar
 22.Mercat, A., Arrestier, F., Pelcat, M., Hamidouche, W., Menard, D. (2018). Machine learning based choice of characteristics for the oneshot determination of the HEVC intra coding tree. In 2018 picture coding symposium (PCS) (pp. 263–267). IEEE.Google Scholar
 23.Mercat, A., Arrestier, F., Pelcat, M., Hamidouche, W., Menard, D. (2018). Machine learning based choice of characteristics for the oneshot determination of the HEVC intra coding tree (pp. 263–267). IEEE.Google Scholar
 24.MulticoreWare (2017). x265 HEVC Encoder / H.265 Video Codec. http://x265.org/.
 25.Peng, K.K., Chiang, J.C., Lie, W.N. (2016). Low complexity depth intra coding combining fast intra mode and fast CU size decision in 3dHEVC (pp. 1126–1130). IEEE.Google Scholar
 26.Penny, W., Machado, I., Porto, M., Agostini, L., Zatt, B. (2016). Paretobased energy control for the HEVC encoder. In 2016 IEEE international conference on image processing (ICIP) (pp. 814–818). IEEE.Google Scholar
 27.Qualcomm. (2014). Snapdragon 810 processor product brief. https://www.qualcomm.com/documents/snapdragon810processorproductbrief
 28.Quinlan, J.R. (2014). C4. 5: Programs for machine learning. Amsterdam: Elsevier.Google Scholar
 29.Ruiz, D., FernándezEscribano, G., Adzic, V., Kalva, H., Martínez, J.L., Cuenca, P. (2015). Fast CU partitioning algorithm for HEVC intra coding using data mining. Multimedia Tools and Applications, 861–894. https://doi.org/10.1007/s1104201530146.
 30.RuizColl, D., Adzic, V., FernándezEscribano, G., Kalva, H., Martínez, J.L., Cuenca, P. (2014). Fast partitioning algorithm for HEVC Intra frame coding using machine learning. In 2014 IEEE international conference on image processing (ICIP) (pp. 4112–4116). IEEE.Google Scholar
 31.Shen, L., Zhang, Z., An, P. (2013). Fast CU size decision and mode decision algorithm for HEVC intra coding. IEEE Transactions on Consumer Electronics, 59(1), 207–213.CrossRefGoogle Scholar
 32.Shen, X., & Yu, L. (2013). CU Splitting early termination based on weighted SVM. EURASIP Journal on Image and Video Processing, 2013(1), 4.CrossRefGoogle Scholar
 33.Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T. (2012). Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1649–1668. https://doi.org/10.1109/TCSVT.2012.2221191.CrossRefGoogle Scholar
 34.Sze, V., Budagavi, M., Sullivan, G.J. (Eds.) (2014). High efficiency video coding (HEVC) integrated circuits and systems. Cham: Springer.Google Scholar
 35.Tan, T.K., Weerakkody, R., Mrak, M., Ramzan, N., Baroncini, V., Ohm, J.R., Sullivan, G.J. (2016). Video quality evaluation methodology and verification testing of HEVC compression performance. IEEE Transactions on Circuits and Systems for Video Technology, 26(1), 76–90. https://doi.org/10.1109/TCSVT.2015.2477916.CrossRefGoogle Scholar
 36.UltraVideoGroup (2017). Kvazaar HEVC Encoder. http://ultravideo.cs.tut.fi/#encoder.
 37.Vanne, J., Viitanen, M., Hamalainen, T.D., Hallapuro, A. (2012). Comparative ratedistortioncomplexity analysis of HEVC and AVC video codecs. IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1885–1898. https://doi.org/10.1109/TCSVT.2012.2223013.CrossRefGoogle Scholar
 38.Vantrix (2017). F265 Open Source HEVC/H.265 Project. http://vantrix.com/f2652/.
 39.Viitanen, M., Koivula, A., Lemmetti, A., Vanne, J., Hamalainen, T.D. (2015). Kvazaar HEVC encoder for efficient intra coding. In 2015 IEEE international symposium on circuits and systems (ISCAS) (pp. 1662–1665). IEEE.Google Scholar
 40.Wang, X., & Xue, Y. (2016). Fast HEVC intra coding algorithm based on Otsu’s method and gradient. In 2016 IEEE international symposium on broadband multimedia systems and broadcasting (BMSB) (pp. 1–5). IEEE.Google Scholar
 41.Wiegand, T., Sullivan, G., Bjontegaard, G., Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 560–576. https://doi.org/10.1109/TCSVT.2003.815165.CrossRefGoogle Scholar
 42.Wien, M. (2015). High efficiency video coding. signals and communication technology. Berlin: Springer.Google Scholar
 43.Zhang, Y., Kwong, S., Wang, X., Yuan, H., Pan, Z., Xu, L. (2015). Machine learningbased coding unit depth decisions for flexible complexity allocation in high efficiency video coding. IEEE Transactions on Image Processing, 24(7), 2225–2238. https://doi.org/10.1109/TIP.2015.2417498.MathSciNetCrossRefGoogle Scholar
 44.Zhang, J., Li, B., Li, H. (2015). An efficient fast mode decision method for inter prediction in HEVC. IEEE Transactions on Circuits and Systems for Video Technology, 26(8), 1502–1515. https://doi.org/10.1109/TCSVT.2015.2461991.CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.