1 Introduction

Within the fields of economics, finance, computer science and decision theory there is an increasing interest in the problem how to replace the additivity property of probability measures by that of monotonicity or, more generally, a non-additive measure. Several types of integrals with respect to non-additive measures, also known as fuzzy measures [30], were developed for different purposes in various works [2, 14, 21, 32, 37]. The Choquet integral, as a popular representation of the non-additive measure, has been successfully used for many applications such as information fusion [7], multiple regressions [33], classification [15, 38], multicriteria decision making [9], image and pattern recognition [20, 39], and data modeling [12].

In general, the basic idea to solve the non-additive model based on Choquet integral is a two-step procedure. The first step is to reduce the non-linear multi-regression model to the traditional linear multi-regression model by converting each n-dimensional vector to a \(2^{n}\)-dimensional one, which is defined over the powerset of attributes; and thus, the second step is to solve the linear model by using various numerical indices and optimization method [27]. So far, numerous related works have been developed [16, 19, 22, 23, 26].

However, the numbers of variables involved in solving the non-additive model increases exponentially with n and so will the computational time. Thus, the use of non-additive measure in practical applications is clearly curbed by this exponential complexity [24]. Several approaches to deal with this problem are known. The notion of k-additivity proposed by Grabisch [6, 10, 11, 13] enables to find a trade-off between the complexity of representation and the richness of the modeling. In [40], Yan developed a hierarchical Choquet integral to model the data. These techniques, however, are all working on solving the complexity problem through discarding or integrating the \(2^{n}\) variables to smaller ones. The results obtained through these techniques are all approximate solutions. On the other hand, these results cannot be mapping back to the original \(2^{n}\)-dimensional space.

To solve the computational complexity problem, several approaches are known as follows. The notion of k-additivity proposed by Grabisch [6, 10, 11, 13] enables to find a trade-off between the complexity of representation and the richness of the modeling. In [40], Yan developed a hierarchical Choquet integral to model the data. These techniques, however, are all working on solving the complexity problem through converting the \(2^{n}\)-dimensional space to a smaller but less precise one. In the original \(2^{n}\)-dimensional space, the practical Choquet integral problem is still not solved.

In this paper, we propose a novel polynomial method to solve the parameter estimation problem for Choquet integral. The basic idea of our method is to regard the problem as a sequential one at first; and then we use Bayesian inference method to solve the problem. The parameters of Choquet integral are updated after each data sample presentation. This update procedure is finished in a low-dimensional space mapping from original one. We provide statistical performance guarantees for the proposed method. The experiments show that the performance achieved by using this method is better than that of traditional methods. A special benefit of our method is that the \(2^{n}\) variables are retained. So the semantic information of the \(2^{n}\) variables is not lost. Based on our previous works [19, 25, 35, 36], this sequential method is very suitable to real-world applications.

The main contributions of the paper can be summarized as follows. (1) A novel polynomial method is proposed to solve the parameter estimation problem for Choquet integral. Using our method, the computational complexity for the non-additive measure is reduced from \(O((n+K)*2^{2n})\) to \(K*O(n^{2}logn)\). (2) As a case of combining sequential concept and Choquet integral, this method can be utilized to real-time applications. (3) As a real-world application, cross-layer design of wireless multimedia networks is optimized by the proposed method.

The remainder of the paper is organized as follows. In Sect. 2, we introduce the preliminaries which provides a mathematical setting for fuzzy measures. Section 3 gives the Bayesian inference based algorithm to estimate the parameters of large-scale Choquet integral. A benchmark dataset test and a real-world application are provided to illustrate the proposed algorithms in Sect. 4. Finally, summary and future works are drawn in Sect. 5.

2 Choquet Integral

2.1 Basic Definitions

Let us consider a multi-feature problem described by a finite set of alternatives \(A =\{a_{1},a_{2},\ldots \}\) and a finite set of features \(F = \{f_{1},f_{2},\ldots \}\). Each alternative \(a_{i} \in A\) can be associated with a profile \((\mu _{1},\ldots ,\mu _{n}) \in \mathcal {R}^{n}\), where, for any \(f_{j} \in F, \mu _{j} \in \mathcal {R}\) represents the utility of \(a_{i}\) related to feature j.

In general, one can compute an aggregation score H(fw) by a linear basis function model which takes into account the weights of importance of the feature.

$$\begin{aligned} H\left( {f,w} \right) = \sum \limits _{j = 1}^n {w _j \phi _j \left( f \right) = w^T \phi \left( f \right) } \end{aligned}$$
(1)

where \(w = (w_{1}, \ldots , w_{n})^{T}\) and \(\phi = (\phi _{1}, \ldots ,\phi _{n})^{T}\). The weight w is also regarded as a Lebesgue measure on a singleton f. However, the assumption of independence among features is rarely verified. In order to take into account a flexible representation of complex interaction phenomena among features, it is useful to substitute to the weight vector w a non-additive set function \(\mu \) on \(N={1,\ldots ,n}\), called Choquet capacity [34], allowing defining a weight not only on the importance of each feature, but also on the importance of each subset of features. It is clearly more powerful than the Lebesgue integral model since Lebesgue integral thus becomes a special case of the Choquet integral model. The Choquet integral is defined as follows [4, 7].

Definition 1

For every space \(\varOmega \) and algebra \(\mathcal {A}\) of subset of \(\varOmega \), a set-function \(\mu :\mathcal {A}\rightarrow \mathcal {R}\) is called a capacity if it satisfies the following:

  1. (i)

    \(\mu \left( \emptyset \right) = 0,\mu \left( \varOmega \right) = 1\),

  2. (ii)

    \(\forall A,B \in \mathrm{\mathcal {A}}:A \subseteq B \Rightarrow \mu \left( A \right) \le \mu \left( B \right) \)

    Furthermore, a capacity \(\mu \) on \(\varOmega \) is said to be additive if

  3. (iii)

    \(\forall A,B \in \mathrm{\mathcal {A}}:\mu \left( {A \cup B} \right) \ge \mu \left( A \right) + \mu \left( B \right) - \mu \left( {A \cap B} \right) \)

For each subset of feature \(A \subseteq \varOmega \), the number \(\mu (A)\) can be regarded as the weight or the importance of A.

Definition 2

If \(\phi : \varOmega \rightarrow \mathcal {R}\) is a bounded A-measurable function and \(\mu \) is any capacity on \(\varOmega \), we define the Choquet integral of \(\phi \) with respect to \(\mu \) to be the number

$$\begin{aligned} \int _\varOmega {\phi \left( f \right) d\mu \left( f \right) } = \int _0^\infty {\mu \left( {\left\{ {f \in \varOmega :\phi \left( f \right) \ge \alpha } \right\} } \right) d\alpha } \nonumber \\ + \int _{ - \infty }^0 {\left[ {\mu \left( {\left\{ {f \in \varOmega :\phi \left( f \right) \ge \alpha } \right\} } \right) - 1} \right] d\alpha } \end{aligned}$$
(2)

where the integrals are taken in the sense of Riemann. In particular, if \(\varOmega \) is finite and \(\phi (f_{1})\ge \phi (f_{2})\ge \ldots \ge \phi (f_{n})\), then

$$\begin{aligned} \int _\varOmega {\phi \left( f \right) d\mu \left( f \right) } = \sum \limits _{i = 1}^{n - 1} \left( {\phi \left( {f _i } \right) - \phi \left( {f _{i + 1} } \right) } \right) \mu \left( {\left\{ {f _1 ,\ldots ,f _i } \right\} } \right) \nonumber \\ + \phi \left( {f _n } \right) \end{aligned}$$
(3)

Definition 3

Let \(\phi \) be a function from \(X=\{x_{1},\ldots x_{n}\}\) to [0, 1]. Let \(\{x_{\sigma (1)},\ldots ,x_{\sigma (n)}\}\) denote a reordering of the set X such that \(0\le \phi (x_{\sigma (1)}) \le \ldots \le \phi (x_{\sigma (n)})\), and let \(A_{(i)}\) be a collection of subsets defined by \(A_{(i)}=\{x_{\sigma (i)},\ldots ,x_{\sigma (n)}\}\). Then, the discrete Choquet integral of \(\phi \) with respect to a fuzzy measure \(\mu \) on X is defined as

$$\begin{aligned} C_\mu \left( \phi \right)= & {} \sum \limits _{i = 1}^n {\mu \left( {A_{\left( i \right) } } \right) \left( {\phi \left( {x_{\left( i \right) } } \right) - \phi \left( {x_{\left( {i - 1} \right) } } \right) } \right) } \nonumber \\= & {} \sum \limits _{i = 1}^n {\phi \left( {x_{\left( i \right) } } \right) \left( {\mu \left( {A_{\left( i \right) } } \right) - \mu \left( {A_{\left( {i + 1} \right) } } \right) } \right) } \end{aligned}$$
(4)

where we take \(\phi (x_{(0)})=0\), \(A_{(n + 1)}=0\), and \(x_{(i)} =x_{\sigma (i).}\)

2.2 Existing Method

Based on the Definitions 13, the relationship between the Choquet integral of \(\phi \) and the capacity \(\mu \) can also be described by a new nonlinear multi-feature regression model [5]:

$$\begin{aligned} C_\mu \left( \phi \right) = e + \int _\varOmega {\phi \left( f \right) d\mu \left( f \right) } + \mathcal {N}\left( {0,\delta ^2 } \right) \end{aligned}$$
(5)

where e is a regression constant, \(\phi \) is an observation of \(X=\{x_{1},x_{2},\ldots ,x_{n}\}, N(0,\delta ^{2})\) is a normally distributed random perturbation with expectation 0 and variance \(\delta ^{2}\), and \(\delta ^{2}\) is the regression residual error. The Choquet integral problem is reduced to a traditional linear multi-regression model. At the same time, specifying a general fuzzy measure requires specification of \(2^{n}-1 \) parameters, which is clearly exponential.

For solving this linear multi-regression model, usually we need to define a loss function. Through minimizing this loss function, the parameter of the linear model can be derived. Given classes \(C_{1},\ldots ,C_{n}\), Grabisch [17, 18] proposed a mean squared error (MSE) criterion, where the difference between desired outputs \(t_{i}\) for \(i=1,\ldots ,n\) and the actual outputs \(C_{\mu }(\phi )\) is minimized under constraints [1]. The loss function L is

$$\begin{aligned} L^2 = \sum \limits _{\phi \in C_1 } {\left( {C_\mu \left( {\phi } \right) - t _1 } \right) ^2 } + \cdots + \sum \limits _{\phi \in C_n } {\left( {C_\mu \left( {\phi } \right) - t _n } \right) ^2 } \end{aligned}$$
(6)

Given the observation data, the optimal regression coefficients \(\mu \) can be determined by using regression methods. For example, in order to make the loss function minimal, the least square method, as a regression method, is used for determine \(\mu _{k}(k=1,2,\ldots , 2^{n}-1)\) [28].

We use a maximum likelihood estimation (MLE) [8] method to fulfill the least square method. In fact, we define the likelihood function by using Gaussian distribution:

$$\begin{aligned} p\left( {t\left| {\phi ,\mu ,\beta } \right. } \right) = \mathrm{\mathcal {N}}\left( {t\left| {C_\mu \left( \phi \right) ,\beta ^{ - 1} } \right. } \right) \end{aligned}$$
(7)

where t is desired outputs given by a deterministic function \(C_\mu \left( \phi \right) \), and \(\beta \) is the precision of Gaussian random variable. Consider an observation set of inputs \(\phi \) with corresponding desired outputs \(t_{1},\ldots ,t_{n}\),

$$\begin{aligned} p\left( {t\left| {\phi ,\mu ,\beta } \right. } \right) = \prod \limits _{n = 1}^N {\mathrm{\mathcal {N}}\left( {t_n \left| {\mathrm{{\mu }}^T \phi _n ,\beta ^{ - 1} } \right. } \right) } \end{aligned}$$
(8)

where t and \(\mu \) are column vectors

Solving for \(\mu \) using MLE method, we obtain

$$\begin{aligned} \mathrm{{\mu }}_{ML} = \left( {\varPhi ^T \varPhi } \right) ^{ - 1} \varPhi ^T \mathrm{{t}} \end{aligned}$$
(9)

Here \(\varPhi \) is an \(N \times M\) matrix whose elements are given by \(\varPhi _{nj} = \phi _{j}(f_{n})\) [31], so that

$$\begin{aligned} \varPhi = \left( {\begin{array}{*{20}c} {\phi _0 \left( {f_1 } \right) } &{} {\phi _1 \left( {f_1 } \right) } &{} \cdots &{} {\phi _{M - 1} \left( {f_1 } \right) } \\ {\phi _0 \left( {f_2 } \right) } &{} {\phi _1 \left( {f_2 } \right) } &{} \cdots &{} {\phi _{M - 1} \left( {f_2 } \right) } \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {\phi _0 \left( {f_N } \right) } &{} {\phi _1 \left( {f_N } \right) } &{} \cdots &{} {\phi _{M - 1} \left( {f_N } \right) } \\ \end{array}} \right) \end{aligned}$$
(10)

To summarize, the algorithm to be employed by MLE method is as follows:

figure a

For time-complexity analysis, we assume that there are n feature and K samples. The Choquet integral space is \(2^{n}\)-dimensional. In addition, we assume that the complexity of the inversion of a matrix \(n\times n\) is \(\varOmega (n^{2}logn)\) [29]. We present the time complexity using pseudo-code analysis.

figure b

We can rewrite this time complexity as

$$O(K*2^{2n}+K*2^{n}+ log2*n*2^{2n}+2^{2n})$$

Thus, we have that the time complexity for MLE-based method is \(O((n+K)*2^{2n})\). Obviously, the numbers of variables involved in solving the non-additive model increases exponentially with n and so will the computational time. In fact, the use of non-additive measure in practical applications is clearly curbed by this exponential complexity [24]. As an important challenge, the practical application of the non-additive measure has been plagued by real-time computational problems for a long time.

On the other hand, the traditional method, such as the MLE-based method, which involves processing the entire samples in one go, can be inappropriate for some real-time applications in which the data observations are arriving in a continuous stream, and predictions must be made before all of the samples are acquired. For a large scale dataset, it may be suitable to use sequential algorithms in which the data samples are considered one by one, and the model parameters updated after each such presentation.

3 Our Method

A general Choquet integral is defined by \(2^{n}-1\) coefficients. In order to reduce the computational complexity, we developed a Bayesian inference based Choquet integral (BIBCI) method.

Now we introduce some theorems needed in this work, and these theorems also can be found in [3].

Given a marginal Gaussian distribution for \(\phi \) and a conditional Gaussian distribution for \(C_\mu \left( \phi \right) \) given \(\phi \) in the form

$$\begin{aligned} p\left( \phi \right) = \mathrm{\mathcal {N}}\left( {\phi \left| {\varphi ,P ^{ - 1}} \right. } \right) \end{aligned}$$
(11)
$$\begin{aligned} p\left( {\left. C_\mu \left( \phi \right) \right| \phi } \right) = \mathrm{\mathcal {N}}\left( {C_\mu \left( \phi \right) \left| {\mu \phi + b,Q^{ - 1}} \right. } \right) \end{aligned}$$
(12)

where \(\varphi , \mu \), and b are parameters governing the means, and P and Q are precision matrices. So the marginal distribution of \(C_\mu \left( \phi \right) \) and the conditional distribution of \(\phi \) given \(C_\mu \left( \phi \right) \) are given by

$$\begin{aligned} p\left( C_\mu \left( \phi \right) \right) = \mathrm{\mathcal {N}}\left( {C_\mu \left( \phi \right) \left| {\mu \varphi + b,Q^{ - 1} + \mu P^{- 1}\mu ^T} \right. } \right) \end{aligned}$$
(13)
$$\begin{aligned} p\left( {\left. \phi \right| C_\mu \left( \phi \right) } \right) = \mathrm{\mathcal {N}}\left( {\phi \left| {\sum {\left\{ {\mu ^TQ\left( {C_\mu \left( \phi \right) - b} \right) + P \varphi } \right\} } } \right. ,\varSigma } \right) \end{aligned}$$
(14)

where

$$\begin{aligned} \varSigma = \left( {P + \mu ^TQ\mu } \right) ^{ - 1} \end{aligned}$$
(15)

Based on (3)–(7), we can derive the posterior distribution of \(\mu \) as

$$\begin{aligned} p\left( {\mu \left| t \right. } \right) = \mathrm{\mathcal {N}}\left( {\mu \left| {\varphi _N ,S_N } \right. } \right) \end{aligned}$$
(16)

where

$$\begin{aligned} \varphi _N = S_N \left( {S_0^{ - 1} \varphi _0 + \beta \varPhi ^Tt} \right) \end{aligned}$$
(17)
$$\begin{aligned} S_N^{ - 1} = S_0^{ - 1} + \beta \varPhi ^T\varPhi \end{aligned}$$
(18)

where \(\varphi _{N}\) is prior means, \(S_{N}\) is prior precision matrices, \(\beta \) is prior uncertainty. Because the posterior distribution is Gaussian, its mode coincides with its mean. Thus the maximum posterior weight vector is simply given by \(\mu _{MAP} = \varphi _{N}\).

Suppose that we have already observed N data points; thereby the posterior distribution over \(\mu \) is given. This posterior can be regarded as the prior for the next observation. By considering an additional data point \((\phi _{N + 1}, t_{N + 1})\) the resulting posterior distribution can be derived based on (8)–(10)

$$\begin{aligned} p\left( {\mu \left| {t_{N + 1} ,\phi _{N + 1} ,\varphi _N ,S_N } \right. } \right) = \mathrm{\mathcal {N}}\left( {\mu \left| {\varphi _{N + 1} ,S_{N + 1} } \right. } \right) \end{aligned}$$
(19)

with

$$\begin{aligned} S_{N + 1}^{ - 1} = S_N^{ - 1} + \beta \phi _{N + 1} \phi _{N + 1}^T \end{aligned}$$
(20)
$$\begin{aligned} \varphi _{N + 1} = S_{N + 1} \left( {S_N^{ - 1} \varphi _N + \beta \phi _{N + 1} t_{N + 1} } \right) \end{aligned}$$
(21)

Since the non-additive measure in the Choquet model is defined over the powerset \(\varOmega \), the reduction step basically aggregates the observed data of individual features to the observation on sets. It is clear that there are only \(n (n\ll 2^n)\) non-zero elements in each \(2^{n}\)-dimensional vector data. On the other hand, it is not all necessary to use the \(2^{n}\) coefficients for building a Choquet integral model. In most cases, we only need a very small part of these coefficients to compute the Choquet integral. More importantly, some coefficients may not be used at all in many applications. However, for traditional Choquet integral method, the model cannot be built if we do not figure out all the coefficients.

We now develop a new approximation algorithm to reduce the computational complexity. Because most information about the relationship among features is contained in only a small fraction of all the coefficients, we can use the n non-zero element in each sample to adjust the related parameter. For each sample, the parameters of Choquet integral related with non-zero element are updated through Bayesian inference method. The basic idea behind the proposed method is that it is far easier to consider a sequence of conditional distributions than it is to obtain the marginal by integration of the joint density. Based on (19)–(21), we can update the parameters of Choquet integral model by Bayesian inference. Though only n parameters are modified in each training process, the \(2^{n}\)-dimensional parameter vector will tend to be reasonable after certain rounds of training. Meanwhile, this algorithm only updates the coefficients that the training data support. It is possible to acquire a reasonable model which maybe has a few coefficients much less than \(2^{n}\). It then executes the following algorithm.

We present the time complexity using pseudo-code analysis. We can rewrite this time complexity as

$$K*O(n+n^{2}+n^{2}+n^{2}logn+n^{2}+n+n+n+n)$$

Thus, the time complexity for our method is \(K*O(n^{2}logn)\).

figure c

Our method uses the idea of mapping high-dimensional space distributions to lower one to update the utility parameters.

Given an n-dimensional vector in original non-additive space, it can be mapped to a \(2^{n}\)-dimensional linear space. If the our method use K samples to achieve convergence, and we define a learning rate as \(\lambda _{p}\) under certain precision p, the frequency of training of each element in an n-dimensional vector is generally in a direct ratio to n:

$$\begin{aligned} \frac{nK}{2^n} \approx \lambda _p n \end{aligned}$$
(22)

Usually, our method needs K samples to achieve the precision p, and K can be estimated by:

$$\begin{aligned} K \approx 2^n\lambda _p \end{aligned}$$
(23)

4 An Application Example

For wireless multimedia networks, cross-layer design has been regarded as one of the most effective and efficient ways to provide quality of service (QoS). At the application layer, prediction mode and quantization parameter (QP) in video encoding are two critical design variables [H.264 2005]. At the physical layer, modulation and coding schemes (MCS) have been adopted to achieve a good tradeoff between transmission rate and transmission reliability. PSNR indicates the performance of wireless multimedia networks. The channel signal-to-interference-noise ratio (SINR) reflects the channel conditions

Using non-additive measure, we can estimate the parameters \(\mu _{k}(k=1,2,\ldots ,n)\). In fact, \(\mu _{k}\) indicates the importance of coefficients k. In previous works [19, 25, 35, 36], we have illustrated that the parameters of model is stable under certain scenario, and the optimization using non-additive measure for cross-layer design is effective when channel condition is known.

Let us further consider this issue for real-world applications. In fact, it is very hard to acquire the real-time channel condition information in wireless multimedia networks community. In other words, the system is usually unaware of the state transitions. For example, we can only obtain the coefficients of QP and MCS for a wireless multimedia networks system. The optimal configuration of QP and MCS is largely determined by SINR under the current scenario. However, the system is not able to obtain the current SINR.

For cross-layer design optimization of wireless multimedia networks, the three problems mentioned above all exist. The traditional non-additive measure method is not able to handle this multi-modal, real-time and high complexity problem.

4.1 The Performance of the Application

The MLE and our methods were applied to the real-time transmission of an individual video bitstream across a multi-hop 802.11a/e wireless network. We first discuss the regression experiments and then the real-time application experiments.

Regression. In this part, the data set contained 8064 3-D samples, each containing one PSNR value from each of the three variables (Mac_length, QP and AMC) used in the cross-layer optimization problem. Two algorithms, MLE and Bayesian inference based Choquet integral (BIBCI) methods, were considered for non-additive measure.

To evaluate the methods as regression algorithms, three runs of a 10-fold cross-validation (in a total of 30 simulations) were applied for the MLE method. As real-time methods, our methods were applied directly to real-time prediction. The overall performance is evaluated by MAD and RMSE. We also conducted the paired t-tests against true PSNR value confirmed the statistical significance of each method.

The results are shown in Table 1. Under the MAD and RMSE criterion, the BIBCI method outperforms the MLE method. For the time consumption, the MLE method is better than BIBCI method. This is an interesting outcome, since the matrix of variables can be calculated directly under low-dimensional situation, with no need for accumulated calculations. However, from the MAD and RMSE points of view, the best option is still our method.

Table 1. The experiment results in terms of the MAD errors and t-test

Real-Time Application. In this part, we simulate the real time wireless multimedia dataset. In this dataset, channel condition is unstable and constantly changing. We draw samples to simulate the real-life situation where the channel condition changes from bad to good or vice versa. There are 1300 samples in this dataset, and it includes 2 period of channel condition changing.

For better illustration, we process the dataset using kernel smoothing as shown in Figs. 1 and 2. The channel condition is just utilized to acquire the optimal PSNR. In real-life situation, the channel condition is hidden and unknown.

Fig. 1.
figure 1

The kernel smoothing of the real-time wireless multimedia dataset

Fig. 2.
figure 2

The SNR of the real-time wireless multimedia dataset

For the MLE method, the utility of variable can be calculated by traversing all the options. For our dataset, the utility of Mac_length equals to 28.56, the utility of QP equals to 35.73 and the utility of AMC equals to 38.40. So the distortion performance is more sensitive to the AMC than to other variables. In this test, system optimizes the AMC configuration all the time for the MLE method.

For our method, we calculate the utilities for each slice, and consider the variable with the largest utility value as the most important variable. For each slice, system always optimizes the most important variable. In order to avoid local optimum, we update the utilities by 8 random samples before dealing with each slice.

The results are shown in Fig. 3. We can see clearly that the performance of our method is closer to the optimal than traditional Choquet integral. We also can find that the accuracy of the first 200 samples is not very satisfactory, and this phenomenon does not happen in second period, which means the our algorithm turns to stable within 9 * 200 updates under this dataset.

Fig. 3.
figure 3

The simulation on the real-life situation. The performance of our method is closer to the optimal than traditional Choquet integral.

The performances of MLE and our methods are shown in Table 2. Larger values result in better multimedia system. To illustrate the quality of the reconstructed videos, we show some sample video frames dealing with the optimal configuration, the MLE-based method and our method, respectively. As shown in Table 3, we can see that the performance achieved by using our method is better than that of MLE-based method.

Table 2. The mean performance of MLE and BIBCI methods
Table 3. Some sample video frames dealing with the optimal configuration, the MLE-based method and BIBCI method.

5 Conclusion

In this paper, we propose a novel polynomial method to solve the parameter estimation problem for Choquet integral. The proposed approach allows training parameters of Choquet integrals without requiring entire input samples. The computational complexity of the proposed algorithm is low and the number of parameters needed to compute is only linear rather than exponential with the number of inputs. Using our method, the computational complexity for the non-additive measure is reduced from \(O((n+K)*2^{2n})\) to \(K*O(n^{2}logn)\). Specifically, the semantic information of the \(2^{n}\) variables is not lost. This method can be utilized to real-time applications. As a real-world application, cross-layer design of wireless multimedia networks is optimized by the proposed method. The experiments show that the performance achieved by using this method is better than that of traditional methods.