A Robust M-Shaped Error Weighted Algorithms for Censored Regression

In reality, the range of sensor response is limited in many sensor systems due to the saturation characteristics of the sensor. That is, the value exceeding the sensor response range is not observed. Using traditional adaptive algorithms to identify the system of this type may lead to the performance degradation. To address this problem, the censored regression algorithms have been proposed. However, when the mixed sub-Gaussian and super-Gaussian/impulsive noises occur, these algorithms may fail to work. To overcome these drawbacks, a family of robust M-shaped (FRMS) functions for censored regression (CR-FRMS) is proposed in this paper. When the system to be identified exhibits a certain degree of sparsity, the CR-FRMS algorithm cannot fully utilize the characteristics of the sparse system. Therefore, in this paper, proportionate FRMS (PFRMS) algorithm based on l0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ l_{0} $$\end{document}-norm constraint for censored regression (l0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ l_{0} $$\end{document}-CRPFRMS) is also proposed accordingly. The simulations using Gaussian white noise as the input signal and the non-Gaussian mixed noise as the background noise show that the proposed algorithm performs better than other algorithms.


Introduction
Linear regression models are widely used for signal processing, communication, and many other yields, assuming that the observed data is fully available. To solve this model problem, the researchers have proposed a large number of adaptive algorithms, such as least mean square (LMS) algorithm [13], normalized LMS (NLMS) algorithm, affine projection algorithm (APA) [14,30,33,42] and so on.
Unfortunately, the requirement for linear regression may not be usually met in many practical applications. In general, output data whose value exceeds the limit of the recording device cannot be observed [3,15,29]. In other words, the output data whose values lie in a certain range are available. This situation occurs in economics [4], statistics [11,27], engineering applications [28,26], and medical research [7,31]. In some systems, due to sensors' saturation characteristics [1,9,43], it is not possible to collect complete data very efficiently. For example, in microphone array signal processing [2,16], when the amplitude of speech signal exceeds a certain amplitude threshold of the microphones, it is cut off and the signal wave shape may be distorted to be flat because the signal's positive and negative peaks exceeding the threshold are lost, or censored. Actually, the censored regression can be seen as a nonlinear regression model which includes a saturated nonlinearity model and linear system [23,38]. Since the output data of the censored regression may lose significant information, using the traditional algorithms to identify this type of model may result in biased and wrong estimates [22]. Recently, in an attempt to deal with the censored regression problem, numerous algorithms have been proposed, such as maximum likelihood (ML) methods [8], two-step estimator [15], least absolute deviation [29] To solve online censored regression problems, Liu et al. [24] proposed the adaptive Heckman two-step algorithm (TSA) which significantly outperforms the conventional adaptive algorithms while the output data is censored.
As is well known, most of the adaptive algorithms are based on Gaussian noise environment. However, real-world signals often exhibit non-Gaussian properties. For example, in the beamforming, sub-Gaussian (light-tailed) signals are frequently encountered [17]. In some active noise control (ANC) applications, mechanical friction, vibration noise, and speech signal are super-Gaussian/impulsive (heavy-tailed) signals [39]. In the blind source separation (BSS), adaptive receivers with multiple antennas, and image denoising, the signal may be the mixed sub-Gaussian and super-Gaussian/impulsive signal [18-20, 32, 35, 40, 46].
In an attempt to improve the robustness of the algorithm in mixed non-Gaussian background noise environments, a family of robust M-shaped (FRMS) functions was applied to the adaptive algorithms [45].
Furthermore, when the system exhibits a certain degree of sparsity, the appeal algorithm cannot utilize the characteristics of the system. To deal with this issue, proportional algorithms [10] and multiple norm forms of zero-attraction algorithms, such as l 0 -norm, l 1 -norm, and l p -norm [5,12,25,34,37,39,46], are applied to this type of system. In fact, the ideal sparse measure is the l 0 -norm that counts the number of nonzero components. Therefore, the l 0 -norm constraint is adopted in this paper.
In this paper, contributions are made as follows: (i) For the first time, a family of robust M-shaped algorithm for mixed non-Gaussian background noises under the censored regression model is proposed. The algorithm not only can effectively compensate the output signal, but also can deal with the effects of sub-Gaussian and super-Gaussian/impulsive noises. (ii) For the characteristics of sparse system, the l 0 -norm proportional FRMS algorithm under the censored regression model (l 0 -CRPFRMS) is also proposed for the first time. (iii) Simulation examples to demonstrate the performance of the proposed algorithm in non-Gaussian background noise environments.

Problem Formulation
Consider the following linear regression model where w o is the unknown column vector of size, η n denotes the background noise with zero mean and variance σ 2 i . In this paper, sub-Gaussian and super-Gaussian noises are involved, and ξ n represents the impulsive noise with zero mean. When the datad n and u are completely observed, using some traditional methods, the unknown vector w o can be identified easily. However, the datad n may be not completely observed in some practical application. The censored output d n can be formulated by where (d n ) + max{0,d n }.

Remark 1
In this paper, although the left-censored to 0 is only applied to the algorithm, the right-censored and both-sides-censored cases can also be applied in the algorithm, and the processing method is similar. In particular, if the outputd n is left-censored or right-censored to a constant c, the censored output d n can be expressed as and respectively.

Review of FRMS Algorithm
In [23], the authors divide the existing error nonlinear adaptive algorithms based on LMS into three categories, V-shaped, Λ-shaped, and M-shaped algorithms. After comparing the advantages and disadvantages of each algorithm, the FRMS function was proposed, where its weighted function is where ς > 0 is the parameter. For instance, when ς → 0, it will be a Λ-shaped algorithm. (Λ-shaped algorithm is applicable to the super-Gaussian noise than that for sub-Gaussian noise environment.) When ς → ∞, it is a V-shaped algorithm.
(V-shaped algorithm is more suitable for sub-Gaussian noise.) In the proposed robust M-shaped function, its denominator is with an order higher than the numerator. Using the gradient descent, this algorithm essentially solves the stochastic cost function which is positive semi-definite for ς > 0; p > 0. Thus, the proposed robust M-shaped algorithm will not suffer from the local minimum problem.

Proposed CR-FRMS Algorithm
Since the observations in this paper are assumed to be censored, the FRMS algorithm would obtain a biased estimate which is called sample selection bias [34,44]. In other words, whend n < 0, the datad n is missing, which leads to the bias and the inequality To compensate the bias, firstly, it is necessary to correct the sample selection bias using FRMS in censored observations, which is inspired by the Heckman two-stage approach [15]. Due to the left-censored property of d n , only positive values of d n can be correctly obtained. Recalling (1) and noting that the background noise η n and impulsive noise ξ n are both zero-mean signals, the expectation of d n under the condition d n > 0 can be expressed by Since the impulsive noise occurs with a very low probability, the following approximation is reasonable, Before calculating the last term of (5), the following lemma is introduced.
Proof See "Appendix" for detail.
Using the lemma, we have with ϕ(·) and Φ(·) given in Table 1. In addition, the vector α is given by Then, using (9) and the probability theory yields where the second equation comes from the fact that the probability of d n > 0 is equal to Φ(u T n α), i.e., Pr(d n > 0) Φ(u T n α). According to (12), the censored regression model (2) can be expressed as: where v n is the random variable with zero mean, i.e., Since the algorithm based on MSE criterion cannot estimate w o correctly, the cost function based on FRMS is adopted where Obviously, to estimate w o , the estimation for α and σ n , should also be available. In the sequel, an indicator variable a n is introduce to estimate α, The probabilities of a n is expressed as Pr(a n 0) Φ(−u T n α).
Then, the following optimization problem is considered to estimate α [22], where and Γ n (ᾱ) log(Pr(d n |u n , α)) (22) with In the sequel, using the steepest ascent principle yields [22] Then, the estimation for σ n is considered. Using the decent method and the cost function f (ε n (w, σ n, α)) with respected to σ n,i−1 , we havê where Similarly, the weight update formula can be obtained by the gradient descent method where

Proposed l 0 -CRPFRMS algorithm
In many practical applications, such as digital TV transmission channels and echo paths, the systems that need to be identified are sparse. However, the algorithm in Sect. 3.1 does not make full use of the characteristics of the sparse system. Therefore, this section proposes a l 0 -norm proportional FRMS algorithm based on the censored regression model (l 0 -CRPFRMS). Sparse system is defined whose impulse response contains many near-zero coefficients and few large ones. In [36], the author first proposed a proportional idea, the (PNLMS) algorithm. Its iteration formula is as follows: where x(·) is input signal,ŷ(·) is output signal and y(·) is expected signal, andĥ n (·) is tap weight. The parameters ρ and δ affect small-signal regularization. In (32), ρ preventsĥ n (k + 1) from stalling when it is much smaller than the largest coefficient and δ regularizes the updating when all coefficients are zero at initialization. It can be seen from (31)(32)(33)(34)36) that when the tap weight is closer to zero, the g n (k)/ḡ(k) item will become smaller and smaller. On the contrary, when the tap weight is far from zero, more energy will be gained in iteration. In this way, the aim of proportional algorithm is achieved, that is, the convergence speed of the algorithm is accelerated without changing the steady-state mean square deviation (MSD).
In order to further accelerate the convergence speed of the algorithm better, this section adds another method for sparse systems, i.e.,l 0 -norm constraint [12,34], on the premise of adding proportional algorithm. As the name implies, it is to add a γ ||w|| 0 to the original cost function. In this section, the cost function is changed into where γ > 0 is a factor to balance the new penalty and the estimation error. In [12], in order to reduce the computational complexity, the author firstly approximates ||w n || 0 to L−1 i (1 − e −β|w n (i)| ), then uses gradient descent method to derive the cost function, and then uses the Taylor formula to perform a first-order expansion on the zero attracting term in the iterative formula, and finally obtains the weight update formula with lower computational complexity, w n w n−1 + gradient correction + zero attraction (38) where zero attraction means κ f β (w n ) · κ μγ is a positive constant and f β (·) is defined as The parameter β is set to 5 in this paper. From (39), it can be seen that when the coefficients are in the range of (−1/β, 1/β), they will be constantly attracted to zero, and when the coefficients are not in the range, there will be no additional attraction, which will improve the convergence speed of those coefficients close to zero, and the overall convergence speed will be accelerated.
As in Sect. 3.1, primarily, it is necessary to process the output signal to get Eq. (13). Then, the error can be expressed as Eq. (16). Next, it is also needed to process α and then use the steepest descent method to get Eq. (24). Again, the next step is to estimate the parameter σ n . That is, the cost function based on l 0 -CRPFRMS is derived and obtained.σ Similarly, the weight update formula can be obtained as follows where w n (w n (0), . . . , w n (L − 1)) (42) The diagonal elements of G n−1 are calculated as follows: where ρ 5/L.

The Convergence Analysis of l 0 -CRPFRMS Algorithm
The steady-state performance of l 0 -CRPFRMS algorithm is analyzed in the following part. To make the analysis tractable, the following assumptions are given, which are commonly used in the analysis of adaptive filtering algorithm [3,13].

Assumption 1
The noise η n is independent of the input signal. The impulsive noise ξ n doesn't occur.

Assumption 2
The weight error vectorŵ n w o − w n is independent of the input signal.
Then, the Taylor expansion of Γ n (α n−1 ) at the pointα n−1 is given by, where Z n Γ n (α n−1 ). Lettingα n−1 α −α n−1 , subtracting both sides of (46) by α and using (47) yields Using the Assumptions A1-A2 and taking expectation both sides of (4) leads to According to [24], E[Z n ] is the hessian matrix of Γ n (α n−1 ) and is negative definite. In the sequel, taking the expectation Γ n (α) with only v n results in Using (50) and Assumptions A1-A2 yields (see [24] for detail) Substituting (52) into (50) yields To guarantee the stability in the mean sense, the matrix should be stable. Note that the negative-definite has negative eigenvalues, and hence, the step size should be selected according to Under this condition, we have In the other words, Eq. (24) is an unbiased estimate of α if the proposed algorithm is stable. Then, (40) and (41) can be rewritten, respectively, Combining (55) and (56) giveŝ Then, the desired signal d n can be written as Using (60), (16) implies Inserting (64) into (57) yields where Then, taking the expectation of both sides of (65) where In order to ensure the convergence of the algorithm, this should make −1 < (I − μ n ) < 1. Therefore, the range of μ is Combining (53) and (69), we can obtain

Verify the Superiority of CR-FRMS
Firstly, the simulations in context of the system identification are carried out to illustrate the advantage of the proposed algorithm. The length of the unknown system generated randomly is 8 taps. It is assumed that the length of the unknown system is same as that of adaptive filter. Figure 1a, b depicts the performance of the CR-FRMS, FRMS [24], TSA [11], and LMS [13] algorithms under two background noise distributions, respectively, where p 2 and κ 8 × 10 −6 . The step sizes of four different algorithms are set to 0.03 in two different mixed noise environments. Figure 1a shows the simulation results in a mixture of sub-Gaussian and super-Gaussian noises, and Fig. 1b shows the simulation results in a mixture of sub-Gaussian noise and impulsive noise [uniform and Laplace distribution are independent and identically distributed (i.i.d.) over time with zero mean]. Bernoulli-Gaussian (BG) process [41] is used frequently for modeling the impulsive noise ξ n , formulated as ξ n τ n υ n , where τ n is a Bernoulli process with the probability of p{τ n 1} 0.01 and υ n is i.i.d. zero-mean Gaussian sequence with variance σ 10000. For a fair comparison of proposed algorithms, parameters are set so that the algorithms have a comparable initial convergence speed. As observed in The second experiment is to test the convergence performance of CR-FRMS with different step sizes. Figure 2a, b illustrates the MSD learning curves with different step sizes of input signals for Gaussian white noise. As expected, when the fixed step size is applied in the CR-FRMS algorithm, there is a trade-off between the steady-state MSD and the rate of convergence. That is, a small step size corresponds to the lower steady-state error, although it slows down the convergence speed. In contrast, a large step size which is located in the stable range provides the higher the convergence speed, while it achieves large steady-state error. The third experiment is to test the convergence performance of CR-FRMS with different parameters p. Suppose that the unknown system has 8 coefficients. The driven signal is white Gaussian with unit variance. The filter length is 8. The step size of these algorithms is fixed to 3 × 10 −2 , while p is set to different values. After a hundred times run, it can be seen from Fig. 3a that in the case of uniform distribution and Laplace distribution variances of 1 and 9, respectively, the larger the p, the smaller the MSD, but the effect is not significant. However, as can be seen from Fig. 3b, in the case of mixed background noise with impulsive noise and sub-Gaussian noise

Verify the Superiority of l 0 -CRPFRMS
The proposed l 0 -CRPFRMS is compared with the algorithms CR-FRMS, PNLMS [9] and l 0 -LMS [5] in two different background noise environments. The first one is a mixed noise with a uniform distribution with variance σ u 2.5 × 10 −5 and a Laplace distribution with variance σ L 0.01. The second is still mixed noise with a sub-Gaussian noise (uniform distribution) with signal-to-noise ratio of 30 dB and the impulsive noise. Besides, in two different mixed background noise environments, p 2. Suppose that the unknown system has 64 coefficients, in which two of them are nonzero ones (their locations and values are randomly selected). The input signal is white Gaussian with unit variance. The filter length is 64. After fifty independent operations, their MSD curves are shown in Fig. 4a, b. It is evidently recognized that l 0 -CRPFRMS algorithm converges faster than its ancestor.
Comparing different algorithms is also needed to do experimental research on different values of parameter κ. Theoretically, when κ is larger, the update weight will be closer to zero faster, that is, the convergence speed of the algorithm will be accelerated. However, the MSD of the algorithm will be increased. In this paper, the range of values of κ is relatively small. Therefore, as can be seen from Fig. 5a convergence speed of the algorithm is almost the same when κ is different, but MSD has obvious difference, that is, the smaller the κ, the smaller the MSD.

Conclusion
In this paper, two algorithms based on censored regression, namely CR-FRMS and l 0 -CRPFRMS, are proposed under two different mixed background noises. CR-FRMS show superiority to LMS, TSA and FRMS in terms of MSD. Meanwhile, in the case where the unknown system is a sparse system, l 0 -CRPFRMS exhibits a faster convergence speed than CR-FRMS without changing the MSD. Since the two algo-rithms show different advantages, CR-FRMS has lower computational complexity and l 0 -CRPFRMS converges fast. Therefore, in real life, we must combine the actual requirements and choose a reasonable algorithm.