Analysis of the LMS and NLMS algorithms using the misalignment norm

This work describes the convergence of the misalignment square norm (MSN) of the NLMS and LMS algorithms. It is shown that the MSN decrease is almost proportional to the mean square error (MSE). This allows obtaining simple expressions for the steady-state MSE. Also, it allows limiting the amount of time that the MSE takes large values and a curve that limits the MSE of LMS at any given time, independent of the input and background noise signals’ properties. Finally, it is also shown that many complications in the analysis of the LMS and NLMS algorithms can come from variations in the input vector square norm. The proposed analysis becomes very simple for long filters or constant power signals.


Introduction
The theory on the least mean square (LMS) [1,2] algorithm dates back to the 1960s [3,4], but comprehensive analyses of the algorithm using the independence assumption seem to have been published in the 1980s [5,6]. The independence assumption [1,7] states that the input data vector is statistically independent or that the filter weights are independent of the input data vectors. It has been shown that using this assumption and for moderate step sizes, the theoretical results agree very well with experiments. The convergence of the NLMS was also studied at that time [8]. After this, some results were obtained for non-Gaussian signals [9], and an exact analysis has been published [10]. The exact analysis uses mathematical software to obtain complex expressions for the convergence of the algorithm, resulting in results that agree with experiments also for large step s sizes. One work [11] shows that the LMS and NLMS algorithms are H ∞ optimum and limit the error signal's power given the disturbances.

The LMS and NLMS algorithms
Given a input signal x(n) and a desired response signal d(n), the LMS algorithm can be used to adapt a size N finite impulse response filter w(n) = [w 0 , w 1 , . . . w N −1 ] T to minimize the error between the filter output, The LMS algorithm is then described by [1,2], where μ is the step size and e(n) is the error signal Let, w o be the filter that minimizes the mean square error (MSE) E[e 2 (n)] given by the Wiener filter [1], then e(n) can also be written by where v(n) is the background noise and is uncorrelated to x(n). In the NLMS algorithm [1,2], μ is replaced by μ/(x T (n)x(n) + q) where q is a small regularization factor [23], resulting in: The NLMS algorithm is stable for μ < 2 [1]. In the following analysis, q is taken to be small or zero.

Variation of the misalignment square norm of the LMS
Let the misalignment be the vector formed by the error in the filter weights, (n) = w(n) − w o then for both the LMS and NLMS algorithms where μ x (n) = μ for the LMS and μ x (n) = μ/(x T (n)x(n)) for the NLMS, and resulting in This equation, although simple, is the main result of this work. It will be useful to get other results. Making e(n) From this equation, it is possible to obtain one first result. It is not possible to guarantee the stability of the LMS and NLMS algorithm for any input signal. In fact if x(n) is selected orthogonal to (n) so that (n) = 0, both algorithms diverge since (n) 2 will in general increase. Of course, this condition is very unlikely to happen in practice since, generally, x(n) is independent of (n).

LMS with Gaussian signals
The following assumptions are used in this section: 1. all signals will be taken to be Gaussian; 2. x(n) is a stationary process; 3. x(n) is independent of (n) that corresponds to the independence assumption of the LMS [1,5,7]; Taking expected values on both sides of (8) results in, where δ(n) 2 and using the Gaussian moment factoring theorem [1] results in and noting that q (n) = tr{K(n)R} andP T = tr{R}. And, finally, where For the case of white noise R = q x I results in β = 1 + 2/N . Also, note that β(n) is always less than three.

Steady-state mean square error and stability
It is known from LMS theory [1] that the convergence of δ(n) does not exhibit oscillations. So, at steady state δ(n + 1) − δ(n) = 0 and from (12) results in that, ; this result agrees with the results from [5,6] for the Gaussian white noise case. Since, for stability, δ(n + 1) − δ(n) ≤ 0, the following limit for the step results in,

Limit on convergence time
Given a value for the excess noise power q x , one can limit the maximum time that q (n) ≥ q x by the time it would take to δ(n) become zero using (12). Namely, where β M is the maximal value for β(n), resulting in and the maximum excess noise at time n is obtained by merely rewriting the same equation as

NLMS with Gaussian signals
This section analyzes the NLMS using the same assumptions as in the previous section for the LMS. For the NLMS, taking expected values on both sides (8) results in where γ 0 (n) is such that and γ 1 is such that Closed-form expressions for γ 0 (n) and γ 1 were obtained for white input signals. Using the independence assumption, Diagonalizing where K(n)Q(n) = (n)Q(n) and (n) is a diagonal matrix with entries λ i (n) and u(n) = Q(n)x(n). Now, since x(n) is Gaussian and white, u(n) is also Gaussian and white, and the expectation does not depend on i. Also, Finally, this implies γ 0 (n) = 1 since q (n) = q x tr{K(n)} and P T = q x M. To calculate γ 1 , note that (for white noise) P T (n) has a chi-square distribution with N degrees of freedom with probability density function (PDF) ρ(x). Next, and γ 1 = M/(M −2). Finally, [9] gets the following approximation for non-Gaussian signals E [1/P T (n)] = 1/(M +1− κ x )/q x where κ x is the kurtosis of x(n). It is also possible to obtain similar limits using the results in [11], for instance, (65), but these are looser bonds.

Steady-state mean square error and stability
Proceeding in the same way as for the LMS algorithm, one gets for NLMS, Moreover, the algorithm is stable for μ < 2. For the white noise case, γ 1 /γ 0 = M/(M − 2). This is the same result as obtained in [8].

Limit on convergence time
For the NLMS, where γ 0m is minimal value for γ 0 (n). This gives the maximal time to reach an excess noise of q x , and for the maximum excess noise at time n, Similar but less general results appear in [9].

Constant input vector square norm signals
Most of the complications in the previous sections' calculations come from the dependence between e 2 (n) and P T (n). If these signals are independent, then the analysis simplifies considerably. One such case is when P T (n) is mostly constant. This can happen if x(n) 2 is constant, for instance, in communications signals, or if the filter size N is large, making the fluctuations in P T (n) small. In the previous sections, it was already shown that in the white noise case, the values of β(n), γ 0 (n), and γ 1 (n) become close to one for large N . This section presents the same results as the previous sections but for constant P T (n) = P T . These results are the same for the LMS and NLMS (with μ = μ x = μ LMS = μ NLMS /P T ) and correspond in making β(n), γ 0 (n) and γ 1 (n) equal to one in the previous expressions. The previous results show that this approximation is fair even for moderate N .
Taking expected values on both sides of (7) results simply in The steady-state MSE is and the maximal MSE at time n is

LMS with the Cauchy-Schwarz inequality
In the case of the LMS algorithm, obtaining (9) requires v(n) to be i.i.d. However, it does not require the independence assumption, and it is possible to obtain a maximal value for E[P T (n) 2 (n)] without using it. Namely, using the Cauchy-Schwarz inequality, resulting that β(n) in (12) becomes where 2 is the kurtosis of (n). When x(n) and (n) are Gaussian signals κ x ≤ 3, κ x = 1+2/N for white Gaussian signals and κ = 3. Note, however, that in general (n) is not Gaussian even with Gaussian input and background noise v(n). It is straightforward to obtain expression for the maximum MSE and other using these values. A value for the step that assures stability is μ ≤ 2/(P T β) ≤ 2/(P T √ κ x κ ).

Simulations
This section presents simulation results to confirm the theoretical findings from the previous sections. Simulations were run for the LMS and NLMS algorithms with a filter size of N = 10 and a Gaussian input signal with unit power. The charts result from 100 Monte Carlo runs with different optimal filters and signals during 1000 samples, unless stated otherwise. The simulation of Fig. 1 shows the variation of the LMS algorithm misalignment square norm ensemble average δ 2 (n) = E[ (n) 2 ] with the squared error signal, more precisely E[e 2 (n)(2 −μP T (n))]. The measurement noise signal power was q v = 0.1. It can be seen that both quantities are linear dependent as predicted by (7). The chart shows three lines for three values for the step size μ. Figures 2 and 4 compare theoretical and experimental curves for the MSE of the LMS and NLMS algorithms as a function of the step size and different values of the measurement noise q v . Both charts use a colored input signal that results from filtering a white noise signal by a size ten filter with impulse response s(n) = sin(2π n/5). The theoretical curves were obtained assuming a white noise input signal. The actual input is colored, but the difference in theoretical  curves is not high for this power spectrum. They are all close to the long filter case, even for N = 10. The curves differ for very small steps because the experiments did not have time to reach a steady state.
In the LMS algorithm, the experimental curves agree with theory up to about μ = 0.04, while the theoretical maxi- mum step is 0.2. From this point forward, the independence assumption is no longer valid. It is worth noting that the validity range for the theoretical curves increases and decreases as the signal becomes less or more colored in agreement with [13]. In the NLMS algorithm, the experimental curves are very close to theory for all values of the step size. Figure 3 shows the same plot as in Fig. 2 but for a long filter with N = 100. It can be seen that the range of values of the step that the experimental values agree with the theory is much larger. Namely, in Fig. 2 this range is about 20% = 0.04/0.2 of 2/M the theoretical maximum and in Fig. 3, the range is 40%. For M = 1000, this range grows to 70%.
Finally, Fig. 5 compares theoretical values for the maximal MSE as a function of the time n, and simulation results with the worst values for the expected value of e 2 (n) using 10,000 Monte Carlo runs. Each run used a random optimal filter and a random input signal coloring filter of size 10. The figure confirms that the theoretical curves limit the MSE at any given time.

Conclusion
This work shows that the LMS misalignment square norm mean-variation is close to proportional to the error signal power. This allows obtaining formulas for the steady-state MSE of the LMS and NLMS in a simple way. It also allows obtaining formulas for the maximum MSE of the LMS and NLMS at any given time and any input signal. There are still some difficulties in the calculation due to the dependence between the error noise signal and the input signal vector norm. Simulation results confirm the theoretical findings.
Author Contributions Not applicable.
Funding Open access funding provided by FCT|FCCN (b-on). This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UIDB/50021/2020.

Availability of data and materials
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Conflict of interest
The authors have no financial or proprietary interests in any material discussed in this article.

Ethical approval Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.