1 Introduction

IN always exists and is more harmful especially in OFDM system such as digital subscriber lines [1], power line communication (PLC) [2], wireless communication [3], and underwater acoustic communication [4]. IN model such as Gaussian mixture (GM), Middleton Class A [5], and Bernoulli-Gaussian [6] can cover most scenarios where IN occurs in PLC systems, and \(\alpha\)-stable distribution model can be used to describe the some short-term spikes existed in shallow water acoustic channel [7].

Although OFDM is inherently more resistant to IN than single carrier modulation, its influence extends to all subcarriers and if the power of IN exceeds a certain threshold, it will lead to a sharp decline in system performance [8]. It follows that in practical applications, the IN should be properly suppressed in the receiver side before detection.

Numerous algorithms have been proposed. One type of commonly used IN mitigation algorithms is nonlinear filtering in which the received signal samples contaminated by IN is then adjusted by using nonlinear blanking or clipping method. The drawback of such algorithms is that it not only suppresses the interference, but also destroys the original signal. Moreover, such algorithms need to use the prior statistics of IN to derive the optimal threshold, which is very difficult to obtain in practice, and the suboptimal threshold is usually set by using an empirical value. Another kind of suppression algorithms is based on compressed sensing (CS) theory. These algorithms are proposed base on the sparsity of IN and CIR [9,10,11,12,13,14], and their performances are superior to traditional blanking and clipping algorithms [15]. However, these algorithms have high computational complexity. In [16], the generalized approximate message passing (AMP) algorithm is used to solve the problems of joint channel coefficients, IN, and data symbols estimation. However, the algorithms based on prior information about CIR and IN, which is difficult to obtain. In [17], the distribution of CIR and IN are modeled as Gaussian Mixture (GM) model, which is the prior information of AMP algorithm. Nevertheless, this assumption does not apply to other channels. In [18], a SBL algorithm combined with Kalman filter is proposed to solve the joint channel and IN estimation problem in OFDM systems. Compared with SBL algorithm, it improves the estimation performance of the system. However, the estimation performance is limited by the number of pilots, increasing the number of pilots will reduce the spectral efficiency of OFDM system.

The problem of joint channel and IN estimation based on all subcarriers in the presence of IN for OFDM system is discussed. Initially, we regard the CIR and IN as an unknown sparse vector and use a SBL framework that employs all subcarriers to estimate unknown vector. The SBL theory is used based on the prior distribution of variables, and then the forward backward joint system is established, which apply the data detection simultaneously [19]. In addition, we develop an FB-Kalman implementation algorithm using EM updates to iteratively estimate the unknown parameters. In the E-step, explicit expressions of mean and covariance matrices of the posterior distribution are derived. In the M-step the unknown parameters are iteratively estimated. Simulation results show that the proposed algorithm improves the channel estimation performance and BER performance of OFDM system in the presence of IN communication environment.

2 System model

Consider a coded frame based OFDM system with N subcarriers, which contains \(N_d\) subcarriers for data transmission, \(N_p\) pilot subcarriers, and \(N_n\) null subcarriers. At the transmitting side, an OFDM frame is composed of \(N_b\) OFDM symbols. At the receiver side, the \(m{\text {th}}\) OFDM symbol \({\varvec{r}}_m \epsilon \mathbb {C}^{N\times 1}\) of the received frame in the frequency domain can be written in a vector form as

$$\begin{aligned} {\varvec{r}}_m =\textbf{D}_m{\varvec{h}}_m+{\varvec{i}}_m+{\varvec{w}}_m \end{aligned}$$
(1)

where \(\mathbf {D_m}=diag({\varvec{d}}_m)\) is a diagonal matrix with \({\varvec{d}}_m\epsilon \mathbb {C}^{N\times 1}\) as the main diagonal element which represents the \(m{\text {th}}\) transmitted OFDM symbol. \({\varvec{h}}_m\epsilon \mathbb {C}^{N\times 1}\) denotes the frequency domain CIR. \({\varvec{i}}_m\epsilon \mathbb {C}^{N\times 1}\) and \({\varvec{w}}_m\epsilon \mathbb {C}^{N\times 1}\) are the frequency domain IN and background Gaussian noise, respectively. The subscript \(m(m=1,2,...,N_b)\) represents the index of OFDM symbols in a frame.

By introducing \({\varvec{h}}_m=\sqrt{N}\textbf{F}_L{\varvec{h}}_{t,m}\) and \({\varvec{i}}_m=\textbf{F}{\varvec{i}}_{t,m}\), where \(\textbf{F}\epsilon \mathbb {C}^{N\times N}\) represents the normalized discrete Fourier transform matrix, \(\textbf{F}_L\epsilon \mathbb {C}^{N\times L}\) is the submatrix selected by the first L columns of matrix \(\textbf{F}\),where L represents the channel length, \({\varvec{h}}_{t,m}\epsilon \mathbb {C}^{L\times 1}\) and \({\varvec{i}}_{t,m}\epsilon \mathbb {C}^{N\times 1}\) denote the time domain CIR and IN, respectively. Equation (1) can be rewritten as

$$\begin{aligned} {\varvec{r}}_m =\sqrt{N}\textbf{D}_m\textbf{F}_L{\varvec{h}}_{t,m}+\textbf{F}{\varvec{i}}_{t,m}+{\varvec{w}}_m \end{aligned}$$
(2)

3 Joint channel and impulsive noise estimation

In this section, based on SBL framework and Kalman filter, we propose the following new algorithm to jointly estimate impulse noise and channel. Let \(\varvec{\Phi _m}=[\sqrt{N}\textbf{D}_m\textbf{F}_L,\textbf{F}] \epsilon \mathbb {C}^{N\times (L+N)}\) and \(\varvec{\theta _m}=({\varvec{h}}_{t,m}^{T},{\varvec{i}}_{t,m}^{T})^T \epsilon \mathbb {C}^{L+N}\), Eq. (2) can be expressed as

$$\begin{aligned} {\varvec{r}}_m = \varvec{\Phi }_m\varvec{\theta }_m+{\varvec{w}}_m \end{aligned}$$
(3)

Because the CIR \({\varvec{h}}_{t,m}\) and IN \({\varvec{i}}_{t,m}\) are regarded as unknown sparse vectors, \(\varvec{\theta }_m\) formed by \({\varvec{h}}_{t,m}\) and \({\varvec{i}}_{t,m}\), can also be regarded as an sparse vector. Noting that the matrix \(\varvec{\Phi }_m\) is an underdetermined matrix, estimating the unknown vector by Eq. (3) translates into a CS problem.

We assume that \(\varvec{\theta }_m\) is unchanged in one OFDM symbol, but changes between symbol to symbol according to a state-space model. Then, taking Eq. (3) as the observation equation, the state equation can be expressed as

$$\begin{aligned} \varvec{\theta }_m = \textbf{A}\varvec{\theta }_{m-1}+{\varvec{v}}_m \end{aligned}$$
(4)

where \(\textbf{A}\triangleq diag(\rho \textbf{1}_L,\varvec{0}_N)\), the \(\textbf{1}_L\) is a L-length vector with all elements as one, and \(\varvec{0}_N\) is a N-length vector with all elements as zero. \(\varvec{\theta }_m\sim \mathcal{C}\mathcal{N}(0,\textbf{B}\varvec{\Gamma }_m)\), where \(\varvec{\Gamma }_m=diag(\varvec{\Gamma }_{({\varvec{h}})},\varvec{\Gamma }_{({\varvec{i}})_m})\) and \(\textbf{B}\triangleq diag((1-{\rho }^2)\textbf{I}_L,\textbf{I}_N)\). \(\varvec{\Gamma }_{({\varvec{h}})}=diag(\gamma _0,\gamma _1,...,\gamma _{L-1})\), \(\varvec{\Gamma }_{({\varvec{i}})}=diag(\gamma _L,\gamma _{L+1},...,\gamma _{L+N-1})\) are the covariance matrices of the corresponding CIR and IN, respectively. The excitation noise \({\varvec{v}}_m\) and the observation noise \({\varvec{w}}_m\) are independent Gaussian white noise with zero mean i.e.,\({\varvec{v}}_m\sim \mathcal{C}\mathcal{N}(0,\textbf{B}\varvec{\Gamma }_m)\), \({\varvec{w}}_m\sim \mathcal{C}\mathcal{N}(0,\beta _m\textbf{I}_N)\) and \(\textbf{I}_N\) is an \(N\times N\) identity matrix. \(\rho\) is the correlation coefficient of the state transition, and \(\beta _m\) is a scalar corresponding to the background noise.

We assume the following forward and backward systems

$$\begin{aligned} {\varvec{r}}_m^f= & {} \varvec{\Phi }_m^f\varvec{\theta }_m+{\varvec{w}}_m^f \end{aligned}$$
(5)
$$\begin{aligned} {\varvec{r}}_m^b= & {} \varvec{\Phi }_m^b\varvec{\theta }_m+{\varvec{w}}_m^b \end{aligned}$$
(6)

where the subscripts f denote the variable in the forward systems and b denote the variables in the backward systems, respectively. In the process of forward Kalman filter, the filtering process is initialized as \(\varvec{\theta }_{0|0}^f=0\), \(\varvec{\Sigma }_{0|0}^f={{{\textbf {I}}}_{L+N}}\).

In the prediction step, we have

$$\begin{aligned} \varvec{\theta }_{m|m-1}^f= & {} \textbf{A}\varvec{\theta }_{m-1|m-1}^f \end{aligned}$$
(7)
$$\begin{aligned} \varvec{\Sigma }_{m|m-1}^f= & {} \textbf{A}\varvec{\Sigma }_{m-1|m-1}^f\textbf{A}^{\textrm{T}}+\textbf{B}\varvec{\Gamma }_m^f \end{aligned}$$
(8)
$$\begin{aligned} {\varvec{e}}_m^f= & {} {\varvec{r}}_m^f-{\varvec{r}}_{m|m-1}^f \end{aligned}$$
(9)
$$\begin{aligned} {\varvec{r}}_{m|m-1}^f= & {} \varvec{\Phi }_m^f\varvec{\theta }_{m|m-1}^f \end{aligned}$$
(10)

where \(\varvec{\theta }_{m|m-1}^f\) denotes the \(m{\text {th}}\) state prediction obtained by using the result of the \({m-1}{\text {th}}\) optimal state. \(\varvec{\theta }_{m-1|m-1}^f\) is the result of the \({m-1}{\text {th}}\) optimal state. \({\varvec{e}}_m^f\) denotes the difference between the \(m{\text {th}}\) measured and predicted value. \(\varvec{\Sigma }_{m|m-1}^f\) and \(\varvec{\Sigma }_{m-1|m-1}^f\) are the \(m{\text {th}}\) prior and the \({m-1}{\text {th}}\) posterior covariance matrices of estimation errors, respectively.

In the update step, we can compute

$$\begin{aligned} \mathbf {K_m^f}= & {} \varvec{\Sigma }_{m|m-1}^f(\varvec{\Phi }_m^f)^H\left( (\beta _m^f)\mathbf {I_N}+\varvec{\Phi }_m^f\varvec{\Sigma }_{m|m-1}^f(\varvec{\Phi }_m^f)^H\right) ^{-1} \end{aligned}$$
(11)
$$\begin{aligned} \varvec{\theta }_{m|m}^f= & {} \varvec{\theta }_{m|m-1}^f+\mathbf {K_m^f}{\varvec{e}}_m^f \end{aligned}$$
(12)
$$\begin{aligned} \varvec{\Sigma }_{m|m}^f= & {} (\textbf{I}-\mathbf {K_m^f}\varvec{\Phi }_m^f)\varvec{\Sigma }_{m|m-1}^f \end{aligned}$$
(13)

where \(\varvec{\theta }_{m|m}^f\) is the best estimate of the \({m}{\text {th}}\) state, \(\mathbf {K_m^f}\) is the Kalman gain,and \({\varvec{e}}_m^f\) is given by (9).

The process of backward Kalman filtering is similar to the forward Kalman filtering in (7)–(13), except that the former is from time \(m=N_b\) to \(m=1\).

For (5) and (6), we can apply the linear minimum mean-squared estimator (LMMSE) for each linear system to estimate \(\varvec{\theta }_m\) in each linear system. For the forward system, the estimated value of \(\varvec{\theta }_m\) is

$$\begin{aligned} \hat{\varvec{\theta }}_m^f=\varvec{\Sigma }_m^f(\varvec{\Phi }_m^f)^H\mathbf {R_{{\varvec{w}}_m^f}^{-1}}{\varvec{r}}_m^f \end{aligned}$$
(14)

where \(\varvec{\Sigma }_m^f=(\mathbf {R_{\varvec{\theta }_m}^{-1}}+(\varvec{\Phi }_m^f)^H\mathbf {R_{{\varvec{w}}_m^f}^{-1}}\varvec{\Phi }_m^f)^{-1}\) is the \({m}{\text {th}}\) estimation error matrix, \(\mathbf {R_{\varvec{\theta }_m}}\) and \(\mathbf {R_{{\varvec{w}}_m^f}}\) are the covariance matrices of \(\varvec{\theta }_m\) and \({\varvec{w}}_m^f\), respectively. Similarly, the estimated value of \(\varvec{\theta }_m\) from the backward system is

$$\begin{aligned} \hat{\varvec{\theta }}_m^b=\varvec{\Sigma }_m^b(\varvec{\Phi }_m^b)^H\mathbf {R_{{\varvec{w}}_m^b}^{-1}}{\varvec{r}}_m^b \end{aligned}$$
(15)

where \(\varvec{\Sigma }_m^b=(\mathbf {R_{\varvec{\theta }_m}^{-1}}+(\varvec{\Phi }_m^b)^H\mathbf {R_{{\varvec{w}}_m^b}^{-1}}\varvec{\Phi }_m^b)^{-1}\) is the \({m}{\text {th}}\) estimation error matrix, \(\mathbf {R_{{\varvec{w}}_m^b}}\) is the covariance matrix of \({\varvec{w}}_m^b\).

Combining (5) and (6), we have

$$\begin{aligned} \genfrac(){0.0pt}0{{\varvec{r}}_m^f}{{\varvec{r}}_m^b}=\left( {\begin{array}{c}\varvec{\Phi }_m^f\\ \varvec{\Phi }_m^b\end{array}}\right) \varvec{\theta }_m+\genfrac(){0.0pt}0{{\varvec{w}}_m^f}{{\varvec{w}}_m^b} \end{aligned}$$
(16)

The LMMSE estimation of \(\varvec{\theta }\) from (16) is given by

$$\begin{aligned} \hat{\varvec{\theta }}_m^{fb}= & {} \bigg ( \begin{bmatrix} (\varvec{\Phi }_m^f)^H&(\varvec{\Phi }_m^b)^H \end{bmatrix} \begin{bmatrix} \mathbf {R_{{\varvec{w}}_m^f}^{-1}} &{} 0 \\ 0 &{} \mathbf {R_{{\varvec{w}}_m^b}^{-1}} \end{bmatrix} \begin{bmatrix} \varvec{\Phi }_m^f \\ \varvec{\Phi }_m^b \end{bmatrix}+\mathbf {R_{\varvec{\theta }_m}^{-1}}\bigg )^{-1}\nonumber \\{} & {} \times \begin{bmatrix} (\varvec{\Phi }_m^f)^H&(\varvec{\Phi }_m^b)^H \end{bmatrix} \begin{bmatrix} \mathbf {R_{{\varvec{w}}_m^f}^{-1}} &{} 0 \\ 0 &{} \mathbf {R_{{\varvec{w}}_m^b}^{-1}} \end{bmatrix} \begin{bmatrix} {\varvec{r}}_m^f \\ {\varvec{r}}_m^b \end{bmatrix}\nonumber \\= & {} ((\varvec{\Phi }_m^f)^H\mathbf {R_{{\varvec{w}}_m^f}^{-1}}\varvec{\Phi }_m^f+(\varvec{\Phi }_m^b)^H\mathbf {R_{{\varvec{w}}_m^b}^{-1}}\varvec{\Phi }_m^b+\mathbf {R_{\varvec{\theta }_m}^{-1}})^{-1}\nonumber \\{} & {} \times ((\varvec{\Phi }_m^f)^H\mathbf {R_{{\varvec{w}}_m^f}^{-1}}{\varvec{r}}_m^f+(\varvec{\Phi }_m^b)^H\mathbf {R_{{\varvec{w}}_m^b}^{-1}}{\varvec{r}}_m^b) \end{aligned}$$
(17)

The estimation error matrix is

$$\begin{aligned} \varvec{\Sigma }_m^{\hat{\varvec{\theta }}_{fb}}= & {} \bigg ( \begin{bmatrix} (\varvec{\Phi }_m^f)^H&(\varvec{\Phi }_m^b)^H \end{bmatrix} \begin{bmatrix} \mathbf {R_{{\varvec{w}}_m^f}^{-1}} &{} 0 \\ 0 &{} \mathbf {R_{{\varvec{w}}_m^b}^{-1}} \end{bmatrix} \begin{bmatrix} \varvec{\Phi }_m^f \\ \varvec{\Phi }_m^b \end{bmatrix}+\mathbf {R_{\varvec{\theta }_m}^{-1}}\bigg )^{-1}\nonumber \\= & {} ((\varvec{\Phi }_m^f)^H\mathbf {R_{{\varvec{w}}_m^f}^{-1}}\varvec{\Phi }_m^f+(\varvec{\Phi }_m^b)^H\mathbf {R_{{\varvec{w}}_m^b}^{-1}}\varvec{\Phi }_m^b+\mathbf {R_{\varvec{\theta }_m}^{-1})^{-1}} \end{aligned}$$
(18)

According to equation (14) and (15), (17) and (18) can be simplified, respectively, as

$$\begin{aligned} \hat{\varvec{\theta }}_m^{fb}= & {} ((\varvec{\Sigma }_m^f)^{-1}+(\varvec{\Sigma }_m^b)^{-1}-(\mathbf {R_{\varvec{\theta }_m})^{-1}})^{-1}\nonumber \\{} & {} ((\varvec{\Sigma }_m^f)^{-1}\hat{\varvec{\theta }}_m^f+(\varvec{\Sigma }_m^b)^{-1}\hat{\varvec{\theta }}_m^b) \end{aligned}$$
(19)
$$\begin{aligned} \varvec{\Sigma }_{\hat{\varvec{\theta }}_m^{fb}}= & {} ((\varvec{\Sigma }_m^f)^{-1}+(\varvec{\Sigma }_m^b)^{-1} -(\mathbf {R_{{\varvec{\theta }}_m})^{-1}})^{-1} \end{aligned}$$
(20)

As Kalman filtering is an extension of sequential LMMSE estimation, the forward and backward estimations can be combined following (19) and (20), respectively, as

$$\begin{aligned} \varvec{\theta }_{m|m}= & {} \big ((\varvec{\Sigma }_{m|m}^f)^{-1}+(\varvec{\Sigma }_{m|m}^b)^{-1}\big )^{-1}\nonumber \\{} & {} \big ((\varvec{\Sigma }_{m|m}^f)^{-1}\varvec{\theta }_{m|m}^f +((\varvec{\Sigma }_{m|m}^b)^{-1}\varvec{\theta }_{m|m}^b\big ) \end{aligned}$$
(21)
$$\begin{aligned} \varvec{\Sigma }_{m|m}= & {} \big ((\varvec{\Sigma }_{m|m}^f)^{-1}+(\varvec{\Sigma }_{m|m}^b)^{-1}\big )^{-1} \end{aligned}$$
(22)

where \(\varvec{\theta }_{m|m}^f\) and \(\varvec{\Sigma }_{m|m}^f\) are given by (12) and (13), respectively, \(\varvec{\theta }_{m|m}^b\) and \(\varvec{\Sigma }_{m|m}^b\) obtained in a similar way. Since there is no prior auto-correlation information about \(\varvec{\theta }\), we set \(\mathbf {R_{\varvec{\theta }}^{-1}}=0\) in (19) and (20).

In (7)–(13), we see that in addition to the unknown parameters \(\varvec{\Gamma }_m\) and \(\beta\), there are some unknown matrix elements of the matrix \(\varvec{\Phi }\) which just based on unknown data symbol \(\textbf{D}\).

Let \(\varvec{\xi }\triangleq \big \{\{\varvec{\Gamma }_m\},\beta ,\mathbf {D_m}\big \}\) as the set of unknown parameters, the steps of EM algorithm are as follows:

In the E-step, the expected value of joint probability density \(p\big (\{{\varvec{r}}_m\},\{\varvec{\theta }_m\},\{\varvec{\Gamma }_m\},\beta \big )\) under posterior probability distribution \(p(\varvec{\theta }|{\varvec{r}})\) is obtained from

$$\begin{aligned} Q\big (\varvec{\xi }|\varvec{\xi }^{(k)}\big )=E_{\{\varvec{\theta }\}|\{{\varvec{r}}\}}\Big \{\log p\big (\{{\varvec{r}}\},\{\varvec{\theta }\},\{\varvec{\Gamma }_m^{(k)}\},\beta ^{(k)}\big )\Big \} \end{aligned}$$
(23)

where k represents the number of iterations.

In the M-step, first fix one parameter, solve another parameter, and then solve the optimal parameter alternately as

$$\begin{aligned} \varvec{\xi }^{(k+1)}={\mathop {\mathrm{arg\,max}}\limits _{\varvec{\xi }}}\,Q\big (\varvec{\xi }|\varvec{\xi }^{(k)}\big ) \end{aligned}$$
(24)

The likelihood function of (2) can be written

$$\begin{aligned} p\big (\{{\varvec{r}}\},\{\varvec{\theta }\},\{\varvec{\Gamma }_m\},\beta \big )= & {} p(\varvec{\theta }_1;\varvec{\Gamma }_1) \prod _{m=2}^{N_b}p(\varvec{\theta }_m|\varvec{\theta }_{m-1};\varvec{\Gamma }_m)\times \prod _{m=1}^{N_b}p({\varvec{r}}_m|\varvec{\theta }_m;\beta ) \end{aligned}$$
(25)

Form (24), we obtain the log-likelihood function as

$$\begin{aligned}{} & {} \log p\big (\{{\varvec{r}}\},\{\varvec{\theta }\}\big )\propto -\sum _{m=1}^{N_b} \frac{\Vert {\varvec{r}}_m-\varvec{\Phi }_m\varvec{\theta }_m\Vert _2^2}{\beta } \nonumber \\{} & {} \quad -NN_b\log \beta -\sum _{m=1}^{N_b} \log |\varvec{\Gamma }_m| \nonumber \\{} & {} \quad -\sum _{m=2}^{N_b} (\varvec{\theta }_m-\textbf{A}\varvec{\theta }_{m-1})^H(\textbf{B}\varvec{\Gamma }_m)^{-1} (\varvec{\theta }_m-\textbf{A}\varvec{\theta }_{m-1})-\varvec{\theta }_1^H\varvec{\Gamma }_1^{-1}\varvec{\theta }_1 \end{aligned}$$
(26)

By analyzing the expression (26), the optimal \(\varvec{\Gamma }_m\) can be obtained as

$$\begin{aligned} \big \{\varvec{\Gamma }_m^{(k+1)}\big \}= & {} \mathop {\arg \min }_{\{\varvec{\Gamma }_m\}} E_{\{{\varvec{r}}\}|\{\varvec{\theta }\}} \Bigg \{\sum _{m=1}^{N_b}\log |\varvec{\Gamma }_m|+\varvec{\theta }_1^H\varvec{\Gamma }_1^{-1}\varvec{\theta }_1 \nonumber \\{} & {} + \sum _{m=2}^{N_b}(\varvec{\theta }_m-\textbf{A}\varvec{\theta }_{m-1})^H (\textbf{B}\varvec{\Gamma }_m)^{-1}(\varvec{\theta }_m-\textbf{A}\varvec{\theta }_{m-1})\Bigg \}\nonumber \\= & {} \mathop {\arg \min }_{\{\varvec{\Gamma }_m\}}\Bigg \{\sum _{m=1}^{N_b}\log |\varvec{\Gamma }_m| +Tr(\varvec{\Gamma }_1^{-1}\mathbf {M_{1|1}}) +\sum _{m=2}^{N_b}Tr(\varvec{\Gamma }_m^{-1}\textbf{B}^{-1}\mathbf {M_{m|m}})\Bigg \} \end{aligned}$$
(27)

where \(\mathbf {M_{m|m}}=\varvec{\Sigma }_{m|m}+\hat{\varvec{\theta }}_{m|m}\hat{\varvec{\theta }}_{m|m}^H\).

Similarly, \(\beta\) can be optimized

$$\begin{aligned} \beta ^{(k+1)}=\mathop {\arg \min }_{\beta } E_{\{\varvec{\theta }\}|\{{\varvec{r}}\}}\Bigg \{\sum _{m=1}^{N_b} \frac{\Vert {\varvec{r}}_m-\varvec{\Phi }_m\varvec{\theta }_{m|m} \Vert _2^2}{\beta }+NN_b\log \beta \Bigg \} \end{aligned}$$
(28)

By setting the derivative of (28) with respect to \(\beta\) as zero, we obtain

$$\begin{aligned} \beta ^{(k+1)}=\frac{1}{NN_b}Tr\Bigg [\sum _{m=1}^{N_b}({\varvec{r}}_m-\varvec{\Phi }_m\varvec{\theta }_{m|m})({\varvec{r}}_m-\varvec{\Phi }_m\varvec{\theta }_{m|m})^H +\varvec{\Phi }_m\varvec{\Sigma }_{m|m}\varvec{\Phi }_m^H\Bigg ] \end{aligned}$$
(29)

After obtaining the estimated values of \(\varvec{\Gamma }_m^{(k+1)}\) and \(\beta ^{(k+1)}\), respectively, the estimated values of data symbol \(\mathbf {D_m}\) is derived as follows.

$$\begin{aligned} \textbf{D}_m^{(k+1)}= & {} {\mathop {\mathrm{arg\,max}}\limits _{\textbf{D}}}\,Q\Big (\mathbf {D^{(k+1)}};\varvec{\Phi }^{(k)},\big \{\varvec{\Gamma }_m^{(k+1)}\big \},\beta ^{(k+1)}\Big )\nonumber \\= & {} {\mathop {\mathrm{arg\,max}}\limits _{\textbf{D}}}\,\Bigg \{c-E_{\{{\varvec{r}}\}|\{\varvec{\theta }\}}\Bigg \{\frac{\Vert {\varvec{r}}_m-\varvec{\Phi }_m\varvec{\theta }_m \Vert _2^2}{\beta }\Bigg \}\Bigg \}\nonumber \\= & {} {\mathop {\mathrm{arg\,max}}\limits _{\textbf{D}}}\,\Big \{c-\beta ^{-1}\big [\Vert {\varvec{r}}_m-\varvec{\Phi }_m\varvec{\theta }_m \Vert _2^2+Tr(\varvec{\Phi }_m^H\varvec{\Sigma }_m\varvec{\Phi }_m)\big ]\Big \}\nonumber \\= & {} \mathop {\arg \min }_{\textbf{D}}\Vert {\varvec{r}}_m-\varvec{\Phi }_m\varvec{\theta }_m \Vert _2^2 +Tr(\varvec{\Phi }_m^H\varvec{\Sigma }_m\varvec{\Phi }_m) \end{aligned}$$
(30)

where c is a constant independent of \({\varvec{D}}\), and Tr() denotes the matrix trace. We have

$$\begin{aligned}{} & {} \Vert {\varvec{r}}_m-\varvec{\Phi }_m\varvec{\theta }_m \Vert _2^2+Tr(\varvec{\Phi }_m^H\varvec{\Sigma }_m\varvec{\Phi }_m)\nonumber \\{} & {} \quad =\Vert {\varvec{r}}_m-\varvec{\Phi }_m\varvec{\theta }_m \Vert _2^2+Tr(N \mathbf {D_m}\textbf{F}_L\varvec{\Sigma }_{1,1}\textbf{F}_L^H\mathbf {D_m}^H\nonumber \\{} & {} \qquad +\sqrt{N}\textbf{F}\varvec{\Sigma }_{2,1}\textbf{F}_L^H{\varvec{D}}_m^H+\sqrt{N}\mathbf {D_m}\textbf{F}_L\varvec{\Sigma }_{1,2}\textbf{F}^H+\textbf{F}\varvec{\Sigma }_{2,2}\textbf{F}^H) \end{aligned}$$
(31)

It is reasonable to assume that the CIR vector and IN vector are independent of each other, so that the covariance submatrices \(\varvec{\Sigma }_{2,1}\) and \(\varvec{\Sigma }_{1,2}\) in Eq. (31) can be set to zero matrix.

Denotes \({\varvec{I}}_d\) as the indices of subcarriers with data symbols. By substituting (31) into (30), the iterative update formula of each data symbol \(d_m[i]\), i.e., the \({i}{\text {th}}\) diagonal element of \(d_m\), \(i\in {\varvec{I}}_d\), can be obtained by solving the following problem

$$\begin{aligned} \big (d_m[i]\big )^{(k+1)}= & {} \mathop {\arg \min }_{d_m[i]}\big |r_m[i]-\sqrt{N}d_m[i]\textbf{F}_L[i,:]\varvec{\theta }_{m|m,1} \nonumber \\{} & {} -\textbf{F}[i,:]\varvec{\theta }_{m|m,2}\big |^2+\mathbf {C_{m,1}}[i,i]\big |d_m[i]\big |^2 \end{aligned}$$
(32)

where

$$\begin{aligned} \mathbf {C_{m,1}}[i,i]= & {} N \textbf{F}\varvec{\Sigma }_{m|m,1,1}\textbf{F}^H \end{aligned}$$
(33)
$$\begin{aligned} \varvec{\Sigma }_{m|m}= & {} \begin{pmatrix} \varvec{\Sigma }_{m|m,1,1} &{} 0 \\ 0 &{} \varvec{\Sigma }_{m|m,2,2} \end{pmatrix} \end{aligned}$$
(34)
$$\begin{aligned} \varvec{\theta }_{m|m}= & {} \big [\varvec{\theta }_{m|m,1}^T,\varvec{\theta }_{m|m,2}^T\big ]^T \end{aligned}$$
(35)

and \(\textbf{F}[i,:]\) is the \({i}{\text {th}}\) row of \(\textbf{F}\), \(\varvec{\theta }_{m|m}\) and \(\varvec{\Sigma }_{m|m}\) can be obtained, respectively, by (21) and (22).

The entire algorithm of FB-Kalman is summarized in Algorithm 1.

figure a

4 Complexity analysis

The FB-Kalman algorithm includes three major steps: forward filtering, backward filtering, and EM parameters estimation. In the forward filtering step, most of the calculations are used to calculate the Kalman gain \(\textbf{K}_m\), which has a complexity of \({O}({N^{3}})\) for each OFDM symbol. The backward filtering step has a computational complexity of \({O}({N^{3}})\) for each OFDM symbol. Moreover, the complexity of the EM step is \({O}(I_1{N^{3}})\), where \(I_1\) is the average number of iterations of the EM process till convergence. Thus, for each OFDM symbol, the computational complexity order of the FB-Kalman algorithm is \({O}((I_1+2){N^{3}})\).

The JCI-Kalman algorithm uses pilot subcarriers, the complexity of the filtering step is \({O}({N_p^{3}})\) for each OFDM symbol, and the complexity of the EM step is \({O}(I_2{N_p^{3}})\) for each OFDM symbol, where \(I_2\) is the average number of iterations of the EM process till convergence. So, the computational complexity of this algorithm is \({O}((I_2+1){N_p^{3}})\) per OFDM symbol.

The SBL-LS algorithm uses SBL method to remove IN by using null subcarriers, and the CIR is estimated by least square method. The EM step has a computational complexity of \({O}(I_3{N_n^{3}})\) for each OFDM symbol, where \(I_3\) is the average number of iterations of the EM process till convergence. And the complexity of the LS step is \({O}({N_p^{3}})\) for each OFDM symbol. Thus the SBL-LS algorithm has a computational complexity of \({O}(I_3{N_n^{3}}+{N_p^{3}})\).

From the above analysis, we see that the FB-Kalman algorithm using all subcarriers has higher complexity than the JCI-Kalman algorithm using only pilot subcarriers and the SBL-LS algorithm using pilot subcarriers and null subcarriers. However, the simulation results show that the proposed FB-Kalman algorithm provides best performance in NMSE and BER. Similarly, compared with the original SBL based algorithm [10], the proposed FB-Kalman algorithm has higher computational complexity and better performance by using filtering and smoothing operations to track time-varying channels and IN. These trade-offs of performance and complexity are meaningful for practical OFDM systems.

5 Results and discussion

In this section, we demonstrate the performance of the proposed joint channel and IN estimation algorithms. An OFDM system with \(N=256\), \(N_p=44\), \(N_d=162\), and \(N_n=50\) is simulated. The Rayleigh-fading uncorrelated-scattering model with sparse impulse response [20] is adopted. Each OFDM frame is composed of \(N_b=7\) OFDM symbols.

The noise including IN and background noise is realized by the publicly available software [21], which uses a Gaussian mixture model to simulate the IN distribution, where the probability of each noise component is 0.9, 0.07, 0.03, and the corresponding power of each noise component is 1, 100, 1000. In the noise generated in the simulation, \(7\%\) of the IN components exceed the background noise power by 20 dB, and about \(3\%\) of the IN components exceed the background noise power by 30 dB.

In addition, we also consider IN environment with \(\alpha\)-stable distribution. To verify the performance of channel and IN estimation, \(\alpha\)-stable IN model with characteristic exponent \(\alpha =1\), skewness parameter \(\beta =0\), scale parameter \(\gamma =0.05\), and location parameter \(\delta =0\) is used in the simulation.

The performance of the FB-Kalman algorithm is compared with the other two algorithms in this section:

  • SBL-LS: assume that the IN is mitigated by using the SBL method with null subcarriers [13], the tap-aware LS method is used to estimate the CIR.

  • JCI-Kalman: assume that the IN and CIR are jointly estimated by using the JCI-Kalman method with pilot subcarriers [18].

Figure 1 compares the channel estimation performance of all the above mitigation algorithms for convolutional coded systems with 1/2 rate using 4-QAM. The figure shows that the NMSE of the three algorithms gradually decreases with the increase in signal-to-noise ratio (SNR). The proposed algorithm performs better than SBL-LS and JCI-Kalman. Compared with SBL-LS, the SNR improvement of the proposed algorithm is more than 10 dB, compared with JCI-Kalman algorithm, the SNR improvement is about 7 dB. Compared with SBL-LS, JCI-Kalman has a gain of about 4 dB in SNR. It can be seen that the joint channel and IN estimation algorithm has better performance than the single estimation. The reason why the proposed algorithm has better performance is that the forward message and the backward message can provide more useful information.

Fig. 1
figure 1

Channel estimation performance comparison for various mitigation methods in coded 4-QAM system

Figure 2 provides the BER performance for convolutional coded systems using 4-QAM. As shown in the figure, the BER of the three algorithms gradually decreases with the increase in SNR. The proposed algorithm is always better than the other two algorithms, and until the SNR approaches 10 dB, the BER of JCI-Kalman and FB-Kalman gradually approaches and then to zero. When BER \(={10^{-3}}\), the proposed algorithm achieves about 2 dB gain in SNR over JCI-Kalman and about 7 dB gain in SNR over SBL-LS.

Fig. 2
figure 2

BER performance comparison for various mitigation methods in coded 4-QAM system

Figure 3 compares the channel estimation performance of three mitigation algorithms for uncoded systems using 4-QAM. Similar to the coded system, the NMSE of three algorithms decreases with the increase in SNR. The proposed algorithm performs better than the other two algorithms. Compared with SBL-LS, the SNR improvement of the proposed algorithm is more than 8 dB, compared with JCI-Kalman, the SNR improvement is more than 5 dB. Compared with SBL-LS algorithm, JCI-Kalman algorithm has a gain of about 3 dB in SNR. Compared with the coded system, the uncoded system has higher NMSE. It shows that channel coding can effectively improve the signal channel estimation performance.

Fig. 3
figure 3

Channel estimation performance comparison for various mitigation methods in uncoded 4-QAM system

Figure 4 provides the BER performance for uncoded systems using 4-QAM. Similar to the coded system, the figure also shows that the BER of three algorithms decreases with the increase in SNR. It can be seen that the proposed algorithm performs better than the other two algorithms. The uncoded system exhibits higher BER compared to the coded system. It shows that channel coding can effectively improve system performance.

Fig. 4
figure 4

Channel estimation performance comparison for various mitigation methods in uncoded 4-QAM system

In the IN environment with \(\alpha\)-stable distribution, the results of NMSE and BER performance for coded system are shown in Figs. 5 and 6, respectively, while the results for uncoded system are shown in Figs. 7 and 8, respectively. For coded system, Fig. 5 compares the channel estimation performance of three algorithms. It is obviously seen that the proposed algorithm performs better than SBL-LS and JCI-Kalman. Compared with SBL-LS algorithm, the SNR improvement of the proposed algorithm is more than 10 dB, compared with JCI-Kalman algorithm, the SNR improvement is more than 8 dB. Figure 6 provides the BER performance of three algorithms. The figure shows that the proposed algorithm outperforms the other two algorithms. When BER \(={10^{ - 3}}\), the proposed algorithm achieves about 5 dB gain in SNR over JCI-Kalman and about 6 dB gain in SNR over SBL-LS. For uncoded system, Fig. 7 compares the channel estimation performance of three algorithms. Similar to the coded system, the proposed algorithm performs better than the other algorithms. Compared with SBL-LS algorithm, the SNR improvement of the proposed algorithm is more than 12 dB, compared with JCI-Kalman algorithm, the SNR improvement is about 10 dB. Figure 8 provides the BER performance of three algorithms. Similar to the coded system, the proposed algorithm outperforms the other two algorithms. When BER \(={10^{ - 3}}\), the proposed algorithm achieves about 5 dB gain in SNR over JCI-Kalman and about 9 dB gain in SNR over SBL-LS.

Fig. 5
figure 5

Channel estimation performance comparison of various mitigation methods for coded 4-QAM system with \(\alpha\)-stable distribution model

Fig. 6
figure 6

BER performance comparison of various mitigation methods for coded 4-QAM system with \(\alpha\)-stable distribution model

Fig. 7
figure 7

Channel estimation performance comparison of various mitigation methods for uncoded 4-QAM system with \(\alpha\)-stable distribution model

Fig. 8
figure 8

BER performance comparison of various mitigation methods for uncoded 4-QAM system with \(\alpha\)-stable distribution model

Fig. 9
figure 9

Channel estimation performance variation of three algorithms with respect to the number of iterations when SNR at 5 dB

Fig. 10
figure 10

Channel estimation performance variation of three algorithms with respect to the number of iterations when SNR at 20 dB

Figures 9 and 10 show the NMSE of three algorithms versus the number of iterations in uncoded system with SNRs of 5 dB and 20 dB, respectively. The figures show that both the FB-Kalman and JCI-Kalman algorithms exhibit a rapid convergence rate during the first 15 iterations, with a significant decrease in NMSE, and then the convergence rate gradually slows down and NMSE approaches a stable value. After 40 iterations, both algorithms have reached an almost stable state. While the SBL-LS algorithm converges fast, but the NMSE is the largest among three algorithms.

The FB-Kalman achieves better and more reliable performance than the JCI-Kalman and SBL-LS because it can jointly estimate the channel, IN, and symbols using all subcarriers in the received OFDM symbol. Moreover, the FB-Kalman filter algorithm captures the time correlation of sparse time-varying channels, so it has better BER performance.

6 Conclusion

In this paper, we discuss the problems of joint sparse channel estimation, IN mitigation, and data detection in the presence of IN for OFDM systems. A joint channel estimation and IN suppression algorithm by using all subcarriers based on SBL and FB-Kalman is proposed. An efficient implementation algorithm based on EM is utilized to estimate the unknown parameters. Simulation results verify the efficiency of the proposed algorithm.

Although the proposed algorithm improves the estimation performance of OFDM systems in IN background, it comes at the cost of sacrificing algorithm complexity. Our future work includes fast implementation algorithms with robust performance.