1 Introduction

The main activity of recent research has identified that the major targets for the next generation of mobile communications, the so-called fifth generation of mobile communications, are to achieve 1000 times the system capacity and 10 times the spectral efficiency, energy efficiency and data rate, and 25 times the average cell throughput [1]. From a high-level perspective, there is a promising technology that enables reaching higher fifth generation targets, called a massive multiple input multiple output (MIMO). A massive MIMO can be defined as a system using a large number of antennas at the base station; accordingly, a significant beamforming can be achieved and the system capacity can serve a large number of users [2].

When comparing massive MIMO to the conventional MIMO systems, massive MIMO shows several advantageous aspects. Firstly, as the number of the antennas at the base station goes to high values, the simplest coherent combiner and linear precoder turn out to be optimal. Secondly, by exploiting the features of the channel reciprocity, additional antennas increase the network capacity significantly without the need for additional feedback overhead. Thirdly, enabling the power reduction in the uplink and in the downlink can provide the potential for small-cell size shrinking [3].

The major limiting factor in massive MIMO is the availability of accurate, instantaneous channel state information (CSI) at the base station. The CSI is typically acquired by transmitting predefined pilot signals and estimating the channel coefficients from the received signals by applying an appropriate estimation algorithm [13].

Channel estimation accuracy depends on having perfect orthogonal pilots allocated to the users; however, to achieve high spectral efficiency, the same carrier frequency should be used in the neighbouring cells by following a specific reuse pattern. This leads to the creation of a spatially correlated inter-cell interference, known as pilot contamination, which reduces the estimation performance and spectral efficiency [13].

The pilot contamination problem was analyzed in [4] and it has shown that the precoding downlink signal of the base station in the serving cell contaminated the received signal of the users roaming in other cells. The authors of [5] analyzed the pilot contamination problems in multi-cell massive MIMO systems relying on a large antennas at the base station, and demonstrated that the pilot contamination problem persisted in large-scale MIMO [6].

However, pilot contamination could be reduced by reducing the number of pilots. A multi-user scenario therefore needs to reduce the number of pilots without affecting the channel impulse response (CIR) quality. Hence, the development of efficient channel estimation techniques for massive MIMO that are computationally less complex and require a fewer number of pilots is a challenge that should be thoroughly addressed [7].

Recently, compressed sensing (CS) techniques have received attention since they can recover the unknown signals from only a small number of measurements, thus using significantly far fewer samples than is possible via the conventional Nyquist rate, which is the signal recovery scheme developed for CS to exploit the sparse nature of signals (that is, only a small number of components in a signal vector are non-zero). CS allows for accurate system parameter estimation with fewer pilots; thereby, addressing the pilot contamination problem and improving the bandwidth efficiency [8, 9]. However, classical CS algorithms require prior knowledge of channel sparsity, which is usually unknown in practical scenarios. In addition, to apply CS algorithms, the sampling matrix must satisfy the restricted isometry property (RIP) for guaranteeing reliable estimators. Such a condition cannot be easily verified because it results computational demanding [10, 11].

To overcome the scarcity of CS-based channel estimation in massive MIMO systems, in this paper, we propose an improved channel estimation scheme based on the theory of Bayesian CS (BCS) that introduces relevance vector machines (RVM) and statistical learning information (SLI) into standard CS; whereby, probabilistic a priori information regarding the channel sparsity can be exploited for more reliable channel recovery to mitigate the pilot contamination problem. Also, the sampling matrix condition is efficiently overcome based on probabilistic formulation [1214].

Compared with the classical based scheme, our simulation results indicate that the proposed channel estimation methods provide improved estimation accuracy and can address the pilot contamination problem.

Furthermore, by exploiting the common statistical sparsity inherent in different multipath signals, we extend the BCS algorithms to a multi-task version for simultaneously reconstructing multiple signals, thus leading to MT-BCS [15, 16].

The main contributions of this paper are summarised as follows:

  • The BCS-based channel estimation algorithm has been proposed for massive MIMO to address the pilot contamination problem.

  • We have also proposed to enhance the performance of the BCS-based estimator through the principle of thresholding to select the most significant taps to improve the channel estimation accuracy.

  • In addition, we have exploited the common statistical sparsity distribution to enhance the estimation accuracy performance through the proposed MT-BCS-based estimator.

  • To provide the benchmark for the minimum performance error of the BSC and MT-BCS, the Cramer Rao bound (CRB) has been drawn for BCS and it has been derived and drawn for MT-BCS.

The remainder of this paper is organized as follows. The multi-cell massive MIMO system model is presented in Section 2. The BSC-based and the MT-BSC based channel estimation details are reviewed in Sections 3 and 4, respectively. In section 5, we provide the Cramer-Rao bound analysis. Section 6 presents the simulation results. Finally, the final conclusions are drawn in Section 7.

The following notation is adopted throughout the paper: \(\mathbb {C}\) denotes the complex number field. For \({A} \in \mathbb {C}\), we have A=A R +j A I , where \(j=\sqrt {-1}\), while A R and A I are the real and imaginary parts of A, respectively. For any matrix A, A i,j denotes the (i,j)th element. The transpose, inverse and Hermitian transpose operators are denoted by (.)T, (.)−1, and (.)H, respectively. Upper bold font are used to denote matrices while lower light font are used to denote vectors, lower and upper case represents the time domain and frequency domain, respectively. The I denotes an identity matrix, \(diag\{\underline {\mathbf {X}}\}\) denotes the diagonal matrix with the diagonal entries equal to the elements of X and \(\hat {X}\) represents the estimate of \(\hat {X}\). The Frobenius and spectral norms of a matrix x are denoted by ∥x F and ∥x2 respectively. E{.} has been employed to denote expectation with regard to all random variables within the brackets. A Gaussian stochastic variable o is the denoted by oN(r,q), where r is the mean and q is the variance. Also, a random vector x having the prober complex Gaussian distribution of mean μ and covariance Σ is indicated by xC N(x;μ,Σ), where, \( N(\mathbf {x};\boldsymbol {\mu },\boldsymbol {\Sigma })=\frac {1}{det(\pi \boldsymbol {\Sigma })} e^{-(\mathbf {x}-\boldsymbol {\mu })\boldsymbol {\Sigma }^{-1}(\mathbf {x}-\boldsymbol {\mu })}\), for simplicity we refer to C N(x;μ,Σ) as xC N(μ,Σ).

2 Massive MIMO system model

We consider a time division duplexing (TDD) multi-cell massive MIMO system with C cells as shown in Fig. 1. Each cell comprises of M antennas at the BS and N single antenna users. To improve the spectral efficiency, orthogonal frequency division multiplexing (OFDM) is adopted [17, 18].

Fig. 1
figure 1

Illustration of the system model of a multi-cell multi-user massive MIMO

At the beginning of the transmission, all mobile stations in all cells synchronously transmit OFDM pilot symbols to their serving base stations. Let the OFDM pilot symbol of user n in the c-th cell be denoted by \(\mathbf {x}^{n}_{c}=[{X}^{n}_{c}[1]\ {X}^{n}_{c}[2] \cdots {X}^{n}_{c}[K]]^{T}\), where K is the number of subcarriers. The OFDM transmission partition the multipath channel between the user and each antenna of the BS into K parallel independent additive white Gaussian noise (AWGN) sub-channels in the frequency domain. Each sub-channel is associated with a subcarrier. Let \({H}^{n}_{c^{*},c,i}[k]\) denote the k-th sub-channel coefficient between the n-th user in the c-th cell and the i-th antenna of the BS of cell c in the uplink.

The received signal \(\phantom {\dot {i}\!}{Y}_{c^{*},i}\) by the i-th antenna element of the cell c at the k-th subcarrier can be expressed as

$$\begin{array}{*{20}l} {Y}_{c^{*},i}[k]&= \sum_{n=1}^{N}{H}^{n}_{c^{*},c^{*},i}[k] {X}^{n}_{c^{*}}[k] \\ &+\sum_{c=1, c\neq{c^{*}}}^{C}\sum_{n=1}^{N}{H}^{n}_{c^{*},c,i}[k] {X}^{n}_{c}[k]+V_{c^{*},i}[k], \end{array} $$
(1)

for all 1≤iM and 1≤cC, where \({V}_{c^{*},i}[k]\phantom {\dot {i}\!}\) is the AWGN at the i-th antenna of the BS in cell c at the k-th subcarrier. Letting \(\phantom {\dot {i}\!}\mathbf {y}_{c^{*},i}=[Y_{c*,i}[1]\cdots Y_{c*,i}[K]]^{T}\), we can write (1) for all subcarriers at the i-th antenna of the BS in cell c in the compact form as

$$\begin{array}{*{20}l} \mathbf{y}_{c^{*},i}&= \sum_{n=1}^{N}\mathbf{X}^{n}_{c^{*}} \mathbf{h}^{n}_{c^{*},c^{*},i}+ \sum_{c=1, c\neq{c^{*}}}^{C} \sum_{n=1}^{N}\mathbf{X}^{n}_{c} \mathbf{h}^{n}_{c^{*},c,i} \\ &+\mathbf{v}_{c^{*},i}, \end{array} $$
(2)

where \(\mathbf {X}^{n}_{c^{*}}=\text {diag}\{\mathbf {x}^{n}_{c^{*}}\}\), \(\mathbf {h}^{n}_{c^{*},c,i}=[{H}^{n}_{c^{*},c,i}[1]\cdots {H}^{n}_{c^{*},c,i}[K]]^{T}\) and \(\mathbf {v}_{c^{*},i}=[{V}_{c^{*},i}[1]\cdots {V}_{c^{*},i}[K]]^{T} \sim CN(0,{\sigma }_{v}^{2})\). Let \(\mathbf {g}^{n}_{c^{*},c,i}=[g^{n}_{c^{*},c,i}[1] \cdots g^{n}_{c^{*},c,i}[\ell ] \cdots g^{n}_{c^{*},c,i}[L]]^{T}\) collect the samples of the sampled multipath CIR between the n-th user of the c-th cell and the i-th antenna of the BS in cell c , where L is the number of the channel taps and \(g^{n}_{c^{*},c,i}[\ell ]\) corresponds to the -th channel tap. The K frequency domain channel coefficients, i.e., \(\mathbf {h}^{n}_{c^{*},c,i}\), can be calculated as the K-point DFT of the CIR samples, i.e., \(\mathbf {g}^{n}_{c^{*},c,i} \in \mathbb {C}^{L \times 1}\), e.g., [18].

Hence,

$$ \mathbf{h}^{n}_{c^{*},c,i}= \mathbf{F} \mathbf{g}^{\prime n}_{c^{*},c,i}, $$
(3)

where \(\mathbf {F} \in \mathbb {C}^{K \times K}\) represents the discrete Fourier transform (DFT) matrix, whose element in row s and column r is given by \([\frac {1}{\sqrt {K}}e^{{-j2 \pi *(K-r)(K-s)}/{K}}]\), 1≤rK and 1≤sK and \(\mathbf {g}^{\prime n}_{c^{*},c,i}\in \mathbb {C}^{K \times 1}\) is \(\mathbf {g}^{n}_{c^{*},c,i}\in \mathbb {C}^{L \times 1}\) augmented with KL zeros. Using (3) in (2), we get

$$\begin{array}{*{20}l} \mathbf{y}_{c^{*},i}&= \sum_{n=1}^{N}\mathbf{X}^{n}_{c^{*}}\mathbf{F} \mathbf{g}^{\prime n}_{c^{*},c,i} +\sum_{c=1, c\neq{c^{*}}}^{C}\sum_{n=1}^{N}\mathbf{X}^{n}_{c} \mathbf{F} \mathbf{g}^{\prime n}_{c^{*},c,i} \\&+\mathbf{v}_{c^{*},i}. \end{array} $$
(4)

The channel coefficient is modelled as \(g^{n}_{c^{*},c,i}[\ell ]=\sqrt {{\phi }_{c^{*},c,i}}[\ell ] {\psi }_{c^{*},c,i}[\ell ]\) for 1≤L, where \({\phi }_{c^{*},c,i}\phantom {\dot {i}\!}\) model the path-loss and shadowing (large-scale fading), while the term \(\phantom {\dot {i}\!}{\psi }_{c^{*},c,i}\) is assumed to be independent identical distribution (i.i.d) of unknown random variables with C N(0,1) (small-scale fading) [3].

The received signal of (4) can be re-written as

$$\begin{array}{*{20}l} \mathbf{y}_{c^{*},i}&= \sum_{n=1}^{N}\mathbf{X}^{n}_{c^{*}}\mathbf{F} \mathbf{g}^{\prime n}_{c^{*},c,i} +\mathbf{z}_{c^{*},i}, \end{array} $$
(5)

where the term \(\mathbf {z}_{c^{*},i}= \sum _{c=1, c\neq {c^{*}}}^{C}\sum _{n=1}^{N}\mathbf {X}^{n}_{c} \mathbf {F} \mathbf {g}^{\prime n}_{c^{*},c,i}+\mathbf {v}_{c^{*},i}\) in (5) represents the net sum of inter-cell interference plus the receiver noise, the variance interference \({{\sigma }_{I}^{2}}\) of the inter-cell interference term caused during pilot transmission can be expressed as

$$\begin{array}{*{20}l} {\sigma}_{I}^{2}&= E \left\{ \left(\sum_{c=1, c\neq{c^{*}}}^{C}\sum_{n=1}^{N}\mathbf{X}^{n}_{c} \mathbf{F} \mathbf{g}^{\prime n}_{c^{*},c,i}\right) \right.\\ & \left.\quad\times\left(\sum_{c=1, c\neq{c^{*}}}^{C}\sum_{n=1}^{N}\mathbf{X}^{n}_{c} \mathbf{F} \mathbf{g}^{\prime n}_{c^{*},c,i}\right)^{H} \right\}. \end{array} $$
(6)

We define the measurement matrix \(\mathbf {A}^{n}_{c^{*}}= \mathbf {X}^{n}_{c^{*}}\mathbf {F}\), then (5) can be rewritten as

$$ {\mathbf{y}}_{c^{*},i}= \sum_{n=1}^{N}{\mathbf{A}}^{n}_{c^{*}} {\mathbf{g}}^{\prime n}_{c^{*},c,i}+\mathbf{z}_{c^{*},i}. $$
(7)

Based on the physical properties of outdoor electromagnetic propagation, the CIR in wireless communications usually contain a few significant channel taps as can be shown in Fig. 2, i.e., the CIR are sparse; hence, the number of non-zero taps of the channel is much smaller than the channel length, then the CS techniques can be applied for sparse channel estimation. This sparse property can be exploited to reduce the necessary channel parameters to be estimated. In this case, we can address the pilot contamination problem by using fewer pilots than the unknown channel coefficients [7, 19, 20].

Fig. 2
figure 2

Illustration of the rich scatterers wireless channel and the resulting channel impulse response is sparse

3 BCS-based channel estimation

In common literature, channel estimation methods are classified into parametric and Bayesian approaches. A standard parametric approach is the best linear unbiased estimator, which is often referred to as least squares channel estimation. In contrast to parametric methods, the Bayesian approach treats the desired parameters as random variable with a-priori known statistics. Clearly, the a priori probability density function (PDF) of the channel is assumed to be perfectly known at the receiver [21, 22]. Based on the Bayesian channel estimation philosophy, the estimation of unknown parameters is the expectation of the posterior probabilistic distribution that is proportional to the prior probability and the likelihood of the unknown parameters.

In this section, BCS-based channel estimation is presented in the context of massive MIMO channel estimation. Following the general procedure of BCS in [23] and [24], the full posterior distribution over unknown parameters of interest for the problem at hand can be given as

$${} \begin{aligned} P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i},\boldsymbol{\beta},{\sigma}^{2}|\mathbf{y}_{c^{*},i}\right)\,=\,\frac{P\left(\mathbf{y}_{c^{*},i}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i},\boldsymbol{\beta},{\sigma}^{2}\right)P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i},\boldsymbol{\beta},{\sigma}^{2}\right)}{P(\mathbf{y}_{c^{*},i}) }, \end{aligned} $$
(8)

where β represents the hyperparameters that control the sparsity of the channel while σ 2 is the net sum of the noise variance and interference variance.

However, the probability of the observation vector, \(\phantom {\dot {i}\!}P(\mathbf {y}_{c^{*},i})\), is defined by the following equation

$$\begin{array}{*{20}l} P(\mathbf{y}_{c^{*},i})&=\int\int\int P(\mathbf{y}_{c^{*},i}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i},{\sigma}^{2},\boldsymbol{\beta}) \\*& P(\mathbf{g}^{\prime n}_{c^{*},c^{*},i},\boldsymbol{\beta},{\sigma}^{2}) d\mathbf{g}^{\prime} \ d\boldsymbol{\beta} \ d{\sigma}^{2}, \end{array} $$
(9)

cannot be computed analytically. So, the posterior distribution can be decomposed as

$$\begin{array}{*{20}l} P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i},\boldsymbol{\beta},{\sigma}^{2}|\mathbf{y}_{c^{*},i}\right)& \equiv P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i}|\mathbf{y}_{c^{*},i},\boldsymbol{\beta},{\sigma}^{2}\right) \\& P\left(\boldsymbol{\beta},{\sigma}^{2}|\mathbf{y}_{c^{*},i}\right). \end{array} $$
(10)

The first term of (10), \(P\left (\mathbf {g}^{\prime n}_{c^{*},c^{*},i}|\mathbf {y}_{c^{*},i},\boldsymbol {\beta },\mathbf {\sigma }^{2}\right)\), the posterior distribution over the channel coefficient can be expressed based on Bayes’ rule as

$$ P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i}|\mathbf{y}_{c^{*},i},\boldsymbol{\beta},{\sigma}^{2}\right)=\frac{P\left(\mathbf{y}_{c^{*},i}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i},{\sigma}^{2} \right)P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i}|\boldsymbol{\beta}\right)}{P\left(\mathbf{y}_{c^{*},i}|\boldsymbol{\beta},{\sigma}^{2} \right) }. $$
(11)

The posterior distribution given above is Gaussian distribution with mean \(\boldsymbol {\mu }^{n}_{c^{*},c^{*},i}\) and the variance \(\boldsymbol {\Sigma }^{n}_{c^{*},c^{*},i}\) are given by

$$ \boldsymbol{\mu}^{n}_{c^{*},c^{*},i}= {\sigma}^{-2} \boldsymbol{\Sigma} \mathbf{A}^{n}_{c^{*}} \mathbf{y}_{c^{*},i}, $$
(12)
$$ \boldsymbol{\Sigma}^{n}_{c^{*},c^{*},i}=\left(\boldsymbol{\zeta}+{\sigma}^{-2} \left(\mathbf{A}^{n}_{c^{*}}\right)^{H} \mathbf{A}^{n}_{c^{*}}\right)^{-1}, $$
(13)

where ζ=d i a g{β 1,β 2,…,β K }.

The estimated channel based on Bayesian estimation approaches to minimize the mean square error (MSE) is the expectation of \(P\left (\mathbf {g}^{\prime n}_{c^{*},c^{*},i}|\mathbf {y}_{c^{*},i},\boldsymbol {\beta },{\sigma }^{2}\right)\), so the estimated channel can be expressed as

$$ \hat{\mathbf{g}}^{\prime n}_{c^{*},c^{*},i}=E\left(P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i}|\mathbf{y}_{c^{*},i},\boldsymbol{\beta},{\sigma}^{2}\right)\right)=\boldsymbol{\mu}^{n}_{c^{*},c^{*},i}. $$
(14)

Now, to obtain the estimated channel \(\hat {\mathbf {g}}^{\prime n}_{c^{*},c^{*},i}\), we need to find the heyparmarpater σ 2 and β that can be obtained from the second term on the right-hand side of (10) by applying a type −I I maximum likelihood procedure by operating a RVM.

Based on Bayes’ theorem, the posterior distribution \(P\left (\boldsymbol {\beta },{\sigma }^{2}|\mathbf {y}_{c^{*},i}\right)\) is proportional \(P\left (\mathbf {y}_{c^{*},i}|\boldsymbol {\beta },{\sigma }^{2}\right)\) [23], Then, the type −I I maximum likelihood is applied to the log marginal likelihood as follows

$$ P(\mathbf{y}_{c^{*},i}|\boldsymbol{\beta},{\sigma}^{2})= \int\limits_{-\infty}^{\infty} {P\left(\mathbf{y}_{c^{*},i}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i},{\sigma}^{2}\right)P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i}|\boldsymbol{\beta}\right)} d\mathbf{g}^{\prime}. $$
(15)

Based on the assumption of the RVM approach in [23], the term \(P(\mathbf {g}^{\prime n}_{c^{*},c^{*},i}|\boldsymbol {\beta })\) follows zero-mean Gaussian distribution and can be expressed as

$$\begin{array}{*{20}l} P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i}|\boldsymbol{\beta}\right) &= (2\pi)^{\frac{-K}{2}} \prod_{i=1}^{K} \beta_{k}^{\frac{1}{2}} \\ & exp\left[\frac{-1}{2} \mathbf{g}^{\prime n}_{c^{*},c^{*},i} \beta_{k}\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i}\right)^{H}\right], \end{array} $$
(16)

while the Gaussian likelihood function of \(\phantom {\dot {i}\!}\mathbf {y}_{c^{*},i}\) according to the probability theory, can be written as

$${} {\begin{aligned} P\!\left(\mathbf{y}_{c^{*},i}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i},{\sigma}^{2} \right)\,=\, \left(\frac{2\pi}{{\sigma}^{2}}\right)^{\frac{-K}{2}}\! exp\left(\!\frac{-{\sigma}^{2}}{2} ||\mathbf{y}_{c^{*},i}-\mathbf{A}^{n}_{c^{*}}\mathbf{g}^{\prime n}_{c^{*},c^{*},i}||_{2}^{2}\right). \end{aligned}} $$
(17)

By substituting (16) and (17) into (15), marginal likelihood \(P(\mathbf {y}_{c^{*},i}|\boldsymbol {\beta },{\sigma }^{2})\) can be expressed as

$$\begin{array}{*{20}l} P(\mathbf{y}_{c^{*},i}|\boldsymbol{\beta},{\sigma}^{2})= & \log \left\{\left(\frac{\beta_{k}}{2\pi}\right)^{\frac{K}{2}} \left(\frac{1}{2\pi}\right)^{\frac{K}{2}} \prod_{k=1}^{K} \beta_{k}^{\frac{1}{2}} \right.\\ &\quad\int\limits_{-\infty}^{\infty} exp\left(\frac{-\beta_{k}}{2}||\mathbf{y}_{c^{*},i}-\mathbf{A}^{n}_{c^{*}}\mathbf{g}^{\prime n}_{c^{*},c^{*},i}||_{2}^{2}\right) \\ &\quad + \left.{\vphantom{\left\{\left(\frac{\beta_{k}}{2\pi}\right)^{\frac{K}{2}} \left(\frac{1}{2\pi}\right)^{\frac{K}{2}} \prod_{k=1}^{K} \beta_{k}^{\frac{1}{2}} \right.}}\frac{1}{2}\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i}\right)^{H} \beta_{k} \mathbf{g}^{\prime n}_{c^{*},c^{*},i})\right\}, \end{array} $$
(18)

β can be obtained by differentiating the log marginal likelihood with regard to σ 2, and equating it to zero and it can be given as

$$ (\beta_{k})^{ii}=\frac{I-\beta_{k} \left(\Sigma^{n}_{c^{*},c^{*},i}\right)_{k}}{\left(\mu^{n}_{c^{*},c^{*},i}\right)_{k}^{2}}. $$
(19)

While σ 2 is obtained by differentiating (19) with regard to β and set these derivations to zero and can be expressed as

$$ (\sigma^{2})^{ii}=\frac {||\mathbf{y}_{c^{*},i}-\mathbf{A}^{n}_{c^{*}} \mathbf{g}^{\prime n}_{c^{*},c^{*},i}||_{2}^{2}} {(M-I+\sum_{k=1}^{K} \beta_{k})}. $$
(20)

The β k and \(\sigma _{k}^{2}\) which maximize the log marginal likelihood are then found iteratively by setting β and σ 2 to initial values and then finding values for \(\boldsymbol {\mu }^{n}_{c^{*},c^{*},i}\) and \(\boldsymbol {\Sigma }^{n}_{c^{*},c^{*},i}\) from (12) and (13). These values are then repeatedly used to calculate a new estimate for β k and σ 2 and until a convergence criteria is met.

Further details of the BCS algorithm can be found in [23, 24]. The procedure for implementation of the proposed technique is summarized in Algorithm 1.

In contrast to the conventional BCS-based estimator, it can also improve the performance of the BCS estimator based on the principle of thresholding, which can be applied to keep the most significant taps. The proposed algorithm applies a threshold approach by retaining the channel taps that have energy above a threshold value of ϱ and set the other taps to zero. The value of ϱ is the energy of the channel impulse response.

4 Multi-task BCS based channel estimation

With a high probability of user movements, the massive MIMO system channel may vary. Consequently, the channels at different time instants/locations are different but share the same common statistical property. As a result, to estimate the current channel, we can exploit the previous compressive vectors in addition to the current compressive vector [15].

Given the system model in II, the received signals of (7) can have the following formulation

$$ \mathbf{y}_{c^{*},i,j}= \sum_{n=1}^{N}\mathbf{A}^{n}_{c^{*},j} \mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}+ \mathbf{z}_{c^{*},i,j}, $$
(21)

for j=1,2,…J where J is the number of the task, \(\mathbf {A}^{n}_{c^{*},j}, \mathbf {g}^{\prime n}_{c^{*},c^{*},i,j}\phantom {\dot {i}\!}\) and \(\phantom {\dot {i}\!}\mathbf {z}_{c^{*},i,j}\) represents the jth measurement matrices,channel vector and the noise vector, respectively [15].

The main target is to estimate the channel \(\mathbf {g}^{\prime n}_{c^{*},c^{*},i,j}\) which can be computed based on Bayesian channel estimation philosophy as the mean of the channel posterior distribution that can be represented as

$$ \hat{\mathbf{g}}^{\prime n}_{c^{*},c^{*},i,j}=E(P(\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}|\mathbf{y}_{c^{*},i,j},\boldsymbol{\Xi}_{j},{\xi}_{0})), $$
(22)

where ξ 0 represents the inverse of the net sum of the noise variance and interference variance, while Ξ j represent the hyperparameters that control the sparsity of the channel. Based on Bayes’ rule the posterior distribution can be given as

$$\begin{array}{*{20}l} P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}|\mathbf{y}_{c^{*},i,j},\boldsymbol{\Xi}_{j},{\xi}_{0}\right) \end{array} $$
$$\begin{array}{*{20}l} =\frac{P\left(\mathbf{y}_{c^{*},i,j}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j},{\xi}_{0}\right) P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}|\boldsymbol{\Xi}_{j}\right)}{{\int} P\left(\mathbf{y}_{c^{*},i,j}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j},{\xi}_{0}\right) P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j},\boldsymbol{\Xi}_{j}\right)d\mathbf{g}^{\prime} } \\ \sim N\left(\boldsymbol{\mu}^{n}_{c^{*},i,j},\boldsymbol{\Sigma}^{n}_{c^{*},i,j}\right), \end{array} $$
(23)

the mean and covariance can be given by

$$ \boldsymbol{\mu}^{n}_{c^{*},i,j}= {\xi}_{0} \boldsymbol{\Sigma}^{n}_{c^{*},i,j} \mathbf{A}^{n}_{c^{*},j} \mathbf{y}_{c^{*},i,j}, $$
(24)
$$ \boldsymbol{\Sigma}^{n}_{c^{*},i,j}=\left(\boldsymbol{\psi}+\boldsymbol{\Xi}_{j} (\mathbf{A}^{n}_{c^{*},j})^{H} \mathbf{A}^{n}_{c^{*},j}\right)^{-1}, $$
(25)

where ψ=d i a g(ψ 0,ψ 1,ψ 2,…,ψ K ).

The likelihood function for the parameter \(\mathbf {g}^{\prime n}_{c^{*},c^{*},i,j}\) and ξ 0 based on the received signal \(\mathbf {y}_{c^{*},i,j}\phantom {\dot {i}\!}\) and can be expressed as

$$\begin{array}{*{20}l}{} P\left(\mathbf{y}_{c^{*},i,j}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j},{\xi}_{0}\right)&= \left(\frac{2\pi}{{\xi}_{0}}\right)^{\frac{-N}{2}}\\ &exp\left(\frac{-{\xi}_{0}}{2}||\mathbf{y}_{c^{*},i,j}-\mathbf{A}^{n}_{c^{*},j}\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}||_{2}^{2}\right). \end{array} $$
(26)

The channel coefficients \(\mathbf {g}^{\prime n}_{c^{*},c^{*},i,j}\) are assumed to be drawn from a product of zero-mean Gaussian distributions that are shared by all tasks as follow

$$\begin{array}{*{20}l} P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}|\boldsymbol{\Xi}_{j}\right)&= \prod_{i=1}^{N} {\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}|0,\boldsymbol{\Xi}_{j}^{-1}\right)} \\& =(2\pi)^{\frac{-N}{2}} \prod_{i=1}^{N} \boldsymbol{\Xi}_{j}^{\frac{1}{2}} \\& \quad\times exp\left[{\frac{-1}{2} \left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}\right)^{H} \boldsymbol{\Xi}_{j}\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}}\right]. \end{array} $$
(27)

To obtain the estimated channel, we need to estimate Ξ j and ξ 0 by applying the same procedure in Section 3 to the posterior distribution \(P\left (\mathbf {y}_{c^{*},i,j}|,\boldsymbol {\Xi }_{j},{\xi }_{0}\right)\) that can be inference as [16]

$$\begin{array}{*{20}l} P\left(\mathbf{y}_{c^{*},i,j}|\boldsymbol{\Xi}_{j},{\xi}_{0}\right) & \equiv P\left(\mathbf{y}_{c^{*},i,j}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j},{\xi}_{0}\right) \\*& P\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}|\boldsymbol{\Xi}_{j}\right). \end{array} $$
(28)

Now, by maximizing the log marginal likelihood and then differentiating with respect to Ξ j and ξ 0 and setting to zero yields

$$ (\boldsymbol{\Xi}_{j})^{new}=\frac{J-\boldsymbol{\Xi}_{j} \sum_{j=1}^{J} \boldsymbol{\Sigma}^{n}_{c^{*},c^{*},i,j}}{\sum_{j=1}^{J}\left(\boldsymbol{\mu}^{n}_{c^{*},c^{*},i,j}\right)^{2}}, $$
(29)
$$ ({\xi}_{0})^{new}=\frac{\sum_{j=1}^{J}\left(K-J+\sum_{i=1}^{J} \boldsymbol{\Sigma}^{n}_{c^{*},c^{*},i,j} \boldsymbol{\Xi}_{j}\right)}{\sum_{j=1}^{J}||\mathbf{y}_{c^{*},i,j}-\mathbf{A}^{n}_{c^{*},j} \mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}||_{2}^{2}}. $$
(30)

Further information on MT-BCS can be found in [16].

5 CRB for BCS-based estimator

In this section, we analyse the CRB for the proposed BCS and MT-BCS based channel estimation techniques to provide a benchmark for the minimum estimation error that can be achieved by the proposed algorithm. The CRB on the covariance of any estimator \(\hat {\boldsymbol \theta }\) can be given as

$$\begin{array}{*{20}l} E\left\{(\hat{\boldsymbol {\theta}}-\boldsymbol \theta)(\hat{\boldsymbol \theta}-\boldsymbol \theta)^{H}\right\} \geq J^{-1}(\boldsymbol \theta), \end{array} $$
(31)

where J(θ) is the Fisher information matrix (FIM) corresponding to the observation f, and can be given as

$$\begin{array}{*{20}l} J(\boldsymbol \theta)= E\left(\frac{\partial }{\partial {\boldsymbol \theta}} log l(\boldsymbol \theta,f)\right)\left(\frac{\partial }{\partial {\boldsymbol \theta}} log l(\boldsymbol \theta,f)\right)^{T}, \end{array} $$
(32)

where l(θ,f) is the likelihood function corresponding to the observation f, parameterized by θ [25].

Therefore, given the system model in 2, the closed form expression of the Bayesian CRB (BCRB) for the proposed BCS can be given as

$$\begin{array}{*{20}l} J(\mathbf{g}^{\prime n}_{c^{*},c^{*},i})\geq \left(\frac{1}{\boldsymbol{\beta}}+\frac{\mathbf{A}^{n}_{c^{*}}(\mathbf{A}^{n}_{c^{*}})^{H}}{{\sigma}^{2}}\right)^{-1}. \end{array} $$
(33)

Theorem 1

Given (28), the closed form expression of the BCRB for the proposed MT-BCS can be given as

$$\begin{array}{*{20}l} J\left(\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}\right)\geq \left(\frac{1}{\boldsymbol{\Xi}}_{j}+\frac{\mathbf{A}^{n}_{c^{*},j}\left(\mathbf{A}^{n}_{c^{*},j}\right)^{H}}{{{\xi}_{0}}}\right)^{-1}. \end{array} $$
(34)

Proof

See Appendix 1. □

6 Simulation results

To verify the accuracy of our analytical results, the simulation parameters can be summarized as follows: the number of antennas is 100, the number of users is 100, the number of the channel taps is 500, the number of subcarrier K is 4096 and the convergence δ is 10−6. The simulation results are obtained by averaging over 1000 realizations.

To compare the accuracy of the channel estimation techniques, the normalized (MSE) is used for performance evaluation and is computed as

$$ MSE= \frac{||\hat{\mathbf{g}}^{\prime n}_{c^{*},c,i,j}-\mathbf{g}^{\prime n}_{c^{*},c,i,j}||_{2}^{2}}{||\mathbf{g}^{\prime n}_{c^{*},c,i,j}||^{2}_{2}}. $$
(35)

Figure 3 shows the MSE performance comparison among a BCS-based channel estimation of three scenarios under small pilot contamination (\(\phantom {\dot {i}\!}{\phi _{c^{*},c^{*},i}}=1\) and \(\phantom {\dot {i}\!}{\phi _{c^{*},c,i}}=0.1\)), strong pilot contamination (\({\phi _{c^{*},c^{*},i}}=1\phantom {\dot {i}\!}\) and \(\phantom {\dot {i}\!}{\phi _{c^{*},c,i}}=0.5\)), very strong pilot contamination (\(\phantom {\dot {i}\!}{\phi _{c^{*},c^{*},i}}=1\) and \(\phantom {\dot {i}\!}{\phi _{c^{*},c,i}}=0.9\)), regularized least square (RLS)-based estimator with no pilot contamination as a benchmark and the BCRB for BCS as a reference line. The results have shown significant improvement in estimation accuracy and addressing the pilot contamination problem for SNR values of −40 to 40 dB for the proposed technique compared with R-LS. This is a result of exploiting the prior statistical of channel sparsity. Furthermore, the results still show enhanced estimation performance for high SNR.

Fig. 3
figure 3

MSE performance comparison between BSC, BCRB for ϕ c∗,c,i ={0.1,0.5,0.9} and R-LS versus SNR

Figure 4 shows the (MSE) performance versus SNR with a different value of setting to the number of subcarrier K={100,200 and 300}, so the compression ratio (CR) (i.e., L/K) is to be C R={0.2,0.1and 0.06}, while the experiment is run under small pilot contamination (\(\phantom {\dot {i}\!}{\phi _{c^{*},c^{*},i}}=1\) and \(\phantom {\dot {i}\!}{\phi _{c^{*},c,i}}=0.1\)). The results prove that the estimation accuracy is better performed by decreasing the values of the number of subcarriers, accordingly with increasing CR.

Fig. 4
figure 4

MSE of BSC for K={100,200, and 300} and C R={0.2,0.1, and 0.06}, respectively

Figure 5 demonstrates the MSE of the BSC-based channel estimation versus SNR for three scenarios of different settings to the number of antennas at the base station M={100,200, and 300}, the system under strong pilot contamination (\(\phantom {\dot {i}\!}{\phi _{c^{*},c^{*},i}}=1\) and \(\phantom {\dot {i}\!}{\phi _{c^{*},c,i}}=0.7\)). The results show that the estimation accuracy of the proposed algorithm is enhanced by increasing the number of antennas. Thus, according to the law of large numbers, more coordinated BS antennas could provide more accurate support estimation.

Fig. 5
figure 5

MSE of BSC for M={100,200, and 300} versus SNR

Figure 6 shows the (MSE) performance versus SNR for BCS with different values for the number of pilots: 1000, 500, 100, 50, and 25, where the number of subcarrier K is 4096. The number of the CIR path is 500 while the experiments run under strong pilot contamination. For cases of the number of the pilots is greater than the number of channel taps (i.e., 1000 and 500), the BCS provides inefficient estimation accuracy, while for the other cases of the number of the pilot of (100, 50, and 25), which is less than 500, the estimation accuracy is enhanced significantly. In addition, there is no significant improvement for the cases of the number of the pilots 100, 50, and 25. In these cases, we can address pilot contamination by employing small values for the number of the pilot, i.e., 25.

Fig. 6
figure 6

MSE performance comparison of BSC based estimator for different values of the number of the pilot 100, 50, and 10 versus SNR

Figure 7 compares the (MSE) performance versus SNR among BCS, threshold-BSC, MT-BCS, LS, OMP and the Bilinear Approximate Message Passing (Bi-AMP) [26]. The number of subcarrier K is 1024 and the number of the CIR path is 100. Results show the proposed MT-BCS enjoys significant performance improvement over all the other estimators as a result of exploiting the statistical prior information on a large scale. However, this advantage is at the expense of a relatively high complexity of BCS and MT-BCS over other estimators as depicted in Table 1, which compares the computational complexity Bi-AMP [26], BCS [23], OMP [27], LS [28], and the MT-BCS [16]. Also, the results showed that the thresholding approach enhances the estimation accuracy of the conventional BCS, as the CIR contains so many taps with no significant energy. By setting the threshold and neglecting these taps, a huge part of the noise and interference from pilot contamination will be eliminated.

Fig. 7
figure 7

MSE performance comparison between BCS, thresholded BCS, LS, MT-BCS, OMP, and BiAMP-based estimators versus SNR

Table 1 Complexity analysis

7 Conclusions

To address the pilot contamination problem in massive MIMO systems, we proposed a BCS-based channel estimation algorithm for the multi-cell multi-user massive MIMO. The simulation results have revealed that the BCS-based channel estimation algorithm has tremendous improvement over conventional-based channel estimation algorithms and can address the pilot contamination problem. Furthermore, the proposed technique can be enhanced by thresholding the CIR to a certain value and also by exploiting the common sparsity feature inherent in the system channel. In addition, the number of antennas and the compression ratio should be selected wisely to achieve optimum estimation accuracy.

8 Appendix 1: Proof of Theorem 1

Following Section 5, we can write the FIM as

$$ J(\mathbf{y}_{c^{*},i,j})\geq - E\left(\frac{\partial^{2} log (P_{\mathbf{y}_{c^{*},i,j}|\boldsymbol{\Xi}_{j},{\xi}_{0}}(P(\mathbf{y}_{c^{*},i,j}|\boldsymbol{\Xi}_{j},{\xi}_{0}))) }{\partial^{2}{\mathbf{g}^{\prime}}}\right)^{-1} $$
(36)

Based on Bayes’ rule in (32), the FIM can be decomposed into two terms

$$\begin{array}{@{}rcl@{}} & - E\left(\frac{\partial^{2} log (P_{\mathbf{y}_{c^{*},i,j}|\boldsymbol{\Xi}_{j},{\xi}_{0}}(P(\mathbf{y}_{c^{*},i,j}|\boldsymbol{\Xi}_{j},{\xi}_{0}))) }{{\partial^{2}{\mathbf{g}^{\prime}}}}\right)= -\\ &E\left(\frac{\partial^{2} log(P_{\mathbf{y}_{c^{*},i,j}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j},{\xi}_{0}}(P(\mathbf{y}_{c^{*},i,j}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j},{\xi}_{0})) }{\partial^{2}{\mathbf{g}^{\prime}}}\right)- \\ &E\left(\frac{\partial^{2} log(P_{\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}|\boldsymbol{\Xi}_{j}}(P(\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}|\boldsymbol{\Xi}_{j})))}{\partial^{2}{\mathbf{g}^{\prime}}}\right), \end{array} $$
(37)

using (28), the first term can be computed as follow

$$\begin{array}{*{20}l} - \frac{\partial^{2} log(P_{\mathbf{y}_{c^{*},i,j}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j},{\xi}_{0}}(P(\mathbf{y}_{c^{*},i,j}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j},{\xi}_{0}))} {\partial^{2}{\mathbf{g}^{\prime}}}= \\ \frac{\partial}{{\partial{\mathbf{g}^{\prime}}}} \left[-log(2\pi)^{\frac{1}{2}}{\xi}_{0}^{-1}- \frac{\xi_{0}}{2}||\mathbf{y}_{c^{*},i,j}-\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}\mathbf{A}^{n}_{c^{*},j}||_{2}^{2}\right], \end{array} $$
(38)
$${} \frac{\partial^{2} log(P_{\mathbf{y}_{c^{*},i,j}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j},{\xi}_{0}}(P(\mathbf{y}_{c^{*},i,j}|\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j},{\xi}_{0})) }{\partial^{2}{\mathbf{g}^{\prime}}}= \frac{\mathbf{A}^{n}_{c^{*},j}(\mathbf{A}^{n}_{c^{*},j})^{H}}{{\xi}_{0}}. $$
(39)

By applying the same procedure in (38 and 39) to the second term of (37) gives

$$\begin{array}{*{20}l} E\left(\frac{\partial^{2} log(P_{\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}|\boldsymbol{\Xi}_{j}}(P(\mathbf{g}^{\prime n}_{c^{*},c^{*},i,j}|\boldsymbol{\Xi}_{j})))}{\partial^{2}{\mathbf{g}^{\prime}}}\right)=(\boldsymbol{\Xi}_{j})^{-1}. \end{array} $$
(40)