1 Introduction

In the 5G cellular communication, millimeter wave (mmWave) has been drawing enormous attention from academia, colleges and governments due to the wide available spectrum [1, 2]. However, challenges also exist in the application of mmWave [3, 4]. With frequency increasing, various attenuation losses of the mmWave channel, for example, the path loss and showing effects, become more hostile. One solution to the power loss problem in the channel is to use large MIMO antennas. For both the base station and the mobile station, large antennas are equipped to get transceiver powerful communication. However, another challenge is that the conventional training overhead for the channel state information (CSI) acquisition grows proportionally with the BS and MS antenna size. Fortunately, the massive MIMO system has a high correlation due to the finite number of scatters in the propagation path, so that the effective dimension is much less than the actual dimension [5]. Hence, with the uniform linear array (ULA) assumption, the mmWave channel can be approximately transformed into the sparse representation under the discrete Fourier transform (DFT) basis when large-scale antennas are equipped [6,7,8]. Compressive sensing (CS) theory was developed by Candes et al. [9] and Donoho [10] which can be very useful when signals are sparse or compressible. Several algorithms have been presented in [5, 11,12,13] to exploit such hidden sparsity. The above methods contribute to achieving a significant reduction in the training overhead. However, there are still some limits in the algorithms. For example, in the expectation–maximization (EM) sparse Bayesian learning algorithm, the computation complexity increases exponentially with the signal dimension [14], which may meet great challenges in practical implementations. Another generally existed problem is that although the sparse algorithms can achieve super estimation performance for sparse signals, there also exists a grid mismatch issue. Some super-resolution (off-grid) compressed sensing methods and two-stage channel estimation algorithms have been applied to improve channel estimation accuracy [14,15,16,17,18,19,20] aiming to overcome the grid mismatch caused by conventional compressed sensing techniques. However, the super-resolution dictionary learning algorithm has two main shortcomings: the convergence of the results cannot be guaranteed; a large amount of prior training is required for the sparse dictionary. These two shortcomings limit the practical design and implementation.

In this paper, we adopt a two-step algorithm. The first step is coarse channel estimate by the SBL algorithm, which simplifies operations using DGAMP. The second step is an accurate angle estimate by the interpolation of three 2-DFT spectral lines. The following summarizes the contributions of this paper.

  • We formulate a sparse recovery problem and develop a DGAMP-SBL algorithm for the coarse estimation channel of the hybrid MIMO system. Based on the algorithm, we accelerate the computation speed of the original EM sparse Bayesian learning algorithm for mmWave channel estimation. The computation complexity shows that we significantly reduce operation time comparing to the original SBL algorithm with large-scale transmission and reception antennas.

  • To address the sparse reconstruction based channel estimation schemes suffering from the grid-off problem, we improve a fast refined algorithm without iteration. In the discrete Fourier transform domain, the influence of Gaussian noise decreases with the increasing antenna number, which helps obtain a more accurate angle estimation.

The rest of the paper is organized as follows. In Sect. 2, we outline our algorithm and briefly explain the operation steps. The system and channel model are discussed in Sect. 3. We formulate the sparse recovery-based channel estimation with the proposed DGAMP-SBL algorithm and get more accurate channel estimation using a refined algorithm in Sect. 4. The performance of the proposed algorithm is compared with that of the existing algorithms, and the superiority of the proposed algorithm is concluded in V. At the end of the paper, a conclusion is given in Sect. 6.

The notations related to this paper are shown in Table 1.

Table 1 Related notations

2 Method

In this section, an off-grid channel model is implied for sparse representation of massive mmWave MIMO systems and was propose a computationally efficient and accurate mmWave channel estimation algorithm. Generalized approximate message passing (GAMP) is an extension of the approximate message passing algorithm [21] to more comprehensive scenarios involving arbitrary priors and observation noise [22]. We use a GAMP based low complexity SBL algorithm to avoid the high-dimension matrix inversion. This algorithm can be embedded in the expectation–maximization (EM) framework to approximate the true posterior distribution of sparse. Then we improve a fast refined algorithm that utilizes the interpolation of three 2-DFT spectral lines to improve the accuracy of the algorithm. Significantly, the improved algorithm can get accurate angles of departures/arrivals (AoAs/AoDs) estimation without iterations. Consequently, we can obtain both computational efficiency and estimation precision. Finally, the lower bound of normalized mean-squared error of the proposed algorithm is derived analytically: the lower bound of estimation is derived by the least square (LS) algorithm with the assumption that the angles of AoAs/AoDs are known. And then, we estimate the channel by using the LS algorithm. The result indicates our proposed algorithm is close to the lower bound at high signal-to-noise ratios.

3 System model

In this paper, we consider a typical hybrid mmWave MIMO systems consisting of a BS and a MS, where the BS and MS are equipped with \({N_t}\) and \({N_r}\) antennas, respectively. Let \(N_t^{\mathrm{RF}}\) and \(N_r^{\mathrm{RF}}\) be the number of transmitter radio frequency (RF) chains and receiver RF chains, respectively. We consider a hybrid analog–digital communication system that has inherent sparse characteristics as shown in Fig. 1. For mmWave massive MIMO with hybrid precoding, the amount of RF chains is far less than the amount of antennas, i.e., \(N_t^{\mathrm{RF}}< {N_t},\;N_r^{\mathrm{RF}} < {N_r}\) [23,24,25]. We assume that \({{\mathbf {F}}} = {{{\mathbf {F}}}_{\mathrm{RF}}}{{{\mathbf {F}}}_{\mathrm{BB}}}\) is the \({N_t} \times N_s^t\) hybrid precoding matrix in BS, where \({{\mathbf {F}}_{\mathrm{\mathrm RF}}} \in {{\mathbb {C}}^{{N_t} \times N_t^{\mathrm{RF}}}}\) and \({{{\mathbf {F}}}_{\mathrm{BB}}} \in {{\mathbb {C}}^{N_t^{\mathrm{RF}} \times N_s^t}}\) denote the analog and digital precoders, respectively. \({{\mathbf {W}}} = {{{\mathbf {W}}}_{\mathrm{RF}}}{{{\mathbf {W}}}_{\mathrm{BB}}} \in {{\mathbb {C}}^{{N_r} \times N_s^r}}\) is the hybrid combiner. \({{\mathbf {W}}_{\mathrm{RF}}} \in {{\mathbb {C}}^{{N_r} \times N_r^{\mathrm{RF}}}}\) and \({{{\mathbf {W}}}_{\mathrm{BB}}} \in {{\mathbb {C}}^{N_r^{\mathrm{RF}} \times N_s^r}}\) denote the analog and digital combiners, respectively, and the received signal \({{\mathbf {r}}} \in {{\mathbb {C}}^{N_s^r \times 1}}\) can be rewritten as

$$\begin{aligned} \begin{aligned} {{\mathbf {r}}}&= {{\mathbf {W}}}_{\mathrm{BB}}^H{{\mathbf {W}}}_{\mathrm{RF}}^H{{\mathbf {H}}}{{{\mathbf {F}}}_{\mathrm{RF}}}{{{\mathbf {F}}}_{\mathrm{BB}}}{{\mathbf {s}}} \\&\quad + {{\mathbf {W}}}_{\mathrm{BB}}^H{{\mathbf {W}}}_{\mathrm{RF}}^H{{\mathbf {n}}} \\&= {{{\mathbf {W}}}^H}{\mathbf {HFs}} + {{{\mathbf {W}}}^H}{{\mathbf {n}}} , \\ \end{aligned} \end{aligned}$$

where \({{\mathbf {r}}} \in {{\mathbb {C}}^{N_s^r \times 1}}\) is the received vector, \({{\mathbf {H}}} \in {{\mathbb {C}}^{{N_r} \times {N_t}}}\) is the channel matrix, \({{\mathbf {s}}} \in {{\mathbb {C}}^{N_s^t \times 1}}\) and \({{\mathbf {n}}} \in {{\mathbb {C}}^{N_s^r \times 1}}\) is the Gaussian noise matrix.

Fig. 1
figure 1

A mm-wave system employing hybrid analog-digital precoding

Use \({{\mathbf {x}}} = {\mathbf {Fs}} \in {C^{{N_t} \times 1}}\) to denote the transmit signal, and the transmit vector elements correspond to the transmit antennas position. Firstly the channel estimation is considered in M time slots, and the channel matrix remains unchanged during the M time slots. \({N_x}\) \(({N_x} < {N_t})\) pilot sequences \({{{\mathbf {x}}}_1},{{{\mathbf {x}}}_2}, \ldots {{{\mathbf {x}}}_{{N_x}}}\) are sent by transmitters, and we also use \({N_y}\) dimensional received pilot \({{{\mathbf {y}}}_p}\) by receiving antennas for each transmits pilot sequence \({{{\mathbf {x}}}_p}\left( {1 \leqslant p \leqslant {N_x}} \right)\)

$$\begin{aligned} \begin{aligned} {{{\mathbf {y}}}_p} = {{\mathbf {W}}}_{\mathrm{r}}^H{{\mathbf {H}}}{{{\mathbf {x}}}_p} + {{\mathbf {W}}}_{\mathrm{r}}^H{{{\mathbf {n}}}_p}, \end{aligned} \end{aligned}$$

where \({{{\mathbf {W}}}_{\mathrm{r}}} = [{{{\mathbf {W}}}_1},{{{\mathbf {W}}}_2}, \ldots ,{{{\mathbf {W}}}_M}] \in {{\mathbb {C}}^{{N_r} \times {N_y}}}\), \({{{\mathbf {n}}}_p} \in {{\mathbb {C}}^{{N_r} \times 1}}\) is the noise. Denote \({{\mathbf {Y}}} = [{{{\mathbf {y}}}_1},{{{\mathbf {y}}}_2}, \ldots ,{{{\mathbf {y}}}_{{N_x}}}]\), \({{\mathbf {X}}} = [{{{\mathbf {x}}}_1},{{{\mathbf {x}}}_2}, \ldots {{{\mathbf {x}}}_{{N_x}}}]\), \({{\mathbf {N}}} = \left[ {{{{\mathbf {n}}}_1},{{{\mathbf {n}}}_2}, \ldots ,{{{\mathbf {n}}}_{{N_x}}}} \right]\), we have

$$\begin{aligned} \begin{aligned} {{\mathbf {Y}}} = {{\mathbf {W}}}_{\mathrm{r}}^H{\mathbf {HX}} + \begin{array}{c} \frown \\ {{\mathbf {N}}} \end{array}, \end{aligned} \end{aligned}$$

where \(\begin{array}{c} \frown \\ {{\mathbf {N}}} \end{array} = {{\mathbf {W}}}_{\mathrm{r}}^H{{\mathbf {N}}}\) and \({{\mathbf {N}}}\) represents the noise matrix with each entry following the distribution \({\mathcal {C}}{\mathcal {N}}(0,{\sigma _n ^2})\). In this paper, we suppose that the matrices \({{{\mathbf {W}}}_{\mathrm{r}}}\) and \({{\mathbf {X}}}\) are known as prior knowledge at the receiver without loss of generality. As mentioned in [16], the entries in \({{{\mathbf {W}}}_{\mathrm{r}}}\) and \({{\mathbf {X}}}\) are optimally chosen from the set \({w_{i,j}} = \sqrt{{1 \big / {{N_t}}}} {\mathrm{{e}}^{j{\beta _{i,j}}}}\),\({x_{i,j}} = \sqrt{{\rho \big / {{N_t}}}} {\mathrm{{e}}^{j{\beta _{i,j}}}}\) where \({\beta _{i,j}}\) is the random phase uniformly distributed in \([ 0, {2\pi } )\). This assumption is also employed in [26, 27]. Further details of how to design \({{{\mathbf {W}}}_{\mathrm{r}}}\) and \({{{\mathbf {X}}}}\) can be found from [28].

It is not available to observe \({\mathbf {H}}\) directly. Instead, a noisy edition \({{\mathbf {W}}}_{\mathrm{r}}^H{\mathbf {HX}}\) can be observed from the receiver. This is referred to as channel subspace sampling and a detailed analysis can be found in [29, 30]. Fortunately, as mentioned above, the channel estimation problem can be solved as a sparse signal recovery. The typical model of mmWave channel is modeled as [31]

$$\begin{aligned} \begin{aligned} {{\mathbf {H}}} = \sum \limits _{l = 1}^{{\tilde{L}}} {{\alpha _l}} {{{\mathbf {a}}}_{\mathrm{r}}}\left( {{\theta _l}} \right) {{\mathbf {a}}}_{\mathrm{t}}^H\left( {{\phi _l}} \right) , \end{aligned} \end{aligned}$$

where \({\tilde{L}}\) denotes the number of communication paths. There are usually a few reflected path clusters in mmWave multipath propagation channel [26,27,28]. Therefore, we usually have \({\tilde{L}} \ll {min}\left\{ {{N_r},{N_t}} \right\}\). \({\alpha _l}\) is the complex-valued gain corresponding to the l-th communication path. \({\theta _l} \in [-\pi ,\pi ]\) and \({\phi _l} \in [-\pi ,\pi ]\) are the corresponding azimuth AoAs and AoDs [32,33,34], respectively, and \({{{\mathbf {a}}}_{\mathrm{r}}} \in {{\mathbb {C}}^{{N_r \times 1}}}\left( {{{{\mathbf {a}}}_{\mathrm{t}}} \in {{\mathbb {C}}^{{N_t \times 1}}}} \right)\) is the array response vector corresponding to the receiver (transmitter).

In this paper, we employ ULA both at the transmitter and the receiver. Then, the steering vectors at the BS and MS can be written as

$$\begin{aligned}&{{{\mathbf {a}}}_{\mathrm{r}}}({\theta _l}) = {\left[ {1,{\mathrm{{e}}^{ - j2\pi \frac{d}{\lambda }\sin ({\theta _l})}}, \ldots ,{\mathrm{{e}}^{ - j2\pi \frac{{({N_r} - 1)d}}{\lambda }\sin ({\theta _l})}}} \right] ^T}, \end{aligned}$$
$$\begin{aligned}&{{{\mathbf {a}}}_{\mathrm{t}}}({\phi _l}) = {\left[ {1,{\mathrm{{e}}^{ - j2\pi \frac{d}{\lambda }\sin ({\phi _l})}}, \ldots ,{\mathrm{{e}}^{ - j2\pi \frac{{({N_t} - 1)d}}{\lambda }\sin ({\phi _l})}}} \right] ^T}, \end{aligned}$$

where \(\lambda\) is the signal wavelength and d denotes the spacing between every two adjacent antenna elements with \(d = \lambda /2\). For convenience, each scatterer is assumed to possess a transmission path, and the channel gains are modeled as independent identically distributed random variables with the distribution \({\mathcal {C}}{\mathcal {N}}(0,\sigma _\alpha ^2)\). By defining the steering matrices \({{\mathbf \Phi }_{\mathrm{r}}} = [{{{\mathbf {a}}}_{\mathrm{r}}}({\theta _1}), \ldots ,{{{\mathbf {a}}}_{\mathrm{r}}}({\theta _{{\tilde{L}}}})] \in {{\mathbb {C}}^{{N_r} \times {\tilde{L}}}}\), \({{\mathbf \Phi }_{\mathrm{t}}}=[{{{\mathbf {a}}}_{\mathrm{t}}}({\phi _1}), \ldots ,{{{\mathbf {a}}}_{\mathrm{t}}}\left( {{\phi _{{\tilde{L}}}}} \right) ] \in {{\mathbb {C}}^{{N_t} \times {\tilde{L}}}}\) in a matrix form, the mmWave channel matrix \({{\mathbf {H}}}\) in (4) can be rewritten in a matrix form as follows

$$\begin{aligned} \begin{aligned} {{\mathbf {H}}} = {{{\varvec{\Phi }} }_{\mathrm{r}}}{{{\mathbf {H}}}_{\mathrm{v}}}{{\varvec{\Phi }} }_{\mathrm{t}}^H, \end{aligned} \end{aligned}$$

where \({{{\mathbf {H}}}_{\mathrm{v}}}\) is a diagonal matrix with diagonal elements \({\alpha } = {[{\alpha _1}, \ldots ,{\alpha _{{\tilde{L}}}}]^T}\). To transform the signal estimation problem into the sparse recovery problem, we firstly express the channel in a sparse matrix form

$$\begin{aligned} {{\mathbf {H}}} = {{{\mathbf {A}}}_{\mathrm{r}}}{\mathbf {CA}}_{\mathrm{t}}^H, \end{aligned}$$

where both \({{{\mathbf {A}}}_{{\mathbf {r}}}} \triangleq \frac{1}{{\sqrt{{N_r}} }}[{{{\mathbf {a}}}_{\mathrm{r}}}\left( {{\psi _1}} \right) \cdots {{{\mathbf {a}}}_{\mathrm{r}}}\left( {{\psi _{N_1}}} \right) ] \in {{\mathbb {C}}^{^{{N_r} \times {N_1}}}}\) and \({{{\mathbf {A}}}_{\mathrm{t}}} \triangleq \frac{1}{{\sqrt{{N_t}} }}[{{{\mathbf {a}}}_{\mathrm{t}}}\left( {{\omega _1}} \right) \cdots {{{\mathbf {a}}}_{\mathrm{r}}}\left( {{\omega _{N_2}}} \right) ] \in {{\mathbb {C}}^{^{N_t \times N_2}}}\) are overcomplete beamforming matrices \(\left( {{N_1} \geqslant {N_r},{N_2} \geqslant {N_t}} \right)\) which are assumed as channel-invariant unitary DFT matrices, and \({{{\mathbf {A}}}_{\mathrm{r}}}{{\mathbf {A}}}_{\mathrm{r}}^H = {{{\mathbf {I}}}_{{N_1}}}\), \({{{\mathbf {A}}}_{\mathrm{t}}}{{\mathbf {A}}}_{\mathrm{t}}^H = {{{\mathbf {I}}}_{N_2}}\), respectively. \({{\mathbf {C}}} \in {{\mathbb {C}}^{{N_1} \times {N_2}}}\) is a sparse representation matrix of mmWave channel with the \({\tilde{L}}\) nonzero elements correspond to each scatterer under the condition that each AoA/AoD is defined in quantization grids. Note that the practical direction is usually invalid to locate on the predefined grids, so it is actually that the quantizing error exists in the sparse channel matrix. Hence, a refined channel estimation scheme is given in the later sections to solve the off-grid problem. What is more, it is worth noting that \({{\mathbf {C}}}\) is no longer diagonal.

The signal model in (3) can be further denoted as follows

$$\begin{aligned} \begin{aligned} {{\mathbf {y}}}&= \sqrt{P} vec({{\mathbf {W}}}_{\mathrm{r}}^H{{\mathbf {H}}}{{{\mathbf {X}}}}) + vec(\begin{array}{c} \frown \\ \mathbf {N} \end{array}) \\&= \sqrt{P} \left( {{{\mathbf {X}}}^T \otimes {{\mathbf {W}}}_{\mathrm{r}}^H} \right) vec\left( {{\mathbf {H}}} \right) + \begin{array}{c} \frown \\ {{\mathbf {n}}} \end{array} \\&= \sqrt{P} \left( {{{\mathbf {X}}}^T{{\mathbf {A}}}_{\mathrm{t}}^* \otimes {{\mathbf {W}}}_{\mathrm{r}}^H{{{\mathbf {A}}}_{\mathrm{r}}}} \right) {{\mathbf {c}}} + {\begin{array}{c} \frown \\ {{\mathbf {n}}} \end{array}} , \\ \end{aligned} \end{aligned}$$

where \({{\mathbf {c}}} = vec({{\mathbf {C}}})\), \({{\mathbf {y}}} = {\text {vec}} ({{\mathbf {Y}}}) \in {{\mathbb {C}}^{{N_x}{N_y} \times 1}}\) and \({\begin{array}{c} \frown \\ {{\mathbf {n}}} \end{array} } = {\text {vec}} ({{\begin{array}{c} \frown \\ {{\mathbf {N}}} \end{array}}})\). The estimation of \({{\mathbf {c}}}\) can be handled as a sparse signal recovery problem with 2D dictionary matrix \({\left( {{\mathbf {A}}}_{\mathrm{t}}^* \otimes {{\mathbf {A}}}_{\mathrm{r}} \right) }\), and the sensing matrix also can be denoted as \({{\varvec{\Phi }} } = {{\mathbf {F}}}_{\mathrm{t}}^T{{\mathbf {A}}}_{\mathrm{t}}^* \otimes {{\mathbf {W}}}_{\mathrm{r}}^H{{{\mathbf {A}}}_{\mathrm{r}}}\). We consider the number of grids \(N_1 = N_2\) for simplicity, and the channel estimation problem can be summarized as compressed sensing problem as follows

$$\begin{aligned} \begin{aligned} \mathop {\arg \min {{\left\| {{\mathbf {c}}} \right\| }_0}}\limits _{{\mathbf {c}}} \quad {\mathrm{subject \; to }}{\left\| {{{\mathbf {y}}} - \sqrt{P} {\varvec{\Phi }} {\mathbf {c}}} \right\| _2} < \zeta , \end{aligned} \end{aligned}$$

where \(\zeta\) is a tolerance constant which is determined by noise power. Since recovery signal from \(l_0\)-norm is a NP hard question, we replace the \(l_1\)-norm with \(l_0\)-norm as follows

$$\begin{aligned} \begin{aligned} \mathop {\arg \min {{\left\| {{\mathbf {c}}} \right\| }_1}}\limits _{{\mathbf {c}}} \quad {\mathrm{subject \; to }}{\left\| {{{\mathbf {y}}} - \sqrt{P} {\varvec{\Phi }} {\mathbf {c}}} \right\| _2} < \zeta . \end{aligned} \end{aligned}$$

4 Proposed GAMP-SBL-based high precision channel estimation

This section describes the proposed SBL scheme for sparse beamspace channel vector estimation. We begin by introducing the proposed hierarchical prior model. Then, we use the DGAMP-SBL to estimate the mmWave channel. After that, we propose a refined algorithm to get the accurate estimation of AoAs and AoDs. And then we propose a refined algorithm to estimate the AoAs and AoDs accurately. Finally, we analyze the computation complexity of the proposed algorithm and make a comparison to the complexity of SBL.

4.1 Hierarchical prior model

We have the following results using the assumption of circular symmetric complex Gaussian noises

$$\begin{aligned} \begin{aligned} p({{\mathbf {y}}}|{{\mathbf {c}}},\lambda ) = \mathcal {C}\mathcal {N}\left( {{{\mathbf {y}}}|{\varvec{\Phi }} {\mathbf {c}},{\lambda ^{ - 1}}{{\mathbf {I}}}} \right) , \end{aligned} \end{aligned}$$

where \(\lambda = {\sigma ^{-2}}\) stands for the noise precision. In order to leverage the sparse structure of the underlying mmWave channel, we enforce sparsity constraints on the channel vector which is commonly used in the sparse Bayesian model. The sparse channel vector \({{\mathbf {c}}}\) is assumed to follow a two hierarchical prior Gaussian distribution. In the first layer, \({{\mathbf {c}}}\) is assumed that the entries of \({{\mathbf {c}}}\) are independent and identically distributed, i.e.

$$\begin{aligned} \begin{aligned} p({{\mathbf {c}}}|{\gamma })\sim \prod \limits _{i = 1}^{{N_1}{N_2}} \mathcal {C} \mathcal {N}\left( {{c_i};0,\gamma _i^{ - 1}} \right) , \end{aligned} \end{aligned}$$

where \({\gamma _i}\) denotes the inverse variance, and \({ \gamma } = [{\gamma _1},{\gamma _2}, \ldots ,{\gamma _{{N_1}N_2}}]\). Meanwhile, a Gamma prior is employed for the inverse variance vector

$$\begin{aligned} \begin{aligned} p({{{\gamma }} })&= \prod \limits _{i = 1}^{{N_1}{N_2}} {{\text {Gamma}} } \left( {{\gamma _i}\mid a + 1,b} \right) \\&= \prod \limits _{i = 1}^{{N_1}{N_2}} {{\Gamma ^{ - 1}}(a + 1){b^a}\gamma _i^a{\mathrm{{e}}^{ - b{\gamma _i}}}}, \end{aligned} \end{aligned}$$

where ab are parameters associated with the above distribution, and \(\Gamma (a) = \int _0^\infty {{t^{a - 1}}} {\mathrm{{e}}^{ - t}}\) is the Gamma function. Besides, \({{\begin{array}{c} \frown \\ {\mathbf {n}} \end{array}}}\) is Gaussian noise with zero mean and covariance matrix \(\left( {1/\lambda } \right) {{\mathbf {I}}}\). We set a Gamma hyperprior over \(\lambda\): \(p(\lambda ) = {\text {Gamma}} (\lambda \mid c+1,d) = \Gamma {(c+1)^{ - 1}}{d^{\left( c+1 \right) }}{\lambda ^{c}}{\mathrm{{e}}^{ - d\lambda }}\).

To obtain a broad hyperprior, we set \(a,b \rightarrow 0\) [35, 36]. This two-stage hierarchical prior gives

$$\begin{aligned} \begin{aligned} p({c_i}) = \int _0^\infty {p({c_i}\mid \gamma _i^{ - 1})p({\gamma _i})} d{\gamma _i}, \end{aligned} \end{aligned}$$

which is helpful to obtain sparse solutions due to the sharp peak and heavy tails with small a and b. Actually, according to paper [37], the maximum posterior estimation of \({{\mathbf {c}}}\) is consistent with \({l_0}\)-norm solution in formula (10) by FOCUSS with \(p \rightarrow 0\). To update the parameter \(\theta = \left\{ {{{\gamma }}, \lambda } \right\}\), we can also use maximum posteriori estimation to achieve the most probable values, i.e.,

$$\begin{aligned} \begin{aligned} \left( {{{{{\gamma }} }^*},{\lambda ^*}} \right) = \arg {\max _{{{{\gamma }} },\lambda }} \; p({{{\gamma }} ,}\lambda |{{\mathbf {y}}}), \end{aligned} \end{aligned}$$

or, equivalently,

$$\begin{aligned} \begin{aligned} \left( {{{{{{\gamma }}} }^*},{\lambda ^*}} \right) = \arg {\max _{{{{\gamma }} },\lambda }}\ln p({{{\gamma }} ,}\lambda ,{{\mathbf {y}}}). \end{aligned} \end{aligned}$$

Then, the EM-SBL algorithm is employed to learn the sparse vector \({\mathbf {c}}\) and iteratively update the hyper-parameters \({{\theta }} = \left\{ {{{\gamma }}, \lambda } \right\}\). Note that the key step is to update the hyper-parameters \(p({{\gamma ,}}\lambda \mid {{\mathbf {y}}})\) by maximizing the posterior probability when the EM-SBL is in the updating phase. And in the stage of E-step, the likelihood function can be written as follows

$$\begin{aligned} \begin{aligned} p\left( {{{\mathbf {y}}}|{{\mathbf {c}}};{\sigma ^2}} \right) = \frac{1}{{{{\left( {2\pi {\sigma ^2}} \right) }^{{N_xN_y}}}}}\exp \left( { - \frac{1}{{2{\sigma ^2}}}\Vert {{\mathbf {y}}} - {{\varvec{\Phi }} }{{{\mathbf {c}}}^2}}\Vert \right) . \end{aligned} \end{aligned}$$

As noted before, the conditional density \(p\mathrm{{(}}{{\mathbf {c}}}\mathrm{{|}}{{\mathbf {r}}}\mathrm{{)}}\) shows \({\mathbf {c}}\) is Gaussian distribution. When we treat \({{\mathbf {c}}}\) as a hidden variable, its posterior distribution we need to compute conditioned on the observed vector \({{\mathbf {y}}}\) and the updated hyper-parameters \(\theta\) is a complex Gaussian [35] function

$$\begin{aligned} & p\left( {\left. {\mathbf{c}} \right|{\mathbf{y}},\gamma ^{{(t)}} ,\lambda ^{{(t)}} } \right) = \frac{{p\left( {{\mathbf{c}}|\gamma ^{{(t)}} } \right)p\left( {{\mathbf{y}}|{\mathbf{c}},\lambda ^{{(t)}} } \right)}}{{p({\mathbf{y}}|\gamma ^{{(t)}} ,\lambda ^{{(t)}} )}} \\ & \quad = (2\pi )^{{ - \frac{{(N_{1} N_{2} + 1)}}{2}}} \left| {{\mathbf{\Sigma }}_{c} } \right|^{{ - \frac{1}{2}}} \exp \left\{ { - \frac{1}{2}({\mathbf{c}} - \mu _{c} )^{{\text{T}}} {\mathbf{\Sigma }}^{{ - 1}} ({\mathbf{c}} - \mu _{c} )} \right\}, \\ \end{aligned}$$

where the posterior covariance \({{{\varvec{\Sigma }}}_c}\) and mean \({{{\mu }}_c}\) are given, respectively,

$$\begin{aligned} {\mathbf{\Sigma }}_{c} & = \left( {\lambda {\mathbf{\Phi }}^{T} {\mathbf{\Phi }} + {\mathbf{D}}} \right)^{{ - 1}} \\ \mu _{c} & = \lambda {\mathbf{\Sigma }}_{c} {\mathbf{\Phi }}^{T} {\mathbf{y}}, \\ \end{aligned}$$

where \({{\mathbf {D}}} = {\text {diag}} \left( {{\gamma _0},{\gamma _1}, \ldots ,{\gamma _{{N_1}{N_2}}}} \right)\), \({{{\varvec{\Sigma }} }_c}\) and \({{{{\mu }} }_c}\) are the posterior mean and variance with relevant for \(p\left( {{{\mathbf {c}}}\mid {{\mathbf {y}}},{{{{\gamma }} }^{(t)}},{\lambda ^{(t)}}} \right)\), respectively. We assume that \({{{\tau }}_c}\) is the vector whose elements are composed of the diagonal of the covariance matrix \({{{\varvec{\Sigma }}}_c}\). As mentioned above, in the EM algorithm iterative process, the hyper-parameters are updated by iteratively maximizing the R-function, i.e.,

$$\begin{aligned} \begin{aligned} {{{{\theta }} }^{(t + 1)}}&\triangleq \arg {\max _{{{\theta }} }} \; R\left( {{{{\theta }} }|{{{{\theta }} }^{(t)}}} \right) \\&\triangleq \arg {\max _{{{\theta }} }}{\; E_{{{\mathbf {c}}}|{{\mathbf {y}}},{{{{\theta }} }^{(t)}}}}[\log p({\theta }|{{\mathbf {c}}},{{\mathbf {y}}})] . \\ \end{aligned} \end{aligned}$$

Using Bayesian rule, (21)) can be rewritten by ignoring part unrelated to \({{\theta }}\) as follows

$$\begin{aligned} & E_{{{\mathbf{c}}/{\mathbf{y}};\gamma ^{t} ,\lambda }} \left[ {\log p(\gamma ^{t} ,\lambda |{\mathbf{c}},y)} \right] \\ & \quad = E_{{{\mathbf{c}}/{\mathbf{y}};\gamma ^{t} ,\lambda }} \left[ { - \log p({\mathbf{y}}|{\mathbf{c}};\lambda ) - \log p(c|\gamma ) - \log p(\gamma )} \right]. \\ \end{aligned}$$

Firstly. the algorithm carries out the M-step for the hyper-parameters \(\left\{ {{\gamma _n}} \right\}\). We take the partial derivative of the R-function with respect to \({\gamma _n}\) with eliminating independent terms. Since the first term in (22) does not depend on \({{{\gamma }} }\), it can be ignored as it will not be relevant for the M-Step. The objective function in (22) becomes

$$\begin{aligned} & E_{{{\mathbf{c}}|{\mathbf{y}},\theta ^{{(t)}} }} [\log p(\theta |{\mathbf{c}},{\mathbf{y}})] \\ & \quad = E_{{{\mathbf{c}}|{\mathbf{y}},\theta ^{{(t)}} }} \left[ { - \log p(c|\gamma ) - \log p(\gamma )} \right] \\ & \quad = \sum\limits_{{n = 1}}^{{N_{1} N_{2} }} {\left( {\frac{{\gamma _{n} \left( {\hat{c}_{n}^{2} + \tau _{{c_{n} }} } \right)}}{2} - \frac{1}{2}\log \gamma _{n} + \log p(\gamma _{n} )} \right)} \\ & \quad = \sum\limits_{{n = 1}}^{{N_{1} N_{2} }} {\left( {\left( {\frac{{\gamma _{n} \left( {\hat{c}_{n}^{2} + \tau _{{c_{n} }} } \right)}}{2} - \frac{1}{2}\log \gamma _{n} + a\log \gamma _{n} - b\gamma _{n} } \right)} \right)} . \\ \end{aligned}$$

We take the partial derivative of the R-function with respect to \({\gamma _n}\) with eliminating independent terms, and the iteration of \({\gamma _n}\) can be denoted by

$$\begin{aligned} \begin{aligned} \gamma _n^{i + 1} = \mathop {\arg \min }\limits _{{\gamma _n}} \;\left( {\frac{{{\gamma _n}\left( {{\hat{c}}_n^2 + {\tau _{{c_n}}}} \right) }}{2} +\log p({\gamma _n}) -\frac{1}{2}\log {\gamma _n} } \right) . \end{aligned} \end{aligned}$$

According to the hyperprior \(p({\gamma _n})\) which possesses a non-informative when the parameter a and b tend to zero, we can simplify the update formalization as

$$\begin{aligned} \begin{aligned} \gamma _n^{i + 1} = \frac{1}{{{\hat{c}}_n^2 + {\tau _{{c_n}}}}}. \end{aligned} \end{aligned}$$

Similarly, we then compute the estimation of the scalar hyper-parameter \(\lambda\) and it can be updated as

$$\begin{aligned} \begin{aligned} {\lambda ^{i + 1}}&= \arg \max {E_{c|y,{\gamma };\lambda }}[p({{\mathbf {y}}},{{\mathbf {c}}},{\gamma };\lambda )] \\&= \frac{{{{\left\| {{{\mathbf {y}}} - {\varvec{\Phi }} {\mathbf {c}}} \right\| }^2} + {{\left( \lambda \right) }^i}\sum \limits _{n = 1}^{{N_1}{N_2}} {(1 - \frac{{{\tau _{{c_n}}}}}{{{\gamma _n}}})} }}{N} , \\ \end{aligned} \end{aligned}$$

where \(N = {N_x}{N_y}\) represents the dimension of \({{\mathbf {y}}}\). According to the references and practical implementation tips, we can consistently set the constant as follows: \(a=b=c=d=0.0001\), since the result of constant initialization has little effect on estimation performance.

4.2 Update by DGAMP

As mentioned earlier, it is obvious that the calculation of the posterior mean and posterior variance which involve the inversion of high-dimensional matrices is extensive. Hence, the high computational complexity characteristic of the EM-SBL algorithm causes that it is impractical to be adopted by the massive MIMO channel estimation. To simplify the calculation, we replace posterior calculation with the GAMP algorithm which is a very-low-complexity Bayesian iterative technique. It is noted that the hyper-parameters \(\{ { {\gamma },\lambda }\}\) are considered as known constants during the iterative process of the GAMP algorithm.

GAMP is a fast heuristic algorithm and can be utilized for simplifying matrix inversion within the SBL framework [38, 39]. GAMP algorithm obtains the maximum posterior estimation of \({\mathbf {c}}\) by Taylor approximation. Specifically, the process of iteratively computing the marginal posterior \(p\left( {{c_n}\mid {{\mathbf {y}}},{\gamma },\lambda } \right)\) is performed by message passing on the GAMP factor graph. By utilizing the condition that all posteriors are Gauss, the process can be simplified by replacing posterior probability with expectation and variance of the sparse variables \(\left\{ {{c_n}} \right\}\) and mixture variables \(\left\{ {{z_m}} \right\}\) whose elements are denoted by \({{\mathbf {z}}} = {{{\varvec{\Phi }}} {\mathbf {c}}}\). To detour the convergence of GAMP whose measurement matrix satisfies independent Gaussian distribution, paper [40] proposes a DGAMP algorithm to improve the robustness of the measurement matrix through importing damping factors \({\rho _s}\), \({\rho _c} \in \left( {0,1} \right]\), but it will also slow down the convergence speed. Nevertheless, the process is computationally efficient since that it only contains scalar operations. Then, we summarize the key steps of DGAMP in Algorithm 1. The readers can refer to [22] to learn more about details of the derivation of DGAMP.

figure a

There are two versions which are the max-sum version and sum-product versions damp-GAMP algorithm. The input and output functions \({g_s}\left( {{{\mathbf {p}}},{{{\tau }}_p}} \right)\) and output functions \({g_x}\left( {{\mathbf {r}},\tau _\mathbf {r}} \right)\) in Algorithm 1 are distinguished according to whether the max-sum or the sum-product version of GAMP. Coincidentally, both the sum-product and max-sum version simplify the same equation. We only introduce the functions of the input and output of the sum-product, and readers can refer [41] to get furthermore detail. The intermediate variables \({{\mathbf {r}}}\) and \({{\mathbf {p}}}\) are explained as approximations of Gaussian noise corrupted of \({{\mathbf {c}}}\) and \({{\mathbf {z}}} = {{{\varvec{\Phi }}} {\mathbf {c}}}\) with the noise levels of \({{{\tau }}_r}\) and \({{{\tau }}_p}\), respectively. The difference between the sum-product and max-sum version is the estimation strategy. The sum-product version uses the vector minimum mean-squared error (MMSE) estimation which is reduced to a sequence of scalar MMSE estimation. The input and output functions are shown as follows

$$\begin{aligned}&{\left[ {{g_s}\left( {{{\mathbf {p}}},{{{\tau }}_p}} \right) } \right] _m} = \frac{{\int {{z_m}p({y_m}|{z_m})\mathcal {N}\left( {{z_m};\frac{{{p_m}}}{{{\tau _{{p_m}}}}},\frac{1}{{{\tau _{{p_m}}}}}} \right) } d{z_m}}}{{\int {p({y_m}|{z_m})\mathcal {N}\left( {{z_m};\frac{{{p_m}}}{{{\tau _{{p_m}}}}},\frac{1}{{{\tau _{{p_m}}}}}} \right) d{z_m}} }}, \end{aligned}$$
$$\begin{aligned}&{\left[ {{g_c}\left( {{\mathbf {r}},{\tau _{\mathbf {r}}}} \right) } \right] _n} = \frac{{\int {{c_n}p({c_n})\mathcal {N}\left( {{c_n};{r_n},{\tau _{{r_n}}}} \right) } d{c_n}}}{{\int {p({c_n})\mathcal {N}\left( {{c_n};{r_n},{\tau _{{r_n}}}} \right) d{c_n}} }}. \end{aligned}$$

According to the (1215) which are the prior information and the posterior information we imposed on \({{\mathbf {c}}}\) and \(p\left( {{{\mathbf {y}}}|{{\mathbf {c}}}} \right)\), respectively, the input function and its derivative function can be rewritten as follows

$$\begin{aligned}&{g_c}({{\mathbf {r}}},{{{\tau }}_r}) = \frac{{{\gamma }}}{{{{\gamma }} + {{{\tau }}_r}}}{{\mathbf {r}}}, \end{aligned}$$
$$\begin{aligned}&g{'_c}({{\mathbf {r}}},{{{\tau }}_r}) = \frac{{{\gamma }}}{{{{\gamma }} + {{{\tau }}_r}}}. \end{aligned}$$

Similarly, the output function and its derivative function can be rewritten as follows

$$\begin{aligned}&{g_s}({{\mathbf {p}}},{{{\tau }}_p}) = \frac{{\left( {{{{\mathbf {p}}} \big / {{{{\tau }}_p}}} - {{\mathbf {y}}}} \right) }}{{\left( {\lambda + {1 \big / {{{{\tau }}_p}}}} \right) }}, \end{aligned}$$
$$\begin{aligned}&g{'_s}({{\mathbf {p}}},{{{\tau }}_p}) = \frac{{{\lambda ^{ - 1}}}}{{{\lambda ^{ - 1}} + {{{\tau }}_p}}}. \end{aligned}$$

Upon convergence, DGAMP-SBL yields a sparse estimate of \({{\mathbf {c}}}\) according to the prior information given by the posterior mean. This also gets a coarse estimate of the channel matrix, which can be represented by a paired frequency of AoAs and AoDs using multiplying the steering vectors. The initial channel estimation can be obtained as

$$\begin{aligned} \begin{aligned} {{{\hat{\mathbf {H}}}}^{(0)}} = {{{\mathbf {A}}}_{{\varvec{r}}}}{{\hat{ {\mathbf {C}}} {{\mathbf {A}}}}}_t^H, \end{aligned} \end{aligned}$$

where \({{\hat{{\mathbf {C}}}}}\) denote order by columns as \({{\mathbf {H}}}\).

It is worth noting that GAMP is a low complexity algorithm that transforms the vector estimation into the scalar estimation; therefore, (27), (28) and the operations in the Algorithm 1, all vector squares, divisions and multiplications are taken element-wise. Figure 2 represents sparse channel matrix based on discrete Fourier basis. Figure 3 represents the SBL-GAMP algorithm estimation without considering the sparse off-grid problem when the number of grids is chosen as \({N_1} = {N_2} = 180\) for comparison.

4.3 Refined estimation

In the following, we propose an exact and fast 2D frequency estimation method base on the interpolation of three 2-DFT spectral lines [42]. The SBL-DGAMP algorithm, represented by the frequency estimate \({\hat{f}} = \left\{ {{{{\hat{f}}}_1},{{{\hat{f}}}_2}, \ldots {{{\hat{f}}}_{{\tilde{L}}}}} \right\}\) where the \({{\hat{f}}_i} = \{ {f_{i1}},{f_{i2}}\} \quad (i = 1,2, \ldots ,{\tilde{L}})\), is only a coarse estimate due to grid mismatch.

Fig. 2
figure 2

Instance diagram, \({\tilde{L}}=3\), \({N_1} = {N_2} = 180\). Sparse channel matrix can be showed under the under the DFT basis by this figure

Fig. 3
figure 3

Instance diagram, SNR=20dB, \({\tilde{L}}=3\), \({N_1} = {N_2} = 180\), the “peaks” are actually paths with a larger gain and the location corresponding to the angle of AoAs and AoDs.

For \((k,j) \in [({k_1},{j_1}),({k_2},{j_2}) \ldots ,({k_{{\tilde{L}}}},{j_{{\tilde{L}}}})]\) related to the corresponding index set of the coarse estimate frequency. And then, we present a interpolation on Fractional Fourier Coefficient. The Fractional Fourier transform can be formulated as follows

$$\begin{aligned}&D(k,j) = \sum \limits _{m = 0}^{{N_r}} {\sum \limits _{n = 0}^{{N_t}} {[{\tilde{H}}} } {]_{n,m}}{\mathrm{{e}}^{ - j2\pi \left( {im + kn} \right) /{N_{DFT}}}}, \end{aligned}$$
$$\begin{aligned}&D(k \pm \delta ,j) = \sum \limits _{m = 0}^{{N_r}} {\sum \limits _{n = 0}^{{N_t}} {[{\tilde{H}}} } {]_{n,m}}{\mathrm{{e}}^{ - j2\pi \left( {im + (k \pm \delta )n} \right) /{N_{DFT}}}}, \end{aligned}$$
$$\begin{aligned}&D(k , \delta \pm j) = \sum \limits _{m = 0}^{{N_r}} {\sum \limits _{n = 0}^{{N_t}} {[{\tilde{H}}} } {]_{n,m}}{\mathrm{{e}}^{ - j2\pi \left( {i(m \pm \delta ) +kn} \right) /{N_{DFT}}}}, \end{aligned}$$

where \(\delta\) is a real number, and we formulate the \(D(k,j),D(k\pm \delta ,j)\), \(D(k,j\pm \delta )\) as \(D,{D_{K+\delta }}\), \({D_{K-\delta }}\), \({D_{J+\delta }}\), \({D_{J-\delta }}\), respectively. Then, we propose the interpolation algorithm as follows

$$\begin{aligned}&{{\hat{\lambda }} _x} = \frac{{{N_r}}}{\pi }{\tan ^{ - 1}} \\&\quad \left( {{\text {Re}} \left\{ {\frac{{\left( {{D_{K + \delta }}E_r^ + - {D_{K - \delta }}E_r^ - } \right) \cdot \sin (\pi i/{N_r})}}{{\left( {{D_{K + \delta }}E_r^ + + {D_{K - \delta }}E_r^ - } \right) \cdot \cos (\pi i/{N_r}) - 2D\cos (\pi i)}}} \right\} } \right) , \end{aligned}$$
$$\begin{aligned}&{{\hat{\lambda }} _y} = \frac{{{N_t}}}{\pi }{\tan ^{ - 1}} \\&\quad \left( {{\text {Re}} \left\{ {\frac{{\left( {{D_{J + \delta }}E_t^ + - {D_{J - \delta }}E_t^ - } \right) \cdot \sin (\pi i/{N_t})}}{{\left( {{D_{J + \delta }}E_t^ + + {D_{J - \delta }}E_t^ - } \right) \cdot \cos (\pi i/{N_t}) - 2D\cos (\pi i)}}} \right\} } \right) , \end{aligned}$$

where \({{\hat{\lambda }} _x},{{\hat{\lambda }} _y}\) are frequency deviation normalized by grid distance \(\Delta f\) which is defined by \(\Delta f = 1/\left( {{N_{DFT}}/2} \right)\).

And then we could get the frequency estimations that

$$\begin{aligned} \begin{aligned} {{\hat{f}}_x} = \left\{ {\begin{array}{*{20}{c}} { - ({\hat{\lambda }} _x + k)/({N_{DFT}}/2)\quad \quad \quad k < {N_{DFT}}/2} \\ {2 - ({{{\hat{\lambda }} }_x} + k)/({N_{DFT}}/2)\quad \quad k \geqslant {N_{DFT}}/2} \end{array}} \right. , \end{aligned} \end{aligned}$$


$$\begin{aligned} \begin{aligned} {{\hat{f}}_y} = \left\{ {\begin{array}{*{20}{c}} {({{{\hat{\lambda }} }_y} + j)/({N_{DFT}}/2)\quad \quad \quad \quad j < {N_{DFT}}/2} \\ {-(2-({{{\hat{\lambda }} }_y} + j)/({N_{DFT}}/2))\quad j \geqslant {N_{DFT}}/2} \end{array}} \right. . \end{aligned} \end{aligned}$$

According to [43], in order to get a more accurate frequency estimate, we adopt a parabolic interpolation algorithm. For each frequency dimension \(d \in \left\{ {x,y} \right\}\), we calculate three periodogram sample \({D_{d1}}\), \({D_{d2}}\) and \({D_{d3}}\) at frequency \({\theta _{_{d1}}} = {{\hat{f}}_d} - {\Delta _d}\), \({\theta _{_{d2}}} = {{\hat{f}}_d}\) and \({\theta _{_{d1}}} = {{\hat{f}}_d} + {\Delta _d}\). Middle frequency \({{\hat{f}}_d}\) given by (39) and (40), and the sides of frequency \({\theta _{d1}}\) and \({\theta _{d3}}\) are excursed by \({\Delta _d}\) which is chosen to satisfy \({\Delta _d} \in (0,\frac{1}{{2{N_{DFT}}}})\) , as well as high estimation accuracy. The last step of frequency estimation along the dth dimension is achieved by calculating the vertex of a parabola fitted through points \(\left( {{\theta _{d1}},{D_{d1}}} \right) ,\left( {{\theta _{d2}},{D_{d2}}} \right) {\text { and }}\left( {{\theta _{d3}},{D_{d3}}} \right)\), i.e.

$$\begin{aligned} \begin{aligned} \theta _d^{fin} = \frac{1}{2}\frac{{\theta _{d3}^2\left( {{\Delta _{12}}} \right) + \theta _{d2}^2\left( {{\Delta _{31}}} \right) + \theta _{d1}^2\left( {{\Delta _{23}}} \right) }}{{{\theta _{d3}}\left( {{\Delta _{12}}} \right) + {\theta _{d2}}\left( {{\Delta _{31}}} \right) + {\theta _{d1}}\left( {{\Delta _{23}}} \right) }}, \end{aligned} \end{aligned}$$

where \(d = x,y\), \({\Delta _{12}} = \left( {{D_{d1}} - {D_{d2}}} \right)\), \({\Delta _{31}} = {D_{d3}} - {D_{d1}}\), \({\Delta _{23}} = {D_{d2}} - {D_{d3}}\). Eventually, according to (41) the final estimation frequency, the accurate angles AoAs and AoDs can be obtained, i.e., \(\left\{ {{{{\hat{\theta }} }_l}} \right\} _{l = 1}^{{\tilde{L}}}\) and \(\left\{ {{{{\hat{\phi }} }_l}} \right\} _{l = 1}^{{\tilde{L}}}\).

figure b

4.4 Reconstruct mmWave MIMO channel

In this subsection, according to the obtained AoAs \(\left\{ {{{{\hat{\theta }} }_l}} \right\} _{l = 1}^{{\tilde{L}}}\) and AoDs \(\left\{ {{{{\hat{\phi }} }_l}} \right\} _{l = 1}^{{\tilde{L}}}\), the mmWave MIMO channel will be reconstructed as follow. Firstly, we recover the steering vectors \({{{\mathbf {a}}}_{\mathrm{r}}}({{\hat{\theta }} _l})\) and \({{{\mathbf {a}}}_{\mathrm{t}}}({{\hat{\phi }} _l})\) in (5) and (6) using the estimated AoAs and AoDs. Secondly, we rewrite the expression (6) as follows

$$\begin{aligned} \begin{aligned} {{\mathbf {H}}} = ({{{\varvec{\Phi }}}}_{\mathrm{t}}^* \odot {{{{\varvec{\Phi }} }}_{\mathrm{r}}})vecd({{{\mathbf {H}}}_{\mathrm{v}}}). \end{aligned} \end{aligned}$$

Thus, the receive signal also can be rewritten using vector form as follows

$$\begin{aligned} \begin{aligned} {{\mathbf {y}}}&= \sqrt{P} {{\mathbf {W}}}_{\mathrm{r}}^H(({{{\varvec{\Phi }} }}_{\mathrm{t}}^* \odot {{{{\varvec{\Phi }} }}_{\mathrm{r}}})vecd({{{\mathbf {H}}}_{\mathrm{v}}})){{{\mathbf {X}}}} + {\begin{array}{c} \frown \\ {{\mathbf {n}}} \end{array}} \\&= \sqrt{P} ({{{\mathbf {X}}}}^T{{{\varvec{\Phi }} }}_{\mathrm{t}}^* \odot {{\mathbf {W}}}_{\mathrm{r}}^H{{{{\varvec{\Phi }} }}_{\mathrm{r}}}){{{\mathbf {h}}}_{\mathrm{v}}} + {{\begin{array}{c} \frown \\ {{\mathbf {n}}} \end{array}}} \\&= \sqrt{P} {{{\mathbf {Q}}}^o}{{{\mathbf {h}}}_{\mathrm{v}}} + {\begin{array}{c} \frown \\ {{\mathbf {n}}} \end{array}} \\ \end{aligned} \end{aligned}$$

where \({{{\mathbf {h}}}_{\mathrm{v}}} = vecd({{{\mathbf {H}}}_{\mathrm{v}}})\), \({{{\mathbf {Q}}}^{\mathrm{o}}} = {{{\mathbf {X}}}}^T{{{\varvec{\Phi }} }}_{\mathrm{t}}^* \odot {{\mathbf {W}}}_{\mathrm{r}}^H{{{{\varvec{\Phi }} }}_{\mathrm{r}}}\). The estimator estimates \({{{\mathbf {h}}}_{\mathrm{v}}}\) in the LS sense. From (35), the LS estimate of \({{{\mathbf {h}}}_{\mathrm{v}}}\), denoted as \({{{\hat{\mathbf {h}}}}_{\mathrm{v}}}\), is given as follows

$$\begin{aligned} \begin{aligned} {{{\hat{\mathbf {h}}}}_{\mathrm{v}}} = \tfrac{1}{{\sqrt{P} }}{\left( {{{\left( {{{{\mathbf {Q}}}^{\mathrm{o}}}} \right) }^H}{{{\mathbf {Q}}}^{\mathrm{o}}}} \right) ^{ - 1}}{\left( {{{{\mathbf {Q}}}^{\mathrm{o}}}} \right) ^H}{{\mathbf {y}}}. \end{aligned} \end{aligned}$$

Finally, according to the obtained exact angle estimation, and the gains of path \({{{\hat{\mathbf {h}}}}_{\mathrm{v}}}\) above, we can recover the high-dimensional mmWave MIMO channel as \({{\hat{\mathbf {H}}}} = {{{{\varvec{\Phi }}}}_{\mathrm{r}}}diag({{{\mathbf {h}}}_{\mathrm{v}}}){{{\varvec{\Phi }} }}_{\mathrm{t}}^H\).

4.5 Analysis of computational complexity

Apparently, the complexity of the DGAMP-SBL algorithm is dominated by the E-step, and the matrix multiplications are a big part of it which matrix multiplications by \({{\mathbf {S}}}\), \({{{\mathbf {S}}}^T}\), \({{{\varvec{\Phi }} }}\), and \({{{{\varvec{\Phi }} }}^T}\) at each iteration. The complexity of each iteration is \(O(4 \cdot {N_1}{N_2}({N_x}{N_y}))\), since we should convert the complex signal to a real signal. It is worth noting that separate operations can reduce the single time, so we also can neglect the coefficient 4. Nonetheless, it is mentioned above that the multiplications operation in Algorithm 1 is taken element-wise. The complexity is much smaller than \(O\left( {{N_1}{N_2}{{({N_x}{N_y})}^2}} \right)\). The complexity of SBL iteration when the dimension of \({N_x}{N_y}\) is large. And the refined part does not need iteration so that it can ignore the complexity. For the OMP scheme, the main computational complexity is \(O( {N_1}{N_2}({N_x}{N_y}))\) which lies in the correlation operation. According to [16], a preprocessing step is proposed to reduce the computational cost of each iteration greatly. For the iterative reweight (IR) scheme, the main computational cost that is contributed by gradient operation is \(O(L^2 \cdot ({N_x}{N_y}({N_r+N_t})))\). Actually, according to the author’s description in [16], the first L maximum correlation angles extracted only according to preprocessing are likely to miss the real angle, especially when L is large. That means the first 2L maximum correlation angles are often taken so that the computational complexity can be written as \(O(4L^2 \cdot ({N_x}{N_y}({N_r+N_t})))\). Finally, for the least square algorithm, the total complexity of LS is \(O( ({N_x}{N_y})^2({N_r}{N_t})+({N_x}{N_y})^3))\). Due to the large antenna dimension of the channel, it is difficult to use LS for mm-Wave channel estimation.

According to the above analysis, we can find that the computational complexity of DGAMP-SBL is proportional to \({{N_1}{N_2}{{({N_x}{N_y})}}}\), which is similar to the OMP algorithm. Obviously, for large \(N_xN_y\), the computational costs of DGAMP-SBL are much smaller than SBL. When L is small, the computational complexity of IR is the smallest, but as the value of L becomes larger, the complexity of IR becomes close to that of DGAMP-SBL and OMP.

5 Results and discussion

In this section, we will prove the performance of the proposed algorithm superiority-based channel estimation scheme through MATLAB simulation with the following parameters. The transmitter and the receiver are equipped with \({N_r} = {N_t} \in \left\{ {32,64} \right\}\), \(N_{\mathrm{RF}}=N_t^{\mathrm{RF}}=N_r^{\mathrm{RF}} = 2\), \({N_x} = {N_y} \in \left\{ {24,32} \right\}\) and \({N_1} = {N_2} = 120\). Each item of the transmitted pilots \({{\mathbf {X}}}\) is defined as \({x_{i,j}} = \sqrt{{\rho \big / {{N_t}}}} {\mathrm{{e}}^{j{\beta _{i,j}}}}\), where \(\rho\) is the transmitted power, \({\beta _{i,j}}\) is the random phase uniformly dirstributed in \(\left[ {0,\left. {2\pi } \right) } \right.\). The signal-to-noise (SNR) is defined by SNR\(= \frac{{\rho \sigma _\alpha ^2}}{{\sigma ^2}}\). We consider the ULA geometry, and we take the following algorithm as comparison algorithms, i.e., the OMP-based channel estimation [44], the IR-based super-resolution channel estimation scheme [16], the Oracle estimator in [44] and the LS estimator. The normalized mean-squared error (NMSE) which is denoted as \(E\left[ {{{\left\| {{{\mathbf {H}}} - {{\hat{\mathbf {H}}}}} \right\| _F^2} \big / {\left\| {{\mathbf {H}}} \right\| _F^2}}} \right]\) is used for performance comparison. And the achievable spectral efficiency efficiency (ASE) can be defined as \({\log _2}\left| {{{{\mathbf {I}}}_{{N_{_{\mathrm{RF}}}}}} + \tfrac{{ P}}{{{N_{_{\mathrm{RF}}}}}}R_n^{ - 1}{{{{\tilde{\mathbf {W}}}}}^H}{{{\mathbf {H}}\tilde{{\mathbf {F}}}}}{{{{\tilde{{\mathbf {F}}}}}}^H}{{{\mathbf {H}}}^H}{{\tilde{\mathbf {W}}}}} \right|\), for the channel estimation \({\hat{\mathbf {H}}}\). And \(\tilde{{\mathbf {F}}}\) and \(\tilde{{\mathbf {W}}}\) are defined as the optimal precoder and combiner, respectively, which are designed via the singular value decomposition of \({\hat{\mathbf {H}}}\), denoted as \({\hat{ {{\mathbf {H}}} }= {\hat{{\mathbf {U}}}}{\hat{\mathbf \Sigma }} {\hat{{\mathbf {V}}}}^H}\). Here the diagonal entries of the singular value matrix \({\hat{{\varvec{\Sigma }}}}\) are arranged in decreasing order. \(\tilde{{\mathbf {F}}}\) and \(\tilde{{\mathbf {W}}}\) consist of the first \(N_{\mathrm{RF}}\) columns of \({\hat{{\mathbf {V}}}}\) and \({\hat{{\mathbf {U}}}}\), respectively. The \(\mathbf {R_n}\) can be defined as \(\mathbf {R}_{\mathbf {n}}\doteq \sigma ^2 \tilde{{\mathbf {W}}}^H \tilde{{\mathbf {W}}}\). The results in this simulation are all obtained through 200 Monte Carlo experiments [45, 46].

Fig. 4
figure 4

NMSE performance comparison of different schemes against SNR, when \({N_x} ={N_y}=32\), \({N_t} = {N_r} = 64\) are considered, respectively

Fig. 5
figure 5

NMSE performance comparison of different schemes against SNR, when \({N_x} ={N_y}=24\), \({N_t} = {N_r} = 32\) are considered, respectively

Fig. 6
figure 6

NMSE performance against the number of training beams when \({N_y} = 32\), \({N_t} = {N_r} = 64\) are considered, respectively

Fig. 7
figure 7

Running time against the number of grids when \({N_y} = 32\), \({N_t} = {N_r} = 64\) are considered, respectively

Fig. 8
figure 8

NMSE performance against the number of grids when \({N_x} = {N_y} = 32\), \({N_t} = {N_r} = 64\) are considered, respectively

Fig. 9
figure 9

NMSE performance against L when \({N_x} = {N_y} = 32\), \({N_t} = {N_r} = 64\), \(N_r^{\mathrm{RF}}=N_t^{\mathrm{RF}}=2\) are considered, respectively

Fig. 10
figure 10

ASE performance against SNR when \({N_x} = {N_y} = 32\), \({N_t} = {N_r} = 64\), \(N_r^{\mathrm{RF}}=N_t^{\mathrm{RF}}=2\) are considered, respectively

Figures 4 and 5 compare NMSE performance against SNR with \({N_r} = {N_t} = 32\), \({N_x} = 24\) and \({N_r} = {N_t} = 64\), \({N_x} = 32\), respectively. In both scenarios, it is apparent that the curves of the proposed algorithm are below that of comparing algorithms under most circumstances. Comparison results indicate that the proposed scheme has remarkable performance improvement. Note that our proposed scheme almost coincides with the curve of Oracle in the condition of high SNR. Under the same SNR, our algorithm achieves a huge advantage since our algorithm considers the off-grid influence which can estimate a more accurate angle. By comparing Fig. 4, our proposed algorithm achieves a great advantage for all of SNR. And with the increase in SNR, the advantage of our proposed algorithm performance over OMP becomes greater. Moreover, our proposed algorithm is superior to the IR algorithm, and it almost coincides with the Oracle curve. The NMSE performance degrades in low SNR levels because the suffer noise interference exists. And under the low SNR, our proposed algorithm coarse estimation faces a challenge which is brought by noise and off-grid, so the performance does not achieve the expectation. Although the number of antennas and pilots decline, our proposed algorithm still have superior performance. Taking the SNR of 10dB, 15dB, 20dB as an example, our proposed algorithm is also very close to the Oracle curve. It can be seen that the channel estimation accuracy based on our proposed algorithm is better than that of comparing algorithm, which verifies the advantages of our proposed schemes. To summarize, the proposed scheme is superior to another scheme in channel estimation accuracy regardless of the antennas and pilots are more or less.

Figure 6 compares the NMSE performance against the number of training beams when SNR=10dB, \({N_t} = {N_r}=64\). As the number of training beams with \({N_x}\) increases, the NMSE curves of all algorithms decrease monotonically. It is obvious that the comparing algorithm curves tends to decrease slowly. Even so, our proposed algorithm still achieves a huge advantage under different pilots. When the SNR is 10dB, our proposed algorithm achieves high estimation accuracy, and it is very close to the Oracle curve. And we have tested that our algorithm may get an accurate result when \(N_x\) is very small. Still, a probability of failure exists comparing with high \(N_x\). Again, among the comparing algorithm, the proposed algorithm still performs better outperforms than others.

Figure 7 compares the difference in runtime between the original SBL and the proposed scheme. In order to facilitate comparison, we fix the number of iterations 200 times. Note that although we apply 200 iterations, the algorithm convergence only needs 100 times and even less in most cases. It is obvious that the proposed algorithm needs much less time than the original SBL algorithm as the dimension increases, which suggests that the proposed algorithm is more practical for large-size sparse matric recovery.

Figure 8 shows the NMSE curves against the number of grids \(N_1=N_2\) when the SNR is 10 dB. We refine the mesh and compare the performance of different algorithms at different mesh spacing. In order to compare the advantages of the second refining algorithm, we add the unrefined DGAMP-SBL to the comparison. It is obvious that the performance of the refined results is greatly improved under any grid spacing. The proposed algorithm outperforms the IR when \(N_1=N_2\ge 70\). After the number of grids is greater than 70, the NMSE of the proposed algorithm is slightly down. The NMSE curves of the sparse matric algorithms monotonically decrease with the number of the grids increasing. It is interesting that the number of grids only needs to be set to 80 to balance the complexity of the computation and the accuracy of estimation. For comparison, the NMSE curves of the Oracle, LS and IR algorithms remain unchanged for different values of grids since all of those work are off-grid.

Figure 9 compares the NMSE curves against the number of paths. When L varies from 2 to 12, the curve of the IR scheme is basically stable since it can still find the most relevant columns. Nevertheless, the number of iterations increases greatly, and as mentioned in Sect. 4.5, with the increase in L, the calculation cost of the IR scheme is proportional to \(4L^2\). When L is large, the computation costs required for a single calculation are much bigger than our algorithm to maintain the estimation accuracy. For our proposed algorithm, the NMSE curve rises slightly. This is because when the number of L is large, due to the small adjacent angular distance, it may lead to the ambiguity of adjacent peaks, which affects the performance of the second step estimation. It is worth noting that our algorithm still has good estimation ability and is much higher than the comparison algorithm OMP.

Figure 10 compares the ASE against SNR when different channel estimation schemes are used with \({N_r} = {N_t} = 64\), \({N_x} = 32\). We introduce a perfect CSI as a reference upper limit curve. It is noted that although the performance of our proposed algorithm is slightly worse than the IR scheme at -5dB, the ASE curve of the proposed scheme almost overlaps that of CSI for high SNR values.

6 Conclusion

In this paper, we have proposed a high-precision channel estimation scheme for mmWave massive MIMO with hybrid precoding. Specifically, a two-stage strategy is used to estimate the channel. In the first phase of the algorithm, the DGAMP-SBL which simplifies operations is used to estimate the mmWave MIMO channel coarsely. In the second phase of the algorithm, the previously obtained channel which considers off-grid error is refined by 2DFT-interpolation algorithm. Experimental results have indicated that the proposed high-precision channel estimation strategy can perform better than state-of-art estimation precision. For future work, we focus on the sparse channel estimation 2D high-dimension matrix reduction, since the sparse recovery problem still be limited by the 2D sparse dictionary matrix which has a huge matrix dimension.