1 Introduction

Many researchers regard multiple input multiple output (MIMO) as a candidate technology for fifth-generation mobile networks (5G). With spatial multiplexing, antennas transmit (TX) independent data streams simultaneously to improve the system throughput [1, 2]. With orthogonal frequency-division multiplexing (OFDM), overlapping but orthogonal subchannels are introduced to convert a frequency-selective fading channel into a set of frequency-flat fading channels for easier equalization [3]. From another perspective, OFDM can resist inter-symbol interference (ISI) by appending cyclic prefix (CP) samples in front of OFDM samples.

In the case of single-antenna OFDM systems or other multicarrier systems, three aspects should generally be managed: frequency synchronization, timing synchronization, and channel estimation. Multiple input-multiple output-orthogonal frequency-division multiplexing (MIMO-OFDM) is extremely sensitive to frequency synchronization and channel estimation errors [4]. Carrier frequency offsets (CFOs) are induced by local oscillator mismatches between transmitters and receivers as well as Doppler shifts [5]. CFOs break subcarrier orthogonality, which results in intercarrier interference and the possible performance degradation of the system [6]. Compared with frequency synchronization and channel estimation, timing synchronization is less critical because CP insertion has some tolerance to timing errors [7]. In this paper, we assume the timing offsets are equal to zero and focus on frequency synchronization and channel estimation.

To achieve high transmission quality, a frequency offset must be estimated and compensated at the receiver [8]. The combined estimation of the channel and the frequency offset causes complex problems in MIMO-OFDM systems due to the number of unknown parameters. In some estimation algorithms, CFO and channel estimations are treated separately by using different training sequences [9,10,11,12]. In these schemes, channel estimation is performed assuming zero CFO or frequency synchronization is achieved. However, such assumptions are rarely valid in practice with the presence of noise [9, 10].

In other schemes, to save bandwidth, joint estimation of the channel and frequency offset is attempted. A prohibitively high computational complexity is required to obtain maximum likelihood (ML) solutions for both the frequency offset and channel impulse response (CIR) [13]. These estimators in [3, 5,6,7, 9,10,11,12,13] are designed for single-input single-output OFDM systems and not MIMO-OFDM systems. In [14, 15], the channel estimation of MIMO-OFDM systems was performed assuming no CFO imbalance or perfect frequency synchronization with the training sequences. In [16, 17], the proposed algorithms for CFO estimation were executed assuming an estimated channel and negligible channel effect. In [18], a method based on the alternating projection algorithm was proposed for ML synchronization and channel estimation in orthogonal frequency-division multiple access uplink transmission. In [19,20,21,22,23], joint estimation algorithms have been proposed for time–frequency synchronization with channel identification in MIMO-OFDM systems. However, all these algorithms have rather high complexity. In [24,25,26], an iterative method was used to reduce the complexity. However, in these schemes in [24,25,26], the CFO estimator must use point search, which may assume a search range with a small interval to achieve low complexity. Moreover, the methods presented in [18,19,20,21,22,23,24,25,26] do not provide the ability to track time-varying CFOs and time-varying channels simultaneously.

This paper discusses the problem of joint CFO and channel estimation in MIMO-OFDM systems for mobile users, in which all subcarriers can be utilized simultaneously. The complexity of joint ML estimation through a grid search procedure motivated us to develop a new iteration algorithm [27] that can provide ML solutions for joint estimation problems in an iterative manner. The proposed algorithm can find the initial estimated CFO by employing a derotation algorithm and use this result as an initial value. It then uses the initial value to apply the frequency-domain equalizer and uses the derotation operation again for better estimation. Subsequently, an iterative method is adopted to improve estimation accuracy. To benchmark the performance of the proposed scheme, Cramer-Rao lower bounds (CRBs) were derived for the CFO and the CIR. Finally, the CFO was estimated through a proposed parameterized adaptive process by tracking channels and CFOs simultaneously. The simulation results indicated the suitable performance of the proposed adaptive iteration estimator. The computational complexity of the proposed algorithm is lower than that of the grid search-based method. Our contribution and new ideas are as follows: (a) a joint CFO and channel estimation in MIMO-OFDM systems for mobile users, (b) a new iteration algorithm with lower complexity than the grid search-based method, and (c) a newly designed mechanism with an adaptive mode to simultaneously track the time-varying CFOs and time-varying channels. Up to the present time, the proposed adaptive iteration algorithm is currently a state-of-the art approach for achieving near-optimal performance for this joint CFO and channel estimation problem. To the best of our knowledge, in other studies such as [18,19,20,21,22,23,24,25,26], no report has investigated the joint CFO and channel estimation problem with simultaneous tracking of the time-varying CFOs and time-varying channels.

The remainder of this paper is organized as follows. Section 2 presents the system model. Section 3 describes the development of the adaptive iterative CFO estimation algorithm. Section 4 presents the simulation results, which indicate that the performance of the proposed algorithm is close to the CRBs. Finally, Section 5 reports the conclusions.

Notation: The superscripts(⋅)T, (⋅)−1, (⋅)H, and diag(⋅)K × K represent the transpose, inverse, Hermitian transpose, and K × K diagonal matrix, respectively. The K × K identity matrix is denoted by IK × K.

2 System model

Consider a MIMO-OFDM communication system with K subcarriers as well as N transmitting (TX) and M receiving (RX) antennas (Fig. 1).

Fig. 1
figure 1

Transmitter and receiver structure of N × M MIMO-OFDM system

The model of each transmitter structure of the MIMO-OFDM system is illustrated in Fig. 2 [28].

Fig. 2
figure 2

Each transmitter structure of preamble MIMO-OFDM system

At the jth transmitter, the preamble OFDM signal in the baseband time domain obtained after performing inverse discrete Fourier transform and CP insertion can be denoted as follows [29]:

$$ {x}_j(k)=\left\{\begin{array}{c}\frac{1}{\sqrt{K}}{\sum}_{k=\hbox{-} {N}_g}^{K-1}{d}_j(k){e}^{j2\pi nk/K},\mathrm{if}-{N}_g\le k\le K-1\\ {}0,\kern12.75em \mathrm{otherwise}\end{array}\right. $$
(1)

where d(k) is the frequency-domain data symbol for each subcarrier (k = 0, 1, …, K) and Ng represents the CP samples of the OFDM symbol, which is used for ISI resistance. The discrete-time composite CIR between the jth transmitter and the ith received antenna with L multipaths is represented as follows:

$$ {\mathbf{h}}_{i,j}={\left[{h}_{i,j}(0),{h}_{i,j}(1),\dots, {h}_{i,j}\left(L-1\right)\right]}^{\mathrm{T}},\kern0.5em 1\le i\le M,1\le j\le N $$
(2)

where hi, j(l), l = 0, 1, …, L − 1 is the complex Gaussian gain of the lth multipath. We assume the complex CIR is time-invariant over one OFDM symbol. Thus, the superposition of signals from all TX antennas plus noise is received at each RX antenna. Assume that all the received signals are down-converted to baseband with the same local oscillator centered at fc. In the presence of CFO ε, the samples at the ith RX antenna can be represented as follows:

$$ {y}_i(m)={e}^{j\left(2\pi \varepsilon /K\right)m}\sum \limits_{j=1}^N\sum \limits_{l=0}^{L-1}{h}_{i,j}(l)\cdot {x}_j\left(m-l\right)+{v}_i(m),i=1,\dots, M $$
(3)

where vi(m) is an additive white complex Gaussian distributed noise with a mean of 0 and variance of \( {\sigma}_{v_i}^2 \). Moreover, ε denotes the CFO normalized to the subcarrier spacing [5]. To avoid ISI, we assume that the timing offset is equal to 0 and that the number of multipaths (L) is smaller than the CP length.

According to (3), the received signal vector ri = [yi(0), yi(1), …, yi(K − 1)]Τ at the ith RX antenna can be used to rewrite the input–output relationship as follows:

$$ {\mathbf{r}}_i=\mathbf{D}\left(\varepsilon \right)\sum \limits_{j=1}^N{\mathbf{X}}_j{\mathbf{h}}_{i,j}+{\mathbf{v}}_i=\mathbf{D}\left(\varepsilon \right){\mathbf{X}\mathbf{h}}_i+{\mathbf{v}}_i,\kern0.5em i=1,\dots, M $$
(4)

where D(ε) =  diag (1, ej2πε/K, …, ej2π(K − 1)ε/K)K × K is a diagonal CFO matrix, Xj is a K × L circular matrix formed by the jth transmitted signal xj(k), and vi is a K × 1 noise vector that can be expressed using the covariance matrix \( {\sigma}_{v_i}^2{\mathbf{I}}_{K\times K} \). Then, we define two matrices X and hi as follows:

$$ {\mathbf{X}}_j={\left[\begin{array}{cccc}{x}_j(0)& {x}_j\left(K-1\right)& \cdots & {x}_j\left(1-L\right)\\ {}{x}_j(1)& {x}_j(0)& \cdots & {x}_j\left(2-L\right)\\ {}{x}_j(2)& {x}_j(1)& \cdots & {x}_j\left(3-L\right)\\ {}\vdots & \vdots & \ddots & \vdots \\ {}{x}_j\left(K-1\right)& {x}_j\left(K-2\right)& \cdots & {x}_j\left(K-L\right)\end{array}\right]}_{K\times L},j=1,\dots, N\;\boldsymbol{X}={\left[{\boldsymbol{X}}_{1\kern0.5em }\cdots \kern0.5em {\boldsymbol{X}}_{N\kern0.5em }\;\right]}_{K\times N\cdot L};{\mathbf{h}}_i={\left[{\mathbf{h}}_{i,1}^{\mathrm{T}},{\mathbf{h}}_{i,2}^{\mathrm{T}},\dots, {\mathbf{h}}_{i,N}^{\mathrm{T}}\right]}_{N\cdot L\times 1}^{\mathrm{T}}. $$
(5)

Let D(ε)Xhi = si and si = [si(0), …, si(K − 1)]T so that the signal-to-noise ratio (SNR) is defined [10] as follows:

$$ \mathrm{SNR}={\sigma}_{s_i}^2/{\sigma}_{v_i}^2 $$
(6)

where

$$ {\sigma}_{{\mathbf{s}}_i}^2=\frac{1}{K}\sum \limits_{k=0}^{K-1}{\left|{s}_i(k)\right|}^2\kern1.5em ,\mathrm{k}=0,1,\dots, K-1. $$
(7)

3 Proposed joint CFO and channel iterative estimation algorithm

3.1 Receiver design

The processing diagram at the receiver for the proposed algorithm is displayed in Fig. 3.

Fig. 3
figure 3

Processing diagram of the proposed MIMO-OFDM receiver structure

In the common RX structure used in OFDM transmission, a frequency-domain equalizer is adopted to eliminate multipath interference. The coefficients of the equalizer are the results obtained through channel estimation. These results are fed back to the time-domain signal, and the steps are run again to obtain more accurate CFO estimates. The main idea is that the proposed algorithm combines the CFOs estimated in the previous processes as a rough estimated value for the followed iteration. Through small-step iterative searching, we can find the maximum peak near the actual CFO as the fine-tuned estimate. Finally, the CFO adaptation process is performed using the final fine-tuned CFO estimate to improve the performance of the proposed estimator for tracking the time-varying parameters. The aforementioned are described in detail as the following text.

3.2 Initial CFO estimation

According to (4) and the Gaussian probability density function, the ML function can be written as follows:

$$ \Lambda \left({\mathbf{h}}_i,\varepsilon \right)=\frac{1}{{\left({\pi \sigma}_{v_i}^2\right)}^K}{e}^{-\frac{1}{\sigma_{v_i}^2}{\left\Vert {\mathbf{r}}_i-\mathbf{D}\left(\varepsilon \right){\mathbf{Xh}}_i\right\Vert}^2}. $$
(8)

Then, the log-likelihood function for two unknown parameters can be expressed as follows:

$$ {L}_i\left(\varepsilon, {\mathbf{h}}_i\right)=-K\ln \left({\pi \sigma}_{{\mathbf{v}}_i}^2\right)-\frac{1}{\sigma_{{\mathbf{v}}_i}^2}{\left\Vert {\mathbf{r}}_i-\mathbf{D}\left(\varepsilon \right){\mathbf{Xh}}_i\right\Vert}^2. $$
(9)

To evaluate the log-likelihood function in (9), the constant is eliminated to minimize the newly defined objective function, which is the second term of the log-likelihood function.

$$ \phi \left(\varepsilon, {\mathbf{h}}_i\right)\overset{\varDelta }{=}{\left\Vert {\mathbf{r}}_i-\mathbf{D}\left(\varepsilon \right){\mathbf{Xh}}_i\right\Vert}^2={\left({\mathbf{r}}_i-\mathbf{D}\left(\varepsilon \right){\mathbf{Xh}}_i\right)}^H\left({\mathbf{r}}_i-\mathbf{D}\left(\varepsilon \right){\mathbf{Xh}}_i\right). $$
(10)

We can expand (10) and take the partial differentiation operator for hi; then, let the first-order derivative be equal to 0 to obtain \( {\tilde{\mathbf{h}}}_i \) as follows:

$$ {\tilde{\mathbf{h}}}_i={\left({\mathbf{X}}^H\mathbf{X}\right)}^{-1}{\mathbf{X}}^H{\mathbf{D}}^H\left(\varepsilon \right){\mathbf{r}}_i. $$
(11)

By substituting (11) into (10), for the purpose of minimizing the ML objective function (10), it becomes to maximize

$$ {\overline{\phi}}_1\left(\varepsilon \right)={{\mathbf{r}}_i}^H\mathbf{D}\left(\varepsilon \right)\mathbf{X}{\left({\mathbf{X}}^H\mathbf{X}\right)}^{-1}{\mathbf{X}}^H{\mathbf{D}}^H\left(\varepsilon \right){\mathbf{r}}_i. $$
(12)

The estimation of ε for the ith RX antenna can be obtained as follows:

$$ {\hat{\varepsilon}}_i=\underset{\varepsilon }{\arg}\left\{\max \left[{{\mathbf{r}}_i}^H\mathbf{D}\left(\varepsilon \right)\mathbf{X}{\left({\mathbf{X}}^H\mathbf{X}\right)}^{-1}{\mathbf{X}}^H{\mathbf{D}}^H\left(\varepsilon \right){\mathbf{r}}_i\right]\right\}. $$
(13)

Equation (13) can be used to perform a grid search to find the optimal CFO. However, this process requires many computations and is thus difficult to implement. The projection matrix X(XHX)−1XH is formed by the preamble signal per frame transmission. The projection matrix can be represented as follows:

$$ {\displaystyle \begin{array}{l}{\mathbf{X}}^H\mathbf{X}=\\ {}{\left[\begin{array}{ccccc}{R}_1& {C}_{1,2}& \dots & {C}_{1,N\cdot L-1}& {C}_{1,N\cdot L}\\ {}{C}_{2,1}& {R}_2& \dots & {C}_{2,N\cdot L-1}& {C}_{2,N\cdot L}\\ {}\vdots & \vdots & \ddots & \vdots & \vdots \\ {}{C}_{N\cdot L-1,1}& {C}_{N\cdot L-1,2}& \dots & {R}_{N\cdot L-1}& {C}_{N\cdot L-1,N\cdot L}\\ {}{C}_{N\cdot L,1}& {C}_{N\cdot L,2}& \dots & {C}_{N\cdot L,N\cdot L-1}& {R}_{N\cdot L}\end{array}\right]}_{N\cdot L\times N\cdot L}.\end{array}} $$
(14)

The main diagonals of the XHX matrix are real-valued and are denoted as Ri, i = 1,…, N·L. Its adjacent diagonals parallel to the main diagonals are denoted as Ci, j, i = 1,…, N·L, j = 1,…, N·L, i ≠ j, which represent conjugate symmetry on the other side parallel to the main diagonal. The matrix X(XHX)−1XH has a similar structure.

$$ {\displaystyle \begin{array}{l}\mathbf{X}{\left({\mathbf{X}}^H\mathbf{X}\right)}^{-1}{\mathbf{X}}^H=\\ {}{\left[\begin{array}{ccccc}{W}_1& {E}_{1,2}& \dots & {E}_{1,K-1}& {E}_{1,K}\\ {}{E}_{2,1}& {W}_2& \dots & {E}_{2,K-1}& {E}_{2,K}\\ {}\vdots & \vdots & \ddots & \vdots & \vdots \\ {}{E}_{K-1,1}& {E}_{K-1,2}& \dots & {W}_{K-1}& {E}_{K-1,K}\\ {}{E}_{K,1}& {E}_{K,2}& \dots & {E}_{K,K-1}& {W}_K\end{array}\right]}_{K\times K}.\end{array}} $$
(15)

where Wi, i = 1, …, K, are real-valued and Ei, j, i = 1, …, K, j = 1, …, K, i ≠ j, are complex valued and are the conjugate of Ej, i.

By substituting (15) into (13), the estimation of ε for the ith RX antenna can be expressed as follows:

$$ {\displaystyle \begin{array}{l}{\hat{\varepsilon}}_i=\underset{\varepsilon }{\arg}\left\{\max \left[{{\mathbf{r}}_i}^H\mathbf{D}\left(\varepsilon \right)\mathbf{X}{\left({\mathbf{X}}^H\mathbf{X}\right)}^{-1}{\mathbf{X}}^H{\mathbf{D}}^H\left(\varepsilon \right){\mathbf{r}}_i\right]\right\}\\ {}=\underset{\varepsilon }{\arg \max}\left\{\begin{array}{l}\sum \limits_{k=1}^{K-1}{r}_i^2(k){W}_{k+1}+\\ {}2\operatorname{Re}\left[\begin{array}{l}\sum \limits_{k=0}^{K-2}{r_i}^{\ast}\left(k+1\right){r}_i(k){e}^{j2\pi \varepsilon /K}{E}_{k+2,k+1}+\\ {}\sum \limits_{k=0}^{K-3}{r_i}^{\ast}\left(k+2\right){r}_i(k){e}^{j2\pi (2)\varepsilon /K}{E}_{k+3,k+1}+\cdots +\\ {}\sum \limits_{k=0}^1{r_i}^{\ast}\left(K-2+k\right){r}_i(k){e}^{j2\pi \left(K-2\right)\varepsilon /K}{E}_{K-1+k,k+1}\\ {}+{r_i}^{\ast}\left(K-1\right){r}_i(0){e}^{j2\pi \left(K-1\right)\varepsilon /K}{E}_{K,1}\end{array}\right]\end{array}\right\}.\end{array}} $$
(16)

To find the maximum value for the estimation of ε, (16) can be separated into multiple terms. Because the first term in (16) is a fixed real value, we only consider the second part. In the second part, because the first term is a complex value caused by the frequency offset, we can derotate its phase to the real axis; thus, the frequency offset can be compensated to achieve a maximum real value. The frequency offsets of K terms for the ith RX antenna can be obtained as follows:

$$ {\displaystyle \begin{array}{l}{\tilde{\varepsilon}}_i(k)=\mathrm{angle}\left(\sum \limits_{a=0}^{K-k-1}{E}_{a+k+1,a+1}{r}_i^{\ast}\left(a+k\right){r}_i(a)\right)K/\left(2 k\pi \right),\\ {}\kern0.5em \begin{array}{cc}& \end{array}\begin{array}{cc}& \end{array}\begin{array}{cc}& \end{array}\begin{array}{cc}& \end{array}\begin{array}{cc}& \end{array}\begin{array}{cc}& \end{array}\begin{array}{cc}& \end{array}\begin{array}{cc}& k=1,\dots, K-1\end{array}.\end{array}} $$
(17)

After finding the phase of each term of (17) associated with the maximum real value, we can average the phases of all the terms for each RX antenna and then use the calculated average values to obtain the initial CFO as follows:

$$ \tilde{\varepsilon}=\frac{1}{M}\sum \limits_{i=1}^M\frac{1}{\left(K-1\right)}\sum \limits_{k=1}^{K-1}{\tilde{\varepsilon}}_i(k). $$
(18)

3.3 Frequency-domain equalizer

The initial CFO performance degrades due to multipath interference and may not be accurate. Therefore, a frequency-domain equalizer is applied to overcome this problem and obtain a more accurate CFO. First, we utilize the initial CFO (18) to find the channel response (11). Second, we use the initial CFO to compensate the received signal, which is then transformed into the frequency domain. A frequency-domain equalizer is employed using the estimated channel response to cancel the multipath interference [30]. The aforementioned operations are expressed as follows:

$$ {\tilde{\mathbf{y}}}_i={\left(\mathbf{D}\left(\tilde{\varepsilon}\right)\right)}^{\mathrm{H}}{\mathbf{r}}_i={\left(\mathbf{D}\left(\tilde{\varepsilon}\right)\right)}^{\mathrm{H}}\left(\mathbf{D}\left(\varepsilon \right)\sum \limits_{j=1}^N{\mathbf{X}}_j{\mathbf{h}}_{i,j}+{\mathbf{v}}_i\right). $$
(19)

If \( \tilde{\varepsilon}=\varepsilon \), we can rewrite the received signal (19) and transform it into the frequency domain as follows:

$$ {\tilde{\boldsymbol{\upgamma}}}_i=\mathbf{F}{\tilde{\mathbf{y}}}_i=\sum \limits_{j=1}^N{\mathbf{H}}_{i,j}{\mathbf{s}}_j+{\mathbf{v}}_i^{\prime } $$
(20)

where Hi, j is a diagonal matrix that is the frequency-domain channel matrix. The term \( {\mathbf{H}}_{i,j}=\mathit{\operatorname{diag}}\left({\overline{\mathbf{h}}}_{i,j}\right),{\overline{\mathbf{h}}}_{i,j}={\left[{\overline{h}}_{i,j}(0),{\overline{h}}_{i,j}(1),\dots, {\overline{h}}_{i,j}\left(K-1\right)\right]}^{\mathrm{H}} \) represents a K × 1 vector that includes the samples from the K-point discrete Fourier transform of the channel response, and sj is the frequency-domain signal of the jth TX antennas. We can rewrite the signal received at all antennas over the kth subcarrier as follows:

$$ {\tilde{\boldsymbol{\upgamma}}}_k={{\left[{\tilde{\boldsymbol{\upgamma}}}_1(k),{\tilde{\boldsymbol{\upgamma}}}_2(k),\dots, {\tilde{\boldsymbol{\upgamma}}}_M(k)\right]}^{\mathrm{T}}}_{M\times 1} $$
$$ {\tilde{\boldsymbol{\upgamma}}}_k={\tilde{\mathbf{H}}}_k{\tilde{\mathbf{s}}}_k+{\mathbf{v}}_k,\kern0.5em k=0,1,\dots, K-1 $$
(21)

where \( {\tilde{\mathbf{H}}}_k\in {\mathrm{\mathbb{C}}}^{M\times N} \) is the MIMO channel matrix over the kth subcarrier and \( {\tilde{\mathbf{s}}}_k\in {\mathrm{\mathbb{C}}}^{N\times 1} \) is the signal transmitted over the kth subcarrier. According to the signal at each subcarrier, we can equalize the compensated received signals as follows:

$$ {\hat{\tilde{\mathbf{s}}}}_k={\left({\tilde{\mathbf{H}}}_k^H{\tilde{\mathbf{H}}}_k\right)}^{-1}{\tilde{\mathbf{H}}}_k^H{\tilde{\gamma}}_k,\kern0.5em k=0,1,\dots, K-1 $$
(22)

Subsequently, we can collect K consecutive samples and transform the equalized signals back into the time domain. The derotation method is then used again to calculate the more accurate CFO \( {\tilde{\varepsilon}}_e \).

In the following section, an iterative method is described. This method provides a refined estimate close to the CRB.

3.4 Small-step iterative searching

The CFO estimation is refined using an iteration method. The CFO \( {\tilde{\varepsilon}}_e \) derived in the previous section is used as the initial main frequency estimate. Then, adjacent frequencies are selected near the initial main frequency as candidate frequencies. The next step is to substitute the main and candidate frequencies into (16) to evaluate the CFO values. The largest CFO function value is determined to identify the search direction (refer to Fig. 4, step 1). Subsequently, the frequency selected in the previous step becomes the new main frequency \( {\tilde{\varepsilon}}_m \). The fixed-step frequency Δf is added with the new main frequency \( {\tilde{\varepsilon}}_m \) to acquire its adjacent frequency, and the comparison is performed as before (refer to Fig. 4, steps 2-1 and 2-2). This iterative process continues until the function value of the main frequency in the evaluation is larger than that of its adjacent frequency (refer to Fig. 4, steps 2 and 3; e.g., the total number of iterations is 3). Finally, the maximum frequency peak εf is achieved in the region between the last main frequency and the last adjacent frequency (refer to Fig. 4, step 3).

Fig. 4
figure 4

Iterative small-step frequency searching

However, the global maximum frequency peak may not be in the search region if the setting of a small-step frequency is inappropriate. For instance, the CFO estimated with the iterative method would not be at the true frequency peak if the true global maximum frequency is in the previous range between the iterative main frequency and the iterative adjacent frequency (refer to Fig. 5, steps 2 and 3). Because the iterative adjacent frequency \( {\tilde{\varepsilon}}_{m_i}+\Delta f \) is larger than the iterative main frequency \( {\tilde{\varepsilon}}_{m_i} \), the maximum frequency is decided to be located at the right-hand side of the iterative adjacent frequency. Therefore, the iteration continues until the result of the main frequency is larger than that of the adjacent frequency (refer to Fig. 5, steps 2–4). However, the peak of the maximum frequency is outside the final search range.

Fig. 5
figure 5

Reverse iterative searching process

In the aforementioned circumstance, the method of reverse iterative searching is adopted. When the maximum value is at the boundary frequency, the search direction goes in the reverse side (refer Fig. 5, step 3). This process is reverse iterative searching. The maximum peak frequency is on the other side of the boundary frequency. The iterative reverse search method finally identifies the global maximum value (refer to Fig. 5, step 3). Illustrations of the iterative small-step-frequency search process and reverse iterative search process are provided in Figs. 4 and 5, respectively. A large Δf can reduce the search time, and a small Δf can provide high estimation accuracy. Thus, a trade-off is involved in the selection of the Δf value. Practically, an appropriate value is selected based on experimental trial and error results in operational environments and has correlation with SNRs.

3.5 Computational complexity and the procedure of the proposed method

We adopt the Big-Oh notation, which is a well-accepted approach for analyzing the computational complexities of algorithms. The detailed analysis of the computational complexity is as follows. Note that K is the number of subcarriers and N and M are the numbers of TX and RX antennas, respectively. The proposed scheme comprises five major steps: (1) In step 1, Eqs. (17) and (18) are involved. Taking phase/angle operation may adopt CORDIC IP core, by configuring it to be an arctan mode, which requires a constant time. Therefore, the complexity is assumed to be O (1). The above steps are performed MK times in Eq. (18). Therefore, the computational complexity in step 1 is O (MK(K + 1)) = O(MK2). (2) In step 2, Eq. (11) is computed. The term (XHX)−1XH matrix in Eq. (11) is formed by the preamble signal per frame transmission so that the information of this matrix is treated as already known for both TX and RX. The components of the resultant matrix (XHX)−1XH can be saved at RX. Therefore, the computational complexity of this matrix (XHX)−1XH requires O (1) in Eq. (11). Here, we consider the worst-case scenario, where L equals the length of cyclic prefix (CP) samples of the OFDM symbol. The term DH(ε)ri has a computational complexity of O(K). Therefore, the product of (XHX)−1XH and DH(ε)ri matrices has a computational complexity of O (KNL). The above operations are performed M times in Eq. (11). Therefore, the computational complexity in step 2 is O (M (K+ KNL)). (3) In step 3, the discrete Fourier transform of the channel response to form \( {\tilde{\mathbf{H}}}_k \) by using Fast Fourier Transform, FFT has a computational complexity of O (MNK log K). The \( {\tilde{\mathbf{H}}}_k^H{\tilde{\mathbf{H}}}_k \) matrix and its matrix inverse operation have a computational complexity of O (MN2) and O (N3), respectively. The calculation of the term \( {\tilde{\mathbf{H}}}_k^H{\tilde{\gamma}}_k \) has a computational complexity of O (NM). In addition, the matrix \( {\left({\tilde{\mathbf{H}}}_k^H{\tilde{\mathbf{H}}}_k\right)}^{-1}{\tilde{\mathbf{H}}}_k^H{\tilde{\gamma}}_k \) has a computational complexity of O (N2). These operations are performed K times in Eq. (22). Then, the operations in step 1 are followed with the computational complexity of O(MK2). Therefore, the computational complexity in step 3 is O (MNKlogK + K(MN2+ N3 + NM + N2) + MK2). (iv) In steps 4 and 5, the computational complexity of the carrier frequency offset searching process is evaluated by using (16). Let \( {\tilde{N}}_{\mathrm{iter}} \) be the total number of iterations for the convergence of the step frequency search process. The computational complexity in steps 4 and 5 is O (MK2\( {\tilde{N}}_{\mathrm{iter}} \)). Similarly, let \( {\tilde{N}}_{\mathrm{grid}} \) be the number of the grid search points in the possible CFO range. The total computational complexity of the grid search-based method can be achieved by O(MK2\( {\tilde{N}}_{\mathrm{grid}} \)). Note that \( {\tilde{N}}_{\mathrm{grid}} \) >> K and \( {\tilde{N}}_{\mathrm{grid}} \) >> \( {\tilde{N}}_{\mathrm{iter}} \). For instance, if the normalized CFOs are located over [−0.5, 0.5], then the number of the grid search-based method in this range would be \( {\tilde{N}}_{grid} \)= 105 if the resolution of 10−5 is set.

The procedure of the proposed algorithms and the computational complexity are summarized in Tables 1 and 2, respectively.

Table 1 Summary of the proposed algorithms
Table 2 Summary of the computational complexity analysis

3.6 Adaptive mode for tracking the time variations of parameters

According to the aforementioned described process, the CFO \( {\tilde{\varepsilon}}_{T=t} \) and CFO \( {\tilde{\varepsilon}}_{T=t+\Delta t} \) (CFO of the next time unit) are obtained (훥t can be the symbol time, slot duration, frame duration, or any other time unit in the transmission systems). The adjustment equation of the final CFO is updated with the coefficient μ as follows:

$$ {\tilde{\varepsilon}}_{e,T=t+\Delta t}=\mu {\tilde{\varepsilon}}_{T=t}+\left(1-\mu \right){\tilde{\varepsilon}}_{T=t+\Delta t}. $$
(23)

By substituting (23) into (11), the channel estimation in this adaptive mode is rewritten as follows:

$$ {\displaystyle \begin{array}{l}{\tilde{\mathbf{h}}}_{i,T=t+\Delta t}=\\ {}\kern1.5em {\left({{\mathbf{X}}_{T=t+\Delta t}}^H{\mathbf{X}}_{T=t+\Delta t}\right)}^{-1}{{\mathbf{X}}_{T=t+\Delta t}}^H{\mathbf{D}}^H\left({\tilde{\varepsilon}}_{e,T=t+\Delta t}\right){\mathbf{r}}_{i,T=t+\Delta t}\end{array}} $$
(24)

4 Simulation

This section presents the simulation results and discusses the efficacy of the proposed schemes, including the iterative mode and adaptive tracking mode. The parameters common to both the iterative mode and adaptive tracking mode were as follows: M = 2, N = 2, K = 64, and the CP size Ng = 16. A preamble was inserted at the beginning of each transmitted signal per frame. The transmitted OFDM data symbols were modulated through quaternary phase-shift keying by using a three-multipath channel model, and estimations were made on the basis of the preamble. The delay profile for the multipaths is presented in Table 3. The channel gain of each multipath was randomly generated with a Gaussian distribution of variance 1. Equations (6) and (7) were used for the SNR calculation in the simulation. After the channel gain coefficients are generated, the noise power associated with the received signal power is generated for a particular SNR. A Δf value of 10−5 was adopted in the simulation. The results of 5000 Monte Carlo simulations were averaged for each SNR.

Table 3 Multipath delay profile

For the iterative mode, the normalized CFOs were randomly selected independently from the random variable and were uniformly distributed over [−0.5, 0.5]. The coefficients of the CIR were complex valued and generated independently from normally distributed random numbers.

Regarding the adaptive tracking mode, the coefficients of the CIR were generated using Jakes’ model, with the carrier frequency equaling 3.5GHz, which is to be adopted in 5G. Various velocities (v) produced a time-varying Doppler shift. The channel variations in Jakes’ model rely on the parameter of the maximum Doppler shift, which is determined by the carrier frequency and the mobile speed. Therefore, the change of the carrier frequency 푓 to scrutinize other frequencies in the 5G band is equivalent to the change of the mobile speed with a larger value for the specified carrier frequency of 3.5 GHz. Simulations were performed for various mobile speeds.

In the case of channel estimation errors, \( \mathrm{E}\left({\left\Vert \hat{\mathbf{h}}-\mathbf{h}\right\Vert}^2\right) \) was calculated and plotted using the sample averages. To obtain benchmarks for the performance evaluation, the lower bounds for the CFO and CIR in the iterative mode were derived using the Fisher information matrix based on (4), as in [31].

$$ {\displaystyle \begin{array}{l}{\left[\begin{array}{c}{\mathbf{r}}_1\\ {}\vdots \\ {}{\mathbf{r}}_M\end{array}\right]}_{M\times 1}={\left[\begin{array}{ccc}\mathbf{D}\left(\varepsilon \right)\mathbf{X}& & \mathbf{O}\\ {}& \ddots & \\ {}\mathbf{O}& & \mathbf{D}\left(\varepsilon \right)\mathbf{X}\end{array}\right]}_{M\times M}{\left[\begin{array}{c}{\mathbf{h}}_1\\ {}\vdots \\ {}{\mathbf{h}}_M\end{array}\right]}_{M\times 1}+{\left[\begin{array}{c}{\mathbf{v}}_1\\ {}\vdots \\ {}{\mathbf{v}}_M\end{array}\right]}_{M\times 1}\\ {}\Rightarrow \mathbf{r}=\left({\mathbf{I}}_M\otimes \mathbf{D}\left(\varepsilon \right)\mathbf{X}\right)\mathbf{h}+\mathbf{v}\end{array}} $$
(25)

where r = [r1, …, rM]Τ, h = [h1, …, hM]Τ;

$$ \mathrm{CRB}\left(\varepsilon \right)=\frac{K^2{\sigma}_{\mathbf{v}}^2}{8{\pi}^2}{\left[{\mathbf{h}}^H\left[{\mathbf{I}}_M\otimes {\mathbf{X}}^H{\mathbf{M}\boldsymbol{\Pi}}^{\perp}\mathbf{MX}\right]\mathbf{h}\right]}^{-1} $$
(26)

where ⊗ denotes the Kronecker product. Moreover, the following equation is obtained:

$$ {\displaystyle \begin{array}{l}{\boldsymbol{\Pi}}^{\perp }={\mathbf{I}}_K-\mathbf{X}{\left({\mathbf{X}}^H\mathbf{X}\right)}^{-1}{\mathbf{X}}^H\\ {}\mathbf{M}=\operatorname{diag}\left(0,1,2,\cdots, K-1\right)\end{array}} $$
(27)

Similarly, the CRB was obtained for the estimated \( \hat{\mathbf{h}} \) as follows:

$$ {\displaystyle \begin{array}{c}\mathrm{E}\left({\left\Vert \hat{\mathbf{h}}-\mathbf{h}\right\Vert}^2\right)=\mathrm{E}\left[{\left(\hat{\mathbf{h}}-\mathbf{h}\right)\left(\hat{\mathbf{h}}-\mathbf{h}\right)}^{\mathrm{H}}\right]\ge \frac{\sigma_{\mathbf{v}}^2}{2}\left(2\lambda +{\gamma}^{-1}{\beta \beta}^{\mathrm{H}}\right)\\ {}= CRB\left(\mathbf{h}\right)\end{array}} $$
(28)

where

$$ {\displaystyle \begin{array}{l}\lambda ={\mathbf{I}}_M\otimes {\left({\mathbf{X}}^{\mathrm{H}}\mathbf{X}\right)}^{-1}\\ {}\beta ={\mathbf{I}}_M\otimes \left[{\left({\mathbf{X}}^{\mathrm{H}}\mathbf{X}\right)}^{-1}{\mathbf{X}}^{\mathrm{H}}\mathbf{D}\left(\varepsilon \right)\mathbf{X}\right]\mathbf{h}\\ {}\gamma ={\mathbf{h}}^{\mathrm{H}}{\mathbf{I}}_M\otimes \left[{\mathbf{X}}^H{\mathbf{M}\boldsymbol{\Pi}}^{\perp}\mathbf{MX}\right]\mathbf{h}\end{array}} $$
(29)

4.1 Algorithm performance in the iterative mode

The mean squared error (MSE) performance of the proposed algorithm for CFO estimation in the iterative mode at different SNRs is revealed in Fig. 6. The performance of the proposed algorithm in the iterative mode was satisfactory and close to the CRB [refer to (26)].

Fig. 6
figure 6

The MSE performance for CFO estimation of the proposed algorithm compared with CRB

The MSE performance of the proposed algorithm for channel estimation at different SNRs is illustrated in Fig. 7. Comparisons of the CRB [refer to (28)] and channel estimate with the estimates in the “perfect CFO estimation” condition, in which the CFOs are perfectly known, indicated that the proposed joint estimation algorithm provided satisfactory channel estimation performance.

Fig. 7
figure 7

The MSE performance for channel estimation compared with CRB and the ideal case

In Fig. 8, the bit error rates (BERs) of the three schemes are illustrated as follows: (a) the “ideal channel” condition, which has perfect channel estimation and perfect CFO synchronization; (b) the “perfect CFO estimation with joint channel estimation” condition, in which CFOs are assumed to be known for estimating the CIR; and (c) “proposed joint CFO and channel estimation.” The results indicate that the BER performance of the joint estimation is nearly identical to that with ideal assumptions. The proposed algorithm not only has a considerably lower computational complexity than the grid search-based method but also provides satisfactory performance, as revealed in Figs. 6, 7, and 8.

Fig. 8
figure 8

The BER performance comparison for the joint estimation

4.2 Algorithm performance in the adaptive tracking mode

The MSE performance of the proposed algorithm for CFO estimation in the adaptive mode at different SNRs is illustrated in Figs. 9, 10, and 11. To obtain intensive simulation results and the best MSE performance, the value of μ was selected as follows for various mobile speeds: (a) μ = 0.5 at v = 0 km/h in Fig. 9, (b) μ = 0.4998 at v = 60 km/h (the speed limit for automobiles in suburban areas) in Fig. 10, and (c) μ = 0.18 at v = 500 km/h (the average speed of high-speed rail, as defined for 5G) in Fig. 11. The μ values for various other mobile speeds are presented in Table 4. The adaptive mode was executed on a per frame basis. The frame duration 훥t was 1 ms. As expected, the simulation results indicated that the coefficient μ should be decreased when the velocity v increases. The estimated CFO \( {\tilde{\varepsilon}}_{T=t+\Delta t} \) of the next frame should have more weight than the estimated CFO \( {\tilde{\varepsilon}}_{T=t} \) when the time variation is faster. Therefore, the weighting of the parameterized equation [refer to (23)] tends toward the second term. As illustrated in Figs. 9 and 10, the performance of the proposed algorithm in the adaptive mode was superior to its performance in the iterative mode. As depicted in Fig. 11, the MSE performance of the proposed adaptive mode began to degrade at SNR = 24 dB due to high time variations, and μ was approximately selected. This phenomenon did not occur at low mobile speeds.

Fig. 9
figure 9

The MSE performance for CFO estimation in the adaptive mode at v = 0 km/h

Fig. 10
figure 10

The MSE performance for CFO estimation in the adaptive mode at v = 60 km/h

Fig. 11
figure 11

The MSE performance for CFO estimation in the adaptive mode at v = 500 km/h

Table 4 The parameter μ for other various mobile speeds

The MSE performance of the proposed algorithm for channel estimation in the adaptive mode is displayed in Figs. 12, 13, and 14. The value of the parameter μ is selected for various mobile speeds as: (a) μ = 0.5 at v = 0 km/h in Fig. 12, (b) μ = 0.4998 at v = 60 km/h in Fig. 13, and (c) μ = 0.18 at v = 500 km/h in Fig. 14, respectively, for the approximate best MSE performance. The results obtained with the proposed algorithm in the iteration mode and those obtained in the “perfect CFO estimation” condition were compared. The simulation results revealed that the performance obtained in the proposed adaptive mode was competitive with that obtained in the “perfect CFO estimation” condition (Figs. 9, 10, and 11).

Fig. 12
figure 12

The MSE performance for channel estimation in the adaptive mode at v = 0 km/h

Fig. 13
figure 13

The MSE performance for channel estimation in the adaptive mode at v = 60 km/h

Fig. 14
figure 14

The MSE performance for channel estimation in the adaptive mode at v = 500 km/h

Figures 15 and 16 display the BERs of the following three schemes: (1) the “ideal channel” condition, in which perfect channel estimation and CFO estimation are assumed; (2) the “perfect CFO estimation and joint channel estimation” condition, in which the CFOs are assumed to be known to estimate the CIR; and (3) the “proposed joint CFO and channel estimation” in the adaptive mode. To obtain the best BER performance, the values of μ were selected as follows: (a) μ = 0.4998 at v = 60 km/h in Fig. 15 and (b) μ = 0.18 at v = 500 km/h in Fig. 16. The results indicated that the proposed adaptive mode estimation offers a BER performance nearly identical to that obtained with ideal assumptions.

Fig. 15
figure 15

The BER performance of the joint estimation in the adaptive mode at v = 60 km/h

Fig. 16
figure 16

The BER performance of the joint estimation in the adaptive mode at v = 500 km/h

The tracking ability of the proposed adaptive mode was examined at v = 120 km/h (Fig. 17). The estimate was compared with a real time-varying CFO. The comparison indicated that the proposed algorithm can track a time-varying CFO in the adaptive mode. For a vehicle velocity of 120 km/h, the tracking results of the h11 are plotted in Fig. 18, where we select one of the channel gains in the illustration. The time variations of the channel were accurately tracked.

Fig. 17
figure 17

Normalized real CFO vs. the estimated CFO for the time-varying scenario

Fig. 18
figure 18

Channels tracking performance for the time-varying scenario

5 Conclusion

In this paper, we propose ML-based algorithms for joint CFO and channel estimation. The proposed methods provided fairly competitive performance to that of CRBs. Moreover, the proposed methods have a lower computational complexity than the grid search-based method does. In addition, an adaptive mode is proposed to improve the algorithm performance. The adaptive method is used to obtain the weighted CFO; thus, the adaptive mode can enhance the performance of the original proposed algorithm and provide tracking capability for time-varying parameters. The parameter μ should be adjusted according to operational environments.