1 Introduction

Orthogonal frequency division multiple-access (OFDMA) is a multi-carrier multi-user transmission technology used in various wireless communication systems nowadays, e.g.[13]. However, high peak-to-average-power ratio (PAPR) and sensitivity to carrier frequency offsets (CFOs) are the two main challenges in implementation of OFDMA systems. High PAPR forces the power amplifier to operate with a back-off, reducing the energy efficiency of the system. Energy efficiency is a major concern in uplink (UL) transmissions for battery-operated mobile users (MUs) due to limited power resources. To resolve the high PAPR issue, a variant of OFDMA termed as single-carrier frequency division multiple-access (SC-FDMA) has been proposed[4], which employs extra discrete Fourier transform (DFT) precoding at the transmitter (TX) to reduce the PAPR of the transmitted signal. Having the advantage of lower PAPR, SC-FDMA is employed in long-term evolution (LTE) UL transmissions as opposed to OFDMA[3]. However, the reduction in PAPR is also influenced by the choice of sub-carrier allocation scheme in UL transmissions[5]. Sub-carrier allocation schemes are broadly classified into three types: grouped, interleaved, and generalized sub-carrier allocations[6]. Among the three schemes, interleaved allocation scheme (IAS), in which the sub-carriers of different users are interleaved into each other, provides the lowest PAPR[5]. In addition, in a perfectly synchronized system, SC-FDMA with IAS outperforms OFDMA with minimum mean squared error (MMSE) equalization[7], which motivates the use of IAS in SC-FDMA UL transmissions.

However, being a variant of OFDMA, SC-FDMA transmissions are still sensitive to CFOs. CFOs arise in UL transmissions due to the residual errors in the downlink (DL) synchronization or Doppler shift induced by the mobility of MUs. The orthogonality of the sub-carriers is violated by CFOs, introducing inter-carrier interference (ICI). In addition, as different MUs have independent CFOs, multiple UL CFOs also introduce multi-user interference (MUI). Without precise CFO estimation and compensation at the base station (BS), corresponding to each MU, ICI, and MUI severely degrade the system performance. Thus, frequency synchronization is a critical step in SC-FDMA transmissions just as in OFDMA. As sub-carriers of different users as interleaved into each other, CFOs generate more MUI in IAS as compared to grouped or generalized allocation schemes. The superior performance of SC-FDMA with IAS in a perfectly synchronized system is lost in the case of non-zero CFOs[8] implying that SC-FDMA is more sensitive to CFOs as compared to OFDMA and motivating the need for an efficient frequency synchronization algorithm for SC-FDMA with IAS.

Due to its similarities with OFDMA, frequency synchronization schemes proposed for OFDMA are applicable to SC-FDMA as well. CFO estimation schemes proposed in the literature can be classified based on sub-carrier allocation schemes, like[911] for grouped allocation,[1217] for generalized allocation, and[1823] for IAS. These schemes can also be classified as data-aided[12, 1517] or non-data-aided/blind schemes[911, 13, 18, 19, 2123]. Data-aided schemes use specially designed training symbols[12, 16, 17] or pilot sub-carriers[15] to perform CFO estimation while blind schemes use null sub-carriers[9, 13], the inherent structure of the transmitted symbol[18, 19, 21, 22] or its statistical characteristics[10].

For CFO estimation in SC-FDMA with IAS, multiple signal classification (MUSIC) algorithm is employed in[18] for blind CFO estimation using a line search. A closed-form solution with better performance at lower signal-to-noise ratios (SNR) is proposed in[19] employing the estimation of signal parameters via rotational invariance technique (ESPRIT). As both MUSIC and ESPRIT are sub-space-based techniques, the CFO estimation schemes in[18, 19] require a non-empty noise sub-space for CFO estimation. This condition implies that the system cannot operate on full load or an extended cyclic prefix (CP) must be employed, which reduces the overall spectral efficiency. Low-complexity blind CFO estimation algorithm has been proposed in[22] while two CFO estimation methods for multi-antenna BSs, based on rank reduction and alternating projection methods are proposed in[21]. Although the schemes in[21] support fully loaded systems, the number of receive antennas must be greater than the channel order, which restricts its applicability in broadband systems. Data-aided maximum likelihood (ML) CFO estimation has been proposed in[16]. However, the complexity of the proposed scheme is too high for practical implementations. A sub-optimal algorithm with lower computational complexity but degraded performance is proposed in[17].

In addition to CFO estimation, various CFO compensation algorithms have also been proposed in the literature[2429]. An interference cancelation scheme based on circular convolution is proposed in[24] while parallel interference cancelation is used in[25] for CFO compensation. A linear decorrelator detector is proposed in[26]. However, the method is computationally intensive as it requires inversion of a large matrix for satisfactory performance. A joint CFO compensation and channel estimation algorithm is proposed in[27] for multi-antenna SC-FDMA systems. Several time-domain CFO compensation schemes paired with successive interference cancelation and MU ordering are proposed in[24]. A low-complexity CFO compensation scheme for OFDMA transmissions has been proposed in[29] for interleaved OFDMA transmissions.

In this paper, we propose a joint CFO estimation and compensation scheme for SC-FDMA UL transmissions with IAS using parallel factor (PARAFAC) analysis method[30]. PARAFAC, an extension of matrix decomposition to tensors, has found numerous applications in signal processing including multi-user detection in CDMA systems[31, 32], CFO estimation for DL OFDM systems[33], channel equalization[34], and array signal processing[35, 36]. However, the scheme proposed in[33] is for single-user OFDM transmissions and it is not applicable to multi-user UL SC-FDMA transmissions, discussed in this paper. Employing identical training symbols and exploiting the structure of transmitted SC-FDMA signals with IAS, we show that in the presence of CFOs, the signal received at the BS can be formulated as the PARAFAC model. This key observation allows us to apply tensor decomposition methods to jointly retrieve the estimates of CFO and the MUI-free received training symbol corresponding to each user. The formulation also allows us to perform a low-complexity CFO compensation for subsequent data symbols. The proposed algorithm is semi-blind in the sense that it requires identical training symbols but the actual knowledge of the training symbols is not required. Thus, the training signals can be designed to optimize other receiver operations like timing synchronization or channel estimation. We prove that the proposed algorithm guarantees the identifiability of CFO estimation while allowing the system to operate on full load as opposed to[18, 19, 21]. In addition, the proposed algorithm can be readily extended to multi-antenna BS reception and offers better performance as compared to the existing approaches in the literature.

The rest of the paper has been organized as follows. Section 2 presents the system model and the structure of transmitted and received signals in SC-FDMA uplink transmissions. The proposed joint CFO estimation and compensation method is presented in Section 3 along with the discussion on identifiability. Section 4 presents the extension of the proposed algorithm to multi-antenna BS case while tensor decomposition algorithms and the associated computational complexity are discussed in Section 5. Section 6 presents the simulation results followed by conclusions in Section 7.

2 System model

We consider a baseband-equivalent system model of SC-FDMA UL transmissions where M active users transmit to a BS using their allocated sub-carriers. The maximum number of users allowed in the system is denoted as M u . The total number of sub-carriers is denoted as N while the number of sub-carriers allocated to each user isR= N M u . The transmission takes place in the form of frames where each frame contains a block of Q identical SC-FDMA training symbols followed by data symbols. The block diagram of the SC-FDMA system is shown in Figure1.

Figure 1
figure 1

Block diagram of SC-FDMA uplink transmissions.

2.1 Transmitted signal structure

As IAS is employed, the exclusive set of sub-carriers allocated to the m th user, 0 ≤ m ≤ M - 1 is denoted as I m := v m + rM u 0 r R - 1 , where v m is the unique starting index of the m th user. The q th SC-FDMA training symbol contains R training points, which are precoded by an R point DFT and then mapped to the sub-carriers allocated to the m th user. Thus, the k th sub-carrier, 0 ≤ k ≤ N - 1, of the q th training symbol transmitted by the m th user is given as

X m q , k = 1 R r = 0 R - 1 x m r e - j 2 π R r k - v m M u k I m , 0 k I m .
(1)

As the Q training symbols are identical, we drop the index q in x m [r]. A N point inverse discrete Fourier transform (IDFT) of X m [q,k], 0 ≤ k ≤ N - 1, is then computed to generate the q th SC-FDMA training symbol in time domain with n th sample given as

s m q , n = 1 N k = 0 N - 1 X m q , k e j 2 π N kn , 0 n N - 1 .
(2)

As X m [q,k] is non-zero only fork I m ,

s m q , n = 1 N r = 0 R - 1 X m q , v m + rM u e j 2 π N v m + rM u n , 0 n N - 1 .
(3)

Using Equation 1, Equation 3 can be written as

s m q , n = 1 NR e j 2 π N v m n r = 0 R - 1 x m [ r ] r = 0 R - 1 e j 2 π R ( n - r ) r = 1 M u e j 2 π N v m n r = 0 R - 1 x m [ r ] δ ( n ) R - r = 1 M u e j 2 π N v m n x m [ ( n ) R ] , 0 n N - 1 ,
(4)

where δ(·) is the Kronecker delta function, and (·) R represents the modulo R operation. Equation 4 shows that the transmitted SC-FDMA signal of the m th user contains M u copies of the original signal x m [r], 0 ≤ r ≤ R - 1 scaled by the phase term corresponding to the starting index of the user. A CP of N g samples is then appended at the start of each SC-FDMA symbol to combat inter-symbol interference (ISI) due to multi-path channel.

2.2 Received signal structure

The channel between the m th user and the BS is assumed to be of length P m and denoted as h ~ m [p],0p P m -1. We assume that the channel remains constant for the Q training symbols, and N g  ≥ P - 1 where P = max m {P m  + d m }, and d m is the timing offset (TO) of the m th user. Under this assumption, the TO of the m th user can be incorporated into its channel and the effective channel is given as

h m p = 0 0 p d m - 1 , h ~ m p - d m d m p P m + d m - 1 , 0 P m + d m p P - 1 .
(5)

Such a system is termed as quasi-synchronous system[6]. Once the TO estimate of each user is estimated at the BS, it is fed back to the user to advance its timing correspondingly and thus, the length of the CP of following symbols can be reduced.

After CP removal, the qth received symbol at the BS is the sum of the M SC-FDMA symbols transmitted by the MUs and given as

y [ q , n ] = m = 0 M - 1 e j 2 π N f m ( qN t + N g + n ) p = 0 P - 1 h m [ p ] s m q , ( n - p ) N + w [ q , n ] ,
(6)

where f m is the CFO between the m th user and the BS normalized by the sub-carrier spacing of the SC-FDMA system, N t  = N + N g , and w[q,n] represents the corresponding additive white Gaussian noise (AWGN) sample. We assume that |f m | < 0.5, ∀m implying that only fractional CFOs appear in the UL transmission. It is a reasonable assumption since UL transmissions initiate after DL synchronization, which implies that the UL CFOs can only be caused by the Doppler effect or residual DL synchronization errors. Using Equation 4, Equation 6 can be written as

y [ q , n ] = 1 M u m = 0 M - 1 e j 2 π N f m ( qN t + N g + n ) e j 2 π N v m n p = 0 P - 1 h m [ p ] e - j 2 π N v m p x m ( n - p ) R + w [ q , n ] .
(7)

If the effective channel length P ≤ R, the inner sum in Equation 7 is also periodic with period R. If we denote

y m [ r ] = 1 M u p = 0 P - 1 h m [ p ] e - j 2 π N v m p x m ( r - p ) R ,
(8)

which is the circular convolution of the scaled channel, i.e., h m [p] e - j 2 π N v m p , and the training symbol x m [r],0 ≤ r ≤ R - 1, then Equation 7 can be written as

y [ q , n ] = m = 0 M - 1 e j 2 π N qf m N t e j 2 π N ψ m e j 2 π N ϕ m n y m [ ( n ) R ] + w [ q , n ] ,
(9)

where ϕ m  = f m  + v m and ψ m  = f m N g . Thus, the received signal is also periodic except for the phase terms introduced by the CFOs and the starting indices. As non-zeros CFOs introduce ICI and MUI, which severely degrade the overall system performance, CFO estimation and compensation must be performed before channel estimation, equalization, and subsequent receiver processes as shown in Figure1.

3 CFO estimation and compensation for SC-FDMA UL with IAS

In this section, we present the proposed CFO estimation and compensation algorithms for SC-FDMA UL transmissions. As the proposed scheme is based on PARAFAC analysis, we start by showing that the received signal in Equation 9 can be formulated as a PARAFAC model, which will allow us to apply tensor decomposition algorithms to estimate the CFO and the MUI-free signal of each user. For an introduction to PARAFAC, refer to[30, 31].

3.1 SC-FDMA UL as a PARAFAC model

If we stack the periods of the q th received signal in (9) in an M u  × R matrix

Y ( q ) = y q , 0 y q , 1 y q , R - 1 y q , R y q , R + 1 y q , 2 R - 1 · · · · · · y q , N - R y q , N - R + 1 y q , N - 1 M u × R ,

then Y(q) can be expressed as

Y ( q ) = A D M ( q ) C T + W ( q ) ,
(10)

where

A = 1 1 1 e j 2 π M u ϕ 0 e j 2 π M u ϕ 1 e j 2 π M u ϕ M - 1 · · · · · · e j 2 π M u ϕ 0 ( M u - 1 ) e j 2 π M u ϕ 1 ( M u - 1 ) e j 2 π M u ϕ M - 1 ( M u - 1 ) M u × M ,
(11)
C = e j 2 π N ψ 0 y 0 [ 0 ] e j 2 π N ψ M - 1 y M - 1 [ 0 ] e j 2 π N ( ϕ 0 + ψ 0 ) y 0 [ 1 ] e j 2 π N ( ϕ M - 1 + ψ M - 1 ) y M - 1 [ 1 ] · · · · e j 2 π N ( ( R - 1 ) ϕ 0 + ψ 0 ) y 0 [ R - 1 ] e j 2 π N ( ( R - 1 ) ϕ M - 1 + ψ M - 1 ) y M - 1 [ R - 1 ] R × M ,
(12)

and D M (q) is a M × M diagonal matrix with diagonal entries e j 2 π N f m qN t 0mM-1, and W(q) is an M u  × R matrix containing the corresponding AWGN samples.

As the Q training symbols are identical, if we stack all Y(q) matrices corresponding to the Q received training symbols, the resulting Q M u  × R matrix can be expressed as

Y Q = Y ( 0 ) Y ( 1 ) · · Y ( Q - 1 ) = A D M ( 0 ) A D M ( 1 ) · · A D M ( Q - 1 ) C T + W ( 0 ) W ( 1 ) · · W ( Q - 1 ) ,
(13)

or

Y Q = ( B A ) C T + W Q ,
(14)

where ⊙ represents the Khatri-Rao product[30] and

B = 1 1 e j 2 π N f 0 N t e j 2 π N f M - 1 N t · · · · e j 2 π N f 0 Q - 1 N t e j 2 π N f M - 1 Q - 1 N t
(15)

is a Q × M matrix with q th row containing the diagonal entries of D M (q). Similarly it can be shown that

Y R = C B A T + W R ,
(16)

where YR is a R Q × M u matrix, which is a concatenation of R matrices with r th matrix given as

y [ 0 , r ] y [ 0 , r + R ] y [ 0 , r + ( M u - 1 ) R ] y [ 1 , r ] y [ 1 , r + R ] y [ 1 , r + r + ( M u - 1 ) R ] · · · · · · y [ Q - 1 , r ] y [ Q - 1 , r + R ] y [ Q - 1 , r + ( M u - 1 ) R ] Q × M u ,

and

Y M u = A C B T + W M u ,
(17)

where Y M u is an M u R × Q matrix with Y M u [n,q]=y[q,n]. In the absence of noise, Equations 14, 16, and 17 form the PARAFAC model of three-way tensors with rank M[30], which allows us to apply tensor decomposition algorithms to obtain estimates of A, B, and C, denoted as A ˆ , B ˆ , and C ˆ , respectively. The details of the algorithms along with the discussion on computational requirements are deferred to Section 5. The CFO estimate and the MUI-free received signal estimate corresponding to each user can then be extracted from A ˆ , B ˆ , and C ˆ as shown in the following sections. We refer to Equations 14, 16, and 17 collectively as the PARAFAC model. A key advantage of employing the PARAFAC model is that the uniqueness of the signal decomposition in the PARAFAC model and thus, identifiability of CFO estimation can be guaranteed.

3.2 Identifiability of CFO estimation

We now discuss the conditions, which guarantee the identifiability of CFO estimation by using the uniqueness properties of the PARAFAC model. Identifiability refers to the uniqueness of the CFO estimates. Guaranteeing the identifiability of CFO estimation is an important requirement for channel frequency selectivity as well as MUI can affect the identifiability of CFO estimation[13, 37].

A key advantage of applying the PARAFAC model is that its uniqueness can be guaranteed. A sufficient condition is presented in[38], which says that the decomposition in Equation 14 is unique up to scaling and column permutation if

k A + k B + k C 2 ( M + 1 ) ,
(18)

where k A denotes the k-rank of A. Specifically, if rank (A) = M and every l ≤ M columns of A are linearly independent, then the k-rank of A is l[31]. We now investigate the k-rank of the matrices involved in the proposed PARAFAC model for the SC-FDMA UL in Equation 14. Recall that ϕ m  = v m  + f m in Equation 11. As the CFO of each user is fractional, i.e., f m  < 0.5 as mentioned in Section 2, and the starting index v m of each user is unique, v m -0.5 < ϕ m  < v m  + 0.5 in Equation 11 lies in a unique interval for each user even if two or more users have CFO values close to each other. Therefore, using the fact that A is a Vandermonde matrix, it has full k-rank, i.e., k A  = min(M u ,M) = M. Moreover, as the CFO is a continuous random variable, the probability that the CFOs of two different users are exactly the same is zero, or in other words, the event that the CFOs of two different users are the same occurs almost never ([39], p. 232). Thus, B, being a Vandermonde matrix as well, has a full k-rank, i.e., k B  = min(Q,M). Similarly, it can be shown that the matrix C in Equation 12 has full k-rank if the training symbols x m (r),0 ≤ r ≤ R - 1 and channel of different users are un-correlated from each other. In that case, k C  = min(R,M). If R ≥ M, which is usually the case in broadband wireless systems, the condition in Equation 18 can be satisfied for Q ≥ 2. Thus, provided R ≥ M, A, B , and C can be uniquely estimated up to scaling and column permutation by transmission of only two identical training symbols. For BSs equipped with multiple antennas, the condition R ≥ M can be relaxed as discussed in Section 4.

As the PARAFAC decomposition is unique only up to scaling and column permutations, matching the columns to the respective users is required once the estimates of the matrices are available. Also, the scaling of C ˆ must be identified and corrected. As each entry of A and B is a complex exponential, any scaling of columns of A ˆ and B ˆ can be easily identified. The scaling of the columns of C can then be rectified by dividing each column with the product of the scaling of respective columns of A and B. As ϕ m  = v m  + f m lies in a unique interval for each user, as discussed earlier, any column permutation can also be easily identified through A ˆ . Specifically, the index of the column of A ˆ corresponding to the m th user, denoted as c m is given as (see Appendix A)

c m = arg min m M u 2 π 1 M u - 1 u = 1 M u - 1 Â [ u , m ] Â [ u - 1 , m ] - v m .
(19)

3.3 CFO estimation using PARAFAC

As both A and B depend on CFO, A ˆ and/or B ˆ can be employed for CFO estimation after correction of scaling and column permutation. We use the entries of B ˆ , and the CFO of the m th user is estimated as (see Appendix B)

f ˆ m = N 2 π N t 1 Q - 1 q = 1 Q - 1 B ˆ [ q , m ] B ˆ [ q - 1 , m ] .
(20)

Thus, the acquisition range of CFO estimation is f max = N 2 N t sub-carrier spacings.

Remark 1. The CFO estimation schemes proposed in[18, 19, 21] cannot operate on fully loaded systems and require either CP extension or multiple antenna BSs to support full load. However, the proposed CFO estimation is independent of the number of active user and thus, provides better spectral efficiency as compared to existing algorithms.

Remark 2. It is now clear that the proposed CFO estimation only requires the transmissions of at least two identical training symbols. The idea of identical training signals has been employed before in timing and carrier synchronization for OFDM system, e.g., see[40]. Also, two out of four preambles employed in random access in LTE-Advanced contain two identical symbols[3]. However, it is important to note that the knowledge of actual data transmitted on the sub-carriers of the training signals is not required for CFO estimation. Thus, the proposed scheme can be referred as a semi-blind CFO estimation, and the training signal can be designed to optimize other receiver tasks such as timing synchronization and/or channel estimation.

3.4 CFO compensation for the training symbol

Apart from CFO estimates obtained through A ˆ and B ˆ , the PARAFAC model also provides the estimate of C, whose columns contain MUI-free received signals corresponding to each user. Thus, another advantage of PARAFAC decomposition is that it automatically decouples user’s received training signals, and CFO compensation can be performed for each user separately to recover y m [r] 0 ≤ r ≤ R - 1 in Equation 8. Specifically,

ŷ m [ r ] = e - j 2 π N r ϕ m + ψ m Ĉ [ r , m ] .
(21)

Once the CFO-compensated training signal of each user is available, it can be employed for channel estimation. Standard channel estimation techniques, like least squares (LS) or MMSE channel estimation[41] are applicable to the estimate of the received training signal in Equation 21.

3.5 CFO compensation for the data symbols

In this section, we present the CFO compensation scheme for SC-FDMA data symbols following the training symbols. As the proposed PARAFAC model requires multiple identical training symbols, PARAFAC decomposition algorithms cannot be employed for CFO compensation of data symbols following the training, for data symbols cannot be identical. However, the received data symbols still satisfy Equation 10 with C replaced by a symbol-dependent matrix C(q) whose columns contain the MUI-free received data symbols of the users. As the receiver has already estimated the CFOs, it has the knowledge of A ˆ and D ˆ M (q) in Equation 10, and C(q) for the q th data SC-FDMA symbol can estimated as

C ˆ ( q ) = D M - 1 ( q ) A ˆ Y ( q ) T ,
(22)

where A ˆ denotes the pseudo-inverse of A. CFO compensation for each user can then be performed separately using Equation 21. Thus, the formulation of the received signal in Equation 10, leading to the PARAFAC model, also allows us to design a time-domain CFO compensation scheme for data symbols using Equations 22 and 21. The CFO compensation scheme in[29] also formulates the received periodic signal in interleaved OFDMA transmissions as a low-dimensional matrix form similar to Y(q) for CFO compensation. However, the scheme in[29] is based on OFDMA transmissions, and thus, the scheme in[29] is not directly applicable to SC-FDMA transmissions.

3.5.1 Complexity analysis of CFO compensation

We now calculate the computational requirements of the proposed CFO compensation scheme under full load and compare it with some of the existing schemes in the literature. Complexity analysis of PARAFAC decomposition for CFO estimation is discussed in Section 5. Calculation of C ˆ (q) in Equation 22 requires inversion of A ˆ , which can be efficiently implemented using the fact that A is Vandermonde matrix and a closed-form solution for its inverse exists with a complexity of onlyO M u 2 operations[42]. Thus, the total computational load of Equations 22 and 21 isO ( N + 1 ) M u + M u 2 . In contrast, the time-domain CFO compensation scheme in[28] requires the inversion of an N × N block diagonal matrix withO M u 2 N + N 2 operations while the linear decorrelation-based frequency-domain CFO compensation scheme in[26] requires inversion of an N × N matrix with O(N3) operations. The complexity of CFO compensation in[26] can be lowered by a banded matrix approximation at the cost of performance degradation. Thus, compared with the existing schemes, the proposed CFO compensation has the advantage of significantly reduced computational complexity.

This completes the discussion on the proposed CFO estimation and compensation techniques, and the block diagram of the proposed algorithms is shown in Figure2. We now discuss the extension of the proposed algorithm to multi-antenna BS reception.

Figure 2
figure 2

Block diagram of the proposed SC-FDMA UL receiver.

4 Extension to multi-antenna BS case

In this section, we show that the proposed synchronization schemes can be readily extended to the case where the BS has multiple antennas. Specifically, if the BS has N r receive antennas, we denote the parameters associated with the t th antenna by using the subscript ‘t’. Thus, y[q,n], y m [r], and h m [p] in Equations 8 and 9 are replaced by y t [q,n], yt,m[r], and ht,m[p], respectively. We assume that the channels between a user and different receive antennas are independent. However, receive antennas are collocated, and all receive chains are assumed to employ a common clock, which implies that the CFO across receiver chains remains constant. This allows us to employ the PARAFAC model in Equation 14 for the multi-antenna BS case with A and B unchanged as in Equations 11 and 15, respectively, while YQ in Equation 14 is updated as

Y Q = Y 0 ( 0 ) Y 1 ( 0 ) Y N r - 1 ( 0 ) Y 0 ( 1 ) Y 1 ( 1 ) Y N r - 1 ( 1 ) · · · · · · Y 0 ( Q - 1 ) Y 1 ( Q - 1 ) Y N r - 1 ( Q - 1 ) M u × N r R ,
(23)

which is a concatenation of Y t (q) in Equation 10 corresponding to different receiver antennas. Correspondingly, C in Equation 12 is updated as

C = C 0 C 1 · · C N r - 1 N r R × M ,
(24)

with the element in r th row and m th column of C t given as

C t [ r , m ] = e j 2 π N ( ( r - 1 ) ϕ m - 1 + ψ m - 1 ) y t , m - 1 [ r - 1 ] .

Thus, the PARAFAC model is still applicable to the multi-antenna BS reception, and tensor decomposition can still be applied to jointly estimate the CFOs and MUI-free received signal corresponding to different users and receive antennas. A better CFO estimation performance is expected in this case due to more information brought by the additional receive chains at the cost of increased computational complexity.

Remark 3. As the dimensions of C in Equation 24 grow to N r R × M for multi-antenna BS as opposed to R × M in Equation 12 for single-antenna BS, the k-rank of C is given as k C  = min(N r R,M). Thus, according to the discussion in Section 3.2, identifiability of CFO estimation in multi-antenna BS case is guaranteed if N r R ≥ M, which is a less restrictive condition as compared to R ≥ M for single-antenna BS. Moreover, the CFO compensation scheme presented in Section 3.5 is also applicable to multi-antenna reception. Since A and B remain un-changed, A ˆ in Equation 22 needs to be calculated only once for CFO compensation corresponding to all receive antennas. Thus, the complexity of CFO compensation increases only linearly with respect to the number of receive antennas.

5 Tensor decomposition algorithms and the associated computational complexity

In this section, we discuss how to estimate the matrices involved in the proposed PARAFAC model in Section 3.1. Various tensor decomposition algorithms proposed in the literature can be employed for estimation of A, B , and C matrices of the proposed PARAFAC model. The examples include alternating least squares (ALS), gradient descent algorithm, conjugate gradient algorithm, and the Levenberg-Marquardt algorithm. For a description of these algorithms in the context of PARAFAC, please refer to[30].

5.1 PARAFAC decomposition using alternating least squares

ALS is one of the most popular algorithms for PARAFAC, which we briefly describe here in the context of SC-FDMA CFO estimation.

Given the three equivalent forms of PARAFAC model in Equations 14, 16, and 17, A, B , and C matrices can be estimated by minimizing either of the following cost functions

J Q = Y Q - ( B A ) C T 2 , J M u = Y M u - ( A C ) B T 2 , J R = Y R - ( C B ) A T 2 .

The idea of ALS is to iteratively solve these three cost functions and update one of the three matrices while keeping the other two constant. Thus, for iteration i, given A ˆ ( i - 1 ) and B ˆ ( i - 1 ) , C ˆ i is calculated as the LS solution of JQ given as

C ˆ ( i ) = B ˆ ( i - 1 ) A ˆ ( i - 1 ) Y Q T .
(25)

Similarly, A ˆ ( i ) and B ˆ ( i ) are then updated as

B ˆ ( i ) = A ˆ ( i - 1 ) C ˆ ( i ) Y M u T ,
(26)

and

A ˆ ( i ) = C ˆ ( i ) B ˆ ( i ) Y R T ,
(27)

respectively, and the algorithm moves on to the next iteration. The initial values, i.e., A ˆ ( 0 ) and B ˆ ( 0 ) required for starting the algorithm are computed by setting the CFO of each user to zero, i.e., f m  = 0 ∀m in Equations 11 and 15, respectively. The effect of number of iterations on the convergence of the algorithm will be shown through simulation results in Section 6.

5.2 Computational complexity of PARAFAC decomposition

The computational complexity of ALS is dominated by the pseudo-inverse calculations required in Equations 25, 26, and 27 for each iteration. Thus, the computational complexity is given by O(N i ((Q M u  + N + Q R)(M2 + M) + N Q M)) operations[30], where N i is the number of iterations of ALS. As a comparison, the complexity of data-aided ML CFO estimation in[16] is O(M N c N i (P3 + P N2)) where N c is the number of candidate CFO values used in the line search, which is significantly higher than the proposed algorithm. On the other hand, the complexity of blind ESPRIT-based CFO estimation in[19] isO M u 3 + M u 2 N , which is about the same complexity order as the proposed algorithm. However, the performance of the proposed algorithm is significantly better than[19] as shown in Section 6.

6 Simulation results

In this section, we evaluate the performance of the proposed algorithms through Monte Carlo simulations and compare it with some of the existing schemes in the literature. For CFO estimation, we compare with the algorithms in[16, 18, 19], and for CFO compensation, we compare with the CFO compensation scheme in[26]. The simulation parameters are as follows: the total number of sub-carriers N = 128, length of CP N g  = 16, maximum number of users M u  = 8, active users M = 4, number of sub-carrier allocated to each userR= N 8 =16, and channel length for each user P m  = 6. For each iteration of the simulation, the CFO of each user is modeled as a random variable uniformly distributed between [-fmax,fmax] with fmax = 0.4. The modulation scheme is quadrature phase shift keying (QPSK). The transmission is modeled in the form of frames, where each frame contains Q identical training symbols in the beginning followed by 20 SC-FDMA symbols carrying data. CFO and channel estimation is carried out using the training symbols, and the estimates are employed to decode all the data symbols in that frame. For comparison, Q symbols are also employed for CFO estimation using[16, 18, 19].

6.1 CFO estimation performance with increasing fmax

Figure3 shows the performance of the proposed CFO estimation in terms of mean squared error (MSE) of CFO estimation with increasing value of fmax, i.e., the maximum absolute CFO at an SNR of 20 dB. The rest of the simulation parameters is given in the beginning of this section. As the maximum CFO increases, ICI and MUI increase, and we expect to see degradation in CFO estimation performance. This trend is visible in Figure3 for both values of simulated SNR as the MSE of CFO estimation increases slightly as fmax increases.

Figure 3
figure 3

Performance of proposed CFO estimation with increasing maximum CFO value.

6.2 Effect of number of training symbols Q on CFO estimation

We now evaluate the effect of number of training symbols Q and the number of iterations of ALS on the MSE of CFO estimation at an SNR of 20 dB. The values of Q range from 2 to 5, and the results are shown in Figure4. As shown, the MSE of CFO estimation decreases as more training symbols and/or more iterations of ALS are employed for CFO estimation. However, the price for the increase in CFO estimation accuracy is decreased spectral efficiency due to more training symbols and/or increased complexity and latency of CFO estimation due to more iteration of ALS. Thus, there exists a trade-off between CFO estimation performance and spectral efficiency/complexity. Figure4 also shows that the improvement in performance of CFO estimation is negligible after about 4 to 6 iterations of ALS. Thus, in the following simulations, only 5 iterations of ALS are employed.

Figure 4
figure 4

Performance of proposed CFO estimation with increasing number of training symbols ( Q ) and iterations of ALS algorithm.

6.3 CFO estimation performance for multi-antenna BS

In this section, we evaluate the performance of the CFO estimation for multi-antenna BS reception. As the CFO remains constant across receiver antennas at the BS, the performance of CFO estimation is expected to improve as more antennas are employed at the BS. This trend is clearly visible in Figure5, which shows the MSE of CFO estimation for multi-antenna BS reception. As the number of receive antennas N r is increased from 1 to 4, about 7 dB improvement in performance is observed.

Figure 5
figure 5

Performance of proposed CFO estimation with increasing number of receive antennas ( N r ) at the BS.

6.4 Comparison of CFO estimation performance with existing schemes

In this section, we compare the performance of the proposed CFO estimation with the schemes in[16, 18, 19] denoted as ‘MUSIC’ and ‘ESPRIT’ and ‘alternating-projection frequency estimation (APFE)’ with single-antenna BS. The same number of symbols Q = 2 is employed for CFO estimation in all schemes although both schemes in[18, 19] are truly blind schemes and do not require knowledge of the training symbols. The proposed scheme is semi-blind as discussed in Section 3 because it does not require the knowledge of the training symbols and only requires that the Q = 2 training symbols are identical. The scheme in[16] is data-aided and requires the knowledge of all training symbols. We also plot the modified Cramer-Rao bound (MCRB) for single-user OFDM ([43], Equation 7) as an absolute reference for CFO estimation performance. The number of candidate CFO values employed in the line search used in MUSIC and APFE is 1,000. The comparison, given in Figure6, shows that the performance of the proposed scheme is significantly better (8 to 10 dB improvement) than the MUSIC and ESPRIT schemes. The performance of ESPRIT is better than MUSIC for lower SNR while the opposite is true for higher SNR. As both MUSIC and ESPRIT are blind CFO estimation algorithms, CFO estimation accuracy can be increased by employing data symbols for CFO estimation. However, it will increase the complexity and latency of CFO estimation. Compared to APFE, the proposed algorithm offers about 1 dB performance improvement for SNR >5 dB. However, it should be noted that the complexity of the proposed algorithm is significantly less than APFE as discussed in Section 5. The complexity of CFO estimation in[16] can be lowered by using sub-optimal algorithms at the cost of degradation of CFO estimation accuracy. Compared to the MCRB, the performance of the proposed algorithm is consistently close to the absolute reference provided by MCRB.

Figure 6
figure 6

Comparison of proposed CFO estimation with CFO estimation schemes in[18](MUSIC),[19](ESPRIT), and[16](APFE).

6.5 Comparison of CFO estimation performance with increasing number of users

In this section, we evaluate the performance of the proposed algorithm with increasing number of active users (M) and compare it with the existing schemes in the literature. The number of users varies from a single user (M = 1) to full load (M = 8) while the operating SNR is 20 dB. The results are shown in Figure7.

Figure 7
figure 7

Performance evaluation of proposed CFO estimation for different number of active users (SNR = 20 dB).

As the number of users increases, MUI increases, and the performance of each algorithm decreases. As the sub-space-based algorithms cannot operate on full load, the performance of MUSIC and ESPRIT algorithms is severely degraded on full load. The performance of the proposed algorithm is consistently better than the other algorithms while it also supports full-load transmissions.

6.6 Performance of CFO compensation and channel estimation

We now compare the performance of the proposed CFO compensation algorithm with the one proposed in[26]. The performance is evaluated in terms of MSE of channel estimation performed after CFO compensation. LS channel estimation[41] is employed in each case. A better CFO compensation scheme will result in less residual ICI and MUI, which will improve the channel estimation accuracy. We compare the performance of the proposed CFO estimation and compensation with the three different combinations of CFO estimation and compensation. CFO estimation algorithms in[18, 19], combined with the CFO compensation in[26] are denoted as ‘MUSIC +[26]’ and ‘ESPRIT +[26]’. As the CFO estimation accuracy for MUSIC and ESPRIT in worse than the proposed algorithm as shown in the previous section, we also evaluate the performance of CFO compensation in[26] combined with the proposed CFO estimation scheme for fair comparison of the proposed and[26]-based CFO compensation schemes. Figure8 shows the MSE for channel estimation for all scenarios. As shown, the performance of MUSIC +[26] and ESPRIT +[26] is worse than the proposed schemes. Moreover, the performance of the proposed CFO estimation and compensation is better (about 2 dB improvement in performance) than the proposed CFO estimation combined with[26], which shows that the proposed CFO compensation schemes do a better job of extracting the interference-free received signal.

Figure 8
figure 8

Comparison of channel estimation using proposed CFO compensation method and the one in[26].

6.7 Bit error rate performance

Figure9 shows the bit error rate (BER) performance of the proposed CFO estimation and compensation algorithms and their comparison with CFO estimation schemes in[18, 19] combined with CFO compensation scheme in[26]. It is clear from the figure that the proposed CFO estimation and compensation schemes outperform both MUSIC and ESPRIT-based CFO estimation combined with CFO compensation in[26], which shows the effectiveness of the proposed schemes. These simulation results show that the proposed CFO estimation and compensation schemes offer significant performance improvements in both CFO estimation accuracy and estimation of MUI-free received signals corresponding to different users.

Figure 9
figure 9

BER comparison of the proposed CFO estimation and compensation methods with MUSIC[18]and ESPRIT[19].

7 Conclusions

We have presented CFO estimation and compensation algorithms for SC-FDMA UL transmissions with interleaved sub-carrier allocation scheme. The proposed algorithm is based on PARAFAC and guarantees the identifiability of CFO estimation. Moreover, the proposed scheme allows the system to operate on full load gaining better spectral efficiency. Comparisons with the existing schemes in the literature demonstrate that the proposed algorithms have lower complexity and offer considerable performance improvements for both CFO estimation and compensation in SC-FDMA UL receivers.

Appendix

A Derivation of Equation 19

The column permutation in PARAFAC decomposition can be identified through entries of A ˆ . As the entries in a certain column of A ˆ are complex exponentials corresponding to ϕ m for a specific 0 ≤ m ≤ M - 1, we can estimate ϕ m for each user by calculating

ϕ ˆ m = M u 2 π 1 M u - 1 u = 1 M u - 1 Â [ u , m ] Â [ u - 1 , m ] , 0 m M - 1 ,
(28)

similar to the derivation of f m explained in Appendix B. However, as the column of A ˆ may be permuted, the calculation of Equation 28 using m th column of A ˆ may not correspond to the m th user. As ϕ m  = f m  + v m for each m lies in a unique interval as explained in Section 3.2, the column permutation can be identified by identifying which interval does the ϕ ˆ m for a particular m belongs to. This can be done by calculating the following metric:

arg min m ϕ ˆ m - v m
(29)

that identifies, which ϕ ˆ m 0 m M-1 corresponds to the m th user. Plugging in the value of ϕ ˆ m from Equation 28 in the above metric concludes to Equation 19.

B Derivation of Equation 20

The matrix B in Equation 15 is a Q × M matrix and m th, 0 ≤ m ≤ M - 1 column of B contains complex exponentials corresponding to the CFO of the m th user. Therefore, dividing the q th entry of the m th column of B ˆ by the q-1 entry gives an estimate of

B q , m B q - 1 , m = e j 2 π N f m N t .
(30)

In order to improve the estimation accuracy, we average the estimate of e j 2 π N f m N t by calculating

q = 1 Q - 1 B ˆ q , m B ˆ q - 1 , m .
(31)

The CFO of the m th user can then be estimated by calculating the angle of the estimate in Equation 31 and scaling it by N 2 π N t , which concludes to the Equation 20.