Coherent Detection of Turbo-Coded OFDM Signals Transmitted through Frequency Selective Rayleigh Fading Channels with Receiver Diversity and Increased Throughput

In this work, we discuss techniques for coherently detecting turbo coded orthogonal frequency division multiplexed (OFDM) signals, transmitted through frequency selective Rayleigh (the magnitude of each channel tap is Rayleigh distributed) fading channels having a uniform power delay profile. The channel output is further distorted by a carrier frequency and phase offset, besides additive white Gaussian noise (AWGN). A new frame structure for OFDM, consisting of a known preamble, cyclic prefix, data and known postamble is proposed, which has a higher throughput compared to the earlier work. A robust turbo decoder is proposed, which functions effectively over a wide range of signal-to-noise ratio (SNR). The key contribution to the good performance of the practical coherent receiver is due to the use of a long preamble (512 QPSK symbols), which is perhaps not specified in any of the current wireless communication standards. We have also shown from computer simulations that, it is possible to obtain even better BER performance, using a better code. A simple and approximate Cramer-Rao bound on the variance of the frequency offset estimation error for coherent detection, is derived. The proposed algorithms are well suited for implementation on a DSP-platform.


I. INTRODUCTION
Future wireless communication standards aim to push the existing data-rates higher. This can only be achieved with the help of coherent communications, since they give the lowest bit-error-rate (BER) performance for a given signal-to-noise ratio (SNR). Conversely, they require the lowest SNR to attain a given BER, resulting in enhanced battery life. If we look at a mobile, it indicates a typical received signal strength equal to −100 dBm (10 −10 mW). However this is not the signal-tonoise ratio! Therefore, the question is: What is the operating The author is with the Dept. of EE, IIT Kanpur. Email: vasu@iitk.ac.in SNR of the mobiles? Would it be possible to achieve the same performance by transmitting at a lower power? The recent advances in cooperative communications has resulted in low complexity solutions, that are not necessarily power efficient [1], [2]. In fact, it is worth quoting the following from [3]: 1) The Myth: Sixty years of research following Shannon's pioneering paper has led to telecommunications solutions operating arbitrarily close to the channel capacity-"flawless telepresence" with zero error is available to anyone, anywhere, anytime across the globe.
2) The Reality: Once we leave home or the office, even top of the range iPhones and tablet computers fail to maintain "flawless telepresence" quality. They also fail to approach the theoretical performance predictions. The 1000-fold throughput increase of the best thirdgeneration (3G) phones over second-generation (2G) GSM phones and the 1000-fold increased teletraffic predictions of the next decade require substantial further bandwidth expansion toward ever increasing carrier frequencies, expanding beyond the radio-frequency (RF) band to optical frequencies, where substantial bandwidths are available. The transmitter and receiver algorithms proposed in this paper and in [4], [5] are well suited for implementation on a DSP processor or hardwired and may perhaps not require quantum computers, as mentioned in [3]. The reader is also referred to the brief commentary on channel estimation and synchronization in page 1351 and also to the noncoherent schemes in page 1353 of [1], which clearly state that cooperative communications avoid coherent receivers due to complexity.
Broadly speaking, the wireless communication device needs to have the following features: 1) maximize the bit-rate 2) minimize the bit-error-rate 3) minimize transmit power 4) minimize transmission bandwidth A rather disturbing trend in the present day wireless communication systems is to make the physical layer very simple and implement it in hardware, and allot most of the computing resources to the application layer, e.g., for internet surfing, video conferencing etc. While hardware implementation of the physical layer is not an issue, in fact, it may even be preferred over software implementation in some situations, the real cause for concern is the tendency to make it "simple", at the cost of BER performance. Therefore, the questions are: 1) was signal processing for coherent communications given a chance to prove itself, or was it ignored straightaway, due to "complexity" reasons? 2) are the present day single antenna wireless transceivers, let alone multi-antenna systems, performing anywhere near channel capacity? This paper demonstrates that coherent receivers need not be restricted to textbooks alone, in fact they can be implemented with linear (not exponential) complexity. The need of the hour is a paradigm shift in the way the wireless communication systems are implemented.
In this article, we dwell on coherent receivers based on orthogonal frequency division multiplexing (OFDM), since it has the ability to mitigate intersymbol interference (ISI) introduced by the frequency selective fading channel [6]- [8]. The "complexity" of coherent detection can be overcome by means of parallel processing, for which there is a large scope. We wish to emphasize that this article presents a proof-ofconcept, and is hence not constrained by the existing standards in wireless communication. We begin by first outlining the tasks of a coherent receiver. Next, we scan the literature on each of these tasks to find out the state-of-the-art, and finally end this section with our contributions.
The basic tasks of the coherent receiver would be: 1) To correctly identify the start of the (OFDM) frame (SoF), such that the probability of false alarm (detecting an OFDM frame when it is not present) or equivalently the probability of erasure/miss (not detecting the OFDM frame when it is present) is minimized. We refer to this step as timing synchronization. 2) To estimate and compensate the carrier frequency offset (CFO), since OFDM is known to be sensitive to CFO. This task is referred to as carrier synchronization. 3) To estimate the channel impulse/frequency response. 4) To perform (coherent) turbo decoding and recover the data. To summarize, a coherent receiver at the physical layer ensures that the medium access control (MAC) is not burdened by frequent requests for retransmissions.
A robust timing and frequency synchronization for OFDM signals transmitted through frequency selective AWGN channels is presented in [9]. Timing synchronization in OFDM is addressed in [10]- [14]. Various methods of carrier frequency synchronization for OFDM are given in [15]- [21]. Joint timing and CFO estimation is discussed in [22]- [27].
Decision directed coherent detection of OFDM in the presence of Rayleigh fading is treated in [28]. A factor graph approach to the iterative (coherent) detection of OFDM in the presence of carrier frequency offset and phase noise is presented in [29]. OFDM detection in the presence of intercarrier interference (ICI) using block whitening is discussed in [30]. In [31], a turbo receiver is proposed for detecting OFDM signals in the presence of ICI and inter antenna interference.
Most flavors of the channel estimation techniques discussed in the literature are done in the frequency domain, using pilot symbols at regular intervals in the time/frequency grid [32]- [36]. Iterative joint channel estimation and multi-user detection for multi-antenna OFDM is discussed in [37]. Noncoherent detection of coded OFDM in the absence of frequency offset and assuming that the channel frequency response to be constant over a block of symbols, is considered in [38]. Expectation maximization (EM)-based joint channel estimation and exploitation of the diversity gain from IQ imbalances is addressed in [39].
Detection of OFDM signals, in the context of spectrum sensing for cognitive radio, is considered in [40], [41]. However, in both these papers, the probability of false alarm is quite high (5%).
In [42], discrete cosine transform (DCT) based OFDM is studied in the presence of frequency offset and noise, and its performance is compared with the discrete Fourier transform (DFT) based OFDM. It is further shown in [42] that the performance of DFT-OFDM is as good as DCT-OFDM, for small frequency offsets.
A low-power OFDM implementation for wireless local area networks (WLAN) is addressed in [43]. OFDM is a suggested modulation technique for digital video broadcasting [44], [45]. It has also been proposed for optical communications [46].
The novelty of this work lies in the use of a filter that is matched to the preamble, to acquire timing synchronization [47], [48] (start-of-frame (SoF) detection). Maximum likelihood (ML) channel estimation using the preamble is performed. This approach does not require any knowledge of the channel and noise statistics.
The main contributions of this paper are the following: 1) It is shown that, for a sufficiently long preamble, the variance of the channel estimator proposed in eq. (28) of [4] approaches zero. 2) A known postamble is used to accurately estimate the residual frequency offset for large data lengths, thereby increasing the throughput compared to [4], [5]. 3) Turbo codes are used to attain BER performance closer to channel capacity compared to any other earlier work in the open literature, for channels having a uniform power delay profile (to the best of the authors knowledge, there is no similar work on the topic of this paper, other than [4], [5]). 4) A robust turbo decoder is proposed, which performs effectively over a wide range of SNR (0 -30 dB). 5) While most papers in the literature try to attain the channel capacity for a given SNR, this work tries to attain the minimum SNR for error-free transmission, for a given channel capacity.
In a multiuser scenario, the suggested technique is OFDM-TDMA. The uplink and downlink may be implemented using time division duplex (TDD) or frequency division duplex (FDD) modes. This paper is organized as follows. Section II describes the system model. The receiver algorithms are presented in section III. The bit-error-rate (BER) results from computer simulations are given in section IV. Finally, in section V, we discuss the conclusions and future work. We assume that the data to be transmitted is organized into frames, as depicted in Figure 1. The frame consists of a known preamble of length L p symbols, a cyclic prefix of length L cp , followed by data of length L d symbols. Thus, the total length of the frame is Let us assume a channel span equal to L h . The channel span assumed by the receiver is L hr (> L h ). The length of the cyclic prefix is [7]: Throughout the manuscript, we use tilde to denote complex quantities. However, complex (QPSK) symbols will be denoted without a tilde e.g. S 1, n . Boldface letters denote vectors or matrices. The channel coefficientsh k, n for the k th frame are C N (0, 2σ 2 f ) and independent over time (n), that is: where "*" denotes complex conjugate and δ K (·) is the Kronecker delta function. This implies a uniform channel power delay profile. The channel is assumed to be quasi-static, that is h k, n is time-invariant over one frame and varies independently from frame-to-frame, that is where k and j denote the frame indexes. The AWGN noise samplesw k, n for the k th frame at time n are C N (0, 2σ 2 w ). The frequency offset ω k for the k th frame is uniformly distributed over [−0.04, 0.04] radian [23]. The phase offset θ k for the k th frame is uniformly distributed over [0, 2π).
Both ω k and θ k are fixed for a frame and vary randomly from frame-to-frame.
Note that: We assume S k, 3, i ∈ ±1 ± j. Since we require: we must have S 1, i ∈ L p /L d (±1 ± j). In other words, the average power of the preamble part must be equal to the average power of the data part.
The received signal for the k th frame can be written as (for 0 ≤ n ≤ L + L h − 2): r k, n = s k, n ⋆h k, n e j(ω k n+θ k ) +w k, n =ỹ k, n e j(ω k n+θ k ) +w k, n where "⋆" denotes convolution and y k, n =s k, n ⋆h k, n .
The set of received samples can be denoted by the vector: III. RECEIVER In this section we discuss the key receiver algorithms, namely, start of frame (SoF), coarse/fine frequency offset, channel and noise variance estimation and finally data detection.

A. Start of Frame and Coarse Frequency Offset Estimation
Let us assume that for the k th frame, the channel impulse response is known at the receiver. The channel length assumed by the receiver is L hr (> L h ) such that the first L h coefficients are identical to the channel coefficients and the remaining L hr − L h coefficients are zeros. Define the m th (0 ≤ m ≤ L cp + L d + L h + L hr − 2) received vector as: The steady-state 1 preamble part of the transmitted signal appearing at the channel output can be represented by a vector: The non-coherent maximum likelihood (ML) rule for frame detection can be stated as [7]: Choose that time as the start of frame and that frequencyω k , which jointly maximize the conditional pdf: substituting for the joint pdf and p(θ k ) and defining we get: where incorporates the phase accumulated by the frequency offset over the first L hr − 1 samples, besides the initial phase θ k .
One of the terms in the exponent is: is approximately proportional to the average received signal power, for large values of L p and L p ≫ L hr , and is hence (approximately) independent of m and θ. The other exponential term is clearly independent of m and θ. Therefore we are only left with (ignoring constants): which simplifies to [7]: where I 0 (·) is the modified Bessel function of the zeroth-order and Noting that I 0 (x) is a monotonic function of x and ignoring constants, the maximization in (19) simplifies to: Observe that (21) resembles the operation of demodulation and matched filtering. The ideal outcome of (21) to estimate the SoF and frequency offset is: In practice, the receiver has only the estimate of the channel (ĥ k, n ), henceỹ k, n must be replaced byŷ k, n , wherê is the preamble convolved with the channel estimate. When h k, n is not available, we propose a heuristic method of frame detection as follows: where agains 1, i denotes the preamble as shown in Figure 1.
The ideal outcome of (24) is: depending on which channel coefficient has the maximum magnitude. In practical situations, one also needs to look at the ratio of the peak-to-average power of (24) to estimate the SoF [48]. When m lies outside the range in (25), the frame is declared as erased (lost). The probability of frame erasure as a function of the preamble length is shown in Figure 2.
Observe that for L p = 512, the probability of erasure is less than 10 −6 and is hence not plotted. The coarse frequency offset estimateω k is obtained by dividing the interval [−0.04, 0.04] radian into B 1 frequency bins and selecting that bin which maximizes (24).

B. Channel Estimation
Here, we focus on maximum likelihood (ML) channel estimation. We assume that the SoF has been estimated using (24) with outcome m 0 (0 ≤ m 0 ≤ L h − 1) and the frequency offset has been perfectly canceled. Define The steady-state, preamble part of the received signal for the k th frame can be written as: where again L hr (> L h ) is the channel length assumed by the receiver. The statement of the ML channel estimation is as follows: findĥ k (the estimate ofh k ) such that: is minimized. Differentiating with respect toĥ * k and setting the result to zero yields [7], [49]: To see the effect of noise on the channel estimate in (30), considerũ Sinces 1, n is a zero-mean random sequence with good autocorrelation properties, it is reasonable to expect where σ 2 s is defined in (6), L 1 is defined in (13), and I L hr is an L hr × L hr identity matrix. It can be shown that Therefore, the variance of the ML channel estimate (σ 2 u ) tends to zero as L 1 → ∞ and L d is kept fixed. Conversely, when L d is increased keeping L 1 fixed, there is noise enhancement.
At this point, it must be mentioned that in the absence of noise, the channel estimate obtained from (30) depends on the SoF estimate m 0 obtained from (24). When m 0 = L h − 1, the channel estimate in the absence of noise would be: When m 0 = 0, the channel estimate (in the absence of noise) is :ĥ Thus we get: Observe that the channel estimation matrixs 1 in (28) remains the same, independent of m 0 . Therefore, the pseudoinverse of s 1 given in (30)

C. Fine Frequency Offset Estimation
For the purpose of fine frequency offset estimation, we propose to use (21) withỹ k, n replaced byŷ k, n as given in (23). Moreover, since the initial estimate of the frequency offset (ω k ) is already available, (21) must be modified as follows: where Observe that the span ofŷ k, i is L 2 . The fine frequency offset estimate (ω k, f ) is obtained by dividing the interval [ω k − 0.005,ω k + 0.005] radian into B 2 frequency bins [48]. The reason for choosing 0.005 radian can be traced to Figure 5. We find that the maximum error in the coarse estimate of the frequency offset is approximately 0.004 radian over 10 4 frames. Thus the probability that the maximum error exceeds 0.005 radian is less than 10 −4 . In Figure 5, the coarse frequency offset estimate is obtained from (24), fine frequency offset estimate from (38), coherent frequency offset estimate ("RMS coho") from (83) and the approximate Cramér-Rao bound from (94).   the two-stage approach (coarse and fine) for frequency offset estimation [48] is illustrated in Table I. The complexity of the two-stage approach is B 1 + B 2 = 128 frequency bins. The resolution of the two-stage approach is 2 × 0.005/B 2 = 0.00015625 radian. For obtaining the same resolution, the single stage approach will require 2×0.04/0.00015625 = 512 frequency bins. Therefore, the two-stage approach is four times more efficient than the single stage approach. At this point, a note on the implementation of the SoF and frequency offset estimation algorithm is in order. Observe that a 2-D search over both frequency and time is required and there is a large scope for parallel processing. Hence, this algorithm is well suited for hardware implementation.

D. Noise Variance Estimation
It is necessary to estimate the noise variance for the purpose of turbo decoding [7]. After the channel has been estimated using (30), the noise variance is estimated as follows: wheres 1 is defined in (28) and L 1 is defined in (13).

E. Turbo Decoding
The encoder block diagram is shown in Figure 7. The overall rate of the encoder is 1/2, since L d1 data bits generate 2L d1 coded QPSK symbols. of the constituent encoders is given by: Let where m 1 is defined in (26). Definẽ as the data part of the received signal for the k th frame. After SoF detection, frequency offset compensation and channel estimation, the receiver block diagram is depicted in Figure 8.
The output of the FFT can be written as (for 0 ≤ i ≤ L d − 1): Note thatĤ k, i andW k, i in Figure 8 are the L d -point DFT of the estimated channelĥ k in (30) andw k, n in (7) respectively, taken over the time interval specified in (43), and S k, 3, i denotes the data symbols for the k th frame, for 0 ≤ i ≤ L d −1.
The variance ofW k, i is and the variance ofĤ k, i is (assuming perfect channel estimates, that isĤ k, i =H k, i ): Note that due to multiplication by the channel DFT (Ĥ k, i ) in (44), the data and parity bits of the QPSK symbol cannot be separated, and the BCJR algorithm is slightly different from the one given in [7]. This is explained below. Observe also that dividing (44) byH k, i results in interference (W k, i /Ĥ k, i ) having a complex ratio distribution [50], [51], which is undesirable.
Corresponding to the transition from state m to state n, at decoder 1, for the k th frame, at time i define (for 0 ≤ i ≤ L d1 − 1, L d1 is defined in Figure 7): where S m, n denotes the QPSK symbol corresponding to the transition from state m to state n in the trellis. We assume that the data bit maps to the real part and the parity bit maps to the imaginary part of the QPSK symbol. We also assume that bit 0 maps to +1 and bit 1 maps to −1. Observe thatσ 2 w is the estimate of σ 2 w obtained from (40). Similarly, for the transition from state m to state n, at decoder 2, for the k th frame, at time i define (for 0 ≤ i ≤ L d1 − 1): Let S denote the number of states in the encoder trellis. Let D n denote the set of states that diverge from state n. For example implies that states 0 and 3 can be reached from state 0.
Similarly, let C n denote the set of states that converge to state n. Let α i, n denote the alpha value at time Then the alpha values for decoder 1 can be recursively computed as follows (forward recursion): where denotes the a priori probability of the systematic bit corresponding to the transition from state m to state n, at decoder 1, at time i, obtained from the 2 nd decoder at time l, after deinterleaving (that is, i = π −1 (l) for some 0 ≤ l ≤ L d1 − 1). The terms F 2, i+ and F 2, i− are defined similar to (54) given below. The normalization step in the last equation of (50) is done to prevent numerical instabilities [7], [52]. Similarly, let β i, n denote the beta values at time i (1 ≤ i ≤ L d1 − 1) at state n (0 ≤ n ≤ S − 1). Then the recursion for beta (backward recursion) at decoder 1 can be written as: Once again, the normalization step in the last equation of (52) is done to prevent numerical instabilities. Let ρ + (n) denote the state that is reached from state n when the input symbol is +1. Similarly let ρ − (n) denote the state that can be reached from state n when the input symbol is −1. Then (for 0 ≤ i ≤ L d1 − 1) Finally, the extrinsic information that is to be fed as a priori probabilities to the second decoder after interleaving, is computed as: Equations (50), (52), (53) and (54) constitute the MAP recursions for the first decoder. The MAP recursions for the second decoder are similar. After a few iterations, (one iteration involves both decoder 1 and 2) the final a posteriori probabilities of i th bit of the k th frame at the output of decoder 1 is given by: followed by When puncturing is used to increase the overall rate, e.g. if the QPSK symbol occurring at odd instants of time in both encoders are not transmitted, then the corresponding gamma values in (47) and (48) are set to unity. For the even time instants, the corresponding gamma values are computed according to (47) and (48).

F. Robust Turbo Decoding
At high SNR, the term in the exponent (b is the exponent of e b ) of (47) and (48) becomes very large (typically b > 100) and it becomes unfeasible for the DSP processor or even a computer to calculate the gammas. We propose to solve this problem by normalizing the exponents. Observe that the exponents are real-valued and negative. Let b 1, j, i denote an exponent at decoder 1 due to the j th symbol in the constellation (1 ≤ j ≤ 4 for QPSK) at time i. Let denote the matrix of exponents for decoder 1. Let b 1, max, i denote the maximum exponent at time i, that is denote the vector containing the maximum exponents. Compute: Note that in (60), the vector b 1, max has to be repeated as many times as the number of symbols in the constellation. If any element of b ′ 1 is less than say, −30, then set it to −30. Thus we get a normalized exponent vector b 1, norm , whose elements lie in the range [0, −30]. It has been found from simulations that normalizing the exponents does not lead to any degradation in BER performance, on the contrary, it increases the operating SNR range of the turbo receiver. In practice, we could divide the range [0, −30] into a large number (e.g. 3000) of levels and the exponentials (e b ) could be precomputed and stored in the DSP processor, and need not be computed in real-time. The choice of the minimum exponent (e.g. −30), would depend on the precision of the DSP processor or the computer.

G. Data Interleaving
Assuming ideal channel estimates, the autocorrelation of the channel DFT at the receiver is: It has been found from simulations that the performance of the turbo decoder gets adversely affected due to the correlation iñ H k, i . To overcome this problem, we interleave the data before the IFFT operation at the transmitter and deinterleave the data after the FFT operation at the receiver, before turbo decoding. This process essentially removes any correlation inH k, i [38].

H. Enhanced Frame Structure
The accuracy of the frequency offset estimate depends on the length of the preamble L p . Increasing the number of frequency bins B 1 and B 2 in Figure 6, for a given L p , does not improve the accuracy. From Figure 5 it can be seen that the RMS value of the fine frequency offset estimation error is about 2 × 10 −4 , at an SNR per bit equal to 8 dB. The subcarrier spacing with data length L d = 4096 is equal to 2π/4096 = 1.534 × 10 −3 radians. Therefore, the residual frequency offset is 0.0002 × 100/0.001534 = 13% of the subcarrier spacing, which is quite high and causes severe intercarrier interference (ICI). Note that the RMS frequency offset estimation error can be reduced by increasing the preamble length (L p ), keeping the data length (L d ) fixed, which in turn reduces the throughput given by: Note that for a rate-1/2 turbo code L d = 2L d1 , whereas for a rate-1 turbo code, L d = L d1 . This motivates us to look for an alternate frame structure which not only solves the frequency offset estimation problem, but also maintains the throughput at a reasonable value.  Consider the frame in Figure 9(a). In addition to the preamble, prefix and data, it contains "buffer" (dummy) symbols of length B and postamble of length L o , all drawn from the QPSK constellation. In Figure 9(b) we illustrate the processing of L d symbols at the transmitter. Observe that only the data and postamble symbols are interleaved before the IFFT operation. After interleaving, the postamble gets randomly dispersed between the data symbols. The buffer symbols are sent directly to the IFFT, without interleaving. The preamble and the cyclic prefix continue to be processed according to Figure 1 and (5). We now explain the reason behind using this frame structure. In what follows, we assume that the SoF has been detected, fine frequency offset correction has been performed and the channel has been estimated.
We proceed by making the following observations: 1) Modulation in the time domain results in a shift in the frequency domain. Therefore, any residual frequency offset after fine frequency offset correction, results in a frequency shift at the output of the FFT operation at the receiver. Moreover, due to the presence of a cyclic prefix, the frequency shift is circular. Therefore, without the buffer symbols, there is a possibility that the first data symbol would be circularly shifted to the last data symbol or vice versa. This explains the use of buffer symbols at both ends in Figure 9. In order to compute the number of buffer symbols (B), we have to know the maximum residual frequency offset, after fine frequency offset correction. Referring to Figure 5, we find that the maximum error in fine frequency offset estimation at 0 dB SNR per bit is about ±2 × 10 −3 radians. With L d = 4096, the subcarrier spacing is 2π/4096 = 1.534×10 −3 radians. Hence, the residual frequency error would result in a shift of ±2/1.534 = ±1.3 subcarrier spacings. Therefore, while B = 2 would suffice, we have taken B = 4, to be on the safe side. 2) Since the frequency shift is not an integer multiple of the subcarrier spacing, we need to interpolate in between the subcarriers, to accurately estimate the shift. Interpolation can be achieved by zero-padding the data before the FFT operation. Thus we get a 2L d −point FFT corresponding to an interpolation factor of 2 and so on. Other methods of interpolation between subcarriers is discussed in [53]. 3) After the FFT operation, postamble matched filtering has to be done, since the postamble andĤ k ≈H k (in (44)) are available. The procedure for constructing the postamble matched filter is illustrated in Figure 10.
The throughput comparison of various frame structures is summarized in Table II.

I. Receiver Diversity
In the presence of receiver diversity, the signal in each diversity arm (l) can be expressed as (see (7)): r k, n, l = s k, n ⋆h k, n, l e j(ω k n+θ k, l ) +w k, n, l =ỹ k, n, l e j(ω k n+θ k, l ) +w k, n, l for 1 ≤ l ≤ N . The frequency offset is assumed to be identical for all the diversity arms, whereas the carrier phase and noise are assumed to be independent. The noise variance is same for all the diversity arms. Two extreme scenarios are considered in the simulations (a) identical channel and (b) independent channel in each diversity arm. The output of the FFT can be written as (for 0 ≤ i ≤ L d − 1): In the turbo decoding operation, (for decoder 1, N diversity arms, rate-1/2 turbo code, the enhanced frame structure in Figure 9 and 0 ≤ i ≤ L d2 /2 − 1), we have from (47): where γ 1, k, i, m, n, l = exp whereσ 2 w is the average estimate of the noise variance over all the diversity arms. Similarly at decoder 2, for 0 ≤ i ≤ L d2 /2 − 1, we have from (48): where γ 2, k, i, m, n, l = exp where For a rate-1 turbo code, alternate gammas have to be set to unity, as explained in the last paragraph of Section III-E.

J. The Channel Capacity
The communication system model under consideration is given by (65). The channel capacity is given by [54]: per dimension (real-valued signals occupy a single dimension, complex-valued signals occupy two dimensions). The "SNR" in (71) denotes the minimum average signal-to-noise ratio per dimension, for error-free transmission. Observe that: 1) The sphere packing derivation of the channel capacity formula [54], does not require noise to be Gaussian. The only requirements are that the noise samples have to be independent, the signal and noise have to be independent, and both the signal and noise must have zero mean.
2) The channel capacity depends only on the SNR.
3) The average SNR per dimension in (71) is different from the average SNR per bit (or E b /N 0 ), which is widely used in the literature. In fact, it can be shown that [7], [54]: 4) It is customary to define the average SNR per bit (E b /N 0 ) over two dimensions (complex signals). When the signal and noise statistics over both dimensions are identical, the average SNR per bit over two dimensions is identical to the average SNR per bit over one dimension. Therefore (72) is valid, even though the SNR is defined over one dimension and the SNR per bit is defined over two dimensions. 5) The notation E b /N 0 is usually used for continuous-time, passband analog signals [54]- [56], whereas SNR per bit is used for discrete-time signals [7]. However, both definitions are equivalent. Note that passband signals are capable of carrying information over two dimensions, using sine and cosine carriers, inspite of the fact that passband signals are real-valued. 6) Each dimension corresponds to a separate and independent path between the transmitter and receiver. 7) The channel capacity is additive with respect to the number of dimensions. Thus, the total capacity over 2N real dimensions is equal to the sum of the capacity over each real dimension. 8) Each S k, 3, i in (65) corresponds to one transmission (over two dimensions, since S k, 3, i is complex-valued). 9) Transmission of L d2 data bits in Figure 9 (for a rate-1 turbo code), results in N L d2 complex samples (2N L d2 real-valued samples) ofR k, i, l in (65), for N th -order receive diversity. Therefore, the channel capacity is per dimension. In other words, (73) implies that in each transmission, one data bit is transmitted over 2N dimensions. Similarly, for a rate-1/2 turbo code with N th -order receive diversity, transmission of L d2 /2 data bits results in N L d2 complex samples ofR k, i, l in (65), and the channel capacity becomes: per dimension. Substituting (73) and (74) in (71), and using (72) we get the minimum (threshold) average SNR per bit required for error-free transmission, for a given channel capacity. The minimum SNR per bit for various code rates and receiver diversity is presented in Table III. Note that [54] the minimum E b /N 0 for errorfree transmission is −1.6 dB only when C → 0. 10) In the case of fading channels, it may not be possible to achieve the minimum possible SNR per bit. This is because, the SNR per bit of a given frame may be less than the threshold average SNR per bit. Such frames are said to be in outage. The frame SNR per bit can be defined as (for the k th frame and the l th diversity arm): where < · > denotes time average over the L d2 data symbols. Note that the frame SNR is different from the average SNR per bit, which is defined as: The k th OFDM frame is said to be in outage when: SNR k, l, bit < minimum average SNR per bit (77) for all l. The outage probability is given by: P out = number of frames in outage total number of frames transmitted .

IV. SIMULATION RESULTS
In this section, we present the simulation results for turbocoded OFDM. In the simulations, the channel length L h is equal to 10, hence L hr = 19. The fade variance σ 2 f = 0.5. The simulation results are presented in Figure 11, for the frame structure in Figure 1(a) with L p = 512 and different values of L d . The term "UC" denotes uncoded, "TC" denotes turbo coded, "data" denotes L d1 , "Pr" denotes practical receiver (with acquired synchronization and channel estimates) and "Id" denotes ideal receiver (ideal synchronization and channel estimates).  11. Simulation results without data interleaving, frame structure in Figure 1(a), rate-1/2 turbo code. c 2013 IEEE. Reprinted, with permission, from [4].
We find that for L d1 = 512, the practical receiver has a performance that is less than 1 dB inferior to the ideal receiver. However, the throughput of this system is just 32.95%, since the data length is equal to the preamble length. Next, for L d1 = 1024, the practical receiver is about 1 dB inferior to the ideal receiver and the throughput has improved to 39.72%. When L d1 = 4096, the performance of the practical receiver is no better than uncoded transmission. This is due to the fact that the residual RMS frequency offset estimation error (fine) in Figure 5 is about 2 × 10 −4 radian, which is a significant fraction of the subcarrier spacing (2π/L d = 0.000767 radian). Note that the frequency offset estimation error depends only on L p and the performance of the ideal receiver is independent of the data length L d1 . Simulation results with data interleaving, frame structure in Figure 1(a), rate-1/2 turbo code. c 2013 IEEE. Reprinted, with permission, from [4] In Figure 12, we present the simulation results with data interleaving, as discussed in Section III-G. Again, the performance of the ideal receiver is independent of L d1 . We see that the practical receiver exhibits more than two orders of magnitude improvement in the BER (compared to the case where there is no data interleaving), at an SNR of 8 dB and L d1 = 512. When L d1 is increased, the performance of the practical receiver deteriorates.  13. Simulation results with data interleaving, enhanced frame structure in Figure 9(a) and rate-1 turbo code.
In Figure 13, we present simulation results for the rate-1 turbo code, with enhanced frame structure, 1 st -order receiver diversity and interpolation factors (ip) equal to 2, 4, 8, 16 and 32. We find that the performance of the practical receiver is as good as the ideal receiver. However, there is a 4 dB degradation in performance of the ideal receiver for the rate-1 turbo code, with respect to the ideal receiver for the rate-1/2 turbo code in Figure 12, at a BER of 10 −5 . This degradation in performance can be compensated by using receiver diversity, which is presented next.  Simulation results with data interleaving, enhanced frame structure in Figure 9(a) and rate-1 turbo code with 2nd order receive diversity. Identical channel on both diversity arms.
In Figure 14, we present simulation results for the rate-1 turbo code, with enhanced frame structure and 2 nd -order receiver diversity. The channel in both diversity arms is assumed to be identical. However, noise in both the diversity arms is assumed to be independent. Comparing Figure 13 and Figure 14, we find that the ideal receiver with 2ndorder diversity is just 2 dB better than the one with 1st-order diversity, at a BER of 10 −5 . Moreover, the practical receivers, with ip=32 have nearly identical performance. This is to be expected, since it is well known that diversity advantage is obtained only when the channels are independent.  15. Simulation results with data interleaving, enhanced frame structure in Figure 9(a) and rate-1 turbo code with 2nd order receive diversity. Independent channel on both diversity arms.
In Figure 15, we present simulation results for the rate-1 turbo code, with enhanced frame structure and 2 nd -order receiver diversity. The channel and noise in both diversity arms are assumed to be independent. Comparing Figure 13 and Figure 15, we find that the ideal receiver with 2nd order diversity exhibits about 5 dB improvement over the one with 1st order diversity, at a BER of 10 −5 . Moreover, the practical receiver with ip=16, 32 is just 1 dB inferior to the ideal receiver, at a BER of 10 −5 .  16. Simulation results for outage probability with data interleaving, enhanced frame structure in Figure 9(a) and rate-1 turbo code with 1st and 2nd order receive diversity. Independent channel on both diversity arms.
Finally, in Figure 16 we present the outage probability for the rate-1 turbo code with 1st and 2nd order receive diversity. The outage probability for 1st order receive diversity, at 6 dB SNR per bit is 3×10 −4 . In other words, 3 out of 10 4 frames are in outage (no error correcting code can correct errors in such frames). Therefore, in the worst case, the number of bit errors for the frames in outage would be 0.5 × 3 × 3832 (assuming probability of error is 0.5). Let us also assume that for the remaining (10000 − 3 = 9997) frames, all errors are corrected, using a sufficiently powerful error correcting code. Therefore, in the best case situation, the overall BER at 6 dB SNR per bit, with 1st order diversity would be 0.5 × 3 × 3832/(10000 * 3832) = 1.5 × 10 −4 . However, from Figure 13, even the ideal coherent receiver exhibits a BER as high as 10 −2 at 6 dB SNR per bit. Therefore, there is large scope for improvement, using perhaps a more powerful error correcting code.
Similarly we observe from Figure 16 that, with 2nd order receive diversity, the outage probability is 10 −4 at 3 dB SNR per bit. This implies that 1 out of 10 4 frames is in outage. Using similar arguments, the best case overall BER at 3 dB SNR per bit would be 0.5×3832/(10000 * 3832) = 0.5×10 −4 . From Figure 15, the ideal coherent receiver gives a BER of 2 × 10 −2 , at 3 dB SNR per bit, once again suggesting that there is large scope for improvement, using a better code.

V. CONCLUSIONS AND FUTURE WORK
This paper deals with linear complexity coherent detectors for turbo-coded OFDM signals transmitted over frequency selective Rayleigh fading channels. Simulation results show that it is possible to achieve a BER of 10 −5 at an SNR per bit of 8 dB and throughput equal to 82.84%, using a single transmit and two receive antennas.
With the rapid advances in VLSI technology, it is expected that coherent transceivers would drive the future wireless telecommunication systems.
It may be possible to further improve the performance, using a better code.

A. An Approximate and Simple Cramér-Rao Bound on the Variance of the Frequency Offset Estimation Error
Consider the signal model in (7), which is repeated here for convenience (for notational simplicity, we drop the subscript k, assume θ k = 0 and N − 1 = L p − L h + 1): r n =ỹ n e jωn +w n for 0 ≤ n ≤ N − 1. (79) We assume that the channel is known, and henceỹ n is known at the receiver. Moreover, we consider only the steady-state preamble part of the received signal (note that time is suitably re-indexed, such that the first steady-state sample is considered as time zero, whereas, actually the first steady-state sample occurs at time L h − 1). Definẽ The coherent maximum likelihood (ML) estimate of the frequency offset is obtained as follows: choose that value of ω which maximizes the joint conditional pdf where ω max denotes the maximum possible frequency offset in radians. Substituting for the joint conditional pdf in (81), we obtain Observe that (38) is the non-coherent ML frequency offset (and timing) estimator, whereas (83) is the coherent ML frequency offset estimator assuming timing is known. Since ML estimators are unbiased, the variance of the frequency offset estimate is lower bounded by the Cramér-Rao bound (CRB): sinceỹ is assumed to be known. It can be shown that Substituting (85) in (84) and assuming independent noise (the real and imaginary parts of noise are also assumed independent), we obtain: and hence whenỹ n is known. Whenỹ n is a random variable, which is true in our case, then the right hand side of (87) needs to be further averaged overỹ [57], [58]. In other words, we need to compute Therefore where we have assumed 1)h n ands n to be independent 2)s n (the preamble) varies randomly from frame to frame and is not a constant. Hence (91) can be rewritten as: where σ 2 f is defined in (3), σ 2 s is defined in (6) and δ K (·) is the Kronecker delta function. With these developments (88) becomes