1 Introduction

Future wireless communication standards aim to push the existing data-rates higher. This can only be achieved with the help of coherent communications, since they give the lowest bit-error-rate (BER) performance for a given signal-to-noise ratio (SNR). Conversely, they require the lowest SNR to attain a given BER, resulting in enhanced battery life. If we look at a mobile, it indicates a typical received signal strength equal to \(-100\) dBm (\(10^{-10}\) mW). However this is not the signal-to-noise ratio! Therefore, the question is: What is the operating SNR of the mobiles? Would it be possible to achieve the same performance by transmitting at a lower power? The recent advances in cooperative communications has resulted in low complexity solutions, that are not necessarily power efficient [1, 2]. In fact, it is worth quoting the following from [3]:

  1. 1.

    The Myth: Sixty years of research following Shannon’s pioneering paper has led to telecommunications solutions operating arbitrarily close to the channel capacity—“flawless telepresence” with zero error is available to anyone, anywhere, anytime across the globe.

  2. 2.

    The Reality: Once we leave home or the office, even top of the range iPhones and tablet computers fail to maintain “flawless telepresence” quality. They also fail to approach the theoretical performance predictions. The 1,000-fold throughput increase of the best third- generation (3G) phones over second-generation (2G) GSM phones and the 1,000-fold increased teletraffic predictions of the next decade require substantial further bandwidth expansion toward ever increasing carrier frequencies, expanding beyond the radio- frequency (RF) band to optical frequencies, where substantial bandwidths are available.

The transmitter and receiver algorithms proposed in this paper and in [4, 5] are well suited for implementation on a DSP processor or hardwired and may perhaps not require quantum computers, as mentioned in [3]. The reader is also referred to the brief commentary on channel estimation and synchronization in page 1351 and also to the noncoherent schemes in page 1353 of [1], which clearly state that cooperative communications avoid coherent receivers due to complexity.

Broadly speaking, the wireless communication device needs to have the following features:

  1. 1.

    maximize the bit-rate

  2. 2.

    minimize the bit-error-rate

  3. 3.

    minimize transmit power

  4. 4.

    minimize transmission bandwidth

A rather disturbing trend in the present day wireless communication systems is to make the physical layer very simple and implement it in hardware, and allot most of the computing resources to the application layer, e.g., for internet surfing, video conferencing etc. While hardware implementation of the physical layer is not an issue, in fact, it may even be preferred over software implementation in some situations, the real cause for concern is the tendency to make it “simple”, at the cost of BER performance. Therefore, the questions are:

  1. 1.

    was signal processing for coherent communications given a chance to prove itself, or was it ignored straightaway, due to “complexity” reasons?

  2. 2.

    are the present day single antenna wireless transceivers, let alone multi-antenna systems, performing anywhere near channel capacity?

This paper demonstrates that coherent receivers need not be restricted to textbooks alone, in fact they can be implemented with linear (not exponential) complexity. The need of the hour is a paradigm shift in the way the wireless communication systems are implemented.

In this article, we dwell on coherent receivers based on orthogonal frequency division multiplexing (OFDM), since it has the ability to mitigate intersymbol interference (ISI) introduced by the frequency selective fading channel [68]. The “complexity” of coherent detection can be overcome by means of parallel processing, for which there is a large scope. We wish to emphasize that this article presents a proof-of-concept, and is hence not constrained by the existing standards in wireless communication. We begin by first outlining the tasks of a coherent receiver. Next, we scan the literature on each of these tasks to find out the state-of-the-art, and finally end this section with our contributions.

The basic tasks of the coherent receiver would be:

  1. 1.

    To correctly identify the start of the (OFDM) frame (SoF), such that the probability of false alarm (detecting an OFDM frame when it is not present) or equivalently the probability of erasure/miss (not detecting the OFDM frame when it is present) is minimized. We refer to this step as timing synchronization.

  2. 2.

    To estimate and compensate the carrier frequency offset (CFO), since OFDM is known to be sensitive to CFO. This task is referred to as carrier synchronization.

  3. 3.

    To estimate the channel impulse/frequency response.

  4. 4.

    To perform (coherent) turbo decoding and recover the data.

To summarize, a coherent receiver at the physical layer ensures that the medium access control (MAC) is not burdened by frequent requests for retransmissions.

A robust timing and frequency synchronization for OFDM signals transmitted through frequency selective additive white Gaussian noise (AWGN) channels is presented in [9]. Timing synchronization in OFDM is addressed in [1014]. Various methods of carrier frequency synchronization for OFDM are given in [1521]. Joint timing and CFO estimation is discussed in [2227].

Decision directed coherent detection of OFDM in the presence of Rayleigh fading is treated in [28]. A factor graph approach to the iterative (coherent) detection of OFDM in the presence of CFO and phase noise is presented in [29]. OFDM detection in the presence of intercarrier interference (ICI) using block whitening is discussed in [30]. In [31], a turbo receiver is proposed for detecting OFDM signals in the presence of ICI and inter antenna interference.

Most flavors of the channel estimation techniques discussed in the literature are done in the frequency domain, using pilot symbols at regular intervals in the time/frequency grid [3236]. Iterative joint channel estimation and multi-user detection for multi-antenna OFDM is discussed in [37]. Noncoherent detection of coded OFDM in the absence of frequency offset and assuming that the channel frequency response to be constant over a block of symbols, is considered in [38]. Expectation maximization (EM)-based joint channel estimation and exploitation of the diversity gain from IQ imbalances is addressed in [39].

Detection of OFDM signals, in the context of spectrum sensing for cognitive radio, is considered in [40, 41]. However, in both these papers, the probability of false alarm is quite high (5 %).

In [42], discrete cosine transform (DCT) based OFDM is studied in the presence of frequency offset and noise, and its performance is compared with the discrete Fourier transform (DFT) based OFDM. It is further shown in [42] that the performance of DFT-OFDM is as good as DCT-OFDM, for small frequency offsets.

A low-power OFDM implementation for wireless local area networks (WLANs) is addressed in [43]. OFDM is a suggested modulation technique for digital video broadcasting [44, 45]. It has also been proposed for optical communications [46].

The novelty of this work lies in the use of a filter that is matched to the preamble, to acquire timing synchronization [47, 48] (start-of-frame (SoF) detection). Maximum likelihood (ML) channel estimation using the preamble is performed. This approach does not require any knowledge of the channel and noise statistics.

The main contributions of this paper are the following:

  1. 1.

    It is shown that, for a sufficiently long preamble, the variance of the channel estimator proposed in eq. (28) of [4] approaches zero.

  2. 2.

    A known postamble is used to accurately estimate the residual frequency offset for large data lengths, thereby increasing the throughput compared to [4, 5].

  3. 3.

    Turbo codes are used to attain BER performance closer to channel capacity compared to any other earlier work in the open literature, for channels having a uniform power delay profile (to the best of the authors knowledge, there is no similar work on the topic of this paper, other than [4, 5]).

  4. 4.

    A robust turbo decoder is proposed, which performs effectively over a wide range of SNR (0–30 dB).

In a multiuser scenario, the suggested technique is OFDM-TDMA. The uplink and downlink may be implemented using time division duplex (TDD) or frequency division duplex (FDD) modes.

This paper is organized as follows. Section 2 describes the system model. The enhanced frame structure is described in Sect. 3. The modifications in the turbo decoder in the presence of receive diversity and the variance of the channel estimation error, are presented in Sect. 4. The channel capacity is discussed in Sect. 5. The BER results from computer simulations are given in Sect. 6. Finally, in Sect. 7, we discuss the conclusions and future work.

2 System Model

We assume 1st-order transmit diversity and \(N\)th-order receive diversity. The data is organized into frames, as depicted in Fig. 1. The earlier frame structure considered in [4] is given in Fig. 2.

Fig. 1
figure 1

a Enhanced frame structure. b Processing of the data part at the transmitter

Fig. 2
figure 2

a The frame structure in [4]. b System model. \(k\) denotes the frame index and \(n\) denotes the time index in a given frame. ©  2013 IEEE. Reprinted, with permission, from [4]

The received signal in each diversity arm (\(l\)) can be expressed as (see also eq. (5) in [4]):

$$\begin{aligned} \tilde{r}_{k,\, n,\, l}&= \left( \tilde{s}_{k,\, n} \star \tilde{h}_{k,\, n,\, l} \right) \, \mathrm {e}^{\,\mathrm {j} (\omega _k n+\theta _{k,\, l})} + \tilde{w}_{k,\, n,\, l} \nonumber \\&= \tilde{y}_{k,\, n,\, l} \mathrm {e}^{\,\mathrm {j} (\omega _k n+\theta _{k,\, l})} + \tilde{w}_{k,\, n,\, l} \end{aligned}$$
(1)

for \(1 \le l \le N\). The frequency offset is assumed to be identical for all the diversity arms, whereas the carrier phase and noise are assumed to be independent. The noise variance is same for all the diversity arms. Two extreme scenarios are considered in the simulations (a) identical channel and (b) independent channel in each diversity arm. The channel span is \(L_h\) and is assumed to have a uniform power delay profile.

The output of the FFT can be written as (for \(0 \le i \le L_d-1\)):

$$\begin{aligned} \tilde{R}_{k,\, i,\, l} = \hat{H}_{k,\, i,\, l} S_{k,\, 3,\, i} + \tilde{W}_{k,\, i,\, l} \end{aligned}$$
(2)

for \(1 \le l \le N\) diversity arms. The notation in (2) is self explanatory and is based on eq. (36) in [4].

3 Enhanced Frame Structure

The basic motivation behind the enhanced frame structure, is to increase the throughput, which in turn, depends on \(L_d\). The accuracy of the frequency offset estimate depends on the length of the preamble \(L_p\). Increasing the number of frequency bins \(B_1\) and \(B_2\) [4] for a given \(L_p\), does not improve the accuracy. From Fig. 3 it can be seen that the RMS value of the fine frequency offset estimation error is about \(2\times 10^{-4}\), at an SNR per bit equal to 8 dB. The subcarrier spacing with data length \(L_d=4{,}096\) is equal to \(2\pi /4{,}096=1.534\times 10^{-3}\) radians. Therefore, the residual frequency offset is \(0.0002\times 100/0.001534=13\,\%\) of the subcarrier spacing, which is quite high and causes severe ICI. Note that the RMS frequency offset estimation error can be reduced by increasing the preamble length (\(L_p\)), keeping the data length (\(L_d\)) fixed, which in turn reduces the throughput given by (for the frame structure in Fig. 2):

$$\begin{aligned} \fancyscript{T} = \frac{L_{d1}}{L_p+L_{cp}+L_d} \end{aligned}$$
(3)

where \(L_{d1}\) is defined in Fig. 4, for the frame structure given in Fig. 2. Note that for a rate-\(1/2\) turbo code \(L_d=2L_{d1}\), whereas for a rate-1 turbo code, \(L_d=L_{d1}\). This motivates us to look for an alternate frame structure which not only solves the frequency offset estimation problem, but also maintains the throughput at a reasonable value.

Fig. 3
figure 3

RMS and maximum frequency offset estimation error for \(L_p=512\)

Fig. 4
figure 4

Encoder block diagram for the frame structure in Fig. 2. ©  2013 IEEE. Reprinted, with permission, from [4]

Consider the frame in Fig. 1a. In addition to the preamble, prefix and data, it contains “buffer” (dummy) symbols of length \(B\) and postamble of length \(L_o\), all drawn from the QPSK constellation. In Fig. 1b we illustrate the processing of \(L_d\) symbols at the transmitter. Observe that only the data and postamble symbols are interleaved before the IFFT operation. After interleaving, the postamble gets randomly dispersed between the data symbols. The buffer symbols are sent directly to the IFFT, without interleaving. The preamble and the cyclic prefix continue to be processed according to Figure 1 in [4] and eq. (3) in [4]. We now explain the reason behind using this frame structure. In what follows, we assume that the SoF has been detected, fine frequency offset correction has been performed and the channel has been estimated.

We proceed by making the following observations:

  1. 1.

    Modulation in the time domain results in a shift in the frequency domain. Therefore, any residual frequency offset after fine frequency offset correction, results in a frequency shift at the output of the FFT operation at the receiver. Moreover, due to the presence of a cyclic prefix, the frequency shift is circular. Therefore, without the buffer symbols, there is a possibility that the first data symbol would be circularly shifted to the last data symbol or vice versa. This explains the use of buffer symbols at both ends in Fig. 1. In order to compute the number of buffer symbols (\(B\)), we have to know the maximum residual frequency offset, after fine frequency offset correction. Referring to Fig. 3, we find that the maximum error in fine frequency offset estimation at 0 dB SNR per bit is about \(\pm 2\times 10^{-3}\) radians. With \(L_d=4{,}096\), the subcarrier spacing is \(2\pi /4{,}096=1.534\times 10^{-3}\) radians. Hence, the residual frequency error would result in a shift of \(\pm 2/1.534=\pm 1.3\) subcarrier spacings. Therefore, while \(B=2\) would suffice, we have taken \(B=4\), to be on the safe side.

  2. 2.

    Since the frequency shift is not an integer multiple of the subcarrier spacing, we need to interpolate in between the subcarriers, to accurately estimate the shift. Interpolation can be achieved by zero-padding the data before the FFT operation. Thus we get a \(2L_d-\)point FFT corresponding to an interpolation factor of two and so on. Other methods of interpolation between subcarriers is discussed in [49].

  3. 3.

    After the FFT operation, postamble matched filtering has to be done, since the postamble and \(\hat{H}_k\approx \tilde{H}_k\) are known. The procedure for constructing the postamble matched filter is illustrated in Fig. 5. From simulations, it has been found that a postamble length \(L_o=128\) results in false peaks at the postamble matched filter output at 0 dB SNR per bit. Therefore we have taken \(L_o=256\). With these calculations, the length of the data works out as \(L_{d2}=L_d-2B-L_o=4{,}096-8-256=3{,}832\) QPSK symbols. The throughput of the enhanced frame structure (with rate-1 turbo code) is

    $$\begin{aligned} \fancyscript{T}&= \frac{L_{d2}}{L_p+L_{cp}+L_d} \nonumber \\&= \frac{3{,}832}{512+18+4{,}096} \nonumber \\&= 82.84\,\%. \end{aligned}$$
    (4)

    The throughput comparison of various frame structures is summarized in Table 1.

Fig. 5
figure 5

Obtaining the postamble matched filter for \(L_d=8\). Buffer symbols are not shown. The frequency offset (\(\pi /L_d\)) is half the subcarrier spacing (\(2\pi /L_d\)). \(H_k\) and \(S_k\) are assumed to be real-valued. Noise is absent. a Output of the \(L_d\)-point FFT in the absence of frequency offset. The red lines represent postamble and the blue lines represent data symbols. b Output of the \(2L_d\)-point FFT in the presence of frequency offset. Observe that the red and blue lines have shifted to the right by \(\pi /L_d\). Green lines denote the output of the \(L_d\)-point FFT in the presence of frequency offset. c The postamble matched filter

Table 1 Throughput comparison of various frame structures with \(L_p=L_{d1}=512\), \(L_{d2}=3{,}832\), \(L_{cp}=18\)

4 Receiver

The receiver algorithms for start-of-frame (SoF) detection, frequency offset, channel and noise variance estimation are already discussed in [4, 5], and apply also to the enhanced frame structure given above and receive diversity. In what follows, we describe the modifications required in the turbo decoder in the presence of receive diversity.

4.1 Turbo Decoding

In the turbo decoding operation, (for decoder 1, transition from state \(m\) to \(n\), \(k\)th frame, \(N\) diversity arms, rate-\(1/2\) turbo code, the enhanced frame structure in Fig. 1 and \(0 \le i \le L_{d2}/2 -1\)), we have (assuming independent noise in all the diversity arms)

$$\begin{aligned} \gamma _{1,\, k,\, i,\, m,\, n} = \prod _{l=1}^{N} \gamma _{1,\, k,\, i,\, m,\, n,\, l} \end{aligned}$$
(5)

where

$$\begin{aligned} \gamma _{1,\, k,\, i,\, m,\, n,\, l} = \exp \left[ - \frac{ \left( \tilde{R}_{k,\, i,\, l}- \hat{H}_{k,\, i,\, l} S_{m,\, n} \right) ^2}{2L_d\hat{\sigma }_w^2} \right] \end{aligned}$$
(6)

where \(\hat{\sigma }_w^2\) is the average estimate of the noise variance over all the diversity arms and \(S_{m,\, n}\) is the QPSK symbol corresponding to the transition from state \(m\) to \(n\).

Similarly at decoder 2, for \(0 \le i \le L_{d2}/2-1\), we have:

$$\begin{aligned} \gamma _{2,\, k,\, i,\, m,\, n} = \prod _{l=1}^{N} \gamma _{2,\, k,\, i,\, m,\, n,\, l} \end{aligned}$$
(7)

where

$$\begin{aligned} \gamma _{2,\, k,\, i,\, m,\, n,\, l} = \exp \left[ - \frac{ \left( \tilde{R}_{k,\, j,\, l}- \hat{H}_{k,\, j,\, l} S_{m,\, n} \right) ^2}{2L_d\hat{\sigma }_w^2} \right] \end{aligned}$$
(8)

where

$$\begin{aligned} j&= L_{d3}+ i \nonumber \\ L_{d3}&= L_{d2}/2. \end{aligned}$$
(9)

For a rate-1 turbo code obtained by puncturing, alternate gammas have to be set to unity [5, 7]. The rest of the BCJR algorithm is described in [5, 7].

4.2 Robust Turbo Decoding

At high SNR, the term in the exponent (\(b\) is the exponent of \(\mathrm {e}^b\)) of (6) and (8) becomes very large (typically \(b>100\)) and it becomes unfeasible for the DSP processor or even a computer to calculate the gammas. We propose to solve this problem by normalizing the exponents. Observe that the exponents are real-valued and negative. Let \(b_{1,\, j,\, i}\) denote an exponent at decoder 1 due to the \(j\)th symbol in the constellation (\(1\le j\le 4\) for QPSK) at time \(i\). For notational convenience, we again assume a rate-\(1/2\) turbo code. Let

$$\begin{aligned} \mathbf {b}_1 = \left[ \begin{array}{ccc} b_{1,\, 1,\, 0} &{} \ldots &{} b_{1,\, 1,\, L_{d3}-1}\\ \vdots &{} \vdots &{} \vdots \\ b_{1,\, 4,\, 0} &{} \ldots &{} b_{1,\, 4,\, L_{d3}-1} \end{array} \right] \end{aligned}$$
(10)

denote the matrix of exponents for decoder 1. Let \(b_{1,\, \mathrm {max},\, i}\) denote the maximum exponent at time \(i\), that is

$$\begin{aligned} b_{1,\,\mathrm {max},\, i} = \max \left[ \begin{array}{c} b_{1,\, 1,\, i}\\ \vdots \\ b_{1,\, 4,\, i} \end{array} \right] . \end{aligned}$$
(11)

Let

$$\begin{aligned} \mathbf {b}_{1,\,\mathrm {max}} = \left[ \begin{array}{ccc} b_{1,\, \mathrm {max},\, 0}&\ldots&b_{1,\,\mathrm {max},\, L_{d3}-1} \end{array} \right] \end{aligned}$$
(12)

denote the vector containing the maximum exponents. Compute:

$$\begin{aligned} \mathbf {b}_1' = \mathbf {b}_1 - \left[ \begin{array}{c} \mathbf {b}_{1,\,\mathrm {max}}\\ \vdots \\ \mathbf {b}_{1,\,\mathrm {max}} \end{array} \right] . \end{aligned}$$
(13)

Note that in (13), the vector \(\mathbf {b}_{1,\,\mathrm {max}}\) has to be repeated as many times as the number of symbols in the constellation.

If any element of \(\mathbf {b}_1'\) is less than say, \(-30\), then set it to \(-30\). Thus we get a normalized exponent vector \(\mathbf {b}_{1,\,\mathrm {norm}}\), whose elements lie in the range \([0,\, -30]\). It has been found from simulations that normalizing the exponents does not lead to any degradation in BER performance, on the contrary, it increases the operating SNR range of the turbo receiver. In practice, we could divide the range \([0,\, -30]\) into a large number (e.g. 3,000) of levels and the exponentials (\(\mathrm {e}^b\)) could be precomputed and stored in the DSP processor, and need not be computed in real-time. The choice of the minimum exponent (e.g. \(-30\)), would depend on the precision of the DSP processor or the computer.

4.3 Variance of the Channel Estimation Error

To see the effect of noise on the channel estimate in eq. (28) of [4], consider

$$\begin{aligned} \tilde{\mathbf {u}} = \left( \tilde{\mathbf {s}}_1^H \tilde{\mathbf {s}}_1 \right) ^{-1} \tilde{\mathbf {s}}_1^H \tilde{\mathbf {w}}_{k,\, m_1}. \end{aligned}$$
(14)

When \(m_0=L_h-1\) (eq. (24) of [4]), observe that

$$\begin{aligned} \hat{\mathbf {h}}_k = \tilde{\mathbf {h}}_k + \tilde{\mathbf {u}}. \end{aligned}$$
(15)

Since \(\tilde{s}_{1,\, n}\) is a zero-mean random sequence with good autocorrelation properties (approximately a Kronecker delta function weighted by \(L_1\sigma ^2_s\)), it is reasonable to expect

$$\begin{aligned}&\tilde{\mathbf {s}}_1^H \tilde{\mathbf {s}}_1 = L_1 \sigma ^2_s \mathbf {I}_{L_{hr}} \quad \text{ for } L_p \gg L_{hr} \nonumber \\&\quad \Rightarrow \left( \tilde{\mathbf {s}}_1^H \tilde{\mathbf {s}}_1 \right) ^{-1} = 1/(L_1 \sigma ^2_s) \mathbf {I}_{L_{hr}} \nonumber \\&\quad \Rightarrow \tilde{\mathbf {u}} = 1/(L_1 \sigma ^2_s) \tilde{\mathbf {s}}_1^H \tilde{\mathbf {w}}_{k,\, m_1} \end{aligned}$$
(16)

where \(\sigma ^2_s\) is defined in eq. (4) of [4], \(L_1\) is defined in eq. (11) of [4], and \(\mathbf {I}_{L_{hr}}\) is an \(L_{hr}\times L_{hr}\) identity matrix. It can be shown that

$$\begin{aligned} E \left[ \tilde{\mathbf {u}} \tilde{\mathbf {u}}^H \right] = \frac{2\sigma ^2_w}{L_1\sigma ^2_s} {\mathbf {I}}_{L_{hr}} = \frac{\sigma ^2_w L_d}{L_1} {\mathbf {I}}_{L_{hr}} \mathop {=}\limits ^{\Delta } 2 \sigma ^2_u {\mathbf {I}}_{L_{hr}}. \end{aligned}$$
(17)

Therefore, the variance of the ML channel estimate (\(\sigma ^2_u\)) tends to zero as \(L_1\rightarrow \infty \) and \(L_d\) is kept fixed. Conversely, when \(L_d\) is increased keeping \(L_1\) fixed, there is noise enhancement.

5 The Channel Capacity

The communication system model under consideration is given by (2). The channel capacity is given by [50]:

$$\begin{aligned} C = \frac{1}{2} \log _2(1+\text{ SNR }) \qquad \text{ bits/transmission } \end{aligned}$$
(18)

per dimension (real-valued signals occupy a single dimension, complex-valued signals occupy two dimensions). The “SNR” in (18) denotes the minimum average signal-to-noise ratio per dimension, for error-free transmission. Observe that:

  1. 1.

    The sphere packing derivation of the channel capacity formula [50], does not require noise to be Gaussian. The only requirements are that the noise samples have to be independent, the signal and noise have to be independent, and both the signal and noise must have zero mean.

  2. 2.

    The channel capacity depends only on the SNR.

  3. 3.

    The average SNR per dimension in (18) is different from the average SNR per bit (or \(E_b/N_0\)), which is widely used in the literature. In fact, it can be shown that [7, 50]:

    $$\begin{aligned} \text{ SNR }\,=\,2C \times \text{ SNR } \text{ per } \text{ bit. } \end{aligned}$$
    (19)
  4. 4.

    It is customary to define the average SNR per bit (\(E_b/N_0\)) over two dimensions (complex signals). When the signal and noise statistics over both dimensions are identical, the average SNR per bit over two dimensions is identical to the average SNR per bit over one dimension. Therefore (19) is valid, even though the SNR is defined over one dimension and the SNR per bit is defined over two dimensions.

  5. 5.

    The notation \(E_b/N_0\) is usually used for continuous-time, passband analog signals [5052], whereas SNR per bit is used for discrete-time signals [7]. However, both definitions are equivalent. Note that passband signals are capable of carrying information over two dimensions, using sine and cosine carriers, inspite of the fact that passband signals are real-valued.

  6. 6.

    Each dimension corresponds to a separate and independent path between the transmitter and receiver.

  7. 7.

    The channel capacity is additive with respect to the number of dimensions. Thus, the total capacity over \(2N\) real dimensions is equal to the sum of the capacity over each real dimension.

  8. 8.

    Each \(S_{k,\, 3,\, i}\) in (2) corresponds to one transmission (over two dimensions, since \(S_{k,\, 3,\, i}\) is complex-valued).

  9. 9.

    Transmission of \(L_{d2}\) data bits in Fig. 1 (for a rate-1 turbo code), results in \(NL_{d2}\) complex samples (\(2NL_{d2}\) real-valued samples) of \(\tilde{R}_{k,\, i,\, l}\) in (2), for \(N\)th-order receive diversity. Therefore, the channel capacity is

    $$\begin{aligned} C&= \frac{L_{d2}}{2NL_{d2}} \nonumber \\&= \frac{1}{2N} \qquad \text{ bits/transmission } \end{aligned}$$
    (20)

    per dimension. In other words, (20) implies that in each transmission, one data bit is transmitted over \(2N\) dimensions. Similarly, for a rate-\(1/2\) turbo code with \(N\)th-order receive diversity, transmission of \(L_{d2}/2\) data bits results in \(NL_{d2}\) complex samples of \(\tilde{R}_{k,\, i,\, l}\) in (2), and the channel capacity becomes:

    $$\begin{aligned} C&= \frac{L_{d2}}{4NL_{d2}} \nonumber \\&= \frac{1}{4N} \qquad \text{ bits/transmission } \end{aligned}$$
    (21)

    per dimension. Substituting (20) and (21) in (18), and using (19) we get the minimum (threshold) average SNR per bit required for error-free transmission, for a given channel capacity. The minimum SNR per bit for various code rates and receiver diversity is presented in Table 2. Note that [50] the minimum \(E_b/N_0\) for error-free transmission is \(-1.6\) dB only when \(C\rightarrow 0\).

  10. 10.

    In the case of fading channels, it may not be possible to achieve the minimum possible SNR per bit. This is because, the SNR per bit of a given frame may be less than the threshold average SNR per bit. Such frames are said to be in outage. The frame SNR per bit can be defined as (for the \(k\)th frame and the \(l\)th diversity arm):

    $$\begin{aligned} \text{ SNR }_{k,\, l,\,\mathrm {bit}} = \frac{1}{2C} \frac{\langle |\tilde{H}_{k,\, i,\, l} S_{k,\, 3,\, i}|^2\rangle }{\langle |\tilde{W}_{k,\, i,\, l}|^2\rangle } \end{aligned}$$
    (22)

    where \(\langle \cdot \rangle \) denotes time average over the \(L_{d2}\) data symbols. Note that the frame SNR is different from the average SNR per bit, which is defined as [4]:

    $$\begin{aligned} \text{ SNR } \text{ per } \text{ bit }&= \frac{1}{2C} \frac{E\left[ \left| \tilde{H}_{k,\, i,\, l} S_{k,\, 3,\, i} \right| ^2\right] }{E\left[ \left| \tilde{W}_{k,\, i,\, l}\right| ^2\right] } \nonumber \\&= \frac{1}{2C} \frac{2 L_h \sigma ^2_f}{L_d \sigma ^2_w}. \end{aligned}$$
    (23)

    The \(k\)th OFDM frame is said to be in outage when:

    $$\begin{aligned} \text{ SNR }_{k,\, l,\,\mathrm {bit}} < \text{ minimum } \text{ average } \text{ SNR } \text{ per } \text{ bit } \end{aligned}$$
    (24)

    for all \(l\). The outage probability is given by:

    $$\begin{aligned} P_{\mathrm {out}} = \frac{\text{ number } \text{ of } \text{ frames } \text{ in } \text{ outage }}{\text{ total } \text{ number } \text{ of } \text{ frames } \text{ transmitted }}. \end{aligned}$$
    (25)
Table 2 The minimum SNR per bit for different code rates and receiver diversity

6 Simulation Results

In this section, we present the simulation results for turbo-coded OFDM. In the simulations, the channel length \(L_h\) is equal to 10, hence \(L_{hr}=19\) [4]. The fade variance \(\sigma ^2_f=0.5\) [4]. Figures 6 and 7 depict the simulation results without and with data interleaving respectively, for the frame structure in Fig. 2a and the rate-1/2 turbo code.

Fig. 6
figure 6

Simulation results without data interleaving, frame structure in Fig. 2a, rate-\(1/2\) turbo code. ©  2013 IEEE. Reprinted, with permission, from [4]

Fig. 7
figure 7

Simulation results with data interleaving, frame structure in Fig. 2a, rate-\(1/2\) turbo code. ©  2013 IEEE. Reprinted, with permission, from [4]

In Fig. 8, we present simulation results for the rate-1 turbo code, with enhanced frame structure, 1st-order receiver diversity and interpolation factors (ip) equal to 2, 4, 8, 16 and 32. We find that the performance of the practical receiver is as good as the ideal receiver for ip equal to 16 and 32. However, there is a 4 dB degradation in performance of the ideal receiver for the rate-1 turbo code, with respect to the ideal receiver for the rate-\(1/2\) turbo code in Fig. 7, at a BER of \(10^{-5}\). This degradation in performance can be compensated by using receiver diversity, which is presented next.

Fig. 8
figure 8

Simulation results with data interleaving, enhanced frame structure in Fig. 1a and rate-1 turbo code

In Fig. 9, we present simulation results for the rate-1 turbo code, with enhanced frame structure and \(2^{nd}\)-order receiver diversity. The channel in both diversity arms is assumed to be identical. However, noise in both the diversity arms is assumed to be independent. Comparing Figs. 8 and 9, we find that the ideal receiver with 2nd-order diversity is just 2 dB better than the one with 1st-order diversity, at a BER of \(10^{-5}\). Moreover, the practical receivers, with ip \(=\) 32 have nearly identical performance. This is to be expected, since it is well known that diversity advantage is obtained only when the channels are independent.

Fig. 9
figure 9

Simulation results with data interleaving, enhanced frame structure in Fig. 1a and rate-1 turbo code with 2nd order receive diversity. Identical channel on both diversity arms

In Fig. 10, we present simulation results for the rate-1 turbo code, with enhanced frame structure and 2nd-order receiver diversity. The channel and noise in both diversity arms are assumed to be independent. Comparing Figs. 8 and 10, we find that the ideal receiver with 2nd order diversity exhibits about 5 dB improvement over the one with 1st order diversity, at a BER of \(10^{-5}\). Moreover, the practical receiver with ip \(=\) 16, 32 is just 1 dB inferior to the ideal receiver, at a BER of \(10^{-5}\).

Fig. 10
figure 10

Simulation results with data interleaving, enhanced frame structure in Fig. 1a and rate-1 turbo code with 2nd order receive diversity. Independent channel on both diversity arms

Finally, in Fig. 11 we present the outage probability for the rate-1 turbo code with 1st and 2nd order receive diversity. The outage probability for 1st order receive diversity, at 6 dB SNR per bit is \(3\times 10^{-4}\). In other words, 3 out of \(10^4\) frames are in outage (no error correcting code can correct errors in such frames). Therefore, in the worst case, the number of bit errors for the frames in outage would be \(0.5\times 3\times 3{,}832\) (assuming probability of error is 0.5). Let us also assume that for the remaining (\(10{,}000-3=9{,}997\)) frames, all errors are corrected, using a sufficiently powerful error correcting code. Therefore, in the best case situation, the overall BER at 6 dB SNR per bit, with 1st order diversity would be \(0.5\times 3\times 3{,}832/(10{,}000\times 3{,}832)=1.5\times 10^{-4}\). However, from Fig. 8, even the ideal coherent receiver exhibits a BER as high as \(10^{-2}\) at 6 dB SNR per bit. Therefore, there is large scope for improvement, using perhaps a more powerful error correcting code. However, it has been found from simulations that, increasing the number of trellis states does not result in a significant improvement in the BER performance. This is probably due to the fact that puncturing leads to loss of information.

Fig. 11
figure 11

Simulation results for outage probability with data interleaving, enhanced frame structure in Fig. 1a and rate-1 turbo code with 1st and 2nd order receive diversity. Independent channel on both diversity arms

Similarly we observe from Fig. 11 that, with 2nd order receive diversity, the outage probability is \(10^{-4}\) at 3 dB SNR per bit. This implies that 1 out of \(10^{4}\) frames is in outage. Using similar arguments, the best case overall BER at 3 dB SNR per bit would be \(0.5\times 3{,}832/(10{,}000\times 3{,}832)=0.5\times 10^{-4}\). From Fig. 10, the ideal coherent receiver gives a BER of \(2\times 10^{-2}\), at 3 dB SNR per bit, once again suggesting that there is large scope for improvement.

7 Conclusions and Future Work

This paper deals with linear complexity coherent detectors for turbo-coded OFDM signals transmitted over frequency selective Rayleigh fading channels. Simulation results show that it is possible to achieve a BER of \(10^{-5}\) at an SNR per bit of 8 dB and throughput equal to 82.84 %, using a single transmit and two receive antennas.

With the rapid advances in VLSI technology, it is expected that coherent transceivers would drive the future wireless telecommunication systems.

It may be possible to further improve the performance, using a better code.