1. Introduction

In the last decades, precoding techniques allowing spatial multiplexing of several users have been proposed to improve the spectral efficiency of multiuser multiple-input multiple-output (MU-MIMO) communication systems. Dirty paper coding (DPC) [1] is a theoretical scheme which allows to precancel the noncasually known interference at the transmitter without entailing a power penalty. For a given user ordering, DPC is serially applied over the users allowing to presubstract the interference caused by users with lower indices [2]. Although it has been proved that DPC achieves the whole capacity region of the MIMO broadcast channel [2, 3], it suffers from a high level of complexity when implemented in practical systems. Due to this, precoding schemes requiring lower complexity are usually employed. Linear schemes, such as zero-forcing (ZF) [4], and non-linear schemes, such as Tomlinson-Harashima precoding (THP) [5] or lattice-reduction Tomlinson-Harashima precoding (LRTHP) [6], are mostly used for single-antenna receivers. In interference-predominant scenarios, non-linear techniques achieve better performance at the cost of higher complexity [7]. For multiple-antenna receivers, however, a linear technique called block diagonalization has been proposed showing good performance when such optimizations as in [8] are considered.

Multiuser MIMO precoding and scheduling techniques require an accurate knowledge of the CSI at the transmitter to achieve full multiuser multiplexing gain [3, 9, 10]. In frequency division duplex (FDD) systems, CSI at the receivers is obtained through an estimation of the channel using reference signals (RS) and it is subsequently sent back to the transmitters via a low-rate feedback channel. Thus, designing limited feedback schemes to reduce the amount of necessary feedback information plays an important role to achieve efficient communication systems. MIMO techniques can also enhance the performance of orthogonal frequency division multiplexing (OFDM) by exploiting the spatial domain. These systems are known as MIMO-OFDM systems. OFDM is a technique used to mitigate the effects of inter-symbol interference in frequency selective channels, turning a broadband frequency selective channel into a set of parallel narrowband frequency flat subchannels [11]. For these systems, multiuser precoding techniques can be carried out independently in each one of the subchannels.

In MIMO-OFDM systems, the amount of CSI that the user equipments (UEs) need to feed back to the transmitter is related to the number of subcarriers or the length of the channel impulse response (CIR). For instance, long term evolution (LTE) Rel. 8 supports a scalable bandwidth up to 20 MHz [12], but it does not satisfy the International Mobile Telecommunications-Advanced (IMT-Advanced) requirements defined by the International Telecommunication Union. Due to this, LTE-Advanced presents some new radio features [13], such as carrier aggregation (CA), in order to improve the peak data rate. CA allows a contiguous or non-contiguous aggregation of bandwidth up to 100 MHz [14], which stands for 6,000 modulated LTE subcarriers. Not only an increase in user data rates is provided but also a more flexible and optimal utilization of frequency resources. However, since the UE is using a higher number of subcarriers, the amount of information that needs to be fed back is larger too.

In terms of feedback, the simplest generalization of MIMO systems to MIMO-OFDM systems would require feeding back independent CSI information per subcarrier. However, this solution is inefficient, since it neglects the frequency correlation between subcarriers. In systems allowing CA, it would mean a large amount of feedback overhead. In order to reduce the feedback information, some frequency-domain techniques take advantage of the channel frequency correlation, grouping adjacent subcarriers. This approach has been adopted for LTE and LTE-Advanced, where groups of 12 adjacent subcarriers are known as resource blocks (RBs) [13]. A common approach in limited feedback schemes consists on assuming that the channel is constant for the subcarriers within a RB. This assumption holds under some conditions based on the channel coherence bandwidth and the feedback rate [15].

Limited feedback schemes for MIMO-OFDM systems have been widely proposed in the literature, and we will comment the most representative as in [1619]. A frequency-domain limited feedback scheme is presented in [16]. The beamforming matrix for the pilot subcarrier within each RB is calculated, quantized through random vector quantization (RVQ) and fed back by the receiver. The beamforming matrices for non-pilot subcarriers are obtained through a spherical interpolation at the transmitter. In [17], the frequency correlation is exploited by dividing the channel frequency response (CFR) into smaller vectors and performing a RVQ over them. The length of these vectors is related to the frequency correlation properties of the channel (i.e., the channel coherence bandwidth). However, correlation between subcarriers can be difficult to exploit and computationally expensive. In order to avoid complex frequency interpolation operations, a time-domain channel quantized feedback scheme is presented in [18], comparing it with two different frequency-domain channel quantization schemes: an analog feedback scheme and a direction quantized feedback scheme. It is shown that the scheme based on time-domain channel quantization outperforms frequency-domain schemes in terms of system sum-rate, requiring lower complexity. In [19], the amount of information to feed back is reduced by exploiting temporal and spatial correlation through rank reduction. However, statistical channel information, such as the channel covariance matrix, has to be estimated and also fed back, but it allows for a robust precoder design at the transmitter as an advantage.

Spectral efficiency is one of the targets of IMT-Advanced. High spectral efficiency can be achieved by means of high or full frequency reuse. However, intercell interference (ICI) increases, limiting the system throughput especially at the cell edge. In LTE-Advanced, coordinated multipoint (CoMP) transmission/reception has been considered as a key technique to mitigate ICI and, thus, to improve the spectral efficiency [2022]. Joint processing (JP), also known as network MIMO, is one of the techniques falling under the umbrella of CoMP. This technique consists of several coordinated cells acting as a single and distributed antenna array, simultaneously transmitting to the different UEs. With JP, ICI can be reduced applying MU-MIMO techniques in the distributed antenna array. However, one of its drawbacks is the large amount of required feedback information, since users need to send back CSI of every coordinated cell. In addition, a large signaling overhead is required for the inter-cell information exchange [23]. In order to alleviate these requirements, the system is usually divided into clusters of cells (coordinated clusters) and JP is performed by the cells within each cluster [21]. In this framework, limited feedback schemes could contribute to further reduce the feedback overhead, bringing CoMP techniques close to practical systems.

In this article, we propose a low-complexity limited feedback scheme based on time-domain channel quantization for a cluster allowing JP. The limited feedback scheme exploits the spatial correlation between the different antennas of each base station (BS) without requiring any previous statistical knowledge of the channel. In our system, UEs are assumed to perfectly estimate their channels. The reduction of feedback information is achieved by means of a differential quantization (DQ) of the CIR coefficients. The contributions of this article can be summarized as follows:

  • A proper pilot symbol allocation grid based on LTE-Advanced allowing the pilot channel estimation in the cluster under consideration has been proposed.

  • Different strategies regarding feedback bit allocation for the proposed feedback scheme have been analyzed. A practical expression of the error introduced by this scheme has been obtained and compared to the error of the standard quantization scheme.

  • The effect of imperfect CSI on some multiuser precoding techniques at the downlink, such as ZF, THP and LRTHP, has been investigated. An expression that relates the achieved sum-rate and the amount of feedback information needed has also been obtained for a general case.

The article is organized as follows. In Section 2, the system model and the pilot symbol allocation scheme for the cluster layout under consideration is presented. The main contribution of this article is presented in Section 3, where the limited feedback scheme is described. The evaluation of the impact of the limited feedback scheme on the downlink using different precoding techniques is carried out in Section 4. The simulation environment and numerical results are presented in Section 5. Finally, conclusions are stated in Section 6.

The following notation is used throughout the article: boldface upper-case letters denote matrices, A, boldface lower-case letters denote vectors, a, and italics denote scalars, a. Superscripts (·)T, (·)H, (·)-1, (·) stand for matrix transpose, Hermitian transpose, inversion and pseudo-inverse operations, respectively. The Frobenius norm of a matrix is denoted by ∥ · ∥ F , and ∥ · ∥ stands for the Euclidean norm of a vector. We use , ( ) ,R { } and I { } to refer to absolute value, phase, real part and imaginary part of a complex value, respectively. We use ℂm×nto denote the set of m × n complex matrices. Regarding quantization, Q B X ( ) with X being G, U or L denote a scalar quantization using B bits and an optimal non-uniform codebook for an input signal with a Gaussian, Uniform or Laplacian probability density function (PDF), respectively. For simplicity, we will refer to them as Gaussian, Uniform or Laplacian quantization, respectively. The rest of calligraphic letters denote sets and denotes the cardinality of the set . Finally, E [ ] denotes the expectation operator.

2. System model

We consider the downlink of a cluster formed by sectors of different BSs. Note that a static cluster is assumed for simplicity, although a dynamic one could also be considered [24]. Each sector is equipped with a linear array of N t antennas (see Figure 1). The system serves J B N t single-antenna UEs, which share all the available OFDM subcarriers in the cluster under consideration, through spatial multiplexing.

Figure 1
figure 1

Example of a coordinated cluster of B = 3 cooperating sectors, and N t transmit antennas per sector.

Assuming a JP system with synchronous reception, a cyclic prefix whose length is longer than the maximum delay of any channel path, a flat fading channel (OFDM turns a broadband frequency selective channel into a set of narrowband frequency flat subchannels) and neglecting the interference from outside the cluster, we can express the received signal at the j th UE for the k th subcarrier as:

y j [ k ] = b = 1 B h j , b T [ k ] x b [ k ] + n j [ k ] ,
(1)

where vector h j , b [ k ] N t × 1 represents the channel vector between the b th sector of the cluster and the j th UE, vector x b [ k ] N t × 1 includes the precoded information symbols transmitted by the b th sector and n j [k] is the noise component at the j th UE, at the k th subcarrier. Assuming all cooperating sectors are interconnected and connected to a central unit, global CSI and user data are available at the transmitter side. Thus, the cluster can be seen as a distributed antenna array and JP can be performed. Aggregating the channel matrices of the cooperating sectors, the received signal for the J users can be expressed as:

y [ k ] = H [ k ] x [ k ] + n [ k ] ,
(2)

where H [ k ] = [ H 1 [ k ] , , H B [ k ] ] J × ( B N t ) is the aggregated channel matrix and H b [ k ] = [ h 1 , b [ k ] h J , b [ k ] ] T J × N t . Vector y[k] ∈ ℂ1 and n[k] ∈ ℂ1 collect the received symbols and the noise components, respectively, for the J UEs in the system. Vector x [ k ] = [ x 1 T [ k ] , , x B T [ k ] ] T ( B N t ) × 1 collects the precoded signal of the different sectors, which is obtained from the precoding techniques analyzed in Section 4. Vector n is the received circular complex additive white Gaussian noise with zero mean and variance σ n 2 .

2.1. Pilot symbol allocation

The LTE slot, also used in LTE-Advanced, is composed by seven OFDM symbols with a duration of 0.5 ms [12], whereas the LTE subframe consists of two LTE slots. In each one of the OFDM symbols, there are NIFFT subcarriers. The subcarrier spacing is Δf = 15 kHz and it remains constant for the different bandwidth configurations. The sampling frequency f s is proportional to NIFFT. However, not all the subcarriers are modulated. Only K over NIFFT subcarriers are used, that are placed around the zero frequency in the baseband spectrum. Unmodulated subcarriers are placed at the edges as a guard band. In LTE-Advanced, channel state information reference signals (CSI-RSs) have been introduced for the use of up to eight transmit antennas [25]. However, for backward compatibility, the CSI-RSs must be placed in resource elements (REs) which do not contain cell-specific reference signals (CRSs) or user equipment specific reference signals (UE-RSs) [25].

In this article we consider a coordinated cluster of B= { 1 , 2 , 3 } cooperating 120◦ sectors and N t = 4 transmit antennas per sector. In this case, we have B N t =12 transmit antennas in the cluster, thus it is necessary to extend the LTE-Advanced CSI-RSs pattern in order to allocate enough pilot symbols to perform the composite channel estimation. Figure 2 shows the pilot symbol allocation grid proposed for the coordinated cluster under consideration. In this figure, the positions of the CSI-RSs of the different transmit antennas within the set of used subcarriers are depicted. A frequency-division multiplexing (FDM) scheme is used to transmit the CSI-RSs of the different pairs of transmitting antennas (1-2 and 3-4 in each sector) in the coordinated cluster. To separate the RSs of each antenna of the pair, either code-division multiplexing (CDM) or time-division multiplexing (TDM) could be used. In particular, CDM with code length spanning on two resource elements in time domain is proposed in Rel. 10 [25].

Figure 2
figure 2

Proposed pilot symbol allocation in a RB of LTE-Advanced [25]for a cluster of B = 3 sectors with N t = 4 transmit antennas per sector.

More advanced pilot allocation schemes using combinations of FDM, TDM and CDM are presented in [2628]. However, the evaluation of the different pilot allocation schemes is out of the scope of the article, since the main objective is the design of a limited feedback scheme for the coordinated cluster presented in Section 2. In the remaining of the article, we assume that the UE obtains an error-free channel estimation through a simple least square (LS) estimation [29]. It should be noticed that the presence of a guard band with unmodulated subcarriers causes an ill conditioning problem in the LS estimation. Thus, different solutions, such as the ones presented in [30, 31], need to be applied in order to achieve an accurate estimation.

3. Limited feedback scheme

As stated in Section 1, a reliable CSI plays an important role in wireless communication systems. Limited feedback schemes for MIMO and MU-MIMO systems have been extensively studied in the literature [32]. However, despite the fact that MU-MIMO and CoMP MU-MIMO channel representations are quite similar, some important differences between them should be pointed out [21]. In coordinated clusters, users can experiment different path loss coefficients in channels from the different BSs. Due to this, in [33] different per-cell codebooks are used. Another important difference is that channel information in CoMP systems is usually larger, since there can be up to B N t transmit antennas instead of N t . In [24, 34], cluster techniques are proposed to reduce the overhead requirements. The slow variances of the channel within the coherence time can also be exploited in limited feedback schemes. In [35], a hierarchical codebook design method which makes use of the temporal correlation is proposed to reduce the feedback overhead in coordinated clusters.

The limited feedback scheme presented in this article exploits the spatial correlation of the antennas in each sector array. Since this scheme reduces the feedback overhead of one BS, it can be independently applied to the different BSs of the coordinated cluster. An analysis of the relation between the multipath channels of the different antennas in the spatial channel model (SCM) [36] from the Third Generation Partnership Project (3GPP) has been performed. This stochastic channel model, which can be classified in the category of parametric stochastic models, characterizes the MIMO channel from parameters such as angle of departure (AoD), as seen in Figure 3.

Figure 3
figure 3

Angular parameters in SCM specifications (see [37] for more details).

Following [37], the coefficients of the N multipath channel between each single-antenna UE and the sth antenna of the sector array are given by:

h s n = P n σ PL σ SF M m = 1 M G n , m BS exp ( j [ k d s sin ( θ n , m , AoD ) + Φ n , m ] ) , s = 1 , , 4 ,
(3)

where P n is the normalized power of the n th path (the total power for the N paths is equal to one, n = 1 N P n =1), σ PL and σ SF are the parameter related to the path loss and lognormal shadow fading, respectively. Constants N and M are the total number of paths and subpaths per-path, respectively, and d s is the distance in meters from the s th antenna to the first antenna (s = 1). Parameters θ n , m ,AoD and Φ n , m are the AoD and the phase of the m th subpath of the n th path, respectively, k = 2π/λ is the wave number and λ is the carrier wavelength in meters. To simplify the analysis, we assume that the antenna gain of the sector array is the same for the different subpaths, since the angular spread at the BS is only 2° for macrocell environments and good candidates for CoMP are users located close to the cell-edge (or around the cluster-center):

G n , m BS = G BS ( θ n , m , AoD ) = G BS ( θ BS + δ n , AoD ) = G n BS .
(4)

One of the characteristics of the SCM channel is that the channel is generated without explicitly setting any spatial correlation parameter. A more detailed analysis of the spatial correlation in the SCM channel can be found in [38]. This study shows that the spatial cross-correlation function of the SCM is related to the joint distribution of the angle of arrival (AoA) and the AoD through the different paths and subpaths.

According to expression (3), the ratio between the coefficients of the different antennas in the same sector array for the n th path can be expressed as:

h s n h s n = m = 1 M exp ( j [ k d s sin ( θ n , m , AoD ) + Φ n , m ] ) m = 1 M exp ( j [ k d s sin ( θ n , m , AoD ) + Φ n , m ] ) .
(5)

Note that the SCM assumes the same path loss and shadow fading for the channels of the antennas in the same sector array. Particularizing this expression to the case of M = 1 subpaths as an illustration, the coefficients ratio would become:

h s n h s n = exp ( j [ k ( d s - d s ) sin ( θ n , AoD ) ] ) , s s .
(6)

Analyzing this expression, we can observe that for any ss', |h sn /h s'n | = 1. Therefore, for the case of M = 1, feedback information could be reduced since only the magnitude of one antenna needs to be fed back. However, M = 20 is the only value supported in the SCM specification [39]. From expression (5), we can see that the relation between channel coefficients (|h sn /h s'n |) cannot be considered equal to 1 for this case. As stated previously, the correlation between h sn and h s'n is due to the joint distribution of AoA and AoD, and it cannot be explicitly specified. However, it can be observed that for M = 20, this correlation implies that the ratio between these parameters can be approximated by a random variable following a Laplacian distribution centered in 1, as seen in Figure 4. This figure and the following results have been obtained from a Monte Carlo simulation with 5,000 generated channels for the cluster under consideration in a suburban macro scenario [20] and an antenna spacing of λ/2 [37]. Note that the estimated PDF shows a higher variance as the distance between the s th antenna and the reference antenna s' increases. Thus, the most suitable reference antennas are the central ones, s' = 2, 3.

Figure 4
figure 4

PDF of | h sn /h s'n | for number of subpaths M = 20 and reference antenna s' = 2.

The estimated PDFs of the different parameters of the channel coefficients are shown in Figure 5. For the sake of simplicity, the effect of path loss and shadow fading have not been taken into account since they are usually quantized and fed back separately. It can be seen that the real and imaginary parts of each channel coefficient show a Gaussian distribution centered in 0 and variance:

var ( R { h s n } ) = var ( I { h s n } ) = σ R - I 2 0.085 .
(7)
Figure 5
figure 5

Estimated PDF of the different CIR parameters for M = 20 subpaths, s' = 2 and s = 1, 3, 4.

Component |h sn | is the magnitude of two normally distributed components with the same variance. Since maximum variation of θ n , m ,AoD in each sector is 120°, the real and imaginary parts are not completely uncorrelated. Thus, |h sn | does not present a strict Rayleigh distribution but it can be approximated as a Rayleigh distribution centered in π 2 σ R - I 0.363 and a variance:

var ( h s n ) 0.049 .
(8)

The phase of the CIR coefficients shows a uniform distribution in [-π, π), therefore its variance can be expressed as:

var ( ( h s n ) ) = π 2 / 3 3.290 .
(9)

Finally, the ratio between no-reference and reference channel coefficients, |h sn /h s'n |, can be approximated by a random variable with Laplacian distribution centered in one, with the lowest variance among the analyzed parameters:

var ( h s n / h s n ) 0.016 .
(10)

The proposed limited feedback scheme takes advantage of the reduced variance of the parameter |h sn /h s'n | in comparison to σ R - I 2 due to the spatial correlation. The steps of the scheme are summarized in Table 1. First of all, a Gaussian quantization [40] is performed over the real and imaginary parts of the reference antenna CIR using B R bits (Equation (1.A) in Table 1). It should be pointed out that, since h s'n is the reference component, a higher number of bits must be used to quantize it in order to reduce as much as possible the quantization error of each component. Then, instead of quantizing the real and imaginary parts of the remaining coefficients of the rest of the sector array, h s n / h ̃ s n and ∠(h sn ) are properly quantized, as shown in Equations (1.B) and (1.C) in Table 1. In these equations, Q B M L uses a Laplacian quantization [41] with B M bits, whereas Q B P U stands for a uniform quantization with B P bits. The number of bits used to represent the quantized value h ̃ s n dif , B M , can be lower than the number of bits used to quantize R { h s n } and I { h s n } , B R , due to the significantly lower variance of h s n / h ̃ s n . We refer to this scheme as DQ.

Table 1 DQ feedback scheme

One important point to note here is that, in order to reduce the quantization error, h ̃ s n dif is obtained from the quantized version of h s'n , that is, h ̃ s n . Thus, the BS can reconstruct the parameter with lower quantization error. On the other hand, the parameter ∠(h sn ) presents a uniform distribution in [-π, π), hence it is quantized through uniform quantization. The fact of quantizing ∠(h sn /h s'n ) instead of (h sn ) does not have any benefit since both variances are similar. Therefore, additional mathematical operations can be avoided by quantizing ∠(h sn ) directly. The reconstructed coefficients at the BS once h ̃ s n dif and γ ̃ have been received are expressed in Equation (1.D) in Table 1.

Finally, in order to improve the stability of h s n / h ̃ s n quantization, the UE can choose the reference antenna as the central one (s' = 2, 3) showing a greater h ̃ s n , using only one additional bit for sending back this information.

4. Impact on precoding techniques

In the cluster under consideration, users share all K available subcarriers. Since users have only one antenna and cannot cooperate, multiuser interference must be canceled at the transmitter side. This is the task of precoding techniques. If the number of users is bounded by the total number of transmit antennas in the cluster, J B N t , and CSI is available at the BSs, multiuser interference can be completely canceled in transmission. However, the feedback scheme introduces a quantization error and imperfect CSI is then available at the transmitter side. Thus, the interference cannot be fully canceled. Through the discrete Fourier transform of the reconstructed CIR at the BS (Equation (1.D) in Table 1) and applying the central limit theorem since different paths are independent, the error in the MIMO CFR matrix of each BS can be modeled as an additive Gaussian error matrix. Assuming the same quantization error for the different UEs and BSs, the reconstructed CFR matrix can be expressed as:

H ̃ [ k ] = H [ k ] + E [ k ] ,
(11)

where H ̃ [ k ] and H[k] are the estimated aggregated channel matrix at the BS and the true aggregated channel, respectively, for the k th subcarrier. Matrix E[k], whose entries are i.i.d. and follow a CN ( o , σ e 2 ) distribution, represents the additive error in the channel matrix due to the channel quantization. Thus, the precoding design at the BS is obtained from H ̃ [ k ] instead of H[k].

In the following subsections, different precoding techniques, such as ZF, THP and LRTHP, are described and the effect of quantization error on them is analyzed. For the sake of simplicity, the frequency indexes k are omitted since the precoding process is performed over each subcarrier separately.

4.1. Zero-forcing

Channel inversion is known as ZF precoding when it is performed at the transmitter. This technique was proposed for only one receive antenna per UE and suffers from a power enhancement when the channel matrix is bad conditioned [4]. In ZF precoding, the beamforming matrix is obtained from the Moore-Penrose pseudo-inverse:

W = H ̃ = H ̃ H ( H ̃ H ̃ H ) - 1 .
(12)

The precoded signal can be expressed as (Figure 6):

Figure 6
figure 6

ZF precoding scheme.

x = β W s = β H ̃ s = β H ̃ H ( H ̃ H ̃ H ) - 1 s ,
(13)

where s is a J × 1 vector which contains the original data symbols of the J users and β limits the sum-power of all the sector arrays, b = 1 B x b 2 = x 2 . However, the actual transmit power depends on the data s, and β is chosen to assure a certain average sum-power E [ x 2 ] =P. Its value is determined by:

β = P / σ s 2 t r ( ( H ̃ H ̃ H ) - 1 = P / σ s 2 j = 1 J ( 1 / λ j 2 ) ,
(14)

where λ j is the j th singular value of H ̃ and σ s 2 is the power of the original signal, s. In the case of a M- QAM modulated signal with odd integer components, its mean power can be determined by:

σ s 2 = 2 ( M - 1 ) 3 .
(15)

More realistic power constraints, such as per-BS power constraints [42] or per-antenna power constraints [43], are needed in the implementation of real systems. However, since it is not the main objective of this article, a sum-power constraint is imposed and β follows Equation (14). Thus, the received signal by the users can be expressed as:

y = H x + n = β H H ˜ s + n = = β ( H ˜ E) H ˜ s + n = β s β E H ˜ s + n .
(16)

The autocorrelation matrix of the interference-plus-noise term is given by:

R i + n = E ( - β E H ̃ s + n ) ( - β E H ̃ s + n ) H = β 2 σ s 2 E E H ̃ ( H ̃ ) H E H + σ n 2 I = = β 2 σ s 2 σ e 2 j = 1 J ( 1 / λ j 2 ) I + σ n 2 I = ( P σ e 2 + σ n 2 ) I ,
(17)

where the singular value decomposition (SVD) of H ̃ has been used, i.e., H ̃ = U Σ V H , and the property that the statistics of matrix E do not change when multiplied by a unitary matrix such as V has also been applied. Thus, the signal to interference-plus-noise ratio (SINR) of each user and the sum-rate of the cluster can be expressed respectively as:

SIN R ZF = β 2 σ s 2 P σ e 2 + σ n 2 = 1 ( σ e 2 + σ n 2 / P ) j = 1 J ( 1 / λ j 2 ) ,
(18)
R ZF = J log 2 1 + 1 ( σ e 2 + σ n 2 / P ) j = 1 J ( 1 / λ j 2 ) .
(19)

4.2. Tomlinson-Harashima precoding

The main drawback of linear precoding techniques is that they need high transmit power to ensure a certain quality of service when the channel matrix is bad-conditioned. Figure 7 shows the scheme of THP, which makes use of the modulo operator to reduce the power of the transmitted signal compared to a linear precoding scheme [5].

Figure 7
figure 7

Tomlinson-Harashima precoding scheme.

This technique can be seen as a scalar integer perturbation of the transmitted signal that reduces its power and allows cancelation of the interference when the modulo operator is applied at the receiver. Using modulo operation makes it possible to reduce the transmit power by reducing the symbols into the boundary region of the M- QAM constellation. The modulo operation can be modeled by adding integer multiples of 2 M to the real and imaginary parts of the original signal before the linear filtering:

x ̃ = G L - 1 ( s + p ) ,
(20)

where matrix L is a J × J lower triangular matrix obtained through an LQ-decomposition of the channel matrix H ̃ , that is, H ̃ = L Q . Matrix G = diag [l11, ..., l JJ ] is a diagonal matrix which contains the elements of the diagonal of L. Vector p is the perturbation vector, with components p j 2 M ( p R + j p I ) , p R , p I . The transmitted signal can be expressed as:

x = β Q H L - 1 ( s + p ) ,
(21)

where Q is a J× B N t unitary matrix obtained in the previous LQ-decomposition. Constant β, as in the ZF case, is used to assure a certain sum-power constraint E [ x 2 ] =P and its value is now determined by:

β = M - 1 M P / σ s 2 j = 1 J ( 1 / l j j 2 ) .
(22)

The first factor compensates the slight power increase of x ̃ with regard to s. Signal x ̃ is uniformly distributed over the boundary region of a M- QAM modulated signal [44]. However, this power increase is not very significative for high order modulation. The signal received by the users can be expressed as:

y = H x + n = ( H ̃ - E ) x + n = = β ( s  +  p ) - β E Q H G - 1 x ̃ + n .
(23)

Disregarding the perturbation vector p since it is removed at the receiver with the modulo-operation, and considering that H ̃ = Q H L - 1 , it is straightforward to show that the autocorrelation matrix of the interference-plus-noise term has the same expression as in Equation (17), that is, R i + n = ( P σ e 2 + σ n 2 ) I . Therefore, the SINR of each user and the sum-rate of the cluster can be expressed respectively as:

SIN R THP = β 2 σ s 2 P σ e 2 + σ n 2 = ( M - 1 ) / M ( σ e 2 + σ n 2 / P ) j = 1 J ( 1 / l j j 2 ) ,
(24)
R THP = J log 2 1 + ( M - 1 ) / M ( σ e 2 + σ n 2 / P ) j = 1 J ( 1 / l j j 2 ) .
(25)

4.3. Lattice-reduction Tomlinson-Harashima precoding

The THP technique can also be performed after a lattice-reduction of the channel matrix [6]. In this case, the Lenstra-Lenstra-Lovász (LLL) reduction algorithm [45] is employed to obtain the lattice-reduced channel matrix, W, and the transformation matrix, T:

H ̃ = T W ,
(26)

where matrix T is a J × J unimodular matrix with integer elements and W is a matrix with the same dimensions but better orthogonality properties than the original channel matrix H ̃ . Since the traditional LLL algorithm originally worked with a real lattice basis, most authors use the real-valued equivalent matrix of the complex-valued channel matrix. However, this approach doubles the channel matrix dimension and can be avoided by using a complex version of the LLL algorithm [46].

As seen in Figure 8, the original signal s is replaced by a = T-1s in the THP scheme. Therefore, signal a suffers from a power increase with respect to s. However, as in THP, the modulo operator constraints the power, thus there is not power increase in x ̃ due to matrix T-1.

Figure 8
figure 8

Lattice-reduction Tomlinson-Harashima precoding scheme.

In Figure 8, β follows the same expression than in Equation (22). Therefore, the SINR of each user and sum-rate of the cluster with this scheme can be expressed as:

SIN R LRTHP = β 2 σ s 2 P σ e 2 + σ n 2 = ( M - 1 ) / M ( σ e 2 + σ n 2 / P ) j = 1 J ( 1 / l j j 2 ) ,
(27)
R LRTHP = J log 2 1 + ( M - 1 ) / M ( σ e 2 + σ n 2 / P ) j = 1 J ( 1 / l j j 2 ) .
(28)

These expressions are the same that in Equations (24) and (25), except from the fact that l jj comes from matrix L which has been obtained through an LQ decomposition of the reduced channel W instead of the original one H ̃ . Since W shows better orthogonality properties than H ̃ , a better performance is obtained with this scheme [6]. It should be noticed that the overall computational cost of this scheme increases considerably due to the lattice reduction process. Some efficient computational algorithms to reduce the overall cost of this scheme can be found in the literature, for example, in [47].

5. Numerical results

In this section we present simulation results comparing the performance of the limited feedback scheme based on DQ with another time-based feedback scheme, and the different precoding algorithms presented in Section 4. We consider a macrocell deployment model whose parameters are specified in [20, 37] and collected in Table 2. Therefore, the statistical analysis carried out in Section 3 is valid for this channel. The channel follows a block fading model, remaining constant between one channel estimation period and the next one. The sum-power constraint has been equally allocated over all subcarriers and users are randomly distributed over the coordinated cluster area (see Figure 1). The transmit sum-power for the different system signal-to-noise ratio (SNR) values has been calculated assuming a user placed at the cell-edge, taking into account the propagation characteristics of the channel and the thermal noise.

Table 2 Channel and system parameters

5.1. Performance of limited feedback schemes

In this subsection, we discuss the relation between the number of bits employed to quantize the different parameters in our DQ limited feedback scheme and its performance. Therefore, Gaussian quantization is used for the real and imaginary parts of h s'n (Equation (1.A)) and separate modulo and phase quantization is performed for h sn with ss' (Equations (1.B) and (1.C)).

The performance of the different configurations of the quantizer regarding the number of bits has been evaluated through Monte Carlo simulations. A feedback scheme based in Gaussian quantization (GQ) has also been evaluated for comparison. In this scheme, B G bits are used by each UE to quantize the real and imaginary part of each coefficient of the CIR. A similar scheme but using uniform quantization instead of Gaussian quantization is proposed in [18]. It is important to notice that the optimizations applied to the scheme in [18], such as different bit allocation across the different paths, could be also applied to the DQ and GQ schemes. However, our comparison has been performed with equal bit allocation across the different paths. The metric used to compare both feedback schemes is the cost, which indicates the total number of bits that each UE uses to quantize the CIR of all the transmit antennas in the cluster. The cost for both schemes can be expressed as:

C DQ = 2 B R b = 1 B L b + ( B M + B P ) ( N t - 1 ) b = 1 B L b + N D b = 1 B ( L b - 1 ) ,
(29)
C GQ = 2 B G N t b = 1 B L b + N D b = 1 B ( L b - 1 ) ,
(30)

where L b is the number of resolvable paths of the channel between each user and the b th sector array and N D is the number of bits dedicated to quantize the discrete delay for paths n = 2, ..., L b . The first term in Equations (29) and (30) collects the cost related to quantize the gain of the paths, whereas the second term collects the cost of quantizing the discrete delays and it is the same for both schemes. In our case, N D = 5 bits and B G = 5-8 bits, involving a cost ranging from 675 to 1,025 bits per UE approximately for the GQ scheme. Since there is not any standardization regarding the number of bits that should be used to quantize explicit feedback, the explicit feedback scheme in IEEE 802.11n [32] has been taken as a reference. In this scheme, 4-8 bits are used to quantize the real and imaginary parts of the entries of the CSI matrices. This range of costs would be affordable in a system as the CoMP field testbed presented in [22]. On the other hand, different configurations of the DQ scheme varying the number of bits given to B R , B M and B P have been employed, as shown in Table 3. The costs of these configurations are approximately within the same range of costs. The reference antenna in each sector array is chosen between s' = 2 and s' = 3 depending on the magnitude of h ̃ s n . The elements of the mean square error (MSE) column follow the expression:

Table 3 Configurations of the differential quantizer
MSE = E H ̃ [ k ] - H [ k ] F 2 = E E [ k ] F 2 = B N t J σ e 2 ,
(31)

where σ e 2 was introduced in Equation (11).

In order to compare the performance of GQ and DQ limited feedback schemes, Figure 9 shows the MSE obtained with both feedback schemes for the configurations stated before. It can be seen that DQ offers more flexibility regarding the bit allocation, which allows for a larger range of Cost-MSE combinations than GQ. It should be noted that most of the DQ configurations simulated outperform the GQ. The dashed line is an approximation of the curve that collects the most suitable configurations of the DQ in terms of Cost and MSE, which are in boldface type in Table 3, in a least squares sense. In this regard, the MSE of those DQ configurations results in half the ones of the GQ scheme. Setting a particular MSE target, the figure shows that around 75 bits/UE can be saved every feedback period by using DQ instead of GQ. This reduction can mean up to a 10% of the total amount of the feedback channel information.

Figure 9
figure 9

MSE vs. cost for GQ and DQ feedback schemes. The dashed line collects the most suitable configurations of the DQ in terms of Cost and MSE, which are highlighted in boldface type in Table 3.

Regarding the combinations in boldface type of Table 3, it is important to identify their common characteristics. It can be appreciated that better performance is obtained by configurations which dedicate a higher number of bits to quantize the phase, B P , in comparison to those dedicated to quantize the relation between magnitudes, B M . In fact, most of these configurations dedicate one or two bits more to quantize the phase than to quantize the magnitude. This result is coherent with the PDFs shown in Figure 5, where the variance of ∠h sn is higher than the variance of (|h sn /h s'n |). In order to explain this behavior, Figure 10 shows a comparison of the different errors due to GQ and DQ schemes. The mean square quantization error (MSQE) of the quantization process Q B ( X ) can be expressed as MSQE =  E X - Q B ( X ) 2 . Trace 1 shows the MSQE obtained through a Gaussian quantization of the real part (MSQE for imaginary part is the same) of any coefficient of the CIR, R { h s n } . Traces 2 and 3 show the MSQE of the modulus ratio and the phase of the coefficients of the CIR with a Laplacian and Uniform quantizer, respectively. Trace 4 shows the mean square distortion (MSD) obtained in the reconstructed modulus of h s n ,E h s n - h ̃ s n 2 . It is important to note that h ̃ s n has been obtained after a Gaussian quantization of the real and imaginary parts of h sn using B bits for each part. Trace 5 represents the MSD of the phase of h sn for the same quantization as for trace 4. It can be appreciated that, generally speaking, in DQ 1.5 bits plus are needed in phase quantization to achieve similar MSQE for both reconstructed magnitudes, modulus and phase. For instance, an MSQE of 10-3 is achieved in DQ allocating B P = 6 bits in the phase and B M = 4-5 bits in the modulus. Regarding GQ, B G = 6 bits are needed to obtain an MSD of 10-3 in the reconstructed phase ∠h sn , but an improved modulus MSD of 10-4 would be achieved in this case. However, the total number of bits required would be 12 bits, 6 for the real part and 6 for the imaginary part.

Figure 10
figure 10

MSE vs. number of bits for different quantized parameters.

From a practical point of view, precoding techniques are much more sensitive to phase errors than magnitude errors. Therefore, for a given whole number of bits, for instance B T = 12, GQ must use B G = 6 bits in quantizing real and imaginary parts of hsn, obtaining an MSD of 10-3 for the phase, whereas DQ can allocate B P = 7 bits in the phase and B M = 5 bits in the modulus ratio, achieving a reduced MSD in the reconstructed phase of 2·10-4. It should be noted that the feedback bit allocation strategy has been evaluated through the simulation of different configurations of the DQ scheme. However, rate-distortion theory could provide a framework for deriving the optimal feedback bit allocation.

Summarizing, DQ outperforms GQ in terms of MSE with respect to the same number of bits (Figure 9) and shows a higher flexibility regarding feedback bit allocation.

5.2. Performance of precoding techniques

In the previous subsection we have analyzed different feedback bit allocations for the DQ scheme, carrying out a performance comparison between this scheme and GQ feedback schemes in terms of MSE. However, from a practical point of view, it is more interesting to evaluate the cluster performance in terms of bit error rate (BER) and sum-rate. In this subsection, we compare how the different precoding techniques can deal with imperfect channel information due to quantization when using different limited feedback schemes. Note that either BER or sum-rate could be further improved by means of power allocation like loading strategies or water-filling. Nevertheless, these techniques may result in some users being dropped due to their channel condition. In this article, we are interested in comparing both feedback schemes and the performance of precoding techniques under quantized channels. Thus, we do not use any particular power allocation technique.

As we have seen in Section 4, BER and sum-rate depend on the variance of the additive error of the channel, σ e 2 , among other parameters. Using Equation (31), the configurations in boldface type of Table 3 for the DQ scheme and Figure 9, we can state that σ e 2 in both GQ and DQ feedback schemes can be approximated through LSs fitting by the following expression:

σ e 2 =k1 0 p C ,
(32)

where C represents the cost expressed as the number of bits per UE and k and p are fitting parameters. For the GQ scheme, k ≈ 1.860 and p ≈ -3.712 · 10-3, whereas for the DQ scheme, k ≈ 1.465 and p ≈ -4.013 · 10-3. From this result, we can state that both feedback schemes have a quite similar slope and, therefore, the difference between them remains almost constant.

As seen in Section 4, the sum-rate performance achieved in the cluster depends on two effects: the accuracy of the channel information and the ability of each precoding technique to deal with bad conditioned channels. To separate these two effects, we assume that the aggregated channel matrix for a given subcarrier is a unitary matrix, HHH= I. Then:

j = 1 J 1 λ j 2 = j = 1 J 1 l j j 2 = J ,
(33)

and the cluster sum-rate achieved by any precoding technique and can be expressed as:

R = J log 2 1 + 1 / J σ e 2 + σ n 2 / P .
(34)

Substituting Equation (32) in Equation (34), we can obtain the sum-rate achieved by each feedback scheme as a function of the feedback cost:

R = J log 2 ( 1 + 1 / J k 10 p C + σ n 2 / P ) ,
(35)

where k and p take the values previously discussed for each feedback scheme. Figure 11 shows the average sum-rate per sub-carrier achieved by GQ and DQ feedback schemes for a system SNR of 30 dB, and compare them with the case of perfect CSI at the transmitter (P-CSIT). The results show that DQ achieves significantly higher sum-rates than GQ. The difference between sum-rates is around 8 bps/Hz and remains almost constant for the range of analyzed costs.

Figure 11
figure 11

Average sum-rate per subcarrier for GQ, DQ and P-CSIT, SNR = 30 dB and unitary aggregated channel matrix.

Figure 12 compares the BER performance of the precoding techniques presented in Section 4 (ZF, THP and LRTHP) with limited feedback schemes for a system SNR of 30 dB and channels generated according Table 2. The results show an almost linear relation between log10(BER) and the cost for GQ. However, traces for DQ present some fluctuations. This is due to the fact that MSE and σ e 2 does not strictly follow a line for the configurations in boldface type in Table 3 (see Figure 9). Nevertheless, it is important to notice that these configurations outperform GQ scheme for any given cost, regardless of the precoding technique. Using a DQ scheme instead of using a GQ scheme allows to save up to 75 bits/UE for configurations with a BER between 10-2 and 10-3.

Figure 12
figure 12

Average uncoded BER per subcarrier for ZF, THP and LRTHP with GQ and DQ feedback schemes and SNR = 30 dB.

Figure 12 also shows an interesting trade-off between processing complexity and the amount of feedback information. Given a certain DQ configuration with a certain precoding technique, there are two choices to reduce the BER. The first one would be to increase the amount of feedback information per user, that is, to increase the cost. This choice would involve an increase of the signaling overhead and would reduce the system efficiency. The second choice would be to increase the complexity of the precoding technique. Substituting ZF precoding by THP or LRTHP, the BER would decrease at the cost of increasing the computational cost of the precoding stage.

Figure 13 shows the results in terms of average cluster sum-rate. As in Figure 12, we can see that the sum-rate obtained in DQ schemes is not completely linear with the cost. For a given sum-rate and precoding technique, around 50 bits/UE can be saved by using a DQ scheme instead of the GQ scheme. This figure also shows a trade-off between processing complexity and feedback information to increase the sum-rate.

Figure 13
figure 13

Average sum-rate per subcarrier for ZF, THP and LRTHP with GQ, DQ and P-CSIT and SNR = 30 dB.

Figures 14 and 15 show the BER and sum-rate achieved by the cluster under consideration for GQ and DQ schemes and different system SNRs, respectively. The first scheme uses GQ with B G = 7 bits whereas the second scheme uses DQ with B R = 7, B M = 6 and B P = 8 bits. These two configurations have been chosen due to their similar cost (around 910 bits/UE).

Figure 14
figure 14

Average uncoded BER per subcarrier for ZF, THP and LRTHP with GD ( B G = 7), DQ ( B R = 7, B M = 6, B P = 8) and P-CSIT.

In Figure 14, we can see that the DQ scheme achieves lower BER for all SNR values than the GQ scheme, providing a gain between 2 and 4 dB for system SNRs ranging from 10 to 20 dB. For SNRs higher than 20 dB, the BER remains constant despite the fact that the system SNR increases. Taking into account that P σ e 2 = σ n 2 (Equations (18), (24) and (27)) for SNR ≈ 14.3 and SNR ≈ 17.3 in GQ and DQ scheme respectively, P σ e 2 σ n 2 for SNRs > 20. Therefore, the system is limited by the interference introduced by the imperfect channel information. We can observe the same behavior in Figure 15, where the growth of the sum-rate starts to decrease for SNRs higher than 20 dB. Here, DQ schemes can achieve a gain up to 5 dB over GQ.

Figure 15
figure 15

Average sum-rate per subcarrier for ZF, THP and LRTHP with GD ( B G = 7), DQ ( B R = 7, B M = 6, B P = 8) and P-CSIT.

Regarding the different precoding schemes, Figure 14 shows that LRTHP can provide a gain around 4 dB over THP, whereas THP outperforms ZF with a gain around 2 dB. It is also interesting to point out that the different precoding techniques also achieve different levels of error floor for SNRs higher than 20 dB. In Figure 15, we can see that LRTHP is the best precoding technique to deal with the noise and interference due to the quantized channel information. However, it is important to realize that THP performs closer to LRTHP than to ZF. In the band of SNRs that is not completely limited by the interference (from 15 to 25 dB), LRTHP provides a gain around 3 dB over THP, whereas the gain of THP over ZF increases up to more than 5 dB. In this figure, it also can be observed the trade-off stated before. A system using a GQ scheme and THP precoding with a system SNR of 30 dB provides a system sum-rate of 6.3 bps/Hz approximately. If we want to increase the sum-rate, we could use DQ instead of GQ or we could use LRTHP precoding instead of THP. Both choices offer the same sum-rate (see Figure 15). The first option will increase slightly the complexity of quantization at the UE whereas the second option will increase considerably the computational complexity at the BS [7].

6. Conclusion

In this article, a low-complexity limited feedback scheme based on time-domain channel quantization for a coordinated cluster allowing JP has been presented. The channel estimation is performed using the proposed pilot symbol allocation grid for a coordinated cluster and the CSI is fed back through the proposed scheme. This scheme takes advantage of the spatial correlation between antennas without requiring a statistical knowledge of the channel or a higher computational complexity, carrying out a DQ over the CIR. Its performance has been compared with the standard quantization of the CIR in a CoMP scenario. The simulation results show that the proposed scheme outperforms the scheme based on standard quantization in terms of MSE, offering a higher flexibility regarding feedback bit allocation.

The effect of imperfect CSI due to the limited feedback scheme has been evaluated on different precoding schemes: ZF, THP and LRTHP. An expression that relates the sum-rate with the number of feedback bits for a general precoding case has been obtained. The proposed scheme achieves a higher sum-rate than the scheme based on standard quantization for the same number of feedback bits. Simulation results also show that the proposed scheme achieves a better performance in terms of sum-rate and BER when ZF, THP or LRTHP techniques are used.

Among the evaluated precoding techniques, numerical results show that the highest robustness against imperfect CSI is achieved with LRTHP at the cost of a higher complexity. An interesting trade-off between the precoding technique complexity and the amount of feedback information has been stated. Given a performance requirement, the amount of feedback information can be reduced by means of using a higher complexity precoding technique and vice versa.