Low-complexity signal detection and precoding algorithms for multiuser massive MIMO systems

In this study, we present efficient detection and precoding algorithms for massive multi-user multiple-input multiple-output wireless system. To reduce the computational complexity due to large matrix inversion, the proposed algorithms are an enhanced version of zero forcing scheme based on QR matrix decomposition for both uplink and downlink systems. Through extensive numerical experiments, we demonstrate that the proposed algorithms outperform the recently published ones in terms of performance and complexity.


Introduction
Massive multiple-input multiple-output systems combined with space-division multiple access (SDMA) techniques are expected to play a major role in next-generation wireless communication systems (5G) where the capability of serving a large number of users in the same timefrequency resource with higher data rates and improved spectral efficiency becomes an important requirement. Future wireless communication systems require a large number of antenna elements at the base station (BS) to increase the data rate and achieve high channel capacity [1]. Multi-user MIMO systems must adopt advanced technologies to support and serve a large number of users simultaneously whereas the high computational complexity of data processing at both sides transmitter and receiver is a major drawback and a big challenge [2][3][4]. In MU-MIMO systems, the multi-user interference (MUI) is the most serious problem that affects the signal detection at the receiver. Thus, the main objective of the previously published works was how to cancel the MUI using more sophisticated algorithms for data detection in uplink and precoding in downlink.
In uplink systems, spatial demultiplexing or signal detection at the receiver is a challenging task for spatially multiplexed multiuser MIMO systems. The optimal data detection techniques in such systems are non-linear. To solve this problem, various algorithms based on Maximum likelihood (ML) criterion are proposed where optimal solutions are achieved [5]. However, their complexity increases exponentially as modulation order or the number of transmit antennas increases [6]. There are many nonlinear data detection algorithms with reduced complexity, such as the sphere decoder (SD) [7] that enables the ML detection for the MIMO system and tabu search (TS) based data detectors [8]. Unfortunately, for massive MIMO systems with large number of antennas and higher-order modulation schemes, such methods need high computation complexity [9,10]. To make trade-off between performance and complexity, linear signal detection algorithms such as the zero-forcing and the minimum mean square error (MMSE) techniques have been proposed [5,11]. However, both of these methods are based on a large-dimensional matrix inversion, which achieve better performance at the expense of higher computational complexity. On the other hand, for the downlink transmission, a multi-antenna transmitter can communicate simultaneously with single or multiple antenna users where the pre-processing, which is called precoding, is made at the base station to facilitate the detection on the receiver side. In the literature, the proposed precoding algorithms for downlink systems can be also sub-divided into linear and nonlinear precoding types. Nonlinear algorithms achieve sum-capacity [12], whereas linear precoding approaches usually achieve best performance with much lower complexity. The famous types of linear precoding schemes are maximum ratio transmission (MRT) [13], zero-forcing [14] and minimum mean squared error [15]. The design and optimization of an optimal linear precoder is achieved using a weighted MMSE criterion where the precoding weights are computed by maximizing the ratio between the useful signal of each user and the multiuser interference plus noise [16].
Recently, several works have been published to improve the performance of both signal detection and precoding algorithms for multi-users massive MIMO mobile communication systems. In order to reduce the high complexity due to the explicit inverse of the channel matrix, Yin et al. [17] proposed an algorithm based on conjugate gradient (CG) for both data detection and precoding, whereas Zhang et al. [18] presented an efficient soft-output data detection algorithm based on Gauss-Seidel method. For large scale MIMO systems, the Gram matrix is diagonally dominant and thus the initial solution of the Gauss-Seidel method is chosen as a 2-term Neumann series expansion, which efficiently speeds up the convergence of the algorithm.
An efficient detection algorithm based on the alternating direction method of multipliers (ADMM) is proposed [19]. It performs infinity-norm or box-constrained equalization, which outperforms classical linear MMSE detectors in terms of packet error-rate (PER). Another work based on optimized coordinate descent (OCD) that performs approximate MMSE is presented in [20], while a near-optimal detection algorithm with low complexity which is based on the Richardson method is proposed in [21]. These latter have lower time complexity than that of algorithms based on an exact matrix inversion and deliver near-optimal results particularly for large ratio between BS antennas and user terminals.
In this paper, we propose an enhanced version of QR based zero forcing algorithms for both detection and precoding techniques. In small-scale MIMO systems with a small number of antennas, ZF scheme results in low performance due to noise amplification especially when the channel matrix is ill-conditioned. However, the columns of channel matrix are asymptotically orthogonal in massive MIMO systems and the Gram matrix H is Hermitian positive definite [21,22]. Consequently, the channel matrix is well conditioned and the Gram matrix is invertible, so simple linear detection schemes can perform well using fast algorithms for either exact or approximate matrix inversion.
The rest of this paper is organized as follows. Section 2 details the massive MU-MIMO model for both uplink and downlink systems and describes the proposed algorithms using QR decomposition. Section 3 provides a complexity analysis and error-rate performance comparison with recently published works. Section 4 concludes the paper.

Multuser massive MIMO systems
Consider a multiuser massive MIMO wireless system where a base station is equipped with B antennas to communicate with U single-antenna users using space-division multiple access (SDMA) technique. The received vector is expressed by the following equation [10].
where corresponds to the transmit vector, is the channel matrix, and stands for independent identically distributed (i.i.d.) Gaussian noise vector with mean zero and variance N 0 per entry. The optimal detection for the MIMO system is the ML detection, in which the receiver finds a vector s such that ‖ − ‖ is minimized.

Uplink system model and data detection
In the uplink, the symbols for all users are mapped onto constellation points in the set O. The uplink system model is expressed as follows: where ∈ O U is the transmit vector with modulated symbols from all users, ∈ ℂ B is the receive vector and u ∈ ℂ B×U is the uplink channel matrix. Figure 1a illustrates the uplink channel model where the Gaussian noise vector is introduced as an additive noise.
When channel state information (CSI) is available, we can use the matched filtering (MF) to maximize the SNR, which is known as maximal ratio transmission (MRT) scheme in the context of multiuser MIMO systems. The output of the matched filter has the following expression: To detect the vector , the least squares solution can be reduced to solving a system of normal equations: As stated above, in massive MIMO systems the Gram matrix H u u is Hermitian positive definite [21,22]. Consequently, the Gram matrix is invertible and the problem has a unique solution. A straightforward way to obtain an estimate ̂ is to compute ̂ = ( H u u ) −1 H u , which is exactly equivalent to the ZF scheme where the interference caused by u is forced to zero.
To avoid the direct matrix inverse of H u u , we can factorize the Gram matrix using a simple and fast QR decomposition [23] where = [r kj ] is an upper triangular matrix and is an orthogonal matrix. Thus, the Eq. (4) can be rewritten as follows: We set = H u , thus the Eq. (5) in the form = is very easy to solve by an iterative process called back substitution for upper triangular matrices. The system of equation solver works according to the following equation.
where s N = z N ∕r NN and k = N − 1∶ 1. Figure 1b depicts a schematic illustration of data detection at the receiver using the above system of equation solver as an alternative way to detect the transmitted data without inverting the Gram matrix.

Downlink system model and precoding
In downlink system, precoding is applied as pre-processing at the transmitter to facilitate detection on the receiver side without further processing. To cancel the multiuser interference, ZF scheme is introduced as pre-equalizer at the transmitter instead of the receiver. The downlink channel d ∈ ℂ U×B satisfies d = H u due to the reciprocity between uplink and downlink. Here, the BS encodes the bit streams for each user and then maps them to constellation points in O. The transmit vector ∈ O U containing the modulated symbols for all U users is precoded using the following linear precoder: is a ZF precoding matrix that allow to cancel the inter-user interference and is a constant to meet the total transmitted power constraint after pre-coding and it is given as [5].
The received signal d + must be divided by to compensate for the effect of amplification at the transmitter. Thus, the received vector becomes.
Consequently the transmit vector s can be recovered directly from the received vector. Figure 2a shows the block diagram of MU-MIMO downlink model where the precoding is done on the transmitter side.
To avoid matrix inversion, we adopt the same technique proposed above in the Sect.  To compute , we can set = and use the same system of equation solver described in the uplink system. Thus, instead of transmitting the original symbol s, we must transmit the precoded vector which is the output of the precoder as explained in Fig. 2b.

Simulation results
To evaluate the performance of the proposed algorithm, we implemented several simulation experiments for single carrier based systems using high-order modulation scheme (64-QAM). We assume perfect knowledge of the channel state information at the base station in order to ensure multiuser detection in the uplink and precoding in the downlink. In our study, the estimate vector ŝ and the precoded vector p are the exact solution of a system of normal equations using QR decomposition. Luckily, the QR decomposition is done during the preprocessing step, and thereby detection and precoding steps are extremely simplified. For QR decomposition, we have used a new simple and fast algorithm recently proposed in [23] that reduces the time complexity from O(n 3 ) to O(n 2.529 ) using the fastest known matrix multiplication. The first part of the simulation is focused basically on signal detection in uplink channel. To analyze the computational complexity of the proposed algorithm, the evaluation is based on the number of complex-valued multiplications needed to detect one single transmitted symbol vector. Table 1 compares the computational complexity of the proposed algorithm to other, recently published massive MIMO data detectors, namely conjugate gradient (CG)based detector [17], Improved GS [18], ADMIN detector [19], OCDBOX [20], and Richardson detector [21].
In this table, the second column represents the computational complexity of the pre-processing step whereas the third column represents the number of complex-valued (10) = multiplications needed for k iterations in the detection step where U is the number of users and B the number of BS antennas. Note that for ADMIN, the LDL decomposition is used with a time complexity of O(u 2.529 ) instead O(u 3 ) in this comparison where, to the best of our knowledge, it is the first time that this algorithm is used in such applications.
In terms of number of operations, our algorithm provides an exact solution using a simple back substitution algorithm that costs only U(U − 1)∕2 complex-valued multiplications and consequently it is the fastest one compared to the other algorithms even for one single iteration (k = 1). Table 2 shows the computational complexity in preprocessing and detection steps for two iterations (k = 2) and four iterations (k = 4) with B = 128 and U = 16, which are used in the next part to evaluate the bit error rate (BER) performance.
In terms of bit error rate (BER) performance analysis, the BER performance of the proposed algorithm is first compared with that of the well-known MMSE signal detection algorithm, which is near-optimal for multi-user MIMO systems, but requires the statistical information of noise and high complexity matrix inversion. In Fig. 3, we compare the BER of the MIMO systems based on the MMSE algorithm and the proposed algorithm. The modulation constellation is 64-QAM, B = 128 and U = 16, 8. We observe that both algorithms attain practically the same BER as shown in Fig. 3a. However, if we zoom in on any part of the BER curves to make it appear much larger and more clearly, we can notice that the proposed algorithm presents a slight improvement in terms of BER performance as illustrated in Fig. 3b for the two cases.
Second, the simulation results of the BER performance with respect to the signal-to-noise ratio (SNR) are analyzed to compare the proposed signal detection algorithm with the recently published ones whose computational complexity is analyzed above. Figure 4 illustrates the BER performance of the proposed algorithm and the recently published ones for two iterations (k = 2) with B = 128, U = 16 and 64-QAM modulation scheme.

ADMIN [19]
2BU + U 2.529 + U(U − 1)∕2 2k(U 2 + U) OCD [20] BU + U K (2BU + U) Richardson [21] 2BU From Fig. 4, one can observe that the proposed detector has a better error performance compared to all the others over the range of SNR for two iterations (k = 2), except ADMIN detector, which performs similar to our algorithm in this scenario, but at the cost of 1088 complex-valued multiplications in the detection step whereas the proposed algorithm requires only 120 operations while the pre-processing step needs practically the same number of operations for both detectors as illustrated in Table 2.
Another experience is performed for four iterations (k = 4) where the achieved results of the aforementioned detectors are compared and shown in Fig. 5 From this figure, it is shown that the BER performance of the proposed algorithm achieves higher throughputs than those that approximate MMSE detector as demonstrated above in Fig. 3. However, the algorithms based on constraint optimization problems such as (ADMIN) [19] and (OCD) detector [20] perform slightly better in terms of error-rate performance but at an enormous cost where the computational complexity is 2176 and 16,448 complex-valued multiplications respectively in the detection phase (Table 2).
Finally, the performance of the proposed QR decomposition based ZF precoder and the MMSE precoding is analyzed. To avoid an explicit matrix inversion in the downlink system, most of linear precoders try to approximate the MMSE precoding matrix using low-complexity iterative algorithms by solving systems of linear equations. Consequently, we compare the proposed precoder with the exact matrix inversion based MMSE precoder. As the total number of arithmetic operations required is the same as in the detection algorithm, the comparison is only made in terms of bit error rate performance. Similar to the configuration above, the modulation constellation is 64-QAM, B = 128 and U = 16, 8 as illustrated in Fig. 6. We can notice that the BER performance of the proposed precoder is able to outperform that of the MMSE precoder at a lower cost in terms of time complexity.

Conclusion
We have proposed an efficient low complexity near-optimal detection and precoding algorithms for multi-user large-scale MIMO systems. By exploiting a very interesting property that the Gram matrix in massive MIMO systems is symmetric positive definite, we adopted an enhanced version of zero forcing scheme based on QR decomposition  ) . The proposed scheme outperforms linear MMSE algorithm which is near-optimal for multi-user MIMO systems, but requires the statistical information of noise. Simulation results also show that the proposed algorithms outperform recently published ones in terms of computational complexity and BER performance. Thus, this approach can allow the implementation of realistic data detection and decoding in massive MU-MIMO systems, especially if we can develop a faster algorithm for QR matrix factorisation. Although the computational complexity is significantly reduced, the proposed algorithm is near optimal. The optimal solution which is obtained by solving a Maximum likelihood criterion may be achieved using the sphere decoder combined with the proposed scheme in the phase detection.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.