1 Introduction

The wireless communication technology goal is providing universal personal and multimedia communication irrespective of mobility and location with extremely high data rates. The wireless channel, however, introduces many discrepancies to the transmitted signal including additive noise, multipath fading and Inter symbol interference (ISI). The focus of the fifth generation (5G) mobile cellular system is to support higher data rates and provide seamless services across a multitude of wireless systems and networks as 5G is mostly a MIMO system that is composed of multiple SISO channels between the different antennas. So, one can consider that the MIMO channel can be estimated based on multiple SISO channel estimations. The complexity of DL-based CE in MIMO is expected to raise exponentially with antenna number, which will lead to huge increase in complexity at both online and offline phase, which calls for suboptimum methods to approximate MIMO CE based one SISO estimations. While in this paper we restrict our study to SISO channel, in the future work we will extend the method developed here for SISO to the MIMO channel while considering complexity aspects. The one of 5G’s innovative strategies to reduce ISI and fading in multi-path environment is OFDM [1].

OFDM is known for its high spectral efficiency and inherent immunity against inter-symbol interference (ISI) and multipath fading. High data rates are possible through adjusting the modulation order and/or transmitted power individually according to the channel response and noise background for each sub-band. Nevertheless, for proper bit loading/power adjustment of each sub-band, accurate CE process should be engaged before and during the data transfer. Among the widely-used CE techniques is PACE, where a known training sequence is used to modulate agreed-upon subset of OFDM carriers (pilots). The received pilots are processed to estimate channel parameters. Two standard CE methods are commonly used: Minimum Mean-Square Error (MMSE) and Least-Square (LS) [2].

Lately, DL has got a lot of attention in communication systems [3,4,5]. In DL-based communication systems, to enhance the different conventional algorithms performance, some approaches have been proposed, including modulation recognition [6], signal detection [7], channel equalization [8], channel state information (CSI) feedback [9] and CE [10, 11].

In this paper, we discuss DL channel estimation for OFDM 5G Systems with different channel models. The main contributions of this paper are to:

  • Evaluate and compare MMSE, LS and DL methods in estimating the impulse response of different channels for next generation 5G cellular mobile communications based on OFDM modulation. The three methods are evaluated using Tapped Delay Line (TDL) and Clustered Delay Line (CDL) channel models. Both channel models are specified at The European Telecommunications Standards Institute (ETSI) technical report [12,13,14] as prominent channel models for LTE and 5G systems working above 6 GHz, The International Telecommunication Union (ITU) model for 5G systems has been used for the three systems as a base of comparison [12,13,14].

  • Use the long short-term memory (LSTM) network model [15, 16] to create the DNN for symbol classification at OFDM receiver. LSTM designs can store data over time period. This characteristic is extremely useful when we deal with Time-Series or Sequential Data. It is processing not only single data points (e.g., images), but also entire data sequences (such as speech or video inputs). The LSTM-based neural network is trained for a generated OFDM signal, where the Bit error rate (BER) is calculated and compared with the LS and MMSE estimations. During the offline training and the online deployment stages in this preliminary investigation the wireless channel is presumed to be fixed. A random phase shift for each transmitted OFDM packet is implemented to assess the effectiveness of the neural network.

The remainder of the paper is structured as follows. Section 2 surveys the previous work related to DL-based CE algorithm in communication systems. Section 3 discusses the background about CE in OFDM systems. System architecture and model training for proposed LSTM is described in Sect. 4. Simulation results are offered in Sect. 5 and finally, Sect. 6 concludes the paper.

2 Literature Review

The new requirements for ultra-high-capacity and high-reliability wireless communication led to comprehensive 5G communications research. In [5], the authors recognized the DL solutions development for 5G communications, and then suggested successful schemes for 5G scenarios based on DL. Specifically, the key ideas for many relevant DL-based communications approaches are discussed along with the research opportunities and challenges. In [10], for high-dimensional communication signals, the author proposed a novel DL-based CE technique that needs no training. By fitting the parameters of the specially built Deep Neural Network (DNN), the deep channel estimator produces a less noisy signal from the received signal. Then to estimate the channel just like a LS estimate, this generated signal is element-wise split into the pilot symbols.

The authors in [11] made attempt to apply effective DL techniques to learn frequency selective wireless channels features and tackle nonlinear distortion and interference for OFDM systems. Unlike current OFDM receivers, the proposed DL approach indirectly estimates CSI and retrieves the transmitted symbols directly. A DL model is first trained offline to fix channel distortion using data created from simulation based on channel statistics, and then used directly to recover the data transmitted online [11].

In [17], the DL-based estimation algorithm for doubly selective channels is proposed by using the DNN and the numerical results show that the proposed DL-based algorithm perform better than the current estimator through both efficiency and robustness, essentially when the channel statistics are time-varying. In [18], the author introduced a DL-based CE algorithm in communication systems to find the full channel state based on pilot values, it regarded a fading channel time–frequency response as a 2D-image and applied image super-resolution and a denoising image restoration algorithm.

In [19], the authors propose a CE Network (CENet) based on DL, called ChanEstNet. ChanEstNet uses the convolutional neural network (CNN) to extract channel response feature vectors and recurrent neural network (RNN) for CE. The authors use a large amount of high-speed channel data to conduct offline training for the learning network, fully exploit the channel information in the training sample, make it learn the characteristics of fast time-varying and non-stationary channels, and better track the features of channels changing in high-speed environments. By examining wireless fading channels time and frequency correlation, the authors of [20] offer a novel DL-based approach in OFDM systems for channel estimation and signal identification. In a PACE, a CENet proposed to replace the traditional interpolation procedure. The transmit signal is then recovered using a Channel Conditioned Recovery Network (CCRNet), which is constructed based on the CENet results.

The authors of [21], suggest a fully-connected deep neural network (FC-DNN) to solve time-varying channels with OFDM systems by making OFDM receivers as a black box, this technique simplifies the OFDM systems architecture. Various unique approaches of applying DL to the physical layer are discussed in [22, 23]. In [24], the author uses PACE in OFDM systems using DL LSTM neural networks, referred as DLLSTM-based channel state estimator (CSE) and three different optimization algorithms for DL to evaluate the proposed estimator performance at each. These optimization algorithms are the SGDm, RMSProp, and Adam. In current study, our simulation is based on different channel models. These channel models are Rayleigh Fading, TDL, CDL and 3GPP TR38.901.

3 Background

3.1 Channel Estimation in OFDM Systems

CE is one of the key techniques used in OFDM framework. Estimation of channels is clearly defined as characterizing a mathematically modeled channel. CE algorithms usually used to find channel impulse response or channel frequency response. Figure 1 demonstrates general concept of CE. Channel estimator specification requirements are to minimize the Mean Squared Error (MSE) and Computational Complexity [25].

Fig. 1
figure 1

General channel estimation

CE can be classified into PACE, blind channel estimation (BCE), and Decision‐Directed Channel Estimation (DDCE) as shown in Fig. 2 [25]. PACE is the most common method of transmitting a known signal from a transmitter, where pilot means the reference signal used by both a transmitter and a receiver. This can be extended to any wireless communication system and is of very low computational complexity. The key drawback, however, is the decrease in the transmission rate, because non-data symbols (pilots) are added. One design challenge for PACE is thus to jointly minimize the pilot numbers while precisely estimate the channel [25].

Fig. 2
figure 2

Types of channel estimation algorithms

Method of pilot assignments can mainly split into block type and comb type as shown in Fig. 3 [1, 26].

Fig. 3
figure 3

block-type pilot and comb-type pilot

The following three considerations are needed in designing the PACE of OFDM system. First, consideration must be given to the correct pilot pattern. Second, it is important to recognize PACE algorithms with low complexity. Third, a proper method of Channel equalization must be established for successful CE [1].

Current research focuses on reaching and evaluating CE techniques such as MMSE Channel Estimation, LS Channel Estimation, and DL channel estimation techniques for 5 G channel models using PACE [27].

3.2 Conventional Channel Estimation Methods

Conventional CE techniques that used in paper are:

  1. 1.

    Least Square (LS) Estimator: we derive the LS estimate of h as [1]

    $${\widehat{h}}_{LS}={F}^{-1}{X}^{-1}Y$$
    (1)

where Y is the output signal vector after OFDM demodulator as \(Y={[{Y}_{0},{Y}_{1},{Y}_{2},\dots \dots .,{Y}_{N-1}]}^{T}\), \(T\) is the transpose, X is the diagonal matrix of pilots as \(X=diag[{X}_{0},{X}_{1},\dots \dots ..,{X}_{N-1}]\), N is the pilots number in one OFDM symbol, and \(\widehat{h}\) is the pilots impulse response of one OFDM symbol, respectively. Also, F is the Fourier transfer matrix as in Eq. (2),

$$F=\left[\begin{array}{ccc}{W}_{N}^{00}& \cdots & {W}_{N}^{0(N-1)}\\ \vdots & \ddots & \vdots \\ {W}_{N}^{(N-1)0}& \cdots & {W}_{N}^{(N-1)(N-1)}\end{array}\right]$$
(2)

where \({W}_{N}^{i,k}=\frac{1}{\sqrt{N}}{e}^{-j2\pi (\frac{ik}{N})}\),Because \({\widehat{H}}_{LS}=F {\widehat{h}}_{LS}\), where \({\widehat{H}}_{LS}\) is the channel impulse response [1],

$${\widehat{H}}_{LS}={X}^{-1}Y={[\frac{{Y}_{0}}{{X}_{0}} \frac{{Y}_{1}}{{X}_{1}} \frac{{Y}_{2}}{{X}_{2}} .....\frac{{Y}_{N-1}}{{X}_{N-1}}]}^{T}$$
(3)

The main benefit of the LS algorithm is its simplicity because the noise and ICI are not considered. Thus, without using any channel statistics knowledge, the LS estimators are estimated with very low complexity but obviously it suffers from a high MSE. Generally, the LS method is used to obtain initial channel estimates at the pilot subcarriers, that are then further improved by various methods [1].

Minimum Mean Square Error (MMSE) Estimator: to minimize the MSE, the MMSE estimator uses the channel conditions second-order statistics. The general form of the MMSE estimator can be expressed as [1]

$${\widehat{H}}_{MMSE}={R}_{HY}{R}_{YY}^{-1}Y={R}_{HH}{({R}_{HH}+{ \sigma }_{N}^{2}{({X}^{H}X)}^{-1})}^{-1}{\widehat{H}}_{LS}$$
(4)

where \({R}_{HH}\), \({R}_{YY}\) denote the auto-covariance matrixes of H, Y respectively, \({R}_{HY}\) is cross covariance matrix between H and Y,

\({\sigma }_{N}^{2}\) is the noise-variance and H denotes conjugate transpose. Mostly under the lower \(\frac{{E}_{b}}{{N}_{o}}\), The MMSE estimator performance is much better than LS estimator and MMSE estimator could achieve 10–15 dB more of performance than LS. However, due to the required matrix inversions and when the OFDM system number subcarriers increase, the computation is very complex. Hence, the high computational complexity can be an important the MMSE estimator drawback [1].

3.3 Deep Neural Network Based Channel Estimation

DNN is an artificial neural network (ANN) with several hidden layers between the input and output layers. Multiple neurons are fitted with each hidden layer. The output is the weighted sum of these neurons with a nonlinear function [15]. In general, the non-linear functions used are Sigmoid function and Rectified Linear Unit (ReLU) function. The ReLU function is expressed as \({f}_{R}(\mathcal{x})=max(0,\mathcal{x})\) and Sigmoid function is \({f}_{S}(\mathcal{x})=\frac{1}{1+{e}^{-\mathcal{x}}}\) [28]. Connections between neurons are connected to a weight, which determines the input value. Neurons use a data activation function to "standardize" the neuron output. You need a large data set to train a Neural Network. Iterating through the data set and comparing the outputs would produce a cost function, showing how far the Artificial Intelligence (AI) is off the actual output. Using Gradient Descent, the weights between neurons are changed after every iteration through the data set to reduce the cost function [29]. The meaning of DNN is the existence in the neural network of a larger invisible layer numbers. As the invisible layers number increases, it increases the performance and efficiency to recognize and estimate [28].

The proposed DNN architecture for CE explained in Fig. 4 [15]. Figure 4 includes a fully connected feedforward DNN with \(\mathcal{L}\) layers: an input layer, \(\mathcal{L}-2\) hidden layers, and an output layer. \({W}_{\text{i}}\) is the \({\mathcal{n}}_{\text{i}}\times {\mathcal{n}}_{\text{i}}\) weight matrix associated with the \((\iota -1)\)th and \(\iota\)th layers, and \({b}_{\iota }\) is the bias vector for the \(\iota\)th layer. Since a single DL algorithm execution is focused on a data batch, \(V\) as the batch size and \(v(0\le v\le \mathrm{V}-1)\) as the serial index. Let \(x(v)\) represent the input and \(\mathrm{y}(v)\) as labels, of the DNN at \(v\). The DNN output is estimating \(\mathrm{y}(v)\), that mathematically can be expressed in Eq. (5)

Fig. 4
figure 4

The DNN Structure, where the circles labeled with “+ 1” are the bias units

$$y(v)=f(x(v),W)={f}^{(\mathcal{L}-1)}({f}^{(\mathcal{L}-2)}(.............{f}^{(1)}(x))),$$
(5)

4 LSTM Based Channel Estimation

In this paper, we utilize LSTM model to implement the DNN-based CE. LSTM is a Recurrent Neural Network (RNN) architecture that effectively overcomes disappearance gradient issue in a naively designed RNN [15]. RNN is a type of ANN where the outputs of specific nodes is fed back in a recursive way to affect the input to the same node. Because its current output is based on prior computations, RNN can retain memory. However, RNN is known to suffer from a “vanishing gradient” problem, where the derivative of the loss function w.r.t the weight parameter becomes very small. LSTMs have been intended to address this issue by including new gates that allow for improved gradient control and protection of long-range dependencies. Figure 5 shown the memory cell and the gates which are the LSTM essential part [30].

Fig. 5
figure 5

Information flow in a long short‐term memory (LSTM) block of the RNN

LSTM contains three layers: input layer, hidden layer and output layer. Input layer interfaces with input data, while the LSTM cells perform the function of hidden layer. The prediction results are shown in the output layer.

An input gate \({i}_{t}\), memory cell \({c}_{t}\), forget gate \({f}_{t}\), and output gate \({o}_{t}\) are all present in each LSTM [16]. The cell stores values over arbitrary time intervals. These gates set information flow into and out cell [15]. The RNN models its \(\{{x}_{1},{x}_{2},...,{x}_{n}\}\) input sequence by using the recurrence:

$${y}_{t}=f({y}_{t-1},{x}_{t})$$
(6)

where \({x}_{t}\) and \({y}_{t}\) are input and output layers at time t respectively. To overcome the disappearance gradient or explosion problem, gates are introduced into the recurrence function \(f\). LSTM cells states are determined as follows:

$${i}_{t}=\sigma ({W}_{ix}{x}_{t}+{W}_{iy}{y}_{t-1}+{b}_{i})$$
(7)
$${f}_{t}=\sigma ({W}_{fx}{x}_{t}+{W}_{fy}{y}_{t-1}+{b}_{f})$$
(8)
$${o}_{t}=\sigma ({W}_{ox}{x}_{t}+{W}_{oy}{y}_{t-1}+{b}_{o})$$
(9)
$${\widetilde{C}}_{t}=tanh({W}_{cx}{x}_{t}+{W}_{cy}{y}_{t-1}+{b}_{C})$$
(10)
$$C_{t} = f_{t} { } \otimes { }C_{t - 1} + i_{t} { } \otimes \tilde{C}_{t}$$
(11)
$$y_{t} = O_{t} { } \otimes { }tanh\left( {C_{t} } \right)$$
(12)

where W’s, b’s, \({C}_{t}\) and \(\widetilde{C}\) are weight matrix, bias vector, current cell state and new candidate values for cell state, respectively [30]. The flowchart of the proposed LSTM neural network is shown in Fig. 6

Fig. 6
figure 6

The flow chart of proposed LSTM neural network

4.1 System Architecture for DL Channel Estimation

The OFDM architecture system utilizing DL-based CE and signal detection is shown in Fig. 7 [11]. Transmitted data is inserted with pilots being converted into a data bit stream, then the IFFT is used to convert message from the frequency domain into the sender's spatial domain or time domain. Then, CP was inserted to reduce ISI. The CP size must be greater than maximum channel delay spread [28]. Let a multi-path channel represented by complex random variables \({\left\{h(n)\right\}}_{n=0}^{N-1}\). Received signal is equated as

$$y(n)=x(n)*h(n)+w(n)$$
(13)

where * indicates circular convolution while x(n) and w(n) denote the noise and transmitted signal. Finally, CP is removed, perform FFT, in frequency domain the received signal is [11]

$$Y(k) = X(k) \cdot H(k) + W(k)$$
(14)

where Y (k), X(k), H(k), and W(k) are the DFT of y(n), x(n), h(n) and w(n), respectively.

Fig. 7
figure 7

System model of DL-based channel estimation

Firstly, we consider OFDM block has pilot symbols however resulting OFDM blocks contains of transferred data. With the data blocks and pilot blocks, a frame is created, and the channel is to be constant spanning, but it varies from one frame to another. To obtain an effective DNN model for CE two steps were involved. In the offline training first step, trained with the received OFDM samples which were generated with different data series and under various channel conditions with some statistical properties. In the online deployment second step process yields output which retrieves the transmitted data without specifically estimating the wireless channel [28].

4.2 Model Training

Assuming OFDM modulation and the wireless channels as hidden data, the model is trained. Researchers have established various channel models through the last years which explain the channel through channel data. On OFDM frame, the received OFDM signal is achieved undergoing channel distortion along with noise. The received signal and the original signal were collected as the training data. 1 pilot block and 1 block of data are given as DL model input. In both training and testing phases, we prepare what we called a "Feature vector", by getting both real and imaginary values of a complex input vector first, then we zip them together to shape a double-size real-data vector as shown in Fig. 8 [2]. In the training phase, those feature vectors are fed to the LSTM model with the corresponding target symbols as batches. In testing/prediction phases, the trained model predicts the corresponding symbol for the extracted input feature vector.

Fig. 8
figure 8

The proposed deep neural network for estimating complex input vector

The model is trained to reduce variance between neural network output and original message. The L2 loss function is indicated below.

$${L}_{2}=\frac{1}{N}{\sum }_{k}{(\widehat{X}(k)-X(k))}^{2},$$
(15)

where \(\widehat{X}(k)\) is the estimated data and \(X(k)\) is original data [28]. The LSTM Layer employs sigmoid function as activation function gate and ‘tanh’ function as state activation function. The model uses the solver ‘adam’ and is trained for 100 epochs. Rate of learning is set to 0.01, while the threshold of gradient is set to 1 to prevent gradient explosion. LSTM network is trained with the specified training options by using “trainNetwork” function [31]. At each iteration, the “trainNetwork” function generates a graphic and shows training metrics. Each iteration involves a gradient estimation and a network parameter update.

Figure 9 shows the training procedure when fed the LSTM network with channel impulse response of 3GPP TR38.901channel model and transmitted symbols.

Fig. 9
figure 9

The LSTM network model training

5 Simulation

5.1 Description of Simulation

  1. 1.

    System parameters Table 1 lists the OFDM system parameters used in the simulation. We assume perfect synchronization because the purpose is to observe channel estimation performance. In addition, to prevent ISI, we chose a guard interval that was bigger than the maximum delay spread. Different signal to noise ratios (SNR) and channel models are used in the simulations. In our simulation, SNR represents Energy per symbol per noise power spectral density (Es/No).

  2. 2.

    Channel model If you wish to develop and test a wireless communication system, you’ll need to know about the channel model. For link level assessments in 5 G, two channel models are currently being proposed: Tapped Delay Line (TDL) and Clustered Delay Line (CDL). In the simulation, we have used Rayleigh fading channel, TDL, CDL and 3GPP TR38.901 channel model

  3. 3.

    Channel estimation The channel coefficients are estimated in pilot subcarriers by using LS algorithm, MMSE algorithm and DL algorithm

Table 1 Simulation parameters

5.2 Simulation Results

Three MATLAB script files are written to explain the DL methods performance for joint CE and symbol detection in OFDM wireless communication systems. The first script shows in a single user OFDM system, how to generate validation and training data for DL model. For a single subcarrier selected, validation and training data are collected focused on a pre-defined metric. Each transmitted OFDM packet contains one data symbol and one pilot symbol. In the pilot sequence, data symbols might be interleaved. Each training sample contains all symbols in a received OFDM packet and is represented by a feature vector with a data structure similar to the sequence classification MATLAB example using LSTM network. The second script is for configuring DNN training settings. The DNN is trained on training data for specified subcarrier. The third script handles the model testing. For each SNR point, it generates testing data and calculates the BER based on DL, LS and MMSE.

In Fig. 10, an LSTM model trained with simulation data and compared to traditional methods under various signal-to-noise ratios (SNRs). As shown in the Fig. 10, in offline training stage, the proposed estimator LSTM is trained based on collected simulation data sets (Transmit Symbols and channel Models). In the online deployment stage, after the training is done, and optimum weights for LSTM is determined, the same LSTM is used for joint CE and symbol detection. LSTM model generates the output that recovers the transmitted data without explicitly estimating the wireless channel.

Fig. 10
figure 10

Proposed LSTM-based channel estimation method

5.2.1 Simulation Result When Using CDL Channel Model

Figures 11 and 12 introduce the behavior of the estimation methods when using CDL channel model. The gain compared to DL estimation is clear (2 dB) for large pilot numbers (64). Nevertheless, the significant gain of the DL technique appears when using small pilot numbers (Fig. 11), where the other methods fail to provide usable BER. It is worthy noticing that DL proves robust against less pilot numbers for both TDL and CDL channel models, which reflects the power of this method when targeting high bandwidth efficiency. The reason for the superior the DL performance is that the CSI is not uniformly distributed, and the wireless channels characteristics can be learned based on the training data generated from the model.

Fig. 11
figure 11

BER curves of different channel estimation algorithm in CDL channel model at different pilots

Fig. 12
figure 12

BER curves of different channel estimation algorithm in CDL channel model at using cyclic prefix

Figure 12 shows the impact of CP on different CE methods when remove from the system. CP costs time and energy and is inserted to OFDM symbols to remove ISI. Since the proposed DL scheme performs joint CE and detection, we test the impact of reducing the CP length from the transmitter side given that DL is used at the receiver. From the Fig. 12, neither MMSE nor LS can estimate channel effectively. The accuracy tends to be saturated when SNR is over 30 dB. Although the method of DL works better than MMSE and LS with reduced CP, SNR saturation occurs as well, which indicates the importance of using CP for the three CE methods. Again, this result indicates that the time-dispersion features of the wireless channel have been discovered and can be learned by the DL in the training stage.

5.2.2 Simulation Result When Using TDL Channel Model

Figures 13 and 14 show the BER performance which has been observed for the OFDM signal for QPSK modulated signals in TDL channel model. Figure 13 show the impact of using different pilot on different CE methods. From this Figure, when 64 pilots used for CE the results show that the LS method is very approaching the MMSE algorithm. The DL has best performance and achieves BER less than traditional methods after 20 dB. When 8 pilots are used, the BER curves of the LS and MMSE methods saturate when SNR is above 20 dB while the DL method still has the ability to decrease its BER with increasing SNR, which shows that the DL is powerful to the pilot numbers used for CE. The reason for the superior the DL performance is that the CSI is not uniformly distributed, and the characteristics of the wireless channels can be learned based on the training data generated from the model. Figure 14 shows the impact of CP on different CE methods. From the Figure the DL method better works well in the range between 15 and 35 dB when neglecting CP.

Fig. 13
figure 13

BER curves of different channel estimation algorithm in TDL channel model at different pilots

Fig. 14
figure 14

BER curves of different channel estimation algorithm in TDL channel model at using cyclic prefix

5.2.3 Simulation Result When Using 3GPP TR38.901 Channel Model

Figure 15 compares the BER of the three estimation methods when using different pilot numbers. It is clear that the DL method performance is much more robust when using less pilot numbers. This feature is crucial for the real-time implementation of DL CE, as the same performance could be achieved using significantly less calculations.

Fig. 15
figure 15

BER curves of different channel estimation algorithm in 3GPP TR38.901 channel model at different pilots

Figure 16 reflects the effect of omitting CP at OFDM signal for the three estimation methods. It is clear that the three methods performance degrades when CP is not used.

Fig. 16
figure 16

BER curves of different channel estimation algorithm in 3GPP TR38.901 channel model at using cyclic prefix

5.2.4 Simulation Result When Using Rayleigh Fading Channel Model

Figures 17 and 18 show the BER performance which has been observed for the OFDM signal for QPSK modulated signals in Rayleigh fading channel. Figure 17 compares the BER performances of all the methods when inserting 64 pilots into OFDM signal and the CP length is 16. The results show that DL is very similar to the MMSE algorithm. Figure 18 compares the BER performances of all the methods when inserting 64 pilot into OFDM signal and rejecting the CP length. From this Figure the performance of three algorithm is approximately similar.

Fig. 17
figure 17

BER curves of different channel estimation algorithm in Rayleigh fading channel at using 64 pilots

Fig. 18
figure 18

BER curves of different channel estimation algorithm in Rayleigh fading channel at using cyclic prefix

In summary, from the above Figures the simulation results show that DL estimator is better than LS and MMSE estimators in estimating the channel and detecting the transmitted symbols. Furthermore, the DL estimator also shows its robustness with the different pilot densities and with the CP is neglected.

5.3 Computational Complexity Analysis

One of the most critical aspects of estimator efficiency is computational complexity. The computational complexities of all estimators are evaluated in this section. The computational complexity is calculated in terms of real-valued mathematical operations such as multiplication/division, summation/subtraction, and other operations needed to estimate the channel for the obtained OFDM symbol [32].

  1. 1.

    Computational complexity for LSTM estimator:

Figure 19 shows the parameters layers of the proposed LSTM neural network.

Fig. 19
figure 19

LSTM architecture

When working with LSTM, we evaluate the online computational complexity by the number of multiplications needed to compute the activation of all neurons (vector product) in all network layers. The transition between the \({\iota }\)th and \({\iota -1}\)th layers require \({\mathrm{J}}_{\iota -1}{\mathrm{J}}_{\iota }\) multiplications for the linear transform. The additional operations in DNN are simple, which include the sum of bias and the vector product in the activation functions. Therefore, the total number of real-valued multiplications and summations in DNN network is given by [32]:

$${N}_{mul/sum}=2\sum_{\iota =2}^{L}{J}_{\iota -1}{J}_{\iota }$$
(16)

From the Fig. 19, we use frame size of 128 samples representing 64 subcarriers. Every subcarrier contains 2 symbols so every OFDM subcarriers contains 128 bits. Therefore, the total number of multiplications required to estimate the channel in DL method is 17572 per OFDM symbol.

  1. 2.

    Computational complexity for LS estimator:

We can get from the above Eq. (3),

$$O({\widehat{H}}_{LS})={O(X}^{-1}Y)=n+ {n}^{2}$$
(17)

where capital \(O\) notation represents the computational complexity and \(n\) represents the matrix size. Equation (17) consists of one inverse operation and one multiplication operation. So, we need to consider \(n\) operation for inverse operation because matrix size of x is (n × 1) and for multiplication need \({n}^{2}\) operation. The \({\widehat{H}}_{LS}\) estimator computational complexity is \(n+ {n}^{2}\). Here, n = 64, so the total number of multiplications required to estimate the channel in LS method is 4,160 per OFDM symbol [33].

  1. 3.

    Computational complexity for MMSE estimator:

From Eq. (4), we can get

$$O({\widehat{H}}_{MMSE})={O(R}_{HY}{R}_{YY}^{-1}Y)={n}^{3}+{n}^{2}+{n}^{3}$$
(18)

Equation (18) consists of number of operations needed for square matrix (n × n) ∗ (n × n) is O(\({\mathrm{n}}^{3}\)) and matrix multiplication of (n × n) * (n × 1) is O (n * n * 1) or O(\({\mathrm{n}}^{2}\)) and for inverse operation \({\mathrm{n}}^{3}\). The \({\widehat{H}}_{\mathrm{MMSE}}\) estimator computational complexity is \({\mathrm{n}}^{3}+{\mathrm{n}}^{2}+{\mathrm{n}}^{3}\). Here, n = 64, so the total number of multiplications required to estimate the channel in MMSE method is 528,384 per OFDM symbol [33].

The difference between the proposed estimation methods in terms of related computational complexity can be seen to be substantial. The LS estimator gives the lowest computational complexity because it consists of only one multiplication and one inverse operation, but it is the least-accurate estimation method. The LSTM estimator gives lower computational complexity than MMSE estimator because of its reduced large number of matrix multiplications used in MMSE, while provide better performance in terms of SER.

Note that, the computational complexity is related to computational delay. When the computational complexity increases, the computational delay also increases and vice versa. The computational delay to assess the proposed LSTM and conventional system performance is related to the number of operations involved in executing that system. The LSTM estimator operating on a 256-sample input requires Simulink software to perform a certain number of multiplications for each input frame. The actual amount of time that these operations consume depends heavily on the performance of both the computer hardware and underlying software layers, such as the MATLAB environment and the operating system. Therefore, computational delay for a particular model can vary from one mobile platform to another. Therefore, the total number of multiplications required to estimate the channel in DL method is 17,572 per OFDM symbol. When we use LS estimator, the total number of multiplications required to estimate the channel in LS method is 4160 per OFDM symbol When we use MMSE estimator, the total number of multiplications required to estimate the channel in MMSE method is 528,384 per OFDM symbol. So MMSE estimator is more delay than LS estimator and LSTM estimator.

6 Conclusion

In this paper a DL channel estimation technique based on LSTM Training algorithm has been developed for OFDM systems over different channel models. The effectiveness of different CE techniques for next generation 5G cellular mobile communications based on OFDM modulation scheme is simulated and implemented for different pilot arrangements and for different CE schemes under different operating parameters by means of computer simulation using MATLAB. BER analysis of the DL based estimator is obtained and compared with the LS, MMSE estimation techniques. Three channel models have been used in the evaluation, namely TR38.901, CDL and TDL channel models. Based on the simulated data, the model is trained offline that views OFDM and the wireless channels as black boxes. DL proved noticeably more robust when using small pilot numbers to estimate the channel. Also, both DL and MMSE methods prove to be robust against shorter CP’s compared to LS.

DL methods for CE are therefore a promising CE method for cellular systems. The reduction of the required computational power required for estimation, together with the time required to train the model are two crucial subjects towards consideration of DL in real-time wireless communications. The DL from our simulation results detects the transmitted symbols with a performance comparable to the estimator for the MMSE. In addition, as less training pilots are used, the CP is neglected, the DL becomes more reliable than traditional methods. In short, DL in wireless communications is a promising method for CE and signal detection.