1 Introduction

Over the past few years, providing customers with access to broadband wireless communication services has become the top priority for businesses. As a result, researchers have focused on developing new wireless technologies that can handle high data rates while remaining unaffected by radio frequency (RF) impairments. In recent years, multi-carrier orthogonal frequency division multiple access (OFDMA) schemes have emerged as the dominant principle for broadband wireless applications due to their high spectral efficiency obtained by selecting a special set of overlapping orthogonal subcarriers [1].

It is challenging perfectly recover the transmitted data at the receiver side due to the significant inter-symbol interference (ISI) effect that is formed between the highly broadcasted symbols in the multi-path environment of wireless communication channels. As a result, it is crucial for wireless communications systems to find a solution to the ISI issue. Hence, to reduce the inferior consequences of ISI, you cannot get around the need for strong channel equalization techniques.

The objective of the channel equalization is to produce a nearly flat response in the frequency domain (FD) from the cascade of the channel and the equalizer, thereby minimizing or eliminating the negative effects of the ISI in the multi-path fading channels. Various types of equalizers, including linear equalizers and nonlinear equalizers, are used in the digital broadband wireless communication receivers [2].

Furthermore, it is possible to think of the channel equalization as a classification problem in which an equalizer is built as a decision-making device to reconstruct the symbol sequence with the highest possible accuracy [3]. Complex classification tasks are within the capabilities of artificial neural networks (ANNs) because they can form arbitrary nonlinear decision boundaries [3, 4]. In general, the ANN equalizers are superior to linear and nonlinear equalizers in terms of equalizer performance and symbol error rate (SER) [5,6,7,8].

Machine learning (ML) [9, 10] techniques especially deep learning (DL) ANN-based methods has been significantly developed to aid in the resolution of numerous challenging issues, including face recognition [11, 12], image synthesis and semantic manipulations [13], sentiment classification [14], image recovery [15], digital image augmentation [16] and many other aspects. DL uses different kinds of neural networks such as convolutional neural networks (CNN [17, 18], multilayer perceptron (MLP) [19], and recurrent neural networks (RNN) [20]; to learn abstract features from data. Additionally, the availability of high-speed computational power as well as the effectiveness of DL in different fields have prompted its utilization for the development of strong broadband wireless communication systems [21, 22]. Numerous researchers have proposed the use of DL in the design of broadband wireless communication systems and exhibited enhanced Bit Error Rate (BER) results. In this regard, deep ANNs have recently received a lot of attention in the field of channel equalization because of their abilities to accomplish the mapping between input and output domains in a way that's not linear [3, 23, 24].

In this case, the deep ANN approach is a good choice among the available channel equalization options. However, there are still some concerns and questions that require answers, such as the following:

  1. 1.

    Is it possible to improve the performance gain of the equalization process of the DL model in terms of BER by changing the activation functions (AFs).

  2. 2.

    Is it possible to improve the learning process by varying the loss functions, and how does this affect the robustness and efficiency of the proposed DL model.

1.1 Motivations and contributions

Hochreiter and Schmidhuber [25] came up with Long short-term memory (LSTM), which is an architecture for a RNN that has been proven to efficiently work for different learning issues, particularly those with sequential data [26]. The LSTM structure contains blocks, which are a set of recurrently interconnected nodes. In RNNs, the gradient of the error function could rise or decline exponentially with time, which is identified as the vanishing gradient problem. LSTMs reconfigure their network units to address this issue. Each LSTM block is made up of one or more memory cells that are self-connected, as well as input, forget, and output multiplicative gates. The gates improve the performance by giving the memory cells more time to store and retrieve data [26].

LSTMs and bidirectional LSTMs have considerable impacts in a wide range of applications, particularly classification ones. For example, these networks can be used in online mode detection [27], sound classification [28, 29], and handwriting recognition [30, 31]. Additionally, LSTMs are utilized for speech synthesis [32], acoustic modeling [33], emotion identification [34], and speech translation [35]. Moreover, these networks are used for protein structure prediction [36, 37], language modeling [38], human activity analysis [39], video and audio data processing [40], and have been successfully utilized in 5G wireless communication systems [41,42,43].

In general, a neural network's performance depends on a variety of aspects, including the network's structure, the learning algorithm, and the activation functions (AFs) utilized in each node. The importance of AFs has not received as much attention as learning algorithms and architectures have in neural network research [44,45,46], though the AFs are very important to NNs due to their assistant in learning abstract features through nonlinear transformations [46]. The value of the AFs determines the decision borders as well as the total input and output signal strength of the node. Choosing the right AFs can have an effect on how well networks work, how complicated they are, and how well the algorithms converge [45, 47].

Throughout this work, we formulate the channel equalization dilemma in the modified version of orthogonal frequency division multiple access (OFDMA), known as single-carrier FDMA, which gives a moderate peak-to-average power ratio (PAPR) compared to the OFDMA, and has been used in the long-term evolution (LTE) standard for uplink (UL) transmission, as a DL task. In the DL model, the channel equalization and signal detection processes are treated as a black boxes, and their functions are constantly approached by a DNN model based on the recurrent feedback LSTM-NN. This model can do equalization and symbol decoding at the same time, even though it does not have any knowledge about channel state information (CSI). The DL model takes features from the SC-FDMA system's received messages and labels them based on the constellation map used at the transmitter.

In this study, we evaluate the performance of several AFs to improve the learning process that improve the learning process of the DL model by fixing the issue of vanishing gradients and leading to more accurate classifications than traditional ones. These AFs will be utilized in the LSTM block's input and output instead of the currently used "tanh" AF, which is known as a state activation function (SAF). Thus, we will build a reliable SC-FDMA wireless communication system using the modified LSTM DNNs. Finally, simulation findings demonstrated that our proposed scheme outperforms other widely employed signal equalization schemes in terms of bit error rate (BER). This effective illustration demonstrates the value of DL in SC-FDMA systems.

In summary, our contributions are:

  1. 1.

    We construct a novel LSTM network with different SAFs in the equalization and symbol detection process as an alternative to the conventional hyperbolic tangent (tanh) function.

  2. 2.

    We construct a reliable and efficient SC-FDMA receiver for combined channel state equalization and symbol detection implicitly.

  3. 3.

    We evaluate the influences of the alternative optimization algorithms, like Adam, RMSProp, and SGdm, on the learning stage of the proposed network to produce the most efficient and reliable model and, consequently, on the equalization and symbol detection performance of the deep network.

  4. 4.

    We assess the effects that varying loss functions, e.g., cross-entropy and sum-squared errors, have on the learning process and how this affects the robustness and the efficiency of the proposed model.

  5. 5.

    We compare the performance of the proposed framework with that of linear equalizers (LEs) such as zero-forcing (ZF) and minimum mean squared error (MMSE).

  6. 6.

    To figure out how well the proposed DL model works, we compare its BER performance with that of the other existing NN-based blind equalization algorithms, such as both the convolutional neural network-based (CNN-based) blind equalization algorithm described in [48] and the Bi-LSTM-based equalization algorithm described in [24].

The following sections will organize the remainder of the paper: Sect. 2 is devoted to describing the methods including the system description subsection, the DL model subsection, and the activation functions subsection. Meanwhile, Sect. 3 introduce the offline training of the suggested scheme. The results and discussions are then shown in Sect. 4. Finally, Sect. 5 concludes the study.

2 Methods

2.1 System model

Figure 1 shows the proposed SC-FDMA system according to [49]. The system's overall subcarriers are M. Each of the N subcarriers is assigned to a single user from among those Nu users, where M = Nu × N. All of this is achieved just after the N-point FFT transformation. Following the M-point IFFT, a cyclic prefix of length Lcp, equal to or greater than the length of the channel's transfer function Lch, would be inserted. This formula \({g}_{k}={F}_{M}^{H}{T}_{k}{F}_{N}{s}_{k},\) represents the time domain (TD) transmitted signal that corresponds to the kth user in vector form, without the Lcp. Where sk is the kth user's N × 1 symbol vector, Tk is an M × N subcarrier mapping matrix, and \({F}_{N}^{H}\) and \({F}_{M}^{H}\) are the FFT and IFFT matrices, respectively, with dimensions N × N and M × M. Assume that hk is the (Lch × 1) transfer function of the channel between the kth user and the base station, with maximum delay spread Lch smaller than the Lcp to completely eliminate the ISI. At the other end (receiving side), the process will be reversed. The CP is first eliminated, after which the SC-FDMA symbols are transformed into FD by M-point FFT  along with subcarrier demapping to extract the FD received signal for the kth user. The FD received signal is then equalized using any conventional technique, such as in [49], to mitigate the effects of the ISI. After N-point IFFT TD transformation, demodulate and find the kth user original transmitted symbols.

Fig. 1
figure 1

The proposed SC-FDMA scheme

Instead of using traditional channel equalization techniques, the proposed method uses a DNN model. This creates an end-to-end approach that can retrieve the original information directly from the information that was sent, without having to get into the intricacies of the channel equalization and symbol detection systems.

2.2 DL model

The LSTM NN structure is covered in this part as a DL model for combined channel equalization and symbol detection. The proposed DL LSTM-based channel equalizer is trained offline using the simulated data.

The LSTM network is a type of recurrent neural network that has the ability to learn long-term correlations among time step sequences [25]. Various LSTM-based systems have been designed to tackle issues such as speech recognition, handwriting recognition, and others [50,51,52,53]. In Fig. 2, we see the single-cell LSTM block, which is a collection of recurrently interconnected nodes.

Fig. 2
figure 2

LSTM neural network architecture

At time \(t\), the input vector \({x}_{t}\) is inserted in the network and the mathematical model for the LSTM-NN setup is given by the following six equations as in [54].

$$i_{t} = \sigma_{g} \left( {w_{i} x_{t} + R_{i} h_{t - 1} + b_{i} } \right)$$
(1)
$$o_{t} = \sigma_{g} \left( {w_{o} x_{t} + R_{o} h_{t - 1} + b_{o} } \right)$$
(2)
$$g_{t} = \sigma_{c} \left( {w_{g} x_{t} + R_{g} h_{t - 1} + b_{g} } \right)$$
(3)
$$f_{t} = \sigma_{g} \left( {w_{f} x_{t} + R_{f} h_{t - 1} + b_{f} } \right)$$
(4)
$$c_{t} = f_{t} \odot c_{t - 1} + i_{t} \odot g_{t}$$
(5)
$$h_{t} = o_{t} \odot \sigma_{c} \left( {c_{t} } \right)$$
(6)

where \(i, o,\mathrm{and }f\) represent the input, output, and forget gates, respectively. The forget and input gates enable the LSTM NN to effectively store long-term memory. The input gate finds the information that will be used with the previous LSTM cell state \({c}_{t-1}\) to obtain a new cell state \({c}_{t}\) based on the current cell input \({x}_{t}\) and the previous cell output \({h}_{t-1}\). The output gate finds current cell output \({h}_{t}\) by using the previous cell output \({h}_{t-1}\) at current cell state \({c}_{t}\) and input \({x}_{t}\). The forget gate allows forgetting and discarding the information by currently used input \({x}_{t}\) and cell output \({h}_{t}\) of the last process. Using the forget and input gates, LSTM can decide which information is abandoned and which is retained. \({\mathrm{g}}_{t}\) defined in Eq. 3 is the block input/cell candidate at time \(t\) which is a tanh layer and with the input gate in Eq. 5, the two decides on the new information that should be stored in the cell state. \({c}_{t}\) is the cell state at time \(t\) which is updated from the old cell state Eq. 5. Finally, \({h}_{t}\) is the cell output/block output at time t.

The output of the block \({h}_{t}\) is recurrently connected back to the block input \({\mathrm{g}}_{t}\) and all of the gates (\(i, o,\mathrm{and }f\)). \({\sigma }_{\mathrm{g}},\mathrm{ and }{\sigma }_{c}\) represent the gate activation function (sigmoid function), and the state activation function (tanh function), respectively. \(\odot\) denote the Hadamard Product (Elementwise Multiplication). \(W={[{w}_{i}{w}_{f}{w}_{\mathrm{g}}{w}_{o}]}^{T},b={[{b}_{i}{b}_{f}{b}_{\mathrm{g}}{b}_{o}]}^{T} and R={[{R}_{i}{R}_{f}{R}_{\mathrm{g}}{R}_{o}]}^{T}\) are the input weights, the biases, and the recurrent weights, respectively.

2.3 Activation functions

The sigmoid and hyperbolic tangent functions are the most frequently used activation functions in neural networks. However, a number of separate studies have looked into other activation functions [44,45,46].

In this article, we will look at how well the DNN LSTM works when these activation functions are used instead of the state activation functions (hyperbolic tangent function (tanh) of the basic LSTM block to effectively combine channel state equalization and symbol detection in the SC-FDMA wireless communication systems. Table 1 lists the most common activation functions that have been used: tanh, Gaussian, GELU, Cloglogm, Modified Elliott, Elliott, Bi-tanh1, Bi-tanh2, Rootsig, Softsign, Wave, and Aranda [44,45,46,47, 54,55,56,57,58,59].

Table 1 Label, definition, and corresponding derivative, for each activation function

3 Offline training of the suggested DL model

Due to the lengthy training period required for the proposed model and the large amount of variables that must be tuned at the time of training, e.g., weights and biases, training must be conducted offline. The trained model is utilized to extract the transmitted data during online implementation.

For the bulk of machine learning tasks, obtaining a huge amount of labeled data for training is a challenge. Alternatively, training data for channel equalization issues can be easily gotten by simply conducting a simulation. Obtaining the training data is straightforward once the channel parameters and model are known.

Offline training of the neural networks is carried out using simulated data. When you run a simulation, you start with a random message s and send the SC-FDMA frames to the receiving end through a simulated channel model. Each frame has one SC-FDMA symbol in it. To retrieve the received SC-FDMA signal, SC-FDMA frames with varying channel defects are used. After undergoing the distortion of the channel and removing the CP, the incoming signals y are gathered as a training samples. As shown in Fig. 1, the network's input data are the signals that are received y, and the actual information messages s. These signals act as the supervision labels.

The same dataset is used for training and testing all equalizers, whether they are CNN-based, Bi-LSTM-based, or LSTM-based with modified loss and SAFs.

As the proposed modified DL loss and SAFs LSTM-based channel equalizer and symbol detector is created as shown in Fig. 3, the weights and biases of the recommended equalizer will be adjusted (tuned) before the deployment using the appropriate optimization algorithm.

Fig. 3
figure 3

DL LSTM-NN framework for the proposed joint channel equalizer and symbol detector

A number of different optimization algorithms are used to get the best possible DL channel equalization and symbol detection model for the SC-FDMA wireless communication system. Some of them are adaptive moment estimation (Adam), root mean square propagation (RMSProp), and stochastic gradient descent with momentum (SGdm).

To figure out the best parameters (weights and biases), a loss function is used to figure out how far the network output is from the desired output, and by minimizing the loss function and updating the weights and biases, the optimization algorithms train the model and reach the optimal network parameters.

The loss function, in its simplest form, is the difference between the network's output and the original messages, which can be expressed in a variety of ways. The loss functions we used in our experiments are the cross-entropy and the sum of squared errors (SSE), and they can be expressed as follows:

$${\text{Loss}}_{{{\text{crossentropyex}}}} = - \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{c} s_{ij} \left( k \right)\log \left( {\hat{s}_{ij} \left( k \right)} \right),$$
(7)
$${\text{Loss}}_{{{\text{SSE}}}} = - \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{c} \left( {s_{ij} \left( k \right) - \hat{s}_{ij} \left( k \right)} \right)^{2} ,$$
(8)

where \(c\) is the class number, \(N\) is the sample number, \({s}_{ij}\) is the \(i\mathrm{th}\) transmitted data sample for the \(j\mathrm{th}\) class and \({\widehat{s}}_{ij}\) is the modified DL SAF LSTM-based model response for sample \(i\) class \(j\).

During the offline training period, we change the SAF (hyperbolic tangent function (tanh) from Table 1 to see how it affects the performance of our DL model during the online implementations.

Finally, after the offline training, the model is capable of recovering data automatically, without the need for explicit channel estimation and symbol detection processes. These processes are accomplished together. Figure 4 shows how to train offline to get a learned DL model based on LSTM-NN.

Fig. 4
figure 4

Offline training of the DLLSTM-NN

The most important limitations and challenges of the proposed system are that each user in the system is allocated four subcarriers, with the possibility for each subcarrier to be one of four QPSK constellation points. In the training, the quantity of labels is denoted as MsN, where Ms represents the constellation (modulation) order and N signifies the subcarriers that are exclusively allocated to a single user. Consequently, there are 256 classes since there are 44 = 256 labels in the training set. For the LSTM-NN, this means that the fully connected layer size needs to be 256 in order to match the number of classes. The number of labels will increase if higher-order modulations are used or if more subcarriers are allocated to each user. The increase in the number of labels leads to an increase in the number of classes and an increase in the size of the LSTM-NN fully Connected Layer. Such an approach requires a very large amount of data necessary for good or effective training and will lead to an increase in training time and decreased usability, ultimately rendering the system impractical. We therefore advise the utilization of QPSK.

4 Results and discussions

Several experiments were carried out to demonstrate the efficiency of the proposed modified loss and state activation functions (SAFs) (Table 1) LSTM-based configurations for the channel equalization and symbol detention techniques in the SC-FDMA wireless communication system. The proposed DLNN-based equalizer was trained offline based on several learning optimizers, namely: The SGdm, RMSProp, and Adam [60], and compared with the conventional Zero-Forcing (ZF) and Minimum Mean Square Error (MMSE) linear equalizers and DL CNN-based and Bi-LSTM-based equalization algorithms [24, 48], in terms of bit error rates (BERs) at different signal-to-noise ratios (SNRs) using the collected data sets. The training dataset is gathered for four subcarriers. The transmitter sends the SC-FDMA packets to the receiver, each containing one SC-FDMA data symbol. The SC-FDMA system and channel specifications are listed in Table 2. The employed DL LSTM NN architecture parameters and training settings are summarized in Table 3.

Table 2 SC-FDMA system architecture and channel specifications
Table 3 DL model architecture

In these simulations, we also looked at how well the proposed equalizer worked with two different loss functions: default (cross-entropy) and sum square of error (SSE).

Instead of using curves, which produce a muddled picture because of their overlap, we used heatmap visualizations, as shown in Fig. 5, A heatmap (or heat map) is a graphical representation of data that uses colors to represent values. Using a heatmap, even a large amount of data can be visualized and understood quickly. Heatmaps make it easier to combine quantitative and qualitative data for data analysis and provide a quick overview of a model's performance. As a visual tool, heat maps help make informed, data-based decisions. As an example of using the heatmap charts, the authors in [42] use them in their published work.

Fig. 5
figure 5

BERs of the proposed modified DL loss and SAFs LSTM-based equalizers, the traditional linear equalizers, Bi-LSTM-based equalizer, and the CNN-based equalizer using the Adam learning algorithm, and the default (cross-entropy) loss function

First, we will discuss the default (cross-entropy) loss function. In the case of deep-fading channels, it is well known that the linear equalization may amplify the noise at the spectral null, which has a negative impact on the performance of the SC-FDMA system. So, it is clear from Fig. 5, that all the proposed modified DL SAFs LSTM-based equalizers using the Adam learning algorithm and cross-entropy loss function outperform both the ZF and the MMSE equalizers at SNRs ranging from 10 to 20 dB, while at 8 dB all the proposed SAFs LSTM-based equalizers outperform both the ZF and the MMSE equalizers except the proposed GLEU SAF, which outperforms the ZF only.

Also, it is clear from Fig. 5, that most of the proposed modified DL SAFs LSTM-based equalizers have promising results compared to this using the default (Tanh) SAF. Furthermore, it should be noted that most of the proposed modified DL SAFs LSTM-based models demonstrated exceptional signal detection capabilities when the SNR exceeded 12 dB. In this case, the BER is zero, which serves as an indication of the model's capabilities.

In contrast to alternative DL-based channel equalization systems, such as those based on CNN and Bi-LSTM [24, 48], the modified DL SAFs LSTM-based equalizers that have been proposed exhibit encouraging performance across the majority of SNR levels, as shown in Fig. 5.

Figure 6 also shows that the proposed modified DL Aranda, Gaussian, and Wave SAFs LSTM-based equalizers using the RMSProp learning algorithm and the default (cross-entropy) loss function have superior performance than both linear equalizers (ZF and MMSE) and the DL CNN-based equalizer at SNRs between 8 and 20 dB, and the DL LSTM-based model with the default SAF (Tanh) at SNRs ranging from 4 to 20 dB, and outperform the DL Bi-LSTM-based equalizer at low SNRs ranging from 0 and 10 dB. Furthermore, Fig. 5 demonstrates that the proposed modified Aranda, Gaussian, Wave, Elliott, Modified Elliott, and Softsign SAFs LSTM-based equalizers outperform the state-of-the-art CNN approach [48] over the entire range of SNR.

Fig. 6
figure 6

BERs of the proposed modified DL loss and SAFs LSTM-based equalizers, the traditional linear equalizers, Bi-LSTM-based equalizer, and the CNN-based equalizer using the RMSProp learning algorithm, and the default(cross-entropy) loss function

Besides, it is obvious from Fig. 7 that the proposed modified DL SAFs LSTM-based equalizers (Bitanh1, Cloglogm, Bitanh2, Rootsig, Softsign, Gaussian, Wave, and Elliott SAFs using the SGdm learning algorithm and default (cross-entropy) loss function outperform the linear equalizers (ZF and MMSE equalizers) and the DL model with the default SAF (Tanh) at SNRs ranging from 10 to 20 dB, and the DL CNN-based equalizer over all the SNR ranges. On the other hand, the DL Bi-LSTM-based equalizer produces approximately comparable performance to the proposed DL Bitanh2 SAFs LSTM-based equalizer. The proposed Aranda SAF has the worst BER at all SNRs ranging from 10 to 20 dB.

Fig. 7
figure 7

BERs of the proposed modified DL loss and SAFs LSTM-based equalizers, the traditional linear equalizers, Bi-LSTM-based equalizer, and the CNN-based equalizer using the SGdm learning algorithm, and the default (cross-entropy) loss function

Secondly, in the case of the Sum of Squared Errors loss function, from Fig. 8, we can observe that all of the proposed modified DL Cloglogm, Bitanh2, Modified Elliott, Wave, Softsign, Rootsig, Bitanh1, Elliott, and Aranda SAFs LSTM-based equalizers using the Adam learning algorithm outperform both the ZF and the MMSE equalizers at SNRs ranging from 10 to 20 dB. While at the SNR of 8 dB, the proposed Cloglogm, Modified Elliott, Bitanh2, Softsign, Bitanh1, Rootsig, and Elliott SAFs provide better performance than the other proposed SAFs and the linear equalizers. On the other hand, the proposed Modified Elliott, Bitanh2, Softsign, and Rootsig SAFs LSTM-based equalizers have superior performance to the DL LSTM-based model that uses the default SAF (Tanh) over all the SNR ranges.

Fig. 8
figure 8

BERs of the proposed modified DL loss and SAFs LSTM-based equalizers, the traditional linear equalizers, Bi-LSTM-based equalizer, and the CNN-based equalizer using the Adam learning algorithm, and the sum of squared errors loss function

In contrast to the other DL-based channel equalization systems, the CNN-based and the Bi-LSTM-based approaches [24, 48] in this case have the worst BER over the entire range of SNR, as shown in Fig. 8.

In addition, as shown in Fig. 9, the proposed modified DL Rootsig, Elliott, Cloglogm, Bitanh2, Softsign, Bitanh1, Gaussian, and Modified Elliott SAFs LSTM-based equalizers trained with the RMSProp learning algorithm and the Sum of Squared Errors loss function outperform the linear equalizers (ZF and MMSE equalizers) and the DL LSTM-based model that uses the default SAF (Tanh) at SNRs ranging from 8 to 20 dB, and the CNN-based or the Bi-LSTM-based DL equalizers [24, 48] over all the SNR ranges.

Fig. 9
figure 9

BERs of the proposed modified DL loss and SAFs LSTM-based equalizers, the traditional linear equalizers, Bi-LSTM-based equalizer, and the CNN-based equalizer using the RMSProp learning algorithm, and the sum squared errors loss function

Figure 10 shows that all the proposed modified DL SAFs LSTM-based equalizers trained with the SGdm learning algorithm and the Sum of Squared Errors loss function perform better than the traditional ZF and MMSE linear equalizers at SNRs ranging from 10 to 20 dB, and the CNN-based equalizer over all the SNR ranges. Also, the proposed Rootsig, Bitanh2, Softsign, Gaussian, Wave, and Cloglogm SAFs LSTM-based equalizers have superior performance to the DL LSTM-based model that uses the default SAF (Tanh) over the SNRs ranging from 6 to 20 dB. Also, the proposed Gaussian, and Cloglogm SAFs LSTM-based equalizers outperform the Bi-LSTM-based equalizer at SNRs ranging from 8 and 20 dB.

Fig. 10
figure 10

BERs of the proposed modified DL loss and SAFs LSTM-based equalizers, the traditional linear equalizers, Bi-LSTM-based equalizer, and the CNN-based equalizer using the SGdm learning algorithm, and the sum squared Errors loss function

As we know the default choice for the LSTM-NN SAF is the hyperbolic tangent function (Tanh) because it has the advantage of being a smooth and symmetric AF, which helps keep the output values centered around zero. This aids the backpropagation process and decreases the likelihood of vanishing gradients, which can be challenging for deep learning networks [61]. Besides this, the Tanh function has the property of squashing its output values between − 1 and 1, which is beneficial in applications such as normalizing the output of a linear layer [62].

The Tanh function has numerous drawbacks, such as its inability to completely eliminate the vanishing gradient problem, its computational complexity, and can only attain a gradient of 1 when the input value is 0 (x is zero); as a result, the function can produce some dead neurons during the computation process [62, 63]. These limitations of the Tanh function necessitated additional research into alternative AFs capable of addressing these issues. Also, the loss function, which computes the error between the actual and desired outputs, controls convergence and the optimum performance of the model [64].

In the scientific community, there is a significant interest in identifying and defining AFs and loss functions that can enhance the performance of neural networks [47, 54, 56, 64, 65].

We showed in Figs. 5, 6, 7, 8, 9, and 10 that the LSTM-based equalizer worked better when different SAFs were used instead of the default Tanh SAF, and SSE was used instead of the default (cross-entropy) loss function. Our research showed that using SSE instead of the default (cross-entropy) loss function, and some less-known AFs instead of the default Tanh has a positive effect on the performance of the LSTM network. This is reflected in the better performance of the DL-LSTM-based equalizers.

We may conclude from Figs. 5, 6, 7, 8, 9, and 10 that, the best-proposed state activation functions, which give the best performance in the modified loss and SAFs LSTM-based equalizers and symbol detector under the previous system settings, are listed in the following table.

Optimization techniques are critical for the improvement of DL systems. DNN training can be viewed as an optimization issue, with the objective of achieving a global optimum via a trustworthy training trajectory and rapid convergence via gradient descent techniques [60]. The goal of the DL method is to develop a model that produces more accurate and faster outcomes by modifying the biases and weights to minimize the loss function. Selecting the best optimizer for a certain scientific issue is a difficult task. By selecting an inadequate optimizer, the network may remain in the local minima (stay in the same place) during training, resulting in little progress in the learning process. As a result, the inquiry is required to look at how different optimizers perform based on the model and dataset used to make the best DL model.

This section compares the performance of the three optimization algorithms: Adam, RMSProp, and SGdm, using an experimental approach. We can use Table 4 to select the best SAFs that give the best performance, each with its own optimization algorithm.

Table 4 The best-proposed state activation functions (SAFs)

In the case of the cross-entropy loss function, Fig. 11, clearly shows that the proposed modified DL SAF Softsign LSTM-based equalizer using the Adam learning algorithm outperforms all of the other proposed modified SAFs LSTM-based equalizers at all SNRs.

Fig. 11
figure 11

Performance comparison of the best-proposed modified DL SAFs LSTM-based equalizers using different optimization algorithms and cross-entropy loss function

On the other hand, in the case of the sum of squared errors loss function, as shown in Fig. 12, the proposed modified DL SAF Elliott LSTM-based equalizer using the RMSProp learning algorithm gives the best performance over all the SNR ranges.

Fig. 12
figure 12

Performance comparison of the best-proposed modified DL SAFs LSTM-based equalizers using different optimization algorithms and sum squared errors loss function

Also from Fig. 13, we can say that the best proposed modified DL SAF LSTM-based equalizer is the modified DL SAF Elliott using the RMSProp learning algorithm and the sum of squared errors loss function.

Fig. 13
figure 13

Performance comparison of the best-proposed DL SAFs LSTM-based equalizers using different, optimization algorithms and loss functions

It is beneficial to monitor the training processes of the DL equalizers by investigating the loss and accuracy curves. These curves deliver details regarding how the training process goes, and the user could indeed decide whether to let the training process keep going or quit.

The Adam, RMSProp, and SGdm optimization loss and accuracy curves for our proposed best modified loss and SAFs LSTM-based equalizers in Figs. 14, 15, 17, and 18 highlight the outcomes shown in Figs. 11, and 12. Furthermore, the Adam, RMSProp, and SGdm optimization loss and accuracy curves for the CNN-based and Bi-LSTM-based approaches in Figs. 14, 15, 16, 17, and 18 emphasize the findings seen in Figs. 5, 6, 7, 8, 9, and 10, where the CNN and Bi-LSTM can provide improvements over the linear equalizers in the cross-entropy loss function with any one of the learning algorithms (Adam, RMSProp, and SGdm), while less or no improvement can be achieved in the case of the sum square of errors.

Fig. 14
figure 14

Loss function comparison of the DL equalizers using different optimization algorithms and cross-entropy

Fig. 15
figure 15

Loss function comparison of the best proposed modified DL SAFs LSTM-based equalizers and Bi-LSTM-based equalizers using different optimization algorithms and the sum squared errors

Fig. 16
figure 16

Loss function comparison of DL CNN-based equalizers using different optimization algorithms and the sum of squared errors

Fig. 17
figure 17

Accuracy curves comparison of the DL equalizers using different optimization algorithms and cross-entropy loss function

Fig. 18
figure 18

Accuracy curves comparison of the DL equalizers using different optimization algorithms and the sum of squared errors loss Function

4.1 Computational complexity of the proposed modified DL loss and SAFs LSTM-based equalizers

The computational complexity of the proposed modified loss and SAFs LSTM-based channel equalization and symbol detection DL models in the SC-FDMA is provided empirically in terms of the training time which is performed offline. Training time can be defined as the amount of time expended to get the best NN parameters (e.g., weights and biases) that will minimize the error using a training dataset. Because it involves continually evaluating the loss function with multiple parameter values, the training procedure is computationally complex.

Table 5 lists the consumed training time for the modified SAFs LSTM-based channel equalization and symbol detection DL models. The used computer is equipped with Windows 10 operating system and an Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz, and 8 GB of RAM.

Table 5 Training time comparison between the investigated SAFs LSTM-based channel equalizers

From Table 5, the best proposed DL SAF Softsign LSTM-based CE-SD trained with the Adam optimizer and cross-entropy loss function consumes a large amount of time compared to the best proposed DL SAF Cloglogm LSTM-based CE-SD that is trained with the Adam optimizer and sum of squared errors loss function. Also, the best DL SAF Gaussian LSTM-based CE-SD trained with the RMSProp optimizer and cross-entropy loss function consumes a large amount of time compared to the best DL SAF Elliott LSTM-based CE-SD that is trained with the RMSProp optimizer and sum of squared errors loss function. On the other hand, the best proposed DL SAF Bitanh2 LSTM-based CE-SD trained with the SGdm optimizer and cross-entropy loss function consumes a small amount of time compared to the best proposed DL SAF Gaussian LSTM-based CE-SD that is trained with the SGdm optimizer and sum of squared errors loss function. Also, from Table 5 and Fig. 13, we can say that the best proposed SAF that allows to give the best performance and consumes the least amount of time is the DL SAF Elliott LSTM-based CE-SD that was trained with the RMSProp optimizer and sum of squared errors loss function. The least SAF training time indicates its lowest computational complexity in comparison to its peers.

Also from Table 6, we can observe that the Bi-LSTM-based approach requires a large amount of training time for all of the training scenarios (Adam, SGdm, and RMSProp) compared to the proposed modified DL loss and SAFs LSTM-based equalizers, which is an indication of its increased computational complexity due to the fact that the Bi-LSTM network uses two distinct hidden layers to analyze data in both directions (first, from the past to the future, and second, from the future to the past) before feeding the results into a single output layer [24].

Table 6 Training time comparison between the Bi-LSTM-based channel equalizers

In contrast, from Table 7, we can say that the CNN-based approach requires the largest training time for all of the training scenarios (Adam, SGdm, and RMSProp), which is an indication of its increased computational complexity compared to our proposed modified DL SAFs LSTM-based equalizers.

Table 7 Training time comparison between the CNN-based channel equalizers

4.2 Generalization ability and robustness of the proposed models

Several practical channel models have been adopted. By using other practical channel models, we can provide additional analysis for comparing the efficacy of the proposed models and AFs. These channel models have been established based on lots of measurements (such as the indoor and vehicular models) released by ITU [66, 67].

Figures 5, 6, 7, 19, 20, and 21 depict the BERs of the proposed modified DL loss and SAFs LSTM-based equalizers, the conventional linear equalizers, Bi-LSTM-based equalizer, and the CNN-based equalizer under two distinct ITU channel models. In all investigated channel models, the proposed modified DL SAFs LSTM-based model outperforms the other equalizers in terms of stability and performance. We trained the model by the ITU Vehicular channel model and then tested it under two distinct ITU channel models (Vehicular and Indoor ITU channel models). The obtained results highlight the generalization ability and the robustness of the proposed equalizer, as it was evaluated using datasets (corrupted by two distinct ITU channel models) that were not utilized in the training process.

Fig. 19
figure 19

BERs of the proposed modified DL loss and SAFs LSTM-based equalizers, the traditional linear equalizers, Bi-LSTM-based equalizer, and the CNN-based equalizer using the Adam learning algorithm, and the cross-entropy loss function under the ITU Indoor channel model

Fig. 20
figure 20

BERs of the proposed modified DL loss and SAFs LSTM-based equalizers, the traditional linear equalizers, Bi-LSTM-based equalizer, and the CNN-based equalizer using the RMSProp learning algorithm, and the cross-entropy loss function under the ITU Indoor channel model

Fig. 21
figure 21

BERs of the proposed modified DL loss and SAFs LSTM-based equalizers, the traditional linear equalizers, Bi-LSTM-based equalizer, and the CNN-based equalizer using the SGdm learning algorithm, and the cross-entropy loss function under the ITU Indoor channel model

5 Conclusion

In conclusion, a modified DL LSTM-based channel equalization and symbol detection method based on changing the default state activation function [the hyperbolic tangent function (tanh)] and the default loss function (cross-entropy) was investigated in this study. The effectiveness of the modified DL model that has been suggested has been examined, and its results have been contrasted with those of other common linear equalizers like ZF and MMSE and other DL models like CNN-based or Bi-LSTM equalizers. The internal weights and biases of the proposed modified DL model were adjusted during the training process with different loss functions (default(cross-entropy) and sum of squared errors(SSE)) and different optimization algorithms (Adam, RMSProp, and SGdm). In our results, we have found that the presented modified loss and SAFs LSTM-based channel equalizer and symbol detector achieved higher performance in terms of BER than the conventionally used non-DL algorithms like linear (ZF and MMSE) equalizers and the other DL algorithms like CNN-based or Bi-LSTM equalizers in the SC-FDMA wireless communication systems. Additionally, the outcomes demonstrated that under various DL model settings (i.e., training algorithm, initial learning rate, learning rate drop factor, etc.), some lesser-known activation functions, including GELU, Wave, Bitanh1, Bitanh2, Modified Elliott, Elliott, Gaussian, Cloglogm, Aranda, Softsign, and Rootsig, can in terms of channel equalization accuracy outperform the frequently employed "tanh" state activation functions. Consequently, our comparison revealed that, among the proposed activation functions, the functions summarized in Table 4 (Softsign, Gaussian, Bitanh2, Cloglogm, and Elliott) outperformed the others. Furthermore, the findings showed that using the SSE loss function instead of the default loss function (cross-entropy) was an option that greatly improved the accuracy of the modified DL LSTM-based channel equalizer and symbol detector. Finally, the computational complexity of the proposed modified DL loss and SAFs LSTM-based equalizers was investigated, and we found that the proposed model provides a moderate computational complexity compared to the existing Bi-LSTM or CNN-based approaches. In light of the rapid technological advancements in the design and production of high-speed GPUs, the proposed model is emphasized. As a result of the proposed DL model's extraordinary learning and generalization properties, the suggested equalizer appears promising for channel equalization, particularly under poor channel conditions.

The following ideas are suggested for future research:

  • Mining for new activation functions and studying the other parts of an LSTM, such as changing the gate activation function (GAFs).

  • Studying the performance of the proposed modified SAFs LSTM-based channel equalizer and symbol detector systems with other loss functions.