Abstract
Long short-term memory (LSTM) based time series forecasting methods suffer from multiple limitations, such as accumulated error, diminishing temporal correlation, and lacking interpretability, which compromises the prediction performance. To overcome these shortcomings, a fuzzy inference-based LSTM with the embedding of a fuzzy system is proposed to enhance the accuracy and interpretability of LSTM for long-term time series prediction. Firstly, a fast and complete fuzzy rule construction method based on Wang–Mendel (WM) is proposed, which can enhance the computational efficiency and completeness of the WM model by fuzzy rules simplification and complement strategies. Then, the fuzzy prediction model is constructed to capture the fuzzy logic in data. Finally, the fuzzy inference-based LSTM is proposed by integrating the fuzzy prediction fusion, the strengthening memory layer, and the parameter segmentation sharing strategy into the LSTM network. Fuzzy prediction fusion increases the network reasoning capability and interpretability, the strengthening memory layer strengthens the long-term memory and alleviates the gradient dispersion problem, and the parameter segmentation sharing strategy balances processing efficiency and architecture discrimination. Experiments on publicly available time series demonstrate that the proposed method can achieve better performance than existing models for long-term time series prediction.
Similar content being viewed by others
Introduction
Time series forecasting (TSF) is the process of analyzing time series data using statistics and modeling to make predictions and inform straegic decision-making. TSF plays a vital role in various domains, especially in the fields of financial management1, 2, social network3,4,5, medical science6,7,8, and industrial engineering9, 10. Therefore, there is a growing consensus that it is of great importance to enhance the accuracy and interpretability of TSF due to its widespread application.
The conventional techniques for time series data analysis have assumed a linear relationship between past and future values for the purpose of prediction. This kind of model is represented by the linear regression approach, such as ARIMA-based models11, and has shown good results for short-term predictions. However, their performance deteriorates significantly when the parameters are not selected properly, and the arima based model is not suitable for time series data with weak periodic characteristics6, 8. Chen et al.5 propose a dynamic linear model to extract the systematical time-series dynamic and volatility features to achieve more accurate prediction, and achieved some success. Today, deep learning-based models have generated great success, especially in long-term time series prediction tasks, where deep learning models significantly outperform conventional linear models. Artificial neural network (ANN) has been a powerful tool for TSF by virtue of its universal approximation capability, nonlinear modeling capability12, 13. Recurrent neural network (RNN)14 can use internal memory cells to handle temporal data, and is employed to model history and future states15,16,17. However, the gradient disappearance problem of the RNN makes its performance limited. To overcome this problem, long short-term memory (LSTM) constructs the “gate” to determine the remembering and forgetting information to obtain the long-term memory of historical data. Ma et al.18 combine LSTM and bidirectional LSTM networks to perform transportation prediction. Bandara et al.19 propose a decomposition-based unified network architecture (LSTM-MSNet) to predict multiple seasonal time series. LSTM-based models, however, are notorious for their limited ability to effectively utilize the latest data and accurately model long sequences and cycles in time series. Additionally, the performance of LSTM for long-term prediction is hindered by the amplification of small errors inherent in the model. Transformer-based time series model is currently a popular research direction, and its modeling ability is incomparable to traditional neural networks. Its inherent advantages in processing and predicting long sequence data make it perform excellently in most temporal tasks20, 21. However, its O(\(n^2\)) algorithm complexity leads to an explosive increase in memory usage when executing long sequence data. Therefore, current research on it mostly focuses on optimizing algorithm efficiency and reducing algorithm complexity22,23,24.
Fuzzy system characterized by universal approximation capability and outstanding interpretability, providing an effective paradigm for handling uncertain data, representing latent knowledge, and exhibiting the inference process25. Some models attempt to enhance deep learning-based models by embedding fuzzy set theory, such as fuzzy deep convolutional neural network26, deep fuzzy echo state network27, and fuzzy recurrent neural network28, etc. Especially for time series forecasting, the related works have been proposed to integrate the fuzzy system with LSTM carrying the advantages of both fuzzy logic and deep learning. Li et al.29 propose a Type-2 fuzzy LSTM neural network to perform traffic volume prediction. Tang et al.30 propose a granule time series forecasting model by integrating the trend fuzzy granule and LSTM network. The innovations of these models are mainly reflected in taking the fuzzy information as input data and training the network’s parameters with a fuzzy system. However, the overall structure of LSTM has not changed, and the interpretability of the fuzzy system has been shrunken to some extent.
The extraction of fuzzy rules from the training data is a crucial component in modeling a fuzzy system. The Wang–Mendel (WM) model is a powerful tool for directly extracting fuzzy rules using only one pass of the training data31, 32. The effectiveness of the WM model could be greatly degraded due to the excessive generation of fuzzy rules. To overcome this issue, an improved WM model utilizing fuzzy c-means is proposed33. But, there is still a challenging task to determine the number of clustering. Zhai et al.34 propose an on-line WM fuzzy inference model, which can adaptively acquire the fuzzy rules from training data. However, the performance of the model could be limited due to the redundant rules and the lacking rules not covered by the training data.
Taking all the above observations into consideration, this paper proposes a fuzzy inference-based LSTM for time series forecasting, which enhance the accuracy and interpretability of LSTM with by embedding fuzzy system. To improve the computational efficiency and completeness of WM model, a fuzzy rule base construction method based on WM model is proposed. Then, the fuzzy prediction model based on the improved WM model is constructed. Finally, the fuzzy inference-based LSTM is proposed to carry out prediction by integrating the fuzzy prediction fusion, the strengthening memory layer, and the parameter segmentation sharing strategy into the LSTM network. In summary, the main contributions of this model are as follows:
-
(i)
A fast and complete fuzzy rule construction method based on the WM model is proposed, which can enhance the computational efficiency and completeness of the WM model by fuzzy rules simplification and complement strategies.
-
(ii)
Strengthening memory layer is constructed by integrating the current output with the cell state, which can strengthen the long-term memory and alleviate the gradient dispersion problem of LSTM.
-
(iii)
Parameter segmentation sharing strategy by dividing the overall output layer into different parts is proposed, which can balance processing efficiency and architecture discrimination.
-
(iv)
Fuzzy inference-based LSTM with the embedding of a fuzzy system is proposed, which can enhance the accuracy and interpretability of LSTM for long-term time series prediction.
-
(v)
Extensive experiments demonstrate the better performance of the proposed method in comparison with related models.
Prerequisites
This paper focuses on improving the interpretability and accuracy of deep neural network based on the fuzzy inference model in tackling the time series prediction problem. This section mainly introduces two related methods, LSTM and WM model.
Long Short-Term Memory Neural Network (LSTM)
RNN has achieved good performance in processing and learning time series information, but it cannot successfully learn long-term dependencies due to gradient explosion or gradient disappearance problems. LSTM is an extension for RNNs, which introduces the “gate” cell to retain and learn long-term dependencies. LSTM network can capture important features from inputs and store the information over a long period of time, thus it has achieved good results in long-term forecasting. In general, the critical components of LSTM network architecture consists of three gates: forget, input, and output gates denoted by f, i and o, respectively. The detailed description of the calculation procedure for each gate is shown as follows:
(1) Forget Gate. Determine what information needs to be retained in the memory cell with the help of sigmoid function. The output is expressed as follows:
where \(x_t\) and \(h_{t-1}\) represent input and hidden state at time step t and \(t-1\), respectively. W represents weight matrices, \(b_f\) represents a constant bias, and \(\sigma (\cdot )\) represents sigmoid function.
(2) Input Gate. Determine whether the new information should be saved to the memory cell by the sigmoid layer and tanh layer. The outputs of the two layers are computed in the following form:
The update of the memory cell is achieved by the combination of these two layers, where the current memory is obtained by retaining previous information and introducing new cell state information. The mathematical equation is expressed in the following form:
where \(c_{t}\) represents cell state at time step t, \(\circ\) denotes the Hadamard product.
3) Output Gate. Determine what part of the memory contributes to the current put and map the output between \(-1\) and 1 by tanh function. The outputs can be computed by the following equations:
Wang–Mendel model
WM model is a simple and powerful tool for generating the fuzzy rule base from sample data. However, the effectiveness of the WM model could be greatly degraded due to the huge amount of data. Each training data generates a fuzzy rule resulting that the rule extraction strategy is not efficient enough. Thus, improving the rule generation mechanism becomes crucial, and a fast and complete fuzzy rule base construction method based on the WM model will be proposed. The simplification strategy for redundant rules and conflict rules is proposed to simplify the fuzzy rule base. The complement strategy is proposed to complement the fuzzy rule base.
Fuzzy rules extraction. Given the time series, define the length of the antecedents and consequents of the fuzzy rule, and divide several fuzzy subsets of each variable of the antecedents to extract input-output sample pairs. Each feature of the input-output sample pair can be assigned to the fuzzy set with the highest membership degree, and these membership degrees are used to calculate the weight of the fuzzy rule, and finally an unorganized fuzzy rule base is generated.
Fuzzy rule arrangement. When the sample size is large, it is easy to generate redundant rules. To solve the above problem, when adding a rule to the rule base, first check whether the antecedents of the rule already exist in the rule base. If it does not exist, add it to the rule base; Otherwise, save the rule with the highest weight.
Fuzzy rule-based prediction. A central antifuzzy inference machine is used to organize rules with the same antecedents in the fuzzy rule base, obtain the consequent of the rules for fuzzy inference, and get the final fuzzy rule inference base. .
The drawback of this model is that the generated fuzzy rule library lacks good completeness and robustness, resulting in low model accuracy. Therefore, in order to improve the accuracy of the model, we need to optimize the method of constructing a fuzzy rule inference system to quickly and comprehensively construct a fuzzy rule inference system.
The proposed fuzzy prediction model
The construction of fuzzy rule base is crucial for fuzzy rule-based prediction model. The fuzzy rule base constructed based on WM model may be have redundant rules, and lack correspondence rules for new available sample due to the fuzzy regions uncovered by training data. To improve the computational efficiency and completeness of WM model, a fast and complete fuzzy rule base construction method based on WM model is proposed, then the prediction is performed based on the fuzzy rule base. The framework of the proposed fuzzy prediction model is shown in Fig. 1. In what follows, we explain the detailed steps of the proposed model.
Fuzzy rules extraction
Given the time series \(T=\{x_{1},x_{2},\ldots , x_{n}\}\), each input-output sample pair for training can be constructed as \(\{x_{i},x_{i+1},\ldots , x_{i+h-1}, y_{i}\}\), \(i=1, 2, \ldots , n-h\), where \(\{x_{i},x_{i+1}, \ldots , x_{i+h-1}\}\) is input sample, h is the length of input sample, and \(y_{i}=x_{i+h}\) is output sample. The domain of discourse is divided into q regions, then define the triangular fuzzy sets \(A_{1}, A_{2},\ldots , A_{q}\) based on these regions shown in Fig. 2.
Each feature of input-output sample pair can be assigned to the fuzzy set defined with the highest membership degree, i.e. \(x_{i}\) is fuzzified into \(A_{1,i}\) with the membership degree \(U_{1,i}\). The fuzzy rules can be extraction using WM method as follows:
where \(A_{j,i}\) is the jth antecedent, and \(A_{y,i}\) is the consequence. The rule that is generated from the training data be called data-generated rules, and fuzzy rule base can be constructed and denoted as \(R=\{R_1, R_2, \ldots , R_{n}\}\).
Fuzzy rules simplification
When the size of sample set is massive, a large number of fuzzy rules are generated. There are many fuzzy rules with same characteristics, such as redundant rules and conflict rules. Redundant rules refer to those rules with the same antecedents and consequence, and conflict rules are the rules that have the same antecedent but different consequences. To simplify the fuzzy rule base, the simplification strategy for redundant rules and conflict rules is proposed as follows:
(1) Redundant rules simplification. Find the group of date-generated rules that have the same antecedents and consequences, and then keep only one fuzzy rule among them and delete the group from the fuzzy rule base.
(2) Conflict rules simplification. Find the group of date-generated rules that have the same antecedents but different consequences, and then integrate the information of all fuzzy rules in the group to generate a new fuzzy rule. Delete the group from the fuzzy rule base and add the new fuzzy rule to the fuzzy rule base.
The process of conflict rules simplification are explained as follows. Assume the group found are \(R_{1}^{\prime }, R_{2}^{\prime }, \ldots , R_{m}^{\prime }\), the fuzzy rule \(R_i^{\prime }\) can be expressed as:
The weight of each fuzzy rule \(R_i^{\prime }\) can be computed by the product of membership function values for each antecedent:
where \(U_{j,i}\) is the membership degree of \(x_{ij}\) to \(A_{ij}\). Then the value can be obtained by using the center-average defuzzification mechanism:
where \(\bar{y}_{i}\) is the central value of fuzzy set \(A_{y_i}\). Assuming that \(A_{\hat{y}}\) is the fuzzy set on which \(\hat{y}\) achieves the maximum membership, the new fuzzy rule is generated as follows:
Fuzzy rules complement
The fuzzy rules are extracted from the fuzzy regions that contain sample data, thus the data-generated fuzzy rule base is in general not complete. To extrapolate the data-generated fuzzy rule base over the regions not covered by these obtained rules, the fuzzy rule base should be complemented to cover the whole domain of discourse. Especially for the forecasting problem, a complete fuzzy rule base is crucial because the rules should be well-defined at all samples in the domain of discourse. To complement the fuzzy rule base, the complement strategy is proposed as the following three steps.
Step 1) For each combination of antecedents that does not appear in the fuzzy rule base, find the group of data-generated fuzzy rules that differ from the combination in only i antecedents, and call this group the i-group. Determine the first group that is not an empty, i.e. t-group.
Step 2) For all fuzzy rules in t-group, compute:
where \(n_{t}\) is the number of fuzzy rules in t-group, \(y^{i}\) is the central value of fuzzy set that is the consequence of ith fuzzy rule in t-group.
Step 3) Find the fuzzy set \(A_{\hat{y}}\) on which \(\hat{y}\) achieves the maximum membership. Assuming that the combination of antecedents is “\(x_{i1}\ \textrm{is} \ A_{i1} \ \textrm{and} \ \cdots \ \textrm{and}\) \(\ x_{ih} \ \textrm{is} \ A_{ih}\)”, the extrapolating rule is generated as:
The process is repeated until all the extrapolating rules are constructed. The complete fuzzy rule base can be obtained by integrating the extrapolating rules and data-generated rules.
Fuzzy rule-based prediction
Let \(\{x_{n-h+1}, x_{n-h+2}, \ldots , x_{n}\}\) be the testing sample, and each feature \(x_{i}^{\prime }\) is fuzzified into a fuzzy set \(A_{i}^{\prime }\). The antecedents of fuzzy rule is obtained as “\(x_{n-h+1}\ \textrm{is} \ A_{1}^{\prime } \ \textrm{and} \ \cdots \ \textrm{and} \ x_{n} \ \textrm{is} \ A_{h}^{\prime }\)”, and the matching fuzzy rule can be extracted from fuzzy rule base shown as:
Predicted value can be obtained by \(\hat{y}=y^{\prime }\), where \(y^{\prime }\) is the center of fuzzy set \(A_{y}^{\prime }\).
In this section, an improved Wang–Mendel model for rapid construction of fuzzy inference systems is proposed, which improves the shortcomings of the incomplete fuzzy rule inference base that the Wang–Mendel model may generate by using a simpler way. Thus a complete fuzzy rule inference base is built. In the process of building this fuzzy inference system, there is not much extra time and computational overhead.
The addition of the fuzzy prediction module will affect the computational efficiency of the model. In the experiment, we improve the computational efficiency of the fuzzy prediction module as much as possible through the following methods.
1) The data in the input part of the experiment is fixed. To reduce the calculation cost of the fuzzy rule module, the construction of the fuzzy rule base is performed offline in advance.
2) For the data in the prediction part of the experiment, the branch bound search algorithm is used to reduce the computational cost when the fuzzy prediction inference base is used to find the corresponding rules.
Fuzzy inference-based LSTM for time series prediction
In this section, the fuzzy inference-based LSTM (FLSTM) for time series forecasting is proposed. The proposed method incorporates the fuzzy prediction fusion, the strengthening memory layer, and the parameter segment sharing strategy into the LSTM network. Fuzzy prediction fusion model combines the fuzzy prediction with the three gates in LSTM to enhance the fuzzy reasoning capacity of the network. Strengthening memory layer integrates the hidden state and the cell state to strengthen the long-term memory. Parameter segment sharing strategy divides the overall output layer into different parts to balance processing efficiency and architecture discrimination. The proposed forecasting model is shown in Fig. 3, and described in detail in the following section.
Fuzzy prediction fusion
The fuzzy prediction model is embedded in the LSTM to enhance the network reasoning capability and interpretability. The fuzzy rule can capture the dynamic characteristic of data change, and the reasoning relationship between the latest information and the historical information is extracted in the form of rules. The fuzzy prediction model can take full advantage of the latest information to prediction future behavior. Therefore, the LSTM combines with fuzzy prediction model can effectively overcome the lacking in the utilization of latest data.
LSTM utilizes gate cell to control information flow in recurrent computations. Therefore, the input gate, forget gate, and output gate are combined with fuzzy prediction to produce new output, which can integrate the fuzzy prediction information into the recurrent computations. The mathematical expressions can be expressed as follows:
where \(r_{t}\) is output of fuzzy prediction model at time step t, and \(W_{ff}, W_{if}, W_{of}\) are weight matrices of \(r_{t}\) for input gate, forget gate, and output gate, respectively.
After the training of model, these weights can represent the strengths of the fuzzy rules in the different gates, thus the proposed input gate, forget gate, and output gate make the results more interpretable. Meanwhile, the fusion of fuzzy prediction information in the recurrent process increases the convergence speed of the training.
Strengthening memory layer
LSTM can learn long-term dependencies through deliberate design, and the critical component is the memory cell. To strengthen the long-term memory and alleviate the gradient dispersion problem of LSTM, the output needs to be determined by the current output and the cell state, thus the strengthening memory layer is proposed. In the strengthening memory layer, the current output and the cell state are combined to form a new unit. Then, the convolution Conv1d and tanh functions are used to extract more effective features to form the new memory cell. Finally, the output is generated by adding the current and new cell states, and it can be computed as follows:
Due to the addition of the new state, the latest information can be strengthened, and through the addition of new features, more information can be saved. The results of two kinds of feature information are combined in a summation way, which can strengthen the impact of the new state on the final result to a certain extent and make the results more comprehensive.
Parameter segment sharing strategy
Parameter sharing is a necessary method for controlling the number of model parameters, which improves the efficiency of the model. Parameter sharing is a reduction of the parameters that the model has to learn, which make the model processing more efficient. However, this also results in coupled optimization among different candidates, making architectures less discriminative. Therefore, a strategy of parameter segment sharing towards better trade-off between processing efficiency and architecture discrimination is proposed for LSTM. Let the prediction length be L, and the number of shared parameters be s, the \(k=L/s\) output layers are constructed to predict. Different output layers can capture temporal features from different time periods, which improves the architecture discrimination. Meanwhile, the output layer with s shared parameters guarantees the model processing efficiency. Finally, the output layer can be expressed in the form:
where \(y_{t}\) is the forecast result, \(W_{yk}\) is weight matrices, \(\hat{h}_t\) is the output of the strengthen layer, and \(b_y\) is bias.
FLSTM model
FLSTM is based on the LSTM model and integrates the fuzzy system to leverage the advantages of both fuzzy logic and deep learning. FLSTM combines the fuzzy prediction fusion, the strengthening memory layer, and the parameter segment sharing strategy to enhance the accuracy and interpretability of LSTM for long-term time series prediction. First, the proposed fuzzy prediction model is utilized to obtain the fuzzy rule-based prediction value. This information will be fused into the input gate, forget gate, and output gate of LSTM. Then, the strengthening memory layer sums the hidden state and the cell state to form a new state, and extracts more effective features using convolution and tanh functions. Add the new state and the new state after feature extraction to generate the strengthening memory state. The parameter segment sharing strategy can be flexibly adjusted based on different datasets and various transformations of prediction cycles and lengths, improving the model’s ability to extract periodic features from time series data and effectively manages the increasing of network parameter. Algorithm 1 shows the details of the FLSTM model.
Experimental study
To verify the prediction performance of FLSTM, a comparison with twenty-two prediction methods on seven collected real-world datasets is conducted. twenty-two time series prediction methods are selected for the comparative experiments, including three classical prediction method ARIMA11, SVR35, naive36, six deep learning-based prediction methods GRU37, DRNN38, LSTM39, Reformer22, LogSparse self-attention23, and Efficient attention40, seven LSTM-based fuzzy inference methods FD-LSTM41, FIS-LSTM42, SEIT2FNN43, RIT2NFS-WB44, MclT2FIS-UM45, MclT2TIS-US45, eIT2FNN-LSTM46, a LSTM-based fuzzy gaussian prediction method LFIGLSTM30, a fuzzy gaussian based fuzzy inference prediction method LFIGFIS47, a fuzzy prediction method FPFTS48, and a hybrid method MLP-Arima8, a nonlinear autoregressive neural network NAR49. Seven real-world datasets are the crucial indicators in the electric power deployment, air quality assessment, daily number of Covid-19 cases, monthly sunspot numbers, and daily maximum temperatures, i.e. Electricity Transformer Temperature (ETT)50, PM2.551, daily Covid-19 cases52, monthly sunspot numbers53, daily maximum temperatures54, abalone age51, mile per gallon51. To evaluate the prediction effectiveness of the proposed method, the six performance indexes, MSE, MAE, RMSE, SMAPE, MAPE, and MASE are adopted8, 40, 48. For the sake of fairness, the selection of prediction length is consistent with the the original paper of the compared models for different datasets. The results of the compared models are derived from reports in the original paper.
Experiment 1: Electricity Transformer Temperature time series
These time series are collected from Electricity Transformer Temperature (ETT)50 in Fig. 4, where \(\textrm{ETTh}_{1}\) and \(\textrm{ETTh}_{2}\) are created for 1-hour-level of 2 years data from two separated countries in China, and \(\textrm{ETTm}_{1}\) and \(\textrm{ETTm}_{2}\) are created for 15-minutes-level from the same datasets. Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the batch size is set to 32, the learning rate is set to 0.001, the training epoch is set to 100, the experiments times is set to 6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in the strengthening memory layer.
For \(\textrm{ETTh}_{1}\) and \(\textrm{ETTh}_{2}\) time series, the prediction lengths are set to 3, 6, 12, 18, 24, 36, 48, and 168 used for the experiment. For \(\textrm{ETTm}_{1}\) and \(\textrm{ETTm}_{2}\) time series, the prediction lengths are set to 4, 8, 12, 16, 24, 32, 48, 96, and 288. The prediction performance evaluation of ARIMA11, GRU37, DRNN38, LSTM39, FD-LSTM41, FIS42, Reformer22, LogTrans23, Efficient-att40, and the proposed method with different prediction lengths on the 4 time series are listed in Tables 1, 2, 3 and 4. The best results are highlighted in boldface and the winning-counts are listed in the last column.
From Tables 1, 2, we can see that FLSTM achieves better results than LSTM on MSE by decreasing 19.9% (at 48) and 29.0% (at 168) in average. This reveals that FLSTM significantly improves the performance of LSTM. In comparison with ARIMA, GRU, DRNN, Reformer, and LogTrans, FLSTM outperforms the prediction performances of these method across all datasets. FLSTM beats Efficient-att mostly in winning-counts, i.e. \(14>3\) and \(15>3\), and surpasses Efficient-att on longer length (\(\ge 36\)). From Tables 3, 4, we can see that FLSTM achieves better results than LSTM on MSE by decreasing 28.9% (at 96) and 32.4% (at 288) in average. This demonstrates that FLSTM significantly improves the performance of LSTM. Comparison with ARIMA, GRU, DRNN, LSTM, FD-LSTM, FIS and Reformer, FLSTM outperforms the prediction performances of these methods across all datasets. FLSTM beats LogTrans and Efficient-att mostly in winning-counts, i.e. \(12>8\) and \(12>1\) for \(\textrm{ETTm}_{1}\), \(12>4\) and \(12>4\) for \(\textrm{ETTm}_{2}\). The experiment shows that the success of FLSTM in enhancing the prediction performance in the long-term prediction problem.
Experiment 2: PM2.5 time series
Currently, research on PM2.5 data has generated great enthusiasm, and more and more deep learning based models have been proposed and applied to long-term PM2.5 generation55, 56. Therefore, we have increased our research on time series prediction of PM2.5 data. These time series are collected from PM2.5 data51 in Fig. 5, where BeijingPM and ShanghaiPM are the PM2.5 data of Bejing and Shanghai in China from 2010 to 2015, including 50387 and 51892 observations respectively. Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the batch size is set to 32, the learning rate is set to 0.001, the training epoch is set to 100, the experiments times is set to 6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in the strengthening memory layer. The prediction lengths are set to 200, 400, and 600 used for the experiment. Tables 5 and 6 summarize the evaluation results of ARIMA11, LSTM39, FPFTS48, and FLSTM with the three long-term prediction lengths. The best results are highlighted in boldface and the winning counts are listed in the last row.
Table 5 demonstrates that FLSTM outperforms other methods for the PM2.5 time series of Beijing in terms of all evaluation metrics, except that FPFTS has the smallest RMSE when the prediction length is 600. The proposed method surpasses FPFTS mostly in winning-counts, i.e. \(5>1\). In comparison with LSTM, the proposed method has a RMSE decrease of 7.0% (at 200), 8.0% (at 400), and 8.1% (at 600). This demonstrates FLSTM acquires better prediction performance than LSTM. From Table 6, we can see that FLSTM for the two evaluation metrics with all prediction lengths outperforms ARIMA, LSTM, and WM for the PM2.5 time series of Shanghai. FLSTM surpasses FPFTS mostly in winning-counts, i.e. \(4>2\). In comparison with LSTM, FLSTM has an RMSE decrease of 38.2% (at 200), 26.6% (at 400), and 16.7% (at 600). This demonstrates FLSTM acquires better prediction performance than LSTM. The experiment shows that the success of FLSTM in improving the prediction capacity for long-term prediction.
Experiment 3: Daily number of Covid-19 cases time series
This time series is collected from the daily number of Covid-19 cases database owned by the organization Our World In Data (OWID)52, and it is built by the number of daily cases in the world until April 25th, 2021 in Fig. 6. Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the batch size is set to 32, the learning rate is set to 0.001, the training epoch is set to 100, the experiments times is set to 6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in the strengthening memory layer. The prediction lengths are set to 7, 14, and 28 used for the experiment same as in the literature8. The prediction performance evaluation of ARIMA(2,0,4)(0,1,2), MLP(14,5,1), MLP-Arima8, and FLSTM for short (7 days), medium (14 days), and long (28 days) prediction lengths are list in Table 7. The best results are highlighted in boldface and the winning counts are listed in the last row.
Table 7 demonstrates that FLSTM for MASE and SMAPE with all prediction lengths outperforms other methods for the daily number of Covid-19 cases time series. In comparison with the state-of-the-art method MLP-Arima8, FLSTM has a MASE decrease of 6.6% (at 7), 79.5% (at 14), and 19.1% (at 28). This demonstrates FLSTM acquires better prediction performance than MLP-Arima. From Table 7, we can see that FLSTM surpasses the comparative methods in all winning-counts. The experiment shows that the success of FLSTM in improving the prediction capacity for different prediction lengths.
Experiment 4: Monthly sunspot numbers time series
This time series is collected from sunspot data in Fig. 7, where SUNSPOT53 is the sunspot data of Zuerich monthly sunspot numbers from 1749 to 1983, including 2819 observations respectively. Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the batch size is set to 32, the learning rate is set to 0.001, the training epoch is set to 100, the experiments times is set to 6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in the strengthening memory layer. The prediction lengths are set to 1, 55, 110, and 165 used for the experiment. Table 8 summarizes the evaluation results of LFIGLSTM30, LFIGFIS47, LSTM39, NAR49, ARIMA11, SVR35, naive36 and FLSTM with four prediction lengths. The best results are highlighted in boldface and the winning counts are listed in the last column.
Table 8 demonstrates that FLSTM for RMSE, MAPE and MAE with all prediction lengths outperforms other methods for the monthly sunspot numbers time series. In comparison with the state-of-the-art method LFIGLSTM, FLSTM has a RMSE decrease of 85.2% (at 1), 50.5% (at 55), 34.8% (at 110) and 27.2% (at 165). This demonstrates FLSTM acquires better prediction performance than LFIGLSTM. From Table 8, we can see that FLSTM surpasses the comparative methods in all winning-counts. The experiment shows that FLSTM has advantages over classical prediction models, deep learning prediction models, and hybrid prediction models in both short-term and long-term prediction tasks.
Experiment 5: Daily maximum temperatures time series
This time series is collected from temperature data in Fig. 8 where Tmax54 is the temperature data of daily maximum temperatures in Melbournea from 1981 to 1990, including 3649 observations respectively. Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the batch size is set to 32, the learning rate is set to 0.001, the training epoch is set to 100, the experiments times is set to 6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in the strengthening memory layer. The prediction lengths are set to 1, 178, 356, and 534 used for the experiment. Table 9 summarizes the evaluation results of LFIGLSTM30, LFIGFIS47, LSTM39, NAR49, ARIMA11, SVR35, naive36 and FLSTM with four prediction lengths. The best results are highlighted in boldface and the winning counts are listed in the last column.
Table 9 demonstrates that FLSTM outperforms other methods for the daily maximum temperatures in terms of all evaluation metrics, except that LFIGLSTM has the smallest MAE when the prediction length is 356 and 534. In comparison with the state-of-the-art method LFIGLSTM, FLSTM has a RMSE decrease of 18.9% (at 1), 13.8% (at 55), 15.4% (at 110) and 9.5% (at 165). The experiment shows that FLSTM has significant advantages over these prediction models in both short-term and long-term prediction tasks.
Experiment 6: Abalone age time series
Abalone age (ABALONE) time series51 is collected from the UCI machine learning repository as shown in Fig. 9, which includes 4177 observations. Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the learning rate is set to 0.001, the training epoch is set to 200, the experiment times is set to 6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in the strengthening memory layer. The prediction length is set to 835 used for the experiment. Table 10 summarizes the evaluation results of SEIT2FNN43, RIT2NFS-WB44, MclT2FIS-UM45, MclT2TIS-US45, eIT2FNN-LSTM46, and FLSTM. The best result is highlighted in boldface.
Table 10 demonstrates that FLSTM outperforms other methods for the abalone age prediction problem in terms of RMSE. In comparison with the state-of-the-art method eIT2FNN-LSTM, FLSTM has a RMSE decrease of 4.5%. The experiment shows that FLSTM has significant advantages over these prediction models.
Experiment 7: Miles-Per-Gallon time series
Miles-Per-Gallon (MPG)51 time series is collected from the UCI machine learning repository in Fig. 10 which includes 392 observations. Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the learning rate is set to 0.001, the training epoch is set to 200, the experiment times is set to 6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in the strengthening memory layer. The prediction length is set to 120 used for the experiment. Table 11 summarizes the evaluation results of SEIT2FNN43, RIT2NFS-WB44, MclT2FIS-UM45, MclT2TIS-US45, eIT2FNN-LSTM46, and FLSTM. The best result is highlighted in boldface.
Table 11 demonstrates that FLSTM outperforms other methods for the Miles-Per-Gallon prediction problem in terms of RMSE. In comparison with the state-of-the-art method eIT2FNN-LSTM, FLSTM has a RMSE decrease of 9.7%. The experiment shows that FLSTM has significant advantages over these prediction models.
Ablation study
To demonstrate the respective roles of different components in the proposed method, including the fuzzy prediction fusion (FPF), strengthening memory layer (SML), and parameter segment sharing (PSS) strategy, the ablation study on the \(\textrm{ETTh}_1\) dataset is carried out. For a finer analysis, the experimental results vary with different combinations of LSTM, FPF, SML, and PSS are shown in Tables 12 and 13 for different prediction lengths.
Results presented in Tables 12 and 13 reveal that the proposed method FLSTM outperforms all other combinations of LSTM, FPF, SML, and PSS for short-term and long-term predictions in terms of MSE and MAE. The three combinations with different components all improve the accuracy of LSTM, which verifies the respective roles of FPF, SML, and PSS. Although the combinations with two different components also obtain the best results as the proposed method, such as LSTM+SML+PSS at short-term prediction lengths, the performances of the methods drop when one component is removed from the proposed method for all long-term prediction lengths. This is attributed to the fact that each component has a positive impact on improving prediction capacity. The proposed method gathers the benefits of the three improvement components and gets the best performance for all prediction lengths.
Ethics declarations
There are not any experiments on humans and/or animals involved in this study.
Conclusion
LSTM-based models yielded great success in the time series forecasting research field, but yet these methods have their main general drawbacks as accumulated error, diminishing temporal correlation, and laking interpretability. This research is undertaken to design a time series prediction model by integrating linear model Wang–Mendel fuzzy inference prediction method and LSTM network, which makes the model parameters more scientific and interpretable, and improves its performance in short-term time series prediction tasks. This study also aims to solve the problem of LSTM’s poor performance in long-term time series prediction tasks. We strengthened the long-term memory by using the strengthening memory layer, and balanced the processing efficiency and structural discrimination of the model by using the parameter segmentation sharing strategy, which solved the problem of LSTM’s poor performance in long-term time series prediction due to the gradient dispersion problem.
Seven publicly available time series are used to compare the prediction performance of the proposed method with eight methods, including three classical prediction method ARIMA, SVR, naive, six deep learning-based prediction methods GRU, DRNN, LSTM, Reformer, LogSparse self-attention, and Efficient attention, seven LSTM-Based Fuzzy inference methods FD-LSTM, FIS-LSTM, SEIT2FNN, RIT2NFS-WB, MclT2FIS-UM, MclT2TIS-US, eIT2FNN-LSTM, a LSTM-based fuzzy gaussian prediction method LFIGLSTM, a fuzzy gaussian based fuzzy inference prediction method LFIGFIS, a fuzzy prediction method FPFTS, and a hybrid method MLP-Arima, a nonlinear autoregressive neural network NAR. In comparison with the classical prediction method, FLSTM outperforms the prediction performances of the method across all datasets. In comparison with the hybrid method, FLSTM acquires better prediction performance for all prediction lengths. In comparison with deep learning-based prediction methods, FLSTM beats these methods in winning-counts. In comparison with the fuzzy prediction method, FLSTM outperforms the prediction performances of the method in terms of winning-count. The experiments show that the success of FLSTM in improving the prediction capacity for long-term prediction. FLSTM has disadvantages in computational complexity. FLSTM can only predict one step at a time, thus the time cost becomes larger as the prediction length increase. The fixed fuzzy rule generation mechanism also limits the flexibility of prediction. Of course, these also provide ideas for future research.
Future research will include the following: (1) support multi-step prediction at a time; (2) provide fuzzy reasoning with different cycle lengths; (3) extend LSTM network to more complex data; (4) apply the proposed method to other appealing directions.
Data availability
The datasets used in this study are publicly available.
References
Liu, G., Xiao, F. & Lin, C. T. A fuzzy interval time-series energy and financial forecasting model using network-based multiple time-frequency spaces and the induced-ordered weighted averaging aggregation operation. IEEE Trans. Fuzzy Syst. 28, 2677–2690 (2020).
Bala, R. & Singh, R. P. A dual-stage advanced deep learning algorithm for long-term and long-sequence prediction for multivariate financial time series. Appl. Soft Comput. 126, 109317 (2022).
Gao, X., Cao, Z. & Li, S. Taxonomy and evaluation for microblog popularity prediction. ACM Trans. Knowl. Discov. Data (TKDD) 13, 1–40 (2019).
Cao, Q., Shen, H. & Gao, J. Popularity prediction on social platforms with coupled graph neural networks. In Proceedings of the 13th International Conference on Web Search and Data Mining, 70–78 (2020).
Chen, X., Lan, X. & Wan, J. Evolutionary prediction of nonstationary event popularity dynamics of Weibo social network using time-series characteristics. Discret. Dyn. Nat. Soc. 2021, 1–19 (2021).
Sharma, R. R., Kumar, M. & Maheshwari, S. EVDHM-ARIMA-based time series forecasting model and its application for COVID-19 cases. IEEE Trans. Instrum. Meas. 70, 1–10 (2020).
Shen, F., Liu, J. & Wu, K. Multivariate time series forecasting based on elastic net and high-order fuzzy cognitive maps: A case study on human action prediction through EEG signals. IEEE Trans. Fuzzy Syst. 29, 2336–2348 (2020).
de Araújo Morais, L. R. & da Silva Gomes, G. S. Forecasting daily Covid-19 cases in the world with a hybrid ARIMA and neural network model. Appl. Soft Comput. 126, 109315 (2022).
Dudek, G., Pełka, P. & Smyl, S. A hybrid residual dilated LSTM and exponential smoothing model for midterm electric load forecasting. IEEE Trans. Neural Netw. Learn. Syst. 33, 2879–2891 (2021).
Soda, P., Sicilia, R. & Acciai, L. Grasping inter-attribute and temporal variability in multivariate time series. IEEE Trans. Big Data 7, 885–892 (2019).
Ariyo, A. A., Adewumi, A. O. & Ayo, C. K. Stock price prediction using the ARIMA model. In 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, 106–112 (2014).
Panigrahi, S. & Behera, H. S. A hybrid ETS-ANN model for time series forecasting. Eng. Appl. Artif. Intell. 66, 49–59 (2017).
Geng, X., Li, H. & Yao, Z. Potential of ANN for prolonging remote sensing-based soil moisture products for long-term time series analysis. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2022).
Elman, J. L. Finding structure in time. Cogn. Sci. 14, 179–211 (1990).
Canizo, M., Triguero, I. & Conde, A. Multi-head CNN-RNN for multi-time series anomaly detection: An industrial case study. Neurocomputing 363, 246–260 (2019).
Ni, Q. & Cao, X. MBGAN: An improved generative adversarial network with multi-head self-attention and bidirectional RNN for time series imputation. Eng. Appl. Artif. Intell. 115, 105232 (2022).
Hu, M., Jiang, K. & Nie, Z. You only align once: Bidirectional interaction for spatial-temporal video super-resolution. In Proceedings of the 30th ACM International Conference on Multimedia, 847–855 (2022).
Ma, C., Dai, G. & Zhou, J. Short-term traffic flow prediction for urban road sections based on time series analysis and LSTM_BILSTM method. IEEE Trans. Intell. Transp. Syst. 23, 5615–5624 (2021).
Bandara, K., Bergmeir, C. & Hewamalage, H. LSTM-MSNet: Leveraging forecasts on sets of related time series with multiple seasonal patterns. IEEE Trans. Neural Netw. Learn. Syst. 32, 1586–1599 (2020).
Vaswani, A., Shazeer, N. & Parmar, N. Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 1–15 (2017).
Xiao, Y., Yuan, Q. & He, J. Space-time super-resolution for satellite video: A joint framework based on multi-scale spatial-temporal transformer. Int. J. Appl. Earth Obs. Geoinf. 108, 102731 (2022).
Kitaev, N., Kaiser, Ł. & Levskaya, A. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020).
Li, S., Jin, X. & Xuan, Y. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv. Neural. Inf. Process. Syst. 32, 1–14 (2019).
Zhou, H., Zhang, S. & Peng, J. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence 35, 11106–11115 (2021).
Garibaldi, J. M. The need for fuzzy AI. IEEE/CAA J. Autom. Sin. 6, 610–622 (2019).
Yeganejou, M., Dick, S. & Miller, J. Interpretable deep convolutional fuzzy classifier. IEEE Trans. Fuzzy Syst. 28, 1407–1419 (2019).
Zhang, S., Sun, Z. & Wang, M. Deep fuzzy echo state networks for machinery fault diagnosis. IEEE Trans. Fuzzy Syst. 28, 1205–1218 (2019).
Zhang, Z. & Yan, Z. An adaptive fuzzy recurrent neural network for solving the nonrepetitive motion problem of redundant robot manipulators. IEEE Trans. Fuzzy Syst. 28, 684–691 (2019).
Li, R., Hu, Y. & Liang, Q. T2F-LSTM method for long-term traffic volume prediction. IEEE Trans. Fuzzy Syst. 28, 3256–3264 (2020).
Tang, Y., Yu, F. & Pedrycz, W. Building trend fuzzy granulation-based LSTM recurrent neural network for long-term time-series forecasting. IEEE Trans. Fuzzy Syst. 30, 1599–1613 (2021).
Wang, L. X. The WM method completed: A flexible fuzzy system approach to data mining. IEEE Trans. Fuzzy Syst. 11, 768–782 (2003).
Wang, L. X. & Mendel, J. M. Generating fuzzy rules by learning from examples. IEEE Trans. Syst. Man Cybern. 22, 1414–1427 (1992).
Gou, J., Hou, F. & Chen, W. Improving Wang-Mendel method performance in fuzzy rules generation using the fuzzy C-means clustering algorithm. Neurocomputing 151, 1293–1304 (2015).
Zhai, Y., Lv, Z. & Zhao, J. Data-driven inference modeling based on an on-line Wang–Mendel fuzzy approach. Inf. Sci. 551, 113–127 (2021).
Cortes, C. & Vapnik, V. Support vector machine. Mach. Learn. 20, 273–297 (1995).
Webb, G. I., Keogh, E. & Miikkulainen, R. Naïve bayes. Encycl. Mach. Learn. 15, 713–714 (2010).
Cho, K., Van Merriënboer, B. & Bahdanau, D. On the properties of neural machine translation: Encoder-Decoder approaches. arXiv preprint arXiv:1409.1259 (2014).
Graves, A., Mohamed, A. R. & Hinton, G. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 6645–6649 (2013).
Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
Shen, Z., Zhang, M. & Zhao, H. Efficient attention: Attention with linear complexities. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 3531–3539 (2021).
Langeroudi, M. K., Yamaghani, M. R. & Khodaparast, S. FD-LSTM: A fuzzy LSTM model for chaotic time-series prediction. IEEE Intell. Syst. 37, 70–78 (2022).
Suppiah, R., Kim, N. & Sharma, A. Fuzzy inference system (FIS)-long short-term memory (LSTM) network for electromyography (EMG) signal analysis. Biomed. Phys. Eng. Express 8, 065032 (2022).
Juang, C. & Tsao, Y. A self-evolving interval type-2 fuzzy neural network with online structure and parameter learning. IEEE Trans. Fuzzy Syst. 16, 1411–1424 (2008).
Juang, C. & Juang, K. Reduced interval type-2 neural fuzzy system using weighted bound-set boundary operation for computation speedup and chip implementation. IEEE Trans. Fuzzy Syst. 21, 477–491 (2012).
Das, A. K., Subramanian, K. & Sundaram, S. An evolving interval type-2 neurofuzzy inference system and its metacognitive sequential learning algorithm. IEEE Trans. Fuzzy Syst. 23, 2080–2093 (2015).
Wang, H., Luo, C. & Wang, X. Synchronization and identification of nonlinear systems by using a novel self-evolving interval type-2 fuzzy LSTM-neural network. Eng. Appl. Artif. Intell. 81, 79–93 (2019).
Yang, X., Yu, F. & Pedrycz, W. Long-term forecasting of time series based on linear fuzzy information granules and fuzzy inference system. Int. J. Approx. Reason. 81, 1–27 (2017).
Wang, W., Liu, W. & Chen, H. Time series forecasting via fuzzy-probabilistic approach with evolving clustering-based granulation. IEEE Trans. Fuzzy Syst. 30, 5324–5336 (2022).
Padilla, C., Hashemi, R. & Mahmood, N. H. A nonlinear autoregressive neural network for interference prediction and resource allocation in URLLC scenarios. In 2021 International Conference on Information and Communication Technology Convergence (ICTC), 184–189 (2021).
ETT dataset. https://github.com/zhouhaoyi/ETDataset .
UCI Machine Repository: Data Sets. http://archive.ics.uci.edu/ml/datasets.php.
Coronavirus pandemic (covid-19). https://ourworldindata.org/coronavirus .
Zurich monthly sunspot number. https://github.com/PacktPublishing/Practical-Time-Series-Analysis .
Melbournea daily max temperatures. https://github.com/jbrownlee/Datasets .
Alexeeff, S. E., Liao, N. S. & Liu, X. Long-term pm2.5 exposure and risks of ischemic heart disease and stroke events: review and meta-analysis. J. Am. Heart Assoc. 10, e016890 (2021).
Xiao, Y., Wang, Y. & Yuan, Q. Generating a long-term (2003–2020) hourly 0.25 global PM2.5 dataset via spatiotemporal downscaling of CAMS with deep learning (DeepCAMS). Sci. Total Environ. 848, 157747 (2022).
Acknowledgements
This work was supported in part by the Natural Science Foundation of China under Grant 62266046, the Natural Science Foundation of Jilin Province, China, under Grant YDZJ202201ZYTS603, and the Natural Science Foundation of Jilin Provincial Department of Education, China, under Grant JJKH20230281KJ.
Author information
Authors and Affiliations
Contributions
All the authors contributed extensively to the manuscript. W.W. wrote the main manuscript, and helped with the formatting review and editing of the paper. J.S. designed the experiments and wrote the main manuscript. H.J. reviewed and edited the original document. All authors have read and agreed to the publication of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, W., Shao, J. & Jumahong, H. Fuzzy inference-based LSTM for long-term time series prediction. Sci Rep 13, 20359 (2023). https://doi.org/10.1038/s41598-023-47812-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-47812-3
- Springer Nature Limited
This article is cited by
-
Air pollutant prediction model based on transfer learning two-stage attention mechanism
Scientific Reports (2024)