Abstract
Machine learning algorithms can improve the time series data analysis as compared to the traditional methods such as moving averages or auto-regressive approaches. This advancement has helped to unlock several challenging problems since machine learning not only helps to forecast the overall trend of the data, but it also helps to keep the historical track of changes in factors, influencing this trend. These predictions play a pivotal role in almost all areas of research where the observations are time dependent, such as problems ranging from challenges of finance to public health, environmental and climate change challenges. A key challenge of these domains is the higher number of attributes and predictors since managing and manipulating data from many attributes is itself a significant challenge for future forecasting. Addressing these challenges is possible with Recursive Long Short-Term Memory models. The application of such models is crucial, and their efficacy is further amplified when considering transfer learning. During this research, a detailed and comprehensive description of such models is addressed. Practical application is illustrated through an example, emphasizing that these models, when transferred to complex and large datasets using transfer learning, hold great promise.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Time series prediction involves finding patterns from past data and using those patterns to make predictions about future events. It is the process of using historical data to forecast future events, playing a crucial role in data science for predicting trends, planning for future occurrences, and making decisions based on expected outcomes. Achieving correct predictions needs a profound understanding of the underlying patterns in the data and how they may evolve over time.
Time series prediction is a process that demands a deep comprehension of the data and its inherent patterns. It is essential to account for the impacts of seasonality, trend, and noise within the data, along with any external factors that might influence it. Additionally, considering the length of the data set is vital, as it directly affects the accuracy of the predictions. A larger dataset enhances generalization [1], reduces overfitting, and ensures statistical significance. However, collecting extensive data isn’t always feasible, so techniques like optimization [2], data augmentation and transfer learning can enhance model performance even with limited data.
The applications of time series data analysis tools are not only limited to engineering research data repositories, where the performance of various engineering devices and tools can be evaluated over time for improvement and accuracy. These approaches are now employed in almost all disciplines of sciences, including bio-medicine [3,4,5,6,7], finance [8,9,10,11], agriculture, industry [12] and most importantly in the domain of climate change that is an evolving area of research [13]. In these fields, the time-dependent data of multiple variables can be efficiently managed and assessed for future predictions [14].
It is important to note that due to the diverse nature of climate change challenges, it is really hard to retrieve detailed information of all the attributes and predictors, therefore the time series forecasting depends on the augmentation techniques. For example, in the event of floods [15], wildfire [16], earthquake [17] and even disasters like pandemics [18], the qualitative data (interviewing the survivors, audio and visual data analysis) is really limited and the analysis is really challenging due to the missing or incomplete data. However, in this manuscript, we will discuss another approach that is comparatively efficient for complex datasets, known as the transfer learning approach [19].
Similarly, machine learning tools have gained recognition in other domains such as finance and business due to the strengths of novel techniques in terms of data and operating characteristics. Over the years, various researchers have discussed the strengths of artificial intelligence (AI) tools through business case-based studies [11, 12, 20,21,22]. Similarly, for the secure online businesses [23,24,25], algorithms have been developed by researchers to identify malicious websites, serving as an initial protective measure.
In all the applications discussed above and in general practice, the statistical datasets exhibit three main qualities: (a) dimension, (b) sparsity, and (c) resolution.
The "dimensionality" of a dataset refers to the total number of characteristics and measurements for each object in the dataset. When a dataset has samples with numerous descriptions, known as "high dimensionality," it can become challenging to discern the meaning of the data. This challenge is often referred to as the "curse of dimensionality."
When most of an object’s features are set to 0, a highly skewed distribution is observed. In many instances, less than 1 percent of the inputs exhibit non-zero values. AI tools consider such raw material as sparse [26], emphasizing the scattered nature of the entire dataset [27].
The third quality relates to the level of resolution in the data structure. If the structural diagram is too limited, the results may not be visible or might be disturbed by noise (see Table 1 for the climate change datasets with noise). Conversely, if the diagram’s result is too extensive, patterns can be obscured. For instance, the motion of storms and other weather phenomena can be observed through changes in atmospheric pressure over an hour. However, on a timescale of months, such patterns may not be discernible.
In particular, as the number of dimensions increases, the space occupied by the data becomes more constrained. For classification, this implies that there may not be enough data objects to construct a model exclusively dedicated to a specific category among all possible items. In clustering, the definition of distance and density between points, crucial for clustering, becomes less clear.
1.1 Types of Datasets
The data of research problems addressed above can be categorize into (a) ordered data, (b) record data, and (c) graph-based data.
In most cases, the work of Data Mining relies on record data, which is a set of records (data objects). At the onset of record data processing (data stored in a table), there are typically no authentic links between data fields and records, and each record (object) shares a similar set of features. Records are often stored in horizontal files or relational databases (tables containing rows and columns).
The relation between the characteristics of data sets is based on their order in time or space. There are four types:
-
1.
Sequential information, also known as temporal records, represents an extended form of recorded facts where each record is associated with a specific time. Consider a dataset related to retail transactions that includes both the time and transaction type.
-
2.
Sequence data comprises a collection of items listed in order, such as a sequence of words or letters. While it resembles sequential data, it differs in that positions follow a distinct pattern rather than being marked with timestamps. For instance, genetic instructions in plants and animals are represented as nucleotide sequences, known as genes. For the analysis of such datasets, the readers are recommended to consider works such as [34, 36].
-
3.
Time series data is a specific category of sequential information where each record serves as a point in a time series. For example, the computation of series considered over a specific time interval, such as economic data based on daily stock prices. In this scenario, time series data could be collected over several months, recording daily prices for various stocks, reflecting the daily fluctuations in prices, especially common in underdeveloped countries where grocery prices can vary daily. Some useful references in this domain are [36, 37].
-
4.
Some objects possess spatial characteristics, indicating their location or size, along with other attributes. An example of spatial data is weather information (precipitation, temperature, and pressure) collected for various locations worldwide. Useful approaches in this domain are listed by researchers [38, 39].
Thus clear understanding of data type is important to select relevant machine learning tool. In the next section, we will discuss the machine learning approaches to analyze time series datasets. We will address the challenges with the help of an example and will extend the research idea with the aid of transfer learning approach.
2 Materials and Methods
With machine learning networks, complex datasets can be explored more efficiently. These networks can further help to address problems such as time series forecasting and risk management.
2.1 Recurrent Neural Networks
Recurrent neural networks (RNNs) are different from the basic feed forward networks. RNNs have the ability to analyze temporal dynamic behaviour by forming a directed graph along a temporal sequence and by using their memory to process these sequences, making them ideal for sequential data like time series, financial data, audio, video, speech, weather, and complex problems. RNNs originated in the 1980 s, but their full strength has only recently emerged.
2.2 Long Short-Term Memory Models
Hochreiter and Schmidhuber [40] introduced Long Short-Term Memory (LSTM) networks in 1996 to address the vanishing gradient problem in traditional RNNs. LSTM architecture mitigates the issue where gradients become extremely small during backpropagation, impeding, or halting the learning process.
To resolve the vanishing gradient problem, LSTM employs a memory cell capable of choosing to forget or remember information over time. The three units, referred to as gating units, control the cell by deciding how much information to forget, remember, and add to the cell as new information
Over time, LSTM architecture has gained acceptance as a superior choice compared to traditional RNNs. The specific design of LSTM has been enhanced and successfully applied to various problems in finance, accounting, and other research and technology fields. Its applications extend to tasks such as speech recognition, natural language processing, medical imaging, bio-medicine, smart energy, and other time-series prediction tasks [41, 42]. With numerous extensions and variations tailored to address diverse challenges, the LSTM approach has become one of the most trusted methods for time series data analysis, particularly with datasets featuring higher frequencies and different attributes [43, 44].
2.2.1 Methodology
These networks includes unique set of reminiscence cells that replace the neurons of the hidden layer of RNN, and the state of the memory cells is also important. LSTM models filter out the information through gate structures to preserve the state of memory cells, and keeps these up to date on regular basis. It has input, output, and forgotten gates in its door structure. Each memory cell consists of 3 sigmoid layers and 1 \(\tanh \) layer. The Fig. 2 shows how LSTM memory cells are put together.
The forget gate within the LSTM unit decides which information about the state of a cell is left out of the model. As shown in Fig. 2, the memory cells take as inputs the previous output, \(h_{t-1}\), and the current moment’s external information, \(x_t\), unified in a long vector, \(\textbf{v} = [h_{t-1}, x_t]\), using the sigmoid function:
\(W_f\) and \(b_f\) are the gate weight matrix and forgotten bias, and sigmoid feature, respectively. The primary purpose of the forgotten gate is to keep track of how much of the cell state \(C_{t-1}\) is reserved for the current cell state \(C_t\). Based on \(h_{t-1}\) and \(x_t\), the gate sends a range of numbers between 0 and 1. 1 means all reserved values, and 0 means all discarded values.
The input gate determines how much of the current time network entry \(x_t\) is reserved for cell state \(C_t\). This prevents irrelevant data from entering the memory cells. To find the state of the cell that needs to be changed, the sigmoid layer selects the values to be modified. In Eq. 2, the mathematical explanation is presented as:
Another approach is to replace the records via the \(\tanh \) layer, creating a candidate vector \(\hat{C_t}\) to control the amount of newly added information, as in Eq. 3:
To change the state of the cell in the memory, the function \(C_{t}\) is used, as shown in Eq. 4:
The output gate essentially controls the information about the percentage of the discarded "current state" of the cell. The output information is first decided by the sigmoid layer. The next step is to alter the element of the cell using \(\tanh \) and multiply the output of the sigmoid layer by the state of the cell to get the result.
The final value of the cell output is given as follows:
2.3 Recursive Long Short-Term Memory Models
Recursive LSTMs offer distinct advantages over alternative methods in time series prediction. Primarily, they excel in capturing long-term dependencies in data, making them well-suited for forecasting future events. Moreover, their capacity to learn from their own predictions contributes to continual improvement in accuracy. Lastly, they effectively capture patterns across multiple time steps, a crucial aspect in the realm of time series prediction.
3 Results and Discussion
3.1 Case Study
Although there are several applications of LSTM, here we will elaborate its significance with the help of a stock price prediction example. In this stock price prediction case study, we employ recursive LSTM models on historical Microsoft Corporation (MSFT) stock data. The process involves data preprocessing, model architecture design, and training for future stock price predictions.
The dataset, sourced from Yahoo Finance (1986-03-14 to 2022-10-07), undergoes normalization, noise removal, and transformation. The model architecture is tailored, specifying layers, neurons, optimizer, and loss function. Training involves feeding, weight adjustments, and accuracy evaluation on a test set.
The dataset is split into training, validation, and test sets, indicated by ‘q80’ and ‘q90.’ Figure 3 visually represents this split, with colors denoting percentages (70%, 15%, and 15%). The legend distinguishes each set in the plot.
This case study demonstrates a systematic approach to recursive LSTM models for stock price prediction, highlighting key steps from preprocessing to model evaluation. Figure 3 provides a concise visual of the dataset split, crucial for assessing model performance.
A plot comparing the model’s training predictions to the actual target values for the training set is provided, making it possible to visually assess the performance of the model on the training data. In Fig. 4, we have provided the model training results.
Use the trained model to generate predictions on the testing data.Flatten the predictions to convert the predicted values to a 1D array. Figure 5 provides the model validation results.
The test results are presented in Fig. 6 respectively.
The tool will generate a plot that compares the model’s predicted values to the true target values across all three data sets, making it possible to visually assess the model’s overall performance.
Results are presented in Figs. 7 and 8, showcasing composite and recursive prediction outcomes. The recursive prediction process involves generating predictions using the trained LSTM model on the validation and testing sets. Predicted values are stored in the ’recursive predictions’ list, visually illustrated when plotted against target dates. The method employs a recursive approach, updating predictions iteratively by replacing the last element in each window with the predicted value.
3.2 Applications to Support Transfer Learning
Developing layers, networks, and classifiers poses significant challenges within the field of machine learning. A substantial subbranch is dedicated to the “reuse” of developed classifiers, often involving the transfer of knowledge gained from training on one dataset to a new problem. These steps have shown promise in developing models in a cost-effective manner.
Over time, several transfer learning approaches have been developed and successfully implemented for complex problems [45,46,47]. Building on the works [48], an algorithm can be designed for the transfer learning approach, utilizing LSTM, to analyze datasets from multiple sources for finance data or for the climate change data respectively. For example, researchers [49] used transfer learning approach and an approach of weighted combination of the available predictors to guarantee the convergence to the best weighted predictor. They mainly focused on the online transfer learning framework for improved temperature predictions in residential buildings. Similarly, for the flood management, researchers proposed transfer learning models [50] for better forecasting. Another fascinating application of transfer learning is proposed by researchers [51], where transfer learning and LSTM approaches were used in a bidirectional manner to address the challenges of missing data problems for building energy.
For more complex situations such as cross-domain knowledge transfer [52] and diverse data sources [53, 54], an improved algorithm can provide a robust framework for efficient analysis and model adaptation. A schematic description (Fig.3) is presented below to illustrate the steps.
The methods reviewed and their implementations described in this work provide readers with insights to develop advanced algorithms.
4 Conclusions
The time series forecasting tools have advanced based on the requirement of current research strategies and pathways. The challenges of time series are not limited to stochastics perturbations, it has a greatly influenced by the underlying sources and stressors. Smart programming tools can memorize patterns and trends, thus utilizing these while forecasting the fate of the open problems under consideration. In this manuscript, we have provided a comprehensive overview of machine learning time series data analysis tools. The transfer learning approaches have proved to be promising in this domain. We conclude that LSTM and transfer learning benefit each other likewise. LSTM networks can be pre-trained on a large dataset for a specific task. The learned representations or weights of the LSTM can then be transferred and fine-tuned on a smaller dataset for a related task. This transfer of knowledge from the pre-trained LSTM to the target task can help improve performance, especially when the target dataset is limited. Transfer learning can help LSTM networks by providing a way to shift knowledge from one task to improve performance on another task. By transferring knowledge from the source task to the target task, the LSTM can benefit from the general features learned in the source task. These tools can help not only with the time series data in business and finance but also with other emerging research areas such as climate change and public health. These fields face significant challenges in data management and processing, and these tools can play a crucial role in addressing these challenges.
Data Availability
The authors declare that the data generated during simulations are generated based on optimized parametric values and literature review. The data is available online at yahoo finance database. The climate change data based studies, reviewed during this manuscript are clearly acknowledged and cited.
Code Availability
All the codes used during this research can be accessed on request.
References
Shi Y (2022) Advances in big data analytics. In: Adv Big Data Anal
Shi Y et al (2011) Optimization based data mining: theory and applications. Springer, Berlin
Jiang Y et al (2022) Artificial intelligence to deal with the post COVID-19 fractal dynamics linked with economy. In: Fractals
Zhenhua Yu et al (2022) Explainability of neural network clustering in interpreting the COVID-19 emergency data. Fractals 30(05):2240122
Zhenhua Yu et al (2021) Forecasting the impact of environmental stresses on the frequent waves of COVID19. Nonlinear Dyn 106:1509–1523
Sohail A (2023) Genetic algorithms in the fields of artificial intelligence and data sciences. Ann Data Sci 10(4):1007–1018
Ayesha Sohail and Fatima Arif (2020) Supervised and unsupervised algorithms for bioinformatics and data science. Prog Biophys Mol Biol 151:14–22
Ayesha Sohail and Usman Ashiq (2023) Quantum inspired improved AI computing for the sensors of cardiac mechano-biology. Sens Int 4:100212
Al-Utaibi KA et al (2022) Neural networks to understand the physics of oncological medical imaging. Biomed Eng Appl Basis Commun 34(06):2250036
Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining, vol 10. McGraw-Hill/Irwin, New York
Yu Z, Sohali A (2024) Machine learning to explore the stochastic perturbations in revenue of pandemic-influenced small businesses. Nonlinear Dyn 112(2):1549–1558
Sohail A, Yu Z, Nutini A (2023) COVID-19 variants and transfer learning for the emerging stringency indices. Neur Proc Lett 55(3):2359–2368
Kamir E, Waldner F, Hochman Z (2020) Estimating wheat yields in Australia using climate records, satellite image time series and machine learning methods. ISPRS J Photogramm Remote Sens 160:124–135
Yu Z et al (2022) Hybrid algorithm for the classification of fractal designs and images. In: Fractals
Yari A et al (2019) Underlying factors affecting death due to flood in Iran: a qualitative content analysis. Int J Disaster Risk Reduct 40:101258
Dupuy J et al (2020) Climate change impact on future wildfire danger and activity in southern Europe: a review. Ann For Sci 77:1–24
Sadhukhan B, Chakraborty S, Mukherjee S (2022) Investigating the relationship between earthquake occurrences and climate change using RNN-based deep learning approach. Arab J Geosci 15(1):31
David Heath Cooper and Joane Nagel (2022) Lessons from the pandemic: climate change and COVID-19. Int J Sociol Soc Policy 42(3/4):332–347
Sohail A (2024) Transfer learning" for bridging the gap between data sciences and the deep learning. Ann Data Sci 11(1):337–345
Bose I, Mahapatra RK (2001) Business data mining—a machine learning perspective. Inf Manag 39(3):211–225
Duan Y et al (2022) Assessing machine learning for forecasting economic risk: evidence from an expanded Chinese financial information set. Financ Res Lett 46:102273
Michael Dowling and Brian Lucey (2023) ChatGPT for (finance) research: the Bananarama conjecture. Financ Res Lett 53:103662
Bang J, Ryu D, Yu J (2023) ESG controversies and investor trading behavior in the Korean market. Finance Res Lett 103750
Hossain E et al (2019) Application of big data and machine learning in smart grid, and associated security concerns: a review. Ieee Access 7:13960–13988
Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178
Shi Y et al (2024) Sparse optimization guided pruning for neural networks. Neurocomputing 574:127280
Martin Shepperd and Michelle Cartwright (2001) Predicting with sparse data. IEEE Trans Softw Eng 27(11):987–998
Parey S, Hoang TTH, Dacunha-Castelle D (2014) Validation of a stochastic temperature generator focusing on extremes, and an example of use for climate change. Clim Res 59(1):61–75
Franzke CLE et al (2015) Stochastic climate theory and modeling. Wiley Interdiscip Rev Clim Change 6(1):63–78
Ocana V, Zorita E, Heimbach P (2016) Stochastic secular trends in sea level rise. J Geophys Res Oceans 121(4):2183–2202
Roberto Tomasicchio G et al (2018) A direct scaling analysis for the sea level rise. Stoch Env Res Risk Assess 32:3397–3408
Karydas C, Xepapadeas A (2019) Pricing climate change risks: CAPM with rare disasters and stochastic probabilities. In: CER-ETH working paper series working paper 19, p 311
Benz E, Trück S (2009) Modeling the price dynamics of CO2 emission allowances. Energy Econ 31(1):4–15
Graves A et al (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp 369–376
Camacho C et al (2009) BLAST+: architecture and applications. BMC Bioinf 10:1–9
Brockwell PJ, Davis RA (1991) Time series: theory and methods. Springer, Berlin
Fuller WA (2009) Introduction to statistical time series. Wiley, New York
Toru Ishikawa and Uiko Nakamura (2012) Landmark selection in the environment: relationships with object characteristics and sense of direction. Spat Cognit Comput 12(1):1–22
Perry JN et al (2002) Illustrations and guidelines for selecting statistical methods for quantifying spatial pattern in ecological data. Ecography 25(5):578–600
Hochreiter S, Schmidhuber J (1996) LSTM can solve hard long time lag problems. In: Advances in neural information processing systems 9
Vinay Kumar Reddy Chimmula and Lei Zhang (2020) Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos Solitons Fractals 135:109864
Siami-Namini S, Namin AS (2018) Forecasting economics and financial time series: ARIMA vs. LSTM. arXiv:1803.06386
Svetlana Borovkova and Ioannis Tsiamas (2019) An ensemble of LSTM neural networks for high-frequency stock market classification. J Forecast 38(6):600–619
Song X et al (2020) Time-series well performance prediction based on long short-term memory (LSTM) neural network model. J Petrol Sci Eng 186:106682
Yang Q et al (2020) Transfer learning. Cambridge University Press, Cambridge
Egan TM, Yang B, Bartlett KR (2004) The effects of organizational learning culture and job satisfaction on motivation to transfer learning and turnover intention. Hum Resour Dev Q 15(3):279–301
Jie L et al (2015) Transfer learning using computational intelligence: A survey. Knowl-Based Syst 80:14–23
Giel A, Diaz R (2015) Recurrent neural networks and transfer learning for action recognition
Grubinger T, Chasparis GC, Natschläger T (2017) Generalized online transfer learning for climate control in residential buildings. Energy Build 139:63–71
Zhao G et al (2021) Improving urban flood susceptibility mapping using transfer learning. J Hydrol 602:126777
Ma J et al (2020) A bi-directional missing data imputation scheme based on LSTM and transfer learning for building energy data. Energy Build 216:109941
Gartzke E, Lindsay JR (2019) Cross-domain deterrence: strategy in an era of complexity. Oxford University Press, Oxford
Rindfuss RR et al (2008) Land use change: complexity and comparisons. J Land Use Sci 3(1):1–10
Wei Y et al (2018) A review of data-driven approaches for prediction and classification of building energy consumption. Renew Sustain Energy Rev 82:1027–1047
Gifford R, Kormos C, McIntyre A (2011) Behavioral dimensions of climate change: drivers, responses, barriers, and interventions. Wiley Interdiscip Rev Clim Change 2(6):801–827
Acknowledgements
The authors would like to acknowledge the data repositories of yahoo finance and financial stock price data of Microsoft co-operation MSTF.
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions.
Author information
Authors and Affiliations
Contributions
MT, SA, AS, YZ and XJ equally contributed to the manuscript. MT and SA did the programming, AS did the supervision, YZ and XJ did the analysis and literature review.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Tahir, M., Ali, S., Sohail, A. et al. Unlocking Online Insights: LSTM Exploration and Transfer Learning Prospects. Ann. Data. Sci. (2024). https://doi.org/10.1007/s40745-024-00551-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40745-024-00551-2