1 Introduction

Stock price time series is highly unpredictable as it exhibits irregular movements and possesses dynamic, highly volatile, and complex nonlinear behavior (Fama 1970). The economic growth of countries is affected by various financial activities and the stock market plays a critical role in the country's domestic as well as global economic conditions (Lin et al. 2012). The stock market is influenced by several factors such as industrial growth, political instability, government policies, global economic position, psychology of investors, and many others factors (Menkhoff 1997). The stock price time series defined as: \({x}_{t} = \{{x}_{t}\upepsilon R| t=1, 2, 3\dots .N\}\) is the set of prices collected at regular time interval t (Pal and Kar 2022). The key focus of the stock price time series forecasting is to forecast the future values of the stock based on the past pattern present in the series itself. According to National securities depository limited (NSDL) (NSDL 2021) and Central depository services (India) limited (CDSL) (CDSL 2021) approximately 5.2% of the Indian population only invest in the stock market due to fear of loss, lack of knowledge, poor services from advisories and brokers and one most important factor that drives the market is uncertainty. Hence, there is a growing need to develop an intelligent system in order to model the uncertainty and nonlinearity in stock price time series. So, forecasting stock price is an important task for reducing the risk of the investor as well as the economic development of the country.

In the past few years, computational intelligence (CI) paradigms namely artificial neural networks (ANNs) (Shahvaroughi Farahani and Razavi Hajiagha 2021), genetic algorithm (GA), fuzzy logic (FL), and numerous nature-inspired optimization (Mahajan et al. 2021, 2022a) techniques namely particle swarm optimization (PSO), differential evolution (DE), artificial bee colony (ABC), bacterial foraging optimization (BFO), etc., have been combined to develop hybrid models for share price forecasting (Kumar et al. 2021a). CI is a growing computing approach that mimics the human brain's ability to generalize and memorize in the field of imprecision and uncertainty to solve difficult problems (Ibrahim 2016; Mahajan et al. 2022b). ANN can approximate the nonlinear relation present in data without making any assumptions, thus it is successfully used for stock price forecasting (Guresen et al. 2011). The most widely utilized artificial neural networks techniques for share price forecasting are recurrent neural networks (RNN) and feed-forward neural networks (FFNN) due to their ability to model nonlinearity and uncertainty (Atsalakis and Valavanis 2009; Liu and Wang 2012). Gong et al. (2018) applied the multiobjective evolutionary algorithm to obtain the optimal architecture of RNN for time series classification and estimation. Yan (2012) used a combination of generalized regression neural network models to automate the exploration for ANN parameters for time series forecasting. Gao et al. (2018) used various nature-inspired methods including PSO and GA to train the dendritic neuron model to solve multiple tasks. In recent studies, however, deep learning models have been widely used for stock price time series forecasting utilizing historical stock prices and technical indicators (Li and Bastos 2020). Convolution neural network (CNN) (Chen et al. 2020), recurrent neural network (RNN) (Berradi and Lazaar 2019), and deep belief network (DBN) (Zhang and Ci 2020) are three main paradigms of deep learning. Cao and Wang (2019) combined CNN with a support vector machine (SVM) to foresee the trend in the market and analyzed that the deep neural network can obtain good forecasting results in the financial time series prediction domain. Balaji et al. (2018) designed fourteen deep learning models based on CNN, long short-term memory (LSTM), gated recurrent unit (GRU), and extreme learning machine (ELM) to forecast 1-day ahead and 4-days ahead close price of all the stocks in S&P BSE-BANKEX and concluded that deep learning models are effective in producing highly accurate forecasting results. Rather et al. (2015) created a hybrid method by combining linear models namely auto-regressive moving average and exponential smoothing and a nonlinear model viz. RNN to predict the stock returns. Further, in this study, the parameters of the proposed model are optimized by using GA. Lin et al. (2018) created a hybrid technique by merging the DBN with a derivative-free optimization technique namely negative-correlation search (NCS) to optimize the hyperparameters of DBN to forecast the equity price and obtained efficient results as compared to traditional neural networks.

However, in the last few years, the long short-term memory (LSTM) neural network is an extensively accepted method for time series forecasting because it can process highly nonlinear and uncertain data. LSTM is a class of recurrent neural networks with memory cells and is significantly used for solving time series forecasting problems. Wang et al. (2018) used the LSTM network to forecast the Chinese stock market and attempted to improve the traditional gradient descent algorithms to enhance the learning process in LSTM. Fischer and Krauss (2018) deployed the LSTM model to predict the trend in various stocks of the S&P 500 index and compared the forecasting accuracy with three memory-free models like random forest (RF), deep neural network (DNN), and logistic regression (LR) and demonstrated the superior performance of the LSTM. Pang et al. (2020) presented the concept of a stock vector and developed two hybrid models: deep LSTM network with an embedded layer and LSTM with the automatic encoder to vectorize the input data that is fed as input to LSTM to achieve better forecasting results. Wei Bao et al. (2017) presented a multiple-stage novel deep learning approach for forecasting stock price. In the first stage, the stock price time series has been filtered by using wavelet transform (WT) to remove the noise, secondly, high-level features are extracted using stacked autoencoder (SAE) and finally, LSTM is applied to forecast the 1-day ahead close price. The authors showed that the proposed model performs significantly better than other approaches such as RNN, LSTM, and WT-LSTM.

Although the LSTM shows good performance in time series forecasting problems, the biggest challenge in LSTM neural network is tuning its hyperparameters namely number of hidden layers, number of hidden nodes in each layer, batch size, number of epochs, learning rate, and optimizing the connection weights as well bias of the network. The automatic search of the architecture of neural networks has been attempted by numerous authors (Elsken et al. 2019). Sakshi and Kumar (2019) used genetic algorithms to evolve the parameters of an ANN and found that the suggested model has a short training period, rapid convergence, and a greater success rate. Peng et al. (2018) applied differential evolution (DE) to obtain the optimal value of various hyperparameters such as window length, the number of hidden nodes, batch size, and the number of epochs in LSTM for electricity price prediction. Liu and Liu (2019) incorporate modified GA for selecting the optimal features subset and hidden neurons of LSTM neural networks for house price prediction in China.

In this study, one of the challenging problems considered is optimizing the initial parameters (weights and bias) of LSTM and fully connected layer (FCL) in a deep neural network (DNN) model for stock price forecasting. Gradient descent (GD) (Ruder 2016), stochastic gradient descent (SGD) (Schaul et al. 2013), stochastic gradient descent with momentum (SGDM) (Yazan and Talu 2017), root-mean-square propagation (RMSProp) (Hinton et al. 2012) and adaptive moment estimation (Adam) (Kingma and Ba 2014) are most popular used algorithms for training LSTM neural network. Among the learning algorithms, Adam optimizer is the most frequently used optimization method for training ANNs because it combines the advantages of two optimization techniques viz. SGDM and RMSProp. Despite being the good performance of Adam optimizer, it has two limitations: firstly, if the objective function is non-differentiable or multi-modal, it might become stuck in local minima (Bock and Weiß 2019). Secondly, it suffers from the problem of vanishing/exploding gradient due to the use of the sigmoid activation function in LSTM (Roodschild et al. 2020). This weakness of the Adam optimizer and the ability of nature-inspired optimization methods to handle continuous optimization problems, have motivated us to use them to find out the optimal set of initial weights for DNN.

Because of Adam's inability to converge locally, it’s clear that ANN weight and bias’ optimization are heavily reliant on the initial values (Roodschild et al. 2020). If the starting values are situated in the local search space then the networks may become stuck in the local region. This issue can be solved by using global search strategies to train ANNs. Initially, to reduce the search space, PSO is used to find the optimal values of initial weight and bias of the LSTM network and fully connected layer in the DNN model, and the resulting weights and bias are then used as the LSTM neural network's starting parameters.

PSO is a swarm intelligence-based stochastic optimization method inspired by the social behavior of animals. In PSO, the inertia coefficient maintains the balance among the exploration and exploitation properties and avoids premature convergence (Nickabadi et al. 2011). Eberhart and Shi (2000) introduced a time-varying inertia coefficient to enhance the exploration and exploitation features of PSO and obtained efficient results. Panigrahi et al. (2008) presented an adaptive PSO in which distinct inertia coefficients are given to particles based on their rank. In this study, we proposed an Adaptive PSO by improving the inertia coefficient of standard PSO to avoid the problem of being stuck in the local optimum during iteration. We have assigned the inertia coefficient to each particle during iteration based on the velocity of global best and personal best particles in the swarm.

In this study, we proposed a swarm intelligence-based hybrid deep learning approach for short-term (1-day, 1-week and 2-weeks ahead) and long-term (4-weeks, 6-weeks and 12 weeks ahead) forecasting of the price of equity market indices by applying technical indicators. The proposed technique is obtained by merging the Adaptive PSO and the Adam optimizer for training the LSTM neural network. Firstly, we employ the adaptive PSO for determining the initial weights and bias of LSTM units and fully connected layer (FCL) in the DNN model. Finally, the LSTM and FCL are initialized with the weights and bias obtained by adaptive PSO and the configured model is further trained by the Adam optimizer. The proposed model is named as Adaptive PSO-LSTM. This article also compares the model's predicting performance to that of conventional LSTM, Elman neural network (ENN) (Ren et al. 2018), and a hybrid approach created by incorporating evolutionary techniques like GA with LSTM. The proposed method's forecasting performance is tested by predicting the close price of three stock market indices: S&P 500, Sensex, and Nifty 50.

The following are the key contributions of this work:

  1. (1)

    The only data available in the stock market is the daily open, high, low, and close prices, as well as the number of shares traded, we developed a pool of technical indicators.

  2. (2)

    To handle the premature convergence issue of the standard PSO we proposed an Adaptive PSO to improve the inertia coefficient to enhance the global search capability of the particles and to prevent the particles from being trapped in local optimum and to balance the exploration and exploitation properties of PSO.

  3. (3)

    To deal with the issue of overfitting/underfitting and vanishing/exploding gradient in the LSTM neural network we have applied the Adaptive PSO to obtain the optimal value of the initial input weights, recurrent weights and bias of LSTM and input weights and bias of FCL in DNN model to achieve the high forecasting accuracy in the domain of stock price forecasting. To the best of our knowledge, this is the first study in which PSO is used to determine the initial weights and bias of LSTM and FCL in DNN.

The remainder of this article is structured as: Sect. 2 provides the outline of previous studies. Section 3 describes the framework for developing the proposed technique. The evaluation metrics and experimental setting are described in Sect. 4. The experimental findings and analyses are shown in Sect. 5. The conclusion is presented in Sect. 6.

2 Related work

Stock price time series forecasting is a challenging area of research due to various unknown factors that influence the stock price movement. In the past several years, machine learning (Ulke et al. 2018; Sands et al. 2015; Kumar et al. 2020) and deep learning (Livieris et al. 2020; Long et al. 2020; Nabipour et al. 2020; Ding and Qin 2020; Chong et al. 2017; Baldominos et al. 2020), metaheuristic optimization techniques (Mahajan et al. 2022c, d)-based approaches have gain popularity in time series forecasting and optimization domain. Kumar et al. (2021a) presented a systematic review of computational intelligence-based approaches for stock price forecasting and analyzed that the nature-inspired-based hybrid neural network models are extensively accepted approaches. Li and Bastos (2020) conducted a comprehensive survey that focuses on deep learning and technical analysis-based model for stock price forecasting and showed that LSTM neural network-based hybrid models are the most frequently applied approaches in the time series forecasting domain. Van Houdt et al. (2020) presented the review of the application of LSTM neural networks in various domains including time series prediction and discussed that the LSTM model is well suited for processing temporal sequences and is suitable for dealing with time series prediction problems. A comprehensive review of a hybrid model based on nature-inspired optimization for adjusting the hyper-parameters and topology of deep neural networks was recently presented by Darwish et al. (2020). In the area of data analytics, they stated that hybrid ANN models are adaptable and effective.

In very recent work, Jamous et al. (2021) developed a novel PSO called PSO with center of gravity (PSOCoG) to choose the optimal hyperparameters of ANN and evaluated the proposed model for stock price forecasting under the effect of the COVID-19 pandemic. Kamalov (2020) employed two deep neural network models such as CNN and LSTM, one shallow neural network viz. multiplayer perceptron (MLP), and random forest (RF) to forecast the significant change in the close price of the stock market index. Experimental results confirm that the deep learning models achieved a high degree of accuracy as compared to machine learning models under consideration. Ji et al. (2021) designed a novel improved PSO and combined it with LSTM to optimize the hyperparameters of LSTM to create a hybrid DNN model for forecasting stock price. They attempted to improve the inertia weight in PSO by utilizing the nonlinear approach and then applied the improved PSO (IPSO) to LSTM to choose the optimum value of the number of epochs, learning rate, and the number of nodes in two hidden layers. Peng et al. (2020) applied the fruit fly optimization algorithm (FOA) to select the optimal hyper-parameters of LSTM for time series forecasting. The hyper-parameters considered in this study are window length of time series, batch size, number of hidden neurons, and number of training iterations. Huang et al. (2021) proposed variational mode decomposition (VMD), LSTM, GA, and back-propagation neural network (BPNN)-based hybrid approach in the domain of financial time series forecasting. Firstly, they employed the VMD to decompose the time series into short-term and long-term trends and applied GA to tune the parameters of VMD to minimize the loss and then fed the decomposed data as input to LSTM. Liu and Long (2020) developed a hybrid deep learning framework by combining the empirical wavelet transform (EWT) and outlier robust extreme learning machine (ORELM) for preprocessing and post-processing of financial time series, LSTM as forecasting model, dropout strategy for optimizing the training process and PSO is used for determining the optimum hyperparameters such as number of hidden neurons, number of training epochs and learning rate of LSTM. Chung and Shin (2018) integrated GA with LSTM network to develop a hybrid deep learning approach to determine the optimal window length and optimal architecture of LSTM for stock market prediction and prove that the proposed model outperforms the benchmark models.

Kumar et al. (2021d) integrated sentimental analysis and technical analysis and developed a hybrid model by merging ABC with LSTM for forecasting time series data. The ABC is employed to tune the six parameters of LSTM namely window length, learning rate, number of hidden units, number of epochs, batch size, and dropout rate, and obtained enhanced forecasting accuracy. Hu et al. (2020) have proposed a hybrid model that predicts two major stock indexes, by merging extreme learning machine (ELM) with Improved Harris's hawks optimization (IHHO). In this paper, IHHO is utilized to find the best value for weights and bias for ELM. Sahoo and Mohanty (2020) used a hybrid technique that included ANN and the grey wolf optimization approach (GWO). The authors of this paper used GWO to fine-tune the parameters of an ANN for forecasting stock prices over different periods. Kumar et al. (2021c) presented a hybrid evolutionary ANN model by aggregating principal component analysis (PCA), PSO, and feed-forward neural network (FFNN) to foresee the close price of stock indices by employing 20 indicators. In this work, PCA is utilized for dimensionality reduction; PSO is applied to determine the optimum value of initial weights and bias of FFNN and obtained promising results.

ANN and nature-inspired algorithm (NIA)-based hybrid models have proven to be potential techniques for stock price time series forecasting. From the literature, it is evident that most of the researchers have attempted to optimize the various hyperparameters of LSTM such as time window length, number of hidden layers, batch size, number of hidden neurons, learning rate, dropout rate, and number of epochs.. To the best of our knowledge, none of the studies has considered the optimization of initial weights and bias of LSTM neural network. Hence, we attempt to optimize the initial weights and bias of LSTM and FCL in the DNN model using adaptive PSO for stock price forecasting.

3 Methodology

To predict the stock price this article presents a deep learning framework comprised of four phases to develop PSO-based hybrid deep learning model. The first step consists of calculating technical indicators from daily stock prices and data pre-processing, which includes using the min–max normalization approach to map technical indicators to a narrow range. PSO is utilized in the second step to obtain the initial weights and bias of the LSTM and FCL in DNN. In the following step, we allocate the optimal weight and biases acquired in the preceding phase to the DNN and train the LSTM with the Adam optimizer. In the final phase, we analyze the forecasting efficiency of the model based on five performance measures and then yield the forecasted prices. The general framework of the proposed technique is depicted in Fig. 1. In the subsequent sections, each phase is described in detail.

  1. A.

    Technical indicators and pre-processing

Fig. 1
figure 1

The general framework of proposed work

Initially, a pool of technical indicators is created using daily OHLC stock market data, which includes the open (OP) price, high (HI) price, low (LO) price, and closing (CL) price, as well as the volume (V). The technical indicators considered in this study are selected from prior research by Kumar et al. (2021b, c). Table 1 provides a overview of mathematical formulations of technical indicators.

  1. B.

    Data normalization

Table 1 Technical indicators

To convert the data values in a short-range we use the Min–Max normalization technique. Let us assume that \({X}_{\mathrm{min}}\) and \({X}_{\mathrm{max}}\) are the lowest and highest values of a feature . Then this method map a value \(x\) of \(X\) to \({x}^{*}\) from [\({X}_{\mathrm{min}},{ X}_{\mathrm{max}}\)] to a small range [\({{X}_{\mathrm{min}}^{*},X}_{\mathrm{max}}^{*}\)] by Eq. (1) given as:

$${x}^{*}=\left(\frac{x-{X}_{\mathrm{min}}}{{X}_{\mathrm{max}}-{X}_{\mathrm{min}}}\right)\left({X}_{\mathrm{max}}^{*}-{X}_{\mathrm{min}}^{*}\right)+{X}_{\mathrm{min}}^{*}$$
(1)
  1. C.

    Long short-term memory (LSTM)

The LSTM neural network developed by Hochreiter and Schmidhuber (1997a) is a type of recurrent neural network (RNNs) with nonlinear gated units and memory cells that enable it to capture long-term dependencies in sequential data (Greff et al. 2016). The LSTM has the capability to deal with vanishing/exploding gradient problems that typically occur in RNNs that restrict it to capture long-term dependencies in time series data (Hochreiter and Schmidhuber 1997b). The LSTM overcomes this problem by selectively reading, writing, and forgetting the information so that only important information remains in the memory cells.

The LSTM network is made-up of the memory cell and three gates namely input gate, forget gate, and output gate. The three gates manage the transfer of information through the memory cell by deciding which information to forget, retain and pass (output) to the next state. Figure 2 displays the typical structure of an LSTM network with a single hidden layer. The core part of the LSTM unit contained in the hidden layer(s) is the cell state.

Fig. 2
figure 2

Long short-term memory (LSTM) neural network

Let \({X}_{t}=[{x}_{t}^{1},{x}_{t}^{2},{x}_{t}^{3},\ldots ,{x}_{t}^{N}]\) be the \(N\) number of inputs,\({h}_{t}=[{h}_{t}^{1},{h}_{t}^{2},{h}_{t}^{3},\ldots ,{c}_{t}^{K}]\) be the \(K\) hidden units or hidden states and \({C}_{t}=[{c}_{t}^{1},{c}_{t}^{2},{c}_{t}^{3},\ldots ,{c}_{t}^{K}]\) be the cell state of the LSTM network at time t and \({f}_{t}\), \({i}_{t}\) and \({O}_{t}\) represent the forget gate, input gate, and output gate, respectively. At each timestamp t, the input \({X}_{t}\) along with previously hidden state \({h}_{t-1}\) is presented to three gates to compute the next hidden state \({h}_{t}\) and to update the previous cell state \({C}_{t-1}\) in order to compute the new cell state \({C}_{t}\). The mathematical formulation of various operations performed in LSTM is given below:

  1. (1)

    In the first stage, the forget gate \({f}_{t}\) in the LSTM layer decides which information is to be discarded from the prior cell state \({C}_{t-1}\), which can be calculated as:

    $${f}_{t}=\sigma ({W}_{f}{X}_{t}+{U}_{f}{h}_{t-1}+{b}_{f})$$
    (2)
  2. (2)

    In the next stage, the LSTM units specify which information is to be included in the cell state \({C}_{t}\). This process involves two operations: first, candidate vector \({\widehat{C}}_{t}\) captures all the information from the last state \({h}_{t-1}\), and current input \({X}_{t}\). Second, the input gate \({i}_{t}\) selectively read the information to be inserted in the cell state \({C}_{t}\). Both the operations are computed as:

    $${\widehat{C}}_{t}=\mathrm{tanh}({W}_{c}{X}_{t}+{U}_{c}{h}_{t-1}+{b}_{c})$$
    (3)
    $${i}_{t}=\sigma ({W}_{i}{X}_{t}+{U}_{i}{h}_{t-1}+{b}_{i})$$
    (4)
  3. (3)

    In the third step, a new cell state \({C}_{t}\) is computed on the basis of results of previous steps as:

    $${C}_{t}={f}_{t}\times {C}_{t-1}+{i}_{t}\times {\widehat{C}}_{t}$$
    (5)
  4. (4)

    In the final step, the output gate \({O}_{t}\) determined how much information is to be passed to compute the output \({h}_{t}\) at the next timestamp:

    $${O}_{t}=\sigma ({W}_{O}{X}_{t}+{U}_{O}{h}_{t-1}+{b}_{O})$$
    (6)
    $${h}_{t}={O}_{t}\times \mathrm{tanh}\left({C}_{t}\right)$$
    (7)

Where \({W}_{f}\),\({W}_{i}\),\({W}_{C}\),\({W}_{O}\) are the weights matrices corresponding to the input \({X}_{t}\) and \({U}_{f}\), \({U}_{i}\), \({U}_{C}\), \({U}_{O}\) are recurrent weights matrices associated with previously hidden state \({h}_{t-1}\) and \({b}_{f}\), \({b}_{i}\), \({b}_{C}\), and \({b}_{O}\) are the bias vectors for forget gate, input gate, candidate solution, and output gate, respectively. \(\sigma \left(x\right)=1/\left(1+{e}^{-x}\right)\) is log-sigmoid activation function and \(\mathrm{tanh}(x)=({e}^{x}-{e}^{-x})/({e}^{x}+{e}^{-x})\) is the hyperbolic tangent activation function. The log-sigmoid activation function generates values in the range 0 to 1 describing how much of each value passes through. The value “0” specifies nothing can pass through and the value “1” specifies everything can pass through and the value between 0 and 1 determines what fraction of value passes through.

  1. D.

    Proposed adaptive PSO-LSTM approach

In this work, our goal is to optimize the weights and bias of LSTM by applying Adaptive particle swarm optimization (PSO).

Kennedy and Eberhart (1995) introduced PSO, a population-based stochastic optimization approach inspired by the social behavior of fish schooling and bird flocks for addressing optimization problems in the continuous domain. The search for the best solution in PSO begins by assigning the population (swarm) with random solutions. In the D-dimensional search area, each particle represents a potential solution. Every particle in the swarm is evaluated and updated repeatedly during the search process. Because of its own best experience and the intelligence of the entire swarm, each particle travels to a new best position by adjusting its velocity.

When designing an LSTM, various hyper-parameters need to be considered and there are several ways to develop a DNN. In this study we considered a basic LSTM having one hidden layer connected to the input layer and a fully connected layer.

PSO is utilized to evolve the initial weights and biases of forget gate, input gate, candidate solution vector, and output gate of LSTM as well as weights and bias between fully connected layer and output layer. The main framework of the proposed adaptive PSO-based hybrid LSTM model is shown in Fig. 3. PSO comprises many phases, as illustrated in the diagram: encoding, initialization, evaluation, and update, which are explained as follows:

  1. (1)

    Coding

Fig. 3
figure 3

Flowchart of adaptive PSO-based hybrid LSTM neural network

In swarm intelligence-based methods, the most essential step is the representation of potential solutions (particles). Potential solutions are provided in this work by encoding the weights and biases of LSTM and FCL as swarm particles. The learnable weights in the LSTM network are input weights (\(\mathrm{IW}\)), recurrent weights (\(\mathrm{RW}\)), and bias (\(B\)). The matrices \(\mathrm{IW}=[{W}_{i},{W}_{f},{W}_{C},{W}_{O}]\), \(\mathrm{RW}=[{U}_{i},{U}_{f},{U}_{C},{U}_{O}]\), and \(B=[{b}_{i},{b}_{f},{b}_{C},{b}_{O}]\), are the concatenation of input gate weights, forget gate weights, cell candidate weights, and output gate weights, respectively. The weights matrix “W” and bias “b” represent the parameters of FCL. Figure 4 shows how each particle (potential solution) in the PSO denotes the parameters in LSTM and FCL to be optimized.

  1. (2)

    Initialization

Fig. 4
figure 4

PSO particle (solution) representation

In PSO, the dimensionality \(D\) of the search region is equivalent to the number of learnable parameters (weights and bias) in LSTM and FCL to be tuned given as: \(P=4*(K\left(K+N\right)+K)\), where \(K\) is the number of hidden units and \(N\) is the total number of input variables and the number of parameters in FCL is equal to \(Q=O*K+O\), where \(O\) is the number of nodes in the output layer. Thus, the total number of parameters to be optimized is equal to \(D=P+Q\). The input weights (\(IW\)) matrix in the particle is initialized with Glorot initializer (Glorot and Bengio 2010) which samples the weights from the uniform distribution with mean zero and variance \(2/(N+(4*K))\) in the range \([-\sqrt{6/\left(4*K\right)+N}, \sqrt{6/\left(4*K\right)+N}]\). The recurrent weights (\(RW\)) and weight matrices (\(W\)) in the particle are initialized by the orthogonal matrix obtained by QR decomposition of the random matrix given by uniform normal distribution (Saxe et al. 2013). Finally, all the bias values are initialized with zero.

Initially, each particle’s velocity \({V}_{0 }^{i}\) is assigned a zero value. Swarm or population size (S), the maximum number of iterations (maxGen), personal coefficient (\({k}_{1}\),), social coefficient (\({k}_{2}\)), and inertia coefficient (\(\omega \)), are various parameters of PSO that need to be appropriately set to execute it. The inertia coefficient \(\omega \) (Xiong et al. 2015), that varies from \({\omega }_{\mathrm{max}}\) to \({\omega }_{\mathrm{min} }\) with time in a linear manner is used in traditional PSOfor comparison with adaptive PSOproposed in this study, which is defined as:

$$\omega ={\omega }_{\mathrm{max}}-\frac{({\omega }_{\mathrm{max}}-{\omega }_{\mathrm{min}})t}{T}$$
(8)

where \(t\) and \(T\) represent the current iteration and maximum iterations, respectively.

  1. (3)

    Evaluation

The efficiency of every solution in the population is determined by calculating the cost function i.e., mean squared error between the real value and predicted value as described in Fig. 5. Given the data samples \(D={\left\{{X}_{i},{T}_{i}\right\}}_{i=1}^{N}\), where, \({T}_{i}\) is the target value corresponding to the input feature \({X}_{i}\). The cost function of the fitness function is given as follows:

$$f(\mathrm{IW},\mathrm{RW},B,W,b,D)=\frac{1}{N}\sum\limits_{i=1}^{N}{({T}_{i}-{Y}_{i})}^{2}$$
(9)

where \(N\) is the sample size and \({Y}_{i}\) is the forecasted value of input \({X}_{i}\) achieved from the output layer of LSTM and assigning the input weights (\(\mathrm{IW}\)), recurrent weights (\(\mathrm{RW}\)), and bias (\(B\)) to LSTM and weights (\(W\)) and bias (\(b\)) to FCL generated by the PSO at iteration t.

Fig. 5
figure 5

PSO-LSTM iterative process for evaluating the fitness of network parameters of LSTM

  1. (4)

    Update

The best solution obtained by each particle is known as personal best (\({Pbest}\)) and the optimum position occupied by any particle within the swarm known as global best (\({Gbest}\)), are improved at every iteration on the basis of fitness of each particle and following \({Pbest}\) and \({Gbest}\), the location and velocity of particles are modified during each iteration.

Let and \({V}_{t}^{i}\) and \({P}_{t}^{i}\) denote the velocity and position of \(ith\) particle at epoch \(t\), respectively. Then the velocity \({V}_{t+1}^{i}\) and the position \({P}_{t+1}^{i}\) of \(ith\) particle at epoch \(t+1\) are changed by Eqs. (10) and (11), respectively:

$${V}_{t+1}^{i}={\omega \times {V}_{t}^{i}}+{k}_{1}{r}_{1}\times ({Pbest}^{i}-{P}_{t}^{i})+{k}_{2}{r}_{2}\times ({Gbest}^{i}-{P}_{t}^{i})$$
(10)
$${P}_{t+1}^{i}={P}_{t}^{i}+{V}_{t+1}^{i}$$
(11)

where \({Pbest}^{i}\) and \({Gbest}^{i}\) represent the particle’s personal the global best representing the potential solution, and \({r}_{1}\) and \({r}_{2}\) are two random numbers uniformly distributed in U(0,1).

The inertia coefficient (\(\omega \)) is one of the important parameters of PSO that maintain the balance amid exploitation and exploration features of PSO (Shi and Eberhart 1998). Exploitation is the property of the PSO to enhance the best solution it has achieved so far in the small neighborhood of the current best solution, whereas exploration is the characteristics of the PSO to search for new solutions by finding the regions with potentially better solutions. The large value of the inertia coefficient may result in divergence from the global best solution and a small value of the inertia coefficient may lead to slow convergence and high complexity. Therefore, the optimum value of the inertia coefficient is desired. In the literature, authors have attempted various strategies such as constant, time-varying, and adaptive inertia coefficients to maintain the balance between the exploitation and exploration properties of PSO (Nickabadi et al. 2011; Jamous et al. 2021; Ji et al. 2021). In this work, we use the adaptive mechanism for dynamically determining the value of the inertia coefficient by utilizing the fitness function (MSE) as a feedback parameter. To maintain the exploitation and exploration features of PSO we use the velocity of the personal best and global best particle in the swarm. The adaptive inertia coefficient is given as:

$${\omega }_{i}^{t}={\omega }_{\mathrm{max}}-{(\omega }_{\mathrm{max}}-{\omega }_{\mathrm{min}})\times ({V}_{{Gbest}_{i}}^{t}-{V}_{{Pbest}_{i}}^{t})$$
(12)

where \({\omega }_{i}^{t}\) is the adaptive inertia coefficient,\({V}_{{Gbest}_{i}}^{t}\) is the velocity of global best (\({Gbest}^{i}\)), \({V}_{{Pbest}_{i}}^{t}\) is the velocity of personal best (\({Pbest}^{i}\)) particle \(i\) at iteration \(t\) and \({\omega }_{\mathrm{max}}\) and \({\omega }_{\mathrm{min}}\) are the maximum and minimum values of the inertia coefficient. Figure 6 shows the search mechanism of PSO using the adaptive inertia coefficient (\({\omega }_{i}^{t}\)) given by Eq. (12) in (10) in place of \(\omega \) to compute the velocity and position of particles using Eqs. (10) and (11), respectively.

Fig. 6
figure 6

Searching scheme of adaptive PSO

The process of changing \({ Pbest}\), \(Gbest\), location, and velocity of particle continue till the iteration count reaches maximum iterations (maxGen) or tolerance reaches a preset value. The final \(Gbest\) is the optimal solution comprised of tuned weights and bias of LSTM and FCL. Finally, the weights and bias generated by Adaptive PSO-LSTM are assigned as initial parameters of LSTM and FCL which is further trained by Adam optimizer.

4 Performance metrics and implementation of forecasting model

The performance metrics utilized to assess the proposed model’s predictive abilities, as well as the experimental setup, are discussed in this section.

  1. A.

    Evaluation measures

We employed five measures namely mean squared error (MSE), mean arctangent absolute percentage error (MAAPE), root-mean-square error (RMSE) (Botchkarev 2018; Kim and Kim 2016), symmetric mean absolute percentage error (SMAPE) (Hyndman and Koehler 2006) and Theil’s Inequality Coefficient (Theil’ U) (Theil 1966) to examine and compare the model's predicting accuracy and robustness. The following are the mathematical definitions of these metrics:

$$\mathrm{MSE}=\frac{1}{N}\sum \limits _{i=1}^{N}{({A}_{i}-{P}_{i})}^{2}$$
(13)
$$\mathrm{RMSE}=\sqrt{\frac{1}{N}\sum \limits_{i=1}^{N}{({A}_{i}-{P}_{i})}^{2}}$$
(14)
$$\mathrm{MAAPE}=100\times \frac{1}{N} \sum \limits_{i=1}^{N}arctan\left(\left|\frac{{A}_{i}-{P}_{i}}{{A}_{i}}\right|\right)$$
(15)
$$\mathrm{SMAPE}=100\times \frac{1}{N}\sum \limits _{i=1}^{N}\left(\frac{\left|{A}_{i}-{P}_{i}\right|}{\left(\left|{A}_{i}\right|+\left|{P}_{i}\right|\right)/2}\right)$$
(16)
$$\mathrm{Theil'\,U}=\frac{\sqrt{\frac{1}{N}{\sum \nolimits} _{i=1}^{N}{({A}_{i}-{P}_{i})}^{2}}}{\sqrt{\frac{1}{N}{\sum \nolimits} _{i=1}^{N}{({A}_{i})}^{2}}+\sqrt{\frac{1}{N}{\sum }_{i=1}^{N}{({P}_{i})}^{2}}}$$
(17)

where \({A}_{i}\) is the true value and \({P}_{i}\) is the predicted value derived from the proposed model for the ith observation of the dataset and N is the sample size in the dataset.

  1. B.

    Implementation of the proposed model

The forecasting models are implemented in MATLAB 2021a installed on a system with a 2.10 GHz i3 CPU and 4 GB RAM.

  1. (1)

    Datasets

To show and evaluate the predicting performance of the model, we utilized stock price time-series data from three stock market indexes: S&P 500, Sensex, and Nifty 50 over the duration and number of samples presented in Table 2. The datasets used in this work were taken from Yahoo Finance. Each dataset is split into three parts: training data (80%), validation data (10%), and testing data (10%). The first 1237 data of Nifty 50 from January 1, 2015, to January 3, 2020, are considered as training dataset, next 155 data samples from January 6, 2020, to August 18, 2020, are used as validation dataset and the last 155 data from August 19, 2020, to March 31, 2021, are used as test dataset. The first 1232 data of Sensex from January 1, 2015, to January 6, 2020, are used as training dataset, the next 155 data from January 7, 2020, to August 18, 2020, are used as validation dataset and the last 154 data from August 19, 2020, to March 31, 2021, are used as test dataset. The first 1257 data of S&P 500 from January 2, 2015, to December 30, 2019, are used as training dataset, next 158 data from December 31, 2019, to August 13, 2020, are used as validation dataset and the last 157 data from August 14, 2020, to March 31, 2021, are used as test dataset.

Table 2 Description of stock price time-series data
  1. (2)

    Implementation of adaptive PSO-based LSTM neural network

To build the pool of technical indicators, we used OHLC data and the number of shares traded in the first stage. Then the data pre-processing module is given a feature set of 20 indicators comprised of 16 technical indicators and 4 daily stock prices. The TA python library has been used to generate these indicators (Technical indicators 2020), and then the Min–Max normalization approach is used to translate the data in the range [− 1 1].

In the next phase, we implement adaptive PSO to improve the initial weights and biases for LSTM and FCL in DNN. The LSTM network considered in this study comprises an input layer with 20 nodes, 1 hidden layer with 100 neurons, and an output layer with 1 neuron. In the third phase, we employed Adam optimizer to train the LSTM created by assigning the initial weights and bias evolved in the preceding step. Table 3 shows the parameters required to implement the forecasting model. The value of the personal coefficient (\({k}_{1}\)) and social coefficient (\({k}_{2}\)) of PSO is set to be 1.2 (Ji et al. 2021). The value of maximum inertia coefficient, (\({\omega }_{\mathrm{max}}\)) and minimum inertia coefficient, (\({\omega }_{\mathrm{min}}\)) are set to be 0.9 and 0.4, respectively (Wang et al. 2020).

Table 3 Parameters of the model

5 Experimental results and discussion

To evaluate the short-term (1-day, 1-week and 2-weeks) and long-term (4-weeks, 6-weeks and 12-weeks) forecasting efficiency of the proposed model entitled Adaptive PSO-LSTM, we have calculated five measures described in section IV and compared its forecasting efficiency with ENN, regular LSTM, GA-LSTM, and PSO-LSTM. Table 4 shows the experimental results corresponding to forecasting of 1-day ahead close price of three major stock market indices such as ‘Nifty 50, Sensex and S&P 500’. The ‘MSE, MAAPE, RMSE, SMAPE, and Theil’s U’ values of the Adaptive PSO-LSTM model for each case show that the model developed in this study performs better than other forecasting techniques considered in this study. Furthermore, in each dataset, the performance measures values for LSTM are all higher than those for Adaptive PSO-LSTM, PSO-LSTM, and GA-LSTM. Therefore, we can infer that the proposed adaptive PSO has the capability to improve the parameters of a conventional LSTM network for time series forecasting.

Table 4 1-Day ahead forecasting performance

The visualization of 1-day ahead forecasting results of three indices using test dataset by five models are shown by ‘Figs. 7, 8, and 9’. It is analyzed from the graphs of three indices that the true prices (blue line) are very close to the predicted price (red line) obtained by Adaptive PSO-LSTM and hence the forecasting performance of the proposed model is higher as compared to others.

Fig. 7
figure 7

1-Day ahead forecasting results of Nifty 50

Fig. 8
figure 8

1-Day ahead forecasting results of Sensex

Fig. 9
figure 9

1-Day ahead forecasting results of S&P 500

Table 5 shows the 1-week ahead forecasting results for three stock indices using five models, and Figs. 10, 11, and 12 show the plot of actual versus estimated price for 1-week ahead forecasting of three stock market indices. In the same way, we have concluded that the proposed technique performs better than others for 1-week ahead forecasting based on the findings.

Table 5 1-Week ahead forecasting performance
Fig. 10
figure 10

1-Week ahead forecasting results of Nifty 50

Fig. 11
figure 11

1-Week ahead forecasting results of Sensex

Fig. 12
figure 12

1-Week ahead forecasting results of S&P 500

Table 6 presents the performance metrics values obtained by applying the five models to forecast the 2-week ahead close price of three indices and Figs. 13, 14 and 15 show the closeness between the actual and predicted price for three datasets. It is clear from the plots that the price predicted by Adaptive PSO-LSTM is very close to actual price.

Table 6 2-Week ahead forecasting performance of the models
Fig. 13
figure 13

2-Week ahead forecasting results of Nifty 50

Fig. 14
figure 14

2-Week ahead forecasting results of Sensex

Fig. 15
figure 15

2-Week ahead forecasting results of S&P 500

Table 7 presents the performance of the proposed Adaptive PSO-LSTM model in comparison to PSO-LSTM, GA-LSTM, LSTM, and ENN corresponding to the 4-week ahead forecasting results of three indices. It is worth mentioning that Adaptive PSO-LSTM gives the best forecasting performance in three datasets according to all performance metrics considered in this study.

Table 7 4-Week ahead forecasting performance

The 4-week ahead forecasted value of the normalized closing price of three indices is portrayed in Figs. 16, 17, and 18, respectively. From the figures, we can judge that the close price forecasted by LSTM and ENN have huge differences compared to true price than GA-LSTM, PSO-LSTM, and Adaptive PSO-LSTM. Furthermore, the values predicted by Adaptive PSO are very close to the actual value which indicates the superiority of the proposed model in comparison to others.

Fig. 16
figure 16

4-Week ahead forecasting results Nifty 50

Fig. 17
figure 17

4-Week ahead forecasting results of Sensex

Fig. 18
figure 18

4-Week ahead forecasting results of S&P 500

As reported in Table 8, the proposed Adaptive PSO-LSTM model shows the better forecasting performance as compared to PSO-LSTM, GA-LSTM, LSTM, and ENN for 4-week ahead forecasting of ‘Nifty 50, Sensex, and S&P 500’. The actual price versus predicted price by different models is plotted in Figs. 19, 20, and 21 for three datasets display that the proposed model performs better in comparison with others.

Table 8 6-Week ahead forecasting performance
Fig. 19
figure 19

6-Week ahead forecasting results of Nifty 50

Fig. 20
figure 20

6-Week ahead forecasting results of Sensex

Fig. 21
figure 21

6-Week ahead forecasting results of S&P 500

The comparison of 12-week ahead forecasting performance of proposed model Adaptive PSO-LSTM with PSO-LSTM, GA-LSTM, LSTM and ENN based on five metrics for three datasets is reported in Table 9. Our model demonstrates the best overall performance in terms of MSE, RMSE, MAAPE, SMAPE and Theil’U metrics. Furthermore, the graph of actual price versus predicted price shown in Figs. 22, 23 and 24 obtained by applying Adaptive PSO-LSTM and other models shows that the close price predicted by the proposed model is almost overlapping with the actual value which confirms the reliability of the model.

Table 9 12-Week ahead forecasting performance of the models
Fig. 22
figure 22

12-Week ahead forecasting results of Nifty 50

Fig. 23
figure 23

12-Week ahead forecasting results of Sensex

Fig. 24
figure 24

12-Week ahead forecasting results of S&P 500

In general, the experimental findings obtained by implementing the proposed model on three datasets shows that the forecasted stock prices are very close to actual prices signifying the generality and robustness of the proposed Adaptive PSO-LSTM model for ‘short term (1-day, 1-week and 2-weeks) and long term forecasting’ (4-weeks, 6-weeks and 12-weeks).

Figures 25 and 26 show the graphical representation of training versus validation loss (MSE) with respect to the epochs for standard LSTM and Adaptive PSO-LSTM, respectively. It is evident from Fig. 25 that overfitting has occurred in standard LSTM because there is a large disparity between training and validation loss. However, the proposed model namely Adaptive PSO-LSTM has effectively avoided the overfitting issue because the training and testing errors are almost identical.

Fig. 25
figure 25

Training versus validation loss for standard LSTM

Fig. 26
figure 26

Training versus validation loss for adaptive PSO-LSTM

Furthermore to verify the robustness and reliability of Adaptive PSO-LSTM as compared to PSO-LSTM we have shown the plot of the best fitness value (MSE) versus iterations in Figs. 27, 28, and 29 for ‘Nifty 50, Sensex, and S&P 500’ datasets corresponding to 1-day ahead forecasting.

Fig. 27
figure 27

Best fitness value (MSE) for Nifty 50

Fig. 28
figure 28

Best fitness value (MSE) for Sensex

Fig. 29
figure 29

Best fitness value (MSE) for S&P 500

Our objective is to use an adjustable inertia weight in PSO to improve the exploration and exploitation features. Exploitation refers to the PSO's ability to improve the best solution it has obtained so far within a limited region of the current best solution, whereas exploration refers to the PSO's ability to identify new solutions by locating regions with possibly superior solutions. From Figs. 25, 26, and 27 it is clear how the fitness of particle (MSE) is further improved in each training epoch by Adaptive PSO as compared to PSO. Further, the robustness and reliability of Adaptive PSO-LSTM can be verified by comparing the MSE, RMSE, MAAPE, SMAPE, and Theil’U values with PSO-LSTM corresponding to ‘short term (1-day, 1-week and 2-weeks) and long term (4-weeks, 6-weeks and 12-weeks) forecasting’ results.

6 Conclusion

This work proposed a nature-inspired optimization-based hybrid deep learning approach named as Adaptive PSO-LSTM by proposing an adaptive PSO technique that search automatically for the initial input weights, recurrent weights, and bias of long short-term memory (LSTM) networks as well as weights and bias of fully connected layer (FCL) for forecasting stock price time series. We evaluated the forecasting ability of the proposed Adaptive PSO-LSTM model on ‘three stock market indices’ namely S&P 500, Sensex, and Nifty 50 for short term (1-day, 1-week and 2-weeks) and long term (4-weeks, 6-weeks and 12-weeks) forecasting. We compared the forecasting efficacy of the model with PSO-LSTM and GA-LSTM, LSTM, and ENN. The following is a summary of the major findings of this study:

  1. (1)

    The proposed hybrid model is effective in obtaining the optimum weights and bias of LSTM and FCL and improves the exploration and exploitation capability of PSO by using the adaptive inertia coefficient in PSO that increases the accuracy of the model.

  2. (2)

    The proposed technique successfully combines the global search potential of adaptive PSO with the local search ability of Adam optimizer, decreasing the chance of getting trapped in local minima and overcoming the underfitting/overfitting issues.

  3. (3)

    Experimental findings attained by employing the proposed method on three stock indices for ‘short term (1-day, 1-week and 2-weeks) and long term (4-weeks, 6-weeks and 12-weeks) forecasting shows that the proposed Adaptive PSO-LSTM performs better in comparison to PSO-LSTM, GA-LSTM, LSTM, and ENN. Hence, the Adaptive PSO-LSTM is proved to be a promising approach for short-term and long-term forecasting of stock price time-series data.

In the future the work can further be extended by taking into account the additional hyperparameters of LSTM viz. learning rate, number of epochs, number of hidden layers and number of hidden neurons, batch size, window length of time series, dropout rate, etc. To improve the model's hyperparameters, more nature-inspired and evolutionary techniques would be investigated in the future in other financial time series forecasting domains.