1 Introduction

People are always looking for some ways to invest their capital. Stock market is one of the main places to invest the money and capital. However, stock markets confront with different risks. Therefore, investors require forecasting stock price and it depends on several psychological, economic, etc., factors. Thus, several methods have been developed to predict stock prices. These forecasting methods aim at proposing approaches to predict index value or stock prices (Lah et al. 2019). They need different considerations due to the quality and quantity of data. Technical analysis, fundamental analysis and statistical methods are used for stock price prediction. One of the main hypotheses which should be considered and better to test it is efficient market hypothesis (EMH) (Malkiel 1989, 2003). EMH means that information has a high impact on stock prices and prices modifying themselves according to this information (Greco et al. 2019). The efficient market ensures investors that they access similar information (Naseer and Bin Tariq 2015). The efficient market is based on the assumption that no system can beat the market because if this system becomes general, everybody will use it. Thus, it negates its potential profitability.

Time series is a main method which is used for the prediction of share prices. Time series analysis deals with analyzing a series of data gathered during time. Time series are common in different fields of economy, finance, healthcare, etc (Bisgaard and Kulahci 2011). This method tries to forecast future by assuming that the previously observed pattern can be considered as the foundation to extract future behavior (Shin 2017). Heuristic algorithms are another set of methods being used for prediction. Heuristic algorithms are often used as an alternative optimization algorithm, instead on exact methods that usually deal with finding a good feasible solution without any assurance of being optimal (Kaveh and Ghazaan 2018). Heuristic algorithms are applicable in different decision problems which have complex structures, and it takes a long time to identify their characteristics. The other methods are metaheuristic algorithms. Metaheuristic algorithms are actually a set of algorithms that are applied to heuristic algorithms and simultaneously allow the use of heuristic algorithms in a large number of issues. It does not take into account the characteristics of the model and is compatible with any model and different solutions (Osman and Kelly 1996; Talbi 2009). In cases which the set of solutions is too large to being sampled completely, metaheuristics examine a set of these solutions. Since metaheuristics are usually developed based on a limited set of assumptions, they can be used for a variety of problems (Blum and Roli 2003).

Comparing with exact methods, there is no guarantee that metaheuristics can find global optimum of an optimization problem (Blum and Roli 2003; Khosravanian et al. 2018). Metaheuristic algorithms are applied to solve difficult and complicated problems in an affordable time. These algorithms usually found acceptable rather than optimal solutions for these types of problems (Talbi 2009). Gogna and Tayal (2013), Abdel-Basset et al. (2018), Wong and Ming (2019) are a sample of studies reviewed the applications of metaheuristic algorithms in different fields.

The other method is ANN which is retrieved from the function of human brain and thinking. ANN is in the subset of artificial intelligence (AI), and it is usable in different contents such as pattern recognition, classification, regression. Because most of the financial data are nonlinear and are asymmetric, ANN can recognize the relationship perfectly.

This paper aims to predict the stock price by ANN. The developed ANN is trained using some metaheuristic algorithms, including social spider optimization (SSO) and bat algorithm (BA). A group of technical measures are used as input variables. Genetic algorithm (GA) also is used as feature selection and choosing the best and most related indicators. Different loss function is used as error assessment criteria.

To evaluate the performance of the mentioned hybrid algorithm, the obtained results are compared with results of ARIMA as a time series model to predict the stock price. This obtained performance and its comparisons are done on five most important and international indices including S&P500, DAX, FTSE100, NASDAQ and DJI. The paper is structured as follows: the 2nd part reviews the available literature. The 3rd part describes ANN structure and proposed algorithm. In Sect. 4 ARIMA is used for time series forecasting. Sections 5 and 6 examine the experimental process and the results. Finally, last section means 7th part concludes the paper. You can see more results in Appendices A and B.

2 Literature review

Stock market is a place where you can invest your capital to buy or sell part of the company's assets in the form of shares (Preethi and Santhi 2012). We can see the market as a pulse of economic activities and almost country, which can be a place with high benefits for investors which they can grow their capital and money or totally their wealth. Stock market is characterized by features such as nonlinearity, discontinuity, and volatile multifaceted elements because many items affect is such as general economic situations, political actions and broker's assumptions (Hadavandi et al. 2010). Considering the amount of fluctuation in this market, a rapid decision making process is required. Therefore, it is very important that transactions are done in the shortest possible time (Barakat et al. 2016). Obtaining maximum profit is the ultimate goal of the investors. As a result, many researchers are looking for market forecasting capabilities in a variety of ways (Prasanna and Ezhilmaran 2013). According to previous studies, ANN seems a good and reasonably validated method in the prediction of stock price (Idris et al. 2015). The three most popular ANNs for stock prediction are the recurrent neural network (RNN) (Saad et al. 1998), the radial basis function (RBF) (Han et al. 2001), and multilayer perceptron (MLP). There are many methods for training the ANN and some of them are better than the others in finding the linear and nonlinear relationship. ANN uses two thresholds for exploration of linear and nonlinear qualifications. The number of layer is very important in predictability. If we use too many layers, the ANN couldn't find the fittest choice and the structure will be complicated. In addition, too few layers mean that the ANN is unable to find the global solution and nonlinear relationships. The researchers have tried to discover some methods which have high speed with high accuracy and lower the error. For this reason, the metaheuristic algorithms are used. These methods are used for the network optimization and finding the best number of input and hidden layers. The ANN models in forecasting stock price, stock return, exchange rate, inflation and imports work better than traditional statistical models (Yim and Mitchell 2002).

Gupta and Wang (2010) used feed-forward neural networks to forecast and trade the future index prices of the Standard and Poor’s 500 (S&P 500). The effect of training the network with the most recent data, together with gradually subsampled past index data, has been studied in this research. They also studied the effect of past NASDAQ 100 data on the prediction of future S&P 500. A daily trading strategy has been used, to buy/sell, according to the predicted prices, and hence calculate the directional efficiency and the rate of returns for different periods. They were able to obtain significantly higher returns compared to earlier work. There were numerous exchange-traded funds (ETFs), which attempted to replicate the performance of the S&P 500 by holding the same stocks in the same proportion as the index, and therefore, giving the same percentage returns as the S&P 500.

Zhu and Wang (2010) proposed an intelligent trading system using support vector regression optimized by genetic algorithms (SVR-GA) and multilayer perceptron optimized with GA (MLP-GA). Experimental results showed that both approaches outperformed conventional trading systems without prediction and a recent fuzzy trading system in terms of final equity and maximum drawdown for Hong Kong Hang Seng stock index.

He et al. (2013) did researches on the principles and theories in the field of financial market, and basic technical analysis methodologies about the stock market were studied and practiced with the help of Feature Selection algorithms. They used the data of Shanghai Stock Exchange Composite Index (SSECI) from 24 March 1997 to 23 August 2006 to measure twelve technical indicators for later research. The twelve chosen technical indicators were calculated, and the results were taken as the input of the Feature Selection algorithms. The three kinds of Feature Selection algorithms, principle component analysis (PCA), genetic algorithm (GA) and sequential forward selection (SFS) were studied. According to the results and analysis, PCA was the most reliable, but might be time-consuming if the input has very large dimensions. Genetic Algorithm would have a better performance since it takes the advantage of randomness in such a situation. SFS could generate the local optimal solution, but with a risk of “nesting problem”.

Dong et al. (2013) first reproduced the one-step ahead prediction system from Phua et al. for the prediction of stock price. Secondly, they made some modifications and successfully outperform the original prediction system in terms of MSE value, hit-rate and absolute error. Moreover, they explored a difficult multi-step prediction problem. Firstly, they reproduced a multi-step prediction system using simple recursive algorithm. Then, they proposed an error constraint algorithm in order to obtain better weights and bias, as well as smaller accumulated errors. The results outperformed the simple recursive algorithm by observation.

Zheng et al. (2013) explored the application of a wavelet neural network (WNN), whose hidden layer was comprised of neurons with adjustable wavelets as activation functions, to stock prediction. They discussed some basic rationales behind technical analysis, and based on which, inputs of the prediction system were carefully selected. This system was tested on Istanbul Stock Exchange National 100 Index and compared with traditional neural networks. The results showed that the WNN could achieve very good prediction accuracy.

Fang et al. (2014) improved stock market prediction based on genetic algorithms (GA) and wavelet neural networks (WNN) and reported significantly better accuracies compared to existing approaches to stock market prediction, including the hierarchical GA (HGA) WNN. Specifically, they added information such as trading volume as inputs and they used the Morlet wavelet function instead of Morlet–Gaussian wavelet function in their prediction model. They also employed a smaller number of hidden nodes in WNN compared to other research work. The prediction system was tested using Shenzhen Composite Index data.

Lim et al. (2016) used delayed neural network models to predict public housing prices in Singapore. The delayed neural networks are used to estimate the trend of the resale price index (RPI) of Singapore housing from the Singapore Housing Development Board (HDB), with nine independent economic and demographic variables. The results show that the delayed neural network model is able to produce a good fit and predictions.

Göçken et al. (2016) predicted Turkish stock price index using technical indicators and hybrid ANN based on GA and harmony search (HS). The results showed that the error of hybrid metaheuristic algorithms is less than ANN. They compared the hybrid ANN-HS and ANN-GA model and found that the error of ANN-HS is less than ANN-GA.

Considering the problem of dealing with features with a similar contribution, the feature weighted SVM (FWSVM) and feature weighted K-nearest neighbor (FWKNN) are proposed to forecast market indices of stock by assigning different weights to different features (Chen and Hao 2017).

Then this model is tested on two stock markets. Comparing the results, the FWSVM and FWKNN perform better than non-weighted models.

Ghasemiyeh et al. (2017) optimized artificial neural network by metaheuristic algorithms. In their research, cuckoo search, improved cuckoo search, enhanced cuckoo search genetic algorithm, genetic algorithm and particle swarm optimization (PSO) are examined. Testing these hybrid algorithms and using 28 variables as input, the results show that particle warm outperforms other algorithms in this study.

Goli et al (2018) used various metaheuristic algorithms for improving the results and predicting demand in dairy industry too. This study used two well liked metaheuristic algorithms, such as GA and PSO, together with two more recent algorithms titled invasive weed optimization (IWO) and cultural algorithm (CA) as feature selection and demand forecasting in dairy industry. According to the results, PSO showed the best performance in feature selection while IWO can significantly improve the prediction error.

Sin and Wang (2017) explored the relationship between the features of Bitcoin and the next day change in the price of Bitcoin using an Artificial Neural Network ensemble approach called Genetic Algorithm-based Selective Neural Network Ensemble, constructed using Multi-Layered Perceptron as the base model for each of the neural network in the ensemble. To better understand the practicality and its effectiveness in real-world application, the ensemble was used to predict the next day direction of the price of Bitcoin given a set of approximately 200 features of the cryptocurrency over a span of 2 years. Over a span of 50 days, a trading strategy based on the ensemble was compared against a “previous day trend following” trading strategy through back-testing. The former trading strategy generated almost 85% returns, outperforming the “previous day trend following” trading strategy which produced an approximate 38% returns and a trading strategy that follows the single, best MLP model in the ensemble that generated approximately 53% in returns.

Chong et al. (2017) applied three methods such as PCA, restricted Boltzmann machine (RBM) and auto encoder on the deep learning network as feature extraction with three loss functions such as root-mean-squared error (RMSE), mean absolute error (MAE), mutual information (MI) and normalized mean squared error (NMSE), to predict future market trend of South Korea. Sezer et al. (2017) employed GA for stock trading system on base of deep neural network (DNN) to anticipate buy–sell–hold. GA was used as feature selection and generates the buy–sell point in mentioned system. Later, Dixon (2018) also used a long short-term memory (LSTM) network and forecasted short-term price movements.

Zhang et al. (2018) designed a system for prediction of stock price trend which could predict stock price movement and its increase or decrease trend interval during predetermined periods. They used random forest model and trained it on historical data from a China Market to categorize the multiple clips of stocks into four major groups regard to the different kinds of their close prices. The result indicates the improvement in prediction of volatility in market and some merits such as precision and return per dealing.

Baek and Kim (2018) proposed a framework, entitled ModAugNet, consisting two modules based on LSTM: one for prevention and one for prediction. The framework is tested over two Korean stock data set. The obtained results show an improvement in different error measures.

Ahmed et al. (2019) used ant colony optimization (ACO) in forecasting stock price of Nigerian stock exchange. They compared ACO with three other algorithms including Price Momentum Oscillator, Stochastic and Moving Average. They concluded that ACO has more accuracy and lower error than other methods. Ghanbari and Arian (2019) used support vector regression (SVR) and butterfly optimization algorithm (BOA) in forecasting stock market. They presented a new BOA-SVR model based on BOA and compared it with results of 11 metaheuristic algorithms on NASDAQ data. The result indicated that the considered model can improve the results and optimizing the SVR Parameters. On the other hand, this model has worked very well with higher performance accuracy and lower time consumption compared to other models. Chandana (2019) used a novel approach based on least square support vector regression (LSSVR) and Machine Learning. He decided to design an expert system for prediction of stock price and he hoped to help strengthen the forecast with improving the power of accuracy. Their system was successful because the computation became fewer and calculation was simpler too. Rajesh et al. (2019) used ensemble learning techniques for stock trend prediction concentrating on the stock price change percentage. They predicted S&P500 and its future trend with ensemble learning. To this aim, they considered two foreseen methods: ensemble learning and heat map. Evidences suggest that support vector machine (SVM), random forest, and K-neighbor's classifiers have more promising results compared to other methods. The accuracy of the forecast model seems upper than 51%, which illustrate 23% increase in prediction accuracy.

Pal and Kar (2019) used a hybrid approach to forecast time series of stock price by using data discretization based on fuzzistics [1; 2], where cumulative probability distribution approach (CPDA) is used to get the intervals for the linguistic values. First-order fuzzy rule generation and reduction of rule sets by rough set theory have been performed. Thereafter, forecasting of the time series data is computed from defuzzification using reduced rule base and its historical evidences. Proposed approach is applied on stock index closing price of three-time series data (BSE, NYSE, and TAIEX) as experimental data sets and the results show that the method is more effective than its counter parts.

Liu and Wang (2019), in order to address the profit bias in model evaluation, proposed a new effective metric, mean profit rate (MPR). The effectiveness of metric was measured based on the correlation between the metric value and profit of the model. Experiments on five stock daily index data among four countries show that MPR outperformed the classification metrics in correlating to profit. In view of these findings, they suggested that MPR is a more effective metric than the classification metrics in stock trend prediction.

Lv et al. (2019) assessed different types of machine learning algorithms based on trading cost. They tried to compare traditional algorithms and advance DNN models. They used data of period 2010–2017 from different index component stocks. The random forest, naïve Bayes, logistic regression, classification and regression tree (CART), traditional machine learning algorithms are SVM, and extreme gradient boosting while the DNN architectures include deep belief network (DBN), multilayer perceptron (MLP), RNN, Stacked Auto encoders (SAE), LSTM, and gated recurrent unit (GRU). Their results indicated that each algorithm is superior than other based on transaction cost. For example, regardless of the transaction cost, traditional machine learning algorithms perform better in many directional assessment indices; however, DNN models perform better despite of transaction cost.

Zaman (2019) realized that efficiency of Bangladesh largest stock market is weak. To improve the results, he conducted parametric and nonparametric tests of DSE & CSE from 2013 to 2017. The results proved that two stock exchanges are not efficient in the weak form.

Zhou et al. (2020) investigated the SVM power in predicting stock price change direction. They used five different datasets, including technical indices, stock posts, transaction data records, news and Baidu index, and concluded that there are different ideal data source to forecast active stocks and inactive ones. Finally, they found that more active stocks can be predicted with higher accuracy for different periods of time.

Sahoo and Mohanty (2020) proposed a combination of ANN and gray wolf optimization (GWO) technique and compared the hybrid ANN-GWO with ANN. They compare these models on a dataset of Bombay stock exchange in a time period from 2004 to 2018. The performance of the ANN-GWO and ANN is evaluated according to different error measures. The comparisons illustrate that the mentioned hybrid method results better than the ANN model.

Kumar et al. (2020) reviewed and organized the published papers on stock market prediction using computational intelligence. The related papers are organized according to related datasets, input variables used, pre-processing methods, techniques used for future selection, forecasting methods and performance metrics to evaluate the models.

According to the above reviewed papers, it can be inferred that study on stock market prediction is still being raised among researchers. Also, it seems that hybrid methods are the permanent approach used in different researches. Considering the acceptance of ANN-based methods, the focus is to enhance the performance of ANN through some metaheuristics. Limitations of the previous methods are provided in Table 1 (Obthong et al. 2020).

Table 1 Limitations of the previous methods

3 Hybrid metaheuristic ANN for stock price prediction

3.1 Technical indicators

ANN includes 3 layers that the input layer is the first one. Here, some important technical indices are used as input variables of the network. Indicators are mathematical functions that are based on specific formulas for analyzing stock prices or analyzing market indices using graphical tools. Investors and managers can use them to analysis of stock market. Choosing the best and most related technical indicators is a controversial issue. To deal with this challenge, GA is used for feature selection. The considered technical indicators are illustrated in Table 2.

Table 2 Important technical indicators

3.2 Artificial neural network (ANN)

Today, ANN is used in different problems. Some of the well-known applications include function approximation, classification and clustering information, save and reviewing data, optimization, etc (Versace et al. 2004). ANN can be used for a variety of topics, including time series forecasting. Because stock price data is not normal and it has some characteristics such as skewness, kurtosis, fat tail and nonlinearity, ANN can be used for recognizing these qualifications. As mentioned earlier, a typical ANN includes 3 layers: (1) Input, (2) Hidden; and (3) Output.

The number of layers in each phase is important because by changing them, the network will react differently. Thus, GA is applied for choosing important variables. The GA is used as feature selection for some reasons include (1) conceptual easiness; (2) searching a wide area of solutions instead of just examining a single point; (3) supporting multi-objective optimization; (4) GA is a stochastic process and robust to local minima/maxima; and finally (5) GA is easily paralyzed (Oreski et al. 2012). By doing this, the speed rate of calculation is increased and also the network will be prevented getting into local minima or maxima trap.

Neural network is based on learning which means each times it tries to reduce their error based on trial and error. The network has three phases: (1) training, (2) validation and (3) testing. This study includes two main parts. The first one includes calculating technical indicators and selecting the most optimal indicator by using GA. Second one includes forecasting closing price by using different hybrid ANN models and comparing their prediction error. Two metaheuristic algorithms, means SSO and BA, are used since they have had successful and brilliant results in various fields and researches such as prediction of the stock price and prediction of interest rate; on the other hand, they have some properties including their approximate and usually non-deterministic nature and also they are not problem-specific and flexible too (Beheshti and Shamsuddin 2013) So, stock price data, from 2013 to 2018, are split into two sections: training and testing. Then, it is analyzed with artificial intelligence algorithms and forecasting the next day closing stock price. Like Göçken et al. (2016), for training period, 70% of observations are used and for testing and validation period, the remaining 30% is used. Models are compared based on 8 criteria of prediction error. Different algorithms are used for training ANN, e.g. gradient descent backpropagation (Mozer et al. 1995), Levenberg–Marquardt (LM) backpropagation (Hao and Wilamowski 2011), BFGS quasi-Newton backpropagation (Fahad et al. 2018), Bayesian regulation backpropagation (Burden and Winkler 2008), etc.

In this study, the hidden layer neurons of the normal neural network are determined based on trial and error and it is not fixed. Due to the feature of MATLAB software, the number of hidden layers is fixed to 1. This can be considered as a kind of limitation. To this aim, 1–32 neurons are examined in hidden layer; the fittest amount of neurons with the most accuracy is chosen as ANN model. For training ANN, error-back propagation is used. LM algorithm is also used as the minimization algorithm in learning the model (Haddad and Haghighat Monfared 2012). The amount of training epochs is a thousand and for improving the results, we increased it to 2000, and the initial training rate set to 0.01 and is decreased to 0.001 to improve the accuracy of the results. ANN has two threshold functions for recognizing the linear and nonlinear qualification of the model. The output function for the hidden layer is a tangent sigmoid function which is a mathematically shifted version of the sigmoid function and it has the feature of both functions that's mean the Tahn and sigmoid and threshold function for the output layer is pure line function. We used the Tanh function for some reasons: (1) because the range of our numbers is between [1, − 1]. (2) The activation works almost always better than the sigmoid function. (3) It is capable to learn and perform more complex and nonlinear tasks. Hence the mean for the hidden layer comes out be 0 or very close to it and hence helps in centering the data by bringing mean close to 0. This makes learning for the next layer much easier.

The architecture of the proposed neural network is represented in Fig. 1.

Fig. 1
figure 1

The structure of the desired artificial neural network (Ghasemiyeh 2017)

Here, input variables are illustrated with 20 technical variables. These variables are normalized to be used as input variables using Eq. (1).

$$ \widetilde{{S_{i} }} = \frac{{\left( {S_{i} - S_{min} } \right)}}{{S_{max} - S_{min} }} \cdot i = 1 \ldots .N $$
(1)

Similarly, the goal of normalization is to change the values of dataset to a common scale, without distorting differences in the ranges and it generally speeds up learning and leads to faster convergence.

Where \({S}_{i}\) is the ith observation. Figure 2 represents the research methodology.

Fig. 2
figure 2

Research methodology

3.3 GA-ANN forecasting model

To select input variables, GA is used. GA is a stochastic search algorithm inspiring natural evolution (Kuo and Han 2011; Saber et al. 2013). Generally, GA seeks the approximate optimal solution by coding and decoding of a population of solutions and reproduction by crossover and mutation, as its main operators. In this study, inputs are coded using binary variables. The chromosomes are defined to contain 26 bits. Of these bits, 21 bits are associated with existence (bit value equal to 1) or nonexistence (bit value equal to 0) of input variables (technical indicators). 5 additional bits show the figure of neurons (25 = 32) in hidden layer. The population size of GA is 20 (Davallou and Azizi 2017; Kai and Wenhua 1997). The primary population is formed stochastically. Technical indicators and the number of hidden layer are entered to the GA and using ANN as its fitness function, and it is the amount of MSE reproduced as output. The fittest choice is one with the lowest MSE. To increase the training speed, the epochs are set to 100. As mentioned, 70% of data are employed for training and 30% is considered to test and validate. Table 3 illustrates the parameters of genetic algorithm.

Table 3 Parameters of GA

Figure 3 illustrates the proposed GA-ANN algorithm.

Fig. 3
figure 3

Considered GA flow chart for training ANN (Liu and Wang 2019)

According to Göçken et al. (2016), roulette wheel is used for parents’ selection and crossover rate is settled 80%. The one-point crossover is also used. A mutation rate of 20% along with binary mutation is also used. Selecting the best chromosomes among parents and children's, new generation continues with repeating the algorithm until termination condition is satisfied. Two termination conditions used are (1) repeating the best individual to 100 generations, and (2) reaching the maximum generation condition, i.e. reproduction of 2000 generations. Different parameters such as mutation rate, crossover rate and the number of population (population size) have been set based on Göçken et al. (2016). However, since different problems have different properties such as scalability/non-scalability, dimensional dependent/independent, there are some common beliefs about the range of parameters in different researches. For example, it is better that the number of population be between 20 and 50 and the crossover and mutation rate between 80–95 and 0.5–1, respectively (Hassanat et al. 2019).

The GA pseudo-code (i.e. steps and how to get the parameters) is illustrated in Table 4.

Table 4 GA-ANN pseudo-code

3.4 Bat algorithm (BA)

Inspiring from echolocation behavior of microbats, the bat algorithm (BA) is proposed as a heuristic optimization algorithm (Iglesiasa et al. 2020). Mirjalili et al. (2014) proved the superiority of BA to some of the other algorithms, like GA and PSO. The echolocation of microbats is interesting and there are several parameters for simulation of its behavior such as speed, location, rate of occurrence, and loudness (Gàlvez and Iglesias 2016): every virtual and figurative bat flies stochastically with a speed υi at location (solution) xi, with a rate of occurrence \({f}_{min}\), changing wavelength λ and loudness A0. Searching and finding its prey, based on the approximation of the target, the rate of occurrence and loudness are changed and rate of pulse emission r is justified (Yang 2010). Exploration is strengthened by a local accidental walk and choosing the best continues until reaching the termination attributes (Nawi et al. 2014); to control the dynamic behavior of swarm of bats, a technique with frequency-tuning nature is used. Also, tuning the algorithm parameters can be applied balance between exploration and exploitation (Yang 2010).

The loudness may change in different ways; it can be supposed that it alters from a big positive value A0 to a minimum fixed value Amin. Initially, BA started with a random population of bats, and then to renovate the location of each bat, the following formulas are used at each step:

$$ \upsilon_{i}^{new} = \upsilon_{i}^{old} + \left( {x_{i} + x_{best} } \right) \times f_{i} $$
(2)
$$ x_{i}^{new} = x_{i}^{old} + \upsilon_{i}^{new} $$
(3)
$$ f_{i} = f_{min} + \varphi_{1} \times \left( {f_{max} + f_{min} } \right) $$
(4)

where \({X}_{best}\) is the position of the best bat, \({\varphi }_{1}\) is a random value in [0, 1], \({f}_{max}\) and \({f}_{min}\) are the values of max and min frequency, and here they are assumed as being 1 and 0, respectively. The initial value of the frequency of each bat is selected from the range [\({f}_{max}. {f}_{min}]\). \({f}_{i}\) is applied to manage the velocity and the bats' movement scope (Nawi et al. 2014).

Afterwards in the local search, each bat uses a random walk to create a new alternative. To accomplish this, each bat produces a random number \(\beta \). If \(\beta \) is greater than the pulse emission rate, the new solution is generated by Eq. (4), otherwise it is generated by Eqs. (58) (Tsai et al. 2014; Chou and Nguyen 2018).

$$ x_{i}^{new} = x_{i}^{old} + eA_{min}^{old} $$
(5)

e is an incidental value from − 1 to 1 and \({A}_{min}^{old}\) illustrates the mean value of the all bats' loudness. Here, to optimize the generated solution in the case that β is not greater than \({A}_{i}\), a modification method is presented.

The main objective of this modification is to increase the diversity of the bat population using mutation and utilizing crossover which help to enhance the search efficiency. Thus, for each bat \({x}_{i}\), three bats \(\left({x}_{k1}\bullet {x}_{k2}\bullet {x}_{k3}\right)\) are selected randomly in which \(i\ne k1\ne k2\ne k3\). Now, by using the mutation and crossover operators, two below improved solutions are produced:

$$ X_{opt1} = X_{k1} + a_{1} \left( {X_{k2} - X_{k3} } \right) $$
(6)
$$ X_{opt1} = \left[ {X_{opt1.1} . X_{opt1.2} \ldots . X_{opt1.n} } \right] $$
(7)

n is the dimension of this problem.

$$ X_{opt2} = \left\{ {\begin{array}{*{20}c} {x_{best. i.} } & {\quad {\text{if}}\,a_{2} < a_{3} } \\ {x_{i.} } & {\quad {\text{Otherwise}}} \\ \end{array} } \right. $$
(8)
$$ X_{best} = [X_{best.1} \cdot X_{best.2} \ldots . X_{best.n} ] $$
(9)

where \({a}_{1}. {a}_{2}\) and \({a}_{3}\) are randomly generated numbers in [0, 1] interval. Among \({X}_{opt1.} {X}_{opt2}\) and \({X}_{i}\), the best one is replaced with \({X}_{i}\). If \(\beta <{A}_{i}\) and \(f\left({x}_{i}\right)<f\left({x}_{best}\right)\), the new generated solution is acceptable. Accepting the new solution, the loudness and the pulse emission rate are renewed as below:

$$ A_{i}^{new} = a \cdot A_{i}^{old} $$
(10)
$$ r_{i}^{new} = r_{i}^{0} \cdot \left[ {1 - exp\left( { - \gamma *t} \right)} \right] $$
(11)

Here, \(a\) and \(\gamma \) are constant values, \({r}_{i}^{0}\) is the initial rate of the pulse emission and t indicates the number of iterations. In this study, the explained BA is used to modify the weight matrix of ANN. In BAT-ANN, at first, the primary population of bats is used to form the initial weight matrix. This matrix is then passed to ANN to start the training phase (Hafezi et al. 2015). Then, BA specified the best solution based on the neural network results. A local search is then performed to discover new solutions. The replacement of new acceptable solution with the best knows solution is replied until satisfaction of the termination criteria (Yang 2010). Finally, the optimal values of the weight matrix are determined. Figure 2 shows the flowchart of BAT-ANN.

It should be noted that the calculation method is adapted from Yang (2010), Golmaryami et al. (2015), and Jantan et al. (2017).

Table 5 summarizes the notation used for parameters of BA.

Table 5 Bat algorithm parameters

As in GA-ANN algorithm, different parameters such as pulse rate and velocity. have been set based on different research such as Golmaryami et al. (2015) and Hafezi et al. (2015).

The process steps of the bat algorithm are shown in Table 6.

Table 6 BA pseudo-code

3.5 Social spider algorithm (SSA)

This kind of algorithm means social spider optimization (SSO) that is in the subset of metaheuristic, evolutionary and swarm intelligence algorithms that is modified lifestyle form of the social spiders, male and female (Mirjalili et al. 2015). They have and do various functions and operations due to their gender, each one does different tasks like mating, preying, web design, social interrelation, etc. (Luque-Chang et al. 2018). As you know, a problem may have several answers and you should find them in search space. In this algorithm, you can consider communal web as a search space. Spiders due to their positions play the role of solution (Evangeline and Abirami 2019). Web and vibration are very important for spiders because they can understand when the prey is trapped and some details about mating which are transferred along the thin strings of the web due to spiders' vibrations (Reddy et al. 2019). In vibration two things are very important: weight and distance. Spiders should change their weights regard to a fitness value. Accordingly, they execute different operations such as mating. Like genetic algorithm which is based on superiority of better individuals, the offspring with better weigh changes with weak one, else the population doesn’t change. At the end of all iterations, the best spider with the best fitness seems to be the optimal choice (Yeh 2012). In training ANN with SSO algorithms, the best spider has a role of optimal solution. Here, to find the fitness value of the spider, minimization of MSE is considered as the objective function.

Like other metaheuristic algorithm, SSA has different steps and parameters.

3.5.1 Initialization

Like any other swarm intelligence and evolutionary algorithms, the SSO algorithm begins with assigning an initial value to population and spider location. It includes two kinds of population: female \({f}_{i}\) and male \({m}_{i}\) spiders. The amount of population and individuals means female spiders \({f}_{i}\)(\({N}_{f})\) can be selected intractably which often lies at range of 65–90% and is obtained by Eq. (12) and the amount of male spiders \({m}_{i}\left({N}_{m}\right)\) is also determined by Eq. (13):

$$ N_{f} = floor[\left( {0.9 - rand\left( {0.1} \right)*0.25} \right)*N $$
(12)
$$ N_{m} = N - N_{f} $$
(13)

In SSO algorithm, the position of the \({f}_{i}\) is important. Therefore, we considered some limitations means upper bound and lower bound that \({f}_{i}\) is generated randomly between them. We have shown the initial parameters of lower and upper bound with \({P}^{low}\) and \({P}^{high}\) which are represented by:

$$ f_{i.j}^{0} = P_{j}^{low} + rand\left( {0.1} \right)*\left( {P_{j}^{high} - P_{j}^{low} } \right) $$
(14)

where \(i = 1.2. \ldots .N_{f}\), \(j = 1.2. \ldots .n\). Then \(m_{i}\) is also accidentally created and equates as:

$$ m_{i.j}^{0} = P_{j}^{low} + rand\left( {0.1} \right)*\left( {P_{j}^{high} - P_{j}^{low} } \right) $$
(15)

where \(i\, = \,1,2, \ldots , \, j\, = \,1,2, \ldots ,n\).

3.5.2 Fitness assignation

It should be noted that the size of spiders is very important and can affect the improvement in the solutions and optimizing the network and totally achieving the main goal. In the presented model, a weight \({W}_{i}\) is assigned to the ith spider (irrespective of gender) that indicates its quality in the population S. The weight is calculated for each spider as follows:

$$ \frac{{J\left( {s_{i} } \right) - worst_{s} }}{{best_{s} - worst_{s} }} $$
(16)

where \(J\left({s}_{i}\right)\) shows the fitness value of spider \({s}_{i}\). Equation (17) represents the values of \({worst}_{s}\) and \({best}_{s}\) as:

$$ \begin{aligned} best_{s} & = max_{{k \in \left[ {1.2 \ldots .N} \right]}} \left( {J\left( {s_{k} } \right)} \right) \\ worst_{s} & = min_{{k \in \left[ {1.2 \ldots .N} \right]}} \left( {J\left( {s_{k} } \right)} \right) \\ \end{aligned} $$
(17)

3.5.3 Vibration modeling

The communal web is something more than communal web and is vital for spiders according to the important things it makes possible, for example, making connection and relationship between spiders and their distance to each other. The size of vibrations means higher or lower one has different meaning. The more vibration means closer distance to each other and vice versa. In sequence, to exchange information between members i and member j of the colony, the mathematical definition of vibration is formed as follows:

$$ Vib_{ij} = w_{j} e^{{ - d_{ij}^{2} }} $$
(18)

where \({d}_{ij}\) shows the Euclidian distance of member i and j within the colony. Spiders use these vibrations to understand the distance and transfer it from member i to member j. These 3 types of vibrations occur between i and j that are illustrated as Vib\({c}_{i}\), Vib\({b}_{i}\), and \(\mathrm{Vib}{f}_{i}\).

The individual i (si) receives the vibration \(Vib{f}_{i}\) as a result of the sent information by the member c (sc) that is nearer to i and also with higher weight compared to i (wc > wi).

$$ Vibc_{i} = w_{c} e^{{ - d_{i.c}^{2} }} $$
(19)

The individual i receives the vibrations Vib\({b}_{i}\) as a result of the transferred information by the member b (sb) that has the best weight or the best fitness value of the population S as a whole.

$$ {\text{Vib}}b_{i} = w_{b} e^{{ - d_{i.b}^{2} }} $$
(20)

Finally, the transferred information from the member i to the closest female individual \({f}_{{(s}_{f})}\) can be defined by \(\mathrm{Vib}{f}_{i}\) as:

$$ {\text{Vib}}f_{i} = w_{f} e^{{ - d_{i.f}^{2} }} $$
(21)

3.5.4 Female cooperative operator

The movement between spiders means absorption or excretion is based on several random criteria which is shown with symbol \({f}_{i}\) in this article without considering gender. A random number \({r}_{m}\) is generated uniformly in the range of [0, 1]. When \({r}_{m}\) is smaller than a predetermined threshold PF, an attraction and a repulsion are created and shown in Eq. (22).

$$ f_{i}^{t + 1} = \left\{ {\begin{array}{*{20}c} {f_{i}^{t} + a*{\text{Vib}}c_{i} *\left( {s_{c} *f_{i}^{t} } \right) + \beta *{\text{Vib}}b_{i} *\left( {s_{b} *f_{i}^{t} } \right) + \delta *\left( {rand - 0.5} \right) } \\ {{\text{with probability}} pf} \\ {f_{i}^{t} - a*{\text{Vib}}c_{i} *\left( {s_{c} *f_{i}^{t} } \right) - \beta *{\text{Vib}}b_{i} *\left( {s_{b} *f_{i}^{t} } \right) + \delta *\left( {rand - 0.5} \right)} \\ {{\text{with probability}} 1 - pf} \\ \end{array} } \right. $$
(22)

where \(\alpha \cdot \beta \cdot \delta\) and rand are random numbers between [0, 1] and t is the iteration number and the individuals \({s}_{c}\) and \({s}_{b}\) symbolize the nearest spider with a higher weight than \({f}_{i}^{t}\) and the best spider in the communal web, respectively.

3.5.5 Male cooperative operator

According to weights, there are two groups of spiders. Some spiders have weights more than median the median of \({N}_{m}\) (dominant or D) and the others have weights lower than median of the \({N}_{m}\) (non-dominant or ND). The median weight is expressed by \({N}_{f+m}\). The position of the \({m}_{i}\) might be equated as:

$$ m_{i}^{t + 1} = \left\{ {\begin{array}{l} {m_{i}^{t} + a*Vibf_{i} *\left( {s_{f} - m_{i}^{t} } \right) + \delta *\left( {rand - 0.5} \right)if (w_{{N_{f + i} }} > w_{{N_{f + m} }} )} \\ {m_{i}^{t} + a*\left( {\frac{{\mathop \sum \nolimits_{h = 1}^{{N_{m} }} - h - N_{f} + n}}{{\mathop \sum \nolimits_{h = 1}^{{N_{m} }} *w_{{N_{f} }} + h}} - m_{i}^{t} } \right).} \\ \end{array} }\right. $$
(23)

3.5.6 Mating operator

Mating has a specific range and takes place between dominant (D) and \({f}_{i}\). The mating range is equated as:

$$ r = \frac{{\mathop \sum \nolimits_{j = 1}^{n} \left( {P_{j}^{high} - P_{j}^{low} } \right)}}{2*n} $$
(24)

Spider weight is directly related to offspring. The chance of the spider with more weight is more likely to offspring and vice versa. Table 7 illustrates the parameters of the SSO algorithm.

Table 7 SSO Algorithm Parameters

The calculation method used in this paper is adapted from (Luque-Chang et al. 2018; Saravanan et al. 2019; Gülmez and Kulluk 2019).

The steps in the social spider algorithm to obtain the parameters are as follows:

  1. 1.

    Consider N as the total number of colony population; define the number of male Nm and female Nf spiders in the entire population S.

  2. 2.

    Initialize randomly the male and female members and calculate the radius of mating.

  3. 3.

    Calculate the weight of every spider of S.

  4. 4.

    Move the female spiders according to the female cooperative operator.

  5. 5.

    Move the male spiders according to the female cooperative operator.

  6. 6.

    Perform the mating operation.

  7. 7.

    If the stop criteria are met, the process is defined; go back to step 3.

4 Time series forecasting (ARMA and ARIMA)

When you check an events or sequence during time intervals, it is a kind of time series. (Hamilton 1994). You can check and examine the events in different frequencies such as yearly, monthly, weekly, daily, hourly or even in minutes or seconds. When you use past data to predict the future happens, it can be considered as prediction of single variable time seriesFootnote 1 and when you anticipate something more than a series, it is called multivariate time series forecasting (Granger et al. 1974; Reinsel 2003).

In autoregressive integrated moving average (ARIMA), the subsequent values of the variable are supposed to be the past observations and random errors in form of linear function and are called white noise. So we can use the same equation for the prediction of future value (Zhang 2003). ARIMA can be used to model the time series which are not station and don't have any pattern.

An ARIMA model is known with these components: (p, d, q) (Sowell 1992).

First of all, you should set the time series stationary. Because, phrase ‘Auto Regressive’ in ARIMA conveys that it uses its own lags for predictors as a linear regression model. We intend to check that whether predictors are dependent or independent to each other which this correlation can affect the model.

To set a time series stationary, a lot of methods are represented. Differencing is the most common one (Clements and Hendry 2000). That is, the current value minus the previous value. Due to the complicacy in the model, sometimes it needs to difference it more than one. Thus, the value of d shows the minimum number of distinct required to create the series stationary. Heretofore the time series is stationary, it means that d = 0 and it doesn't need differencing.

When particular lagged values of Yt are used as predictor variables, the AR(p) model is considered as an autoregressive model. Where results of a time period affect the following periods, the lags are generated.

The “p” value indicates the order. For instance, “First-order autoregressive process” can be shown as AR(1). The output variable of first-order AR in time t depends on its previous time periods (t − 1). The same is true in case of second- or third-order AR process that depend on data of two or three periods apart.

An AR model is one where \({Y}_{t}\) is only related to its own lags. Here, \({Y}_{t}\) as the lags is illustrated as (Tseng et al. 2001; Akaike 1998).

$$ Y_{t} = \alpha + \beta_{1} Y_{t - 1} + \beta_{2} Y_{t - 2} + \cdots + \beta_{p} Y_{t - p} + \varepsilon_{1} $$
(25)

where \(({Y}_{t-1}. {Y}_{t-2}. \dots . {Y}_{t-p})\) are the past series values (lags), \({(\beta }_{1}. {\beta }_{2}. \dots . {\beta }_{p})\) are the coefficient of lag estimated by the model and it also estimates \(\alpha \) as the seperating term.

Also the moving average (MA) model equals 1 while \({Y}_{t}\) is only depended to lagged caused by forecast errors (Said and Dickey 1984).

$$ Y_{t} = \alpha + \varepsilon_{t} + \emptyset_{1} \varepsilon_{t - 1} + \emptyset_{2} \varepsilon_{t - 2} + \cdots + \emptyset_{q} \varepsilon_{t - q} + \varepsilon_{t - q} $$
(26)

where the errors are caused by autoregressive models regard to the related lags. These errors \({\varepsilon }_{t}\) and \({\varepsilon }_{t-1}\) relate to Eqs. (27)–(28):

$$ Y_{t} = \beta_{1} Y_{t - 1} + \beta_{2} Y_{t - 2} + \cdots + \beta_{0} Y_{0} + \varepsilon_{1} $$
(27)
$$ Y_{t - 1} = \beta_{1} Y_{t - 2} + \beta_{2} Y_{t - 3} + \cdots + \beta_{0} Y_{t - n} + \varepsilon_{t - 1} $$
(28)

They come from AR and MA models, respectively.

Through the combination of AR and MA with at least one differencing, an ARIMA model can be produced (Pai and Lin 2005). So the equation becomes:

$$ Y_{t} = \alpha + \beta_{1} Y_{t - 1} + \beta_{2} Y_{t - 2} + \cdots + \beta_{p} Y_{t - p} + \varepsilon_{1} + \emptyset_{1} \varepsilon_{t - 1} + \emptyset_{2} \varepsilon_{t - 2} + \cdots + \emptyset_{q} \varepsilon_{t - q} $$
(29)

The following diagram shows the flowchart of ARIMA model (Fig. 4).

Fig. 4
figure 4

ARIMA flowchart (Ma et al. 2018)

Additional explanations and more details are as follows:

  • Step 1 Check stationarity: if a time series has a trend or seasonality component, it must be made stationary before we can use ARIMA to forecast.

  • Step 2 Difference: if the time series is not stationary, it needs to be stationarized through differencing. Take the first difference, and then check for stationarity. Take as many differences as it takes. Make sure you check seasonal differencing as well.

  • Step 3 Filter out a validation sample: this will be used to validate how accurate our model is. Use train test validation split to achieve this.

  • Step 4 Select AR and MA terms: use the ACF and PACF to decide whether to include an AR term(s), MA term(s), or both.

  • Step 5 Build the model: build the model and set the number of periods to forecast to N (depends on your needs).

  • Step 6 Validate model: compare the predicted values to the actuals in the validation sample.

5 Experimental results and findings

The main goal of this study is to forecast stock price hybridizing ANN with GA for feature selection and two metaheuristic algorithms include BA and SSO for improving the network. The five major indices of DAX, S&P500, FTSE100, DJI and NDAQ are studied in this research. The desired time interval for research is from 4 July 2018 to 4 July 2020 about 2 years. Some important technical indicators like RSI, and MACD, etc., are employed as input variables and are reduced in optimal position. Thus, 20 technical indicators are selected to predict stock price, which 19 variables are inputs and 1 variable is output or target variable that determines the next day's price.

The first step, as declared in Sect. 3 is data normalization. Data are normalized between [− 1, 1] to become ready as input variables. Table 8 is a general description of the indices, the timeframe and number of data used in this study.

Table 8 Statistical description of data

5.1 Artificial neural network (ANN)

As mentioned before, ANN includes three layers. The feature of ANN used in this study is defined in Sect. 3.2. Summarily, the number of input layer is 20, output layer is 1; and hidden layer quantity varies due to trial and error. The hidden layer uses tangent sigmoid as its activation layer and the output layer uses the simple linear. The data set is divided into two sections (1) training network (70%) and (2) validation (30). LM algorithm is used for training. Mean square error (MSE) is also adapted as loss function. The related information about architecture, training and testing for each indices is represented in Table 9.

Table 9 Training, validation and testing (T.V.T) error and network architecture

More information about training, validation and testing for DJI index is represented Table 10 and Fig. 5, for instance. Other indices are presented in “Appendix A”.

Table 10 The DJI index details (T.V.T)
Fig. 5
figure 5

Actual V.S output (testing) for DJI

5.2 GA-ANN algorithm

The GA is used for choosing the best and fittest input and hidden layers in ANN.

Therefore, Sect. 3.3 determines and describes the related parameters including size of population, the magnitude of generations, and rates of mutation and crossover. Using GA, the training, validation and testing error along with network architecture are determined according to Table 11.

Table 11 T.V.T error and network architecture after using GA

Accordingly, using GA the number of input variables can be decreased to 8, while the amount of R-Squared is increased. The best fitness is the best technical indicators which network could recognize. For each index, as it is clear, a different number of input variables is selected, and this is due to the difference in the importance and the role of each technical indicator in the final price or target output (index). Details about selected technical indicators are represented in appendix (table ***A5).

5.3 Bat algorithm (BA)

In this section, the parameters are optimized and the network is improved improved using bat algorithm. The obtained result is illustrated in Table 12.

Table 12 Bat-ANN optimum parameters and error

5.4 SSO (social spider optimization) algorithm

In this part, the global best fitness and global best solution are checked after 1000 iterations. Therefore, the error is improved using SSO. At first, the parameters set to a predetermined number and then, the network optimizes it with minimum error. Table 13 indicates the optimum error and parameters.

Table 13 SSO-ANN optimum parameters and error

\(\alpha \cdot \beta \cdot \delta\) are random numbers between [0, 1]. The classical SSO requires the random selection of parameters \(\alpha \cdot \beta \cdot \delta\) [(22) and (23)] to control the movement of the spiders, which can affect the mentioned balance leading the algorithm to a premature convergence. The other details including the ANN structure (i.e. the number of neurons in each layer such as input layer, hidden layer and output layer) and the estimation error and the average optimum solutions are attainable too.

According to this table, it can be easily seen that error is very lower than ANN and GA-ANN network.

6 Time series forecasting (ARIMA)

The time series with financial nature usually are not stationary; they have some characteristics such as skewness and kurtosis with fat tail. Before doing everything, it seems necessary to check and recognize the stationary of the series. In this research, to find and test the stationary, Augmented Dickey Fuller Test is used. First, the stationary of each index is checked separately. The correlogram of DJI is shown in Fig. 6. Table 12 is shown the Unit root test without differencing for DJI.

Fig. 6
figure 6

Correlogram of closing price (DJI)

From Table 14, since t-statistic, i.e. − 2.001110, is bigger than critical values in various significance levels (1%, 5% and 10%). Thus, series has a unit root and doesn't seem stationary. This problem is solved using ADF test.

Table 14 Unit root test without differencing (DJI)

After differencing, the series is stationary (Fig. 6). More details are represented in Table 15 (Fig. 7).

Table 15 ADF test after differencing
Fig. 7
figure 7

Correlogram of closing price after differencing (DJI)

Now the series can be forecasted using ARIMA. Using Eviews 10, the degree of ARIMA is predicted. Table 16 shows the best model estimation. The used models to select criteria are summarized as can be seen in Table 17. Also, Fig. 8 illustrates the Akaike information criteria, while the ARIMA forecasting summary is illustrated in Table 18.

Table 16 ARIMA forecasting
Table 17 The models used to select criteria
Fig. 8
figure 8

Akaike information criteria (top 20 models)

Table 18 ARIMA forecasting summary

As it is clear, the best ARIMA selected model is (4, 1, 3) with AIC value − 2.695. The above process is done over all the indices and the results are represented in “Appendix B”.

7 Comparing results

In this part, some similar studies are reviewed and the obtained results are compared with them in the, as illustrated in Table 19.

Table 19 Comparative study

It can be seen that the lowest loss functions and highest R-Squared are obtained using the Social Spider Optimization (SSO) and bat algorithm (BA) and these algorithms performed well.

8 Conclusions

Today, the speed of making decisions has increased. So, the stock market has been many fluctuations and volatilities. Different factors toughen up the severity of fluctuations among them can refer to major economic, politic and social changes. On the other hand, with the Coronavirus outbreak in the late of 2019, a great fluctuation is expected in stock market. Thus, using improved and well-equipped methodologies to confront these fluctuations will be a necessity. One of the main tools that can help investors is artificial intelligence (AI). AI has many applications such as pattern recognition, regression, classification.

In the current study, application of a usual ANN in forecasting stock price is compared with a hybrid metaheuristic-based ANN. To forecast stock price, a data set is employed to train and test an ANN. Then, a hybrid ANN is developed. In the proposed hybrid ANN, genetic algorithm is used for feature selection. Then, the bat algorithm and social spider optimization are used separately for ANN parameters optimization.

In this paper, five main and important indices, such as DJI and DAX, are forecasted using ANN which is in the subset of AI. We used 20 main technical indicators as input variables. Today, many methods are used to optimization of the network. One of them is evolutionary algorithms. We used GA as an evolutionary algorithm for feature selection purpose. We could see that by using GA, the number of input variables reduced significantly. Thus, the speed of calculations and the accuracy of the network and the coefficient of determination increased. Also, two new metaheuristic algorithms including social spider algorithm and bat algorithm have been used to improve the results. The main advantages of using metaheuristic algorithms are as follows:

  • Speed up calculations

  • Reduce model complexity

  • Increase the network accuracy

  • Ease of using models

  • High robustness

  • Intelligent.

    On the other hand, they have some limitations:

  • In GA, there is no guarantee that the best and most related technical indicators have been selected.

  • We have tried to overcome the local optima trap but it is still possible.

Comparing with previous methods, SSO and BA have had the lowest error, respectively, which could predict stock price better. As it is clear, the error of the social spider algorithm has been less, but this does not mean that this algorithm is better. Due to the difference in the time required to calculate, the complexity of the calculations, the required parameters, etc., we cannot say with certainty which one is better. But if we consider error as a measure of superiority, the social spider algorithm performed better. We used time series for the prediction of stock price too. The considered model was ARIMA. Because of nonlinearity and asymmetric qualification of stock price data, ANN could predict the stock price better than time series model means ARIMA. Experiments show that hybrid models perform better to explain the model with lower error. Therefore, the main recommendation is that different new metaheuristic algorithms should be used to train the network.