1 Introduction

In the solution of the time series forecasting problem, the past movements and realisations of the time series are taken into account and tried to be explained by statistical models. Classical forecasting methods are usually based on linear models for lagged variables of the time series. Artificial neural networks can successfully solve the forecasting problem by utilising a combination of flexible nonlinear functions and using lagged variables as inputs. Multilayer perceptron (MLP) artificial neural networks proposed by Rumelhart et al. (1986) are the most frequently used shallow artificial neural network type in the literature for solving forecasting and classification problems. In recent studies, Borhani and Wong (2023) used an MLP artificial neural network to predict students' achievement in their study. Shams et al. (2023) used an MLP artificial neural network to estimate air quality indexes in their study. Park et al. (2023) used MLP to predict arsenate toxicity in their study. Arumugam et al. (2024) trained an MLP artificial neural network with a Crossover smell agent algorithm and used it with a convolutional neural network to detect brain tumours. Kumar et al. (2024) used MLP in the detection of Vector-Borne Disease. Shafiq et al. (2024) used MLP models to predict Darcy–Forchheimer tangent hyperbolic flow parameters. Chen et al. (2024b) proposed an MLP-based model for grey gas emissivity and absorptivity. Chen et al. (2024a), MLP is one of the machine learning methods used for breast cancer diagnosis. Mariia (2024) used a three-layer MLP for yield forecasting and control of Microclimate Parameters. Jiang et al. (2024) presented a method that produces interval prediction by using MLP in combination with a deep neural network built with CNN.

While MLP is based only on the additive aggregation function, artificial neural networks based on the multiplicative aggregation function have also been proposed in the literature. The single multiplicative neural cell artificial neural network method proposed by Yadav et al. (2007) can solve the prediction problem with a single neuron as successfully as MLP. This has led to the investigation of artificial neural networks based on the multiplicative aggregation function and many artificial neural networks using the multiplicative aggregation function have been proposed in the literature. Zhao and Yang (2009) used particle swarm optimization algorithm; Burse et al. (2011) used an improved backpropagation algorithm; Worasucheep (2012) used harmony search algorithm; Chatterjee et al. (2013) used standard backpropagation; Wu et al. (2013b) used online training algorithms; Cui et al. (2015) used improved glowworm swarm optimization algorithm; Gundogdu et al. (2016) used PSO, Bas (2016) used differential evolution algorithm; Nigam (2019) used standard backpropagation learning algorithm; Kolay (2019) used sine cosine algorithm, Yu et al. (2020) used spherical search algorithm; Bas et al. (2020) used a hybrid algorithm based on artificial bat and backpropagation algorithms; Egrioglu et al. (2023c) used a new genetic algorithm method based on statistical-based replacement in the training of SMNM-ANN. Aladag (2013) used a multiplicative neuron model to establish fuzzy logic relationships. Wu et al. (2013a) proposed novel techniques based on the SMN model with iterated nonlinear filtering online training algorithms for engine systems reliability prediction. Velásquez et al. (2013) proposed a hybrid model based on SARIMA and a multiplicative neuron model for electricity demand forecasting. Wu et al. (2015) used a single multiplicative neuron model with nonlinear filters for hourly wind speed prediction. Basiouny et al. (2017) proposed a Wi-Fi fingerprinting indoor positioning system which utilizes the single multiplicative neuron. Yildirim et al. (2021) proposed a threshold single multiplicative neuron artificial neural network based on PSO and harmony search algorithm. Wu et al. (2021) used an online nonlinear state space forecasting model with the basis of the SMN model. Pan et al. (2021) used a modified double multiplicative neuron network for time-series interval prediction. Nigam and Bhatt (2023) proposed a single multiplicative neuron model for predicting crude oil prices and analyzing lag effects. Egrioglu and Bas (2023a) proposed a hybrid neural network based on a combination of simple exponential smoothing and the single multiplicative neuron model. Egrioglu et al. (2023a) proposed a new nonlinear causality test based on a single multiplicative neuron model artificial neural network. Kolay and Tunç (2023) proposed a new hybrid neural network classifier based on adaptive neurons and multiplicative neurons.

Shin and Ghosh (1991) proposed an artificial neural network called Pi-Sigma artificial neural network, which is similar to MLP but different from MLP, using a multiplicative aggregation function in the output layer. In the Pi-Sigma neural network, the weights in the hidden layer and the output layer are taken as fixed, whereas, in Egrioglu and Bas (2023b), where these weights are taken as variable, it is shown that improvement is provided for the solution of the forecasting problem compared to the Pi-Sigma ANN. Another artificial neural network that uses the multiplicative aggregation function is the Sigma-Pi artificial neural network, which has been developed in Rumelhart and McClelland (1988) and Gurney (1989). The training of the Sigma-Pi artificial neural network with the grey wolf optimisation algorithm was proposed in Sarıkaya et al. (2023) and its applications on the forecasting problem were performed. Nie and Deng (2008) proposed a hybrid genetic learning algorithm for the Pi-sigma neural network. Hussain et al. (2008) proposed a recurrent Pi-Sigma neural network for physical time series prediction. Ghazali and Al-Jumeily (2009) applied pi-sigma neural networks to financial time series prediction. Husaini et al. (2011) used a backpropagation algorithm on historical temperature data of Batu Pahat. Husaini et al. (2012) studied the effects of parameters on the pi-sigma neural network for temperature forecasting. Panigrahi et al. (2013) used a modified differential evolution algorithm in the training of a pi-sigma neural network for pattern classification. Nayak et al. (2014) used a hybrid training algorithm based on PSO and GA for the Pi sigma neural network. Nayak et al. (2015) used gradient descent and genetic algorithm methods in the training of Pi-Sigma artificial neural networks. Akdeniz et al. (2018) proposed a new recurrent architecture for Pi-Sigma artificial neural networks. Egrioglu et al. (2019) proposed a new intuitionistic fuzzy time series method based on pi-sigma artificial neural networks trained by an artificial bee colony. Nayak (2020) used a fireworks algorithm in the training of PS-ANN. Panda and Majhi (2020) used an improved spotted hyena optimizer algorithm in the training of the Pi sigma neural network. Pattanayak et al. (2020) used hybrid chemical reaction optimization in the training of the Pi-Sigma neural network. Panda and Majhi (2021) used the Salp swarm algorithm in the training of the Pi Sigma neural network. Bas et al. (2021) used a sine cosine optimization algorithm in the training of a Pi-Sigma artificial neural network. Yılmaz et al. (2021) used a differential evolution algorithm in the training of Pi-Sigma artificial neural networks for forecasting. Kumar (2022) used a Lyapunov-stability-based context-layered recurrent pi-sigma neural network for the identification of nonlinear systems. Dash et al. (2023) used a shuffled differential evolution in the training of PS-ANN. Fan et al. (2023) proposed a new algorithm for Pi-sigma neural networks with entropy error functions based on L_0 regularization Arslan and Cagcag Yolcu (2022) proposed an intuitionistic fuzzy time series forecasting model based on a hybrid sigma-pi neural network. Bas et al. (2023) proposed a robust algorithm based on PSO in the training of Pi-sigma artificial neural networks for forecasting problems. Bas and Egrioglu (2023) proposed a new recurrent pi‐sigma artificial neural network inspired by an exponential smoothing feedback mechanism for forecasting.

Another artificial neural network using the multiplicative neuron model is the dendritic neuron model artificial neural network (DNM-ANN) proposed in Todo et al. (2014), which takes different nonlinear transformations of all raw inputs and works on the transformed inputs, thus including the data augmentation process in the process. The dendritic neuron model artificial neural network has been used in many studies in the literature to obtain time series forecasting. Yu et al. (2016) used a dendritic neuron model for forecasting the house price index of China. Zhou et al. (2016) used a dendritic neuron model for time series forecasting. Chen et al. (2017) proposed a novel dendritic neuron model to perform tourism demand forecasting. Gao et al. (2018) used some popular artificial intelligence optimization algorithms in DNM-ANN for classification, approximation, and prediction. Song et al. (2020) used a dendritic neuron model for wind speed time series forecasting. Jia et al. (2018) proposed a flexible methodology by combining a dendritic neuron model with a statistical test for forecasting. Qian et al. (2019) proposed a novel mutual information-based dendritic neuron model for classification. Song et al. (2019) used a social learning particle swarm optimization algorithm in the training of the dendritic neuron model. Jia et al. (2020) used backpropagation, biogeography-based optimization and competitive swarm optimizer in the training of DNM-ANN for classification. Han et al. (2020) used a whale optimization algorithm in the training of the dendritic neuron model for classification. Wang et al. (2020b) proposed a dendritic neuron model with adaptive synapses trained by a differential evolution algorithm. Wang et al. (2020a) used states of matter search in their proposed median dendritic neuron model for forecasting. Yu et al. (2021) used a dynamic scale-free network-based differential evolution in the training of DNM-ANN. Luo et al. (2021) proposed a decision-tree-initialized dendritic neuron model for classification. Xu et al. (2021) used an information feedback-enhanced differential evolution algorithm in the training of a dendritic neuron model. In He et al. (2021), the time series were decomposed by the seasonal trend decomposition method and the method of obtaining forecasts over the decomposed series with DNM-ANN was preferred. Tang et al. (2021) used an artificial immune system algorithm in the training of DNM-ANN. Nayak et al. (2022a) used chemical reaction optimization in the training of DNM-ANN for forecasting. Al-Qaness et al. (2022b) used a seagull optimization algorithm in the training of DNM-ANN for forecasting. Yilmaz and Yolcu (2022) used modified particle swarm optimization in the training of DNM-ANN for forecasting. He et al. (2022) used a coyote optimization algorithm in the training of DNM-ANN. Wang et al. (2022) proposed a novel dendritic convolutional neural network which considers the nonlinear information processing functions of dendrites in a single neuron. In Al-Qaness et al. (2022a), DNM-ANN was used for crude-oil-production forecasting. In Nayak et al. (2022b), an improved chemical reaction optimisation algorithm-based dendritic neuron model is proposed for financial time series forecasting In Egrioglu et al. (2022), a recurrent dendritic neuron model artificial neural network was proposed for the first time in the literature. The proposed artificial neural network in Egrioglu et al. (2022) has a structure in which the error of the network is feedback. Although Egrioglu et al. (2022) produce successful prediction results, it is not a deep neural network and does not have the advantage of increasing the number of hidden layers for more successful modelling. Deep artificial neural networks, which have very useful results, especially in the field of image processing, have started to be preferred in solving the forecasting problem in recent years. Wang et al. (2023) used the Levenberg–Marquardt algorithm with error selection in the training of DNM-ANN. Yılmaz and Yolcu (2023) proposed a robust algorithm based on Huber's loss function in the training of dendritic neuron model artificial neural network for forecasting problems. Egrioglu et al. (2023b) proposed a robust algorithm with Tukey’s weight loss function based on particle swarm optimization (PSO) in the training of winsorized dendritic neuron model artificial neural network for forecasting problems. Olmez et al. (2023) proposed a bootstrapped dendritic neuron model artificial neural network based on PSO for forecasting problems. Gul et al. (2023) proposed some statistical learning algorithms for dendritic neuron model artificial neural networks for forecasting. Zhang et al. (2023) proposed a dendritic neuron model optimized by meta-heuristics for financial time-series forecasting. Yuan et al. (2023) proposed a dendritic neuron model trained by an improved state-of-matter heuristic algorithm for forecasting. Cao et al. (2023) used an improved Adam optimizer to train a dendritic neuron model for water quality prediction. Bas et al. (2024) proposed a robust training of median dendritic artificial neural networks for time series forecasting.

Recurrent deep neural networks such as long short-term memory (LSTM) and gated recurrent unit (GRU) have been the most frequently used deep neural networks in the field of forecasting thanks to their structure using timesteps. A summary of the literature on deep neural networks can be given as follows. Jiang and Hu (2018) used an LSTM model for day-ahead price forecasting for the electricity market. Chung and Shin (2018) used LSTM based on a genetic algorithm for stock market prediction. Tian et al. (2018) used LSTM and convolutional neural network (CNN) methods for load forecasting. In Bendali et al. (2020) study, the GRU-GA model was proposed for the estimation of photovoltaic energy production. Veeramsetty et al. (2021) also carried out a study on load forecasting using factor analysis and LSTM. Liu et al. (2021) combined the LSTM model with online social networks for stock price prediction. In Guo and Mao (2020) study, the GRU-GA model was proposed for the charging estimation of electric vehicles. Gundu and Simon (2021) used LSTM based on PSO for the short-term forecast of heterogeneous time series electricity prices. Inteha (2021) performed the day-ahead short-term load forecast with the GRU-GA model. Ning et al. (2022) compared the performance of ARIMA, LSTM and Prophet methods for oil production forecasting. Karasu and Altan (2022) used the LSTM method for oil time series prediction. Bilgili et al. (2022) performed electricity energy consumption forecasting using LSTM. Liu et al. (2022) proposed a new deep-learning forecasting method for Satellite Network Traffic Prediction by developing the GRU artificial neural network. Du et al. (2022) used LSTM based on particle swarm optimization for urban water demand. Gong et al. (2022) proposed an improved LSTM for the state of health estimation of lithium-ion batteries. Huang et al. (2022) used an LSTM model for well-performance prediction. In Liu et al. (2022), satellite network traffic prediction was performed using GRU-PSO. In Song et al. (2022) study, GRU-PSO was used for terminal cooling load estimation. Li et al. (2022) proposed a novel ensemble method based on Bidirectional-GRU and a sparrow search algorithm is proposed to forecast production. Lin et al. (2022) used gated recurrent unit deep neural networks for time series-based groundwater level forecasting.

When the literature on forecasting with artificial neural networks is examined, it is seen that the application and development of both shallow and deep artificial neural networks on the solution of the forecasting problem continues. It is seen that these deep ANNs can find more application areas, especially since the creation of modular structures of networks such as CNN, LSTM and GRU in ready-made package programs and libraries offers convenience to practitioners. However, in recent years, it has been seen that shallow artificial neural networks based on different neuron models and different architectures can produce more successful prediction results than deep neural networks. As a result, it seems that the introduction of deep artificial neural networks that can be created with different neuron models will be the subject of future studies for researchers producing artificial neural network architectures and models.

The motivation of this study is to contribute to the solution of the forecasting problem by proposing a deep recurrent artificial neural network using a dendritic neuron model for the forecasting problem. The contributions of the work are presented in the following sentences. In this study, a deep recurrent artificial neural network using the dendritic neuron model is proposed, which has started to achieve successful forecasting results in recent years. This proposed new deep recurrent artificial neural network is named a Deep dendritic artificial neural network (DeepDenT). To create the DeepDenT deep artificial neural network, a new "dendritic cell" structure is created. Dendritic cells, just like LSTM and GRU cells, can work like a mini neural network that can receive many inputs and produce many outputs. For the generation of the dendritic cell, the DNM-ANN proposed by Todo et al. (2014) is modified into a multivariate dendritic neuron model (MDNM). DeepDenT is designed to incorporate the proposed new architecture consisting of a hierarchical arrangement of dendritic cells. A training algorithm based on the differential evolution optimisation method proposed by Storn and Price (1997) is presented for training the DeepDenT neural network. The proposed training algorithm can get rid of local optimum traps more easily thanks to the restart strategy it contains and can find a solution to overfitting problems thanks to the early stopping condition. Since the proposed training algorithm does not require derivatives of the objective function, it does not involve exploding or vanishing gradient problems such as LSTM.

The rest of the paper is organized as follows. In the second part of the study, the newly proposed MDNM artificial neural network will be introduced. In the third part of the study, "dendritic cell" will be introduced. In the fourth section, the DeepDenT artificial neural network and its training algorithm will be introduced. In the fifth section, applications of stock market time series and comparison results with other methods in the literature will be presented. In the last section, the advantages, improvements and limitations of the DeepDenT artificial neural network will be discussed by considering the findings obtained in the application.

2 MDNM artificial neural network

DNM-ANN is proposed by Todo et al. (2014) in a multi-input and single-output structure and can be used to obtain the forecasting of a single time series. In this section, the structure of the DNM-ANN has been made multi-output to allow it to form a cell structure in a deep neural network, and the formulas for calculating the output of the network and the architecture of the MDNM are introduced. The architecture of the MDNM artificial neural network is given in Fig. 1. As can be seen from the figure, since more than one neuron is used in the output layer of the network, the number of parameters has to increase twice the number of additional outputs. In addition, for the training of such a network, the total error for all outputs must be taken into account. The training problem of this network is out of the scope of this study because the reason why we propose this network is that it is used in the construction of the "dendritic cell" structure.

Fig. 1
figure 1

The architecture of MDNM artificial neural network

What is important for this study is how to generate the outputs of the MDNM neural network for a given input set. The output of MDNM is given by the following equations. Synaptic functions for an MDDNM with \(p\)-input, \(m\) dendrites and \(k\) outputs are calculated as in Eq. (1).

$$\begin{array}{cc}{Y}_{ij}=\frac{1}{1+{\text{exp}}(-k\left({w}_{ij}{Input}_{i}+{\theta }_{ij}\right))}& i=\mathrm{1,2},\dots ,p;j=\mathrm{1,2},\dots ,m\end{array}$$
(1)

In Eq. 1, \({w}_{ij}\) and \({\theta }_{ij}\) shows the weights and biases, respectively. Besides, \(k\) is the slope parameter for synaptic function.

Dendritic functions are calculated by multiplying synaptic functions as in Eq. (2). The values of the dendrite function are products of different nonlinear transformations of the inputs.

$$\begin{array}{cc}{Z}_{j}=\prod_{i=1}^{p}{Y}_{ij}& ;j=\mathrm{1,2},\dots ,m\end{array}$$
(2)

Membrane functions are calculated by summing the dendritic functions as in Eq. (3). Finally, the output of the network is calculated as in Eq. (4).

$$V=\sum_{j=1}^{m}{Z}_{j}$$
(3)
$$\begin{array}{cc}{Output}_{l}=\frac{1}{1+exp\left(-{k}_{soma}^{l}(V- {\theta }_{soma}^{l})\right)}& l=\mathrm{1,2},\dots ,n\end{array}$$
(4)

In Eq. 4, \({k}_{soma}^{l}\) and \({\theta }_{soma}^{l}\) shows the slope and centralization parameters, respectively. The values of the membrane function are the same input signal for all outputs, but since different parameter values are used in the activation function for each output, different output values are obtained. Here, the activation function parameters become more important for the network to produce different outputs. The total number of parameters in the MDNM artificial neural network is \(2pm+2n+1\) and the parameters of the network are given in Table 1 with the number of elements it contains.

Table 1 The parameters (weights and biases) of MDNM

3 Dendritic cell

To create the DeepDenT deep artificial neural network, a new "dendritic cell (DnC)" structure is created. DnC, just like LSTM and GRU cells, can work like a mini neural network that can receive many inputs and produce many outputs. The architectural structure of the DnC is given in Fig. 2.

Fig. 2
figure 2

The architecture of DnC

In a DnC, given feature number p, number of hidden layer units h and dendrite number m, the output of the DnC is calculated by the following equations.

$$\begin{array}{cc}{Y}_{ij}^{1}=\frac{1}{1+{\text{exp}}(-k\left({w}_{ij}{x}_{t-i}+{\theta }_{ij}\right))}& i=\mathrm{1,2},\dots ,p;j=\mathrm{1,2},\dots ,m\end{array}$$
(5)
$${Y}^{\left(1\right)}=\sigma ({k(W}_{x}{\odot X}_{t-1}+{\theta }_{x}))$$
(6)

In Eq. (6), \({Y}^{\left(1\right)}\) represents the synaptic function values calculated for the inputs. In Eq. (6), \(\sigma (.)\) represents the logistic activation function.

$$\begin{array}{cc}{Y}_{kj}^{2}=\frac{1}{1+{\text{exp}}(-k\left({w}_{ij}{H}_{t-1}+{\theta }_{ij}\right))}& k=\mathrm{1,2},\dots ,h;j=\mathrm{1,2},\dots ,m\end{array}$$
(7)
$${Y}^{\left(2\right)}=\sigma \left(k\left({k(W}_{h}\odot {H}_{t-1}+{\theta }_{h}\right)\right)$$
(8)

In Eq. (8), \({Y}^{\left(2\right)}\) represents the synaptic function values calculated for the recurrent connections.

In Eqs. (6) and (8), \(\odot\) represents elementwise multiplication operation or Hadamard product. An example is given as follows for this product:

$$\left[\begin{array}{cc}a& d\\ b& e\\ c& f\end{array}\right]\odot \left[\begin{array}{cc}g& h\end{array}\right]=\left[\begin{array}{cc}a\times g& d\times h\\ b\times g& e\times h\\ c\times g& f\times h\end{array}\right]$$
(9)

Dendritic function value is calculated by using Eqs. (10) or (11).

$${Z}_{j}=\left(\prod_{i=1}^{p}{Y}_{ij}^{1}\right)\left(\prod_{k=1}^{h}{Y}_{kj}^{2}\right);j=\mathrm{1,2},\dots ,m$$
(10)
$$Z=Y^{\left(1\right)}\circledast Y^{\left(2\right)}$$
(11)

\(\circledast\) represents the multiplication of cumulative product for columns’ elements. An example is given as follows for this product:

$$\begin{bmatrix}a&d\\b&e\\c&f\end{bmatrix}\circledast\begin{bmatrix}g&h\\i&j\end{bmatrix}=\begin{bmatrix}a\times b\times c\times g\times i&d\times e\times f\times h\times j\end{bmatrix}$$
(12)

The membrane function value is calculated as follows:

$$V=\sum_{j=1}^{m}{Z}_{j}$$
(13)

The elements of the output of the DnC are calculated by using (14).

$$\begin{array}{cc}{Output}_{l}=\frac{1}{1+exp\left(-{k}_{soma}^{l}(V- {\theta }_{soma}^{l})\right)}& l=\mathrm{1,2},\dots ,n\end{array}$$
(14)
$${H}_{t}=[{Output}_{1} {Output}_{2} \dots {Output}_{n}]$$
(15)
$${H}_{t}=\sigma (-{k}_{soma}\times V-{\theta }_{soma})$$
(16)

The number of parameters is \(2pm+2k+1\) in a DnC. The dimensions of weights and biases in a dendritic cell are listed in Table 2.

Table 2 The parameters (weights and biases) of MDNM

The dimensions of calculated vectors and matrices in a DnC are given in Table 3.

  • \({H}_{t}:{X}_{t-1}:1\times p\)

  • \({Y}^{\left(1\right)}:p\times m\)

  • \({Y}^{\left(2\right)}:h\times m\)

  • \(Z:1\times m\)

  • \(V:1\times 1\)

Table 3 The dimensions of calculated vectors and matrices

4 DeepDenT artificial neural network and its training algorithm

The DeepDenT artificial neural network is a deep recurrent artificial neural network that combines DnCs. In the output layer of DeepDenT, there is a classical fully connected (FC) layer based on the additive aggregation function. DeepDenT is a partially connected artificial neural network with DnCs in a sequential and hierarchical structure. The architectural structure of DeepDenT is given in Fig. 3.

Fig. 3
figure 3

The architecture of DeepDenT

The DnC given by the dark cell wall is the last cell calculated before the output. The input of a DnC in the DeepDent is a lagged variable of \({x}_{t}=({y}_{t},{y}_{t-1,\dots ,}{y}_{t-p+1})\) according to the number of hidden layer nodes in Fig. 3. The input for the DeepDent cell in the lower left corner of the architecture is \({x}_{t-{\text{h}}}=({y}_{t-{\text{h}}-1},{y}_{t-h-2,\dots ,}{y}_{t-h-p})\) in Fig. 3. The output of the DeepDent deep recurrent artificial neural network is one step ahead forecast of the time series. The architecture in Fig. 3 has \(h\) time steps, \(q\) hidden layers, \(m\) dendrite number and \(p\) inputs or features. The number of neurons in all hidden layers is equal to h. The weight and bias values of all DeepDent cells in the same hidden layer are taken equally. This parameter sharing reduces the number of parameters and enables a common DeepDent cell that presents the same mathematical model in all time steps. These weights and bias values change in different hidden layers, that is, increasing the number of hidden layers increases the number of parameters of the network, while the number of time steps is not effective on the number of parameters like LSTM and GRU. The output of the DeepDenT is calculated with the following formulas. The parameters of DeepDenT for a hidden layer are combined into a single parameter set given in (17) for ease of illustration. As can be seen in (17), while the parameters change from hidden layer to hidden layer, they do not change for the same hidden layer units, i.e. time steps, and parameters are shared.

$${\Theta }^{j}=\left[{{W}_{x}}^{j},{\theta }_{x}^{j},{{W}_{h}}^{j},{\theta }_{h}^{j},{k}_{soma}^{j},{\theta }_{soma}^{j},{k}^{j}\right), j=\mathrm{1,2},\dots ,q$$
(17)

The output of the first hidden layer of DeepDenT is calculated by Eq. (18). Here, the function f is a representative function representation and the calculations are performed with the formulas given in the DnC section.

$${h}_{t-k}^{1}=f\left({h}_{t-k-1}^{1},{x}_{t-k},{\Theta }^{1}\right), k=\mathrm{1,2},\dots ,h$$
(18)

In (18), In this equation, \({h}_{t-k}^{1}\) indicates the output obtained in the first hidden layer for the kth time step at time t. Starting from the second hidden layer, the calculations are performed with Eq. (19) until the output of DeepDenT is obtained.

$${h}_{t-k}^{j}=f\left({h}_{t-k-1}^{j},{h}_{t-k}^{j-1},{\Theta }^{j}\right), k=\mathrm{1,2},\dots ,h, j=2,\dots ,q$$
(19)

The computation of DnCs in DeepDenT is performed in hidden layer order and from left to right within the same hidden layer. The output of the DnC shown in dark colour in Fig. 3 is \({h}_{t-1}^{q}\). The final output of DeepDenT can be calculated as the output of FC with the formula given in Eq. (21).

$${\widehat{y}}_{t}=\sigma \left({W}_{FC}{h}_{t-1}^{q}+{b}_{FC}\right)$$
(20)

After presenting the computational formulas of DeepDenT, the most important problem is to propose the training algorithm for this network. The training algorithm of DeepDenT based on the differential evolution optimisation (DEO) method is given in steps in the following Algorithm.

Algorithm
figure a

DeepDenT's training algorithm based on the differential evolution optimisation method.

The algorithm of the proposed method was coded in Matlab and shared publicly on Github at https://github.com/erole1977/DeepDenT. These codes can be used for recalculation of the obtained results. It is possible to obtain numerical differences in the recalculation of the results since the initial random weights are taken according to the system clock of the running computer at the time of execution of the codes. However, the ranking of the methods.

The computation time of a time series with 250 observations for a single architecture varies between 2.07 s and 5.02 s. The computation time can be affected by random initial values. When considered together with hyperparameter optimization, the total computation time of a time series with 250 observations takes between 20 and 22 min. A personal computer (12th Gen Intel(R) Core(TM) i5-12500H 2.50 GHz processor with 16 GB RAM) was used in the calculations.

5 Applications

In the application, the performance of the proposed method is investigated for a total of 20-time series for two stock market indices in Turkey and the USA stock markets. The first time series analysed is the S&P500 index (S&P 500 (GSPC), SNP—SNP Real-Time Price. Currency in USD) time series. The time series given in Table 4 for the opening values between 2014–2018 were randomly selected for use in the application. In the application, the lengths of the time series were taken as 250 and 500 to cover approximately 1 and 2 years. Thanks to the random selection, opening values in different periods of the year can be used as test data in the comparison.

Table 4 The random S&P500 time series observation dates and numbers

The performance of the proposed method is compared with some popular and recent ANN methods and some classical forecasting methods. LSTM proposed in Hocreiter and Schmidhuber (1991), pi-sigma ANN (PSGM) proposed in Shin and Ghosh (1991) and bootstrapped hybrid ANN (B-HANN) proposed in Egrioglu and Fildes (2022) were used in the comparison. As classical forecasting methods, random walk and Holt's linear trend exponential smoothing method were used.

For all time series used in the application, the data set is divided into three parts training-validation and test data in block structure. While the parameter estimations of the methods applied to the training data were performed, the forecasting performances were calculated on the validation and test data. In the application of all methods, possible values for hyperparameters were selected as similar and the best hyperparameter values were selected on the validation set. According to the best hyperparameter values, 30 different test set performances were obtained by training the methods with data other than the test set with 30 different random initialisations. The statistics of the RMSE values obtained for the test set performance are presented in the tables.

$$RMSE=\sqrt{\frac{1}{ntest}\sum_{t=1}^{ntest}{\left({y}_{t}-{\widehat{y}}_{t}\right)}^{2}}$$
(24)

The following direction accuracy (DA) criterion is used for the direction accuracy of the method. The Da criterion is calculated for the best architecture in the tables and given alongside the RMSE statistics.

$$\begin{array}{cc}DA=\frac{1}{n}\sum_{t=1}^{nt}{a}_{t},& {a}_{t}=\left\{\begin{array}{c}1 if \left({x}_{t+1}-{x}_{t}\right)\left({\widehat{x}}_{t+1}-{y}_{t}\right)>0\\ 0, o.w.\end{array}\right.\end{array}$$
(25)

where \({x}_{t}\) is the value of the time series at time t and \({\widehat{x}}_{t}\) is the forecast value at time t.

The RMSE statistics for the test data forecasting performance obtained for the time series given in Table 5 are given in Table 6 and the best hyperparameter values of all methods are given in Table 7.

Table 5 Statistics of RMSEs calculated for the test set from the S&P500 time series
Table 6 Best hyperparameter values of the methods applied for the S&P500 time series
Table 7 The random BIST100 time series observation dates and numbers

Table 5 shows that DeepDenT has a superior forecasting performance compared to all other methods by having a lower average RMSE value in 8 out of 10-time series, i.e. 80%. In particular, it can produce RMSE results with both lower mean and lower standard deviation for the test set than the LSTM method, which is the most popular deep ANN for the forecasting problem. It can be concluded that DeepDenT is more successful than all other methods for the S&P500 time series and should be preferred.

When the LSTM, PSGM, BHANN and DeepDenT methods are compared according to the DA criterion, it is seen that the methods do not provide a clear superiority to each other, in general, the methods have a directional accuracy performance between 50 and 65%. DeepDenT has the highest directional accuracy in 30% of the S&P500 series and is the second-best method after LSTM in terms of directional accuracy. Although this is the case, it should be ignored that the DA criterion does not make sense on its own, but only measures the direction of the forecast.

The second application was carried out over the opening values of the Borsa Istanbul 100 index (BIST100) between 01/02/2014 and 09/02/2018. The random series obtained for BIST100 are given in Table 7.

The RMSE statistics for the test data forecasting performance obtained for the time series given in Table 7 are given in Table 8 and the best hyperparameter values of all methods are given in Table 9.

Table 8 Statistics of RMSEs calculated for the test set from the BIST100 time series
Table 9 Best hyperparameter values of the methods applied for the BIST100 time series

Table 8 shows that DeepDenT has a superior forecasting performance compared to all other methods by having a lower average RMSE value in 5 out of 10-time series, i.e. 50%. In particular, it can produce RMSE results with both lower mean and lower standard deviation for the test set than the LSTM method, which is the most popular deep ANN for the forecasting problem. It can be concluded that DeepDenT is more successful than all other methods for the BIST100 time series and should be preferred.

When the LSTM, PSGM, BHANN and DeepDenT methods are compared according to the DA criterion, it is seen that the methods do not provide a clear superiority to each other, in general, the methods have a directional accuracy performance between 50 and 65%. DeepDenT has the highest directional accuracy in 40% of the BIST100 series and is the best method with LSTM in terms of directional accuracy.

In Fig. 4, Box-Plot Graphs of RMSE values calculated over the test set for both stock market data are given separately for all methods. As can be seen, although the results of the methods do not have a normal distribution, it is understood that DeepDenT has the lowest median.

Fig. 4
figure 4

Box-plot graph of RMSE for test sets from S&P500 and BIST 100 for all methods

6 Conclusions and discussion

In this study, a new deep artificial neural network DeepDenT is proposed to solve the forecasting problem. In addition, a training algorithm based on the differential evolution algorithm is proposed for DeepDent. Since the proposed training algorithm includes a restart strategy and early stopping conditions, it can produce successful training results for DeepDent. The performance of DeepDent is investigated for a 20-time series obtained from two stock exchanges. As a result of the application, it was observed that the proposed method produces successful results compared to both popular and current artificial neural networks and classical forecasting methods. The performance of the proposed new artificial neural network under training algorithms based on different artificial intelligence optimization methods is one of the topics to be investigated in the future. Another future study is to transform the proposed new neural network into a fully automatic forecasting method. For this purpose, input significance tests and different statistical tools are planned to be used for the proposed ANN.