1 Introduction 

The prediction of the stock market performance is an important facet of financial markets due to the large amounts of money involved. This predictability has been extensively studied, and various hypotheses have been applied, such as efficient market hypothesis (EMH), which states that the stock market is not predictable. Instead, EMH suggests that the stock market is efficient, and therefore, it is not possible to outperform the market [1]. Fama, an American economist, studied EMH and conducted empirical works in the 1970s [2]. After Fama presented his study called “Efficient capital markets: A review of the theory and empirical work,” many responded with research papers either in support or against EMH [16, ]. For the people that criticize the EMH, they believe the stock market prices are unreliable and mispriced as a result of noises caused by the momentum traders, speculators, and insiders. Fundamental and technical are two primary analyses for making investment decisions in stock markets.

1.1 Fundamental Analysis

Fundamental analysis assumes the stock market is not perfectly efficient; therefore, it aims to determine the intrinsic value of the security and analyze it against the actual present value. This ensures that the mispriced securities are detected [7]. Fundamental analysis supports the idea that over time, stock prices will correct themselves. As a result, many investors go long on the undervalued securities while going short on the overvalued securities to balance out the prices. The fundamentalists use financial statements to conduct their analysis using the debt-to-equity ratio (DER), market-to-book ratio (MBR), gross profit margin (GPM), and quick ratio (QR) [8]. Furthermore, the financial ratios of companies within the same sector are weighed against each other to conclude an adequate measure of merit of the financial status of the company. Fundamental analysis is also affected by macroeconomic variables. These variables consist of gross domestic product, rate of unemployment, and rate of inflation. If the financial ratios are conditioned on macroeconomic variables, then the effect of macroeconomic variables on intrinsic values of securities can be investigated [7]. The economic state, whether the economy is expanding or contracting, also affects the investors’ analyses [8]. Overall, fundamental analysis numbers derive from firm-specific, industry-specific, and economy-related measures to determine the intrinsic value of a security. All three of these sources help examine the static values of stock prices.

In a fundamental analysis based on the financial ratios of the companies, a simple regression model for predicting the prices assuming four financial ratios DER, MBR, GPM, and QR can be written as follows:

$${{P}}_{{t}+1}={\alpha }+{{\upbeta }_{1}{DER}}_{{t}}+{{\upbeta }_{2}{MBR}}_{{t}}+{{\upbeta }_{3}{GPM}}_{{t}}+{{\upbeta }_{4}{QR}}_{t}+{\uptheta }_{t}$$
(1)

where \({P}_{{t}+1}\) represents the firm’s share price at the next time; \(\alpha\), \({\upbeta }_{1}\), \({\upbeta }_{2}\), \({\upbeta }_{3}\), and \({\upbeta }_{4}\) are the regression parameters to be estimated; and \({\uptheta }_{\mathrm{t}}\) is an error term. The effect of different financial ratios on the prices can be examined. More common financial ratios used in the literature for the purpose of fundamental analysis can be found in [911].

It should be noted that the financial ratios utilized in the fundamental analysis are collected from the annual statements of the companies. As it is shown by Devine in [12], the quarterly financial data may contain information inconsistent with those derived from the annual financial data, and the financial decision makers are advised to make their financial decisions based on annual financial ratios as compared with the quarterly reported financial ratios. Moreover, the annual financial statements report some financial ratios which may not be necessarily reported in a quarterly statement, including annual capital ratio factor and annual leverage.

1.2 Technical Analysis

Technical analysis is unlike fundamental analysis in predicting stock prices since it is based on the belief that stock prices already indicate information from financial statements. As a result, the intrinsic value of a security is the same as the market value. Technical analysis uses pattern recognition to project short-term stock prices from the previous data. Technicians study active changes in stock prices as well as chart trends to embrace short-term trading opportunities anywhere from a minute-byminute basis to a weekly basis. They review past data since history is inclined to repeat itself [5]. Nonlinear patterns are extracted from noises to allocate the regularities and symmetries in the time series of the stock prices [4, 13]. Much of the research done on pattern recognition within stock prices looked at the analytical historical patterns in the calendar periods such as the result of the weekend, turn of the month, holiday, and first month of a new year [5]. Not only are historical prices studied, but also the volume of trades as the second market action. The historical trading volumes can assist in predicting the stock prices [14].

Even though technical analysis is as old as the stock market itself, it has been denounced due to its subjective nature. People have tried to compare visuals of technical analysis to that of human cognition by stating that pattern recognition is something computers do not have a mere privilege when compared to visual ability of a person [15]. So in return, technical analysis is beneficial in that it can assess the dynamics of stock prices, which is challenging to humans [15]. Technical analysis encapsulates the trading strategies created by practitioners from their own experiences in trading.

Predictability of stock prices has been subject of considerable debate among researchers and analysts in the past few decades [2, 16, 17]. However, this analysis is complex since stock market is dynamic and influenced by many factors. The stochastic process of stock prices is noisy, dynamic, nonlinear, nonparametric, nonstationary, and chaotic in their nature [18]. The characteristics of the stochastic process of stock prices as well as their definitions in mathematical terms are depicted in Table 1.

Table 1 Complexity of the stochastic process of stock prices and definitions

Moreover, the stochastic process of stock prices is nonstationary. Strong stationarity, weak stationarity, and non-stationarity are three forms of stationarity in stochastic processes. Let \(\{{\mathrm{X}}_{\mathrm{t}};\mathrm{t}\in {\mathbb{Z}}\}\) be stochastic process. Assume the finite set of time indices \(\left\{{\mathrm{t}}_{1},{\mathrm{t}}_{2},\dots ,{\mathrm{t}}_{\mathrm{s}}\right\}\subset {\mathbb{Z}},\mathrm{s}\in {\mathbb{Z}}^{+}\), in which the time indices go from \(1\) to \(s\), to set (1), \(\{{\mathrm{X}}_{\mathrm{t}}\}\), as \(({\mathrm{X}}_{{\mathrm{t}}_{1}},{\mathrm{X}}_{{\mathrm{t}}_{2}},\dots ,{\mathrm{X}}_{{\mathrm{t}}_{\mathrm{s}}})\), and (2), \(\{{\mathrm{X}}_{\mathrm{t}+\mathrm{k}}\}\), as \(({\mathrm{X}}_{{\mathrm{t}}_{1}+\mathrm{k}},{\mathrm{X}}_{{\mathrm{t}}_{2}+\mathrm{k}},\dots ,{\mathrm{X}}_{{\mathrm{t}}_{\mathrm{s}}+\mathrm{k}})\), where \(\mathrm{k}\in {\mathbb{Z}}\). The stochastic process \({\mathrm{X}}_{\mathrm{t}}\) is strong stationary if the joint distribution of (1) and (2) are equal. The stochastic process \({\mathrm{X}}_{\mathrm{t}}\) is weak stationary if \(\mathsf{E}{|{\mathrm{X}}_{\mathrm{t}}|}^{2}<\infty , \forall t\), \(\mathsf{E}\left({\mathrm{X}}_{\mathrm{t}}\right)=\upmu , \forall \mathrm{ t}\), and \(\mathrm{cov}\left({\mathrm{X}}_{{\mathrm{t}}_{1}},{\mathrm{X}}_{{\mathrm{t}}_{2}}\right)=\mathrm{cov}\left({\mathrm{X}}_{{\mathrm{t}}_{1}+\mathrm{k}},{\mathrm{X}}_{{\mathrm{t}}_{2}+\mathrm{k}}\right), \forall {\mathrm{ t}}_{1},{\mathrm{t}}_{2},\mathrm{k}\). However, the stochastic process of stock prices is nonstationary since the statistical properties of the time series of stock prices change over time. The Random Walk model has been tested in the literature for modeling the time series of stochastic process of stock prices [19] as follows:

$$X_{t}=\alpha+X_{t}-1+\upvarepsilon_{t}$$
(2)

where the error term \({\upvarepsilon }_{\mathrm{t}}\) is a white noise with mean 0 and variance \({\upsigma }^{2}\), \({\upvarepsilon }_{\mathrm{t}}\sim \mathrm{WN}\left(0,{\upsigma }^{2}\right)\). The statistical properties of the error term are \({\mathrm{E}(\upvarepsilon }_{\mathrm{t}})=0\), \({\mathrm{var}(\upvarepsilon }_{\mathrm{t}})={\upsigma }^{2}<\infty\), and \({\mathrm{cov}(\upvarepsilon }_{\mathrm{t}},{\upvarepsilon }_{\mathrm{t}-\mathrm{k}})=0, \forall \mathrm{ t},\mathrm{ k}\). It should be pointed out that, for the purpose of prediction, the volatility issue of financial time series as one of the characteristics of complexity needs to be taken into consideration. In a recent work, Mantalos et al. [20] study skewness as one of the causes of deviation from normality in financial time series. They propose a test procedure for skewness in autoregressive conditional volatility models which is applicable to financial datasets and capable of detecting the volatility of these datasets.

Neural network-based methods including Multilayer Perceptron have become popular techniques for forecasting stock movements due to their ability to learn how to do tasks through a learning algorithm [21]. Multilayer feedforward perceptron neural network (MLP) is appropriate for predicting stock prices and outperforms the statistical techniques such as regressions for the following four reasons. One, it is numeric and fits the financial datasets. Two, a distribution of input data is not required. Three, a model formulation is not required. Four, it can predict unseen data without reprocessing training data. However, determination of a MLP architecture including learning algorithms, the choice of number of layers, number of neurons in each layer, and (nonlinear) activation functions between the layers is complicated. Different network architectures result in different generalization errors; therefore, generalization error can be used as a measure to evaluate the performance of the network. An appropriate network architecture can be achieved through hyper-parameter optimizations [22].

In this paper, we use a multilayer feedforward perceptron neural network (MLP) and predict the short-term stock trends based on fundamental and technical analysis in isolation. Next, we design a hybrid model using a nonlinear autoregressive with exogenous input (NARX) in neutral networks by integrating the firm’s financial ratios and historical stock prices and predict the future stock movements. The main goal of this study is to explore the predictability of short-term stock trends of Technology companies listed on NASDAQ based on fundamental and technical analysis using MLP as well as an approach for combining fundamental and technical analysis using a NARX in neutral networks. It is shown that the hybrid model is beneficial for stock market participants, in the sense of producing return greater than a buy-and-hold (B&H) strategy.

This paper proceeds in seven sections. In Sect. 2, we conduct a brief review of related research. Section 3 introduces the multilayer feedforward perceptron neural network (MLP); Sect. 4 defines the input and output datasets as well as the data preprocessing methods; Sect. 5 proposes our fundamental, technical, and hybrid models; Sect. 6 presents the results and discusses the findings; Sect. 7 compares the proposed hybrid model with the state-of-the-art models; and Sect. 8 provides the conclusions to the work and offers future directions.

2 Literature Review

A voluminous amount of literature has examined the ability of soft computing techniques including neural networks [2325], fuzzy systems [26], neuro-fuzzy systems [27], genetic algorithm [24, 28], particle swarm optimization [29, 30], etc. to predict the future stock movements as follows:

Roman and Jameel [25] use backpropagation and recurrent neural networks and predict stock returns in five stock markets including Canada, Hong Kong, Japan, UK, and USA. They measure the prediction accuracies as well as the stock returns for each market using two types of neural networks. They conclude that their proposed method for building portfolios across international stock markets successfully predicts the future stock market movements. Leigh et al. [24] test the conventional pattern recognition techniques, neural networks, genetic algorithms, and the cross validation for forecasting price changes for the New York Stock Exchange (NYSE) Composite Index. The results of their study reveal the effectiveness of the bull flag patterns and the volume pattern heuristic. Ang and Quek [31] examine the predictability of stock price differences based on stock trading rough set-based pseudo outer-product (RSPOP). Their trading model uses pseudo outer-product–based fuzzy neural network using the compositional rule of inference (POPFNN-CRI) and simple moving average trading rules. They evaluate the performance of their model against stock trading with dynamic evolving neural-fuzzy inference system (DENFIS), the stock trading without forecast model, and the stock trading with ideal forecast model. They find that RSPOP performs better than DENFIS and the stock trading without forecast model in terms of identifying the rules with greater interpretability and producing higher profits.

Chavarnakul and Enke [32] examine the profitability of stock trading using generalized regression neural networks (GRNN), volume-adjusted moving average (VAMA), and ease of movement (EMV) indicators by utilizing data from S&P 500 index. They use equivolume charts and show that their model outperforms the VAMA, EMV, and MA in isolation. In addition, they find that their model performs better than a B&H strategy. Atsalakis and Valavanis [27] use an adaptive neuro fuzzy inference system (ANFIS) and predict the short-term stock prices based on historical data from Athens and NYSE. They find that the proposed model outperforms the simple B&H strategy and 13 other existing models in terms of accuracy. Chang et al. [26] present a Takagi–Sugeno fuzzy rule-based model for trading stocks. Their model uses a set of trading indicators to generate trading signals learned by a support vector regression (SVR). Additionally, they compare the performance of their model against conventional linear regression model and artificial neural networks and report that their model yields more profit than other models. Boyacioglu and Avci [33] use an adaptive network-based fuzzy inference system (ANFIS) and predict the stock market return on Istanbul Stock Exchange (ISE). They use six macroeconomic variables and three indices as input data. Their experimental results indicate that their model successfully predicts the monthly return of ISE National 100 Index with an acceptable accuracy.

A summary of related work including the markets, methods, and the results is shown in Table 2.

Table 2 An overview of different approaches to stock market price prediction

From the literature review, it can be observed that a variety of techniques have been used in both fundamental and technical analyses. In contrast to voluminous research on fundamental and technical analysis in isolation, there have been only very limited research aimed at developing an integrated approach based on both technical and fundamental analyses. An integrated model of fundamental and technical analysis using multilayer feedforward perceptron neural networks (MLP) and a nonlinear autoregressive structure with exogenous input (NARX), which is the subject of this study, has not previously been tested in the literature. Figure 1 demonstrates a comparison between fundamental and technical analyses as well as the schematic of our approach to predicting short-term stock trends using our neural networks.

Fig. 1
figure 1

Comparison between fundamental and technical analyses as well as the schematic of the proposed approach to predicting short-term stock trends

3 Multilayer Feedforward Perceptron Neural Networks (MLP)

Multilayer perceptron (MLP) is a class of feedforward artificial neural network (ANN) and uses backpropagation as a technique for training. In multilayer perceptron neural networks, the output of each layer forms the input of the next layer. In this study, we use a three-hidden-layer MLP in order to obtain the optimum results. \(\mathrm{P}.{\mathrm{H}}_{1}.{\mathrm{H}}_{2}.{\mathrm{H}}_{3}.\mathrm{Q}\) describes the architecture of an MLP with three hidden layers, where\({\mathrm{H}}_{1}\),\({\mathrm{H}}_{2}\), and \({\mathrm{H}}_{3}\) are the hidden layers, and \(\mathrm{P}\) and \(\mathrm{Q}\) are the input and output layers, respectively. This network can be translated into a matrix form as a function \(\mathrm{f }: {\mathrm{R}}^{\mathrm{D}}\to {\mathrm{R}}^{\mathrm{L}}\), where \(\mathrm{D}\) is the size of input vector\(\mathrm{x}\), and \(\mathrm{L}\) is the size of the output vector \(\mathrm{f}(\mathrm{x})\) [36].

Assuming a training dataset \({\mathrm{D}=\{{\mathrm{x}}_{\mathrm{n}},{\mathrm{t}}_{\mathrm{n}}\}}_{\mathrm{n}=1}^{\mathrm{N}}\), we can form a data matrix \(\mathrm{X}\) containing the training data as follows:

$$\mathrm{X}=\begin{array}{c}\left(\begin{array}{c}{\mathrm{x}}_{1}^{\uptau }\\ :\\ {\mathrm{x}}_{\mathrm{N}}^{\uptau }\end{array}\right)\\ \mathrm{N}\times (\mathrm{P}+1)\end{array}$$
(3)

In order for the MLP to learn the relationships between the variables based on the training data, the first stage is to build the product that connects the input layer to the first hidden layer:

$$\begin{array}{c}{\mathrm{y}}_{1}\\ {\mathrm{H}}_{1}\times (\mathrm{P}+1)\end{array}=\begin{array}{c}{\Omega }_{1}\\ {\mathrm{H}}_{1}\times (\mathrm{P}+1)\end{array}\begin{array}{c}\mathrm{x}\\ (\mathrm{P}+1)\times 1\end{array}$$
(4)

where \({\Omega }_{1}\) is the first weight matrix. Similarly, the first and second hidden layers, the second and third hidden layers, and the last hidden layer and the output layer are connected as follows:

$$\begin{array}{c}{\mathrm{y}}_{2}\\ {\mathrm{H}}_{2}\times ({\mathrm{H}}_{1}+1)\end{array}=\begin{array}{c}{\Omega }_{2}\\ {\mathrm{H}}_{2}\times ({\mathrm{H}}_{1}+1)\end{array}\begin{array}{c}{\mathrm{y}}_{1}\\ {\mathrm{H}}_{1}\times (\mathrm{P}+1)\end{array}$$
(5)
$$\begin{array}{c}{\mathrm{y}}_{3}\\ {\mathrm{H}}_{3}\times ({\mathrm{H}}_{2}+1)\end{array}=\begin{array}{c}{\Omega }_{3}\\ {\mathrm{H}}_{3}\times ({\mathrm{H}}_{2}+1)\end{array}\begin{array}{c}{\mathrm{y}}_{2}\\ {\mathrm{H}}_{2}\times ({\mathrm{H}}_{1}+1)\end{array}$$
(6)
$$\begin{array}{c}\mathrm{z}\\ \mathrm{Q}\times 1\end{array}=\begin{array}{c}\upgamma \\ \mathrm{Q}\times ({\mathrm{H}}_{3}+1)\end{array}\begin{array}{c}{\mathrm{y}}_{3}\\ {\mathrm{H}}_{3}\times ({\mathrm{H}}_{2}+1)\end{array}$$
(7)

where \({\Omega }_{2}\), \({\Omega }_{3}\), and \(\upgamma\) are the second, third, and the last weight matrices, respectively. The dimension of each matrix/vector in Eqs. (4) to (5) varies depending on the number of neurons in the input, hidden, and output layers of MLP. For instance, Eq. (4) explains the relationship between the input layer and the first hidden layer; therefore, the dimension of the corresponding matrix/vector is the number of neurons in the product between the input layer and the first hidden layer, \(\mathrm{P}\times {\mathrm{H}}_{1}\). The dimension of the other matrices/vectors is determined in a similar manner. Therefore, the MLP fitted model can be defined as follows:

$$\mathrm{mlp}\left(\mathrm{x}\right)={\mathrm{f}}_{\mathrm{Q}}\left(\mathrm{z}\right)={\mathrm{f}}_{\mathrm{Q}}(\upgamma [1,{\mathrm{f}}_{{\mathrm{H}}_{3}}({\Omega }_{3}[1,{\mathrm{f}}_{{\mathrm{H}}_{2}}({\Omega }_{2}[1,\left\{{\mathrm{f}}_{{\mathrm{H}}_{1}}\left({\Omega }_{1}\mathrm{x}\right){\}}^{\uptau }{]}^{\uptau }{)}^{\uptau }{]}^{\uptau }{)}^{\uptau }{]}^{\uptau }\right)$$
(8)

MLP’s neurons use nonlinear activation functions. The MLP’s two common activation functions are both sigmoids: Tan-Sigmoid activation function (tansig) and Log-Sigmoid activation function (logsig). Tan-Sigmoid activation function is as follows:

$$\mathrm{tansig}\left(\mathrm{n}\right)=2/(1+\mathrm{exp}\left(-2*\mathrm{n}\right))-1$$
(9)

which ranges between − 1 and 1. While, Log-Sigmoid activation function is as follows:

$$\mathrm{logsig}\left(\mathrm{n}\right)=1/(1+\mathrm{exp}\left(-\mathrm{n}\right))$$
(10)

which ranges between 0 and 1.

The MLP is made up of three or more layers, including an input, output, and one or more hidden layers. Every neuron in one layer attaches with a specific weight, \({\mathrm{w}}_{\mathrm{ij}}\), to another neuron in the next layer. After each data piece is processed, that is when the learning occurs in the perceptron from altering connection weights depending on the amount of error in the output. This error amount is determined by comparing it to the anticipated result. The error in the output neuron \(\mathrm{j}\) in the \(\mathrm{n}\) th data point is calculated by

$${\mathrm{e}}_{\mathrm{j}}\left(\mathrm{n}\right)={\mathrm{d}}_{\mathrm{j}}\left(\mathrm{n}\right)-{\mathrm{y}}_{\mathrm{j}}\left(\mathrm{n}\right)$$
(11)

where \(\mathrm{d}\) is the target value and \(\mathrm{y}\) is the value produced by the perceptron. The weights are adjusted according to the corrections that minimize the error in the entire output, given by

$$\mathcal E\left(\mathrm n\right)=\frac12{\textstyle\sum_{\mathrm j}}\mathrm e_{\mathrm j}^2(\mathrm n)$$
(12)

Using gradient descent, the change in each weight is

$${\Delta \mathrm{w}}_{\mathrm{ji}}\left(\mathrm{n}\right)=-\upeta \frac{\partial \mathcal{E}\left(\mathrm{n}\right)}{\partial {\upupsilon }_{\mathrm{j}}\left(\mathrm{n}\right)}{\mathrm{y}}_{\mathrm{i}}\left(\mathrm{n}\right)$$
(13)

where \({\mathrm{y}}_{\mathrm{i}}\) is the output of the previous neuron, and \(\upeta\) is the learning rate, which is selected in order to ensure that the weights converge quickly to a response, without any oscillation.

The derivative calculation depends on the induced local field \({\upupsilon }_{\mathrm{j}}\), which itself varies. It can be proved that for an output neuron, this derivative can be simplified to

$$-\frac{\partial \mathcal{E}\left(\mathrm{n}\right)}{\partial {\upupsilon }_{\mathrm{j}}\left(\mathrm{n}\right)}={\mathrm{e}}_{\mathrm{j}}\left(\mathrm{n}\right){\upphi }^{{^{\prime}}}\left({\upupsilon }_{\mathrm{j}}\left(\mathrm{n}\right)\right)$$
(14)

where \({\upphi }^{{^{\prime}}}\) is the derivative of the activation function as described previously. The analysis is more complicated for the change in weights with respect to a hidden neuron, but it can be shown that the relevant derivative is

$$-\frac{\partial \mathcal{E}\left(\mathrm{n}\right)}{\partial {\upupsilon }_{\mathrm{j}}\left(\mathrm{n}\right)}={\upphi }^{{^{\prime}}}\left({\upupsilon }_{\mathrm{j}}\left(\mathrm{n}\right)\right){\sum}_{\mathrm{k}}-\frac{\partial \mathcal{E}\left(\mathrm{n}\right)}{\partial {\upupsilon }_{\mathrm{j}}\left(\mathrm{n}\right)}{\mathrm{w}}_{\mathrm{kj}}\left(\mathrm{n}\right)$$
(15)

This depends on the change in weights of the \(\mathrm{k}\) th neurons, which represent the output layer. Therefore, to change the hidden layer weights, the output layer weights change based on the derivative of the activation function.

The performance of the multilayer perceptron can be evaluated based on two measures: accuracy and generalization error. Accuracy represents the learning ability of the model. Generalization error shows how accurately the model is capable of predicting unseen data. Generalization error is an appropriate indicator of performance of neural networks. Hyper-parameter optimization is the process of selecting weights from the architecture in order to improve the performance of the learning algorithms and obtaining the optimal solutions. Hyper-parameter optimizations can be used in order to achieve the best network architecture that results in the least generalization errors. In this study, we use MLP hyper-parameter optimizations to obtain the best network architecture with respect to generalization error and accuracy.

4 Data Preprocessing Methods

Finance, energy, healthcare, technology, etc. are popular sectors of stock markets. In this paper, in order to lower the noise level, we aim to study firms that belong to the same sector. Hence, we chose 578 Technology companies whose data is available on NASDAQ website. Financial ratios of the companies are the main input variables in fundamental analysis of the stock market. This study focuses on twenty-four financial ratios compiled from the financial statements of the Technology companies in the fiscal year of 2017, selected on the basis of the previous literature and for reasons of importance and tractability.

4.1 Feature Selection

Twenty-four financial ratios for the fundamental analysis of stock movements are studied in this subsection. These twenty-four financial ratios are selected for the reason of significance in terms of their effect on the stock movements. These financial ratios are further used as input variables in the multilayer feedforward perceptron model for predicting the stock movements. These predictors are listed as \({\left\{{\mathrm{fr}}_{\mathrm{n}}\right\}}_{\mathrm{n}=1}^{24}\), where \(\mathrm{fr}\) is the financial ratio and \(\mathrm{n}\) is an index for the financial ratios ranging between \(1\) and \(24\). The set of financial ratios as predictors in the multilayer feedforward perceptron model is shown below:

$${\left\{{\mathrm{fr}}_{\mathrm{n}}\right\}}_{\mathrm{n}=1}^{24}=\left\{{\mathrm{fr}}_{1}, {\mathrm{fr}}_{2}, {\mathrm{fr}}_{3}, ..., {\mathrm{fr}}_{24}\right\}$$
(16)

4.1.1 Determination of the Optimal Feature Set

The approach to the determination of the optimal set of features used in this study attempts to minimize the mean squared errors (MSEs) between the actual and predicted values as shown below:

$$\mathrm{MSE}=\frac{1}{\mathrm{N}}{\sum}_{\mathrm{t}=1}^{\mathrm{N}}{({\mathrm{y}}_{\mathrm{t}}-{\widehat{\mathrm{y}}}_{\mathrm{t}})}^{2}$$
(17)

where \({\mathrm{y}}_{\mathrm{t}}\) is the actual value, \({\widehat{\mathrm{y}}}_{\mathrm{t}}\) is the predicted value, \(\mathrm{t}\) is time, and \(N\) is the total number of days for computing MSEs. The multilayer feedforward perceptron models are built for various combinations of the financial ratios as the predictors, and the resulting MSEs are computed. The process of determination of the optimal feature set stops when the following condition is satisfied:

$$\mathrm{The}\;\mathrm{subset}\;\mathrm{of}\;\mathrm{the}\;\;\mathrm{features}\;\mathrm{is}\;'PURE'\;i.e.,{MSE}_1\leq{Err}_{tol}.{MSE}_{total}$$
(18)

where \({\mathrm{MSE}}_{\mathrm{l}}\) indicates the MSE of the predictive model built based on the corresponding subset of features, \({\mathrm{MSE}}_{\mathrm{total}}\) indicates the MSE of the predictive model built based on the original set of twenty-four financial ratios, and \({\mathrm{Err}}_{\mathrm{tol}}\) is a predefined acceptable tolerance (i.e., \({\mathrm{Err}}_{\mathrm{tol}}=0.8\) in this study).

Twelve financial ratios are selected as the most significant variables through the Feature Selection preprocess [37] by conducting the hyper-parameter optimizations. The selected financial ratios are as follows: current ratio (\({\mathrm{fr}}_{1}\)), inventory/current assets (\({\mathrm{fr}}_{3}\)), inventory day sales (\({\mathrm{fr}}_{5}\)), net income/common equity (\({\mathrm{fr}}_{6}\)), net income/net sales (\({\mathrm{fr}}_{7}\)), net income/total assets (\({\mathrm{fr}}_{11}\)), net sales/property-plant and equipment (\({\mathrm{fr}}_{12}\)), net sales/work capitalization (\({\mathrm{fr}}_{17}\)), pretax income/net sales (\({\mathrm{fr}}_{19}\)), quick ratio (\({\mathrm{fr}}_{21}\)), total liability/common equity (\({\mathrm{fr}}_{23}\)), and total liability/total assets (\({\mathrm{fr}}_{24}\)), as reported in Table 3.

Table 3 List of financial ratios, variables, and discretization numbers determined through feature selection and data discretization preprocesses

4.2 Data Discretization

Next, the financial ratios, which are continuous variables, are discretized into discrete counterparts through a data discretization preprocess by conducting the hyper-parameter optimizations. The twelve financial ratios are discretized as shown in Table 3. For instance, inventory/current assets are discretized into the following five classes: 0.1, 0.3, 0.5, 0.7, and 0.9. The combination of twelve financial ratios discretized into their classes, contributes to the least mean squared error (MSE) by comparing the simulated results from MLP neural networks and actual data.

In our technical analysis, we study stock prices of the same companies to forecast the future stock movements. We normalize the input data to reduce the data redundancies and improve the data integrity. The number of input data may vary depending on how much of an impact the historical stock prices have on the future prices [38]. We use IXIC daily stock prices from 2013–2003 to 2018–2006. In order to enhance the comparison between the stock prices, we should restructure the time series of stock prices to achieve stationarity. First, we convert the stock prices into a logarithmic format since stock prices are based on returns, and returns are based on percentages. Next, we calculate the differences between each time series point and multiply them by 100. This computes the compound returns, \(\mathrm{r}\). Time series of adjusted closing prices has a moving average while time series of compound returns has a constant mean, variance, and autocorrelation. This indicates that our data is now stationary.

In addition, we determine the autocorrelations in order to further examine the time series of compound returns. The autocorrelation between any two data points \({\mathrm{r}}_{\mathrm{t}}\) and \({\mathrm{r}}_{\mathrm{t}-\mathrm{h}}\) only depends on the time lag \(\mathrm{h}\) between them. Lag-\(\mathrm{h}\) covariance is defined as:

$${\upgamma }_{\mathrm{h}}=\mathrm{cov}\left({\mathrm{r}}_{\mathrm{t}},{\mathrm{r}}_{\mathrm{t}-\mathrm{h}}\right)$$
(19)

Therefore, the theoretical lag-\(\mathrm{h}\) autocorrelation is

$${\uprho }_{\mathrm{h}}=\mathrm{corr}\left({\mathrm{r}}_{\mathrm{t}},{\mathrm{r}}_{\mathrm{t}-\mathrm{h}}\right)=\frac{{\upgamma }_{\mathrm{h}}}{{\upgamma }_{0}}$$
(20)

where \({\upgamma }_{0}\) is the lag-\(0\) covariance.

The theoretical partial autocorrelation \({\upphi }_{\mathrm{h},\mathrm{h}}\) is the autocorrelation between \({\mathrm{r}}_{\mathrm{t}}\) and \({\mathrm{r}}_{\mathrm{t}-\mathrm{h}}\) after removing the effect of confounders. The theoretical autocorrelation and partial autocorrelation can be estimated using sample autocorrelation and sample partial autocorrelation. For the data points \({\mathrm{r}}_{1}, {\mathrm{r}}_{2},\dots ,{\mathrm{r}}_{\mathrm{T}}\), the sample lag-\(\mathrm{h}\) autocorrelation is:

$${\widehat{\uprho }}_{\mathrm{h}}=\frac{\sum_{\mathrm{t}=\mathrm{h}+1}^{\mathrm{T}}\left({\mathrm{r}}_{\mathrm{t}}-\overline{\mathrm{r} }\right)\left({\mathrm{r}}_{\mathrm{t}-\mathrm{h}}-\overline{\mathrm{r} }\right)}{\sum_{\mathrm{t}=1}^{\mathrm{T}}{\left({\mathrm{r}}_{\mathrm{t}}-\overline{\mathrm{r} }\right)}^{2}}$$
(21)

where \(\overline{\mathrm{r} }\) is the sample mean. Similarly, \({\widehat{\upphi }}_{\mathrm{h},\mathrm{h}}\) denotes the sample partial autocorrelation. The autocorrelation function (ACF) and partial autocorrelation function (PACF) are illustrated in Fig. 2. It is noted that PACF is the sequence \({\upphi }_{\mathrm{h},\mathrm{h}}\), \(\mathrm{h}=\mathrm{1,2},\dots ,\mathrm{N}-1\) for the time series \({\mathrm{r}}_{\mathrm{t}}\), \(\mathrm{t}=\mathrm{1,2},\dots ,\mathrm{N}\) after removing the linear dependence on \({\mathrm{r}}_{1}, {\mathrm{r}}_{2},\dots ,{\mathrm{r}}_{\mathrm{t}-\mathrm{h}+1}\).

Fig. 2
figure 2

Correlogram of sample autocorrelation function (ACF) and partial autocorrelation function (PACF)

Each vertical line, in both graphs in Fig. 2, represents the correlation between the lags. It can be seen in the upper graph that there is a very gradual descent in the lags in the ACF depicted in the upper graph. However, there is an immediate drop in the correlation of the first lags in the PACF depicted in the lower graph. From the autocorrelations demonstrated in this correlogram, we can infer that our data is stationary.

4.3 Self-Organizing Maps

Self-organizing maps (SOM) supply a data visualization method that, in return, assists in understanding high-dimensional data by condensing the dimensions of data into a map. This method groups together similar data which is a clustering concept. As a result, SOM not only shows similarities between data, but also decreases data dimensions. SOM algorithm linearly or randomly assigns the values of weight vectors of each output. Then, Euclidean distance can be used to gauge weight vectors of all the neurons and the data points represented. A neuron that is the closest distance wins and is moved closer to the data point. The SOM shows information that is learned. Therefore, SOM reflects an input value onto a point in the place and the surrounding points have alike functions [39, 40]. SOM is used to associate the historical stock price data points and illustrate the results.

In the technical analysis, the input to the SOM can be written as \({\mathrm{I}}_{\mathrm{SOM}}=\left\{{\mathrm{X}}_{\mathrm{Disc}},{\mathrm{Y}}_{\mathrm{Obs}}\right\}\), where \({\mathrm{X}}_{\mathrm{Disc}}\) is the series of discretized past stock prices, \({\mathrm{Y}}_{\mathrm{Obs}}\) is the observed stock price. In this paper, as the cluster configuration of the \(\mathrm{a}\times \mathrm{b}\) and the optimal quantity of clusters are unknown, the training data is used to generate various cluster combinations. The localization is performed by modifying the learning rate \({\alpha }(\mathrm{t})\) and the neighborhood size \(\mathrm{R}(\mathrm{t})\). The learning amount for each neuron can be computed by \(\alpha \left(t\right)*{e}^{-d/R\left(t\right)}\), where the distance \(\mathrm{d}=\Vert \mathrm{y}-\mathrm{w}\Vert\) given \(\mathrm{y}\) as the input and \(\mathrm{w}\) as the corresponding weight. It should be noted that as \({\alpha }(\mathrm{t})\) and \(\mathrm{R}(\mathrm{t})\) decrease with time, the learning amount will decrease as well, and after sufficient iterations, the updating will reach saturation. These results are shown in Fig. 3.

Fig. 3
figure 3

The results of the self-optimizing map for clustering the input data in technical analysis: hits (left), SOM neighbor weight distances (middle), and SOM weight positions (right)

Hits plot displays the neuron location in the topology along with the training data correlated with each of the neurons. There are 100 neurons due to the 10–10 grid. It can be seen that the maximum number of hits any individual neuron has is 20, meaning there are 20 input vectors in the given cluster. In SOM neighbor weight distances plot, the blue hexagons represent the neurons. Meanwhile, the red lines connect the neighboring neurons. The colors within the red line region show the distances between the neurons. Larger distances are shown by darker colors while smaller distances are shown in lighter colors. There is a dark segment band from the lower-center region into the upper-right region. This shows the SOM network grouped the discretized representation of the input space into four apparent groups. In SOM weight positions plot, the location of the data points and weight vectors are displayed. As shown above, the map is well dispersed throughout the input space. The determination of optimal cluster combination from the possible configurations is discussed in the following subsection.

4.3.1 Determination of the Optimal Number of SOM Clusters

In this paper, the optimal number of SOM clusters is determined through the Silhouettes methodology introduced in [41]. Assume the SOM configuration \(\mathrm{a}\times \mathrm{b}\), the quantity of clusters for this configuration can be calculated as \(\mathrm{k}=\mathrm{a}+\mathrm{b}\), and the Silhouettes \({\mathrm{S}}_{\mathrm{j}}(\mathrm{i})\) for the \(\mathrm{i}\) th point in the \(\mathrm{j}\) th cluster \({\mathrm{C}}_{\mathrm{j}}\), denoted by \({\mathrm{p}}_{\mathrm{j}}(\mathrm{i})\) can be written as:

$${\mathrm{S}}_{{\mathrm{C}}_{\mathrm{j}}}\left(\mathrm{i}\right)=\frac{{\mathrm{b}}_{\mathrm{j}}\left(\mathrm{i}\right)-{\mathrm{a}}_{\mathrm{j}}\left(\mathrm{i}\right)}{\mathrm{max}\left\{{\mathrm{a}}_{\mathrm{j}}\left(\mathrm{i}\right),{\mathrm{b}}_{\mathrm{j}}\left(\mathrm{i}\right)\right\}}$$
(22)

where

$${\mathrm{a}}_{\mathrm{j}}\left(\mathrm{i}\right)=\frac{1}{\left|{\mathrm{C}}_{\mathrm{j}}\right|-1}{\sum }_{\mathrm{h}\ne \mathrm{i},\mathrm{ h}\in \mathrm{j}}\Vert {\mathrm{p}}_{\mathrm{j}}(\mathrm{i})-{\mathrm{p}}_{\mathrm{j}}(\mathrm{h})\Vert$$
(23)

and

$${\mathrm{b}}_{\mathrm{j}}\left(\mathrm{i}\right)=\underset{{\mathrm{C}}_{\mathrm{i}}}{\mathrm{min}}\left\{{\sum }_{\mathrm{h}=1}^{\left|{\mathrm{C}}_{\mathrm{i}}-1\right|}\Vert {\mathrm{p}}_{\mathrm{j}}(\mathrm{i})-{\mathrm{p}}_{\mathrm{C}}(\mathrm{h})\Vert \right\},\mathrm{ i}=1, 2, ...,\mathrm{ k}$$
(24)

Larger value of \({\mathrm{S}}_{{\mathrm{C}}_{\mathrm{j}}}\left(\mathrm{i}\right)\) means the point \({\mathrm{p}}_{\mathrm{j}}(\mathrm{i})\) is well clustered and that point belongs to the cluster \({\mathrm{C}}_{\mathrm{j}}\). Next, we calculate the average Silhouettes value for each cluster \({\mathrm{C}}_{\mathrm{i}}\) in all the configurations as:

$${\upmu }_{{\mathrm{C}}_{\mathrm{i}}}=\frac{1}{{\mathrm{C}}_{\mathrm{i}}}{\sum }_{\mathrm{j}=1}^{\Vert {\mathrm{C}}_{\mathrm{i}}\Vert }{\mathrm{S}}_{{\mathrm{C}}_{i}}\left(j\right),\mathrm{ i}=1, 2, ...,\mathrm{ L}$$
(25)

where \(\mathrm{L}\) is sum of all the clusters of all the combinations. Once the Silhouettes values for all the points in all the clusters are determined, the median Silhouettes value is determined as

$${\mathrm{median}}_{{\upmu }_{\mathrm{C}}}=\mathrm{median}\left\{{\upmu }_{{\mathrm{C}}_{1}}, {\upmu }_{{\mathrm{C}}_{2}}, ..., {\upmu }_{{\mathrm{C}}_{L}}\right\}$$
(26)

Then, the SOM cluster combination with all the clusters having \({\upmu }_{{\mathrm{C}}_{\mathrm{i}}}\ge {\mathrm{median}}_{{\upmu }_{\mathrm{C}}}\) will be the optimal cluster combination.

5 Model Development

In this section, two separate models are presented in order to study the predictability of stock movements using the fundamental and technical analysis in isolation. Then, a hybrid model is examined and compared to the first two models. The number of layers, number of neurons in each layer, and the (nonlinear) activation functions between the layers are determined through MLP hyper-parameter optimization. The results of the optimized network reveal that the (nonlinear) activation functions between the input layer and the 1st hidden layer, as well as the (nonlinear) activation function between the 3rd hidden layer and the output layer, are Log-sigmoid. Interestingly, the same type of (nonlinear) activation function connects the hidden layers. There are 60, 51, and 1 neurons in the 1st, 2nd, and 3rd hidden layers of the optimized network, respectively. These numbers are selected based on the optimum results after hyper-parameter optimization. The neural network depicted in Fig. 4 represents the fundamental analysis of the stock market. According to the results of the fundamental analysis, this model contributes to the least MSE among all the trials of the hyper-parameter optimization process.

Fig. 4
figure 4

Multilayer perceptron (MLP) network implemented for the prediction of stock through fundamental analysis

Similar to the fundamental analysis, we conduct MLP hyper-parameter optimizations for the technical analysis to obtain the optimum results. There are 30, 32, and 1 neurons in the 1st, 2nd, and 3rd hidden layers, respectively. The (nonlinear) activation functions between the input layer and the 1st hidden layer, the 1st and the 2nd hidden layers, and the 2nd and the 3rd hidden layers are all Log-sigmoid. However, the last hidden layer, which contains only one neuron, is connected to the output layer via a Tan-sigmoid activation function. The optimum results indicate that this (nonlinear) activation function results in smaller MSEs as compared to Log-Sigmoid. Figure 5 displays the architecture of the neural network in the technical analysis, which results in the least MSE among all the trials.

Fig. 5
figure 5

Multilayer perceptron (MLP) network implemented for the prediction of stock prices through technical analysis

Next, we present a hybrid model by combining the fundamental and technical analyses. We combine the models in order to integrate financial ratios and historical prices, and then we see how this affects the simulated results. Figure 6 illustrates the prediction system used in order to predict the future stock movements. The inputs of the hybrid model are the same financial ratios and stock prices examined by the first two models, and the output of this model is the same as the output of those models.

Fig. 6
figure 6

Prediction system of short-term stock trends based on the hybrid model of fundamental and technical analysis

The three-hidden layer MLP neural network learns the relationship between the stock prices, financial ratios, and the future stock movements. The hybrid model computes a predicted value \(\mathrm{y}\left(\mathrm{t}+1\right)\) as depicted below, given past values of \(\mathrm{y}\left(\mathrm{t}\right)\), which represent the stock price time series, and another input dataset \(\mathrm{x}\left(\mathrm{t}\right)\), which are the financial ratios of the companies selected using a feature selection preprocess:

$$\mathrm{y}\left(\mathrm{t}+1\right)=\mathrm{f}\left(\mathrm{x}\left(\mathrm{t}\right),\dots ,\mathrm{x}\left(\mathrm{t}-\mathrm{d}+1\right),\mathrm{y}\left(\mathrm{t}\right),\dots ,\mathrm{y}\left(\mathrm{t}-\mathrm{d}+1\right)\right)$$
(27)

In addition to the past values of the same series and the driving (exogenous) variable, the model contains an error term. The error term indicates that information about the other terms does not enable the future value of the time series to be predicted exactly. The architecture of the neural network representing the hybrid model of the fundamental and technical analysis is illustrated in Fig. 7.

Fig. 7
figure 7

Nonlinear autoregressive with external (exogenous) input (NARX) implemented for the prediction of stock prices through hybrid model

In order to predict one-step-ahead stock price, \(\mathrm{y}(1)\), the hybrid model uses 12 financial ratios from the previous year financial statements as well as 20 past prices as input variables. More specifically, the number 0 in the hidden layer of the NARX model relates to \(\mathrm{x}(0)\) which indicates that we use financial ratios from the previous year in our hybrid model. In a similar manner, the range 0:19 in the hidden layer of the NARX model relates to the series of variables \(\mathrm{y}\left(0\right),\mathrm{ y}\left(-1\right), \dots ,\mathrm{ y}(-19)\) which indicates that we use 20 past stock prices in our hybrid model.

6 Results and Findings

In this section, first, we evaluate the performance of our models through a training/test split and cross-validation test. Next, we examine the directional accuracies of our simulated results using the three different models.

6.1 Training/Test Split and Cross-Validation Tests

In order to validate the fundamental, technical, and hybrid models, we test the accuracy and generalization error of each model. We use MSE as the measure of risk function and compare the simulated results with actual data obtained from NASDAQ. First, we split our data into two subsets: training and test data. The process of training a neural network is a very complex task which may have a significant effect on the ultimate results. We use 80% of our normalized data, which includes historical stock prices from 1992 Jan to 2013 Feb, for training the neural networks. The specifications of the training and testing datasets are shown in Table 4.

Table 4 Training and testing datasets

MSE of the model on the training dataset captures the accuracy of the model. On the other hand, MSE of the model on the test dataset represents the generalization errors. From the MSE on the training and test datasets for the fundamental, technical, and hybrid models, the following items are readily noticeable:

  1. 1.

    Fundamental analysis has the best generalization ability and results in the worst accuracy.

  2. 2.

    Technical analysis results in the best accuracy and has the worst generalization ability.

  3. 3.

    MSE of the hybrid model on training data is 0.0058 which indicates the generalization ability of the hybrid model.

  4. 4.

    The hybrid model contributes to the best results due to its overall performance on training and test datasets.

Furthermore, we split our data into three subsets: training, validation, and test data, and then we use cross-validation in order to prevent overfitting and underfitting in the hybrid model. Overfitting and underfitting affect the predictability of the hybrid model. If overfitting happens, the hybrid model learns the noises instead of the trends in the data. On the other hand, underfitting happens when the model misses the relationships between the input and output variables. The relation between objectives to be achieved and the estimated stock prices are shown in Fig. 8. The results are shown for all the data used as well as the training, validation, and test datasets, separately. These plots demonstrate the networks’ outputs with respect to the targets for training, testing, and validation datasets. For a perfect fit, the correlation coefficients, R values, are supposed to be equal to zero meaning that outputs = targets. This can be achieved only if the data falls along a 45-degree line.

Fig. 8
figure 8

The hybrid model performance for the training, validation, test, and all datasets

As it can be seen in Fig. 8, the correlation coefficients for the training, validation, test, and all datasets are 0.99915, 0.998, 0.99708, and 0.99868, respectively. It is observed that the cross-validation test verifies the quality of the hybrid model being able to accurately reproduce the stock prices. If more accurate results were required, the model could be retrained to modify the initial weights and biases of the networks and obtain an improved neural network. The error histogram with 20 bins is also examined and shown in Fig. 9 to identify the outliers and obtain additional verification of the neural network performance. The blue, green, and red bars represent the training, validation, and test datasets, respectively. From Fig. 9, the MSEs are distributed within a reasonably good range around zero. It can be understood that the data fitting of the hybrid model is quite precise.

Fig. 9
figure 9

Error histogram with 20 bins for the hybrid model

The time-series response of the model is illustrated in Fig. 10. From Fig. 10, it can be seen that the hybrid model predicts the future short-term stock trends successfully. Time-series response plot displays the inputs, targets, and errors versus time. It also indicates which time points were selected for training, testing, and validation.

Fig. 10
figure 10

Time-series response of the hybrid model

6.2 Directional Accuracy

In order to examine our hybrid model in terms of directional accuracy of the simulated results, we predict the future stock returns and compare them with actual data obtained from NASDAQ. The outputs of both fundamental and technical analysis are discretized into five classes similar to the classes of input data in the technical analysis. The discretization numbers 1, 2, 3, 4, and 5 represent a sharp fall, downward trend, non-trending, upward trend, and sharp rise in stock returns, respectively. The numerical results for the directional accuracies are shown in Fig. 11.

Fig. 11
figure 11

Directional accuracies of the fundamental, technical, and hybrid models categorized into sharp fall, downward trend, non-trending, upward trend, and sharp rise

In all cases, directional accuracies obtained from fundamental analysis are better than those reported by the technical analysis. The reason is the ability of MLP for fundamental analysis to generalize the results. In all cases, the hybrid model performs better than the other two models in terms of directional accuracies with an exception of stocks with downward trend. The average directional accuracy obtained from the hybrid model is 70.36% that is sizable relative to results of a B&H strategy. These results are consistent with the previous study by Bettman et al. [9] who showed that fundamental and technical analyses are complements rather than substitutes. Moreover, the MLP model presented in this paper can be applied to many stochastic processes with uncertainties including financial time series and time series of processes in physics, physiology, and biology. Our proposed model is capable of learning the prediction of one time series given the historical values of the same time series, the feedback, and another time series known as the external or exogenous time series.

7 Comparisons with State-of-the-Art Models

The state-of-the-art predictive models are presented in the following subsections.

7.1 Regression, ANN, SVM

In this subsection, we illustrate comparisons between the correlation coefficients resulting from our hybrid model and three other models: Regression model [21], ANN [21, 42], and support vector machines (SVM) [21, 43].

A multivariate linear regression model for predicting stock returns as in [21] can be written as:

$$\widehat{\mathrm{y}}={\mathrm{a}}_{0}+{\sum}_{\mathrm{i}=1}^{27}{\mathrm{a}}_{\mathrm{i}}{\mathrm{x}}_{\mathrm{i}}$$
(28)

where \(\mathrm{y}\) is the stock return and \({\mathrm{x}}_{\mathrm{i}}\) is the \(\mathrm{i}\) th independent variable, including the return with 1 day lag, the return with 2 days lag, etc. The model parameters \({\mathrm{a}}_{0}\) and \({\mathrm{a}}_{\mathrm{i}}\) are estimated through least square estimation. The predicted stock return \(\widehat{\mathrm{y}}\) is developed as follows:

$$\begin{aligned}{\widehat{\mathrm{y}}} & =6.9312-0.0234*{\mathrm{x}}_{1}+0.13*{\mathrm{x}}_{2}+0.021*{\mathrm{x}}_{3}+0.021*{\mathrm{x}}_{4} \\& - 0.021*{\mathrm{x}}_{5}-10.303*{\mathrm{x}}_{6}+6.0031*{\mathrm{x}}_{7}+0.7738*{\mathrm{x}}_{8} \\& +0.2779*{\mathrm{x}}_{9}- 0.43916{\mathrm{x}}_{10}-0.27754*{\mathrm{x}}_{11} +0.12733*{\mathrm{x}}_{12} \\&- 0.058638*{\mathrm{x}}_{13}+13.646*{\mathrm{x}}_{14} +9.5224*{\mathrm{x}}_{15}-0.0003*{\mathrm{x}}_{16}\\&+0.24856*{\mathrm{x}}_{17}- 0.0016*{\mathrm{x}}_{18} +0*{\mathrm{x}}_{19}- 2.334 \times {10}^{-9}*{\mathrm{x}}_{20}\\&+0.16257*{\mathrm{x}}_{21} +0.63767*{\mathrm{x}}_{22}- 0.14301*{\mathrm{x}}_{23}\\&+0.08*{\mathrm{x}}_{24}+0.074*{\mathrm{x}}_{25} -0.0002*{\mathrm{x}}_{26}+0.026301*{\mathrm{x}}_{27}\end{aligned}$$
(29)

The neural network model with sigmoid function with 27 input nodes and one output node which represents the stock return as in [21] will have the model setting and parameters as presented in Table 5.

Table 5 The settings and parameters of the neural network model

In the support vector machine (SVM) as in [21], the function \(\mathrm{f}\left(\mathrm{x}\right)\) which estimates the stock return in terms of the dot products between the data is written as:

$$\mathrm{f}\left(\mathrm{x}\right)={\sum}_{\mathrm{i}=1}^{\mathrm{l}}\left({{\alpha }}_{\mathrm{i}}-{{\alpha }}_{\mathrm{i}}^{*}\right)\mathrm{k}\left({\mathrm{x}}_{\mathrm{i}},\mathrm{x}\right)+\mathrm{b}$$
(30)

where \({{\alpha }}_{\mathrm{i}}\) and \({{\alpha }}_{\mathrm{i}}^{*}\) are the dual variables, \(\mathrm{b}\) is bias, and \(\mathrm{k}\) is the kernel function. We can write the objective functions for the SVM algorithm in terms of the dot products between vectors as follows:

$$\mathrm{max }-\frac{1}{2}{\sum}_{\mathrm{i}=1}^{\mathrm{l}}{\sum}_{\mathrm{j}=1}^{\mathrm{l}}\left({{\alpha }}_{\mathrm{i}}-{{\alpha }}_{\mathrm{i}}^{*}\right)\left({{\alpha }}_{\mathrm{j}}-{{\alpha }}_{\mathrm{j}}^{*}\right)\mathrm{k}\left({\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}}\right)-\upvarepsilon {\sum}_{\mathrm{i}=1}^{\mathrm{l}}\left({{\alpha }}_{\mathrm{i}}+{{\alpha }}_{\mathrm{i}}^{*}\right)+{\sum}_{\mathrm{i}=1}^{\mathrm{l}}{\mathrm{y}}_{\mathrm{i}}\left({{\alpha }}_{\mathrm{i}}-{{\alpha }}_{\mathrm{i}}^{*}\right)$$
(31)
$$\mathrm{s}.\mathrm{t}. {\sum}_{\mathrm{i}=1}^{\mathrm{l}}\left({{\alpha }}_{\mathrm{i}}-{{\alpha }}_{\mathrm{i}}^{*}\right)=0$$
(32)

where the kernel function is the dot product between the feature maps, \(\mathrm{k}\left({\mathrm{x}}_{\mathrm{i}},{\mathrm{x}}_{\mathrm{j}}\right)=\langle \Phi \left({\mathrm{x}}_{\mathrm{i}}\right),\Phi \left({\mathrm{x}}_{\mathrm{j}}\right)\rangle\). The common kernel functions (Gaussian, polynomial, and hyperbolic tangent), are shown in Table 6.

Table 6 The common kernel functions in SVM

In this SVM model, RBF is used as the kernel function. The parameters of the kernel function as well as the other SVM model parameters and the dual variables are estimated for prediction of each stock return and then the model is updated by adding the next stock return to the existing data. This process is repeated until the last stock return is predicted by using SVM.

The results of the training and testing datasets are displayed in Table 7.

Table 7 A comparison between the correlation coefficients of the proposed hybrid model, regression, ANN, and SVM

From Table 7, it can be seen that the proposed hybrid model outperforms the regression, ANN, and SVM models in terms of the correlation coefficients on the testing datasets since it yields the greatest value. In terms of the correlation coefficients on the training datasets, our proposed hybrid model results in a greater value as compared to the regression and ANN models. However, the proposed hybrid model and the SVM model presented by Sheta et al. [21] have similar performance in terms of the correlation coefficients on the training datasets as there is a relatively small difference between their R values (0.99915 and 0.9995, respectively).

7.2 Long Short-Term Memory

Long short-term memory (LSTM), which belongs to the recurrent neural networks, is designed for learning long-term dependencies and has the capability of capturing the exploding gradients in financial data, as shown in [34]. In LSTM, the primary characteristic is contained in the hidden layer with units known as memory cells. Each memory cell includes three gates: a forget gate \({\mathrm{f}}_{\mathrm{t}}\), an input gate \({\mathrm{i}}_{\mathrm{t}}\), and an output gate \({\mathrm{o}}_{\mathrm{t}}\). The schematic of a memory cell with three gates is depicted in Fig. 12.

Fig. 12
figure 12

Schematic of the LSTM model for financial data prediction presented by Fischer and Krauss

As it can be seen from Fig. 12, each gate contains an element of the input sequence at the current time \({\mathrm{x}}_{\mathrm{t}}\) and the output at the previous time \({\mathrm{h}}_{\mathrm{t}-1}\). In this LSTM, the forget gate specifies which information is removed from the cell state as written below:

$${\mathrm{f}}_{\mathrm{t}}=\mathrm{sigmoid}\left({\mathrm{W}}_{\mathrm{f},\mathrm{x}}{\mathrm{x}}_{\mathrm{t}}+{\mathrm{W}}_{\mathrm{f},\mathrm{h}}{\mathrm{h}}_{\mathrm{t}-1}+{\mathrm{b}}_{\mathrm{f}}\right)$$
(33)

Next, the input gate specifies which information is added to the cell state, which includes the operations for the computation of the candidate values \({\stackrel{\sim }{\mathrm{s}}}_{\mathrm{t}}\) and the computation of the activation values \({\mathrm{i}}_{\mathrm{t}}\) as written below:

$${\widetilde{\mathrm{s}}}_{\mathrm{t}}=\mathrm{tanh}\left({\mathrm{W}}_{\widetilde{\mathrm{s}},\mathrm{x}}{\mathrm{x}}_{\mathrm{t}}+{\mathrm{W}}_{\widetilde{\mathrm{s}},\mathrm{h}}{\mathrm{h}}_{\mathrm{t}-1}+{\mathrm{b}}_{\widetilde{\mathrm{s}},}\right)$$
(34)
$${\mathrm{i}}_{\mathrm{t}}=\mathrm{sigmoid}\left({\mathrm{W}}_{\mathrm{i},\mathrm{x}}{\mathrm{x}}_{\mathrm{t}}+{\mathrm{W}}_{\mathrm{i},\mathrm{h}}{\mathrm{h}}_{\mathrm{t}-1}+{\mathrm{b}}_{\mathrm{i}}\right)$$
(35)

Next, the new cell states are determined according to the results from the previous two steps as written below:

$${\mathrm{s}}_{\mathrm{t}}={\mathrm{f}}_{\mathrm{t}}\otimes {\mathrm{s}}_{\mathrm{t}-1}+{\mathrm{i}}_{\mathrm{t}}\otimes {\widetilde{\mathrm{s}}}_{\mathrm{t}}$$
(36)

where \(\otimes\) is the element-by-element multiplication. Then, the output gate specifies which information from the cell state is used as output as written below:

$${\mathrm{o}}_{\mathrm{t}}=\mathrm{sigmoid}\left({\mathrm{W}}_{\mathrm{o},\mathrm{x}}{\mathrm{x}}_{\mathrm{t}}+{\mathrm{W}}_{\mathrm{o},\mathrm{h}}{\mathrm{h}}_{\mathrm{t}-1}+{\mathrm{b}}_{\mathrm{o}}\right)$$
(37)
$${\mathrm{h}}_{\mathrm{t}}={\mathrm{o}}_{\mathrm{t}}\otimes \mathrm{tanh}\left({\mathrm{s}}_{\mathrm{t}}\right)$$
(38)

In Eqs. (33) to (38), \({\mathrm{W}}_{\mathrm{f},\mathrm{x}}\), \({\mathrm{W}}_{\mathrm{f},\mathrm{h}}\), \({\mathrm{W}}_{\widetilde{\mathrm{s}},\mathrm{x}}\), \({\mathrm{W}}_{\widetilde{\mathrm{s}},\mathrm{h}}\), \({\mathrm{W}}_{\mathrm{i},\mathrm{x}}\), \({\mathrm{W}}_{\mathrm{i},\mathrm{h}}\), \({\mathrm{W}}_{\mathrm{o},\mathrm{x}}\), and \({\mathrm{W}}_{\mathrm{o},\mathrm{h}}\) are the weight matrices, and \({\mathrm{b}}_{\mathrm{f}}\), \({\mathrm{b}}_{\widetilde{\mathrm{s}},}\), \({\mathrm{b}}_{\mathrm{i}}\), and \({\mathrm{b}}_{\mathrm{o}}\) are the bias vectors.

Fischer and Krauss in [34] use S&P 500 index from 1989 to 2015 divided to training and trading datasets for training the LSTM and predicting unseen data. They build portfolios of various sizes, trade according to their proposed model, and evaluate the performance of their LSTM model based on performance metrics including daily return, Sharpe ratio, and accuracy. The current study differs from the study by Fischer and Krauss since we do not generate trading rules based on our model, and thus the performance measures such as daily return and Sharpe ratio are not suitable metrics for comparisons in this paper. However, the hybrid model presented in the current study can be compared to the model presented by Fischer and Krauss according to the resulting accuracy of the models. They show that their proposed LSTM model results in an accuracy of 54.3%, whereas the hybrid model by integrating the fundamental and technical analysis presented in this study yields an average accuracy of 70.36%. Moreover, it should be noted that the reason we use data Technology companies is that the hybrid model presented in this study performs based on integrating the fundamental and technical analysis of stock market. Combining data from different sectors (i.e., Technology, Financials, Oil & Gas, etc.) adds more noises to our data since the companies from different sectors may include different financial ratios reported in their financial statements. Therefore, we use data from one sector only in our hybrid model to reduce the noises, whereas the LSTM model presented by Fischer and Krauss performs based on technical analysis only (past historical prices) and does not take into account the fundamental analysis of the stocks. Hence, they use data from S&P 500 regardless of the sector the company belongs to, without increasing the noises. However, for the comparison purposes, we aim to use the same data with the same settings as in [34], and we obtain an average directional accuracy of approximately 59.92%, which is still greater than the accuracy resulting from the proposed LSTM model presented in [34] (i.e., 54.3%).

7.3 Hierarchical Graph Attention Network

Prior to presenting the methodology by Kim et al. in [35], no existing work had incorporated the impact of utilizing various types of relations on stock movement prediction or determining an effective way of selectively aggregating information on various relation types. Kim et al. present a hierarchical attention network for stock movement prediction (HATS) to address this issue observed in the literature. Their methodology incudes a feature extraction module which performs based on LSTM, as described in the previous subsection, presented in [34], a relational modeling module which performs based on a graph neural network (GNN), and a prediction layer which predicts the future stock movements.

In the hierarchical attention network, at the first state attention layer, HATS extracts significant information from a set of neighboring nodes. Then, different weights based on the current state of a neighborhood node are calculated. The relation type embedding vector \({\mathrm{e}}_{{\mathrm{r}}_{\mathrm{m}}}\) and the node representations are examined. The attention score is calculated as:

$${\mathrm{u}}_{\mathrm{ij}}={\mathrm{x}}_{\mathrm{ij}}^{\mathrm{rm}}{\mathrm{W}}_{\mathrm{s}}+{\mathrm{b}}_{\mathrm{s}}$$
(39)
$${\mathrm{a}}_{\mathrm{ij}}^{\mathrm{rm}}=\frac{\mathrm{exp}\left({\mathrm{u}}_{\mathrm{ij}}\right)}{\sum_{\mathrm{k}}\mathrm{exp}\left({\mathrm{u}}_{\mathrm{ik}}\right)},\mathrm{ k}\in {\mathrm{N}}_{\mathrm{i}}^{\mathrm{rm}}$$
(40)

where \({\mathrm{x}}_{\mathrm{ij}}^{\mathrm{rm}}\in {\mathrm{R}}^{\left(2\mathrm{f}+\mathrm{d}\right)}\) is the concentrated vector; \({\mathrm{W}}_{\mathrm{s}}\in {\mathrm{R}}^{\left(2\mathrm{f}+\mathrm{d}\right)}\) and \({\mathrm{b}}_{\mathrm{s}}\in \mathrm{R}\) are the parameters. A representation of a vector of relation \(\mathrm{m}\) for company \(\mathrm{i}\) can be written as:

$${\mathrm{s}}_{\mathrm{i}}^{\mathrm{rm}}={\sum}_{\mathrm{j}\in {\mathrm{N}}^{\mathrm{rm}}}{\mathrm{a}}_{\mathrm{ij}}^{\mathrm{rm}}{\mathrm{e}}_{\mathrm{j}}$$
(41)

where \({\mathrm{e}}_{\mathrm{j}}\) represents the \(\mathrm{j}\) th relation-type embedding vector. \({\mathrm{s}}_{\mathrm{i}}^{\mathrm{rm}}\), \({\mathrm{e}}_{\mathrm{j}}\), and \({\mathrm{e}}_{{\mathrm{r}}_{\mathrm{m}}}\) are concentrated to utilize \({\widetilde{\mathrm{x}}}_{\mathrm{i}}^{\mathrm{rm}}\) as input to the relation attention later as:

$${\widetilde{\mathrm{u}}}_{\mathrm{i}}^{\mathrm{rm}}={\widetilde{\mathrm{x}}}_{\mathrm{i}}^{\mathrm{rm}}{\mathrm{W}}_{\mathrm{r}}+{\mathrm{b}}_{\mathrm{r}}$$
(42)
$${\widetilde{\mathrm{a}}}_{\mathrm{i}}^{\kern 0.03000 em\mathrm{rm}}=\frac{\mathrm{exp}\left({\widetilde{\mathrm{u}}}_{\mathrm{i}}^{\mathrm{rm}}\right)}{\sum_{\mathrm{k}}\mathrm{exp}\left({\widetilde{\mathrm{u}}}_{\mathrm{i}}^{\mathrm{rk}}\right)}$$
(43)

where \({\mathrm{W}}_{\mathrm{r}}\) and \({\mathrm{b}}_{\mathrm{r}}\) are the parameters. Then, the weighted vectors of each relation type are added to build an aggregated relation representation:

$${\mathrm{e}}_{\mathrm{i}}^{\mathrm{r}}={\sum}_{\mathrm{k}}{\widetilde{\mathrm{a}}}_{\mathrm{i}}^{\kern 0.03000 em(\mathrm{rk})}{\mathrm{s}}_{\mathrm{i}}^{\mathrm{rk}}$$
(44)

The representation of a node is finally added as:

$${\overline{\mathrm{e}} }_{\mathrm{i}}={\mathrm{e}}_{\mathrm{i}}^{\mathrm{r}}+{\mathrm{e}}_{\mathrm{i}}$$
(45)

Once the HATS is developed, it is used for predicting the stock movements of the companies. The performance metrics used by Kim et al. in [35] for evaluating the performance of their proposed model on portfolios of stocks from different sectors as well as the individual companies are return, Sharpe ratio, and accuracy. Since in the current study we do not build a portfolio for evaluating our proposed hybrid model, and also, we do not generate buy/sell trading strategies, the performance measures such as return and Sharpe ratio are not appropriate metrics for the comparison between our proposed hybrid model and HATKS by Kim et al. as in [35]. However, accuracy can be used for comparison between the performance resulting from the two models. Moreover, the other main difference between our methodology and the one by Kim et al. is the type of analysis we use. In [35], they use only the historical stock prices (after feature extraction) as the input to their predictive model, while our hybrid model presented in the current study uses the financial ratios of the companies in addition to the historical stock prices as the primary input variables. Hence, collecting data from companies which belong to different sectors will add noises to the data. Therefore, we attempt to collect data from companies which belong to the same sector to reduce noises. However, for comparison purposes, as it was mentioned in the previous subsection, we use the same experiment protocol as in [35]. The numerical results reveal that our proposed hybrid model based on integrating the technical and fundamental analysis of stock market, under the same experiment protocol as in [35], yields an accuracy of approximately 48.36% which is still greater than that of HATS model (i.e., approximately 40% on average). This indicates that the combination of technical and fundamental analysis contains more information as compared with individual models which perform solely based on technical or fundamental analysis.

8 Conclusions

Predicting the future stock trends is important for investors who seek to produce a return on the capital invested. This is a complicated and difficult task because the stock market is affected by many factors. AI-based techniques including neural networks have been successfully applied to solve the problems of stock market predictability. In this study, multilayer feedforward perceptron neural network (MLP) is adopted to perform a fundamental and technical analysis and predict the short-term stock trends. In addition, a nonlinear autoregressive with exogenous input (NARX) in neutral networks is used to integrate the two primary stock market analyses into a hybrid model. Twelve significant financial ratios are selected by conducting a feature selection preprocess, and the selected ratios are transformed into discretized counterparts by repeating the MLP hyper-parameter optimizations. The number of stock prices as our input data is determined through an autocorrelation preprocess. Moreover, the historical stock prices are categorized into different clusters, and the dimensionality of the input space is reduced using a self-organizing map. A numerical comparison of the hybrid model with regression, ANN, and SVM models are performed, and it is shown that our hybrid model outperforms the other three models in terms of the correlation coefficients on the training and testing datasets. This study shows that fundamental and technical models are capable of predicting the short-term stock trends with certain accuracies of 64.38% and 62.85%, respectively. Stock price prediction can be enhanced by using a hybrid model integrating fundamental and technical analysis resulting in an average directional accuracy of 70.36%, which is greater than those of the state-of-the-art models. These values are very satisfying when compared to the market data. It is also shown that a cross-validation verifies the quality of our hybrid model and the data fitting of this model is precise. The numerical results yield support for the overall efficiency of the hybrid model. It is found that the hybrid model results in the best performance except with the stocks that have downward trend. The advantages of the hybrid model are as follows: (1) it is easy to implement and operate, and (2) it is not too expensive. In the future work, a supervised clustering method can be studied and compared to the existing (unsupervised) classification systems. The prediction accuracy of our model can also be examined in a portfolio scheme, which has been built based on the selected stocks from the supervised clusters. The proposed model can also be applied to time series in different disciplines and branches of science including physics, biology, etc.