seMLP: Self-evolving Multi-layer Perceptron in Stock Trading Decision Making

There is a growing interest in automatic crafting of neural network architectures as opposed to expert tuning to find the best architecture. On the other hand, the problem of stock trading is considered one of the most dynamic systems that heavily depends on complex trends of the individual company. This paper proposes a novel self-evolving neural network system called self-evolving Multi-Layer Perceptron (seMLP) which can abstract the data and produce an optimum neural network architecture without expert tuning. seMLP incorporates the human cognitive ability of concept abstraction into the architecture of the neural network. Genetic algorithm (GA) is used to determine the best neural network architecture that is capable of knowledge abstraction of the data. After determining the architecture of the neural network with the minimum width, seMLP prunes the network to remove the redundant neurons in the network, thus decreasing the density of the network and achieving conciseness. seMLP is evaluated on three stock market data sets. The optimized models obtained from seMLP are compared and benchmarked against state-of-the-art methods. The results show that seMLP can automatically choose best performing models.


Introduction
Stock markets are one of the most dynamic systems where a large number of parameters influences the changes in the market [1][2][3]. Traditionally, two methods are popular for predicting trends of markets, (i) analysis of the financial condition, current trends of economic, several national/international issues, and so on (ii) analysis of historical data such as the movement of the stock prices. Historical data can be analyzed using a variety of indexes such as moving average, golden cross, dead cross, etc. Researchers proposed several modeling approaches such as rule-based [4,5], trading pattern [6], classifier-based [7], etc. Recent growing nature of Artificial Intelligence (AI) also opened up new opportunities in computational stock trading. Recently, several soft computing and machine learning-based algorithms such as ANN-based [8,9], Optimized Network-based [10], machine learning fusion [11], have been proposed. Due to the complex nature of the stock market, it is very difficult to design a static intelligent system for stock trading [12].
Artificial neural networks (ANN) have garnered a lot of attention these days due to its accurate predictions in various diverse fields such as medical [13], natural language processing [14] and image recognition [15] among many others. However, it often requires expert tuning of the hyperparameters to be able to generate accurate predictions [16]. Architectures often have to be crafted and subsequently trained and tuned by experts and different permutations have to be tested manually to determine the best topology of the neural network that gives the best results. With a predefined topology, the whole process becomes an adaption of weights to the data and there is no real learning of the topology to best represent knowledge abstraction of the data. In stock trading, if accurate predictions can be obtained using ANNs, investors will be able to detect trend reversals earlier which will allow them to make more informed decisions to take a long or short position in the market. These days, statistical methods such as the ARIMA method [17] are often used for stock prediction. Since ANNs are a universal function approximator [18], it can be used to model stock data better than such statistical methods which are based on moving averages [19,20].
Neural network compression is a way to reduce number of parameters without compromising the accuracy (see Fig. 1). With the daily increasing demands for on-device computation, the demand of optimized neural networks are also increasing. Such compression is useful in IoT devices, drones, self driving cars, and mobile devices. In [21] authors used heuristic search based on threshold for compression. Baysian sparsity based weigh estimation is proposed in [22]. Yu et al. [23] proposed to use a weight pruning method for compression. Recently, Ma et al. [24] proposed network comparison using Baysian optimization. Low-rank decomposition based method replace set of parameters by the lowrank approximation for network reduction [25]. In this types of compression Singular Value Decomposition (SVD) [26] and Canonical Polyadic (CP) decomposition [27] are among popular choices. In this paper, we have proposed a selfevolving intelligent framework based on multi-layer perceptron for decision making in stock trading. The proposed method can automatically design the optimum neural network from the data itself.
The main contributions of the paper are: (a) A self-evolving ANN is proposed that use the human cognitive ability of concept abstraction and GA for automatic creation of neural network for a given data set. (b) A link pruning module is employed for removing unnecessary links in the network. The module allows automatic design of a compressed network without compromising the performance. (c) The method is applied on three publicly available stock trading data set and achieved state-of-the-art performance.
The rest of the paper is organized as follows. Sect. 2 describes the proposed seMLP and the components. Section 3 demonstrates the results and discussion of the results and finally, Sect. 4 concludes the article.

Proposed seMLP
The proposed self-evolving Multi-Layer Perceptron (seMLP) network is a novel neural network that builds its architecture based on the data and hence is capable of adapting its architecture to the data set for better accuracy. seMLP consists of two primary modules. First, it adopts the genetic algorithm (GA) approach to determine the optimal number of hidden layers and neurons per layer to allow the network to better generalize to the data. By allowing the network to discover the best architecture for itself, seMLP can learn the best topology for the data presented without any expert tuning or input. The main goal is to allow the network to be constructed such that the minimum number of hidden layers and neurons are used to derive the most compact and concise model for any data that is fed into the network. Second, a link pruning module is employed to shrink the size of the network by removing links which are redundant and to achieve conciseness and the least density. The framework is depicted in Fig. 2. There are two primary modules (marked with a circle) to produce a compressed neural network from the historical data.
Let D is a data consists of a set of features (x) and an outcome (y). D can be divide into train and test (D t , D v ) any number of times for cross validation. Let S is a search space contains nodes and layers that can form a neural network by taking any number of nodes and layers. Performance   a neural network (p( )) is defined by the mean squared error (MSD) and calculated as 1 2 ∑ n i=1 (ŷ − y) 2 and is cross validated over D . Weight of a neural network ( ( )) of the search space S , ∈ S is defined by the number of nodes, layers, and links present in the network. In this paper Eq. 1 is solved to find the best network for a given data.

GA-based Network Evaluation
Using a genetic algorithm, seMLP is able to identify an optimal architecture for the network. The parameters to be tuned by the genetic algorithm are: (a) number of hidden layers and (b) number of nodes per hidden layer. A constraint is also set such that a subsequent hidden layer should have a smaller number of neurons for the concept of abstraction and this is realized in the fitness function to force the genetic algorithm to favor networks with such characteristics.
Chromosome Encoding: seMLP employs binary encoding for its chromosome. In the chromosome, it contains a layer cost ( ) and also another variable represents the binary form of layers ( ) . This chromosome is the backbone of the individual candidate solution in the population. All the genetic material is contained in this chromosome thus it means the candidate solution is encoded inside the chromosome. From this chromosome, we can derive the architecture of the neural network. For example, the number of hidden layers of a network is equal to the length of or . The number of neurons ( ) in i th hidden layer is defined as (2). Where the value of is the integer value of the binary form of the layer.

Population Initialisation:
The population is initialized randomly to allow for a random distribution of individuals with different candidate solutions. This gives the genetic algorithm a large heterogeneous pool of genes to choose from to ensure that it is more likely to be able to break out of local optima to get closer to the optimal solution. Second, the population initialized must also be big enough so that there are enough representative candidate solutions contained within the initial population and prevent it from getting too homogenous after a few evolutions.

Selection:
The selection operator is used to select individuals to breed for the next generation of network. seMLP has adopted a hybrid approach to the selection where a portion of the fittest candidate solutions from the current generation will be kept and the whole population will also be used for (1) min ∈S (p( ), ( )).
breeding. Thus, the top 10% of the network structure of current generation is always kept and allowed to stay on in the next generation without any modifications. To ensure that the population will get fitter over time, seMLP employs rank weighting. All the individuals in the population is ranked based on the value of their fitness function and fitter with a higher probability. Using rank weighting, it ensures that networks with higher values of fitness function will have a higher probability of being selected for breeding. In contrast, solutions with lower values have a lower chance of being selected. Hence, by selecting fitter individuals for breeding, the next generation will have a higher probability of being fitter than the current generation. The probability of choosing any candidate solution (S) in a population N is calculated as N−rank ∑ n i i . Where rank is the position of S in the population according to fitness.

Crossover:
The crossover operation is one of the most important functions as it decides the future of the population. After the selection phase, the crossover operation chooses two parents and breeds two new children by recombining the chromosomes of the parents. After two parents are chosen, there must be checks to make sure that they are not the same parent as it would mean the children will turn out to be similar. Since the crossover operation is used to create a new children chromosome in GA and the chromosome in seMLP consists of the and , the crossover operation has to be performed on both these values. In seMLP, the singlepoint crossover is used because it is fast, easy to calculate and it generally gives a good candidate solution as a result.
Integer Crossover for : The values of are integers in the chromosome. Hence, the integer crossover [28] is used. A random weight ( ) between [0, 1] is chosen and multiplied with the . The of the two children are re-combined values of the parents. The crossover is defined as: where p1, p2 are parent and c1, c2 are children.
Single Point Crossover for : The in the chromosome is binary encoded. Thus, we can use the single point crossover for binary encoded values. A random point, k, is chosen as the crossover point in the string of binary digits. This ensures that the child carries genetic information from both parents. The single point crossover are defined as: where k is the maximum length of the layers.
Choosing Layers for Crossover: If the number of layers of the chosen parents differ, there is a need to choose exactly which layers are to be involved in the crossover because there is positional significance to the layers. For example, if the first parent has five layers and the second parent has only two layers, there might potentially be information loss if the fourth layer of p1 is crossed with the first layer of p2. This is because the first layer of p2 is likely to be a more generalized layer as it is the first hidden layer while the fourth layer of p1 is likely to be a more concrete layer. Hence, during crossover, the position of the layers selected for crossover should be kept relatively similar between the two chosen parents thus preserving the positional significance of the layer. Therefore, for cases where the number of layers differ between the two parents, we propose a structure for choosing which layers are to be chosen for crossover. First, for the two resultant children, the number of layers for them will be the same as the parents. For example, if parent 1 has 5 layers and parent 2 has 3 layers, child 1 and child 2 will then have five and three layers too. The structure for the crossover is defined by a partition point (p). p is defined in 7. Provided parent p1 always have more than or an equal number of layers compared to parent p2: Subsequently, for each layer in p2, the layers in p1 that could be selected for crossover with the layer in p2 is restricted to the range { i , i + p} . Within that range, the layer to be crossovered with p2 will be randomly selected with equal probability. The child with the same number of layers as p1 will then inherit the rest of the missing layers from p1. If the crossover produces any layer with zero nodes or = 0 then the process is repeated.

Mutation:
The purpose of the mutation operator is to ensure that the population does not get too homogenous which might result in candidate solutions getting stuck in local minima which decreases the possibility of getting a more optimal solution. In seMLP, the mutation is achieved by a parameter called mutation probability ( ) . A higher probability will result in more mutations which will introduce more new genetic material back into the population at a higher rate while a lower probability will introduce new genes at a slower rate. seMLP does not mutate individuals who are in the top 10% of the population in terms of fitness and these are the fittest individuals who are selected to be ⌋ kept without any changes. As there are two components to the chromosome of seMLP which is the and , the mutation rate is also defined for both of them ( , ) and these are in the range [0,1]. In each mutation , is chosen randomly and must be less or equal to the restriction of the mutation probability.

Fitness Function:
The fitness function is the most important parameter for the success of any genetic algorithm. It affects not just the convergence of the genetic algorithm but also the quality of the final solution; whether it is near the global optimum. The fitness function must thus be crafted according to what the desired outcome of the genetic algorithm is. For seMLP it is to discover the best architecture for the input data. Hence, there will be three aspects to the fitness function: (i) The network must hold the property of knowledge abstraction i.e. proceeding hidden layers must have lower or equal number of nodes, (ii) The network must be minimized in terms of number of layers and nodes, and (iii) The network performance must be higher in terms of RMSE or R 2 . Therefore, these goals have to be implicitly encoded into the fitness function for the genetic algorithm to discover the best solution.
First, we define abstraction penalty ( ) to ensure the knowledge abstraction. is defined in (8).
Next, a hidden layer penalty ( ) is introduced to ensure minimum network depth. is calculated as n × c h , where n is number of hidden layer and c h is hidden layer cost, we have taken c h = 0.1 empirically and found suitable in our case. The normalized constant (c) is used to normalized penalty. It is defined in Eq. (9), where is the lower limit of the credit in a generation.
Credit ( ) is the cost of a network, higher cost represents a complex network. For a given network is calculated as ∑ n i=1 i , where n is the number of layers present in the network. Finally, a credit ratio ( ) is calculated as Eq. (10). is normalized between 0 and 1 over a generation.
Outliers' Penalty: Outlier values for the quality metric can be defined as the values which are very far from the distribution of the other values. Mathematically, if the population is fairly normally distributed, it can be defined as values which lie outside more than three standard deviations away from the mean. As seMLP employs feature scaling, any outliers might skew the distribution in favor of the outliers which means that all the other non-outlier values might have a very small range after scaling. This is not ideal as the majority of the values are "squashed" and might affect the accuracy of the fitness evaluation as the credit ratio ( ) might outweigh the quality metric. Therefore, seMLP employs an outlier labeling-based technique using the boxand-whisker plot or more commonly known as the boxplot. The advantage of using a boxplot is that it is computationally fast, does not require any tuning parameters and also uses robust summary statistics that are located at actual data points [29]. Furthermore, it does not make any distributional assumptions and does not depend on any standard deviation or mean. Hence, we can apply the boxplot to detect and remove outliers. Tukey et al. [30] defined the inter-quartile range (IQR) as the interval between the lower quartile (q 1 ) and upper quartile (q 3 ) . The lower quartile (q 1 ) is the 25th percentile while the upper quartile (q 3 ) is the 75th percentile.
The IQR can then be used to identify outliers according to the illustration below: In seMLP, there is only a need to detect outliers at one of the extreme points. For example, if RMSE is the quality metric used, seMLP will only utilize the boxplot method to catch outliers above the upper extreme as the lower the RMSE, the better the quality of the candidate solution. After the outliers are found, they will be removed from the normalization of the quality metric as they will skew the normalization as described above. Although these outliers represent a very bad candidate solution, they are not eliminated from the population immediately by forceful removal. Instead, they are allowed to remain in the population in the last few positions in terms of ranking to maintain the variety in the gene pool. Thus, their fitness function will be calculated in a slightly different manner from the rest of the population. The difference will be in the calculation of the quality metric as the outliers only affect the calculation of that. Hence, the calculation of the quality metric of the outliers must give Lower Extreme = q 1 − (3 × iqr), a worse result compared to the non-outliers, such that the outliers will remain the last few in the population.
First, the minimum fitness of the entire population based on the non-outliers has to be calculated as such based on for RMSE and R 2 respectively. Next, the fitness of the outlier is calculated and an outlier penalty ( ) is introduced as in Eq. (16).
is used to normalize the fitness score. High value of puts lower weight of outliers when we normalize the fitness score. Finally, the fitness score ( ) is calculated using

Link Pruning
After the best architecture of the neural network has been found by the genetic algorithm, seMLP will proceed to prune the links/weights in the network to remove the largest number of links possible without affecting the accuracy of the MLP. In seMLP, pruning of links is done based on the threshold principle. If links that have weights above a certain threshold, they are kept as they are deemed to be important while links at or below the threshold are removed by setting them to zero. To find the threshold, a binary search is employed. The threshold will be found using a binary search based on the RMSE error returned after the threshold is applied. The thresholding is based on percentile ranking so the threshold is set in the range from (0, 100) non-inclusive. For example, if the threshold set is 60%, any links below the 60th percentile of all weights for that layer will be removed. Linear interpolation is also used when the desired percentile lies between two data points (i, j) and i < j . Binary search works for the pruning of links because there is a direct correlation between the RMSE of the neural network against the links. If the threshold is set too high, the RMSE of the neural network increases as too many useful links have been pruned. On the other hand, if the threshold is set too low, the RMSE of the neural network does not change because no useful links have been pruned and thus the accuracy of the network has not been affected. Therefore, this fits the requirements of using a binary search algorithm where the threshold responds proportionally to its performance. This can be illustrated according to the Fig. 3.
After the thresholds for each layer has been found by binary search, the whole network can be pruned. The links between each layer can be removed according to the respective thresholds found. The last layer to the output layer is fully connected and not pruned because it would affect (16) = min(fitness non outliers ) − fitness outliers , the error too drastically. This is similar to the CNN architecture [31], where the output layer will always be fully connected to ensure that all the information flow and the knowledge learned in the network will reach the output thus giving a more accurate result. Any links removed from the preceding layer will not affect the subsequent layer because those links are below the threshold to make any meaningful contribution to the node that it is connected to. Hence, removing it will not affect the performance of the network and the whole network can be pruned.

Results and Discussion
This section, the performance of seMLP for optimizing the network and also performance in stock trading is analyzed.
The experiments are carried out on three different unique stock indexes: (a) S&P 500 Index, (b) NASDAQ Composite, and (c) Korea Composite Stock Price Index (KOSPI). The first two are chosen as they are two of the most heavily followed stock indices in the US stock market and most investors around the world will be concerned about its movement and trends. The KOSPI is the representative stock market index of South Korea and it is chosen because it one of the more popular stock market index in Asia and to give some contrast to the S&P500 and NASDAQ Composite which are stock market indices based in the western world. The details of the data sets are given in Table 1.

Experimental Set-up and Hyperparameter
For the experimental setup of the genetic algorithm portion of seMLP, it requires a definition of the parameters for both the genetic algorithm portion and also for the neural network training in the genetic algorithm. The genetic algorithm portion of seMLP requires certain parameters to be set such as the limits for the number of hidden layers and the number of nodes per hidden layer. These are defined in Table 2.
As the genetic algorithm portion of seMLP takes in neural networks as candidate solutions, there is a need to keep the hyperparameters of the neural networks constant across the whole population and generations. The hyperparameters of the neural network are defined in Table 3.
We have taken input feature length 39 empirically. First, to construct the input vector, the difference between the prices of consecutive days have been calculated using (18), where y(t) is stock price at t.
A sliding window of size 39 is then used over y with a stride of 1 to determine the range of input vectors for the neural network. Thus, each input vector will now contain the delta changes in prices between consecutive days. Second, after preparing the input vectors, each input vectors have to be normalized. This is one of the most important preprocessing steps such that the neural network will be able to learn the trend and changes in prices. Each input vector and corresponding output will be normalized within their own window.      All the experiments are carried out in Intel(R) Xeon(R) Gold 6154 CPU with 128 GB of RAM and NVIDIA Quadro RTX 600 GPU of capacity 64 GB. The data set is divided into two parts. First 80% data is used for training and rest for test and performance evaluation.

Results and Analysis of GA-Based Evolving
This section we present the evaluation of the networks and their performance against generation. We have analyzed the best R 2 and average error in each generation. The average error is defined as the mean of the error between the predicted and correct prices over all individuals in the population of that generation. Figs. 4, 5, and 6 presents such results on three public data sets. It is observed that the average R 2 is gained as 0.00223% , 0.108% , and 0.0720% respectively in S&P, NASDAQ, and KOSPI data set.
It can be seen that for all the three data sets, the best R 2 value improves from generation 0 before converging to its final value. Before it converges, it is often able to break out of local optima and to continue increasing. This demonstrates the ability of the genetic algorithm to find better candidate solutions through mutation and crossovers. The average error for each generation is also decreasing which confirms that the R 2 value is reporting the correct trend of improvement where the predicted prices are getting more accurate.

Optimum Model Selection:
The main motivation of the uses of GA is to find the best network structure for the prediction of the various stock market indices. The best architectures for the three different stock market is chosen based on the performance of the network in terms of R 2 score. The best results of prediction in first and final generation is reported in Table 4. The models producing best R 2 are selected. It is observed that a network with only two hidden layers is sufficient for all the cases but the number of nodes in each layer is different. The best network found by GA as {41, 34} nodes (S&P 500), {8, 3} (NASDAQ), and {20, 12} number of nodes (KOSPI). Thus, drawing conclusions from these genetic algorithm experiments, it can be concluded that in general for stock market indices data sets, only two hidden layers are needed for the depth of the network while the nodes per hidden layer can be kept below 50. This should be sufficient to model most stock market indices and give a reasonably good prediction.

Pruning of Links
After the genetic algorithm in seMLP finds the most compact architecture with the minimum width for the data set involved, seMLP can prune the links in the network so that the network will be the most concise. As all the neural networks built in the genetic algorithm are fully-connected   multi-layer perceptrons, they are the densest possible and this section aims to cull the redundant links in the network. Only the links from the input layer to the last hidden layer is retrieved. The links from the last hidden layer to the output layer is not retrieved and not pruned because experimentally, we have found that culling the links from the last layer substantially decreases the accuracy and thus that last layer should stay fully connected. We have used three neural networks of the same architecture for each data set as three neural networks to predict prices at t + 1 , t + 2 and t + 3 . We have observed that the links of the networks are not similar during pruning. The results of such pruning are depicted in Fig. 7. From the pruning results, typically about 17-94% of links can be pruned between consecutive layers in a neural network. For a complete neural network with all the layers, 14.29-84.90% of total weights can be pruned from the network. This shows that there are actually a lot of redundant links in a neural network as these links are too low in weight to affect the final output of the network. Hence, they can be deleted from the network without affecting the accuracy thus saving space and computation complexity. It is also observed that the S&P500 data set has the highest number of redundant links at 80.35%, followed by the KOSPI at 43.40% and lastly the NASDAQ Composite at 28.77%. This could be because the neural network for the S&P500 data set is the most complex and thus have the largest number of links, therefore, increasing the chances of redundancy. On the other hand, the neural network for the NASDAQ Composite is much less complex with only eight nodes in hidden layer 1 and three nodes in hidden layer 2. Thus, the links in the network will be significantly more important as there are much lesser links in the network. Hence, each link is more likely to play a significant role in the output so the threshold for pruning will be lower.

Benchmark Experiment
To evaluate the results of the genetic algorithm with other neural networks, benchmark experiments were performed using the S&P 500 data set. For this benchmark experiment, the genetic algorithm was used to determine the best topology for the S&P 500 data set. The seMLP neural network generated by the genetic algorithm uses the same hyperparameters as described earlier. Two other multi-layer perceptrons (MLP) with different hidden layer sizes are also designed to use as benchmark comparisons with the seMLP network. The accuracy metrics Non-Dimensional Error Index (NDEI) is used and defined as NDEI = RMSE , is the standard deviation of the correct output values. The benchmark experiments have been carried out on S&P 500 data set using evolving Mamdani fuzzy inference system (eMFIS) [32], Self-reorganizing Fuzzy Associative Machine (SeroFAM) [33], Evolving Fuzzy Neural Network (EFuNN) [34], SeroFAM using An extended Takagi-Sugeno-Kang inference system (SeroFAM-TSK) [35] and dynamic evolving neural-fuzzy inference system (DEN-FIS) [36] are fuzzy neural network (FNN) models which are split into Mamdani and TSK models. All these FNN models and the two ANN models are used as benchmarks against the seMLP model. An increase in NDEI represents an increase in error hence it means the accuracy is worse. The results are summarized in Table 5.
For the two ANN models, the NDEI is the average over ten different trainings selection to ensure an accurate representation of the error. The results indicate that seMLP compares favorably with fuzzy neural network models. It outperformed Mamdani type models which are focused on interpretability but it could not match the TSK models which are the most accurate type of fuzzy neural networks. These results show that the genetic algorithm portion of seMLP is strong and is able to determine a good topology for an MLP network that can hold its own against another more advanced type of neural network architectures such as FNNs. Therefore, since the genetic algorithm is modular, it can potentially be used on other architectures such as on fuzzy   [32] Mamdani 0.0241 SeroFAM [33] Mamdani 0.0272 EFuNN [34] Mamdani 0.1544 SeroFAM-TSK [35] TSK 0.0199 DENFIS [36] TSK 0.0200 seMLP {41, 34} Layers MLP 0.0168

SN Computer Science
neural networks to find the best topology for the data set. This would give a better result for those networks. From the results, it can also be seen that the architecture derived from seMLP outperformed the other artificial neural networks (ANNs) that do not have abstraction built into its topology. The seMLP network has hidden layers of size 13 and 2 for the first and second layers respectively which shows that the hidden layers decrease in size for knowledge abstraction to happen in the network. Benchmarking this with two other ANNs that has the same hidden layer size and another with an increasing hidden layer size, the results show that a multi-layer perceptron (MLP) network with abstraction outperforms other MLP networks that are not constrained for abstraction as evident in the lower NDEI value for the seMLP network. Therefore, it can be concluded that for an MLP network to achieve the best accuracy, knowledge abstraction should happen in the neural network thus designing an architecture with decreasing hidden layer size will aid this abstraction of knowledge and give predictions with higher accuracy.

Adaptability of seMLP
Instead of using fully connected multi-layer perceptrons, seMLP can also be adapted to use other forms of neural networks such as LSTMs. For seMLP, it is a very modular algorithm as it is derived from the genetic algorithm. Hence, any form of the neural network can be fed into the genetic algorithm portion of seMLP and it will be able to find the best architecture of that particular network. If an LSTM is used instead of an MLP, the genetic algorithm will be able to find the best architecture for the number of LSTM cells and best number of layers within each LSTM cell to give the highest accuracy of prediction from the LSTM network. The link pruning module is not applicable here. We have also experimented with an LSTM based model selection applicable to S&P 00 data set. The parameters of the GA is presented in Table 6. The parameters of LSTM based training is reported in Table 7.
The optimizer ends with a LSTM containing three layers with [200,184,125] nodes in the corresponding layers (excluding the input and output layers). The model achieved best R 2 of 0.9882254, and NDEI of 0.0144.

Ablation study
Here, we present the ablation study of the proposed seMLP. We have used S&P500 data set and evaluation metric discussed in Sect. 3.4. The results are summarized in Table 8. We have considered four experimental conditions. First, we use an ANN with two layers and ten nodes in each layer. Next, we use maximum possible layers (6) with maximum possible nodes (255) according to our previous experimental setup. We note that both networks perform poorly. In the third case, we use seMLP produced network without pruning the links. We note that the optimized network is compressed with high accuracy (lower NDEI). Finally, we show that the network can be further compressed without compromising accuracy.

Conclusion
Due to the increasing growth and popularity of artificial neural networks for applications in many industries today, there is a need for neural networks that can adapt its architecture to fit the data thus giving more accurate predictions. Many ANNs today require expert tuning and are unable to represent an abstraction in the data. To address these issues, a novel self-evolving Multi-Layer Perceptron (seMLP) system is proposed that consists of a genetic algorithm that is capable of learning the best architecture for the data to best represent an abstraction of the data. The genetic algorithm in seMLP prioritizes the selection of models with better knowledge abstraction thus it is able to build architectures that have an inherent ability to abstract knowledge from the data. This genetic algorithm proposed is also able to work  with deep neural networks with multiple hidden layers and no pre-determined architecture is required at the start. After finding the best architecture, seMLP then proceeds to prune the network using a thresholding function that prunes the redundant links to produce a final architecture that is the most compact and concise. The results have been encouraging where the network discovered by seMLP outperformed other neural networks without abstraction. Furthermore, this network has been pruned to significantly to leave only the important links. Lastly, the seMLP network is applied to stock trading. The results have shown that using the proposed seMLP a data specific artificial neural network can be generated using GA and also beneficial for a highly dynamic system such as stock trading.
Funding Open access funding provided by UiT The Arctic University of Norway (incl University Hospital of North Norway). This study is not Funded from anywhere.

Compliance with Ethical Standards
Conflict of Interest The authors declare that there is no conflict of interest regarding the publication of this paper.
Ethical Approval This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent
Informed consent was obtained from all individual participants included in the study.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.