This section derives a framework for portfolio managers to invest for many different time horizons in stocks to maximize investors’ gain. Here, the analysis is done on multiple sectors to ensure diversified portfolio management to reduce investors’ risk.
As already discussed that there are various perspectives of investors while investing their funds to gain more profit. Share markets are highly sensitive, and there are numerous factors directly or indirectly associated with it that control the market sentiment (Gottschlich and Hinz 2014). A group of investors foresees an appreciation in stock valuation so, they buy stocks at the current market price and expect to sell them in future and book profit. Again, at the same time, another group of investors assumes that stock prices will fall in the future so, they sell at the current market price, and in the future, they may buy again at dips. In reality, there exists simultaneously many sellers and buyers in the market due to the different perception of future stock valuation. In the future, any one group of investors will benefit, and others may have to incur a loss. Many different statistical techniques and time-series analyses have been used with mixed success. We aim to use well-tested regression techniques as a starting tool to screen individual stocks along with domain knowledge in bucketing company stocks into industrial sectors. Therefore, the objective is to propose an analytical approach to predict the share price of different companies and invest total capital across various diversified sectors to earn better profit percentages and mitigate the overall risk. Moreover, we aim to maximize the profit over time as the investors prefer to get a higher return for long term investments.
Analysis of different statistical method and finding a suitable one
Statistics being the body of methods meant for the study of numerical data, the first step in any statistical inquiry must be a collection of relevant numerical data. Once data are collected, a knowledge discovery may analyze the behavior of the data. However, as the different data are different, a suitable statistical method is to be identified for data processing belongs to a particular application.
In this present work, one of the primary challenges is finding a preferable method that would help us forecast/predict the companies’ share value. However, in reality, it is impossible to predict the share value of the companies accurately, as several issues control the movement of the share market. Many parameters may not be directly related to the up/down of the share market, but indirectly related to the share market by affecting parameters responsible for changing the share value directly. Even some new cases/issues (e.g., COVID-19, terrorist attacks, etc.) may evolve in the future that is unknown at the time of analysis. The proposed prediction model will be based on historical prices of shares as it reflects the effects of all events and parameters that determine the stock prices.
There are several statistical methods such as standard deviation, linear regression, non-linear regression, correlation, time series analysis, etc. that are useful for prediction. Standard deviation is the simplest one. The result produced by the standard deviation is used for basic calculation. It gives a high error rate for complex data. Therefore, for more satisfactory precision, it is not a suitable method.
In correlation analysis, the concern is the mutual relationship between the two variables. It uses a measure of the interdependence of the two variables, known as the correlation coefficient.
Because of the close connection between the correlation coefficient and linear regression, it is clear that the former can serve as a satisfactory measure of the strength of the relationship between the two variables only when that relationship is of the linear type. Hence a low value of the correlation coefficient does not rule out the possibility that the variables are related in some other manner. However, as the problem domain is the capital market, it might be possible that the correlation coefficient is small in the measure. Therefore, while dealing with real financial time series data, it is not possible to guarantee that the correlation coefficient remains greater than a particular value all the time. For this limitation, correlation is not considered in the present work as a tool of prediction.
In the share market, the stock prices of companies change over time, and it does not maintain any specific pattern. Different companies follow different patterns of change. Not only this, but a particular company may also show varying patterns over time. Therefore, it is necessary to understand the pattern of change in the share price. Once the values of the shares plotted against time, different types of curves get generated. In mathematical analysis, several forms of curves are identified over time. Among these numerous forms, we intend to look for the one that best fits the given pattern.
There are several forms of curves used in mathematical analysis. Out of those commonly used are listed below in Table 1. Now the best-fitted form of curve is identified by comparing the actual values with the values generated by solving the regression equation associated with each type of curve. In this problem domain, the independent variable is the time (x) and the dependent variable is the closing stock price (y).
Table 1 Different well-known curves with their general form (equation) This comparison helps to calculate the error value for each curve defined above. Now the curve for which the minimum error value is obtained is identified as the best-fitted curve to represent the share movement of that particular company.
Regression analysis is very much useful, as it helps us in the prediction, and forming the curves which help to compute the error values. In regression analysis, one of the two variables (say x) is the independent’ variable and the other (y) as the dependent’ variable, and the objective is to be investigate the dependence of the variable y on the independent variable x. The major problem in the case of linear regression is to express the relationship between y and x through a mathematical function such as a linear equation and, then only it is possible to use the resulting equation to predict y in terms of x.
Non-linear regression is another variation of regression techniques. But in the case of non-linear regression, there is no hard and first rule to establish the relationship between two variables in a linear pattern like linear regression. The relationship between the two variables in non-linear regression is non-linear.
Methodology
One of the main objectives is to predict the share price of an individual company. Actual share prices plotted in a 1-month interval up to the current year. The present-day stock price of a company is estimated using regression on the historical price data (say last P monthly closing price). The regression method is applied to solve the different types of curve-fitting models and compare those values with the current one (actual market price). Next, the difference between the estimated price and the actual price is computed. The percentage of difference gives the error rate for each type of curve. The minimum squared error percentage value would be chosen and, therefore, the corresponding curve would be identified as the best-fitted curve, and the corresponding share value is considered for further processing. The best fit curve then predicts the stock price for the next time period for the company. The overall flow of the methodology is depicted in Fig. 1.
In order to get better prediction value refinement is done on the predicted value generated so far (using the best-fit regression line) by removing some error to some extent, by implementing an error estimation technique discussed below.
In the case of the share market, the impact of the share price of recent years is more influential than the previous years. However, a long-time analysis is also required to understand the trend of the particular stock. With these issues as critical consideration, It uses data for a longer time with a higher impact on recent prices in the proposed formulation. It puts more weight in recent years and gradually decrease the weight for previous years. Then after calculating an error measure, It adjusts on the error value with the predicted value for better estimation.
It gives a set of predicted values of different companies, and that is the input data set for further activities. Therefore, it is needed to allocate the fund into the market such that the net return is comparatively higher. Investing total funds into a single company would not always maximize the returns as a company showing a higher growth-rate might not continue the same in the future. The same is also true for the companies having a low-profit percentage at present, might show better return in the future due to different factors such as the launching of new products, new investment from the investors, acquisition, etc. Therefore, the capital should be invested across many companies (diversified investment) that belong to diverse industry sectors to maintain better returns while reducing the risk. The proposed strategy is to diversify into many sectors for better portfolio management.
After predicting the share values, the companies are clustered sector-wise for diversified fund allocation. These different sectors with a different rate of growth are identified. Therefore, we propose a mathematical approach to allocate funds sector-wise.
Before the allocation of funds, it needs to calculate the growth rate of all individual companies belongs to a sector. The calculation is on the growth rate between the two same time period for all the companies and to give more focus on the recent growth rate as compared to the older period. Some weighting factors are set for the previous periods that would get multiplied with their corresponding growth rate where their sum up gives the overall growth of that company. Similarly, we need to calculate the growth rate of all the companies within a sector. Then, the mean of the growth rate of all the companies except those with negative growth would reflect the overall growth rate of a particular industry sector. Likewise, the net growth rate of all the sectors is calculated. The philosophy here is to allocate a bigger chunk of the fund to the high growth sectors and less to the moderately growing sectors. The same logic is applied while selecting the candidate companies within an industry sector.
Algorithm for diversified fund allocation across Companies as well as sector-wise to maximize the net return
In the proposed methodology, the prediction of the current share value is done based on the data from the previous p months. The month-wise weight (\(Y_i\)) is used for calculating the impact of growth/fall of the price several times in the proposed algorithm for consecutive years to compute the error values and in case of the computing growth rate of an individual company in subsequent years. It is calculated for the ith month using Eq. (1):
$$\begin{aligned} Y_i = \frac{2*(p-i+1)}{p*(p+1)}. \end{aligned}$$
(1)
Different industry sectors are identified by consulting financial news sources and NSE and BSE web portals. Companies that belong to these sectors are also available from the same sources. Historical stock prices of those listed companies are also available from NSE and BSE portal (BSE 2019b).
-
STEP 1:
[PREDICTION OF STOCK PRICE OF SELECTED COMPANIES]
-
1.
A: [IDENTIFYING BEST FITTED CURVE AND DATA SET GENERATION]
-
1.A.
i: Collect the historical stock prices of a company for the last p months. This constitutes the initial dataset for the model.
-
1.A.
ii: Solve different curve-fitting models by a regression method to predict the stock price for different time periods between p and 0 months (present). Therefore, generating some predicted share values of that company across the different time period.
-
1.A.
iii: Calculate the percentage deviation of the forecasted values from the actual values.
-
1.A.
iv: Choose the curve as best fitted for which the rate of deviation (Root Mean Square Error (RMSE)) is lowest and R-squared value is maximum then put the predicted values of the corresponding curve in our data set.
-
1.A.
v: Predict the share value (for historical data for which actual prices are also known) after a certain period (when we would need to withdraw our fund may be after 3, 6, 12 months in future) by solving the best-fitted curve using regression. Insert the predicted share value into the dataset.
-
1.A.
vi: Repeat the steps from 1.A.i to 1.A.v for every company to generate our data set.
-
1.
B. [FINE-TUNING OF PREDICTED STOCK PRICE BY ERROR ESTIMATION]
-
1.B.
i: Calculate the deviation of predicted share value from the actual value at each considered time period
-
1.B.
ii: Suppose the deviations at P different time periods are, \(\ldots ,d_p\). Where \(d_k\) is the deviation of share value corresponding to its actual share value k period (months) earlier.
-
1.B.
iii: Calculate the Company Net Error (CNE) by the following formula:
$$\begin{aligned} CNE_j= Y_1d_1 + Y_2d_2 + \cdots + Y_kd_k + \cdots +Y_pd_p, \end{aligned}$$
where \(CNE_j\) is the Company Net Error of the \(j\)th company (where \(j=1 \ldots m\)). Y values for each time period are calculated following Eq. (1).
-
1.B.
iv: Add the \(CNE_j\) to the corresponding predicted share value of the company to generate FPV (Final Predicted Value) and insert FPV into our data set.
-
1.B.
v: Repeat step 1.B.i to 1.B.iv until FPV (Final Predicted Value) of all the companies get calculated.
-
STEP 2:
[BUSINESS DOMAIN IDENTIFICATION]
Group all listed companies into different industry sectors or domains based on the business areas of the company with the help of domain experts.
-
STEP 3:
[SELECTION OF SECTORS AND COMPANIES BELONG TO THAT SECTOR FOR INVESTMENT]
-
3.
A: [COMPANY-WISE WITHIN A SECTOR]
-
3.A.
i: Pick up a company from a particular sector.
-
3.A.
ii: Find the percentages of the growth rate of the company for different time periods with respect to the month immediately earlier. Calculate this into present day from the initial time period (say p months ago). Suppose the growth rate between ith previous month and \((i-1)\)th previous month is \(Gr_i\) , where \(i=1\) to P, considering current month as 0th month. Therefore, \(Gr_i\) is the growth rate of \((i-1)\)th time period w.r.t its immediate earlier month i.e. ith month. To maximize the impact of current growth over the growth of older year, we would develop a mathematical formula stated below. Suppose the growth rates of a company are \(Gr_1, Gr_2, \ldots , Gr_p\) respectively from present to P years earlier.
-
3.A.
iii: Calculate the Company Net Growth Rate (CNGR) by the following formula:
$$\begin{aligned} CNGR_j=Y_1Gr_1 + Y_2Gr_2 +\cdots + Y_iGr_i + \cdots +Y_pGr_p \end{aligned}$$
Where \(CNGR_j\) is the Company Net Growth Rate of the \(j^{th}\) company (where j=1 to m). \(Y_i\) is calculated following Eq. (1).
-
3.A.
iv: Repeat step 3.A.i to 3.A.iii until Company Net Growth Rate (CNGR) of all the companies of that particular sector gets calculated.
-
3.A.
v: Consider only the companies having positive growth rate to invest our fund and discard all the companies having a negative growth rate for that time period.
-
3.
B: [SECTORWISE]
Calculate the net growth rate of a particular sector by finding the mean value of the growth rate of all the companies belong to that sector.
-
3.
C: LOOP
Repeat step 3.A and 3.B for each sector.
-
STEP
4: [ALLOCATION OF FUND]
In the case of fund allocation, the motive is to allocate more funds in such sectors and companies having better growth rate over the sectors and the companies having a lower rate of growth to enlarge overall profit. Say overall fund is F.
-
4.
A. [SECTORWISE]
-
4.A.
i. Find out the Sector Multiplying Factor (SMF) by the following formula:
$$\begin{aligned} SMF=\frac{100}{\sum _{i=1}^n G_i} \end{aligned}$$
Where \(G_i\) is the growth rate of sectors \(S_i\), n is the number of sectors selected for investment.
-
4.A.
ii: Determine the sector-wise fund to be invested by the mathematical formula given below:
$$\begin{aligned} SA_i = G_i * SMF \end{aligned}$$
Where \(SA_i\) denotes sector-wise percentage allocation. Thus sector-wise allocation is given by
$$\begin{aligned} SFA_i = F * SA_i \end{aligned}$$
-
4.A.
iii: Repeat step 4.A.i to step 4.A.ii for all the selected sector.
-
4.
B. [COMPANYWISE] Repeat for each sector \(S_i\) where \(i=1\) to n Let each sector \(S_i\) consists of m number of companies \(C_1\) to \(C_m\) with growth percentages of \(G_1\) to \(G_m\) respectively.
-
4.B.
i: Find out the Company Multiplying Factor (CMF) by the following formula:
$$\begin{aligned} CMF=\frac{100}{\sum _{i=1}^m G_j} \end{aligned}$$
Where \(G_j\) is the growth rate of a sector containing m number of companies \(C_j\) (where \(j=1 \ldots m\)) respectively.
-
4.B.
ii: Determine the company wise fund to be invested by the mathematical formula given below: \( CA_k = g_k * CMF \) for sector \(C_i\). (where \(k = 1 \ldots m\)) Where \(CA_k\) denotes company wise allocation percentage wise Thus company wise allocation is given by
$$\begin{aligned} SCA_k = SFA_i * CA_k \end{aligned}$$
-
4.B.
iii: End of Repeat Step 4B