Forecasting mortality rates using hybrid Lee–Carter model, artificial neural network and random forest

Inaccurate prediction would cause the insurance company encounter catastrophic losses and may lead to overpriced premiums where low-earning consumers cannot afford to insure themselves. The ability to forecast mortality rates accurately can allow the insurance company to take preventive measures to introduce new policies with reasonable prices. In this paper, several Lee–Carter (LC) based models are used to forecast the mortality rates in a case study of the Malaysian population. The LC-ARIMA model and also a combination of the LC model with two machine learning (ML) methods, namely the random forest (RF) and artificial neural network (ANN) methods are utilized on the prediction of mortality rates for males and females in Malaysia, whereby the LC-Random Forest (LC-RF) hybrid model is a new model that is introduced in this paper. Seventeen years of mortality data in Malaysia are selected as the dataset for this research. To analyze how the forecasting models perform for other countries, we have determined the model that has the best fit and produced the best forecasted mortality rates for all the other countries that are studied. This research has showed that LC-ANN and LC-ARIMA are the best model in predicting the mortality rates of males and females in Malaysia, respectively. This study has also found that the LC-ARIMA model is the best performing model in forecasting the mortality rates in countries that have longer life expectancy and a good healthcare system such as Sweden, Ireland, Japan, Hong Kong, Norway, Switzerland and Czechia. In contrast, the LC-ANN model is the best performing model in forecasting the mortality rates in countries that have a less efficiency, less accessibility healthcare system, and bad personal behavior such as Malaysia, Canada and Latvia.


Introduction
Insurance is an agreement with a premium where the insurer agrees to pay a defined amount to the policyholder when loss occurs [1]. The function of insurance is to provide financial protection from any losses which occur to the insured by the insurer to the policyholder. Insurance generally comprises to general insurance and life insurance. Life insurance is the insurance that covers the risk of the insured's life, while general insurance is the insurance which is apart from 1 3 life insurance such as fire insurance and marine insurance. Inaccurate prediction would cause the insurance company encounter catastrophic losses and may lead to overpriced premiums where low-earning consumers cannot afford to insure themselves. This served as the motivation to develop novel ways of forecasting mortality rate in this research. The ability to forecast mortality rates accurately can allow the insurance company to take preventive measures to introduce new policies with reasonable prices.
Lee and Carter [2] formulated a new model in 1992, which became known as the Lee-Carter (LC) model for mortality forecasting in the long-run. There are two factors in this model, including two age-specific parameters for every age group, and a time-varying effect such that the tendency of all age-specific central death rates has the same pattern of stochastic evolvement over time. Unlike other methods that assume an upper limit in age, the LC model allows decrease exponentially in age-specific death rates without limit. Lee and Carter had applied the LC model in predicting the mortality rates in the US using the mortality data from 1933 to 1987. At present, this method is still being applied in predicting the mortality rate of different countries' population around the world. For instance, Li et al. [3] had applied LC model and the Li-Lee (LL) model to forecast and model the mortality rates in China and 15 developed countries namely Italy, Japan, Canada, Norway, Finland, Denmark, U.K., the U.S., Germany, Spain, Sweden, Switzerland, The Netherlands, France and Austria. Li and Chan [4] used the LC model to analyze the time series outlier and predict the mortality index using the mortality data from Canada and the USA. Shair et al. [5] conducted a study related in comparing the accuracy in predicting the mortality rates in Indonesia, Singapore, Malaysia and Thailand by using the LC model and the extension of LC model called the functional data (FD) model that was introduced by Hyndman and Ullah [6]. Ibrahim et al. [7] focused on using the Heligman-Pollard (HP) model and LC model in predicting the mortality rates of Malaysian.
Apart from the above, the LC model had been introduced in other country in predicting the mortality rate. The LC model had been used in predicting the mortality rate by Hernandez and Sikov [8] in Peruvian, Zili et al. [9] in Indonesia, Basnayake and Nawarathna [10] in Norway, Chavhan and Shinde [11] in India, Calma and Revadulla [12] in the Philippines, Taruvinga et al. [13] in Zimbabwe, Ngataman et al. [14], and Kamaruddin and Ismail [15] in Malaysia.
Based on the original LC model developed by Lee and Carter [2], the parameters will be estimated using Singular Value Decomposition (SVD). After the parameter value obtained, the time-varying parameter, k t will be forecasted by using Autoregressive Integrated Moving Average (ARIMA). In recent years, there are several researchers who propose machine learning (ML) algorithm in forecasting the mortality rates for different countries' population around the world. Levantesi and Pizzorusso [16] used the ML estimators such as gradient boosting (GB), decision tree (DT) and random forest (RF) to enhance the forecasting and fitting quality of the standard stochastic mortality models which were LC model, Renshaw-Haberman model (RH), and the Plat model (PM). Safitri et al. [17] used the LC model as the framework of their study in Indonesia, and the value of k t was forecasted by using Artificial Neural Networks (ANN). Based on the study done by Weng et al. [18], several models have been constructed using the ANN, RF and Cox regression in forecasting the mortality rates of the population in UK. Richman and Wüthrich [19] introduced an extended LC model using the NN algorithms to study the mortality rates in different countries. Nigri et al. [20] suggested an alternative approach to the ARIMA process, which is a deep learning integrated LC model based on the Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) architecture to predict the future value of parameter.
Apart from the above, other machine learning (ML) algorithms have been introduced in the literature. These include the Particle Swarm Optimization (PSO)-based ANN by Abdulkarim and Garko [21], logistic regression (LR) model by Allam et al. [22], and Multilayer Perceptron neural network (MLPNN) by Puddu and Menotti [23]. Atsalakis et al. [24] proposed to predict the mortality rate by using the Adaptive Neuro-Fuzzy Inference System (ANFIS) model, whereas Sakr et al. [25] compared seven ML classification techniques, which included DT, RF, ANN, Naïve Bayesian Network (NBN), SVM, K-Nearest Neighbor (KNN), and Bayesian Classifier (BC) to forecast all-cause mortality by applying cardiorespiratory fitness data. Wiemken et al. [26] applied LR, Least Absolute Shrinkage and Selection Operator (LASSO), RF, Recursive Partitioning Tree, Naïve Bayes, and Conditional Inference Tree in forecasting 30-day mortality of hospitalized patients with Community-Acquired Pneumonia (CAP).
In this paper, we are concerned with applying two machine learning (ML) algorithm namely the RF and ANN with the aim of improving the predictive ability of the LC model. Specifically, our approach aims to integrate the original LC formulation with RF and ANN to forecast the future evolution of the parameter k t , thereby overcoming the limitation showed by the ARIMA processes. We hypothesize that the use of RF and ANN would allow for obtaining mortality forecasts that are more coherent with the observed mortality dynamics, and also in cases of non-linear mortality trends.
The reason for choosing the ANN and RF models to be fused with the LC model is because both ML models are relatively simple to construct compared to other ML models due to the need for lesser parameters. The specific reasons for choosing the ANN model to be fused with the LC model is because the ANN model is more suitable in predicting non-linear processes compared to other models. Therefore, the ANN model is used to observe whether it is possible to capture the non-linear trend of the re-estimated k t after the re-estimating phrase. On the other hand, the reasons for choosing the RF model to be fused with the LC model is because the RF model has shown to perform better in classification tasks compared to other models. Here, the RF model is used to observe the accuracy of the results of the prediction for the time series data used in this study.
The remainder of this paper is organized as follows. In Sect. "Preliminaries", we recapitulate some of the fundamental concepts related to the LC model and other important concepts. In Sect. "Hybrid RF and ANN methods", we introduce our proposed RF and ANN methods to forecast the values of k t . In Sect. "Application to mortality datasets for Malaysia", the original LC model and proposed RF and ANN methods are then applied on the mortality data for Malaysia and the discussion of the results are presented at the end of this section. The historical mortality data for Malaysia from 2000 to 2016 were obtained from World Health Organization (WHO). In Sect. "Comparative analysis", we further analyze how the forecasting models used in this research performs for other 11 countries, and the analysis of the results will be presented at the end of this section. Concluding remarks are given in Sect. "Conclusions", followed by the acknowledgements and the list of references.

Preliminaries
In this section, we recapitulate some important concepts pertaining to the theory of the original LC model. The LC model is a demographic and statistical model which is applied to forecast mortality rates and life expectancy [27]. Based on the combination of a simple approach and statistical time series method, it can be utilized for prediction of the level and age pattern of mortality rates in long-run when dealing with the age distribution of mortality.
In this model, there is a basic premise such as there is a linear relationship among the logarithm of age-specific death rates m x,t and two factors which are the initial age interval x and year t . The equation describing this is as below: By taking the natural logarithm of both sides: where m x,t . The central mortality rate for age group x during decade t. k t The mortality index in the year t. This would capture 80-90% of the historical mortality trend. a x The average age-specific mortality. b x The deviation in mortality due to changes in the k t index. This describes the amount of mortality change at a given age for one unit of total mortality change. x,t The random error assuming normal distribution N(0, 2 ) . The beginning of the last age interval.
Definition 1 [2] The non-parametric estimate of the central mortality rate is given by the ratio: where D x,t The number of deaths in the age group x during decade t. L x,t The population of age interval x at year t, x = 1, 2, … , w Definition 2 [2] The average age-specific mortality, a x , is given by: However, the estimation of b x and k t are not able to solve explicitly and the model is not fitted with the general regression methods. Hence, Lee and Carter used a two-stage estimation approach. The SVD has been applied to the matrix {ln m x,t − a x } to estimate the parameters b x and k t , and the estimated values of k t is required to go through the reestimating process at the second stage. To standardize the matrix undergoing SVD, and to assure that a unique solution is obtained for b x and k t for the system of model equations without the loss of generality, the following constraints are proposed: This can be interpreted as the total amount of mortality change at a given age for one unit of total mortality change is one and the total mortality index is zero.

Parameter estimation of the LC model
To ensure the method chosen to estimate the parameters of LC model has a good performance, an appropriate model had to be decided carefully at the beginning. Currently, the SVD, the Maximum Likelihood Estimate (MLE) and the Weighted Least Square method (WLS) have become the three most used methods to estimate the parameters of LC model. In this study, SVD will be applied in estimating the parameters of the LC model.

First-stage parameter estimation
From Eq. (4), the parameter vector â x can be computed easily by taking the average of the logarithm of central death rate over time. In this stage, SVD is applied to the matrix of {ln m x,t − a x } to obtain the estimates values of b x and k t . The definition of SVD is presented in Definition 3.
Definition 3 [28] Suppose that Z is an m x n matrix. Then, there exists a factorization of the form.
where U an m x m orthogonal matrix. Σ an m x n matrix with non-negative numbers on the diagonal. V T the conjugate transpose of the n x n orthogonal matrix V Writing in matrix form, we have, Such a factorization is called as the SVD of Z . The diagonal entries i,i , where i = 1, 2, … , m of Σ are the singular values of Z . These singular values are the square roots of the eigenvalues used in obtaining matrix U , and listed in descending order, and the diagonal matrix Σ is determined uniquely by Z.
The factorization using SVD method results in a real or complex orthogonal matrix U and V T , where U*U T =V*V T , where I is the identity matrix.
From definition above, we have matrix Z with entries as follows: Then, matrix U , Σ , and V T according to the definition of SVD will be constructed. Then, it will be obtained as shown below: where r is a rank of matrix Z x,t . To estimate parameters b x and k t , LC model uses only rank r = 1, In this phase, the estimation value of b x could be obtained as the result in the first column of matrix U, Meanwhile, the estimation value of k t could be achieved by the multiplication of first singular value and first column of matrix V By following the parameter limitation stated in (5) where ∑ x=1 b x = 1 and ∑ n t=1 k t = 0 , the estimation value of b x and k t could be achieved as follows: The values of the parameters â x , b x and k t for both males and females that was obtained using the equations above are given in Tables 1, 2

Second-stage parameter estimation
It is not guaranteed that the observed number of deaths in total is equal to the fitted number of deaths in total if parameters are less compared to observations. Therefore, to reconcile the fitted data and the actual data, a second-stage estimate of k t is computed and the results are presented in Tables 5 and 6.
The comparison of the original k t and the re-estimated k t for both males and females are given in Tables 5, 6.   From the values in Tables 5 and 6, we can notice that the parameter k t declines roughly linearly from 2000 to 2016. If we look at the values of k t in Tables 2 and 4, the pace of k t decreases during the first half of the period is at about the same as it does during the second half. It is also noticeable that fluctuations of k t in short-run for the first part of the period appear similarly as they do in the second. These results are consistent with the analysis of Lee and Carter [2] that was done on the total population of US in their findings. The characteristics of k t where it has relatively constant variance and it linearly declines are very convenient for forecasting purposes.

Hybrid RF and ANN methods
In this section, we present two ML algorithms which include RF and ANN in predicting the future values of re-estimated k t . The concepts of our proposed RF and ANN methods and the iterative steps for the forecasting of the values of the reestimated k t are also presented and explained.

Random forest (RF)
Random forest is a supervised classification algorithm which forms multiple decision trees (DT) at the training stage, and it will output the class with most votes among the classes at the testing stage [29,30]. In RF, each decision tree will produce a class prediction and the class with the highest votes will becomes the prediction of the model. The visualization of a basic RF model is shown in Fig. 1.
By extracting the feature of the data collected at training stage, each DT will learns and formulate decision rules that can be used to perform prediction. To make the model fitter, it required more trees in the forest and more complex the decision rules. To overcome the common flaw of DT which is overfitting, the RF will bootstraps all those votes and choose the best prediction. Bootstrap in the context of the RF algorithm refers to the process of drawing the samples with replacement which means that some of the samples will be selected for several times.
In general, the concept of RF is simple and powerful, it applies the principal behind the wisdom of crowds. In data science lingo, the RF model is able to perform in a powerful manner because of the large amount of models or trees with uncorrelated relationship operating as a committee which will perform better compared to any of the individual constituent models.
The procedure to apply RF in forecasting the values of re-estimated k t is described as follows:

Step 1: Set train and test data
To train the RF model for time series forecasting, we have input the data in the following pattern, where the first three values are corresponding to the input nodes and the forth value defining the desired value for the output node.   This training pattern has taken the method applied by Nigri et al. [20]. The 3-year data which are the first three values for the input nodes were used, because they are sufficient to reflect the current trend of the mortality rates and can be used to predict the mortality rates for next year which is the forth value defining the desired value for the output node.
As the forecasts move forward, past observation is dropped off, this ensures that the prediction in the future would not be affected by the old data. After the training phase, the ML algorithm has learned the input-output functional relationship and it should be able to predict future values of k t using only the input.
Step 2: Fit the data To fit the training pattern in step 2 to the 'randomForest' package available in RStudio, the number of trees to grow and the number of variables available for splitting at each tree node have to be set in advance. A preliminary round of fine-tuning is carried out based on the plot of Out-of-bag (OOB) error whereby it is a method of measuring the prediction error of random forests to determine the best combination of the hyperparameter of the RF algorithm. The best combinations which are obtained in this step are used in the forecasting procedure.

Step 3: Prediction of the k t values
In the previous step, we have developed the RF model for both male and female. In this step, the values of k t from 2014 to 2016 have been predicted.

Step 4: Evaluation of the model
To verify the estimated and identified model in the previous phases is fitted the historical data well, we compute the mean absolute percentage error (MAPE), the root means square error (RMSE), and the average forecast error (AFE) of the predicted data to measure the accuracy in predicting the mortality rates in Malaysia. The following formulas will be used to determine the accuracy: Based on these formulas, the lower the AFE, MAPE and RMSE, the higher the accuracy of the forecasting model.
The framework for the hybrid LC-RF model is presented in Fig. 2.

Artificial neural network (ANN)
Artificial neural network is an ML algorithm that designed to perform pattern classification, categorization, forecasting, and function approximation by imitating the method of the neural network in the human brain in processing the information.
The example shown in Fig. 3 is a feed-forward network which comprises of three layers, namely the input layer, hidden layer and output layers. The input layer aims to receive the input values and pass on the information to the hidden layer for data processing. The hidden layer can be in the form of single or multiple layers, it is used to perform the computations on the information received from input layer, and transfer it to the output layer once it completed the computations. Output layer is responsible to transfer the results to the outside world.
There are two types of feed-forward networks, namely single-layer perceptron and multi-layer perceptron. Singlelayer perceptron does not consist any of the hidden layers, it is considered as the most simple feed-forward network, while multi-layer perceptron contains one or more hidden layers. Based on Jain et al. [33], the neurons in the feedforward networks are organized to move in one direction only, which is moving from the input layer to the output layer by passing through the hidden layer.
The procedure to apply RF in forecasting the values of re-estimated k t is described as follows: Step 1: Normalize the k t value To fit the data into the 'neuralnet' package available in RStudio, data normalization is needed in advance for reducing the differences of the training result. The derived k t value in this research is normalized by minus the minimum value of k t of the data and divide by the difference of the maximum and minimum value of k t . The formula for normalization of the data is given as follows: 1 3 Step 2: Set train and test data To train the ANN model for time series forecasting, we have input the data in the following pattern, where the first three values are corresponding to the input nodes and the forth value is defining the desired value for the output node.

Fig. 4 Framework for the hybrid LC-ANN model
Step 3: Fit the data The training pattern in step 2 is set as the input for the 'neuralnet' package available in RStudio. There are some rules of thumb to identify the suitable number of neurons to be used in the hidden layers, such as the following: • The quantity of neurons in hidden layer are at about 70-90% of the quantity of neurons in the input layer. • The quantity of neurons in hidden layer should be smaller than twice of the quantity of neurons in input layer. • The quantity of the neurons in hidden layer are between the number of neurons in the input layer and the output layer.
A preliminary round of trial and error for deciding the suitable number of hidden layers and neurons based on the error measures is carried out to determine the best combination of the hyperparameter for the ANN algorithm. The best combinations which are obtained in this step are used in the forecasting procedure.

Step 4: Prediction of the k t values
The ANN model developed in the previous step allowed us to predict the k t values from 2014 to 2016. Since the predicted k t values are normalized, the normalized k t values are then multiplied by the difference of the maximum and minimum k t values of the data and plus the minimum amount of the k t values.
Step 5: Evaluation of the model In this step, the predicted values k t generated from step 4 will be evaluated to verify that the identified model in the previous steps is fitted the historical data well. The MAPE, RMSE and APE of the predicted data will then be calculated using Eqs. (15), (16), and (17) to measure the accuracy in predicting the mortality rates in Malaysia.
The framework for the hybrid LC-ANN model is presented in Fig. 4.

Pros and cons of the proposed methods
The table below summarizes the pros and cons of the standalone methods related to the topic of this research:

Application to mortality datasets for Malaysia
In this section, the LC-ARIMA, LC-RF and LC-ANN models are applied on the mortality datasets for Malaysia for the years 2000-2016 which were collected from the website of the World Health Organization (WHO) at www.who.int. The datasets for the year 2000-2013 are used as the training data, whereas the datasets for the year 2014-2016 are used as testing data in testing the accuracy of the forecast results. The following subsections will discuss about which is the best model to predict the values of k t for males and females and an analysis of the model which outperformed the other methods in predicting the values of k t for Malaysia will be discussed.

Results of the application of the forecasting models to the Malaysian datasets
In this section, the values of k t will be predicted by the chosen forecasting models and the results will be summarized.
The predicted values of k t will be used to calculate the central mortality rate, m x using Eq. (1). To analyze the model that are able to achieve the highest accuracy in predicting the mortality rates in Malaysia, the MAPE, RMSE and AFE of the m x have been calculated and these are as summarized below: From the results in Table 7, we can observe that the ANN (one hidden layer with one neuron) model has outperformed the other models in predicting the value of k t for male mortality based on the measurement error. Conversely, the LC-ARIMA (2, 2, 0) without drift model has achieved the lowest measurement error in predicting the value of k t for female mortality as evident from Table 8. From Eq. (1), we can observe that both of the parameters, a x and b x are the constant when calculating the values of m x for a certain age group, which means the values of k t is the only parameter affecting the value of the m x for a certain age group in different years. Hence, to analyze the decisive factors that enable a model to outperform other models, we will further analyze the trend of the re-estimated k t for the past 14 years, i.e. from 2000 to 2013.
By analyzing the re-estimated k t plot in Fig. 5 for males and females, it can observed that the values of k t for both genders show a decreasing trend. The k t trend for males is much more fluctuated compared to the k t trend for females. This explains the reason that ANN has outperformed the other models in predicting the value of k t for males, because ANN is good in solving non-linear problems. However, the decreasing trend of k t for females shows less fluctuation compared to the males. Hence, the ARIMA model which assumed the series are generated from linear process is able to achieve the highest performance among the other models.
Besides, by observing the trend of ln(m x ) plot for the past 14 years for males and females in Figs. 6 and 7, we found that the ln(m x ) trend for males has an obvious increasing trend during age 15-24 and the mortality rates for male is relatively higher compared to females. The increasing trend for males during age 15-24 might increase the uncertainty which causes the non-linear trend of k t . On the contrary, the ln(m x ) trend for females has relatively smooth increasing trend after age 15.
To conclude, the non-linear trend of the k t and higher fluctuation of the ln(m x ) trend that is reflected by the male plots have determined the LC-ANN (one hidden layer with one neuron) model as the best model. On the contrary, the linear trend of the values of k t and lower fluctuation of the ln(m x ) trend reflected by the female plot shows the main factors where the LC-ARIMA (2, 2, 0) without drift model achieved the best performance.

Analysis of the best model for males and females
In Sect. "Results of the application of the forecasting models to the Malaysian datasets", we concluded that the LC-ANN (one hidden layer with one neuron) model has the best performance in predicting the value of k t for males. On the contrary, the LC-ARIMA (2, 2, 0) without drift model has achieved the higher accuracy in predicting the value of k t for females. The results and analysis of the best model for each gender will be discussed in the following section for the period of 2014-2016 which is the testing period for the data used in this research.

Results of the LC-ANN (one hidden layer with one neuron) model for males
The result of the LC-ANN (one hidden layer with one neuron) model which is the best model in predicting the mortality rate for males in Malaysia for 2014-2016 will presented in Figs. 8, 9 and 10, and the error measurement results are tabulated at the end of this subsection. The error measurement results for the prediction of the mortality rates of males are calculated and tabulated in Table 9.

Results of the LC-ARIMA (2, 2, 0) without drift model for females
The result of the LC-ARIMA (2, 2, 0) without drift model which is the best model in predicting the mortality rate for females in Malaysia for 2014-2016 is presented in Figs. 11, 12 and 13, and the error measurement results are tabulated at the end of this subsection.
The error measurement results for the prediction of the mortality rates of females are calculated and tabulated in Table 10.

Analysis and discussion of the best model
From Figs. 8, 9, 10, 11, 12, 13, the actual and the predicted ln m x,t values showed that the mortality rates of males and females increases along with the age of the population from 2014 to 2016. Nevertheless, the lines shaped a rough "U" pattern between 0 and 20 years old. These lines indicate that the infants' mortality rates are higher compared to the children in other age groups, young adults between 15 and 34 years old and adults before 50 years old. These findings showed that Malaysia has a high infant mortality rate from 2014 to 2016.
By comparing Figs. 8, 9, 10 and 11, 12, 13, it can be observed that overall the actual and the predicted ln m x,t values for males are slightly higher compared to the females. Particularly age 15-24 in the year 2014-2016, there is an obvious increasing trend of ln m x,t values for males where it shows that males are exposed to a higher death risk compared to females. As a result, we can expect a longer life expectancy for females and it is consistent with the results shown in Fig. 14.
This difference in life expectancy between both genders are partly because of the inherent biological and epidemiological advantage of females, but it also reflects the differences in behavioral between males and females. Besides, newborn females have higher possibility to survive to their first birthday compared to newborn males. This benefit will continue through out their life, where the females usually have a lower mortality rate at all age groups and this longevity advantage becomes more obvious at the old age. The lower mortality rates for females compared to males may be due to the result of the lower risk behaviors in their lifetime behavior; for instance, smoking, driving and alcohol use. According to Liew et al. [35], females are less likely to involve in accidents compared to males. This was attributed to the fact that male drivers recorded a higher distance driven (in kilometer) compared to female drivers, and the number of male drivers is significantly higher compared to the number of female drivers. However, it is an indisputable fact that males and females have dissimilar behavior. A report by The Social Issues Research Centre in 2004 [36], has pointed out that males tend to seek thrilling sensations, take risks and exhibit aggression that may result in higher accident rates. Mutalip et al. [37] have revealed that males are more prone to risky drinking practices such as a more frequent alcohol intake which has contributed to a higher mortality rate among males. Besides, Lim et al. [38] have revealed that smoking prevalence among Malaysian male  adults was relatively higher compared to females and there was a dramatic increase in smoking among young adults aged 15-24 years old. The difference between the smoking prevalence of males and females is also a factor that contributes to a higher mortality rate for males compared to females. Moreover, the lower mortality rates for females may be attributed to the harder-to-identify biological and epidemiological advantages that result in lower rates of cancer, cardiovascular disease and coronary heart disease (CHD) among females. Abdullah et al. [39] concluded that the number of males who died from CHD is higher than females, and this study also pointed out that males generally started to suffer CHD at age 30, while females only started to suffer CHD at the age 40. Based on Fig. 15, it can be observed that the probability of death for males from any of the top global causes of death from 2000 to 2015 is relatively higher compared to females. Males have exposed to a higher risk   of cancer, chronic respiratory diseases, diabetes as well as cardiovascular diseases, this is also one of the factors that contributes to a higher mortality rate for males compared to females. As mentioned previously, the mortality rates for males during age 15-24 have an obvious increasing trend comparing to a smooth increasing trend for Malaysian female in the same age group. This might be due to a higher level of suicidal ideation among males compared to females. Figure 16 shows that the suicide rate for males is higher than females. Ibrahim et al. [40] have carried out a study on correspondents aged between 15 and 25 from selected cities in Malaysia, and the study showed that the suicidal ideation of male teens was indeed higher compared to female teens. This can also be a reason for an increasing mortality trend for males in the age group of 15-24 years old compared to females in the same age group.

3
The mortality rates for Malaysian infants and teenagers have a palpable decreasing trend, meanwhile the trend for other age groups showed small increasing tendencies. In general, there will be a rapid reduction of mortality rates during the infant age and the mortality rates at the teen age group (10-15) also slightly decreases from 2016. However, there is a small increment of the mortality rates at the age group of 15-24 years old. Throughout the forecasted calendar years, the elder group (65 and above) has the higher ln m x,t as compared to other age groups.
By comparing the plotted lines between the actual and the predicted ln m x,t , both LC-ANN (one hidden layer with one neuron) and LC-ARIMA (2, 2, 0) without drift models have   [34] performed well in forecasting the trends of mortality rates for both genders, respectively, from 2014 to 2016.

Comparative analysis
In this section, we further analyze the performance of the forecasting models used in this research by applying the forecasting models to the mortality datasets of 11 other countries, namely Canada, Sweden, Ireland, Japan, South Korea, Hong Kong, Norway, Switzerland, Latvia, Slovak Republic, and Czech Republic. The analysis of the results will be presented at the end of this section.

Analysis and discussion of the performance of the models
To analyze how the forecasting models used in this research performs for other countries, we have determined the model that has the best fit and produced the best forecasted mortality rates for all the other 11 countries that were studied. The results obtained have been summarized in Tables 11 and 12. Based on the error measurements, it can be seen that the old-school method of LC-ARIMA shows the best performance in predicting the value of k t for males and females in several countries, such as Sweden, Ireland, Norway, Switzerland, and Czechia due to the strengths of the ARIMA method in solving time series with a linear trend. According to World Life Expectancy (https ://www.world lifee xpect ancy.com/), Sweden, Ireland, Norway and Switzerland have a relative longer life expectancy of around 80 years and this is due to several reasons.
In Sweden, the citizens' longevity is due in part to Sweden's commitment to environmental cleanliness. Based on OECD Better Life Index (https ://www.oecdb etter lifei ndex. org/), there are 96% of the citizens in Sweden included in a poll have agreed that the water quality is satisfactory and lack of pollutants. Moreover, Sweden has one of the best healthcare systems in the world, with the country's universal healthcare system that enables those in poverty to access important services for themselves and their families. As a country's healthcare system improves, the rates of premature death from preventable causes such as self-harming behaviors and lower respiratory infections will show a decline, particularly for those segments of the population that are living in poverty. In fact, premature death from lower respiratory infections in Sweden has decreased by 49% from 1990 to 2010. These factors may also contribute to Sweden's longer-than-average life expectancy.
The life expectancy of citizens in Ireland can be due to the reduction of the major deaths such as cancer and cardiovascular diseases. According to the Department of Health of Ireland (https ://www.gov.ie/en/), there has been a reduction of 39% in stroke, 16% in breast cancer, 26% in suicide and 39% in pneumonia in the country since 2008 and this had resulted in the overall mortality rate of the country reducing by 14.9% since 2008. These facts show the importance of a good healthcare system in providing a higher quality health service for the citizens and its importance in improving the life expectancy of the population of a country.

3
Based on the OECD Better Life Index (https ://www. oecdb etter lifei ndex.org/), the air pollutant particles in Norway is considerably lower than the average of 13.9 microg per cubic meter stated in OECD, which indicates that the air quality in Norway is considerably good. Hence, a reduction in causes of death such as respiratory diseases and cardiovascular diseases can be expected. Studies have shown that 98% of the citizens in Norway are pleased with the water quality in their country, compared to the average of 81% stated in OECD, thereby achieving one of the highest rates in the statistics of OECD. These findings indicate that a higher level of environment cleanness contributes to a longer-thanaverage life expectancy in Norway. A good healthcare system, promotion of an active lifestyle, high living standard and pro-family society in Switzerland has made the country frequently top rankings related to healthcare and aging. Simeon Bennett, a spokesperson from WHO (https ://www.who.int/) in 2016 stated that nations with higher life expectancy have the tendency to have greater standards of living, healthier lifestyles and diets, greater health service coverage, and higher incomes compared to the nations with low life expectancy. This statement is exactly true in the case of Switzerland, because it was stated as the "best country" overall by the U.S. News in 2019 and the World Report in 2020 partly due to its superior healthcare system. The requirement of Switzerland's mandatory basic health insurance has guarantees that everyone living in the Switzerland are able to afford basic healthcare and have access to good medical care at all times. Besides, the statistics reported by the OECD Better Life Index showed that the average annual household income in Switzerland is higher than the OECD annual average, which explains the higher than average standards of living in Switzerland as well.
The standard of healthcare system in Czechia has been landed as one of the best healthcare systems in the Central European region. The country has emerges as a popular destination for medical tourism in Europe attributed to the affordability and standards of the medical treatments. All Czech citizens enjoy nearly universal insurance coverage and are automatically insured under the country's public healthcare system through their employers. The quality of care offered by Czech hospitals is on par with the rest of Europe, and the country has a significantly greater number of physicians and nurses compared to other central European countries, and therefore medical care is widely available.
As a result, a stable trend of mortality rates could be expected from Sweden, Ireland, Norway, Switzerland and Czechia, which allows the ARIMA model to fit well to the mortality data of these countries, hence enabling the ARIMA model to outperform the other models for these countries. Furthermore, the LC-ARIMA (0, 1, 0) with drift model which was introduced in the study by Lee and Carter [2] has also outperformed other models in predicting the mortality rates in Sweden, Ireland, Norway, and Czechia. This might due to the fact that the trend of the mortality rates in these countries are quite similar to the US, although these countries have a longer life expectancy. This is consistent with the results obtained by Lee and Carter [2] in which the LC-ARIMA (0, 1, 0) with drift model showed the best performance in forecasting mortality rates in the US.
A research done by Schneider et al. [41] ranked Canada in the ninth position out of 11 countries due to the low performance of healthcare system such as equity, healthcare outcomes and accessibility to the system. Simpson et al. [42] have stated that the healthcare system of Canada is considerably expensive. At first, the federal government of Canada agreed to bear 50% of provincial hospital and medical care costs. However, they started to reduce their cash contributions when they realized that they had no control over the total expenses and they did not have any mechanisms in place to control the costs. Furthermore, the costs of universal hospital insurance in Canada have been increasing over the years and this has put a burden on the finances of the government. This has resulted in the total average spending of Canadian citizens on healthcare to be relatively higher compared to other developed countries in the world. As a result, the citizens have no choice but bear the high medical costs to enjoy the healthcare services in the country. This indicates that the performance of the healthcare system in a country might affect the mortality rates of the whole population, and this may lead to a higher fluctuation of the value of k t . The higher fluctuation in the value of k t would cause the ARIMA model to fail to fit well into the data. Hence, the ANN and the RF models which are able to accurately capture non-linear trends are able to perform well in predicting the values of k t in the case of Canada.
Japan is ranked in the second position and first position for the life expectancy of male and females, respectively, by World Life Expectancy. This can be due to the high quality of the diet consumed by the Japanese which has been related to a lower risk of mortality. Kurotani et al. [43] found that the Japanese dietary guidelines do help in lowering down the risk of mortality from cardiovascular diseases, particularly cerebrovascular disease and the total mortality among the Japanese population. Consequently, the population in Japan are expected to have a longer life expectancy. In this case, a smooth linear trend for the values of k t will be expected and the ARIMA model will be able to work well for both genders. However, the ANN model has surprisingly outperformed the ARIMA model in predicting the value of k t for females. To further investigate, an analysis was done and the results showed that trend of the value of k t for both genders are decreasing linearly overall. However, the decreasing rate of the value of k t for females has been accelerated in the period of 1996 to 2013 and this has caused the non-linear trend of the values of k t , thereby resulting in the ANN model outperforming the ARIMA model in this case.
In South Korea, both genders have a relatively high life expectancy according to World Life Expectancy. Therefore, the ARIMA model will be expected to outperform the other models. However, the ANN model performs better in predicting the values of k t for females. The error measurement for females in South Korea has a relatively higher error compared to the other countries and this might due to several factors. First, based on the AFE, the results showed that the data of females in South Korea is not fitted well to the LC model due to the second-stage estimation of the values of k t . This stage is to adjust the values of k t to guarantee the observed number of deaths in total is equal to the expected number of deaths in total. However, in reality, the low rates of youths' death contribute far lesser to the death rates. When re-estimating the values of k t , they are equivalently weighted to the high rate of death of the older ages. It is also notable that the differences in sizes of population age group will cause the differences in weights during the second-stage estimation of k t . Second, the unfairness in distributing the equivalent weightage of k t for all age groups might increase the uncertain behavior of the changing trend of the values of k t which would affect the following prediction stage as the prediction model is constructed based on the values of re-estimating k t . According to Statistic Korea (https ://kosta t.go.kr/porta l/eng/index .actio n), the deaths of people aged 60 and above had occupied around 80% of total deaths of people in South Korea from 2014 to 2016. However, the reestimating stage has distributed the weightage of the values of k t equivalently to all age groups, which means the poor performance of the age group of 60 and above has been distributed to the age groups below 60 too. As a result, the prediction error for females in South Korea is higher compared to other countries.
From Tables 11 and 12, we can observe that the LC-ARIMA model has outperformed the other models in predicting the values of k t for both genders. This might due to the linear trend and minor fluctuations reflected by the trend of the movement of the values of k t during the period of 2014-2016. There are several factors that caused the linear trend of k t with the citizens' longevity being one of the important factors. According to Chung and Marmot [44], the life expectancy of Hong Kong has increased gradually for the past 50 years, the life expectancy for males is 81.9 years and for females is 87.6 years in 2017. The infant mortality rate in Hong Kong is among the lowest in the world, which indicates that a good quality of maternal and child healthcare are provided. Based on the Thematic Household Survey conducted in 2015, the overall smoking rates in Hong Kong has remained low consistently for the past few years and became one of the lowest figures worldwide, which is 11.8% and 11.4% in 2012 and 2015, respectively. The low prevalence of smoking is thought to contribute to the long life expectancy of Hong Kong citizens. As a result, a stable and linear decreasing trend of k t for males and females can be expected, and hence the LC-ARIMA are able to outperform the other models in the case of Hong Kong.
According to a health report by the European Commission (https ://ec.europ a.eu/info/index _en) in 2017 and 2019, there is a huge gap in the life expectancy by socioeconomic status and gender in Latvia. Females in Latvia are expected to live on average nearly a decade more compared to the males in Latvia. These inequalities in health are mainly attributable to a higher prevalence of risk factors, particularly harmful alcohol consumption and tobacco smoking among the males in Latvia. While one in four adults reported smoking daily, the average conceals a strong gender difference, with males in Latvia being among the heaviest smokers in the European Union (EU). Although a National Health Service system with a universal population coverage and general tax-financed healthcare provision is implemented by the government of Latvia, relatively low levels of government spending is allocated to health, resulting in considerable resource constraints and underfunding of the healthcare system. In view of the limited resources allocated to healthcare system, it is nothing to be surprise that the healthcare system in Latvia has a lower performance in terms of access to quality healthcare or quality of the healthcare. Regarding the quality of acute care among all the data reported by the EU countries, Latvia has the highest case of fatality rates for stroke and heart attack. Besides, the small share of public spending by government on healthcare system has caused the households to have highly reliance on their own private spending in the form of direct out-of-pocket payments. This has caused a significant number of households in Latvia to incur catastrophic expenditure on healthcare, thereby resulting in severe financial barriers to access quality medical care, especially for low-income households. In Tables 11  and 12, it can be observed that the LC-ANN model performed better than the other models in predicting the values of k t for males. This is reasonable as the males in Latvia have higher rates of mortality due to a higher prevalence of risk factors due to alcohol consumption and smoking which might increase the uncertainty behavior of the trend of k t for males. However, the LC-ARIMA model has outperformed the other models in predicting the values of k t for females. Further analysis has been done for the changing trend of the values of k t for both genders, and the results showed that the trend of change in the values of k t for males has an increasing trend from early year 1960 to middle year 1994 and started to decrease from 1994 to the later year 2013, which is consistent to the expectation of more uncertainty in the mortality rates onwards. The analysis of the trend of k t for females showed that it has a relatively stable trend showing neither an obvious increasing nor decreasing trend in the analysis. This might be due to the low performance of the healthcare system in Latvia. As a result, the LC-ARIMA model is much more suitable in the case of Latvian females and able to outperform the other models in predicting the values k t for females.
From Tables 11 and 12, the LC-ARIMA model has successfully performed outstanding in predicting the values of k t for both genders in Slovakia. Further analysis for females shows that the LC-ARIMA, LC-ANN and LC-RF models have achieved a very similar performance. Among these models, the LC-RF model has achieved the best performance. However, the LC-ARIMA model has chosen as the best model in predicting the values of k t for females due to the linear decreasing trend of the values of k t for females. As mentioned previously, the ARIMA model has it strengths in predicting a linear process and despite the fact that the LC-RF model has slightly better performance compared to the LC-ARIMA model, we expect the LC-ARIMA model can perform better than the other models in predicting the values of k t for 2017 and onward. For Slovakian males, it can be observed that the values of the error metrics are relatively higher, which can be due to the same reason as in South Korea, where the weightage of k t has been equivalently distributed among all the age groups during the reestimating stage, and hence disrupt the processes in the prediction stage. In Slovakia, diseases such as diabetes mellitus and cardiovascular diseases have burdened the healthcare and economic system and these diseases have contributed considerably to the high mortality rates of the population. Gavurova and Vagasova [45] have found that the age group of 25-44 in Slovakia is most exposed to the death risks from cerebrovascular diseases, while the age group of 45-64 in Slovakia have exposed to the higher death risks caused by chronic ischemic heart disease. This reflects that the morbidity of the Slovak population has been shifted gradually into the younger age groups. This study also revealed that females between 25 and 44 years old in Slovakia are more exposed to the deaths caused by diabetes mellitus compared to males from aged 25 to 44 in Slovakia. The study by Gavurova and Vagasova [45] further suggested that the prevalence rate of males in acute coronary syndrome, specifically from aged 45 to 64 has been increased. These findings reflect that the age group of 25 to 64 in Slovakia has largely occupied a high number of deaths for a year, and this caused the unfairness in distributing the equivalent weightage of k t for all the age groups in Slovakia. As a result, the re-estimating stage of the values of k t might not be a necessary process as it sometimes worsens the prediction process. In our findings, we have also discovered that the trend of the change of the original k t and the re-estimated k t for females showed a relatively small difference which indicates that the re-estimating stage in cases such as the Slovakian population may be skipped. Instead, the original values of k t could be directly used in building the mortality rates prediction model without going through the re-estimating stage.
(iv) An analysis of the factors that will affect the trend of change of k t in diverse countries, such as life expectancy, quality, accessibility and efficiency of healthcare system, level of environmental cleanliness, average household income, personal behavior such as consumption of alcohol and tobacco and other risky behavior, and dietary preferences have been presented. (v) An analysis of the factors that will affect the mortality rates and cause an abnormal increase or decrease in the mortality rates in the case of Malaysia has also been done. The population of Malaysia is approximately estimated at 32.4 million in 2018, and it has increased to 32.6 million in 2019. Tracking and predicting the mortality rates are pertinent as it can provide the insights to the mortality trends in Malaysia in the future and these can be used by insurance companies to derive new policies with reasonable premiums for the benefit of the consumers, and at the same time enable the insurance companies to make sufficient profits to cover their risks. Therefore, the development of a better and more efficient risk prevention and risk management system in insurance field can be done if any prediction model able to predict the mortality rates accurately. It was also found that the mortality rates of males in Malaysia have an abnormal increasing trend in the age group of 15-24 years old due to a high level of suicide ideation among Malaysian male youths can be useful to the relevant government agencies and non-governmental organizations (NGO) in charge of the well-being of youths and youth empowerment as it would enable them to take necessary actions in order to reduce the suicidal intention and suicide rates among Malaysian youths which can contribute to an increase in the life expectancy of Malaysian youths. (vi) Further studies will integrate advanced prediction models using hybrid ANN and Genetic Algorithm, Fuzzy Time Series and Statistical Models, etc. as in [46][47][48][49][50] for acceleration of the proposed method in this research.