A multiobjective optimization approach for threshold determination in extreme value analysis for financial time series

The literature on the extreme value theory threshold optimization problem for multiple time series analysis does not consider determining a single optimal tail probability for all marginal distributions. With multiple tail probabilities, their discrepancy results in a differing number of exceedances, which may favour a particular marginal series. In this study, we propose a single optimal tail probability by integrating trade-offs among multiple time series within an MOO framework. Mathematically, our approach links the peaks-over-threshold technique and goal programming technique by developing a set of regression functions, which represent continuous paths of possible tail areas for multiple time series, and we formulate them at the desired levels within a multiobjective optimization framework. The optimal solution is found as the minimum Chebyshev variant weighted value. Our approach advances the development of the peaks-over-threshold method by considering the characteristics of a group of time series collectively instead of independently. The proposed optimal tail probability can be considered an optimal reference point for practical risk investment portfolio analysis that employs an identical tail size across multiple time series data. The daily log returns of four U.S. stock market indices, namely, S&P 500, NASDAQ Composite, NYSE Composite, and Russell 2000, from 1 July 1992 to 30 June 2022 are studied empirically.


Introduction
Extreme value theory (EVT) is a powerful and accurate statistical modelling approach to examine extreme values in financial time series in accordance with their asymptotic nature and the behaviour of the underlying probability distribution (Embrechts et al. 1999;Rocco 2014).Researchers have employed EVT based on the peaks-over-threshold (POT) approach as a risk management tool in financial analysis, such as the investigation of the asymmetry of illiquidity measures of stock prices and trading volumes (Będowska-Sójka et al. 2022); inspection of the tail relationships between returns and volumes for high-frequency cryptocurrencies (Chan et al. 2022); comparison of the extreme potential losses and gains of stock indices in both the short and long terms (Chikobvu and Jakata 2020); examination of the stress testing perspective of a semiparametric copula-GARCH risk model for financial return series (Koliai 2016); and estimation of the potential loss for digital banking transaction risks (Saputra and Chaerani 2022).The procedure to identify an effective threshold for determining the tail of extreme values is crucial, as an inappropriate threshold can lead to the problem of bias or inefficiency in risk analysis (Chan et al. 2022).Choosing a small or large threshold value can result in too many or too few observations of exceedances, respectively, which may cause biased parameter estimates with inappropriate measurement data by including nonextreme observations or may induce inefficient parameter estimates with large standard errors by excluding extreme observations.Hence, the identification of an optimal threshold is a critical issue in the risk management of financial time series.
In previous empirical studies of EVT, various POT approaches have been adopted to choose a threshold in extreme value analysis for financial time series.These approaches include the applications of statistical Pareto QQ, Hill, and mean excess plots (Jakata and Chikobvu 2022); parameter estimates by using moments, probability weighted moments, and elementary percentiles (Jocković 2016); Monte Carlo simulation from a known probability distribution (Longin and Solnik 2001); hyperparameter selection based on machine learning algorithms (Nakamura 2021); statistical simulation from a bivariate probability distribution (Verster and Kwaramba 2022); and Bayesian modelling by deriving a posterior distribution for the unknown generalized parameter (Verster and Raubenheimer 2021).All these POT approaches have merely achieved the single objective of choosing an individual or local threshold and analysing the tail properties of extreme values for each corresponding financial time series.For handling multiple financial time series data, the multivariate generalized Pareto distribution (Rootzén and Tajvidi 2006;Rootzén et al. 2018) and generalized copula method (Falk et al. 2019) have proven to be effective.The two methods aim to find the best distributional fitness, and the values of the optimal thresholds for the involved time series are not restricted to be the same as each other.The discrepancy among threshold values (numbers of exceedances) causes a depute of fairness of the overall modelling process and the values of the optimal tail probabilities for the involved time series are not restricted to be the same as each other (Caeiro and Gomes 2015;Roth et al. 2016).The discrepancy among optimal values causes a depute of fairness of the overall modelling process.In other words, the unbiasedness is arguable, as the estimated model may favour/ overfit a particular marginal series by resulting in a larger number of exceedances than other marginal series.
Multiple criteria decision analysis (MCDA) techniques are widely used in the process of financial decision-making as in the areas of business management and accounting, economics and econometrics, computing, decision science, and financial mathematics and engineering (Zopounidis et al. 2015;Spronk et al. 2016).The multiobjective optimization (MOO) approach is one of the core methods of MCDA to systematically evaluate a set of financial actions or criteria that are usually conflicting in nature and produce an optimal decision in accordance with an overall or a global perspective (Steuer and Na 2003;Steuer et al. 2007;Durbach and Stewart 2012).The MOO approach can be employed to aggregate the results of local thresholds, which are identified by using the POT approach, of individual financial time series into a holistic evaluation framework.
In this study, we fill this gap by proposing a single value representing the optimal tail probability that considers the trade-off among the involved multiple time series within an MOO framework.The optimality of the threshold is assessed by its ability to achieve minimum deviations with reference to the tail probabilities of individual financial time series within the group.Mathematically, our proposed approach links up the POT method and MOO approach by using a set of data-driven regression functions that model the trade-off among various tail probabilities.The regression function set serves as the desired levels for the formulation of MOO.The optimality solution is found as the minimum Chebyshev variant weighted value.Our approach advances the development of the peaks-over-threshold method by considering the characteristics of a group of time series collectively instead of independently.
In practice, the assessment of the tail risk level of investment portfolios carried out by investors is based on an identical tail probability across multiple time series data (Dupuis and Jones 2006).The literature does not directly address the need to obtain a single optimal tail probability for achieving the best overall distributional fitness.Our proposed MOO framework fills this gap, and the single optimal value can be considered an optimal reference point.Furthermore, the assessment of the risk level beyond the MOO optimal tail probability can be accurately estimated by a single derived set of GPDs.It is not necessary to re-estimate a distinct set of GPDs for each user-specified probability separately (Dupuis and Jones 2006).Those portfolio tail risk levels can be examined via the truncated distributions conditioned on the user-specified value.In addition, the derived set of GPDs provided by our model provide the best overall distributional fitness under the restriction of having the same number of exceedances in all marginal time series.
The paper is organized as follows: Sect. 2 covers the background of the POT approach and the corresponding local threshold determination method.In Sect.3, we introduce the formulation of the proposed multiobjective optimization approach for optimal threshold determination with the use of a data-driven regression function.The new formulation is capable of considering the influence from multiple time series obtained from related financial markets effectively.In Sect.4, we describe the empirical data and address the simulation results.The proposed model is tested on four U.S. stock market index data for assessing its effectiveness in determining the MOO tail probability.Concluding remarks are provided in the last section.

Generalized Pareto distribution (GPD)
The POT approach in EVT is based on the formulation of a probability distribution, a generalized Pareto distribution (GPD), depicting the allocations of the extreme values beyond a certain threshold in terms of cut-off value, percentile, or tail probability (1-percentile).This approach extracts extreme values that exceed a sufficiently large threshold value or location parameter then converges to a limiting distribution function GPD with shape parameter and scale parameter .The probability density function (PDF) h , , (x) and cumulative distribution function (CDF) H , , (x) associated with the random variable X over (the extracted extreme val- ues) following GPD( , , ) are given as follows (Chikobvu and Jakata 2020; Chan  et al. 2022): where , ∈ R , ∈ R + , for x ≥ when ≥ 0, and ≤ x ≤ − when  < 0.

Local threshold
The POT threshold estimation method (Longin and Solnik 2001;Chan et al. 2022) assumes that the distribution of extreme values exceeding the local threshold can be modelled by a Monte Carlo simulation of Student's t-distribution with k degrees of freedom.In accordance with this assumption, the algorithm for finding the local threshold of a single time series can be summarized as the following algorithm: The variable k denotes the degree of freedom of the t-distribution.The choice of k = 1 to 10 is based on the suggested procedure (Longin and Solnik 2001), which considers the adequacy of capturing different degrees of tail fatness.The objective of the algorithm is to find the optimal degree of freedom k* and to locate the corresponding optimal tail probability * k .Thereafter, the corresponding optimal distributional parameters of GDP can be found accordingly.Additional details can be found in Appendix A of the paper (Longin and Solnik 2001).

MOO optimal threshold approach
The MOO approach identifies an optimal solution with respect to several objectives.The goal programming model (GPM) is widely adopted to solve multiobjective problems (Ghandforoush 1993;Tamiz et al. 1998;Jones and Tamiz 2010;Ghufran et al. 2015).A goal in a GPM refers to the desired level, target, or criterion to be attained.The essence of GPM is to incorporate multiple goals into an MOO formulation that can be solved by using a conventional single objective optimization approach (Steuer et al. 2003).For an identified goal, there are three types of target level: exactly the target level (Type I), at most the target level (less is better, Type II), and at least the target level (more is better, Type III) (Jones and Tamiz 2010).A deviational variable is used to compute the difference between the desired level and a given solution on an identified target.
Positive or negative deviational variables d + or d − represent the difference in the achieved values above or below the target level, respectively.The GPM algorithm minimizes the undesired deviational variables to attain the optimal solution with respect to their identified targets.For Type I goals, both positive and negative deviational variables are undesirable.For Type II and Type III goals, the positive and negative deviational variables are undesirable, respectively.
There are four commonly used formulations, namely, zero-one normalization weighted, percentage normalization weighted, Chebyshev variant weighted, and lexicographic GPMs (Ghandforoush 1993;Tamiz et al. 1998).In this study, we adopt the Chebyshev variant weighted GPM (GPM CVW ), which demands the Algorithm Determination of local thresholds for a financial time series optimal solution with minimum value across all weighted undesirable deviations to solve the multiobjective problem (Ghufran et al. 2015).
We extend Longin and Solnik's algorithm listed in Sect.2.2 with regression functions to capture the relationships between tail size and MSE of simulated data of the chosen degree of freedom.Based on the discrete simulation procedure, the original algorithm can only consider discrete tail sizes (i.e., α = 0.01, 0.02, ⋯, 0.20).Our method models the tail size as a continuous variable by using a polynomial linear regression function g i ( ) to represent a continuous path of all possible tail areas for a particular time series i.The parameters of g i ( ) are estimated by using the MSE of the simulated data obtained in step 4 of Algorithm 1 (i.e., MSE k * .) with the optimal degree of freedom k*.
The setup of regression functions is essential to link the POT model with the MOO approach.Each continuous regression function models the characteristics of an individual market, and the regression function set serves as the desired level for the formulation of MOO.The overall formulation of the GPM CVW is stated below: where g i ( ) represents the regression of MSE on tail probability , g i ( i ) represents the target value of minimum MSE , d − i represents the negative deviation from g i ( i ) , and d + i represents the positive deviation from g i ( i ) of the financial time series with index i = 1, 2, … , m.A 6 th order polynomial linear regression model g i ( i ) is used in this study to achieve a satisfactory empirical result.The regression coefficients are estimated by using the MSE k * .obtained in step 4 of Algorithm 1 with the optimal degree of freedom k*.upper and lower represent the upper and lower limits of the tails, respectively.Their values are set to 0.01 and 0.20 with reference to the suggested boundaries in the previous study (Longin and Solnik 2001).The solution of the GPM CVW represents the optimal tail probability with minimum deviations from the local values among the group of financial time series.Our proposed model is implemented using MATLAB 2019b software (The MathWorks, Inc., Natick, MA, USA) and LINGO 18.0 software (Lindo Systems Inc., Chicago, IL, USA).

Empirical data
The dataset presented in this study consists of 7556 daily log returns ( DLRs ) of four U.S. stock market indices, namely, the S&P 500, NASDAQ Composite, NYSE Composite, and Russell 2000, from 1 July 1992 to 30 June 2022 covering the periods of dot-com bubble (between April 2001and December 2001), global financial crisis (between January 2008and July 2009), and COVID-19 pandemic (between March 2020and June 2022).Each index consisting of time series data of 7557 trading day information of closing values (from 30 June 1992 to 30 June 2022) is downloaded from the website of Yahoo Finance.The DLR at time t is calculated as follows: For t = 1, 2, ⋯ , 7556 and C 0 is the closing value on 30 June 1992.

Daily log returns
The histograms, annualized means, maxima, minimums, annualized SDs, skewness, and kurtoses of the daily log returns depict the distribution patterns of the four U.S. stock market indices over the past thirty years (Fig. 2).The NASDAQ Composite achieved the highest annualized mean DLR of 0.0991 compared with 0.0753, 0.0742, and 0.0603 achieved by the Russell 2000, S&P 500, and NYSE Composite, respectively.

MOO optimal tail probability
The formulation of the GPM CVW is given by: The solution of the GPM CVW representing the MOO optimal tail probability is 0.04073.The distributions and curves of the GPD of the extreme values based on the optimal tail probability of 0.04073 attain global threshold values of 0.01967, 0.02702, 0.01884, and 0.02409 negative DLRs for the S&P 500, NASDAQ Composite, NYSE Composite, and Russel 2000 indices, respectively (Fig. 4).

Summary of GDP parameter estimates
The tail probabilities and number of exceedances, together with the corresponding GPD parameter estimates of the four U.S. stock market indices, are tabulated in Table 1.The optimal tail probability value (i.e., 0.04073) estimated by the MOO model provides the best overall distributional fitness under the restriction of having the same number of exceedances (i.e., 308) in all marginal time series.Slight differences among the GPD parameters can be observed.For the case in which the optimal tail probability is larger than the locally optimized tail probability, the GPD location parameter φ is adjusted to a smaller value to accommodate the inclusion of additional exceedances, and vice versa.No indicative pattern is observed for the values of shape β and scale ω parameters.
However, from the empirical analysis we conclude that the location parameter φ estimated with the use of various local and global threshold values is between 0.01884 and 0.02702 (mean = 0.02251) for the S&P 500, NASDAQ Composite, NYSE Composite, and Russell 2000 indices from 01 July 1992 to 30 June 2022 (Table 1).The result is comparable to the findings of a previous study that applied the POT approach to compute tail risk measures of daily returns for six global stock market indices (Gilli and Këllezi 2006).

Conclusion
Investigation of probable extreme or rare incidents plays a key role in the risk analysis of financial portfolio management in terms of loss or gain for investors with a long or short position on the portfolio, respectively.The application of MCDA is considered a useful tool for regulators, policy-makers, and individual, institutional, and corporate investors (Zopounidis et al. 2015).Accurate estimation of tail risk exposure of value at risk (VaR) and expected shortfall is crucial for risk analysis and portfolio management.The POT approach in EVT provides a powerful framework to quantify extreme risk measures that avoid the underestimation or overestimation of risk levels (Jakata and Chikobvu 2022).
The literature on the extreme value theory threshold optimization problem for multiple time series analysis does not consider determining a single optimal tail probability for all marginal distributions.With multiple tail probabilities, the discrepancy results in a differing number of exceedances, which may favour a particular marginal series.In this study, we develop a single optimal tail probability by integrating trade-offs among multiple time series within an MOO framework.Our method unites the POT technique and MOO framework.Here, we develop a set of regression functions, representing continuous paths of possible tail areas for multiple time series.The regression functions are then formulated as the desired levels within an MOO framework.The optimality of MOO tail probability is assessed by its ability to achieve minimum Chebyshev variant weighted deviations with reference to the tail probabilities of individual financial time series within the group.
Apart from proper estimation of the portfolio tail risk level, an identification of extreme returns is vital in formulating the connectedness among financial time series and in identifying contagion during financial turmoil (Bae et al. 2003).Researchers have investigated the linkages and temporal relationships for rare incidents in multivariate time series (Arnold et al. 2007;Chen and Chihying 2007) as well as transmission and comovements of prices in financial markets by using the correlation among stocks that can be employed to signify the dominance of specific stocks in accordance with the correlation-based network under consideration (Ganeshapillai et al. 2013).The optimal tail probability identified in this study can help investors connect a group of financial time series by using the same number of extreme returns and identify the risk that is generated in tandem.Furthermore, we consider the negative DLRs of four U.S. stock market indices, adopt the POT approach to identify the local threshold, and employ the Chebyshev variant weighted GPM to find the global threshold.In future studies, researchers can evaluate both positive and negative DLRs of stock prices or mar- ket indices in other countries (Koliai 2016;Będowska-Sójka et al. 2022), can use other methods, such as the block maxima (BM) method, to find the local threshold (Nakamura 2021;Będowska-Sójka et al. 2022), and can apply other weighted lexicographic GPMs to choose the global threshold (Jones and Tamiz 2010).
To conclude, we propose a novel MOO threshold determination framework that is capable of producing a single optimal tail probability for obtaining the same number of exceedances across multiple time series.The MOO tail probability can be considered an optimal reference point for practical risk investment portfolio analysis that employs an identical tail size across multiple time series data.The assessment of risk level beyond the MOO optimal tail can be accurately estimated via the truncated distributions conditioned on the user-specified value.Moreover, the calculation method is demonstrated empirically with the use of four U.S. stock market index data covering the period from 1 July 1992 to 30 June 2022.

Fig. 1
Fig. 1 Thirty-year historical closing price time series (consisting of 7556 trading days) of the four U.S. stock market indices from 1 July 1992 to 30 June 2022

Fig. 3
Fig. 3 The 6th-order polynomial regressions of mean squared error (MSE) on tail probability ( ) of the four U.S. stock market indices

Fig. 4
Fig.4The distributions and curves generated by the GPD of the 308 extreme negative daily log returns extracted from 7556 daily log returns of the four U.S. stock market indices