Hypothesis testing for varying coefficient models in tail index regression

This study examines the varying coefficient model in tail index regression. The varying coefficient model is an efficient semiparametric model that avoids the curse of dimensionality when including large covariates in the model. In fact, the varying coefficient model is useful in mean, quantile, and other regressions. The tail index regression is not an exception. However, the varying coefficient model is flexible, but leaner and simpler models are preferred for applications. Therefore, it is important to evaluate whether the estimated coefficient function varies significantly with covariates. If the effect of the non-linearity of the model is weak, the varying coefficient structure is reduced to a simpler model, such as a constant or zero. Accordingly, the hypothesis test for model assessment in the varying coefficient model has been discussed in mean and quantile regression. However, there are no results in tail index regression. In this study, we investigate the asymptotic properties of an estimator and provide a hypothesis testing method for varying coefficient models for tail index regression.


Introduction
In various fields, predicting the high-or low-tail behavior of data distribution is of interest.Examples include events such as heavy rains, large earthquakes, and significant fluctuations in stock prices.Extreme value theory is a standard approach for analyzing the data of such extremal events.Let Y 1 , Y 2 , . . ., Y n be independent and identically distributed random variables with distribution function F .In extreme value theory, the following maximum domain of attraction assumption is standard: Assume that there exist sequences of constants a n > 0 and b n ∈ R and a non-degenerate distribution function G such that for each continuity point y in G.This assumption implies that there exist a constant γ ∈ R and a positive function σ(t) such that for all y for which 1 + γy > 0, where y * = sup{y : F (y) < 1} ∈ (−∞, ∞] and the right-hand side for γ = 0 is interpreted as e −y (see, Theorem 1.1.6 of de Haan and Ferreira 2006).The class of distributions on the right-hand side is called the generalized Pareto distribution and the parameter γ is called the extreme value index.Therefore, in extreme value theory, the tail behavior of the data distribution is characterized by the extreme value index γ.Its existing estimators include the Hill estimator (Hill 1975), Pickands estimator (Pickands 1975), kernel estimator (Csorgo et al. 1985), maximum likelihood estimator (Smith 1987), and moment estimator (Dekkers et al. 1989), etc.It is noteworthy that the generalized Pareto distribution has different features depending on the sign of γ.If γ > 0, we have for all y > 0, where L(y) is a slowly varying function at infinity; i.e., L(ys)/L(y) → 1 as y → ∞ for all s > 0. The class of these distributions is called the Pareto-type distribution.This case seems to be common in areas such as finance and insurance, and we frequently observe extremely large values in the data compared to the case of γ ≤ 0. Therefore, many researchers in extreme value theory have focused on this case.The Hill estimator mentioned above is one of the estimators of the positive extreme value index γ and is widely used in many extreme value studies.In this study, we assume that the extreme value index γ is positive.
In recent years, various regression models of the conditional extreme value index have been studied in the so-called tail index regression, where the tail index refers to the inverse of the extreme value index.The nonparametric tail index regression estimators include Gardes and Girard (2010), Stupfler (2013), Daouia et al. (2013), Gardes and Stupfler (2014), Goegebeur et al. (2014Goegebeur et al. ( , 2015)), and Ma et al. (2020).For fully nonparametric methods, the curse of dimensionality arises when multiple covariates are used.However, in many applications, extremal events are often triggered by multiple factors, and we hope to consider these factors.To avoid the curse of dimensionality, Wang and Tsai (2009) studied the parametric tail index regression assuming the linear model.However, in some applications of extreme value theory, the linear model may be too simple to predict the tail behavior of the distribution of the response.As an extension of Wang and Tsai (2009), Youngman (2019) studied additive models, Li et al. (2022) developed partially linear models, Yoshida (2022) provided single-index models, and Ma et al. (2019) provided varying coefficient models.The varying coefficient model is useful for analyzing time series and longitudinal data, etc.Because time or location is often important in many applications of extreme value theory, the varying coefficient model is expected to be useful in tail index regression.We are also interested in tail index regression assuming the varying coefficient model.
The varying coefficient models pioneered by Hastie and Tibshirani (1993) assume that the regression function m Y (X, T) of interest satisfies m Y (X, T) = X ⊤ θ(T) for the given explanatory variable vectors X and T, and the response variable Y , where θ(•) is the vector of unknown smooth functions of T, which is denoted by the coefficient function vector.In mean and quantile regression, many authors have developed varying coefficient models, such as those of Wu et al. (1998), Fan and Zhang (1999), Huang et al. (2002Huang et al. ( , 2004)), Kim (2007), Cai and Xu (2008), and Andriyana et al. (2014,2018).Fan and Zhang (2008) provided a review article on the varying coefficient model.Some of these studies examined not only the estimation methods of the coefficient function, but also the hypothesis testing methods.We denote θ(•) = (θ 1 (•), θ 2 (•), . . ., θ p (•)) ⊤ .The hypothesis test for the constancy of a specific component can be represented as H 0C : θ j (•) ≡ C 0 vs. H 1C : θ j (•) ≡ C 0 for an unknown constant C 0 , where H 0C is the null hypothesis and H 1C is the alternative hypothesis.It is particularly important to test the sparsity of a specific covariate, which can be expressed as where H 0Z is the null hypothesis and H 1Z is the alternative hypothesis.The varying coefficient model is flexible, but simpler models provide an easy interpretation of the data structure in real data analysis.The above hypothesis tests help to reduce the varying coefficient model to a simpler model.In mean and quantile regression, testing methods based on the comparison of the residual sum of squares include Cai et al. (2000), Fan et al. (2001), Huang et al. (2002), and Kim (2007), among others, where they used the bootstrap to implement their test.In mean regression, Fan and Zhang (2000) proposed the testing method based on the asymptotic distribution of the maximum deviation between the estimated coefficient function and true coefficient function.
In this study, we employ a logarithmic transformation to link the extreme value index of the response variable Y to the explanatory variable vectors X and T via log γ(X, T) −1 = X ⊤ θ(T).
To the best of our knowledge, Ma et al. (2019) also studied this model.They provided a kernel-type nonparametric estimator of θ(T) and established asymptotic normality.However, they did not discuss hypothesis testing.Therefore, there are no results for the hypothesis tests in tail index regression.Our study aims to establish a testing method for varying coefficient models for tail index regression.
The remainder of this paper is organized as follows.Section 2 introduces the local constant (Nadaraya-Watson type) maximum likelihood estimator of the coefficient functions, and Section 3 investigates its asymptotic properties.Section 4 introduces the proposed method for testing the structure of the coefficient functions and demonstrates the finite sample performance through simulations.A real example is analyzed in Section 5.All technical details are provided in Appendix.

Varying coefficient models in tail index regression
Let Y > 0 be the univariate response variable of interest, X = (X 1 , X 2 , . . ., X p ) ⊤ ∈ R p be the p-dimensional explanatory variable vector, and T = (T 1 , T 2 , . . ., T q ) ⊤ ∈ R q be the q-dimensional explanatory variable vector.In addition, let F (y | x, t) = P (Y ≤ y | X = x, T = t) be the conditional distribution function of Y given (X, T) = (x, t).We consider the Pareto-type distribution where γ(x, t) is a positive function of x and t, and L(y; x, t) is a covariate-dependent slowly varying function at infinity; i.e., L(ys; x, t)/L(y; x, t) → 1 as y → ∞ for any s > 0. We assume that the slowly varying function L(y; x, t) converges to a constant at a reasonably high speed.Specifically, we assume where c 0 (x, t), c 1 (x, t) and β(x, t) are functions of x and t with c 0 (x, t) > 0 and β(x, t) > 0, and o(y −β(x,t) ) is a higher-order term.This assumption is called the Hall class (Hall 1982).In our study, we adopt the varying coefficient model for the conditional extreme value index γ(x, t) as , and θ j (t), j = 0, 1, . . ., p are the unknown smooth functions of t.

Local constant maximum likelihood estimator
Let f (y | x, t) be the conditional probability density function of Y given (X, T) = (x, t).If L(•; x, t) is differentiable, we obtain Because L(y; x, t) → c 0 (x, t) and ∂L(y; x, t)/∂y → 0 as y → ∞ by using (2.2), we have . ., n} be an independent and identically distributed random sample with the same distribution as (Y, X, T).We introduce a sufficiently high threshold ω n > 0 such that ω n → ∞ as n → ∞ and use the responses that exceed it.Let f (y | x, t, ω n ) be the conditional probability density function of Y given (X, T) = (x, t) and Y > ω n .Then, we have for y > ω n .Thus, we can estimate the coefficient function vector θ(t) by using the following weighted maximum likelihood approach: Let where θ ∈ R p+1 , I(•) is an indicator function, K(•) is a kernel function on R q , and H n = diag(h n1 , . . ., h nq ) is a q-order diagonal matrix whose components are bandwidths h nk , k = 1, 2, . . ., q such that h nk → 0 as n → ∞.We define the estimator of the coefficient function vector θ(t) as the minimizer of the objective function L n (θ).We denote this estimator by θ(t) = ( θ 0 (t), θ 1 (t), . . ., θ p (t)) ⊤ ∈ R p+1 .Ma et al. (2019) provided the local linear maximum likelihood estimator.When p = 0 and q = 0, the covariate-independent estimator θ 0 is explicitly obtained, and we have , which is the Hill estimator proposed by Hill (1975) and is widely used in univariate extreme value theory.Note that the varying coefficient model corresponds to linear and nonparametric models as special cases.When q = 0, (2.3) is simplified as where θ = (θ 0 , θ 1 , . . ., θ p ) ⊤ ∈ R p+1 , and θ j , j = 0, 1, . . ., p are the regression coefficients.Wang and Tsai (2009) studied this tail index regression model.Whereas, when p = 0, we obtain a nonparametric estimator of the positive extreme value index as , which was studied by Goegebeur et al. (2014Goegebeur et al. ( , 2015)).

Bandwidths and threshold selection
The threshold ω n and bandwidths h nk , k = 1, . . ., q are tuning parameters that control the balance between the bias and variance of the estimator θ(t).A larger value of h nk or smaller value of ω n leads to more bias, whereas a larger value of ω n or smaller value of h nk leads to a larger variance.Therefore, these tuning parameters must be appropriately selected.
The threshold selection is needed to obtain the good approximation of (2.4).To achieve this, the discrepancy measure, which was proposed by Wang and Tsai (2009), is suitable.Meanwhile, the choice of the bandwidths controls the smoothness of the estimator.Therefore, we use the cross-validation to select the bandwidths, similar to other studies using kernel smoothing (e.g., Ma et al. 2019).Thus, we combine the discrepancy measure and cross-validation as the whole tuning parameter selection method.The algorithm of the tuning parameter selection is as follows.In the first step, we select the bandwidths h nk , k = 1, . . ., q by D-fold cross-validation as which is based on (2.5), where ω 0 is a predetermined threshold, ⌊•⌋ is the floor function, and {(Y is the proposed estimator, with ω n = ω 0 and H n = H, which is obtained from the dth training dataset.In the second step, we select the threshold ω n using the discrepancy measure.We denote the order statistics of {exp{− exp((1, ) is the number of responses that exceed the threshold ω n .Because the conditional distribution of exp{− exp((1, X ⊤ )θ(T)) log(Y /ω n )} given Y > ω n is approximately standard uniform, we can regard { U l,n 0 } n 0 l=1 as a sample from the standard uniform distribution.Therefore, we select the threshold ω n as where 3 Asymptotic properties

Conditions
In this section, we investigate the asymptotic properties of our proposed estimator.
The following technical conditions are required: We define n 0 , where f T (t) is the marginal probability density function of T. We also define and (C.1)The kernel function K(•) is an absolutely continuous function that has compact support and satisfies the conditions 2) The joint probability density function f (y, x, t) of (Y, X, T) and the coefficient function θ j (t) have continuous second-order derivative on t.
as n → ∞ for all t ∈ R q , where I p+1 is a (p + 1)-order identity matrix and the symbol " P − →" stands for convergence in probability.

Asymptotic properties
We define and The above Ln (θ) and Ln (θ) are the gradient vector and Hessian matrix of the objective function L n (θ), respectively.The proposed estimator θ(t) is defined as the minimizer of L n (θ) and satisfies Ln ( θ(t)) = 0. Therefore, similar to common approaches for establishing the asymptotic normality of the maximum likelihood estimator, we need to investigate the asymptotic properties of Ln (θ) and Ln (θ).
Theorem 1.Let us suppose that conditions (C.1)-(C.5)are satisfied; then, as n → ∞, where the symbol " D − →" denotes convergence in the distribution, From Theorems 1 and 2, we obtain the following asymptotic normality of our proposed estimator θ(t): Theorem 3. Let us suppose that conditions (C.1)-(C.5)are satisfied; then, as n → ∞, This result implies that θ(t) is the consistent estimator of θ(t).The convergence rate of θ(t) to θ(t) is on the same order as is proportional to the number of top-order statistics of the responses used for estimation at t.The Σ n (t) is defined in Section 3.1.The asymptotic bias is caused by two factors.The bias b(t) is caused by the approximation of the tail of the conditional distribution of Y by the Pareto distribution in (2.4), which is related to the convergence rate of the slowly varying function L(•; x, t) to the constant c 0 (x, t).From the definition of b(t) given in (C.5), we can see that the proposed estimator is more biased for larger γ(x, t).In other words, the heavier the tail of the data, the more biased the estimator.Meanwhile, if β(x, t) is small, the large bias of the estimator is occurred.Thus, the bias of our estimator is particularly sensitive to γ(x, t) and β(x, t).These parameters are related to the second order condition in extreme value theory (see, Theorem 3.   Our asymptotic normality is comparable to the asymptotic normality of the local linear maximum likelihood estimator of the coefficient function vector proposed by Ma et al. (2019).The difference between the two estimators is the asymptotic bias.In the asymptotic normality in Ma et al. (2019), it is assumed that the bias caused by the approximation (2.4) is negligible, so the bias b(t) does not appear in their asymptotic normality.The essential difference is the bias caused by kernel smoothing.In the case of Ma et al. (2019), the bias caused by kernel smoothing is Λ (2) n (t).However, it has the same convergence rate as the bias Λ The asymptotic variances of the two estimators are the same.
4 Testing for structure of the coefficient function

Testing method
In varying coefficient models, we often hope to test whether each coefficient function θ j (•) is constant or zero.If some θ j (t) does not vary across t, this motivates us to consider models that are simpler than the varying coefficient model.Generally, the hypothesis test can be represented as for a given known function η(•), where H 0 is the null hypothesis and H 1 is the alternative hypothesis.Without a loss of generality, we assume that the explanatory variable vector T ∈ R q is distributed on [0, 1] q ⊂ R q .Then, we apply Lemma 1 of Fan and Zhang (2000) to where , j = 0, 1, . . ., p and X 0 ≡ 1.The following conditions are required: (C.6)For all large n ∈ N, the function σ nj (t) is bounded away from zero for all t ∈ [0, 1] q and has a bounded partial derivative.
as n → ∞, where det(Ξ) 4qπ (Rosenblatt 1976) and From Theorem 3, we now have θ(t) → P θ(t) as n → ∞.By the first-order Taylor expansion around θ j (t) = θ j (t), we obtain The left-hand side of the above equation is zero because θ(t) = ( θ 0 (t), θ 1 (t), . . ., θ p (t)) ⊤ is the minimizer of L n (θ).From Theorems 2 and 3, we also have as n → ∞.Consequently, we have This implies that ψ(t) in Theorem 4 can be approximated as n (t).This bias involves many unknown parameters.In particular, β(x, t) included in b(t) corresponds to the so-called second order parameter (see, Gomes et al. 2002).However, the estimation method of the second order parameter has not yet been developed in the context of the tail index regression.Thus, at the present stage, checking that (C.5) is satisfied and estimating E[ψ(t)] are challenging and are posited as future work.Therefore, in this paper, we assume that E[ψ(t)] is zero, similar to Wang and Tsai (2009).Then, Theorem 4 can be used to test if (4.1).Under the null hypothesis H 0 : θ j (t) ≡ η(t), we use the test statistic where [n(t)σ nj (t)] is the kernel estimator of n(t)σ nj (t) based on (C.3).For a given significance level α, we reject the null hypothesis H As mentioned above, we are mainly interested in the following two hypothesis tests: One is If the null hypothesis H 0Z is not rejected, the corresponding X j may not be important for predicting the tail behavior of the distribution of the response Y .Thus, this can help judge the sparsity of a particular covariate.The other is for an unknown constant C 0 without prior knowledge.Under the null hypothesis H 0C , we estimate the unknown constant C 0 as the average of the estimates { θ j (t l )} L l=1 , where t l , l = 1, 2, . . ., L are equally spaced points in [0, 1] q .If the null hypothesis H 0C is not rejected, it motivates us to adopt a simpler model that considers the coefficient function θ j (•) to be constant.
The simultaneous test from Theorem 4 is more rigorous than the test statistic based on the residual sum of squares (see, Cai et al. 2000).Here, we consider the separate hypotheses for each coefficient.One might think that the single hypothesis test on all coefficients would be of interest.However, such an extension is really difficult because we have to consider the distribution of sup t { θ 0 (t), θ 1 (t), . . ., θ p (t)}.
In fact, such a method has not even been studied in the context of mean regression.Thus, the development of a simultaneous testing method into a single hypothesis test on all coefficient functions is posited as an important future work.

Simulation
We ran a simulation study to demonstrate the finite sample performance of the proposed estimator and test statistic.We present the results for the three model settings.In all settings, we simulated the responses {Y i } n i=1 using the following conditional distribution function: where log γ(x, t) −1 = (1, x ⊤ )θ(t).This conditional distribution function satisfies (2.2) with c 0 (x, t) = 1 + δ, c 1 (x, t) = −δ(1 + δ), and β(x, t) = γ(x, t) −1 .If δ = 0, the above conditional distribution is not the Pareto distribution; therefore, we need to introduce the threshold ω n appropriately.Otherwise, modeling bias occurs, resulting in less accuracy in the estimation.We simulated the predictors {(X i1 , X i2 , . . ., X ip )} n i=1 based on the following procedure: where ] with unit variance.Meanwhile, we simulated the predictors {(T i1 , T i2 , . . ., T iq )} n i=1 from a uniform distribution on [−0.2, 1.2] q ⊂ R q with cov[T ik 1 , T ik 2 ] = 0.
To measure the goodness of the estimator θ(t), we calculated the following empirical mean square error based on M = 100 simulations: j (•) is the estimate of θ j (•) using the mth dataset and {t l } L l=1 are equally spaced points in [0, 1] q .In addition, to evaluate the performance of the test statistic, we obtained the probability of error as follows.When the null hypothesis is true, the empirical probability of the Type I error is defined as where T m is the test statistic T using the mth dataset.Meanwhile, when the null hypothesis is false, the empirical probability of the Type II error is given by E2 = #{m : e α/2 < T m < e 1−α/2 , m = 1, 2, . . ., M }/M.Now, the null hypotheses of interest, H 0C and H 0Z , are defined in Section 4.1.Accordingly, if the null hypothesis H 0C is true, i.e., the given coefficient function θ j (t) is constant, we provide E1 to examine the performance of the constancy test; if not, E2 is provided.Similarly, if the null hypothesis H 0Z is true, i.e., θ j (t) ≡ 0, E1 is used to evaluate the accuracy for the sparsity test; if not, the result for H 0Z is given as E2.
In the first model setting, we set p = 3 and q = 1 and defined the coefficient functions θ j (•), j = 1, 2, 3 as where the intercept term θ 0 (t) was not considered.We employed the Epanechnikov kernel in the proposed estimator.In the estimation process, we selected the threshold ω n and bandwidth h n using the procedure described in Section 2.3.We set the pre-determined sample fraction to n 0 /n = 0.2 in D = 20-fold cross-validation, where n 0 = n i=1 I(Y i > ω n ).Table 1 shows the calculated MSEs and empirical probabilities of error for each coefficient function θ j (•) when δ = 0.1, 0.25, 0.5 and n = 200, 500, 1000.For each coefficient function θ j (•), the calculated MSEs improved as n increased.This result is desirable and suggests the consistency of the proposed estimator.Note that when testing the null hypothesis H 0C , we must estimate the unknown constant C 0 .Since the maximum deviation between θ j (t) and the estimated C 0 tends to be smaller than the maximum deviation between θ j (t) and the true value C 0 , the empirical probabilities of the Type I error were smaller for the null hypothesis H 0C than for the null hypothesis H 0Z .In all settings, the empirical probability of the Type II error improved as n increased.
The second model setting focuses on the case where p is larger than in the first model setting.We set p = 10 and q = 1 and defined the coefficient functions θ j (•), j = 1, 2, . . ., 10 as where the intercept term θ 0 (t) was not considered.The kernel function was the Epanechnikov kernel, and the tuning parameters were selected in the same manner as in the first model setting.Table 2 shows the calculated MSEs and empirical probabilities of error for each coefficient function θ j (•) when δ = 0.1, 0.25, 0.5 and n = 200, 500, 1000.The accuracy of the estimator and test statistic improved as n increased, with no significant deterioration compared to the first model setting with p = 3, indicating that the proposed model can avoid the curse of dimensionality even when the dimension p is large.Figure 1 shows the results of the estimation.The two dotted lines are plots of the 5th and 95th largest estimates of the M = 100 estimates at each point t ∈ [0, 1].The average estimates (dashed line) resembled the true value (solid line).
In the third model setting, we set p = 2 and q = 2 and defined the coefficient functions θ j (•), j = 0, 1, 2 as We employed the kernel function of the Epanechnikov type as follows: The tuning parameters were selected in the same manner as in the first model setting.Table 3 shows the calculated MSEs and empirical probabilities of error for each coefficient function θ j (•) when δ = 0.1, 0.25, 0.5 and n = 3000, 5000.As with the first and second settings, the accuracy of the estimator and test statistic improved as n increased.
We note that Tables 1-3 show the results of the hypothesis tests when the tuning parameters are automatically selected based on each dataset.As a result, the Table 2: Results of estimation and hypothesis testing in the second model setting.E1 and E2 represent the empirical probabilities of Type I and Type II errors, respectively.Since θ 1 (t) = 1 is a nonzero constant, the null hypothesis H 0C is true, whereas the null hypothesis H 0Z is false.Accordingly, we provide E1 for H 0C and E2 for H 0Z .Meanwhile, since θ 2 (t) = cos(2t) is not constant, the null hypotheses H 0C and H 0Z are false and thus E2 is given for both tests.Similarly, θ j (t) = 0, j = 3, 4, . . ., 10 indicate that the null hypotheses H 0C and H 0Z are true.Thus, for each θ j (t) = 0, j = 3, 4, . . ., 10, E1 is provided for both tests.the empirical probabilities of Type I and Type II errors, respectively.Since θ 0 (t) = 2 is a nonzero constant, the null hypothesis H 0C is true, whereas the null hypothesis H 0Z is false.Accordingly, we provide E1 for H 0C and E2 for H 0Z .Meanwhile, since θ 1 (t) = − exp(−10 t − (0.5, 0.5) ⊤ 2 ) is not constant, the null hypotheses H 0C and H 0Z are false and thus E2 is given for both tests.Similarly, θ 2 (t) = 0 indicates that the null hypotheses H 0C and H 0Z are true.Thus, for θ 2 (t) = 0, E1 is provided for both tests.

Application
In this section, we apply the proposed method to a real dataset on white blood cells.
The dataset is available in Kaggle (https://www.kaggle.com/amithasanshuvo/cardiac-data-nhanes)White blood cells play a role in processing foreign substances such as bacteria and viruses that have invaded the body, and are a type of blood cell that is indispensable for maintaining the normal immune function of the human body.Therefore, if the white blood cell count is abnormal, diseases may be suspected.The top left and right panels of Figure 2 show histograms of the white blood cell counts for n = 18047 males and n = 19032 females aged 20 to 85, respectively, and the bottom two panels show histograms for those over 15 (×10 3 /µL).We can judge whether the tails of these distributions have a positive extreme value index by comparing them to the normal distribution with a zero extreme value index.In many extreme value studies, kurtosis is often used.The sample kurtosis was about 403.8 for males and about 38.3 for females, indicating that the right tails of these distributions are heavy.In addition, Figure 3 shows plots of the subject's age and white blood cell count, suggesting that the number of abnormal white blood cell counts tends to increase with age.
The dataset also contains percentages by type: neutrophils, eosinophils, basophils, monocytes, and lymphocytes.White blood cell differentiation is a clinical test that identifies the types of white blood cells that cause an abnormal white   The three dimensional scatter plots of (X j , T, Y ), j = 1, 2, 3, 4 with Y > ω n for male.For the top left, top right, bottom left and bottom right panels, X j is the percentage of eosinophils, basophils, monocytes and lymphocytes in the white blood cells, respectively.blood cell count.These five types have different immune functions and can help detect certain diseases.The sample averages were about 58.02, 3.10, 0.69, 8.39, and 29.84% for males and about 58.70, 2.57, 0.71, 7.47, and 30.59% for females, respectively.Neutrophils and lymphocytes comprised the majority of white blood cells, and the correlation coefficient calculated from the transformed observations, as described below, was approximately −0.93 for males and −0.95 for females.In other words, there was a strong negative correlation between the percentages of neutrophils and lymphocytes.In this analysis, we define the response Y as the white blood cell count; the predictors X 1 , X 2 , X 3 and X 4 as the percentages of eosinophils, basophils, monocytes and lymphocytes in the white blood cells; and the predictor T as age.We denote X = (X 1 , X 2 , X 3 , X 4 ) ⊤ .
Figures 4 and 5 show the three-dimensional scatter plots of each (X j , T, Y ) for male and female, respectively.As shown in these figures, the predictors X 1 , X 2 , X 3 and X 4 had many outliers.However, excluding these outliers also excludes the extreme values of the response Y .Therefore, we apply the normal score transformation to where all observations are jittered by uniform noise before applying the normal score transformation.Consequently, the redefined predictors X 1 , X 2 , X 3 , and X 4 are normally distributed.Wang and Tsai (2009) applied a similar transformation in their analysis of real data.The three dimensional scatter plots of (X j , T, Y ), j = 1, 2, 3, 4 with Y > ω n for female.For the top left, top right, bottom left and bottom right panels, X j is the percentage of eosinophils, basophils, monocytes and lymphocytes in the white blood cells, respectively.
We assume that the conditional distribution function of Y given (X, T ) = (x, t) where L(•; x, t) is a slowly varying function satisfying (2.2), and where x = (x 1 , x 2 , x 3 , x 4 ) ⊤ ∈ R 4 , and θ j (t), j = 0, 1, 2, 3, 4 are unknown smooth functions of t.The aim of the analysis is to investigate the effect of X j on the extreme values of Y , where the effect of X j varies with T .To do this, we first estimate the unknown coefficient functions θ j (•), j = 0, 1, 2, 3, 4.Then, we select the threshold ω n and bandwidth h n using the procedure described in Section 2.3.We employ the Epanechikov kernel in the proposed estimator and set the pre-determined sample fraction to n 0 /n = 0.030 in D = 20-fold cross-validation, where n 0 = n i=1 I(Y i > ω n ).We obtained the optimal tuning parameters as (h n , n 0 /n) = (0.21, 0.042) for male and (h n , n 0 /n) = (0.30, 0.036) for female.Figure 6 shows the estimated coefficient functions θ j (•), j = 0, 1, 2, 3, 4 by the solid line and the following pointwise 95% confidence intervals computed from the asymptotic normality of the proposed estimator by the dashed lines: Figure 6: The estimated coefficient functions (solid line) and its 95% confidence intervals (dashed lines) with bias ignored for male (first column) and female (second column).
where the bias is ignored, n(t)σ nj (t) defined in Section 4.1 is estimated based on (C.3), and ν = K(u) 2 du.For all coefficient functions, the trends were similar for male and female.The decreasing trend in the estimated intercept term θ 0 (•) indicates that the number of abnormal white blood cell counts tends to increase  4 presents the results of the statistical hypothesis tests for sparsity and constancy, as defined in Section 4.1.For the significance level α = 0.05, we reject the null hypothesis if T < −0.61 or T > 4.37.The null hypothesis H 0Z for sparsity was rejected for all coefficient functions, except θ 2 (•) for both male and female.In addition, the null hypothesis H 0C for constancy was rejected for θ 1 (•) and θ 4 (•) for male and θ 0 (•) and θ 4 (•) for female.Remarkably, eosinophils and monocytes, which represented a small percentage of white blood cells, were associated with abnormal white blood cell counts.
We evaluate the goodness of fit of the model using the Q-Q plot (quantilequantile plot).We regard {exp((1, . ., n} as a random sample from the standard exponential distribution.Figure 7 shows plots of these empirical and theoretical quantiles.The two dashed lines show the pointwise 95% confidence intervals computed in the simulations.We can infer that the better the plots are aligned on a straight line, the better the model fits the data.Most of the plots were within the 95% confidence interval and the goodness of fit  of the model did not seem to be bad.In contrast, Figure 8 shows the plots for the linear model proposed by Wang and Tsai (2009), where the predictors are defined as T scaled on [0, 1], X 1 , X 2 , X 3 and X 4 .In this case, many plots were outside the 95% confidence interval and deviated significantly from the straight line, indicating that our model fits the data better.Finally, because the null hypotheses H 0Z and H 0C were not rejected for θ 2 (•) and θ 3 (•), we adopt a simpler model.We consider the model which assumes the sparsity of X 2 .For the model (5.1), the discrepancy measure value described in Section 2.3 was approximately 4.322 × 10 −4 for males and 3.015 × 10 −4 for females.Meanwhile, for the model (5.2), the discrepancy measure value was approximately 3.730 × 10 −4 for males and 3.017 × 10 −4 for females, where (h n , n 0 /n) = (0.22, 0.042) for males and (h n , n 0 /n) = (0.30, 0.036) for females.The discrepancy measure values for females were not very different between the two models, but the discrepancy measure value for males was smaller in the model (5.2) than in the model (5.1).Moreover, we consider the model where θ 3 is the average of the estimates { θ 3 (t l )} L l=1 obtained in model (5.1), which is a known constant.For the model (5.3), the discrepancy measure value was approximately 3.628× 10 −4 for males and 3.104× 10 −4 for females, where (h n , n 0 /n) = (0.19, 0.042) for males and (h n , n 0 /n) = (0.30, 0.036) for females.The discrepancy measure value for males was smaller in the model (5.3) than in the model (5.1).Therefore, from the point of view of the discrepancy measure, the data structure may be explained by a simpler model.

Appendix
In this appendix, we prove Theorems 1-3 for t = t 0 ∈ R q .For convenience, the intercept term θ 0 (•) is not considered.
Proof of Theorem 1. Ln (θ(t 0 )) can be regarded as the sum of independent and identically distributed random variables.To apply the Central Limit Theorem, we show Ln (θ(t 0 ))] → νI p as n → ∞ in the second step, where "var" denotes the variance-covariance matrix.
Step 1.We can write Ln (θ(t 0 )) as n .By the Taylor expansion and the condition (C.1), we have . From the condition (C.4) and model assumptions (2.1) and (2.2), we have Analogously, we have Under the condition (C.5), we have n ] → −b(t 0 ) as n → ∞.Using the second-order Taylor expansion, we have exp Therefore, by the Taylor expansion and condition (C.1), we have Because the conditional distribution of γ(X, t 0 ) −1 log(Y /ω n ) given (X, T) = (x, t 0 ) and Y > ω n is approximately a standard exponential, we have where f (X,T) (x, t) denotes the marginal density function of (X, T).Therefore, the right-hand side of (A.1) can be written as where Λ n (t) and Λ (2) n (t) are defined in Section 3.2.Therefore, we have E Hence, the proof of the first step is completed.
Step 2. We abbreviate as From the result of Step 1, the second term on the right-hand side converges to the zero matrix as n → ∞.Using the Taylor expansion, the first term on the right-hand side can be written as = n(t 0 )[n 0 (t 0 )Σ n (t 0 )] −1/2 Under the condition (C.3), the right-hand side converges to I p as n → ∞.Hence, the proof of the third step is completed.
The condition (C.1) is typically used for kernel estimation.The conditions (C.3)-(C.5)correspond to the conditions (C.1)-(C.3) of Wang and Tsai (2009).The condition (C.3) requires that a certain weak law of large numbers holds.The condition (C.4) regularizes the extreme behavior of the slowly varying function L(y; x, t).
2.5 of de Haan and Ferreira 2006, Theorems 2 and 3 of Wang and Tsai 2009, and Theorem 2 of Li et al. 2022, to name a few).In contrast, the biases Λ

Figure 2 :
Figure 2: The histograms of the response Y for male (top left panel) and female (top right panel): The bottom two panels show the histograms of the response Y greater than 15 for male (bottom left panel) and female (bottom right panel).

Figure 3 :
Figure 3: The time series plots of Y for male (left panel) and female (right panel), where Y exceeds the threshold ω n .
Figure5: The three dimensional scatter plots of (X j , T, Y ), j = 1, 2, 3, 4 with Y > ω n for female.For the top left, top right, bottom left and bottom right panels, X j is the percentage of eosinophils, basophils, monocytes and lymphocytes in the white blood cells, respectively.

Figure 7 :
Figure 7: The Q-Q plots for the proposed model for male (left panel) and female (right panel).

Figure 8 :
Figure 8: The Q-Q plots for the linear model proposed by Wang and Tsai (2009) for male (left panel) and female (right panel).

Table 1 :
Results of estimation and hypothesis testing in the first model setting.E1 and E2represent the empirical probabilities of Type I and Type II errors, respectively.Since θ 1 (t) = 1 is a nonzero constant, the null hypothesis H 0C is true, whereas the null hypothesis H 0Z is false.Accordingly, we provide E1 for H 0C and E2 for H 0Z .Meanwhile, since θ 2 (t) = cos(2t) is not constant, the null hypotheses H 0C and H 0Z are false and thus E2 is given for both tests.Similarly, θ 3 (t) = 0 indicates that the null hypotheses H 0C and H 0Z are true.Thus, for θ 3 (t) = 0, E1 is provided for both tests.

Table 3 :
Results of estimation and hypothesis testing in the third model setting.E1 and E2 represent

Table 4 :
The results of the hypothesis testing.H 0C is the null hypothesis that θ j (t) is constant, and H 0Z is the null hypothesis that θ j (t) is zero.For the significance level α = 0.05, we reject the null hypothesis if T < −0.61 or T > 4.37.