Strong consistency of local linear estimation of a conditional density function under random censorship

In this paper, we study nonparametric local linear estimation of the conditional density of a randomly censored scalar response variable given a functional random covariate. We establish under general conditions the pointwise almost sure convergence with rates of this estimator under α -mixing dependence. Finally, to show interests of our results, on the practical point of view, we have conducted a computational study, ﬁrst on a simulated data and, then on some real data concerning Kidney transplant data.


Introduction
Conditional density plays an important role; not only in exploring relationships between responses and covariates, but also in financial econometrics (see Ait-Sahalia [1]). A vast variety of papers use the estimators of conditional densities as building blocks. These papers include those of Robinson [20], Tjøstheim [23], among others. However, in all of these papers, the conditional density function is indirectly estimated. Hyndman et al. [13] have studied the kernel estimator of conditional density estimator and its bias corrected version. There are many advantages of using local linear regression, such as the lack of boundary modifications, high minimax efficiency, and ease of implementation. Then, Bashtannyk and Hyndman [3] have suggested several simple and useful rules for selecting bandwidths for conditional density estimation. Hall et al. [11] applied the crossvalidation technique to estimate the conditional density. Fan and Yim [9] proposed a consistent data-driven bandwidth selection procedure for estimating the conditional density functions.
In the last decade, the kernel method has been largely used for nonparametric functional data study; in this context, we refer to the monograph of Ferraty and Vieu [10]. Since then, many interesting publications have appeared. According to the literature, results on the local linear modeling in the functional data setting are limited. Baillo and Grané [4] proposed a local linear estimator (LLE) of the regression operator when the explanatory variable takes values in a Hilbert space, and then when the explanatory variable takes values in a semi-metric space. Demongeot et al. [7] presented local linear estimation of the conditional density when the data are functional. Furthermore, Messaci et al. [19] used the same approach to estimate the conditional quantile of a scalar response given a functional explanatory variable in the i.i.d. case. In these papers, it is assumed that the observations are complete.
Censored data analysis is a major issue in survival studies. Censored data, truncated data, missing data, and current status data are among the complex data structures in which only partial information on the variable of interest is available (see Kaplan and Meier [14]). In real data applications, censoring is a condition in which the value of a measurement or observation is only partially known. For this, we observe the censored lifetime (C i ) for i = 1, . . . , n of the variable under study instead of the real lifetime (Y i ) for i = 1, . . . , n (which has a continuous cumulative function (cdf) F(.)). We assume that {Y i , i ≥ 1} is a stationary sequence of lifetimes and {C i , i ≥ 1} is a sequence of i.i.d censoring random variable with common unknown continuous (cdf) G(.), and we observe only the n pairs with ∧ denoting minimum and 1 1 A denoting the indicator function on a set A).
The distinguishing characteristic of censoring has attracted the attention of many researchers. Beran [5] introduced a nonparametric estimate of the conditional survival function and showed some consistency results. Many other properties of the conditional distribution have been broadly studied in the literature (see Stute [22]). Furthermore, in the same context, many results on conditional quantile and conditional mode have been given (see Horrigue and Ould Saïd [12], Khardani and Thiam [15]).
The present study extends the result of Demongeot et al. [7] to censored data under general conditions. We establish the almost sure consistencies with convergence rates of the conditional density estimator when the explanatory variable is of functional type.
The paper is organized as follows. Section 2 is devoted to the presentation of our estimator. In Sect. 3, we introduce notations and assumptions, and state the main results. In Sect. 4, we conduct a simulation study that shows the performance of the considered estimator. Finally, proofs of our results are gathered in Sect. 5.

Construction of the conditional density estimator
Consider n pairs of random variables (X i , Y i ) for i = 1, . . . , n drawn from the pair (X, Y ) with values in F × R, where F is a semi-metric space equipped with a semi-metric d. In this paper, we consider the problem of nonparametric estimation of the conditional Y given X = x when the responses variable (Y i ) for i = 1, . . . , n are right censored and when the observations (X i , Y i ) for i = 1, . . . , n are strongly mixing. Furthermore, we denote by (C i ) for i = 1, . . . , n the censoring random variables which are independent and identically distributed with a common unknown continuous distribution function G. Thus, we observe the triplets (X i , T i , δ i ) for i = 1, . . . , n, where T i = Y i ∧ C i and δ i = 1 {Y i ≤C i } (with ∧ denoting minimum and 1 A denoting the indicator function on a set A). We suppose that (Y i ) for i = 1, . . . , n and (C i ) for i = 1, . . . , n are independent which ensures the identifiability of the model. In the case of complete data, we adopt the fast functional locally modeling, introduced by Barrientos-Marin et al. [2] for regression analysis, that is, we estimate the conditional density ζ(.|x) by a which is obtained by minimizing the following quantity: where β(., .) is a known function from F 2 into R, such that ∀ξ ∈ F, β(ξ, ξ ) = 0, where K and H are kernels, h K = h K ,n (resp. h H = h H,n ) is a sequence of positive real numbers, and δ(., .) is chosen as a function of F × F, such that d(., .) = |δ(., .)|. Here, we denote a by ζ n (.|x). Then, the expression of ζ n (.|x) is given as: We assume that there exists a certain compact set ⊂ IR, such that ζ(y|x) has an unique mode θ(x) on , where: In the censored case, we adapt the idea of Carbonez et al. (1995), Kohler et al. (2002), and Khardani et al. (2010) to the infinite dimension case using a smooth distribution function H(.) instead of a step function. Then, we get the "pseudo"-estimator of ζ(.|x): The cumulative distribution function G, of the censoring random variables, is estimated by the Kaplan-Meier (1958) estimator defined as: where T (1) < T (2) < · · · < T (n) are the order statistics of T i and δ (i) is the concomitant of T (i) , which is known to be uniformly convergent toḠ. Therefore, a feasible estimator of ζ(.|x) is given by: Then, a natural estimator of θ(x) is defined by: θ n (x) = arg sup y∈ ζ n (y|x).

Assumptions and main results
. .} be a strictly stationary sequence of random variables. Given a positive integer n, set: The sequence is said to be α-mixing (strong mixing) if the mixing coefficient α(n) → 0 as n → ∞.
This condition was introduced in 1956 by Rosenblatt. The strong-mixing condition is reasonably weak and has many practical applications (see Doukhan [8], for more details).

Almost sure consistency of the conditional density estimator
Our first result concerns the almost sure convergence of the LLE of the conditional density. We introduce some conditions that are required to state our asymptotic result. Throughout the paper, x denotes a fixed point in F; N x denotes a fixed neighborhood of x. For any df L, let τ L := sup{y : L(y) < 1} be its support's right endpoint and assume that θ( Note that our nonparametric model is quite general in the sense that we just need the following assumptions: where C x is a positive constant depending on x. and ζ (2) (H7) H is a positive, bounded, Lipschitzian continuous function, such that: (H8) The bandwidth h K satisfies: there exists an integer n 0 , such that: . Remarks on the assumptions Most of the assumptions are common in nonparametric functional data analysis (NFDA) context. More precisely, assumption (H1) is usually used in NFDA and it is linked with the topological structure of the functional space, F, of the explanatory variable X (see Ferraty and Vieu [10] for more discussions). Furthermore, assumptions (H2) and (H3) are mild regularity assumptions on the conditional density function. Finally, conditions (H4), (H5), and (H10) are technical assumptions (see Ferraty and Vieu [10] for the constant local method case).

Theorem 3.2 Under assumptions (H1)-(H10), we have:
The proof is a direct consequence of the following decomposition ∀y ∈ : where and the following Lemmas 3.3-3.6.

Lemma 3.5 Under assumptions of Theorem 3.2, we get
:

Almost sure consistency of the conditional mode estimator
The convergence rate of the LLE of the conditional mode is a direct consequence of the previous result. Thus, in addition to the previous conditions, we assume that: Then, the asymptotic behaviour of θ(x) is given in the following Theorem.
Theorem 3.7 Assume that (H1)-(H11) hold. We have: Proof of Theorem 3. 7 We easily can show that: Making use of a Taylor expansion of the function ζ(.|x), we get: where θ * (x) is between θ(x) and θ n (x) Combining Eqs. (22) and (23), we find: The almost sure consistency of θ n (x) is an immediate consequence of Theorem 3.2.

On simulated data
In this subsection, a simulation study is carried out to investigate the finite sample performance of the local linear estimator f L L (y|x) of the conditional density function under right-censored and functional-dependent data. As common to all, the applicability of asymptotic normality result requires a practical estimation of the asymptotic bias and variance. For this, we neglect the bias term and we use a plug-in approach to construct an estimator of the asymptotic variance of the conditional density function given by: To test the effectiveness of the asymptotic normality result and to attain this purpose, let us consider the following regression model where the response is a scalar: . . . , n,, where i is the error generated by an autoregressive model defined by: with {η i } i is a sequence of i.i.d. random variables normally distributed with a variance equal to 0.1. The explanatory variables are constructed by: where W is generated from a Gaussian distribution N (0, 1) and A is a random variable Bernoulli distributed with parameter p = 0.5. X i s are generated from 300 curves and are plotted in Fig. 1.
On the other hand, n i.i.d. random variables (C i ) i are simulated through the exponential distribution E(λ) and for i = 1, . . . , n = 300,, the scalar response Y i is computed by considering the following operator: Given X = x, we can easily see that Y is as a gaussian distribution N (r (x), 0.2). Then, we can get the corresponding conditional density, which is explicitly defined by Therefore, the conditional mode, the conditional mean r (x), and the conditional median functions will coincide and will be equal to r (x), for any fixed x. Our purpose, now, consists in evaluating the accuracy of the conditional mode function estimator based on randomly censored data. The computation of this estimator is based on the observed data (X i , In this simulation study, we present only results of the case where i = 2 and q = 1. For this, we take K 0 (s) = 3(1 − s 2 )1 1 [1,1] , K (1) > 0, and K 1 (s) = 3(1 − s 2 )1 1 [1,1] . Elsewhere, as it is well known in FDA, the choice of the metric and the smoothing parameters have crucial roles in the computational issues. To optimize these choices on this illustration, we use the local cross-validation procedure method in the aim of choosing smoothing parameters h K and h H (see Laksaci et al. [17]).
Another important point for insuring good behavior of the considered methods is to use locating functions δ and/or β that are well adapted to the kind of data that we have to deal with. Here, it is clear that the shape of the curves (cf. Fig. 1) allows us to use the locating functions σ and β defined by the derivatives of the curves. More precisely, we take:   Figure 2 displays the distribution of the obtained MSE given by the N replications. It can be observed that the proposed estimator performs well, especially when the sample size increases. This conclusion is confirmed by Table 1 which provides a numerical summary of the distribution of the MSE, with different censored rates (CR).
In the second part of the simulation studies, we are interested in the evaluation of the prediction accuracy of the conditional median with different censored rates (CR). A sample (X i , Y i ) i=1,...,550 of size n = 550 generated from the model described above, is considered for this purpose. We split this sample into two parts: a learning subsample {(X i , Y i ); i = 1, . . . , 500} which is used to calculate the predictor (the conditional mode in this case) and a testing subsample {(X i , Y i ); i = 501, . . . , 550} used to evaluate the performance of the predictor. The prediction accuracy is measured, for different values of CR, using the Mean Absolute Error

Real data application
A useful tool in survival analysis is the hazard rate, which reflects the instantaneous probability that a duration will end within the next time instant. Among the most used examples in survival analysis: survival times of patients, Stanford Heart Transplant, durations between subsequent transactions in a financial security,. . .. For application on real data, we apply the local linear method via the Kidney transplant data (see Klein and Moeschberger [16] and https://www.agence-biomedecine.fr). The bandwidth selection is given by plug-in rules and more advanced selection methods like cross-validation are likely to further improve the performance of the local linear hazard rate estimator. To use the Kidney transplant data, we need to describe three fundamental Fig. 3 Estimates of the conditional hazard rate after the transplantation parameters that are (1) the survival times in days of patients following kidney transplant as the response, (2) race (black/white), and (3) age in years for each patient as the covariate. We propose a new method based on the functional local linear approach. More precisely, we use the conditional hazard rate. The methodology of this study is given by the following description: in the first, we consider the subsamples of white males and white females. At the second step, we take 432 patients as the first group and have a censoring rate of 83%. The second group involves 280 patients with a 86% censoring rate. The survival times of white patients vary between 1 day and 9.4 years. The average age of the male patients is slightly less than 44 years, whereas the female patients are almost 41 years old on average. We find only small differences between the various plug-in bandwidths for the conditional hazard rate, where we have taken age as the conditioning variable.
In conclusion, we find only small differences between the various plug-in bandwidths for the conditional hazard rate, where we have taken age as the conditioning variable (see Fig. 3). We see that the hazard rate is strongly no monotonic for both male and female patients. As expected, the hazard rate is higher for older patients. Interestingly, the differences in the hazard rates of younger and older women diminish after about 150 days. For men, the differences between the various age groups persist longer. Also, the risk of dying shortly after the transplantation is higher for men than for women. On the other hand, for female patients, the risk of dying in a later stage is lower than for males.

Proofs
In what follows, when no confusion is possible, we will denote by C and C some strictly positive generic constants. Moreover, we put, for any x ∈ F, and for all i = 1, . . . , n :

Proposition 5.1 (Ferraty and Vieu [10], page 237)
Assume that {U i , i ≥ 1} are identically distributed, with strong mixing coefficient α(n) = O(n −a ), a > 1, such that |U 1 | is bounded. Then, for each r > 1 and ε > 0: Proof of Lemma 3.4 The bias term is not affected by the dependence condition of (X i , Y i ). Therefore, by the equiprobability of the couples (X i , Y i ), we have: Using conditional expectation properties and the fact that: for any measurable function ϕ, we have: then by assumptions (H2) and (H7), we get: which proves Lemma 3.4.
For this, we consider the following decomposition: where and Therefore, our claimed result is direct consequences of the following assertions: and Cov(T 4 , T 5 ) = o a.co.
In the same way, we can choose η, such that s n ℘ 1 is the general term of a convergent series. Finally, we find: Concerning (10): from Demongeot et al. [7], we have: Now, we have to study the case where i = 2, 4. Since the pairs (X i , Y i ), i = 1, ..., n are identically distributed, we obtain: We have to evaluate: , for l = 0, 1.
Using the conditional expectation properties and Eq. (5), we have for all l = 0, 1 : Using Lemma 3 given in Barrientos-Marin et al. [2], we obtain From Eq. (20), we find: Concerning (11) and (12): following similar steps as in the proof of (16), we get: Hence: sup y∈ f x n (z y ) − IE[f x n (z y )] = o a.co.
Proof of Lemma 3.6 Observe that: f x n (y) −f x n (y) ≤