Abstract
The receiver operating characteristic (ROC) curve is a popular graphical tool for describing the accuracy of a diagnostic test. Based on the idea of estimating the ROC curve as a distribution function, we propose a new kernel smoothing estimator of the ROC curve which is invariant under nondecreasing data transformations. We prove that the estimator has better asymptotic mean squared error properties than some other estimators involving kernel smoothing and we present an easy method of bandwidth selection. By simulation studies, we show that for the limited sample sizes, our proposed estimator is competitive with some other nonparametric estimators of the ROC curve. We also give an example of applying the estimator to a real data set.
Similar content being viewed by others
1 Introduction
The receiver operating characteristic (ROC) curve is used to describe the performance of a diagnostic test, which on the basis of some observable measurements, assigns individuals to one of two different groups. For definiteness, let us think of them as the groups of diseased and healthy patients. This medical terminology is related to the fact that, in practice, the ROC curves are mainly used in medicine. However, their applications were recently extended to many other fields like economics and data mining. More information about the ROC curves and their possible applications can be found, for example, in Swets (1988), Pepe (2003) and Krzanowski and Hand (2009). For a given cutoff point \(c \in \mathbb {R}\), let an individual be classified as healthy if its test score is greater than c and as diseased otherwise. Suppose that the real random variables X and Y denote the test score in the groups of healthy and diseased individuals, respectively, and let \(F(x)=P(X\le x)\) and \(G(x)=P(Y\le x)\) be their continuous and strictly increasing distribution functions. The accuracy of the test is typically summarized by the sensitivity and specificity, given by \(SE(c)=1-G(c)\) and \(SP(c)=F(c)\), respectively. The ROC curve is a plot of SE(c) versus \(1-SP(c)\) for all possible cutoff values \(c \in \mathbb {R}\cup \{ -\infty , \infty \}\). Equivalently, it can be defined as
Let \(\pmb {X}_m=(X_1,\ldots ,X_m)\) and \(\pmb {Y}_n=(Y_1,\ldots ,Y_n)\) be independent simple samples from healthy and diseased populations, respectively, and let \(F_m\) and \(G_n\) denote their empirical cumulative distribution functions. The most commonly used nonparametric estimator of R(t) is the empirical ROC curve, which is of the form
Asymptotic properties of this estimator were studied by Hsieh and Turnbull (1996). Among other things they proved that, under some basic assumptions for F and \(G, R_{m,n}(t)\) converges to the true ROC curve uniformly on [0, 1] with probability one.
Although the empirical ROC curve is very simple and very popular, its obvious weakness is being a step function, while R(t) is continuous and smooth. One of the ways to obtain a continuous estimator of R(t) is to use the kernel smoothing method. Zou et al. (1997) proposed a nonparametric estimator of R(t) from kernel estimates for the density functions of F and G. Lloyd (1998), using kernel estimates directly for F and G, obtained a smooth ROC curve estimator given by
where
are kernel estimators of F and G with a kernel function \(Q, \mathscr {Q}(v)= \int _{- \infty }^{v}Q(z)dz\) and bandwidths \(h_F\) and \(h_G\). Lloyd and Zhou (1999) proved that estimator (3) has better asymptotic mean squared error (MSE) properties than the empirical ROC curve. Unfortunately, to the best of our knowledge, in the case of estimator (3), there is no uniform, but only pointwise convergence to R(t). Moreover, the kernel ROC curve estimator is not invariant under monotone data transformations, which may be undesirable in some practical applications. The problem of transformation-invariant nonparametric estimation of the ROC curve is considered, e.g., in Du and Tang (2009) and Tang et al. (2010). Finally, estimator (3) involves two separate bandwidth parameters, so special care is required for bandwidth selection (Zhou and Harezlak 2002; Hall and Hyndmann 2003).
To overcome some of the mentioned drawbacks, different methods of smoothing the empirical ROC curve were proposed, including local linear smoothing (Peng and Zhou 2004), Bayesian bootstrap (Gu et al. 2008) and bandwidth-free smoothing of the empirical CDFs (Jokiel-Rokita and Pulit 2012). In this paper, instead of estimating the ROC curve as the composition of estimators of \(F^{-1}\) and G, we use the fact that for \(Z=1-F(Y)\),
and propose to estimate R(t) as the cumulative distribution function of Z. It is clear that without any knowledge about F, we need to obtain a predictor of the unknown random sample \(\pmb {Z}_n=(1-F(Y_1),\ldots ,1-F(Y_n))\). The simplest way to do this is to substitute the unknown distribution function F by its any estimator \(\hat{F}\). Based on the vector \(\pmb {\hat{Z}}_{n}=(1-\hat{F}(Y_1),\ldots ,1-\hat{F}(Y_n))\), we can directly estimate R(t), using the well known method of the kernel distribution function estimation.
In Sect. 2 we define a new kernel smoothing estimator of the ROC curve, which is invariant to nondecreasing data transformations and involves only one bandwidth parameter. We also show some asymptotic results, including a MSE comparison of the proposed estimator and the kernel-smoothed estimator proposed by Lloyd (1998). In Sect. 3 we propose a method of bandwidth selection. Section 4 contains results of simulation studies. Finally, in Sect. 5 we apply the proposed estimator to real data. All proofs are put in Appendices 1 and 2.
2 Main results
Let \(\pmb {X}_m=(X_1,\ldots ,X_m)\) and \(\pmb {Y}_n=(Y_1,\ldots ,Y_n)\) be independent simple samples from unknown distribution functions F and G with the same supports \(I_F=I_G\subseteq \mathbb {R}\) and with density functions f and g (with respect to Lebesgue measure), respectively. Let K be a continuous symmetric density function with support \([-1,1]\) and denote \(\mathscr {K}(x)=\int _{-\infty }^{x}K(y)dy\). Define the smooth ROC curve estimator as
where \(h_n>0\) is a bandwidth parameter and \(F_m\) denotes the empirical distribution function of the sample \(\pmb {X}_m\). For the estimator \(\hat{R}_{m,n}\) to have better (than some other estimators) asymptotic MSE properties, the kernel function K should satisfy some conditions like e.g. \(\int _{-1}^{1}K''(x)dx<0\). Therefore, for simplicity, we assume that K is the Epanechnikov kernel \(K(x)=3/4(1-x^2)\mathbf {1}_{[-1,1]}(x)\).
If we consider the ROC curve as the distribution function of the random variable \(Z=1-F(Y)\) [see (4)], it is easily seen that \(\hat{R}_{m,n}(t)\) is the kernel distribution function estimator based on \(\hat{Z}_i=1-F_m(Y_i)\), which are estimators of \(Z_i=1-F(Y_i), i=1,2,\ldots ,n\).
Remark 1
If we apply the same nondecreasing transformation to samples \(\pmb {X}_m\) and \(\pmb {Y}_n\), the estimator given by (5) does not change, which means that \(\hat{R}_{m,n}(t)\) is transformation invariant. Therefore, without loss of generality, we can assume that \(I_F=I_G =\mathbb {R}\).
Theorem 1
Assume that \(R''(s)\) exists for s near \(t \in (0,1), R''(s)\) is continuous at \(s=t\) and let \(h_n\rightarrow 0\). Then
Let \(k_n \in \mathbb {N}\) denote the minimal sample size for which the MSE of the empirical ROC curve is no greater than the MSE of estimator (3), based on a sample of size \(n\in \mathbb {N}\). Lloyd and Zhou (1999) showed that, under some assumptions on the kernel function and the bandwidths \(h_F\) and \(h_G\), the difference \(k_n-n\) is divergent to infinity and \(k_n-n\sim n \sqrt{h_F h_G}\). The proposed estimator \(\hat{R}_{m,n}\) has an analogous advantage, not only over the empirical ROC curve, but also over estimator (3) proposed by Lloyd (1998). Assume that
and denote
where \(\tilde{R}_{n}^{\star }\) is the kernel ROC curve estimator given by (3), with the asymptotic optimal (in the sense of minimizing the MSE) bandwidths \(h_{F}^{\star }\) and \(h_{G}^{\star }\), which are \(O(m^{-1/3})\) and \(O(n^{-1/3})\), respectively (Lloyd and Zhou 1999; Hall and Hyndmann 2003). The asymptotic bias and variance of \(\widetilde{R}_{m,n}\) are given by
and were derived by Lloyd (1998). Therefore, using the condition (M), we get
Theorem 2
Suppose that assumptions of Theorem 1 hold and the condition (M) is satisfied. Then, if \(nh_n^2\rightarrow \infty \) and \(nh_n^4\rightarrow 0\), we have
Under the additional assumptions \(nh_n^3\rightarrow 0\) and \(\lambda -\lambda _n=O\left( n^{-1/3} \right) \), we get
and the above limit is strictly positive if \(R'(t)>0\). Finally, if we assume that \(nh_n^2\rightarrow \delta \in (0,\infty )\) and \(\delta >\frac{5}{4\lambda }t(1-t)\), then
and the above limit is strictly positive if \(R'(t)>0\).
Remark 2
The MSE of the ROC curve estimator proposed by Peng and Zhou (2004), with asymptotically optimal choice of bandwidth, has the same form as the MSE of the estimator proposed by Lloyd (1998), given by (11) (see Peng and Zhou (2004), Sect. 3). Therefore, Theorem 2 remains true if, instead of the estimator \(\tilde{R}_{n}^{\star }\) appearing in definition (8) of \(b_n(t)\), we insert the Peng and Zhou’s estimator.
3 Bandwidth selection
In this section we deal with the issue of choosing the parameter \(h_n\), appearing in (5). In the problem of bandwidth selection when estimating the distribution function, to the best of our knowledge, only two methods have been investigated: plug-in and cross-validation. The plug-in bandwidth choice was studied e.g. by Altman and Leger (1995) and Polansky and Baker (2000). The least-squares cross-validation method was analyzed in Sarda (1993) and in Bowman et al. (1998). It seems that an idea presented in the last paper may be adapted to our problem. Bowman et al. proposed the method which minimizes the function
where \(I(x-x_i)=1\) if \(x-x_i\ge 0\) and 0 in other case, and \(\widetilde{F}_{n,-i}(x,h)\) denotes the kernel distribution function estimator constructed from the data with observation \(x_i\) omitted. Analogously, one can choose the bandwidth parameter \(h_n\) by minimizing the function
where
This method of bandwidth selection works decently and usually leads to the estimator \(\hat{R}_{m,n}\) with the MSE smaller than in the case of the empirical ROC curve. However, when the sample sizes are small, it is not very stable and often gives too small or too large parameters \(h_n\), which results in under- or oversmoothed estimated ROC curves, respectively. Moreover, procedure of numerical minimization of the function CV, repeated many times, is time consuming
For that reason we propose another method of choosing the parameter \(h_n\). From Theorem 2 it follows that for fixed \(t \in (0,1)\) and for \(nh_n^2 \rightarrow \delta \), where \(\frac{5}{4\lambda }t(1-t)<\delta <\infty \), we have
and it is easy to check that the function \(\varPsi (\delta )\) is maximized for
Therefore, for fixed \(t \in (0,1)\), to maximize the asymptotic relative efficiency of \(\hat{R}_{n}(t)\) with respect to \(\tilde{R}_{n}^{\star }(t)\), the bandwidth parameter \(h_n\) should be selected in such a way that \(nh_n^2\rightarrow \delta ^{\star }\). Hence, we propose to choose the bandwidth parameter which depends on t and is of the form
where \(c_n\) is some sequence converging to 1. Note that our proposed method of bandwidth selection gives the smoothing parameter \(h_n^{\star }(t)\) which, in contrast to the optimal bandwidth(s) obtained by other methods relating to some other kernel ROC curve estimators (e.g. Lloyd and Zhou 1999; Hall and Hyndmann 2003; Peng and Zhou 2004), does not depend on the unknown distribution functions F and G. Therefore the parameter \(h_n^{\star }(t)\) is easy to compute. Moreover, \(h_n^{\star }(t)\) becomes small near the ends of the interval [0, 1], which results in a reduction of the bias of the proposed estimator, especially for t close to 0.
4 Simulation study
A small simulation study was performed to investigate the efficiency of the proposed estimator of the ROC curve for the limited sample sizes. We considered four different combinations of the distribution functions. In the first two studies, both F and G belong to the same family of distributions, normal or logistic. The parameters are selected so that the resulting ROC curves have similar shapes (see Fig. 1). In the other two studies, F and G are different: if one is normal, the other is logistic. In this case also the corresponding ROC curves are completely different. In the simulations we used 1000 samples of equal sizes \(m=n=20,50\). For each of the considered ROC curves and sample sizes, we computed the empirical ROC curve (EM), the smoothed empirical ROC curve (SEM) of Jokiel-Rokita and Pulit (2012), based on smoothed empirical CDFs, the Bayesian bootstrap estimator (BB) proposed by Gu et al. (2008), the local linear smoothing estimator (LLS) of Peng and Zhou (2004), the Lloyd’s kernel-smoothed estimator (KS) and the new kernel smoothing estimator (NKS) proposed in this paper.
Although the choice of the sequence \(c_n\) appearing in (15) does not affect on the asymptotic behavior of the estimator \(\widetilde{R}_{m,n}\), for the limited sample sizes the best results are achieved when \(c_n\approx 1.5-2.5\), depending on the estimated ROC curve and the value of n. In the simulation study, choosing \(h_n^{\star }(t)\) for our estimator, for simplicity, we decided to take \(c_n=1+1.8n^{-1/5}\) in all the considered cases. In the problem of bandwidth selection for the kernel estimator (KS), we used the normal-reference method proposed by Hall and Hyndmann (2003), which is recommended when the sampled distributions are not far from normal. The authors found that in the context of the ROC curve estimation, proposed method give substantial improvement in the mean integrated squared error over other known methods of bandwidth selection. Finally, in the case of the local linear smoothing estimator (LLS), we choose the smoothing parameter which minimizes the mean trimmed integrated squared error, assuming knowledge of the distribution functions F and G, [see Peng and Zhou (2004), Sect. 3].
Figures 2 and 3 display the results of the simulations for the sample sizes \(m=n=20\) and \(m=n=50\), respectively. Every figure contains four plots corresponding to four different ROC curves which are to be estimated (see Fig. 1) and every single plot compares the considered ROC curve estimators in term of their mean squared error (MSE) on the unit interval. The obtained results indicate that the proposed estimator (NKS) is competitive with other estimators, also for the limited sample sizes. In the problem of estimation of the ROC curve it performs better than the empirical ROC curve (EM), the smoothed empirical ROC curve (SEM) and the Bayesian bootstrap estimator (BB). In some of the cases it is also better than two other estimators, (KS) and (LLS).
Supplementary materials to the paper, containing some box-plots comparing the accuracy of the estimators in term of MSE when estimating AUC, are available at https://drive.google.com/file/d/0B3L4pdDwuWxvT0RRbmUzWGtaa28/view?pli=1.
5 Real data analysis
To illustrate our method, we apply it to the set of real data which comes from a clinical study performed from November 2008 to August 2011 by a research team led by Dr. Krzysztof Tupikowski from Department of Urology and Oncological Urology, Wroclaw Medical University.
One investigated the efficacy of combined treatment of interferon alpha and metronomic cyclophosphamide in patients with metastatic kidney cancer not eligible for thyrosine kinase inhibitors treatment with various negative prognostic factors for survival. It has been approved by an independent local bioethics committee. One of the secondary goals of the study was to assess if there are any predictive factors for response to this novel combination treatment.
Table 1 contains presence (1) (or absence - 0) of clinical response (CR) observed at 24-th week of treatment, hemoglobin level (HL) and serum fibrinogen concentration (FC) of 31 patients treated per protocol. Missing data are denoted by x. Low HL has been previously associated with short survival and poor response to treatment in disseminated disease (Tonini et al. 2011). High FC is examined as a negative predictor of response to treatment in metastatic kidney cancer patients for the first time.
The estimators of the ROC curves for HL (left) and FC (right) as the predictive factors (positive and negative, respectively) are plotted in Fig. 4.
References
Altman N, Leger C (1995) Bandwidth selection for kernel distribution function estimation. J Stat Plan Inference 46:195–214
Bowman A, Hall P, Prvan T (1998) Bandwidth selection for the smoothing of distribution functions. Biometrika 85:799–808
Du P, Tang L (2009) Transformation-invariant and nonparametric monotone smooth estimation of ROC curves. Stat Med 28:349–359
Gu J, Ghosal S, Roy A (2008) Bayesian bootstrap estimation of ROC curve. Stat Med 27:5407–5420
Hall PG, Hyndmann RJ (2003) Improved methods for bandwidth selection when estimating ROC curves. Stat Probab Lett 64:181–189
Hsieh F, Turnbull BW (1996) Nonparametric and semiparametric estimation of the receiver operating characteristic curve. Ann Stat 24:25–40
Jokiel-Rokita A, Pulit M (2012) Nonparametric estimation of the ROC curve based on smoothed empirical distribution functions. Stat Comput 23:703–712
Krzanowski WJ, Hand DJ (2009) ROC curves for continuous data. Chapman and Hall/CRC, London
Lloyd CJ (1998) Using smoothed receiver operating characteristic curves to summarize and compare diagnostic systems. J Am Stat Assoc 93:1356–1364
Lloyd CJ, Zhou Y (1999) Kernel estimators of the ROC curve are better than empirical. Stat Probab Lett 44:221–228
Peng L, Zhou XH (2004) Local linear smoothing of receiver operating characteristic (ROC) curves. J Stat Plan Inference 118:129–143
Pepe MS (2003) The statistical evaluation of medical tests for classification and prediction. Oxford University Press, Oxford
Polansky AM, Baker ER (2000) Multistage plug-in bandwidth selection for kernel distribution function estimates. J Stat Comput Simul 65:63–80
Sarda P (1993) Smoothing parameter selection for smooth distribution functions. J Stat Plan Inference 35:65–75
Swets JA (1988) Measuring the accuracy of diagnostic systems. Science 240:1285–1293
Tang L, Du P, Wu C (2010) Compare diagnostic tests using transformation-invariant smoothed ROC curves. J Stat Plan Inference 140:3540–3551
Tonini G, Fratto ME, Imperatori M, Pantano F, Vincenzi B, Santini D (2011) Predictive factors of response to treatment in patients with metastatic renal cell carcinoma: new evidence. Expert Rev Anticancer Ther 11(6):921–930
Zhou XH, Harezlak J (2002) Comparison of bandwidth selection methods for kernel smoothing of ROC curves. Stat Med 21:2045–2055
Znidaric M (2009) Asymptotic expansion for inverse moments of binomial and Poisson distributions. Open Stat Probab J 1:710
Zou KH, Hall WJ, Shapiro DE (1997) Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Stat Med 16:2143–2156
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Proofs of the theorems
Proof (of Theorem 1)
Let us fix any \(t \in (0,1)\) for which \(R''(s)\) exists for s near t and \(R''(s)\) is continuous at \(s=t\). Denote
\(i=1,2,\ldots ,n\). The bias of \(\hat{R}_{m,n}(t)\) is given by
Applying Taylor expansion to function \(\mathscr {K}\left( T_{1,m}\right) \) at \(T_1\), we obtain
The first term in the sum on the right side of equality (17) is equal to
Expanding \(R(t-xh_n)\) in a Taylor series at t to order 2 and substituting the obtained expansion into (18), we get
Note that for all n greater than some \(n_0 \in \mathbb {N}\) we have \(\frac{t-1}{h_n}<-1\) and \(\frac{t}{h_n}>1\). Therefore
The expectation \(I_1\) appearing in (17) is equal to zero, because \(F_m\) is an unbiased estimator of F, namely
Applying Lemma 3 to \(I_2\) and \(I_3\), we obtain
From (17), (19), (20) i (21), we get
The variance of \(\tilde{R}_{m,n}\) is equal to
Let us first take care of the variance of \(\mathscr {K}(T_{1,m})\), denoted by \(J_1\).
Applying Taylor expansion to function \(\mathscr {K}^2\left( T_{1,m}\right) \) at \(T_1\), we obtain
and in consequence, using (23),
With respect to \(J_{1,0}\), we have
Expanding \(R(t-xh_n)\) in a Taylor series at t to order 2, we get
Let us now return to equation (24). Using the fact that \(F_m\) is an unbiased estimator of F, we get \(J_{1,1}=0\). Therefore, applying Lemma 3 to \(J_{1,k}, k=2,3,\ldots ,6\), we obtain
which in combination with (24) and (25), gives
With respect to the covariance \(\mathrm {Cov}\left[ \mathscr {K}\left( T_{1,m}\right) , \mathscr {K}\left( T_{2,m} \right) \right] \) denoted in (22) by \(J_2\), using the fact that \(T_{1,m}\) and \(T_{2,m}\) have the same distributions, we can write
Using now the fact that \(T_1\) and \(T_2\) are i.i.d., we get
Applying the two-variable Taylor formula to \(\mathscr {K}\left( T_{1,m}\right) \mathscr {K}\left( T_{2,m}\right) \) and expanding the function at \((T_1,T_2)\), we obtain
where \(\mathscr {I}_k=\{ i \in \mathbb {N}: 0\vee (k-3) \le i \le k\wedge 3 \}\). Let us now take care of the last term of the sum in (27). It is easy to check that
Form (17), (20) and (21), we know that \(\mathrm {E}_{F,G}\left[ \mathscr {K}\left( T_{1,m}\right) \right] - \mathrm {E}_{G}\left[ \mathscr {K}\left( T_1\right) \right] \) is equal to
and in result
Again using the fact that \(T_1\) and \(T_2\), and \(T_{1,m}\) and \(T_{2,m}\) have the same distributions, and \(T_1\) and \(T_2\) are independent, equality (28) may be written in the following form
Therefore, using (27), we get
where \(\mathscr {J}_k=\{ j \in \mathbb {N}: 1\vee (k-3) \le j \le (k-1)\wedge 3 \}\). The first term in the sum on the right side of equality (29) is equal to
where
Applying Lemma 4 to \(J_{2,2}\) expressed as above, we obtain
where
Simplifying, we get
The term \(J_{2,4}\) is the sum of three another terms from which the first one and the third one have the same values, and their sum is equal to
and the second one is equal to
One can check that
where
Therefore
Applying Lemma 4 to each of the expectations appearing in (31), we get
where
After simplification, we have
The last term in the sum on the right side of equality (29) is equal to
Again, one can check that
where
and
Applying now Lemma 4 to the expectation in (33), we obtain
where
Simplifying, we get
The remaining terms \(J_{2,3}\) and \(J_{2,5}\) in the sum on the right side of equality (29), we can estimate using Lemma 3
Combining now (29), (30), (32), (34) and (35), we obtain
Finally, substituting (26) and (36) into (22), after simplification, we have
which completes the proof. \(\square \)
Proof (of Theorem 2)
From (11) and Theorem 1, under the assumption \(nh_n^2\rightarrow \infty \), we can write
and
where
From the definition (8) of \(b_n=b_n(t)\), we have
Substituting (37) and (38) into (40), we get
and after some simple transformations, we obtain
From (37), (38) and (41), after analogical transformations, we get
Combining (42) and (43), and using the assumptions \(h_n\rightarrow 0, nh_n^2\rightarrow \infty , nh_n^4\rightarrow 0, \lambda _n\rightarrow \lambda \) and the fact that \(b_n\rightarrow \infty \), we obtain
Suppose now that \(nh_n^3\rightarrow 0\) and \(\lambda -\lambda _n=O\left( n^{-1/3} \right) \). Subtracting 1 from both sides of inequalities (42) and (43), and multiplying them by \(nh_n^2\), we get
and
respectively. Using the assumptions \(h_n\rightarrow 0, nh_n^2\rightarrow \infty , nh_n^3\rightarrow 0\) and equality (44), form inequalities (45) and (46), we obtain
Suppose now that \(nh_n^2\rightarrow \delta \in (0,\infty )\). Then, from Theorem 1, we have
where A(t) and B(t) are given by (39), and
Substituting (37) and (47) into (40), after some transformations, we get
Analogously, substituting (37) and (47) into (41), we obtain
Combining (49) and (50), and using the condition \(nh_n^2\rightarrow \delta \in (0,\infty )\), we get
One can easily see that for \(\delta >\frac{5}{4\lambda }t(1-t)\), the above limit is greater then 1, which completes the proof. \(\square \)
Appendix 2: Some useful lemmas
Lemma 1
(Znidaric 2009, Corollary 1) Let \(\mu _k(m,p)\) denote the k-th central moment of the binomial distribution \(\mathscr {B}(m,p)\), i.e. \(\mu _k(m,p)=\mathrm {E}\left( X-mp \right) ^k\), where \(X\sim \mathscr {B}(m,p)\). Then
-
(1)
\(\mu _k(m,p)\) is a polynomial of degree \(\lfloor \frac{k}{2} \rfloor \) in m,
-
(2)
\(\mu _k(m,p)\) is a polynomial of degree k in p.
Lemma 2
Let \(\mu _{k,l}(m,\pmb {p})\) denote the mixed central moment of order \(k+l\) of the multinomial distribution \(\mathscr {M}_3(m,\pmb {p})\), i.e. \(\mu _{k,l}(m,\pmb {p})=\mathrm {E}\left( X_1-mp_1 \right) ^k \left( X_2-mp_2 \right) ^l\), where \(\pmb {X}=\left( X_1,X_2,X_3\right) \sim \mathscr {M}_3(m,\pmb {p}), \pmb {p}=\left( p_1,p_2,p_3 \right) , p_1+p_2+p_3=1\). Then
-
(1)
\(\mu _{k,l}(m,\pmb {p})\) is a polynomial of degree at most \(\lfloor \frac{k+l}{2} \rfloor \) in m.
-
(2)
\(\mu _{k,l}(m,\pmb {p})\) is a polynomial of degree at most \(k+l\) in \(p_1\) and \(p_2\).
Proof
Let \(k,l \in \mathbb {N}\) and let \(\mathscr {X}=\left\{ (x_1,x_2,x_3)\in \mathbb {N}^3: \sum _{i=1}^3x_i=m \right\} \).
Differentiating \(\mu _{k,l}\) with respect to \(p_1\), we get
Analogously, differentiating \(\mu _{k,l}\) with respect to \(p_2\), we obtain
Note that the last of the three terms appearing in (51) and (52) are equal \(-mk\mu _{k-1,l}\) and \(-ml\mu _{k,l-1}\), respectively. Hence
Using the fact that \(x_3=m-x_1-x_2\), we get
which in combination with (53), gives
The recursive formula (54) and the initial conditions
let us to prove the lemma, using mathematical induction over \(s=k+l\). We give the proof only of the first assertion of the lemma. The proof of the second is essentially the same. Equalities (55) indicate that, the thesis is true for \(s=1\) and \(s=2\). Suppose that the thesis is satisfied for some natural \(s_0\ge 2\) and \(s_0+1\), i.e. for any \(k,l \in \mathbb {N}\) such that \(k+l=s_0\) or \(k+l=s_0+1, \mu _{k,l}\) is a polynomial (of variable m) of degree at most \(\lfloor \frac{k+l}{2} \rfloor \). We need to show that all \(\mu _{k,l}\), where \((k,l)\in \{ (s_0+2,0),(s_0+1,1), (s_0,2), \ldots , (1,s_0+1), (0,s_0+2) \}\), are polynomials of degree at most \(\lfloor \frac{s_0+2}{2} \rfloor \).
It follows from Lemma 1 that \(\mu _{s_0+2,0}=\mathrm {E}\left( X_1-mp_1 \right) ^{s_0+2}\) is a polynomial of degree \(\lfloor \frac{s_0+2}{2} \rfloor \). Assume that \(\mu _{s_0+2,0}, \mu _{s_0+1,1}, \mu _{s_0,2}, \ldots , \mu _{s_0+2-s_1,s_1}\), where \(0\le s_1<s_0+2\), satisfied the thesis of the lemma. We show that then also \(\mu _{s_0+1-s_1,s_1+1}\) satisfied the thesis. Indeed, from (54) (taking \(k=s_0+1-s_1, l=s_1\)), we have
By the induction hypothesis, \(\mu _{s_0+1-s_1,s_1}\) is a polynomial of degree at most \(\lfloor \frac{s_0+1}{2} \rfloor \) and \(\mu _{s_0-s_1,s_1}\) and \(\mu _{s_0+1-s_1,s_1-1}\) are polynomials of degree at most \(\lfloor \frac{s_0}{2} \rfloor \). Moreover, as we assumed, \(\mu _{s_0+2-s_1,s_1}\) is a polynomial of degree at most \(\lfloor \frac{s_0+2}{2} \rfloor \). Thus, using (56), we conclude that also \(\mu _{s_0+1-s_1,s_1+1}\) is a polynomial of degree at most \(\lfloor \frac{s_0+2}{2} \rfloor \). Hence, one can easily deduce that the degrees of all \(\mu _{s_0+2-i,i}, i=0,1,\ldots ,s_0+2\) are equal at most \(\lfloor \frac{s_0+2}{2} \rfloor \), which completes the proof. \(\square \)
Lemma 3
Let \(\pmb {X}_m=(X_1,\ldots ,X_m)\) be a simple sample from a continuous distribution function F and let \(F_m\) denote its empirical distribution function. Let \(Y_1\) and \(Y_2\) be independent random variables from a continuous distribution function G and let \(R(s)=1-G(F^{-1}(1-s))\). Assume that \(R''(s)\) exists for s near \(t \in (0,1), R''(s)\) is continuous at \(s=t\), and let \(\phi _1\) and \(\phi _2\) be continuous functions supported on \([-1,1]\). Then for any \(h_n>0\) such that \(h_n\rightarrow 0\) and for any \(k,n \in \mathbb {N}\), we have:
-
(1)
\(\mathrm {E}_{F,G}\left[ \phi _1 \left( T_1 \right) \left( T_{1,m}-T_1 \right) ^k \right] = \left\{ \begin{array}{ll} O\left( \frac{m^{\lfloor \frac{k}{2} \rfloor }}{m^k h_n^{k-2}} \right) , \text { if } \phi _1 \text { is odd function},\\ O\left( \frac{m^{\lfloor \frac{k}{2} \rfloor }}{m^k h_n^{k-1}} \right) , \text { otherwise}, \end{array} \right. \)
-
(2)
\(\mathrm {E}_{F,G}\left[ \phi _1 \left( T_1 \right) \phi _2 \left( T_2 \right) \left( T_{1,m}-T_1 \right) ^k\left( T_{2,m}-T_2 \right) ^l \right] =O\left( \frac{m^{\lfloor \frac{k+l}{2} \rfloor }}{m^{k+l} h_n^{k+l-2}} \right) ,\)
where \(T_{i}=\frac{t-1+F(Y_i)}{h_n}, T_{i,m}=\frac{t-1+F_m(Y_i)}{h_n}, i=1,2\).
Proof
From the fact that the random variable \(mF_m(y) = mF_m(y,\pmb {X}_m)\) has a binomial distribution \(\mathscr {B}(m,F(y))\) for any \(y \in \mathbb {R}\), using Lemma 1, we can write
where \(\omega _i(x), i=1,2,\ldots , \lfloor \frac{k}{2} \rfloor \), are polynomials of degree at most k. Hence
where
and \(\tilde{\omega }_i(t,xh_n)=\omega _i(1-t+xh_n)=\sum _{j=0}^k a_{i,j}(t)(xh_n)^j\) for some \(a_{i,j}(t) \in \mathbb {R}\). Expanding \(r(t-xh_n)\) in a Taylor series at t, we get
The above integral is finite for every \(i=1,2,\ldots , \lfloor \frac{k}{2} \rfloor \) and, if \(\phi _1\) is odd and in consequence \(\int _{-1}^{1} \phi _1(x) dx = 0\), it is of order \(O(h_n)\). This, in combination with (57), proves statement \((1) \) of the lemma. To prove statement \((2) \) let us denote \(DF(x,y)=F(y)-F(x)\) and \(DF_m(x,y)=F_m(y)-F_m(x), x,y \in \mathbb {R}\) and assume that \(x<y\). Then
The random vector \(\left( mF_m(x),mDF_m(x,y),m-mF_m(x)-mDF_m(x,y)\right) \) has a multinomial distribution \(\mathscr {M}_3(m,\pmb {p})\), where \(\pmb {p}=\left( F(x),DF(x,y),1-F(x)\right. \left. -DF(x,y) \right) \). Therefore, from Lemma 2 we conclude that all expectations in the sum on the right side of the above equation, are polynomials of degree \(\lfloor \frac{k+l}{2} \rfloor \) in m. Hence
A similar arguments leads to the same conclusion when \(y<x\), so equality (59) is true for any \(x,y \in \mathbb {R}\). Therefore
where \(C>0\) is some constant. With respect to the last expectation, we have
where r(t) is given by (58) and \(\phi _j^{\star }=\sup _{x \in [-1,1]}| \phi _j ( x )| < \infty , j=1,2\). This, in combination with (60), completes the proof of statement \((2) \) of the lemma. \(\square \)
Lemma 4
Let \(Y_1\) and \(Y_2\) be independent random variables from a continuous distribution function G and let F be a continuous distribution function. Let \(\phi _1\) and \(\phi _2\) be continuous functions supported on \([-1,1]\) and let \(R(s)=1-G(F^{-1}(1-s))\). Assume that \(R''(s)\) exists for s near \(t \in (0,1)\) and \(R''(s)\) is continuous at \(s=t\). Denote \(T_{i}=\frac{t-1+F(Y_i)}{h_n}, i=1,2\) and \(H(x,y,z)=\sum _{k=1}^{k_0}c_k x^{\alpha _k} y^{\beta _k} z^{\gamma _k}, k_0 \in \mathbb {N}, c_k \in \mathbb {R}, \alpha _k, \beta _k, \gamma _k \in \mathbb {N}, k=1,2,\ldots ,k_0\). Then for any \(h_n>0\) such that \(h_n\rightarrow 0\) and for any \(k,n \in \mathbb {N}\), we have:
where \(r(t)=R'(t)=\frac{g(F^{-1}(1-t))}{f(F^{-1}(1-t))}, \eta _{0,i}=\int _{-1}^1\phi _i(x)dx, \eta _{1,i}=\int _{-1}^1x\phi _i(x)dx, i=1,2, \eta _2=\int _{-1}^1\phi _2(u)\int _{-1}^u x\phi _1(x) dxdu, \eta _3=\int _{-1}^1u\phi _2(u)\int _{u}^1 \phi _1(x) dxdu\) and \(\sigma _k=\alpha _k+\beta _k+\gamma _k\).
Proof
where \(A_1=\mathbb {R}\times (-\infty ,y_2]\) and \(A_2=\mathbb {R}\times (y_2,\infty )\). Changing variables in the integral \(I_1\) and using the definition of the function H, we get
where \(\widetilde{A_1}=\left( \frac{t-1}{h_n}, \frac{t}{h_n} \right) \times \left( \frac{t-1}{h_n}, x_2 \right] \) and r(t) is given by (58). Applying Newton’s binomial formula to \((1-t+x_1h_n)^{\alpha _k+\beta _k}\) and \((1-t+x_2h_n)^{\gamma _k}\), we obtain
where
Analogously, one can check that in the case of the integral \(I_2\), we have
where
Combining (61), (62) and (63), after simplification, we have
Note that
Expanding \(r(t-x_ih_n), i=1,2\), in a Taylor series at t in the above integrals, we obtain
and
where \(\eta _{0,i}, \eta _{1,i}, i=1,2\), are define in the lemma. In consequence, substituting the obtained expansions into (65), we get
Now we need to check that
Indeed
Substituting (66) and (67) into (64), we obtain
which completes the proof. \(\square \)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Pulit, M. A new method of kernel-smoothing estimation of the ROC curve. Metrika 79, 603–634 (2016). https://doi.org/10.1007/s00184-015-0569-1
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-015-0569-1