Abstract
The receiver operating characteristic (ROC) curve describes the performance of a diagnostic test, which classifies individuals into one of two categories. Many parametric, semiparametric and nonparametric estimation methods have been proposed for estimating the ROC curve and its functionals. In this paper the minimum distance estimation of the binormal ROC curve is considered. A modification of the estimator considered in the paper of Davidov and Nov (J Stat Plan Inference 142(4):872–877, 2012) and some new estimators are proposed. We compare the accuracy of the new estimators with known minimum distance estimators of the binormal ROC curve and we conclude that our estimators generally perform better than their competitors.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The receiver operating characteristic (ROC) curve is commonly used to describe the accuracy of a medical or other diagnostic test, which classifies individuals into “non-diseased” and “diseased” categories. It is defined as a plot of the true positive rate against the false positive rate, or sensitivity versus 1-specificity, for various threshold values. Over the years, it has been widely applied in many fields including biosciences, data mining, experimental psychology, finance, geosciences, machine learning, medicine, radiology, sociology and others. For comprehensive review of the literature, see Zhou et al. (2002), Pepe (2003), Krzanowski and Hand (2009) and Gonçalves et al. (2014).
More precisely, let X and Y be the test results from a non-diseased population and a diseased population, respectively. Let F be a continuous cumulative distribution function (cdf) of the random variable X, and G—a continuous cdf of the random variable Y. The ROC curve is defined as a plot of \(1-G(c)\) versus \(1-F(c)\) for \(-\infty \le c\le \infty ,\) or equivalently as a plot
against t, for \(t\in [0,1].\)
A special feature of the ROC curve is that it is invariant to any increasing transformation of the data, i.e. if \(X'=h(X),\) and \(Y'=h(Y),\) for some increasing transformation h, then the ROC curve corresponding to the distribution functions F and G is the same as the ROC curve corresponding to the distribution function \(F'\) and \(G'\) of the random variables \(X'\) and \(Y',\) respectively.
In this paper we consider the problem of estimation of the ROC curve in the binormal model, i.e. we assume that after some increasing transformation h, the random variables \(X'\) and \(Y'\) are normally distributed. Without loss of generality we can assume that \(X'\sim \mathcal{N}(0,1)\) and \(Y'\sim \mathcal{N}(\mu ,\sigma ^{2}).\) In this case the ROC curve has a simple parametric form
where \(\Phi \) is the cumulative distribution function of the standard normal distribution. Thus, in the binormal model, the estimation of the ROC curve reduces to the estimation of the parameters \(\mu \) and \(\sigma .\) The most common arguments in favor of using the binormal estimator are presented in Hanley (1988). Swets (1986) and Hanley (1988, 1996) also argue that the binormal estimator is robust.
Many different techniques have been proposed to solve the problem of semiparametric estimation of the ROC curve. For estimating ROC curves from discrete or grouped response data, the most commonly used procedure is that proposed by Dorfman and Alf (1969). Metz et al. (1998) developed an algorithm called LABROC, which groups continuous data into a finite number of ordered categories and then uses the maximum likelihood algorithm from Dorfman and Alf (1969). Hsieh and Turnbull (1996) proposed a generalized least squares procedure for grouped data and a minimum distance estimator (MDE), which does not require grouping data. MDE of the binormal ROC curve was also considered in the papers of Davidov and Nov (2009, 2012). In the papers of Zou and Hall (2000), Cai and Moskowitz (2004), Zhou and Lin (2008) maximum likelihood and pseudo-likelihood approach to estimate the binormal ROC curve was considered. Techniques based on regression were also proposed (see for example Lloyd 2002; Cai and Pepe 2002; Qin and Zhang 2003; Wan and Zhang 2007). Bayesian approach to the semiparametric estimation of the ROC curve was considered in the papers of Branscum et al. (2008), Erkanli et al. (2006), Gu et al. (2008), Gu and Ghosal 2009. The paper Gonçalves et al. (2014) overviews some developments on the estimation of the ROC curve with the particular emphasis on some frequentist and Bayesian methods which have been mostly employed in the medical setting.
This paper deals with minimum distance estimation of the binormal ROC curve. To the best of our knowledge, a minimum distance approach to estimating the binormal ROC curve parameters was considered only by Hsieh and Turnbull (1996) and Davidov and Nov (2009, 2012). In the paper of Davidov and Nov (2009) the central idea was to estimate the unknown function h (a transformation of X and Y to normal random variables) in two different ways; only one of the two estimates depended on the unknown parameters \(\mu \) and \(\sigma \) of the binormal ROC curve. Then, they estimated \(\mu \) and \(\sigma \) by the values that minimized a certain norm of the difference between the estimates of the function h. In this paper we do not develop this idea. A different approach is presented in the papers of Hsieh and Turnbull (1996) and Davidov and Nov (2012). They took into consideration two different measures of distance between the empirical and the theoretical ordinal dominance curve (ODC), the curve closely related to the ROC curve. Davidov and Nov (2012) showed that their MDE is consistent and asymptotically normally distributed and it outperforms Hsieh and Turnbull’s original, grouped-data estimator, but it has not been compared with the Hsieh and Turnbull’s MDE estimator.
In this paper we compare the accuracy of the known MDE’s given by Hsieh and Turnbull (1996) and Davidov and Nov (2012). We obtain that the MDE given by Hsieh and Turnbull (1996) outperforms, in some sense, MDE given by Davidov and Nov (2012). Both of the estimators are obtained by minimization of distance measures between the unknown binormal and empirical ROC curve. Empirical ROC curve, as a step function, often gives unsatisfactory nonparametric estimators of the ROC curve in the case of small sample sizes. Therefore, the second purpose of this work is to introduce modifications of these known measures of distance by replacing the underlaying empirical ROC curve by its continuous nonparametric counterparts. Another modification of Davidov and Nov (2012) approach stems from widening the domain taken into account when the distance between empirical and binormal ROC curve is calculated. In this paper, a total of seven new estimators in binormal model are introduced and their performances are compared in the simulation study.
The paper is organized as follows. In Sect. 2 we recall the MDE’s of the binormal ROC curve parameters considered in the papers of Hsieh and Turnbull (1996) and Davidov and Nov (2012). Then we propose a modification of the Davidov and Nov estimator, and some new MDE’s by replacing the empirical ROC curve by the Bayesian bootstrap estimator of the ROC curve (see Gu and Ghosal 2008) in measures of distance considered by Hsieh and Turnbull (1996) and Davidov and Nov (2012). We prove the consistency of the estimators proposed. We also recall two smooth nonparametric estimators of the ROC curve, namely the kernel estimator considered by Lloyd (1998), and the estimator proposed by Jokiel-Rokita and Pulit (2013), which we also use to obtain MDE’s of the binormal ROC curves. Results from simulation studies are provided in Sect. 3. In Sect. 4 real data analysis is discussed. The paper ends with some concluding remarks in Sect. 5.
2 Minimum distance estimation of the ROC curve
In this section, we recall some known methods and provide some new methods of estimation of the parameters \(\mu \) and \(\sigma \) in the binormal model, basing on the minimum distance concept. Minimum distance estimation has been studied extensively beginning with the work of Wolfowitz (1957). The concept of minimum distance estimation of the binormal ROC curve parameters was introduced in framework of estimation of binormal ordinal dominance curve (ODC) given by \(D(t)=F(G^{-1}(t)),\)\(t\in [0,1].\) The ODC curve is closely related to the ROC curve and in the binormal model it has the following parametric form
However, in course of this paper, we find more convenient to construct all estimators of the unknown parameters \(\mu \) and \(\sigma \) in the direct reference to the ROC curves. Therefore all results originally established for ODC curves will be rephrased in terms of ROC curves.
2.1 Minimum distance estimator of Hsieh and Turnbull
Assume that independent samples \(X_{1},\ldots ,X_{m}\) and \(Y_{1},\ldots ,Y_{n}\) from distributions with cdf’s F and G, respectively, are available. Denote by \(F_{m}\) and \(G_{n}\) the empirical distribution functions of \(X_{1},\ldots ,X_{m}\) and \(Y_{1},\ldots ,Y_{n},\) respectively, and the empirical quantile function by \(G_{n}^{-1}(t)=\inf \{y:G_{n}(y)\ge t\}.\) The empirical ROC curve is defined as
while the empirical ODC curve is given by
In the paper of Hsieh and Turnbull (1996), MDE’s of the ROC curve parameters are derived by finding the ODC curve that fits most closely to the empirical ODC curve using a \(L_{2}\) norm criterion. We adopt the original idea introduced by Hsieh and Turnbull (1996). More precisely, for \(\theta =(\mu ,\sigma )^{T},\) let us denote by
and
the \(L_{2}\)-distance measure between ROC(t) and \(ROC_{mn}(t).\)
The MDE \({\widehat{\theta }}=({\widehat{\mu }},{\widehat{\sigma }})^{T}\) of the parameter \(\theta \) is defined by
where \(\Theta =\{\theta =(\mu ,\sigma )':\mu \in {\mathbb R},\sigma >1\}\), as in the paper of Hsieh and Turnbull (1996). The restriction that \(\sigma >1\) is not unreasonable if one thinks of the healthy response as “noise” and the diseased response as “noise plus signal”. However, we can avoid this restriction if we modify the distance criterion (5) so that the integral is over a closed interval excluding 0 and 1. In the sequel, we will denote the MDE estimator \({\widehat{\theta }}\) by \({\widehat{\theta }}_{HT}=({\widehat{\mu }}_{HT},{\widehat{\sigma }}_{HT}).\) Using the theory developed by Millar (1984), Hsieh and Turnbull (1996) proved the asymptotic normality of their MDE of the parameter \(\theta \), but did not provide any concrete procedure to compute them. In Sect. 3, we describe an algorithm, used in the simulation study, to obtain the estimates \({\hat{\theta }}_{HT}.\)
2.2 Minimum distance estimator of Davidov and Nov
Hsieh and Turnbull (1996) also proposed (in Remark 1), as an object for future research, to modify their measure of distance by applying the \(\Phi ^{-1}\) transformation to both \(D_{mn}(t)\) and D(t) which, in terms of the ROC curve, leads to following counterpart
of \(\xi _{mn}(\theta ).\) Davidov and Nov (2012) followed on this suggestion and considered estimation of the parameter \(\theta \) based on minimization of the following objective function
where the integration endpoints \(0<a<b<1\) ensures that the last integral is finite. Namely, they considered the MDE
where
The minimization problem given by (9) is convex and quadratic in \(\mu \) and \(\sigma \) and, unlike (6), it enjoys a closed-form solution
where
Please note that since we employed the ROC instead of the ODC curve, the formulas (12)–(15) differ from corresponding Davidov and Nov’s (2012) formulas.
The integration endpoints a, b were introduced to ensure that \(\Phi ^{-1}(ROC_{mn}(t))\ne \pm \infty \) and hence that optimization problem (9) is well-defined. However, the selection of the upper integral limit according to Eq. (11) causes that the difference between the empirical ROC curve and the true (binormal) ROC curve on the interval [b, c], where \(c:=\min \{i/m:ROC_{mn}(i/m)=1, i=1,\ldots ,m\}\) (on the last step of the \(ROC_{mn}\)) is not taken into account. We think that this loss of information influences the accuracy of estimates for small samples sizes m and n. Hence, we propose a modification of the minimum distance estimator considered by Davidov and Nov by choosing the upper limit of integration just before the last jump of the empirical ROC curve. Since \(ROC_{mn}(t)\) is right-continuous, we take
where \(\varepsilon _m<1/m\) is a positive constant, which guarantees that \(\Phi ^{-1}(ROC_{mn}(t))<\infty \). Moreover, thanks to the right continuity of the empirical ROC curve, there is no need to introduce any modification for the lower integration endpoint (the lowest possible value is already provided by formula (10)).
The estimates of the parameters \(\mu \) and \(\sigma \) computed with \(b_m'\) instead of b in (12)–(15) will be denoted by \(\hat{\mu }_{DNM}\) and \({\hat{\sigma }}_{DNM},\) respectively. It is clear, that those modified estimators are consistent and asymptotically normal as the original estimators of Davidov and Nov (see Davidov and Nov 2012, Theorems 1 and 2), under the same assumptions.
2.3 Minimum distance estimators of the binormal ROC curve parameters based on BB estimator of the ROC curve
In the paper of Gu and Ghosal (2008) the Bayesian bootstrap (BB) for the nonparametric estimation of the ROC curve and its functionals has been proposed (see also Gu et al. 2008). In this approach stochastic empirical distribution functions, introduced by Rubin (1981), are employed. Let \(U_1,\ldots ,U_{m-1}\) be iid uniform \(\mathcal {U}(0,1)\) random variables, independent of data. Rubin’s stochastic empirical distribution function, say \(F_m^{(b)}\), based on the sample \(X_1,\ldots ,X_m,\) is defied as follows
where \(U_{(i)}\) denotes i-th order statistic of the vector \((U_{1},\ldots ,U_{m-1}).\) The function \(F_m^{(b)}\) is a step function which at each point \(X_{(i)},\)\(i=1,\ldots ,m,\) jumps up by the random value \(U_{(i)}-U_{(i-1)}\), where \(U_{(0)}=0, U_{(m)}=1\). Let \(G_n^{(b)}\) be Rubin’s stochastic empirical distribution function based on the observations \(Y_1,\ldots ,Y_n\) from the second sample. In order to get a ROC curve estimator, say \(ROC_{mn}^{(b)},\) we proceed in the same way as in the case of empirical ROC curve given by (3), and plug in Rubin’s stochastic empirical distribution function \(G_n^{(b)}\) and quantile function \(F_m^{(b){-1}}\) into (1). Next the BB estimate of the ROC curve is obtained by averaging over a large number of \(ROC_{mn}^{(b)}\) realizations, i.e.
The estimator \(ROC_{mn}^{BB}\) is a bandwidth-free nonparametric estimator and, because of averaging over two random variations, is “smoother” than \(ROC_{mn}\). The BB estimates of the ROC curve for two different values of B, based on the samples of equal sizes \(n=m=15,\) together with the empirical and the true ROC curve, are presented in Fig. 1. As can be seen, that even when we average over a small number of realizations, we obtain “smoother” estimate than the empirical ROC curve.
Remark 1
An efficient three-step procedure for computing BB estimates, which does not require inverting the stochastic empirical distribution function (17), was proposed by Gu et al. (2008). In the first step auxiliary variables \(Z_j\) are defined, based on BB resampling distribution,
where \((p_1,\ldots ,p_m)\sim Dirichlet(m;1,\ldots ,1)\) independent of others. In the second step a random realization of ROC curve, \(ROC^\#_{mn}\), is generated as randomized distribution function of \(Z_1,\ldots ,Z_n\); we have
where \((q_1,\ldots ,q_n)\sim Dirichlet(n;1,\ldots ,1)\) independent of others. In the last step the BB estimate of ROC curve is obtained by averaging over the ensemble of random ROC curves \(ROC_{mn}^{BB}(t)=mean(ROC^\#_{mn}(t))\). A convenient method for generating \((p_1,\ldots ,p_m)\sim Dirichlet(m;1,\ldots ,1)\) was also proposed by Gu.
Let us assume that
and
Moreover, throughout this section we assume that the sample sizes m, n are such that \(m=m(n)\) and \(n/m \rightarrow \lambda \in (0,\infty )\) as \(n \rightarrow \infty ,\) and that the following two conditions are satisfied
-
(C1)
The continuous cdf F is twice differentiable on \((\alpha ,\beta )\), the derivative \(F'=f \ne 0\) on \((\alpha ,\beta ),\) and for some \(\gamma >0,\)
$$\begin{aligned} \sup _{x\in (\alpha ,\beta )}\big \{F(x)(1-F(x))|f'(x)/f^2(x)|\big \}\le \gamma . \end{aligned}$$ -
(C2)
Let cdf’s F and G satisfy Condition 1, and additionally
$$\begin{aligned} \sup _{x\in (\alpha ,\beta )}\left\{ F(x)(1-F(x))\Big |\frac{g'(x)}{f^2(x)}\Big |\right\}<\infty ,~\sup _{x\in (\alpha ,\beta )}\left\{ F(x)(1-F(x))\Big |\frac{g(x)}{f(x)}\Big |\right\} <\infty . \end{aligned}$$
Using the theory of Kiefer processes, Gu and Ghosal (2008) proved some strong approximation results and asymptotic properties of the Bayesian bootstrap ROC curve estimator. In particular, its rate of convergence to the true ROC curve was shown to be \(n^{-1/2}\).
We will consider minimum distance estimation of the binormal ROC curve parameters by replacing the empirical ROC curve with corresponding BB estimator \(ROC_{mn}^{BB}(t)\) in measure (8). Since jumps of \(ROC_{mn}^{BB}(t)\) are random we can choose the integration limits in (12)–(15) to be closer to 0 and 1 then in the original procedure. Namely we define
where \(\varepsilon _m < 1/m\) is a positive constant, which need to be introduced due to right continuity of \(ROC_{mn}^{BB}\) function (analogously to (16)). To be more specific, we consider the MDE
Using the same approach as in Sect. 2.2, one can show that the solution to the optimization problem above is given by
where
are the counterparts of Eqs. (12), (13), respectively. Similarly, \(\tilde{S_3}\) and \(\tilde{S_4}\) are computed by changing the integration domain from (a, b) to \((a_m',b_m')\) in Eqs. (14)–(15).
The following lemma can be proved in an analogous manner to Lemma 1 in Davidov and Nov (2012).
Lemma 1
Under the above assumptions, \(a_m'\rightarrow 0\) and \(b_m'\rightarrow 1\) a.s., as \(m\rightarrow \infty \).
Denote
Theorem 1
Under assumptions (C1)–(C2), \({\hat{\mu }}_{DNB}\rightarrow \mu \) and \({\hat{\sigma }}_{DNB}\rightarrow \sigma \) in probability, as \(n\rightarrow \infty \), and hence the estimator \(ROC_{mn}^{DNB}\) of the binormal ROC curve converges pointwise to the true ROC curve on (0, 1).
A proof of Theorem 1 is given in in Appendix.
We will also consider an estimator of the parameter \(\vartheta \), which combines the minimum distance concept of Hsieh and Turnbull with the BB nonparametric estimator of the ROC curve. In this method, Eq. (4) is modified by replacing the empirical \(ROC_{mn}(t)\) curve with the Bayesian bootstrap estimator \(ROC_{mn}^{BB}(t)\) which gives
and the corresponding \(L_{2}\)-distance measure is
The minimum distance estimate \(\hat{\theta }_{HTB}=(\hat{\mu }_{HTB},\hat{\sigma }_{HTB})\) of the parameter \(\theta \) is defined as the value which minimizes (21), i.e.
2.4 Minimum distance estimators of the binormal ROC curve parameters based on smooth nonparametric estimators of the ROC curve
The empirical ROC curve retains many properties of the empirical distribution function. It is uniformly convergent to the theoretical curve (Hsieh and Turnbull 1996), but it is also not continuous and not very accurate for small sample sizes. The idea behind semiparametric procedures of Hsieh and Turnbull, as well as Davidov and Nov, is to minimize a distance between binormal ROC curve given by (2), and the empirical one. In this section we propose MDE’s of the binormal curve by replacing the empirical ROC curve, in measures (5) and (8), by its continuous nonparametric counterparts. Consequently, each considered nonparametric estimator of the ROC curve leads to two new semiparametric minimum distance estimators.
2.4.1 Kernel estimator of the ROC curve
Lloyd (1998) used the kernel smoothing technique to obtain a smooth ROC curve estimator given by
where
are standard kernel estimators with kernel function K, \(\mathcal {K}(v)= \int _{- \infty }^{v}K(z)dz\) and bandwidth parameters \(h_n\) and \(h_m\). Lloyd and Yong (1999) showed that estimator (22) has better mean squared error properties than the empirical ROC curve. In the problem of kernel density estimation, choosing between many available kernel functions is of secondary importance as all give comparable results, but more care needs to be taken over the selection of bandwidth. Therefore, in the kernel ROC curve estimation the main emphasis is put on the bandwidth selection (Zhou and Harezlak 2002, Hall and Hyndman 2003). In the Simulation study (Sect. 3), the Gaussian kernel is employed and the bandwidth parameter \(h_m\) is chosen according to
where \(s_x\) and \(iqr_x\) are the standard deviation and the interquartile range for non-diseased population, respectively. The bandwidth parameter \(h_n\) for diseased population was determined in the same way. This method of bandwidth selection was recommended by Silverman (1986) as it works ’very well for a wide range of densities’, which is reasonable in our case, since we have no information about samples distribution.
Kernel estimator (22) of the ROC curve allows us to introduce two new minimum distance estimators of the binormal ROC curve parameters which will be denoted by \({\hat{\theta }}_{HTK}\) and \({\hat{\theta }}_{DNK}\). The first one employs the \(ROC_{mn}^K(t)\) instead of the empirical ROC curve in Eq. (4), while the latter—in Eq. (7), e.g.
where the integration limits \(a'\) and \(b'\) are the counterparts of Eqs. (18)–(19), where \(ROC_{mn}^{BB}(t)\) is replaced with \(ROC_{mn}^{K}(t)\).
2.4.2 Estimator of the ROC curve by smoothing the sample distribution functions
In the paper of Jokiel-Rokita and Pulit (2013), the authors proposed to estimate the ROC curve using the plug in method with smoothed sample distribution functions. Let \(X_{1:m} \le X_{2:m} \le \dots \le X_{m:m}\) and \(Y_{1:n} \le Y_{2:n} \le \dots \le Y_{n:n}\) denote order statistics from the samples \(\pmb {X}_m\) and \(\pmb {Y}_n\), respectively. We set
where L, U are random variables such that \(L \le \min {\{X_{1:m},Y_{1:n} \}}\) and \(U \ge \max {\{X_{m:m},Y_{n:n} \}}\) almost surely. Denote
With this notation we define the estimators of the distribution functions F, G by
respectively, where
where \(r :[0,1] \rightarrow [0,1]\) is a continuous, strictly increasing function such that \(r(0)=0\), \(r(1)=1\), e.g. \(r(x)=x\). The inverse function of \(F^S_m(t)\) on [L, U] can be written as
It is clear that \({F_m^S}^{-1}(t)\) is continuous and strictly increasing on [0, 1]. Since \(G^S_n(t)\) is continuous and strictly increasing on [L, U], it follows that the composition \(G^S_n({F_m^S}^{-1}(t))\) is continuous and strictly increasing on [0, 1]. Hence we can define the continuous and strictly increasing nonparametric ROC curve estimator by
An appropriate choice of the function r, appearing in formula (23), can guarantee differentiability of the estimator (e.g. if function r is differentiable and \(r_{+}'(0)=r_{-}'(1)=0\)). Simultaneously, determination of the estimator (24) remains as easy as in the case of the empirical ROC curve.
Minimum distance estimators of the parameter \(\theta ,\) based on the nonparametric ROC curve estimator \(ROC_{mn}^S\) applied in (4) and (7) instead of the estimator \(ROC_{mn},\) will be denoted by \({\hat{\theta }}_{HTS}\) and \(\hat{\theta }_{DNS},\) respectively.
3 Simulation study
A simulation experiment was conducted in order to
-
Investigate the accuracy of the original minimum distance estimators considered by Davidov and Nov (2012) in comparison with their modification proposed in Sect. 2.2,
-
Compare the accuracy of the minimum distance estimators of the binormal ROC curve parameters proposed by Hsieh and Turnbull (1996) with those considered by Davidov and Nov (2012) (answer the question: which measure of distance provides more accurate estimators),
-
Compare the accuracy of the minimum distance estimators considered by Hsieh and Turnbull (1996) and Davidov and Nov (2012) with their counterparts obtained by replacing the empirical ROC curve with BB estimator or with the smooth nonparametric estimators of the ROC curve (the kernel estimator and the estimator proposed by Jokiel-Rokita and Pulit 2013).
An important index connected with the ROC curve is the area under the curve, commonly denoted by
It can be easily shown that in the model considered \(AUC=P(X<Y).\) We considered binormal ROC curves which values of AUC were 0.75 and 0.85 and assumed that \(X \sim \mathcal {N}(0,1)\) and Y is normally distributed with standard deviation \(\sigma \in \{1,4/3,2\}\) and mean value \(\mu \) follows according to \(\mu =\sqrt{1+\sigma ^2}\Phi ^{-1}(\text {AUC})\). For each ROC curve, 5000 data sets with \(m=n\in \{15, 20, 100\}\) were generated. Next, for each data set, four nonparametric ROC curve estimators were computed: the empirical ROC curve \(\widehat{ROC}_{mn}\), the smoothed estimator \(ROC_{mn}^S\) according to Eq. (24) with linking function \(r(x)=x\), the kernel estimator \(ROC_{mn}^K\) given by formula (22), and the Bayesian bootstrap estimator \(ROC_{mn}^{BB}\) averaged over \(B=1000\) realizations.
All nonparametric estimators were calculated on regular grid with intervals length of 0.0001. For kernel estimator we additionally used four times denser support grid, in order to compute the inverse of the cdf estimator \({F_m^K}^{-1}\) with sufficient accuracy. As it was tested, further increase of the grid density virtually did not alter the simulation results. Then semiparametric minimum distance estimators were calculated based on nonparametric ones. In study, nine distinct semiparametric estimators were considered: five based on minimum distance approach considered by Davidov and Nov (2012) (shortly D–N estimators) and four based on the measure of distance considered by Hsieh and Turnbull (1996) (shortly H–T estimators). For all D–N estimators, except the original DN, the integration endpoints were calculated according to equation (19) with proper nonparametric ROC estimator plugged in. In practice, due to the finite distance between grid points, there is no need to introduce the \(\varepsilon _n\) constant.
In Hsieh and Turnbull approach one need to numerically minimize the \(L_2\)-distance between the binormal ROC curve and considered nonparametric estimator. For the binormal model this problem corresponds to minimization of a function of two variables \(\mu \) and \(\sigma \). In simulations the Nelder–Mead method was employed to minimize the objective function and initial values of unknown parameters were calculated using corresponding DNM estimator.
The performance of estimators introduced in previous section is studied in two ways: by comparing the estimates of binormal parameters and by looking at the deviation of estimated ROC curve from it’s true shape. In Table 1 estimated bias and MSE of parameters \(\mu \) and \(\sigma \) are listed for four binormal models (with \(\sigma =1\) and \(\sigma =2\) and for two values of AUC: 0.75 and 0.85). In practice one is more interested in estimation of the ROC curve than the parameters of binormal model. Hence, in order to examine overall goodness of fit of the ROC curve estimator the mean integrated square error (MISE)
was estimated, where \(\widehat{ROC}(t)\) stands for the considered ROC curve estimator. In Table 2 the estimated values of MISE (multiplied by 100, for brevity) are collected for three values of \(\sigma \), AUC=0.75, and different sample sizes. Results corresponding to AUC=0.85 are given in Table 3. MISE’s are presented for both semiparametric and nonparametric ROC curves estimates for comparison.
As can be seen from Table 1, there are quite big differences in accuracy between the original (DN) and the modified (DNM) minimum distance estimators of Davidov and Nov, even though the latter requires only a marginal modification in the computational procedure. For \(m=n=10\) and \(m=n=15\) estimated mean square errors of the DNM estimators of parameters \(\mu \) and \(\sigma \) are significantly smaller (sometimes even by half) than the corresponding estimated errors of the original DN estimators. The bias for \({\hat{\vartheta }}_{DNM}\) is also smaller than the one for \({\hat{\vartheta }}_{DN}\), but the difference between them is less prominent. For large samples size, \(m=n=100\), when formulas (11) and (16) yields virtually the same integration endpoints, the DN and DNM procedures give almost the same biases and mean square errors, as expected. The DNM estimator outperforms the original Davidov and Nov (2012) estimator (DN) also in terms of mean integrated square error. The results given in Tables 2 and 3 indicate a reduction of MISE by approximately 10% in the case of small sample sizes and 3% for \(m=n=100\).
We find interesting to examine the accuracy of the estimates obtained by minimization of two distinct measures (5) and (8). In the case of small sample sizes \(m=n=15\) and \(m=n=20\), the HT procedure performs much better in terms of bias and mean square error than DNM, and hence also outperforms the DN, regardless of AUC and true value of parameter \(\sigma \) (cf. Table 1). For \(m=n=100\), the bias of \({\hat{\mu }}_{HT}\) remains much lower than the corresponding bias of \({\hat{\mu }}_{DN}\) and \({\hat{\mu }}_{DMN}\), while the differences in MSE between these estimators are reduced. Simultaneously, the HT method gives also smaller bias of the estimator of \(\sigma \) in comparison to DN and DNM procedures but in some cases it yields greater MSE. These conclusions also holds to a great extend when DNS estimator, based on smoothed nonparametric ROC curve, is compared with corresponding HTS estimator. At the same time, inspection of the results collected in Tables 2 and 3 reveals that estimators based on D–N approach, aside from the original DN, yielded better fit to the true ROC curve in terms of MISE than these originating from H–T procedure—in all models, expect one, estimates that gave the lowest MISE were obtained utilizing the distance measure considered by Davidov and Nov (2012).
Based on simulations, we may also address the influence of replacing the empirical ROC curve with other nonparametric estimators on the accuracy of estimated binormal ROC curve. In all considered models, semiparametric estimators based on smoothed empirical ROC curve, \(ROC_{mn}^S(t)\), performed better than their counterparts based on empirical curve \(ROC_{mn}(t)\) for both employed distance measures. The bias and MSE of \({\hat{\mu }}_{DNS}\) and \({\hat{\sigma }}_{DNS}\) are considerably smaller than of \({\hat{\mu }}_{DNM}\) and \(\hat{\sigma }_{DNM}\), respectively. Similar conclusions can be drawn when compare HTS with original HT procedure. For small sample sizes, the mean square error for estimates of both parameters decreases, by factor of 4.5 on average, when underlaying empirical ROC curve is replaced with it’s smoothed counterpart (24). Naturally, the advantage of estimates based on \(ROC_{mn}^S(t)\) over those based on \(ROC_{mn}(t)\) decreases when sample size increases. However, no significant improvement of parameters estimates is observed when kernel or BB methods are employed. In the case of methods based on Davidov and Nov approach, when one minimizes the objective function given by (9), the estimated biases and MSE’s of the estimators \({\hat{\theta }}_{DNK}\) and \({\hat{\theta }}_{DNB}\) are only slightly reduced with comparison to DNM method. Furthermore, for HTK and HTB methods even some increase of bias and MSE is observed in comparison to original minimum distance procedure of Hsieh and Turnbull. Replacing the underlaying empirical ROC curve with it’s smoothed counterpart leads also to decrease of mean integrated square error of both semiparametric and nonparametric estimators. For eighteen binormal models considered in Tables 2 and 3 the DNS method always outperform the DN and in fifteen cases it yields smaller MISE than DNM estimator. In fact, for AUC = 0.75, the DNS estimator achieves the lowest MISE among all considered in 8 out of 9 comparisons. The HTS estimator exceeds the HT also in 15 out of 18 comparisons. Some improvement of estimates is observed when bootstrap estimator is employed (DNB and HTB methods). Consequently, simulation study shows that replacing empirical ROC curve (3) with its smoothed counterpart (24) significantly improves the minimum distance estimates of the binormal ROC curve.
4 Real data analysis
To illustrate all considered semiparametric estimators, we apply them to data analysed in the paper of Tupikowski et al. (2012). In the dataset the effectiveness of combined treatment of interferon alpha and metronomic cyclophosphamide in patients with metastatic kidney cancer was studied in terms of hemoglobin level (HL) and serum fibrinogen concentration (FC). The dataset contains 31 observations in total; 14 with and 17 without clinical response. Low value of HL or FC level has been recognized as a negative predictor of treatment response and associated with short survival. The estimates of the binormal ROC curves parameters for HL and FC as predictive factors are given in Table 4 for all considered methods. The estimated values of AUC are also tabulated. Interestingly, while the estimates of the parameters \(\mu \) and \(\sigma \) vary between methods, the estimates of AUC are close to each other, and differ only by 7% for both HL and FC.
5 Conclusions and some prospects
In this article seven new estimators of binormal ROC curve in semiparametric setting have been proposed. New estimators originate from the minimum distance concept applied to the ROC curve estimation by Hsieh and Turnbull (1996) and recently revisited by Davidov and Nov (2012). In the original MDE procedures one minimizes some distance measures between the binormal ROC curve, characterized by two parameters \(\mu \) and \(\sigma \), and the empirical ROC curve. In our methods we propose to replace the \(ROC_{mn}\) estimator, which is not continuous and not very accurate for small sample sizes, with other nonparametric estimators of the ROC curve. Procedures involving kernel, Bayesian bootstrap and smoothed ROC curve estimators were considered. Moreover, for estimators based on the Davidov and Nov (2012) approach, the role of appropriate integration limits was emphasized.
The small-sample performance of the proposed estimators was investigated numerically and compared with original procedures of Davidov and Nov (2012) and Hsieh and Turnbull (1996). The biggest improvement, both in terms of the parameters accuracy and MISE, was observed for estimators based on the smoothed \(ROC_{mn}^S\) nonparametric ROC curve estimator (see Sect. 2.4.2). For samples of small sizes, we observed that replacing the \(ROC_{mn}\) with \(ROC_{mn}^S\) in minimum distance procedures can reduce the MSE of the estimators of \(\mu \) and \(\sigma \) parameters by an order of magnitude, and by factor of 4.5 on average. The goodness of fit of the estimator of the ROC curve to the true ROC curve is also improved as indicated by lower mean integrated square error. Employing the BB estimator does not improve the performance of MDE’s so much, while using the kernel estimators sometimes leads to even less accurate semiparametric ROC curves estimates.
In the future research we are going to examine the asymptotic equivalence of the estimators considered. Especially, the asymptotic properties of DNS and HTS estimators needs further investigation since as these methods clearly outperforms the others. In fact, the smoothed nonparametric estimator of the ROC curve, introduced by Jokiel-Rokita and Pulit (2013), seems to be very promising method and theoretical investigation of its asymptotic properties is of our interest. We are also going to study robustness of the considered estimators on model misspecification.
References
Branscum AJ, Johnson WO, Hanson TE, Gardner IA (2008) Bayesian semiparametric ROC curve estimation and disease diagnosis. Stat Med 27:2474–2496
Cai T, Moskowitz CS (2004) Semi-parametric estimation of the binormal ROC curve for a continuous diagnostic test. Biostatistics 5(4):573–586
Cai T, Pepe MS (2002) Semiparametric receiver operating characteristic analysis to evaluate biomarkers for disease. J Am Stat Assoc 97(460):1099–1107
Davidov O, Nov Y (2009) Minimum-norm estimation for binormal receiver operating characteristic (ROC) curves. Biometrical J 51(6):1030–1046
Davidov O, Nov Y (2012) Improving an estimator of Hsieh and Turnbull for the binormal ROC curve. J Stat Plan Inference 142(4):872–877
Dorfman DD, Alf E (1969) Maximum likelihood estimation of parameters of signal detection theory and determination of confidence interval - rating method data. J Math Psychol 6:487–496
Erkanli A, Sung M, Costello EJ, Angold A (2006) Bayesian semi-parametric ROC analysis. Stat Med 25:3905–3928
Gonçalves L, Subtil A, Oliveira MR, De Zea Bermudez P (2014) ROC curve estimation: an overview. REVSTAT Stat J 12(1):1–20
Gu J, Ghosal S (2008) Strong approximations for resample quantile process and applications to ROC methodology. J Nonparametr Stat 20(3):229–240
Gu J, Ghosal S (2009) Bayesian ROC curve estimation under binormality using a rank likelihood. J Stat Plan Inference 139:2076–2083
Gu J, Ghosal S, Roy A (2008) Bayesian bootstrap estimation of ROC curve. Stat Med 27:5407–5420
Hall PG, Hyndman RJ (2003) Improved methods for bandwidth selection when estimating ROC curves. Stat Prob Lett 64(2):181–189
Hanley JA (1988) The robustness of the “binormal” assumptions used in fitting ROC curves. Med Decis Mak 8:197–203
Hanley JA (1996) The use of binormal model for parametric ROC analysis of quantitative diagnostic tests. Stat Med 15:1575–1585
Hsieh F, Turnbull B (1996) Nonparametric and semiparametric estimation of the receiver operating characteristic curve. Ann Stat 24(1):25–40
Jokiel-Rokita A, Pulit M (2013) Nonparametric estimation of the ROC curve based on smoothed empirical distribution function. Stat Comput 23:703–712
Krzanowski W, Hand D (2009) ROC curves for continuous data, volume 111 of \(C\) & \(H/CRC\) monographs on statistics & applied probability. Chapman and Hall/CRC, Boca Raton
Lloyd CJ (1998) Using smoothed receiver operating characteristic curves to summarize and compare diagnostic systems. J Am Stat Assoc 93(444):1356–1364
Lloyd CJ (2002) Estimation of a convex ROC curve. Stat Prob Lett 59(1):99–111
Lloyd C, Yong Z (1999) Kernel estimators of the ROC curve are better than empirical. Stat Prob Lett 44(3):221–228
Metz CE, Herman BA, Shen J-H (1998) Maximum likelihood estimation of receiver characteristic (ROC) curves from continuosly-distributed data. Stat Med 17:1033–1053
Millar PW (1984) A general approach to the optymality of minimum distance estimators. Trans Am Math Soc 286:377–418
Mitzenmacher M, Upfal E (2005) Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, New York
Pepe MS (2003) The statistical evaluation of medical tests for classification and prediction. Oxford University Press, Oxford
Qin J, Zhang B (2003) Using logistic regression procedures for estimating receiver operating characteristic curves. Biometrika 90(3):585–596
Rubin DB (1981) The Bayesian bootstrap. Ann Stat 9(1):130–134
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Swets JA (1986) Form of empirical ROCs in discrimination and diagnostic tasks: implications for theory and measurement of performance. Psychol Bull 99:181–198
Tupikowski K, Dembowski J, Kołodziej A, Niezgoda T, Debiński P, Małkiewicz B, Szydełko T, Kowal P, Zdrojowy R (2012) C133 interferon alpha and metronomic cuclophsphamide for metastatic kidney cancer. Eur Urol Suppl 11(4):113–113
Wan S, Zhang B (2007) Smooth semiparametric receiver operating characteristic curves for continuous diagnostic tests. Stat Med 26:2565–2586
Wolfowitz J (1957) The minimum distance method. Ann Math Stat 28(1):75–88
Zhou XH, Harezlak J (2002) Comparison of bandwidth selection methods for kernel smoothing of ROC curves. Stat Med 21:2045–2055
Zhou X-H, Lin H (2008) Semi-parametric maximum likelihood estimates for ROC curves of continuous-scale tests. Stat Med 27:5271–5290
Zhou XH, Obuchowski NA, McClish DK (2002) Statistical methods in diagnostic medicine. Wiley, New York
Zou KH, Hall WJ (2000) Two transformation models for estimating an ROC curve derived from continuous data. J Appl Stat 27(5):621–631
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Theorem 1
The idea behind the proof is the same as in the proof of Theorem 1 in Davidov and Nov (2012). Let \(S_i,\)\(i=1,\ldots ,4\), be the deterministic counterparts of \(\tilde{S}_i,\) obtained by substituting ROC(t) for \(ROC_{mn}^{BB}(t)\), and the values 0 and 1 for the lower and the upper integration limit, respectively, i.e., for example
Convergence \(\tilde{S}_3\rightarrow S_3\) and \(\tilde{S}_4\rightarrow S_4\) in probability, as \(n\rightarrow \infty ,\) can be easily derived from Lemma 1. We will show that \(\tilde{S}_1\rightarrow S_1\) in probability. In very similar fashion one can show that \(\tilde{S}_2\rightarrow S_2\) in probability, hence, by definition (20), and Continuous Mapping Theorem, the theorem will be proved.
From Lemma 1, the coefficient \(1/(b_m'-a_m')\) converges to 1 a.s., therefore it can be omitted. We have,
The second term of the right-hand side of the above inequality converges to 0 a.s., as it was indicated in Lemma 1, hence it also converges to 0 in probability. Therefore it remains to show that the first term of the above inequality converges to 0 in probability. Using the same arguments as in the original paper of Davidov and Nov (2012), one can show that
where \(\dot{\Phi }^{-1}(x)=(d/dx)\Phi ^{-1}(x)\). Note that the first factor of the right side of inequality (27) depends on the integration limits, while the second—depends on nonparametric ROC curve estimator. In fact, the rate of convergence of
is \(O_P(1/\sqrt{m}),\) what can be deduced from Theorem 4.1 of Gu and Ghosal (2008). We will show that although \(\dot{\Phi }^{-1}(ROC(a_m'))\) converges to \(\infty \) as m increases, it converges to 0 after being multiplied by \(1/\sqrt{m}\); the corresponding proof for \(\dot{\Phi }^{-1}(ROC(b_m'))\) is very similar and hence it is omitted. Let
then the lower integration limit, defined by (19), can be expressed in terms of \(a_m^{(b)}\) as
The definition of \(a_m^{(b)}\) may be equivalently written as
where \(F_m\) is the empirical distribution function based on \(X_{1},\ldots ,X_{m}\). As in the proof of Theorem 1 in Davidov and Nov (2012), we can show that the rate of convergence of the first term of (29) is \(\Omega _P(1/\sqrt{m}).\) The notation \(\Omega _{P}\) is the equivalent of \(O_{P}\) for an asymptotic lower bound, i.e., \(Q_{n}=\Omega _{P}(R_{n})\) if \(R_{n}/Q_{n}\) is bounded in probability. By the Dvoretzky–Kiefer–Wolfowitz inequality, the term in second bracket in (29) converges in probability to 0 exponentially. We will show that the expression in third bracket in (29) converges in probability to 0 faster than 1 / m, hence \(a_m^{(b)}=O_P(1/\sqrt{m})\). For given samples, let K denote the number of observations in \(X_1,\ldots ,X_m\) which are not greater than \(Y_{n:n}\): \(K=\sum _{i=1}^mI_{\{X_j\le Y_{n:n}\}}\). By definition (17) and properties of the empirical distribution function, the following inequality holds
Since \(U_{(k)}\) is k-th order statistic from the uniform distribution \(\mathcal{U}(0,1)\), it has beta distribution \(B(k,m-k)\) with expected value equal to k / m. A suitably tight upper bound for the last probability can be obtained using the following inequality (see Mitzenmacher and Upfal 2005, p. 59)
We have
and
Therefore, due to decomposition (29), we have \(a_m^{(b)}=O_P(1/\sqrt{m}),\) and combining this with relation (28), we conclude that \(a_m'=O_P(1/\sqrt{m})\). Using the same approach as Davidov and Nov (2012) in their proof of Theorem 1, we can show that \(\dot{\Phi }^{-1}(ROC(a_m'))=o_P(\sqrt{m})\) which completes the proof that \(\tilde{S}_1\rightarrow S_1\) in probability, as \(n\rightarrow \infty ,\) and thus theorem is proved.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Jokiel-Rokita, A., Topolnicki, R. Minimum distance estimation of the binormal ROC curve. Stat Papers 60, 2161–2183 (2019). https://doi.org/10.1007/s00362-017-0915-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-017-0915-7