Minimum distance estimation of the binormal ROC curve
 1.2k Downloads
 1 Citations
Abstract
The receiver operating characteristic (ROC) curve describes the performance of a diagnostic test, which classifies individuals into one of two categories. Many parametric, semiparametric and nonparametric estimation methods have been proposed for estimating the ROC curve and its functionals. In this paper the minimum distance estimation of the binormal ROC curve is considered. A modification of the estimator considered in the paper of Davidov and Nov (J Stat Plan Inference 142(4):872–877, 2012) and some new estimators are proposed. We compare the accuracy of the new estimators with known minimum distance estimators of the binormal ROC curve and we conclude that our estimators generally perform better than their competitors.
Keywords
Receiver operating characteristic (ROC) curve Binormal model Semiparametric estimation Minimum distance estimation (MDE) Bayesian bootstrap (BB)1 Introduction
The receiver operating characteristic (ROC) curve is commonly used to describe the accuracy of a medical or other diagnostic test, which classifies individuals into “nondiseased” and “diseased” categories. It is defined as a plot of the true positive rate against the false positive rate, or sensitivity versus 1specificity, for various threshold values. Over the years, it has been widely applied in many fields including biosciences, data mining, experimental psychology, finance, geosciences, machine learning, medicine, radiology, sociology and others. For comprehensive review of the literature, see Zhou et al. (2002), Pepe (2003), Krzanowski and Hand (2009) and Gonçalves et al. (2014).
A special feature of the ROC curve is that it is invariant to any increasing transformation of the data, i.e. if \(X'=h(X),\) and \(Y'=h(Y),\) for some increasing transformation h, then the ROC curve corresponding to the distribution functions F and G is the same as the ROC curve corresponding to the distribution function \(F'\) and \(G'\) of the random variables \(X'\) and \(Y',\) respectively.
Many different techniques have been proposed to solve the problem of semiparametric estimation of the ROC curve. For estimating ROC curves from discrete or grouped response data, the most commonly used procedure is that proposed by Dorfman and Alf (1969). Metz et al. (1998) developed an algorithm called LABROC, which groups continuous data into a finite number of ordered categories and then uses the maximum likelihood algorithm from Dorfman and Alf (1969). Hsieh and Turnbull (1996) proposed a generalized least squares procedure for grouped data and a minimum distance estimator (MDE), which does not require grouping data. MDE of the binormal ROC curve was also considered in the papers of Davidov and Nov (2009, 2012). In the papers of Zou and Hall (2000), Cai and Moskowitz (2004), Zhou and Lin (2008) maximum likelihood and pseudolikelihood approach to estimate the binormal ROC curve was considered. Techniques based on regression were also proposed (see for example Lloyd 2002; Cai and Pepe 2002; Qin and Zhang 2003; Wan and Zhang 2007). Bayesian approach to the semiparametric estimation of the ROC curve was considered in the papers of Branscum et al. (2008), Erkanli et al. (2006), Gu et al. (2008), Gu and Ghosal 2009. The paper Gonçalves et al. (2014) overviews some developments on the estimation of the ROC curve with the particular emphasis on some frequentist and Bayesian methods which have been mostly employed in the medical setting.
This paper deals with minimum distance estimation of the binormal ROC curve. To the best of our knowledge, a minimum distance approach to estimating the binormal ROC curve parameters was considered only by Hsieh and Turnbull (1996) and Davidov and Nov (2009, 2012). In the paper of Davidov and Nov (2009) the central idea was to estimate the unknown function h (a transformation of X and Y to normal random variables) in two different ways; only one of the two estimates depended on the unknown parameters \(\mu \) and \(\sigma \) of the binormal ROC curve. Then, they estimated \(\mu \) and \(\sigma \) by the values that minimized a certain norm of the difference between the estimates of the function h. In this paper we do not develop this idea. A different approach is presented in the papers of Hsieh and Turnbull (1996) and Davidov and Nov (2012). They took into consideration two different measures of distance between the empirical and the theoretical ordinal dominance curve (ODC), the curve closely related to the ROC curve. Davidov and Nov (2012) showed that their MDE is consistent and asymptotically normally distributed and it outperforms Hsieh and Turnbull’s original, groupeddata estimator, but it has not been compared with the Hsieh and Turnbull’s MDE estimator.
In this paper we compare the accuracy of the known MDE’s given by Hsieh and Turnbull (1996) and Davidov and Nov (2012). We obtain that the MDE given by Hsieh and Turnbull (1996) outperforms, in some sense, MDE given by Davidov and Nov (2012). Both of the estimators are obtained by minimization of distance measures between the unknown binormal and empirical ROC curve. Empirical ROC curve, as a step function, often gives unsatisfactory nonparametric estimators of the ROC curve in the case of small sample sizes. Therefore, the second purpose of this work is to introduce modifications of these known measures of distance by replacing the underlaying empirical ROC curve by its continuous nonparametric counterparts. Another modification of Davidov and Nov (2012) approach stems from widening the domain taken into account when the distance between empirical and binormal ROC curve is calculated. In this paper, a total of seven new estimators in binormal model are introduced and their performances are compared in the simulation study.
The paper is organized as follows. In Sect. 2 we recall the MDE’s of the binormal ROC curve parameters considered in the papers of Hsieh and Turnbull (1996) and Davidov and Nov (2012). Then we propose a modification of the Davidov and Nov estimator, and some new MDE’s by replacing the empirical ROC curve by the Bayesian bootstrap estimator of the ROC curve (see Gu and Ghosal 2008) in measures of distance considered by Hsieh and Turnbull (1996) and Davidov and Nov (2012). We prove the consistency of the estimators proposed. We also recall two smooth nonparametric estimators of the ROC curve, namely the kernel estimator considered by Lloyd (1998), and the estimator proposed by JokielRokita and Pulit (2013), which we also use to obtain MDE’s of the binormal ROC curves. Results from simulation studies are provided in Sect. 3. In Sect. 4 real data analysis is discussed. The paper ends with some concluding remarks in Sect. 5.
2 Minimum distance estimation of the ROC curve
2.1 Minimum distance estimator of Hsieh and Turnbull
2.2 Minimum distance estimator of Davidov and Nov
The estimates of the parameters \(\mu \) and \(\sigma \) computed with \(b_m'\) instead of b in (12)–(15) will be denoted by \(\hat{\mu }_{DNM}\) and \({\hat{\sigma }}_{DNM},\) respectively. It is clear, that those modified estimators are consistent and asymptotically normal as the original estimators of Davidov and Nov (see Davidov and Nov 2012, Theorems 1 and 2), under the same assumptions.
2.3 Minimum distance estimators of the binormal ROC curve parameters based on BB estimator of the ROC curve
Remark 1
 (C1)The continuous cdf F is twice differentiable on \((\alpha ,\beta )\), the derivative \(F'=f \ne 0\) on \((\alpha ,\beta ),\) and for some \(\gamma >0,\)$$\begin{aligned} \sup _{x\in (\alpha ,\beta )}\big \{F(x)(1F(x))f'(x)/f^2(x)\big \}\le \gamma . \end{aligned}$$
 (C2)Let cdf’s F and G satisfy Condition 1, and additionally$$\begin{aligned} \sup _{x\in (\alpha ,\beta )}\left\{ F(x)(1F(x))\Big \frac{g'(x)}{f^2(x)}\Big \right\}<\infty ,~\sup _{x\in (\alpha ,\beta )}\left\{ F(x)(1F(x))\Big \frac{g(x)}{f(x)}\Big \right\} <\infty . \end{aligned}$$
The following lemma can be proved in an analogous manner to Lemma 1 in Davidov and Nov (2012).
Lemma 1
Under the above assumptions, \(a_m'\rightarrow 0\) and \(b_m'\rightarrow 1\) a.s., as \(m\rightarrow \infty \).
Theorem 1
Under assumptions (C1)–(C2), \({\hat{\mu }}_{DNB}\rightarrow \mu \) and \({\hat{\sigma }}_{DNB}\rightarrow \sigma \) in probability, as \(n\rightarrow \infty \), and hence the estimator \(ROC_{mn}^{DNB}\) of the binormal ROC curve converges pointwise to the true ROC curve on (0, 1).
A proof of Theorem 1 is given in in Appendix.
2.4 Minimum distance estimators of the binormal ROC curve parameters based on smooth nonparametric estimators of the ROC curve
The empirical ROC curve retains many properties of the empirical distribution function. It is uniformly convergent to the theoretical curve (Hsieh and Turnbull 1996), but it is also not continuous and not very accurate for small sample sizes. The idea behind semiparametric procedures of Hsieh and Turnbull, as well as Davidov and Nov, is to minimize a distance between binormal ROC curve given by (2), and the empirical one. In this section we propose MDE’s of the binormal curve by replacing the empirical ROC curve, in measures (5) and (8), by its continuous nonparametric counterparts. Consequently, each considered nonparametric estimator of the ROC curve leads to two new semiparametric minimum distance estimators.
2.4.1 Kernel estimator of the ROC curve
2.4.2 Estimator of the ROC curve by smoothing the sample distribution functions
Minimum distance estimators of the parameter \(\theta ,\) based on the nonparametric ROC curve estimator \(ROC_{mn}^S\) applied in (4) and (7) instead of the estimator \(ROC_{mn},\) will be denoted by \({\hat{\theta }}_{HTS}\) and \(\hat{\theta }_{DNS},\) respectively.
3 Simulation study

Investigate the accuracy of the original minimum distance estimators considered by Davidov and Nov (2012) in comparison with their modification proposed in Sect. 2.2,

Compare the accuracy of the minimum distance estimators of the binormal ROC curve parameters proposed by Hsieh and Turnbull (1996) with those considered by Davidov and Nov (2012) (answer the question: which measure of distance provides more accurate estimators),

Compare the accuracy of the minimum distance estimators considered by Hsieh and Turnbull (1996) and Davidov and Nov (2012) with their counterparts obtained by replacing the empirical ROC curve with BB estimator or with the smooth nonparametric estimators of the ROC curve (the kernel estimator and the estimator proposed by JokielRokita and Pulit 2013).
All nonparametric estimators were calculated on regular grid with intervals length of 0.0001. For kernel estimator we additionally used four times denser support grid, in order to compute the inverse of the cdf estimator \({F_m^K}^{1}\) with sufficient accuracy. As it was tested, further increase of the grid density virtually did not alter the simulation results. Then semiparametric minimum distance estimators were calculated based on nonparametric ones. In study, nine distinct semiparametric estimators were considered: five based on minimum distance approach considered by Davidov and Nov (2012) (shortly D–N estimators) and four based on the measure of distance considered by Hsieh and Turnbull (1996) (shortly H–T estimators). For all D–N estimators, except the original DN, the integration endpoints were calculated according to equation (19) with proper nonparametric ROC estimator plugged in. In practice, due to the finite distance between grid points, there is no need to introduce the \(\varepsilon _n\) constant.
Estimated bias and MSE (in parentheses) of the estimators of the binormal ROC curve parameters \(\mu \) and \(\sigma \)
\(\sigma \)  \(n=m=15\)  \(n=m=20\)  \(n=m=100\)  

\(\hat{\mu }\)  \(\hat{\sigma }\)  \(\hat{\mu }\)  \(\hat{\sigma }\)  \(\hat{\mu }\)  \(\hat{\sigma }\)  
AUC = 0.75  1  DN  0.248  (1.009)  0.365  (0.984)  0.160  (0.427)  0.224  (0.346)  0.022  (0.030)  0.026  (0.017) 
DNM  0.218  (0.557)  0.350  (0.484)  0.156  (0.292)  0.229  (0.221)  0.027  (0.030)  0.031  (0.016)  
DNS  0.028  (0.211)  0.055  (0.119)  0.051  (0.149)  0.061  (0.074)  0.012  (0.028)  0.013  (0.013)  
DNK  0.151  (0.423)  0.126  (0.362)  0.119  (0.243)  0.103  (0.199)  0.025  (0.030)  0.026  (0.016)  
DNB  0.145  (0.330)  0.147  (0.259)  0.112  (0.185)  0.089  (0.133)  0.020  (0.027)  −0.001  (0.020)  
HT  0.114  (0.392)  0.126  (0.287)  0.083  (0.228)  0.081  (0.162)  0.010  (0.029)  0.008  (0.020)  
HTS  0.018  (0.183)  0.037  (0.127)  0.025  (0.146)  0.026  (0.098)  0.007  (0.028)  0.005  (0.019)  
HTK  0.114  (0.390)  0.126  (0.286)  0.083  (0.227)  0.081  (0.162)  0.009  (0.029)  0.008  (0.020)  
HTB  0.197  (0.486)  0.290  (0.363)  0.142  (0.273)  0.203  (0.194)  0.018  (0.030)  0.029  (0.021)  
2  DN  0.662  (5.162)  0.793  (5.185)  0.348  (1.579)  0.409  (1.486)  0.025  (0.100)  0.032  (0.089)  
DNM  0.484  (2.316)  0.616  (2.432)  0.301  (1.045)  0.369  (0.967)  0.030  (0.096)  0.039  (0.085)  
DNS  −0.078  (0.452)  −0.058  (0.491)  −0.058  (0.374)  −0.048  (0.368)  −0.031  (0.084)  −0.031  (0.071)  
DNK  0.317  (2.060)  0.332  (2.544)  0.210  (0.899)  0.221  (1.002)  0.029  (0.096)  0.037  (0.085)  
DNB  0.376  (2.011)  0.482  (2.612)  0.253  (0.884)  0.324  (1.081)  0.049  (0.100)  0.070  (0.095)  
HT  0.326  (1.838)  0.412  (2.229)  0.201  (0.822)  0.258  (0.906)  0.022  (0.090)  0.041  (0.094)  
HTS  −0.135  (0.358)  −0.143  (0.453)  −0.111  (0.288)  −0.127  (0.356)  −0.029  (0.074)  −0.032  (0.077)  
HTK  0.329  (1.968)  0.416  (2.404)  0.201  (0.818)  0.259  (0.902)  0.021  (0.090)  0.041  (0.094)  
HTB  0.465  (2.215)  0.628  (2.730)  0.301  (0.967)  0.416  (1.082)  0.040  (0.094)  0.073  (0.100)  
AUC = 0.85  1  DN  0.555  (2.651)  0.641  (2.127)  0.389  (1.553)  0.407  (1.024)  0.054  (0.051)  0.047  (0.028) 
DNM  0.442  (1.422)  0.561  (1.158)  0.321  (0.679)  0.361  (0.439)  0.061  (0.049)  0.053  (0.026)  
DNS  0.011  (0.215)  0.097  (0.119)  0.053  (0.191)  0.092  (0.087)  0.045  (0.045)  0.049  (0.020)  
DNK  0.228  (0.934)  0.195  (0.755)  0.190  (0.465)  0.134  (0.312)  0.054  (0.047)  0.044  (0.026)  
DNB  0.109  (0.472)  0.084  (0.356)  0.085  (0.234)  0.024  (0.160)  −0.005  (0.040)  −0.037  (0.031)  
HT  0.241  (0.850)  0.256  (0.565)  0.181  (0.432)  0.166  (0.247)  0.025  (0.041)  0.024  (0.027)  
HTS  −0.036  (0.185)  0.032  (0.129)  −0.014  (0.143)  0.003  (0.092)  0.009  (0.035)  0.005  (0.023)  
HTK  0.242  (0.848)  0.257  (0.563)  0.181  (0.432)  0.166  (0.247)  0.025  (0.041)  0.024  (0.027)  
HTB  0.384  (1.041)  0.440  (0.698)  0.294  (0.539)  0.314  (0.307)  0.045  (0.045)  0.055  (0.030)  
2  DN  1.066  (10.041)  1.109  (7.524)  0.970  (9.465)  0.839  (5.630)  0.073  (0.189)  0.042  (0.125)  
DNM  0.825  (4.513)  0.928  (3.729)  0.760  (4.453)  0.685  (2.824)  0.078  (0.174)  0.046  (0.117)  
DNS  −0.340  (0.569)  −0.173  (0.564)  −0.193  (0.543)  −0.136  (0.485)  −0.043  (0.146)  −0.048  (0.106)  
DNK  0.332  (3.520)  0.335  (3.361)  0.400  (3.812)  0.283  (2.675)  0.069  (0.174)  0.038  (0.120)  
DNB  0.310  (3.368)  0.370  (3.384)  0.329  (3.548)  0.254  (2.651)  0.052  (0.186)  0.023  (0.147)  
HT  0.469  (3.071)  0.550  (2.881)  0.478  (3.265)  0.429  (2.278)  0.067  (0.144)  0.055  (0.121)  
HTS  −0.447  (0.490)  −0.308  (0.510)  −0.354  (0.374)  −0.324  (0.440)  −0.106  (0.094)  −0.123  (0.101)  
HTK  0.466  (2.995)  0.548  (2.831)  0.476  (3.258)  0.428  (2.266)  0.066  (0.144)  0.056  (0.121)  
HTB  0.678  (3.793)  0.774  (3.566)  0.647  (3.948)  0.603  (2.745)  0.103  (0.156)  0.096  (0.129) 
Simulated mean integrate square error, multiplied by 100, for AUC = 0.75
Estimator  \(n=m=15\)  \(n=m=20\)  \(n=m=100\)  

\(\sigma =1\)  \(\sigma =\frac{4}{3}\)  \(\sigma =2\)  \(\sigma =1\)  \(\sigma =\frac{4}{3}\)  \(\sigma =2\)  \(\sigma =1\)  \(\sigma =\frac{4}{3}\)  \(\sigma =2\)  
\(ROC_{mn}\)  1.807  1.652  1.455  1.424  1.250  1.120  0.284  0.256  0.231 
\(ROC_{mn}^S\)  1.395  1.274  1.195  1.161  1.010  0.966  0.271  0.242  0.228 
\(ROC_{mn}^K\)  1.788  1.634  1.437  1.414  1.241  1.111  0.284  0.256  0.230 
\(ROC_{mn}^{BB}\)  1.445  1.355  1.241  1.163  1.036  0.968  0.249  0.229  0.211 
DN  1.512  1.366  1.218  1.141  0.985  0.912  0.198  0.181  0.176 
DNM  1.344  1.209  1.093  1.039  0.895  0.843  0.193  0.177  0.172 
DNS  1.329  1.072  1.002  0.862  0.780  0.789  0.185  0.173  0.171 
DNK  1.479  1.370  1.293  1.119  0.981  0.941  0.194  0.178  0.172 
DNB  1.215  1.183  1.143  0.975  0.897  0.881  0.199  0.183  0.173 
HT  1.403  1.275  1.156  1.104  0.947  0.884  0.209  0.187  0.176 
HTS  1.222  1.108  1.041  0.999  0.854  0.820  0.207  0.184  0.174 
HTK  1.404  1.275  1.154  1.104  0.948  0.883  0.209  0.187  0.176 
HTB  1.335  1.231  1.130  1.062  0.922  0.868  0.207  0.187  0.175 
Same as in Table 2, but for AUC = 0.85
Methods  \(n=m=15\)  \(n=m=20\)  \(n=m=100\)  

\(\sigma =1\)  \(\sigma =\frac{4}{3}\)  \(\sigma =2\)  \(\sigma =1\)  \(\sigma =\frac{4}{3}\)  \(\sigma =2\)  \(\sigma =1\)  \(\sigma =\frac{4}{3}\)  \(\sigma =2\)  
\(ROC_{mn}\)  1.379  1.183  0.953  1.078  0.916  0.756  0.223  0.192  0.162 
\(ROC_{mn}^S\)  0.971  0.892  0.892  0.790  0.700  0.709  0.201  0.176  0.178 
\(ROC_{mn}^K\)  1.364  1.169  0.939  1.070  0.909  0.749  0.223  0.192  0.162 
\(ROC_{mn}^{BB}\)  1.091  0.942  0.786  0.872  0.748  0.644  0.192  0.169  0.147 
DN  1.242  1.025  0.811  0.909  0.753  0.632  0.159  0.140  0.128 
DNM  1.037  0.860  0.690  0.781  0.653  0.556  0.153  0.135  0.124 
DNS  0.811  0.719  0.738  0.593  0.539  0.566  0.145  0.131  0.127 
DNK  1.046  0.978  0.949  0.802  0.732  0.712  0.153  0.137  0.125 
DNB  0.771  0.767  0.750  0.645  0.620  0.617  0.161  0.150  0.131 
HT  1.048  0.884  0.721  0.811  0.679  0.576  0.156  0.134  0.120 
HTS  0.857  0.776  0.767  0.681  0.594  0.592  0.151  0.129  0.122 
HTK  1.048  0.883  0.719  0.811  0.679  0.576  0.157  0.134  0.120 
HTB  0.988  0.834  0.693  0.779  0.653  0.564  0.156  0.134  0.120 
Estimated parameters for Tupikowski’s kidney cancer data for hemoglobin level (HB) and fibrinogen concentration (FC)
HL  FC  

\({\hat{\mu }}\)  \({\hat{\sigma }}\)  AUC  \({\hat{\mu }}\)  \({\hat{\sigma }}\)  AUC  
DN  0.899  1.391  0.7002  0.709  1.134  0.6805 
DNM  0.837  1.067  0.7165  0.688  0.998  0.6868 
DNS  0.884  1.192  0.7149  0.782  1.117  0.6990 
DNK  0.881  1.082  0.7250  0.786  1.085  0.7030 
DNB  0.998  1.187  0.7399  0.859  1.111  0.7173 
HT  0.857  1.301  0.6992  0.629  1.025  0.6699 
HTS  0.853  1.333  0.6957  0.675  1.085  0.6763 
HTK  0.915  1.337  0.7081  0.689  1.058  0.6820 
HTB  0.877  1.135  0.7190  0.699  0.931  0.6955 
Based on simulations, we may also address the influence of replacing the empirical ROC curve with other nonparametric estimators on the accuracy of estimated binormal ROC curve. In all considered models, semiparametric estimators based on smoothed empirical ROC curve, \(ROC_{mn}^S(t)\), performed better than their counterparts based on empirical curve \(ROC_{mn}(t)\) for both employed distance measures. The bias and MSE of \({\hat{\mu }}_{DNS}\) and \({\hat{\sigma }}_{DNS}\) are considerably smaller than of \({\hat{\mu }}_{DNM}\) and \(\hat{\sigma }_{DNM}\), respectively. Similar conclusions can be drawn when compare HTS with original HT procedure. For small sample sizes, the mean square error for estimates of both parameters decreases, by factor of 4.5 on average, when underlaying empirical ROC curve is replaced with it’s smoothed counterpart (24). Naturally, the advantage of estimates based on \(ROC_{mn}^S(t)\) over those based on \(ROC_{mn}(t)\) decreases when sample size increases. However, no significant improvement of parameters estimates is observed when kernel or BB methods are employed. In the case of methods based on Davidov and Nov approach, when one minimizes the objective function given by (9), the estimated biases and MSE’s of the estimators \({\hat{\theta }}_{DNK}\) and \({\hat{\theta }}_{DNB}\) are only slightly reduced with comparison to DNM method. Furthermore, for HTK and HTB methods even some increase of bias and MSE is observed in comparison to original minimum distance procedure of Hsieh and Turnbull. Replacing the underlaying empirical ROC curve with it’s smoothed counterpart leads also to decrease of mean integrated square error of both semiparametric and nonparametric estimators. For eighteen binormal models considered in Tables 2 and 3 the DNS method always outperform the DN and in fifteen cases it yields smaller MISE than DNM estimator. In fact, for AUC = 0.75, the DNS estimator achieves the lowest MISE among all considered in 8 out of 9 comparisons. The HTS estimator exceeds the HT also in 15 out of 18 comparisons. Some improvement of estimates is observed when bootstrap estimator is employed (DNB and HTB methods). Consequently, simulation study shows that replacing empirical ROC curve (3) with its smoothed counterpart (24) significantly improves the minimum distance estimates of the binormal ROC curve.
4 Real data analysis
To illustrate all considered semiparametric estimators, we apply them to data analysed in the paper of Tupikowski et al. (2012). In the dataset the effectiveness of combined treatment of interferon alpha and metronomic cyclophosphamide in patients with metastatic kidney cancer was studied in terms of hemoglobin level (HL) and serum fibrinogen concentration (FC). The dataset contains 31 observations in total; 14 with and 17 without clinical response. Low value of HL or FC level has been recognized as a negative predictor of treatment response and associated with short survival. The estimates of the binormal ROC curves parameters for HL and FC as predictive factors are given in Table 4 for all considered methods. The estimated values of AUC are also tabulated. Interestingly, while the estimates of the parameters \(\mu \) and \(\sigma \) vary between methods, the estimates of AUC are close to each other, and differ only by 7% for both HL and FC.
5 Conclusions and some prospects
In this article seven new estimators of binormal ROC curve in semiparametric setting have been proposed. New estimators originate from the minimum distance concept applied to the ROC curve estimation by Hsieh and Turnbull (1996) and recently revisited by Davidov and Nov (2012). In the original MDE procedures one minimizes some distance measures between the binormal ROC curve, characterized by two parameters \(\mu \) and \(\sigma \), and the empirical ROC curve. In our methods we propose to replace the \(ROC_{mn}\) estimator, which is not continuous and not very accurate for small sample sizes, with other nonparametric estimators of the ROC curve. Procedures involving kernel, Bayesian bootstrap and smoothed ROC curve estimators were considered. Moreover, for estimators based on the Davidov and Nov (2012) approach, the role of appropriate integration limits was emphasized.
The smallsample performance of the proposed estimators was investigated numerically and compared with original procedures of Davidov and Nov (2012) and Hsieh and Turnbull (1996). The biggest improvement, both in terms of the parameters accuracy and MISE, was observed for estimators based on the smoothed \(ROC_{mn}^S\) nonparametric ROC curve estimator (see Sect. 2.4.2). For samples of small sizes, we observed that replacing the \(ROC_{mn}\) with \(ROC_{mn}^S\) in minimum distance procedures can reduce the MSE of the estimators of \(\mu \) and \(\sigma \) parameters by an order of magnitude, and by factor of 4.5 on average. The goodness of fit of the estimator of the ROC curve to the true ROC curve is also improved as indicated by lower mean integrated square error. Employing the BB estimator does not improve the performance of MDE’s so much, while using the kernel estimators sometimes leads to even less accurate semiparametric ROC curves estimates.
In the future research we are going to examine the asymptotic equivalence of the estimators considered. Especially, the asymptotic properties of DNS and HTS estimators needs further investigation since as these methods clearly outperforms the others. In fact, the smoothed nonparametric estimator of the ROC curve, introduced by JokielRokita and Pulit (2013), seems to be very promising method and theoretical investigation of its asymptotic properties is of our interest. We are also going to study robustness of the considered estimators on model misspecification.
References
 Branscum AJ, Johnson WO, Hanson TE, Gardner IA (2008) Bayesian semiparametric ROC curve estimation and disease diagnosis. Stat Med 27:2474–2496MathSciNetCrossRefGoogle Scholar
 Cai T, Moskowitz CS (2004) Semiparametric estimation of the binormal ROC curve for a continuous diagnostic test. Biostatistics 5(4):573–586CrossRefGoogle Scholar
 Cai T, Pepe MS (2002) Semiparametric receiver operating characteristic analysis to evaluate biomarkers for disease. J Am Stat Assoc 97(460):1099–1107MathSciNetCrossRefGoogle Scholar
 Davidov O, Nov Y (2009) Minimumnorm estimation for binormal receiver operating characteristic (ROC) curves. Biometrical J 51(6):1030–1046MathSciNetGoogle Scholar
 Davidov O, Nov Y (2012) Improving an estimator of Hsieh and Turnbull for the binormal ROC curve. J Stat Plan Inference 142(4):872–877MathSciNetCrossRefGoogle Scholar
 Dorfman DD, Alf E (1969) Maximum likelihood estimation of parameters of signal detection theory and determination of confidence interval  rating method data. J Math Psychol 6:487–496CrossRefGoogle Scholar
 Erkanli A, Sung M, Costello EJ, Angold A (2006) Bayesian semiparametric ROC analysis. Stat Med 25:3905–3928MathSciNetCrossRefGoogle Scholar
 Gonçalves L, Subtil A, Oliveira MR, De Zea Bermudez P (2014) ROC curve estimation: an overview. REVSTAT Stat J 12(1):1–20MathSciNetzbMATHGoogle Scholar
 Gu J, Ghosal S (2008) Strong approximations for resample quantile process and applications to ROC methodology. J Nonparametr Stat 20(3):229–240MathSciNetCrossRefGoogle Scholar
 Gu J, Ghosal S (2009) Bayesian ROC curve estimation under binormality using a rank likelihood. J Stat Plan Inference 139:2076–2083MathSciNetCrossRefGoogle Scholar
 Gu J, Ghosal S, Roy A (2008) Bayesian bootstrap estimation of ROC curve. Stat Med 27:5407–5420MathSciNetCrossRefGoogle Scholar
 Hall PG, Hyndman RJ (2003) Improved methods for bandwidth selection when estimating ROC curves. Stat Prob Lett 64(2):181–189MathSciNetCrossRefGoogle Scholar
 Hanley JA (1988) The robustness of the “binormal” assumptions used in fitting ROC curves. Med Decis Mak 8:197–203CrossRefGoogle Scholar
 Hanley JA (1996) The use of binormal model for parametric ROC analysis of quantitative diagnostic tests. Stat Med 15:1575–1585CrossRefGoogle Scholar
 Hsieh F, Turnbull B (1996) Nonparametric and semiparametric estimation of the receiver operating characteristic curve. Ann Stat 24(1):25–40MathSciNetCrossRefGoogle Scholar
 JokielRokita A, Pulit M (2013) Nonparametric estimation of the ROC curve based on smoothed empirical distribution function. Stat Comput 23:703–712MathSciNetCrossRefGoogle Scholar
 Krzanowski W, Hand D (2009) ROC curves for continuous data, volume 111 of \(C\) & \(H/CRC\) monographs on statistics & applied probability. Chapman and Hall/CRC, Boca RatonCrossRefGoogle Scholar
 Lloyd CJ (1998) Using smoothed receiver operating characteristic curves to summarize and compare diagnostic systems. J Am Stat Assoc 93(444):1356–1364CrossRefGoogle Scholar
 Lloyd CJ (2002) Estimation of a convex ROC curve. Stat Prob Lett 59(1):99–111MathSciNetCrossRefGoogle Scholar
 Lloyd C, Yong Z (1999) Kernel estimators of the ROC curve are better than empirical. Stat Prob Lett 44(3):221–228MathSciNetCrossRefGoogle Scholar
 Metz CE, Herman BA, Shen JH (1998) Maximum likelihood estimation of receiver characteristic (ROC) curves from continuoslydistributed data. Stat Med 17:1033–1053CrossRefGoogle Scholar
 Millar PW (1984) A general approach to the optymality of minimum distance estimators. Trans Am Math Soc 286:377–418CrossRefGoogle Scholar
 Mitzenmacher M, Upfal E (2005) Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, New YorkCrossRefGoogle Scholar
 Pepe MS (2003) The statistical evaluation of medical tests for classification and prediction. Oxford University Press, OxfordzbMATHGoogle Scholar
 Qin J, Zhang B (2003) Using logistic regression procedures for estimating receiver operating characteristic curves. Biometrika 90(3):585–596MathSciNetCrossRefGoogle Scholar
 Rubin DB (1981) The Bayesian bootstrap. Ann Stat 9(1):130–134MathSciNetCrossRefGoogle Scholar
 Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, LondonCrossRefGoogle Scholar
 Swets JA (1986) Form of empirical ROCs in discrimination and diagnostic tasks: implications for theory and measurement of performance. Psychol Bull 99:181–198CrossRefGoogle Scholar
 Tupikowski K, Dembowski J, Kołodziej A, Niezgoda T, Debiński P, Małkiewicz B, Szydełko T, Kowal P, Zdrojowy R (2012) C133 interferon alpha and metronomic cuclophsphamide for metastatic kidney cancer. Eur Urol Suppl 11(4):113–113CrossRefGoogle Scholar
 Wan S, Zhang B (2007) Smooth semiparametric receiver operating characteristic curves for continuous diagnostic tests. Stat Med 26:2565–2586MathSciNetCrossRefGoogle Scholar
 Wolfowitz J (1957) The minimum distance method. Ann Math Stat 28(1):75–88MathSciNetCrossRefGoogle Scholar
 Zhou XH, Harezlak J (2002) Comparison of bandwidth selection methods for kernel smoothing of ROC curves. Stat Med 21:2045–2055CrossRefGoogle Scholar
 Zhou XH, Lin H (2008) Semiparametric maximum likelihood estimates for ROC curves of continuousscale tests. Stat Med 27:5271–5290MathSciNetCrossRefGoogle Scholar
 Zhou XH, Obuchowski NA, McClish DK (2002) Statistical methods in diagnostic medicine. Wiley, New YorkCrossRefGoogle Scholar
 Zou KH, Hall WJ (2000) Two transformation models for estimating an ROC curve derived from continuous data. J Appl Stat 27(5):621–631CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.