1 Introduction

The receiver operating characteristic (ROC) curve is commonly used to describe the accuracy of a medical or other diagnostic test, which classifies individuals into “non-diseased” and “diseased” categories. It is defined as a plot of the true positive rate against the false positive rate, or sensitivity versus 1-specificity, for various threshold values. Over the years, it has been widely applied in many fields including biosciences, data mining, experimental psychology, finance, geosciences, machine learning, medicine, radiology, sociology and others. For comprehensive review of the literature, see Zhou et al. (2002), Pepe (2003), Krzanowski and Hand (2009) and Gonçalves et al. (2014).

More precisely, let X and Y be the test results from a non-diseased population and a diseased population, respectively. Let F be a continuous cumulative distribution function (cdf) of the random variable X,  and G—a continuous cdf of the random variable Y. The ROC curve is defined as a plot of \(1-G(c)\) versus \(1-F(c)\) for \(-\infty \le c\le \infty ,\) or equivalently as a plot

$$\begin{aligned} ROC(t)=1-G(F^{-1}(1-t)), \end{aligned}$$
(1)

against t, for \(t\in [0,1].\)

A special feature of the ROC curve is that it is invariant to any increasing transformation of the data, i.e. if \(X'=h(X),\) and \(Y'=h(Y),\) for some increasing transformation h,  then the ROC curve corresponding to the distribution functions F and G is the same as the ROC curve corresponding to the distribution function \(F'\) and \(G'\) of the random variables \(X'\) and \(Y',\) respectively.

In this paper we consider the problem of estimation of the ROC curve in the binormal model, i.e. we assume that after some increasing transformation h,  the random variables \(X'\) and \(Y'\) are normally distributed. Without loss of generality we can assume that \(X'\sim \mathcal{N}(0,1)\) and \(Y'\sim \mathcal{N}(\mu ,\sigma ^{2}).\) In this case the ROC curve has a simple parametric form

$$\begin{aligned} ROC(t)=\Phi \Big (\frac{\mu }{\sigma }+\frac{1}{\sigma }\Phi ^{-1}(t)\Big ), \end{aligned}$$
(2)

where \(\Phi \) is the cumulative distribution function of the standard normal distribution. Thus, in the binormal model, the estimation of the ROC curve reduces to the estimation of the parameters \(\mu \) and \(\sigma .\) The most common arguments in favor of using the binormal estimator are presented in Hanley (1988). Swets (1986) and Hanley (1988, 1996) also argue that the binormal estimator is robust.

Many different techniques have been proposed to solve the problem of semiparametric estimation of the ROC curve. For estimating ROC curves from discrete or grouped response data, the most commonly used procedure is that proposed by Dorfman and Alf (1969). Metz et al. (1998) developed an algorithm called LABROC, which groups continuous data into a finite number of ordered categories and then uses the maximum likelihood algorithm from Dorfman and Alf (1969). Hsieh and Turnbull (1996) proposed a generalized least squares procedure for grouped data and a minimum distance estimator (MDE), which does not require grouping data. MDE of the binormal ROC curve was also considered in the papers of Davidov and Nov (2009, 2012). In the papers of Zou and Hall (2000), Cai and Moskowitz (2004), Zhou and Lin (2008) maximum likelihood and pseudo-likelihood approach to estimate the binormal ROC curve was considered. Techniques based on regression were also proposed (see for example Lloyd 2002; Cai and Pepe 2002; Qin and Zhang 2003; Wan and Zhang 2007). Bayesian approach to the semiparametric estimation of the ROC curve was considered in the papers of Branscum et al. (2008), Erkanli et al. (2006), Gu et al. (2008), Gu and Ghosal 2009. The paper Gonçalves et al. (2014) overviews some developments on the estimation of the ROC curve with the particular emphasis on some frequentist and Bayesian methods which have been mostly employed in the medical setting.

This paper deals with minimum distance estimation of the binormal ROC curve. To the best of our knowledge, a minimum distance approach to estimating the binormal ROC curve parameters was considered only by Hsieh and Turnbull (1996) and Davidov and Nov (2009, 2012). In the paper of Davidov and Nov (2009) the central idea was to estimate the unknown function h (a transformation of X and Y to normal random variables) in two different ways; only one of the two estimates depended on the unknown parameters \(\mu \) and \(\sigma \) of the binormal ROC curve. Then, they estimated \(\mu \) and \(\sigma \) by the values that minimized a certain norm of the difference between the estimates of the function h. In this paper we do not develop this idea. A different approach is presented in the papers of Hsieh and Turnbull (1996) and Davidov and Nov (2012). They took into consideration two different measures of distance between the empirical and the theoretical ordinal dominance curve (ODC), the curve closely related to the ROC curve. Davidov and Nov (2012) showed that their MDE is consistent and asymptotically normally distributed and it outperforms Hsieh and Turnbull’s original, grouped-data estimator, but it has not been compared with the Hsieh and Turnbull’s MDE estimator.

In this paper we compare the accuracy of the known MDE’s given by Hsieh and Turnbull (1996) and Davidov and Nov (2012). We obtain that the MDE given by Hsieh and Turnbull (1996) outperforms, in some sense, MDE given by Davidov and Nov (2012). Both of the estimators are obtained by minimization of distance measures between the unknown binormal and empirical ROC curve. Empirical ROC curve, as a step function, often gives unsatisfactory nonparametric estimators of the ROC curve in the case of small sample sizes. Therefore, the second purpose of this work is to introduce modifications of these known measures of distance by replacing the underlaying empirical ROC curve by its continuous nonparametric counterparts. Another modification of Davidov and Nov (2012) approach stems from widening the domain taken into account when the distance between empirical and binormal ROC curve is calculated. In this paper, a total of seven new estimators in binormal model are introduced and their performances are compared in the simulation study.

The paper is organized as follows. In Sect. 2 we recall the MDE’s of the binormal ROC curve parameters considered in the papers of Hsieh and Turnbull (1996) and Davidov and Nov (2012). Then we propose a modification of the Davidov and Nov estimator, and some new MDE’s by replacing the empirical ROC curve by the Bayesian bootstrap estimator of the ROC curve (see Gu and Ghosal 2008) in measures of distance considered by Hsieh and Turnbull (1996) and Davidov and Nov (2012). We prove the consistency of the estimators proposed. We also recall two smooth nonparametric estimators of the ROC curve, namely the kernel estimator considered by Lloyd (1998), and the estimator proposed by Jokiel-Rokita and Pulit (2013), which we also use to obtain MDE’s of the binormal ROC curves. Results from simulation studies are provided in Sect. 3. In Sect. 4 real data analysis is discussed. The paper ends with some concluding remarks in Sect. 5.

2 Minimum distance estimation of the ROC curve

In this section, we recall some known methods and provide some new methods of estimation of the parameters \(\mu \) and \(\sigma \) in the binormal model, basing on the minimum distance concept. Minimum distance estimation has been studied extensively beginning with the work of Wolfowitz (1957). The concept of minimum distance estimation of the binormal ROC curve parameters was introduced in framework of estimation of binormal ordinal dominance curve (ODC) given by \(D(t)=F(G^{-1}(t)),\)\(t\in [0,1].\) The ODC curve is closely related to the ROC curve and in the binormal model it has the following parametric form

$$\begin{aligned} D(t)=\Phi (\mu +\sigma \Phi ^{-1}(t)). \end{aligned}$$

However, in course of this paper, we find more convenient to construct all estimators of the unknown parameters \(\mu \) and \(\sigma \) in the direct reference to the ROC curves. Therefore all results originally established for ODC curves will be rephrased in terms of ROC curves.

2.1 Minimum distance estimator of Hsieh and Turnbull

Assume that independent samples \(X_{1},\ldots ,X_{m}\) and \(Y_{1},\ldots ,Y_{n}\) from distributions with cdf’s F and G, respectively, are available. Denote by \(F_{m}\) and \(G_{n}\) the empirical distribution functions of \(X_{1},\ldots ,X_{m}\) and \(Y_{1},\ldots ,Y_{n},\) respectively, and the empirical quantile function by \(G_{n}^{-1}(t)=\inf \{y:G_{n}(y)\ge t\}.\) The empirical ROC curve is defined as

$$\begin{aligned} ROC_{mn}(t)=1-G_{n}(F_{m}^{-1}(1-t)), \quad \ t\in (0,1), \end{aligned}$$
(3)

while the empirical ODC curve is given by

$$\begin{aligned} D_{mn}(t)=F_m(G_n^{-1}(t)), \quad \ t\in (0,1). \end{aligned}$$

In the paper of Hsieh and Turnbull (1996), MDE’s of the ROC curve parameters are derived by finding the ODC curve that fits most closely to the empirical ODC curve using a \(L_{2}\) norm criterion. We adopt the original idea introduced by Hsieh and Turnbull (1996). More precisely, for \(\theta =(\mu ,\sigma )^{T},\) let us denote by

$$\begin{aligned} \xi _{mn}(\theta )=ROC_{mn}(t)-\Phi \Big (\frac{\mu }{\sigma }+\frac{1}{\sigma }\Phi ^{-1}(t)\Big ), \end{aligned}$$
(4)

and

$$\begin{aligned} \Vert \xi _{mn}(\theta )\Vert = \int _{0}^{1}\xi _{mn}^2(\theta )dt \end{aligned}$$
(5)

the \(L_{2}\)-distance measure between ROC(t) and \(ROC_{mn}(t).\)

The MDE \({\widehat{\theta }}=({\widehat{\mu }},{\widehat{\sigma }})^{T}\) of the parameter \(\theta \) is defined by

$$\begin{aligned} \Vert \xi _{mn}({\widehat{\theta }})\Vert =\inf _{\theta \in \Theta }\Vert \xi _{mn}(\theta )\Vert , \end{aligned}$$
(6)

where \(\Theta =\{\theta =(\mu ,\sigma )':\mu \in {\mathbb R},\sigma >1\}\), as in the paper of Hsieh and Turnbull (1996). The restriction that \(\sigma >1\) is not unreasonable if one thinks of the healthy response as “noise” and the diseased response as “noise plus signal”. However, we can avoid this restriction if we modify the distance criterion (5) so that the integral is over a closed interval excluding 0 and 1. In the sequel, we will denote the MDE estimator \({\widehat{\theta }}\) by \({\widehat{\theta }}_{HT}=({\widehat{\mu }}_{HT},{\widehat{\sigma }}_{HT}).\) Using the theory developed by Millar (1984), Hsieh and Turnbull (1996) proved the asymptotic normality of their MDE of the parameter \(\theta \), but did not provide any concrete procedure to compute them. In Sect. 3, we describe an algorithm, used in the simulation study, to obtain the estimates \({\hat{\theta }}_{HT}.\)

2.2 Minimum distance estimator of Davidov and Nov

Hsieh and Turnbull (1996) also proposed (in Remark 1), as an object for future research, to modify their measure of distance by applying the \(\Phi ^{-1}\) transformation to both \(D_{mn}(t)\) and D(t) which, in terms of the ROC curve, leads to following counterpart

$$\begin{aligned} \nu _{mn}(\theta ) = \Phi ^{-1}(ROC_{mn}(t))-\Big (\frac{\mu }{\sigma }+\frac{1}{\sigma }\Phi ^{-1}(t)\Big ) \end{aligned}$$
(7)

of \(\xi _{mn}(\theta ).\) Davidov and Nov (2012) followed on this suggestion and considered estimation of the parameter \(\theta \) based on minimization of the following objective function

$$\begin{aligned} \Vert \nu _{mn}(\theta )\Vert = \int _a^b \nu _{mn}^2(\theta )dt, \end{aligned}$$
(8)

where the integration endpoints \(0<a<b<1\) ensures that the last integral is finite. Namely, they considered the MDE

$$\begin{aligned} {\widehat{\theta }}_{DN}:=({\widehat{\mu }}_{DN},{\widehat{\sigma }}_{DN})={{\mathrm{arg\,min}}}_{\mu ,\sigma }\int _{a}^{b}\left[ \Phi ^{-1}(ROC_{mn}(t))-\Big (\frac{\mu }{\sigma }+\frac{1}{\sigma }\Phi ^{-1}(t)\Big )\right] ^{2}dt, \end{aligned}$$
(9)

where

$$\begin{aligned} a&=\min \{i/m: ROC_{mn}(i/m)>0, i=1,\ldots ,m\}, \end{aligned}$$
(10)
$$\begin{aligned} b&=\max \{i/m: ROC_{mn}(i/m)<1, i=1,\ldots ,m\}. \end{aligned}$$
(11)

The minimization problem given by (9) is convex and quadratic in \(\mu \) and \(\sigma \) and, unlike (6), it enjoys a closed-form solution

$$\begin{aligned} {\widehat{\mu }}_{DN}={\widehat{\sigma }}_{DN}{\widehat{S}}_{1}-{\widehat{S}}_{3}, \qquad {\widehat{\sigma }}_{DN}=\frac{{\widehat{S}}_{4}-{\widehat{S}}_{3}^{2}}{{\widehat{S}}_{2}-{\widehat{S}}_{1}{\widehat{S}}_{3}}, \end{aligned}$$

where

$$\begin{aligned} \widehat{S}_1&= \displaystyle \frac{1}{b-a}\int _a^b \Phi ^{-1}(ROC_{mn}(t))dt, \end{aligned}$$
(12)
$$\begin{aligned} \widehat{S}_2&= \displaystyle \frac{1}{b-a}\int _a^b \Phi ^{-1}(ROC_{mn}(t))\Phi ^{-1}(t)dt, \end{aligned}$$
(13)
$$\begin{aligned} \widehat{S}_3&= \displaystyle \frac{1}{b-a}\int _a^b \Phi ^{-1}(t)dt, \end{aligned}$$
(14)
$$\begin{aligned} \widehat{S}_4&= \displaystyle \frac{1}{b-a}\int _a^b \left( \Phi ^{-1}(t)\right) ^2dt. \end{aligned}$$
(15)

Please note that since we employed the ROC instead of the ODC curve, the formulas (12)–(15) differ from corresponding Davidov and Nov’s (2012) formulas.

The integration endpoints ab were introduced to ensure that \(\Phi ^{-1}(ROC_{mn}(t))\ne \pm \infty \) and hence that optimization problem (9) is well-defined. However, the selection of the upper integral limit according to Eq. (11) causes that the difference between the empirical ROC curve and the true (binormal) ROC curve on the interval [bc],  where \(c:=\min \{i/m:ROC_{mn}(i/m)=1, i=1,\ldots ,m\}\) (on the last step of the \(ROC_{mn}\)) is not taken into account. We think that this loss of information influences the accuracy of estimates for small samples sizes m and n. Hence, we propose a modification of the minimum distance estimator considered by Davidov and Nov by choosing the upper limit of integration just before the last jump of the empirical ROC curve. Since \(ROC_{mn}(t)\) is right-continuous, we take

$$\begin{aligned} b_m'=\sup \{t \in [0,1]: ROC_{mn}(t)<1\}-\varepsilon _m, \end{aligned}$$
(16)

where \(\varepsilon _m<1/m\) is a positive constant, which guarantees that \(\Phi ^{-1}(ROC_{mn}(t))<\infty \). Moreover, thanks to the right continuity of the empirical ROC curve, there is no need to introduce any modification for the lower integration endpoint (the lowest possible value is already provided by formula (10)).

The estimates of the parameters \(\mu \) and \(\sigma \) computed with \(b_m'\) instead of b in (12)–(15) will be denoted by \(\hat{\mu }_{DNM}\) and \({\hat{\sigma }}_{DNM},\) respectively. It is clear, that those modified estimators are consistent and asymptotically normal as the original estimators of Davidov and Nov (see Davidov and Nov 2012, Theorems 1 and 2), under the same assumptions.

2.3 Minimum distance estimators of the binormal ROC curve parameters based on BB estimator of the ROC curve

In the paper of Gu and Ghosal (2008) the Bayesian bootstrap (BB) for the nonparametric estimation of the ROC curve and its functionals has been proposed (see also Gu et al. 2008). In this approach stochastic empirical distribution functions, introduced by Rubin (1981), are employed. Let \(U_1,\ldots ,U_{m-1}\) be iid uniform \(\mathcal {U}(0,1)\) random variables, independent of data. Rubin’s stochastic empirical distribution function, say \(F_m^{(b)}\), based on the sample \(X_1,\ldots ,X_m,\) is defied as follows

$$\begin{aligned} F_m^{(b)}(x) = \left\{ \begin{array}{lll} 0, &{} \text {when}&{}x< X_{(1)}, \\ U_{(i)}, &{} \text {when}&{} X_{(i)} \le x < X_{(i+1)}, 1 \le i \le m-1, \\ 1, &{} \text {when}&{} x \ge X_{(m)}, \end{array} \right. \end{aligned}$$
(17)

where \(U_{(i)}\) denotes i-th order statistic of the vector \((U_{1},\ldots ,U_{m-1}).\) The function \(F_m^{(b)}\) is a step function which at each point \(X_{(i)},\)\(i=1,\ldots ,m,\) jumps up by the random value \(U_{(i)}-U_{(i-1)}\), where \(U_{(0)}=0, U_{(m)}=1\). Let \(G_n^{(b)}\) be Rubin’s stochastic empirical distribution function based on the observations \(Y_1,\ldots ,Y_n\) from the second sample. In order to get a ROC curve estimator, say \(ROC_{mn}^{(b)},\) we proceed in the same way as in the case of empirical ROC curve given by (3), and plug in Rubin’s stochastic empirical distribution function \(G_n^{(b)}\) and quantile function \(F_m^{(b){-1}}\) into (1). Next the BB estimate of the ROC curve is obtained by averaging over a large number of \(ROC_{mn}^{(b)}\) realizations, i.e.

$$\begin{aligned} ROC_{mn}^{BB}(t) = \frac{1}{B} \sum _{b=1}^B ROC_{mn}^{(b)}(t). \end{aligned}$$

The estimator \(ROC_{mn}^{BB}\) is a bandwidth-free nonparametric estimator and, because of averaging over two random variations, is “smoother” than \(ROC_{mn}\). The BB estimates of the ROC curve for two different values of B,  based on the samples of equal sizes \(n=m=15,\) together with the empirical and the true ROC curve, are presented in Fig. 1. As can be seen, that even when we average over a small number of realizations, we obtain “smoother” estimate than the empirical ROC curve.

Fig. 1
figure 1

Comparison of empirical and BB estimates of ROC curve with the true binormal ROC curve for \(\mu = 1.8\) and \(\sigma =1.5\). The estimates are based on samples of sizes \(m=n=15.\)

Remark 1

An efficient three-step procedure for computing BB estimates, which does not require inverting the stochastic empirical distribution function (17), was proposed by Gu et al. (2008). In the first step auxiliary variables \(Z_j\) are defined, based on BB resampling distribution,

$$\begin{aligned} Z_j=1-F^\#(Y_j)=1-\sum _{i=1}^m p_i I(X_i \le Y_j), \end{aligned}$$

where \((p_1,\ldots ,p_m)\sim Dirichlet(m;1,\ldots ,1)\) independent of others. In the second step a random realization of ROC curve, \(ROC^\#_{mn}\), is generated as randomized distribution function of \(Z_1,\ldots ,Z_n\); we have

$$\begin{aligned} ROC^\#_{mn}(t)=\sum _{j=1}^n q_j I(Z_j \le t), \end{aligned}$$

where \((q_1,\ldots ,q_n)\sim Dirichlet(n;1,\ldots ,1)\) independent of others. In the last step the BB estimate of ROC curve is obtained by averaging over the ensemble of random ROC curves \(ROC_{mn}^{BB}(t)=mean(ROC^\#_{mn}(t))\). A convenient method for generating \((p_1,\ldots ,p_m)\sim Dirichlet(m;1,\ldots ,1)\) was also proposed by Gu.

Let us assume that

$$\begin{aligned} \sup \{x: F(x)=0\}=\sup \{x: G(x)=0\} := \alpha \end{aligned}$$

and

$$\begin{aligned} \inf \{x: F(x)=1\}=\inf \{x: G(x)=1\} := \beta . \end{aligned}$$

Moreover, throughout this section we assume that the sample sizes mn are such that \(m=m(n)\) and \(n/m \rightarrow \lambda \in (0,\infty )\) as \(n \rightarrow \infty ,\) and that the following two conditions are satisfied

  1. (C1)

    The continuous cdf F is twice differentiable on \((\alpha ,\beta )\), the derivative \(F'=f \ne 0\) on \((\alpha ,\beta ),\) and for some \(\gamma >0,\)

    $$\begin{aligned} \sup _{x\in (\alpha ,\beta )}\big \{F(x)(1-F(x))|f'(x)/f^2(x)|\big \}\le \gamma . \end{aligned}$$
  2. (C2)

    Let cdf’s F and G satisfy Condition 1, and additionally

    $$\begin{aligned} \sup _{x\in (\alpha ,\beta )}\left\{ F(x)(1-F(x))\Big |\frac{g'(x)}{f^2(x)}\Big |\right\}<\infty ,~\sup _{x\in (\alpha ,\beta )}\left\{ F(x)(1-F(x))\Big |\frac{g(x)}{f(x)}\Big |\right\} <\infty . \end{aligned}$$

Using the theory of Kiefer processes, Gu and Ghosal (2008) proved some strong approximation results and asymptotic properties of the Bayesian bootstrap ROC curve estimator. In particular, its rate of convergence to the true ROC curve was shown to be \(n^{-1/2}\).

We will consider minimum distance estimation of the binormal ROC curve parameters by replacing the empirical ROC curve with corresponding BB estimator \(ROC_{mn}^{BB}(t)\) in measure (8). Since jumps of \(ROC_{mn}^{BB}(t)\) are random we can choose the integration limits in (12)–(15) to be closer to 0 and 1 then in the original procedure. Namely we define

$$\begin{aligned} a_m'= & {} \inf \left\{ t \in [0,1]: ROC_{mn}^{BB}(t)>0 \right\} , \end{aligned}$$
(18)
$$\begin{aligned} b_m'= & {} \sup \left\{ t \in [0,1]: ROC_{mn}^{BB}(t)<1 \right\} - \varepsilon _m, \end{aligned}$$
(19)

where \(\varepsilon _m < 1/m\) is a positive constant, which need to be introduced due to right continuity of \(ROC_{mn}^{BB}\) function (analogously to (16)). To be more specific, we consider the MDE

$$\begin{aligned} {\widehat{\theta }}_{DNB}= & {} ({\widehat{\mu }}_{DNB},{\widehat{\sigma }}_{DNB})\\:= & {} {{\mathrm{arg\,min}}}_{\mu ,\sigma }\int _{a_m'}^{b_m'}\left[ \Phi ^{-1}(ROC_{mn}^{BB}(t))-\Big (\frac{\mu }{\sigma }+\frac{1}{\sigma }\Phi ^{-1}(t)\Big )\right] ^{2}dt. \end{aligned}$$

Using the same approach as in Sect. 2.2, one can show that the solution to the optimization problem above is given by

$$\begin{aligned} \hat{\mu }_{DNB}=\hat{\sigma }_{DNB}\tilde{S}_1-\tilde{S}_3, \qquad \hat{\sigma }_{DNB}=\frac{\tilde{S}_4-\tilde{S}_3^2}{\tilde{S}_2-\tilde{S}_1\tilde{S}_3}, \end{aligned}$$
(20)

where

$$\begin{aligned} \tilde{S_1}= & {} \frac{1}{b_m'-a_m'}\int _{a_m'}^{b_m'}\Phi ^{-1}(ROC_{mn}^{BB}(t))dt,\\ \tilde{S_2}= & {} \frac{1}{b_m'-a_m'}\int _{a_m'}^{b_m'}\Phi ^{-1}(ROC_{mn}^{BB}(t))\Phi ^{-1}(t)dt, \end{aligned}$$

are the counterparts of Eqs. (12), (13), respectively. Similarly, \(\tilde{S_3}\) and \(\tilde{S_4}\) are computed by changing the integration domain from (ab) to \((a_m',b_m')\) in Eqs. (14)–(15).

The following lemma can be proved in an analogous manner to Lemma 1 in Davidov and Nov (2012).

Lemma 1

Under the above assumptions, \(a_m'\rightarrow 0\) and \(b_m'\rightarrow 1\) a.s., as \(m\rightarrow \infty \).

Denote

$$\begin{aligned} ROC_{mn}^{DNB}(t)=\Phi \Big (\frac{\hat{\mu }_{DNB}}{\hat{\sigma }_{DNB}}+\frac{1}{\hat{\sigma }_{DNB}}\Phi ^{-1}(t)\Big ). \end{aligned}$$

Theorem 1

Under assumptions (C1)–(C2), \({\hat{\mu }}_{DNB}\rightarrow \mu \) and \({\hat{\sigma }}_{DNB}\rightarrow \sigma \) in probability, as \(n\rightarrow \infty \), and hence the estimator \(ROC_{mn}^{DNB}\) of the binormal ROC curve converges pointwise to the true ROC curve on (0, 1).

A proof of Theorem 1 is given in in Appendix.

We will also consider an estimator of the parameter \(\vartheta \), which combines the minimum distance concept of Hsieh and Turnbull with the BB nonparametric estimator of the ROC curve. In this method, Eq. (4) is modified by replacing the empirical \(ROC_{mn}(t)\) curve with the Bayesian bootstrap estimator \(ROC_{mn}^{BB}(t)\) which gives

$$\begin{aligned} \xi _{mn}^{BB}(\theta )=ROC_{mn}^{BB}(t)-\Phi \Big (\frac{\mu }{\sigma }+\frac{1}{\sigma }\Phi ^{-1}(t)\Big ), \end{aligned}$$

and the corresponding \(L_{2}\)-distance measure is

$$\begin{aligned} \Vert \xi _{mn}^{BB}(\theta )\Vert = \int _{0}^{1}{\xi _{mn}^{BB}}^2(\theta )dt. \end{aligned}$$
(21)

The minimum distance estimate \(\hat{\theta }_{HTB}=(\hat{\mu }_{HTB},\hat{\sigma }_{HTB})\) of the parameter \(\theta \) is defined as the value which minimizes (21), i.e.

$$\begin{aligned} \Vert \xi _{mn}^{BB}({\hat{\theta }_{HTB}})\Vert =\inf _{\theta }\Vert \xi _{mn}^{BB}(\theta )\Vert . \end{aligned}$$

2.4 Minimum distance estimators of the binormal ROC curve parameters based on smooth nonparametric estimators of the ROC curve

The empirical ROC curve retains many properties of the empirical distribution function. It is uniformly convergent to the theoretical curve (Hsieh and Turnbull 1996), but it is also not continuous and not very accurate for small sample sizes. The idea behind semiparametric procedures of Hsieh and Turnbull, as well as Davidov and Nov, is to minimize a distance between binormal ROC curve given by (2), and the empirical one. In this section we propose MDE’s of the binormal curve by replacing the empirical ROC curve, in measures (5) and (8), by its continuous nonparametric counterparts. Consequently, each considered nonparametric estimator of the ROC curve leads to two new semiparametric minimum distance estimators.

2.4.1 Kernel estimator of the ROC curve

Lloyd (1998) used the kernel smoothing technique to obtain a smooth ROC curve estimator given by

$$\begin{aligned} ROC_{mn}^K(t)=1-G^K_n({F^K_m}^{-1}(1-t)),\quad t \in [0,1], \end{aligned}$$
(22)

where

$$\begin{aligned} F^K_m(x)=\frac{1}{m}\sum _{j=1}^{m}\mathcal {K}\left( \frac{x-X_j}{h_m}\right) , \quad G^K_n(x)=\frac{1}{n}\sum _{j=1}^{n}\mathcal {K}\left( \frac{x-Y_j}{h_n}\right) \end{aligned}$$

are standard kernel estimators with kernel function K, \(\mathcal {K}(v)= \int _{- \infty }^{v}K(z)dz\) and bandwidth parameters \(h_n\) and \(h_m\). Lloyd and Yong (1999) showed that estimator (22) has better mean squared error properties than the empirical ROC curve. In the problem of kernel density estimation, choosing between many available kernel functions is of secondary importance as all give comparable results, but more care needs to be taken over the selection of bandwidth. Therefore, in the kernel ROC curve estimation the main emphasis is put on the bandwidth selection (Zhou and Harezlak 2002, Hall and Hyndman 2003). In the Simulation study (Sect. 3), the Gaussian kernel is employed and the bandwidth parameter \(h_m\) is chosen according to

$$\begin{aligned} h_m=0.9\min (s_x,iqr_x/1.34)m^{-1.5}, \end{aligned}$$

where \(s_x\) and \(iqr_x\) are the standard deviation and the interquartile range for non-diseased population, respectively. The bandwidth parameter \(h_n\) for diseased population was determined in the same way. This method of bandwidth selection was recommended by Silverman (1986) as it works ’very well for a wide range of densities’, which is reasonable in our case, since we have no information about samples distribution.

Kernel estimator (22) of the ROC curve allows us to introduce two new minimum distance estimators of the binormal ROC curve parameters which will be denoted by \({\hat{\theta }}_{HTK}\) and \({\hat{\theta }}_{DNK}\). The first one employs the \(ROC_{mn}^K(t)\) instead of the empirical ROC curve in Eq. (4), while the latter—in Eq. (7), e.g.

$$\begin{aligned} {\hat{\theta }}_{HTK}= & {} ({\hat{\mu }}_{HTK},{\hat{\sigma }}_{HTK}) = {{\mathrm{arg\,min}}}_{\mu ,\sigma }\int _{0}^{1}\left[ ROC_{mn}^{K}(t)-\Phi \Big (\frac{\mu }{\sigma }+\frac{1}{\sigma }\Phi ^{-1}(t)\Big )\right] ^{2}dt, \\ {\hat{\theta }}_{DNK}= & {} ({\hat{\mu }}_{DNK},{\hat{\sigma }}_{DNK}) = {{\mathrm{arg\,min}}}_{\mu ,\sigma }\int _{a'}^{b'}\left[ \Phi ^{-1}(ROC_{mn}^{K}(t))-\Big (\frac{\mu }{\sigma }+\frac{1}{\sigma }\Phi ^{-1}(t)\Big )\right] ^{2}dt, \end{aligned}$$

where the integration limits \(a'\) and \(b'\) are the counterparts of Eqs. (18)–(19), where \(ROC_{mn}^{BB}(t)\) is replaced with \(ROC_{mn}^{K}(t)\).

2.4.2 Estimator of the ROC curve by smoothing the sample distribution functions

In the paper of Jokiel-Rokita and Pulit (2013), the authors proposed to estimate the ROC curve using the plug in method with smoothed sample distribution functions. Let \(X_{1:m} \le X_{2:m} \le \dots \le X_{m:m}\) and \(Y_{1:n} \le Y_{2:n} \le \dots \le Y_{n:n}\) denote order statistics from the samples \(\pmb {X}_m\) and \(\pmb {Y}_n\), respectively. We set

$$\begin{aligned}&X_{0:m}= 2L - X_{1:m}, \quad X_{(m+1):m}=2U - X_{m:m},\\&Y_{0:n}= 2L - Y_{1:n}, \quad Y_{(n+1):n}=2U - Y_{n:n}, \end{aligned}$$

where L, U are random variables such that \(L \le \min {\{X_{1:m},Y_{1:n} \}}\) and \(U \ge \max {\{X_{m:m},Y_{n:n} \}}\) almost surely. Denote

$$\begin{aligned}&Q_j(\pmb {X}_m)=\frac{X_{(j-1):m}+X_{j:m}}{2}, \quad j=1,2,\dots ,m+1,\\&R_j(\pmb {X}_m)=Q_{j+1}(\pmb {X}_m)-Q_j(\pmb {X}_m)=\frac{X_{(j+1):m}-X_{(j-1):m}}{2}, \quad j=1,2,\dots ,m, \\&Q_j(\pmb {Y}_n)=\frac{Y_{(j-1):n}+Y_{j:n}}{2}, \quad j=1,2,\dots ,n+1,\\&R_j(\pmb {Y}_n)=Q_{j+1}(\pmb {Y}_n)-Q_j(\pmb {Y}_n)=\frac{Y_{(j+1):n}-Y_{(j-1):n}}{2}, \quad j=1,2,\dots ,n. \end{aligned}$$

With this notation we define the estimators of the distribution functions FG by

$$\begin{aligned}&F^S_m(x)=\frac{1}{m}\sum _{j=1}^{m}T\left( \frac{x-Q_j(\pmb {X}_m)}{R_j(\pmb {X}_m)}\right) , \\&G^S_n(x)=\frac{1}{n}\sum _{j=1}^{n}T\left( \frac{x-Q_j(\pmb {Y}_n)}{R_j(\pmb {Y}_n)}\right) , \end{aligned}$$

respectively, where

$$\begin{aligned} T(x) = \left\{ \begin{array}{ll} 0, &{}\quad \text {for }x < 0,\\ r(x), &{}\quad \text {for } 0 \le x \le 1,\\ 1, &{}\quad \text {for }x > 1, \end{array} \right. \end{aligned}$$
(23)

where \(r :[0,1] \rightarrow [0,1]\) is a continuous, strictly increasing function such that \(r(0)=0\), \(r(1)=1\), e.g. \(r(x)=x\). The inverse function of \(F^S_m(t)\) on [LU] can be written as

$$\begin{aligned} {F_m^S}^{-1}(t) = \left\{ \begin{array}{ll} L, &{} \text {for }t=0,\\ r^{-1}(mt-(k-1))R_k(\pmb {X}_m)+Q_k(\pmb {X}_m), &{} \text {for} \frac{k-1}{m}<t\le \frac{k}{m}, k=1,\dots ,m. \end{array} \right. \end{aligned}$$

It is clear that \({F_m^S}^{-1}(t)\) is continuous and strictly increasing on [0, 1]. Since \(G^S_n(t)\) is continuous and strictly increasing on [LU], it follows that the composition \(G^S_n({F_m^S}^{-1}(t))\) is continuous and strictly increasing on [0, 1]. Hence we can define the continuous and strictly increasing nonparametric ROC curve estimator by

$$\begin{aligned} ROC^S_{mn}(t)=1-G^S_n({F_m^S}^{-1}(1-t)),\quad t \in [0,1]. \end{aligned}$$
(24)

An appropriate choice of the function r,  appearing in formula (23), can guarantee differentiability of the estimator (e.g. if function r is differentiable and \(r_{+}'(0)=r_{-}'(1)=0\)). Simultaneously, determination of the estimator (24) remains as easy as in the case of the empirical ROC curve.

Minimum distance estimators of the parameter \(\theta ,\) based on the nonparametric ROC curve estimator \(ROC_{mn}^S\) applied in (4) and (7) instead of the estimator \(ROC_{mn},\) will be denoted by \({\hat{\theta }}_{HTS}\) and \(\hat{\theta }_{DNS},\) respectively.

3 Simulation study

A simulation experiment was conducted in order to

  • Investigate the accuracy of the original minimum distance estimators considered by Davidov and Nov (2012) in comparison with their modification proposed in Sect. 2.2,

  • Compare the accuracy of the minimum distance estimators of the binormal ROC curve parameters proposed by Hsieh and Turnbull (1996) with those considered by Davidov and Nov (2012) (answer the question: which measure of distance provides more accurate estimators),

  • Compare the accuracy of the minimum distance estimators considered by Hsieh and Turnbull (1996) and Davidov and Nov (2012) with their counterparts obtained by replacing the empirical ROC curve with BB estimator or with the smooth nonparametric estimators of the ROC curve (the kernel estimator and the estimator proposed by Jokiel-Rokita and Pulit 2013).

An important index connected with the ROC curve is the area under the curve, commonly denoted by

$$\begin{aligned} AUC=\int _{0}^{1}ROC(t)dt. \end{aligned}$$
(25)

It can be easily shown that in the model considered \(AUC=P(X<Y).\) We considered binormal ROC curves which values of AUC were 0.75 and 0.85 and assumed that \(X \sim \mathcal {N}(0,1)\) and Y is normally distributed with standard deviation \(\sigma \in \{1,4/3,2\}\) and mean value \(\mu \) follows according to \(\mu =\sqrt{1+\sigma ^2}\Phi ^{-1}(\text {AUC})\). For each ROC curve, 5000 data sets with \(m=n\in \{15, 20, 100\}\) were generated. Next, for each data set, four nonparametric ROC curve estimators were computed: the empirical ROC curve \(\widehat{ROC}_{mn}\), the smoothed estimator \(ROC_{mn}^S\) according to Eq. (24) with linking function \(r(x)=x\), the kernel estimator \(ROC_{mn}^K\) given by formula (22), and the Bayesian bootstrap estimator \(ROC_{mn}^{BB}\) averaged over \(B=1000\) realizations.

All nonparametric estimators were calculated on regular grid with intervals length of 0.0001. For kernel estimator we additionally used four times denser support grid, in order to compute the inverse of the cdf estimator \({F_m^K}^{-1}\) with sufficient accuracy. As it was tested, further increase of the grid density virtually did not alter the simulation results. Then semiparametric minimum distance estimators were calculated based on nonparametric ones. In study, nine distinct semiparametric estimators were considered: five based on minimum distance approach considered by Davidov and Nov (2012) (shortly D–N estimators) and four based on the measure of distance considered by Hsieh and Turnbull (1996) (shortly H–T estimators). For all D–N estimators, except the original DN, the integration endpoints were calculated according to equation (19) with proper nonparametric ROC estimator plugged in. In practice, due to the finite distance between grid points, there is no need to introduce the \(\varepsilon _n\) constant.

In Hsieh and Turnbull approach one need to numerically minimize the \(L_2\)-distance between the binormal ROC curve and considered nonparametric estimator. For the binormal model this problem corresponds to minimization of a function of two variables \(\mu \) and \(\sigma \). In simulations the Nelder–Mead method was employed to minimize the objective function and initial values of unknown parameters were calculated using corresponding DNM estimator.

Table 1 Estimated bias and MSE (in parentheses) of the estimators of the binormal ROC curve parameters \(\mu \) and \(\sigma \)

The performance of estimators introduced in previous section is studied in two ways: by comparing the estimates of binormal parameters and by looking at the deviation of estimated ROC curve from it’s true shape. In Table 1 estimated bias and MSE of parameters \(\mu \) and \(\sigma \) are listed for four binormal models (with \(\sigma =1\) and \(\sigma =2\) and for two values of AUC: 0.75 and 0.85). In practice one is more interested in estimation of the ROC curve than the parameters of binormal model. Hence, in order to examine overall goodness of fit of the ROC curve estimator the mean integrated square error (MISE)

$$\begin{aligned} \text {MISE}=E\Big (\int _0^1\big (ROC(t)-\widehat{ROC}(t)\big )^2dt\Big ), \end{aligned}$$

was estimated, where \(\widehat{ROC}(t)\) stands for the considered ROC curve estimator. In Table 2 the estimated values of MISE (multiplied by 100, for brevity) are collected for three values of \(\sigma \), AUC=0.75, and different sample sizes. Results corresponding to AUC=0.85 are given in Table 3. MISE’s are presented for both semiparametric and nonparametric ROC curves estimates for comparison.

As can be seen from Table 1, there are quite big differences in accuracy between the original (DN) and the modified (DNM) minimum distance estimators of Davidov and Nov, even though the latter requires only a marginal modification in the computational procedure. For \(m=n=10\) and \(m=n=15\) estimated mean square errors of the DNM estimators of parameters \(\mu \) and \(\sigma \) are significantly smaller (sometimes even by half) than the corresponding estimated errors of the original DN estimators. The bias for \({\hat{\vartheta }}_{DNM}\) is also smaller than the one for \({\hat{\vartheta }}_{DN}\), but the difference between them is less prominent. For large samples size, \(m=n=100\), when formulas (11) and (16) yields virtually the same integration endpoints, the DN and DNM procedures give almost the same biases and mean square errors, as expected. The DNM estimator outperforms the original Davidov and Nov (2012) estimator (DN) also in terms of mean integrated square error. The results given in Tables 2 and 3 indicate a reduction of MISE by approximately 10% in the case of small sample sizes and 3% for \(m=n=100\).

Table 2 Simulated mean integrate square error, multiplied by 100, for AUC = 0.75
Table 3 Same as in Table 2, but for AUC = 0.85

We find interesting to examine the accuracy of the estimates obtained by minimization of two distinct measures (5) and (8). In the case of small sample sizes \(m=n=15\) and \(m=n=20\), the HT procedure performs much better in terms of bias and mean square error than DNM, and hence also outperforms the DN, regardless of AUC and true value of parameter \(\sigma \) (cf. Table 1). For \(m=n=100\), the bias of \({\hat{\mu }}_{HT}\) remains much lower than the corresponding bias of \({\hat{\mu }}_{DN}\) and \({\hat{\mu }}_{DMN}\), while the differences in MSE between these estimators are reduced. Simultaneously, the HT method gives also smaller bias of the estimator of \(\sigma \) in comparison to DN and DNM procedures but in some cases it yields greater MSE. These conclusions also holds to a great extend when DNS estimator, based on smoothed nonparametric ROC curve, is compared with corresponding HTS estimator. At the same time, inspection of the results collected in Tables 2 and 3 reveals that estimators based on D–N approach, aside from the original DN, yielded better fit to the true ROC curve in terms of MISE than these originating from H–T procedure—in all models, expect one, estimates that gave the lowest MISE were obtained utilizing the distance measure considered by Davidov and Nov (2012).

Table 4 Estimated parameters for Tupikowski’s kidney cancer data for hemoglobin level (HB) and fibrinogen concentration (FC)

Based on simulations, we may also address the influence of replacing the empirical ROC curve with other nonparametric estimators on the accuracy of estimated binormal ROC curve. In all considered models, semiparametric estimators based on smoothed empirical ROC curve, \(ROC_{mn}^S(t)\), performed better than their counterparts based on empirical curve \(ROC_{mn}(t)\) for both employed distance measures. The bias and MSE of \({\hat{\mu }}_{DNS}\) and \({\hat{\sigma }}_{DNS}\) are considerably smaller than of \({\hat{\mu }}_{DNM}\) and \(\hat{\sigma }_{DNM}\), respectively. Similar conclusions can be drawn when compare HTS with original HT procedure. For small sample sizes, the mean square error for estimates of both parameters decreases, by factor of 4.5 on average, when underlaying empirical ROC curve is replaced with it’s smoothed counterpart (24). Naturally, the advantage of estimates based on \(ROC_{mn}^S(t)\) over those based on \(ROC_{mn}(t)\) decreases when sample size increases. However, no significant improvement of parameters estimates is observed when kernel or BB methods are employed. In the case of methods based on Davidov and Nov approach, when one minimizes the objective function given by (9), the estimated biases and MSE’s of the estimators \({\hat{\theta }}_{DNK}\) and \({\hat{\theta }}_{DNB}\) are only slightly reduced with comparison to DNM method. Furthermore, for HTK and HTB methods even some increase of bias and MSE is observed in comparison to original minimum distance procedure of Hsieh and Turnbull. Replacing the underlaying empirical ROC curve with it’s smoothed counterpart leads also to decrease of mean integrated square error of both semiparametric and nonparametric estimators. For eighteen binormal models considered in Tables 2 and 3 the DNS method always outperform the DN and in fifteen cases it yields smaller MISE than DNM estimator. In fact, for AUC = 0.75, the DNS estimator achieves the lowest MISE among all considered in 8 out of 9 comparisons. The HTS estimator exceeds the HT also in 15 out of 18 comparisons. Some improvement of estimates is observed when bootstrap estimator is employed (DNB and HTB methods). Consequently, simulation study shows that replacing empirical ROC curve (3) with its smoothed counterpart (24) significantly improves the minimum distance estimates of the binormal ROC curve.

4 Real data analysis

To illustrate all considered semiparametric estimators, we apply them to data analysed in the paper of Tupikowski et al. (2012). In the dataset the effectiveness of combined treatment of interferon alpha and metronomic cyclophosphamide in patients with metastatic kidney cancer was studied in terms of hemoglobin level (HL) and serum fibrinogen concentration (FC). The dataset contains 31 observations in total; 14 with and 17 without clinical response. Low value of HL or FC level has been recognized as a negative predictor of treatment response and associated with short survival. The estimates of the binormal ROC curves parameters for HL and FC as predictive factors are given in Table 4 for all considered methods. The estimated values of AUC are also tabulated. Interestingly, while the estimates of the parameters \(\mu \) and \(\sigma \) vary between methods, the estimates of AUC are close to each other, and differ only by 7% for both HL and FC.

5 Conclusions and some prospects

In this article seven new estimators of binormal ROC curve in semiparametric setting have been proposed. New estimators originate from the minimum distance concept applied to the ROC curve estimation by Hsieh and Turnbull (1996) and recently revisited by Davidov and Nov (2012). In the original MDE procedures one minimizes some distance measures between the binormal ROC curve, characterized by two parameters \(\mu \) and \(\sigma \), and the empirical ROC curve. In our methods we propose to replace the \(ROC_{mn}\) estimator, which is not continuous and not very accurate for small sample sizes, with other nonparametric estimators of the ROC curve. Procedures involving kernel, Bayesian bootstrap and smoothed ROC curve estimators were considered. Moreover, for estimators based on the Davidov and Nov (2012) approach, the role of appropriate integration limits was emphasized.

The small-sample performance of the proposed estimators was investigated numerically and compared with original procedures of Davidov and Nov (2012) and Hsieh and Turnbull (1996). The biggest improvement, both in terms of the parameters accuracy and MISE, was observed for estimators based on the smoothed \(ROC_{mn}^S\) nonparametric ROC curve estimator (see Sect. 2.4.2). For samples of small sizes, we observed that replacing the \(ROC_{mn}\) with \(ROC_{mn}^S\) in minimum distance procedures can reduce the MSE of the estimators of \(\mu \) and \(\sigma \) parameters by an order of magnitude, and by factor of 4.5 on average. The goodness of fit of the estimator of the ROC curve to the true ROC curve is also improved as indicated by lower mean integrated square error. Employing the BB estimator does not improve the performance of MDE’s so much, while using the kernel estimators sometimes leads to even less accurate semiparametric ROC curves estimates.

In the future research we are going to examine the asymptotic equivalence of the estimators considered. Especially, the asymptotic properties of DNS and HTS estimators needs further investigation since as these methods clearly outperforms the others. In fact, the smoothed nonparametric estimator of the ROC curve, introduced by Jokiel-Rokita and Pulit (2013), seems to be very promising method and theoretical investigation of its asymptotic properties is of our interest. We are also going to study robustness of the considered estimators on model misspecification.