Consistency of the estimator of binary response models based on AUC maximization

Fedotenkov, Igor

doi:10.1007/s10260-013-0229-4

Consistency of the estimator of binary response models based on AUC maximization

Published: 02 February 2013

Volume 22, pages 381–390, (2013)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

Igor Fedotenkov¹

241 Accesses
Explore all metrics

Abstract

This paper examines the asymptotic properties of a binary response model estimator based on maximization of the Area Under receiver operating characteristic Curve (AUC). Given certain assumptions, AUC maximization is a consistent method of binary response model estimation up to normalizations. As AUC is equivalent to Mann-Whitney U statistics and Wilcoxon test of ranks, maximization of area under ROC curve is equivalent to the maximization of corresponding statistics. Compared to parametric methods, such as logit and probit, AUC maximization relaxes assumptions about error distribution, but imposes some restrictions on the distribution of explanatory variables, which can be easily checked, since this information is observable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

A Tutorial on Applying the Difference-in-Differences Method to Health Data

Article Open access 07 September 2023

Notes

$k=1$ leads to a degenerate case, when the parameter of interest is normalized to 1 or $-$1.

References

Agarwal S, Har-Peled S, Roth D (2005) A uniform convergence bound for the area under the ROC curve. In: Proceedings of the 10th international workshop on artificial intelligence and, statistics, pp 1–8
Ailon N, Mohri M (2007) An efficient reduction of ranking to classification. Technical Report TR-2007-903, New York University
Balcan MF, Bansal N, Beygelzimer A, Coppersmith D, Langford J, Sorkin GB (2008) Robust reductions from ranking to classification. Mach Learn J 72(1–2):139–153
Google Scholar
Bamber D (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol 12(4):387–415
Article MathSciNet MATH Google Scholar
Cortes C, Mohri M (2004) AUC optimization vs error rate minimization. Advances in neural information processing systems. MIT Press, Cambridge
Google Scholar
Jaroszewicz S (2006) Polynomial association rules with applications to logistic regression. KDD conference paper, pp 586–591
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36
Google Scholar
Herschtal A, Raskutti B (2004) Optimising area under the roc curve using gradient descent. ACM Press, ICML
Google Scholar
Horowitz JL (1992) Smoothed maximum score estimator for the binary response model. Econometrica 60(3):505–531
Article MathSciNet MATH Google Scholar
Manski CF (1975) Maximum score estimation of the stochastic utility model of choice. J Econom 3(3): 205–228
Google Scholar
Manski CF (1983) Closest empirical distribution estimation. Econometrica 51(2):305–319
Article MathSciNet Google Scholar
Manski CF (1985a) Semiparametric analysis of discrete response: asymptotic properties of the maximum score estimator. J Econom 27(3):313–333
Article MathSciNet MATH Google Scholar
Manski CF (1985b) Semiparametric analysis of binary response from response-based samples. J Econom 31(1):31–40
Article MathSciNet Google Scholar
Manski CF (1986) Operational characteristics of maximum score estimation. J Econom 32(1):85–108
Article MathSciNet Google Scholar
Manski CF (1988) Identification of binary response models. J Am Stat Assoc 83(403):729–738
Article MathSciNet MATH Google Scholar
Marrocco C, Duin RPW, Tortorella F (2008) Maximizing the area under the ROC curve by pairwise feature combination. Pattern Recognit 41(6):1961–1974
Article MATH Google Scholar
Rakotomamonjy A (2004) Optimizing area under ROC curve with SVMs. ROC Anal Artif Intell proceedings, 71–80
Toh KA, Kim J, Lee S (2008) Maximizing area under ROC curve for biometric scores fusion. Pattern Recognit 41:3373–3392
Article MATH Google Scholar
Train K (2003) Discrete choice methods with simulation, 1st edn. Cambridge University Press, Cambridge
Book Google Scholar
Wenxia G, Whitmore GA (2010) Binary response and logistic regression in recent accounting research publications: a methodological note. Rev Quant Financ Account 34(1):81–93
Article Google Scholar
Wooldridge JM (2006) Introductory econometrics: a modern approach, 3rd edn. Thomson South-Western, Canada
Google Scholar

Download references

Acknowledgments

I would like to thank the participants at the 12th Symposium of Mathematics and its Applications (2009) in Timisoara. Furthermore, I wish to thank Alfredas Račkauskas, Dmitrij Celov and Irena Mikolajun for their useful comments and Steve Guttenberg for his help with the English language.

Author information

Authors and Affiliations

Department of Economics, University of Verona, Verona, Italy
Igor Fedotenkov

Authors

Igor Fedotenkov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Igor Fedotenkov.

Appendix

1.1 Proof of Lemma 2

Proof

Using the definition of conditional probability, the expression for $AUC_\infty (b)$ in Eq. (4) can be rewritten:

$$\begin{aligned} AUC_\infty (b)=CP_X(b^{\prime } X_1>b^{\prime } X_2)P_{X,\epsilon }(Y_1=1, Y_2=0 \vert b^{\prime } X_1>b^{\prime } X_2). \end{aligned}$$

(5)

The probability that for a randomly drawn $X_1$ and $X_2$ pair inequality $b^{\prime } X_1>b^{\prime }X_2$ holds is constant and because of assumption 6 it is equal to 0.5. This constant will be included in $C$, to make notation as simple as possible. Employing this notation and the law of total probability we find:

$$\begin{aligned}&\displaystyle AUC_\infty =C P_{X,\epsilon }(Y_1=1, Y_2=0 \vert b^{\prime }X_1>b^{\prime }X_2)=\end{aligned}$$

(6)

$$\begin{aligned}&\displaystyle C\int \int \limits _{b^{\prime }X_1>b^{\prime }X_2}\int {\mathbb{1 }(Y_1=1 \quad and \quad Y_2=0 \vert X_1,X_2)}dF_\epsilon dF_X(X_1)dF_X(X_2)= \end{aligned}$$

(7)

$$\begin{aligned}&\displaystyle C\int \int \limits _{b^{\prime }X_1>b^{\prime }X_2}P_\epsilon (Y_1=1\vert X_1)P_\epsilon (Y_2=0 \vert X_2)dF_X(X_1)dF_X(X_2). \end{aligned}$$

(8)

Equation (8) derives from the facts that the inner integral in Eq. (7) can be treated as a probability and $Y_1$ and $Y_2$ are independent events.

Next, we take the true parameter $\beta $ and compare it with an arbitrary parameter $\tilde{\beta }$. In Eq. (8), parameters alter the integral range and also have an impact on $P_\epsilon (Y_1=1\vert X_1)P_\epsilon (Y_2=0 \vert X_2)$, because $b$ determines if $Y$ will be treated as $Y_1$ or $Y_2$. Note that for an observation with explanatory factors $X_r, P(Y_r=0\vert X_r)=F_\epsilon (-b^{\prime } X_r)$ and $P(Y_r=1\vert X_r)=1-F_\epsilon (-b^{\prime } X_r), \forall X_r, r=1,2$.

Consider $X_1$ and $X_2$ from $D_X$. Without loss of generality it can be assumed that $\beta ^{\prime } X_1> \beta ^{\prime } X_2$. The observations will be ranked correctly when $Y_1=1$ and $Y_2=0$. With parameter $\beta $, the probability of ranking observations correctly is $A= P_\epsilon (Y_1=1\vert X_1)P_\epsilon (Y_2=0\vert X_2)= (1-F_\epsilon (-\beta ^{\prime } X_1))F_\epsilon (-\beta ^{\prime } X_2)$. Now let us consider another parameter: $\tilde{\beta }$. When $\tilde{\beta } X_1 > \tilde{\beta } X_2$, the probability of ranking observations correctly remains the same because $\tilde{\beta }$ doesn’t generate data. Namely, term $1-F_\epsilon (-\beta ^{\prime } X_1)$ remains, because the true probability of $Y_1=1$ is $1-F_\epsilon (-\beta ^{\prime } X_1)$. Another situation is $\tilde{\beta } X_2 > \tilde{\beta } X_1$. Now the probability of correct ranking is $\tilde{A}= P_\epsilon (Y_2=1\vert X_2)P_\epsilon (Y_1=0\vert X_1)= F_\epsilon (-\beta ^{\prime } X_1)(1-F_\epsilon (-\beta ^{\prime } X_2)) =F_\epsilon (-\beta ^{\prime } X_1)-F_\epsilon (-\beta ^{\prime } X_1)F_\epsilon (-\beta ^{\prime } X_2)$. If we compare this with $A=F_\epsilon (-\beta ^{\prime } X_2)-F_\epsilon (-\beta ^{\prime } X_1)F_\epsilon (-\beta ^{\prime } X_2)$, it clear that the term $F_\epsilon (-\beta ^{\prime } X_2)\ge F_\epsilon (-\beta ^{\prime } X_1)$, because $\beta ^{\prime } X_1> \beta ^{\prime } X_2$ and $F_\epsilon $ is nondecreasing. Hence, $A\ge \tilde{A}$.

The integral range of $AUC_\infty (\beta )$ is $\beta ^{\prime }(X_1-X_2)>0$, while taking a parameter $\tilde{\beta }$ the integral range is $\tilde{\beta }^{\prime }(X_1-X_2)>0$. It may be the case that $X$ is concentrated in the area $\tilde{\beta }^{\prime }(X_1-X_2)>0$, with relatively few observations in $\beta ^{\prime }(X_1-X_2)>0$. To insure that this is not the case, it is assumed that $X$ is drawn from a distribution that is symmetric around zero. Lines $\beta ^{\prime }(X_1-X_2)=0$ and $\tilde{\beta }^{\prime }(X_1-X_2)=0$ both pass through the origin in $D_X\times D_X$ space, therefore $X$ symmetry around zero insures that $AUC_\infty (\beta )\ge AUC_\infty (\tilde{\beta })$. $\square $

1.2 Proof of Lemma 3

Proof

Suppose the existence of an $X_1$ and $X_2$ pair, $X_1\in D_X$ and $X_2 \in D_X$, that $\beta ^{\prime } X_1> \beta ^{\prime } X_2$, but $\tilde{\beta }^{\prime } X_1< \tilde{\beta }^{\prime } X_2$. Then a neighborhood of point $X_1 \tilde{U}(X_1)$ exists such that if we substitute $X_1$ with an element $\tilde{X}$ from $\tilde{U}(X_1)$, the inequalities $\beta ^{\prime } \tilde{X}> \beta ^{\prime } X_2$ and $\tilde{\beta }^{\prime } \tilde{X}< \tilde{\beta }^{\prime } X_2$ are valid.

Define $E_r$:

$$\begin{aligned} E_r=min \left(\Bigg \vert \frac{\beta ^{\prime }(X_1-X_2)}{2\beta _r k}\Bigg \vert , \Bigg \vert \frac{\tilde{\beta }^{\prime }(X_1-X_2)}{2\tilde{\beta }_r k}\Bigg \vert \right), \quad r=1,2,\ldots k. \end{aligned}$$

(9)

Now define the neighborhood $\tilde{U}(X_1)$ of the point $X_1$ as a set of all $\tilde{X}$ such that for each component of $\tilde{X} \in \tilde{U}(X_1)$ an inequality is valid: $X_{1,r}-E_r \le \tilde{X}_r \le X_{i,r}+E_r, r=1 \ldots k$. In general $\tilde{U}(X_1)$ shouldn’t necessarily be a subset of $D_X$.

Convergence in assumption 5 implies that $r<\infty $ exists such that $U_r(X_1)\subset \tilde{U}(X_1)$. Hence, $P_X( \tilde{\beta }^{\prime } \tilde{X}< \tilde{\beta }^{\prime } X_2 \;and\; \beta ^{\prime } \tilde{X}> \beta ^{\prime } X_2)>0$. $\square $

1.3 Proof of Lemma 4

Proof

The previous lemma implies that if such an $X_1$ and $X_2$ pair exists, $X_1\in D_X$ and $X_2 \in D_X$, that $\beta ^{\prime } X_1> \beta ^{\prime } X_2$, but $\tilde{\beta }^{\prime } X_1< \tilde{\beta }^{\prime } X_2$; such inequalities are valid with nonzero probability. Together with assumption 4, we get $AUC_\infty (\beta )>AUC_\infty (\tilde{\beta })$. (See proof of Lemma 2.)

Suppose, that $\tilde{\beta }$ exists such that, when $\beta ^{\prime } X_1> \beta ^{\prime } X_2$ holds, $\tilde{\beta } X_1> \tilde{\beta } X_2$ holds. We assumed that the first element of $X$ has a strictly increasing continuous distribution function. Let us take a pair $X_1,X_2 \in D_X$. Furthermore consider a sequence of $\eta _r,r=1,2,3\ldots $ such that $\eta _r<\beta ^{\prime }(X_{1}-X_{2})/\beta _1$, where $\beta _1$ is the first element of vector $\beta $ and $\lim _{r\rightarrow \infty }\eta _r=\beta ^{\prime }(X_{1}-X_{2})/\beta _1$. Thus the inequality $\beta ^{\prime } X_{1}> \beta ^{\prime } X_{2}+\beta _1\eta _r$ is satisfied. Likewise the inequality with $\tilde{\beta }$: $\tilde{\beta }^{\prime } X_{1}> \tilde{\beta }^{\prime } X_{2}+\tilde{\beta }_1\eta _r$ . When $r\rightarrow \infty $ the last inequality may be rewritten as $(\tilde{\beta }^{\prime }- (\tilde{\beta }_1/\beta _1)\beta ^{\prime })(X_{1}-X_2)\ge 0$. Taking a sequence of $\eta _r>\beta ^{\prime }(X_{1}-X_{2})/\beta _1$ converging to $\beta ^{\prime }(X_{1}-X_{2})/\beta _1$ the opposite inequality is found: $((\tilde{\beta }^{\prime }- (\tilde{\beta }_1/\beta _1)\beta ^{\prime })(X_{1}-X_2)\le 0$. Hence, $(\tilde{\beta }^{\prime }- (\tilde{\beta }_1/\beta _1)\beta ^{\prime })(X_{1}-X_2)= 0$. Therefore, $\tilde{\beta }/\tilde{\beta }_1=\beta /\beta _1$: the coefficients $\beta $ and $\tilde{\beta }$ are proportional.

If $\tilde{\beta }=c\beta $ for a $c>0$, the proof that $AUC_{\infty }(\tilde{\beta })=AUC_{\infty }(\beta )$ is trivial. It follows directly from the definition of the AUC.

It follows that $AUC_\infty (\tilde{\beta })=AUC_\infty (\beta )$ is equivalent to $\tilde{\beta }=c\beta $, where $c$ is a constant. $\square $

1.4 Proof of Lemma 5

Proof

To show, that $AUC_\infty (b)$ is continuous on $b$ it is sufficient to show that $AUC_\infty (b+\Delta b)\rightarrow AUC_\infty (b)$ when $\Delta b\rightarrow 0$. Rewrite Eq. (4) for $b+\Delta b$:

$$\begin{aligned}&\!\!\!\! AUC_\infty (b+\Delta b)= C P_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1>(b^{\prime }+\Delta b^{\prime })X_2;\quad Y_1=1, Y_2=0\big ) \nonumber \\&\quad =CP_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1>(b^{\prime }+\Delta b^{\prime })X_2;\quad b^{\prime }X_1>b^{\prime }X_2;\quad Y_1=1, Y_2=0\big ) \nonumber \\&\qquad +CP_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1>(b^{\prime }+\Delta b^{\prime })X_2;\quad b^{\prime }X_1\le b^{\prime }X_2;\quad Y_1=1, Y_2=0\big ).\nonumber \\ \end{aligned}$$

(10)

A similar strategy can be performed with $AUC_\infty (b)$:

$$\begin{aligned}&\!\!\!\! AUC_\infty (b)=\nonumber \\&CP_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1>(b^{\prime }+\Delta b^{\prime })X_2;\quad b^{\prime }X_1>b^{\prime }X_2;\quad Y_1=1, Y_2=0\big ) \nonumber \\&+CP_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1\le (b^{\prime }+\Delta b^{\prime })X_2;\quad b^{\prime }X_1> b^{\prime }X_2;\quad Y_1=1, Y_2=0\big ).\qquad \quad \end{aligned}$$

(11)

Subtracting $AUC_\infty (b)$ from $AUC_\infty (b+\Delta b)$ we get:

$$\begin{aligned}&AUC_\infty (b+\Delta b)-AUC_\infty (b)=\nonumber \\&\quad CP_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1>(b^{\prime }+\Delta b^{\prime })X_2;\quad b^{\prime }X_1\le b^{\prime }X_2;\quad Y_1=1, Y_2=0\big )\nonumber \\&\quad - CP_{X,\epsilon }\big ((b^{\prime }+\Delta b^{\prime })X_1\le (b^{\prime }+\Delta b^{\prime })X_2;\quad b^{\prime }X_1> b^{\prime }X_2;\quad Y_1=1, Y_2=0\big )\nonumber \\ \end{aligned}$$

(12)

$$\begin{aligned}&\lim _{\Delta b\rightarrow 0} \Big (AUC_\infty (b{+}\Delta b){-}AUC_\infty (b)\Big )=\nonumber \\&\quad CP_{X,\epsilon }\big (b^{\prime }X_1\ge b^{\prime }X_2;\quad b^{\prime }X_1\le b^{\prime }X_2;\quad Y_1{=}1, Y_2=0\big ) \nonumber \\&\quad -CP_{X,\epsilon }\big (b^{\prime }X_1\le b^{\prime } X_2;\quad b^{\prime }X_1> b^{\prime }X_2;\quad Y_1=1, Y_2=0\big ). \qquad \qquad \qquad \end{aligned}$$

(13)

The first term in the right hand side of Eq. (13) may be rewritten as $CP_{X,\epsilon }\big (b^{\prime }X_1= b^{\prime }X_2; Y_1=1, Y_2=0\big )$. It is equal to zero because of assumption 6. In the second term, events $b^{\prime }X_1\le b^{\prime } X_2$ and $b^{\prime }X_1> b^{\prime }X_2$ are mutually exclusive, so the probability of such an event is also equal to zero. Therefore, $AUC_\infty (b)$ is continuous on $b$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fedotenkov, I. Consistency of the estimator of binary response models based on AUC maximization. Stat Methods Appl 22, 381–390 (2013). https://doi.org/10.1007/s10260-013-0229-4

Download citation

Accepted: 13 January 2013
Published: 02 February 2013
Issue Date: August 2013
DOI: https://doi.org/10.1007/s10260-013-0229-4

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Consistency of the estimator of binary response models based on AUC maximization

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A Tutorial on Applying the Difference-in-Differences Method to Health Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Proof of Lemma 2

Proof

1.2 Proof of Lemma 3

Proof

1.3 Proof of Lemma 4

Proof

1.4 Proof of Lemma 5

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Consistency of the estimator of binary response models based on AUC maximization

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A Tutorial on Applying the Difference-in-Differences Method to Health Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Proof of Lemma 2

Proof

1.2 Proof of Lemma 3

Proof

1.3 Proof of Lemma 4

Proof

1.4 Proof of Lemma 5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation