Skip to main content
Log in

The relationship between Gini terminology and the ROC curve

  • Published:
METRON Aims and scope Submit manuscript

Abstract

The objectives of this note are to correct a common error and to clarify the connection between the Gini terminology as used in the economic literature and the one used in the diagnostic and classification literature. More specifically, the connection between the area under the receiver operating characteristic (ROC) curve, which is frequently used in the diagnosis and classification literature, and the Gini terminology, which is mainly used in the economic literature, is clarified. It is shown that the area under the ROC curve is related to the covariance between the two vectors \(Y=\{y_i\}_{i=1}^{n_0}\) and \(\{i/{n_0}\}_{i=1}^{n_0}\). Here \(y_i\) is the number of items classified to group 1 lying between the \((i-1)\mathrm{th}\) and the \(i\mathrm{th}\) items classified to group 0, and \(n_0\) is the number of items in group 0.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gajowniczek, K., Zabkowski, T., Szupiluk, R.: Estimating the ROC curve and its significance for classification models’ assessment. Quant. Methods Econ XV(2), 382–391 (2014)

  2. Gini, C.: Reprinted: On the measurement of concentration and variability of characters. Metron 63(1), 3–38 (1914) (2005)

  3. Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001)

    Article  Google Scholar 

  4. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982)

    Article  Google Scholar 

  5. Irwin, R.J., Hautus, M.J.: Lognormal Lorenz and normal receiver operating characteristic curves as mirror images. R. Soc. Open sci. 2, 140280 (2015)

    Article  MathSciNet  Google Scholar 

  6. Kakwani, N.C.: Applications of Lorenz curves in economic analysis. Econometrica 45, 719–727 (1977)

    Article  MathSciNet  Google Scholar 

  7. Lee, W.C.: Characterizing exposure-disease association in human populations using the Lorenz curve and Gini index. Stat. Med. 16, 729–39 (1997)

    Article  Google Scholar 

  8. Lee, W.C.: Probabilistic analysis of global performances of diagnostic tests: interpreting the Lorenz curve-based summary measures. Stat. Med. 18, 455–471 (1999)

    Article  Google Scholar 

  9. Lerman, R.I., Yitzhaki, S.: A note on the calculation and interpretation of the Gini index. Econ. Lett. 15, 363–368 (1984)

    Article  Google Scholar 

  10. Lilja, H., Cronin, A.M., Dahlin, A., Manjer, J., Nilsson, P.M., Eastham, J.A., Bjartell, A.S., Scardino, P.T., Ulmert, D., Vickers, A.J.: Prediction of significant prostate cancer diagnosed 20 to 30 years later with a single measure of prostate-specific antigen at or before age 50. Cancer 117, 1210–1219 (2011)

    Article  Google Scholar 

  11. Liu, C., White, M., Newell, G.: Measuring and comparing the accuracy of species distribution models with presence–absence data. Ecography 34, 232–243 (2011)

    Article  Google Scholar 

  12. O’Donnell, O., van Doorslaer, E., Wagstaff, A., Lindelow, M.: Analyzing Health Equity Using Household Survey Data: A Guide to Techniques and Their Implementation. World Bank Institute (2008)

  13. Schechtman, E., Yitzhaki, S.: A measure of association based on Gini’s mean difference. Commun. Stat. Theory Methods 16(1), 207–231 (1987)

    Article  MathSciNet  Google Scholar 

  14. Siadatya, M.S., Philbrickb, J.T., Heimc, S.W., Schectman, J.M.: Repeated-measures modeling improved comparison of diagnostic tests in meta-analysis of dependent studies. Epidemiology 57, 698–711 (2004)

    Google Scholar 

  15. Vuk, M., Curk, T.: ROC curve, Lift chart and calibration plot. Metodol. Zvezki 3, 89–108 (2006)

    Google Scholar 

  16. Wu, Y.C., Lee, W.C.: Alternative performance measures for prediction models. PLoS One 9(3), e91249 (2014)

    Article  Google Scholar 

  17. Yitzhaki, S.: More than a dozen alternative ways of spelling Gini. Res. Econ. Inequal. 8, 13–30 (1998)

    Google Scholar 

  18. Yitzhaki, S., Schechtman, E.: The Gini Methodology—A Primer on a Statistical Methodology. Springer, New York (2013)

    Book  Google Scholar 

Download references

Acknowledgements

We thank Itai Dattner, David Hand, Foster Provost, Benjamin Reiser, Saharon Rosset and Amit Shelef for helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edna Schechtman.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs of Theorems 1 and 2

Appendix: Proofs of Theorems 1 and 2

Proof of Theorem 1:

Recall the definition of \({\tilde{t}}_i\) given just before the statement of Theorem 1 and denote \(x_i=F_T(\tilde{t}_i)=i/n_0\), \(i=1,\dots ,n_0\)

$$\begin{aligned} \text {Cov}(Y,F_T(T))=\frac{1}{n_0}\sum _{i=1}^{n_0}(x_i-\bar{x})(y_i-\bar{y})=\frac{1}{n_0}\sum _{i=1}^{n_0}x_iy_i-\frac{n_0+1}{2n_0}\bar{y} \end{aligned}$$
(5)

We start with the first term of the right-hand side of Eq. (5).

$$\begin{aligned} \frac{1}{n_0}\sum _{i=1}^{n_0}x_iy_i= & {} \frac{1}{n_0}\sum _{i=1}^{n_0}\left( \sum _{j=1}^{i}x_j-\sum _{j=0}^{i-1}x_j\right) y_i =\frac{1}{n_0}\sum _{i=1}^{n_0}\left( \sum _{j=1}^{i}x_j-\sum _{j=1}^{i}x_{j-1}\right) y_i\nonumber \\= & {} \frac{1}{n_0}\sum _{i=1}^{n_0}\sum _{j=1}^{i}(x_j-x_{j-1})y_i =\frac{1}{n_0}\sum _{j=1}^{n_0}\sum _{i=j}^{n_0}(x_j-x_{j-1})y_i\nonumber \\= & {} \frac{1}{n_0}\sum _{j=1}^{n_0}(x_j-x_{j-1})\sum _{i=j}^{n_0}y_i =\frac{1}{n^2_0}\sum _{j=1}^{n_0}\left( \sum _{i=j}^{n_0}y_i\right) \nonumber \\= & {} \frac{1}{n^2_0}\sum _{j=1}^{n_0}\left( \sum _{i=1}^{n_0}y_i-\sum _{i=1}^{j-1}y_i\right) =\bar{y}-\frac{1}{n^2_0}\sum _{j=1}^{n_0}\sum _{i=1}^{j-1}y_i. \end{aligned}$$
(6)

So

$$\begin{aligned} cov(Y,F_T(T))= & {} \frac{1}{n_0}\sum _{i=1}^{n_0}x_iy_i-\frac{n_0+1}{2n_0}\bar{y}=\bar{y}-\frac{1}{n^2_0}\sum _{j=1}^{n_0}\sum _{i=1}^{j-1}y_i-\frac{n_0+1}{2n_0}\bar{y}\nonumber \\= & {} \frac{n_0-1}{2n_0}\bar{y}-\frac{1}{n^2_0}\sum _{j=1}^{n_0}\sum _{i=1}^{j-1}y_i= \frac{n_0+1}{2n^2_0}\sum _{i=1}^{n_0}y_i- \frac{1}{n^2_0}\sum _{j=1}^{n_0+1}\sum _{i=1}^{j-1}y_i . \end{aligned}$$
(7)

We still need to show that

$$\begin{aligned} \sum _{j=1}^{n_0+1}\sum _{i=1}^{j-1}y_i=\sum _{i=1}^{n_0}t_i. \end{aligned}$$

Indeed,

$$\begin{aligned} \sum _{j=1}^{n_0+1}\sum _{i=1}^{j-1}y_i= & {} \sum _{i=1}^{n_0}(n_0+1-i)y_i=\sum _{i=1}^{n_0}(n_0+1-i)(t_i-t_{i-1})\\= & {} \sum _{i=0}^{n_0}(n_0+1-i)t_i-\sum _{i=0}^{n_0-1}(n_0-i)t_i=\sum _{i=0}^{n_0-1}t_i+t_{n{_0}}=\sum _{i=1}^{n_0}t_i. \end{aligned}$$

Proof of Theorem 2:

Note that \(\sum _{i=1}^{n_0}y_i=n_1\), so \(\bar{y}=n_1/n_0\) and by Theorem 1,

$$\begin{aligned} -\text {Cov}(Y,F_T(T))+0.5\bar{y}=\frac{1}{n^2_0}\sum _{i=1}^{n_0}t_i-\frac{(n_0+1)n_1}{2n^2_0}+0.5\bar{y}=\frac{1}{n^2_0}\sum _{i=1}^{n_0}t_i-\frac{n_1}{2n^2_0}. \end{aligned}$$

Dividing both sides by \(\bar{y}=\frac{n_1}{n_0}\) we get

$$\begin{aligned} -\frac{n_0}{n_1}\text {Cov}(Y,F_T(T))+0.5=\frac{\sum _{i=1}^{n_0}t_i}{n_1n_0}-\frac{1}{2n_0}=\hat{A}-\frac{1}{2n_0}. \end{aligned}$$

where the last equation follows (3). This completes the proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schechtman, E., Schechtman, G. The relationship between Gini terminology and the ROC curve. METRON 77, 171–178 (2019). https://doi.org/10.1007/s40300-019-00160-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40300-019-00160-7

Keywords

Navigation