Abstract
The objectives of this note are to correct a common error and to clarify the connection between the Gini terminology as used in the economic literature and the one used in the diagnostic and classification literature. More specifically, the connection between the area under the receiver operating characteristic (ROC) curve, which is frequently used in the diagnosis and classification literature, and the Gini terminology, which is mainly used in the economic literature, is clarified. It is shown that the area under the ROC curve is related to the covariance between the two vectors \(Y=\{y_i\}_{i=1}^{n_0}\) and \(\{i/{n_0}\}_{i=1}^{n_0}\). Here \(y_i\) is the number of items classified to group 1 lying between the \((i-1)\mathrm{th}\) and the \(i\mathrm{th}\) items classified to group 0, and \(n_0\) is the number of items in group 0.
Similar content being viewed by others
References
Gajowniczek, K., Zabkowski, T., Szupiluk, R.: Estimating the ROC curve and its significance for classification models’ assessment. Quant. Methods Econ XV(2), 382–391 (2014)
Gini, C.: Reprinted: On the measurement of concentration and variability of characters. Metron 63(1), 3–38 (1914) (2005)
Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001)
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982)
Irwin, R.J., Hautus, M.J.: Lognormal Lorenz and normal receiver operating characteristic curves as mirror images. R. Soc. Open sci. 2, 140280 (2015)
Kakwani, N.C.: Applications of Lorenz curves in economic analysis. Econometrica 45, 719–727 (1977)
Lee, W.C.: Characterizing exposure-disease association in human populations using the Lorenz curve and Gini index. Stat. Med. 16, 729–39 (1997)
Lee, W.C.: Probabilistic analysis of global performances of diagnostic tests: interpreting the Lorenz curve-based summary measures. Stat. Med. 18, 455–471 (1999)
Lerman, R.I., Yitzhaki, S.: A note on the calculation and interpretation of the Gini index. Econ. Lett. 15, 363–368 (1984)
Lilja, H., Cronin, A.M., Dahlin, A., Manjer, J., Nilsson, P.M., Eastham, J.A., Bjartell, A.S., Scardino, P.T., Ulmert, D., Vickers, A.J.: Prediction of significant prostate cancer diagnosed 20 to 30 years later with a single measure of prostate-specific antigen at or before age 50. Cancer 117, 1210–1219 (2011)
Liu, C., White, M., Newell, G.: Measuring and comparing the accuracy of species distribution models with presence–absence data. Ecography 34, 232–243 (2011)
O’Donnell, O., van Doorslaer, E., Wagstaff, A., Lindelow, M.: Analyzing Health Equity Using Household Survey Data: A Guide to Techniques and Their Implementation. World Bank Institute (2008)
Schechtman, E., Yitzhaki, S.: A measure of association based on Gini’s mean difference. Commun. Stat. Theory Methods 16(1), 207–231 (1987)
Siadatya, M.S., Philbrickb, J.T., Heimc, S.W., Schectman, J.M.: Repeated-measures modeling improved comparison of diagnostic tests in meta-analysis of dependent studies. Epidemiology 57, 698–711 (2004)
Vuk, M., Curk, T.: ROC curve, Lift chart and calibration plot. Metodol. Zvezki 3, 89–108 (2006)
Wu, Y.C., Lee, W.C.: Alternative performance measures for prediction models. PLoS One 9(3), e91249 (2014)
Yitzhaki, S.: More than a dozen alternative ways of spelling Gini. Res. Econ. Inequal. 8, 13–30 (1998)
Yitzhaki, S., Schechtman, E.: The Gini Methodology—A Primer on a Statistical Methodology. Springer, New York (2013)
Acknowledgements
We thank Itai Dattner, David Hand, Foster Provost, Benjamin Reiser, Saharon Rosset and Amit Shelef for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proofs of Theorems 1 and 2
Appendix: Proofs of Theorems 1 and 2
Proof of Theorem 1:
Recall the definition of \({\tilde{t}}_i\) given just before the statement of Theorem 1 and denote \(x_i=F_T(\tilde{t}_i)=i/n_0\), \(i=1,\dots ,n_0\)
We start with the first term of the right-hand side of Eq. (5).
So
We still need to show that
Indeed,
Proof of Theorem 2:
Note that \(\sum _{i=1}^{n_0}y_i=n_1\), so \(\bar{y}=n_1/n_0\) and by Theorem 1,
Dividing both sides by \(\bar{y}=\frac{n_1}{n_0}\) we get
where the last equation follows (3). This completes the proof.
Rights and permissions
About this article
Cite this article
Schechtman, E., Schechtman, G. The relationship between Gini terminology and the ROC curve. METRON 77, 171–178 (2019). https://doi.org/10.1007/s40300-019-00160-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40300-019-00160-7