On the definition of a concentration function relevant to the ROC curve

This is a reader's reaction to a recent paper by E. Schechtman and G. Schechtman (Metron, 2019) about the correct definition of a concentration function for the diagnostic, i.e. supervised classification, problem. We propose and motivate a different definition and refer to the relevant literature.

together two different distributions, and can not be reduced to univariate indices such as the Gini. Schechtman and Schechtman [5] build on the wealth of research reviewed in the monograph by Yitzhaki and Schechtman [6], where a whole technology based on the Gini and the co-Gini are proposed as an alternative to traditional variance and covariance based methods to study variability, correlation, regression and the like.
Of course, studying how jointly distributed random variables interrelate is a very fundamental problem in Statistics and its applications to Economics and the Sciences. However, when turning to the diagnostic (or classification) setup, where ROC curves are naturally used, we observe one or more diagnostic variables (called features in the Machine Learning literature) from two populations and try to set up a rule that discriminates between them. Some special requirements can then be identified: 1. Two probability distributions should be evaluated as alternative, mutually exclusive explanations of the data, rather than from a joint point of view; for example, a diagnostic marker observed on a patient has either the sick patient distribution or the healthy patient distribution, and in no way the same marker can be observed jointly under both sick and healthy conditions. 2. The definition of the ROC curve and the associated concentration function should be viable also in the multivariate setup; for example, more than one diagnostic marker can be observed on the same patient. 3. The definition of the ROC curve and the associated concentration function must be given both at the population and at the sample level, as widely discussed in the ROC literature (see for example [2]); a clear definition of the ROC curve at the population level is necessary to understand basic ideas and to give appropriate definitions.
We claim the definition of (absolute and) relative concentration curve contained in [5] is not appropriate for the diagnostic setup since: a. conditional distributions are used in the Definition 1 of [5], thus contradicting requirement 1); b. percentiles are used in the same definition, thus contradicting requirement 2); c. in [5], the discussion on the ROC curve is mantained at the sample level only, making it hard to understand what is, for example, the definition of population ROC curve.
We believe the correct definition of concentration function for the diagnostic setup was given by Cifarelli and Regazzini in [1] and discussed in [4], where the relationship to the ROC curve is also established. The purpose of this paper is therefore to show that the definition of concentration function as given by Cifarelli and Regazzini is more suitable to the diagnostic setup since it is a one-to-one transformation of the ROC curve of the optimal diagnostic test, i.e. the one based on the likelihood ratio. Some examples are given in Section 4.

The concentration function by Cifarelli and Regazzini and its relationship to the ROC curve of the optimal test
To favor the comparison with [5], we adopt a similar notation and make some simplifying assumptions. This way, we can state special cases of the main results in [1] and [4] which possess sufficient generality to clarify our point but, at the same time, avoid measure-theoretic complications.
In particular, assume that Y is a continuous random variable with distribution function F Y and positive density f Y and X is a continuous random variable with distribution function F X and positive density f X . Y and X represent, respectively, the relevant diagnostic variable under the two conditions to be compared by a diagnostic test. For example, Y may be a biological marker measured in a diseased person, whereas X is the same marker when measured in a healthy person. Next, define the likelihood ratios which are pro bono random variables since they are functions of X and Y , respectively.
The definition is a special case of the one given in [1]; according to their suggestion, for each p ∈ [0, 1] the concentration function ϕ(p) is the likelihood ratio Y -mass of a set collecting the smallest p fraction of the likelihood ratio X-mass. The strongest simplifying assumption made here is that L X and L Y are continuous random variable; that is not true in general, even if X and Y are continuous, since partially parallel densities f X and f Y may create atoms in the distributions of L X and L Y .
The likelihood ratio is the fundamental measure of comparison of the Y and X distributions and plays a role similar (and in our opinion, more appropriate) to the role the conditional expectation of Y given X plays in [5].
The likelihood ratio is also a fundamental tool in the definition of the following optimal diagnostic test in the no-data situation (i.e. when the X and Y distributions are known and no estimation is needed) Definition 2 Suppose an observation Z has to be assigned either to the X or to the Y population. Then, for some 0 < t < 1, the likelihood ratio based As it is well known in the ROC literature, the diagnostic rule based on the likelihood ratio is the best possible test we can construct based on Z, as discussed for example in [7]. Optimality basically stems from the Neyman-Pearson lemma. The point of the above construction is that the optimal diagnostic test has a ROC curve which is a bijective transformation of the concentration function in Definition 1 above, as shown in the following theorem.
Theorem 1 Under the above assumptions, the ROC curve of the optimal likelihood ratio based diagnostic test is while, as usual, ROC opt (0) = 0 and ROC opt (1) = 1.
Proof. As usual, the ROC curve can be calculated as a parametric curve in t by computing separately the True Positive Rate (TPR) and the False Positive Rate (FPR): Setting q = 1 − t and substituting, we obtain the explicit form (2). As a consequence, the ROC of the optimal test is a nondecreasing, continuous and convex function, while other ROC curves of suboptimal diagnostic rules may not be.

The Lorenz curve and the AUC of the optimal test
An interesting special case discussed in [1] arises when X is a positive random variable with finite mean µ x = ∞ 0 xf X (x)dx and Y is the length-biased version of X, i.e. f Y (y) = yf X (y) µ X , y > 0.
In economic applications, Y represents wealth; in general, it may be a transferable character, i.e. some characteristic which can in theory be transported from one unit of the population to another. This is the famous Lorenz-Gini setup. The likelihood ratios in this case simplify to in which we recognize one of the usual forms of the Lorenz curve. We have just proven the following Corollary 1 In the Lorenz-Gini scenario, i.e. when f Y (y) = yf X (y)/µ X , the concentration curve is the usual Lorenz curve.
A second important consequence of Theorem 1 is about the AUC of the optimal likelihood ratio based test, which can be easily computed as follows.
Now, in the Lorenz-Gini scenario, the Gini concentration coefficient (Gini) is defined to be twice the area between the diagonal and the Lorenz curve: Since the concentration curve is a generalization of the Lorenz curve which describes the concentration of one variable with respect to another (and not necessarily its length-biased version), we can define the generalized Gini as Gini gen = 2 1 0 (p − ϕ(p))dp, similarly to the co-Gini in [5]. Substituting into expression (3) we obtain the following corollary.

Corollary 2
The AUC of the optimal likelihood ratio based diagnostic test equals AUC opt = 1 2 (1 + Gini gen ).
The same result can be found in [3] and mentioned by several other authors. We stress that the result is true for the likelihood ratio based test and, of course, for models with monotone likelihood ratios (like the example considered in [3]) but not in general for the AUC of any ROC, as also noted by [5]. A few more results we have obtained agree with the results in [3], but they have been presented here in a more general form at the population level for continuous variables, for which some examples are presented in the next section.
Example 3. Assume X is normal with mean µ X and variance σ 2 X and Y is normal with mean µ Y and variance σ 2 Y , with µ X > µ Y for the reasons stated in Example 1. This is the well-know binormal classification model which has been studied by several authors. To compute the concentration and the ROC opt curves in the general case, one should first compute and a task which can be accomplished by simulation or by tedious calculations. Notice that, unlike Examples 1 and 2, if σ X = σ Y the model does not have monotone likelihood ratios and only the likelihood ratio based test has a proper ROC curve, as it is well-known in the literature. If σ X = σ Y instead, the likelihood ratio is a linear function and therefore monotone. The two cases generalize to higher dimensions, giving rise to Fisher's Quadratic and Linear Discriminant functions, respectively. Further details are contained in [4].

Conclusions
The definition of concentration function given here is a convenient one for the diagnostic problem, since it compares two alternative probability distributions using a natural bivariate generalization of the Lorenz curve. The discussion on the concentration and the ROC curves at the population level allows for a deeper understanding of the concepts and for the proof of Theorem 1, which ties together the concentration function and the ROC curve of the optimal likelihood ratio based diagnostic test. Similar results were given by [3] at the sample level. All results mentioned in this section can conceptually be generalized to higher dimensions, although computations may become very hard. In particular, the likelihood ratio is an efficient dimension reduction technique which reduces the comparison to a one-dimensional problem and allows for Definition 1 of concentration function without involving higher dimensional conditional expectations or quantiles. We hope we have convinced the reader that the nature of the diagnostic (classification) problem requires a definition of concentration function which does not involve conditional and joint distributions of the populations which are being compared.