Skip to main content
Log in

Visualizing the decision rules behind the ROC curves: understanding the classification process

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

The receiver operating characteristic (ROC) curve is a graphical method commonly used to study the capacity of continuous variables (markers) to properly classify subjects into one of two groups. The decision made is ultimately endorsed by a classification subset on the space where the marker is defined. In this paper, we study graphical representations and propose visual forms to reflect those classification rules giving rise to the construction of the ROC curve. On the one hand, we use static pictures for displaying the classification regions for univariate markers, which are specially convenient when there is not a monotone relationship between the marker and the likelihood of belonging to one group. In those cases, there are two options to improve the classification accuracy: to allow for more flexibility in the classification rules (for example considering two cutoff points instead of one) or to transform the marker by using a function whose resulting ROC curve is optimal. On the other hand, we propose to build videos for visualizing the collection of subsets when several markers are considered simultaneously. A compilation of techniques for finding a rule that maximizes the area under the ROC curve is included, with a focus on linear combinations. We present a tool for the R software which generates those graphics, and we apply it to one real dataset. The R code is provided as Supplementary Material.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sonia Pérez-Fernández.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors gratefully acknowledge support by the Grants MTM2015-63971-P from the Spanish Ministerio of Economía y Competitividad and by FC-15-GRUPIN14-101 and Severo Ochoa Grant BP16118 from the Principado de Asturias and Grant from Campus of International Excellence of University of Oviedo (the last two ones for Pérez-Fernández).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1860 KB)

Appendix

Appendix

1.1 Theoretical result about the existence of a transformation \(h(\cdot )\) of the marker which reports a standard ROC curve equivalent to the gROC curve for the original marker

Proposition 1

Not for any set of values of (\(x_t^L\), \(x_t^U\)) with \(t \in [0,1]\) giving rise to the gROC curve one can find a transformation \(h(\cdot )\) of the marker such that the classification regions \(s_t = (x_t^L, x_t^U]\) (or \(s_t = (-\infty , x^L] \cup (x^U,\infty )\) without loss of generality) can be expressed as \(\mathscr {C}_t = \{ x \in \mathbb {R} \text{ such } \text{ that } h(x) \ge x^*_t\}\) for some \(x^*_t\) for all \(t \in [0,1]\). In other words, in some scenarios there is no transformation \(h(\cdot )\) of the marker such that the resulting standard ROC curve is the same as the gROC curve for the original marker.

Proof

  1. (1)

    Suppose that there exists a function \(h: \mathscr {D}(h) \subseteq \mathbb {R} \longrightarrow {\mathscr {R}}(h) \subseteq \mathbb {R}\) (where \(\mathscr {D}(h)\) and \({\mathscr {R}}(h)\) denote the domain and codomain of the function \(h(\cdot )\), respectively) such that, for every false-positive rate \(t \in [0,1]\), there exists a \(x^*_t \in \mathbb {R}\) such that the classification subset \(\mathscr {C}_t\) defined as

    $$\begin{aligned} \mathscr {C}_t = \{ x \in \mathscr {D}(h) \text{ such } \text{ that } h(x) \ge x^*_t\} \end{aligned}$$

    is equivalent to the classification subset \(s_t = (x_t^L, x_t^U]\) resulting from the classification process used in the definition of the gROC curve.

  2. (2)

    By definition, \(t = {\mathscr {P}}\{\chi \in \mathscr {C}_t\}\) and hence \(x^*_t \in \mathbb {R}\) is such that \({\mathscr {P}}\{h(\chi ) \ge x^*_t\} = t\). Therefore, given two false-positive rates \(t_1, t_2 \in [0,1]\) such that \(t_1 > t_2\), then \({\mathscr {P}}\{ h(\chi ) \ge x^*_{t_1} \} > {\mathscr {P}}\{ h(\chi ) \ge x^*_{t_2} \}\) and thus \(x^*_{t_1} < x^*_{t_2}\).

  3. (3)

    But for any function \(h: \mathscr {D}(h) \subseteq \mathbb {R} \longrightarrow {\mathscr {R}}(h) \subseteq \mathbb {R}\), we know that the following subsets content relationship is fulfilled:

    $$\begin{aligned} \{ x \in \mathscr {D}(h) \text{ such } \text{ that } h(x) \ge a_2 \} \subseteq \{ x \in \mathscr {D}(h) \text{ such } \text{ that } h(x) \ge a_1 \} \end{aligned}$$

    for every pair \(a_1, a_2 \in {\mathscr {R}}(h)\) such that \(a_1 < a_2\). This is due to the fact that, every \(x \in \{ x \in \mathscr {D}(h) \text{ such } \text{ that } h(x) \ge a_2 \}\), by definition fulfills that \(h(x) \ge a_2\) so, particularly, \(h(x) \ge a_1\) for every \(a_1 < a_2\), i.e., \(x \in \{ x \in \mathscr {D}(h) \text{ such } \text{ that } h(x) \ge a_1 \}\).

Joining the results above, we have that given two false-positive rates \(t_1, t_2 \in [0,1]\) such that \(t_1 > t_2\), then \(x^*_{t_1} < x^*_{t_2}\) [by (2)] and thus \(\mathscr {C}_{t_2} \subseteq \mathscr {C}_{t_1}\) [by (3)]. But \(\mathscr {C}_t = s_t\) for every \(t \in [0,1]\) by (1), therefore \(s_{t_2} \subseteq s_{t_1}\).

In summary, if there exists a transformation \(h(\cdot )\) of the marker such that the resulting ROC curve coincides with the gROC curve for the initial marker, then the corresponding classification regions, denoted by \(s_t \subset \mathbb {R}\), fulfill that

$$\begin{aligned} s_{t_2} \subseteq s_{t_1} \text{ for } \text{ every } \text{ pair } \text{ of } \text{ false-positive } \text{ rates } t_1, t_2 \in [0,1] \text{ such } \text{ that } t_1 > t_2. \end{aligned}$$

In the scenario where the marker in the negative population is normally distributed as \(\mathscr {N}(0, 0.75)\) and in the positive population follows the mixture of normal distributions \(\Delta \times \mathscr {N}(-0.5, 0.25) + \Delta \times \mathscr {N}(0.75, 0.25)\) where \(\Delta\) is a Bernoulli random variable with success probability \(\pi = 0.5\), the classification subsets underlying the gROC curve are those reported in Fig. 10.

Fig. 10
figure 10

Top, the density functions for the scenario \(\chi \sim \mathscr {N}(0, 0.75)\) and \(\xi \sim \Delta \times \mathscr {N}(-0.5, 0.25) + \Delta \times \mathscr {N}(0.75, 0.25)\) where \(\Delta\) is a Bernoulli random variable with success probability \(\pi = 0.5\). Bottom, the classification subsets (left) underlying the gROC curve (right). The classification subsets are of the form \((x^L,x^U] \subset \mathbb {R}\) and are colored in gray for every false-positive rate \(t \in [0,1]\). In particular, those corresponding to \(t_1 = 0.6\) and \(t_2 = 0.4\) are highlighted in blue

The classification subsets are of the form \((x^L,x^U] \subset \mathbb {R}\) and, by definition, are those reporting the maximum sensitivity, \(Se(x^L, x^U)\) for each fixed specificity \(Sp(x^L, x^U) = 1-t\), where

$$\begin{aligned} Sp(x^L, x^U)&= 1 - {\mathscr {P}}\{x^L < \chi \le x^U \} = 1 - \Phi \left( \dfrac{x^U}{0.75} \right) + \Phi \left( \dfrac{x^L}{0.75} \right) \end{aligned}$$
(12)
$$\begin{aligned} Se(x^L, x^U)&= {\mathscr {P}}\{x^L < \xi \le x^U \} = 0.5 \cdot \left[ \Phi \left( \dfrac{x^U + 0.5}{0.25} \right) + \Phi \left( \dfrac{x^U - 0.75}{0.25} \right) \right] \\&\quad - 0.5 \cdot \left[ \Phi \left( \dfrac{x^L + 0.5}{0.25} \right) + \Phi \left( \dfrac{x^L - 0.75}{0.25} \right) \right] \end{aligned}$$
(13)

with \(\Phi (\cdot )\) denoting the cumulative distribution function of a standard normal, \(\mathscr {N}(0,1)\).

Therefore,

$$\begin{aligned} {\mathscr {R}}_g (t) = \sup _{s \in {\mathscr {I}}_g(t)} {\mathscr {P}}\{ \xi \in s \} \end{aligned}$$

where \({\mathscr {I}}_g(t) = \{ s = (x^L,x^U] \subset {\mathbb {R}} \; \text { such that } \; {\mathscr {P}}\{\chi \in s\}\le t\}\) is equivalent to

$$\begin{aligned} {\mathscr {R}}_g (t) = 0.5 \cdot \sup _{x^L \in {\mathscr {I}}^*(t)} H(x^L) \end{aligned}$$

where \({\mathscr {I}}^*(t) = (-\infty , 0.75 \cdot \Phi ^{-1}(1-t)]\), with \(\Phi ^{-1}(\cdot )\) is the quantile distribution function of a standard normal, and

$$\begin{aligned} H(x^L)&= \Phi \left( \dfrac{0.75 \cdot \Phi ^{-1} \left( t + \Phi \left( \dfrac{x^L}{0.75} \right) \right) + 0.5}{0.25} \right) - \Phi \left( \dfrac{x^L + 0.5}{0.25} \right) \\&\quad + \Phi \left( \dfrac{0.75 \cdot \Phi ^{-1} \left( t + \Phi \left( \dfrac{x^L}{0.75} \right) \right) - 0.75}{0.25} \right) - \Phi \left( \dfrac{x^L - 0.75}{0.25} \right) \end{aligned}$$

by using the substitution \(x^U = 0.75 \cdot \Phi ^{-1} \left( t + \Phi \left( \dfrac{x^L}{0.75} \right) \right)\) from \(t = 1 - Sp(x^L, x^U) = {\mathscr {P}}\{ \chi \in (x^L, x^U] \}\).

Going back to the beginning of the proof, if there exists a transformation \(h(\cdot )\) of the marker such that the resulting ROC curve classification subsets over the original space coincide with those subsets underlying the gROC curve, then the highlighted classification regions in Fig. 10, \(s_{t_1}\) and \(s_{t_2}\) corresponding to \(t_1 = 0.6\) and \(t_2 = 0.4\), respectively, should fulfill that \(s_{t_2} \subseteq s_{t_1}\) since \(t_1 > t_2\). However, as it can be seen,

$$\begin{aligned} s_{t_2} = (0.145, 1.492] \nsubseteq (-0.307, 1.174] = s_{t_1}. \end{aligned}$$

That is, we have found a scenario where there is no transformation \(h(\cdot )\) of the marker with the ROC curve for such transformation being the same as the gROC curve for the original marker. \(\square\)

1.2 Simulation study about the influence of imposing the restriction (C) on the classification subsets underlying the gROC curve (Sect. 2.2.1)

In order to explore the influence of the restriction (C) on the resulting gROC curves and the impact of the selection of the initial point in Step 2 of the algorithm proposed in Sect. 2.2.1 on the classification subsets, a simulation study was carried out. An analysis of the change on the area under the gROC curve imposing the restriction (C) departing from different FPRs was conducted for different scenarios and sample sizes. Particularly:

  • Scenario 1. \(\chi \sim \mathscr {N}(0,1)\) and \(\xi \sim \mathscr {N}(a,b)\).

  • Scenario 2. \(\chi \sim \mathscr {U}(a,b)\) and \(\xi \sim \Delta \times \mathscr {N}(-2,1) + (1- \Delta ) \times \mathscr {N}(3,0.5)\) where \(\Delta\) is a Bernoulli random variable with success probability \(\pi = 0.5\).

The parameters a and b were taken for obtaining gAUCs without restrictions of 0.75 and 0.85. The classification subsets and gROC curves without and with restriction (C) are shown in Figure S1 (Supplementary Material). The results are based on \(B = 500\) simulations, and these are displayed in Fig 11. The numerical results have also been collected in Table S1 in Supplementary Material.

Fig. 11
figure 11

Results of 500 simulations of Scenario 1 and 2 for different sample sizes and theoretical gAUC. The mean (95% C.I.) for every estimator is displayed. For each one, the four vertical lines correspond to \(n=m=50\), \(2n=m=100\), \(n=m=100\), \(2n=m=200\). \(\widehat{{\mathscr {R}}}_g\) denotes the estimated gROC curve without restrictions, \(\widehat{{\mathscr {R}}}_g^{C}\) the optimal estimated gROC curve with restriction (C), and \(\widehat{{\mathscr {R}}}_g^{C,Y}\), \(\widehat{{\mathscr {R}}}_g^{C,0}\), \(\widehat{{\mathscr {R}}}_g^{C,0.1}\) and \(\widehat{{\mathscr {R}}}_g^{C,1}\) the estimated gROC curve with restriction (C) considering as starting point in Step 2 of the algorithm the FPR related to the Youden index, FPR = 0, FPR = 0.1 and FPR = 1, respectively

From the results, it can be seen that the area under the optimal gROC curve with the restriction (C) is similar to the gAUC without restrictions. Only a small decreasing is observed between \(\widehat{{\mathscr {R}}}_g\) and \(\widehat{{\mathscr {R}}}_g^{C}\), being 0.028 the maximum difference in gAUC means for Scenario 2. This scenario has been designed to be pathological regarding the non-compliance of restriction (C) (Figure S1).

The estimation of the optimal restricted gROC curve is computationally time-consuming for high sample sizes. In those cases, the suggestion is to use the FPR reported by the Youden index as the initial point in Step 2, because it results in higher AUCs for all the scenarios considered, compared to other initial points 0, 0.1 and 1. Despite its superiority, in the Scenario 2 with gAUC\(=0.85\), \(\widehat{{\mathscr {R}}}_g^{C,Y}\) slightly underestimates the area (with a maximum difference in means of 0.07), but the optimal \(\widehat{{\mathscr {R}}}_g^{C}\) remains giving adequate results.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pérez-Fernández, S., Martínez-Camblor, P., Filzmoser, P. et al. Visualizing the decision rules behind the ROC curves: understanding the classification process. AStA Adv Stat Anal 105, 135–161 (2021). https://doi.org/10.1007/s10182-020-00385-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-020-00385-2

Keywords

Navigation