Abstract
The receiver operating characteristic (ROC) curve is a graphical method commonly used to study the capacity of continuous variables (markers) to properly classify subjects into one of two groups. The decision made is ultimately endorsed by a classification subset on the space where the marker is defined. In this paper, we study graphical representations and propose visual forms to reflect those classification rules giving rise to the construction of the ROC curve. On the one hand, we use static pictures for displaying the classification regions for univariate markers, which are specially convenient when there is not a monotone relationship between the marker and the likelihood of belonging to one group. In those cases, there are two options to improve the classification accuracy: to allow for more flexibility in the classification rules (for example considering two cutoff points instead of one) or to transform the marker by using a function whose resulting ROC curve is optimal. On the other hand, we propose to build videos for visualizing the collection of subsets when several markers are considered simultaneously. A compilation of techniques for finding a rule that maximizes the area under the ROC curve is included, with a focus on linear combinations. We present a tool for the R software which generates those graphics, and we apply it to one real dataset. The R code is provided as Supplementary Material.
Similar content being viewed by others
References
Aitchison, J., Egozcue, J.J.: Compositional data analysis: where are we and where should we be heading? Math. Geol. 37(7), 829–850 (2005). https://doi.org/10.1007/s1100400573837
Bamber, D.: The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J. Math. Psychol. 12(4), 387–415 (1975). https://doi.org/10.1016/00222496(75)900012
Biau, G., Bunea, F., Wegkamp, M.H.: Functional classification in Hilbert spaces. IEEE Trans. Inf. Theory 51(6), 2163–2172 (2005). https://doi.org/10.1109/TIT.2005.847705
Chen, B., Li, P., Qin, J., Yu, T.: Using a monotonic density ratio model to find the asymptotically optimal combination of multiple diagnostic tests. J. Am. Stat. Assoc. 111(514), 861–874 (2016). https://doi.org/10.1080/01621459.2015.1066681
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936). https://doi.org/10.1111/j.14691809.1936.tb02137.x
Floege, J., Kim, J., Ireland, E., Chazot, C., Drueke, T., de Francisco, A., Kronenberg, F., Marcelli, D., PasslickDeetjen, J., Schernthaner, G., Fouqueray, B., Wheeler, D.C., Investigators, A.: Serum iPTH, calcium and phosphate, and the risk of mortality in a European haemodialysis population. Nephrol. Dial. Transplant. 26(6), 1948–1955 (2011). https://doi.org/10.1093/ndt/gfq219
Fluss, R., Faraggi, D., Reiser, B.: Estimation of the youden index and its associated cutoff point. Biom J 47(4), 458–472 (2005). https://doi.org/10.1002/bimj.200410135
Gardner, J.G., Bhamidipati, D.R., Rueda, A.M., Graviss, E., Nguyen, D., Musher, D.M.: The white blood cell count and prognosis in pneumococcal pneumonia. Open Forum Infect. Dis. (2016). https://doi.org/10.1093/ofid/ofw172.948
Green, D.M., Swets, J.A.: Signal Detection Theory and Psychophysics. Wiley, New York (1966)
Hall, P., Poskitt, D.S., Presnell, B.: A functional dataanalytic approach to signal discrimination. Technometrics 43(1), 1–9 (2001). https://doi.org/10.1198/00401700152404273
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982). https://doi.org/10.1148/radiology.143.1.7063747
Kang, L., Liu, A., Tian, L.: Linear combination methods to improve diagnostic/prognostic accuracy on future observations. Stat. Methods Med. Res. 25(4), 1359–1380 (2016). https://doi.org/10.1177/0962280213481053
Kauppi, H.: The Generalized Receiver Operating Characteristic Curve. Discussion paper 114, Aboa Centre for Economics (2016)
Liu, C., Liu, A., Halabi, S.: A min–max combination of biomarkers to improve diagnostic accuracy. Stat. Med. 30(16), 2005–2014 (2011). https://doi.org/10.1002/sim.4238
LópezRatón, M.: Optimal cutoff points for classification in diagnostic studies: new contributions and software development. Ph.D. Thesis, Universidade de Santiago de Compostela, (2015). http://hdl.handle.net/10347/14593
MartínezCamblor, P., PardoFernández, J.C.: Parametric estimates for the receiver operating characteristic curve generalization for nonmonotone relationships. Stat. Methods Med. Res. 28(7), 2032–2048 (2019). https://doi.org/10.1177/0962280217747009
MartínezCamblor, P., Corral, N., Rey, C., Pascual, J., CernudaMorollón, E.: Receiver operating characteristic curve generalization for nonmonotone relationships. Stat. Methods Med. Res. 26(1), 113–123 (2017). https://doi.org/10.1177/0962280214541095
MartínezCamblor, P., PérezFernández, S., DíazCoto, S.: Improving the biomarker diagnostic capacity via functional transformations. J. Appl. Stat. 46(9), 1550–1566 (2019). https://doi.org/10.1080/02664763.2018.1554628
McClish, D.K., Powell, S.H.: How well can physicians estimate mortality in a medical intensive care unit? Med. Decis. Mak. 9(2), 125–132 (1989). https://doi.org/10.1177/0272989X8900900207
McIntosh, M.W., Pepe, M.S.: Combining several screening tests: optimality of the risk score. Biometrics 58(3), 657–664 (2002). https://doi.org/10.1111/j.0006341x.2002.00657.x
Meisner, A., Carone, M., Pepe, M. S., Kerr, K. F.: Combining biomarkers by maximizing the true positive rate for a fixed false positive rate. UW Biostatistics Working Paper Series (Working Paper 420) (2017)
Nielsen, J.D., Rumí, R., Salmerón, A.: Supervised classification using probabilistic decision graphs. Comput. Stat. Data Anal. 53(4), 1299–1311 (2009). https://doi.org/10.1016/j.csda.2008.11.003
Pepe, M.S.: The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, Oxford (2003)
Pepe, M.S., Thompson, M.L.: Combining diagnostic test results to increase accuracy. Biostatistics 1(2), 123–140 (2000). https://doi.org/10.1093/biostatistics/1.2.123
Pepe, M.S., Cai, T., Longton, G.: Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 62(1), 221–229 (2006). https://doi.org/10.1111/j.15410420.2005.00420.x
PérezFernández, S., MartínezCamblor, P., Filzmoser, P., Corral, N.: nsROC: an R package for nonstandard ROC curve analysis. R J. 10(2), 55–77 (2018). https://doi.org/10.32614/RJ2018043
Su, J.Q., Liu, J.S.: Linear combinations of multiple diagnostic markers. J. Am. Stat. Assoc. 88(424), 1350–1355 (1993). https://doi.org/10.2307/2291276
Xu, T., Fang, Y., Rong, A., Wang, J.: Flexible combination of multiple diagnostic biomarkers to improve diagnostic accuracy. BMC Med. Res. Methodol. 15(1), 94 (2015). https://doi.org/10.1186/s128740150085z
Yan, L., Tian, L., Liu, S.: Combining large number of weak biomarkers based on AUC. Stat. Med. 34(29), 3811–3830 (2015). https://doi.org/10.1002/sim.6600
Yin, J., Tian, L.: Optimal linear combinations of multiple diagnostic biomarkers based on Youden index. Stat. Med. 33(8), 1426–1440 (2014). https://doi.org/10.1002/sim.6046
Zhang, H.: Classification trees for multiple binary responses. J. Am. Stat. Assoc. 93(441), 180–193 (1998). https://doi.org/10.2307/2669615
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors gratefully acknowledge support by the Grants MTM201563971P from the Spanish Ministerio of Economía y Competitividad and by FC15GRUPIN14101 and Severo Ochoa Grant BP16118 from the Principado de Asturias and Grant from Campus of International Excellence of University of Oviedo (the last two ones for PérezFernández).
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
1.1 Theoretical result about the existence of a transformation \(h(\cdot )\) of the marker which reports a standard ROC curve equivalent to the gROC curve for the original marker
Proposition 1
Not for any set of values of (\(x_t^L\), \(x_t^U\)) with \(t \in [0,1]\) giving rise to the gROC curve one can find a transformation \(h(\cdot )\) of the marker such that the classification regions \(s_t = (x_t^L, x_t^U]\) (or \(s_t = (\infty , x^L] \cup (x^U,\infty )\) without loss of generality) can be expressed as \(\mathscr {C}_t = \{ x \in \mathbb {R} \text{ such } \text{ that } h(x) \ge x^*_t\}\) for some \(x^*_t\) for all \(t \in [0,1]\). In other words, in some scenarios there is no transformation \(h(\cdot )\) of the marker such that the resulting standard ROC curve is the same as the gROC curve for the original marker.
Proof

(1)
Suppose that there exists a function \(h: \mathscr {D}(h) \subseteq \mathbb {R} \longrightarrow {\mathscr {R}}(h) \subseteq \mathbb {R}\) (where \(\mathscr {D}(h)\) and \({\mathscr {R}}(h)\) denote the domain and codomain of the function \(h(\cdot )\), respectively) such that, for every falsepositive rate \(t \in [0,1]\), there exists a \(x^*_t \in \mathbb {R}\) such that the classification subset \(\mathscr {C}_t\) defined as
$$\begin{aligned} \mathscr {C}_t = \{ x \in \mathscr {D}(h) \text{ such } \text{ that } h(x) \ge x^*_t\} \end{aligned}$$is equivalent to the classification subset \(s_t = (x_t^L, x_t^U]\) resulting from the classification process used in the definition of the gROC curve.

(2)
By definition, \(t = {\mathscr {P}}\{\chi \in \mathscr {C}_t\}\) and hence \(x^*_t \in \mathbb {R}\) is such that \({\mathscr {P}}\{h(\chi ) \ge x^*_t\} = t\). Therefore, given two falsepositive rates \(t_1, t_2 \in [0,1]\) such that \(t_1 > t_2\), then \({\mathscr {P}}\{ h(\chi ) \ge x^*_{t_1} \} > {\mathscr {P}}\{ h(\chi ) \ge x^*_{t_2} \}\) and thus \(x^*_{t_1} < x^*_{t_2}\).

(3)
But for any function \(h: \mathscr {D}(h) \subseteq \mathbb {R} \longrightarrow {\mathscr {R}}(h) \subseteq \mathbb {R}\), we know that the following subsets content relationship is fulfilled:
$$\begin{aligned} \{ x \in \mathscr {D}(h) \text{ such } \text{ that } h(x) \ge a_2 \} \subseteq \{ x \in \mathscr {D}(h) \text{ such } \text{ that } h(x) \ge a_1 \} \end{aligned}$$for every pair \(a_1, a_2 \in {\mathscr {R}}(h)\) such that \(a_1 < a_2\). This is due to the fact that, every \(x \in \{ x \in \mathscr {D}(h) \text{ such } \text{ that } h(x) \ge a_2 \}\), by definition fulfills that \(h(x) \ge a_2\) so, particularly, \(h(x) \ge a_1\) for every \(a_1 < a_2\), i.e., \(x \in \{ x \in \mathscr {D}(h) \text{ such } \text{ that } h(x) \ge a_1 \}\).
Joining the results above, we have that given two falsepositive rates \(t_1, t_2 \in [0,1]\) such that \(t_1 > t_2\), then \(x^*_{t_1} < x^*_{t_2}\) [by (2)] and thus \(\mathscr {C}_{t_2} \subseteq \mathscr {C}_{t_1}\) [by (3)]. But \(\mathscr {C}_t = s_t\) for every \(t \in [0,1]\) by (1), therefore \(s_{t_2} \subseteq s_{t_1}\).
In summary, if there exists a transformation \(h(\cdot )\) of the marker such that the resulting ROC curve coincides with the gROC curve for the initial marker, then the corresponding classification regions, denoted by \(s_t \subset \mathbb {R}\), fulfill that
In the scenario where the marker in the negative population is normally distributed as \(\mathscr {N}(0, 0.75)\) and in the positive population follows the mixture of normal distributions \(\Delta \times \mathscr {N}(0.5, 0.25) + \Delta \times \mathscr {N}(0.75, 0.25)\) where \(\Delta\) is a Bernoulli random variable with success probability \(\pi = 0.5\), the classification subsets underlying the gROC curve are those reported in Fig. 10.
The classification subsets are of the form \((x^L,x^U] \subset \mathbb {R}\) and, by definition, are those reporting the maximum sensitivity, \(Se(x^L, x^U)\) for each fixed specificity \(Sp(x^L, x^U) = 1t\), where
with \(\Phi (\cdot )\) denoting the cumulative distribution function of a standard normal, \(\mathscr {N}(0,1)\).
Therefore,
where \({\mathscr {I}}_g(t) = \{ s = (x^L,x^U] \subset {\mathbb {R}} \; \text { such that } \; {\mathscr {P}}\{\chi \in s\}\le t\}\) is equivalent to
where \({\mathscr {I}}^*(t) = (\infty , 0.75 \cdot \Phi ^{1}(1t)]\), with \(\Phi ^{1}(\cdot )\) is the quantile distribution function of a standard normal, and
by using the substitution \(x^U = 0.75 \cdot \Phi ^{1} \left( t + \Phi \left( \dfrac{x^L}{0.75} \right) \right)\) from \(t = 1  Sp(x^L, x^U) = {\mathscr {P}}\{ \chi \in (x^L, x^U] \}\).
Going back to the beginning of the proof, if there exists a transformation \(h(\cdot )\) of the marker such that the resulting ROC curve classification subsets over the original space coincide with those subsets underlying the gROC curve, then the highlighted classification regions in Fig. 10, \(s_{t_1}\) and \(s_{t_2}\) corresponding to \(t_1 = 0.6\) and \(t_2 = 0.4\), respectively, should fulfill that \(s_{t_2} \subseteq s_{t_1}\) since \(t_1 > t_2\). However, as it can be seen,
That is, we have found a scenario where there is no transformation \(h(\cdot )\) of the marker with the ROC curve for such transformation being the same as the gROC curve for the original marker. \(\square\)
1.2 Simulation study about the influence of imposing the restriction (C) on the classification subsets underlying the gROC curve (Sect. 2.2.1)
In order to explore the influence of the restriction (C) on the resulting gROC curves and the impact of the selection of the initial point in Step 2 of the algorithm proposed in Sect. 2.2.1 on the classification subsets, a simulation study was carried out. An analysis of the change on the area under the gROC curve imposing the restriction (C) departing from different FPRs was conducted for different scenarios and sample sizes. Particularly:

Scenario 1. \(\chi \sim \mathscr {N}(0,1)\) and \(\xi \sim \mathscr {N}(a,b)\).

Scenario 2. \(\chi \sim \mathscr {U}(a,b)\) and \(\xi \sim \Delta \times \mathscr {N}(2,1) + (1 \Delta ) \times \mathscr {N}(3,0.5)\) where \(\Delta\) is a Bernoulli random variable with success probability \(\pi = 0.5\).
The parameters a and b were taken for obtaining gAUCs without restrictions of 0.75 and 0.85. The classification subsets and gROC curves without and with restriction (C) are shown in Figure S1 (Supplementary Material). The results are based on \(B = 500\) simulations, and these are displayed in Fig 11. The numerical results have also been collected in Table S1 in Supplementary Material.
From the results, it can be seen that the area under the optimal gROC curve with the restriction (C) is similar to the gAUC without restrictions. Only a small decreasing is observed between \(\widehat{{\mathscr {R}}}_g\) and \(\widehat{{\mathscr {R}}}_g^{C}\), being 0.028 the maximum difference in gAUC means for Scenario 2. This scenario has been designed to be pathological regarding the noncompliance of restriction (C) (Figure S1).
The estimation of the optimal restricted gROC curve is computationally timeconsuming for high sample sizes. In those cases, the suggestion is to use the FPR reported by the Youden index as the initial point in Step 2, because it results in higher AUCs for all the scenarios considered, compared to other initial points 0, 0.1 and 1. Despite its superiority, in the Scenario 2 with gAUC\(=0.85\), \(\widehat{{\mathscr {R}}}_g^{C,Y}\) slightly underestimates the area (with a maximum difference in means of 0.07), but the optimal \(\widehat{{\mathscr {R}}}_g^{C}\) remains giving adequate results.
Rights and permissions
About this article
Cite this article
PérezFernández, S., MartínezCamblor, P., Filzmoser, P. et al. Visualizing the decision rules behind the ROC curves: understanding the classification process. AStA Adv Stat Anal 105, 135–161 (2021). https://doi.org/10.1007/s10182020003852
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182020003852