Skip to main content
Log in

A new way for ranking functional data with applications in diagnostic test

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

This is a two faces paper. Firstly, it investigates diagnostic tests in situations when the observed variables are functional, that is, diagnostic tests that use functional variables as biomarkers. A procedure based on functional version of ROC analysis is proposed, the main question being linked with a suitable way for ranking the sample of functional data. The second facet of this paper is to present a general new way for ordering functional data in a self-contained way allowing for a wide scope of applications overpassing the former diagnostic test problem. Finite sample analysis highlight how this ranking procedure behaves for diagnostic test.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Note that the line in E through \(\chi _1\) and \(\chi _2\) \((\chi _1, \chi _2 \in E)\) is defined as the set of points \(\{\chi _c(t)=c\chi _1(t)+(1-c)\chi _2(t); t\in T, c \in [0,1]\}\).

  2. If this assumption is not verified on entire domain of curves, a subset of such domain, where the overlap of groups is less, could be selected and then carrying out the diagnostic test on it. You could also choose to apply some transformation to the original functional data, for instance the first derivative, before run the diagnostic test.

  3. Note that if E is a Hilbert spaces or a \(L_p\) space for \(p\in (1,\infty )\), it would be strictly convex (see Kothe (1983)) and then the previous family \(E_1\) would be the line in E through \(M_A\) and \(M_{NA}\).

References

  • Aneiros G, Cao R, Fraiman R, Genest C, Vieu P (2019a) Recent advances in functional data analysis and high-dimensional statistics. J Multivar Anal 170(C):3–9

    Article  MathSciNet  Google Scholar 

  • Aneiros G, Cao R, Vieu P (2019b) Editorial on the special issue on functional data analysis and related topics. Comput Stat 34(2):447–450

    Article  MathSciNet  Google Scholar 

  • Balakrishnan N, Rao CR (1998a) Order statistics: an introduction. Handbook of statistics 16:3–24

    Article  MathSciNet  Google Scholar 

  • Balakrishnan N, Rao CR (1998b) Order statistics: applications, vol 17 of handbook of statistics. Elsevier, New York

  • Barnett V (1976) The ordering of multivariate data. J R Stat Soc: Ser A (General) 139(3):318–344

    MathSciNet  Google Scholar 

  • Begg CB (1991) Advances in statistical methodology for diagnostic medecine ni the 1980s. Stat Med 10(12):1887–1895

    Article  Google Scholar 

  • Bugni FA, Hall P, Horowitz JL, Neumann GR (2009) Goodness-of-fit tests for functional data. Econom J 12:S1–S18

    Article  MathSciNet  Google Scholar 

  • Carvalho VI, Carvalho M, Alonzo TA, González-Manteiga W et al (2016) Functional covariate-adjusted partial area under the specificity-ROC curve with an application to metabolic syndrome diagnosis. Ann Appl Stat 10(3):1472–1495

    Article  MathSciNet  Google Scholar 

  • Chakraborty A, Chaudhuri P (2014) The spatial distribution in infinite dimensional spaces and related quantiles and depths. Ann Stat 42(3):1203–1231

    Article  MathSciNet  Google Scholar 

  • Chakraborty A, Chaudhuri P (2015) A wilcoxon-mann-whitney-type test for infinite-dimensional data. Biometrika 102(1):239–246

    Article  MathSciNet  Google Scholar 

  • Cuesta-Albertos JA, Nieto-Reyes A (2008) The random tukey depth. Comput Stat Data Anal 52(11):4979–4988

    Article  MathSciNet  Google Scholar 

  • Cuevas A (2014) A partial overview of the theory of statistics with functional data. J Stat Plan Inference 147:1–23

    Article  MathSciNet  Google Scholar 

  • Cuevas A, Febrero M, Fraiman R (2006) On the use of the bootstrap for estimating functions with functional data. Comput Stat Data Anal 51(2):1063–1074

    Article  MathSciNet  Google Scholar 

  • Cuevas A, Febrero M, Fraiman R (2007) Robust estimation and classification for functional data via projection-based depth notions. Comput Stat 22(3):481–496

    Article  MathSciNet  Google Scholar 

  • Cuevas A, Fraiman R (2009) On depth measures and dual statistics. A methodology for dealing with general data. J Multivar Anal 100(4):753–766

    Article  MathSciNet  Google Scholar 

  • David H, Nagaraja H (2003) Order statistics (Third edition). Wiley Series in Probability and Statistics. Wiley

  • D’Esposito MR, Ragozini G (2008) A new r-ordering procedure to rank multivariate performances. Quaderni di Stat 10:5–21

    Google Scholar 

  • Eddy W (1985) Ordering of multivariate data. Computer science and statistics: the interface, pp 25–30

  • Estévez-Pérez G, Vilar JA (2013) Functional anova starting from discrete data: an application to air quality data. Environ Ecol Stat 20(3):495–517

    Article  MathSciNet  Google Scholar 

  • Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, Berlin

    MATH  Google Scholar 

  • Fraiman R, Muniz G (2001) Trimmed means for functional data. Test 10(2):419–440

    Article  MathSciNet  Google Scholar 

  • Goia A, Vieu P (2016) An introduction to recent advances in high/infinite dimensional statistics. J Multivar Anal 146:1–6

    Article  MathSciNet  Google Scholar 

  • Horváth L, Kokoszka P (2012) Inference for functional data with applications, vol 200. Springer, Berlin

    Book  Google Scholar 

  • Hsieh F, Turnbull BW (1996) Nonparametric estimation of the receiver operating characteristic curve. Ann Stat 25:25–40

    MathSciNet  MATH  Google Scholar 

  • Hung H, Chiang C-T (2011) Nonparametric methodology for the time-dependent partial area under the ROC curve. J Stat Plan Inference 141(12):3829–3838

    Article  MathSciNet  Google Scholar 

  • Inácio V, González-Manteiga W, Febrero-Bande M, Gude F, Alonzo TA, Cadarso-Suárez C (2012) Extending induced ROC methodology to the functional context. Biostatistics 13(4):594–608

    Article  Google Scholar 

  • Kemmeren P, van Berkum NL, Vilo J, Bijma T, Donders R, Brazma A, Holstege FC (2002) Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol Cell 9(5):1133–1143

    Article  Google Scholar 

  • Korhonen P, Siljamäki A (1998) Ordinal principal component analysis theory and an application. Comput Stat Data Anal 26(4):411–424

    Article  MathSciNet  Google Scholar 

  • Kothe G (1983) Topological vector spaces I. Springer, New York

    Book  Google Scholar 

  • Ma H, Bandos AI, Rockette HE, Gur D (2013) On use of partial area under the ROC curve for evaluation of diagnostic performance. Stat Med 32(20):3449–3458

    Article  MathSciNet  Google Scholar 

  • Peng L, Zhou X-H (2004) Local linear smoothing of receiver operating characteristic ( ROC) curves. J Stat Plan Inference 118(1–2):129–143

    Article  MathSciNet  Google Scholar 

  • Pepe MS et al (2003) The statistical evaluation of medical tests for classification and prediction. Medicine

  • R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

  • Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York

    Book  Google Scholar 

  • Ratón ML (2016) Optimal cutoff points for classification in diagnostic studies: new contributions and software development. Ph.D. thesis, Universidade de Santiago de Compostela

  • Serfling R (2006) Depth functions in nonparametric multivariate inference. DIMACS Ser Discrete Math Theor Comput Sci 72:1

    Article  MathSciNet  Google Scholar 

  • Sguera C, Galeano P, Lillo R (2014) Spatial depth-based classification for functional data. TEST 23(4):725–750

    Article  MathSciNet  Google Scholar 

  • Swets JA (1979) ROC analysis applied to the evaluation of medical imaging techniques. Invest Radiol 14:109–121

    Article  Google Scholar 

  • Swets JA, Pickett RM (1982) Evaluation of diagnostic systems: methods from signal detection theory. Academic Press, New York

    Google Scholar 

  • Wang Z, Chang Y-CI (2011) Marker selection via maximizing the partial area under the ROC curve of linear risk scores. Biostatistics 12(2):369–385

    Article  Google Scholar 

  • Weinstein MC, Fineberg HV (1980) Clinical decision analysis. W.B. Saunders Company, Philadelphia

    Google Scholar 

  • Zhang J (2002) Some extensions of tukey’s depth function. J Multivar Anal 82:134–165

    Article  MathSciNet  Google Scholar 

  • Zhou XH, McClish DK, Obuchowski NA (2002) Statistical methods in diagnostic medicine. Wiley-Interscience

  • Zou KH, Hall WJ, Shapiro DE (1997) Smooth nonparametric receiver operating characteristic ( ROC) curves for continuous diagnostic tests. Stat Med 16:2143–2156

    Article  Google Scholar 

  • Zuo Y, Serfling R (2000) General notions of statistical depth function. Ann Stat 28(2):461–482

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research has been supported by MINECO (Grant MTM2014-52876-R), by Xunta de Galicia (Centro Singular de Investigación de Galicia ED431G/01 and Grupos de Referencia Competitiva ED431C2016-015), all of them through the ERDF. The authors would like to thank the Associate Editor and the two anonymous referees for their constructive and helpful comments, which have greatly improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Graciela Estévez-Pérez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Annex I: Supplementary material

Annex I: Supplementary material

1.1 Simulation study with only replication

We carried out an initial analysis based only on one replication to understand how the functional diagnostic test works on a particular dataset. Specifically, for the family of mean curves (F1), one trial with \(n_i=30\) independent trajectories for each group was produced. In this case, the parameters \(\sigma \) and \(\sigma _{1}\) take values 0.04 and 0.06, respectively although other values could be consider. The smoothed curves, obtained according to it was indicated in Sect. 4, are shown in the left part of Fig. 7. In this situation, the curves of both groups present a similar shape although the data of Affected subjects take in general higher values than data of Non-Affected ones, such as we have supposed in Sect. 3.1.

Pairing the different location measures r.t with the different proximities measures d.t (see Table 2), we have obtained the corresponding empirical ROC curves and their AUC values, which are shown in Table 6.

Table 6 Area under each empirical ROC curve (AUC) and minimal distance from the empirical ROC curves to the point (0, 1) for several values of r.t and d.t

It can be seen that values of AUC, for \(L_2\), Semi.PCA, Semi.hshift and Semi.mplsr \((d.t \in \{1,5,6,8\})\), are around 0.87, 0.87, 0.88 and 0.96 independently of representative curve type. For the other options of semi-metrics there are more variability depending of r.t. We observe that \(r.t=4\) (CC) leads to the worst result, in terms of discriminating power, for all semi-metric. In addition, the best combination in terms of AUC is: FMean-Semi.Basis (\(AUC=0.996\)), whereas other reasonable results have been obtained combining FMean-Semi.Deriv (\(AUC=0.988\)), for example.

Also for each combination of (r.td.t), we compute \(c_0\) as the value in \(\mathbb {R}\) that minimizes the distance from corresponding ROC curve to the point (0, 1) (North-West corner criterion). Such minimum distances, shown in Table 6, lead the analogous results as those observed for AUC measures. Following, to get the cutoff curve \(\chi _{c_0}\) and hence the diagnostic rule, we choose the combination (r.td.t) that maximizes the correspondent AUC, taht is \(r.t=1\) (FMean curve) and \(d.t=2\) (Semi.Basis with first deriv).

Once the parameters \(d.t=2\) and \(r.t=1\) have been chosen, the optimal cutoff curve \(\chi _{c_0}\) is obtained and the diagnostic test is established. In this case, we achieve a \(3.33\%\) of false positive and a \(96.67\%\) of true positive (\((\alpha _{c_0}, 1-\beta _{c_0})=(0.033, 0.967)\)). Finally, in the left plot of Fig. 7 the optimal cutoff curve \(\chi _{c_0}\) (thin line in black) was added to sample data and the representative curves for each group (mean curves for A and NA). The right side of Fig. 7, is presenting an analogous graphic with the first derivatives of each curve (functional data, mean curves and optimal cutoff curve) because the selected semi-metric has been \(d.t=2\) (Semi.Basis) with first derivative.

Fig. 7
figure 7

Left graphic show the smoothed simulated dataset (thin line) for scenario F1 and mean curves for each group (thick line). Optimal cutoff curve (black line) together with the sample data (dashed lines) and the representative curves for each group (thick lines)

1.2 Other graphs of simulation study

Fig. 8
figure 8

Means of area under ROC curve (AUC) for several values of r.t and d.t when F1 scenario is considered. Sample size \(n_1=n_2=30\) and \(\sigma =0.08\) and \(\sigma _1=0.1\) (A bottom chart); \(\sigma =0.06\) and \(\sigma _1=0.06\) (B central chart); \(\sigma =0.04\) and \(\sigma _1=0.04\) (C top chart)

Fig. 9
figure 9

Means of area under ROC curve (AUC) for several values of r.t and d.t when F2 scenario is considered. Sample size \(n_1=n_2=50\), \(\sigma =0.01\) and \(\sigma _1=0.03\)

Fig. 10
figure 10

Means of area under ROC curve (AUC) for several values of r.t and d.t when F2 scenario is considered. Sample size \(n_1=n_2=50\), \(\sigma =\sigma _1=0.06\)

Fig. 11
figure 11

Means of AUC, 1-specificity and sensitivity for several values r.t and d.t when F3 scenario is considered. Sample size \(n_1=n_2=30\) and \(\sigma =0.2\) and \(\sigma _1=0.8\)

Fig. 12
figure 12

Means of AUC, 1-specificity and sensitivity for several values r.t and d.t when F4 scenario is considered. Sample size \(n_1=n_2=20\) and \(\sigma =1.4\) and \(\sigma _1=1.8\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Estévez-Pérez, G., Vieu, P. A new way for ranking functional data with applications in diagnostic test. Comput Stat 36, 127–154 (2021). https://doi.org/10.1007/s00180-020-01020-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-01020-z

Keywords

Navigation