Abstract
The receiver operating characteristic curve is a popular tool to describe and compare the diagnostic accuracy of biomarkers when the binary-scale gold standard is available. There are, however, many examples of diagnostic tests whose gold standards are continuous. Hence, several extensions are proposed to evaluate the diagnostic potential of biomarkers when the gold standard is continuous-scale. In practice, there may exist more than one biomarkers and diagnostic accuracy can be improved by combining multiple biomarkers. In this paper, an explicit form of diagnostic accuracy index and the corresponding linear combination that maximizes the diagnostic accuracy are derived under elliptical distribution assumption. Simulations are conducted to evaluate the performance of the diagnostic accuracy and the optimal linear combination under different distribution assumptions. The methodology is applied to a behavioral intervention study for families of youth with type 1 diabetes.
Similar content being viewed by others
References
Anderson, T.W.: An Introduction to Multivariate Statistical Analysis. Wiley, New York (1958)
Baker, S.G.: Identifying combinations of cancer markers for further study as triggers of early intervention. Biometrics 56, 1082–1087 (2000)
Chang, Y.C.: Maximizing an ROC-type measure via linear combination of markers when the gold reference is continuous. Stat. Med. 32, 1893–1903 (2013)
Eguchi, S., Copas, J.: A class of logistic-type discriminant functions. Biometrika 89, 1–22 (2002)
Fang, H.B., Fang, K.T., Kotz, S.: The meta-elliptical distributions with given marginals. J. Multivar. Anal. 82, 1–16 (2002)
Jin, H., Lu, Y.: The optimal linear combination of multiple predictors under the generalized linear models. Stat. Probab. Lett. 79, 2321–2327 (2009)
Kendall, M.: A new measure of rank correlation. Biometrika 30, 81–89 (1938)
Liu, A., Schisterman, E.F., Zhu, Y.: On linear combinations of biomarkers to improve diagnostic accuracy. Stat. Med. 24, 37–47 (2005)
Liu, C., Liu, A., Halabi, S.: A min-max combination of biomarkers to improve diagnostic accuracy. Stat. Med. 30, 2005–2014 (2011)
Ma, S., Huang, J.: Combining multiple markers for classification using ROC. Biometrics 63, 751–757 (2007)
McIntosh, M.W., Pepe, M.S.: Combining several screening tests: optimality of the risk score. Biometrics 58, 657–664 (2002)
Neyman, J., Pearson, E.S.: The testing of statistical hypotheses in relation to probabilities a priori. Mathematical Proceedings of the Cambridge Philosophical Society, vol. 29, pp. 492–510. Cambridge University Press, Cambridge (1933)
Nansel, T.R., Iannotti, R.J., Liu, A.: Clinic-integrated behavioral intervention for families of youth with type 1 diabetes: randomized clinical trial. Pediatrics 129, 866–873 (2012)
Obuchowski, N.A.: Estimating and comparing diagnostic tests accuracy when the gold standard is not binary. Acad. Radiol. 12, 1198–1204 (2005)
Obuchowski, N.A.: An ROC-type measure of diagnostic accuracy when the gold standard is continuous-scale. Stat. Med. 25, 481–493 (2006)
Pepe, M.S.: The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, Ocford (2003)
Pepe, M.S., Cai, T., Longton, G.: Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 62, 221–229 (2006)
Pepe, M.S., Thompson, M.L.: Combining diagnostic test results to increase accuracy. Biostatistics 1, 123–140 (2000)
Richards, R.J., Hammitt, J.K., Tsevat, J.: Finding the optimal multiple-test strategy using a method analogous to logistic regression the diagnosis of hepatolenticular degeneration (wilson’s disease). Med. Decis. Making 16, 367–375 (1996)
Su, J.Q., Liu, J.S.: Linear combinations of multiple diagnostic markers. J. Am. Stat. Assoc. 88, 1350–1355 (1993)
Wang, Z., Chang, Y. C.: Evaluating the diagnostic powers of variables and their linear combinations when the gold standard is continuous. arXiv preprint arXiv:1105.1575 (2011)
Zhou, X.H., Obuchowski, N.A., McClish, D.K.: Statistical Methods in Diagnostic Medicine, 2nd edn. Wiley, New York (2011)
Zou, K.H., Liu, A., Bandos, A.I., Ohno-Machado, L., Rockette, H.E.: Statistical Evaluation of Diagnostic Performance: Topics in ROC Analysis. Chapman and Hall/CRC, New York (2011)
Funding
Research of A. Liu is supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), the National Institutes of Health (NIH). The authors thank two anonymous referees for their valuable comments that resulted in an improved paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Human and animal rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Appendices
Appendix 1: Generating data from a multivariate lognormal distribution
We use the following scheme to generate a random sample from a multivariate lognormal distribution.
-
(1).
Generate a p-dimensional random vector \((Z_1,Z_2,\ldots Z_p)\) from a multivariate normal distribution with mean \(\mathbf {\mu }=(\mu _1,..\mu _p)\) and variance-covariance matrix \({\varvec{\Sigma }}=(\sigma _{ij})p\times p\).
-
(2).
Transform the p-dimensional random vector to a new one \((X_1,X_2,\ldots ,X_{p-1},Y)\), where
$$\begin{aligned} X_i&= {} e^{Z_i}, i=1,2,\ldots ,p-1; \\ Y&= {} e^{Z_p}. \end{aligned}$$
The vector \((X_1,X_2,\ldots ,X_{p-1},Y)\) is then a p-dimensional lognormal vector with mean \(\mathbf {\mu _L}=e^{\mathbf {\mu }+\frac{1}{2}diag({\varvec{\Sigma }})}\) and the variance-covariance matrix
where \(diag({\varvec{\Sigma }})\) is a vector of diagonal elements in \({\varvec{\Sigma }}\), and \(\varSigma _{ij}=e^{\mu _i+\mu _j+\frac{1}{2}(\sigma _{ii}+\sigma _{jj})}(e^{\sigma _{ij}}-1),i,j=1,\ldots ,p.\)
Appendix 2: Generating data from a multivariate gamma distribution
We propose a simple scheme to generate a data set from a multivariate gamma distribution. The algorithm proceeds as follows.
-
(1).
Generate \(p+1\) independent random observations, \(G_0,G_1,\ldots G_p\), from gamma distributions, such that
$$\begin{aligned} G_i \sim Gamma(\alpha _i,\beta ) , i=0,1,\ldots ,p. \end{aligned}$$ -
(2).
Generate a p-dimensional random vector \((X_1,X_2,\ldots ,X_{p-1},Y)\), where
$$\begin{aligned} X_i&= {} G_0+G_i, i=1,2,\ldots ,p-1; \\ Y&= {} G_0+G_1+\cdots +G_p. \end{aligned}$$
The vector \((X_1,X_2,\ldots ,X_{p-1},Y)\) is then a p-dimensional gamma vector, where the marginal distribution of each \(X_i\) is \(Gamma(\alpha _0+\alpha _i,\beta )\) and the marginal distribution of Y is \(Gamma(\sum _{i=0}^p \alpha _i , \beta )\). The variance-covariance matrix is
where
This is a special case of multivariate gamma distributions, which is used for our simulation.
Rights and permissions
About this article
Cite this article
Sun, X., Liu, A. & Li, Z. Maximizing an ROC-type measure via linear combinations of biomarkers. Health Serv Outcomes Res Method 16, 103–116 (2016). https://doi.org/10.1007/s10742-016-0150-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10742-016-0150-z