Maximizing an ROC-type measure via linear combinations of biomarkers

Sun, Xian; Liu, Aiyi; Li, Zhaohai

doi:10.1007/s10742-016-0150-z

Maximizing an ROC-type measure via linear combinations of biomarkers

Published: 20 June 2016

Volume 16, pages 103–116, (2016)
Cite this article

Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

171 Accesses
1 Citation
Explore all metrics

Abstract

The receiver operating characteristic curve is a popular tool to describe and compare the diagnostic accuracy of biomarkers when the binary-scale gold standard is available. There are, however, many examples of diagnostic tests whose gold standards are continuous. Hence, several extensions are proposed to evaluate the diagnostic potential of biomarkers when the gold standard is continuous-scale. In practice, there may exist more than one biomarkers and diagnostic accuracy can be improved by combining multiple biomarkers. In this paper, an explicit form of diagnostic accuracy index and the corresponding linear combination that maximizes the diagnostic accuracy are derived under elliptical distribution assumption. Simulations are conducted to evaluate the performance of the diagnostic accuracy and the optimal linear combination under different distribution assumptions. The methodology is applied to a behavioral intervention study for families of youth with type 1 diabetes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heterogeneity of glycaemic phenotypes in type 1 diabetes

Article Open access 23 May 2024

Update on the Obesity Epidemic: After the Sudden Rise, Is the Upward Trajectory Beginning to Flatten?

Article Open access 02 October 2023

A Tutorial on Applying the Difference-in-Differences Method to Health Data

Article Open access 07 September 2023

References

Anderson, T.W.: An Introduction to Multivariate Statistical Analysis. Wiley, New York (1958)
Google Scholar
Baker, S.G.: Identifying combinations of cancer markers for further study as triggers of early intervention. Biometrics 56, 1082–1087 (2000)
Article CAS PubMed Google Scholar
Chang, Y.C.: Maximizing an ROC-type measure via linear combination of markers when the gold reference is continuous. Stat. Med. 32, 1893–1903 (2013)
Article PubMed Google Scholar
Eguchi, S., Copas, J.: A class of logistic-type discriminant functions. Biometrika 89, 1–22 (2002)
Article Google Scholar
Fang, H.B., Fang, K.T., Kotz, S.: The meta-elliptical distributions with given marginals. J. Multivar. Anal. 82, 1–16 (2002)
Article CAS Google Scholar
Jin, H., Lu, Y.: The optimal linear combination of multiple predictors under the generalized linear models. Stat. Probab. Lett. 79, 2321–2327 (2009)
Article PubMed PubMed Central Google Scholar
Kendall, M.: A new measure of rank correlation. Biometrika 30, 81–89 (1938)
Article Google Scholar
Liu, A., Schisterman, E.F., Zhu, Y.: On linear combinations of biomarkers to improve diagnostic accuracy. Stat. Med. 24, 37–47 (2005)
Article PubMed Google Scholar
Liu, C., Liu, A., Halabi, S.: A min-max combination of biomarkers to improve diagnostic accuracy. Stat. Med. 30, 2005–2014 (2011)
Article PubMed PubMed Central Google Scholar
Ma, S., Huang, J.: Combining multiple markers for classification using ROC. Biometrics 63, 751–757 (2007)
Article PubMed Google Scholar
McIntosh, M.W., Pepe, M.S.: Combining several screening tests: optimality of the risk score. Biometrics 58, 657–664 (2002)
Article PubMed Google Scholar
Neyman, J., Pearson, E.S.: The testing of statistical hypotheses in relation to probabilities a priori. Mathematical Proceedings of the Cambridge Philosophical Society, vol. 29, pp. 492–510. Cambridge University Press, Cambridge (1933)
Nansel, T.R., Iannotti, R.J., Liu, A.: Clinic-integrated behavioral intervention for families of youth with type 1 diabetes: randomized clinical trial. Pediatrics 129, 866–873 (2012)
Article Google Scholar
Obuchowski, N.A.: Estimating and comparing diagnostic tests accuracy when the gold standard is not binary. Acad. Radiol. 12, 1198–1204 (2005)
Article PubMed Google Scholar
Obuchowski, N.A.: An ROC-type measure of diagnostic accuracy when the gold standard is continuous-scale. Stat. Med. 25, 481–493 (2006)
Article PubMed Google Scholar
Pepe, M.S.: The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, Ocford (2003)
Google Scholar
Pepe, M.S., Cai, T., Longton, G.: Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 62, 221–229 (2006)
Article PubMed Google Scholar
Pepe, M.S., Thompson, M.L.: Combining diagnostic test results to increase accuracy. Biostatistics 1, 123–140 (2000)
Article PubMed Google Scholar
Richards, R.J., Hammitt, J.K., Tsevat, J.: Finding the optimal multiple-test strategy using a method analogous to logistic regression the diagnosis of hepatolenticular degeneration (wilson’s disease). Med. Decis. Making 16, 367–375 (1996)
Article CAS PubMed Google Scholar
Su, J.Q., Liu, J.S.: Linear combinations of multiple diagnostic markers. J. Am. Stat. Assoc. 88, 1350–1355 (1993)
Article Google Scholar
Wang, Z., Chang, Y. C.: Evaluating the diagnostic powers of variables and their linear combinations when the gold standard is continuous. arXiv preprint arXiv:1105.1575 (2011)
Zhou, X.H., Obuchowski, N.A., McClish, D.K.: Statistical Methods in Diagnostic Medicine, 2nd edn. Wiley, New York (2011)
Book Google Scholar
Zou, K.H., Liu, A., Bandos, A.I., Ohno-Machado, L., Rockette, H.E.: Statistical Evaluation of Diagnostic Performance: Topics in ROC Analysis. Chapman and Hall/CRC, New York (2011)
Google Scholar

Download references

Funding

Research of A. Liu is supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), the National Institutes of Health (NIH). The authors thank two anonymous referees for their valuable comments that resulted in an improved paper.

Author information

Authors and Affiliations

Department of Statistics, The George Washington University, Washington, DC, USA
Xian Sun & Zhaohai Li
Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD, USA
Aiyi Liu

Authors

Xian Sun
View author publications
You can also search for this author in PubMed Google Scholar
Aiyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhaohai Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aiyi Liu.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Appendices

Appendix 1: Generating data from a multivariate lognormal distribution

We use the following scheme to generate a random sample from a multivariate lognormal distribution.

(1).
Generate a p-dimensional random vector $(Z_1,Z_2,\ldots Z_p)$ from a multivariate normal distribution with mean $\mathbf {\mu }=(\mu _1,..\mu _p)$ and variance-covariance matrix ${\varvec{\Sigma }}=(\sigma _{ij})p\times p$.
(2).
Transform the p-dimensional random vector to a new one $(X_1,X_2,\ldots ,X_{p-1},Y)$, where
$$\begin{aligned} X_i&= {} e^{Z_i}, i=1,2,\ldots ,p-1; \\ Y&= {} e^{Z_p}. \end{aligned}$$

The vector $(X_1,X_2,\ldots ,X_{p-1},Y)$ is then a p-dimensional lognormal vector with mean $\mathbf {\mu _L}=e^{\mathbf {\mu }+\frac{1}{2}diag({\varvec{\Sigma }})}$ and the variance-covariance matrix

$$\begin{aligned} {\varvec{\Sigma _L}}=\begin{bmatrix} \varSigma _{11}&\varSigma _{12}&\cdots&\varSigma _{1p} \\ \varSigma _{21}&\varSigma _{22}&\cdots&\varSigma _{2p} \\ \vdots&\ddots&\vdots \\ \varSigma _{p1}&\varSigma _{p2}&\cdots&\varSigma _{pp} \end{bmatrix}, \end{aligned}$$

where $diag({\varvec{\Sigma }})$ is a vector of diagonal elements in ${\varvec{\Sigma }}$, and $\varSigma _{ij}=e^{\mu _i+\mu _j+\frac{1}{2}(\sigma _{ii}+\sigma _{jj})}(e^{\sigma _{ij}}-1),i,j=1,\ldots ,p.$

Appendix 2: Generating data from a multivariate gamma distribution

We propose a simple scheme to generate a data set from a multivariate gamma distribution. The algorithm proceeds as follows.

(1).
Generate $p+1$ independent random observations, $G_0,G_1,\ldots G_p$, from gamma distributions, such that
$$\begin{aligned} G_i \sim Gamma(\alpha _i,\beta ) , i=0,1,\ldots ,p. \end{aligned}$$
(2).
Generate a p-dimensional random vector $(X_1,X_2,\ldots ,X_{p-1},Y)$, where
$$\begin{aligned} X_i&= {} G_0+G_i, i=1,2,\ldots ,p-1; \\ Y&= {} G_0+G_1+\cdots +G_p. \end{aligned}$$

The vector $(X_1,X_2,\ldots ,X_{p-1},Y)$ is then a p-dimensional gamma vector, where the marginal distribution of each $X_i$ is $Gamma(\alpha _0+\alpha _i,\beta )$ and the marginal distribution of Y is $Gamma(\sum _{i=0}^p \alpha _i , \beta )$. The variance-covariance matrix is

$$\begin{aligned} {\varvec{\Sigma }}=\begin{bmatrix} \sigma ^2_1&\sigma ^2_x&\cdots&\sigma ^2_x&\sigma ^2_{1} \\ \sigma ^2_x&\sigma ^2_2&\cdots&\sigma ^2_x&\sigma ^2_{2} \\ \vdots&\ddots&\vdots \\ \sigma ^2_x&\cdots&\sigma ^2_x&\sigma ^2_{p-1}&\sigma ^2_{(p-1)}\\ \sigma ^2_{1}&\sigma ^2_{2}&\cdots&\sigma ^2_{(p-1)}&\sigma ^2_y \end{bmatrix}, \end{aligned}$$

where

$$\begin{aligned} \sigma ^2_i&= {} \frac{\alpha _0+\alpha _i}{\beta ^2},i=1,\ldots ,p-1 \\ \sigma ^2_x&= {} \frac{\alpha _0}{\beta ^2},\\ \sigma ^2_y &= {} \frac{\sum _{i=0}^p \alpha _i}{\beta ^2}. \end{aligned}$$

This is a special case of multivariate gamma distributions, which is used for our simulation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, X., Liu, A. & Li, Z. Maximizing an ROC-type measure via linear combinations of biomarkers. Health Serv Outcomes Res Method 16, 103–116 (2016). https://doi.org/10.1007/s10742-016-0150-z

Download citation

Received: 01 January 2016
Revised: 07 June 2016
Accepted: 13 June 2016
Published: 20 June 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10742-016-0150-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximizing an ROC-type measure via linear combinations of biomarkers

Abstract

Access this article

Similar content being viewed by others

Heterogeneity of glycaemic phenotypes in type 1 diabetes

Update on the Obesity Epidemic: After the Sudden Rise, Is the Upward Trajectory Beginning to Flatten?

A Tutorial on Applying the Difference-in-Differences Method to Health Data

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Appendices

Appendix 1: Generating data from a multivariate lognormal distribution

Appendix 2: Generating data from a multivariate gamma distribution

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Maximizing an ROC-type measure via linear combinations of biomarkers

Abstract

Access this article

Similar content being viewed by others

Heterogeneity of glycaemic phenotypes in type 1 diabetes

Update on the Obesity Epidemic: After the Sudden Rise, Is the Upward Trajectory Beginning to Flatten?

A Tutorial on Applying the Difference-in-Differences Method to Health Data

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Appendices

Appendix 1: Generating data from a multivariate lognormal distribution

Appendix 2: Generating data from a multivariate gamma distribution

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation