Skip to main content
Log in

Maximizing an ROC-type measure via linear combinations of biomarkers

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

The receiver operating characteristic curve is a popular tool to describe and compare the diagnostic accuracy of biomarkers when the binary-scale gold standard is available. There are, however, many examples of diagnostic tests whose gold standards are continuous. Hence, several extensions are proposed to evaluate the diagnostic potential of biomarkers when the gold standard is continuous-scale. In practice, there may exist more than one biomarkers and diagnostic accuracy can be improved by combining multiple biomarkers. In this paper, an explicit form of diagnostic accuracy index and the corresponding linear combination that maximizes the diagnostic accuracy are derived under elliptical distribution assumption. Simulations are conducted to evaluate the performance of the diagnostic accuracy and the optimal linear combination under different distribution assumptions. The methodology is applied to a behavioral intervention study for families of youth with type 1 diabetes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson, T.W.: An Introduction to Multivariate Statistical Analysis. Wiley, New York (1958)

    Google Scholar 

  • Baker, S.G.: Identifying combinations of cancer markers for further study as triggers of early intervention. Biometrics 56, 1082–1087 (2000)

    Article  CAS  PubMed  Google Scholar 

  • Chang, Y.C.: Maximizing an ROC-type measure via linear combination of markers when the gold reference is continuous. Stat. Med. 32, 1893–1903 (2013)

    Article  PubMed  Google Scholar 

  • Eguchi, S., Copas, J.: A class of logistic-type discriminant functions. Biometrika 89, 1–22 (2002)

    Article  Google Scholar 

  • Fang, H.B., Fang, K.T., Kotz, S.: The meta-elliptical distributions with given marginals. J. Multivar. Anal. 82, 1–16 (2002)

    Article  CAS  Google Scholar 

  • Jin, H., Lu, Y.: The optimal linear combination of multiple predictors under the generalized linear models. Stat. Probab. Lett. 79, 2321–2327 (2009)

    Article  PubMed  PubMed Central  Google Scholar 

  • Kendall, M.: A new measure of rank correlation. Biometrika 30, 81–89 (1938)

    Article  Google Scholar 

  • Liu, A., Schisterman, E.F., Zhu, Y.: On linear combinations of biomarkers to improve diagnostic accuracy. Stat. Med. 24, 37–47 (2005)

    Article  PubMed  Google Scholar 

  • Liu, C., Liu, A., Halabi, S.: A min-max combination of biomarkers to improve diagnostic accuracy. Stat. Med. 30, 2005–2014 (2011)

    Article  PubMed  PubMed Central  Google Scholar 

  • Ma, S., Huang, J.: Combining multiple markers for classification using ROC. Biometrics 63, 751–757 (2007)

    Article  PubMed  Google Scholar 

  • McIntosh, M.W., Pepe, M.S.: Combining several screening tests: optimality of the risk score. Biometrics 58, 657–664 (2002)

    Article  PubMed  Google Scholar 

  • Neyman, J., Pearson, E.S.: The testing of statistical hypotheses in relation to probabilities a priori. Mathematical Proceedings of the Cambridge Philosophical Society, vol. 29, pp. 492–510. Cambridge University Press, Cambridge (1933)

  • Nansel, T.R., Iannotti, R.J., Liu, A.: Clinic-integrated behavioral intervention for families of youth with type 1 diabetes: randomized clinical trial. Pediatrics 129, 866–873 (2012)

    Article  Google Scholar 

  • Obuchowski, N.A.: Estimating and comparing diagnostic tests accuracy when the gold standard is not binary. Acad. Radiol. 12, 1198–1204 (2005)

    Article  PubMed  Google Scholar 

  • Obuchowski, N.A.: An ROC-type measure of diagnostic accuracy when the gold standard is continuous-scale. Stat. Med. 25, 481–493 (2006)

    Article  PubMed  Google Scholar 

  • Pepe, M.S.: The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, Ocford (2003)

    Google Scholar 

  • Pepe, M.S., Cai, T., Longton, G.: Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics 62, 221–229 (2006)

    Article  PubMed  Google Scholar 

  • Pepe, M.S., Thompson, M.L.: Combining diagnostic test results to increase accuracy. Biostatistics 1, 123–140 (2000)

    Article  PubMed  Google Scholar 

  • Richards, R.J., Hammitt, J.K., Tsevat, J.: Finding the optimal multiple-test strategy using a method analogous to logistic regression the diagnosis of hepatolenticular degeneration (wilson’s disease). Med. Decis. Making 16, 367–375 (1996)

    Article  CAS  PubMed  Google Scholar 

  • Su, J.Q., Liu, J.S.: Linear combinations of multiple diagnostic markers. J. Am. Stat. Assoc. 88, 1350–1355 (1993)

    Article  Google Scholar 

  • Wang, Z., Chang, Y. C.: Evaluating the diagnostic powers of variables and their linear combinations when the gold standard is continuous. arXiv preprint arXiv:1105.1575 (2011)

  • Zhou, X.H., Obuchowski, N.A., McClish, D.K.: Statistical Methods in Diagnostic Medicine, 2nd edn. Wiley, New York (2011)

    Book  Google Scholar 

  • Zou, K.H., Liu, A., Bandos, A.I., Ohno-Machado, L., Rockette, H.E.: Statistical Evaluation of Diagnostic Performance: Topics in ROC Analysis. Chapman and Hall/CRC, New York (2011)

    Google Scholar 

Download references

Funding

Research of A. Liu is supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), the National Institutes of Health (NIH). The authors thank two anonymous referees for their valuable comments that resulted in an improved paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aiyi Liu.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Appendices

Appendix 1: Generating data from a multivariate lognormal distribution

We use the following scheme to generate a random sample from a multivariate lognormal distribution.

  1. (1).

    Generate a p-dimensional random vector \((Z_1,Z_2,\ldots Z_p)\) from a multivariate normal distribution with mean \(\mathbf {\mu }=(\mu _1,..\mu _p)\) and variance-covariance matrix \({\varvec{\Sigma }}=(\sigma _{ij})p\times p\).

  2. (2).

    Transform the p-dimensional random vector to a new one \((X_1,X_2,\ldots ,X_{p-1},Y)\), where

    $$\begin{aligned} X_i&= {} e^{Z_i}, i=1,2,\ldots ,p-1; \\ Y&= {} e^{Z_p}. \end{aligned}$$

The vector \((X_1,X_2,\ldots ,X_{p-1},Y)\) is then a p-dimensional lognormal vector with mean \(\mathbf {\mu _L}=e^{\mathbf {\mu }+\frac{1}{2}diag({\varvec{\Sigma }})}\) and the variance-covariance matrix

$$\begin{aligned} {\varvec{\Sigma _L}}=\begin{bmatrix} \varSigma _{11}&\varSigma _{12}&\cdots&\varSigma _{1p} \\ \varSigma _{21}&\varSigma _{22}&\cdots&\varSigma _{2p} \\ \vdots&\ddots&\vdots \\ \varSigma _{p1}&\varSigma _{p2}&\cdots&\varSigma _{pp} \end{bmatrix}, \end{aligned}$$

where \(diag({\varvec{\Sigma }})\) is a vector of diagonal elements in \({\varvec{\Sigma }}\), and \(\varSigma _{ij}=e^{\mu _i+\mu _j+\frac{1}{2}(\sigma _{ii}+\sigma _{jj})}(e^{\sigma _{ij}}-1),i,j=1,\ldots ,p.\)

Appendix 2: Generating data from a multivariate gamma distribution

We propose a simple scheme to generate a data set from a multivariate gamma distribution. The algorithm proceeds as follows.

  1. (1).

    Generate \(p+1\) independent random observations, \(G_0,G_1,\ldots G_p\), from gamma distributions, such that

    $$\begin{aligned} G_i \sim Gamma(\alpha _i,\beta ) , i=0,1,\ldots ,p. \end{aligned}$$
  2. (2).

    Generate a p-dimensional random vector \((X_1,X_2,\ldots ,X_{p-1},Y)\), where

    $$\begin{aligned} X_i&= {} G_0+G_i, i=1,2,\ldots ,p-1; \\ Y&= {} G_0+G_1+\cdots +G_p. \end{aligned}$$

The vector \((X_1,X_2,\ldots ,X_{p-1},Y)\) is then a p-dimensional gamma vector, where the marginal distribution of each \(X_i\) is \(Gamma(\alpha _0+\alpha _i,\beta )\) and the marginal distribution of Y is \(Gamma(\sum _{i=0}^p \alpha _i , \beta )\). The variance-covariance matrix is

$$\begin{aligned} {\varvec{\Sigma }}=\begin{bmatrix} \sigma ^2_1&\sigma ^2_x&\cdots&\sigma ^2_x&\sigma ^2_{1} \\ \sigma ^2_x&\sigma ^2_2&\cdots&\sigma ^2_x&\sigma ^2_{2} \\ \vdots&\ddots&\vdots \\ \sigma ^2_x&\cdots&\sigma ^2_x&\sigma ^2_{p-1}&\sigma ^2_{(p-1)}\\ \sigma ^2_{1}&\sigma ^2_{2}&\cdots&\sigma ^2_{(p-1)}&\sigma ^2_y \end{bmatrix}, \end{aligned}$$

where

$$\begin{aligned} \sigma ^2_i&= {} \frac{\alpha _0+\alpha _i}{\beta ^2},i=1,\ldots ,p-1 \\ \sigma ^2_x&= {} \frac{\alpha _0}{\beta ^2},\\ \sigma ^2_y &= {} \frac{\sum _{i=0}^p \alpha _i}{\beta ^2}. \end{aligned}$$

This is a special case of multivariate gamma distributions, which is used for our simulation.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, X., Liu, A. & Li, Z. Maximizing an ROC-type measure via linear combinations of biomarkers. Health Serv Outcomes Res Method 16, 103–116 (2016). https://doi.org/10.1007/s10742-016-0150-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10742-016-0150-z

Keywords

Navigation