Abstract
Biomarkers assessing cardiovascular function can encompass a wide range of biochemical or physiological measurements. Medical tests that measure biomarkers are typically evaluated for measurement validation and clinical performance in the context of their intended use. General statistical principles for the evaluation of medical tests are discussed in this paper in the context of heart failure. Statistical aspects of study design and analysis to be considered while assessing the quality of measurements and the clinical performance of tests are highlighted. A discussion of statistical considerations for specific clinical uses is also provided. The remarks in this paper mainly focus on methods and considerations for statistical evaluation of medical tests from the perspective of bias and precision. With such an evaluation of performance, healthcare professionals could have information that leads to a better understanding on the strengths and limitations of tests related to heart failure.
Similar content being viewed by others
Notes
CLSI website address: http://www.clsi.org
References
Januzzi, J. L., Jr., Camargo, C. A., Anwaruddin, S., et al. (2005). The N-terminal pro-BNP Investigation of Dyspnea in the Emergency Department (PRIDE) study. The American Journal of Cardiology, 95, 948–954. doi:10.1016/j.amjcard.2004.12.032.
Lok, D. J., Van Der Meer, P., de la Porte, P. W. B. A., Lipsic, E., Van Wijngaarden, J., Hillege, H. L., et al. (2010). Prognostic value of galectin-3, a novel marker of fibrosis, in patients with chronic heart failure: data from the DEAL-HF study. Clinical Research in Cardiology, 99, 323–328. doi:10.1007/s00392-010-0125-y.
Deng, M. C., Eisen, H. J., Mehra, M. R., et al. (2006). Noninvasive discrimination of rejection in cardiac allograft recipients using gene expression profiling. American Journal of Transplantation, 6, 150–160. doi:10.1111/j.1600-6143.2005.01175.x.
Pham, M. X., Teuteberg, J. J., Kfoury, A. G., et al. (2010). Gene-expression profiling for rejection surveillance after cardiac transplantation. The New England Journal of Medicine, 362, 1890–1900. doi:10.1056/NEJMoa0912965.
Biomarkers Definitions Working Group. (2001). Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clinical Pharmacology and Therapeutics, 69, 89–95. doi:10.1067/mcp.2001.113989.
Clinical and Laboratory Standards Institute. Harmonized terminology database http://login.clsi.org/HTDatabase.cfm. Accessed 01 Mar 2013.
ISO. (1994). Accuracy (trueness and precision) of measurement methods and results—part 1: general principles and definitions. ISO 5725–1. Geneva: International Organization for Standardization.
ISO. (1994). Accuracy (trueness and precision) of measurement methods and results—part 2: basic method for the determination of repeatability and reproducibility of a standard measurement method. ISO 5725–2. Geneva: International Organization for Standardization.
CLSI. (2002). Method comparison and bias estimation using patient samples; approved guideline–second edition. CLSI document EP9-A2. Wayne: Clinical and Laboratory Standards Institute.
CLSI. (2002). Evaluation of precision performance of quantitative measurement methods; Approved guideline–second edition. CLSI document EP5-A2. Wayne: Clinical and Laboratory Standards Institute.
CLSI. (2012). Evaluation of detection capability for clinical laboratory measurement procedures; approved guideline—second edition. CLSI document EP17-A2. Wayne: Clinical and Laboratory Standards Institute.
CLSI. (2003). Evaluation of the linearity of quantitative measurement procedures: a statistical approach: approved guideline. CLSI document EP6-A. Wayne: Clinical and Laboratory Standards Institute.
CLSI. (2005). Interference testing in clinical chemistry; approved guideline–second edition. CLSI document EP7-A2. Wayne: Clinical and Laboratory Standards Institute.
CLSI. (2008). User protocol for evaluation of qualitative test performance–second edition. CLSI document EP12-A2. Wayne: Clinical and Laboratory Standards Institute.
CLSI. (2003). Estimation of total analytical error for clinical laboratory methods: approved guideline. CLSI document EP21-A. Wayne: Clinical and Laboratory Standards Institute.
CLSI. (2011). Assessment of the diagnostic accuracy of laboratory tests using receiver operating characteristic curves; approved guideline-second edition. CLSI document EP24-A2. Wayne: Clinical and Laboratory Standards Institute.
CLSI. (2008). Defining, establishing, verifying reference intervals in the clinical laboratory; approved guideline–third edition. CLSI document C28-A3. Wayne: Clinical and Laboratory Standards Institute.
US Food and Drug Administration. Standards (medical devices) http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/Standards/default.htm. Accessed 4 February 2011.
Simon, R. (2009). Identification of pharmacogenomic biomarker classifiers in cancer drug development. In F. Innocenti (Ed.), Genomics and pharmacogenomics in anticancer drug development and clinical response (pp. 327–338). New Jersey: Humana. doi:10.1007/978-1-60327-088-5_19.
Zweig, M. H., & Campbell, G. (1993). Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical Chemistry, 39(4), 561–577.
Altman, D. G. (1991). Categorising continuous variables. British Journal of Cancer, 64, 975. doi:10.1038/bjc.1991.441.
US Food and Drug Administration (2011) Draft guidance for industry, clinical investigators, and food and drug administration staff—design considerations for pivotal clinical investigations for medical devices. http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm265553.htm. Accessed 12 February 2013.
Hlatky, M. A., Greenland, P., Arnett, D. K., Ballantyne, C. M., et al. (2009). Criteria for evaluation of novel markers of cardiovascular risk. Circular, 119, 2408–2416. doi:10.1161/CIRCULATIONAHA.109.192278.
Mandrekar, S. J., & Sargent, D. J. (2009). Clinical trial designs for predictive biomarker validation: theoretical considerations and practical challenges. Journal of Clinical Oncology, 27, 4027–4034. doi:10.1200/JCO.2009.22.3701.
Hsu, J. (1996). Multiple comparisons: theory and methods. Boca Raton: Chapman and Hall/CRC.
Campbell, G., Pennello, G., & Yue, L. (2011). Missing data in the regulation of medical devices. Journal of Biopharmaceutical Statistics, 21, 180–195. doi:10.1080/10543406.2011.550094.
National Academy of Sciences. (2010). The prevention and treatment of missing data in clinical trials. Panel on Handling Missing Data in Clinical Trials, National Research Council. Washington, DC: National Academies.
Little, R. J., D'Agostino, R., Cohen, M. L., Dickersin, K., et al. (2012). The prevention and treatment of missing data in clinical trials. The New England Journal of Medicine, 367, 1355–1360. doi:10.1056/NEJMsr1203730.
Wang, T. J., Gona, P., Larson, M. G., et al. (2006). Multiple biomarkers for the prediction of first major cardiovascular events and death. The New England Journal of Medicine, 355(25), 2631–2639. doi:10.1056/NEJMoa055373.
Pencina, M. J., D'Agostino, R. B., & Vasan, R. S. (2010). Statistical methods for assessment of added usefulness of new biomarkers. Clinical Chemistry and Laboratory Medicine, 48, 1703–1711. doi:10.1515/cclm.2010.340.
Janes, H., & Pepe, M. S. (2008). Adjusting for covariates in studies of diagnostic, screening, or prognostic markers: an old concept in a new setting. American Journal of Epidemiology, 168, 89–97. doi:10.1093/aje/kwn099.
Simon, R. M., Paik, S., & Hayes, D. F. (2009). Use of archived specimens in evaluation of prognostic and predictive biomarkers. Journal of the National Cancer Institute, 101, 1446–1452. doi:10.1093/jnci/djp335.
Herman, C. R., Gill, H. K., Eng, J., & Fajardo, L. L. (2002). Screening for preclinical disease: test and disease characteristics. American Journal of Roentgenology, 179, 825–831.
US Food and Drug Administration (2007) Guidance for industry and FDA staff: statistical guidance on reporting results from studies evaluating diagnostic tests. http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm071148.htm . Accessed 4 February 2011.
Cook, N. R. (2008). Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve. Clinical Chemistry, 54, 17–23. doi:10.1373/clinchem.2007.096529.
Simon, R., Radmacher, M. D., Dobbin, K., & McShane, L. M. (2003). Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. Journal of the National Cancer Institute, 95, 14–18. doi:10.1093/jnci/95.1.14.
Altman, D. G., Vergouwe, Y., Royston, P., & Moons, K. G. (2009). Prognosis and prognostic research: validating a prognostic model. BMJ, 338, b605. doi:10.1136/bmj.b605.
Pencina, M. J., D'Agostino, R. B., Sr., D'Agostino, R. B., Jr., & Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in Medicine, 27, 157–172. doi:10.1002/sim.2929.
Pepe, M. S., Feng, Z., Huang, Y., Longton, G., Prentice, R., Thompson, I. M., et al. (2008). Integrating the predictiveness of a marker with its performance as a classifier. American Journal of Epidemiology, 167, 362–368. doi:10.1093/aje/kwm305.
Zheng, Y., Cai, T., Pepe, M. S., & Levy, W. C. (2008). Time-dependent predictive values of prognostic biomarkers with failure time outcome. Journal of the American Statistical Association, 103, 362–368. doi:10.1198/016214507000001481.
Cook, N. R. (2007). Use and misuse of the receiver operating characteristic curve in risk prediction. Circular, 115, 928–935. doi:10.1161/CIRCULATIONAHA.106.672402.
Gail, M. H., & Pfeiffer, R. M. (2005). On criteria for evaluating models of absolute risk. Biostatistics, 6, 227–239. doi:10.1093/biostatistics/kxi005.
US Food and Drug Administration (2011) Draft guidance for Industry and Food and Drug Administration Staff—in vitro companion diagnostic devices. http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm262292.htm . Accessed 28 January 2013.
Janes, H., Pepe, M. S., Bossuyt, P. M., & Barlow, W. E. (2011). Measuring the performance of markers for guiding treatment decisions. Annals of Internal Medicine, 154, 253–259. doi:10.1059/0003-4819-154-4-201102150-00006.
Bossuyt, P. M. M., Lijmer, J. G., & Mol, B. W. J. (2000). Randomised comparisons of medical tests: sometimes invalid, not always efficient. Lancet, 356, 1844–1847. doi:10.1016/S0140-6736(00)03246-3.
US Food and Drug Administration (2012) Draft guidance for industry, enrichment strategies for clinical trials to support approval of human drugs and biological products. http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM332181.pdf . Accessed 6 February 2013.
Permutt, T. (2007). A note on stratification in clinical trials. Drug Information Journal, 41, 719–722. doi:10.1177/009286150704100604.
Maitournam, A., & Simon, R. (2005). On the efficacy of targeted clinical trials. Statistics in Medicine, 24, 329–339. doi:10.1002/sim.1975.
Freidlin, B., McShane, L. M., & Korn, E. (2010). Randomized clinical trials with biomarkers: design issues. Journal of the National Cancer Institute, 102, 152–160. doi:10.1093/jnci/djp477.
Fraser, C. G., Hyltoff Petersen, P., & Lytken Larsen, M. (1990). Setting analytical goals for random analytical error in specific clinical monitoring situations. Clinical Chemistry, 36(9), 1625–1628.
Acknowledgments
Thanks to Paula Caposino and Bipasa Biswas for their critical review of this paper. We are grateful to the editor and reviewers for their thoughtful and insightful comments, which improved this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
De, A., Meier, K., Tang, R. et al. Evaluation of Heart Failure Biomarker Tests: A Survey of Statistical Considerations. J. of Cardiovasc. Trans. Res. 6, 449–457 (2013). https://doi.org/10.1007/s12265-013-9470-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12265-013-9470-3