Abstract
High-throughput technologies currently have the capability to capture information at both global and targeted scales for the transcriptome, proteome, and metabolome, as well as determining functional aspects of these biomolecules. The promise of data integration is that by utilizing these disparate data streams a more accurate predictive model of the phenotype of interest can be developed by identifying the best subset of molecules associated with the outcome. However, in a space of tens of thousands of variables (e.g., genes, proteins), feature selection approaches often yield over-trained models with poor predictive power. Moreover, feature selection algorithms are typically focused on a single source of data and do not evaluate the effect on downstream statistical integration models. The integration of Bayesian statistical outputs have been shown to be an effective approach that optimizes the outcome of interest in the context of the integrated posterior probability. This chapter demonstrates that this approach can improve sensitivity and specificity over simple selection routines based on individual high-throughput datasets generated via mass spectrometry.
Keywords
- Feature Selection Algorithm
- Poor Predictive Power
- Liquid Chromatography Multiple Reaction Monitoring
- Diabetes Antibody Standardization Program (DASP)
- Total Classification Accuracy
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Beagley, N., Stratton, K. G., & Webb-Robertson, B. J. (2010). VIBE 2.0: Visual integration for Bayesian evaluation. Bioinformatics, 26(2), 280–282. doi:10.1093/bioinformatics/btp639.
Bingley, P. J., Bonifacio, E., & Mueller, P. W. (2003). Diabetes Antibody Standardization Program: First assay proficiency evaluation. Diabetes, 52(5), 1128–1136.
Chen, X., Liang, Y. Z., Yuan, D. L., & Xu, Q. S. (2009). A modified uncorrelated linear discriminant analysis model coupled with recursive feature elimination for the prediction of bioactivity. SAR and QSAR in Environmental Research, 20(1–2), 1–26. doi:10.1080/10629360902724127.
Dai, Q., Cheng, J. H., Sun, D. W., & Zeng, X. A. (2015). Advances in feature selection methods for hyperspectral image processing in food industry applications: A review. Critical Reviews in Food Science and Nutrition, 55(10), 1368–1382. doi:10.1080/10408398.2013.871692.
De Martino, F., Valente, G., Staeren, N., Ashburner, J., Goebel, R., & Formisano, E. (2008). Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns. NeuroImage, 43(1), 44–58. doi:10.1016/j.neuroimage.2008.06.037.
Eriksson, C., Masaki, N., Yao, I., Hayasaka, T., & Setou, M. (2013). MALDI imaging mass spectrometry-A mini review of methods and recent developments. Mass Spectrom (Tokyo), 2(Spec Iss), S0022. doi:10.5702/massspectrometry.S0022.
Gholami, B., Norton, I., Tannenbaum, A. R., & Agar, N. Y. (2012). Recursive feature elimination for brain tumor classification using desorption electrospray ionization mass spectrometry imaging. Conference Proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2012, 5258–5261. doi:10.1109/EMBC.2012.6347180.
Hand, D. J. (1997). Construction and assessment of classification rules. New York: Wiley.
Hu, C., Wang, J., Zheng, C., Xu, S., Zhang, H., Liang, Y., et al. (2013). Raman spectra exploring breast tissues: Comparison of principal component analysis and support vector machine-recursive feature elimination. Medical Physics, 40(6), 063501. doi:10.1118/1.4804054.
Ibanez, C., Simo, C., Garcia-Canas, V., Cifuentes, A., & Castro-Puyana, M. (2013). Metabolomics, peptidomics and proteomics applications of capillary electrophoresis-mass spectrometry in foodomics: A review. Analytica Chimica Acta, 802, 1–13. doi:10.1016/j.aca.2013.07.042.
Jarman, K. H., Kreuzer-Martin, H. W., Wunschel, D. S., Valentine, N. B., Cliff, J. B., Petersen, C. E., et al. (2008). Bayesian-integrated microbial forensics. Applied and Environmental Microbiology, 74(11), 3573–3582. doi:10.1128/AEM.02526-07.
Jia, P., He, H., & Lin, W. (2005). Decision by maximum of posterior probability average with weights: A method of multiple classifiers combination. In Proceedings of Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 2005 (pp. 1949–1954). IEEE.
Kruve, A., Rebane, R., Kipper, K., Oldekop, M. L., Evard, H., Herodes, K., et al. (2015). Tutorial review on validation of liquid chromatography-mass spectrometry methods: Part I. Analytica Chimica Acta, 870, 29–44. doi:10.1016/j.aca.2015.02.017.
Kruve, A., Rebane, R., Kipper, K., Oldekop, M. L., Evard, H., Herodes, K., et al. (2015). Tutorial review on validation of liquid chromatography-mass spectrometry methods: Part II. Analytica Chimica Acta, 870, 8–28. doi:10.1016/j.aca.2015.02.016.
Lampasona, V., Schlosser, M., Mueller, P. W., Williams, A. J., Wenzlau, J. M., Hutton, J. C., et al. (2011). Diabetes antibody standardization program: First proficiency evaluation of assays for autoantibodies to zinc transporter 8. Clinical Chemistry, 57(12), 1693–1702. doi:10.1373/clinchem.2011.170662.
Lanckriet, G. R., De Bie, T., Cristianini, N., Jordan, M. I., & Noble, W. S. (2004). A statistical framework for genomic data fusion. Bioinformatics, 20(16), 2626–2635. doi:10.1093/bioinformatics/bth294.
Liesenfeld, D. B., Habermann, N., Owen, R. W., Scalbert, A., & Ulrich, C. M. (2013). Review of mass spectrometry-based metabolomics in cancer research. Cancer Epidemiology, Biomarkers and Prevention, 22(12), 2182–2201. doi:10.1158/1055-9965.EPI-13-0584.
Lin, X., Yang, F., Zhou, L., Yin, P., Kong, H., Xing, W., et al. (2012). A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information. Journal of Chromatography B, Analytical Technologies in the Biomedical and Life Sciences, 910, 149–155. doi:10.1016/j.jchromb.2012.05.020.
Piao, Y., Piao, M., Park, K., & Ryu, K. H. (2012). An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data. Bioinformatics, 28(24), 3306–3315. doi:10.1093/bioinformatics/bts602.
Rolandsson, O., Hagg, E., Nilsson, M., Hallmans, G., Mincheva-Nilsson, L., & Lernmark, A. (2001). Prediction of diabetes with body mass index, oral glucose tolerance test and islet cell autoantibodies in a regional population. Journal of Internal Medicine, 249(4), 279–288.
Saeys, Y., Inza, I., & Larranaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2507–2517. doi:10.1093/bioinformatics/btm344.
Saligan, L. N., Fernandez-Martinez, J. L., deAndres-Galiana, E. J., & Sonis, S. (2014). Supervised classification by filter methods and recursive feature elimination predicts risk of radiotherapy-related fatigue in patients with prostate cancer. Cancer Information, 13, 141–152. doi:10.4137/CIN.S19745.
Semmar, N., Canlet, C., Delplanque, B., Ruyet, P. L., Paris, A., & Martin, J. C. (2014). Review and research on feature selection methods from NMR data in biological fluids. Presentation of an original ensemble method applied to atherosclerosis field. Current Drug Metabolism, 15(5), 544–556.
Shapiro, C. P. (1977). Classification by maximum posterior probability. The Annals of Statistics, 5(1), 185–190.
Tao, P., Liu, T., Li, X., & Chen, L. (2015). Prediction of protein structural class using tri-gram probabilities of position-specific scoring matrix and recursive feature elimination. Amino Acids, 47(3), 461–468. doi:10.1007/s00726-014-1878-9.
Van Oudenhove, L., & Devreese, B. (2013). A review on recent developments in mass spectrometry instrumentation and quantitative tools advancing bacterial proteomics. Applied Microbiology and Biotechnology, 97(11), 4749–4762. doi:10.1007/s00253-013-4897-7.
Webb-Robertson, B. J., Kreuzer, H., Hart, G., Ehleringer, J., West, J., Gill, G., et al. (2012). Bayesian integration of isotope ratio for geographic sourcing of castor beans. Journal of Biomedicine and Biotechnology, 2012, 450967. doi:10.1155/2012/450967.
Webb-Robertson, B. J., McCue, L. A., Beagley, N., McDermott, J. E., Wunschel, D. S., Varnum, S. M., et al. (2009). A Bayesian integration model of high-throughput proteomics and metabolomics data for improved early detection of microbial infections. Pac Symp Biocomput (pp. 451–463).
Webb-Robertson, B. J., Wiberg, H. K., Matzke, M. M., Brown, J. N., Wang, J., McDermott, J. E., et al. (2015). Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. Journal of Proteome Research, 14(5), 1993–2001. doi:10.1021/pr501138h.
Yousef, M., Jung, S., Showe, L. C., & Showe, M. K. (2007). Recursive cluster elimination (RCE) for classification and feature selection from gene expression data. BMC Bioinformatics, 8, 144. doi:10.1186/1471-2105-8-144.
Zhang, Q., Fillmore, T. L., Schepmoes, A. A., Clauss, T. R., Gritsenko, M. A., Mueller, P. W., et al. (2013). Serum proteomics reveals systemic dysregulation of innate immunity in type 1 diabetes. Journal of Experimental Medicine, 210(1), 191–203. doi:10.1084/jem.20111843.
Acknowledgments
This work was funded by NIH NIDDK grant R33 DK070146. Significant portions of the work were performed at the Environmental Sciences Laboratory, a national scientific user facility sponsored by the Department of Energy’s (DOE) Office of Biological and Environmental Research and located at Pacific Northwest National Laboratory (PNNL) in Richland, Washington. PNNL is a multi-program national laboratory operated by Battelle for the DOE under contract DE-AC05-765RL0 1830.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Webb-Robertson, BJ.M., Metz, T.O., Waters, K.M., Zhang, Q., Rewers, M. (2017). Bayesian Posterior Integration for Classification of Mass Spectrometry Data. In: Datta, S., Mertens, B. (eds) Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-45809-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-45809-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45807-6
Online ISBN: 978-3-319-45809-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)