Abstract
Data transformation, normalization, and handling of batch effect are a key part of data analysis for almost all spectrometry-based omics data. This paper reviews and contrasts these three distinct aspects. We present a systematic overview of the key approaches and critically review some common procedures. Much of this paper is inspired by mass spectrometry-based experimentation, but most of our discussion carries over to omics data using distinct spectrometric approaches generally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aitchison, J. (1986). The statistical analysis of compositional data. Caldwell: Blackburn.
Beer, A. (1852). Annalen der Physic und Chime, 86, 78–88.
Cairns, D. A. (2011). Statistical issues in quality control of proteomic analyses: Good experimental design and planning. Proteomics, 11, 1037–1048.
Cox, D. R., & Oakes, D. (1984). Analysis of survival data. London: Chapman and Hall.
Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78, 2262–2267.
Eidhammer, I., Barsnes, H., Eide, G. E., & Martens, L. (2013). Computational and statistical methods for protein quantification by mass spectrometry. Chichester: Wiley.
Jolliffe, I. T. (2002). Principal component analysis. New York: Springer.
Kakourou, A., Vach, W., Nicolardi, S., van der Burgt, Y., & Mertens, B. (2016). Accounting for isotopic clustering in Fourier transform mass spectrometry data analysis for clinical diagnostic studies. Statistical Applications in Genetics and Molecular Biology, 15(5), 415–430. doi:10.1515/sagmb-2016-0005.
Klein, J. P., van Houwelingen, H. C., Ibrahim, J. G., & Scheike, T. H. (2014). Handbook of survival analysis. Boca Raton: Chapman and Hall/CRC Press.
Krzanowski, W. J., Jonathan, P., McCarthy, W. V., & Thomas, M. R (1995). Discriminant analysis with singular covariance matrices: Methods and applications to spectroscopic data. Applied Statistics, 44, 101–115.
Mertens, B. J. A., De Noo, M. E., Tollenaar, R. A. E. M., & Deelder, A. M. (2006). Mass spectrometry proteomic diagnosis: Enacting the double cross-validatory paradigm. Journal of Computational Biology, 13(9), 1591–1605.
Molenberghs, G., & Verbeke, G. (2000). Linear mixed models for longitudinal data. New York: Springer.
Naes, T., Isaksson, T., Fearn, T., & Davies, T. (2002). A user-friendly guide to multivariate classification and calibration. Chichester: NIR Publications.
Sauve, A., & Speed, T. (2004). Normalization, baseline correction and alignment of high-throughput mass spectrometry data. In Proceedings of the Genomic Signal Processing and Statistics Workshop, Baltimore, MO.
Snedecor, G. W., & Cochran, W. G. (1980). Statistical methods (7th ed.). Ames: Iowa State University Press.
Vach, W. (2013). Regression models as a tool in medical research. Boca Raton: Chapman and Hall/CRC Press.
van den Berg, R. A., Hoefsloot, H. C. J., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling and transformations: Improving the biological information content of metabolomics data. BMC Genomics, 7, 142.
van Houwelingen, H. C., & Putter, H. (2012). Dynamic prediction in clinical survival analysis. Boca Raton: Chapman and Hall/CRC Press.
Acknowledgements
This work was supported by funding from the European Community’s Seventh Framework Programme FP7/2011: Marie Curie Initial Training Network MEDIASRES (“Novel Statistical Methodology for Diagnostic/Prognostic and Therapeutic Studies and Systematic Reviews,” www.mediasres-itn.eu) with the Grant Agreement Number 290025 and by funding from the European Union’s Seventh Framework Programme FP7/ Health/F5/2012: MIMOmics (“Methods for Integrated Analysis of Multiple Omics Datasets,” http://www.mimomics.eu) under the Grant Agreement Number 305280.
Thanks to Mar Rodríguez Girondo for critical comments on an early version of this text.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Mertens, B.J.A. (2017). Transformation, Normalization, and Batch Effect in the Analysis of Mass Spectrometry Data for Omics Studies. In: Datta, S., Mertens, B. (eds) Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-45809-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-45809-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45807-6
Online ISBN: 978-3-319-45809-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)