Advertisement

Metabolomics

, 14:108 | Cite as

Comparing normalization methods and the impact of noise

  • Thao Vu
  • Eli Riekeberg
  • Yumou Qiu
  • Robert Powers
Original Article

Abstract

Introduction

Failure to properly account for normal systematic variations in OMICS datasets may result in misleading biological conclusions. Accordingly, normalization is a necessary step in the proper preprocessing of OMICS datasets. In this regards, an optimal normalization method will effectively reduce unwanted biases and increase the accuracy of downstream quantitative analyses. But, it is currently unclear which normalization method is best since each algorithm addresses systematic noise in different ways.

Objective

Determine an optimal choice of a normalization method for the preprocessing of metabolomics datasets.

Methods

Nine MVAPACK normalization algorithms were compared with simulated and experimental NMR spectra modified with added Gaussian noise and random dilution factors. Methods were evaluated based on an ability to recover the intensities of the true spectral peaks and the reproducibility of true classifying features from orthogonal projections to latent structures—discriminant analysis model (OPLS-DA).

Results

Most normalization methods (except histogram matching) performed equally well at modest levels of signal variance. Only probabilistic quotient (PQ) and constant sum (CS) maintained the highest level of peak recovery (> 67%) and correlation with true loadings (> 0.6) at maximal noise.

Conclusion

PQ and CS performed the best at recovering peak intensities and reproducing the true classifying features for an OPLS-DA model regardless of spectral noise level. Our findings suggest that performance is largely determined by the level of noise in the dataset, while the effect of dilution factors was negligible. A minimal allowable noise level of 20% was also identified for a valid NMR metabolomics dataset.

Keywords

Metabolomics Normalization Noise NMR Preprocessing chemometrics 

Abbreviations

NMR

Nuclear magnetic resonance

PCA

Principal components analysis

OPLS-DA

Orthogonal projections to latent structures—discriminant analysis

PQ

Probabilistic quotient

HM

Histogram matching

SNV

Standard normal variate

MSC

Multiplicative scatter correction

Q

Quantile

CSpline

Natural cubic splines

SSpline

Smoothing splines

CS

Constant sum

ROI

Region of interest

PSC

Phase-scatter correction

LOESS

LOcally Estimated Scatterplot Smoothing

ROC

Receiver operating characteristic curve

1D

One-dimensional

SD

Standard deviation

Notes

Acknowledgements

We thank Dr. Martha Morton, the Director of the Research Instrumentation Facility in the Department of Chemistry at the University of Nebraska-Lincoln for her assistance with the NMR experiments. This material is based upon work supported by the National Science Foundation under Grant Number (1660921). This work was supported in part by funding from the Redox Biology Center (P30 GM103335, NIGMS); and the Nebraska Center for Integrated Biomolecular Communication (P20 GM113126, NIGMS). The research was performed in facilities renovated with support from the National Institutes of Health (RR015468-01). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Author contributions

TV and ER performed the experiments; RP and YQ designed the experiments; TV, ER, YQ, and RP analyzed the data and wrote the manuscript.

Compliance with ethical standards

Conflict of interest

Authors have no conflict of interest to declare.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary material

11306_2018_1400_MOESM1_ESM.pdf (580 kb)
Supplementary material 1 (PDF 579 KB)

References

  1. Aardema, M. J., & MacGregor, J. T. (2002). Toxicology and genetic toxicology in the new era of “toxicogenomics”: Impact of “-omics” technologies. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 499, 13–25.  https://doi.org/10.1016/S0027-5107(01)00292-5.CrossRefPubMedGoogle Scholar
  2. Barnes, R. J., Dhanda, M. S., & Lister, S. J. (1989). Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Applied Spectroscopy, 43, 772–777.CrossRefGoogle Scholar
  3. Berger, B., Peng, J., & Singh, M. (2013). Computational solutions for omics data. Nature Reviews Genetics, 14, 333–346.  https://doi.org/10.1038/nrg3433.CrossRefPubMedPubMedCentralGoogle Scholar
  4. Butcher, E. C., Berg, E. L., & Kunkel, E. J. (2004). Systems biology in drug discovery. Nature Biotechnology, 22, 1253.  https://doi.org/10.1038/nbt1017.CrossRefPubMedGoogle Scholar
  5. Callister, S. J., Barry, R. C., Adkins, J. N., Johnson, E. T., Qian, W. J., Webb-Robertson, B. J. M., … Lipton, M. S. (2006). Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. Journal of Proteome Research, 5, 277–286.  https://doi.org/10.1021/pr050300l.CrossRefPubMedPubMedCentralGoogle Scholar
  6. Chawade, A., Alexandersson, E., & Levander, F. (2014). Normalyzer: A tool for rapid evaluation of normalization methods for omics data sets. Journal of Proteome Research, 13, 3114–3120.  https://doi.org/10.1021/pr401264n.CrossRefPubMedPubMedCentralGoogle Scholar
  7. Chen, R., Mias, G. I., Li-Pook-Than, J., Jiang, L., Lam, H. Y., Chen, R., … Cheng, Y. (2012). Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell, 148, 1293–1307.  https://doi.org/10.1016/j.cell.2012.02.009.CrossRefPubMedPubMedCentralGoogle Scholar
  8. Choe, S. E., Boutros, M., Michelson, A. M., Church, G. M., & Halfon, M. S. (2005). Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biology, 6, R16.  https://doi.org/10.1186/gb-2005-6-2-r16.CrossRefPubMedPubMedCentralGoogle Scholar
  9. Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78, 2262–2267.  https://doi.org/10.1021/ac0519312.CrossRefPubMedGoogle Scholar
  10. Cuykx, M., Claes, L., Rodrigues, R. M., Vanhaecke, T., & Covaci, A. (2018). Metabolomics profiling of steatosis progression in HepaRG® cells using sodium valproate. Toxicology Letters, 286, 22–30.  https://doi.org/10.1016/j.toxlet.2017.12.015.CrossRefPubMedGoogle Scholar
  11. Dieterle, F., Ross, A., Schlotterbeck, G., & Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Analytical Chemistry, 78, 4281–4290.  https://doi.org/10.1021/ac051632c.CrossRefPubMedGoogle Scholar
  12. Doran, M. L., Knee, J. M., Wang, N., Rzezniczak, T. Z., Parkes, T. L., Li, L., & Merritt, T. J. (2017). Metabolomic analysis of oxidative stress: Superoxide dismutase mutation and paraquat induced stress in Drosophila melanogaster. Free Radical Biology and Medicine, 113, 323–334.  https://doi.org/10.1016/j.freeradbiomed.2017.10.011.CrossRefPubMedGoogle Scholar
  13. Fujioka, H., & Kano, H. (2005). Smoothing spline curves and surfaces for sampled data. International Journal of Innovative Computing, 1, 429–449.Google Scholar
  14. Fukushima, A., Iwasa, M., Nakabayashi, R., Kobayashi, M., Nishizawa, T., Okazaki, Y., … Kusano, M. (2017). Effects of combined low glutathione with mild oxidative and low phosphorus stress on the metabolism of Arabidopsis thaliana. Frontiers in Plant Science, 8, 1464.CrossRefPubMedPubMedCentralGoogle Scholar
  15. Giraudeau, P., Tea, I., Remaud, G. S., & Akoka, S. (2014). Reference and normalization methods: Essential tools for the intercomparison of NMR spectra. Journal of Pharmaceutical and Biomedical Analysis, 93, 3–16.  https://doi.org/10.1016/j.jpba.2013.07.020.CrossRefPubMedGoogle Scholar
  16. Halouska, S., Zhang, B., Gaupp, R., Lei, S., Snell, E., Fenton, R. J., ... Powers, R. (2013). Revisiting protocols for the NMR analysis of bacterial metabolomes. Journal of Integrated OMICS, 2, 120–137.Google Scholar
  17. Halouska, S., & Powers, R. (2006). Negative impact of noise on the principal component analysis of NMR data. Journal of Magnetic Resonance, 178, 88–95.CrossRefPubMedGoogle Scholar
  18. Hochrein, J., Zacharias, H. U., Taruttis, F., Samol, C., Engelmann, J. C., Spang, R., … Gronwald, W. (2015). Data normalization of 1H NMR metabolite fingerprinting data sets in the presence of unbalanced metabolite regulation. Journal of Proteome Research, 14, 3217–3228.  https://doi.org/10.1021/acs.jproteome.5b00192.CrossRefPubMedGoogle Scholar
  19. Jung, Y.-S., Lee, J., Seo, J., & Hwang, G.-S. (2017). Metabolite profiling study on the toxicological effects of polybrominated diphenyl ether in a rat model. Environmental Toxicology, 32, 1262–1272.  https://doi.org/10.1002/tox.22322.CrossRefPubMedGoogle Scholar
  20. Kohl, S. M., Klein, M. S., Hochrein, J., Oefner, P. J., Spang, R., & Gronwald, W. (2012). State-of-the art data normalization methods improve NMR-based metabolomic analysis. Metabolomics, 8, 146–160.  https://doi.org/10.1007/s11306-011-0350-z.CrossRefPubMedGoogle Scholar
  21. R Development Core Team. (2017). R: A language and environment for statistical computing. Austria: R Foundation for Statistical Computing Vienna.Google Scholar
  22. Thulin, E., Thulin, M., & Andersson, D. I. (2017). Reversion of high-level mecillinam resistance to susceptibility in Escherichia coli during growth in urine. EBioMedicine, 23, 111–118.  https://doi.org/10.1016/j.ebiom.2017.08.021.CrossRefPubMedPubMedCentralGoogle Scholar
  23. Torgrip, R. J. O., Åberg, K. M., Alm, E., Schuppe-Koistinen, I., & Lindberg, J. (2008). A note on normalization of biofluid 1D 1H-NMR data. Metabolomics, 4, 114–121.  https://doi.org/10.1007/s11306-007-0102-2.CrossRefGoogle Scholar
  24. Weisstein, E. W. (2017). Cauchy distribution. In: MathWorld. http://mathworld.wolfram.com/CauchyDistribution.html.
  25. Windig, W., Shaver, J., & Bro, R. (2008). Loopy MSC: A simple way to improve multiplicative scatter correction. Applied Spectroscopy, 62, 1153–1159.  https://doi.org/10.1366/000370208786049097.CrossRefPubMedGoogle Scholar
  26. Wishart, D. S. (2008). Metabolomics: Applications to food science and nutrition research. Trends in Food Science & Technology, 19, 482–493.  https://doi.org/10.1016/j.tifs.2008.03.003.CrossRefGoogle Scholar
  27. Workman, C., Jensen, L. J., Jarmer, H., Berka, R., Gautier, L., Nielser, H. B., … Knudsen, S. (2002). A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biology.  https://doi.org/10.1186/gb-2002-3-9-research0048.PubMedPubMedCentralCrossRefGoogle Scholar
  28. Worley, B., & Powers, R. (2013). Multivariate analysis in metabolomics. Current Metabolomics, 1, 92–107.  https://doi.org/10.2174/2213235X11301010092.PubMedPubMedCentralCrossRefGoogle Scholar
  29. Worley, B., & Powers, R. (2014a). MVAPACK: A complete data handling package for NMR metabolomics. ACS Chemical Biology, 9, 1138–1144.  https://doi.org/10.1021/cb4008937.CrossRefPubMedPubMedCentralGoogle Scholar
  30. Worley, B., & Powers, R. (2014b). Simultaneous phase and scatter correction for NMR datasets. Chemometrics and Intelligent Laboratory Systems, 131, 1–6.  https://doi.org/10.1016/j.chemolab.2013.11.005.CrossRefPubMedPubMedCentralGoogle Scholar
  31. Worley, B., & Powers, R. (2016). PCA as a practical indicator of OPLS-DA model reliability. Current Metabolomics, 4, 97–103.  https://doi.org/10.2174/2213235x04666160613122429.CrossRefPubMedPubMedCentralGoogle Scholar
  32. Zyprych-Walczak, J., Szabelska, A., Handschuh, L., Górczak, K., Klamecka, K., Figlerowicz, M., & Siatkowski, I. (2015). The impact of normalization methods on RNA-Seq data analysis. BioMed Research International.  https://doi.org/10.1155/2015/621690.PubMedPubMedCentralCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of Nebraska-LincolnLincolnUSA
  2. 2.Department of ChemistryUniversity of Nebraska-LincolnLincolnUSA
  3. 3.Nebraska Center for Integrated Biomolecular CommunicationLincolnUSA

Personalised recommendations