Skip to main content
Log in

Normalization techniques for PARAFAC modeling of urine metabolomic data

  • Original Article
  • Published:
Metabolomics Aims and scope Submit manuscript

Abstract

Introduction

One of the body fluids often used in metabolomics studies is urine. The concentrations of metabolites in urine are affected by hydration status of an individual, resulting in dilution differences. This requires therefore normalization of the data to correct for such differences. Two normalization techniques are commonly applied to urine samples prior to their further statistical analysis. First, AUC normalization aims to normalize a group of signals with peaks by standardizing the area under the curve (AUC) within a sample to the median, mean or any other proper representation of the amount of dilution. The second approach uses specific end-product metabolites such as creatinine and all intensities within a sample are expressed relative to the creatinine intensity.

Objectives

Another way of looking at urine metabolomics data is by realizing that the ratios between peak intensities are the information-carrying features. This opens up possibilities to use another class of data analysis techniques designed to deal with such ratios: compositional data analysis. The aim of this paper is to develop PARAFAC modeling of three-way urine metabolomics data in the context of compositional data analysis and compare this with standard normalization techniques.

Methods

In the compositional data analysis approach, special coordinate systems are defined to deal with the ratio problem. In essence, it comes down to using other distance measures than the Euclidian Distance that is used in the conventional analysis of metabolomic data.

Results

We illustrate using this type of approach in combination with three-way methods (i.e. PARAFAC) of a longitudinal urine metabolomics study and two simulations. In both cases, the advantage of the compositional approach is established in terms of improved interpretability of the scores and loadings of the PARAFAC model.

Conclusion

For urine metabolomics studies, we advocate the use of compositional data analysis approaches. They are easy to use, well established and proof to give reliable results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Aitchison, J. (2003). A concise guide to compositional data analysis. In CoDaWork’03. Universitat de Girona. Departament d’Informática i Matemática Aplicada.

  • Aitchison, J. (1986). The statistical analysis of compositional data. London: Chapman & Hall.

    Book  Google Scholar 

  • Aitchison, J., & Greenacre, M. (2002). Biplots of compositional data. Journal of the Royal Statistical Society, 51(4), 375–392.

    Article  Google Scholar 

  • Andersson, C., Munck, L., Henrion, R., & Henrion, G. (1997). Analysis of n-dimensional data arrays from fluorescence spectroscopy of an intermediary sugar product. Fresenius’ Journal of Analytical Chemistry, 359, 138–142.

    Article  CAS  Google Scholar 

  • Billheimer, D., Guttorp, P., & Fagan, W. (2001). Statistical interpretation of species composition. Journal of the American Statistical Association, 96(456), 1205–1214.

    Article  Google Scholar 

  • Bosco, M., Garrido, M., & Larrechi, M. (2006). Determination of phenol in the presence of its principal degradation products in water during a tio2-photocatalytic degradation process by three-dimensional excitation-emission matrix fluorescence and parallel factor analysis. Analytica Chimica Acta, 559, 240–247.

    Article  CAS  Google Scholar 

  • Brereton, R. (2009). Chemometrics for pattern recognition. Chichester: Wiley.

    Book  Google Scholar 

  • Bro, R. (1998). Multi-way analysis in the food industry—Models, algorithms and applications. PhD thesis, Universiteit van Amsterdam, The Netherlands.

  • Bro, R. (1997). Parafac. tutorial and applications. Chemometrics and Intelligent Laboratory, 38, 149–171.

    Article  CAS  Google Scholar 

  • Bro, R., & Smilde, A. (2003). Centering and scaling in component analysis. Journal of Chemometrics, 17(1), 16–33.

    Article  CAS  Google Scholar 

  • Carroll, J., & Chang, J. (1970). Analysis of individual differences in multidimensional scaling via an \(n\)-way generalization of Eckart-Young decomposition. Psychometrika, 35, 283–319.

    Article  Google Scholar 

  • Carter, B., Haverkamp, A., & Merenstein, G. B. (1993). The definition of acurate perinatal asphyxia. Psychometrika, 20(2), 287–304.

    CAS  Google Scholar 

  • Chen, Y., Shen, G., Zhang, R., He, J., Zhang, Y., Xu, J., et al. (2013). Combination of injection volume calibration by creatinine and ms signals normalization to overcome urine variability in lc-ms-based metabolomics studies. Psychometrika, 85, 7659–7665.

    CAS  Google Scholar 

  • Development Core Team, R. (2012). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

    Google Scholar 

  • Di Palma, A., Gallo, M., Filzmoser, P., & Hron, K. (2015). A robust Candecomp/Parafac model for compositional data. Submitted.

  • Dieterle, F., Ross, A., Schlotterbeck, G., & Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in H-1 NMR metabolomics. Analytical Chemistry, 78, 4281–4290.

    Article  CAS  PubMed  Google Scholar 

  • Dunn, W. B., Broadhurst, D., Begley, P., Zelena, E., Francis-McIntyre, S., Anderson, N., et al. (2011). Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Analytical Chemistry, 6(7), 1060–1083.

    CAS  Google Scholar 

  • Eaton, M. (1983). Multivariate statistics. A vector space approach. New York: Wiley.

    Google Scholar 

  • Egozcue, J., & Pawlowsky-Glahn, V. (2006). Simplicial geometry for compositional data. In Pawlowsky-Glahn, V., & Buccianti, A., (Eds.), Compositional data analysis in the geosciences: From theory to practice (pp. 145–160). Geological Society, London. Special Publications 264.

  • Egozcue, J., Pawlowsky-Glahn, V., Mateu-Figueras, G., & Barceló-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Analytical Chemistry, 35(3), 279–300.

    Google Scholar 

  • Engle, M. A., Gallo, M., Schroeder, K. T., Geboy, N. J., & Zupancic, J. W. (2014). Three-way compositional analysis of water quality monitoring data. Analytical Chemistry, 21(3), 565–581.

    CAS  Google Scholar 

  • Filzmoser, P., & Hron, K. (2015). Robust coordinates for compositional data using weighted balances. In K. Nordhausen & S. Taskinen (Eds.), Modern nonparametric, robust and multivariate methods (pp. 167–184). Heidelberg: Springer.

    Chapter  Google Scholar 

  • Filzmoser, P., & Walczak, B. (2014). What can go wrong at the data normalization step for identification of biomarkers? Analytical Chemistry, 1362, 194–205.

    CAS  Google Scholar 

  • Fung, E. T., & Enderwick, C. (2002). Proteinchip clinical proteomics: Computational challenges and solutions. Analytical Chemistry, 32, S34–S41.

    Google Scholar 

  • Gallo, M. (2013). Log-ratio and parallel factor analysis: An approach to analyze three-way compositional data. In A. N. Proto, M. Squillante, & J. Kacprzyk (Eds.), Advanced dynamic modeling of economic and social systems (Vol. 448, pp. 209–221)., Studies in Computational Intelligence Springer: Heidelberg.

    Chapter  Google Scholar 

  • Giordani, P., Kiers, H., & Del Ferraro, M. (2014). Three-way component analysis using the R package ThreeWay. Analytical Chemistry, 57(7), 1–23.

    Google Scholar 

  • Goodacre, R., Broadhurst, D., Smilde, A., Kristal, B., Baker, J., Beger, R., et al. (2007). Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics, 3, 231–241.

    Article  CAS  Google Scholar 

  • Haglund, O. (2008). Qualitative comparison of normalization approaches in maldi-ms. Master of Science Thesis, Royal Institute of Technology, Stockholm, Sweden.

  • Harshman, R. (1970). Foundations of the parafac procedure: Models and conditions for an “explanatory” multimodal factor analysis. UCLA Working Papers in Phonetics, Vol.16, pp. 1–84.

  • Harshman, R., & Lundy, M. (1994). Parafac: Parallel factor analysis. Metabolomics, 18, 39–72.

    Google Scholar 

  • Hron, K., Jelínková, M., Filzmoser, P., Kreuziger, R., Bednář, P., & Barták, P. (2012). Statistical analysis of wines using a robust compositional biplot. Talanta, 90, 46–50.

    Article  CAS  PubMed  Google Scholar 

  • Hubert, M., Van Kerckhoven, J., & Verdonck, T. (2012). Robust parafac for incomplete data. Talanta, 26(6), 290–298.

    CAS  Google Scholar 

  • Janečková, H., Hron, K., Wojtowicz, P., Hlídková, E., Barešová, A., Friedecký, D., et al. (2012). Targeted metabolomic analysis of plasma samples for the diagnosis of inherited metabolic disorders. Talanta, 1226, 11–17.

    Google Scholar 

  • Kalivodová, A., Hron, K., Filzmoser, P., Najdekr, L., Janečková, H., & Adam, T. (2015). PLS-DA for compositional data with application to metabolomics. Talanta, 29, 21–28.

    Google Scholar 

  • Karlíková, R., Široká, J., Jahn, P., Friedecký, D., Gardlo, A., Janečková, H., Hrdinová, F., Drábková, Z., and Adam, T. (2016). Atypical myopathy of grazing horses: a metabolic study. Under review.

  • Kiers, A. L. (2000). Towards a standardized notation and terminology in multiway analysis. Talanta, 14, 105–122.

    CAS  Google Scholar 

  • Kolda, T., & Bader, B. W. (2009). Talanta, 51(3), 455–500.

    Google Scholar 

  • Korhoňová, M., Hron, K., Klimčíková, D., Müller, L., Bednář, P., & Barták, P. (2009). Coffee aroma—Statistical analysis of compositional data. Talanta, 80, 710–715.

    Article  PubMed  Google Scholar 

  • Kruskal, J. (1977). Three-way arrays: Rank and uniqueness of trilinear decomposition, with application to arithmetic complexity and statistics. Linear Algebra Applications, 18, 95–138.

    Article  Google Scholar 

  • Leibovici, D., & Sabatier, R. (1998). A singular value decomposition of k-way array for a principal component analysis of multiway data, pta-k. Linear Algebra Applications, 269, 307–329.

    Article  Google Scholar 

  • Martín-Fernández, J. A., Palarea-Albaladejo, J., & Olea, R. A. (2011). Dealing with zeros. In V. Pawlowsky-Glahn & A. Buccianti (Eds.), Compositional data analysis: Theory and applications (pp. 43–58). Chichester: Wiley.

    Chapter  Google Scholar 

  • Mateu-Figueras, G., & Pawlowsky-Glahn, V. (2008). A critical approach to probability laws in geochemistry. Mathematical Geosciences, 40(5), 489–502.

    Article  Google Scholar 

  • Mei, J., Alexander, J., Adam, B., & Hannon, W. (2001). Use of filter paper for the collection and analysis of human whole blood specimens. Mathematical Geosciences, 131, 1631–1636.

    Google Scholar 

  • Najdekr, L., Gardlo, A., Mádrová, L., Friedecký, D., Janečková, H., Correa, E., et al. (2015). Oxidized phosphatidylcholines suggest oxidative stress in patients with medium-chain acyl-CoA dehydrogenase deficiency. Talanta, 139, 62–66.

    Article  CAS  PubMed  Google Scholar 

  • Paatero, P., & Juntto, S. (2000). Determination of underlying components of a cyclical time series by means of two-way and three-way factor analytic techniques. Talanta, 14, 241–259.

    CAS  Google Scholar 

  • Pawlowsky-Glahn, V., & Buccianti, A. (2011). Compositional data analysis: Theory and applications. Chichester: Wiley.

    Book  Google Scholar 

  • Pawlowsky-Glahn, V., & Egozcue, J. J. (2001). Geometric approach to statistical analysis on the simplex. Talanta, 15(5), 384–398.

    Google Scholar 

  • Pawlowsky-Glahn, V., Egozcue, J., & Tolosana-Delgado, R. (2015). Modeling and analysis of compositional data. Chichester: Wiley.

    Google Scholar 

  • Pearson, K. (1897). Mathematical contributions to the theory of evolution. on a form of spurious correlation which may arise when indices are used in the measurement of organs. In: Proceedings of the Royal Society of London, LX.

  • Pravdova, V., Boucon, C., de Jong, S., Walczak, B., & Massart, D. (2002). Three-way principal component analysis applied to food analysis: An example. Talanta, 462, 133–148.

    CAS  Google Scholar 

  • Sauve, A., & Speed, T. (2004). Normalization, baseline correction and alignment of high-throughput mass spectrometry data. Proceedings of the genomic signal processing and statistics workshop, Baltimore, MO, USA, May 26–27, pages http://stat–www.berkeley.edu/users/terry/Group/publications/Final2Gensips2004Sauve.pdf.

  • Smilde, A., Bro, R., & Geladi, P. (2004). Multi-way analysis with applications in the chemical sciences. Chichester, UK: Wiley.

    Book  Google Scholar 

  • Templ, M., Hron, K., & Filzmoser, P. (2011). robCompositions: An R-package for robust statistical analysis of compositional data.

  • Tucker, L. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31, 279–311.

    Article  CAS  PubMed  Google Scholar 

  • van den Berg, R. A., Hoefsloot, H. C. J., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling, and transformations: Improving the biological information content of metabolomics data. Psychometrika, 7, 142.

    Google Scholar 

  • Waikar, S., Sabbisetti, V. S., & Bonventre, J. (2010). Normalization of urinary biomarkers to creatinine during changes in glomerular filtration rate. Kidney International, 78(5), 486–494.

  • Warracka, B., Hnatyshyna, S., Otta, K., Reilya, M., Sandersa, M., Zhanga, H., et al. (2009). Normalization strategies for metabonomic analysis of urine samples. Journal of Chromatography B, 877, 547–552.

    Article  Google Scholar 

  • Weintraub, A., Carey, A., Connors, J., Blanco, V., & Green, R. (2015). Relationship of maternal creatinine to first neonatal creatinine in infants<30 weeks gestation. Journal of Perinatology, Jan 15.:Epub ahead of print.

Download references

Compliance with Ethical Standards

Conflicts of interest

The authors confirm that they have no conflicts of interest.

Funding

This study was funded by the grant 15-34613L of the Czech Science Foundation (GA CR), the projects CZ.1.07/2.3.00/20.0170 and LO1304 of the Ministry of Education, Youth and Sports of the Czech Republic, grant LF_2016_014 by IGA MZČR NT12218, IGUP Olomouc and grant IGA_PrF_2016_025 of the Internal Grant Agency of the Palacký University in Olomouc. The authors gratefully acknowledge to MUDr. Lumír Kantor, Ph.D from Neonatal Department, University Hospital Olomouc, Olomouc, Czech Republic.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alžběta Gardlo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gardlo, A., Smilde, A.K., Hron, K. et al. Normalization techniques for PARAFAC modeling of urine metabolomic data. Metabolomics 12, 117 (2016). https://doi.org/10.1007/s11306-016-1059-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11306-016-1059-9

Keywords

Navigation