Skip to main content
Log in

COVAIN: a toolbox for uni- and multivariate statistics, time-series and correlation network analysis and inverse estimation of the differential Jacobian from metabolomics covariance data

  • Original Article
  • Published:
Metabolomics Aims and scope Submit manuscript

Abstract

Metabolomics emerges as one of the cornerstones in systems biology by characterizing metabolic activities as the ultimate readout of physiological processes of biological systems thereby linking genotypes with the corresponding phenotypes. As metabolomics data are high-dimensional, statistical data analysis is complex. No single technique for statistical analysis and biological interpretation of these ultracomplex data is sufficient to reveal the full information content of the data. Therefore a combination of univariate and multivariate statistics, network topology and biochemical pathway mapping analysis is in all cases recommended. Therefore, we developed a toolbox with fully graphical user interface support in MATLAB© called covariance inverse (COVAIN). COVAIN provides a complete workflow including uploading data, data preprocessing, uni- and multivariate statistical analysis, Granger time-series analysis, pathway mapping, correlation network topology analysis and visualization, and finally saving results in a user-friendly way. It covers analysis of variance, principal components analysis, independent components analysis, clustering and correlation coefficient analysis and integrates new algorithms, such as Granger causality and permutation entropy analysis that are not implemented in other similar softwares. Furthermore, we provide a new algorithm to reconstruct a differential Jacobian matrix of two different metabolic conditions. The algorithm is based on the assumptions of stochastic fluctuations in the metabolic network as described by us recently. By integrating the metabolomics covariance matrix and the stoichiometric matrix N of the corresponding pathways this approach allows for a systematic investigation of perturbation sites in the biochemical network based on metabolomics data. COVAIN was primarily developed for metabolomics data but can also be used for other omics data analysis. A C language programming module was integrated to handle computational intensive work for large datasets, e.g., genome-level proteomics and transcriptomics data sets which usually contain several thousand or more variables. COVAIN can perform cross analysis and integration between several datasets, which might be useful to investigate responses on different hierarchies of cellular contexts and to reveal the systems response as an integrated molecular network. The source codes can be downloaded from http://www.univie.ac.at/mosys/software.html.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aprees, T. (1980). Integration of pathways of synthesis and degradation of hexose phosphates. In J. Preiss (Ed.), The biochemistry of plants (Vol. 3, pp. 1–29). New York: Academic Press.

    Google Scholar 

  • Arkin, A., Shen, P. D., & Ross, J. (1997). A test case of correlation metric construction of a reaction pathway from measurements. Science, 277, 1275–1279.

    Article  CAS  Google Scholar 

  • Arkin, A., Ross, J., & McAdams, H. H. (1998). Stochastic kinetic analysis of developmental pathway bifurcation in phage lambda-infected Escherichia coli cells. Genetics, 149, 1633–1648.

    PubMed  CAS  Google Scholar 

  • Bandt, C., & Pompe, B. (2002). Permutation entropy: A natural complexity measure for time series. Physical Review Letters, 88, 174102.

    Article  PubMed  Google Scholar 

  • Bassham, J. A., Benson, A. A., & Calvin, M. (1950). The path of carbon in photosynthesis. Journal of Biological Chemistry, 185, 781–787.

    PubMed  CAS  Google Scholar 

  • Batagelj, V., & Mrvar, A. (2004). Pajek—analysis and visualization of large networks. Graph Drawing Software, 378, 77–103.

    Article  Google Scholar 

  • Broeckling, C. D., Huhman, D. V., Farag, M. A., et al. (2005). Metabolic profiling of Medicago truncatula cell cultures reveals the effects of biotic and abiotic elicitors on metabolism. Journal of Experimental Botany, 56, 323–336.

    Article  PubMed  CAS  Google Scholar 

  • Camacho, D., Fuente, A., & Mendes, P. (2005). The origin of correlations in metabolomics data. Metabolomics, 1, 53–63.

    Article  CAS  Google Scholar 

  • Clish, C. B., Davidov, E., Oresic, M., et al. (2004). Integrative biological analysis of the APOE*3-Leiden transgenic mouse. Omics: A Journal of Integrative Biology, 8, 3–13.

    Article  CAS  Google Scholar 

  • Cornishbowden, A., & Hofmeyr, J. H. S. (1994). Determination of control coefficients in intact metabolic systems. Biochemical Journal, 298, 367–375.

    CAS  Google Scholar 

  • Dal’Molin, C. G. D., Quek, L. E., Palfreyman, R. W., et al. (2010). AraGEM, a genome-scale reconstruction of the primary metabolic network in Arabidopsis. Plant Physiology, 152, 579–589.

    Article  Google Scholar 

  • Engl, H. W., Hanke, M., & Neubauer, A. (Eds.). (1996). Regularization of inverse problems (Vol. 375). Dordrecht: Kluwer.

    Google Scholar 

  • Engl, H. W., Flamm C., Kugler P., et al. (2009). Inverse problems in systems biology. Inverse Problems, 25. doi:10.1088/0266-5611/1025/1012/123014.

  • Fukushima, A., Kusano, M., Redestig, H., et al. (2011). Metabolomic correlation-network modules in Arabidopsis based on a graph-clustering approach. BMC Systems Biology, 5, 1.

    Article  PubMed  CAS  Google Scholar 

  • Giersch, C. (1994). Determining elasticities from multiple measurements of steady-state flux rates and metabolite concentrations—theory. Journal of Theoretical Biology, 169, 89–99.

    Article  Google Scholar 

  • Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37, 414–426.

    Google Scholar 

  • Heinrich, R., & Rapoport, T. A. (1974). Linear steady-state treatment of enzymatic chains—general properties, control and effector strength. European Journal of Biochemistry, 42, 89–95.

    Article  PubMed  CAS  Google Scholar 

  • Hendrickx, D. M., Hendriks, M., Eilers, P. H. C., et al. (2011). Reverse engineering of metabolic networks, a critical assessment. Molecular Biosystems, 7, 511–520.

    Article  PubMed  CAS  Google Scholar 

  • Jansen, J. J., Szymanska, E., Hoefsloot, H. C. J., et al. (2011). Between metabolite relationships: An essential aspect of metabolic change. Metabolomics. doi:10.1007/s11306-011-0316-1.

  • Jia, G., Stephanopoulos, G. N., & Gunawan, R. (2011). Parameter estimation of kinetic models from metabolic profiles: Two-phase dynamic decoupling method. Bioinformatics, 27, 1964–1970.

    Article  PubMed  CAS  Google Scholar 

  • Kacser, H., & Burns, J. A. (1973). The control of flux. Symposia of the Society for Experimental Biology, 27, 65–104.

    PubMed  CAS  Google Scholar 

  • Kanehisa, M., Araki, M., Goto, S., et al. (2008). KEGG for linking genomes to life and the environment. Nucleic Acids Research, 36, D480–D484.

    Article  PubMed  CAS  Google Scholar 

  • Karp, P. D., Ouzounis, C. A., Moore-Kochlacs, C., et al. (2005). Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Research, 33, 6083–6089.

    Article  PubMed  CAS  Google Scholar 

  • Kilian, J., Whitehead, D., Horak, J., et al. (2007). The AtGenExpress global stress expression data set: Protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. Plant Journal, 50, 347–363.

    Article  PubMed  CAS  Google Scholar 

  • Kose, F., Weckwerth, W., Linke, T., & Fiehn, O. (2001). Visualizing plant metabolomic correlation networks using clique-metabolite matrices. Bioinformatics, 17, 1198–1208.

    Article  PubMed  CAS  Google Scholar 

  • Kusano, M., Fukushima, A., Arita, M., et al. (2007). Unbiased characterization of genotype-dependent metabolic regulations by metabolomic approach in Arabidopsis thaliana. BMC Systems Biology, 1, 17.

    Article  Google Scholar 

  • Le Novere, N., Bornstein, B., Broicher, A., et al. (2006). BioModels Database: A free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Research, 34, D689–D691.

    Article  PubMed  Google Scholar 

  • Markovsky, I., & Van Huffel, S. (2007). Overview of total least squares methods. Signal Processing, 87, 2283–2302.

    Article  Google Scholar 

  • Mendes, P., Camacho, D., & de la Fuente, A. (2005). Modelling and simulation for metabolomics data analysis. Biochemical Society Transactions, 33, 1427–1429.

    Article  PubMed  CAS  Google Scholar 

  • Meyerhof, O. (1927). Recent investigations on the aerobic and an-aerobic metabolism of carbohydrates. Journal of General Physiology, 8, 531–542.

    Article  PubMed  CAS  Google Scholar 

  • Meyerhof, O. (1947). The rates of glycolysis of glucose and fructose in extracts of brain. Archives of Biochemistry, 13, 485–487.

    PubMed  CAS  Google Scholar 

  • Mochida, K., Furuta, T., Ebana, K., et al. (2009). Correlation exploration of metabolic and genomic diversity in rice. BMC Genomics, 10, 568.

    Article  PubMed  Google Scholar 

  • Morgenthal, K., Wienkoop, S., Scholz, M., et al. (2005). Correlative GC-TOF-MS based metabolite profiling and LC-MS based protein profiling reveal time-related systemic regulation of metabolite-protein networks and improve pattern recognition for multiple biomarker selection. Metabolomics, 1, 109–121.

    Article  CAS  Google Scholar 

  • Morgenthal, K., Weckwerth, W., & Steuer, R. (2006). Metabolomic networks in plants: Transitions from pattern recognition to biological interpretation. Biosystems, 83, 108–117.

    Article  PubMed  CAS  Google Scholar 

  • Muller-Linow, M., Weckwerth, W., & Hutt, M. T. (2007). Consistency analysis of metabolic correlation networks. BMC Systems Biology, 1, 44–56.

    Article  PubMed  Google Scholar 

  • Paulsson, J. (2005). Models of stochastic gene expression. Physics of Life Reviews, 2, 157–175.

    Article  Google Scholar 

  • Rao, C. V., Wolf, D. M., & Arkin, A. P. (2002). Control, exploitation and tolerance of intracellular noise. Nature, 420, 231–237.

    Article  PubMed  CAS  Google Scholar 

  • Rascher, U., Hutt, M. T., Siebke, K., et al. (2001). Spatiotemporal variation of metabolism in a plant circadian rhythm: The biological clock as an assembly of coupled individual oscillators. Proceedings of the National Academy of Sciences of the United States of America, 98, 11801–11805.

    Article  PubMed  CAS  Google Scholar 

  • Samoilov, M., Arkin, A., & Ross, J. (2001). On the deduction of chemical reaction pathways from measurements of time series of concentrations. Chaos, 11, 108–114.

    Article  PubMed  CAS  Google Scholar 

  • Scholz, M., Gatzek, S., Sterling, A., et al. (2004). Metabolite fingerprinting: Detecting biological features by independent component analysis. Bioinformatics, 20, 2447–2454.

    Article  PubMed  CAS  Google Scholar 

  • Shannon, P., Markiel, A., Ozier, O., et al. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research, 13, 2498–2504.

    Article  PubMed  CAS  Google Scholar 

  • Smilde, A. K., Westerhuis, J. A., Hoefsloot, H. C. J., et al. (2010). Dynamic metabolomic data analysis: A tutorial review. Metabolomics, 6, 3–17.

    Article  PubMed  CAS  Google Scholar 

  • Smoot, M. E., Ono, K., Ruscheinski, J., et al. (2011). Cytoscape 2.8: New features for data integration and network visualization. Bioinformatics, 27, 431–432.

    Article  PubMed  CAS  Google Scholar 

  • Steuer, R., Kurths, J., Fiehn, O., & Weckwerth, W. (2003a). Interpreting correlations in metabolomic networks. Biochemical Society Transactions, 31, 1476–1478.

    Article  PubMed  CAS  Google Scholar 

  • Steuer, R., Kurths, J., Fiehn, O., & Weckwerth, W. (2003b). Observing and interpreting correlations in metabolomic networks. Bioinformatics, 19, 1019–1026.

    Article  PubMed  CAS  Google Scholar 

  • Steuer, R., Morgenthal, K., Weckwerth, W., & Selbig, J. (2006). A gentle guide to the analysis of metabolomic data. Methods in Molecular Biology, 358, 105–126.

    Article  Google Scholar 

  • Sun, X., Zou, Y., Nikiforova, V., et al. (2010). The complexity of gene expression dynamics revealed by permutation entropy. BMC Bioinformatics, 11, 607.

    Article  PubMed  Google Scholar 

  • van Kampen, N. G. (1992). Stochastic processes in physics and chemistry. Amsterdam: Elsevier.

    Google Scholar 

  • Vance, W., Arkin, A., & Ross, J. (2002). Determination of causal connectivities of species in reaction networks. Proceedings of the National Academy of Sciences of the United States of America, 99, 5816–5821.

    Article  PubMed  CAS  Google Scholar 

  • Walther, D., Strassburg, K., Durek, P., & Kopka, J. (2010). Metabolic pathway relationships revealed by an integrative analysis of the transcriptional and metabolic temperature stress–response dynamics in yeast. Omics: A Journal of Integrative Biology, 14, 261–274.

    Article  CAS  Google Scholar 

  • Weckwerth, W. (2003). Metabolomics in systems biology. Annual Review of Plant Biology, 54, 669–689.

    Article  PubMed  CAS  Google Scholar 

  • Weckwerth, W. (2011). Unpredictability of metabolism—the key role of metabolomics science in combination with next-generation genome sequencing. Analytical and Bioanalytical Chemistry, 400, 1967–1978.

    Article  PubMed  CAS  Google Scholar 

  • Weckwerth, W., & Fiehn, O. (2002). Can we discover novel pathways using metabolomic analysis? Current Opinion in Biotechnology, 13, 156–160.

    Article  PubMed  CAS  Google Scholar 

  • Weckwerth, W., & Morgenthal, K. (2005). Metabolomics: From pattern recognition to biological interpretation. Drug Discovery Today, 10, 1551–1558.

    Article  PubMed  CAS  Google Scholar 

  • Weckwerth, W., & Steuer, R. (2005). Metabolomic networks: From experiment to biological interpretation. In S. Vaidyanathan, G. G. Harrigan, & R. Goodacre (Eds.), Metabolomics. New York: Springer.

    Google Scholar 

  • Weckwerth, W., Tolstikov V., & Fiehn O. (2001). Metabolomic characterization of transgenic potato plants using GC/TOF and LC/MS analysis reveals silent metabolic phenotypes. Proceedings of the 49th ASMS conference on mass spectrometry and allied topics (pp. 1–2).

  • Weckwerth, W., Loureiro, M. E., Wenzel, K., & Fiehn, O. (2004a). Differential metabolic networks unravel the effects of silent plant phenotypes. Proceedings of the National Academy of Sciences of the United States of America, 101, 7809–7814.

    Article  PubMed  CAS  Google Scholar 

  • Weckwerth, W., Wenzel, K., & Fiehn, O. (2004b). Process for the integrated extraction identification, and quantification of metabolites, proteins and RNA to reveal their co-regulation in biochemical networks. Proteomics, 4, 78–83.

    Article  PubMed  CAS  Google Scholar 

  • Westerhuis, J. A., van Velzen, E. J., Hoefsloot, H. C., & Smilde, A. K. (2010). Multivariate paired data analysis: Multilevel PLSDA versus OPLSDA. Metabolomics, 6, 119–128.

    Article  PubMed  CAS  Google Scholar 

  • Wienkoop, S., Morgenthal, K., Wolschin, F., et al. (2008). Integration of metabolomic and proteomic phenotypes: Analysis of data covariance dissects starch and RFO metabolism from low and high temperature compensation response in Arabidopsis thaliana. Molecular and Cellular Proteomics, 7, 1725–1736.

    Article  PubMed  CAS  Google Scholar 

  • Wienkoop, S., Weiss, J., May, P., et al. (2010). Targeted proteomics for Chlamydomonas reinhardtii combined with rapid subcellular protein fractionation, metabolomics and metabolic flux analyses. Molecular Biosystems, 6, 1018–1031.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The authors thank especially Dirk Walther, Lena Fragner and Stefanie Wienkoop for their helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfram Weckwerth.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOC 214 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, X., Weckwerth, W. COVAIN: a toolbox for uni- and multivariate statistics, time-series and correlation network analysis and inverse estimation of the differential Jacobian from metabolomics covariance data. Metabolomics 8 (Suppl 1), 81–93 (2012). https://doi.org/10.1007/s11306-012-0399-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11306-012-0399-3

Keywords

Navigation