Abstract
Metabolomics emerges as one of the cornerstones in systems biology by characterizing metabolic activities as the ultimate readout of physiological processes of biological systems thereby linking genotypes with the corresponding phenotypes. As metabolomics data are high-dimensional, statistical data analysis is complex. No single technique for statistical analysis and biological interpretation of these ultracomplex data is sufficient to reveal the full information content of the data. Therefore a combination of univariate and multivariate statistics, network topology and biochemical pathway mapping analysis is in all cases recommended. Therefore, we developed a toolbox with fully graphical user interface support in MATLAB© called covariance inverse (COVAIN). COVAIN provides a complete workflow including uploading data, data preprocessing, uni- and multivariate statistical analysis, Granger time-series analysis, pathway mapping, correlation network topology analysis and visualization, and finally saving results in a user-friendly way. It covers analysis of variance, principal components analysis, independent components analysis, clustering and correlation coefficient analysis and integrates new algorithms, such as Granger causality and permutation entropy analysis that are not implemented in other similar softwares. Furthermore, we provide a new algorithm to reconstruct a differential Jacobian matrix of two different metabolic conditions. The algorithm is based on the assumptions of stochastic fluctuations in the metabolic network as described by us recently. By integrating the metabolomics covariance matrix and the stoichiometric matrix N of the corresponding pathways this approach allows for a systematic investigation of perturbation sites in the biochemical network based on metabolomics data. COVAIN was primarily developed for metabolomics data but can also be used for other omics data analysis. A C language programming module was integrated to handle computational intensive work for large datasets, e.g., genome-level proteomics and transcriptomics data sets which usually contain several thousand or more variables. COVAIN can perform cross analysis and integration between several datasets, which might be useful to investigate responses on different hierarchies of cellular contexts and to reveal the systems response as an integrated molecular network. The source codes can be downloaded from http://www.univie.ac.at/mosys/software.html.
Similar content being viewed by others
References
Aprees, T. (1980). Integration of pathways of synthesis and degradation of hexose phosphates. In J. Preiss (Ed.), The biochemistry of plants (Vol. 3, pp. 1–29). New York: Academic Press.
Arkin, A., Shen, P. D., & Ross, J. (1997). A test case of correlation metric construction of a reaction pathway from measurements. Science, 277, 1275–1279.
Arkin, A., Ross, J., & McAdams, H. H. (1998). Stochastic kinetic analysis of developmental pathway bifurcation in phage lambda-infected Escherichia coli cells. Genetics, 149, 1633–1648.
Bandt, C., & Pompe, B. (2002). Permutation entropy: A natural complexity measure for time series. Physical Review Letters, 88, 174102.
Bassham, J. A., Benson, A. A., & Calvin, M. (1950). The path of carbon in photosynthesis. Journal of Biological Chemistry, 185, 781–787.
Batagelj, V., & Mrvar, A. (2004). Pajek—analysis and visualization of large networks. Graph Drawing Software, 378, 77–103.
Broeckling, C. D., Huhman, D. V., Farag, M. A., et al. (2005). Metabolic profiling of Medicago truncatula cell cultures reveals the effects of biotic and abiotic elicitors on metabolism. Journal of Experimental Botany, 56, 323–336.
Camacho, D., Fuente, A., & Mendes, P. (2005). The origin of correlations in metabolomics data. Metabolomics, 1, 53–63.
Clish, C. B., Davidov, E., Oresic, M., et al. (2004). Integrative biological analysis of the APOE*3-Leiden transgenic mouse. Omics: A Journal of Integrative Biology, 8, 3–13.
Cornishbowden, A., & Hofmeyr, J. H. S. (1994). Determination of control coefficients in intact metabolic systems. Biochemical Journal, 298, 367–375.
Dal’Molin, C. G. D., Quek, L. E., Palfreyman, R. W., et al. (2010). AraGEM, a genome-scale reconstruction of the primary metabolic network in Arabidopsis. Plant Physiology, 152, 579–589.
Engl, H. W., Hanke, M., & Neubauer, A. (Eds.). (1996). Regularization of inverse problems (Vol. 375). Dordrecht: Kluwer.
Engl, H. W., Flamm C., Kugler P., et al. (2009). Inverse problems in systems biology. Inverse Problems, 25. doi:10.1088/0266-5611/1025/1012/123014.
Fukushima, A., Kusano, M., Redestig, H., et al. (2011). Metabolomic correlation-network modules in Arabidopsis based on a graph-clustering approach. BMC Systems Biology, 5, 1.
Giersch, C. (1994). Determining elasticities from multiple measurements of steady-state flux rates and metabolite concentrations—theory. Journal of Theoretical Biology, 169, 89–99.
Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37, 414–426.
Heinrich, R., & Rapoport, T. A. (1974). Linear steady-state treatment of enzymatic chains—general properties, control and effector strength. European Journal of Biochemistry, 42, 89–95.
Hendrickx, D. M., Hendriks, M., Eilers, P. H. C., et al. (2011). Reverse engineering of metabolic networks, a critical assessment. Molecular Biosystems, 7, 511–520.
Jansen, J. J., Szymanska, E., Hoefsloot, H. C. J., et al. (2011). Between metabolite relationships: An essential aspect of metabolic change. Metabolomics. doi:10.1007/s11306-011-0316-1.
Jia, G., Stephanopoulos, G. N., & Gunawan, R. (2011). Parameter estimation of kinetic models from metabolic profiles: Two-phase dynamic decoupling method. Bioinformatics, 27, 1964–1970.
Kacser, H., & Burns, J. A. (1973). The control of flux. Symposia of the Society for Experimental Biology, 27, 65–104.
Kanehisa, M., Araki, M., Goto, S., et al. (2008). KEGG for linking genomes to life and the environment. Nucleic Acids Research, 36, D480–D484.
Karp, P. D., Ouzounis, C. A., Moore-Kochlacs, C., et al. (2005). Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Research, 33, 6083–6089.
Kilian, J., Whitehead, D., Horak, J., et al. (2007). The AtGenExpress global stress expression data set: Protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. Plant Journal, 50, 347–363.
Kose, F., Weckwerth, W., Linke, T., & Fiehn, O. (2001). Visualizing plant metabolomic correlation networks using clique-metabolite matrices. Bioinformatics, 17, 1198–1208.
Kusano, M., Fukushima, A., Arita, M., et al. (2007). Unbiased characterization of genotype-dependent metabolic regulations by metabolomic approach in Arabidopsis thaliana. BMC Systems Biology, 1, 17.
Le Novere, N., Bornstein, B., Broicher, A., et al. (2006). BioModels Database: A free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Research, 34, D689–D691.
Markovsky, I., & Van Huffel, S. (2007). Overview of total least squares methods. Signal Processing, 87, 2283–2302.
Mendes, P., Camacho, D., & de la Fuente, A. (2005). Modelling and simulation for metabolomics data analysis. Biochemical Society Transactions, 33, 1427–1429.
Meyerhof, O. (1927). Recent investigations on the aerobic and an-aerobic metabolism of carbohydrates. Journal of General Physiology, 8, 531–542.
Meyerhof, O. (1947). The rates of glycolysis of glucose and fructose in extracts of brain. Archives of Biochemistry, 13, 485–487.
Mochida, K., Furuta, T., Ebana, K., et al. (2009). Correlation exploration of metabolic and genomic diversity in rice. BMC Genomics, 10, 568.
Morgenthal, K., Wienkoop, S., Scholz, M., et al. (2005). Correlative GC-TOF-MS based metabolite profiling and LC-MS based protein profiling reveal time-related systemic regulation of metabolite-protein networks and improve pattern recognition for multiple biomarker selection. Metabolomics, 1, 109–121.
Morgenthal, K., Weckwerth, W., & Steuer, R. (2006). Metabolomic networks in plants: Transitions from pattern recognition to biological interpretation. Biosystems, 83, 108–117.
Muller-Linow, M., Weckwerth, W., & Hutt, M. T. (2007). Consistency analysis of metabolic correlation networks. BMC Systems Biology, 1, 44–56.
Paulsson, J. (2005). Models of stochastic gene expression. Physics of Life Reviews, 2, 157–175.
Rao, C. V., Wolf, D. M., & Arkin, A. P. (2002). Control, exploitation and tolerance of intracellular noise. Nature, 420, 231–237.
Rascher, U., Hutt, M. T., Siebke, K., et al. (2001). Spatiotemporal variation of metabolism in a plant circadian rhythm: The biological clock as an assembly of coupled individual oscillators. Proceedings of the National Academy of Sciences of the United States of America, 98, 11801–11805.
Samoilov, M., Arkin, A., & Ross, J. (2001). On the deduction of chemical reaction pathways from measurements of time series of concentrations. Chaos, 11, 108–114.
Scholz, M., Gatzek, S., Sterling, A., et al. (2004). Metabolite fingerprinting: Detecting biological features by independent component analysis. Bioinformatics, 20, 2447–2454.
Shannon, P., Markiel, A., Ozier, O., et al. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research, 13, 2498–2504.
Smilde, A. K., Westerhuis, J. A., Hoefsloot, H. C. J., et al. (2010). Dynamic metabolomic data analysis: A tutorial review. Metabolomics, 6, 3–17.
Smoot, M. E., Ono, K., Ruscheinski, J., et al. (2011). Cytoscape 2.8: New features for data integration and network visualization. Bioinformatics, 27, 431–432.
Steuer, R., Kurths, J., Fiehn, O., & Weckwerth, W. (2003a). Interpreting correlations in metabolomic networks. Biochemical Society Transactions, 31, 1476–1478.
Steuer, R., Kurths, J., Fiehn, O., & Weckwerth, W. (2003b). Observing and interpreting correlations in metabolomic networks. Bioinformatics, 19, 1019–1026.
Steuer, R., Morgenthal, K., Weckwerth, W., & Selbig, J. (2006). A gentle guide to the analysis of metabolomic data. Methods in Molecular Biology, 358, 105–126.
Sun, X., Zou, Y., Nikiforova, V., et al. (2010). The complexity of gene expression dynamics revealed by permutation entropy. BMC Bioinformatics, 11, 607.
van Kampen, N. G. (1992). Stochastic processes in physics and chemistry. Amsterdam: Elsevier.
Vance, W., Arkin, A., & Ross, J. (2002). Determination of causal connectivities of species in reaction networks. Proceedings of the National Academy of Sciences of the United States of America, 99, 5816–5821.
Walther, D., Strassburg, K., Durek, P., & Kopka, J. (2010). Metabolic pathway relationships revealed by an integrative analysis of the transcriptional and metabolic temperature stress–response dynamics in yeast. Omics: A Journal of Integrative Biology, 14, 261–274.
Weckwerth, W. (2003). Metabolomics in systems biology. Annual Review of Plant Biology, 54, 669–689.
Weckwerth, W. (2011). Unpredictability of metabolism—the key role of metabolomics science in combination with next-generation genome sequencing. Analytical and Bioanalytical Chemistry, 400, 1967–1978.
Weckwerth, W., & Fiehn, O. (2002). Can we discover novel pathways using metabolomic analysis? Current Opinion in Biotechnology, 13, 156–160.
Weckwerth, W., & Morgenthal, K. (2005). Metabolomics: From pattern recognition to biological interpretation. Drug Discovery Today, 10, 1551–1558.
Weckwerth, W., & Steuer, R. (2005). Metabolomic networks: From experiment to biological interpretation. In S. Vaidyanathan, G. G. Harrigan, & R. Goodacre (Eds.), Metabolomics. New York: Springer.
Weckwerth, W., Tolstikov V., & Fiehn O. (2001). Metabolomic characterization of transgenic potato plants using GC/TOF and LC/MS analysis reveals silent metabolic phenotypes. Proceedings of the 49th ASMS conference on mass spectrometry and allied topics (pp. 1–2).
Weckwerth, W., Loureiro, M. E., Wenzel, K., & Fiehn, O. (2004a). Differential metabolic networks unravel the effects of silent plant phenotypes. Proceedings of the National Academy of Sciences of the United States of America, 101, 7809–7814.
Weckwerth, W., Wenzel, K., & Fiehn, O. (2004b). Process for the integrated extraction identification, and quantification of metabolites, proteins and RNA to reveal their co-regulation in biochemical networks. Proteomics, 4, 78–83.
Westerhuis, J. A., van Velzen, E. J., Hoefsloot, H. C., & Smilde, A. K. (2010). Multivariate paired data analysis: Multilevel PLSDA versus OPLSDA. Metabolomics, 6, 119–128.
Wienkoop, S., Morgenthal, K., Wolschin, F., et al. (2008). Integration of metabolomic and proteomic phenotypes: Analysis of data covariance dissects starch and RFO metabolism from low and high temperature compensation response in Arabidopsis thaliana. Molecular and Cellular Proteomics, 7, 1725–1736.
Wienkoop, S., Weiss, J., May, P., et al. (2010). Targeted proteomics for Chlamydomonas reinhardtii combined with rapid subcellular protein fractionation, metabolomics and metabolic flux analyses. Molecular Biosystems, 6, 1018–1031.
Acknowledgments
The authors thank especially Dirk Walther, Lena Fragner and Stefanie Wienkoop for their helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Sun, X., Weckwerth, W. COVAIN: a toolbox for uni- and multivariate statistics, time-series and correlation network analysis and inverse estimation of the differential Jacobian from metabolomics covariance data. Metabolomics 8 (Suppl 1), 81–93 (2012). https://doi.org/10.1007/s11306-012-0399-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11306-012-0399-3