, Volume 8, Supplement 1, pp 81–93 | Cite as

COVAIN: a toolbox for uni- and multivariate statistics, time-series and correlation network analysis and inverse estimation of the differential Jacobian from metabolomics covariance data

  • Xiaoliang Sun
  • Wolfram WeckwerthEmail author
Original Article


Metabolomics emerges as one of the cornerstones in systems biology by characterizing metabolic activities as the ultimate readout of physiological processes of biological systems thereby linking genotypes with the corresponding phenotypes. As metabolomics data are high-dimensional, statistical data analysis is complex. No single technique for statistical analysis and biological interpretation of these ultracomplex data is sufficient to reveal the full information content of the data. Therefore a combination of univariate and multivariate statistics, network topology and biochemical pathway mapping analysis is in all cases recommended. Therefore, we developed a toolbox with fully graphical user interface support in MATLAB© called covariance inverse (COVAIN). COVAIN provides a complete workflow including uploading data, data preprocessing, uni- and multivariate statistical analysis, Granger time-series analysis, pathway mapping, correlation network topology analysis and visualization, and finally saving results in a user-friendly way. It covers analysis of variance, principal components analysis, independent components analysis, clustering and correlation coefficient analysis and integrates new algorithms, such as Granger causality and permutation entropy analysis that are not implemented in other similar softwares. Furthermore, we provide a new algorithm to reconstruct a differential Jacobian matrix of two different metabolic conditions. The algorithm is based on the assumptions of stochastic fluctuations in the metabolic network as described by us recently. By integrating the metabolomics covariance matrix and the stoichiometric matrix N of the corresponding pathways this approach allows for a systematic investigation of perturbation sites in the biochemical network based on metabolomics data. COVAIN was primarily developed for metabolomics data but can also be used for other omics data analysis. A C language programming module was integrated to handle computational intensive work for large datasets, e.g., genome-level proteomics and transcriptomics data sets which usually contain several thousand or more variables. COVAIN can perform cross analysis and integration between several datasets, which might be useful to investigate responses on different hierarchies of cellular contexts and to reveal the systems response as an integrated molecular network. The source codes can be downloaded from


Metabolomics Jacobian Inverse modelling Genotype Phenotype Stoichiometric matrix Stochastic processes Network Perturbation sites 



The authors thank especially Dirk Walther, Lena Fragner and Stefanie Wienkoop for their helpful suggestions.

Supplementary material

11306_2012_399_MOESM1_ESM.doc (214 kb)
Supplementary material 1 (DOC 214 kb)


  1. Aprees, T. (1980). Integration of pathways of synthesis and degradation of hexose phosphates. In J. Preiss (Ed.), The biochemistry of plants (Vol. 3, pp. 1–29). New York: Academic Press.Google Scholar
  2. Arkin, A., Shen, P. D., & Ross, J. (1997). A test case of correlation metric construction of a reaction pathway from measurements. Science, 277, 1275–1279.CrossRefGoogle Scholar
  3. Arkin, A., Ross, J., & McAdams, H. H. (1998). Stochastic kinetic analysis of developmental pathway bifurcation in phage lambda-infected Escherichia coli cells. Genetics, 149, 1633–1648.PubMedGoogle Scholar
  4. Bandt, C., & Pompe, B. (2002). Permutation entropy: A natural complexity measure for time series. Physical Review Letters, 88, 174102.PubMedCrossRefGoogle Scholar
  5. Bassham, J. A., Benson, A. A., & Calvin, M. (1950). The path of carbon in photosynthesis. Journal of Biological Chemistry, 185, 781–787.PubMedGoogle Scholar
  6. Batagelj, V., & Mrvar, A. (2004). Pajek—analysis and visualization of large networks. Graph Drawing Software, 378, 77–103.CrossRefGoogle Scholar
  7. Broeckling, C. D., Huhman, D. V., Farag, M. A., et al. (2005). Metabolic profiling of Medicago truncatula cell cultures reveals the effects of biotic and abiotic elicitors on metabolism. Journal of Experimental Botany, 56, 323–336.PubMedCrossRefGoogle Scholar
  8. Camacho, D., Fuente, A., & Mendes, P. (2005). The origin of correlations in metabolomics data. Metabolomics, 1, 53–63.CrossRefGoogle Scholar
  9. Clish, C. B., Davidov, E., Oresic, M., et al. (2004). Integrative biological analysis of the APOE*3-Leiden transgenic mouse. Omics: A Journal of Integrative Biology, 8, 3–13.CrossRefGoogle Scholar
  10. Cornishbowden, A., & Hofmeyr, J. H. S. (1994). Determination of control coefficients in intact metabolic systems. Biochemical Journal, 298, 367–375.Google Scholar
  11. Dal’Molin, C. G. D., Quek, L. E., Palfreyman, R. W., et al. (2010). AraGEM, a genome-scale reconstruction of the primary metabolic network in Arabidopsis. Plant Physiology, 152, 579–589.CrossRefGoogle Scholar
  12. Engl, H. W., Hanke, M., & Neubauer, A. (Eds.). (1996). Regularization of inverse problems (Vol. 375). Dordrecht: Kluwer.Google Scholar
  13. Engl, H. W., Flamm C., Kugler P., et al. (2009). Inverse problems in systems biology. Inverse Problems, 25. doi: 10.1088/0266-5611/1025/1012/123014.
  14. Fukushima, A., Kusano, M., Redestig, H., et al. (2011). Metabolomic correlation-network modules in Arabidopsis based on a graph-clustering approach. BMC Systems Biology, 5, 1.PubMedCrossRefGoogle Scholar
  15. Giersch, C. (1994). Determining elasticities from multiple measurements of steady-state flux rates and metabolite concentrations—theory. Journal of Theoretical Biology, 169, 89–99.CrossRefGoogle Scholar
  16. Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37, 414–426.Google Scholar
  17. Heinrich, R., & Rapoport, T. A. (1974). Linear steady-state treatment of enzymatic chains—general properties, control and effector strength. European Journal of Biochemistry, 42, 89–95.PubMedCrossRefGoogle Scholar
  18. Hendrickx, D. M., Hendriks, M., Eilers, P. H. C., et al. (2011). Reverse engineering of metabolic networks, a critical assessment. Molecular Biosystems, 7, 511–520.PubMedCrossRefGoogle Scholar
  19. Jansen, J. J., Szymanska, E., Hoefsloot, H. C. J., et al. (2011). Between metabolite relationships: An essential aspect of metabolic change. Metabolomics. doi: 10.1007/s11306-011-0316-1.
  20. Jia, G., Stephanopoulos, G. N., & Gunawan, R. (2011). Parameter estimation of kinetic models from metabolic profiles: Two-phase dynamic decoupling method. Bioinformatics, 27, 1964–1970.PubMedCrossRefGoogle Scholar
  21. Kacser, H., & Burns, J. A. (1973). The control of flux. Symposia of the Society for Experimental Biology, 27, 65–104.PubMedGoogle Scholar
  22. Kanehisa, M., Araki, M., Goto, S., et al. (2008). KEGG for linking genomes to life and the environment. Nucleic Acids Research, 36, D480–D484.PubMedCrossRefGoogle Scholar
  23. Karp, P. D., Ouzounis, C. A., Moore-Kochlacs, C., et al. (2005). Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Research, 33, 6083–6089.PubMedCrossRefGoogle Scholar
  24. Kilian, J., Whitehead, D., Horak, J., et al. (2007). The AtGenExpress global stress expression data set: Protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. Plant Journal, 50, 347–363.PubMedCrossRefGoogle Scholar
  25. Kose, F., Weckwerth, W., Linke, T., & Fiehn, O. (2001). Visualizing plant metabolomic correlation networks using clique-metabolite matrices. Bioinformatics, 17, 1198–1208.PubMedCrossRefGoogle Scholar
  26. Kusano, M., Fukushima, A., Arita, M., et al. (2007). Unbiased characterization of genotype-dependent metabolic regulations by metabolomic approach in Arabidopsis thaliana. BMC Systems Biology, 1, 17.CrossRefGoogle Scholar
  27. Le Novere, N., Bornstein, B., Broicher, A., et al. (2006). BioModels Database: A free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Research, 34, D689–D691.PubMedCrossRefGoogle Scholar
  28. Markovsky, I., & Van Huffel, S. (2007). Overview of total least squares methods. Signal Processing, 87, 2283–2302.CrossRefGoogle Scholar
  29. Mendes, P., Camacho, D., & de la Fuente, A. (2005). Modelling and simulation for metabolomics data analysis. Biochemical Society Transactions, 33, 1427–1429.PubMedCrossRefGoogle Scholar
  30. Meyerhof, O. (1927). Recent investigations on the aerobic and an-aerobic metabolism of carbohydrates. Journal of General Physiology, 8, 531–542.PubMedCrossRefGoogle Scholar
  31. Meyerhof, O. (1947). The rates of glycolysis of glucose and fructose in extracts of brain. Archives of Biochemistry, 13, 485–487.PubMedGoogle Scholar
  32. Mochida, K., Furuta, T., Ebana, K., et al. (2009). Correlation exploration of metabolic and genomic diversity in rice. BMC Genomics, 10, 568.PubMedCrossRefGoogle Scholar
  33. Morgenthal, K., Wienkoop, S., Scholz, M., et al. (2005). Correlative GC-TOF-MS based metabolite profiling and LC-MS based protein profiling reveal time-related systemic regulation of metabolite-protein networks and improve pattern recognition for multiple biomarker selection. Metabolomics, 1, 109–121.CrossRefGoogle Scholar
  34. Morgenthal, K., Weckwerth, W., & Steuer, R. (2006). Metabolomic networks in plants: Transitions from pattern recognition to biological interpretation. Biosystems, 83, 108–117.PubMedCrossRefGoogle Scholar
  35. Muller-Linow, M., Weckwerth, W., & Hutt, M. T. (2007). Consistency analysis of metabolic correlation networks. BMC Systems Biology, 1, 44–56.PubMedCrossRefGoogle Scholar
  36. Paulsson, J. (2005). Models of stochastic gene expression. Physics of Life Reviews, 2, 157–175.CrossRefGoogle Scholar
  37. Rao, C. V., Wolf, D. M., & Arkin, A. P. (2002). Control, exploitation and tolerance of intracellular noise. Nature, 420, 231–237.PubMedCrossRefGoogle Scholar
  38. Rascher, U., Hutt, M. T., Siebke, K., et al. (2001). Spatiotemporal variation of metabolism in a plant circadian rhythm: The biological clock as an assembly of coupled individual oscillators. Proceedings of the National Academy of Sciences of the United States of America, 98, 11801–11805.PubMedCrossRefGoogle Scholar
  39. Samoilov, M., Arkin, A., & Ross, J. (2001). On the deduction of chemical reaction pathways from measurements of time series of concentrations. Chaos, 11, 108–114.PubMedCrossRefGoogle Scholar
  40. Scholz, M., Gatzek, S., Sterling, A., et al. (2004). Metabolite fingerprinting: Detecting biological features by independent component analysis. Bioinformatics, 20, 2447–2454.PubMedCrossRefGoogle Scholar
  41. Shannon, P., Markiel, A., Ozier, O., et al. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research, 13, 2498–2504.PubMedCrossRefGoogle Scholar
  42. Smilde, A. K., Westerhuis, J. A., Hoefsloot, H. C. J., et al. (2010). Dynamic metabolomic data analysis: A tutorial review. Metabolomics, 6, 3–17.PubMedCrossRefGoogle Scholar
  43. Smoot, M. E., Ono, K., Ruscheinski, J., et al. (2011). Cytoscape 2.8: New features for data integration and network visualization. Bioinformatics, 27, 431–432.PubMedCrossRefGoogle Scholar
  44. Steuer, R., Kurths, J., Fiehn, O., & Weckwerth, W. (2003a). Interpreting correlations in metabolomic networks. Biochemical Society Transactions, 31, 1476–1478.PubMedCrossRefGoogle Scholar
  45. Steuer, R., Kurths, J., Fiehn, O., & Weckwerth, W. (2003b). Observing and interpreting correlations in metabolomic networks. Bioinformatics, 19, 1019–1026.PubMedCrossRefGoogle Scholar
  46. Steuer, R., Morgenthal, K., Weckwerth, W., & Selbig, J. (2006). A gentle guide to the analysis of metabolomic data. Methods in Molecular Biology, 358, 105–126.CrossRefGoogle Scholar
  47. Sun, X., Zou, Y., Nikiforova, V., et al. (2010). The complexity of gene expression dynamics revealed by permutation entropy. BMC Bioinformatics, 11, 607.PubMedCrossRefGoogle Scholar
  48. van Kampen, N. G. (1992). Stochastic processes in physics and chemistry. Amsterdam: Elsevier.Google Scholar
  49. Vance, W., Arkin, A., & Ross, J. (2002). Determination of causal connectivities of species in reaction networks. Proceedings of the National Academy of Sciences of the United States of America, 99, 5816–5821.PubMedCrossRefGoogle Scholar
  50. Walther, D., Strassburg, K., Durek, P., & Kopka, J. (2010). Metabolic pathway relationships revealed by an integrative analysis of the transcriptional and metabolic temperature stress–response dynamics in yeast. Omics: A Journal of Integrative Biology, 14, 261–274.CrossRefGoogle Scholar
  51. Weckwerth, W. (2003). Metabolomics in systems biology. Annual Review of Plant Biology, 54, 669–689.PubMedCrossRefGoogle Scholar
  52. Weckwerth, W. (2011). Unpredictability of metabolism—the key role of metabolomics science in combination with next-generation genome sequencing. Analytical and Bioanalytical Chemistry, 400, 1967–1978.PubMedCrossRefGoogle Scholar
  53. Weckwerth, W., & Fiehn, O. (2002). Can we discover novel pathways using metabolomic analysis? Current Opinion in Biotechnology, 13, 156–160.PubMedCrossRefGoogle Scholar
  54. Weckwerth, W., & Morgenthal, K. (2005). Metabolomics: From pattern recognition to biological interpretation. Drug Discovery Today, 10, 1551–1558.PubMedCrossRefGoogle Scholar
  55. Weckwerth, W., & Steuer, R. (2005). Metabolomic networks: From experiment to biological interpretation. In S. Vaidyanathan, G. G. Harrigan, & R. Goodacre (Eds.), Metabolomics. New York: Springer.Google Scholar
  56. Weckwerth, W., Tolstikov V., & Fiehn O. (2001). Metabolomic characterization of transgenic potato plants using GC/TOF and LC/MS analysis reveals silent metabolic phenotypes. Proceedings of the 49th ASMS conference on mass spectrometry and allied topics (pp. 1–2).Google Scholar
  57. Weckwerth, W., Loureiro, M. E., Wenzel, K., & Fiehn, O. (2004a). Differential metabolic networks unravel the effects of silent plant phenotypes. Proceedings of the National Academy of Sciences of the United States of America, 101, 7809–7814.PubMedCrossRefGoogle Scholar
  58. Weckwerth, W., Wenzel, K., & Fiehn, O. (2004b). Process for the integrated extraction identification, and quantification of metabolites, proteins and RNA to reveal their co-regulation in biochemical networks. Proteomics, 4, 78–83.PubMedCrossRefGoogle Scholar
  59. Westerhuis, J. A., van Velzen, E. J., Hoefsloot, H. C., & Smilde, A. K. (2010). Multivariate paired data analysis: Multilevel PLSDA versus OPLSDA. Metabolomics, 6, 119–128.PubMedCrossRefGoogle Scholar
  60. Wienkoop, S., Morgenthal, K., Wolschin, F., et al. (2008). Integration of metabolomic and proteomic phenotypes: Analysis of data covariance dissects starch and RFO metabolism from low and high temperature compensation response in Arabidopsis thaliana. Molecular and Cellular Proteomics, 7, 1725–1736.PubMedCrossRefGoogle Scholar
  61. Wienkoop, S., Weiss, J., May, P., et al. (2010). Targeted proteomics for Chlamydomonas reinhardtii combined with rapid subcellular protein fractionation, metabolomics and metabolic flux analyses. Molecular Biosystems, 6, 1018–1031.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Molecular Systems BiologyUniversity of ViennaViennaAustria

Personalised recommendations