Advertisement

Gaussian Graphical Models to Infer Putative Genes Involved in Nitrogen Catabolite Repression in S. cerevisiae

  • Kevin Kontos
  • Bruno André
  • Jacques van Helden
  • Gianluca Bontempi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5483)

Abstract

Nitrogen is an essential nutrient for all life forms. Like most unicellular organisms, the yeast Saccharomyces cerevisiae transports and catabolizes good nitrogen sources in preference to poor ones. Nitrogen catabolite repression (NCR) refers to this selection mechanism. We propose an approach based on Gaussian graphical models (GGMs), which enable to distinguish direct from indirect interactions between genes, to identify putative NCR genes from putative NCR regulatory motifs and over-represented motifs in the upstream noncoding sequences of annotated NCR genes. Because of the high-dimensionality of the data, we use a shrinkage estimator of the covariance matrix to infer the GGMs. We show that our approach makes significant and biologically valid predictions. We also show that GGMs are more effective than models that rely on measures of direct interactions between genes.

Keywords

Receiver Operator Characteristic Curve Partial Correlation Sample Covariance Matrix Shrinkage Estimator Good Nitrogen Source 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Godard, P., Urrestarazu, A., Vissers, S., Kontos, K., Bontempi, G., van Helden, J., André, B.: Effect of 21 different nitrogen sources on global gene expression in the yeast Saccharomyces cerevisiae. Molecular and Cellular Biology 27, 3065–3086 (2007)CrossRefGoogle Scholar
  2. 2.
    Scherens, B., Feller, A., Vierendeels, F., Messenguy, F., Dubois, E.: Identification of direct and indirect targets of the Gln3 and Gat1 activators by transcriptional profiling in response to nitrogen availability in the short and long term. FEMS Yeast Research 6, 777–791 (2006)CrossRefGoogle Scholar
  3. 3.
    Kontos, K., Godard, P., André, B., van Helden, J., Bontempi, G.: Machine learning techniques to identify putative genes involved in nitrogen catabolite repression in the yeast Saccharomyces cerevisiae. BMC Proceedings 2, S5 (2008)CrossRefGoogle Scholar
  4. 4.
    Lauritzen, S.L.: Graphical Models. Oxford Statistical Science Series. Clarendon Press, Oxford (1996)zbMATHGoogle Scholar
  5. 5.
    Simonis, N., Wodak, S.J., Cohen, G.N., van Helden, J.: Combining pattern discovery and discriminant analysis to predict gene co-regulation. Bioinformatics 20, 2370–2379 (2004)CrossRefGoogle Scholar
  6. 6.
    Schäfer, J., Strimmer, K.: A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology 4, 32 (2005)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Dobra, A., Hans, C., Jones, B., Nevins, J., Yao, G., West, M.: Sparse graphical models for exploring gene expression data. Journal of Multivariate Analysis 90, 196–212 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Castelo, R., Roverato, A.: A robust procedure for Gaussian graphical model search from microarray data with p larger than n. Journal of Machine Learning Research 7, 2621–2650 (2006)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Magwene, P., Kim, J.: Estimating genomic coexpression networks using first-order conditional independence. Genome Biology 5, R100 (2004)CrossRefGoogle Scholar
  10. 10.
    Wille, A., Zimmermann, P., Vranová, E., Fürholz, A., Laule, O., Bleuler, S., Hennig, L., Prelić, A., von Rohr, P., Thiele, L., Zitzler, E., Gruissem, W., Bühlmann, P.: Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biology 5, R92 (2004)CrossRefGoogle Scholar
  11. 11.
    Kontos, K., Bontempi, G.: Nested q-partial graphs for genetic network inference from “small n, large p” microarray data. In: Elloumi, M., Küng, J., Linial, M., Murphy, R., Schneider, K., Toma, C. (eds.) BIRD 2008. CCIS 13, pp. 273–287. Springer, Heidelberg (2008)Google Scholar
  12. 12.
    Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88, 365–411 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Cooper, T.G.: Transmitting the signal of excess nitrogen in Saccharomyces cerevisiae from the Tor proteins to the GATA factors: connecting the dots. FEMS Microbiology Reviews 26, 223–238 (2002)CrossRefGoogle Scholar
  14. 14.
    Bar-Joseph, Z., Gerber, G., Lee, T., Rinaldi, N., Yoo, J., Robert, F., Gordon, D., Fraenkel, E., Jaakkola, T., Young, R., et al.: Computational discovery of gene modules and regulatory networks. Nature Biotechnology 21, 1337–1342 (2003)CrossRefGoogle Scholar
  15. 15.
    Butte, A., Tamayo, P., Slonim, D., Golub, T., Kohane, I.: Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proceedings of the National Academy of Sciences 97, 12182–12186 (2000)CrossRefGoogle Scholar
  16. 16.
    Whittaker, J.: Graphical Models in Applied Multivariate Statistics. John Wiley and Sons, Inc., Chichester (1990)zbMATHGoogle Scholar
  17. 17.
    Edwards, D.: Introduction to Graphical Modelling, 2nd edn. Springer Texts in Statistics. Springer, Heidelberg (2000)CrossRefzbMATHGoogle Scholar
  18. 18.
    Schäfer, J., Strimmer, K.: An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21, 754–764 (2005)CrossRefGoogle Scholar
  19. 19.
    Dykstra, R.: Establishing the positive definiteness of the sample covariance matrix. The Annals of Mathematical Statistics 41, 2153–2154 (1970)CrossRefzbMATHGoogle Scholar
  20. 20.
    van Helden, J.: Regulatory sequence analysis tools. Nucleic Acids Research 31, 3593–3596 (2003)CrossRefGoogle Scholar
  21. 21.
    Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453. Morgan Kaufmann, San Francisco (1998)Google Scholar
  22. 22.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27, 861–874 (2006)CrossRefGoogle Scholar
  23. 23.
    McClish, R.J.: Analyzing a portion of the ROC curve. Medical Decision Making 9, 190–195 (1989)CrossRefGoogle Scholar
  24. 24.
    Jiang, Y.L., Metz, C.E., Nishikawa, R.M.: A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 201, 745–750 (1996)CrossRefGoogle Scholar
  25. 25.
    Efron, B.: Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika 68, 589–599 (1981)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Kevin Kontos
    • 1
  • Bruno André
    • 2
  • Jacques van Helden
    • 3
  • Gianluca Bontempi
    • 1
  1. 1.Machine Learning Group, Faculté des SciencesUniversité Libre de Bruxelles (ULB)BrusselsBelgium
  2. 2.Physiologie Moléculaire de la Cellule, IBMM, Faculté des SciencesULBGosseliesBelgium
  3. 3.Laboratoire de Bioinformatique des Génomes et des Réseaux, Faculté des SciencesULBBrusselsBelgium

Personalised recommendations