Data Mining for Systems Biology pp 47-58

Part of the Methods in Molecular Biology book series (MIMB, volume 939) | Cite as

Supervised Inference of Gene Regulatory Networks from Positive and Unlabeled Examples

Protocol

Abstract

Elucidating the structure of gene regulatory networks (GRN), i.e., identifying which genes are under control of which transcription factors, is an important challenge to gain insight on a cell’s working mechanisms. We present SIRENE, a method to estimate a GRN from a collection of expression data. Contrary to most existing methods for GRN inference, SIRENE requires as input a list of known regulations, in addition to expression data, and implements a supervised machine-learning approach based on learning from positive and unlabeled examples to account for the lack of negative examples.

Key words

Gene regulatory network Reverse engineering Inference Machine learning Gene expression 

References

  1. 1.
    Hood L, Heath JR, Phelps ME, Lin B (2004) Systems biology and new technologies enable predictive and preventative medicine. Science 306(5696):640–643PubMedCrossRefGoogle Scholar
  2. 2.
    Bansal M, Belcastro V, Ambesi-Impiombato A, diBernardo D (2007) How to infer gene networks from expression profiles. Mol Syst Biol 3:78PubMedGoogle Scholar
  3. 3.
    Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM (1999) Systematic determination of genetic network architecture. Nat Genet 22:281–285PubMedCrossRefGoogle Scholar
  4. 4.
    Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci USA 97(22):12182–12186PubMedCrossRefGoogle Scholar
  5. 5.
    Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A (2006) ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular contexts. BMC Bioinformatics 7 Suppl 1:S7Google Scholar
  6. 6.
    Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS (2007) Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 5(1):e8PubMedCrossRefGoogle Scholar
  7. 7.
    Friedman N, Linial M, Nachman I, Pe’er D (2000) Using Bayesian networks to analyze expression data. J Comput Biol 7(3–4):601–620PubMedCrossRefGoogle Scholar
  8. 8.
    Chen T, He HL, Church GM (1999) Modeling gene expression with differential equations. Pac Symp Biocomput 4:29–40Google Scholar
  9. 9.
    Tegner J, Yeung MKS, Hasty J, Collins JJ (2003) Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling. Proc Natl Acad Sci USA 100(10):5944–5949PubMedCrossRefGoogle Scholar
  10. 10.
    Gardner TS, Bernardo D, Lorenz D, Collins JJ (2003) Inferring genetic networks and identifying compound mode of action via expression profiling. Science 301(5629):102–105PubMedCrossRefGoogle Scholar
  11. 11.
    Chen K-C, Wang T-Y, Tseng H-H, Huang C-YF, Kao C-Y (2005) A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae. Bioinformatics 21(12):2883–2890PubMedCrossRefGoogle Scholar
  12. 12.
    Bernardo D, Thompson MJ, Gardner TS, Chobot SE, Eastwood EL, Wojtovich AP, Elliott SJ, Schaus SE, Collins JJ (2005) Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nat Biotechnol 23(3):377–383PubMedCrossRefGoogle Scholar
  13. 13.
    Bansal M, Della Gatta G, Bernardo D (2006) Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics 22(7):815–822PubMedCrossRefGoogle Scholar
  14. 14.
    Akutsu T, Miyano S, Kuhara S (2000) Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function. J Comput Biol 7(3–4):331–343PubMedCrossRefGoogle Scholar
  15. 15.
    Yamanishi Y, Vert J-P, Kanehisa M (2004) Protein network inference from multiple genomic data: a supervised approach. Bioinformatics 20:i363–i370PubMedCrossRefGoogle Scholar
  16. 16.
    Vert J-P, Yamanishi Y (2005) Supervised graph inference. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems, vol 17. MIT, Cambridge, MA, pp 1433–1440Google Scholar
  17. 17.
    Yamanishi Y, Vert J-P, Kanehisa M (2005) Supervised enzyme network inference from the integration of genomic data and chemical information. Bioinformatics 21:i468–i477PubMedCrossRefGoogle Scholar
  18. 18.
    Bleakley K, Biau G, Vert J-P (2007) Supervised reconstruction of biological networks with local models. Bioinformatics 23(13):i57–i65PubMedCrossRefGoogle Scholar
  19. 19.
    Mordelet F, Vert J-P (2008) SIRENE: Supervised inference of regulatory networks. Bioinformatics 24(16):i76–i82PubMedCrossRefGoogle Scholar
  20. 20.
    Bishop C (2006) Pattern recognition and machine learning. Springer, BerlinGoogle Scholar
  21. 21.
    Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the Support of a High-Dimensional Distribution. Neural Comput 13:1443–1471PubMedCrossRefGoogle Scholar
  22. 22.
    Denis F, Gilleron R, Letouzey F (2005) Learning from positive and unlabeled examples. Theoret Computer Sci 348(1):70–83CrossRefGoogle Scholar
  23. 23.
    Scott C, Blanchard G (2009) Novelty detection: unlabeled data definitely help. In: van Dyk V, Welling M (ed) Proceedings of the twelfth international conference on artificial intelligence and statistics (AISTATS), vol 5. Clearwater Beach, Florida, pp 464–471Google Scholar
  24. 24.
    Manevitz LM, Yousef M (2001) One-class SVMs for document classification. J Mach Learn Res 2:139–154Google Scholar
  25. 25.
    Liu B, Lee WS, Yu PS, Li X (2002) Partially supervised classification of text documents. In: ICML ’02: Proceedings of the Nineteenth International Conference on Machine Learning, San Francisco, CA, USA. Morgan Kaufmann Publishers, USA, pp 387–394Google Scholar
  26. 26.
    Li X, Liu B (2003) Learning to classify texts using positive and unlabeled data. In: IJCAI’03: Proceedings of the 18th international joint conference on Artificial intelligence San Francisco, CA. Morgan Kaufmann Publishers, USA, pp 587–592Google Scholar
  27. 27.
    Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. In: International Conference on Data Mining, pp 179–186Google Scholar
  28. 28.
    Yu H, Han J, Chang KC-C (2004) PEBL: Web page classification without negative examples. IEEE Trans Knowl Data Eng 16(1):70–81CrossRefGoogle Scholar
  29. 29.
    Lee WS, Liu B (2003) Learning with positive and unlabeled examples using weighted logistic regression. In: Fawcett T, Mishra N (ed) Machine learning, proceedings of the twentieth international conference (ICML 2003). AAAI Press, USA, pp 448–455Google Scholar
  30. 30.
    Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: KDD ’08: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, USA, pp 213–220Google Scholar
  31. 31.
    Mordelet F, Vert J-P (2010) A bagging SVM to learn from positive and unlabeled examples. Technical Report HAL:00523336Google Scholar
  32. 32.
    Salgado H, Gama-Castro S, Peralta-Gil M, Díaz-Peredo E, Sánchez-Solano F, Santos-Zavaleta A, Martínez-Flores I, Jiménez-Jacinto V, Bonavides-Martínez C, Segura-Salazar J, Martínez-Antonio A, Collado-Vides J (2006) RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res 34(Database issue):D394–D397Google Scholar
  33. 33.
    Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140Google Scholar
  34. 34.
    Vapnik VN (1998) Statistical learning theory. Wiley, New YorkGoogle Scholar
  35. 35.
    Schölkopf B, Tsuda K, Vert J-P (2004) Kernel methods in computational biology. MIT, Cambridge, MAGoogle Scholar
  36. 36.
    Vert R, Vert J-P (2006) Consistency and convergence rates of one-class SVMs and related algorithms. J Mach Learn Res 7:817–854Google Scholar
  37. 37.
    Mordelet F (2010) Learning from positive and unlabeled examples in biology. Ph.D. thesis, Mines ParisTechGoogle Scholar
  38. 38.
    Joachims T (1997) A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: ICML ’97: Proceedings of the fourteenth international conference on machine learning, Nashville, Tennessee. Morgan Kaufmann Publishers, USA, pp 143–151Google Scholar
  39. 39.
    De Bie T, Tranchevent L-C, vanOeffelen LMM, Moreau Y (2007) Kernel-based data fusion for gene prioritization. Bioinformatics 23(13):i125–i132PubMedCrossRefGoogle Scholar
  40. 40.
    Zhang K, Tsang I, Kwok J (2009) Maximum margin clustering made practical. IEEE Trans Neural Network 20(4):583–596CrossRefGoogle Scholar
  41. 41.
    Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Computer ScienceDuke UniversityDurhamUSA
  2. 2.Mines ParisTechCentre for Computational BiologyFontainebleauFrance

Personalised recommendations