Skip to main content
Log in

Semi-supervised prediction of gene regulatory networks using machine learning algorithms

  • Published:
Journal of Biosciences Aims and scope Submit manuscript

Abstract

Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

References

  • Abdulrehman D, Monteiro PT, Teixeira MC, Mira NP, Lourenço AB, dos Santos SC, Cabrito TR, Francisco AP, et al. 2011 YEASTRACT: providing a programmatic access to curated transcriptional regulatory associations in Saccharomyces cerevisiae through a web services interface. Nucleic Acids Res. 39 D136–D140

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Acerbi E, Zelante T, Narang V and Stella F 2014 Gene network inference using continuous time Bayesian networks: a comparative study and application to Th17 cell differentiation. BMC Bioinf. 15 387

    Article  Google Scholar 

  • Aggarwal K and Lee KH 2011 Overexpression of cloned RhsA sequences perturbs the cellular translational machinery in Escherichia coli. J. Bacteriol. 193 4869–4880

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Asakura Y, Kojima H and Kobayashi I 2011 Evolutionary genome engineering using a restriction-modification system. Nucleic Acids Res. 39 9034–9046

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Blum A and Mitchell T 1998 Combining labelled and unlabelled data with co-training; in Proceedings of the 11th Annual ACM Conference on Computational Learning Theory

  • Breiman L 2001 Random forests. Mach. Learn. 45 5–32

    Article  Google Scholar 

  • Cerulo L, Elkan C and Ceccarelli M 2010 Learning gene regulatory networks from only positive and unlabelled data. BMC Bioinf. 11 228

    Article  Google Scholar 

  • Chang C and Lin C 2011 LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2 27

    Article  Google Scholar 

  • Chang YH, Gray JW and Tomlin CJ 2014 Exact reconstruction of gene regulatory networks using compressive sensing. BMC Bioinf. 15 400

    Article  Google Scholar 

  • Chin SL, Marcus IM, Klevecz RR and Li CM 2012 Dynamics of oscillatory phenotypes in Saccharomyces cerevisiae reveal a network of genome-wide transcriptional oscillators. FEBS J. 279 1119–1130

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Elkan C and Noto K 2008 Learning classifiers from only positive and unlabelled data; in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

  • Fong SS, Joyce AR and Palsson BØ 2005 Parallel adaptive evolution cultures of Escherichia coli lead to convergent growth phenotypes with different gene expression states. Genome Res. 15 1365–1372

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Gillani Z, Akash MS, Rahaman MD and Chen M 2014 CompareSVM: supervised, Support Vector Machine (SVM) inference of gene regularity networks. BMC Bioinf. 15 395

    Article  Google Scholar 

  • Haddadin FT and Harcum SW 2005 Transcriptome profiles for high-cell-density recombinant and wild-type Escherichia coli. Biotechnol. Bioeng. 90 127–153

    Article  CAS  PubMed  Google Scholar 

  • Hu Z, Killion PJ and Iyer VR 2007 Genetic reconstruction of a functional transcriptional regulatory network. Nat. Genet. 39 683–687

    Article  CAS  PubMed  Google Scholar 

  • Huerta AM, Salgado H, Thieffry D and Collado-Vides J 1998 Regulon DB: a database on transcriptional regulation in Escherichia coli. Nucleic Acids Res. 26 55–59

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Hurley DG, Cursons J, Wang YK, Budden DM, Print CG and Crampin EJ 2015 NAIL, a software toolset for inferring, analyzing and visualizing regulatory networks. Bioinformatics. 31 277–278

  • Joachims T 1999 Making large-scale support vector machine learning practical. Advances in Kernel Methods (MIT Press) pp 169–84

  • Keseler IM, Mackie A, Peralta-Gil M, Santos-Zavaleta A, Gama-Castro S, Bonavides-Martínez C, Fulcher C, Huerta AM, et al. 2013 EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 41 D605–D612

  • Lähdesmäki H, Shmulevich I and Yli-Harja O 2003 On learning gene regulatory networks under the Boolean network model. Mach. Learn. 52 147–167

    Article  Google Scholar 

  • Laubacher ME and Ades SE 2008 The Rcsphosphorelay is a cell envelope stress response activated by peptidoglycan stress and contributes to intrinsic antibiotic resistance. J. Bacteriol. 190 2065–2074

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Li XL and Liu B 2003 Learning to classify texts using positive and unlabelled data; in Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp 587–92

  • Liaw A and Wiener M 2002 Classification and regression by random. Forest R News. 2 18–22

    Google Scholar 

  • Lingeman JM and Shasha D 2012 Network inference in molecular biology: a hands-on framework (Springer)

  • Liu B, Dai Y, Li X, Lee WS and Yu PS 2003 Building text classifiers using positive and unlabelled examples; in Proceedings of the 3rd IEEE International Conference on Data Mining

  • MacIsaac KD, Wang T, Gordon DB, Gifford DK, Stormo GD and Fraenkel E 2006 An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinf. 7 113

    Article  Google Scholar 

  • Maetschke SR, Madhamshettiwar PB, Davis MJ and Ragan MA 2014 Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Brief. Bioinform. 15 195–211

    Article  PubMed Central  PubMed  Google Scholar 

  • Marbach D, Schaffter T, Mattiussi C and Floreano D 2009 Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J. Comput. Biol. 16 229–239

    Article  CAS  PubMed  Google Scholar 

  • Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D and Stolovitzky G 2010 Revealing strengths and weaknesses of methods for gene network inference. Proc. Natl. Acad. Sci. USA 107 6286–6291

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, et al. 2012 Wisdom of crowds for robust gene network inference. Nat. Methods 9 796–804

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Mitchell TM 1997 Machine learning (Burr Ridge: McGraw Hill)

    Google Scholar 

  • Mordelet F and Vert JP 2008 SIRENE: supervised inference of regulatory networks. Bioinformatics 24 i76–i82

    Article  PubMed  Google Scholar 

  • Orlando DA, Lin CY, Bernard A, Wang JY, Socolar JES, Iversen ES, Hartemink AJ and Haase SB 2008 Global control of cell-cycle transcription by coupled CDK and network oscillators. Nature 453 944–947

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Pe'er D and Hacohen N 2011 Principles and strategies for developing network models in cancer. Cell 144 864–873

    Article  PubMed Central  PubMed  Google Scholar 

  • Prill RJ, Marbach D, Saez-Rodriguez J, Sorger PK, Alexopoulos LG, Xue X, Clarke ND, Altan-Bonnet G, et al. 2010 Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS One 9 e9202

  • Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muniz-Rascado L, Garcia-Sotelo JS, Weiss V, Solano-Lira H, et al. 2013 RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res. 41 D203–D213

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Shalem O, Dahan O, Levo M, Martinez MR, Furman I, Segal E and Pilpel Y 2008 Transient transcriptional responses to stress are generated by opposing effects of mRNA production and degradation. Mol. Syst. Biol. 4 223

    Article  PubMed Central  PubMed  Google Scholar 

  • Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, et al. 2003 Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13 2498–2504

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Vapnik V 1998 Statistical learning theory (New York: Wiley)

    Google Scholar 

  • Vignes M, Vandel J, Allouche D, Ramadan-Alban N, Cierco-Ayrolles C, Schiex T, Mangin B and De Givry S 2011 Gene regulatory network reconstruction using Bayesian networks, the Dantzig Selector, the Lasso and their meta-analysis. PLoS One 6 e29165

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Yona AH, Manor YS, Herbst RH, Romano GH, Mitchell A, Kupiec M, Pilpel Y and Dahan O 2012 Chromosomal duplication is a transient evolutionary solution to stress. Proc. Natl. Acad. Sci. USA 109 21010–21015

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason T L Wang.

Additional information

[Patel N and Wang JTL 2015 Semi-supervised prediction of gene regulatory networks using machine learning algorithms. J. Biosci.] DOI 10.1007/s12038-015-9558-9

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patel, N., Wang, J.T.L. Semi-supervised prediction of gene regulatory networks using machine learning algorithms. J Biosci 40, 731–740 (2015). https://doi.org/10.1007/s12038-015-9558-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12038-015-9558-9

Keywords

Navigation