Abstract
Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.
Similar content being viewed by others
References
Abdulrehman D, Monteiro PT, Teixeira MC, Mira NP, Lourenço AB, dos Santos SC, Cabrito TR, Francisco AP, et al. 2011 YEASTRACT: providing a programmatic access to curated transcriptional regulatory associations in Saccharomyces cerevisiae through a web services interface. Nucleic Acids Res. 39 D136–D140
Acerbi E, Zelante T, Narang V and Stella F 2014 Gene network inference using continuous time Bayesian networks: a comparative study and application to Th17 cell differentiation. BMC Bioinf. 15 387
Aggarwal K and Lee KH 2011 Overexpression of cloned RhsA sequences perturbs the cellular translational machinery in Escherichia coli. J. Bacteriol. 193 4869–4880
Asakura Y, Kojima H and Kobayashi I 2011 Evolutionary genome engineering using a restriction-modification system. Nucleic Acids Res. 39 9034–9046
Blum A and Mitchell T 1998 Combining labelled and unlabelled data with co-training; in Proceedings of the 11th Annual ACM Conference on Computational Learning Theory
Breiman L 2001 Random forests. Mach. Learn. 45 5–32
Cerulo L, Elkan C and Ceccarelli M 2010 Learning gene regulatory networks from only positive and unlabelled data. BMC Bioinf. 11 228
Chang C and Lin C 2011 LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2 27
Chang YH, Gray JW and Tomlin CJ 2014 Exact reconstruction of gene regulatory networks using compressive sensing. BMC Bioinf. 15 400
Chin SL, Marcus IM, Klevecz RR and Li CM 2012 Dynamics of oscillatory phenotypes in Saccharomyces cerevisiae reveal a network of genome-wide transcriptional oscillators. FEBS J. 279 1119–1130
Elkan C and Noto K 2008 Learning classifiers from only positive and unlabelled data; in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Fong SS, Joyce AR and Palsson BØ 2005 Parallel adaptive evolution cultures of Escherichia coli lead to convergent growth phenotypes with different gene expression states. Genome Res. 15 1365–1372
Gillani Z, Akash MS, Rahaman MD and Chen M 2014 CompareSVM: supervised, Support Vector Machine (SVM) inference of gene regularity networks. BMC Bioinf. 15 395
Haddadin FT and Harcum SW 2005 Transcriptome profiles for high-cell-density recombinant and wild-type Escherichia coli. Biotechnol. Bioeng. 90 127–153
Hu Z, Killion PJ and Iyer VR 2007 Genetic reconstruction of a functional transcriptional regulatory network. Nat. Genet. 39 683–687
Huerta AM, Salgado H, Thieffry D and Collado-Vides J 1998 Regulon DB: a database on transcriptional regulation in Escherichia coli. Nucleic Acids Res. 26 55–59
Hurley DG, Cursons J, Wang YK, Budden DM, Print CG and Crampin EJ 2015 NAIL, a software toolset for inferring, analyzing and visualizing regulatory networks. Bioinformatics. 31 277–278
Joachims T 1999 Making large-scale support vector machine learning practical. Advances in Kernel Methods (MIT Press) pp 169–84
Keseler IM, Mackie A, Peralta-Gil M, Santos-Zavaleta A, Gama-Castro S, Bonavides-Martínez C, Fulcher C, Huerta AM, et al. 2013 EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 41 D605–D612
Lähdesmäki H, Shmulevich I and Yli-Harja O 2003 On learning gene regulatory networks under the Boolean network model. Mach. Learn. 52 147–167
Laubacher ME and Ades SE 2008 The Rcsphosphorelay is a cell envelope stress response activated by peptidoglycan stress and contributes to intrinsic antibiotic resistance. J. Bacteriol. 190 2065–2074
Li XL and Liu B 2003 Learning to classify texts using positive and unlabelled data; in Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp 587–92
Liaw A and Wiener M 2002 Classification and regression by random. Forest R News. 2 18–22
Lingeman JM and Shasha D 2012 Network inference in molecular biology: a hands-on framework (Springer)
Liu B, Dai Y, Li X, Lee WS and Yu PS 2003 Building text classifiers using positive and unlabelled examples; in Proceedings of the 3rd IEEE International Conference on Data Mining
MacIsaac KD, Wang T, Gordon DB, Gifford DK, Stormo GD and Fraenkel E 2006 An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinf. 7 113
Maetschke SR, Madhamshettiwar PB, Davis MJ and Ragan MA 2014 Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Brief. Bioinform. 15 195–211
Marbach D, Schaffter T, Mattiussi C and Floreano D 2009 Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J. Comput. Biol. 16 229–239
Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D and Stolovitzky G 2010 Revealing strengths and weaknesses of methods for gene network inference. Proc. Natl. Acad. Sci. USA 107 6286–6291
Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, et al. 2012 Wisdom of crowds for robust gene network inference. Nat. Methods 9 796–804
Mitchell TM 1997 Machine learning (Burr Ridge: McGraw Hill)
Mordelet F and Vert JP 2008 SIRENE: supervised inference of regulatory networks. Bioinformatics 24 i76–i82
Orlando DA, Lin CY, Bernard A, Wang JY, Socolar JES, Iversen ES, Hartemink AJ and Haase SB 2008 Global control of cell-cycle transcription by coupled CDK and network oscillators. Nature 453 944–947
Pe'er D and Hacohen N 2011 Principles and strategies for developing network models in cancer. Cell 144 864–873
Prill RJ, Marbach D, Saez-Rodriguez J, Sorger PK, Alexopoulos LG, Xue X, Clarke ND, Altan-Bonnet G, et al. 2010 Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS One 9 e9202
Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muniz-Rascado L, Garcia-Sotelo JS, Weiss V, Solano-Lira H, et al. 2013 RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res. 41 D203–D213
Shalem O, Dahan O, Levo M, Martinez MR, Furman I, Segal E and Pilpel Y 2008 Transient transcriptional responses to stress are generated by opposing effects of mRNA production and degradation. Mol. Syst. Biol. 4 223
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, et al. 2003 Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13 2498–2504
Vapnik V 1998 Statistical learning theory (New York: Wiley)
Vignes M, Vandel J, Allouche D, Ramadan-Alban N, Cierco-Ayrolles C, Schiex T, Mangin B and De Givry S 2011 Gene regulatory network reconstruction using Bayesian networks, the Dantzig Selector, the Lasso and their meta-analysis. PLoS One 6 e29165
Yona AH, Manor YS, Herbst RH, Romano GH, Mitchell A, Kupiec M, Pilpel Y and Dahan O 2012 Chromosomal duplication is a transient evolutionary solution to stress. Proc. Natl. Acad. Sci. USA 109 21010–21015
Author information
Authors and Affiliations
Corresponding author
Additional information
[Patel N and Wang JTL 2015 Semi-supervised prediction of gene regulatory networks using machine learning algorithms. J. Biosci.] DOI 10.1007/s12038-015-9558-9
Rights and permissions
About this article
Cite this article
Patel, N., Wang, J.T.L. Semi-supervised prediction of gene regulatory networks using machine learning algorithms. J Biosci 40, 731–740 (2015). https://doi.org/10.1007/s12038-015-9558-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12038-015-9558-9