A Learning Framework to Improve Unsupervised Gene Network Inference

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9729)


Network inference through link prediction is an important data mining problem that finds many applications in computational social science and biomedicine. For example, by predicting links, i.e., regulatory relationships, between genes to infer gene regulatory networks (GRNs), computational biologists gain a better understanding of the functional elements and regulatory circuits in cells. Unsupervised methods have been widely used to infer GRNs; however, these methods often create missing and spurious links. In this paper, we propose a learning framework to improve the unsupervised methods. Given a network constructed by an unsupervised method, the proposed framework employs a graph sparsification technique for network sampling and principal component analysis for feature selection to obtain better quality training data, which guides three classifiers to predict and clean the links of the given network. The three classifiers include neural networks, random forests and support vector machines. Experimental results on several datasets demonstrate the good performance of the proposed learning framework and the classifiers used in the framework.


Feature selection Graph mining Network analysis Applications in biology and medicine 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barzel, B., Barabási, A.L.: Network link prediction by global silencing of indirect correlations. Nature Biotechnology 31(8), 720–725 (2013)CrossRefGoogle Scholar
  2. 2.
    Bogdanov, P., Singh, A.K.: Accurate and scalable nearest neighbors in large networks based on effective importance. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 1009–1018 (2013). http://doi.acm.org/10.1145/2505515.2505522
  3. 3.
    Bonneau, R., Reiss, D.J., Shannon, P., Facciotti, M., Hood, L., Baliga, N.S., Thorsson, V.: The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biology 7(5), R36 (2006)Google Scholar
  4. 4.
    Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3), 27 (2011). http://doi.acm.org/10.1145/1961189.1961199
  5. 5.
    Chen, H., Ku, W., Wang, H., Tang, L., Sun, M.: LinkProbe: probabilistic inference on large-scale social networks. In: Proceedings of the 29th IEEE International Conference on Data Engineering, pp. 290–301 (2013). http://dx.doi.org/10.1109/ICDE.2013.6544833
  6. 6.
    Clauset, A., Moore, C., Newman, M.E.: Hierarchical structure and the prediction of missing links in networks. Nature 453(7191), 98–101 (2008)CrossRefGoogle Scholar
  7. 7.
    De Smet, R., Marchal, K.: Advantages and limitations of current network inference methods. Nature 8(10), 717–729 (2010)Google Scholar
  8. 8.
    Elloumi, M., Iliopoulos, C.S., Wang, J.T.L., Zomaya, A.Y.: Pattern Recognition in Computational Molecular Biology: Techniques and Approaches. Wiley (2015)Google Scholar
  9. 9.
    Getoor, L., Diehl, C.P.: Link mining: a survey. SIGKDD Explorations 7(2), 3–12 (2005). http://doi.acm.org/10.1145/1117454.1117456
  10. 10.
    Greenfield, A., Madar, A., Ostrer, H., Bonneau, R.: DREAM4: combining genetic and dynamic information to identify biological networks and dynamical models. PLoS ONE 5(10), e13397 (2010). http://dx.doi.org/10.1371%2Fjournal.pone.0013397
  11. 11.
    Günther, F., Fritsch, S.: Neuralnet: training of neural networks. Nature 2(1), 30–38 (2010)Google Scholar
  12. 12.
    Hasan, M., Zaki, M.: A survey of link prediction in social networks. In: Aggarwal, C.C. (ed.) Social Network Data Analytics, pp. 243–275. Springer, US (2011). http://dx.doi.org/10.1007/978-1-4419-8462-3_9
  13. 13.
    Hothorn, T., Everitt, B.S.: A Handbook of Statistical Analyses Using R. CRC Press (2014)Google Scholar
  14. 14.
    Huynh-Thu, V.A., Sanguinetti, G.: Combining tree-based and dynamical systems for the inference of gene regulatory networks. Bioinformatics 31(10), 1614–1622 (2015). http://dx.doi.org/10.1093/bioinformatics/btu863
  15. 15.
    Kanji, G.K.: 100 Statistical Tests. Sage (2006)Google Scholar
  16. 16.
    Krouk, G., Mirowski, P., LeCun, Y., Shasha, D., Coruzzi, G.: Predictive network modeling of the high-resolution dynamic plant transcriptome in response to nitrate. Genome Biology 11(12), R123 (2010). http://dx.doi.org/10.1186/gb-2010-11-12-r123
  17. 17.
    Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, NY, USA, pp. 631–636 (2006). http://doi.acm.org/10.1145/1150402.1150479
  18. 18.
    Leskovec, J., Huttenlocher, D., Kleinberg, J.: Predicting positive and negative links in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, NY, USA, pp. 641–650 (2010). http://doi.acm.org/10.1145/1772690.1772756
  19. 19.
    Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002). http://CRAN.R-project.org/doc/Rnews/
  20. 20.
    Madar, A., Greenfield, A., Vanden-Eijnden, E., Bonneau, R.: DREAM3: network inference using dynamic context likelihood of relatedness and the Inferelator. PLoS ONE 5(3), e9803 (2010). http://dx.doi.org/10.1371%2Fjournal.pone.0009803
  21. 21.
    Maetschke, S., Madhamshettiwar, P.B., Davis, M.J., Ragan, M.A.: Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Briefings in Bioinformatics 15(2), 195–211 (2014). http://dx.doi.org/10.1093/bib/bbt034
  22. 22.
    Marbach, D., Schaffter, T., Mattiussi, C., Floreano, D.: Generating realistic in silico gene networks for performance assessment of reverse engineering methods. Nature 16(2), 229–239 (2009)Google Scholar
  23. 23.
    Mathioudakis, M., Bonchi, F., Castillo, C., Gionis, A., Ukkonen, A.: Sparsification of influence networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, NY, USA, pp. 529–537 (2011). http://doi.acm.org/10.1145/2020408.2020492
  24. 24.
    Patel, N., Wang, J.T.L.: Semi-supervised prediction of gene regulatory networks using machine learning algorithms. Journal of Biosciences 40(4), 731–740 (2015). http://dx.doi.org/10.1007/s12038-015-9558-9
  25. 25.
    Prill, R.J., Marbach, D., Saez-Rodriguez, J., Sorger, P.K., Alexopoulos, L.G., Xue, X., Clarke, N.D., Altan-Bonnet, G., Stolovitzky, G.: Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS ONE 5(2), e9202 (2010). http://dx.doi.org/10.1371%2Fjournal.pone.0009202
  26. 26.
    Ringnér, M.: What is principal component analysis? Nature 26(3), 303–304 (2008)Google Scholar
  27. 27.
    Schaffter, T., Marbach, D., Floreano, D.: GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics 27(16), 2263–2270 (2011). http://dx.doi.org/10.1093/bioinformatics/btr373
  28. 28.
    Turki, T., Roshan, U.: Weighted maximum variance dimensionality reduction. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-Lopez, J.A., Salas-Rodríguez, J., Suen, C.Y. (eds.) MCPR 2014. LNCS, vol. 8495, pp. 11–20. Springer, Heidelberg (2014)Google Scholar
  29. 29.
    Turki, T., Wang, J.T.L.: A new approach to link prediction in gene regulatory networks. In: Jackowski, K., et al. (eds.) IDEAL 2015. LNCS, vol. 9375, pp. 404–415. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  30. 30.
    Vera-Licona, P., Jarrah, A.S., García-Puente, L.D., McGee, J., Laubenbacher, R.C.: An algebra-based method for inferring gene regulatory networks. BMC Systems Biology 8, 37 (2014). http://dx.doi.org/10.1186/1752-0509-8-37
  31. 31.
    Villaverde, A.F., Ross, J., Morn, F., Banga, J.R.: MIDER: network inference with mutual information distance and entropy reduction. PLoS ONE 9(5), e96732 (2014). http://dx.doi.org/10.1371%2Fjournal.pone.0096732
  32. 32.
    Wang, J.T.L., Zaki, M.J., Toivonen, H.T.T., Shasha, D.: Data Mining in Bioinformatics. Springer (2005)Google Scholar
  33. 33.
    Wang, J.T.L., Liu, J., Wang, J.: XML clustering and retrieval through principal component analysis. International Journal on Artificial Intelligence Tools 14(4), 683 (2005). http://dx.doi.org/10.1142/S0218213005002326
  34. 34.
    Yeung, K.Y., Ruzzo, W.L.: Principal component analysis for clustering gene expression data. Bioinformatics 17(9), 763–774 (2001). http://dx.doi.org/10.1093/bioinformatics/17.9.763
  35. 35.
    Young, W., Raftery, A.E., Yeung, K.Y.: Fast Bayesian inference for gene regulatory networks using ScanBMA. BMC Systems Biology 8, 47 (2014). http://dx.doi.org/10.1186/1752-0509-8-47
  36. 36.
    Yu, J., Smith, V.A., Wang, P.P., Hartemink, A.J., Jarvis, E.D.: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics 20(18), 3594–3603 (2004). http://bioinformatics.oxfordjournals.org/content/20/18/3594.abstract
  37. 37.
    Zoppoli, P., Morganella, S., Ceccarelli, M.: TimeDelay-ARACNE: reverse engineering of gene networks from time-course data by an information theoretic approach. BMC Bioinformatics 11, 154 (2010). http://dx.doi.org/10.1186/1471-2105-11-154

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Computer Science DepartmentKing Abdulaziz UniversityJeddahSaudi Arabia
  2. 2.New Jersey Institute of Technology Bioinformatics Program and Department of Computer ScienceUniversity HeightsNewarkUSA

Personalised recommendations