Exploiting causality in gene network reconstruction based on graph embedding

  • Gianvito PioEmail author
  • Michelangelo Ceci
  • Francesca Prisciandaro
  • Donato Malerba


Gene network reconstruction is a bioinformatics task that aims at modelling the complex regulatory activities that may occur among genes. This task is typically solved by means of link prediction methods that analyze gene expression data. However, the reconstructed networks often suffer from a high amount of false positive edges, which are actually the result of indirect regulation activities due to the presence of common cause and common effect phenomena or, in other terms, due to the fact that the adopted inductive methods do not take into account possible causality phenomena. This issue is accentuated even more by the inherent presence of a high amount of noise in gene expression data. Existing methods for the identification of a transitive reduction of a network or for the removal of (possibly) redundant edges suffer from limitations in the structure of the network or in the nature/length of the indirect regulation, and often require additional pre-processing steps to handle specific peculiarities of the networks (e.g., cycles). Moreover, they are not able to consider possible community structures and possible similar roles of the genes in the network (e.g. hub nodes), which may change the tendency of nodes to be highly connected (and with which nodes) in the network. In this paper, we propose the method INLOCANDA, which learns an inductive predictive model for gene network reconstruction and overcomes all the mentioned limitations. In particular, INLOCANDA is able to (i) identify and exploit indirect relationships of arbitrary length to remove edges due to common cause and common effect phenomena; (ii) take into account possible community structures and possible similar roles by means of graph embedding. Experiments performed along multiple dimensions of analysis on benchmark, real networks of two organisms (E. coli and S. cerevisiae) show a higher accuracy with respect to the competitors, as well as a higher robustness to the presence of noise in the data, also when a huge amount of (possibly false positive) interactions is removed. Availability:


Causality Bionformatics Network Reconstruction Link prediction 



We would like to acknowledge the support of the European Commission through the Projects MAESTRA - Learning from Massive, Incompletely annotated, and Structured Data (Grant Number ICT-2013-612944) and TOREADOR - Trustworthy Model-aware Analytics Data Platform (Grant Number H2020-688797). We would also like to thank Lynn Rudd for her help in reading and correcting the manuscript.


  1. Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6(1), 37–66.Google Scholar
  2. Aho, A. V., Garey, M. R., & Ullman, J. D. (1972). The transitive reduction of a directed graph. SIAM Journal on Computing, 1(2), 131–137.MathSciNetzbMATHCrossRefGoogle Scholar
  3. Atias, N., & Sharan, R. (2012). Comparative analysis of protein networks: Hard problems, practical solutions. Communications of the ACM, 55(5), 88–97.CrossRefGoogle Scholar
  4. Babu, M. M., Luscombe, N. M., Aravind, L., Gerstein, M., & Teichmann, S. A. (2004). Structure and evolution of transcriptional regulatory networks. Current Opinion in Structural Biology, 14(3), 283–291.CrossRefGoogle Scholar
  5. Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems 14 (pp. 585–591). Cambridge: MIT Press.Google Scholar
  6. Berger, M. F., & Bulyk, M. L. (2009). Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nature Protocols, 4(3), 393–411.CrossRefGoogle Scholar
  7. Blockeel, H., Raedt, L. D., & Ramon, J. (1998). Top-down induction of clustering trees. In J. W. Shavlik (Ed.), ICML 1998 (pp. 55–63). Burlington: Morgan Kaufmann.Google Scholar
  8. Böck, M., Ogishima, S., Tanaka, H., Kramer, S., & Kaderali, L. (2012). Hub-centered gene network reconstruction using automatic relevance determination. PLOS ONE, 7(5), 1–17.CrossRefGoogle Scholar
  9. Bošnački, D., Odenbrett, M. R., Wijs, A., Ligtenberg, W., & Hilbers, P. (2012). Efficient reconstruction of biological networks via transitive reduction on general purpose graphics processors. BMC Bioinformatics, 13(1), 281.zbMATHCrossRefGoogle Scholar
  10. Bulyk, M. L. (2005). Discovering DNA regulatory elements with bacteria. Nature Biotechnology, 23(8), 942–944.CrossRefGoogle Scholar
  11. Ceci, M., Pio, G., Kuzmanovski, V., & Dẑeroski, S. (2015). Semi-supervised multi-view learning for gene network reconstruction. PLOS ONE, 10(12), 1–27.CrossRefGoogle Scholar
  12. Cohen, W. W. (1995). Fast effective rule induction. In Proceedings of the twelfth international conference on international conference on machine learning, ICML’95 (pp. 115–123). San Francisco, CA: Morgan Kaufmann Publishers Inc.CrossRefGoogle Scholar
  13. de Jong, H. (2002). Modeling and simulation of genetic regulatory systems: A literature review. Journal of Computational Biology, 9(1), 67–103.MathSciNetCrossRefGoogle Scholar
  14. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.MathSciNetzbMATHGoogle Scholar
  15. Emmert-Streib, F., Glazko, G., De Matos Simoes, R., et al. (2012). Statistical inference and reverse engineering of gene regulatory networks from observational expression data. Bioinformatics and Computational Biology, 3, 8.Google Scholar
  16. Gallagher, B., & Eliassi-Rad, T. (2010). Leveraging label-independent features for classification in sparsely labeled networks: An empirical study. In L. Giles, M. Smith, J. Yen, & H. Zhang (Eds.), Advances in Social Network Mining and Analysis (pp. 1–19). Berlin: Springer.Google Scholar
  17. Geistlinger, L., Csaba, G., Dirmeier, S., Küffner, R., & Zimmer, R. (2013). A comprehensive gene regulatory network for the diauxic shift in saccharomyces cerevisiae. Nucleic Acids Research, 41(18), 8452–8463. CrossRefGoogle Scholar
  18. Grover, A., & Leskovec, J. (2016). Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16 (pp. 855–864). New York, NY: ACM.Google Scholar
  19. Hase, T., Ghosh, S., Yamanaka, R., & Kitano, H. (2013). Harnessing diversity towards the reconstructing of large scale gene regulatory networks. PLoS Computational Biology, 9(11), e1003361.CrossRefGoogle Scholar
  20. Hecker, M., Lambeck, S., Toepfer, S., Van Someren, E., & Guthke, R. (2009). Gene regulatory network inference: Data integration in dynamic models—A review. Biosystems, 96(1), 86–103.CrossRefGoogle Scholar
  21. Hempel, S., Koseska, A., Nikoloski, Z., & Kurths, J. (2011). Unraveling gene regulatory networks from time-resolved gene expression data—A measures comparison study. BMC Bioinformatics, 12(1), 292.CrossRefGoogle Scholar
  22. Henderson, K., Gallagher, B., Li, L., Akoglu, L., Eliassi-Rad, T., Tong, H., & Faloutsos, C. (2011). It’s who you know: Graph mining using recursive structural features. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11 (pp. 663–671). New York: ACM.Google Scholar
  23. Hsu, H. T. (1975). An algorithm for finding a minimal equivalent graph of a digraph. Journal of ACM, 22(1), 11–16.MathSciNetzbMATHCrossRefGoogle Scholar
  24. Ibarguren, I., Lasarguren, A., Pérez, J. M., Muguerza, J., Gurrutxaga, I., & Arbelaitz, O. (2016). Bfpart: Best-first part. Information Sciences, 367–368, 927–952.CrossRefGoogle Scholar
  25. Itani, S., Ohannessian, M., Sachs, K., Nolan, G.P., & Dahleh, M.A. (2008). Structure learning in causal cyclic networks. In Proceedings of the international conference on causality: objectives and assessment—Vol. 6, COA’08 (pp. 165–176) Scholar
  26. Korb, K. B., & Nicholson, A. E. (2010). Bayesian Artificial Intelligence (2nd ed.). Boca Raton, FL: CRC Press Inc.zbMATHGoogle Scholar
  27. Li, J., & Xie, D. (2015). Rack1, a versatile hub in cancer. Oncogene, 34(15), 1890–1898.CrossRefGoogle Scholar
  28. Lo, L., Wong, M., Lee, K., & Leung, K. (2015). Time delayed causal gene regulatory network inference with hidden common causes. PLOS ONE, 10(9), 1–47.CrossRefGoogle Scholar
  29. Lü, L., & Zhou, T. (2011). Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications, 390(6), 1150–1170.CrossRefGoogle Scholar
  30. Marbach, D., Costello, J. C., Küffner, R., Vega, N. M., Prill, R. J., Camacho, D. M., et al. (2012). Wisdom of crowds for robust gene network inference. Nature Methods, 9, 796–804.CrossRefGoogle Scholar
  31. Margolin, A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Favera, R., et al. (2006). Aracne: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 7(Suppl 1), S7.CrossRefGoogle Scholar
  32. Markowetz, F., & Spang, R. (2007). Inferring cellular networks—A review. BMC Bioinformatics, 8(Suppl 6), S5.CrossRefGoogle Scholar
  33. Mikolov, T., Chen, K., Corrado, G. S., & Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR arXiv:1301.3781
  34. Omranian, N., Eloundou-Mbebi, J. M. O., Mueller-Roeber, B., & Nikoloski, Z. (2016). Gene regulatory network inference using fused lasso on multiple data sets. Scientific Reports, 6, 20533. CrossRefGoogle Scholar
  35. Park, P. J. (2009). ChIP-seq: Advantages and challenges of a maturing technology. Nature Reviews Genetics, 10(10), 669–680.CrossRefGoogle Scholar
  36. Pearl, J. (2000). Causality: Models, reasoning, and inference. New York, NY: Cambridge University Press.zbMATHGoogle Scholar
  37. Penfold, C. A., & Wild, D. L. (2011). How to infer gene networks from expression profiles, revisited. Interface Focus, 1(6), 857–870.CrossRefGoogle Scholar
  38. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14 (pp. 701–710). New York, NY: ACM.Google Scholar
  39. Pinna, A., Soranzo, N., & de la Fuente, A. (2010). From knockouts to networks: Establishing direct cause-effect relationships through graph analysis. PLoS ONE, 10(5), e12912.CrossRefGoogle Scholar
  40. Pio, G., Ceci, M., Malerba, D., & D’Elia, D. (2015). ComiRNet: A web-based system for the analysis of miRNA-gene regulatory networks. BMC Bioinformatics, 16(9), S7.CrossRefGoogle Scholar
  41. Pio, G., Ceci, M., Prisciandaro, F., & Malerba, D. (2017). LOCANDA: Exploiting causality in the reconstruction of gene regulatory networks. In A. Yamamoto, T. Kida, T. Uno, & T. Kuboyama (Eds.), Discovery science 2017, Lecture notes in computer science (Vol. 10558, pp. 283–297). Berlin: Springer.Google Scholar
  42. Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.CrossRefGoogle Scholar
  43. Selvanathan, S. P., Graham, G. T., Erkizan, H. V., Dirksen, U., Natarajan, T. G., Dakic, A., et al. (2015). Oncogenic fusion protein ews-fli1 is a network hub that regulates alternative splicing. Proceedings of the National Academy of Sciences, 112(11), E1307–E1316.CrossRefGoogle Scholar
  44. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web, WWW ’15 (pp. 1067–1077). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland.Google Scholar
  45. Tenenbaum, J. B., Silva, V. D., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.CrossRefGoogle Scholar
  46. Thattai, M., & van Oudenaarden, A. (2001). Intrinsic noise in gene regulatory networks. Proceedings of the National Academy of Sciences, 98(15), 8614–8619.CrossRefGoogle Scholar
  47. Van den Bulcke, T., Van Leemput, K., Naudts, B., van Remortel, P., Ma, H., Verschoren, A., et al. (2006). SynTReN: A generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics, 7, 43.CrossRefGoogle Scholar
  48. Vilalta, R., & Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18(2), 77–95.CrossRefGoogle Scholar
  49. Yu, D., Lim, J., Wang, X., Liang, F., & Xiao, G. (2017). Enhanced construction of gene regulatory networks using hub gene information. BMC Bioinformatics, 18(1), 186.CrossRefGoogle Scholar
  50. Zitnik, M., & Zupan, B. (2015). Data imputation in epistatic MAPs by network-guided matrix completion. Journal of Computational Biology, 22(6), 595–608.MathSciNetCrossRefGoogle Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Bari Aldo MoroBariItaly
  2. 2.Big Data LaboratoryNational Interuniversity Consortium for Informatics (CINI)RomeItaly
  3. 3.Jožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations