A Feature Selection Approach for Evaluate the Inference of GRNs Through Biological Data Integration - A Case Study on A. Thaliana

  • Fábio F. R. Vicente
  • Euler Menezes
  • Gabriel Rubino
  • Juliana de Oliveira
  • Fabrício Martins LopesEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9423)


The inference of gene regulatory networks (GRNs) from expression profiles is a great challenge in bioinformatics due to the curse of dimensionality. For this reason, several methods that perform data integration have been developed to reduce the estimation error of the inference. However, it is not completely formulated how to use each type of biological information available. This work address this issue by proposing feature selection approach in order to integrate biological data and evaluate three types of biological information regarding their effect on the similarity of inferred GRNs. The proposed feature selection method is based on sequential forward floating selection (SFFS) search algorithm and the mean conditional entropy (MCE) as criterion function. An expression dataset was built as an additional contribution of this work containing 22746 genes and 1206 experiments regarding A. thaliana. The experimental results achieve 39% of GRNs improvement in average when compared to non-use of biological data integration. Besides, the results showed that the improvement is associated to a specific type of biological information: the cellular localization, which is a valuable and information for the development of new experiments and indicates an important insight for investigation.


Gene regulatory networks Feature selection Data integration Bioinformatics Arabidopsis thaliana 



This work was supported by FAPESP grant 2011/50761-2, UTFPR, CNPq, Fundação Araucária, CAPES and NAP eScience - PRP - USP.


  1. 1.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)CrossRefGoogle Scholar
  2. 2.
    Baralla, A., Mentzen, W.I., de la Fuente, A.: Inferring gene networks: dream or nightmare? Annals of the New York Academy of Sciences 1158, 246–56 (2009)CrossRefGoogle Scholar
  3. 3.
    Barrera, J., Cesar Jr., R.M., Martins Jr., D.C., Vencio, R.Z.N., Merino, E.F., Yamamoto, M.M., Leonardi, F.G., Pereira, C.A.B., Portillo, H.A.: Constructing probabilistic genetic networks of Plasmodium falciparum, from dynamical expression signals of the intraerythrocytic development cycle. In: McConnell, P., Lin, S.M., Hurban, P. (eds.) Meth. of Microarray Data Analysis, pp. 11–26. Springer (2007)Google Scholar
  4. 4.
    Barrett, T., Wilhite, S.E., Ledoux, P., Evangelista, C., Kim, I.F., Tomashevsky, M., Marshall, K.A., Phillippy, K.H., Sherman, P.M., Holko, M., Yefanov, A., Lee, H., Zhang, N., Robertson, C.L., Serova, N., Davis, S., Soboleva, A.: NCBI GEO: Archive for functional genomics data sets - Update. NAR 41(D1), 991–995 (2013)CrossRefGoogle Scholar
  5. 5.
    Childs, K.L., Hamilton, J.P., Zhu, W., Ly, E., Cheung, F., Wu, H., Rabinowicz, P.D., Town, C.D., Buell, C.R., Chan, A.P.: The tigr plant transcript assemblies database. Nucleic Acids Research 35(suppl. 1), D846–D851 (2007)CrossRefGoogle Scholar
  6. 6.
    Davis, S., Meltzer, P.: Geoquery: a bridge between the gene expression omnibus (geo) and bioconductor. Bioinformatics 14, 1846–1847 (2007)CrossRefGoogle Scholar
  7. 7.
    De Haan, J., Piek, E., van Schaik, R., de Vlieg, J., Bauerschmidt, S., Buydens, L., Wehrens, R.: Integrating gene expression and go classification for pca by preclustering. BMC Bioinformatics 11(1), 158 (2010)CrossRefGoogle Scholar
  8. 8.
    D’haeseleer, P., Liang, S., Somogyi, R.: Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics 16(8), 707–726 (2000)CrossRefGoogle Scholar
  9. 9.
    Dougherty, E.R.: Validation of inference procedures for gene regulatory networks. Current Genomics 8(6), 351–359 (2007)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Dougherty, E.R.: Validation of gene regulatory networks: scientific and inferential. Briefings in Bioinformatics 12(3), 245–252 (2011)CrossRefGoogle Scholar
  11. 11.
    Edgar, R., Domrachev, M., Lash, A.E.: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research 30(1), 207–210 (2002)CrossRefGoogle Scholar
  12. 12.
    Friedman, N., Linial, M., Nachman, I., Pe’er, D.: Using bayesian networks to analyze expression data. Journal of Computational Biology 7(3–4), 601–620 (2000)CrossRefGoogle Scholar
  13. 13.
    Hashimoto, R.F., Kim, S., Shmulevich, I., Zhang, W., Bittner, M.L., Dougherty, E.R.: Growing genetic regulatory networks from seed genes. Bioinformatics 20(8), 1241–1247 (2004)CrossRefGoogle Scholar
  14. 14.
    Kanehisa, M., Goto, S.: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 28(1), 27–30 (2000)CrossRefGoogle Scholar
  15. 15.
    Lamesch, P., Berardini, T.Z., Li, D., Swarbreck, D., Wilks, C., Sasidharan, R., Muller, R., Dreher, K., Alexander, D.L., Garcia-Hernandez, M., Karthikeyan, A.S., Lee, C.H., Nelson, W.D., Ploetz, L., Singh, S., Wensel, A., Huala, E.: The arabidopsis informtion resource (TAIR): improved gene annotation and new tools. NAR (2011)Google Scholar
  16. 16.
    Lopes, F.M., Martins Jr., D.C., Barrera, J., Cesar, Jr., R.M.: A feature selection technique for inference of graphs from their known topological properties: Revealing scale-free gene regulatory networks. Information Sciences 272, 1–15 (2014)Google Scholar
  17. 17.
    Lopes, F.M., Martins Jr., D.C., Barrera, J., Cesar Jr., R.M.: SFFS-MR: A floating search strategy for grns inference. In: Dijkstra, T.M.H., Tsivtsivadze, E., Marchiori, E., Heskes, T. (eds.) PRIB 2010. LNCS, vol. 6282, pp. 407–418. Springer, Heidelberg (2010)Google Scholar
  18. 18.
    Lopes, F.M., Martins Jr., D.C., Cesar Jr., R.M.: Feature selection environment for genomic applications. BMC Bioinformatics 9(1), 451 (2008)Google Scholar
  19. 19.
    Lopes, F.M., de Oliveira, E.A., Cesar Jr., R.M.: Inference of gene regulatory networks from time series by Tsallis entropies. BMC Systems Biology 5(1), 61 (2011)Google Scholar
  20. 20.
    Lopes, F.M., Ray, S.S., Hashimoto, R.F., Cesar Jr., R.M.C.: Entropic biological score: a cell cycle investigation for GRNs inference. Gene 541(2), 129–137 (2014)Google Scholar
  21. 21.
    Lu, L.J., Xia, Y., Paccanaro, A., Yu, H., Gerstein, M.: Assessing the limits of genomic data integration for predicting protein networks. Gen. Res. 15(7), 945–53 (2005)CrossRefGoogle Scholar
  22. 22.
    Margolin, A., Basso, K.N., Wiggins, C., Stolovitzky, G., Favera, R., Califano, A.: ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7(suppl. 1), S7 (2006)CrossRefGoogle Scholar
  23. 23.
    Pavlopoulos, G.A., Secrier, M., Moschopoulos, C.N., Soldatos, T.G., Kossida, S., Aerts, J., Schneider, R., Bagos, P.G.: Using graph theory to analyze biological networks. BioData mining 4(1), 10 (2011)CrossRefGoogle Scholar
  24. 24.
    Pudil, P., Novovičová, J., Kittler, J.: Floating search methods in feature-selection. Pattern Recognition Letters 15(11), 1119–1125 (1994)CrossRefGoogle Scholar
  25. 25.
    Shmulevich, I., Dougherty, E.R., Kim, S., Zhang, W.: Probabilistic boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics 18(2), 261–274 (2002)CrossRefGoogle Scholar
  26. 26.
    Stuart, J.M., Segal, E., Koller, D., Kim, S.K.: A gene-coexpression network for global discovery of conserved genetic modules. Science 302(5643), 249–255 (2003)CrossRefGoogle Scholar
  27. 27.
    da Rocha Vicente, F.F., Lopes, F.M.: SFFS-SW: A feature selection algorithm exploring the small-world properties of GNs. In: Comin, M., Käll, L., Marchiori, E., Ngom, A., Rajapakse, J. (eds.) PRIB 2014. LNCS, vol. 8626, pp. 60–71. Springer, Heidelberg (2014) Google Scholar
  28. 28.
    Vicente, F.F.R., Lopes, F.M., Hashimoto, R.F., Cesar Jr., R.M.: Assessing the gain of biological data integration in gene networks inference. BMC Genomics 13(suppl. 6), S7 (2012)Google Scholar
  29. 29.
    Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)CrossRefGoogle Scholar
  30. 30.
    Yilmaz, A., Mejia-Guerra, M.K., Kurz, K., Liang, X., Welch, L., Grotewold, E.: Agris: the arabidopsis gene regulatory information server, an update. Nucleic Acids Research 39(suppl. 1), D1118–D1122 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Fábio F. R. Vicente
    • 1
    • 2
  • Euler Menezes
    • 1
  • Gabriel Rubino
    • 1
  • Juliana de Oliveira
    • 3
  • Fabrício Martins Lopes
    • 1
    Email author
  1. 1.Federal University of TechnologyCornélio ProcópioBrazil
  2. 2.Institute of Mathematics and StatisticsUniversity of São PauloSão PauloBrazil
  3. 3.Department of Biological Sciences Faculty of Sciences and Letters of Assis - FCLAUniversity of São Paulo State - UNESPAssis, São PauloBrazil

Personalised recommendations