Hub Gene Selection Methods for the Reconstruction of Transcription Networks

  • José Miguel Hernández-Lobato
  • Tjeerd M. H. Dijkstra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6321)


Transcription control networks have a scale-free topological structure: While most genes are involved in a reduced number of links, a few hubs or key regulators are connected to a significantly large number of nodes. Several methods have been developed for the reconstruction of these networks from gene expression data, e.g. ARACNE. However, few of them take into account the scale-free structure of transcription networks. In this paper, we focus on the hubs that commonly appear in scale-free networks. First, three feature selection methods are proposed for the identification of those genes that are likely to be hubs and second, we introduce an improvement in ARACNE so that this technique can take into account the list of hub genes generated by the feature selection methods. Experiments with synthetic gene expression data validate the accuracy of the feature selection methods in the task of identifying hub genes. When ARACNE is combined with the output of these methods, we achieve up to a 62% improvement in performance over the original reconstruction algorithm. Finally, the best method for identifying hub genes is validated on a set of expression profiles from yeast.


Transcription network ARACNE Automatic relevance determination Group Lasso Maximum relevance minimum redundancy Scale-free Hub 


  1. 1.
    Kitano, H.: Systems biology: a brief overview. Science 295(5560), 1662–1664 (2002)CrossRefGoogle Scholar
  2. 2.
    Stolovitzky, G., Califano, A.: Systems biology: Making sense of oceans of biological data. The New York Academy of Sciences Update Magazine, 20–23 (March/April 2006)Google Scholar
  3. 3.
    Tong, et al.: Global mapping of the yeast genetic interaction network. Science 303(5659), 808–813 (2004)CrossRefGoogle Scholar
  4. 4.
    Basso, K., Margolin, A.A., Stolovitzky, G., Klein, U., Dalla-Favera, R., Califano, A.: Reverse engineering of regulatory networks in human B cells. Nature Genetics 37, 382–390 (2005)CrossRefGoogle Scholar
  5. 5.
    Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J.J., Gardner, T.S.: Large-scale mapping and validation of escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology 5, 54–66 (2007)CrossRefGoogle Scholar
  6. 6.
    Thieffry, D., Huerta, A.M., Pérez-Rueda, E., Collado-Vides, J.: From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in escherichia coli. BioEssays 20(5), 433–440 (1998)CrossRefGoogle Scholar
  7. 7.
    Albert, R.: Scale-free networks in cell biology. Journal of Cell Science 118(21), 4947–4957 (2005)CrossRefGoogle Scholar
  8. 8.
    Friedman, N.: Inferring cellular networks using probabilistic graphical models. Science 303(5659), 799–805 (2004)CrossRefGoogle Scholar
  9. 9.
    Yeung, M.K.S., Tegnér, J., Collins, J.J.: Reverse engineering gene networks using singular value decomposition and robust regression. Proceedings of the National Academy of Sciences of the United States of America 99(9), 6163–6168 (2002)CrossRefGoogle Scholar
  10. 10.
    Gardner, T., di Bernardo, D., Lorenz, D., Collins, J.J.: Inferring genetic networks and identifying compound mode of action via expression profiling. Science 301(5629), 102–105 (2003)CrossRefGoogle Scholar
  11. 11.
    Sheridan, P., Kamimura, T., Shimodaira, H.: On scale-free prior distributions and their applicability in large-scale network inference with gaussian graphical models. In: Complex Sciences, pp. 110–117. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  12. 12.
    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B 68, 49–67 (2006)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Tipping, M.E.: Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211–244 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  15. 15.
    den Bulcke, T.V., Leemput, K.V., Naudts, B., van Remortel, P., Ma, H., Verschoren, A., Moor, B.D., Marchal, K.: Syntren: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics 7(1), 43 (2006)CrossRefGoogle Scholar
  16. 16.
    Margolin, A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Favera, R., Califano, A.: Aracne: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7(Suppl. 1), S7 (2006)CrossRefGoogle Scholar
  17. 17.
    Faith, J.J., Driscoll, M.E., Fusaro, V.A., Cosgrove, E.J., Hayete, B., Juhn, F.S., Schneider, S.J., Gardner, T.S.: Many microbe microarrays database: uniformly normalized affymetrix compendia with structured experimental metadata. Nucleic Acids Research 36, D866–D870 (2008)CrossRefGoogle Scholar
  18. 18.
    Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research 13, 2498–2504 (2003)CrossRefGoogle Scholar
  19. 19.
    Gardner, T.S., Faith, J.J.: Reverse-engineering transcription control networks. Physics of Life Reviews 2(1), 65–88 (2005)CrossRefGoogle Scholar
  20. 20.
    Alon, U.: An introduction to systems biology. CRC Press, Boca Raton (2006)zbMATHGoogle Scholar
  21. 21.
    Barabási, A.L., Oltvai, Z.N.: Network biology: understanding the cell’s functional organization. Nature Reviews Genetics 5(2), 101–113 (2004)CrossRefGoogle Scholar
  22. 22.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1), 267–288 (1996)zbMATHMathSciNetGoogle Scholar
  23. 23.
    Bishop, C.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)zbMATHGoogle Scholar
  24. 24.
    Kim, Y., Kim, J., Kim, Y.: Blockwise sparse regression. Statistica Sinica 16, 375–390 (2006)zbMATHMathSciNetGoogle Scholar
  25. 25.
    Shen-Orr, S.S., Milo, R., Mangan, S., Alon, U.: Network motifs in the transcriptional regulation network of escherichia coli. Nature Genetics 32, 64–68 (2002)CrossRefGoogle Scholar
  26. 26.
    Ma, H.W., Kumar, B., Ditges, U., Gunzer, F., Buer, J., Zeng, A.P.: An extended transcriptional regulatory network of escherichia coli and analysis of its hierarchical structure and network motifs. Nucleic Acids Research 32(22), 6643–6649 (2004)CrossRefGoogle Scholar
  27. 27.
    Guelzim, N., Bottani, S., Bourgine, P., Képès, F.: Topological and causal structure of the yeast transcriptional regulatory network. Nature Genetics 31, 60–63 (2002)CrossRefGoogle Scholar
  28. 28.
    Meyer, P.E., Lafitte, F., Bontempi, G.: minet: A r/bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinformatics 9(1), 461 (2008)CrossRefGoogle Scholar
  29. 29.
    Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: ICML 2006, pp. 223–240 (2006)Google Scholar
  30. 30.
    Huibregtse, J.M., Yang, J.C., Beaudenon, S.L.: The large subunit of RNA polymerase II is a substrate of the Rsp5 ubiquitin-proteinligase. Proceedings of the National Academy of Sciences of the United States of America 94(8), 3656–3661 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • José Miguel Hernández-Lobato
    • 1
  • Tjeerd M. H. Dijkstra
    • 2
  1. 1.Computer Science DepartmentUniversidad Autónoma de MadridMadridSpain
  2. 2.Institute for Computing and Information SciencesRadboud University NijmegenThe Netherlands

Personalised recommendations