Dissecting Transcriptional Control Networks

  • Vijayalakshmi H. Nagaraj
  • Anirvan M. Sengupta


Reconstructing how transcriptional networks function involves figuring out which promoters are affected by which transcription factors. Searching for functional regulatory sites bound by particular transcription factors in a genome is therefore of great importance. The chapter discusses efforts at building classifiers that separate promoters targeted by particular transcription factors from those that are not. We start with simple sequence classifiers based on Support Vector Machines and go on to discuss how to integrate different kind of data into the analysis.

Key Words

Transcription regulatory elements motifs support vector machines probabilistic models DNA sequence evolution 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lewin B. Genes VII. New York: Oxford University Press; 2000.Google Scholar
  2. 2.
    Fickett JW, Wasserman WW. Discovery and modeling of transcriptional regulatory regions. Curr Opin Biotechnol 2000;1:19–24.CrossRefGoogle Scholar
  3. 3.
    Stormo GD, Tan K. Mining genome databases to identify and understand new gene regulatory systems. Curr Opin Microbiol 2002;5:149–153.PubMedCrossRefGoogle Scholar
  4. 4.
    Sengupta AM, Djordjevic M, Shraiman BI. Specificity and robustness of transcription control networks. Proc Natl Acad Sci USA 2002;99:2072–2077.PubMedCrossRefGoogle Scholar
  5. 5.
    Wagner R. Transcription Regulation in Prokaryotes. Oxford: Oxford University Press; 2000.Google Scholar
  6. 6.
    Gilbert SF. Developmental Biology, 6th edition. Sunderland: Sinauer; 2000.Google Scholar
  7. 7.
    Docherty K. Gene Transcription, DNA Binding Proteins. New York: John Wiley & Sons Ltd.; 1997.Google Scholar
  8. 8.
    Travers AA, Buckle M. DNA-Protein Interactions: A Practical Approach. Oxford: Oxford University Press; 2000.Google Scholar
  9. 9.
    Robison K, McGuire AM, Church GM. A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. J Mol Biol 1998;284:241–254. Available at Scholar
  10. 10.
    Salgado H, Santos A, Garza-Ramos U, et al. RegulonDB (version 2.0): a database on transcriptional regulation in Escherichia coli. Nucleic Acids Res 1999;27:59–60. Scholar
  11. 11.
    Zhu J, Zhang MQ. SCPD: A Promoter Database of Yeast Saccharomyces cerevisiae. Bioinformatics 1999;15:607–611. Available at Scholar
  12. 12.
    Wingender E, Chen X, Hehl R, et al. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 2000;28:316–319. Available at http://transfac, Scholar
  13. 13.
    Ren B, Robert F, Wyrick JJ, et al. Genome-wide location and function of DNA binding proteins. Science 2000;290:2306–2309.PubMedCrossRefGoogle Scholar
  14. 14.
    Iyer VR, Horak CE, Scafe CS, et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 2001;409:533–538.PubMedCrossRefGoogle Scholar
  15. 15.
    Lee TI, Rinaldi NJ, Robert F, Odom DT, et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002;298:799–804.PubMedCrossRefGoogle Scholar
  16. 16.
    Harbison CT, Gordon DB, Lee TI, et al. Transcriptional regulatory code of a eukaryotic genome. Nature 2004;431:99–104.PubMedCrossRefGoogle Scholar
  17. 17.
    Tuerk C, Gold L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage-T4 DNA-polymerase. Science 1990;249:505–510.PubMedCrossRefGoogle Scholar
  18. 18.
    Mathias JR, Hanlon SE, O’Flanagan RA, et al. Repression of the yeast HO gene by the MATα2 and MATa1 homeodomain proteins. Nucleic Acids Res 2004;32:6469–6478.PubMedCrossRefGoogle Scholar
  19. 19.
    Roulet E, Busso S, Camargo AA, et al. High-throughput SELEX SAGE method for quantitative modeling of transcription factor binding sites. Nat Biotechnol 2002;20:831–835.PubMedGoogle Scholar
  20. 20.
    Nagaraj VH, O’Flanagan RA, Shraiman BI, Sengupta AM, manuscript in preparation.Google Scholar
  21. 21.
    Chen QK, Hertz GZ, Stormo GD. MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices. Comput Appl Biosci 1995;11:563–566.PubMedGoogle Scholar
  22. 22.
    Gralla J, Collado-Vides J. Organization and function of transcription regulatory elements. In: Neidhart FC, Ingraham F, eds. Escherichia coli and Samonella typhimurium: Cellular and Molecular Biology, Washington DC: ASM Press, 1996:1232–1245.Google Scholar
  23. 23.
    Stormo GD, Hartzell GW, 3rd. Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci USA 1989;86:1183–1197.PubMedCrossRefGoogle Scholar
  24. 24.
    Tavazoie S, Hughes JD, Campbell MJ, et al. Systematic determination of genetic network architecture. Nat Genet 1999;22:281–285.PubMedCrossRefGoogle Scholar
  25. 25.
    Hughes JD, Estep PW, Tavazoie S, Church GM. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 2000;296:1205–1214.PubMedCrossRefGoogle Scholar
  26. 26.
    Bussemaker HJ, Li H, Siggia ED. Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc Natl Acad Sci USA 2000;97:10096–10100.PubMedCrossRefGoogle Scholar
  27. 27.
    Bussemaker HJ, Li H, Siggia ED. Regulatory element detection using correlation with expression. Nat Genet 2001;27:167–171.PubMedCrossRefGoogle Scholar
  28. 28.
    McCue L, Thompson W, Carmack C, et al. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res 2001;29:774–782.PubMedCrossRefGoogle Scholar
  29. 29.
    Rajewsky N, Socci ND, Zapotocky M, Siggia ED. The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons. Genome Res 2002;12:298–308.PubMedCrossRefGoogle Scholar
  30. 30.
    Liu XS, Brutlag DL, Liu JS. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat Biotechnol 2002;20:835–839.PubMedGoogle Scholar
  31. 31.
    Staden R. Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res 1984;12:505–519.PubMedCrossRefGoogle Scholar
  32. 32.
    Schneider TD, Stormo GD, Gold L, Ehrenfeucht A. Information content of binding sites on nucleotide sequences. J Mol Biol 1986;188:415–431.PubMedCrossRefGoogle Scholar
  33. 33.
    Stormo GD, Schneider TD, Gold L. Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res 1986;14:6661–6679.PubMedCrossRefGoogle Scholar
  34. 34.
    Berg OG, von Hippel PH. Selection of DNA binding sites by regulatory proteins: statistical-mechanical theory and application to operators and promoters. J Mol Biol 1987;193:723–750.PubMedCrossRefGoogle Scholar
  35. 35.
    Stormo GD, Fields DS. Specificity, free energy and information content in protein-DNA interactions. Trends Biochem Sci 1998;3:109–113.CrossRefGoogle Scholar
  36. 36.
    Stormo GD. DNA binding sites: representation and discovery. Bioinformatics 2000;1:16–23.CrossRefGoogle Scholar
  37. 37.
    Djordjevic M, Sengupta AM, Shraiman BI. A biophysical approach to transcription factor binding site discovery. Genome Res 2003;13:2381–2390.PubMedCrossRefGoogle Scholar
  38. 38.
    Fletcher R. Practical Methods of Optimization. New York: Wiley; 1987.Google Scholar
  39. 39.
    Cristianini N, Shawe-Taylor J. Introduction to support vector machines. Cambridge: Cambridge University Press; 2001.Google Scholar
  40. 40.
    Schölkopf B, Platt J, Shawe-Taylor J, et al. Estimating the support of a high-dimensional distribution. Neural Comput 2001;13:1443–1471.PubMedCrossRefGoogle Scholar
  41. 41.
    Manevitz LM, Yousef M. One-class SVMs for document classification. J Mach Learn Res 2001;2:139–154.CrossRefGoogle Scholar
  42. 42.
    Tax DMJ, Duin RPW. Uniform object generation for optimizing one-class classifiers. J Mach Learn Res 2002;2:155–173.CrossRefGoogle Scholar
  43. 43.
    Jaakkola T, Diekhans M, Haussler D. Using the Fisher kernel method to detect remote protein homologies. In: Lengauer T, Schneider R, Bork P, Brutlad D, Glasgow J, Mewes H, Zimmer R editors. ISMB 99. Proceedings Seventh International Conference on Intelligent Systems for Molecular Biology; 1999 Aug 6–11; Heidelberg, Germany. Menlo Park: AAAI Press; 1999:149–158.Google Scholar
  44. 44.
    Jaakkola T, Diekhans M, Haussler D. A discriminative framework for detecting remote protein homologies. J Comput Biol 2000;7:95–114.PubMedCrossRefGoogle Scholar
  45. 45.
    Furey TS, Cristianini N, Duffy N, et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000;16:906–914.PubMedCrossRefGoogle Scholar
  46. 46.
    Brown MP, Grundy WN, Lin D, et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 2000;97:262–267.PubMedCrossRefGoogle Scholar
  47. 47.
    Pavlidis P, Furey TS, Liberto M, Haussler D, Grundy WN. Promoter regionbased classification of genes. In: Altman RB, Dunker AK, Hunter L, Lauderdale K, Klein TE editors. BIOCOMPUTING 2001. Proceedings of the Pacific Symposium; 2001 Jan 3–7; Mauna Lani, Hawaii, USA. Singapore: World Scientific; 2000:151–163.Google Scholar
  48. 48.
    Vert JP. Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings. In: Altman RB, Dunker AK, Hunter L, Lauderdale K, Klein TE editors. BIOCOMPUTING 2002. Proceedings of the Pacific Symposium; 2002 Jan 3–7; Kauai, Hawaii, USA. Singapore: World Scientific; 2001:649–660.Google Scholar
  49. 49.
    Schölkopf B, Tsuda K, Vert JP. Kernel Methods in Computational Biology. Cambridge: The MIT Press; 2004.Google Scholar
  50. 50.
    Kowalczyk A, Raskutti B. One class SVM for yeast regulation prediction, ACM SIGKDD Explorations Newsletter 2002;4:99–100.CrossRefGoogle Scholar
  51. 51.
    Egan JP. Signal Detection Theory and ROC Analysis. New York: Academic Press, 1975.Google Scholar
  52. 52.
    Bulyk ML, Johnson PL, Church GM. Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res 2002;30:1255–1261.PubMedCrossRefGoogle Scholar
  53. 53.
    Benos PV, Bulyk ML, Stormo GD. Additivity in protein-DNA interactions: how good an approximation is it? Nucleic Acids Res 2002;30:4442–4451.PubMedCrossRefGoogle Scholar
  54. 54.
    O’Flanagan RA, Paillard G, Lavery R, Sengupta AM. Non-additivity in protein-DNA binding. Bioinformatics 2005;21:2254–2263.PubMedCrossRefGoogle Scholar
  55. 55.
    Paillard G, Lavery R. Analyzing protein-DNA recognition mechanisms. Structure 2004;12:113–122.PubMedCrossRefGoogle Scholar
  56. 56.
    Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York: Springer; 2001.Google Scholar
  57. 57.
    Hosmer DW, Lemeshow S. Applied Logistic Regression. New York: Wiley; 2000.Google Scholar
  58. 58.
    Dietterich TG. Machine learning research: four current directions. AI Magazine 1997;18:97–136.Google Scholar
  59. 59.
    Johnson A. A combinatorial regulatory circuit in budding yeast. In: McKnight SL, Yamamoto KR, editors. Transcriptional Regulation. Cold Spring Harbor: Cold Spring Harbor Laboratory Press; 1992.Google Scholar
  60. 60.
    Nagaraj VH, O’Flanagan RA, Bruning AR, et al. Combined analysis of expression data and transcription factor binding sites in the yeast genome. BMC Genomics 2004;5:59.PubMedCrossRefGoogle Scholar
  61. 61.
    Galitski T, Saldanha AJ, Styles CA, et al. Ploidy regulation of gene expression. Science 1999;285:251–254.PubMedCrossRefGoogle Scholar
  62. 62.
    Jin Y, Zhong H, Vershon AK. The yeast a1 and alpha2 homeodomain proteins do not contribute equally to heterodimeric DNA binding. Mol Cell Biol 1999;19, 585–593.PubMedGoogle Scholar
  63. 63.
    Galgoczy DJ, Cassidy-Stone A, Llinas M, et al. Genomic dissection of the cell-type-specification circuit in Saccharomyces cerevisiae. Proc Natl Acad Sci USA 2004;101:18069–18074.PubMedCrossRefGoogle Scholar
  64. 64.
    Beer MA, Tavazoie S. Predicting gene expression from sequence. Cell 2004;117:185–198.PubMedCrossRefGoogle Scholar
  65. 65.
    Segal E, Shapira M, Regev A, Pe’er D, Botstein D, Koller D, Friedman N. Module networks: identifying regulatory modules and their conditionspecific regulators from gene expression data. Nat Genet 2003;34:166–176.PubMedCrossRefGoogle Scholar
  66. 66.
    McGuire AM, Hughes JD, Church GM. Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res 2000;10:744–757.PubMedCrossRefGoogle Scholar
  67. 67.
    Pennacchio LA, Rubin EM. Genomic strategies to identify mammalian regulatory sequences. Nat Rev Genet 2001;2:100–109.PubMedCrossRefGoogle Scholar
  68. 68.
    Blanchette M, Tompa M. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 2002;12:739–748.PubMedCrossRefGoogle Scholar
  69. 69.
    Bulyk ML. Computational prediction of transcription-factor binding site locations. Genome Biol 2003;5:201.PubMedCrossRefGoogle Scholar
  70. 70.
    Cliften P, Sudarsanam P, Desikan A, et al. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 2003;301:71–76.PubMedCrossRefGoogle Scholar
  71. 71.
    Kellis M, Patterson N, Endrizzi M, et al. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 2003;423:241–254.PubMedCrossRefGoogle Scholar
  72. 72.
    Miller AM, MacKay VL, Nasmyth, KA. Identification and comparison of two sequence elements that confer cell-type specific transcription in yeast. Nature 1985;314:598–603.PubMedCrossRefGoogle Scholar
  73. 73.
    Morgenstern B, Dress A, Werner T. Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc Natl Acad Sci USA 1996;93:12098–12103.PubMedCrossRefGoogle Scholar
  74. 74.
    Morgenstern B, Frech K, Dress A, Werner T. DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 1998;14:290–294.PubMedCrossRefGoogle Scholar
  75. 75.
    Morgenstern B. DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 1999;15, 211–218.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press Inc. 2007

Authors and Affiliations

  • Vijayalakshmi H. Nagaraj
    • 1
  • Anirvan M. Sengupta
    • 1
  1. 1.BioMaPS InstituteRutgers University, The State University of New JerseyPiscatawayUSA

Personalised recommendations