Skip to main content

The Identification of Cis-Regulatory Sequence Motifs in Gene Promoters Based on SNP Information

  • Protocol
  • First Online:
Plant Synthetic Promoters

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1482))

Abstract

Conservation of particular molecular sequence motifs throughout evolution is a strong indicator of their functional relevance as selective pressure likely prevented the accumulation of mutations. Known as “phylogenetic footprinting”, this rationale has been exploited for the identification of novel functional motifs using sequence information from sequence alignments of diverse species, in particular transcription factor binding site motifs in aligned gene promoter sequences of orthologous genes. With the rapid advances of sequencing technologies, whole genome sequence information is accumulating not only across different species, but increasingly for variants of the same species exhibiting relatively little sequence variability, primarily present as single nucleotide polymorphisms (SNPs). Here, we lay out the basic strategy for the identification of functional cis-regulatory motifs in gene promoter regions based on SNP information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

TFBS:

Transcription factor binding site

TSS:

Transcription start site

SNP:

Single nucleotide polymorphism

References

  1. Tagle DA, Koop BF, Goodman M, Slightom JL, Hess DL, Jones RT (1988) Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J Mol Biol 203(2):439–455

    Article  CAS  PubMed  Google Scholar 

  2. Wasserman WW, Palumbo M, Thompson W, Fickett JW, Lawrence CE (2000) Human-mouse genome comparisons to locate regulatory sites. Nat Genet 26(2):225–228. doi:10.1038/79965

    Article  CAS  PubMed  Google Scholar 

  3. Blanchette M, Tompa M (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 12(5):739–748. doi:10.1101/gr.6902

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Blanchette M, Schwikowski B, Tompa M (2002) Algorithms for phylogenetic footprinting. J Comput Biol 9(2):211–223. doi:10.1089/10665270252935421

    Article  CAS  PubMed  Google Scholar 

  5. Blanchette M, Tompa M (2003) FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Res 31(13):3840–3842

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. McGuire AM, Hughes JD, Church GM (2000) Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res 10(6):744–757

    Article  CAS  PubMed  Google Scholar 

  7. Gelfand MS, Koonin EV, Mironov AA (2000) Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. Nucleic Acids Res 28(3):695–705

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Boffelli D, McAuliffe J, Ovcharenko D, Lewis KD, Ovcharenko I, Pachter L, Rubin EM (2003) Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299(5611):1391–1394. doi:10.1126/science.1081331

    Article  CAS  PubMed  Google Scholar 

  9. Hong RL, Hamaguchi L, Busch MA, Weigel D (2003) Regulatory elements of the floral homeotic gene AGAMOUS identified by phylogenetic footprinting and shadowing. Plant Cell 15(6):1296–1309

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Boffelli D (2008) Phylogenetic shadowing: sequence comparisons of multiple primate species. Methods Mol Biol 453:217–231. doi:10.1007/978-1-60327-429-6_10

    Article  CAS  PubMed  Google Scholar 

  11. Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES (2003) Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423(6937):241–254. doi:10.1038/nature01644

    Article  CAS  PubMed  Google Scholar 

  12. Korkuc P, Schippers JH, Walther D (2014) Characterization and identification of cis-regulatory elements in Arabidopsis based on single-nucleotide polymorphism information. Plant Physiol 164(1):181–200. doi:10.1104/pp.113.229716

    Article  CAS  PubMed  Google Scholar 

  13. Huala E, Dickerman AW, Garcia-Hernandez M, Weems D, Reiser L, LaFond F, Hanley D, Kiphart D, Zhuang M, Huang W, Mueller LA, Bhattacharyya D, Bhaya D, Sobral BW, Beavis W, Meinke DW, Town CD, Somerville C, Rhee SY (2001) The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res 29(1):102–105

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER, Weinstock GM, Wilson RK, Ding L (2009) VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25(17):2283–2285. doi:10.1093/bioinformatics/btp373

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Kumar S, You FM, Cloutier S (2012) Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries. BMC Genomics 13:684. doi:10.1186/1471-2164-13-684

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Issel-Tarver L, Christie KR, Dolinski K, Andrada R, Balakrishnan R, Ball CA, Binkley G, Dong S, Dwight SS, Fisk DG, Harris M, Schroeder M, Sethuraman A, Tse K, Weng S, Botstein D, Cherry JM (2002) Saccharomyces Genome Database. Methods Enzymol 350:329–346

    Article  CAS  PubMed  Google Scholar 

  17. Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, Holloway E, Kolesnykov N, Lilja P, Lukk M, Mani R, Rayner T, Sharma A, William E, Sarkans U, Brazma A (2007) ArrayExpress—a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35(Database issue):D747–D750. doi:10.1093/nar/gkl995

    Article  CAS  PubMed  Google Scholar 

  19. Craigon DJ, James N, Okyere J, Higgins J, Jotham J, May S (2004) NASCArrays: a repository for microarray data generated by NASC’s transcriptomics service. Nucleic Acids Res 32(Database issue):D575–D577. doi:10.1093/nar/gkh133

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Wingender E, Dietze P, Karas H, Knuppel R (1996) TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res 24(1):238–241

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32(Database issue):D91–D94. doi:10.1093/nar/gkh012

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. O’Connor TR, Dyreson C, Wyrick JJ (2005) Athena: a resource for rapid visualization and systematic analysis of Arabidopsis promoter sequences. Bioinformatics 21(24):4411–4413. doi:10.1093/bioinformatics/bti714

    Article  PubMed  Google Scholar 

  23. Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA, Mann I, Cook K, Zheng H, Goity A, van Bakel H, Lozano JC, Galli M, Lewsey MG, Huang E, Mukherjee T, Chen X, Reece-Hoyes JS, Govindarajan S, Shaulsky G, Walhout AJ, Bouget FY, Ratsch G, Larrondo LF, Ecker JR, Hughes TR (2014) Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158(6):1431–1443. doi:10.1016/j.cell.2014.08.009

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Kielbasa SM, Korbel JO, Beule D, Schuchhardt J, Herzel H (2001) Combining frequency and positional information to predict transcription factor binding sites. Bioinformatics 17(11):1019–1026

    Article  CAS  PubMed  Google Scholar 

  25. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate—a practical and powerful approach to multiple testing. J R Stat Soc Series B 57(1):289–300

    Google Scholar 

  26. Dunn J (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57. doi:10.1080/01969727308546046

    Article  Google Scholar 

  27. Tan P-N, Steinbach M, Kummar V (2006) Cluster analysis: basic concepts and algorithms. In: Tan P-N, Steinbach M, Kummar V (eds) Introduction to data mining. Pearson Education, Essex, UK

    Google Scholar 

  28. Thompson J, Gibson T, Higgins D (2002) Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics 2–3

    Google Scholar 

  29. Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS (2009) MEME SUITE: tools for motif discovery and searching. Web Server Issue 37(2):W202–W208. doi:10.1093/nar/gkp335

    CAS  Google Scholar 

  30. Jothi R, Cuddapah S, Barski A, Cui K, Zhao K (2008) Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res 36(16):5221–5231. doi:10.1093/nar/gkn488

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Alexandrov NN, Troukhan ME, Brover VV, Tatarinova T, Flavell RB, Feldmann KA (2006) Features of Arabidopsis genes and genome discovered using full-length cDNAs. Plant Mol Biol 60(1):69–85. doi:10.1007/s11103-005-2564-9

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dirk Walther .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Korkuć, P., Walther, D. (2016). The Identification of Cis-Regulatory Sequence Motifs in Gene Promoters Based on SNP Information. In: Hehl, R. (eds) Plant Synthetic Promoters. Methods in Molecular Biology, vol 1482. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6396-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6396-6_3

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6394-2

  • Online ISBN: 978-1-4939-6396-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics