Abstract
Conservation of particular molecular sequence motifs throughout evolution is a strong indicator of their functional relevance as selective pressure likely prevented the accumulation of mutations. Known as “phylogenetic footprinting”, this rationale has been exploited for the identification of novel functional motifs using sequence information from sequence alignments of diverse species, in particular transcription factor binding site motifs in aligned gene promoter sequences of orthologous genes. With the rapid advances of sequencing technologies, whole genome sequence information is accumulating not only across different species, but increasingly for variants of the same species exhibiting relatively little sequence variability, primarily present as single nucleotide polymorphisms (SNPs). Here, we lay out the basic strategy for the identification of functional cis-regulatory motifs in gene promoter regions based on SNP information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- TFBS:
-
Transcription factor binding site
- TSS:
-
Transcription start site
- SNP:
-
Single nucleotide polymorphism
References
Tagle DA, Koop BF, Goodman M, Slightom JL, Hess DL, Jones RT (1988) Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J Mol Biol 203(2):439–455
Wasserman WW, Palumbo M, Thompson W, Fickett JW, Lawrence CE (2000) Human-mouse genome comparisons to locate regulatory sites. Nat Genet 26(2):225–228. doi:10.1038/79965
Blanchette M, Tompa M (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 12(5):739–748. doi:10.1101/gr.6902
Blanchette M, Schwikowski B, Tompa M (2002) Algorithms for phylogenetic footprinting. J Comput Biol 9(2):211–223. doi:10.1089/10665270252935421
Blanchette M, Tompa M (2003) FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Res 31(13):3840–3842
McGuire AM, Hughes JD, Church GM (2000) Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res 10(6):744–757
Gelfand MS, Koonin EV, Mironov AA (2000) Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. Nucleic Acids Res 28(3):695–705
Boffelli D, McAuliffe J, Ovcharenko D, Lewis KD, Ovcharenko I, Pachter L, Rubin EM (2003) Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299(5611):1391–1394. doi:10.1126/science.1081331
Hong RL, Hamaguchi L, Busch MA, Weigel D (2003) Regulatory elements of the floral homeotic gene AGAMOUS identified by phylogenetic footprinting and shadowing. Plant Cell 15(6):1296–1309
Boffelli D (2008) Phylogenetic shadowing: sequence comparisons of multiple primate species. Methods Mol Biol 453:217–231. doi:10.1007/978-1-60327-429-6_10
Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES (2003) Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423(6937):241–254. doi:10.1038/nature01644
Korkuc P, Schippers JH, Walther D (2014) Characterization and identification of cis-regulatory elements in Arabidopsis based on single-nucleotide polymorphism information. Plant Physiol 164(1):181–200. doi:10.1104/pp.113.229716
Huala E, Dickerman AW, Garcia-Hernandez M, Weems D, Reiser L, LaFond F, Hanley D, Kiphart D, Zhuang M, Huang W, Mueller LA, Bhattacharyya D, Bhaya D, Sobral BW, Beavis W, Meinke DW, Town CD, Somerville C, Rhee SY (2001) The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res 29(1):102–105
Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER, Weinstock GM, Wilson RK, Ding L (2009) VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25(17):2283–2285. doi:10.1093/bioinformatics/btp373
Kumar S, You FM, Cloutier S (2012) Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries. BMC Genomics 13:684. doi:10.1186/1471-2164-13-684
Issel-Tarver L, Christie KR, Dolinski K, Andrada R, Balakrishnan R, Ball CA, Binkley G, Dong S, Dwight SS, Fisk DG, Harris M, Schroeder M, Sethuraman A, Tse K, Weng S, Botstein D, Cherry JM (2002) Saccharomyces Genome Database. Methods Enzymol 350:329–346
Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210
Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, Holloway E, Kolesnykov N, Lilja P, Lukk M, Mani R, Rayner T, Sharma A, William E, Sarkans U, Brazma A (2007) ArrayExpress—a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35(Database issue):D747–D750. doi:10.1093/nar/gkl995
Craigon DJ, James N, Okyere J, Higgins J, Jotham J, May S (2004) NASCArrays: a repository for microarray data generated by NASC’s transcriptomics service. Nucleic Acids Res 32(Database issue):D575–D577. doi:10.1093/nar/gkh133
Wingender E, Dietze P, Karas H, Knuppel R (1996) TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res 24(1):238–241
Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32(Database issue):D91–D94. doi:10.1093/nar/gkh012
O’Connor TR, Dyreson C, Wyrick JJ (2005) Athena: a resource for rapid visualization and systematic analysis of Arabidopsis promoter sequences. Bioinformatics 21(24):4411–4413. doi:10.1093/bioinformatics/bti714
Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA, Mann I, Cook K, Zheng H, Goity A, van Bakel H, Lozano JC, Galli M, Lewsey MG, Huang E, Mukherjee T, Chen X, Reece-Hoyes JS, Govindarajan S, Shaulsky G, Walhout AJ, Bouget FY, Ratsch G, Larrondo LF, Ecker JR, Hughes TR (2014) Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158(6):1431–1443. doi:10.1016/j.cell.2014.08.009
Kielbasa SM, Korbel JO, Beule D, Schuchhardt J, Herzel H (2001) Combining frequency and positional information to predict transcription factor binding sites. Bioinformatics 17(11):1019–1026
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate—a practical and powerful approach to multiple testing. J R Stat Soc Series B 57(1):289–300
Dunn J (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57. doi:10.1080/01969727308546046
Tan P-N, Steinbach M, Kummar V (2006) Cluster analysis: basic concepts and algorithms. In: Tan P-N, Steinbach M, Kummar V (eds) Introduction to data mining. Pearson Education, Essex, UK
Thompson J, Gibson T, Higgins D (2002) Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics 2–3
Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS (2009) MEME SUITE: tools for motif discovery and searching. Web Server Issue 37(2):W202–W208. doi:10.1093/nar/gkp335
Jothi R, Cuddapah S, Barski A, Cui K, Zhao K (2008) Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res 36(16):5221–5231. doi:10.1093/nar/gkn488
Alexandrov NN, Troukhan ME, Brover VV, Tatarinova T, Flavell RB, Feldmann KA (2006) Features of Arabidopsis genes and genome discovered using full-length cDNAs. Plant Mol Biol 60(1):69–85. doi:10.1007/s11103-005-2564-9
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this protocol
Cite this protocol
Korkuć, P., Walther, D. (2016). The Identification of Cis-Regulatory Sequence Motifs in Gene Promoters Based on SNP Information. In: Hehl, R. (eds) Plant Synthetic Promoters. Methods in Molecular Biology, vol 1482. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6396-6_3
Download citation
DOI: https://doi.org/10.1007/978-1-4939-6396-6_3
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6394-2
Online ISBN: 978-1-4939-6396-6
eBook Packages: Springer Protocols