Molecular Genetics and Genomics

, Volume 291, Issue 5, pp 1851–1869 | Cite as

Selection pressure on human STR loci and its relevance in repeat expansion disease

  • Makoto K. Shimada
  • Ryoko Sanbonmatsu
  • Yumi Yamaguchi-Kabata
  • Chisato Yamasaki
  • Yoshiyuki Suzuki
  • Ranajit Chakraborty
  • Takashi Gojobori
  • Tadashi Imanishi
Original Article


Short Tandem Repeats (STRs) comprise repeats of one to several base pairs. Because of the high mutability due to strand slippage during DNA synthesis, rapid evolutionary change in the number of repeating units directly shapes the range of repeat-number variation according to selection pressure. However, the remaining questions include: Why are STRs causing repeat expansion diseases maintained in the human population; and why are these limited to neurodegenerative diseases? By evaluating the genome-wide selection pressure on STRs using the database we constructed, we identified two different patterns of relationship in repeat-number polymorphisms between DNA and amino-acid sequences, although both patterns are evolutionary consequences of avoiding the formation of harmful long STRs. First, a mixture of degenerate codons is represented in poly-proline (poly-P) repeats. Second, long poly-glutamine (poly-Q) repeats are favored at the protein level; however, at the DNA level, STRs encoding long poly-Qs are frequently divided by synonymous SNPs. Furthermore, significant enrichments of apoptosis and neurodevelopment were biological processes found specifically in genes encoding poly-Qs with repeat polymorphism. This suggests the existence of a specific molecular function for polymorphic and/or long poly-Q stretches. Given that the poly-Qs causing expansion diseases were longer than other poly-Qs, even in healthy subjects, our results indicate that the evolutionary benefits of long and/or polymorphic poly-Q stretches outweigh the risks of long CAG repeats predisposing to pathological hyper-expansions. Molecular pathways in neurodevelopment requiring long and polymorphic poly-Q stretches may provide a clue to understanding why poly-Q expansion diseases are limited to neurodegenerative diseases.


STR polymorphism Single amino-acid repeat Human evolution Triplet-repeat expansion disease Database for human polymorphism (VarySysDB) 



Short tandem repeat


Simple amino acids repeat


Untranslated region


Coding sequence region


Coding trinucleotide short tandem repeat


The international nucleotide sequence databases collaboration


Human-gene diversity of life-style related diseases/gene diversity database system


Gene ontology


Annotation data set for All Human Genes version 2


H-invitational database


H-InvDB transcript


H-InvDB gene cluster defined by mapping of transcripts on genome sequence


Percentage of G or C at the third codon



We are grateful to Hidetoshi Inoko for support to use H-GOLD/GDBS data, Yasuyuki Fujii, Katsuhiko Murakami, Yoshiharu Sato and Jun-ichi Takeda for providing gene structure and annotation data, Ryuzo Matsumoto and Yosuke Hayakawa for useful suggestion on computer programming, and other former member of the H-Invitational 2 consortium, Genome Information Integration Project (GIIP), the Integrated Database and Systems Biology Team of BIRC, AIST for their helpful support. This research was financially supported by the Ministry of Economy, Trade and Industry of Japan (METI) and the Japan Biological Informatics Consortium (JBIC). Also, this work is partly supported by the Grants-in-Aid for Scientific Research (C) to MKS (JSPS Grant Numbers 24510271 and 21510205), and the Saito Gratitude Foundation to MKS.

Compliance with ethical standards

Conflict of interest

All authors declare no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

This article does not contain any studies with human participants.

Data availability

Updated data on STRs and SARs within known human transcriptome sequences will be continuously provided in the VarySysDB database ( The original data of STRs and SARs in human exonic region are available at the web site of the first author, MKS ( and a web-based data sharing system provided by the research map (

Supplementary material

438_2016_1219_MOESM1_ESM.xlsx (57 kb)
Supplementary material 1 (XLSX 56 kb)
438_2016_1219_MOESM2_ESM.pdf (299 kb)
Supplementary material 2 (PDF 299 kb)
438_2016_1219_MOESM3_ESM.docx (786 kb)
Supplementary material 3 (DOCX 785 kb)


  1. Alba MM, Guigo R (2004) Comparative analysis of amino acid repeats in rodents and humans. Genome Res 14:549–554. doi: 10.1101/gr.1925704 PubMedPubMedCentralCrossRefGoogle Scholar
  2. Ananda G, Walsh E, Jacob KD, Krasilnikova M, Eckert KA, Chiaromonte F, Makova KD (2013) Distinct mutational behaviors differentiate short tandem repeats from microsatellites in the human genome. Genome Biol Evol 5:606–620. doi: 10.1093/gbe/evs116 PubMedCrossRefGoogle Scholar
  3. Andrés AM, Soldevila M, Lao O, Volpini V, Saitou N, Jacobs HT, Hayasaka I, Calafell F, Bertranpetit J (2004) Comparative genetics of functional trinucleotide tandem repeats in humans and apes. J Mol Evol 59:329–339. doi: 10.1007/s00239-004-2628-5 PubMedCrossRefGoogle Scholar
  4. Armstrong RA (2014) When to use the Bonferroni correction. Ophthalmic Physiol Opt 34:502–508. doi: 10.1111/opo.12131 PubMedCrossRefGoogle Scholar
  5. Astolfi P, Bellizzi D, Sgaramella V (2003) Frequency and coverage of trinucleotide repeats in eukaryotes. Gene 317:117–125. doi: 10.1016/S0378-1119(03)00659-0 PubMedCrossRefGoogle Scholar
  6. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc 57:289–300Google Scholar
  7. Bhattacharyya A, Thakur AK, Chellgren VM, Thiagarajan G, Williams AD, Chellgren BW, Creamer TP, Wetzel R (2006) Oligoproline effects on polyglutamine conformation and aggregation. J Mol Biol 355:524–535. doi: 10.1016/j.jmb.2005.10.053 PubMedCrossRefGoogle Scholar
  8. Birge L, Pitts M, Baker R, Wilkinson G (2010) Length polymorphism and head shape association among genes with polyglutamine repeats in the stalk-eyed fly, Teleopsis dalmanni. BMC Evol Biol 10:227. doi: 10.1186/1471-2148-10-227 PubMedPubMedCentralCrossRefGoogle Scholar
  9. Burke KA, Kauffman KJ, Umbaugh CS, Frey SL, Legleiter J (2013) The interaction of polyglutamine peptides with lipid membranes is regulated by flanking sequences associated with huntingtin. J Biol Chem 288:14993–15005. doi: 10.1074/jbc.M112.446237 PubMedPubMedCentralCrossRefGoogle Scholar
  10. Buschiazzo E, Gemmell NJ (2006) The rise, fall and renaissance of microsatellites in eukaryotic genomes. Bioessays 28:1040–1050. doi: 10.1002/bies.20470 PubMedCrossRefGoogle Scholar
  11. Chan HYE (2014) RNA-mediated pathogenic mechanisms in polyglutamine diseases and amyotrophic lateral sclerosis. Front Cell Neurosci 8:431. doi: 10.3389/fncel.2014.00431 PubMedPubMedCentralCrossRefGoogle Scholar
  12. Chang DK, Metzgar D, Wills C, Boland CR (2001) Microsatellites in the eukaryotic DNA mismatch repair genes as modulators of evolutionary mutation rate. Genome Res 11:1145–1146. doi: 10.1101/gr.186301 PubMedCrossRefGoogle Scholar
  13. Choudhry S, Mukerji M, Srivastava AK, Jain S, Brahmachari SK (2001) CAG repeat instability at SCA2 locus: anchoring CAA interruptions and linked single nucleotide polymorphisms. Hum Mol Genet 10:2437–2446. doi: 10.1093/hmg/10.21.2437 PubMedCrossRefGoogle Scholar
  14. Core Team R (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  15. Cotton AJ, Foldvari M, Cotton S, Pomiankowski A (2014) Male eyespan size is associated with meiotic drive in wild stalk-eyed flies (Teleopsis dalmanni). Heredity 112:363–369PubMedPubMedCentralCrossRefGoogle Scholar
  16. Darnell G, Orgel JP, Pahl R, Meredith SC (2007) Flanking polyproline sequences inhibit beta-sheet structure in polyglutamine segments by inducing PPII-like helix structure. J Mol Biol 374:688–704. doi: 10.1016/j.jmb.2007.09.023 PubMedCrossRefGoogle Scholar
  17. Deka R, Guangyun S, Smelser D, Zhong Y, Kimmel M, Chakraborty R (1999a) Rate and directionality of mutations and effects of allele size constraints at anonymous, gene-associated, and disease-causing trinucleotide loci. Mol Biol Evol 16:1166–1177PubMedCrossRefGoogle Scholar
  18. Deka R, Guangyun S, Wiest J, Smelser D, Chunhua S, Zhong Y, Chakraborty R (1999b) Patterns of instability of expanded CAG repeats at the ERDA1 locus in general populations. Am J Hum Genet 65:192–198. doi: 10.1086/302453 PubMedPubMedCentralCrossRefGoogle Scholar
  19. Dunbar RI (1998) The social brain hypothesis. Brain 9:178–190Google Scholar
  20. Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56:52–64. doi: 10.1080/01621459.1961.10482090 CrossRefGoogle Scholar
  21. Elhaik E, Landan G, Graur D, Can GC (2009) Content at third-codon positions be used as a proxy for isochore composition? Mol Biol Evol 26:1829–1833. doi: 10.1093/molbev/msp100 PubMedCrossRefGoogle Scholar
  22. Erwin AL, Bonthuis PJ, Geelhood JL, Nelson KL, McCrea KW, Gilsdorf JR, Smith AL (2006) Heterogeneity in tandem octanucleotides within Haemophilus influenzae lipopolysaccharide biosynthetic gene losA affects serum resistance. Infect Immun 74:3408–3414. doi: 10.1128/IAI.01540-05 PubMedPubMedCentralCrossRefGoogle Scholar
  23. Faux N (2012) Single amino acid and trinucleotide repeats: function and evolution. In: Hannan AJ (ed) Tandem repeat polymorphisms: genetic plasticity, neural diversity and disease, vol 769. Landes Bioscience and Springer Science + Business Media, New York, pp 26–40CrossRefGoogle Scholar
  24. Fiszer A, Krzyzosiak W (2013) RNA toxicity in polyglutamine disorders: concepts, models, and progress of research. J Mol Med 91:683–691. doi: 10.1007/s00109-013-1016-2 PubMedPubMedCentralCrossRefGoogle Scholar
  25. Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S et al (2012) Ensembl 2012. Nucleic Acids Res 40:D84–D90. doi: 10.1093/nar/gkr991 PubMedCrossRefGoogle Scholar
  26. Fondon JW III, Garner HR (2004) Molecular origins of rapid and continuous morphological evolution. Proc Natl Acad Sci USA 101:18058–18063. doi: 10.1073/pnas.0408118101 PubMedPubMedCentralCrossRefGoogle Scholar
  27. Fondon JW III, Hammock EAD, Hannan AJ, King DG (2008) Simple sequence repeats: genetic modulators of brain function and behavior. Trends Neurosci 31:328–334. doi: 10.1016/j.tins.2008.03.006 PubMedCrossRefGoogle Scholar
  28. Fukuda K, Ichiyanagi K, Yamada Y, Go Y, Udono T, Wada S, Maeda T, Soejima H, Saitou N, Ito T et al (2013) Regional DNA methylation differences between humans and chimpanzees are associated with genetic changes, transcriptional divergence and disease genes. J Hum Genet 58:446–454. doi: 10.1038/jhg.2013.55 PubMedCrossRefGoogle Scholar
  29. Galant R, Carroll SB (2002) Evolution of a transcriptional repression domain in an insect Hox protein. Nature 415:910–913. doi: 10.1038/nature717 PubMedCrossRefGoogle Scholar
  30. Gebhardt F, Zanker KS, Brandt B (1999) Modulation of epidermal growth factor receptor gene transcription by a polymorphic dinucleotide repeat in intron 1. J Biol Chem 274:13176–13180. doi: 10.1074/jbc.274.19.13176 PubMedCrossRefGoogle Scholar
  31. Gemayel R, Vinces MD, Legendre M, Verstrepen KJ (2010) Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annu Rev Genet 44:445–477. doi: 10.1146/annurev-genet-072610-155046 PubMedCrossRefGoogle Scholar
  32. Gojobori J, Ueda S (2011) Elevated evolutionary rate in genes with homopolymeric amino acid repeats constituting nondisordered structure. Mol Biol Evol 28:543–550. doi: 10.1093/molbev/msq225 PubMedCrossRefGoogle Scholar
  33. Grabczyk E, Mancuso M, Sammarco MC (2007) A persistent RNA. DNA hybrid formed by transcription of the Friedreich ataxia triplet repeat in live bacteria, and by T7 RNAP in vitro. Nucleic Acids Res 35:5351–5359. doi: 10.1093/nar/gkm589 PubMedPubMedCentralCrossRefGoogle Scholar
  34. Guo W-J, Ling J, Li P (2009) Consensus features of microsatellite distribution: microsatellite contents are universally correlated with recombination rates and are preferentially depressed by centromeres in multicellular eukaryotic genomes. Genomics 93:323–331. doi: 10.1016/j.ygeno.2008.12.009 PubMedCrossRefGoogle Scholar
  35. Guzhova IV, Lazarev VF, Kaznacheeva AV, Ippolitova MV, Muronetz VI, Kinev AV, Margulis BA (2011) Novel mechanism of Hsp70 chaperone-mediated prevention of polyglutamine aggregates in a cellular model of Huntington disease. Hum Mol Genet 20:3953–3963. doi: 10.1093/hmg/ddr314 PubMedCrossRefGoogle Scholar
  36. Haasl RJ, Payseur BA (2013) Microsatellites as targets of natural selection. Mol Biol Evol 30:285–298. doi: 10.1093/molbev/mss247 PubMedCrossRefGoogle Scholar
  37. Hammock EAD, Young LJ (2005) Microsatellite instability generates diversity in brain and sociobehavioral traits. Science 308:1630–1634. doi: 10.1126/science.1111427 PubMedCrossRefGoogle Scholar
  38. Huang DW, Sherman BT, Lempicki RA (2009a) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37:1–13. doi: 10.1093/nar/gkn923 CrossRefGoogle Scholar
  39. Huang DW, Sherman BT, Lempicki RA (2009b) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protocols 4:44–57. doi: 10.1038/nprot.2008.211 CrossRefGoogle Scholar
  40. Hui J, Hung LH, Heiner M, Schreiner S, Neumüller N, Reither G, Haas SA, Bindereif A (2005) Intronic CA-repeat and CA-rich elements: a new class of regulators of mammalian alternative splicing. EMBO J 24:1988–1998. doi: 10.1038/sj.emboj.7600677 PubMedPubMedCentralCrossRefGoogle Scholar
  41. Huntley MA, Clark AG (2007) Evolutionary analysis of amino acid repeats across the genomes of 12 Drosophila species. Mol Biol Evol 24:2598–2609. doi: 10.1093/molbev/msm129 PubMedCrossRefGoogle Scholar
  42. Imanishi T, Itoh T, Suzuki Y, O’Donovan C, Fukuchi S, Koyanagi KO, Barrero RA, Tamura T, Yamaguchi-Kabata Y, Tanino M et al (2004) Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol 2:e162. doi: 10.1371/journal.pbio.0020256 PubMedPubMedCentralCrossRefGoogle Scholar
  43. International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921. doi: 10.1038/35057062 CrossRefGoogle Scholar
  44. Jiang C, Zhao Z (2006) Mutational spectrum in the recent human genome inferred by single nucleotide polymorphisms. Genomics 88:527–534. doi: 10.1016/j.ygeno.2006.06.003 PubMedCrossRefGoogle Scholar
  45. Kashi Y, King DG (2006) Simple sequence repeats as advantageous mutators in evolution. Trends Genet 22:253–259. doi: 10.1016/j.tig.2006.03.005 PubMedCrossRefGoogle Scholar
  46. King DG (2012) Evolution of simple sequence repeats as mutable sites. In: Hannan AJ (ed) Tandem repeat polymorphisms: genetic plasticity, neural diversity and disease, vol 769. Landes Bioscience and Springer Science + Business Media, New York, pp 10–25CrossRefGoogle Scholar
  47. Kozlowski P, de Mezer M, Krzyzosiak WJ (2010) Trinucleotide repeats in human genome and exome. Nucleic Acids Res 38:4027–4039. doi: 10.1093/nar/gkq127 PubMedPubMedCentralCrossRefGoogle Scholar
  48. Kurosaki T, Ninokata A, Wang L, Ueda S (2006) Evolutionary scenario for acquisition of CAG repeats in human SCA1 gene. Gene 373:23–27. doi: 10.1016/j.gene.2005.12.017 PubMedCrossRefGoogle Scholar
  49. Labbadia J, Morimoto RI (2013) Huntington’s disease: underlying molecular mechanisms and emerging concepts. Trends Biochem Sci 38:378–385. doi: 10.1016/j.tibs.2013.05.003 PubMedPubMedCentralCrossRefGoogle Scholar
  50. Laffita-Mesa JM, Velazquez-Perez LC, Santos Falcon N, Cruz-Marino T, Gonzalez Zaldivar Y, Vazquez Mojena Y, Almaguer-Gotay D, Almaguer Mederos LE, Rodriguez Labrada R (2012) Unexpanded and intermediate CAG polymorphisms at the SCA2 locus (ATXN2) in the Cuban population: evidence about the origin of expanded SCA2 alleles. Eur J Hum Genet 20:41–49. doi: 10.1038/ejhg.2011.154 PubMedCrossRefGoogle Scholar
  51. Legendre M, Pochet N, Pak T, Verstrepen KJ (2007) Sequence-based estimation of minisatellite and microsatellite repeat variability. Genome Res 17:1787–1796. doi: 10.1101/gr.6554007 PubMedPubMedCentralCrossRefGoogle Scholar
  52. Lin Y, Wilson JH (2012) Nucleotide excision repair, mismatch repair, and R-loops modulate convergent transcription-induced cell death and repeat instability. PLoS One 7:e46807. doi: 10.1371/journal.pone.0046807 PubMedPubMedCentralCrossRefGoogle Scholar
  53. Lin Y, Leng M, Wan M, Wilson JH (2010) Convergent transcription through a long CAG tract destabilizes repeats and induces apoptosis. Mol Cell Biol 30:4435–4451. doi: 10.1128/mcb.00332-10 PubMedPubMedCentralCrossRefGoogle Scholar
  54. Lin W, Lin Y, Wilson J (2014) Convergent transcription through microsatellite repeat tracts induces cell death. Mol Biol Rep 41:5627–5634. doi: 10.1007/s11033-014-3432-y PubMedCrossRefGoogle Scholar
  55. Liu C-R, Chang C-R, Chern Y, Wang T-H, Hsieh W-C, Shen W-C, Chang C-Y, Chu IC, Deng N, Cohen SN et al (2012) Spt4 is selectively required for transcription of extended trinucleotide repeats. Cell 148:690–701. doi: 10.1016/j.cell.2011.12.032 PubMedCrossRefGoogle Scholar
  56. Lo Sardo V, Zuccato C, Gaudenzi G, Vitali B, Ramos C, Tartari M, Myre MA, Walker JA, Pistocchi A, Conti L et al (2012) An evolutionary recent neuroepithelial cell adhesion function of huntingtin implicates ADAM10-Ncadherin. Nat Neurosci 15:713–721. doi: 10.1038/nn.3080 PubMedCrossRefGoogle Scholar
  57. Lorenz M, Hewing B, Hui J, Zepp A, Baumann G, Bindereif A, Stangl V, Stangl K (2007) Alternative splicing in intron 13 of the human eNOS gene: a potential mechanism for regulating eNOS activity. FASEB J 21:1556–1564. doi: 10.1096/fj.06-7434com PubMedCrossRefGoogle Scholar
  58. McIvor EI, Polak U, Napierala M (2010) New insights into repeat instability: role of RNA-DNA hybrids. RNA Biol 7:551–558. doi: 10.4161/rna.7.5.12745 PubMedPubMedCentralCrossRefGoogle Scholar
  59. Mishra R, Jayaraman M, Roland BP, Landrum E, Fullam T, Kodali R, Thakur AK, Arduini I, Wetzel R (2012) Inhibiting the nucleation of amyloid structure in a huntingtin fragment by targeting α-helix-rich oligomeric intermediates. J Mol Biol 415:900–917. doi: 10.1016/j.jmb.2011.12.011 PubMedCrossRefGoogle Scholar
  60. Mizuguchi M, Obita T, Serita T, Kojima R, Nabeshima Y, Okazawa H (2014) Mutations in the PQBP1 gene prevent its interaction with the spliceosomal protein U5-15kD. Nat Commun 5:3822. doi: 10.1038/ncomms4822 PubMedCrossRefGoogle Scholar
  61. Mohan A, Goodwin M, Swanson MS (2014) RNA-protein interactions in unstable microsatellite diseases. Brain Res 1584:3–14. doi: 10.1016/j.brainres.2014.03.039 PubMedCrossRefGoogle Scholar
  62. Molnár Z, Kaas JH, de Carlos JA, Hevner RF, Lein E, Němec P (2014) Evolution and development of the mammalian cerebral cortex. Brain Behav Evol 83:126–139. doi: 10.1159/000357753 PubMedPubMedCentralGoogle Scholar
  63. Mühlau M, Winkelmann J, Rujescu D, Giegling I, Koutsouleris N, Gaser C, Arsic M, Weindl A, Reiser M, Meisenzahl EM (2012) Variation within the Huntington’s disease gene influences normal brain structure. PLoS One 7:e29809. doi: 10.1371/journal.pone.0029809 PubMedPubMedCentralCrossRefGoogle Scholar
  64. Mularoni L, Veitia RA, Alba MM (2007) Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. Genomics 89:316–325. doi: 10.1016/j.ygeno.2006.11.011 PubMedCrossRefGoogle Scholar
  65. Mularoni L, Ledda A, Toll-Riera M, Albà MM (2010) Natural selection drives the accumulation of amino acid tandem repeats in human proteins. Genome Res 20:745–754. doi: 10.1101/gr.101261.109 PubMedPubMedCentralCrossRefGoogle Scholar
  66. Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P et al (2002) InterPro: an integrated documentation resource for protein families, domains and functional sites. Brief Bioinform 3:225–235. doi: 10.1093/bib/3.3.225 PubMedCrossRefGoogle Scholar
  67. Nei M, Gojobori T (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3:418–426PubMedGoogle Scholar
  68. Nithianantharajah J, Hannan AJ (2007) Dynamic mutations as digital genetic modulators of brain development, function and dysfunction. Bioessays 29:525–535. doi: 10.1002/bies.20589 PubMedCrossRefGoogle Scholar
  69. Ogasawara M, Imanishi T, Moriwaki K, Gaudieri S, Tsuda H, Hashimoto H, Shiroishi T, Gojobori T, Koide T (2005) Length variation of CAG/CAA triplet repeats in 50 genes among 16 inbred mouse strains. Gene 349:107–119. doi: 10.1016/j.gene.2004.11.050 PubMedCrossRefGoogle Scholar
  70. Okazawa H (2003) Polyglutamine diseases: a transcription disorder? Cell Mol Life Sci 60:1427–1439. doi: 10.1007/s00018-003-3013-z PubMedCrossRefGoogle Scholar
  71. Okazawa H, Rich T, Chang A, Lin X, Waragai M, Kajikawa M, Enokido Y, Komuro A, Kato S, Shibata M et al (2002) Interaction between mutant ataxin-1 and PQBP-1 affects transcription and cell death. Neuron 34:701–713. doi: 10.1016/S0896-6273(02)00697-9 PubMedCrossRefGoogle Scholar
  72. Paulson HL (2000) Toward an understanding of polyglutamine neurodegeneration. Brain Pathol 10:293–299. doi: 10.1111/j.1750-3639.2000.tb00263.x PubMedCrossRefGoogle Scholar
  73. Perutz MF, Johnson T, Suzuki M, Finch JT (1994) Glutamine repeats as polar zippers: their possible role in inherited neurodegenerative diseases. Proc Natl Acad Sci USA 91:5355–5358PubMedPubMedCentralCrossRefGoogle Scholar
  74. Pruitt K, Brown G, Tatusova T, Maglott D (2002) The reference sequence (RefSeq) database. The NCBI handbook. National Center for Biotechnology Information, U.S. National Library of Medicine. Accessed 30 Jun 2015
  75. Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33:W116–W120. doi: 10.1093/nar/gki442 PubMedPubMedCentralCrossRefGoogle Scholar
  76. Rado-Trilla N, Alba M (2012) Dissecting the role of low-complexity regions in the evolution of vertebrate proteins. BMC Evol Biol 12:155. doi: 10.1186/1471-2148-12-155 PubMedPubMedCentralCrossRefGoogle Scholar
  77. Ramazzotti M, Monsellier E, Kamoun C, Degl’Innocenti D, Melki R (2012) Polyglutamine repeats are associated to specific sequence biases that are conserved among eukaryotes. PLoS One 7:e30824. doi: 10.1371/journal.pone.0030824 PubMedPubMedCentralCrossRefGoogle Scholar
  78. Rees M, Gorba C, de Chiara C, Bui TT, Garcia-Maya M, Drake AF, Okazawa H, Pastore A, Svergun D, Chen YW (2012) Solution model of the intrinsically disordered polyglutamine tract-binding protein-1. Biophys J 102:1608–1616. doi: 10.1016/j.bpj.2012.02.047 PubMedPubMedCentralCrossRefGoogle Scholar
  79. Richard G-F, Kerrest A, Dujon B (2008) Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol Biol Rev 72:686–727. doi: 10.1128/mmbr.00011-08 PubMedPubMedCentralCrossRefGoogle Scholar
  80. Rosenberg MS, Subramanian S, Kumar S (2003) Patterns of transitional mutation biases within and among mammalian genomes. Mol Biol Evol 20:988–993. doi: 10.1093/molbev/msg113 PubMedCrossRefGoogle Scholar
  81. Salinas-Rios V, Belotserkovskii BP, Hanawalt PC (2011) DNA slip-outs cause RNA polymerase II arrest in vitro: potential implications for genetic instability. Nucleic Acids Res 39:7444–7454. doi: 10.1093/nar/gkr429 PubMedPubMedCentralCrossRefGoogle Scholar
  82. Shao J, Diamond MI (2007) Polyglutamine diseases: emerging concepts in pathogenesis and therapy. Hum Mol Genet 16:R115–R123. doi: 10.1093/hmg/ddm213 PubMedCrossRefGoogle Scholar
  83. Shimada MK, Matsumoto R, Hayakawa Y, Sanbonmatsu R, Gough C, Yamaguchi-Kabata Y, Yamasaki C, Imanishi T, Gojobori T (2009) VarySysDB: a human genetic polymorphism database based on all H-InvDB transcripts. Nucleic Acids Res 37:D810–D815. doi: 10.1093/nar/gkn798 PubMedCrossRefGoogle Scholar
  84. Shiraishi R, Tamura T, Sone M, Okazawa H (2014) Systematic analysis of fly models with multiple drivers reveals different effects of Ataxin-1 and huntingtin in neuron subtype-specific expression. PLoS One 9:e116567. doi: 10.1371/journal.pone.0116567 PubMedPubMedCentralCrossRefGoogle Scholar
  85. Shiwaku H, Okazawa H (2015) Impaired DNA damage repair as a common feature of neurodegenerative diseases and psychiatric disorders. Curr Mol Med 15:119–128. doi: 10.2174/1566524015666150303002556 PubMedCrossRefGoogle Scholar
  86. Shriver MD, Jin L, Chakraborty R, Boerwinkle E (1993) VNTR allele frequency distributions under the stepwise mutation model: a computer simulation approach. Genetics 134:983–993PubMedPubMedCentralGoogle Scholar
  87. Siwach P, Pophaly SD, Ganesh S (2006) Genomic and evolutionary insights into genes encoding proteins with single amino acid repeats. Mol Biol Evol 23:1357–1369. doi: 10.1093/molbev/msk022 PubMedCrossRefGoogle Scholar
  88. Siwach P, Sengupta S, Parihar R, Ganesh S (2011) Proline repeats, in cis- and trans-positions, confer protection against the toxicity of misfolded proteins in a mammalian cellular model. Neurosci Res 70:435–441. doi: 10.1016/j.neures.2011.05.001 PubMedCrossRefGoogle Scholar
  89. Sobczak K, Krzyzosiak WJ (2004) Patterns of CAG repeat interruptions in SCA1 and SCA2 genes in relation to repeat instability. Hum Mutat 24:236–247. doi: 10.1002/humu.20075 PubMedCrossRefGoogle Scholar
  90. Sobczak K, Michlewski G, de Mezer M, Kierzek E, Krol J, Olejniczak M, Kierzek R, Krzyzosiak WJ (2010) Structural diversity of triplet repeat RNAs. J Biol Chem 285:12755–12764. doi: 10.1074/jbc.M109.078790 PubMedPubMedCentralCrossRefGoogle Scholar
  91. Suzuki Y (2011) Overestimation of nonsynonymous/synonymous rate ratio by reverse-translation of aligned amino acid sequences. Genes Genet Syst 86:123–129PubMedCrossRefGoogle Scholar
  92. Takahashi M, Mizuguchi M, Shinoda H, Aizawa T, Demura M, Okazawa H, Kawano K (2009) Polyglutamine tract binding protein-1 is an intrinsically unstructured protein. Biochim Biophys Acta Proteins Proteom 1794:936–943. doi: 10.1016/j.bbapap.2009.03.001 CrossRefGoogle Scholar
  93. Takahashi T, Katada S, Onodera O (2010) Polyglutamine diseases: where does toxicity come from? what is toxicity? where are we going? J Mol Cell Biol 2:180–191. doi: 10.1093/jmcb/mjq005 PubMedCrossRefGoogle Scholar
  94. Takezaki N, Nei M (2009) Genomic drift and evolution of microsatellite DNAs in human populations. Mol Biol Evol 26:1835–1840. doi: 10.1093/molbev/msp091 PubMedCrossRefGoogle Scholar
  95. Tamiya G, Shinya M, Imanishi T, Ikuta T, Makino S, Okamoto K, Furugaki K, Matsumoto T, Mano S, Ando S et al (2005) Whole genome association study of rheumatoid arthritis using 27039 microsatellites. Hum Mol Genet 14:2305–2321. doi: 10.1093/hmg/ddi234 PubMedCrossRefGoogle Scholar
  96. Tartari M, Gissi C, Lo Sardo V, Zuccato C, Picardi E, Pesole G, Cattaneo E (2008) Phylogenetic comparison of huntingtin homologues reveals the appearance of a primitive polyQ in sea urchin. Mol Biol Evol 25:330–338. doi: 10.1093/molbev/msm258 PubMedCrossRefGoogle Scholar
  97. Tatarinova T, Elhaik E, Pellegrini M (2013) Cross-species analysis of genic GC3 content and DNA methylation patterns. Genome Biol Evol 5:1443–1456. doi: 10.1093/gbe/evt103 PubMedPubMedCentralCrossRefGoogle Scholar
  98. The UniProt Consortium (2014) Activities at the universal protein resource (UniProt). Nucleic Acids Res 42:D191–D198. doi: 10.1093/nar/gkt1140 CrossRefGoogle Scholar
  99. Tompa P (2003) Intrinsically unstructured proteins evolve by repeat expansion. Bioessays 25:847–855. doi: 10.1002/bies.10324 PubMedCrossRefGoogle Scholar
  100. Trifonov EN (1989) The multiple codes of nucleotide sequences. Bull Math Biol 51:417–432PubMedCrossRefGoogle Scholar
  101. Tsoi H, Chan HYE (2014) Roles of the nucleolus in the CAG RNA-mediated toxicity. Biochim Biophys Acta Mol Basis Dis 1842:779–784. doi: 10.1016/j.bbadis.2013.11.015 CrossRefGoogle Scholar
  102. Vachharajani SN, Chaudhary RK, Prasad S, Roy I (2012) Length of polyglutamine tract affects secondary and tertiary structures of huntingtin protein. Int J Biol Macromol 51:920–925. doi: 10.1016/j.ijbiomac.2012.07.022 PubMedCrossRefGoogle Scholar
  103. Vinces MD, Legendre M, Caldara M, Hagihara M, Verstrepen KJ (2009) Unstable tandem repeats in promoters confer transcriptional evolvability. Science 324:1213–1216. doi: 10.1126/science.1170097 PubMedPubMedCentralCrossRefGoogle Scholar
  104. Wang Q, Moore MJ, Adelmant G, Marto JA, Silver PA (2013) PQBP1, a factor linked to intellectual disability, affects alternative splicing associated with neurite outgrowth. Genes Dev 27:615–626. doi: 10.1101/gad.212308.112 PubMedPubMedCentralCrossRefGoogle Scholar
  105. Waragai M, Lammers C-H, Takeuchi S, Imafuku I, Udagawa Y, Kanazawa I, Kawabata M, Mouradian MM, Okazawa H (1999) PQBP-1, a novel polyglutamine tract-binding protein, inhibits transcription activation by Brn-2 and affects cell survival. Hum Mol Genet 8:977–987. doi: 10.1093/hmg/8.6.977 PubMedCrossRefGoogle Scholar
  106. Weiser JN, Love JM, Moxon ER (1989) The molecular mechanism of phase variation of H. influenzae lipopolysaccharide. Cell 59:657–665. doi: 10.1016/0092-8674(89)90011-1 PubMedCrossRefGoogle Scholar
  107. Yamasaki C, Murakami K, Fujii Y, Sato Y, Harada E, J-i Takeda, Taniya T, Sakate R, Kikugawa S, Shimada M et al (2008) The H-invitational database (H-InvDB), a comprehensive annotation resource for human genes and transcripts. Nucleic Acids Res 36:D793–D799. doi: 10.1093/nar/gkm999 PubMedGoogle Scholar
  108. Yin R-H, Li Y, Yang F, Zhan Y-Q, Yu M, Ge C-H, Xu W-X, Tang L-J, Wang X-H, Chen B et al (2014) Expansion of the polyQ repeats in THAP11 forms intranuclear aggregation and causes cell G0/G1 arrest. Cell Biol Int 38:757–767. doi: 10.1002/cbin.10255 PubMedCrossRefGoogle Scholar
  109. Zaghlool A, Ameur A, Cavelier L, Feuk L (2014) Splicing in the Human Brain. In: Robert H, Shannon M (eds) International review of neurobiology, vol 116., Academic PressWaltham, MA, pp 95–125Google Scholar
  110. Zhang W, Bouffard GG, Wallace SS, Bond JP (2007) Estimation of DNA sequence context-dependent mutation rates using primate genomic sequences. J Mol Evol 65:207–214. doi: 10.1007/s00239-007-9000-5 PubMedCrossRefGoogle Scholar
  111. Zhang W, Zeng F, Liu Y, Zhao Y, Lv H, Niu L, Teng M, Li X (2013) Crystal structures and RNA-binding properties of the RNA recognition motifs of heterogeneous nuclear ribonucleoprotein L: insights into its roles in alternative-splicing regulation. J Biol Chem 288:22636–22649. doi: 10.1074/jbc.M113.463901 PubMedPubMedCentralCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Makoto K. Shimada
    • 1
    • 2
    • 3
  • Ryoko Sanbonmatsu
    • 3
  • Yumi Yamaguchi-Kabata
    • 2
    • 4
  • Chisato Yamasaki
    • 2
    • 3
  • Yoshiyuki Suzuki
    • 5
  • Ranajit Chakraborty
    • 6
  • Takashi Gojobori
    • 2
    • 7
  • Tadashi Imanishi
    • 2
    • 8
  1. 1.Institute for Comprehensive Medical ScienceFujita Health UniversityToyoakeJapan
  2. 2.National Institute of Advanced Industrial Science and TechnologyTokyoJapan
  3. 3.Japan Biological Informatics ConsortiumTokyoJapan
  4. 4.Tohoku Medical Megabank OrganizationTohoku UniversitySendaiJapan
  5. 5.Graduate School of Natural SciencesNagoya City UniversityNagoyaJapan
  6. 6.Health Science CenterUniversity of North TexasFort WorthUSA
  7. 7.Computational Bioscience Research CenterKing Abdullah University of Science and TechnologyThuwalKingdom of Saudi Arabia
  8. 8.Department of Molecular Life ScienceTokai University School of MedicineIseharaJapan

Personalised recommendations