Correlated Selection on Amino Acid Deletion and Replacement in Mammalian Protein Sequences
A low ratio of nonsynonymous and synonymous substitution rates (dN/dS) at a codon is an indicator of functional constraint caused by purifying selection. Intuitively, the functional constraint would also be expected to prevent such a codon from being deleted. However, to the best of our knowledge, the correlation between the rates of deletion and substitution has never actually been estimated. Here, we use 8595 protein-coding region sequences from nine mammalian species to examine the relationship between deletion rate and dN/dS. We find significant positive correlations at the levels of both sites and genes. We compared our data against controls consisting of simulated coding sequences evolving along identical phylogenetic trees, where deletions occur independently of substitutions. A much weaker correlation was found in the corresponding simulated sequences, probably caused by alignment errors. In the real data, the correlations cannot be explained by alignment errors. Separate investigations on nonsynonymous (dN) and synonymous (dS) substitution rates indicate that the correlation is most likely due to a similarity in patterns of selection rather than in mutation rates.
KeywordsMammals Protein-coding genes dN/dS Codon deletion Purifying selection Indifferent DNA
We used the Maxwell cluster from the Center of Advanced Computing and Data Systems (CACDS) at the University of Houston. CACDS staff provided technical support. We would like to thank Sarah Parks and her colleagues at EMBL-European Bioinformatics Institute for their help in running the SLR program on part of our data. R.B.R.A. was funded by NIH R01GM101352. We would also like to thank Jaanus Suurväli and Jan Gravemeyer at University of Cologne for their help in manuscript editing.
- Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, Hillsdale, p 67Google Scholar
- Graur D (2016) Molecular and genome evolution. Sinauer Associates, SunderlandGoogle Scholar
- Kolmogorov A (1933) Sulla determinazione empirica di una legge di distribuzione. G Ist Ital Attuari 4:83–91Google Scholar
- Landan G, Graur D (2008) Local reliability measures from sets of co-optimal multiple sequence alignments. Pac Symp Biocomput 13:15–24Google Scholar
- Montgomery SB, Goode DL, Kvikstad E, Albers CA, Zhang ZD, Mu XJ, Ananda G, Howie B, Karczewski KJ, Smith KS, Anaya V (2013) The origin, evolution, and functional impact of short insertion–deletion variants identified in 179 human genomes. Genome Res 23(5):749–761CrossRefPubMedPubMedCentralGoogle Scholar