Skip to main content
Log in

The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment

  • Articles
  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

The size distributions of deletions, insertions, and indels (i.e., insertions or deletions) were studied, using 78 human processed pseudogenes and other published data sets. The following results were obtained: (1) Deletions occur more frequently than do insertions in sequence evolution; none of the pseudogenes studied shows significantly more insertions than deletions. (2) Empirically, the size distributions of deletions, insertions, and indels can be described well by a power law, i.e., f k = Ck b, where f k is the frequency of deletion, insertion, or indel with gap length k, b is the power parameter, and C is the normalization factor. (3) The estimates of b for deletions and insertions from the same data set are approximately equal to each other, indicating that the size distributions for deletions and insertions are approximately identical. (4) The variation in the estimates of b among various data sets is small, indicating that the effect of local structure exists but only plays a secondary role in the size distribution of deletions and insertions. (5) The linear gap penalty, which is most commonly used in sequence alignment, is not supported by our analysis; rather, the power law for the size distribution of indels suggests that an appropriate gap penalty is w k = a + b ln k, where a is the gap creation cost and blnk is the gap extension cost. (6) The higher frequency of deletion over insertion suggests that the gap creation cost of insertion (a i ) should be larger than that of deletion (a d ); that is, a i a d = In R, where R is the frequency ratio of deletions to insertions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Barton GJ, Sternberg MJE (1987) Evaluation and improvements in the automatic alignment of protein sequences. Protein Eng 1:89–94

    Google Scholar 

  • Benner SA, Cohen MA, Gannet GH (1993) Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J Mol Biol 229:1065–1082

    Google Scholar 

  • de Jong WW, Ryden L (1981) Causes of more frequent deletions than insertions in mutations and protein evolution. Nature 290:157–159

    Google Scholar 

  • Doolittle RF (1981) Similar amino acid sequences: chance or common ancestry? Science 214:149–159

    Google Scholar 

  • Fitch W, Smith TF (1983) Optimal sequence alignments. Proc Natl Acad Sci USA 80:1382–1386

    Google Scholar 

  • Golenberg EM, Clegg MT, Durbin ML, Doebley J, Ma DP (1993) Evolution of a noncoding region of the chloroplast genome. Mol Phylogenet Evol 2:52–64

    Google Scholar 

  • Grant D, Shuali Y, Li WH (1989) Deletions in processed pseudogenes accumulate faster in rodents than in humans. J Mol Evol 28:279–285

    Google Scholar 

  • Johnson NL, Kotz S (1969) Discrete distributions. John Wiley and Sons, New York

    Google Scholar 

  • Higgins DG, Bleasby AJ, Fuchs R (1992) CLUSTALV: improved software for multiple sequence alignment. CABIOS 8:189–191

    Google Scholar 

  • Krawczak M, Cooper DN (1991) Gene deletions causing human genetic diseases: mechanisms of mutagenesis and the role of the local DNA sequence environment. Hum Genet 86:425–441

    Google Scholar 

  • Kunkel TA (1990) Misalignment-mediated DNA synthesis errors. Biochemistry 29:8003–8011

    Google Scholar 

  • McClure MA, Vasi TK, Fitch WM (1994) Comparative analysis of multiple protein-sequence alignment methods. Mol Biol Evol 11:571–592

    Google Scholar 

  • Murata M (1990) Three-way Needleman-Wunsch algorithm. In: Doolittle RF (ed) Methods in enzymology, vol 183. Academic Press, San Diego, pp 365–375

    Google Scholar 

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequences of two proteins. J Mol Biol 48:444–453

    Google Scholar 

  • Pascarella S, Argos P (1992) Analysis of insertions/deletions in protein structures. J Mol Biol 224:461–471

    Google Scholar 

  • Saitou N, Ueda S (1994) Evolutionary rates of insertion and deletion in noncoding nucleotide sequences of primates. Mol Biol Evol 11:504–512

    Google Scholar 

  • Thorne JL, Kishino H, Felsenstein (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J Mol Evol 33:114–124

    Google Scholar 

  • Thorne JL, Kishino H, Felsenstein (1992) Inching toward reality: an improved likelihood model of sequence evolution. J Mol Evol 34:3–16

    Google Scholar 

  • Thorne JL, Kishino H (1992) Freeing phylogenies from artifacts of alignment. Mol Biol Evol 9:1148–1162

    Google Scholar 

  • Vanin EF (1985) Processed pseudogenes: characteristics and evolution. Annu Rev Genet 19:253–272

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Correspondence to: W.-H. Li

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gu, X., Li, WH. The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment. J Mol Evol 40, 464–473 (1995). https://doi.org/10.1007/BF00164032

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00164032

Key words

Navigation