A comprehensive examination of protein sequences for evidence of internal gene duplication

Summary

We have implemented a routine procedure for screening protein sequences for evidence of intragenic duplications. We tested 163 protein sequences representing 116 superfamilies of unrelated proteins. Twenty superfamilies contain proteins with internal gene duplications. The intragenic duplications detected can be divided into two major types. (1) One or more duplications of all or part of a gene produce a protein with two or several detectable regions of sequence homology. Sequences from 18 superfamilies contained this type of duplication. (2) Repeated reduplication of a small DNA segment can produce a protein that is repetitive over most of its length. Three superfamilies contain such repetitive sequences. We also investigated the limits of detection of ancient duplications using sequences derived by random mutation of a model sequence consisting of ten 10-residue repeats. The original repetitive nature of the sequence was usually detected after 250 point mutations even though the ancestral segment could not be accurately reconstructed.

This is a preview of subscription content, access via your institution.

References

  1. Ambler, R.P., Bruschi, M., LeGall, J. (1971). FEBS Lett.18, 347–350

    Google Scholar 

  2. Barker, W.C., Dayhoff, M.O. (1977). Comp. Biochem. Physiol.51B, 309–315

    Google Scholar 

  3. Barker, W.C., Dayhoff, M.O. (1976). Lipid-associated proteins. In: Atlas of Protein Sequence and Structure, Vol. 5, Suppl. 2, (M.O. Dayhoff, ed.), pp. 253–256. Washington, D.C.: National Biomedical Research Foundation

    Google Scholar 

  4. Barker, W.C., Dayhoff, M.O. (1972). Detecting distant relationships: computer methods and results. In: Atlas of Protein Sequence and Structure, Vol. 5, (M.O. Dayhoff, ed.), pp. 101–110. Washington, D.C.: National Biomedical Research Foundation

    Google Scholar 

  5. Black, J.A., Dixon, G.H. (1970). Can. J. Biochem.48, 133–146

    Google Scholar 

  6. Dayhoff, M.O. (1976a). Atlas of Protein Sequence and Structure, Vol. 5, Suppl. 2. Washington, D.C.: National Biomedical Research Foundation

    Google Scholar 

  7. Dayhoff, M.O. (1976b). Survey of new data and computer methods of analysis. In: Atlas of Protein Sequence and Structure, Vol. 5, Suppl. 2, (M.O. Dayhoff, ed.), pp. 1–8. Washington, D.C.: National Biomedical Research Foundation

    Google Scholar 

  8. Dayhoff, M.O. (1973). Atlas of Protein Sequence and Structure, Vol. 5, Suppl. 1. Washington, D.C.: National Biomedical Research Foundation

    Google Scholar 

  9. Dayhoff, M.O. (1972). Atlas of Protein Sequence and Structure, Vol. 5, Washington, D.C.: National Biomedical Research Foundation

    Google Scholar 

  10. Dayhoff, M.O., Barker, W.C. (1972): Mechanisms in molecular evolution: examples. In: Atlas of Protein Sequence and Structure, Vol. 5, (M.O. Dayhoff, ed.), pp. 41–45. Washington, D.C.: National Biomedical Research Foundation

    Google Scholar 

  11. Dayhoff, M.O., Barker, W.C., Hunt, L.T. (1976a). Protein superfamilies. In: Atlas of Protein Sequence and Structure, Vol. 5, Suppl. 2, (M.O. Dayhoff, ed.), pp. 9–19. Washington, D.C.: National Biomedical Research Foundation

    Google Scholar 

  12. Dayhoff, M.O., Dayhoff, R.E., Hunt, L.T. (1976b). Composition of proteins. In: Atlas of Protein Sequence and Structure, Vol. 5, Suppl. 2, (M.O. Dayhoff, ed.), pp. 301–310. Washington, D.C.: National Biomedical Research Foundation

    Google Scholar 

  13. Dayhoff, M.O., Eck, R.V., Park, C.M. (1972). A model of evolutionary change in proteins. In: Atlas of Protein Sequence and Structure, Vol. 5, (M.O. Dayhoff, ed.), pp. 89–99. Washington, D.C.: National Biomedical Research Foundation

    Google Scholar 

  14. DeLange, R.J., Hooper, J.A., Smith, E.L. (1973). J. Biol. Chem.248, 3261–3274

    Google Scholar 

  15. DeVries, A.L., Vandenheede, J., Feeney, R.E. (1971). J. Biol. Chem.246, 305–308

    Google Scholar 

  16. Eck, R.V., Dayhoff, M.O. (1966). Science152, 363–366

    Google Scholar 

  17. Fitch, W.M. (1966). J. Mol. Biol.16, 9–16

    Google Scholar 

  18. Hunt, L.T., Barker, W.C., Dayhoff, M.O. (1974). Biochem. Biophys. Res. Commun.60, 1020–1028

    Google Scholar 

  19. Kretsinger, R.H. (1972). Nature New Biol.240, 85–88

    Google Scholar 

  20. Lode, E.T., Coon, M.J. (1973). Role of rubredoxin in fatty acid and hydrocarbon hydroxylation reactions. In: Iron-sulfur Proteins, Vol. 1, (W. Lovenberg, ed.), pp. 173–191. New York: Academic

    Google Scholar 

  21. Lucas, F., Rudall, K.M. (1968). Extracellular fibrous proteins: the silks. In: Comprehensive Biochemistry, Vol. 26, part B, (M. Florkin and E.H. Stotz, eds.), pp. 475–558. Amsterdam: Elsevier

    Google Scholar 

  22. McLachlan, A.D. (1976). J. Mol. Biol.107, 159–174

    Google Scholar 

  23. McLachlan, A.D. (1972). J. Mol. Biol.64, 417–437

    Google Scholar 

  24. McLachlan, A.D. (1971). J. Mol. Biol.61, 409–424

    Google Scholar 

  25. McLachlan, A.D., Stewart, M., Smillie, L.B. (1975). J. Mol. Biol.98, 281–291

    Google Scholar 

  26. Odani, S., Ikenaka, T. (1973). J. Biochem.74, 857–860

    Google Scholar 

  27. Schwartz, R.M., Dayhoff, M.O. (1977). Detection of distant relationships based on point mutation data. In: Proceedings of the International Symposium on Evolution of Protein Molecules, (H. Matsubara and T. Yamanaka, eds.), Tokyo: Univ. Tokyo Press, (in press)

    Google Scholar 

  28. Van der Ouderaa, F.J., DeJong, W.W., Hilderink, A., Bloemendal, H. (1974). Eur. J. Biochem.49, 157–168

    Google Scholar 

  29. Yasunobu, K.T., Tanaka, M. (1973). Syst. Zool.22, 570–589

    Google Scholar 

  30. Ycas, M. (1972). J. Mol. Evol.2, 17–27

    Google Scholar 

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Barker, W.C., Ketcham, L.K. & Dayhoff, M.O. A comprehensive examination of protein sequences for evidence of internal gene duplication. J Mol Evol 10, 265–281 (1978). https://doi.org/10.1007/BF01734217

Download citation

Key words

  • Internal Gene Duplication
  • Periodic Proteins
  • Computer Examination of Protein Sequences
  • Ancestral Sequences
  • Evolutionary Mechanisms