Synonymous Codon Substitution Matrices

  • Adrian Schneider
  • Gaston H. Gonnet
  • Gina M. Cannarozzi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3992)


Observing differences between DNA or protein sequences and estimating the true amount of substitutions from them is a prominent problem in molecular evolution as many analyses are based on distance measures between biological sequences. Since the relationship between the observed and the actual amount of mutations is very complex, more than four decades of research have been spent to improve molecular distance measures. In this article we present a method called SynPAM which can be used to estimate the amount of synonymous change between sequences of coding DNA. The method is novel in that it is based on an empirical model of codon evolution and that it uses a maximum-likelihood formalism to measure synonymous change in terms of codon substitutions, while reducing the need for assumptions about DNA evolution to an absolute minimum. We compared the SynPAM method with two established methods for measuring synonymous sequence divergence. Our results suggest that this new method not only shows less variance, but is also able to capture weaker phylogenetic signals than the other methods.


Synonymous Substitution Synonymous Codon Synonymous Mutation Substitution Matrix Codon Frequency 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Miyata, T., Yasunaga, T.: Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its application. J. Mol. Evol. 16, 23–36 (1980)CrossRefGoogle Scholar
  2. 2.
    Perler, F., Efstratiadis, A., Lomedico, P., Gilbert, W., Kolodner, R., Dodgson, J.: The evolution of genes: the chicken preproinsulin gene. Cell 20(2), 555–566 (1980)CrossRefGoogle Scholar
  3. 3.
    Goldman, N., Yang, Z.: A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11(5), 725–736 (1994)Google Scholar
  4. 4.
    Yang, Z., Nielsen, R., Goldman, N., Pedersen, A.M.K.: Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155, 432–449 (2000)Google Scholar
  5. 5.
    Benner, S.A.: Interpretive proteomics– finding biological meaning in genome and proteome databases. Advances in Enzyme Regulation 43, 271–359 (2003)CrossRefGoogle Scholar
  6. 6.
    Caraco, M.D.: Neutral Evolutionary Distance: A New Dating Tool and its Applications. PhD thesis, ETH Zürich, Zürich, Switzerland (2002)Google Scholar
  7. 7.
    Yang, Z.: Paml: A program package for phylogenetic analysis by maximum likelihood. CABIOS 13, 555–556 (1997)Google Scholar
  8. 8.
    Yang, Z., Nielsen, R.: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17(1), 32–43 (2000)Google Scholar
  9. 9.
    Schneider, A., Cannarozzi, G.M., Gonnet, G.H.: Empirical codon substitution matrix. BMC Bioinformatics 6(134) (2005)Google Scholar
  10. 10.
    Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model for evolutionary change in proteins. In: Dayhoff, M.O. (ed.) Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, vol. 5, pp. 345–352 (1978)Google Scholar
  11. 11.
    Cox, D., Miller, H.: The Theory of Stochastic Processes. Chapman and Hall, London (1965)MATHGoogle Scholar
  12. 12.
    Gonnet, G.H., Hallett, M.T., Korostensky, C., Bernardin, L.: Darwin v. 2.0: An interpreted computer language for the biosciences. Bioinformatics 16(2), 101–103 (2000)CrossRefGoogle Scholar
  13. 13.
    Nakamura, Y., Gojobori, T., Ikemura, T.: Codon usage tabulated from the international DNA sequence database. Nucleic Acids Res. 28, 292 (2000)CrossRefGoogle Scholar
  14. 14.
    Hubbard, T., Andrews, D., Caccamo, M., Cameron, G., Chen, Y., Clamp, M., Clarke, L., Coates, G., Cox, T., Cunningham, F., Curwen, V., Cutts, T., Down, T., Durbin, R., Fernandez-Suarez, X.M., Gilbert, J., Hammond, M., Herrero, J., Hotz, H., Howe, K., Iyer, V., Jekosch, K., Kahari, A., Kasprzyk, A., Keefe, D., Keenan, S., Kokocinsci, F., London, D., Longden, I., McVicker, G., Melsopp, C., Meidl, P., Potter, S., Proctor, G., Rae, M., Rios, D., Schuster, M., Searle, S., Severin, J., Slater, G., Smedley, D., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Storey, R., Trevanion, S., Ureta-Vidal, A., Vogel, J., White, S., Woodwark, C., Birney, E.: Ensembl 2005. Nucleic Acids Res. 33(suppl.1), D447–D453 (2005)Google Scholar
  15. 15.
    Dessimoz, C., Cannarozzi, G., Gil, M., Margadant, D., Roth, A., Schneider, A., Gonnet, G.: OMA, a comprehensive, automated project for the identification of orthologs from complete genome data: Introduction and first achievements. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 61–72. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Bielawski, J.P., Dunn, K.A., Yang, Z.: Rates of nucleotide substitution and mammalian nuclear gene evolution: Approximate and maximum-likelihood methods lead to different conclusions. Genetics 156, 1299–1308 (2000)Google Scholar
  17. 17.
    Dunn, K.A., Bielawski, J.P., Yang, Z.: Substitution rates in drosophila nuclear genes: Implications for translational selection. Genetics 157, 295–305 (2001)Google Scholar
  18. 18.
    Waterman, M.S., Smith, T.F., Beyer, W.A.: Some biological sequence metrics. Advances in Mathematics 20, 367–387 (1976)MathSciNetMATHCrossRefGoogle Scholar
  19. 19.
    Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Adrian Schneider
    • 1
  • Gaston H. Gonnet
    • 1
  • Gina M. Cannarozzi
    • 1
  1. 1.Computational Biology Research Group, Institute for Computational ScienceETH ZürichZürichSwitzerland

Personalised recommendations