Abstract
Observing differences between DNA or protein sequences and estimating the true amount of substitutions from them is a prominent problem in molecular evolution as many analyses are based on distance measures between biological sequences. Since the relationship between the observed and the actual amount of mutations is very complex, more than four decades of research have been spent to improve molecular distance measures. In this article we present a method called SynPAM which can be used to estimate the amount of synonymous change between sequences of coding DNA. The method is novel in that it is based on an empirical model of codon evolution and that it uses a maximum-likelihood formalism to measure synonymous change in terms of codon substitutions, while reducing the need for assumptions about DNA evolution to an absolute minimum. We compared the SynPAM method with two established methods for measuring synonymous sequence divergence. Our results suggest that this new method not only shows less variance, but is also able to capture weaker phylogenetic signals than the other methods.
This work was supported by the intramural research program of the National Institutes of Health, National Library of Medicine.
An erratum to this chapter is available at http://dx.doi.org/10.1007/11758525_148.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Miyata, T., Yasunaga, T.: Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its application. J. Mol. Evol. 16, 23–36 (1980)
Perler, F., Efstratiadis, A., Lomedico, P., Gilbert, W., Kolodner, R., Dodgson, J.: The evolution of genes: the chicken preproinsulin gene. Cell 20(2), 555–566 (1980)
Goldman, N., Yang, Z.: A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11(5), 725–736 (1994)
Yang, Z., Nielsen, R., Goldman, N., Pedersen, A.M.K.: Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155, 432–449 (2000)
Benner, S.A.: Interpretive proteomics– finding biological meaning in genome and proteome databases. Advances in Enzyme Regulation 43, 271–359 (2003)
Caraco, M.D.: Neutral Evolutionary Distance: A New Dating Tool and its Applications. PhD thesis, ETH Zürich, Zürich, Switzerland (2002)
Yang, Z.: Paml: A program package for phylogenetic analysis by maximum likelihood. CABIOS 13, 555–556 (1997)
Yang, Z., Nielsen, R.: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17(1), 32–43 (2000)
Schneider, A., Cannarozzi, G.M., Gonnet, G.H.: Empirical codon substitution matrix. BMC Bioinformatics 6(134) (2005)
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model for evolutionary change in proteins. In: Dayhoff, M.O. (ed.) Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, vol. 5, pp. 345–352 (1978)
Cox, D., Miller, H.: The Theory of Stochastic Processes. Chapman and Hall, London (1965)
Gonnet, G.H., Hallett, M.T., Korostensky, C., Bernardin, L.: Darwin v. 2.0: An interpreted computer language for the biosciences. Bioinformatics 16(2), 101–103 (2000)
Nakamura, Y., Gojobori, T., Ikemura, T.: Codon usage tabulated from the international DNA sequence database. Nucleic Acids Res. 28, 292 (2000)
Hubbard, T., Andrews, D., Caccamo, M., Cameron, G., Chen, Y., Clamp, M., Clarke, L., Coates, G., Cox, T., Cunningham, F., Curwen, V., Cutts, T., Down, T., Durbin, R., Fernandez-Suarez, X.M., Gilbert, J., Hammond, M., Herrero, J., Hotz, H., Howe, K., Iyer, V., Jekosch, K., Kahari, A., Kasprzyk, A., Keefe, D., Keenan, S., Kokocinsci, F., London, D., Longden, I., McVicker, G., Melsopp, C., Meidl, P., Potter, S., Proctor, G., Rae, M., Rios, D., Schuster, M., Searle, S., Severin, J., Slater, G., Smedley, D., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Storey, R., Trevanion, S., Ureta-Vidal, A., Vogel, J., White, S., Woodwark, C., Birney, E.: Ensembl 2005. Nucleic Acids Res. 33(suppl.1), D447–D453 (2005)
Dessimoz, C., Cannarozzi, G., Gil, M., Margadant, D., Roth, A., Schneider, A., Gonnet, G.: OMA, a comprehensive, automated project for the identification of orthologs from complete genome data: Introduction and first achievements. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 61–72. Springer, Heidelberg (2005)
Bielawski, J.P., Dunn, K.A., Yang, Z.: Rates of nucleotide substitution and mammalian nuclear gene evolution: Approximate and maximum-likelihood methods lead to different conclusions. Genetics 156, 1299–1308 (2000)
Dunn, K.A., Bielawski, J.P., Yang, Z.: Substitution rates in drosophila nuclear genes: Implications for translational selection. Genetics 157, 295–305 (2001)
Waterman, M.S., Smith, T.F., Beyer, W.A.: Some biological sequence metrics. Advances in Mathematics 20, 367–387 (1976)
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schneider, A., Gonnet, G.H., Cannarozzi, G.M. (2006). Synonymous Codon Substitution Matrices. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science – ICCS 2006. ICCS 2006. Lecture Notes in Computer Science, vol 3992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11758525_86
Download citation
DOI: https://doi.org/10.1007/11758525_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34381-3
Online ISBN: 978-3-540-34382-0
eBook Packages: Computer ScienceComputer Science (R0)