Skip to main content
Log in

Modeling sequence evolution with kernel methods

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We model the evolution of biological and linguistic sequences by comparing their statistical properties. This comparison is performed by means of efficiently computable kernel functions, that take two sequences as an input and return a measure of statistical similarity between them. We show how the use of such kernels allows to reconstruct the phylogenetic trees of primates based on the mitochondrial DNA (mtDNA) of existing animals, and the phylogenetic tree of Indo-European and other languages based on sample documents from existing languages.

Kernel methods provide a convenient framework for many pattern analysis tasks, and recent advances have been focused on efficient methods for sequence comparison and analysis. While a large toolbox of algorithms has been developed to analyze data by using kernels, in this paper we demonstrate their use in combination with standard phylogenetic reconstruction algorithms and visualization methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Allman, E.S., Rhodes, J.A.: Mathematical Models in Biology: An Introduction. Cambridge University Press, Cambridge (2004)

    MATH  Google Scholar 

  2. Benedetto, D., Caglioti, E., Loreto, V.: Language trees and zipping. Phys. Rev. Lett. (2002)

  3. De Bie, T., Cristianini, N.: Kernel methods for exploratory data analysis: a demonstration on text data. In: Proceedings of the joint IAPR international workshops on Syntactical and Structural Pattern Recognition, SSPR 2004 and Statistical Pattern Recognition, SPR 2004, Lisbon, August 2004

  4. Felsenstein, J.: Inferring Phylogenies. Sinauer, Sunderland (2004)

    Google Scholar 

  5. Ingman, M.: mtDB—Human Mitochondrial Genome Database, http://www.genpat.uu.se/mtDB/sequences.php

  6. Ingman, M., Kaessmann, H., Pbo, S., Gyllensten, U.: Mitochondrial genome variation and the origin of modern humans. Nature (2000)

  7. Kruskal, J.B., Wish, M.: Multidimensional Scaling. Sage, Beverly Hills (1978)

    Google Scholar 

  8. Leslie, C., Kuang, R.: Fast kernels for inexact string matching. In: Conference on Learning Theory, Columbia University, New York, NY, 2003

  9. Li, M., Li, X., Ma, B., Vitanyi, P.: Similarity distance and phylogeny. IEEE Trans. Inform. Theory (2004)

  10. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. (2002)

  11. Nowak, M.A., Krakauer, D.C.: The evolution of language. Proc. Natl. Acad. Sci. USA (1999)

  12. Perrière, G., Gouy, M.: WWW-Query: An on-line retrieval system for biological sequence banks. Biochimie (1996), http://pbil.univ-lyon1.fr/software/, pp. 364–369

  13. United Nations General Assembly resolution 217 A (III), Universal Declaration of Human Rights, 1948

  14. Ringe, D.A., Taylor, A., Warnow, T.: Determining the Evolutionary History of Languages. University of Pennsylvania, Philadelphia (1955)

    Google Scholar 

  15. Ruhlen, M.: The Origin of Language: Tracing the Evolution of the Mother Tongue. Wiley, New York (1994)

    Google Scholar 

  16. Saitou, N., Nei, M.: The neighbor joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. (1987)

  17. Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. (1948)

  18. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004), http://www.kernel-methods.net

    Google Scholar 

  19. Studier, A.J., Keppler, K.J.: A note on the neighbor joining algorithm of Saitou and Nei. Mol. Biol. Evol. (1988)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nello Cristianini.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bresco, M., Turchi, M., De Bie, T. et al. Modeling sequence evolution with kernel methods. Comput Optim Appl 38, 281–298 (2007). https://doi.org/10.1007/s10589-007-9045-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-007-9045-9

Keywords

Navigation