Pacemaker Partition Identification

  • Sagi Snir
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8701)


The universally observed conservation of the distribution of evolution rates across the complete sets of orthologous genes in pairs of related genomes can be explained by the model of the Universal Pacemaker (UPM) of genome evolution. Under UPM, the relative evolutionary rates of all genes remain nearly constant whereas the absolute rates can change arbitrarily. It was shown on several taxa groups spanning the entire tree of life that the UPM model describes the evolutionary process better than the traditional molecular clock model [26][25]. Here we extend this analysis and ask: how many pacemakers are there and which genes are affected by which pacemakers? The answer to this question induces a partition of the gene set such that all the genes in one part are affected by the same pacemaker. The input to the problem comes with arbitrary amount of statistical noise, hindering the solution even more. In this work we devise a novel heuristic procedure, relying on statistical and geometrical tools, to solve the pacemaker partition identification problem and demonstrate by simulation that this approach can cope satisfactorily with considerable noise and realistic problem sizes. We applied this procedure to a set of over 2000 genes in 100 prokaryotes and demonstrated the significant existence of two pacemakers.


Molecular Evolution Genome Evolution Pacemaker Deming regression Partition Distance Gap Statistics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adcock, R.J.: A problem in least squares. Annals of Mathematics 5, 53–54 (1878)Google Scholar
  2. 2.
    Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows. Prentice-Hall, Englewood Cliffs (1993)zbMATHGoogle Scholar
  3. 3.
    Bar-Yehuda, R.: One for the price of two: A unified approach for approximating covering problems. Algorithmica 27, 131–144 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Borg, I., Groenen, P.: Modern multidimensional scaling, theory and applications. Springer, New York (1997)CrossRefzbMATHGoogle Scholar
  5. 5.
    Bromham, L.: Why do species vary in their rate of molecular evolution? Biology Letters 5(3), 401–404 (2009)CrossRefGoogle Scholar
  6. 6.
    Deming, W.E.: Tatistical adjustment of data. J. Wiley & Sons (1943)Google Scholar
  7. 7.
    Doolittle, W.F.: Phylogenetic classification and the universal tree. Science 284(5423), 2124–2129 (1999)CrossRefGoogle Scholar
  8. 8.
    Drummond, D.A., Wilke, C.O.: Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134(2), 341–352 (2008)CrossRefGoogle Scholar
  9. 9.
    Finden, C.R., Gordon, A.D.: Obtaining common pruned trees. Journal of Classification 2, 225–276 (1985)CrossRefGoogle Scholar
  10. 10.
    Fuller, W.A.: Measurement error models. John Wiley & Sons, Chichester (1987)CrossRefzbMATHGoogle Scholar
  11. 11.
    Gogarten, J.P., Doolittle, W.F., Lawrence, J.G.: Prokaryotic evolution in light of gene transfer. Mol. Biol. Evol. 19, 2226–2238 (2002)CrossRefGoogle Scholar
  12. 12.
    Grishin, N.V., Wolf, Y.I., Koonin, E.V.: From complete genomes to measures of substitution rate variability within and between proteins. Genome Research 10(7), 991–1000 (2000), doi:10.1101/gr.10.7.991CrossRefGoogle Scholar
  13. 13.
    Guérin, R., Orda, A.: Computing shortest paths for any number of hops. IEEE/ACM Trans. Netw. 10(5), 613–620 (2002)CrossRefGoogle Scholar
  14. 14.
    Gusfield, D.: Partition-distance: A problem and class of perfect graphs arising in clustering. Information Processing Letters 82(3), 159 (2002)CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Applied Statistics 28, 100–108 (1979)CrossRefzbMATHGoogle Scholar
  16. 16.
    Kruskal, J.B.: Nonmetric multidimensional scaling: a numerical method. Psychometrika 29, 115–130 (1964)CrossRefzbMATHMathSciNetGoogle Scholar
  17. 17.
    Kruskal, J.B., Wish, M.: Multidimensional Scaling. Sage Publications (1978)Google Scholar
  18. 18.
    Lawler, E.L.: Combinatorial optimization: networks and matroids. The University of Michigan (1976)Google Scholar
  19. 19.
    Lloyd, S.P.: Least squares quantization in pcm. IEEE Transactions on Information Theory 28, 129–137 (1982)CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Mardia, K.V.: Some properties of classical multidimensional scaling. Communications on Statistics – Theory and Methods A7 (1978)Google Scholar
  21. 21.
    Moran, S., Snir, S.: Efficient approximation of convex recolorings. J. Comput. Syst. Sci. 73(7), 1078–1089 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  22. 22.
    Moran, S., Snir, S.: Convex recolorings of strings and trees: Definitions, hardness results and algorithms. J. Comput. Syst. Sci. 74(5), 850–869 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  23. 23.
    Preis, R.: Linear time \(\frac{1}{2}\)-approximation algorithm for maximum weighted matching in general graphs. In: Meinel, C., Tison, S. (eds.) STACS 1999. LNCS, vol. 1563, p. 259. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  24. 24.
    Puigbo, P., Wolf, Y., Koonin, E.: Search for a ’tree of life’ in the thicket of the phylogenetic forest. Journal of Biology 8(6), 59 (2009)CrossRefGoogle Scholar
  25. 25.
    Snir, S., Wolf, Y.I., Koonin, E.V.: Universal pacemaker of genome evolution in animals and fungi and variation of evolutionary rates in diverse organisms. In: Genome Biology and Evolution (2014)Google Scholar
  26. 26.
    Snir, S., Wolf, Y.I., Koonin, E.V.: Universal pacemaker of genome evolution. PLoS Comput Biol. 8, e1002785 (2012)Google Scholar
  27. 27.
    Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(2), 411–423 (2001)zbMATHMathSciNetCrossRefGoogle Scholar
  28. 28.
    Tutte, W.T.: Connectivity in graphs. Mathematical expositions. University of Toronto Press (1966)Google Scholar
  29. 29.
    Wasserman, L.: All of Statistics, ch. 4. Springer, New York (2004)CrossRefzbMATHGoogle Scholar
  30. 30.
    Wolf, Y.I., Snir, S., Koonin, E.V.: Stability along with extreme variability in core genome evolution. Genome Biology and Evolution 5(7), 1393–1402 (2013)CrossRefGoogle Scholar
  31. 31.
    Wolf, Y.I., Novichkov, P.S., Karev, G.P., Koonin, E.V., Lipman, D.J.: The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proceedings of the National Academy of Sciences 106(18), 7273–7280 (2009)CrossRefGoogle Scholar
  32. 32.
    Zuckerkandl, E.: On the molecular evolutionary clock. Journal of Mol. Evol. 26(1), 34–46 (1987)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Sagi Snir
    • 1
  1. 1.Dept. of Evolutionary BiologyUniversity of HaifaHaifaIsrael

Personalised recommendations