Novel Phylogenetic Network Inference by Combining Maximum Likelihood and Hidden Markov Models
Horizontal Gene Transfer (HGT) is the event of transferring genetic material from one lineage in the evolutionary tree to a different lineage. HGT plays a major role in bacterial genome diversification and is a significant mechanism by which bacteria develop resistance to antibiotics. Although the prevailing assumption is of complete HGT, cases of partial HGT (which are also named chimeric HGT) where only part of a gene is horizontally transferred, have also been reported, albeit less frequently.
In this work we suggest a new probabilistic model for analyzing and modeling phylogenetic networks, the NET-HMM. This new model captures the biologically realistic assumption that neighboring sites of DNA or amino acid sequences are not independent, which increases the accuracy of the inference. The model describes the phylogenetic network as a Hidden Markov Model (HMM), where each hidden state is related to one of the network’s trees. One of the advantages of the NET-HMM is its ability to infer partial HGT as well as complete HGT. We describe the properties of the NET-HMM, devise efficient algorithms for solving a set of problems related to it, and implement them in software. We also provide a novel complementary significance test for evaluating the fitness of a model (NET-HMM) to a given data set.
Using NET-HMM we are able to answer interesting biological questions, such as inferring the length of partial HGT’s and the affected nucleotides in the genomic sequences, as well as inferring the exact location of HGT events along the tree branches. These advantages are demonstrated through the analysis of synthetical inputs and two different biological inputs.
KeywordsHide Markov Model Horizontal Gene Transfer Edge Length Segment Length Horizontal Transfer
Unable to display preview. Download preview PDF.
- 1.Addario-Berry, L., Hallett, M., Lagergren, J.: Towards identifying lateral gene transfer events. In: PSB 2003, pp. 279–290 (2003)Google Scholar
- 4.Boc, A., Makarenkov, V.: New efficient algorithm for detection of horizontal gene transfer events. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 190–201. Springer, Heidelberg (2003)Google Scholar
- 5.Delwiche, C., Palmer, J.: Rampant horizontal transfer and duplication of rubisco genes in eubacteria and plastids. Mol. Biol. Evol. 13(6) (1996)Google Scholar
- 7.Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1999)Google Scholar
- 11.Hallett, M., Lagergren, J., Tofigh, A.: Simultaneous identification of duplications and lateral transfers. In: Proceedings of the eighth annual international conference on Research in computational molecular biology, pp. 347–356 (2004)Google Scholar
- 18.Jin, G., Nakhleh, L., Snir, S., Tuller, T.: A new linear-time heuristic algorithm for computing the parsimony score of phylogenetic networks: Theoretical bounds and empirical performance. In: Măndoiu, I.I., Zelikovsky, A. (eds.) ISBRA 2007. LNCS (LNBI), vol. 4463, pp. 61–72. Springer, Heidelberg (2007)CrossRefGoogle Scholar
- 19.Jin, G., Nakhleh, L., Snir, S., Tuller, T.: Parsimony score of phylogenetic networks: Hardness results and a linear-time heuristic (submitted, 2008)Google Scholar
- 21.Jukes, T., Cantor, C.: Evolution of protein molecules. In: Munro, H.N. (ed.) Mammalian protein metabolism, pp. 21–132 (1969)Google Scholar
- 22.Matte-Tailliez, O., Brochier, C., Forterre, P., Philippe, H.: Archaeal phylogeny based on ribosomal proteins. Mol. Biol. Evol. 19(5), 631–639 (2002)Google Scholar
- 24.Pupko, T., Huchon, D., Cao, Y., Okada, N., Hasegawa, M.: Combining multiple datasets in a likelihood analysis: which models are best. Mol. Biol. Evol. 19(12), 2294–2307 (2002)Google Scholar
- 27.Siepel, A., Haussler, D.: Combining phylogenetic and hidden markov models in biosequence analysis. In: RECOMB 2003, pp. 277–286 (2003)Google Scholar
- 28.Strimmer, K., Moulton, V.: Likelihood analysis of phylogenetic networks using directed graphical models. Mol. Biol. Evol. 17(6), 875–881 (2000)Google Scholar