Abstract
Comparative sequence analysis is widely used for the reconstruction of phylogeny and for understanding the evolutionary history of gene families. Here, we describe the methodologies to reconstruct the phylogenetic and evolutionary history of a gene family across genomes with a focus on the ARGONAUTE (AGO) family of proteins in plants. The method described here may easily be adapted for studying molecular evolution of a wide variety of gene families. We enlist methods as well as parameters for the collection of molecular data (nucleic acids and peptides), preparation of datasets, and selection of evolutionary models and various methods for the phylogenetic and evolutionary analysis, such as maximum likelihood and Bayesian inference.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Carmell MA, Xuan Z, Zhang MQ, Hannon GJ (2002) The Argonaute family: tentacles that reach into RNAi, developmental control, stem cell maintenance, and tumorigenesis. Genes Dev 16(21):2733–2742. doi:10.1101/gad.1026102
Hutvagner G, Simard MJ (2008) Argonaute proteins: key players in RNA silencing. Nat Rev Mol Cell Biol 9(1):22–32. doi:10.1038/nrm2321
Kuhn CD, Joshua-Tor L (2013) Eukaryotic Argonautes come into focus. Trends Biochem Sci 38(5):263–271. doi:10.1016/j.tibs.2013.02.008
Hur JK, Zinchenko MK, Djuranovic S, Green R (2013) Regulation of Argonaute slicer activity by guide RNA 3′ end interactions with the N-terminal lobe. J Biol Chem 288(11):7829–7840. doi:10.1074/jbc.M112.441030
Song JJ, Smith SK, Hannon GJ, Joshua-Tor L (2004) Crystal structure of Argonaute and its implications for RISC slicer activity. Science 305(5689):1434–1437. doi:10.1126/science.1102514
Axtell MJ (2013) Classification and comparison of small RNAs from plants. Annu Rev Plant Biol 64:137–159. doi:10.1146/annurev-arplant-050312-120043
Baulcombe D (2004) RNA silencing in plants. Nature 431(7006):356–363. doi:10.1038/nature02874
Singh RK, Gase K, Baldwin IT, Pandey SP (2015) Molecular evolution and diversification of the Argonaute family of proteins in plants. BMC Plant Biol 15(1):1–23. doi:10.1186/s12870-014-0364-6
Hock J, Meister G (2008) The Argonaute protein family. Genome Biol 9(2):210. doi:10.1186/gb-2008-9-2-210
Singh RK, Pandey SP (2015) Evolution of structural and functional diversification among plant Argonautes. Plant Signal Behav 10(10):e1069455. doi:10.1080/15592324.2015.1069455
Mi S, Cai T, Hu Y, Chen Y, Hodges E, Ni F, Wu L, Li S, Zhou H, Long C, Chen S, Hannon GJ, Qi Y (2008) Sorting of small RNAs into Arabidopsis argonaute complexes is directed by the 5′ terminal nucleotide. Cell 133(1):116–127. doi:10.1016/j.cell.2008.02.034
Vaucheret H (2008) Plant ARGONAUTES. Trends Plant Sci 13(7):350–358. doi:10.1016/j.tplants.2008.04.007
Chothia C, Lesk AM (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5(4):823–826
Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW (2002) Evolutionary rate in the protein interaction network. Science 296(5568):750–752. doi:10.1126/science.1068696
Waxman D, Peck JR (1998) Pleiotropy and the preservation of perfection. Science 279(5354):1210–1213. doi:10.1126/science.279.5354.1210
Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, Rokhsar DS (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40(Database issue):D1178–D1186. doi:10.1093/nar/gkr944
Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Res 30(1):31–34. doi:10.1093/nar/30.1.31
Pruitt KD, Tatusova T, Maglott DR (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33(Database issue):D501–D504. doi:10.1093/nar/gki025
Clarke JT, Warnock RC, Donoghue PC (2011) Establishing a time-scale for plant evolution. New Phytol 192(1):266–301. doi:10.1111/j.1469-8137.2011.03794.x
Soskine M, Tawfik DS (2010) Mutational effects and the evolution of new protein functions. Nat Rev Genet 11(8):572–582. doi:10.1038/nrg2808
Eddy SR, Durbin R (1994) RNA sequence analysis using covariance models. Nucleic Acids Res 22(11):2079–2088. doi:10.1093/nar/22.11.2079
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95(25):14863–14868. doi:10.1073/pnas.95.25.14863
Gotoh O (1982) An improved algorithm for matching biological sequences. J Mol Biol 162(3):705–708. doi:10.1016/0022-2836(82)90398-9
Krogh A (1998) An introduction to hidden Markov models for biological sequences. New Compr Biochem 32:45–63. doi:10.1016/S0167-7306(08)60461-5
Bach MJ (1986) The design of the UNIX operating system, vol 5. Prentice-Hall Englewood Cliffs, NJ
Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21):2688–2690. doi:10.1093/bioinformatics/btl446
Lartillot N, Rodrigue N, Stubbs D, Richer J (2013) PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst Biol 62(4):611–615. doi:10.1093/sysbio/syt022
Fitch WM (2000) Homology: a personal view on some of the problems. Trends Genet 16(5):227–231. doi:10.1016/S0168-9525(00)02005-9
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. doi:10.1016/S0022-2836(05)80360-2
Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763. doi:10.1093/bioinformatics/14.9.755
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39(Web Server issue):W29–W37. doi:10.1093/nar/gkr367
Lechner M, Findeiss S, Steiner L, Marz M, Stadler PF, Prohaska SJ (2011) Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinformatics 12(1):124. doi:10.1186/1471-2105-12-124
Boussau B, Daubin V (2010) Genomes as documents of evolutionary history. Trends Ecol Evol 25(4):224–232. doi:10.1016/j.tree.2009.09.007
Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33(2):511–518. doi:10.1093/nar/gki198
Atkinson GC, Baldauf SL (2011) Evolution of elongation factor G and the origins of mitochondrial and chloroplast forms. Mol Biol Evol 28(3):1281–1292. doi:10.1093/molbev/msq316
Christin PA, Spriggs E, Osborne CP, Stromberg CA, Salamin N, Edwards EJ (2014) Molecular dating, evolutionary rates, and the age of the grasses. Syst Biol 63(2):153–165. doi:10.1093/sysbio/syt072
Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17(4):540–552. doi:10.1093/oxfordjournals.molbev.a026334
Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15):1972–1973. doi:10.1093/bioinformatics/btp348
Fitch WM, Margoliash E (1967) Construction of phylogenetic trees. Science 155(3760):279–284. doi:10.1126/science.155.3760.279
Page RD, Holmes EC (2009) Molecular evolution: a phylogenetic approach. John Wiley & Sons, New York, NY
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28(10):2731–2739. doi:10.1093/molbev/msr121
Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4):783–791. doi:10.2307/2408678
Anisimova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55(4):539–552. doi:10.1080/10635150600755453
Larget B, Simon DL (1999) Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol Biol Evol 16(6):750–759. doi:10.1093/oxfordjournals.molbev.a026160
Tomii K, Kanehisa M (1996) Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. Protein Eng 9(1):27–36. doi:10.1093/protein/9.1.27
Dayhoff MO, Schwartz RM (1978) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of protein sequence and structure, vol 5. National Biomedial Research Foundation, Washington DC, pp 345–358
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89(22):10915–10919. doi:10.1073/pnas.89.22.10915
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8(3):275–282. doi:10.1093/bioinformatics/8.3.275
Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18(5):691–699. doi:10.1093/oxfordjournals.molbev.a003851
Yang Z (1996) Maximum-Likelihood Models for Combined Analyses of Multiple Sequence Data. J Mol Evol 42(5):587–596. doi:10.1007/BF02352289
Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21(9):2104–2105. doi:10.1093/bioinformatics/bti263
Akaike H (1973) Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika 60(2):255–265. doi:10.1093/biomet/60.2.255
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464. doi:10.1214/aos/1176344136
Maddison WP, Donoghue MJ, Maddison DR (1984) Outgroup analysis and parsimony. Syst Biol 33(1):83–103. doi:10.1093/sysbio/33.1.83
Hedges SB, Kumar S (2009) The timetree of life. OUP Oxford,
Soltis DE, Smith SA, Cellinese N, Wurdack KJ, Tank DC, Brockington SF, Refulio-Rodriguez NF, Walker JB, Moore MJ, Carlsward BS, Bell CD, Latvis M, Crawley S, Black C, Diouf D, Xi Z, Rushworth CA, Gitzendanner MA, Sytsma KJ, Qiu YL, Hilu KW, Davis CC, Sanderson MJ, Beaman RS, Olmstead RG, Judd WS, Donoghue MJ, Soltis PS (2011) Angiosperm phylogeny: 17 genes, 640 taxa. Am J Bot 98(4):704–730. doi:10.3732/ajb.1000404
Piel WH, Donoghue M, Sanderson M, Netherlands L TreeBASE: a database of phylogenetic information. In: Proceedings of the 2nd International Workshop of Species 2000, 2000.
Chen K, Durand D, Farach-Colton M (2000) NOTUNG: a program for dating gene duplications and optimizing gene family trees. J Comput Biol 7(3-4):429–447. doi:10.1089/106652700750050871
Gu X, Vander Velden K (2002) DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family. Bioinformatics 18(3):500–501. doi:10.1093/bioinformatics/18.3.500
Gaucher EA, Gu X, Miyamoto MM, Benner SA (2002) Predicting functional divergence in protein evolution by site-specific rate shifts. Trends Biochem Sci 27(6):315–321. doi:10.1016/S0968-0004(02)02094-7
Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24(8):1586–1591. doi:10.1093/molbev/msm088
Suyama M, Torrents D, Bork P (2006) PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34(suppl 2):W609–W612. doi:10.1093/nar/gkl315
Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148(3):929–936
Yang Z, Nielsen R, Goldman N, Pedersen A-MK (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155(1):431–449
Yang Z, Wong WS, Nielsen R (2005) Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol 22(4):1107–1118. doi:10.1093/molbev/msi097
Yang Z, Nielsen R (1998) Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol 46(4):409–418. doi:10.1007/PL00006320
Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11(5):725–736. doi:10.1093/oxfordjournals.molbev.a040153
Fares MA, McNally D (2006) CAPS: coevolution analysis using protein sequences. Bioinformatics 22(22):2821–2822. doi:10.1093/bioinformatics/btl493
Gascuel O (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14(7):685–695. doi:10.1093/oxfordjournals.molbev.a025808
Simonetti FL, Teppa E, Chernomoretz A, Nielsen M, Marino Buslje C (2013) MISTIC: mutual information server to infer coevolution. Nucleic Acids Res 41(Web Server issue):W8–14. doi:10.1093/nar/gkt427
Buslje CM, Santos J, Delfino JM, Nielsen M (2009) Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information. Bioinformatics 25(9):1125–1131. doi:10.1093/bioinformatics/btp135
Larkin MA, Blackshields G, Brown N, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23(21):2947–2948. doi:10.1093/bioinformatics/btm404
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797. doi:10.1093/nar/gkh340
Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217. doi:10.1006/jmbi.2000.4042
Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30(14):3059–3066. doi:10.1093/nar/gkf436
Pearson T, Hornstra HM, Sahl JW, Schaack S, Schupp JM, Beckstrom-Sternberg SM, O’Neill MW, Priestley RA, Champion MD, Beckstrom-Sternberg JS (2013) When outgroups fail; phylogenomics of rooting the emerging pathogen, Coxiella burnetii. Syst Biol 62(5):752–762. doi:10.1093/sysbio/syt038
Jill Harrison C, Langdale JA (2006) A step by step guide to phylogeny reconstruction. Plant J 45(4):561–572. doi:10.1111/j.1365-313X.2005.02611.x
Acknowledgments
Financial assistance from MPG India partner program of Max Planck Society and Department of Science and Technology, India, the WHEAT Competitive Grants Initiative, CIMMYT and the CGIAR (A4031.09.10), and core funding from IISER-Kolkata is thankfully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media LLC
About this protocol
Cite this protocol
Singh, R.K., Pandey, S.P. (2017). Phylogenetic and Evolutionary Analysis of Plant ARGONAUTES. In: Carbonell, A. (eds) Plant Argonaute Proteins. Methods in Molecular Biology, vol 1640. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7165-7_20
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7165-7_20
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7164-0
Online ISBN: 978-1-4939-7165-7
eBook Packages: Springer Protocols