Tools and Methods in the Analysis of Simple Sequences

  • Yogesh Kumar
  • Om Prakash
  • Priyanka Kumari


The comparison and analysis of large-scale nucleotide and protein sequences have always remained to be a challenging task for the molecular biologists. However, the development of new statistical methods and computational programs has empowered the scientific community to analyze and interpret the features, function, structure, and evolution of biological sequencing data without much difficulty. In this context, the current chapter presents with different sequence alignment approaches including pairwise alignment and multiple sequence alignment and phylogenetic tree construction. This chapter provides insight into different bioinformatics tools and algorithms along with some basic examples. It also covers the essential topics of sequence analysis for the ease of readers to understand and implement in their regular work.


Biological sequence Sequence alignment MSA Phylogenetic tree 



We are thankful to CSIR-CIMAP, Lucknow, India, Dr. Feroz Khan for his support. Authors express special thanks of gratitude to all the world-wide scientists, professors and their contribution in the research, authors also thankful to all the young research scholar community for giving their life to the research and to all parents who supported them with patience.


  1. Aldrich J (1997) RA fisher and the making of maximum likelihood 1912-1922. Stat Sci 12(3):162–176Google Scholar
  2. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410Google Scholar
  3. Altschul SF, Boguski MS, Gish W, Wootton JC (1994) Issues in searching molecular sequence databases. Nat Genet 6(2):119–129PubMedGoogle Scholar
  4. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17):3389–3402PubMedPubMedCentralGoogle Scholar
  5. Bhagwat M, Young L, Robison RR (2012) Using BLAT to find sequence similarity in closely related genomes. Curr Protoc Bioinformatics:10–18Google Scholar
  6. Chial H (2008) DNA sequencing technologies key to the human genome project. Nature Education 1(1):219Google Scholar
  7. Corpet F (1988) Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res 16(22):10881–10890PubMedPubMedCentralGoogle Scholar
  8. Day WH, Sankoff D (1986) The computational complexity of inferring phylogenies by compatibility. Syst Biol 35(2):224–229Google Scholar
  9. Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Atlas of protein sequence and structure, vol 5. National Biomedical Research Foundation, Silver Spring, pp 345–352Google Scholar
  10. Eck RV, Dayhoff MO (1966) Evolution of the structure of ferredoxin based on living relics of primitive amino acid sequences. Science 152(3720):363–366PubMedGoogle Scholar
  11. Efron B (2003) Second thoughts on the bootstrap. Stat Sci 18(2):135–140Google Scholar
  12. Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC pressGoogle Scholar
  13. Efron B, Halloran E, Holmes S (1996) Bootstrap confidence levels for phylogenetic trees. Proc Natl Acad Sci 93(23):13429–13429PubMedGoogle Scholar
  14. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17(6):368–376PubMedGoogle Scholar
  15. Felsenstein J (1988) Phylogenies from molecular sequences: inference and reliability. Annu Rev Genet 22(1):521–565PubMedGoogle Scholar
  16. Felsenstein J (1996). [24) Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol 266:418–427PubMedGoogle Scholar
  17. Felsenstein J (2002) {PHYLIP}(Phylogeny Inference Package) version 3.6 a3Google Scholar
  18. Feng DF, Doolittle RF (1996). [21) Progressive alignment of amino acid sequences and construction of phylogenetic trees from them. Methods Enzymol 266:368–382PubMedGoogle Scholar
  19. Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Syst Biol 20(4):406–416Google Scholar
  20. Gibbs AJ, McIntyre GA (1970) The diagram, a method for comparing sequences. FEBS J 16(1):1–11Google Scholar
  21. Gotoh O (1982) An improved algorithm for matching biological sequences. J Mol Biol 162(3):705–708PubMedGoogle Scholar
  22. Guindon S, Lethiec F, Duroux P, Gascuel O (2005) PHYML online—a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res 33(suppl_2):W557–W559PubMedPubMedCentralGoogle Scholar
  23. Hald A (1998) A history of mathematical statistics from 1750 to 1930. WileyGoogle Scholar
  24. Hendy MD, Penny D (1982) Branch and bound algorithms to determine minimal evolutionary trees. Math Biosci 59(2):277–290Google Scholar
  25. Henikoff S, Henikoff JG (1992a) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 89(22):10915–10919PubMedGoogle Scholar
  26. Henikoff S, Henikoff JG (1992b) Amino acid substitution matrices from protein blocks. PNAS 89(22):10915–10919PubMedGoogle Scholar
  27. Higgins DG, Sharp PM (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73(1):237–244PubMedGoogle Scholar
  28. Higgins DG, Thompson JD, Gibson TJ (1996) Using CLUSTAL for multiple sequence alignments. Methods Enzymol 266:383–402PubMedGoogle Scholar
  29. Hogeweg P, Hesper B (1984) The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J Mol Evol 20(2):175–186PubMedGoogle Scholar
  30. Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17(8):754–755PubMedGoogle Scholar
  31. Huelsenbeck JP, Ronquist F (2005) Bayesian analysis of molecular evolution using MrBayes. Statistical methods in molecular evolution:183–226Google Scholar
  32. Kent WJ (2002) BLAT-the BLAST-like alignment tool. Genome Res 12(4):656–664PubMedPubMedCentralGoogle Scholar
  33. Kluge AG, Farris JS (1969) Quantitative phyletics and the evolution of anurans. Syst Biol 18(1):1–32Google Scholar
  34. Kuhner MK, Felsenstein J (1994) A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol Biol Evol 11(3):459–468PubMedGoogle Scholar
  35. Kuhner MK, Yamato J, Felsenstein J (1998) Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149(1):429–434PubMedPubMedCentralGoogle Scholar
  36. Kumar S, Stecher G, Tamura K (2016a) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33(7):1870–1874PubMedGoogle Scholar
  37. Kumar S, Stecher G, Tamura K (2016b) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33(7):1870–1874PubMedGoogle Scholar
  38. Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441PubMedGoogle Scholar
  39. Lipman DJ, Altschul SF, Kececioglu JD (1989) A tool for multiple sequence alignment. Proc Natl Acad Sci 86(12):4412–4415PubMedGoogle Scholar
  40. Löytynoja A, Goldman N (2005) An algorithm for multiple progressive alignments of sequences with insertions. Proc Natl Acad Sci U S A 102(30):10557–10562PubMedPubMedCentralGoogle Scholar
  41. Maizel JV, Lenk RP (1981) Enhanced graphic matrix analysis of nucleic acid and protein sequences. Proc Natl Acad Sci 78(12):7665–7669PubMedGoogle Scholar
  42. Mount DW (2008) Maximum parsimony method for phylogenetic prediction. Cold Spring Harb Protoc 2008(4):PDB–top32Google Scholar
  43. Mount DW (2009) Using progressive methods for multiple global sequencesGoogle Scholar
  44. Mueller LD, Ayala FJ (1982) Estimation and interpretation of genetic distance in empirical studies. Genet Res 40(2):127–137PubMedGoogle Scholar
  45. Navidi WC, Churchill GA, Von Haeseler A (1991) Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. Mol Biol Evol 8(1):128–143PubMedGoogle Scholar
  46. Needleman SB, Wunsch CD (1970) A general method applies to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453PubMedGoogle Scholar
  47. Notredame C, Higgins DG, Heringa J (2000) T-coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205–217PubMedGoogle Scholar
  48. Ortet P, Bastien O (2010) Where does the alignment score distribution shape come from. In: Evolutionary bioinformatics online, vol 6, p 159Google Scholar
  49. Pearson WR, Miller W (1992) Dynamic programming algorithms for biological sequence comparison. Methods Enzymol 210:575–601PubMedGoogle Scholar
  50. Plotree DOTREE, Plotgram DOTGRAM (1989) PHYLIP-phylogeny inference package (version 3.2). Cladistics 5(163):6Google Scholar
  51. Polyanovsky VO, Roytberg MA, Tumanyan VG (2011) Comparative analysis of the quality of a global algorithm and a local algorithm for alignment of two sequences. Algorithms Mol Biol 6(1):25PubMedPubMedCentralGoogle Scholar
  52. Quenouille MH (1949) Approximate tests of correlation in time series. J R Stat Soc Ser B 11:68–84Google Scholar
  53. Quenouille MH (1956) Notes on bias in estimation. Biometrika 43(3/4):353–360Google Scholar
  54. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425PubMedGoogle Scholar
  55. Sattath S, Tversky A (1977) Additive similarity trees. Psychometrika 42(3):319–345Google Scholar
  56. Schadt EE, Sinsheimer JS, Lange K (1998) Computational advances in maximum likelihood methods for molecular phylogeny. Genome Res 8(3):222–233PubMedGoogle Scholar
  57. Sellers PH (1974) On the theory and computation of evolutionary distances. SIAM J Appl Math 26(4):787–793Google Scholar
  58. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197PubMedGoogle Scholar
  59. Sneath PH, & Sokal RR (1973) Numerical taxonomy. The principles and practice of numerical classificationGoogle Scholar
  60. Sokal RR (1958) A statistical method for evaluating the systematic relationship. University of Kansas science bulletin 28:1409–1438Google Scholar
  61. Sokal RR, Michener CD (1958) A statistical methods for evaluating relationships. Univ Kansas Sci Bull 38:1409–1448Google Scholar
  62. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313PubMedPubMedCentralGoogle Scholar
  63. Tamura K (2007) Molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol 24:1596–1599PubMedGoogle Scholar
  64. Tamura K, Nei M (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10(3):512–526PubMedGoogle Scholar
  65. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28(10):2731–2739PubMedPubMedCentralGoogle Scholar
  66. Tukey JW (1958) Bias and confidence in not quite large samples. Ann Math Statist 29:614Google Scholar
  67. Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. J Comput Biol 1(4):337–348PubMedGoogle Scholar
  68. Waterman MS, Perlwitz MD (1984) Line geometries for sequence comparisons. Bull Math Biol 46(4):567–557Google Scholar
  69. Zuo G, Xu Z, Yu H, Hao B (2010) Jackknife and bootstrap tests of the composition vector trees. Genomics Proteomics Bioinformatics 8(4):262–267PubMedGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Yogesh Kumar
    • 1
  • Om Prakash
    • 1
  • Priyanka Kumari
    • 2
  1. 1.Department of Metabolic & Structural BiologyCSIR-Central Institute of Medicinal and Aromatic PlantsLucknowIndia
  2. 2.Department of Plant BiotechnologyCSIR-Central Institute of Medicinal and Aromatic PlantsLucknowIndia

Personalised recommendations