Skip to main content

Biological Sequence Search and Analysis

  • Chapter
Bioinformatics: A Concept-Based Introduction
  • 2987 Accesses

Abstract

Protein and genomic sequence analyses helps in understanding the structure, function, and organization of cellular systems. Important features of genes include identifying promoter regions, protein-coding regions, and intron-exon boundaries. Protein sequence analysis involves identifying functional motifs and patterns. Sequence search tools help in identifying similar sequences in protein and genomic databases. Here, we will discuss bioinformatics tools that help in biological sequence searches and analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abagyan, R.A. and Batalov, S. (1997) Do aligned sequences share the same fold? JMol Biol 273(1), 355–68.

    Article  PubMed  CAS  Google Scholar 

  • Attwood, T.K., Craning, M.D., et al. (2000) PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res 28(1), 225–7.

    Article  PubMed  CAS  Google Scholar 

  • Biswas, M., O’Rourke, J.F., et al. (2002) Applications of InterPro in protein annotation and genome analysis. Brief Bioinform 3(3), 285–95.

    Article  PubMed  CAS  Google Scholar 

  • Dayhoff, M.O. and Schwartz, R.M. (1978). A model of evolutionary change in proteins. Washington DC, National Biomedical Research Foundation.

    Google Scholar 

  • Falquet, L., Pagni, M., et al. (2002) The PROSITE database, its status in 2002. Nucleic Acids Res 30(1), 235–8.

    Article  PubMed  CAS  Google Scholar 

  • Finn, R.D., Mistry, J., et al. (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34(Database issue), D247–51.

    Google Scholar 

  • Gattiker, A., Gasteiger, E., et al. (2002) ScanProsite: a reference implementation of a PROSITE scanning tool. Appl Bioinformatics 1(2), 107–8.

    PubMed  CAS  Google Scholar 

  • Gonnet, G.H., Cohen, M.A., et al. (1992) Exhaustive matching of the entire protein sequence database. Science 256(5062), 1443–5.

    Article  PubMed  CAS  Google Scholar 

  • Gotoh, O. (1982) An improved algorithm for matching biological sequences. J Mol Biol 162(3), 705–8.

    Article  PubMed  CAS  Google Scholar 

  • Grundy, W.N., Bailey, T.L., et al. (1997) Hidden Markov model analysis of motifs in steroid dehydrogenases and their homologs. Biochem Biophys Res Commun 231(3), 760–6.

    Article  PubMed  CAS  Google Scholar 

  • Grundy, W.N., Bailey, T.L., et al. (1997 b) Meta-MEME: motif-based hidden Markov models of protein families. Comput Appl Biosci 13(4), 397–406.

    Google Scholar 

  • Henikoff, J.G., Greene, E.A., et al. (2000) Increased coverage of protein families with the blocks database servers. Nucleic Acids Res 28(1), 228–30.

    Article  PubMed  CAS  Google Scholar 

  • Henikoff, J.G., Pietrokovski, S., et al. (2000 b) Blocks-based methods for detecting protein homology. Electrophoresis 21(9), 1700–6.

    Google Scholar 

  • Henikoff, S. and Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proc NatlAcadSci USA 89(22), 10915–9.

    Article  CAS  Google Scholar 

  • Huang, J.Y. and Brutlag, D.L. (2001) The EMOTIF database. Nucleic Acids Res 29(1), 202–4.

    Article  PubMed  CAS  Google Scholar 

  • Johnson, M.S. and Overington, J.P. (1993) A structural basis for sequence comparisons. An evaluation of scoring methodologies. JMol Biol 233(4), 716–38.

    Article  CAS  Google Scholar 

  • Jonassen, I., Collins, J.F., et al. (1995) Finding flexible patterns in unaligned protein sequences. Protein Sci 4(8), 1587–95.

    Article  PubMed  CAS  Google Scholar 

  • Kanapin, A., Apweiler, R., et al. (2002) Interactive InterPro-based comparisons of proteins in whole genomes. Bioinformatics 18(2), 374–5.

    Article  PubMed  CAS  Google Scholar 

  • Karlin, S. and Altschul, S.F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA 87(6), 2264–8.

    Article  PubMed  CAS  Google Scholar 

  • Lipman, D.J., Wilbur, W.J., et al. (1984) On the statistical significance of nucleic acid similarities. Nucleic Acids Res 12(1 Pt 1), 215–26.

    Google Scholar 

  • Mathura, V.S., Schein, C.H., et al. (2003) Identifying property based sequence motifs in protein families and superfamilies: application to DNase-1 related endonucleases. Bioinformatics 19(11), 1381–90.

    Article  PubMed  CAS  Google Scholar 

  • Mulder, N.J. and Apweiler, R. (2002) Tools and resources for identifying protein families, domains and motifs. Genome Biol 3(1), REVIEWS2001.

    PubMed  Google Scholar 

  • Naor, D., Fischer, D., et al. (1996) Amino acid pair interchanges at spatially conserved locations. JMol Biol 256(5), 924–38.

    Article  CAS  Google Scholar 

  • Needleman, S.B. and Wunsch, CD. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. JMol Biol 48(3), 443–53.

    Article  CAS  Google Scholar 

  • Notredame, C, Higgins, D.G., et al. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment JMol Biol 302(1), 205–17.

    Article  CAS  Google Scholar 

  • Pearson, W.R. (1998) Empirical statistical estimates for sequence similarity searches. J Mol Biol 276(1), 71–84.

    Article  PubMed  CAS  Google Scholar 

  • Prlic, A., Domingues, F.S., et al. (2000) Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng 13(8), 545–50.

    Article  PubMed  CAS  Google Scholar 

  • Rost, B. (1999) Twilight zone of protein sequence alignments. Protein Eng 12(2), 85–94.

    Article  PubMed  CAS  Google Scholar 

  • Sigrist, C.J., Cerutti, L., et al. (2002) PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 3(3), 265–74.

    Article  PubMed  CAS  Google Scholar 

  • Smith, T.F. and Waterman, M.S. (1981) Identification of common molecular subsequences. J Mol Biol 147(1), 195–7.

    Article  PubMed  CAS  Google Scholar 

  • Thompson, J.D., Higgins, D.G., et al. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22), 4673–80.

    Article  PubMed  CAS  Google Scholar 

  • Thompson, J.D., Plewniak, F., et al. (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15(1), 87–8.

    Article  PubMed  CAS  Google Scholar 

  • Thompson, W., Rouchka, E.C., et al. (2003) Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res 31(13), 3580–5.

    Article  PubMed  CAS  Google Scholar 

  • Venkatarajan, M.S. and Braun, W. (2001) New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical-chemical properties. J Mol Model, 7, 445–53.

    Article  CAS  Google Scholar 

  • Wilson, C.A., Kreychman, J., et al. (2000) Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J Mol Biol 297(1), 233–49.

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Mathura, V.S. (2009). Biological Sequence Search and Analysis. In: Mathura, V.S., Kangueane, P. (eds) Bioinformatics: A Concept-Based Introduction. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-84870-9_5

Download citation

Publish with us

Policies and ethics