Biological Sequence Search and Analysis

Mathura, Venkatarajan S.

doi:10.1007/978-0-387-84870-9_5

Venkatarajan S. Mathura³

2987 Accesses

Abstract

Protein and genomic sequence analyses helps in understanding the structure, function, and organization of cellular systems. Important features of genes include identifying promoter regions, protein-coding regions, and intron-exon boundaries. Protein sequence analysis involves identifying functional motifs and patterns. Sequence search tools help in identifying similar sequences in protein and genomic databases. Here, we will discuss bioinformatics tools that help in biological sequence searches and analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abagyan, R.A. and Batalov, S. (1997) Do aligned sequences share the same fold? JMol Biol 273(1), 355–68.
Article PubMed CAS Google Scholar
Attwood, T.K., Craning, M.D., et al. (2000) PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res 28(1), 225–7.
Article PubMed CAS Google Scholar
Biswas, M., O’Rourke, J.F., et al. (2002) Applications of InterPro in protein annotation and genome analysis. Brief Bioinform 3(3), 285–95.
Article PubMed CAS Google Scholar
Dayhoff, M.O. and Schwartz, R.M. (1978). A model of evolutionary change in proteins. Washington DC, National Biomedical Research Foundation.
Google Scholar
Falquet, L., Pagni, M., et al. (2002) The PROSITE database, its status in 2002. Nucleic Acids Res 30(1), 235–8.
Article PubMed CAS Google Scholar
Finn, R.D., Mistry, J., et al. (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34(Database issue), D247–51.
Google Scholar
Gattiker, A., Gasteiger, E., et al. (2002) ScanProsite: a reference implementation of a PROSITE scanning tool. Appl Bioinformatics 1(2), 107–8.
PubMed CAS Google Scholar
Gonnet, G.H., Cohen, M.A., et al. (1992) Exhaustive matching of the entire protein sequence database. Science 256(5062), 1443–5.
Article PubMed CAS Google Scholar
Gotoh, O. (1982) An improved algorithm for matching biological sequences. J Mol Biol 162(3), 705–8.
Article PubMed CAS Google Scholar
Grundy, W.N., Bailey, T.L., et al. (1997) Hidden Markov model analysis of motifs in steroid dehydrogenases and their homologs. Biochem Biophys Res Commun 231(3), 760–6.
Article PubMed CAS Google Scholar
Grundy, W.N., Bailey, T.L., et al. (1997 b) Meta-MEME: motif-based hidden Markov models of protein families. Comput Appl Biosci 13(4), 397–406.
Google Scholar
Henikoff, J.G., Greene, E.A., et al. (2000) Increased coverage of protein families with the blocks database servers. Nucleic Acids Res 28(1), 228–30.
Article PubMed CAS Google Scholar
Henikoff, J.G., Pietrokovski, S., et al. (2000 b) Blocks-based methods for detecting protein homology. Electrophoresis 21(9), 1700–6.
Google Scholar
Henikoff, S. and Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proc NatlAcadSci USA 89(22), 10915–9.
Article CAS Google Scholar
Huang, J.Y. and Brutlag, D.L. (2001) The EMOTIF database. Nucleic Acids Res 29(1), 202–4.
Article PubMed CAS Google Scholar
Johnson, M.S. and Overington, J.P. (1993) A structural basis for sequence comparisons. An evaluation of scoring methodologies. JMol Biol 233(4), 716–38.
Article CAS Google Scholar
Jonassen, I., Collins, J.F., et al. (1995) Finding flexible patterns in unaligned protein sequences. Protein Sci 4(8), 1587–95.
Article PubMed CAS Google Scholar
Kanapin, A., Apweiler, R., et al. (2002) Interactive InterPro-based comparisons of proteins in whole genomes. Bioinformatics 18(2), 374–5.
Article PubMed CAS Google Scholar
Karlin, S. and Altschul, S.F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA 87(6), 2264–8.
Article PubMed CAS Google Scholar
Lipman, D.J., Wilbur, W.J., et al. (1984) On the statistical significance of nucleic acid similarities. Nucleic Acids Res 12(1 Pt 1), 215–26.
Google Scholar
Mathura, V.S., Schein, C.H., et al. (2003) Identifying property based sequence motifs in protein families and superfamilies: application to DNase-1 related endonucleases. Bioinformatics 19(11), 1381–90.
Article PubMed CAS Google Scholar
Mulder, N.J. and Apweiler, R. (2002) Tools and resources for identifying protein families, domains and motifs. Genome Biol 3(1), REVIEWS2001.
PubMed Google Scholar
Naor, D., Fischer, D., et al. (1996) Amino acid pair interchanges at spatially conserved locations. JMol Biol 256(5), 924–38.
Article CAS Google Scholar
Needleman, S.B. and Wunsch, CD. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. JMol Biol 48(3), 443–53.
Article CAS Google Scholar
Notredame, C, Higgins, D.G., et al. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment JMol Biol 302(1), 205–17.
Article CAS Google Scholar
Pearson, W.R. (1998) Empirical statistical estimates for sequence similarity searches. J Mol Biol 276(1), 71–84.
Article PubMed CAS Google Scholar
Prlic, A., Domingues, F.S., et al. (2000) Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng 13(8), 545–50.
Article PubMed CAS Google Scholar
Rost, B. (1999) Twilight zone of protein sequence alignments. Protein Eng 12(2), 85–94.
Article PubMed CAS Google Scholar
Sigrist, C.J., Cerutti, L., et al. (2002) PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 3(3), 265–74.
Article PubMed CAS Google Scholar
Smith, T.F. and Waterman, M.S. (1981) Identification of common molecular subsequences. J Mol Biol 147(1), 195–7.
Article PubMed CAS Google Scholar
Thompson, J.D., Higgins, D.G., et al. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22), 4673–80.
Article PubMed CAS Google Scholar
Thompson, J.D., Plewniak, F., et al. (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15(1), 87–8.
Article PubMed CAS Google Scholar
Thompson, W., Rouchka, E.C., et al. (2003) Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res 31(13), 3580–5.
Article PubMed CAS Google Scholar
Venkatarajan, M.S. and Braun, W. (2001) New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical-chemical properties. J Mol Model, 7, 445–53.
Article CAS Google Scholar
Wilson, C.A., Kreychman, J., et al. (2000) Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J Mol Biol 297(1), 233–49.
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Roskamp Institute, 2040 Whi field Avenue, Sarasota, Florida 34243, USA
Venkatarajan S. Mathura

Authors

Venkatarajan S. Mathura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Roskamp Institute, 2040 Whitfield Avenue, Sarasota, FL 34243
Venkatarajan S. Mathura
Biomed-Informatics, 17A Main Road, Irulan Chandai Annex, Pondicherry 607 402, India
Pandjassarame Kangueane

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mathura, V.S. (2009). Biological Sequence Search and Analysis. In: Mathura, V.S., Kangueane, P. (eds) Bioinformatics: A Concept-Based Introduction. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-84870-9_5

Download citation

DOI: https://doi.org/10.1007/978-0-387-84870-9_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-84869-3
Online ISBN: 978-0-387-84870-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics