Database Similarity Searches

Plewniak, Frédéric

doi:10.1007/978-1-59745-398-1_24

Frédéric Plewniak⁵

Part of the book series: Methods in Molecular Biology ((MIMB,volume 484))

3087 Accesses
2 Citations

Abstract

With genome sequencing projects producing huge amounts of sequence data, database sequence similarity search has become a central tool in bioinformatics to identify potentially homologous sequences. It is thus widely used as an initial step for sequence characterization and annotation, phylogeny, genomics, transcriptomies, and proteomics studies. Database similarity search is based upon sequence alignment methods also used in pairwise sequence comparison. Sequence alignment can be global (whole sequence alignment) or local (partial sequence alignment) and there are algorithms to find the optimal alignment given particular comparison criteria. However, as database searches require the comparison of the query sequence with every single sequence in the database, heuristic algorithms have been designed to reduce the time required to build an alignment that has a reasonable chance to be the best one. Such algorithms have been implemented as fast and efficient programs (Blast, FastA) available in different types to address different kinds of problems. After searching the appropriate database, similarity search programs produce a list of similar sequences and local alignments. These results should be carefully examined before coming to any conclusion, as many traps await the similarity seeker: paralogues, multidomain proteins, pseudogenes, etc. This chapter presents points that should always be kept in mind when performing database similarity searches for various goals. It ends with a practical example of sequence characterization from a single protein database search using Blast.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rao, J. K. M. (1987) New scoring matrix for amino acid residue exchange based on residue characteristic physical parameters. Int. J. Peptide Protein Res. 29, 276–281.
CAS Google Scholar
Henikoff, S. and Henikoff, J. G. (1993) Performance evaluation of amino acid substitution matrices. Proteins: Structure Function Genet. 17, 49–61.
Article CAS Google Scholar
Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919.
Article PubMed CAS Google Scholar
Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. (1978) A model of evolutionary change in proteins. Atlas Protein Sequence Struct. 5, 345–352.
Google Scholar
Altschul, S. F., Madden, T. L., Schaeffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Article PubMed CAS Google Scholar
Pearson, W. R., and Lipman, D. J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.
Article PubMed CAS Google Scholar
Gumbel, E. J. (1958) Statistics of Extremes. Columbia University Press, New York.
Google Scholar
Needleman, S. B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48, 443–453.
Article PubMed CAS Google Scholar
Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.
Article PubMed CAS Google Scholar
The UniProt Consortium. (2007) The Universal Protein Resource (UniProt). Nucleic Acids Res. 35, D193–D197.
Article Google Scholar
Pruitt, K. D., Tatusova, T., and Maglott, D. R. (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65.
Article PubMed CAS Google Scholar
Berman, H. M., Battistuz, T., Bhat, T. N., Bluhm, W. F., Bourne, P. E., Burkhardt, K., Feng, Z., Gilliland, G. L., Iype, L., Jain, S., Fagan, P., Marvin, J., Padilla, D., Ravichandran, V., Schneide, B., Thanki, N., Weissig, H., Westbrook, J. D., and Zardecki, C. (2002) The Protein Data Bank. Acta Crrystallogr. D Biol. Crystallogr. 58, 899–907.
Article CAS Google Scholar
Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R., and Wu, C. H. (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 282–288.
Article CAS Google Scholar
Wootton, J. C. and Federhen, S. (1993) Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17, 149–163.
Article CAS Google Scholar
Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E. L., Studholme, D. J., Yeats, C., and Eddy, S. R. (2004) The Pfam protein families database. Nucleic Acids Res. 32, D138–D141.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Plate-forme Bio-informatique de Strasbourg, Institut de Génétique et de Biologie Moléculaire et Cellulaire, UMR 7104-CNRS-Inserm-ULP, Illkirch, France
Frédéric Plewniak

Authors

Frédéric Plewniak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Laboratoire de Bioinformatique et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire, Illkirch, France
Julie D. Thompson
Department of Protein Science Helmholtz Zentrum München, German Research Center for Environmental Health, Munich-Neuherberg, Germany
Marius Ueffing
LSMBO, ECPM, Institut Pluridisciplinaire Hubert Curien, Strasbourg, France
Christine Schaeffer-Reiss

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Plewniak, F. (2008). Database Similarity Searches. In: Thompson, J.D., Ueffing, M., Schaeffer-Reiss, C. (eds) Functional Proteomics. Methods in Molecular Biology, vol 484. Humana Press. https://doi.org/10.1007/978-1-59745-398-1_24

Download citation

DOI: https://doi.org/10.1007/978-1-59745-398-1_24
Publisher Name: Humana Press
Print ISBN: 978-1-58829-971-0
Online ISBN: 978-1-59745-398-1
eBook Packages: Springer Protocols

Publish with us

Policies and ethics