An Optimized Distance Function for Comparison of Protein Binding Sites

  • Gábor Iván
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4643)


An important field of application of string processing algorithms is the comparison of protein or nucleotide sequences. In this paper we present an algorithm capable of determining the dissimilarity (distance) of protein sequences originating from protein binding sites found in the RS-PDB database that is a repaired and cleaned version of the publicly available Protein Data Bank (PDB). The special way of construction of these protein sequences enabled us to optimize the algorithm, achieving runtimes several times faster than the unoptimized approach. One example the algorithm proposed in this paper can be useful for is searching conserved sequences in protein chains.


Protein Data Bank Protein Binding Site Substitution Matrice Protein Data Bank Entry Structure Protein Data Bank 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Grolmusz, V., Szabadka, Z.: High Throughput Processing of the Structural Information of the Protein Data Bank. Journal of Molecular Graphics and Modeling 25, 831–836 (2007)CrossRefGoogle Scholar
  2. 2.
    Grolmusz, V., Szabadka, Z.: Building a Structured PDB: The RS-PDB Database. In: Proc. of the 28th IEEE EMBS Annual International Conference, New York City, pp. 5755–5758. IEEE Computer Society Press, Los Alamitos (2006)Google Scholar
  3. 3.
    Henikoff, S., Henikoff, J.G.: Amino Acid Substitution Matrices from Protein Blocks. PNAS 89, 10915–10919 (1992)CrossRefGoogle Scholar
  4. 4.
    Mott, R.: Local Sequence Alignments with Monotonic Gap Penalties. Bioinformatics 15(6), 455–462 (1999)CrossRefGoogle Scholar
  5. 5.
    Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)CrossRefGoogle Scholar
  6. 6.
    Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. Journal of Molecular Biology 147, 195–197 (1981)CrossRefGoogle Scholar
  7. 7.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 27(17), 3389–3402 (1997)CrossRefGoogle Scholar
  8. 8.
    Berman, H.M., Henrick, K., Nakamura, H.: Announcing the worldwide Protein Data Bank. Nature Structural Biology 10(12), 980 (2003)CrossRefGoogle Scholar
  9. 9.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. Journal of Molecular Biology 215, 403–410 (1990)Google Scholar
  10. 10.
    Ukkonen, E.: Algorithms for Approximate String Matching. Information and Control 64(1-3), 100–118 (1985)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Gábor Iván
    • 1
  1. 1.Department of Computer Science, Eötvös University, Pázmány Péter stny. 1/C, H-1117 BudapestHungary

Personalised recommendations