An Efficient Nearest Neighbor Method for Protein Contact Prediction

  • Gualberto Asencio-CortésEmail author
  • Jesús S. Aguilar-Ruiz
  • Alfonso E. Márquez- Chamorro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9121)


A variety of approaches for protein inter-residue contact prediction have been developed in recent years. However, this problem is far from being solved yet. In this article, we present an efficient nearest neighbor (NN) approach, called PKK-PCP, and an application for the protein inter-residue contact prediction. The great strength of using this approach is its adaptability to that problem. Furthermore, our method improves considerably the efficiency with regard to other NN approaches. Our NN-based method combines parallel execution with k-d tree as search algorithm. The input data used by our algorithm is based on structural features and physico-chemical properties of amino acids besides of evolutionary information. Results obtained show better efficiency rates, in terms of time and memory consumption, than other similar approaches.


k-nearest neighbor k-d tree Protein inter-residue contact prediction 



This research was supported by the Spanish MEC under project TIN2011-28956-C02-01.


  1. 1.
    Abu-doleh, A., Al-jarrah, O., Alkhateeb, A.: Protein contact map prediction using multi-stage hybrid intelligence inference systems. J. Biomed. Inf. 45, 173–183 (2012)CrossRefGoogle Scholar
  2. 2.
    Altschul, S., Madden, T., Schffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)CrossRefGoogle Scholar
  3. 3.
    Arya, S., Mount, D., Netanyahu, N., Silverman, R., Wu, A.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Bacardit, J., Widera, P., Márquez-Chamorro, A., Divina, F., Aguilar-Ruiz, J., Krasnogor, N.: Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features. Bioinformatics 28(19), 2441–2448 (2012)CrossRefGoogle Scholar
  5. 5.
    Bentley, J.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Black, S., Mould, D.: Development of hydrophobicity parameters to analyze proteins which bear post or cotranslational modifications. J. Anal. Biochem. 193, 72–82 (1991)CrossRefGoogle Scholar
  7. 7.
    Colubri, A., Jha, A., Shen, M., Sali, A., Berry, R., Sosnick, T., Freed, K.: Minimalist representations and the importance of nearest neighbor effects in protein folding simulations. J. Mol. Biol. 363, 835–857 (2006)CrossRefGoogle Scholar
  8. 8.
    Davies, J., Glasgow, J., Kuo, T.: Visio-spatial case-based reasoning: a case estudy in prediction of protein structure. Comput. Intel. 22, 194–207 (2006)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Fariselli, P., Casadio, R.: A neural network based predictor of residue contacts in proteins. Protein Eng. 12, 15–21 (1999)CrossRefGoogle Scholar
  10. 10.
    Glasgow, J., Kuo, T., Davies, J.: Protein structure from contact maps: a case-based reasoning approach. Inf. Sys. Front 8, 29–36 (2006)CrossRefGoogle Scholar
  11. 11.
    Jones, D.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999)CrossRefGoogle Scholar
  12. 12.
    Klein, P., Kanehisa, M., DeLisi, C.: Prediction of protein function from sequence properties: discriminant analysis of a data base. J Biochim. Biophys. 787, 221–226 (1984)CrossRefGoogle Scholar
  13. 13.
    Márquez-Chamorro, A., Asencio-Cortés, G., Divina, F., Aguilar-Ruiz, J.: Evolutionary decision rules for predicting protein contact maps. In: Pattern Analysis and Applications, September 2012, pp. 1–13 (2012)Google Scholar
  14. 14.
    Márquez-Chamorro, A.E., Divina, F., Aguilar-Ruiz, J.S., Bacardit, J., Asencio-Cortés, G., Santiesteban-Toca, C.E.: A NSGA-II algorithm for the residue-residue contact prediction. In: Giacobini, M., Vanneschi, L., Bush, W.S. (eds.) EvoBIO 2012. LNCS, vol. 7246, pp. 234–244. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  15. 15.
    Noguchi, T., Matsuda, H., Akiyama, Y.: PDB-REPRDB: a database of representative protein chains from the protein data bank (PDB). Nucl. Acids Res. 29(1), 219–220 (2001)CrossRefGoogle Scholar
  16. 16.
    Radzicka, A., Wolfenden, R.: Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1-octanol, and neutral aqueous solution. J. Biochem. 27, 1664–1670 (1988)CrossRefGoogle Scholar
  17. 17.
    Rost, B., Sander, C.: Conservation and prediction of solvent accessibility in protein families. Proteins 20(3), 216–26 (1994)CrossRefGoogle Scholar
  18. 18.
    Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kauffman, San Francisco (2011) Google Scholar
  19. 19.
    Zhang, G., Huang, D., Quan, Z.: Combining a binary input encoding scheme with RBFNN for globulin protein inter-residue contact map prediction. Pattern Recognit. Lett. 26, 1543–1553 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Gualberto Asencio-Cortés
    • 1
    Email author
  • Jesús S. Aguilar-Ruiz
    • 1
  • Alfonso E. Márquez- Chamorro
    • 1
  1. 1.School of EngineeringPablo de Olavide UniversitySevillaSpain

Personalised recommendations