Prediction of Hot Spots Based on Physicochemical Features and Relative Accessible Surface Area of Amino Acid Sequence

  • ShanShan Hu
  • Peng ChenEmail author
  • Jun Zhang
  • Bing Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9771)


Hot spot is dominant for understanding the mechanism of protein-protein interactions and can be applied as a target to drug design. Since experimental methods are costly and time-consuming, computational methods are prevalently applied as an useful tool in hot spot prediction through sequence or structure information. Here, we propose a new sequence-based model that combines physicochemical features with relative accessible surface area of amino acid sequence. The model consists of 83 classifiers involving IBk algorithm, where instances for one classifier are encoded by corresponding property extracted from 544 properties in AAindex1 database. Then several top performance classifiers with respect to F1 score are selected to be an ensemble by majority voting technique. The model outperforms other state-of-the-art computational methods, yields a F1 score of 0.80 on BID test set.


Hot spots Physicochemical features Majority voting IBk algorithm 



This work was supported by the National Natural Science Foundation of China (Nos. 61300058, 61472282, 61271098 and 61374181).


  1. 1.
    Chothia, C., Janin, J.: Principles of protein-protein recognition. Nature 256(5520), 705–708 (1975)CrossRefGoogle Scholar
  2. 2.
    Bogan, A.A., Thorn, K.S.: Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280(1), 1–9 (1998)CrossRefGoogle Scholar
  3. 3.
    Brenke, R., Kozakov, D., Chuang, G.Y., Beglov, D., Hall, D., Landon, M.R., Mattos, C., Vajda, S.: Fragment-based identification of druggable ‘hot spots’ of proteins using Fourier domain correlation techniques. Bioinformatics 25(5), 621–627 (2009)CrossRefGoogle Scholar
  4. 4.
    Wells, J.A.: Systematic mutational analyses of protein-protein interfaces. Methods Enzymol. 202, 390–411 (1991)CrossRefGoogle Scholar
  5. 5.
    DeLano, W.L.: Unraveling hot spots in binding interfaces: progress and challenges. Curr. Opin. Struct. Biol. 12(1), 14–20 (2002)CrossRefGoogle Scholar
  6. 6.
    Kortemme, T., Baker, D.: A simple physical model for binding energy hot spots in protein-protein complexes. Proc. Nat. Acad. Sci. U.S.A. 99(22), 14116–14121 (2002)CrossRefGoogle Scholar
  7. 7.
    Guerois, R., Nielsen, J.E., Serrano, L.: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 320(2), 369–387 (2002)CrossRefGoogle Scholar
  8. 8.
    Darnell, S.J., Page, D., Mitchell, J.C.: An automated decision-tree approach to predicting protein interaction hot spots. Proteins 68(4), 813–823 (2007)CrossRefGoogle Scholar
  9. 9.
    Shingate, P., Manoharan, M., Sukhwa, A., Sowdhamini, R.: ECMIS: computational approach for the identification of hotspots at protein-protein interfaces. BMC Bioinformatics 15, 303 (2014)CrossRefGoogle Scholar
  10. 10.
    Wang, L., Zhang, W., Gao, Q., Xiong, C.: Prediction of hot spots in protein interfaces using extreme learning machines with the information of spatial neighbour residues. IET Syst. Biol. 8(4), 184–190 (2014)CrossRefGoogle Scholar
  11. 11.
    Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. (Database Issue) 36, D202–205 (2008)CrossRefGoogle Scholar
  12. 12.
    Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)Google Scholar
  13. 13.
    Thorn, K.S., Bogan, A.A.: ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 17(3), 284–285 (2001)CrossRefGoogle Scholar
  14. 14.
    Fischer, T.B., Arunachalam, K.V., Bailey, D., Mangual, V., Bakhru, S., Russo, R., Huang, D., Paczkowski, M., Lalchandani, V., Ramachandra, C.: The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 19(11), 1453–1454 (2003)CrossRefGoogle Scholar
  15. 15.
    Chen, P., Li, J., Wong, L., Kuwahara, H., Huang, J.Z., Gao, X.: Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences. Proteins 81(8), 1351–1362 (2013)CrossRefGoogle Scholar
  16. 16.
    Chou, K.C.: Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43(3), 246–255 (2001)CrossRefGoogle Scholar
  17. 17.
    Liu, B., Wang, S., Wang, X.: DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Sci. Rep. 5, 15479 (2015)CrossRefGoogle Scholar
  18. 18.
    Tang, H., Chen, W., Lin, H.: Identification of immunoglobulins using chou’s pseudo amino acid composition with feature selection technique. Mol. BioSyst. 12(4), 1269–1275 (2016)CrossRefGoogle Scholar
  19. 19.
    Shen, H.B., Chou, K.C.: PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal. Biochem. 373(2), 386–388 (2008)CrossRefGoogle Scholar
  20. 20.
    Martins, J.M., Ramos, R.M., Pimenta, A.C., Moreira, I.S.: Solvent-accessible surface area: how well can be applied to hot-spot detection? Proteins 82(3), 479–490 (2014)CrossRefGoogle Scholar
  21. 21.
    Chen, R., Chen, W., Yang, S., Wu, D., Wang, Y., Tian, Y., Shi, Y.: Rigorous assessment and integration of the sequence and structure based features to predict hot spots. BMC Bioinformatics 12, 311 (2011)CrossRefGoogle Scholar
  22. 22.
    Petersen, B., Petersen, T.N., Andersen, P., Nielsen, M., Lundegaard, C.: A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct. Biol. 9, 51 (2009)CrossRefGoogle Scholar
  23. 23.
    Darnell, S.J., LeGault, L., Mitchell, J.C.: KFC server: interactive forecasting of protein interaction hot spots. Nucleic Acids Res. (Web Server Issue). 36, W265–269 (2008)CrossRefGoogle Scholar
  24. 24.
    Ofran, Y., Rost, B.: ISIS: interaction sites identified from sequence. Bioinformatics 23(2), E13–E16 (2007)CrossRefGoogle Scholar
  25. 25.
    Tuncbag, N., Gursoy, A., Keskin, O.: Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 25(12), 1513–1520 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Institute of Health SciencesAnhui UniversityHefeiChina
  2. 2.College of Electrical Engineering and AutomationAnhui UniversityHefeiChina
  3. 3.School of Electronics and Information EngineeringTongji UniversityShanghaiChina

Personalised recommendations