Predicting Hot Spots Using a Deep Neural Network Approach

Part of the Methods in Molecular Biology book series (MIMB, volume 2190)


Targeting protein–protein interactions is a challenge and crucial task of the drug discovery process. A good starting point for rational drug design is the identification of hot spots (HS) at protein–protein interfaces, typically conserved residues that contribute most significantly to the binding. In this chapter, we depict point-by-point an in-house pipeline used for HS prediction using only sequence-based features from the well-known SpotOn dataset of soluble proteins (Moreira et al., Sci Rep 7:8007, 2017), through the implementation of a deep neural network. The presented pipeline is divided into three steps: (1) feature extraction, (2) deep learning classification, and (3) model evaluation. We present all the available resources, including code snippets, the main dataset, and the free and open-source modules/packages necessary for full replication of the protocol. The users should be able to develop an HS prediction model with accuracy, precision, recall, and AUROC of 0.96, 0.93, 0.91, and 0.86, respectively.

Key words

Protein–protein interactions Hot spots Machine learning Neural networks Python TensorFlow 



False negatives


False positives


True negatives


True positives



This work was supported by the European Regional Development Fund (ERDF), through the Centro 2020 Regional Operational Programme under project CENTRO-01-0145-FEDER-000008: BrainHealth 2020 and through the COMPETE 2020—Operational Programme for Competitiveness and Internationalisation and Portuguese national funds via FCT—Fundação para a Ciência e a Tecnologia, under project[s] POCI-01-0145-FEDER-031356, PTDC/QUI-OUT/32243/2017, and UIDB/04539/2020. A. J. Preto was also supported by FCT through PhD scholarship SFRH/BD/144966/2019. I. S. Moreira was funded by the FCT Investigator Programme—IF/00578/2014 (co-financed by European Social Fund and Programa Operacional Potencial Humano). The authors would like also to acknowledge ERNEST—European Research Network on Signal Transduction, CA18133, and STRATAGEM—New diagnostic and therapeutic tools against multidrug-resistant tumors, CA17104.


  1. 1.
    Kotlyar M, Pastrello C, Malik Z et al (2019) IID 2018 update: context-specific physical protein–protein interactions in human, model organisms and domesticated species. Nucleic Acids Res 47:D581–D589PubMedGoogle Scholar
  2. 2.
    Lage K (2014) Protein–protein interactions and genetic diseases: the interactome. Biochim Biophys Acta Mol basis Dis 1842:1971–1980Google Scholar
  3. 3.
    Ran X, Gestwicki JE (2018) Inhibitors of protein–protein interactions (PPIs): an analysis of scaffold choices and buried surface area. Curr Opin Chem Biol 44:75–86PubMedPubMedCentralGoogle Scholar
  4. 4.
    Fry DC (2015) Targeting protein-protein interactions for drug discovery. Protein-protein interactions. Methods Mol Biol 1278:93–106PubMedGoogle Scholar
  5. 5.
    Moreira IS, Koukos PI, Melo R et al (2017) SpotOn: high accuracy identification of protein-protein Interface hot-spots. Sci Rep 7:8007PubMedPubMedCentralGoogle Scholar
  6. 6.
    Moreira IS, Fernandes PA, Ramos MJ (2007) Hot spots-a review of the protein-protein interface determinant amino-acid residues. Proteins 68:803–812PubMedGoogle Scholar
  7. 7.
    Melo R, Fieldhouse R, Melo A et al (2016) A machine learning approach for hot-spot detection at protein-protein interfaces. Int J Mol Sci 17:1215PubMedCentralGoogle Scholar
  8. 8.
    Sommer C, Gerlich DW (2013) Machine learning in cell biology—teaching computers to recognize phenotypes. J Cell Sci 126:5529–5539PubMedGoogle Scholar
  9. 9.
    Libbrecht MW, Noble WS (2015) Machine learning applications in genetics and genomics. Nat Rev Genet 16:321–332PubMedPubMedCentralGoogle Scholar
  10. 10.
    Lise S, Buchan D, Pontil M et al (2011) Predictions of hot spot residues at protein-protein interfaces using support vector machines. PLoS One 6:e16774PubMedPubMedCentralGoogle Scholar
  11. 11.
    Ofran Y, Rost B (2007) ISIS: interaction sites identified from sequence. Bioinformatics 23:e13–e16PubMedGoogle Scholar
  12. 12.
    Wang H, Liu C, Deng L (2018) Enhanced prediction of hot spots at protein-protein interfaces using extreme gradient boosting. Sci Rep 8:14285PubMedPubMedCentralGoogle Scholar
  13. 13.
    Jain AK, Jianchang M, Mohiuddin KM (1996) Artificial neural networks: a tutorial. Computer (Long Beach Calif) 29:31–44Google Scholar
  14. 14.
    Gonzalez RC (2018) Deep convolutional neural networks [lecture notes]. IEEE Signal Process Mag 35:79–87Google Scholar
  15. 15.
    Bengio Y (2009) Learning deep architectures for AI. Found trends®. Mach Learn 2:1–127Google Scholar
  16. 16.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444PubMedPubMedCentralGoogle Scholar
  17. 17.
    Cock PJA, Antao T, Chang JT et al (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25:1422–1423PubMedPubMedCentralGoogle Scholar
  18. 18.
    van der Walt S, Colbert SC, Varoquaux G (2011) The NumPy Array: a structure for efficient numerical computation. Comput Sci Eng 13:22–30Google Scholar
  19. 19.
    McKinney W (2010) Data structures for statistical computing in python, in: proceeding of the 9th python in science Conf (SciPy 2010), Austin, TexasGoogle Scholar
  20. 20.
    Rossum G van, Boer J de (1991) Linking a stub generator (AIL) to a prototyping language (python), In: EurOpen Conference Proceedings, Tromso, NorwayGoogle Scholar
  21. 21.
    Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830Google Scholar
  22. 22.
    Abadi M, Agarwal A, Barham P et al (2015) TensorFlow: large-scale machine learning on heterogeneous distributed systems, preprint available at arXiv:1603.04467Google Scholar
  23. 23.
    Buckman J, Roy A, Raffel C et al (2018), Thermometer encoding: one hot way to resist adversarial examples. In: 6th international conference on learning representations (ICLR 2018), Vancouver, CanadaGoogle Scholar
  24. 24.
    Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Preprint available at arXiv:1412.6980Google Scholar
  25. 25.
    Crowther PS, Cox RJ (2005) A method for optimal division of data sets for use in neural networks, presented at the knowledge-based intelligent information and engineering systems. KES 2005. In: Lecture notes in computer science, vol 3684. Springer, Berlin, HeidelbergGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2021

Authors and Affiliations

  1. 1.Center for Innovative Biomedicine and BiotechnologyUniversity of CoimbraCoimbraPortugal
  2. 2.Center for Neuroscience and Cell BiologyUniversity of CoimbraCoimbraPortugal
  3. 3.Institute for Interdisciplinary ResearchUniversity of CoimbraCoimbraPortugal
  4. 4.University of Coimbra, Department of Life SciencesUniversity of CoimbraCoimbraPortugal

Personalised recommendations