Skip to main content
Log in

Predicting Hot Spot Residues at Protein–DNA Binding Interfaces Based on Sequence Information

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Hot spot residues at protein–DNA binding interfaces are hugely important for investigating the underlying mechanism of molecular recognition. Currently, there are a few tools available for identifying the hot spot residues in the protein–DNA complexes. In addition, the three-dimensional protein structures are needed in these tools. However, it is well known that the three-dimensional structures are unavailable for most proteins. Considering the limitation, we proposed a method, named SPDH, for predicting hot spot residues only based on protein sequences. Firstly, we obtained 133 features from physicochemical property, conservation, predicted solvent accessible surface area and structure. Then, we systematically assessed these features based on various feature selection methods to obtain the optimal feature subset and compared the models using four classical machine learning algorithms (support vector machine, random forest, logistic regression, and k-nearest neighbor) on the training dataset. We found that the variability of physicochemical property features between wild and mutative types was important on improving the performance of the prediction model. On the independent test set, our method achieved the performance with AUC of 0.760 and sensitivity of 0.808, and outperformed other methods. The data and source code can be downloaded at https://github.com/xialab-ahu/SPDH.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Xiong Y, Zhu X, Dai H, Wei D-Q (2018) Survey of computational approaches for prediction of DNA-binding residues on protein surfaces. Computational systems biology. Springer, Berlin, pp 223–234. https://doi.org/10.1007/978-1-4939-7717-8_13

    Chapter  Google Scholar 

  2. Zhang S, Zhao L, Zheng C-H, Xia J (2020) A feature-based approach to predict hot spots in protein–DNA binding interfaces. Briefings Bioinf 21(3):1038–1046. https://doi.org/10.1093/bib/bbz037

    Article  CAS  Google Scholar 

  3. Clackson T, Wells JA (1995) A hot spot of binding energy in a hormone-receptor interface. Science 267(5196):383–386. https://doi.org/10.1126/science.7529940

    Article  CAS  PubMed  Google Scholar 

  4. Chauhan S, Ahmad S (2020) Enabling full-length evolutionary profiles based deep convolutional neural network for predicting DNA-binding proteins from sequence. Proteins Struct Funct Bioinf 88(1):15–30. https://doi.org/10.1002/prot.25763

    Article  CAS  Google Scholar 

  5. Wang L, Brown SJ (2006) BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res 34:W243–W248. https://doi.org/10.1093/nar/gkl298

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Bogan AA, Thorn KS (1998) Anatomy of hot spots in protein interfaces. J Mol Biol 280(1):1–9. https://doi.org/10.1006/jmbi.1998.1843

    Article  CAS  PubMed  Google Scholar 

  7. DeLano WL (2002) Unraveling hot spots in binding interfaces: progress and challenges. Curr Opin Struct Biol 12(1):14–20. https://doi.org/10.1016/s0959-440x(02)00283-x

    Article  CAS  PubMed  Google Scholar 

  8. Moreira IS, Fernandes PA, Ramos MJ (2007a) Computational determination of the relative free energy of binding–application to alanine scanning mutagenesis. Molecular materials with specific interactions–modeling and design. Springer, Berlin, pp 305–339. https://doi.org/10.1007/1-4020-5372-x_6

    Chapter  Google Scholar 

  9. Moreira IS, Fernandes PA, Ramos MJ (2007b) Hot spots—a review of the protein-protein interface determinant amino-acid residues. Proteins 68(4):803–812. https://doi.org/10.1002/prot.21396

    Article  CAS  PubMed  Google Scholar 

  10. Gao M, Skolnick J (2009) A threading-based method for the prediction of DNA-binding proteins with application to the human genome. PLoS Comput Biol 5(11):e1000567. https://doi.org/10.1371/journal.pcbi.1000567

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Gao Y, Wang R, Lai L (2004) Structure-based method for analyzing protein–protein interfaces. J Mol Model 10(1):44–54. https://doi.org/10.1007/s00894-003-0168-3

    Article  CAS  PubMed  Google Scholar 

  12. Nimrod G, Szilágyi A, Leslie C, Ben-Tal N (2009) Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. J Mol Biol 387(4):1040–1053. https://doi.org/10.1016/j.jmb.2009.02.023

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Peng Y, Sun L, Jia Z, Li L, Alexov E (2018) Predicting protein–DNA binding free energy change upon missense mutations using modified MM/PBSA approach: SAMPDI webserver. Bioinformatics 34(5):779–786. https://doi.org/10.1093/bioinformatics/btx698

    Article  CAS  PubMed  Google Scholar 

  14. Pires DE, Ascher DB (2017) mCSM–NA: predicting the effects of mutations on protein–nucleic acids interactions. Nucleic Acids Res 45(W1):W241–W246. https://doi.org/10.1093/nar/gkx236

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Zhang N, Chen Y, Zhao F, Yang Q, Simonetti FL, Li M (2018) PremPDI estimates and interprets the effects of missense mutations on protein-DNA interactions. PLoS Comput Biol 14(12):e1006615. https://doi.org/10.1371/journal.pcbi.1006615

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. (2019) Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res 47(D1):D520–D528. https://doi.org/10.1093/nar/gky949

  17. Consortium U (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506–D515. https://doi.org/10.1093/nar/gky1049

    Article  CAS  Google Scholar 

  18. Liu L, Xiong Y, Gao H, Wei D-Q, Mitchell JC, Zhu X (2018) dbAMEPNI: a database of alanine mutagenic effects for protein–nucleic acid interactions. Database 2018:bay034. https://doi.org/10.1093/database/bay034

    Article  CAS  PubMed Central  Google Scholar 

  19. He J, Fang T, Zhang Z, Huang B, Zhu X, Xiong Y (2018) PseUI: Pseudouridine sites identification based on RNA sequence information. BMC Bioinf 19(1):306. https://doi.org/10.1186/s12859-018-2321-0

    Article  CAS  Google Scholar 

  20. Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28. https://doi.org/10.1109/5254.708428

    Article  Google Scholar 

  21. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659. https://doi.org/10.1093/bioinformatics/btl158

    Article  CAS  PubMed  Google Scholar 

  22. Hubbard S, Thornton J (1992) NACCESS: program for calculating accessibilities. Department of biochemistry and molecular biology. University College of London. Available at https://www.bioinf.manchester.ac.uk/naccess

  23. Lundberg J (2007) Lifting the crown-citation z-score. J Informetr 1(2):145–154. https://doi.org/10.1016/j.joi.2006.09.007

    Article  Google Scholar 

  24. Pan Y, Wang Z, Zhan W, Deng L (2018) Computational identification of binding energy hot spots in protein–RNA complexes using an ensemble approach. Bioinformatics 34(9):1473–1480. https://doi.org/10.1093/bioinformatics/btx822

    Article  CAS  PubMed  Google Scholar 

  25. Munteanu CR, AnC P, Fernandez-Lozano C, Melo A, Cordeiro MN, Moreira IS (2015) Solvent accessible surface area-based hot-spot detection methods for protein–protein and protein–nucleic acid interfaces. J Chem Inf Model 55(5):1077–1086. https://doi.org/10.1021/ci500760m

    Article  CAS  PubMed  Google Scholar 

  26. Petersen B, Petersen TN, Andersen P, Nielsen M, Lundegaard C (2009) A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct Biol 9(1):51. https://doi.org/10.1186/1472-6807-9-51

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402. https://doi.org/10.1093/nar/25.17.3389

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5(1):1–11. https://doi.org/10.1038/srep11476

    Article  Google Scholar 

  29. Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33(18):2842–2849. https://doi.org/10.1093/bioinformatics/btx218

    Article  CAS  PubMed  Google Scholar 

  30. Dosztanyi Z, Csizmok V, Tompa P, Simon I (2005) The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol 347(4):827–839. https://doi.org/10.1016/j.jmb.2005.01.071

    Article  CAS  PubMed  Google Scholar 

  31. Mészáros B, Simon I, Dosztányi Z (2009) Prediction of protein binding regions in disordered proteins. PLoS Comput Biol 5(5):e1000376. https://doi.org/10.1371/journal.pcbi.1000376

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Jones DT, Cozzetto D (2015) DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics 31(6):857–863. https://doi.org/10.1093/bioinformatics/btu744

    Article  CAS  PubMed  Google Scholar 

  33. Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, Russell RB (2003) Protein disorder prediction: implications for structural proteomics. Structure 11(11):1453–1459. https://doi.org/10.1016/j.str.2003.10.002

    Article  CAS  PubMed  Google Scholar 

  34. Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M (2007) AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 36(suppl_1):D202–D205. https://doi.org/10.1093/nar/gkm998

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Chen P, Li J, Wong L, Kuwahara H, Huang JZ, Gao X (2013) Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences. Proteins 81(8):1351–1362. https://doi.org/10.1002/prot.24278

    Article  CAS  PubMed  Google Scholar 

  36. Zhang S, Zhao L, Xia J (2019) SPHot: prediction of hot spots in protein-RNA complexes by protein sequence information and ensemble classifier. IEEE Access 7:104941–104946. https://doi.org/10.1109/access.2019.2931552

    Article  Google Scholar 

  37. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422. https://doi.org/10.1023/a:1012487302797

    Article  Google Scholar 

  38. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal 27(8):1226–1238. https://doi.org/10.1109/tpami.2005.159

    Article  Google Scholar 

  39. Xia J, Yue Z, Di Y, Zhu X, Zheng C-H (2016) Predicting hot spots in protein interfaces based on protrusion index, pseudo hydrophobicity and electron-ion interaction pseudopotential features. Oncotarget 7(14):18065. https://doi.org/10.18632/oncotarget.7695

    Article  PubMed  PubMed Central  Google Scholar 

  40. Xia J-F, Zhao X-M, Song J, Huang D-S (2010) APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinf 11(1):174. https://doi.org/10.1186/1471-2105-11-174

    Article  CAS  Google Scholar 

  41. Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27. https://doi.org/10.1145/1961189.1961199

    Article  Google Scholar 

  42. Xiong Y, Wang Q, Yang J, Zhu X, Wei D-Q (2018) PredT4SE-Stack: prediction of bacterial type IV secreted effectors from protein sequences using a stacked ensemble method. Front Microbiol 9:2571. https://doi.org/10.3389/fmicb.2018.02571

    Article  PubMed  PubMed Central  Google Scholar 

  43. Yue Z, Zhao L, Cheng N, Yan H, Xia J (2019) dbCID: a manually curated resource for exploring the driver indels in human cancer. Brief Bioinform 20(5):1925–1933. https://doi.org/10.1093/bib/bby059

    Article  CAS  PubMed  Google Scholar 

  44. Cheng N, Li M, Zhao L, Zhang B, Yang Y, Zheng C-H, Xia J (2020) Comparison and integration of computational methods for deleterious synonymous mutation prediction. Brief Bioinform 21(3):970–981. https://doi.org/10.1093/bib/bbz047

    Article  CAS  PubMed  Google Scholar 

  45. Chu Y, Kaushik AC, Wang X, Wang W, Zhang Y, Shan X, Salahub DR, Xiong Y, Wei D-Q (2019) DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief Bioinform. https://doi.org/10.1093/bib/bbz152

    Article  Google Scholar 

  46. Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 100(9):1100–1103. https://doi.org/10.1109/t-c.1971.223410

    Article  Google Scholar 

  47. Veljkovic V, Cosic I, Lalovic D (1985) Is it possible to analyze DNA and protein sequences by the methods of digital signal processing? IEEE Trans Biomed Eng 5:337–341. https://doi.org/10.1109/tbme.1985.325549

    Article  Google Scholar 

  48. Wilce MC, Aguilar M-I, Hearn MT (1995) Physicochemical basis of amino acid hydrophobicity scales: evaluation of four new scales of amino acid hydrophobicity coefficients derived from RP-HPLC of peptides. Anal Chem 67(7):1210–1219. https://doi.org/10.1021/ac00103a012

    Article  CAS  Google Scholar 

  49. Maxfield FR, Scheraga HA (1976) Status of empirical methods for the prediction of protein backbone topography. Biochemistry 15(23):5138–5153. https://doi.org/10.1021/bi00668a030

    Article  CAS  PubMed  Google Scholar 

  50. Lazović J (1996) Selection of amino acid parameters for Fourier transform-based analysis of proteins. Bioinformatics 12(6):553–562. https://doi.org/10.1093/bioinformatics/12.6.553

    Article  Google Scholar 

  51. Cosic I, Pavlovic M, Vojisavljevic V (1989) Prediction of “hot spots” in interleukin-2 based on informational spectrum characteristics of growth-regulating factors. Comparison with experimental data. Biochimie 71(3):333–342. https://doi.org/10.1016/0300-9084(89)90005-9

    Article  CAS  PubMed  Google Scholar 

  52. Ramachandran P, Antoniou A (2008) Identification of hot-spot locations in proteins using digital filters. IEEE J STSP 2(3):378–389. https://doi.org/10.1109/jstsp.2008.923850

    Article  Google Scholar 

  53. Dill KA (1990) Dominant forces in protein folding. Biochemistry 29(31):7133–7155. https://doi.org/10.1021/bi00483a001

    Article  CAS  Google Scholar 

  54. Lichtarge O, Bourne HR, Cohen FE (1996) An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257(2):342–358. https://doi.org/10.1006/jmbi.1996.0167

    Article  CAS  PubMed  Google Scholar 

  55. Kenneth Morrow J, Zhang S (2012) Computational prediction of protein hot spot residues. Curr Pharm Des 18(9):1255–1265. https://doi.org/10.2174/138920012799362909

    Article  Google Scholar 

  56. Keskin O, Ma B, Nussinov R (2005) Hot regions in protein–protein interactions: the organization and contribution of structurally conserved hot spot residues. J Mol Biol 345(5):1281–1294. https://doi.org/10.1016/j.jmb.2004.10.077

    Article  CAS  PubMed  Google Scholar 

  57. Banerjee S, Nag S, Tapadar S, Ghosh S, Guha S, Bakshi S (2015) Improving protein protein interaction prediction by choosing appropriate physiochemical properties of amino acids. In: 2015 international conference and workshop on computing and communication (IEMCON). IEEE, pp 1–8. https://doi.org/10.1109/iemcon.2015.7344458

  58. Sun M, Wang X, Zou C, He Z, Liu W, Li H (2016) Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors. BMC Bioinform 17(1):231. https://doi.org/10.1186/s12859-016-1110-x

    Article  CAS  Google Scholar 

  59. Elrod-Erickson M, Rould MA, Nekludova L, Pabo CO (1996) Zif268 protein–DNA complex refined at 1.6 Å: a model system for understanding zinc finger–DNA interactions. Structure 4(10):1171–1180. https://doi.org/10.1016/s1074-5521(96)90190-8

    Article  CAS  PubMed  Google Scholar 

  60. Tamulaitiene G, Jovaisaite V, Tamulaitis G, Songailiene I, Manakova E, Zaremba M, Grazulis S, Xu S-y, Siksnys V (2017) Restriction endonuclease AgeI is a monomer which dimerizes to cleave DNA. Nucleic Acids Res 45(6):3547–3558. https://doi.org/10.1093/nar/gkw1310

    Article  CAS  PubMed  Google Scholar 

  61. Zhang X, Lin X, Zhao J, Huang Q, Xu X (2018) Efficiently predicting hot spots in PPIs by combining random forest and synthetic minority over-sampling technique. IEEE ACM Trans Comput Biol Bioinform 16(3):774–781. https://doi.org/10.1109/tcbb.2018.2871674

    Article  Google Scholar 

  62. Wen P, Xiao P, Xia J (2016) dbDSM: a manually curated database for deleterious synonymous mutations. Bioinformatics 32(12):1914–1916. https://doi.org/10.1093/bioinformatics/btw086

    Article  CAS  PubMed  Google Scholar 

  63. Shi F, Yao Y, Bin Y, Zheng C-H, Xia J (2019) Computational identification of deleterious synonymous variants in human genomes using a feature-based approach. BMC Med Genomics 12(1):12. https://doi.org/10.1186/s12920-018-0455-6

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Yue Z, Chu X, Xia J (2020) PredCID: prediction of driver frameshift indels in human cancer. Brief Bioinform. https://doi.org/10.1093/bib/bbaa119

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors acknowledge the High-performance Computing Platform of Anhui University for providing computing resources. This work was supported by the National Natural Science Foundation of China (11835014 and 21601001), the Recruitment Program for Leading Talent Team of Anhui Province (2019-16), and the China Postdoctoral Science Foundation (2018M630699).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yannan Bin.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest, financial or otherwise.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 78 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yao, L., Wang, H. & Bin, Y. Predicting Hot Spot Residues at Protein–DNA Binding Interfaces Based on Sequence Information. Interdiscip Sci Comput Life Sci 13, 1–11 (2021). https://doi.org/10.1007/s12539-020-00399-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-020-00399-z

Keywords

Navigation