Predicting Hot Spot Residues at Protein–DNA Binding Interfaces Based on Sequence Information

Yao, Lingsong; Wang, Huadong; Bin, Yannan

doi:10.1007/s12539-020-00399-z

Predicting Hot Spot Residues at Protein–DNA Binding Interfaces Based on Sequence Information

Original research article
Published: 17 October 2020

Volume 13, pages 1–11, (2021)
Cite this article

Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

843 Accesses
4 Citations
Explore all metrics

Abstract

Hot spot residues at protein–DNA binding interfaces are hugely important for investigating the underlying mechanism of molecular recognition. Currently, there are a few tools available for identifying the hot spot residues in the protein–DNA complexes. In addition, the three-dimensional protein structures are needed in these tools. However, it is well known that the three-dimensional structures are unavailable for most proteins. Considering the limitation, we proposed a method, named SPDH, for predicting hot spot residues only based on protein sequences. Firstly, we obtained 133 features from physicochemical property, conservation, predicted solvent accessible surface area and structure. Then, we systematically assessed these features based on various feature selection methods to obtain the optimal feature subset and compared the models using four classical machine learning algorithms (support vector machine, random forest, logistic regression, and k-nearest neighbor) on the training dataset. We found that the variability of physicochemical property features between wild and mutative types was important on improving the performance of the prediction model. On the independent test set, our method achieved the performance with AUC of 0.760 and sensitivity of 0.808, and outperformed other methods. The data and source code can be downloaded at https://github.com/xialab-ahu/SPDH.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence: machine learning for chemical sciences

Article 21 December 2021

AllerTOP v.2—a server for in silico prediction of allergens

Article 31 May 2014

Recent developments of sequence-based prediction of protein–protein interactions

Article 24 December 2022

References

Xiong Y, Zhu X, Dai H, Wei D-Q (2018) Survey of computational approaches for prediction of DNA-binding residues on protein surfaces. Computational systems biology. Springer, Berlin, pp 223–234. https://doi.org/10.1007/978-1-4939-7717-8_13
Chapter Google Scholar
Zhang S, Zhao L, Zheng C-H, Xia J (2020) A feature-based approach to predict hot spots in protein–DNA binding interfaces. Briefings Bioinf 21(3):1038–1046. https://doi.org/10.1093/bib/bbz037
Article CAS Google Scholar
Clackson T, Wells JA (1995) A hot spot of binding energy in a hormone-receptor interface. Science 267(5196):383–386. https://doi.org/10.1126/science.7529940
Article CAS PubMed Google Scholar
Chauhan S, Ahmad S (2020) Enabling full-length evolutionary profiles based deep convolutional neural network for predicting DNA-binding proteins from sequence. Proteins Struct Funct Bioinf 88(1):15–30. https://doi.org/10.1002/prot.25763
Article CAS Google Scholar
Wang L, Brown SJ (2006) BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res 34:W243–W248. https://doi.org/10.1093/nar/gkl298
Article CAS PubMed PubMed Central Google Scholar
Bogan AA, Thorn KS (1998) Anatomy of hot spots in protein interfaces. J Mol Biol 280(1):1–9. https://doi.org/10.1006/jmbi.1998.1843
Article CAS PubMed Google Scholar
DeLano WL (2002) Unraveling hot spots in binding interfaces: progress and challenges. Curr Opin Struct Biol 12(1):14–20. https://doi.org/10.1016/s0959-440x(02)00283-x
Article CAS PubMed Google Scholar
Moreira IS, Fernandes PA, Ramos MJ (2007a) Computational determination of the relative free energy of binding–application to alanine scanning mutagenesis. Molecular materials with specific interactions–modeling and design. Springer, Berlin, pp 305–339. https://doi.org/10.1007/1-4020-5372-x_6
Chapter Google Scholar
Moreira IS, Fernandes PA, Ramos MJ (2007b) Hot spots—a review of the protein-protein interface determinant amino-acid residues. Proteins 68(4):803–812. https://doi.org/10.1002/prot.21396
Article CAS PubMed Google Scholar
Gao M, Skolnick J (2009) A threading-based method for the prediction of DNA-binding proteins with application to the human genome. PLoS Comput Biol 5(11):e1000567. https://doi.org/10.1371/journal.pcbi.1000567
Article CAS PubMed PubMed Central Google Scholar
Gao Y, Wang R, Lai L (2004) Structure-based method for analyzing protein–protein interfaces. J Mol Model 10(1):44–54. https://doi.org/10.1007/s00894-003-0168-3
Article CAS PubMed Google Scholar
Nimrod G, Szilágyi A, Leslie C, Ben-Tal N (2009) Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. J Mol Biol 387(4):1040–1053. https://doi.org/10.1016/j.jmb.2009.02.023
Article CAS PubMed PubMed Central Google Scholar
Peng Y, Sun L, Jia Z, Li L, Alexov E (2018) Predicting protein–DNA binding free energy change upon missense mutations using modified MM/PBSA approach: SAMPDI webserver. Bioinformatics 34(5):779–786. https://doi.org/10.1093/bioinformatics/btx698
Article CAS PubMed Google Scholar
Pires DE, Ascher DB (2017) mCSM–NA: predicting the effects of mutations on protein–nucleic acids interactions. Nucleic Acids Res 45(W1):W241–W246. https://doi.org/10.1093/nar/gkx236
Article CAS PubMed PubMed Central Google Scholar
Zhang N, Chen Y, Zhao F, Yang Q, Simonetti FL, Li M (2018) PremPDI estimates and interprets the effects of missense mutations on protein-DNA interactions. PLoS Comput Biol 14(12):e1006615. https://doi.org/10.1371/journal.pcbi.1006615
Article CAS PubMed PubMed Central Google Scholar
(2019) Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res 47(D1):D520–D528. https://doi.org/10.1093/nar/gky949
Consortium U (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506–D515. https://doi.org/10.1093/nar/gky1049
Article CAS Google Scholar
Liu L, Xiong Y, Gao H, Wei D-Q, Mitchell JC, Zhu X (2018) dbAMEPNI: a database of alanine mutagenic effects for protein–nucleic acid interactions. Database 2018:bay034. https://doi.org/10.1093/database/bay034
Article CAS PubMed Central Google Scholar
He J, Fang T, Zhang Z, Huang B, Zhu X, Xiong Y (2018) PseUI: Pseudouridine sites identification based on RNA sequence information. BMC Bioinf 19(1):306. https://doi.org/10.1186/s12859-018-2321-0
Article CAS Google Scholar
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28. https://doi.org/10.1109/5254.708428
Article Google Scholar
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659. https://doi.org/10.1093/bioinformatics/btl158
Article CAS PubMed Google Scholar
Hubbard S, Thornton J (1992) NACCESS: program for calculating accessibilities. Department of biochemistry and molecular biology. University College of London. Available at https://www.bioinf.manchester.ac.uk/naccess
Lundberg J (2007) Lifting the crown-citation z-score. J Informetr 1(2):145–154. https://doi.org/10.1016/j.joi.2006.09.007
Article Google Scholar
Pan Y, Wang Z, Zhan W, Deng L (2018) Computational identification of binding energy hot spots in protein–RNA complexes using an ensemble approach. Bioinformatics 34(9):1473–1480. https://doi.org/10.1093/bioinformatics/btx822
Article CAS PubMed Google Scholar
Munteanu CR, AnC P, Fernandez-Lozano C, Melo A, Cordeiro MN, Moreira IS (2015) Solvent accessible surface area-based hot-spot detection methods for protein–protein and protein–nucleic acid interfaces. J Chem Inf Model 55(5):1077–1086. https://doi.org/10.1021/ci500760m
Article CAS PubMed Google Scholar
Petersen B, Petersen TN, Andersen P, Nielsen M, Lundegaard C (2009) A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct Biol 9(1):51. https://doi.org/10.1186/1472-6807-9-51
Article CAS PubMed PubMed Central Google Scholar
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402. https://doi.org/10.1093/nar/25.17.3389
Article CAS PubMed PubMed Central Google Scholar
Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5(1):1–11. https://doi.org/10.1038/srep11476
Article Google Scholar
Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33(18):2842–2849. https://doi.org/10.1093/bioinformatics/btx218
Article CAS PubMed Google Scholar
Dosztanyi Z, Csizmok V, Tompa P, Simon I (2005) The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol 347(4):827–839. https://doi.org/10.1016/j.jmb.2005.01.071
Article CAS PubMed Google Scholar
Mészáros B, Simon I, Dosztányi Z (2009) Prediction of protein binding regions in disordered proteins. PLoS Comput Biol 5(5):e1000376. https://doi.org/10.1371/journal.pcbi.1000376
Article CAS PubMed PubMed Central Google Scholar
Jones DT, Cozzetto D (2015) DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics 31(6):857–863. https://doi.org/10.1093/bioinformatics/btu744
Article CAS PubMed Google Scholar
Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, Russell RB (2003) Protein disorder prediction: implications for structural proteomics. Structure 11(11):1453–1459. https://doi.org/10.1016/j.str.2003.10.002
Article CAS PubMed Google Scholar
Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M (2007) AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 36(suppl_1):D202–D205. https://doi.org/10.1093/nar/gkm998
Article CAS PubMed PubMed Central Google Scholar
Chen P, Li J, Wong L, Kuwahara H, Huang JZ, Gao X (2013) Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences. Proteins 81(8):1351–1362. https://doi.org/10.1002/prot.24278
Article CAS PubMed Google Scholar
Zhang S, Zhao L, Xia J (2019) SPHot: prediction of hot spots in protein-RNA complexes by protein sequence information and ensemble classifier. IEEE Access 7:104941–104946. https://doi.org/10.1109/access.2019.2931552
Article Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422. https://doi.org/10.1023/a:1012487302797
Article Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal 27(8):1226–1238. https://doi.org/10.1109/tpami.2005.159
Article Google Scholar
Xia J, Yue Z, Di Y, Zhu X, Zheng C-H (2016) Predicting hot spots in protein interfaces based on protrusion index, pseudo hydrophobicity and electron-ion interaction pseudopotential features. Oncotarget 7(14):18065. https://doi.org/10.18632/oncotarget.7695
Article PubMed PubMed Central Google Scholar
Xia J-F, Zhao X-M, Song J, Huang D-S (2010) APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinf 11(1):174. https://doi.org/10.1186/1471-2105-11-174
Article CAS Google Scholar
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27. https://doi.org/10.1145/1961189.1961199
Article Google Scholar
Xiong Y, Wang Q, Yang J, Zhu X, Wei D-Q (2018) PredT4SE-Stack: prediction of bacterial type IV secreted effectors from protein sequences using a stacked ensemble method. Front Microbiol 9:2571. https://doi.org/10.3389/fmicb.2018.02571
Article PubMed PubMed Central Google Scholar
Yue Z, Zhao L, Cheng N, Yan H, Xia J (2019) dbCID: a manually curated resource for exploring the driver indels in human cancer. Brief Bioinform 20(5):1925–1933. https://doi.org/10.1093/bib/bby059
Article CAS PubMed Google Scholar
Cheng N, Li M, Zhao L, Zhang B, Yang Y, Zheng C-H, Xia J (2020) Comparison and integration of computational methods for deleterious synonymous mutation prediction. Brief Bioinform 21(3):970–981. https://doi.org/10.1093/bib/bbz047
Article CAS PubMed Google Scholar
Chu Y, Kaushik AC, Wang X, Wang W, Zhang Y, Shan X, Salahub DR, Xiong Y, Wei D-Q (2019) DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief Bioinform. https://doi.org/10.1093/bib/bbz152
Article Google Scholar
Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 100(9):1100–1103. https://doi.org/10.1109/t-c.1971.223410
Article Google Scholar
Veljkovic V, Cosic I, Lalovic D (1985) Is it possible to analyze DNA and protein sequences by the methods of digital signal processing? IEEE Trans Biomed Eng 5:337–341. https://doi.org/10.1109/tbme.1985.325549
Article Google Scholar
Wilce MC, Aguilar M-I, Hearn MT (1995) Physicochemical basis of amino acid hydrophobicity scales: evaluation of four new scales of amino acid hydrophobicity coefficients derived from RP-HPLC of peptides. Anal Chem 67(7):1210–1219. https://doi.org/10.1021/ac00103a012
Article CAS Google Scholar
Maxfield FR, Scheraga HA (1976) Status of empirical methods for the prediction of protein backbone topography. Biochemistry 15(23):5138–5153. https://doi.org/10.1021/bi00668a030
Article CAS PubMed Google Scholar
Lazović J (1996) Selection of amino acid parameters for Fourier transform-based analysis of proteins. Bioinformatics 12(6):553–562. https://doi.org/10.1093/bioinformatics/12.6.553
Article Google Scholar
Cosic I, Pavlovic M, Vojisavljevic V (1989) Prediction of “hot spots” in interleukin-2 based on informational spectrum characteristics of growth-regulating factors. Comparison with experimental data. Biochimie 71(3):333–342. https://doi.org/10.1016/0300-9084(89)90005-9
Article CAS PubMed Google Scholar
Ramachandran P, Antoniou A (2008) Identification of hot-spot locations in proteins using digital filters. IEEE J STSP 2(3):378–389. https://doi.org/10.1109/jstsp.2008.923850
Article Google Scholar
Dill KA (1990) Dominant forces in protein folding. Biochemistry 29(31):7133–7155. https://doi.org/10.1021/bi00483a001
Article CAS Google Scholar
Lichtarge O, Bourne HR, Cohen FE (1996) An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257(2):342–358. https://doi.org/10.1006/jmbi.1996.0167
Article CAS PubMed Google Scholar
Kenneth Morrow J, Zhang S (2012) Computational prediction of protein hot spot residues. Curr Pharm Des 18(9):1255–1265. https://doi.org/10.2174/138920012799362909
Article Google Scholar
Keskin O, Ma B, Nussinov R (2005) Hot regions in protein–protein interactions: the organization and contribution of structurally conserved hot spot residues. J Mol Biol 345(5):1281–1294. https://doi.org/10.1016/j.jmb.2004.10.077
Article CAS PubMed Google Scholar
Banerjee S, Nag S, Tapadar S, Ghosh S, Guha S, Bakshi S (2015) Improving protein protein interaction prediction by choosing appropriate physiochemical properties of amino acids. In: 2015 international conference and workshop on computing and communication (IEMCON). IEEE, pp 1–8. https://doi.org/10.1109/iemcon.2015.7344458
Sun M, Wang X, Zou C, He Z, Liu W, Li H (2016) Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors. BMC Bioinform 17(1):231. https://doi.org/10.1186/s12859-016-1110-x
Article CAS Google Scholar
Elrod-Erickson M, Rould MA, Nekludova L, Pabo CO (1996) Zif268 protein–DNA complex refined at 1.6 Å: a model system for understanding zinc finger–DNA interactions. Structure 4(10):1171–1180. https://doi.org/10.1016/s1074-5521(96)90190-8
Article CAS PubMed Google Scholar
Tamulaitiene G, Jovaisaite V, Tamulaitis G, Songailiene I, Manakova E, Zaremba M, Grazulis S, Xu S-y, Siksnys V (2017) Restriction endonuclease AgeI is a monomer which dimerizes to cleave DNA. Nucleic Acids Res 45(6):3547–3558. https://doi.org/10.1093/nar/gkw1310
Article CAS PubMed Google Scholar
Zhang X, Lin X, Zhao J, Huang Q, Xu X (2018) Efficiently predicting hot spots in PPIs by combining random forest and synthetic minority over-sampling technique. IEEE ACM Trans Comput Biol Bioinform 16(3):774–781. https://doi.org/10.1109/tcbb.2018.2871674
Article Google Scholar
Wen P, Xiao P, Xia J (2016) dbDSM: a manually curated database for deleterious synonymous mutations. Bioinformatics 32(12):1914–1916. https://doi.org/10.1093/bioinformatics/btw086
Article CAS PubMed Google Scholar
Shi F, Yao Y, Bin Y, Zheng C-H, Xia J (2019) Computational identification of deleterious synonymous variants in human genomes using a feature-based approach. BMC Med Genomics 12(1):12. https://doi.org/10.1186/s12920-018-0455-6
Article CAS PubMed PubMed Central Google Scholar
Yue Z, Chu X, Xia J (2020) PredCID: prediction of driver frameshift indels in human cancer. Brief Bioinform. https://doi.org/10.1093/bib/bbaa119
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors acknowledge the High-performance Computing Platform of Anhui University for providing computing resources. This work was supported by the National Natural Science Foundation of China (11835014 and 21601001), the Recruitment Program for Leading Talent Team of Anhui Province (2019-16), and the China Postdoctoral Science Foundation (2018M630699).

Author information

Authors and Affiliations

Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
Lingsong Yao & Yannan Bin
School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, China
Huadong Wang

Authors

Lingsong Yao
View author publications
You can also search for this author in PubMed Google Scholar
Huadong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yannan Bin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yannan Bin.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest, financial or otherwise.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 78 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yao, L., Wang, H. & Bin, Y. Predicting Hot Spot Residues at Protein–DNA Binding Interfaces Based on Sequence Information. Interdiscip Sci Comput Life Sci 13, 1–11 (2021). https://doi.org/10.1007/s12539-020-00399-z

Download citation

Received: 08 July 2020
Revised: 27 September 2020
Accepted: 01 October 2020
Published: 17 October 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s12539-020-00399-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions