A Novel Prediction Method of ATP Binding Residues from Protein Primary Sequence

  • Chuyi Song
  • Guixia Liu
  • Jiazhi Song
  • Jingqing JiangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11555)


ATP is an important nucleotide that provides energy for biological activities in cells. Correctly identifying the protein-ATP binding site is helpful for protein function annotations and new drug development. With the innovation of machine learning, more and more researchers start to predict the binding sites from protein sequences instead of using biochemical experiment methods. Since the number of non-binding residues is far from the number of binding residues, a popular method to deal with the ATP-binding dataset is to apply the under-sampling to construct training subset which will inevitably lose the negative samples. However, a lot of valuable information for ATP binding properties is hidden in negative samples which should be carefully considered. In this study, the dataset which contains full negative samples are applied in training process. In order to avoid biased in prediction result, the decision tree classification algorithm which shows stable performance in imbalanced data is applied. The prediction performance on five-fold cross validation has demonstrated that our proposed method improves the performance compared with using under-sampled data.


ATP-binding site Protein primary sequence Decision tree Binary classification 



This work was supported by The National Natural Science Foundation of China (Project No. 61662057, 61672301) and Higher Educational Scientific Research Projects of Inner Mongolia Autonomous Region (Project No. NJZC17198).


  1. 1.
    Andrews, B.J., Hu, J.: TSC_ATP: a two-stage classifier for predicting protein-ATP binding sites from protein sequence. In: Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). IEEE, Niagara Falls (2015)Google Scholar
  2. 2.
    Burley, S.K., Berman, H.M., Christie, C., et al.: RCSB protein data bank: sustaining a living digital data resource that enables breakthroughs in scientific research and biomedical education. Protein Sci. 27(1), 316–330 (2018)CrossRefGoogle Scholar
  3. 3.
    Johnson, Z.L., Chen, J.: ATP binding enables substrate release from multidrug resistance protein1. Cell 172(1-2), 81 (2018)CrossRefGoogle Scholar
  4. 4.
    Chauhan, J.S., Mishra, N.K., Raghava, G.P.: Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinform. 10(1), 434 (2009)CrossRefGoogle Scholar
  5. 5.
    Chen, K., Mizianty, M.J., Kurgan, L.: ATPsite: sequence-based prediction of ATP-binding residues. Proteome Sci. 9(S1), S4 (2010)Google Scholar
  6. 6.
    Chen, K., Mizianty, M.J., Kurgan, L.: Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics 28(3), 331–341 (2012)CrossRefGoogle Scholar
  7. 7.
    Bauer, R.A., Günther, S., Jansen, D.: SuperSite: dictionary of metabolite and drug binding sites in proteins. Nucleic Acids Res. 37(Database issue), D195–D200 (2009)CrossRefGoogle Scholar
  8. 8.
    Li, W., Godzik, A.: CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13), 1658 (2006)CrossRefGoogle Scholar
  9. 9.
    Sobolev, V., Sorokine, A., Prilusky, J.: Automated analysis of interatomic contacts in proteins. Bioinformatics 15(4), 327–332 (1999)CrossRefGoogle Scholar
  10. 10.
    Chauhan, J.S.: Identification of ATP binding residues of a protein from its primary sequence. 16 Oct 2018
  11. 11.
    Altschul, S.F., Madden, T.L., Shaffer, A.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1996)CrossRefGoogle Scholar
  12. 12.
    Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acids Res. 25(1), 31–36 (1997)CrossRefGoogle Scholar
  13. 13.
    Mcguffin, L.J., Bryson, K., Jones, D.T.: The PSIPRED protein structure prediction server. Bioinformatics 16(4), 404–405 (2000)CrossRefGoogle Scholar
  14. 14.
    Shen, J., Zhang, J., Luo, X.: Predicting protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. 104(11), 4337–4341 (2007)CrossRefGoogle Scholar
  15. 15.
    Yedida, V.R.K.S., Chan, C.C., Duan, Z.H.: Protein function prediction using decision trees. In: IEEE International Conference on Bioinformatics and Biomedicine Workshops, pp. 193–199. IEEE, Philadelphia (2008)Google Scholar
  16. 16.
    Shringi, R.P.: PyMol software for 3D visualization of aligned molecules. Biomaterials 26(1), 63–72 (2005)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Chuyi Song
    • 1
  • Guixia Liu
    • 2
    • 3
  • Jiazhi Song
    • 2
    • 3
  • Jingqing Jiang
    • 4
    Email author
  1. 1.College of MathematicsInner Mongolia University for NationalitiesTongliaoChina
  2. 2.College of Computer Science and TechnologyJilin UniversityChangchunChina
  3. 3.Key Laboratory of Symbolic Computational and Knowledge EngineeringChangchunChina
  4. 4.College of Computer Science and TechnologyInner Mongolia University for NationalitiesTongliaoChina

Personalised recommendations