A Novel Prediction Method of ATP Binding Residues from Protein Primary Sequence
ATP is an important nucleotide that provides energy for biological activities in cells. Correctly identifying the protein-ATP binding site is helpful for protein function annotations and new drug development. With the innovation of machine learning, more and more researchers start to predict the binding sites from protein sequences instead of using biochemical experiment methods. Since the number of non-binding residues is far from the number of binding residues, a popular method to deal with the ATP-binding dataset is to apply the under-sampling to construct training subset which will inevitably lose the negative samples. However, a lot of valuable information for ATP binding properties is hidden in negative samples which should be carefully considered. In this study, the dataset which contains full negative samples are applied in training process. In order to avoid biased in prediction result, the decision tree classification algorithm which shows stable performance in imbalanced data is applied. The prediction performance on five-fold cross validation has demonstrated that our proposed method improves the performance compared with using under-sampled data.
KeywordsATP-binding site Protein primary sequence Decision tree Binary classification
This work was supported by The National Natural Science Foundation of China (Project No. 61662057, 61672301) and Higher Educational Scientific Research Projects of Inner Mongolia Autonomous Region (Project No. NJZC17198).
- 1.Andrews, B.J., Hu, J.: TSC_ATP: a two-stage classifier for predicting protein-ATP binding sites from protein sequence. In: Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). IEEE, Niagara Falls (2015)Google Scholar
- 5.Chen, K., Mizianty, M.J., Kurgan, L.: ATPsite: sequence-based prediction of ATP-binding residues. Proteome Sci. 9(S1), S4 (2010)Google Scholar
- 10.Chauhan, J.S.: Identification of ATP binding residues of a protein from its primary sequence. http://www.imtech.res.in/raghava/atpint/atpdataset. 16 Oct 2018
- 15.Yedida, V.R.K.S., Chan, C.C., Duan, Z.H.: Protein function prediction using decision trees. In: IEEE International Conference on Bioinformatics and Biomedicine Workshops, pp. 193–199. IEEE, Philadelphia (2008)Google Scholar