As one of the most popular post-transcriptional modifications, pseudouridine (Ψ) participates in a series of biological processes. Therefore, the efficient detection of pseudouridine sites is very important in revealing its functions in biological processes. Although experimental techniques have been proposed for identifying Ψ sites at single-base resolution, they are still labor intensive and expensive. Recently, to fill the experimental method’s gap, computational methods have been proposed for identifying Ψ sites. However, their performances are still unsatisfactory. In this paper, we proposed an eXtreme Gradient Boosting (xgboost)-based method, called XG-PseU, to identify Ψ sites based on the optimal features obtained using the forward feature selection together with increment feature selection method. Our results demonstrated that XG-PseU is superior or at least complementary to existing methods for identifying pseudouridine sites. Finally, a freely available online web server for XG-PseU was established at http://www.bioml.cn/. We wish that XG-PseU will become a useful tool for computationally identifying Ψ sites.
This is a preview of subscription content, log in to check access.
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Basak A, Query CC (2014) A pseudouridine residue in the spliceosome core is part of the filamentous growth program in yeast. Cell Rep 8:966–973
Boccaletto P, Machnicka MA, Purta E, Piatkowski P, Baginski B, Wirecki TK, de Crecy-Lagard V, Ross R, Limbach PA, Kotter A, Helm M, Bujnicki JM (2018) MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res 46:D303–D307
Brayet J, Zehraoui F, Jeanson-Leh L, Israeli D, Tahi F (2014) Towards a piRNA prediction using multiple kernel fusion and support vector machine. Bioinformatics 30:I364–I370
Carlile TM, Rojas-Duran MF, Zinshteyn B, Shin H, Bartoli KM, Gilbert WV (2014) Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells. Nature 515:143–146
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Acm sigkdd international conference on knowledge discovery & data mining
Chen W, Ding H, Feng PM, Lin H, Chou KC (2016a) IACP: a sequence-based tool for identifying anticancer peptides. Oncotarget 7:16895–16909
Chen W, Tang H, Ye J, Lin H, Chou KC (2016b) iRNA-PseU: identifying RNA pseudouridine sites. Mol Ther Nucleic Acids 5:e332
Chen XX, Tang H, Li WC, Wu H, Chen W, Ding H, Lin H (2016c) Identification of bacterial cell wall lyases via pseudo amino acid composition. Biomed Res Int 2016:1654623
Chen W, Lv H, Nie F, Lin H (2019) i6mA-Pred: Identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz015
Chou KC (2001) Using subsite coupling to predict signal peptides. Protein Eng 14:75–79
Dezman ZDW, Gao C, Yang SM, Hu P, Yao L, Li HC, Chang CI, Mackenzie C (2017) Anomaly detection outperforms logistic regression in predicting outcomes in trauma patients. Prehospital Emerg Care 21:174–179
Feng PM, Chen W, Lin H, Chou K-C (2013) iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem 442:118–125
Ferre-D’Amare AR (2003) RNA-modifying enzymes. Curr Opin Struct Biol 13:49–55
Fujiwara T, Harigae H (2013) Pathophysiology and genetic mutations in congenital sideroblastic anemia. Pediatr Int 55:675–679
Fujiwara T, Harigae H (2019) Molecular pathophysiology and genetic mutations in congenital sideroblastic anemia. Free Radic Biol Med 133:179–185
Ge J, Yu YT (2013) RNA pseudouridylation: new insights into an old modification. Trends Biochem Sci 38:210–218
Guzzi N, Ciesla M, Ngoc PCT, Lang S, Arora S, Dimitriou M, Pimkova K, Sommarin MNE, Munita R, Lubas M, Lim Y, Okuyama K, Soneji S, Karlsson G, Hansson J, Jonsson G, Lund AH, Sigvardsson M, Hellstrom-Lindberg E, Hsieh AC, Bellodi C (2018) Pseudouridylation of tRNA-derived fragments steers translational control in stem cells. Cell 173(1204–1216):e1226
Hamma T, Ferre-D’Amare AR (2006) Pseudouridine synthases. Chem Biol 13:1125–1135
He J, Fang T, Zhang Z, Huang B, Zhu X, Xiong Y (2018) PseUI: pseudouridine sites identification based on RNA sequence information. BMC Bioinform 19:306
Hudson GA, Bloomingdale RJ, Znosko BM (2013) Thermodynamic contribution and nearest-neighbor parameters of pseudouridine-adenosine base pairs in oligoribonucleotides. RNA 19:1474–1482
Jiang W, Middleton K, Yoon HJ, Fouquet C, Carbon J (1993) An essential yeast protein, CBF5p, binds in vitro to centromeres and microtubules. Mol Cell Biol 13:4884–4893
Kiss T, Fayet E, Jady BE, Richard P, Weber M (2006) Biogenesis and intranuclear trafficking of human box C/D and H/ACA RNPs. Cold Spring Harb Symp Quant Biol 71:407–417
Le NQK (2019) iN6-methylat (5-step): identifying DNA N6-methyladenine sites in rice genome using continuous bag of nucleobases via Chou’s 5-step rule. Mol Genet Genomics. https://doi.org/10.1007/s00438-019-01570-y
Le NQ, Yapp EK, Ho QT, Nagasundaram N, Ou YY, Yeh HY (2019a) iEnhancer-5Step: identifying enhancers using hidden information of DNA sequences via Chou’s 5-step rule and word embedding. Anal Biochem 571:53–61
Le NQ, Yapp EK, Ou YY, Yeh HY (2019b) iMotor-CNN: identifying molecular functions of cytoskeleton motor proteins using 2D convolutional neural network via Chou’s 5-step rule. Anal Biochem 575:17–26
Li X, Zhu P, Ma S, Song J, Bai J, Sun F, Yi C (2015a) Chemical pulldown reveals dynamic pseudouridylation of the mammalian transcriptome. Nat Chem Biol 11:592–597
Li YH, Zhang G, Cui Q (2015b) PPUS: a web server to predict PUS-specific pseudouridine sites. Bioinformatics 31:3362–3364
Li GQ, Liu Z, Shen HB, Yu DJ (2016) Target M6A: identifying N-6-methyladenosine sites from RNA sequences via position-specific nucleotide propensities and a support vector machine. IEEE Trans Nanobiosci 15:674–682
Liu Y, Gu W, Zhang W, Wang J (2015) Predict and analyze protein glycation sites with the mRMR and IFS methods. Biomed Res Int 2015:561547
Schwartz S, Bernstein DA, Mumbach MR, Jovanovic M, Herbst RH, Leon-Ricardo BX, Engreitz JM, Guttman M, Satija R, Lander ES, Fink G, Regev A (2014) Transcriptome-wide mapping reveals widespread dynamic-regulated pseudouridylation of ncRNA and mRNA. Cell 159:148–162
Tahir M, Tayara H, Chong KT (2019) iPseU-CNN: identifying RNA pseudouridine sites using convolutional neural networks. Mol Ther Nucleic Acids 16:463–470
Tang H, Zhao YW, Zou P, Zhang CM, Chen R, Huang P, Lin H (2018) HBPred: a tool to identify growth hormone-binding proteins. Int J Biol Sci 14:957–964
Toh SM, Mankin AS (2008) An indigenous posttranscriptional modification in the ribosomal peptidyl transferase center confers resistance to an array of protein synthesis inhibitors. J Mol Biol 380:593–597
Vacic V, Iakoucheva LM, Radivojac P (2006) Two sample logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics 22:1536–1537
Vuckovic F, Theodoratou E, Thaci K, Timofeeva M, Vojta A, Stambuk J, Pucic-Bakovic M, Rudd PM, Derek L, Servis D, Wennerstrom A, Farrington SM, Perola M, Aulchenko Y, Dunlop MG, Campbell H, Lauc G (2016) IgG glycome in colorectal cancer. Clin Cancer Res 22:3078–3086
Wang L, Shen C, Hartley R (2011) On the optimality of sequential forward feature selection using class separability measure. In: International conference on digital image computing: techniques & applications
Wang Q, Zhao D, Wang Y, Hou X (2019) Ensemble learning algorithm based on multi-parameters for sleep staging. Med Biol Eng Comput 57(8):1693–1707. https://doi.org/10.1007/s11517-019-01978-z
Xuan JJ, Sun WJ, Lin PH, Zhou KR, Liu S, Zheng LL, Qu LH, Yang JH (2018) RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Res 46:D327–D334
Yang H, Tang H, Chen XX, Zhang CJ, Zhu PP, Ding H, Chen W, Lin H (2016) Identification of secretory proteins in mycobacterium tuberculosis using pseudo amino acid composition. Biomed Res Int 2016:5413903
Yao L, Cai M, Chen Y, Shen C, Shi L, Guo Y (2019) Prediction of antiepileptic drug treatment outcomes of patients with newly diagnosed epilepsy by machine learning. Epilepsy Behav 96:92–97
Ye K (2007) H/ACA guide RNAs, proteins and complexes. Curr Opin Struct Biol 17:287–292
Zebarjadian Y, King T, Fournier MJ, Clarke L, Carbon J (1999) Point mutations in yeast CBF5 can abolish in vivo pseudouridylation of rRNA. Mol Cell Biol 19:7461–7472
Zhang Y, Wang XH, Kang L (2011) A k-mer scheme to predict piRNAs and characterize locust piRNAs. Bioinformatics 27:771–776
This work was supported by the National Nature Scientific Foundation of China (31771471, 61772119) and the Natural Science Foundation for Distinguished Young Scholar of Hebei Province (No. C2017209244).
Conflict of interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants performed by any of the authors.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Communicated by Stefan Hohmann.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Liu, K., Chen, W. & Lin, H. XG-PseU: an eXtreme Gradient Boosting based method for identifying pseudouridine sites. Mol Genet Genomics 295, 13–21 (2020). https://doi.org/10.1007/s00438-019-01600-9
- eXtreme Gradient Boosting
- Feature selection
- Web server