Skip to main content

Advertisement

Log in

iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Pseudouridine represents one of the most prevalent post-transcriptional RNA modifications. The identification of pseudouridine sites is an essential step toward understanding RNA functions, RNA structure stabilization, translation process, and RNA stability; however, high-throughput experimental techniques remain expensive and time-consuming in lab explorations and biochemical processes. Thus, how to develop an efficient pseudouridine site identification method based on machine learning is very important both in academic research and drug development. Motived by this, we present an effective layered ensemble model designated as iPseU-Layer for identification of RNA pseudouridine sites. The proposed iPseU-Layer approach is essentially based on three different machine learning layers including: feature selection layer, feature extraction and fusion layer, and prediction layer. The feature selection layer reduces the dimensionality, which can be regarded as a data pre-processing stage. The feature extraction and fusion layer utilizes an ensemble method which is implemented through various machine learning algorithms to generate some outputs. The prediction layer applies classic random forest to identify the final results. Furthermore, we systematically conduct the validation experiments using cross-validation tests and independent test with the current state-of-the-art models. The proposed iPseU-Layer provides a promising predictive performance in terms of sensitivity, specificity, accuracy and Matthews correlation coefficient. Collectively, these findings indicate that the framework of iPseU-Layer is a feasible and effective strategy for the prediction of RNA pseudouridine sites.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Ge J, Yu YT (2013) RNA pseudouridylation: new insights into an old modification. Trends Biochem Sci 38(4):210–218. https://doi.org/10.1016/j.tibs.2013.01.002

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Hudson GA, Bloomingdale RJ, Znosko BM (2013) Thermodynamic contribution and nearest-neighbor parameters of pseudouridine-adenosine base pairs in oligoribonucleotides. Rna 19(11):1474–1482. https://doi.org/10.1261/rna.039610.113

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Tahir M, Tayara H, Chong KT (2019) iPseU-CNN: identifying RNA pseudouridine sites using convolutional neural networks. Mol Ther Nucl Acids 16:463–470. https://doi.org/10.1016/j.omtn.2019.03.010

    Article  CAS  Google Scholar 

  4. Reddy R, Busch H (1998) Small nuclear RNAs: RNA sequences, structure, and modifications. Structure and function of major and minor small nuclear ribonucleoprotein particles. Springer, Berlin, pp 1–37

    Google Scholar 

  5. Andrew TY, Ge J, Yu YT (2011) Pseudouridines in spliceosomal snRNAs. Protein Cell 2(9):712–725. https://doi.org/10.1007/s13238-011-1087-1

    Article  CAS  Google Scholar 

  6. Wu G, Yu AT, Kantartzis A et al (2011) Functions and mechanisms of spliceosomal small nuclear RNA pseudouridylation. Wires Rna 2(4):571–581. https://doi.org/10.1002/wrna.77

    Article  CAS  PubMed  Google Scholar 

  7. Maden BEH (1990) The numerous modified nucleotides in eukaryotic ribosomal RNA. Prog Nucl Acid Res 39:241–303. https://doi.org/10.1016/S0079-6603(08)60629-7

    Article  CAS  Google Scholar 

  8. Schattner P, Barberan-soler S, Lowe TM (2006) A computational screen for mammalian pseudouridylation guide H/ACA RNAs. Rna 12(1):15–25. https://doi.org/10.1261/rna.2210406

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Grosjean H, Sprinzl M, Steinberg S (1995) Posttranscriptionally modified nucleosides in transfer RNA: their locations and frequencies. Biochimie 77(1–2):139–141. https://doi.org/10.1016/0300-9084(96)88117-X

    Article  CAS  PubMed  Google Scholar 

  10. Sprinzl M, Horn C, Brown M et al (1998) Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res 26(1):148–153. https://doi.org/10.1093/nar/26.1.148

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Hopper AK, Phizicky EM (2003) tRNA transfers to the limelight. Genes Dev 17(2):162–180. https://doi.org/10.1101/gad.1049103

    Article  CAS  PubMed  Google Scholar 

  12. Karijolich J, Yu YT (2015) The new era of RNA modification. Rna 21(4):659–660. https://doi.org/10.1261/rna.049650.115

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Karijolich J, Yu YT (2011) Converting nonsense codons into sense codons by targeted pseudouridylation. Nature 474(7351):395–398. https://doi.org/10.1038/nature10165

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Carlile TM, Rojas-Duran MF, Zinshteyn B et al (2014) Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells. Nature 515(7525):143–146. https://doi.org/10.1038/nature13802

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Lovejoy AF, Riordan DP, Brown PO (2014) Transcriptome-wide mapping of pseudouridines: pseudouridine synthases modify specific mRNAs in S. cerevisiae. PLoS One 9(10):e110799. https://doi.org/10.1371/journal.pone.0110799

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Schwartz S, Bernstein DA, Mumbach MR et al (2014) Transcriptome-wide mapping reveals widespread dynamic-regulated pseudouridylation of ncRNA and mRNA. Cell 159(1):148–162. https://doi.org/10.1016/j.cell.2014.08.028

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Chen W, Feng P, Tang H et al (2016) Identifying 2’-O-methylationation sites by integrating nucleotide chemical properties and nucleotide compositions. Genomics 107(6):255–258. https://doi.org/10.1016/j.ygeno.2016.05.003

    Article  CAS  PubMed  Google Scholar 

  18. Sun WJ, Li JH, Liu S et al (2016) RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data. Nucleic Acids Res 44(D1):D259–D265. https://doi.org/10.1093/nar/gkv1036

    Article  CAS  PubMed  Google Scholar 

  19. Li YH, Zhang G, Cui Q (2015) PPUS: a web server to predict PUS-specific pseudouridine sites. Bioinformatics 31(20):3362–3364. https://doi.org/10.1093/bioinformatics/btv366

    Article  CAS  PubMed  Google Scholar 

  20. Chen W, Tang H, Ye J et al (2016) iRNA-PseU: identifying RNA pseudouridine sites. Mol Ther Nucl Acids 5:e332. https://doi.org/10.1038/mtna.2016.37

    Article  CAS  Google Scholar 

  21. He J, Fang T, Zhang Z et al (2018) PseUI: pseudouridine sites identification based on RNA sequence information. BMC Bioinform 19(1):306. https://doi.org/10.1186/s12859-018-2321-0

    Article  CAS  Google Scholar 

  22. Liu K, Chen W, Lin H (2020) XG-PseU: an eXtreme gradient boosting based method for identifying pseudouridine sites. Mol Genet Genomics 295(1):13–21. https://doi.org/10.1007/s00438-019-01600-9

    Article  CAS  PubMed  Google Scholar 

  23. Dou L, Li X, Ding H et al (2020) Is there any sequence feature in the RNA pseudouridine modification prediction problem? Mol Ther Nucl Acids 19:293–303. https://doi.org/10.1016/j.omtn.2019.11.014

    Article  CAS  Google Scholar 

  24. Jia J, Liu Z, Xiao X et al (2015) iPPI-Esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. J Theor Biol 377:47–56. https://doi.org/10.1016/j.jtbi.2015.04.011

    Article  CAS  PubMed  Google Scholar 

  25. Jia J, Liu Z, Xiao X et al (2016) pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J Theor Biol 394:223–230. https://doi.org/10.1016/j.jtbi.2016.01.020

    Article  CAS  PubMed  Google Scholar 

  26. Jia C, Zuo Y (2017) S-SulfPred: a sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. J Theor Biol 422:84–89. https://doi.org/10.1016/j.jtbi.2017.03.031

    Article  CAS  PubMed  Google Scholar 

  27. Chen W, Feng P, Yang H et al (2018) iRNA-3typeA: identifying three types of modification at RNA’s adenosine sites. Mol Ther Nucl Acids 11:468–474. https://doi.org/10.1016/j.omtn.2018.03.012

    Article  CAS  Google Scholar 

  28. Cheng X, Xiao X, Chou KC (2018) pLoc-mEuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. Genomics 110(1):50–58. https://doi.org/10.1016/j.ygeno.2017.08.005

    Article  CAS  PubMed  Google Scholar 

  29. Cheng X, Lin WZ, Xiao X et al (2019) pLoc\_bal-mAnimal: predict subcellular localization of animal proteins by balancing training dataset and PseAAC. Bioinformatics 35(3):398–406. https://doi.org/10.1093/bioinformatics/bty628

    Article  CAS  PubMed  Google Scholar 

  30. Feng P, Yang H, Ding H et al (2019) iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics 111(1):96–102. https://doi.org/10.1016/j.ygeno.2018.01.005

    Article  CAS  PubMed  Google Scholar 

  31. Cheng X, Xiao X, Chou KC (2018) pLoc-mGneg: predict subcellular localization of gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics 110(4):231–239. https://doi.org/10.1016/j.ygeno.2017.10.002

    Article  CAS  Google Scholar 

  32. Liu B, Li K, Huang DS et al (2018) iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach. Bioinformatics 34(22):3835–3842. https://doi.org/10.1093/bioinformatics/bty458

    Article  CAS  PubMed  Google Scholar 

  33. Liu B, Weng F, Huang DS et al (2018) iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC. Bioinformatics 34(18):3086–3093. https://doi.org/10.1093/bioinformatics/bty312

    Article  CAS  PubMed  Google Scholar 

  34. Su ZD, Huang Y, Zhang ZY et al (2018) iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics 34(24):4196–4204. https://doi.org/10.1093/bioinformatics/bty508

    Article  CAS  PubMed  Google Scholar 

  35. Chen Z, Zhao P, Li F et al (2019) iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief Bioinform. https://doi.org/10.1093/bib/bbz041

    Article  PubMed  PubMed Central  Google Scholar 

  36. Hall MA (1998) Correlation-based feature subset selection for machine learning. University of Waikato, Hamilton

    Google Scholar 

  37. Shi H (2007) Best-first decision tree learning. The University of Waikato, Hamilton

    Google Scholar 

  38. Jia J, Liu Z, Xiao X et al (2016) iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC. Oncotarget 7(23):34558. https://doi.org/10.18632/oncotarget.9148

    Article  PubMed  PubMed Central  Google Scholar 

  39. Jia J, Liu Z, Xiao X et al (2016) Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition. J Biomol Struct Dyn 34(9):1946–1961. https://doi.org/10.1080/07391102.2015.1095116

    Article  CAS  PubMed  Google Scholar 

  40. Jia J, Liu Z, Xiao X et al (2016) iPPBS-Opt: a sequence-based ensemble classifier for identifying protein-protein binding sites by optimizing imbalanced training datasets. Molecules 21(1):95. https://doi.org/10.3390/molecules21010095

    Article  PubMed Central  Google Scholar 

  41. Jia J, Liu Z, Xiao X et al (2016) iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Anal Biochem 497:48–56. https://doi.org/10.1016/j.ab.2015.12.009

    Article  CAS  PubMed  Google Scholar 

  42. Jia J, Zhang L, Liu Z et al (2016) pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics 32(20):3133–3141. https://doi.org/10.1093/bioinformatics/btw387

    Article  CAS  PubMed  Google Scholar 

  43. Chen W, Feng PM, Lin H et al (2013) iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res 41(6):e68–e68. https://doi.org/10.1093/nar/gks1450

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Lin H, Deng EZ, Ding H et al (2014) iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res 42(21):12961–12972. https://doi.org/10.1093/nar/gku1019

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Liu B, Liu F, Wang X et al (2015) Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 43(W1):W65–W71. https://doi.org/10.1038/mtna.2016.37

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Liu B, Wang S, Long R et al (2017) iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics 33(1):35–41. https://doi.org/10.1093/bioinformatics/btw539

    Article  CAS  PubMed  Google Scholar 

  47. Liu B, Wu H, Chou KC (2017) Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nat Sci 9(04):67. https://doi.org/10.4236/ns.2017.94007

    Article  CAS  Google Scholar 

  48. Liu B, Yang F, Chou KC (2017) 2L-piRNA: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function. Mol Ther Nucl Acids 7:267–277. https://doi.org/10.1016/j.omtn.2017.04.008

    Article  CAS  Google Scholar 

  49. Qiu WR, Xiao X, Chou KC (2014) iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int J Mol Sci 15(2):1746–1766. https://doi.org/10.3390/ijms15021746

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Chou KC (2015) Impacts of bioinformatics to medicinal chemistry. Med Chem 11(3):218–234. https://doi.org/10.2174/1573406411666141229162834

    Article  CAS  PubMed  Google Scholar 

  51. Xiao X, Ye HX, Liu Z et al (2016) iROS-gPseKNC: predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition. Oncotarget 7(23):34180. https://doi.org/10.18632/oncotarget.9057

    Article  PubMed  PubMed Central  Google Scholar 

  52. Feng P, Ding H, Yang H et al (2017) iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Mol Ther Nucl Acids 7:155–163. https://doi.org/10.1016/j.omtn.2017.03.006

    Article  CAS  Google Scholar 

  53. Yang H, Qiu WR, Liu G et al (2018) iRSpot-Pse6NC: identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. Int J Biol Sci 14(8):883. https://doi.org/10.7150/ijbs.24616

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Song J, Wang Y, Li F et al (2019) iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief Bioinform 20(2):638–658. https://doi.org/10.1093/bib/bby028

    Article  CAS  PubMed  Google Scholar 

  55. Chou KC (2001) Prediction of signal peptides using scaled window. Peptides 22(12):1973–1979. https://doi.org/10.1016/S0196-9781(01)00540-X

    Article  CAS  PubMed  Google Scholar 

  56. Chou KC (2001) Using subsite coupling to predict signal peptides. Protein Eng 14(2):75–79. https://doi.org/10.1093/protein/14.2.75

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by the Research Foundation for Advanced Talents (Nos. 2019BS007, 31401204) of Henan University of Technology and the National Natural Science Foundation of China under Grants (Nos. 61673082, 61773352).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yashuang Mu.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mu, Y., Zhang, R., Wang, L. et al. iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model. Interdiscip Sci Comput Life Sci 12, 193–203 (2020). https://doi.org/10.1007/s12539-020-00362-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-020-00362-y

Keywords

Navigation