iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model

Mu, Yashuang; Zhang, Ruijun; Wang, Lidong; Liu, Xiaodong

doi:10.1007/s12539-020-00362-y

iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model

Original research article
Published: 13 March 2020

Volume 12, pages 193–203, (2020)
Cite this article

Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Yashuang Mu ORCID: orcid.org/0000-0003-0362-9640^1,2,
Ruijun Zhang³,
Lidong Wang³ &
…
Xiaodong Liu⁴

394 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Pseudouridine represents one of the most prevalent post-transcriptional RNA modifications. The identification of pseudouridine sites is an essential step toward understanding RNA functions, RNA structure stabilization, translation process, and RNA stability; however, high-throughput experimental techniques remain expensive and time-consuming in lab explorations and biochemical processes. Thus, how to develop an efficient pseudouridine site identification method based on machine learning is very important both in academic research and drug development. Motived by this, we present an effective layered ensemble model designated as iPseU-Layer for identification of RNA pseudouridine sites. The proposed iPseU-Layer approach is essentially based on three different machine learning layers including: feature selection layer, feature extraction and fusion layer, and prediction layer. The feature selection layer reduces the dimensionality, which can be regarded as a data pre-processing stage. The feature extraction and fusion layer utilizes an ensemble method which is implemented through various machine learning algorithms to generate some outputs. The prediction layer applies classic random forest to identify the final results. Furthermore, we systematically conduct the validation experiments using cross-validation tests and independent test with the current state-of-the-art models. The proposed iPseU-Layer provides a promising predictive performance in terms of sensitivity, specificity, accuracy and Matthews correlation coefficient. Collectively, these findings indicate that the framework of iPseU-Layer is a feasible and effective strategy for the prediction of RNA pseudouridine sites.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting Pseudouridine Sites with Porpoise

iPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features

Article Open access 30 December 2019

Identification of 6-methyladenosine sites using novel feature encoding methods and ensemble models

Article Open access 08 April 2024

References

Ge J, Yu YT (2013) RNA pseudouridylation: new insights into an old modification. Trends Biochem Sci 38(4):210–218. https://doi.org/10.1016/j.tibs.2013.01.002
Article CAS PubMed PubMed Central Google Scholar
Hudson GA, Bloomingdale RJ, Znosko BM (2013) Thermodynamic contribution and nearest-neighbor parameters of pseudouridine-adenosine base pairs in oligoribonucleotides. Rna 19(11):1474–1482. https://doi.org/10.1261/rna.039610.113
Article CAS PubMed PubMed Central Google Scholar
Tahir M, Tayara H, Chong KT (2019) iPseU-CNN: identifying RNA pseudouridine sites using convolutional neural networks. Mol Ther Nucl Acids 16:463–470. https://doi.org/10.1016/j.omtn.2019.03.010
Article CAS Google Scholar
Reddy R, Busch H (1998) Small nuclear RNAs: RNA sequences, structure, and modifications. Structure and function of major and minor small nuclear ribonucleoprotein particles. Springer, Berlin, pp 1–37
Google Scholar
Andrew TY, Ge J, Yu YT (2011) Pseudouridines in spliceosomal snRNAs. Protein Cell 2(9):712–725. https://doi.org/10.1007/s13238-011-1087-1
Article CAS Google Scholar
Wu G, Yu AT, Kantartzis A et al (2011) Functions and mechanisms of spliceosomal small nuclear RNA pseudouridylation. Wires Rna 2(4):571–581. https://doi.org/10.1002/wrna.77
Article CAS PubMed Google Scholar
Maden BEH (1990) The numerous modified nucleotides in eukaryotic ribosomal RNA. Prog Nucl Acid Res 39:241–303. https://doi.org/10.1016/S0079-6603(08)60629-7
Article CAS Google Scholar
Schattner P, Barberan-soler S, Lowe TM (2006) A computational screen for mammalian pseudouridylation guide H/ACA RNAs. Rna 12(1):15–25. https://doi.org/10.1261/rna.2210406
Article CAS PubMed PubMed Central Google Scholar
Grosjean H, Sprinzl M, Steinberg S (1995) Posttranscriptionally modified nucleosides in transfer RNA: their locations and frequencies. Biochimie 77(1–2):139–141. https://doi.org/10.1016/0300-9084(96)88117-X
Article CAS PubMed Google Scholar
Sprinzl M, Horn C, Brown M et al (1998) Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res 26(1):148–153. https://doi.org/10.1093/nar/26.1.148
Article CAS PubMed PubMed Central Google Scholar
Hopper AK, Phizicky EM (2003) tRNA transfers to the limelight. Genes Dev 17(2):162–180. https://doi.org/10.1101/gad.1049103
Article CAS PubMed Google Scholar
Karijolich J, Yu YT (2015) The new era of RNA modification. Rna 21(4):659–660. https://doi.org/10.1261/rna.049650.115
Article CAS PubMed PubMed Central Google Scholar
Karijolich J, Yu YT (2011) Converting nonsense codons into sense codons by targeted pseudouridylation. Nature 474(7351):395–398. https://doi.org/10.1038/nature10165
Article CAS PubMed PubMed Central Google Scholar
Carlile TM, Rojas-Duran MF, Zinshteyn B et al (2014) Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells. Nature 515(7525):143–146. https://doi.org/10.1038/nature13802
Article CAS PubMed PubMed Central Google Scholar
Lovejoy AF, Riordan DP, Brown PO (2014) Transcriptome-wide mapping of pseudouridines: pseudouridine synthases modify specific mRNAs in S. cerevisiae. PLoS One 9(10):e110799. https://doi.org/10.1371/journal.pone.0110799
Article CAS PubMed PubMed Central Google Scholar
Schwartz S, Bernstein DA, Mumbach MR et al (2014) Transcriptome-wide mapping reveals widespread dynamic-regulated pseudouridylation of ncRNA and mRNA. Cell 159(1):148–162. https://doi.org/10.1016/j.cell.2014.08.028
Article CAS PubMed PubMed Central Google Scholar
Chen W, Feng P, Tang H et al (2016) Identifying 2’-O-methylationation sites by integrating nucleotide chemical properties and nucleotide compositions. Genomics 107(6):255–258. https://doi.org/10.1016/j.ygeno.2016.05.003
Article CAS PubMed Google Scholar
Sun WJ, Li JH, Liu S et al (2016) RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data. Nucleic Acids Res 44(D1):D259–D265. https://doi.org/10.1093/nar/gkv1036
Article CAS PubMed Google Scholar
Li YH, Zhang G, Cui Q (2015) PPUS: a web server to predict PUS-specific pseudouridine sites. Bioinformatics 31(20):3362–3364. https://doi.org/10.1093/bioinformatics/btv366
Article CAS PubMed Google Scholar
Chen W, Tang H, Ye J et al (2016) iRNA-PseU: identifying RNA pseudouridine sites. Mol Ther Nucl Acids 5:e332. https://doi.org/10.1038/mtna.2016.37
Article CAS Google Scholar
He J, Fang T, Zhang Z et al (2018) PseUI: pseudouridine sites identification based on RNA sequence information. BMC Bioinform 19(1):306. https://doi.org/10.1186/s12859-018-2321-0
Article CAS Google Scholar
Liu K, Chen W, Lin H (2020) XG-PseU: an eXtreme gradient boosting based method for identifying pseudouridine sites. Mol Genet Genomics 295(1):13–21. https://doi.org/10.1007/s00438-019-01600-9
Article CAS PubMed Google Scholar
Dou L, Li X, Ding H et al (2020) Is there any sequence feature in the RNA pseudouridine modification prediction problem? Mol Ther Nucl Acids 19:293–303. https://doi.org/10.1016/j.omtn.2019.11.014
Article CAS Google Scholar
Jia J, Liu Z, Xiao X et al (2015) iPPI-Esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. J Theor Biol 377:47–56. https://doi.org/10.1016/j.jtbi.2015.04.011
Article CAS PubMed Google Scholar
Jia J, Liu Z, Xiao X et al (2016) pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J Theor Biol 394:223–230. https://doi.org/10.1016/j.jtbi.2016.01.020
Article CAS PubMed Google Scholar
Jia C, Zuo Y (2017) S-SulfPred: a sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. J Theor Biol 422:84–89. https://doi.org/10.1016/j.jtbi.2017.03.031
Article CAS PubMed Google Scholar
Chen W, Feng P, Yang H et al (2018) iRNA-3typeA: identifying three types of modification at RNA’s adenosine sites. Mol Ther Nucl Acids 11:468–474. https://doi.org/10.1016/j.omtn.2018.03.012
Article CAS Google Scholar
Cheng X, Xiao X, Chou KC (2018) pLoc-mEuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. Genomics 110(1):50–58. https://doi.org/10.1016/j.ygeno.2017.08.005
Article CAS PubMed Google Scholar
Cheng X, Lin WZ, Xiao X et al (2019) pLoc\_bal-mAnimal: predict subcellular localization of animal proteins by balancing training dataset and PseAAC. Bioinformatics 35(3):398–406. https://doi.org/10.1093/bioinformatics/bty628
Article CAS PubMed Google Scholar
Feng P, Yang H, Ding H et al (2019) iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics 111(1):96–102. https://doi.org/10.1016/j.ygeno.2018.01.005
Article CAS PubMed Google Scholar
Cheng X, Xiao X, Chou KC (2018) pLoc-mGneg: predict subcellular localization of gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics 110(4):231–239. https://doi.org/10.1016/j.ygeno.2017.10.002
Article CAS Google Scholar
Liu B, Li K, Huang DS et al (2018) iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach. Bioinformatics 34(22):3835–3842. https://doi.org/10.1093/bioinformatics/bty458
Article CAS PubMed Google Scholar
Liu B, Weng F, Huang DS et al (2018) iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC. Bioinformatics 34(18):3086–3093. https://doi.org/10.1093/bioinformatics/bty312
Article CAS PubMed Google Scholar
Su ZD, Huang Y, Zhang ZY et al (2018) iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics 34(24):4196–4204. https://doi.org/10.1093/bioinformatics/bty508
Article CAS PubMed Google Scholar
Chen Z, Zhao P, Li F et al (2019) iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief Bioinform. https://doi.org/10.1093/bib/bbz041
Article PubMed PubMed Central Google Scholar
Hall MA (1998) Correlation-based feature subset selection for machine learning. University of Waikato, Hamilton
Google Scholar
Shi H (2007) Best-first decision tree learning. The University of Waikato, Hamilton
Google Scholar
Jia J, Liu Z, Xiao X et al (2016) iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC. Oncotarget 7(23):34558. https://doi.org/10.18632/oncotarget.9148
Article PubMed PubMed Central Google Scholar
Jia J, Liu Z, Xiao X et al (2016) Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition. J Biomol Struct Dyn 34(9):1946–1961. https://doi.org/10.1080/07391102.2015.1095116
Article CAS PubMed Google Scholar
Jia J, Liu Z, Xiao X et al (2016) iPPBS-Opt: a sequence-based ensemble classifier for identifying protein-protein binding sites by optimizing imbalanced training datasets. Molecules 21(1):95. https://doi.org/10.3390/molecules21010095
Article PubMed Central Google Scholar
Jia J, Liu Z, Xiao X et al (2016) iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Anal Biochem 497:48–56. https://doi.org/10.1016/j.ab.2015.12.009
Article CAS PubMed Google Scholar
Jia J, Zhang L, Liu Z et al (2016) pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics 32(20):3133–3141. https://doi.org/10.1093/bioinformatics/btw387
Article CAS PubMed Google Scholar
Chen W, Feng PM, Lin H et al (2013) iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res 41(6):e68–e68. https://doi.org/10.1093/nar/gks1450
Article CAS PubMed PubMed Central Google Scholar
Lin H, Deng EZ, Ding H et al (2014) iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res 42(21):12961–12972. https://doi.org/10.1093/nar/gku1019
Article CAS PubMed PubMed Central Google Scholar
Liu B, Liu F, Wang X et al (2015) Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 43(W1):W65–W71. https://doi.org/10.1038/mtna.2016.37
Article CAS PubMed PubMed Central Google Scholar
Liu B, Wang S, Long R et al (2017) iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics 33(1):35–41. https://doi.org/10.1093/bioinformatics/btw539
Article CAS PubMed Google Scholar
Liu B, Wu H, Chou KC (2017) Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nat Sci 9(04):67. https://doi.org/10.4236/ns.2017.94007
Article CAS Google Scholar
Liu B, Yang F, Chou KC (2017) 2L-piRNA: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function. Mol Ther Nucl Acids 7:267–277. https://doi.org/10.1016/j.omtn.2017.04.008
Article CAS Google Scholar
Qiu WR, Xiao X, Chou KC (2014) iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int J Mol Sci 15(2):1746–1766. https://doi.org/10.3390/ijms15021746
Article CAS PubMed PubMed Central Google Scholar
Chou KC (2015) Impacts of bioinformatics to medicinal chemistry. Med Chem 11(3):218–234. https://doi.org/10.2174/1573406411666141229162834
Article CAS PubMed Google Scholar
Xiao X, Ye HX, Liu Z et al (2016) iROS-gPseKNC: predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition. Oncotarget 7(23):34180. https://doi.org/10.18632/oncotarget.9057
Article PubMed PubMed Central Google Scholar
Feng P, Ding H, Yang H et al (2017) iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Mol Ther Nucl Acids 7:155–163. https://doi.org/10.1016/j.omtn.2017.03.006
Article CAS Google Scholar
Yang H, Qiu WR, Liu G et al (2018) iRSpot-Pse6NC: identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. Int J Biol Sci 14(8):883. https://doi.org/10.7150/ijbs.24616
Article CAS PubMed PubMed Central Google Scholar
Song J, Wang Y, Li F et al (2019) iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief Bioinform 20(2):638–658. https://doi.org/10.1093/bib/bby028
Article CAS PubMed Google Scholar
Chou KC (2001) Prediction of signal peptides using scaled window. Peptides 22(12):1973–1979. https://doi.org/10.1016/S0196-9781(01)00540-X
Article CAS PubMed Google Scholar
Chou KC (2001) Using subsite coupling to predict signal peptides. Protein Eng 14(2):75–79. https://doi.org/10.1093/protein/14.2.75
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the Research Foundation for Advanced Talents (Nos. 2019BS007, 31401204) of Henan University of Technology and the National Natural Science Foundation of China under Grants (Nos. 61673082, 61773352).

Author information

Authors and Affiliations

Key Laboratory of Grain Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou, 450001, People’s Republic of China
Yashuang Mu
College of Information Science and Engineering, Henan University of Technology, Zhengzhou, 450001, People’s Republic of China
Yashuang Mu
School of Science, Dalian Maritime University, Dalian, 116026, People’s Republic of China
Ruijun Zhang & Lidong Wang
School of Control Science and Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, 116024, People’s Republic of China
Xiaodong Liu

Authors

Yashuang Mu
View author publications
You can also search for this author in PubMed Google Scholar
Ruijun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lidong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yashuang Mu.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mu, Y., Zhang, R., Wang, L. et al. iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model. Interdiscip Sci Comput Life Sci 12, 193–203 (2020). https://doi.org/10.1007/s12539-020-00362-y

Download citation

Received: 09 October 2019
Revised: 16 February 2020
Accepted: 19 February 2020
Published: 13 March 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s12539-020-00362-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model

Abstract

Access this article

Similar content being viewed by others

Predicting Pseudouridine Sites with Porpoise

iPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features

Identification of 6-methyladenosine sites using novel feature encoding methods and ensemble models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model

Abstract

Access this article

Similar content being viewed by others

Predicting Pseudouridine Sites with Porpoise

iPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features

Identification of 6-methyladenosine sites using novel feature encoding methods and ensemble models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation