Skip to main content

Ensemble Classifiers for Multiclass MicroRNA Classification

  • Protocol
  • First Online:
miRNomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2257))

Abstract

Gene regulation is of utmost importance to cell homeostasis; thus, any dysregulation in it often leads to disease. MicroRNAs (miRNAs) are involved in posttranscriptional gene regulation and consequently, their dysregulation has been associated with many diseases.

MiRBase version 21 contains microRNAs from about 200 species organized into about 70 clades. It has been shown that not all miRNAs collected in the database are likely to be real and, therefore, novel routes to delineate between correct and false miRNAs should be explored. We introduce a novel approach based on k-mer frequencies and machine learning that assigns an unknown/unlabeled miRNA to its most likely clade/species of origin. A simple way to filter new data would be to ensure that the novel miRNA categorizes closely to the species it is said to originate from. For that, an ensemble classifier of multiple two-class random forest classifiers was designed, where each random forest was trained on one species–clade pair. The approach was tested with different sampling methods on a dataset that was taken from miRBase version 21 and it was evaluated using a hierarchical F-measure. The approach predicted 81% to 94% of the test data correctly, depending on the sampling method. This is the first classifier that can classify miRNAs to their species of origin. This method will aid in the evaluation of miRNA database integrity and analysis of noisy miRNA samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bartel D (2004) MicroRNAsGenomics, biogenesis, mechanism, and function. Cell 116:281–297

    Article  CAS  Google Scholar 

  2. Hammond SM (2015) An overview of microRNAs. Adv Drug Deliv Rev 87:3–14

    Article  CAS  Google Scholar 

  3. Hamzeiy H, Suluyayla R, Brinkrolf C, Janowski SJ, Hofestaedt R, Allmer J (2017) Visualization and analysis of MicroRNAs within KEGG pathways using VANESA. J Integr Bioinform 14:20160004

    Article  Google Scholar 

  4. Takamizawa J, Konishi H, Yanagisawa K, Tomida S, Osada H, Endoh H, Harano T, Yatabe Y, Nagino M, Nimura Y et al (2004) Reduced expression of the let-7 microRNAs in human lung cancers in association with shortened postoperative survival. Cancer Res 64:3753–3756

    Article  CAS  Google Scholar 

  5. Rodriguez A, Griffiths-Jones S, Ashurst JL, Bradley A (2004) Identification of mammalian microRNA host genes and transcription units. Genome Res 14:1902–1910

    Article  CAS  Google Scholar 

  6. Sempere LF, Cole CN, Mcpeek MA, Peterson KJ (2006) The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. J Exp Zoolog B Mol Dev Evol 306:575–588

    Article  Google Scholar 

  7. Kozomara A, Griffiths-Jones S (2010) miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res 39:D152–D157

    Article  Google Scholar 

  8. Velandia-Huerto CA, Yazbeck AM, Schor J, Stadle PF (2021) Evolution and phylogeny of microRNAs—protocols, pitfalls, and problems. In: Allmer J, Yousef M (eds) miRNomics: microRNA biology and computational analysis. Methods in molecular biology, vol 2257. Springer, New York

    Google Scholar 

  9. Meng Y, Shao C, Wang H, Chen M (2012) Are all the miRBase-registered microRNAs true? A structure-and expression-based re-examination in plants. RNA Biol 9:249–253

    Article  CAS  Google Scholar 

  10. Saçar MD, Hamzeiy H, Allmer J (2013) Can MiRBase provide positive data for machine learning for the detection of MiRNA hairpins? J Integr Bioinform 10:1–11

    Article  Google Scholar 

  11. Bağcı C, Allmer J (2016) One step forward, two steps back; xeno-microRNAs reported in breast milk are artifacts. PLoS One 11:e0145065

    Article  Google Scholar 

  12. Zhang B, Pan X, Cannon CH, Cobb GP, Anderson TA (2006) Conservation and divergence of plant microRNA genes. Plant J 46:243–259

    Article  CAS  Google Scholar 

  13. Yousef M, Khalifa W, Acar İE, Allmer J (2017) MicroRNA categorization using sequence motifs and k-mers. BMC Bioinformatics 18:170

    Article  Google Scholar 

  14. Yousef M, Nigatu D, Levy D, Allmer J, Henkel W (2017) Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers. Eurasip J Adv Signal Proc 2017. https://doi.org/10.1186/s13634-017-0506-8

  15. Yousef M (2019) Hamming distance and K-mer features for classification of pre-cursor microRNAs from different species. In: Benavente-Peces C, Slama SB, Zafar B (eds) Proceedings of the 1st international conference on smart innovation, ergonomics and applied human factors (SEAHF). Springer International Publishing, Cham, pp 180–189

    Chapter  Google Scholar 

  16. Demirci MDS, Baumbach J, Allmer J (2017) On the performance of pre-microRNA detection algorithms. Nat Commun 8:1–9

    Google Scholar 

  17. Yousef M, Allmer J (2019) Classification of pre-cursor microRNAs from different species using a new set of features BT—database and expert systems applications. In: Anderst-Kotsis G, Tjoa AM, Khalil I (eds) . Springer International Publishing, Cham, pp 15–20

    Google Scholar 

  18. Yousef M, Khalifa W, Acar İE, Allmer J (2017) Distinguishing between MicroRNA targets from diverse species using sequence motifs and k-mers. In: Proceedings of the 10th international joint conference on biomedical engineering systems and technologies. SCITEPRESS—Science and Technology Publications, Setúbal, pp 133–139

    Chapter  Google Scholar 

  19. Yousef M, Levy D, Allmer J (2018) Species categorization via MicroRNAs—based on 3′UTR target sites using sequence features: in: proceedings of the 11th international joint conference on biomedical engineering systems and technologies. SCITEPRESS—Science and Technology Publications, Funchal, Madeira, Portugal, pp 112–118

    Google Scholar 

  20. Yousef M, Khaleifa W, Onal-Suzek T (2019) In silico validation of ncRNA-ncRNA interaction sites with ncRNAs represented by k-mers features: in: proceedings of the 12th international joint conference on biomedical engineering systems and technologies. SCITEPRESS - Science and Technology Publications, Prague, Czech Republic, pp 168–173

    Google Scholar 

  21. Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition. IEEE, Piscataway, New Jersey, pp 278–282

    Google Scholar 

  22. Saçar MD, Allmer J (2014) Machine learning methods for microRNA gene prediction. In: miRNomics: MicroRNA biology and computational analysis. Springer, New York, pp 177–187

    Chapter  Google Scholar 

  23. Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–2461

    Article  CAS  Google Scholar 

  24. Kurtz S, Narechania A, Stein JC, Ware D (2008) A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes. BMC Genomics 9:517

    Article  Google Scholar 

  25. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  26. Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Thiel K, Wiswedel B (2009) KNIME-the Konstanz information miner: version 2.0 and beyond. AcM SIGKDD Explor Newsl 11:26–31

    Article  Google Scholar 

  27. Kiritchenko S, Matwin S, Nock R, Famili AF (2006) Learning and evaluation in the presence of class hierarchies: application to text categorization. In: Conference of the Canadian Society for Computational Studies of intelligence. Springer, New York, pp 395–406

    Google Scholar 

  28. Xu Q-S, Liang Y-Z (2001) Monte Carlo cross validation. Chemom Intell Lab Syst 56:1–11

    Article  CAS  Google Scholar 

  29. FernáNdez A, LóPez V, Galar M, Del Jesus MJ, Herrera F (2013) Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowl-Based Syst 42:97–110

    Article  Google Scholar 

  30. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software. ACM SIGKDD Explor Newsl 11:10–18. https://doi.org/10.1145/1656274.1656278

    Article  Google Scholar 

  31. Yousef M, Abdallah L, Allmer J (2019) maTE: discovering expressed interactions between microRNAs and their targets. Bioinformatics 35:4020–4028. https://doi.org/10.1093/bioinformatics/btz204

    Article  CAS  PubMed  Google Scholar 

  32. Olcum M, Tufekci KU, Genc S (2021) MicroRNAs in genetic etiology of human diseases. In: Allmer J, Yousef M (eds) miRNomics: microRNA biology and computational analysis. Methods in molecular biology, vol 2257. Springer, New York

    Google Scholar 

  33. Yildiz MT, Tutar L, Giritlioğlu NI, Bayram B, Tutar Y (2021) MicroRNAs and heat shock proteins in breast cancer biology. In: Allmer J, Yousef M (eds) miRNomics: microRNA biology and computational analysis. Methods in molecular biology, vol 2257. Springer, New York

    Google Scholar 

  34. Karagur ER, Akgun S, Akca H (2021) Computational and bioinformatics methods for microRNA gene prediction. In: Allmer J, Yousef M (eds) miRNomics: microRNA biology and computational analysis. Methods in molecular biology, vol 2257. Springer, New York

    Google Scholar 

  35. Uzuner E, Ulu GT, Gürler SB, Baran Y (2021) The role of MiRNA in cancer: pathogenesis, diagnosis, and treatment. In: Allmer J, Yousef M (eds) miRNomics: microRNA biology and computational analysis. Methods in molecular biology, vol 2257. Springer, New York

    Google Scholar 

  36. Robinson O, Dylus D, Dessimoz C (2016) Phylo.io : interactive viewing and comparison of large phylogenetic trees on the web. Mol Biol Evol 33:2163–2166. https://doi.org/10.1093/molbev/msw080

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Odenthal, L., Allmer, J., Yousef, M. (2022). Ensemble Classifiers for Multiclass MicroRNA Classification. In: Allmer, J., Yousef, M. (eds) miRNomics. Methods in Molecular Biology, vol 2257. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1170-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1170-8_12

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1169-2

  • Online ISBN: 978-1-0716-1170-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics