Deep Learning and Random Forest-Based Augmentation of sRNA Expression Profiles

  • Jelena Fiosina
  • Maksims FiosinsEmail author
  • Stefan Bonn
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11490)


The lack of well-structured annotations in a growing amount of RNA expression data complicates data interoperability and reusability. Commonly used text mining methods extract annotations from existing unstructured data descriptions and often provide inaccurate output that requires manual curation. Automatic data-based augmentation (generation of annotations on the base of expression data) can considerably improve the annotation quality and has not been well-studied. We formulate an automatic augmentation of small RNA-seq expression data as a classification problem and investigate deep learning (DL) and random forest (RF) approaches to solve it. We generate tissue and sex annotations from small RNA-seq expression data for tissues and cell lines of homo sapiens. We validate our approach on 4243 annotated small RNA-seq samples from the Small RNA Expression Atlas (SEA) database. The average prediction accuracy for tissue groups is 98% (DL), for tissues - 96.5% (DL), and for sex - 77% (DL). The “one dataset out” average accuracy for tissue group prediction is 83% (DL) and 59% (RF). On average, DL provides better results as compared to RF, and considerably improves classification performance for ‘unseen’ datasets.


Augmentation Deep learning Random forest Ontology Small RNA Expression counts Contamination 



The research was supported by the German Federal Ministry of Education and Research (BMBF), project Integrative Data Semantics for Neurodegenerative research (031L0029); by German Research Foundation (DFG), project Quantitative Synaptology (SFB 1286 Z2) and by Volkswagen Foundation.


  1. 1.
    Backes, C., Khaleeq, Q.T., et al.: miEAA: microRNA enrichment analysis and annotation. Nucleic Acids Res. 44(W1), W110–W116 (2016)CrossRefGoogle Scholar
  2. 2.
    Ellis, S., et al.: Improving the value of public RNA-SEQ expression data by phenotype prediction. Nucleic Acids Res. 46(9), e54 (2018)CrossRefGoogle Scholar
  3. 3.
    Gene expression omnibus.
  4. 4.
    Guo, L., et al.: miRNA and mRNA expression analysis reveals potential sex-biased miRNA expression. Sci. Rep. 7, 39812 (2017)CrossRefGoogle Scholar
  5. 5.
    Guo, Z., Maki, M., et al.: Genome-wide survey of tissue-specific microRNA and transcription factor regulatory networks in 12 tissues. Sci. Rep. 4, 5150 (2014)CrossRefGoogle Scholar
  6. 6.
    Hadley, D., Pan, J., et al.: Precision annotation of digital samples in NCBI’s gene expression omnibus. Sci. Data 4, 170125 (2017)CrossRefGoogle Scholar
  7. 7.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436 (2015)CrossRefGoogle Scholar
  8. 8.
    Li, Y., et al.: Deep learning in bioinformatics: introduction, application, and perspective in big data era. bioRxiv (2019)Google Scholar
  9. 9.
    Madan, S., Fiosins, M., et al.: A semantic data integration methodology for translational neurodegenerative disease research. Figshare (2018)Google Scholar
  10. 10.
    Rahman, R.U., Sattar, A., Fiosins, M., et al.: Sea: the small RNA expression atlas. bioRxiv (2017).
  11. 11.
    Rahman, R.U., et al.: Oasis 2: improved online analysis of small RNA-seq data. BMC Bioinform. 19, 54 (2018)CrossRefGoogle Scholar
  12. 12.
    Simon, L., et al.: Human platelet microRNA-mRNA networks associated with age and gender revealed by integrated plateletomics. Blood 123, e37–e45 (2014)CrossRefGoogle Scholar
  13. 13.
    Statnikov, A., Wang, L., Aliferis, C.F.: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinform. 9, 319 (2008)CrossRefGoogle Scholar
  14. 14.
    Sun, Y., Koo, S., et al.: Development of a micro-array to detect human and mouse microRNAs and characterization of expression in human organs. Nucleic Acids Res. 32(22), e188 (2004)CrossRefGoogle Scholar
  15. 15.
    Webb, S.: Deep learning for biology. Nature 554, 555–557 (2018)CrossRefGoogle Scholar
  16. 16.
    Wilkinson, M.D., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016)CrossRefGoogle Scholar
  17. 17.
    Xiao, T., et al.: Learning from massive noisy labeled data for image classification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2691–2699 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jelena Fiosina
    • 1
  • Maksims Fiosins
    • 2
    • 3
    • 4
    Email author
  • Stefan Bonn
    • 2
    • 3
  1. 1.Clausthal University of TechnologyClausthal-ZellerfeldGermany
  2. 2.German Center for Neurodegenerative DiseasesTübingenGermany
  3. 3.Institute for Medical Systems Biology, Center for Molecular NeurobiologyUniversity Medical Center Hamburg-EppendorfHamburgGermany
  4. 4.Genevention GmbHGöttingenGermany

Personalised recommendations