Skip to main content
Log in

Cancer miRNA biomarkers classification using a new representation algorithm and evolutionary deep learning

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The diagnosis of cancer is presently undergoing a change of paradigm for the diagnostic panel using molecular biomarkers. MicroRNA (miRNA) is one of the most important genomic datasets presenting the genome sequences. Since several studies have shown the relationship between miRNAs and cancers, data mining and machine learning methods can be incorporated to extract a large amount of knowledge from cancer genomic datasets. However, previous research works on the identification of cancers from miRNAs have made it possible to diagnose cancer, and the accuracy of some classes is not quite satisfactory. Therefore, this research is aimed at promoting a super-class (meta-label) approach and deep learning in a three-phase method to diagnose cancers from miRNAs. The steps in the first phase of the proposed method, named Representation learning, are partitioning data into super-classes, meta-data creation and super-classes classification. This phase helps data to be split into some subsets to improve classification accuracy. In other words, the first phase groups labels based on the separability of classes into a meta-label, and then a multi-label learner is built to predict these meta-labels. In the second phase, a feature selection to reduce the dimensions of the problem is applied to each super-class to help to focus the attention of an induction algorithm in those features that are more important to predict the target concept. In the third phase of the proposed method, an evolutionary deep neural network for the classification of labels in each super-class is performed. The last two phases are done separately for each subset in which five super-classes and subsequently five deep neural networks are trained. The experimental results reveal that the proposed method achieved more efficient results than 19 recent machine learning methods. Despite the fact that evaluating the dataset which consists of 29 types of cancers provides a more complicated situation for the convolutional neural network to be learned, the performance of the method is noticeably better than other existing methods. The other success which can be considered here is a significant reduction in running time comparing to other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://cancergenome.nih.gov/.

References

  • Abdel-Basset M et al (2018) A hybrid whale optimization algorithm based on local search strategy for the permutation flow shop scheduling problem. Future Gener Comput Syst 85:129–145

    Article  Google Scholar 

  • Abualigah LM et al (2016) A krill herd algorithm for efficient text documents clustering. In: IEEE symposium on computer applications and industrial electronics (ISCAIE). IEEE

  • Abualigah LM et al (2017) β-hill climbing technique for the text document clustering. In: New Trends in Information Technology (NTIT)–2017, p 60

  • Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin

    Book  Google Scholar 

  • Abualigah L (2020) Multi-verse optimizer algorithm: a comprehensive survey of its results, variants, and applications. Neural Comput Appl 32:12381–12401

    Article  Google Scholar 

  • Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19

    Google Scholar 

  • Abualigah LM, Khader AT, Hanandeh ES (2018) A novel weighting scheme applied to improve the text document clustering techniques. In: Innovative computing, optimization and its applications. Springer, Berlin, pp 305–320

  • Aghdam HH, Heravi EJ (2017) Guide to convolutional neural networks, vol 10. Springer, New York, pp 978–983

    Book  Google Scholar 

  • Alevizos I, Illei GG (2010) MicroRNAs as biomarkers in rheumatic diseases. Nat Rev Rheumatol 6(7):391

    Article  Google Scholar 

  • Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185

    MathSciNet  Google Scholar 

  • Aydilek IB (2018) A hybrid firefly and particle swarm optimization algorithm for computationally expensive numerical problems. Appl Soft Comput 66:232–249

    Article  Google Scholar 

  • Barger JF, Nana-Sinkam SP (2015) MicroRNA as tools and therapeutics in lung cancer. Respir Med 109(7):803–812

    Article  Google Scholar 

  • Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116(2):281–297

    Article  Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin

    MATH  Google Scholar 

  • Bolón-Canedo V, Sánchez-Marono N, Alonso-Betanzos A (2014) Data classification using an ensemble of filters. Neurocomputing 135:13–20

    Article  Google Scholar 

  • Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory. ACM

  • Breiman L (1999) Pasting small votes for classification in large databases and on-line. Mach Learn 36(1–2):85–103

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  • Breiman L et al (1984) Classification and regression trees. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Brown TA (2007) Genomes 3. Garland Science Pub., New York

    Google Scholar 

  • Chen X et al (2018) Novel human miRNA-disease association inference based on random forest. Mol Therapy Nucleic Acids 13:568–579

    Article  Google Scholar 

  • Chin Y-H et al (2017) Music emotion recognition using PSO-based fuzzy hyper-rectangular composite neural networks. IET Signal Process 11(7):884–891

    Article  Google Scholar 

  • Cox DR (1958) The regression analysis of binary sequences. J R Stat Soc Ser B (Methodol) 20(2):215–232

    MathSciNet  MATH  Google Scholar 

  • Crammer K et al (2006) Online passive-aggressive algorithms. J Mach Learn Res 7(Mar):551–585

    MathSciNet  MATH  Google Scholar 

  • Edgar JR (2016) Q&A: what are exosomes, exactly? BMC Biol 14(1):46

    Article  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Article  MathSciNet  MATH  Google Scholar 

  • Fujino A, Isozaki H, Suzuki J (2008) Multi-label text categorization with model combination based on f1-score maximization. In: Proceedings of the third international joint conference on natural language processing, vol II

  • Garzelli A, Capobianco L, Nencini F (2008) Fusion of multispectral and panchromatic images as an optimisation problem. In: Image fusion, p 223

  • Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42

    Article  MATH  Google Scholar 

  • Ghasemzadeh A, Azad SS, Esmaeili E (2019) Breast cancer detection based on Gabor-wavelet transform and machine learning methods. Int J Mach Learn Cybern 10(7):1603–1612

    Article  Google Scholar 

  • Han S et al (2018) Optimizing filter size in convolutional neural networks for facial action unit recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  • Hastie T et al (2009) Multi-class adaboost. Stat Interface 2(3):349–360

    Article  MathSciNet  MATH  Google Scholar 

  • Hearst MA et al (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28

    Article  Google Scholar 

  • Ho TK, Basu M (2000) Measuring the complexity of classification problems. In: Proceedings 15th international conference on pattern recognition, ICPR-2000. IEEE

  • Holland JH (1992) Adaptation in natural and artificial systems. MIT Press, Cambridge

    Book  Google Scholar 

  • Hubel DH, Wiesel TN (1959) Receptive fields of single neurones in the cat’s striate cortex. J Physiol 148(3):574–591

    Article  Google Scholar 

  • Hubel D, Wiesel T (1960) Receptive fields of optic nerve fibres in the spider monkey. J Physiol 154(3):572–580

    Article  Google Scholar 

  • Javaid N et al (2017) A hybrid genetic wind driven heuristic optimization algorithm for demand side management in smart grid. Energies 10(3):319

    Article  Google Scholar 

  • Jovanovic M et al (2010) A quantitative targeted proteomics approach to validate predicted microRNA targets in C. elegans. Nat Methods 7(10):837–842

    Article  Google Scholar 

  • Karaboga D, Akay B (2009) A comparative study of artificial bee colony algorithm. Appl Math Comput 214(1):108–132

    MathSciNet  MATH  Google Scholar 

  • Lewis DP, Jebara T, Noble WS (2006) Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure. Bioinformatics 22(22):2753–2760

    Article  Google Scholar 

  • Liu X-Q et al (2019) Prediction of long non-coding RNAs based on deep learning. Genes 10(4):273

    Article  Google Scholar 

  • Lopez-Rincon A et al (2018) Evolutionary optimization of convolutional neural networks for cancer miRNA biomarkers classification. Appl Soft Comput 65:91–100

    Article  Google Scholar 

  • Lu J et al (2005) MicroRNA expression profiles classify human cancers. Nature 435(7043):834

    Article  Google Scholar 

  • Mirjalili S, Mirjalili SM, Hatamlou A (2016) Multi-verse optimizer: a nature-inspired algorithm for global optimization. Neural Comput Appl 27(2):495–513

    Article  Google Scholar 

  • Montavon G, Braun ML, MÃŧller K-R (2011) Kernel analysis of deep networks. J Mach Learn Res 12(Sep):2563–2581

    MathSciNet  MATH  Google Scholar 

  • Morán-Fernández L, Bolón-Canedo V, Alonso-Betanzos A (2017) Centralized vs. distributed feature selection methods based on data complexity measures. Knowl Based Syst 117:27–45

    Article  Google Scholar 

  • Öztürk Ş et al (2018) Convolution kernel size effect on convolutional neural network in histopathological image processing applications. In: International symposium on fundamentals of electrical engineering (ISFEE). IEEE

  • Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12(Oct):2825–2830

    MathSciNet  MATH  Google Scholar 

  • Peralta D et al (2015) Evolutionary feature selection for big data classification: a MapReduce approach. Math Probl Eng. https://doi.org/10.1155/2015/246139

    Article  MATH  Google Scholar 

  • Pian C et al (2020) Discovering cancer-related miRNAs from miRNA-target interactions by support vector machines. Mol Therapy Nucleic Acids 19:1423–1433

    Article  Google Scholar 

  • Potharaju SP, Sreedevi M (2019) Distributed feature selection (DFS) strategy for microarray gene expression data to improve the classification performance. Clin Epidemiol Glob Health 7(2):171–176

    Article  Google Scholar 

  • Rajabioun R (2011) Cuckoo optimization algorithm. Appl Soft Comput 11(8):5508–5518

    Article  Google Scholar 

  • Sabzehzari M, Naghavi M (2018) Phyto-miRNA: a molecule with beneficial abilities for plant biotechnology. Gene 683:28–34

    Article  Google Scholar 

  • Salem H, Attiya G, El-Fishawy N (2017) Classification of human cancer diseases by gene expression profiles. Appl Soft Comput 50:124–134

    Article  Google Scholar 

  • Sarbazi-Azad S, Abadeh MS (2018) Gene selection for cancer classification from microarray data using data overlap measure. In: 25th National and 3rd international Iranian conference on biomedical engineering (ICBME). IEEE

  • Sarbazi-Azad S, Abadeh MS, Abadi MIN (2018) Feature selection in microarray gene expression data using fisher discriminant ratio. In: 8th International conference on computer and knowledge engineering (ICCKE). IEEE

  • Scherer D, Müller A, Behnke S (2010) Evaluation of pooling operations in convolutional architectures for object recognition. In: International conference on artificial neural networks. Springer, Berlin

  • Sherafatian M (2018) Tree-based machine learning algorithms identified minimal set of miRNA biomarkers for breast cancer diagnosis and molecular subtyping. Gene 677:111–118

    Article  Google Scholar 

  • Soon FC et al (2017) Hyper-parameters optimisation of deep CNN architecture for vehicle logo recognition. IET Intell Trans Syst 12(8):939–946

    Article  Google Scholar 

  • Tibshirani R et al (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci 99(10):6567–6572

    Article  Google Scholar 

  • Tikhonov AN (1943) The stability of inverse problems. Dokl Akad Nauk SSSR 39:195–198

    MathSciNet  Google Scholar 

  • Torres R, Judson-Torres RL (2019) Research techniques made simple: feature selection for biomarker discovery. J Investig Dermatol 139(10):2068–2074

    Article  Google Scholar 

  • Vasudevan S, Tong Y, Steitz JA (2007) Switching from repression to activation: microRNAs can up-regulate translation. Science 318(5858):1931–1934

    Article  Google Scholar 

  • Wang Y, Zhang H, Zhang G (2019) cPSO-CNN: an efficient PSO-based algorithm for fine-tuning hyper-parameters of convolutional neural networks. Swarm Evol Comput 49:114–123

    Article  Google Scholar 

  • Yang X-S (2012) Flower pollination algorithm for global optimization. In: International conference on unconventional computing and natural computation. Springer, Berlin

  • Ye Z, Sun B, Xiao Z (2020) Machine learning identifies 10 feature miRNAs for Lung squamous cell carcinoma. Gene 749:144669

    Article  Google Scholar 

  • Yoon S et al (2019) Biclustering analysis of transcriptome big data identifies condition-specific microRNA targets. Nucleic Acids Res 47:e53

    Article  Google Scholar 

  • Young SR et al (2015) Optimizing deep learning hyper-parameters through an evolutionary algorithm. In: Proceedings of the workshop on machine learning in high-performance computing environments

  • Zhang Y-H et al (2020) Identifying circulating miRNA biomarkers for early diagnosis and monitoring of lung cancer. Biochim Biophys Acta (BBA) Mol Basis Dis 1866:165847

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Saniee Abadeh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest regarding this manuscript.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bagheri Khoulenjani, N., Saniee Abadeh, M., Sarbazi-Azad, S. et al. Cancer miRNA biomarkers classification using a new representation algorithm and evolutionary deep learning. Soft Comput 25, 3113–3129 (2021). https://doi.org/10.1007/s00500-020-05366-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-05366-w

Keywords

Navigation