Identification of tissue-specific tumor biomarker using different optimization algorithms

  • Shib Sankar BhowmickEmail author
  • Debotosh Bhattacharjee
  • Luis Rato
Research Article



Identification of differentially expressed genes, i.e., genes whose transcript abundance level differs across different biological or physiological conditions, was indeed a challenging task. However, the inception of transcriptome sequencing (RNA-seq) technology revolutionized the simultaneous measurement of the transcript abundance levels for thousands of genes.


In this paper, such next-generation sequencing (NGS) data is used to identify biomarker signatures for several of the most common cancer types (bladder, colon, kidney, brain, liver, lung, prostate, skin, and thyroid)


Here, the problem is mapped into the comparison of optimization algorithms for selecting a set of genes that lead to the highest classification accuracy of a two-class classification task between healthy and tumor samples. As the optimization algorithms Artificial Bee Colony (ABC), Ant Colony Optimization, Differential Evolution, and Particle Swarm Optimization are chosen for this experiment. A standard statistical method called DESeq2 is used to select differentially expressed genes before being feed to the optimization algorithms. Classification of healthy and tumor samples is done by support vector machine


Cancer-specific validation yields remarkably good results in terms of accuracy. Highest classification accuracy is achieved by the ABC algorithm for Brain lower grade glioma data is 99.10%. This validation is well supported by a statistical test, gene ontology enrichment analysis, and KEGG pathway enrichment analysis for each cancer biomarker signature


The current study identified robust genes as biomarker signatures and these identified biomarkers might be helpful to accurately identify tumors of unknown origin


Biomarker Machine learning tools Messenger RNA Optimization algorithm Pathway analysis 


Compliance with ethical standards

Conflicts of interest

Shib Sankar Bhowmick, Debotosh Bhattacharjee and Luis Rato declare that they have no conflict of interest

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards

Informed consent

Informed consent was obtained from all individual participants included in the study

Supplementary material

13258_2018_773_MOESM1_ESM.pdf (1.4 mb)
Supplementary material 1 (pdf 1425 KB)
13258_2018_773_MOESM2_ESM.pdf (700 kb)
Supplementary material 2 (pdf 700 KB)
13258_2018_773_MOESM3_ESM.pdf (71 kb)
Supplementary material 3 (pdf 71 KB)


  1. Abu-Mouti FS, El-Hawary M (2011) Optimal distributed generation allocation and sizing in distribution systems via artificial bee colony algorithm. IEEE Trans Power Deliv 26(4):2090–2101CrossRefGoogle Scholar
  2. Argani P, Rosty C, Reiter RE, Wilentz RE, Murugesan SR, Leach SD, Ryu B, Skinner HG, Goggins M, Jaffee EM (2001) Discovery of new markers of cancer through serial analysis of gene expression: prostate stem cell antigen is overexpressed in pancreatic adenocarcinoma. Cancer Res 61(11):4320–4324PubMedGoogle Scholar
  3. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25CrossRefPubMedPubMedCentralGoogle Scholar
  4. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory, Pittsburgh, pp 144–152Google Scholar
  5. Cai H, Chung C, Wong K (2008) Application of differential evolution algorithm for transient stability constrained optimal power flow. IEEE Trans Power Syst 23(2):719–728CrossRefGoogle Scholar
  6. Chandra B, Gupta M (2011) An efficient statistical feature selection approach for classification of gene expression data. J Biomed Inform 44(4):529–535CrossRefPubMedGoogle Scholar
  7. Chopra P, Lee J, Kang J, Lee S (2010) Improving cancer classification accuracy using gene pairs. PLoS ONE 5(12):e14305CrossRefPubMedPubMedCentralGoogle Scholar
  8. Dorigo M, Stützle T (2003) The ant colony optimization metaheuristic: algorithms, applications, and advances. In: Glover F, Kochenberger GA (eds) Handbook of metaheuristics. Springer, Boston, pp 250–285CrossRefGoogle Scholar
  9. Dorigo M, Birattari M, Stützle T (2006) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39CrossRefGoogle Scholar
  10. Dorigo M, Birattari M, Blum C, Clerc M, Stützle T, Winfield A (eds) (2008) Ant colony optimization and swarm intelligence: 6th International conference, ANTS 2008, Brussels, Belgium, September 22–24, 2008, Proceedings. Theoretical computer science and general issues, vol 5217. Springer, Berlin, HeidelbergCrossRefGoogle Scholar
  11. Eberhart Shi Y (2001) Particle swarm optimization: developments, applications and resources. Proc Evol Comput 1:81–86Google Scholar
  12. Fleming RI, Harbison S (2010) The development of a mRNA multiplex RT-PCR assay for the definitive identification of body fluids. Forensic Sci Int: Genet 4(4):244–256CrossRefGoogle Scholar
  13. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701CrossRefGoogle Scholar
  14. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914CrossRefGoogle Scholar
  15. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537CrossRefGoogle Scholar
  16. Haas C, Klesser B, Maake C, Bär W, Kratzer A (2009) mRNA profiling for body fluid identification by reverse transcription endpoint PCR and realtime PCR. Forensic Sci Int: Genet 3(2):80–88CrossRefGoogle Scholar
  17. Han M, Liu X (2012) Forward feature selection based on approximate Markov blanket. In: International symposium on neural networks, Berlin, pp 64–72Google Scholar
  18. Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JC, Trent JM, Staudt LM, Hudson J, Boguski MS (1999) The transcriptional program in the response of human fibroblasts to serum. Science 283(5398):83–87CrossRefGoogle Scholar
  19. Juusola J, Ballantyne J (2007) mRNA profiling for body fluid identification by multiplex quantitative RT-PCR. J Forensic Sci 52(6):1252–1262PubMedPubMedCentralGoogle Scholar
  20. Kandaswamy KK, Chou KC, Martinetz T, Möller S, Suganthan P, Sridharan S, Pugalenthi G (2011) AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties. J Theor Biol 270(1):56–62CrossRefPubMedGoogle Scholar
  21. Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J Global Optim 39(3):459–471CrossRefGoogle Scholar
  22. Karaboga D, Gorkemli B, Ozturk C, Karaboga N (2014) A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif Intell Rev 42(1):21–57CrossRefGoogle Scholar
  23. Kennedy J (2011) Particle swarm optimization. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, New York, pp 760–766Google Scholar
  24. Kennedy J, Eberhart R (1995) Particle swarm optimization. Proc IEEE Int Conf Neural Netw 4:1942–1948CrossRefGoogle Scholar
  25. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A (2016) Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res 44(W1):W90–W97CrossRefPubMedPubMedCentralGoogle Scholar
  26. Lapointe J, Li C, Higgins JP, Van De Rijn M, Bair E, Montgomery K, Ferrari M, Egevad L, Rayford W, Bergerheim U (2004) Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci 101(3):811–816CrossRefPubMedGoogle Scholar
  27. Liu H, Liu L, Zhang H (2010) Ensemble gene selection by grouping for microarray data classification. J Biomed Inform 43(1):81–87CrossRefPubMedGoogle Scholar
  28. Liu J, Ranka S, Kahveci T (2008) Classification and feature selection algorithms for multi-class CGH data. Bioinformatics 24(13):i86–i95CrossRefPubMedPubMedCentralGoogle Scholar
  29. Liu Q, Sung AH, Chen Z, Liu J, Chen L, Qiao M, Wang Z, Huang X, Deng Y (2011) Gene selection and classification for cancer microarray data based on machine learning and similarity measures. BMC Genom 12(5):S1CrossRefGoogle Scholar
  30. Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12):550CrossRefPubMedPubMedCentralGoogle Scholar
  31. Mramor M, Leban G, Demšar J, Zupan B (2007) Visualization-based cancer microarray data classification analysis. Bioinformatics 23(16):2147–2154CrossRefPubMedGoogle Scholar
  32. Olopade OI, Grushko T (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344(26):2028–2029CrossRefGoogle Scholar
  33. Ooi C, Tan P (2003) Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1):37–44CrossRefGoogle Scholar
  34. Peng Y, Wu Z, Jiang J (2010) A novel feature selection approach for biomedical data classification. J Biomed Inform 43(1):15–23CrossRefGoogle Scholar
  35. Richard MLL, Harper KA, Craig RL, Onorato AJ, Robertson JM, Donfack J (2012) Evaluation of mRNA marker specificity for the identification of five human body fluids by capillary electrophoresis. Forensic Sci Int: Genet 6(4):452–460CrossRefGoogle Scholar
  36. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517CrossRefGoogle Scholar
  37. Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: Proceedings of IEEE international conference on evolutionary computation, Anchorage, pp 69–73Google Scholar
  38. Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359CrossRefGoogle Scholar
  39. Wang Y, Jatkoe T, Zhang Y, Mutch MG, Talantov D, Jiang J, McLeod HL, Atkins D (2004) Gene expression profiles and molecular markers to predict recurrence of Dukes’ B colon cancer. J Clin Oncol 22(9):1564–1571CrossRefGoogle Scholar
  40. Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW (2005) Gene selection from microarray data for cancer classification-a machine learning approach. Comput Biol Chem 29(1):37–46CrossRefPubMedGoogle Scholar
  41. Wobst J, Banemann R, Bastisch I (2011) RNA can do better-an improved strategy for RNA-based characterization of different body fluids and skin. Forensic Sci Int Genet Suppl Ser 3(1):e421–e422CrossRefGoogle Scholar
  42. Zhang H, Wang H, Dai Z, Ms Chen, Yuan Z (2012) Improving accuracy for cancer classification with a new algorithm for genes selection. BMC Bioinform 13(1):298CrossRefGoogle Scholar

Copyright information

© The Genetics Society of Korea 2018

Authors and Affiliations

  • Shib Sankar Bhowmick
    • 1
    Email author
  • Debotosh Bhattacharjee
    • 2
  • Luis Rato
    • 3
  1. 1.Department of Electronics and Communication EngineeringHeritage Institute of TechnologyKolkataIndia
  2. 2.Department of Computer Science and EngineeringJadavpur UniversityKolkataIndia
  3. 3.Department of InformaticsUniversity of EvoraEvoraPortugal

Personalised recommendations