Skip to main content
Log in

Identification of tissue-specific tumor biomarker using different optimization algorithms

  • Research Article
  • Published:
Genes & Genomics Aims and scope Submit manuscript

Abstract

Background

Identification of differentially expressed genes, i.e., genes whose transcript abundance level differs across different biological or physiological conditions, was indeed a challenging task. However, the inception of transcriptome sequencing (RNA-seq) technology revolutionized the simultaneous measurement of the transcript abundance levels for thousands of genes.

Objective

In this paper, such next-generation sequencing (NGS) data is used to identify biomarker signatures for several of the most common cancer types (bladder, colon, kidney, brain, liver, lung, prostate, skin, and thyroid)

Methods

Here, the problem is mapped into the comparison of optimization algorithms for selecting a set of genes that lead to the highest classification accuracy of a two-class classification task between healthy and tumor samples. As the optimization algorithms Artificial Bee Colony (ABC), Ant Colony Optimization, Differential Evolution, and Particle Swarm Optimization are chosen for this experiment. A standard statistical method called DESeq2 is used to select differentially expressed genes before being feed to the optimization algorithms. Classification of healthy and tumor samples is done by support vector machine

Results

Cancer-specific validation yields remarkably good results in terms of accuracy. Highest classification accuracy is achieved by the ABC algorithm for Brain lower grade glioma data is 99.10%. This validation is well supported by a statistical test, gene ontology enrichment analysis, and KEGG pathway enrichment analysis for each cancer biomarker signature

Conclusion

The current study identified robust genes as biomarker signatures and these identified biomarkers might be helpful to accurately identify tumors of unknown origin

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. http://gdac.broadinstitute.org/.

References

  • Abu-Mouti FS, El-Hawary M (2011) Optimal distributed generation allocation and sizing in distribution systems via artificial bee colony algorithm. IEEE Trans Power Deliv 26(4):2090–2101

    Article  Google Scholar 

  • Argani P, Rosty C, Reiter RE, Wilentz RE, Murugesan SR, Leach SD, Ryu B, Skinner HG, Goggins M, Jaffee EM (2001) Discovery of new markers of cancer through serial analysis of gene expression: prostate stem cell antigen is overexpressed in pancreatic adenocarcinoma. Cancer Res 61(11):4320–4324

    CAS  PubMed  Google Scholar 

  • Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory, Pittsburgh, pp 144–152

  • Cai H, Chung C, Wong K (2008) Application of differential evolution algorithm for transient stability constrained optimal power flow. IEEE Trans Power Syst 23(2):719–728

    Article  Google Scholar 

  • Chandra B, Gupta M (2011) An efficient statistical feature selection approach for classification of gene expression data. J Biomed Inform 44(4):529–535

    Article  CAS  PubMed  Google Scholar 

  • Chopra P, Lee J, Kang J, Lee S (2010) Improving cancer classification accuracy using gene pairs. PLoS ONE 5(12):e14305

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Dorigo M, Stützle T (2003) The ant colony optimization metaheuristic: algorithms, applications, and advances. In: Glover F, Kochenberger GA (eds) Handbook of metaheuristics. Springer, Boston, pp 250–285

    Chapter  Google Scholar 

  • Dorigo M, Birattari M, Stützle T (2006) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39

    Article  Google Scholar 

  • Dorigo M, Birattari M, Blum C, Clerc M, Stützle T, Winfield A (eds) (2008) Ant colony optimization and swarm intelligence: 6th International conference, ANTS 2008, Brussels, Belgium, September 22–24, 2008, Proceedings. Theoretical computer science and general issues, vol 5217. Springer, Berlin, Heidelberg

    Book  Google Scholar 

  • Eberhart Shi Y (2001) Particle swarm optimization: developments, applications and resources. Proc Evol Comput 1:81–86

    Google Scholar 

  • Fleming RI, Harbison S (2010) The development of a mRNA multiplex RT-PCR assay for the definitive identification of body fluids. Forensic Sci Int: Genet 4(4):244–256

    Article  CAS  Google Scholar 

  • Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701

    Article  Google Scholar 

  • Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914

    Article  CAS  PubMed  Google Scholar 

  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Article  CAS  PubMed  Google Scholar 

  • Haas C, Klesser B, Maake C, Bär W, Kratzer A (2009) mRNA profiling for body fluid identification by reverse transcription endpoint PCR and realtime PCR. Forensic Sci Int: Genet 3(2):80–88

    Article  CAS  Google Scholar 

  • Han M, Liu X (2012) Forward feature selection based on approximate Markov blanket. In: International symposium on neural networks, Berlin, pp 64–72

  • Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JC, Trent JM, Staudt LM, Hudson J, Boguski MS (1999) The transcriptional program in the response of human fibroblasts to serum. Science 283(5398):83–87

    Article  CAS  PubMed  Google Scholar 

  • Juusola J, Ballantyne J (2007) mRNA profiling for body fluid identification by multiplex quantitative RT-PCR. J Forensic Sci 52(6):1252–1262

    CAS  PubMed  Google Scholar 

  • Kandaswamy KK, Chou KC, Martinetz T, Möller S, Suganthan P, Sridharan S, Pugalenthi G (2011) AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties. J Theor Biol 270(1):56–62

    Article  CAS  PubMed  Google Scholar 

  • Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J Global Optim 39(3):459–471

    Article  Google Scholar 

  • Karaboga D, Gorkemli B, Ozturk C, Karaboga N (2014) A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif Intell Rev 42(1):21–57

    Article  Google Scholar 

  • Kennedy J (2011) Particle swarm optimization. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, New York, pp 760–766

    Google Scholar 

  • Kennedy J, Eberhart R (1995) Particle swarm optimization. Proc IEEE Int Conf Neural Netw 4:1942–1948

    Article  Google Scholar 

  • Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A (2016) Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res 44(W1):W90–W97

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lapointe J, Li C, Higgins JP, Van De Rijn M, Bair E, Montgomery K, Ferrari M, Egevad L, Rayford W, Bergerheim U (2004) Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci 101(3):811–816

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Liu H, Liu L, Zhang H (2010) Ensemble gene selection by grouping for microarray data classification. J Biomed Inform 43(1):81–87

    Article  CAS  PubMed  Google Scholar 

  • Liu J, Ranka S, Kahveci T (2008) Classification and feature selection algorithms for multi-class CGH data. Bioinformatics 24(13):i86–i95

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Liu Q, Sung AH, Chen Z, Liu J, Chen L, Qiao M, Wang Z, Huang X, Deng Y (2011) Gene selection and classification for cancer microarray data based on machine learning and similarity measures. BMC Genom 12(5):S1

    Article  Google Scholar 

  • Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12):550

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mramor M, Leban G, Demšar J, Zupan B (2007) Visualization-based cancer microarray data classification analysis. Bioinformatics 23(16):2147–2154

    Article  CAS  PubMed  Google Scholar 

  • Olopade OI, Grushko T (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344(26):2028–2029

    Article  CAS  PubMed  Google Scholar 

  • Ooi C, Tan P (2003) Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1):37–44

    Article  CAS  PubMed  Google Scholar 

  • Peng Y, Wu Z, Jiang J (2010) A novel feature selection approach for biomedical data classification. J Biomed Inform 43(1):15–23

    Article  PubMed  Google Scholar 

  • Richard MLL, Harper KA, Craig RL, Onorato AJ, Robertson JM, Donfack J (2012) Evaluation of mRNA marker specificity for the identification of five human body fluids by capillary electrophoresis. Forensic Sci Int: Genet 6(4):452–460

    Article  CAS  Google Scholar 

  • Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

    Article  CAS  PubMed  Google Scholar 

  • Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: Proceedings of IEEE international conference on evolutionary computation, Anchorage, pp 69–73

  • Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359

    Article  Google Scholar 

  • Wang Y, Jatkoe T, Zhang Y, Mutch MG, Talantov D, Jiang J, McLeod HL, Atkins D (2004) Gene expression profiles and molecular markers to predict recurrence of Dukes’ B colon cancer. J Clin Oncol 22(9):1564–1571

    Article  CAS  PubMed  Google Scholar 

  • Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW (2005) Gene selection from microarray data for cancer classification-a machine learning approach. Comput Biol Chem 29(1):37–46

    Article  CAS  PubMed  Google Scholar 

  • Wobst J, Banemann R, Bastisch I (2011) RNA can do better-an improved strategy for RNA-based characterization of different body fluids and skin. Forensic Sci Int Genet Suppl Ser 3(1):e421–e422

    Article  Google Scholar 

  • Zhang H, Wang H, Dai Z, Ms Chen, Yuan Z (2012) Improving accuracy for cancer classification with a new algorithm for genes selection. BMC Bioinform 13(1):298

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shib Sankar Bhowmick.

Ethics declarations

Conflicts of interest

Shib Sankar Bhowmick, Debotosh Bhattacharjee and Luis Rato declare that they have no conflict of interest

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards

Informed consent

Informed consent was obtained from all individual participants included in the study

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhowmick, S.S., Bhattacharjee, D. & Rato, L. Identification of tissue-specific tumor biomarker using different optimization algorithms. Genes Genom 41, 431–443 (2019). https://doi.org/10.1007/s13258-018-0773-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13258-018-0773-2

Keywords

Navigation