Skip to main content

Advertisement

Log in

Gene Expression-Based Supervised Classification Models for Discriminating Early- and Late-Stage Prostate Cancer

  • Research Article
  • Published:
Proceedings of the National Academy of Sciences, India Section B: Biological Sciences Aims and scope Submit manuscript

Abstract

Prostate cancer is one of the prominent types of cancer affecting the human male population throughout the world. Detecting cancer in the early-stage is a crucial factor in the effective treatment of the disease. Machine learning is a type of algorithm that can learn and predict from a given dataset without being manually programmed. Machine learning can be useful with gene expression data to discriminate cancer stage rather than relying on histology of tissue and various other diagnostic methods used in prostate cancer detection. In this study, the authors have developed a supervised classifier for detecting early- and late-stage prostate cancer using RNA sequencing-based gene expression data collected from The Cancer Genome Atlas. Supervised learning algorithms Naive Bayes, stochastic gradient descent, J48, and Random Forest, Multilayer Perceptron were employed with 276 most informative subset of features extracted from gene expression data. Accuracies of these developed models were evaluated after tenfold cross-validation. Among all, the trained classifiers stochastic gradient descent-based classifier performed best with accuracy 86.91%, sensitivity 86.9% and area under receiver operating curve 0.656. Gene Ontology and KEGG pathway enrichment analysis of these 276 gene features were also performed to functionally categorize these genes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo M et al (2015) Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012: globocan 2012. Int J Cancer 136(5):E359–E386

    Article  CAS  PubMed  Google Scholar 

  2. Shen MM, Abate-Shen C (2010) Molecular genetics of prostate cancer: new prospects for old challenges. Genes Dev 24(18):1967–2000

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Droz J-P, Albrand G, Gillessen S, Hughes S, Mottet N, Oudard S et al (2017) Management of prostate cancer in elderly patients: recommendations of a task force of the international society of geriatric oncology. Eur Urol 72(4):521–531

    Article  PubMed  Google Scholar 

  4. Hariharan K, Padmanabha V (2016) Demography and disease characteristics of prostate cancer in India. Indian J Urol 32(2):103

    Article  PubMed  PubMed Central  Google Scholar 

  5. Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M et al (2001) Phases of biomarker development for early detection of cancer. JNCI J Natl Cancer Inst 93(14):1054–1061

    Article  CAS  PubMed  Google Scholar 

  6. Agnihotri S, Mittal RD, Kapoor R, Mandhani A (2014) Asymptomatic prostatic inflammation in men with clinical BPH and erectile dysfunction affects the positive predictive value of prostate-specific antigen. Urol Oncol Semin Orig Investig 32(7):946–951

    Google Scholar 

  7. Mejak SL, Bayliss J, Hanks SD (2013) Long distance bicycle riding causes prostate-specific antigen to increase in men aged 50 years and over. PLoS ONE 8(2):e56030

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Cui T, Kovell RC, Terlecki RP (2016) Is it time to abandon the digital rectal examination? Lessons from the PLCO cancer screening trial and peer-reviewed literature. Curr Med Res Opin 32(10):1663–1669

    Article  PubMed  Google Scholar 

  9. Harvey CJ, Pilcher J, Richenberg J, Patel U, Frauscher F (2012) Applications of transrectal ultrasound in prostate cancer. Br J Radiol. 85(special_issue_1):S3–S17

    Article  PubMed  PubMed Central  Google Scholar 

  10. Mkinen T, Auvinen A, Hakama M, åkan Stenman U-H, Tammela TLJ (2002) Acceptability and complications of prostate biopsy in population-based PSA screening versus routine clinical practice: a prospective controlled study. Urology 60(5):846–850

    Article  PubMed  Google Scholar 

  11. Raaijmakers R, Kirkels WJ, Roobol MJ, Wildhagen MF, Schrder FH (2002) Complication rates and risk factors of 5802 transrectal ultrasound-guided sextant biopsies of the prostate within a population-based screening program. Urology 60(5):826–830

    Article  PubMed  Google Scholar 

  12. Prensner JR, Rubin MA, Wei JT, Chinnaiyan AM (2012) Beyond PSA: the next generation of prostate cancer biomarkers. Sci Transl Med. 4(127):127rv3

    Article  PubMed  PubMed Central  Google Scholar 

  13. Buyyounouski MK, Choyke PL, McKenney JK, Sartor O, Sandler HM, Amin MB et al (2017) Prostate cancer—major changes in the American Joint Committee on Cancer eighth edition cancer staging manual: prostate cancer-major 8th edition changes. CA Cancer J Clin. 67(3):245–253

    Article  PubMed  PubMed Central  Google Scholar 

  14. Chen N, Zhou Q (2016) The evolving Gleason grading system. Chin J Cancer Res Chung-Kuo Yen Cheng Yen Chiu 28(1):58–64

    CAS  PubMed  Google Scholar 

  15. You JS, Jones PA (2012) Cancer genetics and epigenetics: two sides of the same coin? Cancer Cell 22(1):9–20

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Morozova O, Marra MA (2008) Applications of next-generation sequencing technologies in functional genomics. Genomics 92(5):255–264

    Article  CAS  PubMed  Google Scholar 

  17. Bhalla S, Chaudhary K, Kumar R, Sehgal M, Kaur H, Sharma S et al (2017) Gene expression-based biomarkers for discriminating early and late stage of clear cell renal cancer. Sci Rep 28(7):44997

    Article  Google Scholar 

  18. Jagga Z, Gupta D (2014) Classification models for clear cell renal carcinoma stage progression, based on tumor RNAseq expression trained supervised machine learning algorithms. BMC Proc 8(Suppl 6):S2

    Article  PubMed  PubMed Central  Google Scholar 

  19. Singireddy S, Alkhateeb A, Rezaeian I, Rueda L, Cavallo-Medved D, Porter L (2015) Identifying differentially expressed transcripts associated with prostate cancer progression using RNA-Seq and machine learning techniques. In: 2015 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB). Niagara Falls, ON, Canada: IEEE, p 1–5. http://ieeexplore.ieee.org/document/7300302/. Accessed 9 Apr 2019

  20. Arvaniti E, Fricker KS, Moret M, Rupp N, Hermanns T, Fankhauser C et al (2018) Automated Gleason grading of prostate cancer tissue microarrays via deep learning. Sci Rep. 8(1):12054

    Article  PubMed  PubMed Central  Google Scholar 

  21. Hussain L, Ahmed A, Saeed S, Rathore S, Awan IA, Shah SA et al (2018) Prostate cancer detection using machine learning techniques by employing combination of features extracting strategies. Cancer Biomark Sect Dis Markers 21(2):393–413

    Article  Google Scholar 

  22. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10

    Article  Google Scholar 

  23. Witten IH (ed) (2017) Data mining: practical machine learning tools and techniques, 4th edn. Elsevier, Amsterdam, p 621

    Google Scholar 

  24. John GH, Langley P (1995) Estimating Continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 338–345. (UAI’95). http://dl.acm.org/citation.cfm?id=2074158.2074196. Accessed 12 Apr 2018

  25. Kiefer J, Wolfowitz J (1952) Stochastic estimation of the maximum of a regression function. Ann Math Stat. 23:462–466

    Article  Google Scholar 

  26. Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, Amsterdam

    Google Scholar 

  27. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  28. Pandya AS, Macy RB (1996) Pattern recognition with neural networks in C++. CRC Press, Boca Raton, p 410

    Google Scholar 

  29. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36

    Article  CAS  Google Scholar 

  30. Wang J, Duncan D, Shi Z, Zhang B (2013) WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res 41(W1):W77–W83

    Article  PubMed  PubMed Central  Google Scholar 

  31. Larrañaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I et al (2006) Machine learning in bioinformatics. Brief Bioinform 7(1):86–112

    Article  PubMed  Google Scholar 

  32. Abeshouse A, Ahn J, Akbani R, Ally A, Amin S, Andry CD et al (2015) The molecular taxonomy of primary prostate cancer. Cell 163(4):1011–1025

    Article  CAS  Google Scholar 

  33. Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H et al (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci 98(19):10869–10874

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The results published here are based upon data generated by the TCGA Research Network (http://cancergenome.nih.gov/). The authors are thankful to the Department of Biotechnology, New Delhi, India, for providing financial assistance through Bioinformatics National Certification (BINC) (File No. PU/BINC/2016/E-04). They thank Mr. Purshotam Das for providing technical support while carrying out this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. K. Gaur.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Significance Statement

In this work, the authors have used TCGA gene expression data and machine learning techniques to classify whether prostate cancer is in early- or late-stage. Using TCGA gene expression data the authors identified the most informative subset of gene features and used expression of these gene features to classify prostate cancer stage. The authors have shown that machine learning-based prediction methods can be substitute for histology-based cancer-stage determination.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, R., Bhanti, P., Marwal, A. et al. Gene Expression-Based Supervised Classification Models for Discriminating Early- and Late-Stage Prostate Cancer. Proc. Natl. Acad. Sci., India, Sect. B Biol. Sci. 90, 541–565 (2020). https://doi.org/10.1007/s40011-019-01127-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40011-019-01127-4

Keywords

Profiles

  1. Avinash Marwal