Skip to main content

Overcoming Interpretability in Deep Learning Cancer Classification

  • Protocol
  • First Online:
Deep Sequencing Data Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2243))

Abstract

Since its inception, deep learning has revolutionized the field of machine learning and data-driven science. One such data-driven science to be transformed by deep learning is genomics. In the past decade, numerous genomics studies have adopted deep learning and its applications range from predicting regulatory elements to cancer classification. Despite its dominating efficacy in these applications, deep learning is not without drawbacks. A prominent shortcoming of deep learning is the lack of interpretability. Hence, the main objective of this study is to address this obstacle in the deep learning cancer classification. Here we adopt a feature importance scoring methodology (Gradient-based class activation mapping or Grad-CAM) on a quasi-recurrent neural network model that classify cancer based on FASTA sequencing data. In this study, we managed to formulate a nucleotide-to-genomic-region Grad-CAM scoring methodology, as well as, validate the use this methodology for the chosen model. Consequently, this allows for the utilization of the Grad-CAM scoring methodology for feature importance in deep learning cancer classification. The results from our study identify potential novel candidate genes, genomic elements, and mechanisms for future cancer research.

Yue Yang (Alan) Teo and Artem Danilevsky are equal contributors

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. The MIT Press

    Google Scholar 

  2. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 1097–1105

    Google Scholar 

  3. Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, pp 3642–3649. https://doi.org/10.1109/CVPR.2012.6248110

  4. Hinton G, Deng L, Yu D et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97. https://doi.org/10.1109/MSP.2012.2205597

    Article  Google Scholar 

  5. Morgan N, Bourlard H, Renals S et al (1993) Hybrid neural network/hidden markov model systems for continuous speech recognition. Intern J Pattern Recognit Artif Intell 07(04):899–916. https://doi.org/10.1142/S0218001493000455

    Article  Google Scholar 

  6. Lee C-H (2009) Developments and directions in speech recognition and understanding, part 1. IEEE Signal Process Mag 26(3):75–80

    Article  Google Scholar 

  7. Eraslan G, Avsec Ž, Gagneur J et al (2019) Deep learning: new computational modelling techniques for genomics. Nat Rev Genet 20(7):389–403. https://doi.org/10.1038/s41576-019-0122-6

    Article  CAS  PubMed  Google Scholar 

  8. Kelley DR, Snoek J, Rinn JL (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 26(7):990–999. https://doi.org/10.1101/gr.200535.115

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12(10):931–934. https://doi.org/10.1038/nmeth.3547

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Kelley DR, Reshef YA, Bileschi M et al (2018) Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res 28(5):739–750. https://doi.org/10.1101/gr.227819.117

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Angermueller C, Lee HJ, Reik W et al (2017) DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol 18(1):67. https://doi.org/10.1186/s13059-017-1189-z

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Zeng H, Gifford DK (2017) Predicting the impact of non-coding variants on DNA methylation. Nucleic Acids Res 45(11):e99. https://doi.org/10.1093/nar/gkx177

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Rhee S, Seo S, Kim S (2018) Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, pp 3527–3534

    Google Scholar 

  14. Wang M, Tai C, Weinan E et al (2018) DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. Nucleic Acids Res 46(11):e69. https://doi.org/10.1093/nar/gky215

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Zhou B, Khosla A, Lapedriza A, et al (2015) Learning deep features for discriminative localization, arXiv:1512.04150 [cs]

    Google Scholar 

  16. Alipanahi B, Delong A, Weirauch MT et al (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–838. https://doi.org/10.1038/nbt.3300

    Article  CAS  PubMed  Google Scholar 

  17. Greenside P, Shimko T, Fordyce P et al (2018) Discovering epistatic feature interactions from neural network models of regulatory DNA sequences. Bioinformatics 34(17):i629–i637. https://doi.org/10.1093/bioinformatics/bty575

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Selvaraju RR, Cogswell M, Das A et al (2020) Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis 128(2):336–359. https://doi.org/10.1007/s11263-019-01228-7

    Article  Google Scholar 

  19. Lyu B, Haque A (2018) Deep learning based tumor type classification using gene expression data, bioRxiv, p 364323. https://doi.org/10.1101/364323

  20. Conesa A, Madrigal P, Tarazona S et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17(1):1–19. https://doi.org/10.1186/s13059-016-0881-8

    Article  CAS  Google Scholar 

  21. Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90–95. https://doi.org/10.1109/MCSE.2007.55

    Article  Google Scholar 

  22. Virtanen P, Gommers R, Oliphant TE, et al (2019) SciPy 1.0--Fundamental Algorithms for Scientific Computing in Python, arXiv:1907.10121 [physics]

    Google Scholar 

  23. van der Walt S, Colbert SC, Varoquaux G (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13(2):22–30. https://doi.org/10.1109/MCSE.2011.37

    Article  Google Scholar 

  24. Bradski G (2000) The OpenCV library. Dr. Dobb’s J Software Tools 120:122–125

    Google Scholar 

  25. Phallen J, Sausen M, Adleff V et al (2017) Direct detection of early-stage cancers using circulating tumor DNA. Sci Transl Med 9(403):eaan2415. https://doi.org/10.1126/scitranslmed.aan2415

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Leech NL, Barrett KC, Morgan GA et al (2014) IBM SPSS for intermediate statistics: use and interpretation, 5th edn. Routledge, New York

    Book  Google Scholar 

  27. Mi H, Muruganujan A, Ebert D et al (Jan. 2019) PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res 47(D1):D419–D426. https://doi.org/10.1093/nar/gky1038

    Article  CAS  PubMed  Google Scholar 

  28. Proenca CC, Gao KP, Shmelkov SV et al (2011) Slitrks as emerging candidate genes involved in neuropsychiatric disorders. Trends Neurosci 34(3):143. https://doi.org/10.1016/j.tins.2011.01.001

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Chano T, Kita H, Avnet S et al (2018) Prominent role of RAB39A-RXRB axis in cancer development and stemness. Oncotarget 9(11):9852–9866. https://doi.org/10.18632/oncotarget.23955

    Article  PubMed  PubMed Central  Google Scholar 

  30. Peschansky VJ, Wahlestedt C (Jan. 2014) Non-coding RNAs as direct and indirect modulators of epigenetic regulation. Epigenetics 9(1):3–12. https://doi.org/10.4161/epi.27473

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Noam Shomron .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Teo, Y.Y.(., Danilevsky, A., Shomron, N. (2021). Overcoming Interpretability in Deep Learning Cancer Classification. In: Shomron, N. (eds) Deep Sequencing Data Analysis. Methods in Molecular Biology, vol 2243. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1103-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1103-6_15

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1102-9

  • Online ISBN: 978-1-0716-1103-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics