Skip to main content

Machine Learning Approach for Feature Interpretation and Classification of Genetic Mutations Leading to Tumor and Cancer

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 672))

Abstract

As the interpretation of genetic mutation is done manually, it is difficult to diagnose a large number of patients and get reports of the same in a quick time. Hence, it needs to be automated using machine learning approach. Towards the same, natural language processing (NLP) technique, viz. term frequency-inverse document frequency (TF-IDF), is used to represent documents as fixed-size depiction for interpreting the given nine classes of genetic mutations. The main aim of this study is to identify the well-suited machine learning model which will give better results in terms of multi-class log-loss. Another important aspect of this study is to interpret the features since feature interpretability is very important in healthcare domain using various machine learning algorithms. Logistic regression (LR) with class balancing was implemented by taking top 1000 words of 3-gram TF-IDF generated features that outperformed the other classifiers to give a test log-loss of 0.98.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Zhang Y, Bhatti UA (2018) Heterogeneous data sources. IEEE J Biomed Heal Informatics 22:1824–1833. https://doi.org/10.1109/JBHI.2018.2846626

    Article  Google Scholar 

  2. Ander J, Arévalo J, Paredes R, Nin J (2018) End-to-end neural network architecture for fraud scoring in card payments. 105:175–181. https://doi.org/10.1016/j.patrec.2017.08.024

  3. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. CSBJ 13:8–17. https://doi.org/10.1016/j.csbj.2014.11.005

    Article  Google Scholar 

  4. Cruz JA, Wishart DS (2006) Applications of machine learning in cancer prediction and prognosis. 59–77. https://doi.org/10.1177/117693510600200030

  5. Tan AC, Gilbert D (2003) Data for cancer classification. 2:1–10

    Google Scholar 

  6. Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KFX, Mewes HW (2005) Gene selection from microarray data for cancer classification—a machine learning approach. 29:37–46. https://doi.org/10.1016/j.compbiolchem.2004.11.001

  7. Michaela B, Grgur K, Franko S (2018) Personalized medicine: redefining cancer treatment classification using bidirectional recurrent convolutions. 28–32

    Google Scholar 

  8. Li L, Zhang Q, Ding Y, Jiang H, Thiers BH, Wang JZ (2014) Automatic diagnosis of melanoma using machine learning methods on a spectroscopic system. 14:1–12. https://doi.org/10.1186/1471-2342-14-36

  9. Ramos-González J, López-Sánchez D, Castellanos-Garzón JA, de Paz JF, Corchado JM (2017) A CBR framework with gradient boosting based feature selection for lung cancer subtype classification. Comput Biol Med 86:98–106. https://doi.org/10.1016/j.compbiomed.2017.05.010

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ankit Kumar Sah or U. Srinivasulu Reddy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sah, A.K., Mishra, A., Reddy, U.S. (2020). Machine Learning Approach for Feature Interpretation and Classification of Genetic Mutations Leading to Tumor and Cancer. In: Sengodan, T., Murugappan, M., Misra, S. (eds) Advances in Electrical and Computer Technologies. Lecture Notes in Electrical Engineering, vol 672. Springer, Singapore. https://doi.org/10.1007/978-981-15-5558-9_35

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-5558-9_35

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-5557-2

  • Online ISBN: 978-981-15-5558-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics