Abstract
As the interpretation of genetic mutation is done manually, it is difficult to diagnose a large number of patients and get reports of the same in a quick time. Hence, it needs to be automated using machine learning approach. Towards the same, natural language processing (NLP) technique, viz. term frequency-inverse document frequency (TF-IDF), is used to represent documents as fixed-size depiction for interpreting the given nine classes of genetic mutations. The main aim of this study is to identify the well-suited machine learning model which will give better results in terms of multi-class log-loss. Another important aspect of this study is to interpret the features since feature interpretability is very important in healthcare domain using various machine learning algorithms. Logistic regression (LR) with class balancing was implemented by taking top 1000 words of 3-gram TF-IDF generated features that outperformed the other classifiers to give a test log-loss of 0.98.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Zhang Y, Bhatti UA (2018) Heterogeneous data sources. IEEE J Biomed Heal Informatics 22:1824–1833. https://doi.org/10.1109/JBHI.2018.2846626
Ander J, Arévalo J, Paredes R, Nin J (2018) End-to-end neural network architecture for fraud scoring in card payments. 105:175–181. https://doi.org/10.1016/j.patrec.2017.08.024
Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. CSBJ 13:8–17. https://doi.org/10.1016/j.csbj.2014.11.005
Cruz JA, Wishart DS (2006) Applications of machine learning in cancer prediction and prognosis. 59–77. https://doi.org/10.1177/117693510600200030
Tan AC, Gilbert D (2003) Data for cancer classification. 2:1–10
Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KFX, Mewes HW (2005) Gene selection from microarray data for cancer classification—a machine learning approach. 29:37–46. https://doi.org/10.1016/j.compbiolchem.2004.11.001
Michaela B, Grgur K, Franko S (2018) Personalized medicine: redefining cancer treatment classification using bidirectional recurrent convolutions. 28–32
Li L, Zhang Q, Ding Y, Jiang H, Thiers BH, Wang JZ (2014) Automatic diagnosis of melanoma using machine learning methods on a spectroscopic system. 14:1–12. https://doi.org/10.1186/1471-2342-14-36
Ramos-González J, López-Sánchez D, Castellanos-Garzón JA, de Paz JF, Corchado JM (2017) A CBR framework with gradient boosting based feature selection for lung cancer subtype classification. Comput Biol Med 86:98–106. https://doi.org/10.1016/j.compbiomed.2017.05.010
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sah, A.K., Mishra, A., Reddy, U.S. (2020). Machine Learning Approach for Feature Interpretation and Classification of Genetic Mutations Leading to Tumor and Cancer. In: Sengodan, T., Murugappan, M., Misra, S. (eds) Advances in Electrical and Computer Technologies. Lecture Notes in Electrical Engineering, vol 672. Springer, Singapore. https://doi.org/10.1007/978-981-15-5558-9_35
Download citation
DOI: https://doi.org/10.1007/978-981-15-5558-9_35
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5557-2
Online ISBN: 978-981-15-5558-9
eBook Packages: Computer ScienceComputer Science (R0)