Abstract
Hepatitis is one of the most commonly diagnosed diseases in the world. With the enormous amount of data available in the medical industry, it is difficult to draw important conclusions. With the advent of technology, data mining techniques are used to solve this problem. In this study, we have applied various classifiers namely KNN, Logistic Regression, Naive Bayes, Decision Tree, Support Vector Machine (SVM), and Random Forest on Hepatitis dataset acquired from UCI Machine Learning repository. Two feature selection techniques: Chi-square test and Boruta Algorithm are used to improve the performance of the classifiers. Finally, we analyze which classifier performed the best and classify the patients into live or dead based on various performance measures. It was concluded that Naive Bayes with Chi-Square attribute selection performed better in terms of F1 score value. Overall, Logistic regression, Support Vector Machine, Kernel SVM, and KNN performed equally well with an accuracy of 90.32%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Al-Turaiki, I., Alshahrani, M., Almutairi, T.: Building predictive models for MERS-CoV infections using data mining techniques. J. Inf. Public Health 9(6), 744–748 (2016)
USF Health, Morsani College of Medicine. Data mining in healthcare. https://www.usfhealthonline.com/resources/key-concepts/data-mining-in-healthcare, last accessed 2020/08/21
World Health Organization.“What is hepatitis?: https://www.who.int/news-room/q-a-detail/what-is-hepatitis, last accessed 2020/08/21
World Health Organization.: Hepatitis. https://www.who.int/health-topics/hepatitis #tab=tab_1, last accessed 2020/08/21
Pushpalatha, S., Pandya, D.: Framework for diagnosing hepatitis disease using classification algorithms. Int. J. Adv. Res. 4(7), 2189–2195 (2016)
Kumar, V., BR, G.D.: Hepatitis prediction model based on data mining algorithm and optimal feature selection to improve predictive accuracy. Int. J. Comput. Appl. 51(19), 13–16 (2012)
Karthikeyan, T., Thangaraju, P.: Analysis of classification algorithms applied to hepatitis patients. Int. J. Comp. Appl. 62(15), 25–30 (2013)
Hepatitis dataset, UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/ Irvine. CA University of California, School of Information Technology and Computer Science, last accessed 2020/07/18
Mallick, Pradeep Kumar, Debahuti Mishra, Srikanta Patnaik, Kailash Shaw: A semi-supervised rough set and random forest approach for pattern classification of gene expression data.Int. J. Rea.-Based Intel. Syst. 8(3–4), 155–167 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Panda, N., Satapathy, S.K., Mishra, S., Mallick, P.K. (2021). Empirical Study on Different Feature Selection and Classification Algorithms for Prediction of Hepatitis Disease. In: Tripathy, H.K., Mishra, S., Mallick, P.K., Panda, A.R. (eds) Technical Advancements of Machine Learning in Healthcare. Studies in Computational Intelligence, vol 936. Springer, Singapore. https://doi.org/10.1007/978-981-33-4698-7_4
Download citation
DOI: https://doi.org/10.1007/978-981-33-4698-7_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4697-0
Online ISBN: 978-981-33-4698-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)