Abstract
Purpose
Hepatitis B, caused by the Hepatitis B virus (HBV), can harm the liver without noticeable symptoms. Early detection is crucial to prevent transmission and enhance recovery. The main goal is to predict Hepatitis B through cost-effective lab test data, by utilizing machine learning. The primary focus is on evaluating the effectiveness of various algorithms in predicting the disease and their potential to enhance early diagnosis capabilities.
Methods
Six distinct algorithms (Support Vector Machine, K-nearest Neighbors, Logistic Regression, decision tree, extreme gradient boosting, random forest) were employed alongside an ensemble model. Analysis involved two rounds: considering all features and key attributes. The Synthetic Minority Oversampling Technique (SMOTE) was employed for data imbalance. Various metrics, including the confusion matrix, precision, recall, F1 score, accuracy, receiver operating characteristics (ROC) curve, area under the curve (AUC), and mean absolute error (MAE), were utilized to assess the efficacy of each predictive technique. The National Health and Nutrition Examination Survey (NHANES) dataset was employed.
Results
The experimental results demonstrate that the ensemble model attained the highest accuracy (97%) and AUC (0.997) in comparison to existing models. The analysis revealed that specific crucial features possess substantial predictive significance within this model.
Conclusion
The study underscores the potential of the ensemble model as a valuable tool for medical practitioners, leveraging cost-effective and readily obtainable laboratory test data to predict Hepatitis B with remarkable accuracy. By facilitating early diagnosis and intervention, this research presents a promising avenue to enhance patient outcomes in the context of Hepatitis B.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12553-023-00802-x/MediaObjects/12553_2023_802_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12553-023-00802-x/MediaObjects/12553_2023_802_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12553-023-00802-x/MediaObjects/12553_2023_802_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12553-023-00802-x/MediaObjects/12553_2023_802_Fig4_HTML.png)
Similar content being viewed by others
Data availability
The NHANES dataset, the Centers for Disease Control (CDC) of the United States as a part of the National Health and Nutrition Examination Survey (NHANES), available online at https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2017
References
Liang TJ. Hepatitis B: the virus and disease. Hepatology. 2009;49(5 Suppl):S13–21. https://doi.org/10.1002/hep.22881.
Kwon SY, Lee CH. Epidemiology and prevention of hepatitis B virus infection. Korean J. Hepatol. 2011;17(2):87–95. https://doi.org/10.3350/kjhep.2011.17.2.87.
World Health Organization. Hepatitis B. https://www.who.int/news-room/fact-sheets/detail/hepatitis-b, Accessed 2022-06-24.
Nelson NP, Easterbrook PJ, McMahon BJ. Epidemiology of hepatitis B virus infection and impact of vaccination on disease. Clin. Liver Dis. 2016;20(4):607–28. https://doi.org/10.1016/j.cld.2016.06.006.
Burns GS, Thompson AJ. Viral hepatitis B: clinical and epidemiological characteristics. Cold Spring Harb. Perspect. Med. 2014;4(12):a024935. https://doi.org/10.1101/cshperspect.a024935.
World Health Organization. Guidelines on hepatitis B and C testing. https://www.who.int/publications/i/item/9789241549981, Accessed 2017-02-16.
Song JE, Kim DY. Diagnosis of hepatitis B. Ann Transl Med. 2016;4(18):338. https://doi.org/10.21037/atm.2016.09.11.
Bréchot C. Polymerase chain reaction for the diagnosis of viral hepatitis B and C. Gut. 1993;34(2 Suppl):S39–44. https://doi.org/10.1136/gut.34.2_suppl.s39.
Akram A, Islam SMR, Munshi SU, Tabassum S. Detection of hepatitis B virus DNA among chronic and potential occult HBV patients in resource-limited settings by loop-mediated isothermal amplification assay. J Viral Hepat. 2018;25(11):1306–11. https://doi.org/10.1111/jvh.12931.
Borst A, Box ATA, Fluit AC. False-positive results and contamination in nucleic acid amplification assays: suggestions for a prevent and destroy strategy. Eur J Clin Microbiol Infect Dis. 2004;23(4):289–99. https://doi.org/10.1007/s10096-004-1100-1.
(CDC)., C. for D. C. and P. of the U. S. W.-S. Interpretation of Hepatitis B Serologic Test Results. https://www.cdc.gov/hepatitis/hbv/interpretationOfHepBSerologicResults.htm, Accessed 2023-01-13.
Tanveer M, Jangir J, Ganaie MA, Beheshti I, Tabish M, Chhabra N. Diagnosis of schizophrenia: a comprehensive evaluation. IEEE Journal of Biomedical and Health Informatics. 2023;27(3):1185–92. https://doi.org/10.1109/JBHI.2022.3168357.
Chicco D, Jurman G. An ensemble learning approach for enhanced classification of patients with hepatitis and cirrhosis. IEEE Access. 2021;9:24485–98.
Coronato N, Brown DE, Sharma Y, Bar-Yoseph R, Radom-Aizik S, Cooper DM. Functional data analysis for predicting pediatric failure to complete ten brief exercise bouts. IEEE J Biomed Health Inform. 2022;26(12):5953–63. https://doi.org/10.1109/JBHI.2022.3206100.
Obaido G, Ogbuokiri B, Swart TG, Ayawei N, Kasongo SM, Aruleba K, Mienye ID, Aruleba I, Chukwu W, Osaye F, Egbelowo OF, Simphiwe S, Esenogho E. An interpretable machine learning approach for hepatitis B diagnosis. Appl. Sci. 2022;12(21) https://doi.org/10.3390/app122111127.
Sowmien VS, Sugumaran V, Palani K, Vijayaram DTV, Sowmien S, Sugumaran V, Karthikeyan CP, Vijayaram TR. Diagnosis of hepatitis using decision tree algorithm. Int. J. Eng. Technol. 2016;8:2319–8613.
Tian X, Chong Y, Huang Y, Guo P, Li M, Zhang W, Du Z, Li X, Hao Y. Using machine learning algorithms to predict hepatitis B surface antigen Seroclearance. Comput Math Methods Med. 2019;2019:6915850. https://doi.org/10.1155/2019/6915850.
CDC. CDC Database. https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2017, Accessed 2020-02-15.
Schafer JL, Graham JW. Missing data: our view of the state of the art. Psychol Methods. 2002;7(2):147–77.
Buuren, S. van. Flexible Imputation of Missing Data. https://doi.org/10.1201/b11826.
Wei J, Lu Z, Qiu K, Li P, Sun H. Predicting drug risk level from adverse drug reactions using SMOTE and machine learning approaches. IEEE Access. 2020;8:185761–75. https://doi.org/10.1109/ACCESS.2020.3029446.
Li J, Zhang H, Zhao J, Guo X, Rihan W, Deng G. Embedded feature selection and machine learning methods for flash flood susceptibility-mapping in the mainstream Songhua River Basin, China. Remote Sensing. 2022; https://doi.org/10.3390/rs14215523.
Sultan Bin Habib, A.-Z.; Tasnim, T.; Billah, M. M. A Study on Coronary Disease Prediction Using Boosting-Based Ensemble Machine Learning Approaches. In 2019 2nd International Conference on Innovation in Engineering and Technology (ICIET); 2019; pp 1–6. https://doi.org/10.1109/ICIET48527.2019.9290600.
Maurya, J.; Prakash, S. Machine Learning Based Prediction and Diagnosis of Heart Disease Using Multiple Models. 2023. https://doi.org/10.21203/rs.3.rs-2642516/v1.
Umutesi J, Klett-Tammen C, Nsanzimana S, Krause G, Ott JJ. Cross-sectional study of chronic hepatitis B virus infection in Rwandan high-risk groups: unexpected findings on prevalence and its determinants. BMJ Open. 2021; https://doi.org/10.1136/bmjopen-2021-054039.
Meheus A. Risk of hepatitis B in adolescence and young adulthood. Vaccine. 1995;13(Suppl 1):S31–4. https://doi.org/10.1016/0264-410x(95)80044-e.
Meheus A. Teenagers’ lifestyle and the risk of exposure to hepatitis B virus. Vaccine. 2000;18(Suppl 1):S26–9. https://doi.org/10.1016/s0264-410x(99)00458-2.
Code availability
Not applicable.
Funding
This research was supported in part by the National Science and Technology Council of the Republic of China (Taiwan), under grants NSTC 112-2221-E-027-107, NSTC 112-2221-E-027-097, and NSTC 112-2119-M-027-001.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Conflict of interest
The authors confirm that the present study has no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alizargar, A., Chang, YL., Tan, TH. et al. Comparative analysis of machine learning and ensemble approaches for hepatitis B prediction using data mining with synthetic minority oversampling technique. Health Technol. 14, 109–118 (2024). https://doi.org/10.1007/s12553-023-00802-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12553-023-00802-x