Skip to main content
Log in

Comparative analysis of machine learning and ensemble approaches for hepatitis B prediction using data mining with synthetic minority oversampling technique

  • Original Paper
  • Published:
Health and Technology Aims and scope Submit manuscript

Abstract

Purpose

Hepatitis B, caused by the Hepatitis B virus (HBV), can harm the liver without noticeable symptoms. Early detection is crucial to prevent transmission and enhance recovery. The main goal is to predict Hepatitis B through cost-effective lab test data, by utilizing machine learning. The primary focus is on evaluating the effectiveness of various algorithms in predicting the disease and their potential to enhance early diagnosis capabilities.

Methods

Six distinct algorithms (Support Vector Machine, K-nearest Neighbors, Logistic Regression, decision tree, extreme gradient boosting, random forest) were employed alongside an ensemble model. Analysis involved two rounds: considering all features and key attributes. The Synthetic Minority Oversampling Technique (SMOTE) was employed for data imbalance. Various metrics, including the confusion matrix, precision, recall, F1 score, accuracy, receiver operating characteristics (ROC) curve, area under the curve (AUC), and mean absolute error (MAE), were utilized to assess the efficacy of each predictive technique. The National Health and Nutrition Examination Survey (NHANES) dataset was employed.

Results

The experimental results demonstrate that the ensemble model attained the highest accuracy (97%) and AUC (0.997) in comparison to existing models. The analysis revealed that specific crucial features possess substantial predictive significance within this model.

Conclusion

The study underscores the potential of the ensemble model as a valuable tool for medical practitioners, leveraging cost-effective and readily obtainable laboratory test data to predict Hepatitis B with remarkable accuracy. By facilitating early diagnosis and intervention, this research presents a promising avenue to enhance patient outcomes in the context of Hepatitis B.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The NHANES dataset, the Centers for Disease Control (CDC) of the United States as a part of the National Health and Nutrition Examination Survey (NHANES), available online at https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2017

References

  1. Liang TJ. Hepatitis B: the virus and disease. Hepatology. 2009;49(5 Suppl):S13–21. https://doi.org/10.1002/hep.22881.

    Article  Google Scholar 

  2. Kwon SY, Lee CH. Epidemiology and prevention of hepatitis B virus infection. Korean J. Hepatol. 2011;17(2):87–95. https://doi.org/10.3350/kjhep.2011.17.2.87.

    Article  MathSciNet  Google Scholar 

  3. World Health Organization. Hepatitis B. https://www.who.int/news-room/fact-sheets/detail/hepatitis-b, Accessed 2022-06-24.

  4. Nelson NP, Easterbrook PJ, McMahon BJ. Epidemiology of hepatitis B virus infection and impact of vaccination on disease. Clin. Liver Dis. 2016;20(4):607–28. https://doi.org/10.1016/j.cld.2016.06.006.

    Article  Google Scholar 

  5. Burns GS, Thompson AJ. Viral hepatitis B: clinical and epidemiological characteristics. Cold Spring Harb. Perspect. Med. 2014;4(12):a024935. https://doi.org/10.1101/cshperspect.a024935.

    Article  Google Scholar 

  6. World Health Organization. Guidelines on hepatitis B and C testing. https://www.who.int/publications/i/item/9789241549981, Accessed 2017-02-16.

  7. Song JE, Kim DY. Diagnosis of hepatitis B. Ann Transl Med. 2016;4(18):338. https://doi.org/10.21037/atm.2016.09.11.

    Article  Google Scholar 

  8. Bréchot C. Polymerase chain reaction for the diagnosis of viral hepatitis B and C. Gut. 1993;34(2 Suppl):S39–44. https://doi.org/10.1136/gut.34.2_suppl.s39.

    Article  Google Scholar 

  9. Akram A, Islam SMR, Munshi SU, Tabassum S. Detection of hepatitis B virus DNA among chronic and potential occult HBV patients in resource-limited settings by loop-mediated isothermal amplification assay. J Viral Hepat. 2018;25(11):1306–11. https://doi.org/10.1111/jvh.12931.

    Article  Google Scholar 

  10. Borst A, Box ATA, Fluit AC. False-positive results and contamination in nucleic acid amplification assays: suggestions for a prevent and destroy strategy. Eur J Clin Microbiol Infect Dis. 2004;23(4):289–99. https://doi.org/10.1007/s10096-004-1100-1.

    Article  Google Scholar 

  11. (CDC)., C. for D. C. and P. of the U. S. W.-S. Interpretation of Hepatitis B Serologic Test Results. https://www.cdc.gov/hepatitis/hbv/interpretationOfHepBSerologicResults.htm, Accessed 2023-01-13.

  12. Tanveer M, Jangir J, Ganaie MA, Beheshti I, Tabish M, Chhabra N. Diagnosis of schizophrenia: a comprehensive evaluation. IEEE Journal of Biomedical and Health Informatics. 2023;27(3):1185–92. https://doi.org/10.1109/JBHI.2022.3168357.

    Article  Google Scholar 

  13. Chicco D, Jurman G. An ensemble learning approach for enhanced classification of patients with hepatitis and cirrhosis. IEEE Access. 2021;9:24485–98.

    Article  Google Scholar 

  14. Coronato N, Brown DE, Sharma Y, Bar-Yoseph R, Radom-Aizik S, Cooper DM. Functional data analysis for predicting pediatric failure to complete ten brief exercise bouts. IEEE J Biomed Health Inform. 2022;26(12):5953–63. https://doi.org/10.1109/JBHI.2022.3206100.

    Article  Google Scholar 

  15. Obaido G, Ogbuokiri B, Swart TG, Ayawei N, Kasongo SM, Aruleba K, Mienye ID, Aruleba I, Chukwu W, Osaye F, Egbelowo OF, Simphiwe S, Esenogho E. An interpretable machine learning approach for hepatitis B diagnosis. Appl. Sci. 2022;12(21) https://doi.org/10.3390/app122111127.

  16. Sowmien VS, Sugumaran V, Palani K, Vijayaram DTV, Sowmien S, Sugumaran V, Karthikeyan CP, Vijayaram TR. Diagnosis of hepatitis using decision tree algorithm. Int. J. Eng. Technol. 2016;8:2319–8613.

    Google Scholar 

  17. Tian X, Chong Y, Huang Y, Guo P, Li M, Zhang W, Du Z, Li X, Hao Y. Using machine learning algorithms to predict hepatitis B surface antigen Seroclearance. Comput Math Methods Med. 2019;2019:6915850. https://doi.org/10.1155/2019/6915850.

    Article  Google Scholar 

  18. CDC. CDC Database. https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2017, Accessed 2020-02-15.

  19. Schafer JL, Graham JW. Missing data: our view of the state of the art. Psychol Methods. 2002;7(2):147–77.

    Article  Google Scholar 

  20. Buuren, S. van. Flexible Imputation of Missing Data. https://doi.org/10.1201/b11826.

  21. Wei J, Lu Z, Qiu K, Li P, Sun H. Predicting drug risk level from adverse drug reactions using SMOTE and machine learning approaches. IEEE Access. 2020;8:185761–75. https://doi.org/10.1109/ACCESS.2020.3029446.

    Article  Google Scholar 

  22. Li J, Zhang H, Zhao J, Guo X, Rihan W, Deng G. Embedded feature selection and machine learning methods for flash flood susceptibility-mapping in the mainstream Songhua River Basin, China. Remote Sensing. 2022; https://doi.org/10.3390/rs14215523.

  23. Sultan Bin Habib, A.-Z.; Tasnim, T.; Billah, M. M. A Study on Coronary Disease Prediction Using Boosting-Based Ensemble Machine Learning Approaches. In 2019 2nd International Conference on Innovation in Engineering and Technology (ICIET); 2019; pp 1–6. https://doi.org/10.1109/ICIET48527.2019.9290600.

  24. Maurya, J.; Prakash, S. Machine Learning Based Prediction and Diagnosis of Heart Disease Using Multiple Models. 2023. https://doi.org/10.21203/rs.3.rs-2642516/v1.

  25. Umutesi J, Klett-Tammen C, Nsanzimana S, Krause G, Ott JJ. Cross-sectional study of chronic hepatitis B virus infection in Rwandan high-risk groups: unexpected findings on prevalence and its determinants. BMJ Open. 2021; https://doi.org/10.1136/bmjopen-2021-054039.

  26. Meheus A. Risk of hepatitis B in adolescence and young adulthood. Vaccine. 1995;13(Suppl 1):S31–4. https://doi.org/10.1016/0264-410x(95)80044-e.

    Article  Google Scholar 

  27. Meheus A. Teenagers’ lifestyle and the risk of exposure to hepatitis B virus. Vaccine. 2000;18(Suppl 1):S26–9. https://doi.org/10.1016/s0264-410x(99)00458-2.

    Article  Google Scholar 

Download references

Code availability

Not applicable.

Funding

This research was supported in part by the National Science and Technology Council of the Republic of China (Taiwan), under grants NSTC 112-2221-E-027-107, NSTC 112-2221-E-027-097, and NSTC 112-2119-M-027-001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tan-Hsu Tan.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Conflict of interest

The authors confirm that the present study has no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alizargar, A., Chang, YL., Tan, TH. et al. Comparative analysis of machine learning and ensemble approaches for hepatitis B prediction using data mining with synthetic minority oversampling technique. Health Technol. 14, 109–118 (2024). https://doi.org/10.1007/s12553-023-00802-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12553-023-00802-x

Keywords

Navigation