Skip to main content

Cascading Approach for Automatic ICD-10 Codes Association To Diseases in Bulgarian

  • Conference paper
  • First Online:
Contemporary Methods in Bioinformatics and Biomedicine and Their Applications (BioInfoMed 2020)

Abstract

ICD-10 is the 10th revision of the International Classification of Diseases, a medical ontology for the encoding of diseases and related health problems provided by the World Health Organization. This encoding is used by physicians to be able to describe diseases in a standardized way. Since this is currently performed manually by medical professionals, the ability to automate this task would save time and allow doctors to focus more on patient care. The task of automatic association of ICD-10 codes to a textual description is an extreme scale multi-class multi-label classification task, due to the huge number of classes – 11000, and the possibility to assign multiple valid ICD-10 codes to a diagnosis. Moreover, for the application of machine learning algorithms for this task, a large training data set is required. This task is even a bigger challenge for low resource languages such as the Bulgarian language. We created semi-automatically a dataset from linked open data and public documents. The corpora contain about 350,000 diagnoses in the Bulgarian language labeled with 3-sign and 4 sign ICD-10 codes. The paper presents a cascading approach for automatic classification of ICD-10 codes to diagnosis, which uses the hierarchical nature of the ICD-10 classification, to improve the accuracy of classification. This approach is tested and compared with the flat classification approach on the above-mentioned date set. Different machine learning algorithms are tested, including those based on deep learning transformers like BERT models. The results from the conducted experiments provide evidence that the proposed approach which takes into account the hierarchical structure of the ICD-10 codes outperforms the ones that ignore it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.snomed.org/.

  2. 2.

    https://www.cdc.gov/nchs/icd/icd9.htm.

  3. 3.

    https://www.who.int/classifications/icd/icdonlineversions/en/.

  4. 4.

    https://ncpha.government.bg/bg/2019-02-19-23-22-18/icd-10.

  5. 5.

    https://github.com/BorisVelichkov/ICD10-Medical-Data.

  6. 6.

    https://www.wikidata.org/.

  7. 7.

    Scikit-Learn https://scikit-learn.org/stable/.

  8. 8.

    nlpaug library https://pypi.org/project/nlpaug/.

  9. 9.

    https://huggingface.co/transformers/tokenizer_summary.html.

  10. 10.

    BTB stopword list in Bulgarian: http://bultreebank.org/wp-content/uploads/2017/04/BTB-StopWordList.zip.

  11. 11.

    BulStem stemmer: https://github.com/mhardalov/bulstem-py.

  12. 12.

    https://www.kaggle.com/nikkisharma536/fastai-toxic.

  13. 13.

    Multilingual BERT https://github.com/google-research/bert/blob/master/multilingual.md.

  14. 14.

    https://www.kaggle.com/nikkisharma536/fastai-toxic.

  15. 15.

    https://docs.fast.ai/text.models.awdlstm.

References

  1. Almagro, M., Unanue, R.M., Fresno, V., Montalvo, S.: ICD-10 coding of Spanish electronic discharge summaries: an extreme classification problem. IEEE Access 8, 100073–100083 (2020)

    Article  Google Scholar 

  2. Alsentzer, E., et al.: Publicly available clinical bert embeddings. arXiv preprint arXiv:1904.03323 (2019)

  3. Amin, S., Neumann, G., Dunfield, K., Vechkaeva, A., Chapman, K., Wixted, M.: Mlt-dfki at clef ehealth 2019: Multi-label classification of ICD-10 codes with bert (September 2019)

    Google Scholar 

  4. Arifoğlu, D., Deniz, O., Aleçakır, K., Yöndem, M.: CodeMagic: semi-automatic assignment of ICD-10-AM codes to patient records. In: Czachórski, T., Gelenbe, E., Lent, R. (eds.) Information Sciences and Systems 2014, pp. 259–268. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09465-6_27

    Chapter  Google Scholar 

  5. Arkhipov, M., Trofimova, M., Kuratov, Y., Sorokin, A.: Tuning multilingual transformers for language-specific named entity recognition. In: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pp. 89–93 (2019)

    Google Scholar 

  6. Atutxa, A., Pérez, A., Casillas, A.: Machine learning approaches on diagnostic term encoding with the icd for clinical documentation. IEEE J. Biomed. Health Inform. 22(4), 1323–1329 (2017)

    Article  Google Scholar 

  7. Bagheri, A., Sammani, A., Van der Heijden, P.G., Asselbergs, F.W., Oberski, D.L.: Automatic icd-10 classification of diseases from dutch discharge letters. In: BIOINFORMATICS 2020–11th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 13th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2020, vol. 13, pp. 281–289. SciTePress (2020). https://doi.org/10.5220/0009372602810289

  8. Boytcheva, S.: Automatic matching of ICD-10 codes to diagnoses in discharge letters. In: Proceedings of the Second Workshop on Biomedical Natural Language Processing, pp. 11–18. Association for Computational Linguistics, Hissar, Bulgaria (September 2011). https://www.aclweb.org/anthology/W11-4203

  9. Boytcheva, S., Velichkov, B., Velchev, G., Koychev, I.: Automatic generation of annotated corpora of diagnoses with icd-10 codes based on open data and linked open data. In: 2020 15th Conference on Computer Science and Information Systems (FedCSIS), pp. 163–167. IEEE (2020)

    Google Scholar 

  10. Catling, F., Spithourakis, G.P., Riedel, S.: Towards automated clinical coding. Int. J. Med. Inform. 120, 50–61 (2018)

    Article  Google Scholar 

  11. CEYLAN, N.M., ALPKOÇAK, A., ESATOĞLU, A.E.: Tıbbi kayıtlara icd-10 hastalık kodlarının atanmasına yardımcı akıllı bir sistem (2012)

    Google Scholar 

  12. Chen, Y., Lu, H., Li, L.: Automatic icd-10 coding algorithm using an improved longest common subsequence based on semantic similarity. PLoS ONE 12(3), e0173410 (2017)

    Google Scholar 

  13. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  14. Lee, J., et al.: Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)

    Google Scholar 

  15. Ning, W., Yu, M., Zhang, R.: A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation. BMC Med. Inform. Decis. Mak. 16(1), 1–12 (2016)

    Article  Google Scholar 

  16. Parlak, B., Uysal, A.K.: On feature weighting and selection for medical document classification. In: Rocha, Á., Reis, L.P. (eds.) Developments and Advances in Intelligent Systems and Applications. SCI, vol. 718, pp. 269–282. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-58965-7_19

    Chapter  Google Scholar 

  17. Velichkov, B., et al.: Automatic icd-10 codes association to diagnosis: bulgarian case. In: CSBio 2020: Proceedings of the Eleventh International Conference on Computational Systems-Biology and Bioinformatics, pp. 46–53 (2020). https://doi.org/10.1145/3429210.3429224

  18. Wang, Q., et al.: A study of entity-linking methods for normalizing Chinese diagnosis and procedure terms to icd codes. J. Biomed. Inform. 105, 103418 (2020). https://doi.org/10.1016/j.jbi.2020.103418

Download references

Acknowledgments

This research is partially funded by the Bulgarian Ministry of Education and Science, grant DO1-200/2018 ‘Electronic health care in Bulgaria’ (e-Zdrave) and the Bulgarian National Science Fund, grant DN-02/4-2016 ‘Specialized Data Mining Methods Based on Semantic Attributes’ (IZIDA).

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Velichkov, B. et al. (2022). Cascading Approach for Automatic ICD-10 Codes Association To Diseases in Bulgarian. In: Sotirov, S.S., Pencheva, T., Kacprzyk, J., Atanassov, K.T., Sotirova, E., Staneva, G. (eds) Contemporary Methods in Bioinformatics and Biomedicine and Their Applications. BioInfoMed 2020. Lecture Notes in Networks and Systems, vol 374. Springer, Cham. https://doi.org/10.1007/978-3-030-96638-6_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-96638-6_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-96637-9

  • Online ISBN: 978-3-030-96638-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics