Skip to main content

Advertisement

Log in

Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Crop Electronic Medical Records (CEMRs) contain a rich diversity of information about disease characteristics, which is highly valuable as a support to plant doctors diagnosing disease. However, mining CEMR presents challenges, such as the lack of publicly available datasets, unlabeled data, and various agricultural and slang terms in the text, which are still unstudied. This study proposes a crop disease diagnosis model based on Bidirectional Encoder Representations from Transformers specific to the crop disease domain and RCNN (CdsBERT-RCNN). First, a crop disease corpus is constructed for domain-adaptive pre-training; second, semantic features of CEMRs are extracted by CdsBERT; third, distinct contextual information is further extracted, and disease diagnosis is achieved through RCNN. A CEMR dataset containing 32 diseases was constructed to validate the model. Experiments showed that the proposed method could effectively diagnose crop diseases with an F1-score of 85.63% and an accuracy of 85.65%. The proposed method outperformed widely used neural network models, i.e., CNN, DPCNN, RCNN, RNN, attention-based RNN, FastText, and Transformer, with more information obtained by self-supervised pre-training; and outperforms generic domain pre-trained language models, i.e., BERT, ERNIE, XLNet and RoBERTa, with data distribution more appropriate for the crop disease domain and effective fine-tuning strategies. Furthermore, we conduct ablation studies, demonstrating the value of DPAT and RCNN in our model. The results demonstrated the effectiveness of our framework for the CEMR-based disease diagnosis, with potential applications in electronic medical record systems and artificial intelligence in crop disease management.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

  1. Plantwise is a global, CABI managed program, aiming at strengthening plant health systems through plant clinics in over 30 countries in Africa, Asia and Latin America: www.plantwise.org.

References

  1. Silvestri S, Macharia M, Uzayisenga B (2019) Analysing the potential of plant clinics to boost crop protection in Rwanda through adoption of IPM: the case of maize and maize stem borers. Food Secur 11:301–315. https://doi.org/10.1007/s12571-019-00910-5

    Article  Google Scholar 

  2. Tambo JA, Uzayisenga B, Mugambi I, Bundi M (2021) Do Plant Clinics improve Household Food Security? Evidence from Rwanda. J Agric Econ 72:97–116. https://doi.org/10.1111/1477-9552.12391

    Article  Google Scholar 

  3. Howell D, Rogers L, Kasarskis A, Twyman K (2022) Comparison and validation of algorithms for asthma diagnosis in an electronic medical record system. Ann Allergy Asthma Immunol 128:677–681. https://doi.org/10.1016/j.anai.2022.03.025

    Article  Google Scholar 

  4. Kong G, Xu DL, Yang JB et al (2021) Evidential reasoning rule-based decision support system for Predicting ICU admission and In-Hospital death of Trauma. IEEE Trans Syst Man Cybern Syst 51:7131–7142. https://doi.org/10.1109/TSMC.2020.2967885

    Article  Google Scholar 

  5. Segura-Bedmar I, Colón-Ruíz C, Tejedor-Alonso M, Moro-Moro M (2018) Predicting of anaphylaxis in big data EMR by exploring machine learning approaches. J Biomed Inform 87:50–59. https://doi.org/10.1016/j.jbi.2018.09.012

    Article  Google Scholar 

  6. Ben Miled Z, Haas K, Black CM et al (2020) Predicting dementia with routine care EMR data. Artif Intell Med 102:101771. https://doi.org/10.1016/j.artmed.2019.101771

    Article  Google Scholar 

  7. Shen Y, Zhang L, Zhang J et al (2018) CBN: constructing a clinical bayesian network based on data from the electronic medical record. J Biomed Inform 88:1–10. https://doi.org/10.1016/j.jbi.2018.10.007

    Article  Google Scholar 

  8. Zhao J, Gu S, McDermaid A (2019) Predicting outcomes of chronic kidney disease from EMR data based on Random Forest Regression. Math Biosci 310:24–30. https://doi.org/10.1016/j.mbs.2019.02.001

    Article  MathSciNet  MATH  Google Scholar 

  9. Tan HX, Teo CHD, Ang PS et al (2022) Combining machine learning with a rule-based algorithm to detect and identify related entities of documented adverse drug reactions on hospital discharge summaries. Drug Saf 45:853–862. https://doi.org/10.1007/s40264-022-01196-x

  10. Song K, Zeng X, Zhang Y et al (2021) An interpretable knowledge-based decision support system and its applications in pregnancy diagnosis. Knowledge-Based Syst 221:106835. https://doi.org/10.1016/j.knosys.2021.106835

    Article  Google Scholar 

  11. Leng J, Wang D, Ma X et al (2022) Bi-level artificial intelligence model for risk classification of acute respiratory diseases based on chinese clinical data. Appl Intell 1–18. https://doi.org/10.1007/s10489-022-03222-y

  12. Rios A, Kavuluru R (2019) Neural transfer learning for assigning diagnosis codes to EMRs. Artif Intell Med 96:116–122. https://doi.org/10.1016/j.artmed.2019.04.002

    Article  Google Scholar 

  13. Ho LV, Aczon M, Ledbetter D, Wetzel R (2021) Interpreting a recurrent neural network’s predictions of ICU mortality risk. J Biomed Inform 114:103672. https://doi.org/10.1016/j.jbi.2021.103672

    Article  Google Scholar 

  14. Qiu XP, Sun TX, Xu YG et al (2020) Pre-trained models for natural language processing: a survey. Sci China Technol Sci 63:1872–1897. https://doi.org/10.1007/s11431-020-1647-3

    Article  Google Scholar 

  15. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019. pp 4171–4186

  16. Zhao A, Yu Y (2021) Knowledge-enabled BERT for aspect-based sentiment analysis. Knowledge-Based Syst 227:107220. https://doi.org/10.1016/j.knosys.2021.107220

    Article  Google Scholar 

  17. Zhu X, Zhu Y, Zhang L, Chen Y (2022) A BERT-based multi-semantic learning model with aspect-aware enhancement for aspect polarity classification. Appl Intell. https://doi.org/10.1007/s10489-022-03702-1

    Article  Google Scholar 

  18. Song R, Liu Z, Chen X et al (2022) Label prompt for multi-label text classification. Appl Intell. https://doi.org/10.1007/s10489-022-03896-4

    Article  Google Scholar 

  19. Wang J, Zhang X, Chen L (2021) How well do pre-trained contextual language representations recommend labels for GitHub issues? Knowledge-Based Syst 232:107476. https://doi.org/10.1016/j.knosys.2021.107476

    Article  Google Scholar 

  20. Dligach D, Afshar M, Miller T (2021) Pre-training phenotyping classifiers. J Biomed Inform 113:103626. https://doi.org/10.1016/j.jbi.2020.103626

    Article  Google Scholar 

  21. Peng Y, Xiao T, Yuan H (2022) Cooperative gating network based on a single BERT encoder for aspect term sentiment analysis. Appl Intell 52:5867–5879. https://doi.org/10.1007/s10489-021-02724-5

    Article  Google Scholar 

  22. Zhang N, Yang G, Pan Y et al (2020) A review of advanced technologies and development for hyperspectral-based plant disease detection in the past three decades. Remote Sens 12:1–34. https://doi.org/10.3390/rs12193188

    Article  Google Scholar 

  23. Nikitin MM, Statsyuk NV, Frantsuzov PA et al (2018) Matrix approach to the simultaneous detection of multiple potato pathogens by real-time PCR. J Appl Microbiol 124:797–809. https://doi.org/10.1111/jam.13686

    Article  Google Scholar 

  24. Abdulridha J, Batuman O, Ampatzidis Y (2019) UAV-based remote sensing technique to detect citrus canker disease utilizing hyperspectral imaging and machine learning. Remote Sens 11:1373

    Article  Google Scholar 

  25. Iqbal Z, Khan MA, Sharif M et al (2018) An automated detection and classification of citrus plant diseases using image processing techniques: a review. Comput Electron Agric 153:12–32. https://doi.org/10.1016/j.compag.2018.07.032

    Article  Google Scholar 

  26. Jiang J, Wang H, Xie J et al (2020) Medical knowledge embedding based on recursive neural network for multi-disease diagnosis. Artif Intell Med 103:101772. https://doi.org/10.1016/j.artmed.2019.101772

    Article  Google Scholar 

  27. Zhang J, Huang Y, Pu R et al (2019) Monitoring plant diseases and pests through remote sensing technology: a review. Comput Electron Agric 165:104943. https://doi.org/10.1016/j.compag.2019.104943

    Article  Google Scholar 

  28. Barbedo JGA (2018) Factors influencing the use of deep learning for plant disease recognition. Biosyst Eng 172:84–91. https://doi.org/10.1016/j.biosystemseng.2018.05.013

    Article  Google Scholar 

  29. Gokulnath CB, Shantharajah SP (2019) An optimized feature selection based on genetic approach and support vector machine for heart disease. Cluster Comput 22:14777–14787. https://doi.org/10.1007/s10586-018-2416-4

    Article  Google Scholar 

  30. Li X, Wang H, He H et al (2019) Intelligent diagnosis with chinese electronic medical records based on convolutional neural networks. BMC Bioinformatics 20:1–12. https://doi.org/10.1186/s12859-019-2617-8

    Article  Google Scholar 

  31. Chen YW, Qin XL, Zhang LG, Yi B (2020) A Novel Method of Heart failure prediction based on DPCNN-XGBOOST Model. C Mater Contin 65:495–510. https://doi.org/10.32604/cmc.2020.011278

    Article  Google Scholar 

  32. Hao Y, Usama M, Yang J et al (2019) Recurrent convolutional neural network based multimodal disease risk prediction. Futur Gener Comput Syst 92:76–83. https://doi.org/10.1016/j.future.2018.09.031

    Article  Google Scholar 

  33. Peters ME, Neumann M, Iyyer M et al (2018) Deep contextualized word representations. NAACL HLT 2018–2018 Conf North Am Chapter Assoc Comput Linguist Hum Lang Technol - Proc Conf 1:2227–2237. https://doi.org/10.18653/v1/n18-1202

  34. Briskilal J, Subalalitha CN (2022) An ensemble model for classifying idioms and literal texts using BERT and RoBERTa. Inf Process Manag 59:102756. https://doi.org/10.1016/j.ipm.2021.102756

    Article  Google Scholar 

  35. Wang Y, Sun Y, Ma Z et al (2020) An ERNIE-based joint model for chinese named entity recognition. Appl Sci 10:5711. https://doi.org/10.3390/app10165711

    Article  Google Scholar 

  36. Yang Z, Dai Z, Yang Y et al (2019) XLNet: generalized autoregressive pretraining for language understanding.Adv Neural Inf Process Syst 32

  37. Liu Y, He F (2019) Incorporating the disease triangle framework for testing the effect of soil-borne pathogens on tree species diversity. Funct Ecol 33:1211–1222. https://doi.org/10.1111/1365-2435.13345

    Article  MathSciNet  Google Scholar 

  38. Ketkar N, Moolayil J (2021) Introduction to PyTorch. In: Deep learning with Python. Springer, pp 27–91

  39. Sun C, Qiu X, Xu Y, Huang X (2019) How to fine-tune BERT for text classification? In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Springer, pp 194–206

  40. Rong D, Xie L, Ying Y (2019) Computer vision detection of foreign objects in walnuts using deep learning. Comput Electron Agric 162:1001–1010. https://doi.org/10.1016/j.compag.2019.05.019

    Article  Google Scholar 

  41. Yu S, Su J, Luo D (2019) Improving BERT-Based text classification with auxiliary sentence and domain knowledge. IEEE Access 7:176600–176612. https://doi.org/10.1109/ACCESS.2019.2953990

  42. Mahmood Z, Safder I, Nawab RMA et al (2020) Deep sentiments in roman Urdu text using recurrent convolutional neural network model. Inf Process Manag 57:102233. https://doi.org/10.1016/j.ipm.2020.102233

    Article  Google Scholar 

  43. Usama M, Ahmad B, Xiao W et al (2020) Self-attention based recurrent convolutional neural network for disease prediction using healthcare data. Comput Methods Programs Biomed 190:105191. https://doi.org/10.1016/j.cmpb.2019.105191

    Article  Google Scholar 

  44. Shan G, Xu S, Yang L et al (2020) Learn#: a novel incremental learning method for text classification. Expert Syst Appl 147:113198. https://doi.org/10.1016/j.eswa.2020.113198

    Article  Google Scholar 

  45. Jiang M, Liang Y, Feng X et al (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29:61–70. https://doi.org/10.1007/s00521-016-2401-x

    Article  Google Scholar 

  46. Kiran R, Kumar P, Bhasker B (2020) Oslcfit (organic simultaneous LSTM and CNN Fit): a novel deep learning based solution for sentiment polarity classification of reviews. Expert Syst Appl 157:113488. https://doi.org/10.1016/j.eswa.2020.113488

    Article  Google Scholar 

  47. Minaee S, Kalchbrenner N, Cambria E et al (2021) Deep learning–based text classification: a comprehensive review. ACM Comput Surv 54:1–40

  48. Ait Hammou B, Ait Lahcen A, Mouline S (2020) Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics. Inf Process Manag 57:102122. https://doi.org/10.1016/j.ipm.2019.102122

    Article  Google Scholar 

  49. Sadiq S, Umer M, Ullah S et al (2021) Discrepancy detection between actual user reviews and numeric ratings of Google App store using deep learning. Expert Syst Appl 181:115111. https://doi.org/10.1016/j.eswa.2021.115111

    Article  Google Scholar 

Download references

Funding

This study was funded by the National Natural Science Foundation of China (62176261).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lingxian Zhang.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, J., Li, B., Xu, C. et al. Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records. Appl Intell 53, 15979–15992 (2023). https://doi.org/10.1007/s10489-022-04346-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04346-x

Keywords

Navigation