Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records

Ding, Junqi; Li, Bo; Xu, Chang; Qiao, Yan; Zhang, Lingxian

doi:10.1007/s10489-022-04346-x

Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records

Published: 01 December 2022

Volume 53, pages 15979–15992, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Junqi Ding¹,
Bo Li²,
Chang Xu¹,
Yan Qiao³ &
…
Lingxian Zhang ORCID: orcid.org/0000-0002-8665-7075^1,4,5

672 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Crop Electronic Medical Records (CEMRs) contain a rich diversity of information about disease characteristics, which is highly valuable as a support to plant doctors diagnosing disease. However, mining CEMR presents challenges, such as the lack of publicly available datasets, unlabeled data, and various agricultural and slang terms in the text, which are still unstudied. This study proposes a crop disease diagnosis model based on Bidirectional Encoder Representations from Transformers specific to the crop disease domain and RCNN (CdsBERT-RCNN). First, a crop disease corpus is constructed for domain-adaptive pre-training; second, semantic features of CEMRs are extracted by CdsBERT; third, distinct contextual information is further extracted, and disease diagnosis is achieved through RCNN. A CEMR dataset containing 32 diseases was constructed to validate the model. Experiments showed that the proposed method could effectively diagnose crop diseases with an F1-score of 85.63% and an accuracy of 85.65%. The proposed method outperformed widely used neural network models, i.e., CNN, DPCNN, RCNN, RNN, attention-based RNN, FastText, and Transformer, with more information obtained by self-supervised pre-training; and outperforms generic domain pre-trained language models, i.e., BERT, ERNIE, XLNet and RoBERTa, with data distribution more appropriate for the crop disease domain and effective fine-tuning strategies. Furthermore, we conduct ablation studies, demonstrating the value of DPAT and RCNN in our model. The results demonstrated the effectiveness of our framework for the CEMR-based disease diagnosis, with potential applications in electronic medical record systems and artificial intelligence in crop disease management.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel plant disease diagnosis framework by integrating semi-supervised and ensemble learning

Article 28 September 2023

Res4net-CBAM: a deep cnn with convolution block attention module for tea leaf disease diagnosis

Article 03 November 2023

A novel GCL hybrid classification model for paddy diseases

Article 19 September 2022

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

Plantwise is a global, CABI managed program, aiming at strengthening plant health systems through plant clinics in over 30 countries in Africa, Asia and Latin America: www.plantwise.org.

References

Silvestri S, Macharia M, Uzayisenga B (2019) Analysing the potential of plant clinics to boost crop protection in Rwanda through adoption of IPM: the case of maize and maize stem borers. Food Secur 11:301–315. https://doi.org/10.1007/s12571-019-00910-5
Article Google Scholar
Tambo JA, Uzayisenga B, Mugambi I, Bundi M (2021) Do Plant Clinics improve Household Food Security? Evidence from Rwanda. J Agric Econ 72:97–116. https://doi.org/10.1111/1477-9552.12391
Article Google Scholar
Howell D, Rogers L, Kasarskis A, Twyman K (2022) Comparison and validation of algorithms for asthma diagnosis in an electronic medical record system. Ann Allergy Asthma Immunol 128:677–681. https://doi.org/10.1016/j.anai.2022.03.025
Article Google Scholar
Kong G, Xu DL, Yang JB et al (2021) Evidential reasoning rule-based decision support system for Predicting ICU admission and In-Hospital death of Trauma. IEEE Trans Syst Man Cybern Syst 51:7131–7142. https://doi.org/10.1109/TSMC.2020.2967885
Article Google Scholar
Segura-Bedmar I, Colón-Ruíz C, Tejedor-Alonso M, Moro-Moro M (2018) Predicting of anaphylaxis in big data EMR by exploring machine learning approaches. J Biomed Inform 87:50–59. https://doi.org/10.1016/j.jbi.2018.09.012
Article Google Scholar
Ben Miled Z, Haas K, Black CM et al (2020) Predicting dementia with routine care EMR data. Artif Intell Med 102:101771. https://doi.org/10.1016/j.artmed.2019.101771
Article Google Scholar
Shen Y, Zhang L, Zhang J et al (2018) CBN: constructing a clinical bayesian network based on data from the electronic medical record. J Biomed Inform 88:1–10. https://doi.org/10.1016/j.jbi.2018.10.007
Article Google Scholar
Zhao J, Gu S, McDermaid A (2019) Predicting outcomes of chronic kidney disease from EMR data based on Random Forest Regression. Math Biosci 310:24–30. https://doi.org/10.1016/j.mbs.2019.02.001
Article MathSciNet MATH Google Scholar
Tan HX, Teo CHD, Ang PS et al (2022) Combining machine learning with a rule-based algorithm to detect and identify related entities of documented adverse drug reactions on hospital discharge summaries. Drug Saf 45:853–862. https://doi.org/10.1007/s40264-022-01196-x
Song K, Zeng X, Zhang Y et al (2021) An interpretable knowledge-based decision support system and its applications in pregnancy diagnosis. Knowledge-Based Syst 221:106835. https://doi.org/10.1016/j.knosys.2021.106835
Article Google Scholar
Leng J, Wang D, Ma X et al (2022) Bi-level artificial intelligence model for risk classification of acute respiratory diseases based on chinese clinical data. Appl Intell 1–18. https://doi.org/10.1007/s10489-022-03222-y
Rios A, Kavuluru R (2019) Neural transfer learning for assigning diagnosis codes to EMRs. Artif Intell Med 96:116–122. https://doi.org/10.1016/j.artmed.2019.04.002
Article Google Scholar
Ho LV, Aczon M, Ledbetter D, Wetzel R (2021) Interpreting a recurrent neural network’s predictions of ICU mortality risk. J Biomed Inform 114:103672. https://doi.org/10.1016/j.jbi.2021.103672
Article Google Scholar
Qiu XP, Sun TX, Xu YG et al (2020) Pre-trained models for natural language processing: a survey. Sci China Technol Sci 63:1872–1897. https://doi.org/10.1007/s11431-020-1647-3
Article Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019. pp 4171–4186
Zhao A, Yu Y (2021) Knowledge-enabled BERT for aspect-based sentiment analysis. Knowledge-Based Syst 227:107220. https://doi.org/10.1016/j.knosys.2021.107220
Article Google Scholar
Zhu X, Zhu Y, Zhang L, Chen Y (2022) A BERT-based multi-semantic learning model with aspect-aware enhancement for aspect polarity classification. Appl Intell. https://doi.org/10.1007/s10489-022-03702-1
Article Google Scholar
Song R, Liu Z, Chen X et al (2022) Label prompt for multi-label text classification. Appl Intell. https://doi.org/10.1007/s10489-022-03896-4
Article Google Scholar
Wang J, Zhang X, Chen L (2021) How well do pre-trained contextual language representations recommend labels for GitHub issues? Knowledge-Based Syst 232:107476. https://doi.org/10.1016/j.knosys.2021.107476
Article Google Scholar
Dligach D, Afshar M, Miller T (2021) Pre-training phenotyping classifiers. J Biomed Inform 113:103626. https://doi.org/10.1016/j.jbi.2020.103626
Article Google Scholar
Peng Y, Xiao T, Yuan H (2022) Cooperative gating network based on a single BERT encoder for aspect term sentiment analysis. Appl Intell 52:5867–5879. https://doi.org/10.1007/s10489-021-02724-5
Article Google Scholar
Zhang N, Yang G, Pan Y et al (2020) A review of advanced technologies and development for hyperspectral-based plant disease detection in the past three decades. Remote Sens 12:1–34. https://doi.org/10.3390/rs12193188
Article Google Scholar
Nikitin MM, Statsyuk NV, Frantsuzov PA et al (2018) Matrix approach to the simultaneous detection of multiple potato pathogens by real-time PCR. J Appl Microbiol 124:797–809. https://doi.org/10.1111/jam.13686
Article Google Scholar
Abdulridha J, Batuman O, Ampatzidis Y (2019) UAV-based remote sensing technique to detect citrus canker disease utilizing hyperspectral imaging and machine learning. Remote Sens 11:1373
Article Google Scholar
Iqbal Z, Khan MA, Sharif M et al (2018) An automated detection and classification of citrus plant diseases using image processing techniques: a review. Comput Electron Agric 153:12–32. https://doi.org/10.1016/j.compag.2018.07.032
Article Google Scholar
Jiang J, Wang H, Xie J et al (2020) Medical knowledge embedding based on recursive neural network for multi-disease diagnosis. Artif Intell Med 103:101772. https://doi.org/10.1016/j.artmed.2019.101772
Article Google Scholar
Zhang J, Huang Y, Pu R et al (2019) Monitoring plant diseases and pests through remote sensing technology: a review. Comput Electron Agric 165:104943. https://doi.org/10.1016/j.compag.2019.104943
Article Google Scholar
Barbedo JGA (2018) Factors influencing the use of deep learning for plant disease recognition. Biosyst Eng 172:84–91. https://doi.org/10.1016/j.biosystemseng.2018.05.013
Article Google Scholar
Gokulnath CB, Shantharajah SP (2019) An optimized feature selection based on genetic approach and support vector machine for heart disease. Cluster Comput 22:14777–14787. https://doi.org/10.1007/s10586-018-2416-4
Article Google Scholar
Li X, Wang H, He H et al (2019) Intelligent diagnosis with chinese electronic medical records based on convolutional neural networks. BMC Bioinformatics 20:1–12. https://doi.org/10.1186/s12859-019-2617-8
Article Google Scholar
Chen YW, Qin XL, Zhang LG, Yi B (2020) A Novel Method of Heart failure prediction based on DPCNN-XGBOOST Model. C Mater Contin 65:495–510. https://doi.org/10.32604/cmc.2020.011278
Article Google Scholar
Hao Y, Usama M, Yang J et al (2019) Recurrent convolutional neural network based multimodal disease risk prediction. Futur Gener Comput Syst 92:76–83. https://doi.org/10.1016/j.future.2018.09.031
Article Google Scholar
Peters ME, Neumann M, Iyyer M et al (2018) Deep contextualized word representations. NAACL HLT 2018–2018 Conf North Am Chapter Assoc Comput Linguist Hum Lang Technol - Proc Conf 1:2227–2237. https://doi.org/10.18653/v1/n18-1202
Briskilal J, Subalalitha CN (2022) An ensemble model for classifying idioms and literal texts using BERT and RoBERTa. Inf Process Manag 59:102756. https://doi.org/10.1016/j.ipm.2021.102756
Article Google Scholar
Wang Y, Sun Y, Ma Z et al (2020) An ERNIE-based joint model for chinese named entity recognition. Appl Sci 10:5711. https://doi.org/10.3390/app10165711
Article Google Scholar
Yang Z, Dai Z, Yang Y et al (2019) XLNet: generalized autoregressive pretraining for language understanding.Adv Neural Inf Process Syst 32
Liu Y, He F (2019) Incorporating the disease triangle framework for testing the effect of soil-borne pathogens on tree species diversity. Funct Ecol 33:1211–1222. https://doi.org/10.1111/1365-2435.13345
Article MathSciNet Google Scholar
Ketkar N, Moolayil J (2021) Introduction to PyTorch. In: Deep learning with Python. Springer, pp 27–91
Sun C, Qiu X, Xu Y, Huang X (2019) How to fine-tune BERT for text classification? In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Springer, pp 194–206
Rong D, Xie L, Ying Y (2019) Computer vision detection of foreign objects in walnuts using deep learning. Comput Electron Agric 162:1001–1010. https://doi.org/10.1016/j.compag.2019.05.019
Article Google Scholar
Yu S, Su J, Luo D (2019) Improving BERT-Based text classification with auxiliary sentence and domain knowledge. IEEE Access 7:176600–176612. https://doi.org/10.1109/ACCESS.2019.2953990
Mahmood Z, Safder I, Nawab RMA et al (2020) Deep sentiments in roman Urdu text using recurrent convolutional neural network model. Inf Process Manag 57:102233. https://doi.org/10.1016/j.ipm.2020.102233
Article Google Scholar
Usama M, Ahmad B, Xiao W et al (2020) Self-attention based recurrent convolutional neural network for disease prediction using healthcare data. Comput Methods Programs Biomed 190:105191. https://doi.org/10.1016/j.cmpb.2019.105191
Article Google Scholar
Shan G, Xu S, Yang L et al (2020) Learn#: a novel incremental learning method for text classification. Expert Syst Appl 147:113198. https://doi.org/10.1016/j.eswa.2020.113198
Article Google Scholar
Jiang M, Liang Y, Feng X et al (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29:61–70. https://doi.org/10.1007/s00521-016-2401-x
Article Google Scholar
Kiran R, Kumar P, Bhasker B (2020) Oslcfit (organic simultaneous LSTM and CNN Fit): a novel deep learning based solution for sentiment polarity classification of reviews. Expert Syst Appl 157:113488. https://doi.org/10.1016/j.eswa.2020.113488
Article Google Scholar
Minaee S, Kalchbrenner N, Cambria E et al (2021) Deep learning–based text classification: a comprehensive review. ACM Comput Surv 54:1–40
Ait Hammou B, Ait Lahcen A, Mouline S (2020) Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics. Inf Process Manag 57:102122. https://doi.org/10.1016/j.ipm.2019.102122
Article Google Scholar
Sadiq S, Umer M, Ullah S et al (2021) Discrepancy detection between actual user reviews and numeric ratings of Google App store using deep learning. Expert Syst Appl 181:115111. https://doi.org/10.1016/j.eswa.2021.115111
Article Google Scholar

Download references

Funding

This study was funded by the National Natural Science Foundation of China (62176261).

Author information

Authors and Affiliations

China Agricultural University, Beijing, 100083, China
Junqi Ding, Chang Xu & Lingxian Zhang
School of Economics and Management, Beijing Information Science and Technology University, Beijing, 100192, China
Bo Li
Beijing Plant Protection Station, Beijing, 100029, China
Yan Qiao
Key Laboratory of Agricultural Informationization Standardization, Ministry of Agriculture and Rural Affairs, Beijing, China
Lingxian Zhang
College of Information and Electrical Engineering, China Agricultural University, 209# No.17 Qinghua Donglu, Haidian District, Beijing, 100083, China
Lingxian Zhang

Authors

Junqi Ding
View author publications
You can also search for this author in PubMed Google Scholar
Bo Li
View author publications
You can also search for this author in PubMed Google Scholar
Chang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Lingxian Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lingxian Zhang.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ding, J., Li, B., Xu, C. et al. Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records. Appl Intell 53, 15979–15992 (2023). https://doi.org/10.1007/s10489-022-04346-x

Download citation

Accepted: 16 November 2022
Published: 01 December 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04346-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records

Abstract

Access this article

Similar content being viewed by others

A novel plant disease diagnosis framework by integrating semi-supervised and ensemble learning

Res4net-CBAM: a deep cnn with convolution block attention module for tea leaf disease diagnosis

A novel GCL hybrid classification model for paddy diseases

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records

Abstract

Access this article

Similar content being viewed by others

A novel plant disease diagnosis framework by integrating semi-supervised and ensemble learning

Res4net-CBAM: a deep cnn with convolution block attention module for tea leaf disease diagnosis

A novel GCL hybrid classification model for paddy diseases

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation