Abstract
Crop Electronic Medical Records (CEMRs) contain a rich diversity of information about disease characteristics, which is highly valuable as a support to plant doctors diagnosing disease. However, mining CEMR presents challenges, such as the lack of publicly available datasets, unlabeled data, and various agricultural and slang terms in the text, which are still unstudied. This study proposes a crop disease diagnosis model based on Bidirectional Encoder Representations from Transformers specific to the crop disease domain and RCNN (CdsBERT-RCNN). First, a crop disease corpus is constructed for domain-adaptive pre-training; second, semantic features of CEMRs are extracted by CdsBERT; third, distinct contextual information is further extracted, and disease diagnosis is achieved through RCNN. A CEMR dataset containing 32 diseases was constructed to validate the model. Experiments showed that the proposed method could effectively diagnose crop diseases with an F1-score of 85.63% and an accuracy of 85.65%. The proposed method outperformed widely used neural network models, i.e., CNN, DPCNN, RCNN, RNN, attention-based RNN, FastText, and Transformer, with more information obtained by self-supervised pre-training; and outperforms generic domain pre-trained language models, i.e., BERT, ERNIE, XLNet and RoBERTa, with data distribution more appropriate for the crop disease domain and effective fine-tuning strategies. Furthermore, we conduct ablation studies, demonstrating the value of DPAT and RCNN in our model. The results demonstrated the effectiveness of our framework for the CEMR-based disease diagnosis, with potential applications in electronic medical record systems and artificial intelligence in crop disease management.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Notes
Plantwise is a global, CABI managed program, aiming at strengthening plant health systems through plant clinics in over 30 countries in Africa, Asia and Latin America: www.plantwise.org.
References
Silvestri S, Macharia M, Uzayisenga B (2019) Analysing the potential of plant clinics to boost crop protection in Rwanda through adoption of IPM: the case of maize and maize stem borers. Food Secur 11:301–315. https://doi.org/10.1007/s12571-019-00910-5
Tambo JA, Uzayisenga B, Mugambi I, Bundi M (2021) Do Plant Clinics improve Household Food Security? Evidence from Rwanda. J Agric Econ 72:97–116. https://doi.org/10.1111/1477-9552.12391
Howell D, Rogers L, Kasarskis A, Twyman K (2022) Comparison and validation of algorithms for asthma diagnosis in an electronic medical record system. Ann Allergy Asthma Immunol 128:677–681. https://doi.org/10.1016/j.anai.2022.03.025
Kong G, Xu DL, Yang JB et al (2021) Evidential reasoning rule-based decision support system for Predicting ICU admission and In-Hospital death of Trauma. IEEE Trans Syst Man Cybern Syst 51:7131–7142. https://doi.org/10.1109/TSMC.2020.2967885
Segura-Bedmar I, Colón-Ruíz C, Tejedor-Alonso M, Moro-Moro M (2018) Predicting of anaphylaxis in big data EMR by exploring machine learning approaches. J Biomed Inform 87:50–59. https://doi.org/10.1016/j.jbi.2018.09.012
Ben Miled Z, Haas K, Black CM et al (2020) Predicting dementia with routine care EMR data. Artif Intell Med 102:101771. https://doi.org/10.1016/j.artmed.2019.101771
Shen Y, Zhang L, Zhang J et al (2018) CBN: constructing a clinical bayesian network based on data from the electronic medical record. J Biomed Inform 88:1–10. https://doi.org/10.1016/j.jbi.2018.10.007
Zhao J, Gu S, McDermaid A (2019) Predicting outcomes of chronic kidney disease from EMR data based on Random Forest Regression. Math Biosci 310:24–30. https://doi.org/10.1016/j.mbs.2019.02.001
Tan HX, Teo CHD, Ang PS et al (2022) Combining machine learning with a rule-based algorithm to detect and identify related entities of documented adverse drug reactions on hospital discharge summaries. Drug Saf 45:853–862. https://doi.org/10.1007/s40264-022-01196-x
Song K, Zeng X, Zhang Y et al (2021) An interpretable knowledge-based decision support system and its applications in pregnancy diagnosis. Knowledge-Based Syst 221:106835. https://doi.org/10.1016/j.knosys.2021.106835
Leng J, Wang D, Ma X et al (2022) Bi-level artificial intelligence model for risk classification of acute respiratory diseases based on chinese clinical data. Appl Intell 1–18. https://doi.org/10.1007/s10489-022-03222-y
Rios A, Kavuluru R (2019) Neural transfer learning for assigning diagnosis codes to EMRs. Artif Intell Med 96:116–122. https://doi.org/10.1016/j.artmed.2019.04.002
Ho LV, Aczon M, Ledbetter D, Wetzel R (2021) Interpreting a recurrent neural network’s predictions of ICU mortality risk. J Biomed Inform 114:103672. https://doi.org/10.1016/j.jbi.2021.103672
Qiu XP, Sun TX, Xu YG et al (2020) Pre-trained models for natural language processing: a survey. Sci China Technol Sci 63:1872–1897. https://doi.org/10.1007/s11431-020-1647-3
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019. pp 4171–4186
Zhao A, Yu Y (2021) Knowledge-enabled BERT for aspect-based sentiment analysis. Knowledge-Based Syst 227:107220. https://doi.org/10.1016/j.knosys.2021.107220
Zhu X, Zhu Y, Zhang L, Chen Y (2022) A BERT-based multi-semantic learning model with aspect-aware enhancement for aspect polarity classification. Appl Intell. https://doi.org/10.1007/s10489-022-03702-1
Song R, Liu Z, Chen X et al (2022) Label prompt for multi-label text classification. Appl Intell. https://doi.org/10.1007/s10489-022-03896-4
Wang J, Zhang X, Chen L (2021) How well do pre-trained contextual language representations recommend labels for GitHub issues? Knowledge-Based Syst 232:107476. https://doi.org/10.1016/j.knosys.2021.107476
Dligach D, Afshar M, Miller T (2021) Pre-training phenotyping classifiers. J Biomed Inform 113:103626. https://doi.org/10.1016/j.jbi.2020.103626
Peng Y, Xiao T, Yuan H (2022) Cooperative gating network based on a single BERT encoder for aspect term sentiment analysis. Appl Intell 52:5867–5879. https://doi.org/10.1007/s10489-021-02724-5
Zhang N, Yang G, Pan Y et al (2020) A review of advanced technologies and development for hyperspectral-based plant disease detection in the past three decades. Remote Sens 12:1–34. https://doi.org/10.3390/rs12193188
Nikitin MM, Statsyuk NV, Frantsuzov PA et al (2018) Matrix approach to the simultaneous detection of multiple potato pathogens by real-time PCR. J Appl Microbiol 124:797–809. https://doi.org/10.1111/jam.13686
Abdulridha J, Batuman O, Ampatzidis Y (2019) UAV-based remote sensing technique to detect citrus canker disease utilizing hyperspectral imaging and machine learning. Remote Sens 11:1373
Iqbal Z, Khan MA, Sharif M et al (2018) An automated detection and classification of citrus plant diseases using image processing techniques: a review. Comput Electron Agric 153:12–32. https://doi.org/10.1016/j.compag.2018.07.032
Jiang J, Wang H, Xie J et al (2020) Medical knowledge embedding based on recursive neural network for multi-disease diagnosis. Artif Intell Med 103:101772. https://doi.org/10.1016/j.artmed.2019.101772
Zhang J, Huang Y, Pu R et al (2019) Monitoring plant diseases and pests through remote sensing technology: a review. Comput Electron Agric 165:104943. https://doi.org/10.1016/j.compag.2019.104943
Barbedo JGA (2018) Factors influencing the use of deep learning for plant disease recognition. Biosyst Eng 172:84–91. https://doi.org/10.1016/j.biosystemseng.2018.05.013
Gokulnath CB, Shantharajah SP (2019) An optimized feature selection based on genetic approach and support vector machine for heart disease. Cluster Comput 22:14777–14787. https://doi.org/10.1007/s10586-018-2416-4
Li X, Wang H, He H et al (2019) Intelligent diagnosis with chinese electronic medical records based on convolutional neural networks. BMC Bioinformatics 20:1–12. https://doi.org/10.1186/s12859-019-2617-8
Chen YW, Qin XL, Zhang LG, Yi B (2020) A Novel Method of Heart failure prediction based on DPCNN-XGBOOST Model. C Mater Contin 65:495–510. https://doi.org/10.32604/cmc.2020.011278
Hao Y, Usama M, Yang J et al (2019) Recurrent convolutional neural network based multimodal disease risk prediction. Futur Gener Comput Syst 92:76–83. https://doi.org/10.1016/j.future.2018.09.031
Peters ME, Neumann M, Iyyer M et al (2018) Deep contextualized word representations. NAACL HLT 2018–2018 Conf North Am Chapter Assoc Comput Linguist Hum Lang Technol - Proc Conf 1:2227–2237. https://doi.org/10.18653/v1/n18-1202
Briskilal J, Subalalitha CN (2022) An ensemble model for classifying idioms and literal texts using BERT and RoBERTa. Inf Process Manag 59:102756. https://doi.org/10.1016/j.ipm.2021.102756
Wang Y, Sun Y, Ma Z et al (2020) An ERNIE-based joint model for chinese named entity recognition. Appl Sci 10:5711. https://doi.org/10.3390/app10165711
Yang Z, Dai Z, Yang Y et al (2019) XLNet: generalized autoregressive pretraining for language understanding.Adv Neural Inf Process Syst 32
Liu Y, He F (2019) Incorporating the disease triangle framework for testing the effect of soil-borne pathogens on tree species diversity. Funct Ecol 33:1211–1222. https://doi.org/10.1111/1365-2435.13345
Ketkar N, Moolayil J (2021) Introduction to PyTorch. In: Deep learning with Python. Springer, pp 27–91
Sun C, Qiu X, Xu Y, Huang X (2019) How to fine-tune BERT for text classification? In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Springer, pp 194–206
Rong D, Xie L, Ying Y (2019) Computer vision detection of foreign objects in walnuts using deep learning. Comput Electron Agric 162:1001–1010. https://doi.org/10.1016/j.compag.2019.05.019
Yu S, Su J, Luo D (2019) Improving BERT-Based text classification with auxiliary sentence and domain knowledge. IEEE Access 7:176600–176612. https://doi.org/10.1109/ACCESS.2019.2953990
Mahmood Z, Safder I, Nawab RMA et al (2020) Deep sentiments in roman Urdu text using recurrent convolutional neural network model. Inf Process Manag 57:102233. https://doi.org/10.1016/j.ipm.2020.102233
Usama M, Ahmad B, Xiao W et al (2020) Self-attention based recurrent convolutional neural network for disease prediction using healthcare data. Comput Methods Programs Biomed 190:105191. https://doi.org/10.1016/j.cmpb.2019.105191
Shan G, Xu S, Yang L et al (2020) Learn#: a novel incremental learning method for text classification. Expert Syst Appl 147:113198. https://doi.org/10.1016/j.eswa.2020.113198
Jiang M, Liang Y, Feng X et al (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29:61–70. https://doi.org/10.1007/s00521-016-2401-x
Kiran R, Kumar P, Bhasker B (2020) Oslcfit (organic simultaneous LSTM and CNN Fit): a novel deep learning based solution for sentiment polarity classification of reviews. Expert Syst Appl 157:113488. https://doi.org/10.1016/j.eswa.2020.113488
Minaee S, Kalchbrenner N, Cambria E et al (2021) Deep learning–based text classification: a comprehensive review. ACM Comput Surv 54:1–40
Ait Hammou B, Ait Lahcen A, Mouline S (2020) Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics. Inf Process Manag 57:102122. https://doi.org/10.1016/j.ipm.2019.102122
Sadiq S, Umer M, Ullah S et al (2021) Discrepancy detection between actual user reviews and numeric ratings of Google App store using deep learning. Expert Syst Appl 181:115111. https://doi.org/10.1016/j.eswa.2021.115111
Funding
This study was funded by the National Natural Science Foundation of China (62176261).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ding, J., Li, B., Xu, C. et al. Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records. Appl Intell 53, 15979–15992 (2023). https://doi.org/10.1007/s10489-022-04346-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04346-x