Abstract
Machine learning technology has been widely adopted as a cost-saving document prioritization approach in systematic literature reviews related to human health risk assessments. Supervised approaches use a training dataset, a relatively small set of documents with human-annotated labels indicating the topic of each document, to build models that automatically predict the labels of a much larger set of unlabelled documents. Deep learning algorithms form a branch of machine learning that relies on complex neural network architectures to learn the features of the object to be classified. Although deep learning algorithms have till recently mainly been applied for image, video, and audio classification, they are increasingly being deployed on text classification problems. To explore the potential advantages and practicalities of using deep learning algorithms in the document prioritization step of systematic literature reviews, we compare the performance of the most commonly used deep learning architectures with more traditional machine learning models using a dataset of approximately 7000 abstracts from the scientific literature related to the chemical arsenic. The dataset was previously annotated by subject matter experts with regard to relevance to toxicological mode of action. We examine the relative performance of each algorithm type at alternative levels of training by sequentially expanding the training dataset to generate a learning curve. We find that deep learning offers increased performance in some instances but also requires more data to train algorithms, increased model training time, increased computational power, and more labor-intensive algorithm tuning compared to baseline traditional machine learning algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
ICF (2015) Document classification and topic extraction resource (DoCTER). https://www.icf-docter.com
Aphinyanaphongs Y, Tsamardinos I, Statnikov A, Hardin D, Aliferis CF (2005) Text categorization models for high-quality article retrieval in internal medicine. J Am Med Inform Assoc 12:207–216
Bacchi S et al (2019) Deep learning natural language processing successfully predicts the cerebrovascular cause of transient ischemic attack-like presentations. Stroke 50(3):758–760
Bekhuis T, Demner-Fushman D (2012) Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers. Artif Intell Med 55(3):197–207
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–39
Chollet, F. (2015) keras, GitHub. https://github.com/fchollet/keras
Del Fiol G et al (2018) A deep learning method to automatically identify reports of scientifically rigorous clinical research from the biomedical literature: comparative analytic study. J Med Internet Res 20(6):e10281
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Ingersoll GS, Morton TS, Farris AL (2013) Taming text: "How to Find, Organize, and Manipulate It". Manning Publications Co, New York
Jonnalagadda S, Goyal P, Huffman M (2015) Automating data extraction in systematic reviews: a systematic review. Syst Rev 15(4):78. https://doi.org/10.1186/s13643-015-0066-7
Kim, Y. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods on Natural Language Processing (EMNLP-14), pp. 1746–1751.
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Mikolov T, Chen K, Corrado G, and Jeffrey D (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
O'Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S (2015) Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4:5
Pennington J, Socher R, and Manning C (2013) Glove: global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.3115/v1/D14-1162.
Python Software Foundation. Python language reference (Version 2.7).
Rehurek R, Sojka P (2010) Software framework for topic modelling with large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. Valletta, Malta, ELRA. https://is.muni.cz/publication/884893/en.
Segura-Bedmar I et al (2018) Predicting of anaphylaxis in big data EMR by exploring machine learning approaches. J Biomed Inform 87:50–59
Shemilt I et al (2014) Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Res Synth Methods 5(1):31–49
Sulieman L et al (2017) Classifying patient portal messages using convolutional neural networks. J Biomed Inform 74:59–70
Varghese A, Cawley M, Hong T (2017) Supervised clustering for automated document classification and prioritization: a case study using toxicological abstracts. https://doi.org/10.1007/s10669-017-9670-5
Varghese A, Hong T, Hunter C, Agyeman-Badu G, Cawley M (2019) Active learning in automated text classification: a case study exploring bias in predicted model performance metrics. Environ Syst Decis https://doi.org/10.1007/s10669-019-09717-3
Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH (2010) Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics 11:55
Wang YS et al (2019) A clinical text classification paradigm using weak supervision and deep representation. BMC Med Inform Decis Mak 19:1
Weng WH et al (2017) Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach. BMC Med Inform Decis Mak 17:155
Zhang Y, Wallace B (2015) A sensitivity analysis of (and Practitioners’ Guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820
Zhou P et al. (2016) Text classification improved by integrating bidirectional lstm with two dimensional max pooling. In Proceedings of COLING 2016
Acknowledgements
The development of the methods presented here was fully supported by ICF. The results presented here were generated for the purposes of this paper alone.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Varghese, A., Agyeman-Badu, G. & Cawley, M. Deep learning in automated text classification: a case study using toxicological abstracts. Environ Syst Decis 40, 465–479 (2020). https://doi.org/10.1007/s10669-020-09763-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10669-020-09763-2