Mining Medication-Effect Relations from Twitter Data Using Pre-trained Transformer Language Model

Jiang, Keyuan; Zhang, Dingkai; Bernard, Gordon R.

doi:10.1007/978-3-030-93733-1_35

Keyuan Jiang ORCID: orcid.org/0000-0002-1565-3202⁶⁴,
Dingkai Zhang⁶⁵ &
Gordon R. Bernard⁶⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1525))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1288 Accesses

Abstract

Pharmacovigilance aims to promote safe use of pharmaceutical products by continuously assessing the safety of marketed medications. Lately, an active area of this endeavor is to use social media such as Twitter as an alternative data source to gather patient-reported experience with medication use. Published work focused on identifying expressions of adverse effects in social media data while giving little attention to understanding the relationship between a mentioned medication and any mentioned effect expressions. In this study, we investigated the discovery of medication-effect relations from Twitter text using BERT, a transformer-based language model, with fine-tuning. Our results on a corpus of 9,516 annotated tweets show that the overall performance of our method is superior to the 4 baseline approaches studied. The outcome of this work may help automate and accelerate the process of discovering potentially unreported medication effects from patient-reported experiences documented in the sheer amount of social media data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Ramos-Casals, M., et al.: Off-label use of rituximab in 196 patients with severe, refractory systemic autoimmune diseases. Clin. Exp. Rheumatol. 28, 468–476 (2010)
Google Scholar
Effinger, A., O’Driscoll, C.M., McAllister, M., Fotaki, N.: Impact of gastrointestinal disease states on oral drug absorption–implications for formulation design–a PEARRL review. J. Pharm. Pharmacol. 71, 674–698 (2019)
Article Google Scholar
Golder, S., Norman, G., Loke, Y.: Systematic review on the prevalence, frequency and comparative value of adverse events data in social media. Br. J. Clin. Pharmacol. 80, 878–888 (2015)
Article Google Scholar
Sarker, A., et al.: Utilizing social media data for pharmacovigilance: a review. J. Biomed. Inform. 54, 202–212 (2015)
Article Google Scholar
Magge, A., et al.: DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug effect mentions on twitter. medRxiv (2020)
Google Scholar
Jiang, K., Huang, L., Chen, T., Karbaschi, G., Zhang, D., Bernard, G.R.: Mining potentially unreported effects from Twitter posts through relational similarity: a case for opioids. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2603–2609 (2020)
Google Scholar
Jiang, K., Feng, S., Huang, L., Chen, T., Bernard, G.R.: Mining potential effects of HUMIRA in Twitter posts through relational similarity. Stud. Health Technol. Inf. 270, 874–878 (2020)
Google Scholar
Jurgens, D., Mohammad, S., Turney, P., Holyoak, K.: Semeval-2012 task 2: measuring degrees of relational similarity. In: SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pp. 356–364 (2012)
Google Scholar
Rindflesch, T.C., Fiszman, M.: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J. Biomed. Inform. 36, 462–477 (2003)
Article Google Scholar
Aramaki, E., et al.: Extraction of adverse drug effects from clinical records. Medinfo 160, 739–743 (2010)
Google Scholar
Gurulingappa, H., Rajput, A.M., Roberts, A., Fluck, J., Hofmann-Apitius, M., Toldo, L.: Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. J. Biomed. Inform. 45, 885–892 (2012)
Article Google Scholar
Zhang, Y., Lu, Z.: Exploring semi-supervised variational autoencoders for biomedical relation extraction. Methods 166, 112–119 (2019)
Article Google Scholar
Rindflesch, T.C., Fiszman, M.: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J. Biomed. Inform. 36(6), 462–477 (2003)
Article Google Scholar
Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), Barcelona, pp. 423–429 (2004)
Google Scholar
Wang, C., James, F.: Medical relation extraction with manifold models. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 828–838 (2014)
Google Scholar
Song, M., Won, K.C., Dahee, L., Go, E.H., Keun, Y.K.: PKDE4J: entity and relation extraction for public knowledge discovery. J. Biomed. Inform. 57, 320–332 (2015)
Article Google Scholar
Segura-Bedmar, I., Martínez, P., de Pablo-Sánchez, C.: Using a shallow linguistic kernel for drug–drug interaction extraction. J. Biomed. Inform. 44, 789–804 (2011)
Article Google Scholar
Kim, S., Liu, H., Yeganova, L., Wilbur, W.: Extracting drug–drug interactions from literature using a rich feature-based linear kernel approach. J. Biomed. Inform. 55, 23–30 (2015)
Article Google Scholar
Giuliano, C., Lavelli, A., Roman, L.: Exploiting shallow linguistic information for relation extraction from biomedical literature. In: 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento (2006)
Google Scholar
Kalina, B., Derczynski, L., Funk, A., Greenwood, M., Maynard, D., Aswani, N.: Twitie: an open-source information extraction pipeline for microblog text. In: Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, Hissar, pp. 83–90 (2013)
Google Scholar
Hasby, M., Khodra, M.L.: Optimal path finding based on traffic information extraction from Twitter. In: International Conference on ICT for Smart Society, Jakarta, pp. 1–5 (2013)
Google Scholar
Anggareska, D., Purwarianti, A.: Information extraction of public complaints on Twitter text for bandung government. In: 2014 International Conference on Data and Software Engineering (ICODSE), Bandung, pp. 1–6 (2014)
Google Scholar
Yu, F., Moh, M., Moh, T.S.: Towards extracting drug-effect relation from Twitter: a supervised learning approach. In: IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), High Performance and Smart Computing (HPSC), and Intelligent Data and Security (IDS), pp. 339–344. IEEE (2016)
Google Scholar
Adrover, C., Bodnar, T., Huang, Z., Telenti, A., Salathé, M.: Identifying adverse effects of HIV drug treatment and associated sentiments using Twitter. JMIR Publ. Health Surveill. 1, e7 (2015)
Article Google Scholar
Kuhn, M., Letunic, I., Jensen, L.J., Bork, P.: The SIDER database of drugs and side effects. Nucleic Acids Res. 44, D1075–D1079 (2015)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018). https://arxiv.org/abs/1810.04805
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR, Scottsdale (2013)
Google Scholar
Wang, Y., et al.: A comparison of word embeddings for the biomedical natural language processing. J. Biomed. Inform. 87, 12–20 (2018)
Article Google Scholar
Angeli, G., Premkumar, M.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1, pp. 344–354 (2015)
Google Scholar
Zeng, Q.T., Tse, T.: Exploring and developing consumer health vocabularies. J. Am. Med. Inform. Assoc. 13, 24–29 (2006)
Article Google Scholar
National Library of Medicine. Unified Medical Language System® (UMLS®) Glossary (2016). https://www.nlm.nih.gov/research/umls/new_users/glossary.html
Scheff, S.W.: Nonparametric statistics. In: Fundamental Statistical Principles for the Neurobiologist. Academic Press, New York (2016)
Google Scholar
Sani, F., Todman, J.: Experimental design and statistics for psychology: a first course. In: Appendix 1: Statistical Tables, pp. 183–196. John Wiley & Sons, New York (2006)
Google Scholar
Jiang, K., Feng, S., Song, Q., Calix, R.A., Gupta, M., Bernard, G.R.: Identifying tweets of personal health experience through word embedding and LSTM neural network. BMC Bioinformatics 19(8), 67–74 (2018)
Google Scholar

Download references

Acknowledgement

Authors wish to thank anonymous reviewers for their critiques and constructive comments which improved this manuscript.

Author information

Authors and Affiliations

Purdue University Northwest, Hammond, IN, 46323, USA
Keyuan Jiang
Ningbo City College of Vocational Technology, Ningbo, Zhejiang, China
Dingkai Zhang
Vanderbilt University, Nashville, TN, 37232, USA
Gordon R. Bernard

Authors

Keyuan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Dingkai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Gordon R. Bernard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keyuan Jiang .

Editor information

Editors and Affiliations

IKIM, Ruhr-University Bochum, Bochum, Germany
Michael Kamp
University of Sydney, Sydney, NSW, Australia
Irena Koprinska
University of Namur, Namur, Belgium
Adrien Bibal
University of Rennes 1, Rennes, France
Tassadit Bouadi
University of Namur, Namur, Belgium
Benoît Frénay
Inria, Rennes, France
Luis Galárraga
University of Antwerp, Antwerp, Belgium
José Oramas
Ruhr University Bochum, Bochum, Germany
Linara Adilova
Royal Holloway University of London, Egham, UK
Yamuna Krishnamurthy
Ghent University, Ghent, Belgium
Bo Kang
Université Jean Monnet, Saint-Etienne cedex 2, France
Christine Largeron
Ghent University, Gent, Belgium
Jefrey Lijffijt
Telecom Paris, Paris, France
Tiphaine Viard
University of Bonn, Bonn, Germany
Pascal Welke
Norwegian Univesity of Science and Technology, Trondheim, Norway
Massimiliano Ruocco
BI Norwegian Business School, Oslo, Norway
Erlend Aune
University of Pisa, Pisa, Italy
Claudio Gallicchio
University of Duisburg-Essen, Essen, Germany
Gregor Schiele
Graz University of Technology, Graz, Austria
Franz Pernkopf
Xilinx Research, Dublin, Ireland
Michaela Blott
Heidelberg University, Heidelberg, Germany
Holger Fröning
Heidelberg University, Heidelberg, Germany
Günther Schindler
University of Pisa, Pisa, Italy
Riccardo Guidotti
University of Pisa, Pisa, Italy
Anna Monreale
ISTI-CNR, Pisa, Italy
Salvatore Rinzivillo
Warsaw University of Technology, Warsaw, Poland
Przemyslaw Biecek
Freie Universität Berlin, Berlin, Germany
Eirini Ntoutsi
Eindhoven University of Technology, Eindhoven, The Netherlands
Mykola Pechenizkiy
Leibniz University Hannover, Hannover, Germany
Bodo Rosenhahn
University of Sussex, Brighton, UK
Christopher Buckley
University of Chieti-Pescara, Chieti, Italy
Daniela Cialfi
Radboud University Nijmegen, Nijmegen, The Netherlands
Pablo Lanillos
McGill University, Montreal, Canada
Maxwell Ramstead
Ghent University, Ghent, Belgium
Tim Verbelen
University of Lisbon, Lisboa, Portugal
Pedro M. Ferreira
University of Bari Aldo Moro, Bari, Italy
Giuseppina Andresini
Universita di Bari Aldo Moro, Bari, Italy
Donato Malerba
University of Lisbon, Lisbon, Portugal
Ibéria Medeiros
Shenzhen University, Shenzhen, China
Philippe Fournier-Viger
Harbin Institute of Technology, Harbin, China
M. Saqib Nawaz
University of Córdoba, Córdoba, Spain
Sebastian Ventura
Peking University, Beijing, China
Meng Sun
Noah's Ark Lab, Huawei, Beijing, China
Min Zhou
UniCredit, Milan, Italy
Valerio Bitetta
UniCredit, Rome, Italy
Ilaria Bordino
UniCredit, Milan, Italy
Andrea Ferretti
Unicredit, Rome, Italy
Francesco Gullo
ENEA Headquarters, Portici, Italy
Giovanni Ponti
Unicredit, Rome, Italy
Lorenzo Severini
University of Porto, Porto, Portugal
Rita Ribeiro
University of Porto, Porto, Portugal
João Gama
UPC BarcelonaTech, Barcelona, Spain
Ricard Gavaldà
Northwestern University, Chicago, IL, USA
Lee Cooper
PD Personalised Healthcare, Basel, Switzerland
Naghmeh Ghazaleh
University of Lausanne, Lausanne, Switzerland
Jonas Richiardi
ETH Zurich, Basel, Switzerland
Damian Roqueiro
F. Hoffmann–La Roche Ltd, Basel, Switzerland
Diego Saldana Miranda
Novartis Pharma AG, Basel, Switzerland
Konstantinos Sechidis
University of Lisbon, Lisbon, Portugal
Guilherme Graça

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, K., Zhang, D., Bernard, G.R. (2021). Mining Medication-Effect Relations from Twitter Data Using Pre-trained Transformer Language Model. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1525. Springer, Cham. https://doi.org/10.1007/978-3-030-93733-1_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-93733-1_35
Published: 18 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93732-4
Online ISBN: 978-3-030-93733-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics