Data Mining and Knowledge Discovery

, Volume 33, Issue 4, pp 848–870 | Cite as

Automatic discovery of adverse reactions through Chinese social media

  • Mengxue Zhang
  • Meizhuo Zhang
  • Chen Ge
  • Quanyang Liu
  • Jiemin Wang
  • Jia WeiEmail author
  • Kenny Q. ZhuEmail author


Despite tremendous efforts made before the release of every drug, some adverse drug reactions (ADRs) may go undetected and thus, cause harm to both the users and to the pharmaceutical companies. One plausible venue to collect evidence of such ADRs is online social media, where patients and doctors discuss medical conditions and their treatments. There is substantial previous research on ADRs extraction from English online forums. However, very limited research was done on Chinese data. In this paper, we try to use the posts from two popular Chinese social media as the original dataset. We propose a semi-supervised learning framework that detects mentions of medications and colloquial ADR terms and extracts lexicon-syntactic features from natural language text to recognize positive associations between drug use and ADRs. The key contribution is an automatic label generation algorithm, which requires very little manual annotation. This bootstrapping algorithm could also be further applied on English data. The research results indicate that our algorithm outperforms the hidden Markov model and conditional random fields. With this approach, we discovered a large number of side effects for a variety of popular medicines in real world scenarios.


Adverse drug reaction Chinese social media Natural language processing 



This work has been partially supported by AstraZeneca and NSFC grant 91646205.

Supplementary material


  1. Benton A, Ungar LH, Hill S, Hennessy S, Mao J, Chung A, Leonard CE, Holmes JH (2011) Identifying potential adverse effects using the web: a new approach to medical hypothesis generation. J Biomed Inform 44(6):989–996CrossRefGoogle Scholar
  2. Bombardier C, Laine L, Reicin A, Shapiro D, Burgos-Vargas R, Davis B, Day R, Ferraz MB, Hawkey CJ, Hochberg MC et al (2000) Comparison of upper gastrointestinal toxicity of rofecoxib and naproxen in patients with rheumatoid arthritis. N Engl J Med 343(21):1520–1528CrossRefGoogle Scholar
  3. Bresalier RS, Sandler RS, Quan H, Bolognese JA, Oxenius B, Horgan K, Lines C, Riddell R, Morton D, Lanas A et al (2005) Cardiovascular events associated with rofecoxib in a colorectal adenoma chemoprevention trial. N Engl J Med 352(11):1092–1102CrossRefGoogle Scholar
  4. Brown E, Wood L, Wood S (1999) The medical dictionary for regulatory activities (meddra). Drug Saf 20(2):109–117CrossRefGoogle Scholar
  5. Cocos A, Fiks AG, Masino AJ (2017) Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in twitter posts. J Am Med Inform Assoc 24(4):813–821CrossRefGoogle Scholar
  6. Freifeld CC, Brownstein JS, Menone CM, Bao W, Filice R, Kass-Hout T, Dasgupta N (2014) Digital drug safety surveillance: monitoring pharmaceutical products in twitter. Drug Saf 37(5):343–350CrossRefGoogle Scholar
  7. Graham DJ, Campen D, Hui R, Spence M, Cheetham C, Levy G, Shoor S, Ray WA (2005) Risk of acute myocardial infarction and sudden cardiac death in patients treated with cyclo-oxygenase 2 selective and non-selective non-steroidal anti-inflammatory drugs: nested case–control study. The Lancet 365(9458):475–481CrossRefGoogle Scholar
  8. Gurulingappa H, Toldo L, Rajput AM, Kors JA, Taweel A, Tayrouz Y (2013) Automatic detection of adverse events to predict drug label changes using text and data mining techniques. Pharmacoepidemiol Drug Saf 22(11):1189–1194CrossRefGoogle Scholar
  9. Hahn U, Cohen KB, Garten Y, Shah NH (2012) Mining the pharmacogenomics literaturea survey of the state of the art. Brief Bioinform 13(4):460–494CrossRefGoogle Scholar
  10. Harpaz R, Haerian K, Chase HS, Friedman C (2010) Statistical mining of potential drug interaction adverse effects in FDAS spontaneous reporting system. In: AMIA annual symposium proceedings, vol 2010. American Medical Informatics Association, p 281Google Scholar
  11. Harpaz R, DuMouchel W, Shah NH, Madigan D, Ryan P, Friedman C (2012) Novel data-mining methodologies for adverse drug event discovery and analysis. Clin Pharmacol Ther 91(6):1010–1021CrossRefGoogle Scholar
  12. Huynh T, He Y, Willis A, Rüger S (2016) Adverse drug reaction classification with deep neural networks. COLINGGoogle Scholar
  13. Jiang L, Yang CC, Li J (2013) Discovering consumer health expressions from consumer-contributed content. In: SBP. Springer, Berlin, pp 164–174Google Scholar
  14. Jonnagaddala J, Jue TR, Dai H (2016) Binary classification of twitter posts for adverse drug reactions. In: Proceedings of the social media mining shared task workshop at the pacific symposium on biocomputing, Big Island, HI, USA, pp 4–8Google Scholar
  15. Karimi S, Kim S, Cavedon L (2011) Drug side-effects: What do patient forums reveal. In: The second international workshop on Web science and information exchange in the medical Web. ACM, pp 10–11Google Scholar
  16. Leaman R, Wojtulewicz L, Sullivan R, Skariah A, Yang J, Gonzalez G (2010) Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks. In: Proceedings of the 2010 workshop on biomedical natural language processing. Association for Computational Linguistics, pp 117–125Google Scholar
  17. Lee K, Qadir A, Hasan SA, Datla V, Prakash A, Liu J, Farri O (2017) Adverse drug event detection in tweets with semi-supervised convolutional neural networks. In: Proceedings of the 26th international conference on World Wide Web. International World Wide Web Conferences Steering Committee, pp 705–714Google Scholar
  18. Li YA (2011) Medical data mining: improving information accessibility using online patient drug reviews. PhD thesis, Massachusetts Institute of TechnologyGoogle Scholar
  19. Liu X, Chen H (2013) Azdrugminer: an information extraction system for mining patient-reported adverse drug events in online patient forums. In: International conference on smart health. Springer, Berlin, pp 134–150Google Scholar
  20. Liu X, Liu J, Chen H (2014) Identifying adverse drug events from health social media: a case study on heart disease discussion forums. In: International conference on smart health. Springer, Berlin, pp 25–36Google Scholar
  21. Nikfarjam A, Gonzalez GH (2011) Pattern mining for extraction of mentions of adverse drug reactions from user comments. In: AMIA annual symposium proceedings, vol 2011. American Medical Informatics Association, p 1019Google Scholar
  22. Nikfarjam A, Sarker A, OConnor K, Ginn R, Gonzalez G (2015) Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J Am Med Inform Assoc 22(3):671–681Google Scholar
  23. Pandey C, Ibrahim Z, Wu H, Iqbal E, Dobson R (2017) Improving RNN with attention and embedding for adverse drug reactions. In: Proceedings of the 2017 international conference on digital health. ACM, pp 67–71Google Scholar
  24. Sampathkumar H, Xw Chen, Luo B (2014) Mining adverse drug reactions from online healthcare forums using hidden Markov model. BMC Med Inform Decis Mak 14(1):91CrossRefGoogle Scholar
  25. Sarker A, Gonzalez G (2015) Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J Biomed Inform 53:196–207CrossRefGoogle Scholar
  26. Scheiber J, Jenkins JL, Sukuru SCK, Bender A, Mikhailov D, Milik M, Azzaoui K, Whitebread S, Hamon J, Urban L et al (2009) Mapping adverse drug reactions in chemical space. J Med Chem 52(9):3103–3107CrossRefGoogle Scholar
  27. Sharif H, Zaffar F, Abbasi A, Zimbra D (2014) Detecting adverse drug reactions using a sentiment classification framework. In: SocialCom, Academy of Science and Engineering (ASE), USA, ASE 2014Google Scholar
  28. Sohn S, Kocher JPA, Chute CG, Savova GK (2011) Drug side effect extraction from clinical narratives of psychiatry and psychology patients. J Am Med Inform Assoc 18(Supplement-1):i144–i149CrossRefGoogle Scholar
  29. Trotti A, Colevas AD, Setser A, Rusch V, Jaques D, Budach V, Langer C, Murphy B, Cumberlin R, Coleman CN et al (2003) Ctcae v3. 0: development of a comprehensive grading system for the adverse effects of cancer treatment. Semin Radiat Oncol 13:176–181CrossRefGoogle Scholar
  30. Wang W, Haerian K, Salmasian H, Harpaz R, Chase H, Friedman C (2011) A drug-adverse event extraction algorithm to support pharmacovigilance knowledge mining from pubmed citations. In: AMIA annual symposium proceedings, vol 2011. American Medical Informatics Association, p 1464Google Scholar
  31. Wang F, Zhang P, Cao N, Hu J, Sorrentino R (2014) Exploring the associations between drug side-effects and therapeutic indications. J Biomed Inform 51:15–23CrossRefGoogle Scholar
  32. Warrer P, Hansen EH, Juhl-Jensen L, Aagaard L (2012) Using text-mining techniques in electronic patient records to identify ADRs from medicine use. Br J Clin Pharmacol 73(5):674–684CrossRefGoogle Scholar
  33. Wu H, Fang H, Stanhope SJ (2012) An early warning system for unrecognized drug side effects discovery. In: Proceedings of the 21st international conference on World Wide Web. ACM, pp 437–440Google Scholar
  34. Wu H, Fang H, Stanhope S et al (2013) Exploiting online discussions to discover unrecognized drug side effects. Methods Inf Med 52(2):152–9CrossRefGoogle Scholar
  35. Xiao C, Zhang P, Chaowalitwongse WA, Hu J, Wang F (2017) Adverse drug reaction prediction with symbolic latent Dirichlet allocation. In: Proceedings of the thirty-first AAAI conference on artificial intelligenceGoogle Scholar
  36. Xie L, Li J, Xie L, Bourne PE (2009) Drug discovery using chemical systems biology: identification of the protein–ligand binding network to explain the side effects of CETP inhibitors. PLoS Comput Biol 5(5):e1000387CrossRefGoogle Scholar
  37. Yamanishi Y, Pauwels E, Kotera M (2012) Drug side-effect prediction based on the integration of chemical and biological spaces. J Chem Inf Model 52(12):3284–3292CrossRefGoogle Scholar
  38. Yang C, Srinivasan P, Polgreen PM (2012a) Automatic adverse drug events detection using letters to the editor. In: AMIA annual symposium proceedings. American Medical Informatics Association, vol 2012, p 1030Google Scholar
  39. Yang CC, Jiang L, Yang H, Tang X (2012b) Detecting signals of adverse drug reactions from health consumer contributed content in social media. In: Proceedings of ACM SIGKDD workshop on health informaticsGoogle Scholar
  40. Yates A, Goharian N (2013) ADRTrace: detecting expected and unexpected adverse drug reactions from user reviews on social media sites. Springer, BerlinGoogle Scholar
  41. Ye H, Liu Q, Wei J (2014) Construction of drug network based on side effects and its application for drug repositioning. PLoS ONE 9(2):e87864CrossRefGoogle Scholar
  42. Yeleswarapu S, Rao A, Joseph T, Saipradeep VG, Srinivasan R (2014) A pipeline to extract drug-adverse event pairs from multiple data sources. BMC Med Inform Decis Mak 14(1):13CrossRefGoogle Scholar
  43. Zhang HP, Yu HK, Xiong DY, Liu Q (2003) HHMM-based Chinese lexical analyzer ICTCLAS. In: Proceedings of the second SIGHAN workshop on Chinese language processing, -volume 17. Association for Computational Linguistics, pp 184–187Google Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringShanghai Jiao Tong UniversityShanghaiChina
  2. 2.R&D Information, AstraZenecaShanghaiChina

Personalised recommendations