Skip to main content
Log in

Application of Information Retrieval Approaches to Case Classification in the Vaccine Adverse Event Reporting System

Drug Safety Aims and scope Submit manuscript



Automating the classification of adverse event reports is an important step to improve the efficiency of vaccine safety surveillance. Previously we showed it was possible to classify reports using features extracted from the text of the reports.


The aim of this study was to use the information encoded in the Medical Dictionary for Regulatory Activities (MedDRA®) in the US Vaccine Adverse Event Reporting System (VAERS) to support and evaluate two classification approaches: a multiple information retrieval strategy and a rule-based approach. To evaluate the performance of these approaches, we selected the conditions of anaphylaxis and Guillain–Barré syndrome (GBS).


We used MedDRA® Preferred Terms stored in the VAERS, and two standardized medical terminologies: the Brighton Collaboration (BC) case definitions and Standardized MedDRA® Queries (SMQ) to classify two sets of reports for GBS and anaphylaxis. Two approaches were used: (i) the rule-based instruments that are available by the two terminologies (the Automatic Brighton Classification [ABC] tool and the SMQ algorithms); and (ii) the vector space model.


We found that the rule-based instruments, particularly the SMQ algorithms, achieved a high degree of specificity; however, there was a cost in terms of sensitivity in all but the narrow GBS SMQ algorithm that outperformed the remaining approaches (sensitivity in the testing set was equal to 99.06 % for this algorithm vs. 93.40 % for the vector space model). In the case of anaphylaxis, the vector space model achieved higher sensitivity compared with the best values of both the ABC tool and the SMQ algorithms in the testing set (86.44 % vs. 64.11 % and 52.54 %, respectively).


Our results showed the superiority of the vector space model over the existing rule-based approaches irrespective of the standardized medical knowledge represented by either the SMQ or the BC case definition. The vector space model might make automation of case definitions for spontaneous report review more efficient than current rule-based approaches, allowing more time for critical assessment and decision making by pharmacovigilance experts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3


  1. MedDRA® terminology is the international medical terminology developed under the auspices of the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH).


  1. Varricchio F, Iskander J, Destefano F, Ball R, Pless R, Braun MM, et al. Understanding vaccine safety information from the vaccine adverse event reporting system. Pediatr Infect Dis J. 2004;23(4):287–94.

    Article  PubMed  Google Scholar 

  2. Manning CD, Raghavan P, Schutze H. Introduction to information retrieval. 1st ed. Cambridge: Cambridge University Press; 2008.

    Book  Google Scholar 

  3. Manning CD, Schutze H. Foundations of statistical natural language processing. 1st ed. Cambridge: MIT Press; 1999.

    Google Scholar 

  4. Brown EG, Wood L, Wood S. The medical dictionary for regulatory activities (MedDRA). Drug Saf. 1999;20(2):109–17.

    Article  PubMed  CAS  Google Scholar 

  5. Bonhoeffer J, Kohl K, Chen R, Duclos P, Heijbel H, Heininger U, et al. The Brighton Collaboration: addressing the need for standardized case definitions of adverse events following immunization (AEFI). Vaccine. 2002;21(3–4):298–302.

    Article  PubMed  Google Scholar 

  6. Humphreys BL, Lindberg DAB, Schoolman HM, Barnett GO. The unified medical language system. J Am Med Inform Assoc. 1998;5(1):1–11.

    Article  PubMed  CAS  Google Scholar 

  7. Liu H, Hu ZZ, Zhang J, Wu C. BioThesaurus: a web-based thesaurus of protein and gene names. Bioinformatics. 2006;22(1):103–5.

    Article  PubMed  CAS  Google Scholar 

  8. Thompson P, McNaught J, Montemagni S, Calzolari N, Del Gratta R, Lee V, et al. The BioLexicon: a large-scale terminological resource for biomedical text mining. BMC Bioinformatics. 2011;12(1):397.

    Article  PubMed  Google Scholar 

  9. Botsis T, Nguyen MD, Woo EJ, Markatou M, Ball R. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection. J Am Med Inform Assoc. 2011;18(5):631–8.

    Article  PubMed  Google Scholar 

  10. Pedersen T, Pakhomov SV, Patwardhan S, Chute CG. Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform. 2007;40(3):288–99.

    Article  PubMed  Google Scholar 

  11. Lin D. An information-theoretic definition of similarity. In: Proceedings of 15th international conference on machine learning. San Francisco: Morgan Kaufmann Publishers Inc.; 1998. p. 296–304.

  12. Cao H, Melton GB, Markatou M, Hripcsak G. Use abstracted patient-specific features to assist an information-theoretic measurement to assess similarity between medical cases. J Biomed Inform. 2008;41(6):882–8.

    Article  PubMed  Google Scholar 

  13. Markatou M, Kuruppumullage-Don P, Hu J, Wang F, Sun J, Sorrentino R, et al. Case-based reasoning in comparative effectiveness research. IBM J Res Dev. 2012;56:5.

    Article  Google Scholar 

  14. Botsis T, Buttolph T, Nguyen MD, Winiecki S, Woo EJ, Ball R. Vaccine Adverse Event Text Mining (VaeTM) system for extracting features from vaccine safety reports. J Am Med Inform Assoc. 2012;19(6):1011–8.

    Article  PubMed  Google Scholar 

  15. Brown EG. Using MedDRA: implications for risk management. Drug Saf. 2004;27(8):591–602.

    Article  PubMed  Google Scholar 

  16. Mozzicato P. Standardised MedDRA queries: their role in signal detection. Drug Saf. 2007;30(7):617–9.

    Article  PubMed  Google Scholar 

  17. Ruggeberg JU, Gold MS, Bayas JM, Blum MD, Bonhoeffer J, Friedlander S, et al. Anaphylaxis: case definition and guidelines for data collection, analysis, and presentation of immunization safety data. Vaccine. 2007;25(31):5675–84.

    Article  PubMed  Google Scholar 

  18. Sejvar JJ, Kohl KS, Gidudu J, Amato A, Bakshi N, Baxter R, et al. Guillain–Barré syndrome and Fisher syndrome: case definitions and guidelines for collection, analysis, and presentation of immunization safety data. Vaccine. 2011;29(3):599–612.

    Article  PubMed  Google Scholar 

  19. MedDRA Maintenance and Support Services Organization. Introductory guide for standardised MedDRA queries (SMQs) Version 14.1. Chantily: MedDRA; 2011.

  20. Medical Dictionary for Regulatory Activities: Maintenance and Support Services Organization. Accessed 4 Apr 2013.

  21. Food and Drug Administration. Guidance for industry: good pharmacovigilance practices and pharmacoepidemiologic assessment. US Department of Health and Human Services. Accessed 4 Apr 2013.

Download references


We would like to thank Hector Izurieta at Office of Biostatistics and Epidemiology for the fruitful discussions, and Jan Bonhoeffer and Benus Becker at the BC for their guidance on how to use the ABC tool.

MedDRA® is a registered trademark owned by the International Federation of Pharmaceutical Manufacturers and Associations (IFPMA) on behalf of the International Conference on Harmonization (ICH).

This project was supported in part by the appointment of Taxiarchis Botsis to the Research Participation Program at the Center for Biologics Evaluation and Research administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the US Department of Energy and the US FDA.

Author contributions

Taxiarchis Botsis conceived the idea, performed the analysis and authored the paper; Emily Jane Woo read adverse event reports and determined the diagnostic level of certainty regarding GBS and edited the manuscript; Robert Ball led the overall effort in applying medical informatics to adverse event evaluation, defined the outcomes for the study, selected terms to represent the case definitions and edited the manuscript.

Competing interests

Taxiarchis Botsis, Emily Jane Woo and Robert Ball have no conflicts of interest to declare that are directly relevant to the content of this study.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Taxiarchis Botsis.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 149 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Botsis, T., Woo, E.J. & Ball, R. Application of Information Retrieval Approaches to Case Classification in the Vaccine Adverse Event Reporting System. Drug Saf 36, 573–582 (2013).

Download citation

  • Published:

  • Issue Date:

  • DOI: