Skip to main content
Log in

Automated Classification of Quality Defect Issues Relating to Substandard Medicines Using a Hybrid Machine Learning and Rule-Based Approach

  • Original Research Article
  • Published:
Drug Safety Aims and scope Submit manuscript

Abstract

Background and Objective

Substandard medicines can lead to serious safety issues affecting public health; however, the nature of such issues can be widely heterogeneous. Health product regulators seek to prioritise critical product quality defects for review to ensure that prompt risk mitigation measures are taken. This study aims to classify the nature of issues for substandard medicines using machine learning to augment a risk-based and timely review of cases.

Methods

A combined machine learning algorithm with a keyword-based model was developed to classify quality issues using text relating to substandard medicines (CISTERM). The nature of issues for product defect cases were classified based on Medical Dictionary for Regulatory Activities–Health Sciences Authority (MedDRA–HSA) lowest-level terms.

Results

Product defect cases received from January 2010 to December 2021 were used for training (n = 11,082) and for testing (n = 2771). The machine learning model achieved a good recall (precision) of 92% (96%) for ‘Product adulterated and/or contains prohibited substance’, 86% (90%) for ‘Out of specification or out of trend test result’ and 90% (91%) for ‘Manufacturing non-compliance’.

Conclusion

Post-market surveillance of substandard medicines remains a key activity for drug regulatory authorities. A combined machine learning algorithm with keyword-based model can help to prioritise the review of product quality defect issues in a timely manner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. World Health Organization. A study on the public health and socioeconomic impact of substandard and falsified medical products. Geneva, Switzerland: World Health Organization; 2017.

  2. Ang PS, Teo DCH, Dorajoo SR, et al. Augmenting product defect surveillance through web crawling and machine learning in Singapore. Drug Saf. 2021;44(9):939–48. https://doi.org/10.1007/s40264-021-01084-w.

  3. Medical Dictionary for Regulatory Activities. MedDRA® the Medical Dictionary for Regulatory Activities. In: International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH): MedDRA Maintenance and Support Services Organization (MSSO)) Version 24.0; 2021.

  4. Ang PS, Teo DCH, Toh YL, et al. A risk classification model for prioritising the management of quality issues relating to substandard medicines in Singapore. Pharmacoepidemiol Drug Saf. 2022;31(7):729–38. https://doi.org/10.1002/pds.5434.

  5. Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57. https://doi.org/10.48550/arXiv.1106.1813.

  6. He H, Bai Y, Garcia EA, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on neural networks. 2008;1322−28.

  7. Tomek I. Two modifications of CNN. IEEE Trans Syst Man Cybern. 1976;SMC-6(11):769–72. https://doi.org/10.1109/TSMC.1976.4309452.

  8. Mihalcea R, Tarau P. Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing. 2004.

  9. Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. https://doi.org/10.48550/arXiv.1301.3781.

  10. Joulin A, Grave E, Bojanowski P, et al. Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651, 2016. https://doi.org/10.48550/arXiv.1612.03651.

  11. Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759, 2016. https://doi.org/10.48550/arXiv.1607.01759.

  12. Bojanowski P, Grave E, Joulin A, et al. Enriching word vectors with subword information. Trans Assoc Comput Linguist. 2017;5:135–46. https://doi.org/10.1162/tacl_a_00051.

  13. Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18(1):559–63. https://doi.org/10.48550/arXiv.1609.06570

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank Wanyu Zheng and Dr Han Leong Goh from the Synapxe Pte Ltd for their helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Desmond Chun Hwee Teo.

Ethics declarations

Funding

This initiative received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

Conflict of Interest

The authors have no conflicts of interest that are directly relevant to the content of this article. The view expressed in this article are the authors’ personal views and may not be understood or quoted as being made on behalf or reflect the position of HSA.

Ethics Approval

Not applicable.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Code Availability

All domain-specific keywords and software libraries relevant to the study, i.e. to run the text classification algorithm are included in the article as Online Resources.

Availability of data and material

The data that support the findings of this study are not openly available due to reasons of sensitivity and restrictions applied to the data.

Authors' Contributions

Ang proposed the research idea. Desmond, Yiting, Sreemanee and Ang designed the models. Desmond and Yiting analysed the data. Ang, Choong, Doris, Dorothy, Maggie, Michelle and Koh provided the domain expertise for manual annotation of the data and developed the list of positive and negative keywords. Jalene provided the thought leadership for the project. All authors read and approved the final version. 

Supplementary Information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Teo, D.C.H., Huang, Y., Dorajoo, S.R. et al. Automated Classification of Quality Defect Issues Relating to Substandard Medicines Using a Hybrid Machine Learning and Rule-Based Approach. Drug Saf 46, 975–989 (2023). https://doi.org/10.1007/s40264-023-01339-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40264-023-01339-8

Navigation