Abstract
Background and Objective
Substandard medicines can lead to serious safety issues affecting public health; however, the nature of such issues can be widely heterogeneous. Health product regulators seek to prioritise critical product quality defects for review to ensure that prompt risk mitigation measures are taken. This study aims to classify the nature of issues for substandard medicines using machine learning to augment a risk-based and timely review of cases.
Methods
A combined machine learning algorithm with a keyword-based model was developed to classify quality issues using text relating to substandard medicines (CISTERM). The nature of issues for product defect cases were classified based on Medical Dictionary for Regulatory Activities–Health Sciences Authority (MedDRA–HSA) lowest-level terms.
Results
Product defect cases received from January 2010 to December 2021 were used for training (n = 11,082) and for testing (n = 2771). The machine learning model achieved a good recall (precision) of 92% (96%) for ‘Product adulterated and/or contains prohibited substance’, 86% (90%) for ‘Out of specification or out of trend test result’ and 90% (91%) for ‘Manufacturing non-compliance’.
Conclusion
Post-market surveillance of substandard medicines remains a key activity for drug regulatory authorities. A combined machine learning algorithm with keyword-based model can help to prioritise the review of product quality defect issues in a timely manner.
Similar content being viewed by others
References
World Health Organization. A study on the public health and socioeconomic impact of substandard and falsified medical products. Geneva, Switzerland: World Health Organization; 2017.
Ang PS, Teo DCH, Dorajoo SR, et al. Augmenting product defect surveillance through web crawling and machine learning in Singapore. Drug Saf. 2021;44(9):939–48. https://doi.org/10.1007/s40264-021-01084-w.
Medical Dictionary for Regulatory Activities. MedDRA® the Medical Dictionary for Regulatory Activities. In: International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH): MedDRA Maintenance and Support Services Organization (MSSO)) Version 24.0; 2021.
Ang PS, Teo DCH, Toh YL, et al. A risk classification model for prioritising the management of quality issues relating to substandard medicines in Singapore. Pharmacoepidemiol Drug Saf. 2022;31(7):729–38. https://doi.org/10.1002/pds.5434.
Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57. https://doi.org/10.48550/arXiv.1106.1813.
He H, Bai Y, Garcia EA, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on neural networks. 2008;1322−28.
Tomek I. Two modifications of CNN. IEEE Trans Syst Man Cybern. 1976;SMC-6(11):769–72. https://doi.org/10.1109/TSMC.1976.4309452.
Mihalcea R, Tarau P. Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing. 2004.
Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. https://doi.org/10.48550/arXiv.1301.3781.
Joulin A, Grave E, Bojanowski P, et al. Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651, 2016. https://doi.org/10.48550/arXiv.1612.03651.
Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759, 2016. https://doi.org/10.48550/arXiv.1607.01759.
Bojanowski P, Grave E, Joulin A, et al. Enriching word vectors with subword information. Trans Assoc Comput Linguist. 2017;5:135–46. https://doi.org/10.1162/tacl_a_00051.
Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18(1):559–63. https://doi.org/10.48550/arXiv.1609.06570
Acknowledgements
The authors would like to thank Wanyu Zheng and Dr Han Leong Goh from the Synapxe Pte Ltd for their helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
This initiative received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
Conflict of Interest
The authors have no conflicts of interest that are directly relevant to the content of this article. The view expressed in this article are the authors’ personal views and may not be understood or quoted as being made on behalf or reflect the position of HSA.
Ethics Approval
Not applicable.
Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Code Availability
All domain-specific keywords and software libraries relevant to the study, i.e. to run the text classification algorithm are included in the article as Online Resources.
Availability of data and material
The data that support the findings of this study are not openly available due to reasons of sensitivity and restrictions applied to the data.
Authors' Contributions
Ang proposed the research idea. Desmond, Yiting, Sreemanee and Ang designed the models. Desmond and Yiting analysed the data. Ang, Choong, Doris, Dorothy, Maggie, Michelle and Koh provided the domain expertise for manual annotation of the data and developed the list of positive and negative keywords. Jalene provided the thought leadership for the project. All authors read and approved the final version.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Teo, D.C.H., Huang, Y., Dorajoo, S.R. et al. Automated Classification of Quality Defect Issues Relating to Substandard Medicines Using a Hybrid Machine Learning and Rule-Based Approach. Drug Saf 46, 975–989 (2023). https://doi.org/10.1007/s40264-023-01339-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40264-023-01339-8