Automated Classification of Quality Defect Issues Relating to Substandard Medicines Using a Hybrid Machine Learning and Rule-Based Approach

Teo, Desmond Chun Hwee; Huang, Yiting; Dorajoo, Sreemanee Raaj; Ng, Michelle Sau Yuen; Choong, Chih Tzer; Phuah, Doris Sock Tin; Tan, Dorothy Hooi Myn; Tan, Filina Meixuan; Huang, Huilin; Tan, Maggie Siok Hwee; Koh, Suan Tian; Poh, Jalene Wang Woon; Ang, Pei San

doi:10.1007/s40264-023-01339-8

Automated Classification of Quality Defect Issues Relating to Substandard Medicines Using a Hybrid Machine Learning and Rule-Based Approach

Original Research Article
Published: 30 September 2023

Volume 46, pages 975–989, (2023)
Cite this article

Drug Safety Aims and scope Submit manuscript

112 Accesses
Explore all metrics

Abstract

Background and Objective

Substandard medicines can lead to serious safety issues affecting public health; however, the nature of such issues can be widely heterogeneous. Health product regulators seek to prioritise critical product quality defects for review to ensure that prompt risk mitigation measures are taken. This study aims to classify the nature of issues for substandard medicines using machine learning to augment a risk-based and timely review of cases.

Methods

A combined machine learning algorithm with a keyword-based model was developed to classify quality issues using text relating to substandard medicines (CISTERM). The nature of issues for product defect cases were classified based on Medical Dictionary for Regulatory Activities–Health Sciences Authority (MedDRA–HSA) lowest-level terms.

Results

Product defect cases received from January 2010 to December 2021 were used for training (n = 11,082) and for testing (n = 2771). The machine learning model achieved a good recall (precision) of 92% (96%) for ‘Product adulterated and/or contains prohibited substance’, 86% (90%) for ‘Out of specification or out of trend test result’ and 90% (91%) for ‘Manufacturing non-compliance’.

Conclusion

Post-market surveillance of substandard medicines remains a key activity for drug regulatory authorities. A combined machine learning algorithm with keyword-based model can help to prioritise the review of product quality defect issues in a timely manner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The role of artificial intelligence in healthcare: a structured literature review

Article Open access 10 April 2021

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Article 12 April 2021

Revolutionizing healthcare: the role of artificial intelligence in clinical practice

Article Open access 22 September 2023

References

World Health Organization. A study on the public health and socioeconomic impact of substandard and falsified medical products. Geneva, Switzerland: World Health Organization; 2017.
Ang PS, Teo DCH, Dorajoo SR, et al. Augmenting product defect surveillance through web crawling and machine learning in Singapore. Drug Saf. 2021;44(9):939–48. https://doi.org/10.1007/s40264-021-01084-w.
Medical Dictionary for Regulatory Activities. MedDRA® the Medical Dictionary for Regulatory Activities. In: International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH): MedDRA Maintenance and Support Services Organization (MSSO)) Version 24.0; 2021.
Ang PS, Teo DCH, Toh YL, et al. A risk classification model for prioritising the management of quality issues relating to substandard medicines in Singapore. Pharmacoepidemiol Drug Saf. 2022;31(7):729–38. https://doi.org/10.1002/pds.5434.
Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57. https://doi.org/10.48550/arXiv.1106.1813.
He H, Bai Y, Garcia EA, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on neural networks. 2008;1322−28.
Tomek I. Two modifications of CNN. IEEE Trans Syst Man Cybern. 1976;SMC-6(11):769–72. https://doi.org/10.1109/TSMC.1976.4309452.
Mihalcea R, Tarau P. Textrank: Bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing. 2004.
Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. https://doi.org/10.48550/arXiv.1301.3781.
Joulin A, Grave E, Bojanowski P, et al. Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651, 2016. https://doi.org/10.48550/arXiv.1612.03651.
Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759, 2016. https://doi.org/10.48550/arXiv.1607.01759.
Bojanowski P, Grave E, Joulin A, et al. Enriching word vectors with subword information. Trans Assoc Comput Linguist. 2017;5:135–46. https://doi.org/10.1162/tacl_a_00051.
Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18(1):559–63. https://doi.org/10.48550/arXiv.1609.06570
Google Scholar

Download references

Acknowledgements

The authors would like to thank Wanyu Zheng and Dr Han Leong Goh from the Synapxe Pte Ltd for their helpful discussions.

Author information

Authors and Affiliations

Vigilance and Compliance Branch, Health Products Regulation Group, Health Sciences Authority, 11 Biopolis Way #11-01 Helios, Singapore, 138667, Singapore
Desmond Chun Hwee Teo, Yiting Huang, Sreemanee Raaj Dorajoo, Michelle Sau Yuen Ng, Chih Tzer Choong, Doris Sock Tin Phuah, Dorothy Hooi Myn Tan, Filina Meixuan Tan, Huilin Huang, Maggie Siok Hwee Tan, Suan Tian Koh, Jalene Wang Woon Poh & Pei San Ang

Authors

Desmond Chun Hwee Teo
View author publications
You can also search for this author in PubMed Google Scholar
Yiting Huang
View author publications
You can also search for this author in PubMed Google Scholar
Sreemanee Raaj Dorajoo
View author publications
You can also search for this author in PubMed Google Scholar
Michelle Sau Yuen Ng
View author publications
You can also search for this author in PubMed Google Scholar
Chih Tzer Choong
View author publications
You can also search for this author in PubMed Google Scholar
Doris Sock Tin Phuah
View author publications
You can also search for this author in PubMed Google Scholar
Dorothy Hooi Myn Tan
View author publications
You can also search for this author in PubMed Google Scholar
Filina Meixuan Tan
View author publications
You can also search for this author in PubMed Google Scholar
Huilin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Maggie Siok Hwee Tan
View author publications
You can also search for this author in PubMed Google Scholar
Suan Tian Koh
View author publications
You can also search for this author in PubMed Google Scholar
Jalene Wang Woon Poh
View author publications
You can also search for this author in PubMed Google Scholar
Pei San Ang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Desmond Chun Hwee Teo.

Ethics declarations

Funding

This initiative received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

Conflict of Interest

The authors have no conflicts of interest that are directly relevant to the content of this article. The view expressed in this article are the authors’ personal views and may not be understood or quoted as being made on behalf or reflect the position of HSA.

Ethics Approval

Not applicable.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Code Availability

All domain-specific keywords and software libraries relevant to the study, i.e. to run the text classification algorithm are included in the article as Online Resources.

Availability of data and material

The data that support the findings of this study are not openly available due to reasons of sensitivity and restrictions applied to the data.

Authors' Contributions

Ang proposed the research idea. Desmond, Yiting, Sreemanee and Ang designed the models. Desmond and Yiting analysed the data. Ang, Choong, Doris, Dorothy, Maggie, Michelle and Koh provided the domain expertise for manual annotation of the data and developed the list of positive and negative keywords. Jalene provided the thought leadership for the project. All authors read and approved the final version.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 131 KB)

Supplementary file2 (PDF 33 KB)

Supplementary file3 (PDF 25 KB)

Supplementary file4 (PDF 27 KB)

Supplementary file5 (PDF 48 KB)

Supplementary file6 (PDF 38 KB)

Supplementary file7 (PDF 38 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Teo, D.C.H., Huang, Y., Dorajoo, S.R. et al. Automated Classification of Quality Defect Issues Relating to Substandard Medicines Using a Hybrid Machine Learning and Rule-Based Approach. Drug Saf 46, 975–989 (2023). https://doi.org/10.1007/s40264-023-01339-8

Download citation

Accepted: 25 July 2023
Published: 30 September 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s40264-023-01339-8

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated Classification of Quality Defect Issues Relating to Substandard Medicines Using a Hybrid Machine Learning and Rule-Based Approach