Abstract
Introduction
Postmarketing drug safety surveillance research has focused on the product-patient interaction as the primary source of variability in clinical outcomes. However, the inherent complexity of pharmaceutical manufacturing and distribution, especially of biologic drugs, also underscores the importance of risks related to variability in manufacturing and supply chain conditions that could potentially impact clinical outcomes. We propose a data-driven signal detection method called HMMScan to monitor for manufacturing lot-dependent changes in adverse event (AE) rates, and herein apply it to a biologic drug.
Methods
The HMMScan method chooses the best-fitting candidate from a family of probabilistic Hidden Markov Models to detect temporal correlations in per lot AE rates that could signal clinically relevant variability in manufacturing and supply chain conditions. Additionally, HMMScan indicates the particular lots most likely to be related to risky states of the manufacturing or supply chain condition. The HMMScan method was validated on extensive simulated data and applied to three actual lot sequences of a major biologic drug by combining lot metadata from the manufacturer with AE reports from the US FDA Adverse Event Reporting System (FAERS).
Results
Extensive method validation on simulated data indicated that HMMScan is able to correctly detect the presence or absence of variable manufacturing and supply chain conditions for contiguous sequences of 100 lots or more when changes in these conditions have a meaningful impact on AE rates. Applying the HMMScan method to FAERS data, two of the three actual lot sequences examined exhibited evidence of potential manufacturing or supply chain-related variability.
Conclusions
HMMScan could be utilized by both manufacturers and regulators to automate lot variability monitoring and inform targeted root-cause analysis. Broad application of HMMScan would rely on a well-developed data input pipeline. The proposed method is implemented in an open-source GitHub repository.
Similar content being viewed by others
Notes
An outlier is defined as an AE rate greater than the 75th percentile plus 1.5 times the interquartile range [20].
References
Harpaz R, DuMouchel W, Shah NH, Madigan D, Ryan P, Friedman C. Novel data-mining methodologies for adverse drug event discovery and analysis. Clin Pharmacol Ther. 2012;91:1010–21.
Khouri C, Nguyen T, Revol B, Lepelley M, Pariente A, Roustit M, et al. Leveraging the variability of pharmacovigilance disproportionality analyses to improve signal detection performances. Front Pharmacol. 2021;12:1–7.
Kulldorff M, Dashevsky I, Avery TR, Chan AK, Davis RL, Graham D, et al. Drug safety data mining with a tree-based scan statistic. Pharmacoepidemiol Drug Saf. 2013;22:517–23.
Sandberg L, Taavola H, Aoki Y, Chandler R, Norén GN. Risk factor considerations in statistical signal detection: using subgroup disproportionality to uncover risk groups for adverse drug reactions in VigiBase. Drug Saf. 2020;43:999–1009. https://doi.org/10.1007/s40264-020-00957-w.
Beninger P. Opportunities for collaboration at the interface of pharmacovigilance and manufacturing. Clin Ther. 2017;39:702–12. https://doi.org/10.1016/j.clinthera.2017.03.010.
US FDA Center for Biologics Evaluation and Research. Best practices in drug and biological product postmarket safety surveillance for FDA staff. 2019. https://www.federalregister.gov/documents/2019/11/07/2019-24332/best-practices-in-drug-and-biological-product-postmarket-safety-surveillance-for-food-and-drug. Accessed 15 July 2022.
Dumouchel W, Yuen N, Payvandi N, Booth W, Rut A, Fram D. Automated method for detecting increases in frequency of spontaneous adverse event reports over time. J Biopharm Stat. 2013;23:161–77.
Heimann G, Belleli R, Kerman J, Fisch R, Kahn J, Behr S, et al. A nonparametric method to detect increased frequencies of adverse drug reactions over time. Stat Med. 2018;37:1491–514.
Mahaux O, Bauchau V, Zeinoun Z, Van Holle L. Tree-based scan statistic—application in manufacturing-related safety signal detection. Vaccine. 2018;37:49–55. https://doi.org/10.1016/j.vaccine.2018.11.044.
Rabiner LR. A tutorial on Hidden Markov models and selected applications in speech recognition. Proc IEEE. 1989;77:257–86.
Raftery AE. Bayesian model selection in social research. Sociol Methodol. 1995;25:111.
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol). 1977;39:1–22.
Baum LE, Petrie T, Soules G, Weiss N. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov Chains. Ann Math Stat. 1970;41:164–71.
Schreiber J, Allen PG. pomegranate: fast and flexible probabilistic modeling in Python. J Mach Learn Res. 2018;18:1–6.
Clemons TE, Bradley EL. A nonparametric measure of the overlapping coefficient. Comput Stat Data Anal. 2000;34:51–61.
Weitzman MS. Measures of overlap of income distributions of White and Negro Families in the United States. Washington, DC: US Government Printing Office; 1970.
US FDA. FDA Adverse Event Reporting System (FAERS). 2022. https://fis.fda.gov/extensions/FPD-QDE-FAERS/FPD-QDE-FAERS.html. Accessed 15 July 2022.
Wilde J. HMMScan: surveillance of adverse event variability across manufacturing lots in biologics. 2022. https://github.com/josh-wilde/hmmscan. Accessed 30 Oct 2022.
Wilde J, Levi R. HMMScan data repository. Mendeley Data. 2022. https://data.mendeley.com/datasets/zzd5vbj7yn.3. Accessed 9 Sep 2023.
Walfish S. A review of statistical outlier methods. Pharmaceutical Technology. 2006. http://www.pharmtech.com/pharmtech/content/printContentPopup.jsp?id=384716. Accessed 30 Oct 2022.
Popov AA, Gultyaeva TA, Uvarov VE. Training hidden Markov models on incomplete sequences. 2016 13th International Scientific-Technical Conference on Actual Problems of Electronics Instrument Engineering (APEIE). IEEE; 2016. p. 317–20. http://ieeexplore.ieee.org/document/7806478/
Nylund KL, Asparouhov T, Muthén BO. Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Struct Equ Model. 2007;14:535–69.
Alatawi YM, Hansen RA. Empirical estimation of under-reporting in the US Food and Drug Administration Adverse Event Reporting System (FAERS). Expert Opin Drug Saf. 2017;16:761–7.
Dumont T. Context tree estimation in variable length Hidden Markov models. IEEE Trans Inf Theory. 2014;60:3196–208.
Kontoyiannis I, Mertzanis L, Panotopoulou A, Papageorgiou I, Skoularidou M. Bayesian context trees: modelling and exact inference for discrete time series. J R Stat Soc Ser B Stat Methodol. 2022;84(4):1287–323. https://doi.org/10.1111/rssb.12511.
Monaco JV, Tappert CC. The partially observable hidden Markov model and its application to keystroke dynamics. Pattern Recognit. 2018;76:449–62.
Ratcliff R, Tuerlinckx F. Estimating parameters of the diffusion model: approaches to dealing with contaminant reaction times and parameter variability. Psychon Bull Rev. 2002;9:438–81. https://doi.org/10.3758/BF03196302.
Wagenmakers EJ, Ratcliff R, Gomez P, Iverson GJ. Assessing model mimicry using the parametric bootstrap. J Math Psychol. 2004;48:28–50.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
This work was supported by the FDA, Grant no. U01FD006483. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the financial sponsor.
Conflict of interest
Subsequent to the substantial completion of this work but prior to publication, the second, third, and fourth authors (Stacy Springs, Jacqueline M. Wolfrum, and Retsef Levi, respectively) received, through MIT, an award from the MIT-Takeda initiative to conduct research on signal detection. Joshua T. Wilde, Stacy Springs, Jacqueline M. Wolfrum, and Retsef Levi declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Availability of data and materials
The datasets generated and/or analyzed during the current study are available in the HMMScan Data Repository (https://doi.org/10.17632/zzd5vbj7yn.3).
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Code availability
All code and related documentation are available at https://github.com/josh-wilde/hmmscan.
Author contributions
JTW, SS, JMW, and RL contributed to the study conception and design. JTW and RL contributed to methodology development. Data collection and analysis were performed by JTW. The first draft of the manuscript was written by JTW and RL, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wilde, J.T., Springs, S., Wolfrum, J.M. et al. Development and Application of a Data-Driven Signal Detection Method for Surveillance of Adverse Event Variability Across Manufacturing Lots of Biologics. Drug Saf 46, 1117–1131 (2023). https://doi.org/10.1007/s40264-023-01349-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40264-023-01349-6