Skip to main content
Log in

Identifying and Mitigating Potential Biases in Predicting Drug Approvals

Drug Safety Aims and scope Submit manuscript

Cite this article



Machine learning models are increasingly applied to predict the drug development outcomes based on intermediary clinical trial results. A key challenge to this task is to address various forms of bias in the historical drug approval data.


We aimed to identify and mitigate the bias in drug approval predictions and quantify the impacts of debiasing in terms of financial value and drug safety.


We instantiated the Debiasing Variational Autoencoder, the state-of-the-art model for automated debiasing. We trained and evaluated the model on the Citeline dataset provided by Informa Pharma Intelligence to predict the final drug development outcome from phase II trial results.


The debiased Debiasing Variational Autoencoder model achieved better performance (measured by the \(F_{1}\) score 0.48) in predicting the drug development outcomes than its un-debiased baseline (\(F_{1}\) score 0.25). It had a much higher true-positive rate than baseline (60% vs 15%), while its true-negative rate was slightly lower (88% vs 99%). The Debiasing Variational Autoencoder distinguished between drugs developed by large pharmaceutical firms and those by small biotech companies. The model prediction is strongly influenced by multiple factors such as prior approval of the drug for another indication, whether the trial meets the positive/negative endpoints, and the year when the trial is completed. We estimate that the debiased model generates financial value for the drug developer in six major therapeutic areas, with a range of US$763–1,365 million.


Our analysis shows that debiasing improves the financial efficiency of late-stage drug development. From the pharmacovigilance perspective, the debiased model is more likely to identify drugs that are both safe and effective. Meanwhile, it may predict a higher probability of success for drugs with potential adverse effects (because of its lower true-negative rate), thus it must be used with caution to predict the development outcomes of drug candidates currently in the pipeline.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. As of 5 December 2021. The Citeline data is continually updated to include new clinical trials.

  2. We use the notation of conditional probability, \(Q_{i} ( \cdot |X_{\text{train}})\), to emphasize that the latent space density is sensitive to the distribution of input features in the training dataset, \(X_{\text{train}}\).


  1. Scannell J, Blanckley A, Boldon H, Warrington B. Diagnosing the decline in pharmaceutical R&D efficiency. Nat Rev Drug Discov. 2012;11:191–200.

    Article  CAS  PubMed  Google Scholar 

  2. Wouters OJ, McKee M, Luyten J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA. 2020;323(9):844–53.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Project ALPHA. MIT Laboratory for Financial Engineering. 2021. Accessed 15 Jul 2021.

  4. Butler D. Translational research: crossing the valley of death. Nature. 2008;453:840–2.

    Article  CAS  PubMed  Google Scholar 

  5. DiMasi JA, Hermann JC, Twyman K, Kondru RK, Stergiopoulos S, Getz KA, et al. A tool for predicting regulatory approval after phase II testing of new oncology compounds. Clin Pharmacol Ther. 2015;98(5):506–13.

    Article  CAS  PubMed  Google Scholar 

  6. Goffin J, Baral S, Tu D, Nomikos D, Seymour L. Objective responses in patients with malignant melanoma or renal cell cancer in early clinical studies do not predict regulatory approval. Clin Cancer Res. 2005;11(16):5928–34.

    Article  PubMed  Google Scholar 

  7. El-Maraghi RH, Eisenhauer EA. Review of phase II trial designs used in studies of molecular targeted agents: outcomes and predictors of success in phase III. J Clin Oncol. 2008;26(8):1346–54.

    Article  PubMed  Google Scholar 

  8. Malik L, Mejia A, Parsons H, Ehler B, Mahalingam D, Brenner A, et al. Predicting success in regulatory approval from phase I results. Cancer Chemother Pharmacol. 2014;74:1099–103.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Beinse G, Tellier V, Charvet V, Deutsch E, Borget I, Massard C, et al. Prediction of drug approval after phase I clinical trials in oncology: RESOLVED2. JCO Clin Cancer Inform. 2019;3:1–10.

    Article  PubMed  Google Scholar 

  10. Lo AW, Siah KW, Wong CH. Machine learning with statistical imputation for predicting drug approvals. Harv Data Sci Rev. 2019.

    Article  Google Scholar 

  11. Informa Pharma Intelligence. Citeline. n.d. Accessed 5 Dec 2021.

  12. Wong CH, Siah KW, Lo AW. Estimation of clinical trial success rates and related parameters. Biostatistics. 2019;20(2):273–86.

    Article  PubMed  Google Scholar 

  13. Wong CH, Siah KW, Lo AW. Estimating clinical trial success rates and related parameters in oncology. SSRN preprint. 2019. Accessed 5 Dec 2021.

  14. Lo AW, Siah KW, Wong CH. Estimating probabilities of success of vaccine and other anti-infective therapeutic development programs. Harv Data Sci Rev. 2020.

    Article  Google Scholar 

  15. Siah KW, Kelley NW, Ballerstedt S, Holzhauer B, Lyu T, Mettler D, et al. Predicting drug approvals: the Novartis data science and artificial intelligence challenge. Patterns. 2021;2(8):100312.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Aronson JK, Green AR. Me-too pharmaceutical products: history, definitions, examples, and relevance to drug shortages and essential medicines lists. Br J Clin Pharmacol. 2020;86:2114–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. a survey on bias and fairness in machine learning. ACM Comput Surv. 2021;54(6):1–35.

    Article  Google Scholar 

  18. Weber M, Yurochkin M, Botros S, Markov V. Black loans matter: distributionally robust fairness for fighting subgroup discrimination. NeurIPS Fair AI in Finance Workshop 2020. Accessed 5 Dec 2021.

  19. Yapo A, Weiss JW. Ethical implications of bias in machine learning. Proceedings of 2018 Hawaii International Conference on System Sciences. 2018.

    Article  Google Scholar 

  20. Bandi H, Bertsimas D. The price of diversity. Arxiv preprint. 2021. Accessed 5 Dec 2021.

  21. Lambrecht A, Tucker C. Algorithmic bias? An empirical study of apparent gender-based discrimination in the display of STEM career ads. Manag Sci. 2019;65(7):2966–81.

    Article  Google Scholar 

  22. Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD. Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw. 2008;21(2–3):427–36.

    Article  PubMed  Google Scholar 

  23. Seyyed-Kalantari L, Liu G, McDermott M, Chen IY, Ghassemi M. CheXclusion: fairness gaps in deep chest X-ray classifiers. Pac Symp Biocomput. 2021;26:232–43.

    Article  CAS  PubMed  Google Scholar 

  24. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–53.

    Article  CAS  PubMed  Google Scholar 

  25. Bauder RA, Khoshgoftaar TM, Hasanin T. An empirical study on class rarity in big data. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). 2018. p. 785–790

  26. Bauder RA, Khoshgoftaar TM. The effects of varying class distribution on learner behavior for Medicare fraud detection with imbalanced big data. Health Inf Sci Syst. 2018;6(1):9.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Amini A, Soleimany AP, Schwarting W, Bhatia SN, Rus D. Uncovering and mitigating algorithmic bias through learned latent structure. Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 2019.

  28. Zhou ZH, Liu XY. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng. 2006;18(1):63–77.

    Article  Google Scholar 

  29. More A. Survey of resampling techniques for improving classification performance in unbalanced datasets. Arxiv preprint. 2016. Accessed 5 Dec 2021.

  30. Sattigeri P, Hoffman SC, Chenthamarakshan V, Varshney KR. Fairness GAN: generating datasets with fairness properties using a generative adversarial network. IBM J Res Dev. 2019;63(4–5):3:1-3:9.

    Article  Google Scholar 

  31. Calmon FP, Wei D, Vinzamuri B, Ramamurthy KN, Varshney KR. Optimized pre-processing for discrimination prevention. Adv Neural Inf Process Syst. 2017.

    Article  Google Scholar 

  32. Kingma DP, Welling M. Auto-encoding variational bayes. Arxiv preprint. 2013. Accessed 5 Dec 2021.

  33. Kullback S, Leibler RA. On information and sufficiency. Ann Math Stat. 1951;22(1):79–86.

    Article  Google Scholar 

  34. Cover TM, Hart PE. Nearest neighbor pattern classification. IEEE Trans Inf Theory. 1967;13(1):21–7.

    Article  Google Scholar 

  35. Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett. 2006;27(8):861–74.

    Article  Google Scholar 

  36. Brabec J, Machlica L. Bad practices in evaluation methodology relevant to class-imbalanced problems. Adv Neural Inf Process Syst. 2018. Accessed 5 Dec 2021.

  37. Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. ICLR 2014. Accessed 5 Dec 2021.

  38. Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(86):2579–605.

    Google Scholar 

  39. Harrington SE. Cost of capital for pharmaceutical, biotechnology, and medical device firms. In: Danzon PM, Nicholson S, editors. The oxford handbook of the economics of the biopharmaceutical industry. New York:Oxford University Press, Inc.; 2012.

  40. Krieger J, Li D, Papanikolaou D. Missing novelty in drug development. Rev Financ Stud. 2022;35(2):636–79.

    Article  Google Scholar 

Download references


We thank Informa Pharma Intelligence for providing us access to their Citeline data. We thank Kate Lyons and Jillian Ternullo for logistics support, and Jayna Cummings for editorial assistance. Research support from the MIT Laboratory for Financial Engineering is gratefully acknowledged. The views and opinions expressed in this article are those of the authors only, and do not necessarily represent the views and opinions of any institution or agency, any of their affiliates or employees acknowledged above.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Andrew W. Lo.

Ethics declarations


Research support from MIT Laboratory for Financial Engineering is gratefully acknowledged. No direct funding was received for this study and no funding bodies had any role in the study design, data collection and analysis, decision to publish, or preparation of this article. The authors were personally salaried by their institutions during the period of writing (though no specific salary was set aside or given for the writing of this manuscript).

Conflict of interest

Q.X. reports personal investments in publicly traded pharmaceutical companies. E.A. and A.A. are co-founders of Themis AI. D.R. reports personal investments in technology companies and mutual funds. D.R. is co-founder of Venti Technologies, ThemisAI, and The Routing Company. She is a member of the technology advisory board of British Telecom, Hyundai Motor Company, RobGlobal, Knowledge AI, Ten63, Venti, and RIIID. She is a member of the board of trustees of MBZUAI, a senior visiting fellow at The MITRE Corporation, and a member of Accenture’s Luminary program. She was a member of PCAST and DIB and has given recent talks at GITEX, TransformAI, Stavros Niarchos Foundation, Purdue, Harvard Radcliff, UIUC, University of Pennsylvania, Johns Hopkins University, ETH, EPFL, KTH, and University of Cambridge. D.R.’s research is funded by the USA Air Force, NSF, ONR, DARPA, Toyota Research Institute, IBM, The Boeing Company, Amazon, JPMC, DSTA, DSO, GIST, the Israel Ministry of Defense, AMS, SMART, and a TED Audacious Prize. A.W.L. reports personal investments in private biotech companies, biotech venture capital funds, and mutual funds. A.W.L. is a co-founder and partner of QLS Advisors, a healthcare analytics and consulting company; an advisor to Apricity Health, Aracari Bio, BrightEdge Impact Fund, Enable Medicine, FINRA, Lazard, Quantile Health, SalioGen Therapeutics, the Swiss Finance Institute, Thalēs, and Think Therapeutics; a director of AbCellera, Atomwise, BridgeBio Pharma, Roivant Sciences, and Annual Reviews; and a member of the NIH’s National Center for Advancing Translational Sciences Advisory Council. During the most recent 6-year period, A.W.L. has received speaking/consulting fees, honoraria, or other forms of compensation from: AlphaSimplex Group, Annual Reviews, the Bernstein Fabozzi Jacobs Levy Award, BIS, BridgeBio Pharma, Cambridge Associates, Chicago Mercantile Exchange, Financial Times, Harvard Kennedy School, IMF, JOIM, National Bank of Belgium, New Frontiers Advisors (for 2020 Harry M. Markowitz Prize), Q Group, Research Affiliates, Roivant Sciences, and the Swiss Finance Institute.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Availability of data and material

Citeline is a proprietary dataset provided by Informa Pharma Intelligence and can be accessed via commercial license. The probability of success estimates extracted from historical drug development data in Citeline can be accessed via Project ALPHA

Code availability

The code for this work is available from the corresponding author on reasonable request.

Authors’ contributions

The study was first conceived by D.R. and A.W.L. Data curation and preprocessing were performed by Q.X. The code for training the deep learning models, data visualization, and hyperparameter sensitivity analysis was written by E.A., with help from Q.X. and A.A. The deep learning models were trained by Q.X. on the MIT Engaging cluster. The code for formal analysis was primarily written by Q.X., with crucial inputs from E.A. and A.A. and supervision by D.R. and A.W.L. The first draft of the manuscript was written by Q.X. and E.A. All authors reviewed and commented on previous versions of the manuscript. All authors read and approved the final version of the manuscript.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 165 kb)

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, Q., Ahmadi, E., Amini, A. et al. Identifying and Mitigating Potential Biases in Predicting Drug Approvals. Drug Saf 45, 521–533 (2022).

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: