Abstract
Multi-stage screening pipelines are ubiquitous throughout experimental and computational science. Much of the effort in developing screening pipelines focuses on improving generative methods or surrogate models in an attempt to make each screening step effective for a specific application. Little focus has been placed on characterizing a generic screening pipeline’s performance with respect to the problem or problem parameters. Here, we develop methods to codify and simulate features and properties about the screening procedure in general. We outline and model common problem settings and identify potential opportunities to perform decision-making under uncertainty for optimizing the execution of screening pipelines. We then illustrate the developed methods through several simulation studies. We finally show how such studies can provide a quantification of the screening pipeline performance with respect to problem parameters, specifically identifying the significance of stage-wise covariance structure. We show how such structure can lead to qualitatively different screening behaviors and how screening can even perform worse than random in some cases.
Similar content being viewed by others
References
A.D. Andricopulo, R.V. Guido, and G. Oliva, Curr. Med. Chem. 15, 37 (2008).
G. Schneider and H.-J. Böhm, Drug Discov. Today 7, 64 (2002).
M. Kandeel and M. Al-Nazawi, Life Sci. 251, 117627 (2020).
J. Bajorath, Nat. Rev. Drug Discov. 1, 882 (2002).
B.K. Shoichet, Nature 432, 862 (2004).
R. Gómez-Bombarelli, J. Aguilera-Iparraguirre, T.D. Hirzel, D. Duvenaud, D. Maclaurin, M.A. Blood-Forsythe, H.S. Chae, M. Einzinger, D.-G. Ha, T. Wu, G. Markopoulos, S. Jeon, H. Kang, H. Miyazaki, M. Numata, S. Kim, W. Huang, S.I. Hong, M. Baldo, R.P. Adams, and A. Aspuru-Guzik, Nat. Mater. 15, 1120 (2016).
E.O. Pyzer-Knapp, C. Suh, R. Gómez-Bombarelli, J. Aguilera-Iparraguirre, and A. Aspuru-Guzik, Annu. Rev. Mater. Res. 45, 195 (2015).
E.O. Pyzer-Knapp, G.N. Simm, and A.A. Guzik, Mater. Horiz. 3, 226 (2016).
K. Abdel-Latif, R.W. Epps, F. Bateni, S. Han, K.G. Reyes, and M. Abolhasani, Adv. Intel. Sys. 3, 2000245 (2021).
A.E. Gongora, K.L. Snapp, E. Whiting, P. Riley, K.G. Reyes, E.F. Morgan, and K.A. Brown, iScience 24, 102262 (2021).
S. Baek and K.G. Reyes, Comput. Mater. Sci. 193, 110385 (2021).
A. Tropsha and A. Golbraikh, Curr. Pharm. Des. 13, 3494 (2007).
C.-A. Azencott, A. Ksikes, S.J. Swamidass, J.H. Chen, L. Ralaivola, and P. Baldi, J. Chem. Inf. Model. 47, 965 (2007).
Q.U. Ain, A. Aleksandrova, F.D. Roessler, and P.J. Ballester, WIRES Comput. Mol. Sci. 5, 405 (2015).
P.J. Ballester and J.B. Mitchell, Bioinformatics 26, 1169 (2010).
B. Sanchez-Lengeling and A. Aspuru-Guzik, Science 361, 360 (2018).
J. Lim, S. Ryu, J.W. Kim, and W.Y. Kim, J. Cheminform. 10, 1 (2018).
W. Gao and C.W. Coley, J. Chem. Inf. Model. 60, 5714 (2020).
K. Deb, Multi-objective optimization. In Search Methodologies (Springer, 2014), pp 403–449.
N. Beume, B. Naujoks, and M. Emmerich, Eur. J. Oper. Res. 181, 1653 (2007).
Y. Wang, K.G. Reyes, K.A. Brown, C.A. Mirkin, and W.B. Powell, SIAM, J. Sci. Comput. 37, B361 (2015).
J. Snoek, H. Larochelle, and R.P. Adams, Practical Bayesian optimization of machine learning algorithms. In: F. Pereira, C. Burges, L. Bottou, K. Weinberger (Eds.) Advances in Neural Information Processing Systems, Vol. 25, (Curran Associates, Inc., 2012).
P. Auer, N. Cesa-Bianchi, and P. Fischer, Mach. Learn. 47, 235 (2002).
E.V. Bonilla, K. Chai, and C. Williams, Multi-task Gaussian process prediction. In: J. Platt, D. Koller, Y. Singer, and S. Roweis (Eds.) Advances in Neural Information Processing Systems, Vol. 20, (Curran Associates, Inc., 2007).
A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin, Bayesian Data Analysis (Chapman and Hall/CRC Press, London, 2013), pp. 72–74.
H.-M. Woo, X. Qian, L. Tan, S. Jha, F. J. Alexander, E.R. Dougherty, and B.-J. Yoon, Optimal decision making in high-throughput virtual screening pipelines. arXiv:2109.11683.
Acknowledgements
This material is based upon work supported in part by the National Science Foundation under Grant No. 1950796, NSF REU Site: “Data-driven Materials Design.” The work was also supported in part by the Brookhaven National Laboratory Directed Research and Development (LDRD) Grant No. 21-044. We thank Bill Bauer and Erik Einarsson for organizing the REU Site. We thank Byung-Jun Yoon, Nathan Urban and Frank Alexander for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Reyes, K.G., Liu, J. & Vargas, C.J.D. Decision-Making Under Uncertainty for Multi-stage Pipelines: Simulation Studies to Benchmark Screening Strategies. JOM 74, 2897–2907 (2022). https://doi.org/10.1007/s11837-022-05368-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11837-022-05368-z