Skip to main content
Log in

Decision-Making Under Uncertainty for Multi-stage Pipelines: Simulation Studies to Benchmark Screening Strategies

  • Applications of Autonomous Data Collection and Active Learning
  • Published:
JOM Aims and scope Submit manuscript

Abstract

Multi-stage screening pipelines are ubiquitous throughout experimental and computational science. Much of the effort in developing screening pipelines focuses on improving generative methods or surrogate models in an attempt to make each screening step effective for a specific application. Little focus has been placed on characterizing a generic screening pipeline’s performance with respect to the problem or problem parameters. Here, we develop methods to codify and simulate features and properties about the screening procedure in general. We outline and model common problem settings and identify potential opportunities to perform decision-making under uncertainty for optimizing the execution of screening pipelines. We then illustrate the developed methods through several simulation studies. We finally show how such studies can provide a quantification of the screening pipeline performance with respect to problem parameters, specifically identifying the significance of stage-wise covariance structure. We show how such structure can lead to qualitatively different screening behaviors and how screening can even perform worse than random in some cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. A.D. Andricopulo, R.V. Guido, and G. Oliva, Curr. Med. Chem. 15, 37 (2008).

    Article  Google Scholar 

  2. G. Schneider and H.-J. Böhm, Drug Discov. Today 7, 64 (2002).

    Article  Google Scholar 

  3. M. Kandeel and M. Al-Nazawi, Life Sci. 251, 117627 (2020).

    Article  Google Scholar 

  4. J. Bajorath, Nat. Rev. Drug Discov. 1, 882 (2002).

    Article  Google Scholar 

  5. B.K. Shoichet, Nature 432, 862 (2004).

    Article  Google Scholar 

  6. R. Gómez-Bombarelli, J. Aguilera-Iparraguirre, T.D. Hirzel, D. Duvenaud, D. Maclaurin, M.A. Blood-Forsythe, H.S. Chae, M. Einzinger, D.-G. Ha, T. Wu, G. Markopoulos, S. Jeon, H. Kang, H. Miyazaki, M. Numata, S. Kim, W. Huang, S.I. Hong, M. Baldo, R.P. Adams, and A. Aspuru-Guzik, Nat. Mater. 15, 1120 (2016).

    Article  Google Scholar 

  7. E.O. Pyzer-Knapp, C. Suh, R. Gómez-Bombarelli, J. Aguilera-Iparraguirre, and A. Aspuru-Guzik, Annu. Rev. Mater. Res. 45, 195 (2015).

    Article  Google Scholar 

  8. E.O. Pyzer-Knapp, G.N. Simm, and A.A. Guzik, Mater. Horiz. 3, 226 (2016).

    Article  Google Scholar 

  9. K. Abdel-Latif, R.W. Epps, F. Bateni, S. Han, K.G. Reyes, and M. Abolhasani, Adv. Intel. Sys. 3, 2000245 (2021).

    Article  Google Scholar 

  10. A.E. Gongora, K.L. Snapp, E. Whiting, P. Riley, K.G. Reyes, E.F. Morgan, and K.A. Brown, iScience 24, 102262 (2021).

    Article  Google Scholar 

  11. S. Baek and K.G. Reyes, Comput. Mater. Sci. 193, 110385 (2021).

    Article  Google Scholar 

  12. A. Tropsha and A. Golbraikh, Curr. Pharm. Des. 13, 3494 (2007).

    Article  Google Scholar 

  13. C.-A. Azencott, A. Ksikes, S.J. Swamidass, J.H. Chen, L. Ralaivola, and P. Baldi, J. Chem. Inf. Model. 47, 965 (2007).

    Article  Google Scholar 

  14. Q.U. Ain, A. Aleksandrova, F.D. Roessler, and P.J. Ballester, WIRES Comput. Mol. Sci. 5, 405 (2015).

    Article  Google Scholar 

  15. P.J. Ballester and J.B. Mitchell, Bioinformatics 26, 1169 (2010).

    Article  Google Scholar 

  16. B. Sanchez-Lengeling and A. Aspuru-Guzik, Science 361, 360 (2018).

    Article  Google Scholar 

  17. J. Lim, S. Ryu, J.W. Kim, and W.Y. Kim, J. Cheminform. 10, 1 (2018).

    Article  Google Scholar 

  18. W. Gao and C.W. Coley, J. Chem. Inf. Model. 60, 5714 (2020).

    Article  Google Scholar 

  19. K. Deb, Multi-objective optimization. In Search Methodologies (Springer, 2014), pp 403–449.

  20. N. Beume, B. Naujoks, and M. Emmerich, Eur. J. Oper. Res. 181, 1653 (2007).

    Article  Google Scholar 

  21. Y. Wang, K.G. Reyes, K.A. Brown, C.A. Mirkin, and W.B. Powell, SIAM, J. Sci. Comput. 37, B361 (2015).

  22. J. Snoek, H. Larochelle, and R.P. Adams, Practical Bayesian optimization of machine learning algorithms. In: F. Pereira, C. Burges, L. Bottou, K. Weinberger (Eds.) Advances in Neural Information Processing Systems, Vol. 25, (Curran Associates, Inc., 2012).

  23. P. Auer, N. Cesa-Bianchi, and P. Fischer, Mach. Learn. 47, 235 (2002).

    Article  Google Scholar 

  24. E.V. Bonilla, K. Chai, and C. Williams, Multi-task Gaussian process prediction. In: J. Platt, D. Koller, Y. Singer, and S. Roweis (Eds.) Advances in Neural Information Processing Systems, Vol. 20, (Curran Associates, Inc., 2007).

  25. A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin, Bayesian Data Analysis (Chapman and Hall/CRC Press, London, 2013), pp. 72–74.

    Book  Google Scholar 

  26. H.-M. Woo, X. Qian, L. Tan, S. Jha, F. J. Alexander, E.R. Dougherty, and B.-J. Yoon, Optimal decision making in high-throughput virtual screening pipelines. arXiv:2109.11683.

Download references

Acknowledgements

This material is based upon work supported in part by the National Science Foundation under Grant No. 1950796, NSF REU Site: “Data-driven Materials Design.” The work was also supported in part by the Brookhaven National Laboratory Directed Research and Development (LDRD) Grant No. 21-044. We thank Bill Bauer and Erik Einarsson for organizing the REU Site. We thank Byung-Jun Yoon, Nathan Urban and Frank Alexander for helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kristofer G. Reyes.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 3687 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reyes, K.G., Liu, J. & Vargas, C.J.D. Decision-Making Under Uncertainty for Multi-stage Pipelines: Simulation Studies to Benchmark Screening Strategies. JOM 74, 2897–2907 (2022). https://doi.org/10.1007/s11837-022-05368-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11837-022-05368-z

Navigation