Skip to main content
Log in

The perpetual motion machine of AI-generated data and the distraction of ChatGPT as a ‘scientist’

  • Correspondence
  • Published:

From Nature Biotechnology

View current issue Submit your manuscript

This article has been updated

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Change history

  • 30 April 2024

    In the version of the article initially published, in Box 1, the text now reading “∼300 billion web pages” originally read “∼3 billion web pages” and has now been amended in the HTML and PDF versions of the article.

References

  1. Burley, S. K. et al. Nucleic Acids Res. 51, D488–D508 (2023).

    Article  CAS  PubMed  Google Scholar 

  2. Jumper, J. et al. Nature 596, 583–289 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Terwilliger, T. C. et al. Nat. Methods https://www.nature.com/articles/s41592-023-02087-4 (2023).

  4. Jahanian, A., Puig, X., Tian, Y. & Isola, P. Generative models as a data source for multiview representation learning. Preprint at arXiv https://arxiv.org/abs/2106.05258 (2022).

  5. Dietterich, T. G. In Multiple Classifier Systems (MCS 2000), Lecture Notes in Computer Science Vol. 1857 (Springer, 2000).

  6. Schuhmann, C. et al. LAION-5B: an open large-scale dataset for training next generation image-text models. Preprint at arXiv https://arxiv.org/abs/2210.08402v1 (2022).

  7. Deng, J. et al. Fundam. Res. 3, 727–737 (2023).

    Article  CAS  Google Scholar 

  8. Kearnes, S. M. et al. J. Am. Chem. Soc. 143, 18820–18826 (2021).

    Article  CAS  PubMed  Google Scholar 

  9. Tran, R. et al. ACS Catal. 13, 3066–3084 (2022).

    Article  Google Scholar 

  10. Sriram, A. et al. The Open DAC 2023 dataset and challenges for sorbent discovery in direct air capture. Preprint at arXiv https://arxiv.org/abs/2311.00341v2 (2023).

  11. Jain, A. et al. APL Mater. 1, 11002 (2013).

    Article  Google Scholar 

Download references

Acknowledgements

Thanks to Tyler Bonnen, James Bowden, Jennifer Doudna, Lisa Dunlap, Alyosha Efros, Nicolo Fusi, Aaron Hertzmann, Hanlun Jiang, Aditi Krishnapriyan, Jitendra Malik, Sara Mostafavi, Hunter Nisonoff and Ben Recht for helpful comments on this piece as it was taking shape.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jennifer Listgarten.

Ethics declarations

Competing interests

The author declares no competing interests.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Listgarten, J. The perpetual motion machine of AI-generated data and the distraction of ChatGPT as a ‘scientist’. Nat Biotechnol 42, 371–373 (2024). https://doi.org/10.1038/s41587-023-02103-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-023-02103-0

  • Springer Nature America, Inc.

This article is cited by

Navigation