Skip to main content

Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet

  • Conference paper
  • First Online:
Epistemic Uncertainty in Artificial Intelligence (Epi UAI 2023)

Abstract

The rapid adoption of generative Artificial Intelligence (AI) tools that can generate realistic images or text, such as DALL-E, MidJourney, or ChatGPT, have put the societal impacts of these technologies at the center of public debate. These tools are possible due to the massive amount of data (text and images) that is publicly available through the Internet. At the same time, these generative AI tools become content creators that are already contributing to the data that is available to train future models. Therefore, future versions of generative AI tools will be trained with a mix of human-created and AI-generated content, causing a potential feedback loop between generative AI and public data repositories. This interaction raises many questions: how will future versions of generative AI tools behave when trained on a mixture of real and AI-generated data? Will they evolve and improve with the new data sets or on the contrary will they degrade? Will evolution introduce biases or reduce diversity in subsequent generations of generative AI tools? What are the societal implications of the possible degradation of these models? Can we mitigate the effects of this feedback loop? In this work, we explore the effect of this interaction and report some initial results using simple diffusion models trained with various image datasets. Our results show that the quality and diversity of the generated images can degrade over time suggesting that incorporating AI-created data can have undesired effects on future versions of generative models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/gonz-mart/Towards-Understanding-the-Interplay-of-Generative-Artificial-Intelligence-and-the-Internet.

References

  1. Azizi, S., Kornblith, S., Saharia, C., Norouzi, M., Fleet, D.J.: Synthetic data from diffusion models improves imagenet classification. arXiv preprint arXiv:2304.08466 (2023)

  2. Bansal, M.A., Sharma, D.R., Kathuria, D.M.: A systematic review on data scarcity problem in deep learning: solution and applications. ACM Comput. Surv. 54(10s) (2022). https://doi.org/10.1145/3502287

  3. Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., Verdoliva, L.: Intriguing properties of synthetic images: from generative adversarial networks to diffusion models (2023)

    Google Scholar 

  4. Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29(6), 141–142 (2012)

    Article  Google Scholar 

  5. Doyle, J.C., Francis, B.A., Tannenbaum, A.R.: Feedback Control Theory. Courier Corporation, Chelmsford (2013)

    Google Scholar 

  6. Fahimi, F., Dosen, S., Ang, K.K., Mrachacz-Kersting, N., Guan, C.: Generative adversarial networks-based data augmentation for brain-computer interface. IEEE Trans. Neural Netw. Learn. Syst. 32(9), 4039–4051 (2021). https://doi.org/10.1109/TNNLS.2020.3016666

    Article  Google Scholar 

  7. Fournaris, A.P., Lalos, A.S., Serpanos, D.: Generative adversarial networks in AI-enabled safety-critical systems: friend or foe? Computer 52(9), 78–81 (2019). https://doi.org/10.1109/MC.2019.2924546

    Article  Google Scholar 

  8. Gozalo-Brizuela, R., Garrido-Merchan, E.C.: ChatGPT is not all you need. a state of the art review of large generative AI models. arXiv (2023). https://doi.org/10.48550/ARXIV.2301.04655, https://arxiv.org/abs/2301.04655

  9. Hataya, R., Bao, H., Arai, H.: Will large-scale generative models corrupt future datasets? arXiv preprint arXiv:2211.08095 (2022)

  10. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  11. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020)

    Google Scholar 

  12. Ho, J., Salimans, T.: Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)

  13. Jiang, R., Chiappa, S., Lattimore, T., György, A., Kohli, P.: Degenerate feedback loops in recommender systems. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 383–390, January 2019.https://doi.org/10.1145/3306618.3314288

  14. Jiang, Z., Zhang, J., Gong, N.Z.: Evading watermark based detection of AI-generated content. arXiv preprint arXiv:2305.03807 (2023)

  15. Karagiannakos, S., Adaloglou, N.: Diffusion models: toward state-of-the-art image generation (2022). https://theaisummer.com/

  16. Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., Aila, T.: Improved precision and recall metric for assessing generative models. Adv. Neural Inf. Process. Syst. 32 (2019)

    Google Scholar 

  17. Laurençon, H., et al.: The bigscience roots corpus: a 1.6 TB composite multilingual dataset. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 31809–31826. Curran Associates, Inc. (2022)

    Google Scholar 

  18. Lhoest, Q., et al.: Datasets: a community library for natural language processing. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 175–184. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, November 2021.https://doi.org/10.18653/v1/2021.emnlp-demo.21, https://aclanthology.org/2021.emnlp-demo.21

  19. Li, C., et al.: Geometry-based molecular generation with deep constrained variational autoencoder. IEEE Trans. Neural Netw. Learn. Syst. 1–10 (2022).https://doi.org/10.1109/TNNLS.2022.3147790

  20. Mansoury, M., Abdollahpouri, H., Pechenizkiy, M., Mobasher, B., Burke, R.: Feedback loop and bias amplification in recommender systems. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 2145–2148 (2020)

    Google Scholar 

  21. Martínez, G., Watson, L., Reviriego, P., Hernández, J.A., Juarez, M., Sarkar, R.: Combining generative artificial intelligence (AI) and the internet: heading towards evolution or degradation? (2023)

    Google Scholar 

  22. Naeem, M.F., Oh, S.J., Uh, Y., Choi, Y., Yoo, J.: Reliable fidelity and diversity metrics for generative models. In: International Conference on Machine Learning, pp. 7176–7185. PMLR (2020)

    Google Scholar 

  23. Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning, pp. 8162–8171. PMLR (2021)

    Google Scholar 

  24. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, December 2008

    Google Scholar 

  25. Schuhmann, C., et al.: LAION-5B: an open large-scale dataset for training next generation image-text models. arXiv (2022). https://doi.org/10.48550/ARXIV.2210.08402, https://arxiv.org/abs/2210.08402

  26. Schuhmann, C., et al.: LAION-400M: open dataset of CLIP-filtered 400 million image-text pairs. arXiv (2021). https://doi.org/10.48550/ARXIV.2111.02114, https://arxiv.org/abs/2111.02114

  27. Shumailov, I., Shumaylov, Z., Zhao, Y., Gal, Y., Papernot, N., Anderson, R.: The curse of recursion: training on generated data makes models forget (2023)

    Google Scholar 

  28. Simard, M.: Clean data for training statistical MT: the case of MT contamination. In: Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track, pp. 69–82 (2014)

    Google Scholar 

  29. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)

  30. Szegedy, C., et al.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594

  31. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: Caltech-UCSD birds-200-2011 (cub-200-2011). Technical report. CNS-TR-2011-001, California Institute of Technology (2011)

    Google Scholar 

  32. Weng, L.: What are diffusion models? lilianweng.github.io, July 2021. https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

  33. Xiao, Z., Kreis, K., Vahdat, A.: Tackling the generative learning trilemma with denoising diffusion GANs. arXiv preprint arXiv:2112.07804 (2021)

  34. Zhang, C., Geng, Y., Han, Z., Liu, Y., Fu, H., Hu, Q.: Autoencoder in autoencoder networks. IEEE Trans. Neural Netw. Learn. Syst. 1–13 (2022). https://doi.org/10.1109/TNNLS.2022.3189239

  35. Zhang, C., Zhang, C., Zhang, M., Kweon, I.S.: Text-to-image diffusion model in generative AI: a survey. arXiv preprint arXiv:2303.07909 (2023)

  36. Zhang, C., Zhang, C., Zhang, M., Kweon, I.S.: Text-to-image diffusion models in generative AI: a survey (2023)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the FUN4DATE (PID2022-136684O7B-C21/22) and ENTRUDIT (TED2021-130118B-I00) projects funded by the Spanish Agencia Estatal de Investigacion (AEI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gonzalo Martínez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Martínez, G., Watson, L., Reviriego, P., Hernández, J.A., Juarez, M., Sarkar, R. (2024). Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet. In: Cuzzolin, F., Sultana, M. (eds) Epistemic Uncertainty in Artificial Intelligence . Epi UAI 2023. Lecture Notes in Computer Science(), vol 14523. Springer, Cham. https://doi.org/10.1007/978-3-031-57963-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-57963-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-57962-2

  • Online ISBN: 978-3-031-57963-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics