Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet

Martínez, Gonzalo; Watson, Lauren; Reviriego, Pedro; Hernández, José Alberto; Juarez, Marc; Sarkar, Rik

doi:10.1007/978-3-031-57963-9_5

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14523))

Included in the following conference series:

International Workshop on Epistemic Uncertainty in Artificial Intelligence

335 Accesses

Abstract

The rapid adoption of generative Artificial Intelligence (AI) tools that can generate realistic images or text, such as DALL-E, MidJourney, or ChatGPT, have put the societal impacts of these technologies at the center of public debate. These tools are possible due to the massive amount of data (text and images) that is publicly available through the Internet. At the same time, these generative AI tools become content creators that are already contributing to the data that is available to train future models. Therefore, future versions of generative AI tools will be trained with a mix of human-created and AI-generated content, causing a potential feedback loop between generative AI and public data repositories. This interaction raises many questions: how will future versions of generative AI tools behave when trained on a mixture of real and AI-generated data? Will they evolve and improve with the new data sets or on the contrary will they degrade? Will evolution introduce biases or reduce diversity in subsequent generations of generative AI tools? What are the societal implications of the possible degradation of these models? Can we mitigate the effects of this feedback loop? In this work, we explore the effect of this interaction and report some initial results using simple diffusion models trained with various image datasets. Our results show that the quality and diversity of the generated images can degrade over time suggesting that incorporating AI-created data can have undesired effects on future versions of generative models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/gonz-mart/Towards-Understanding-the-Interplay-of-Generative-Artificial-Intelligence-and-the-Internet.

References

Azizi, S., Kornblith, S., Saharia, C., Norouzi, M., Fleet, D.J.: Synthetic data from diffusion models improves imagenet classification. arXiv preprint arXiv:2304.08466 (2023)
Bansal, M.A., Sharma, D.R., Kathuria, D.M.: A systematic review on data scarcity problem in deep learning: solution and applications. ACM Comput. Surv. 54(10s) (2022). https://doi.org/10.1145/3502287
Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., Verdoliva, L.: Intriguing properties of synthetic images: from generative adversarial networks to diffusion models (2023)
Google Scholar
Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29(6), 141–142 (2012)
Article Google Scholar
Doyle, J.C., Francis, B.A., Tannenbaum, A.R.: Feedback Control Theory. Courier Corporation, Chelmsford (2013)
Google Scholar
Fahimi, F., Dosen, S., Ang, K.K., Mrachacz-Kersting, N., Guan, C.: Generative adversarial networks-based data augmentation for brain-computer interface. IEEE Trans. Neural Netw. Learn. Syst. 32(9), 4039–4051 (2021). https://doi.org/10.1109/TNNLS.2020.3016666
Article Google Scholar
Fournaris, A.P., Lalos, A.S., Serpanos, D.: Generative adversarial networks in AI-enabled safety-critical systems: friend or foe? Computer 52(9), 78–81 (2019). https://doi.org/10.1109/MC.2019.2924546
Article Google Scholar
Gozalo-Brizuela, R., Garrido-Merchan, E.C.: ChatGPT is not all you need. a state of the art review of large generative AI models. arXiv (2023). https://doi.org/10.48550/ARXIV.2301.04655, https://arxiv.org/abs/2301.04655
Hataya, R., Bao, H., Arai, H.: Will large-scale generative models corrupt future datasets? arXiv preprint arXiv:2211.08095 (2022)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020)
Google Scholar
Ho, J., Salimans, T.: Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)
Jiang, R., Chiappa, S., Lattimore, T., György, A., Kohli, P.: Degenerate feedback loops in recommender systems. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 383–390, January 2019.https://doi.org/10.1145/3306618.3314288
Jiang, Z., Zhang, J., Gong, N.Z.: Evading watermark based detection of AI-generated content. arXiv preprint arXiv:2305.03807 (2023)
Karagiannakos, S., Adaloglou, N.: Diffusion models: toward state-of-the-art image generation (2022). https://theaisummer.com/
Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., Aila, T.: Improved precision and recall metric for assessing generative models. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Laurençon, H., et al.: The bigscience roots corpus: a 1.6 TB composite multilingual dataset. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 31809–31826. Curran Associates, Inc. (2022)
Google Scholar
Lhoest, Q., et al.: Datasets: a community library for natural language processing. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 175–184. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, November 2021.https://doi.org/10.18653/v1/2021.emnlp-demo.21, https://aclanthology.org/2021.emnlp-demo.21
Li, C., et al.: Geometry-based molecular generation with deep constrained variational autoencoder. IEEE Trans. Neural Netw. Learn. Syst. 1–10 (2022).https://doi.org/10.1109/TNNLS.2022.3147790
Mansoury, M., Abdollahpouri, H., Pechenizkiy, M., Mobasher, B., Burke, R.: Feedback loop and bias amplification in recommender systems. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 2145–2148 (2020)
Google Scholar
Martínez, G., Watson, L., Reviriego, P., Hernández, J.A., Juarez, M., Sarkar, R.: Combining generative artificial intelligence (AI) and the internet: heading towards evolution or degradation? (2023)
Google Scholar
Naeem, M.F., Oh, S.J., Uh, Y., Choi, Y., Yoo, J.: Reliable fidelity and diversity metrics for generative models. In: International Conference on Machine Learning, pp. 7176–7185. PMLR (2020)
Google Scholar
Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning, pp. 8162–8171. PMLR (2021)
Google Scholar
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, December 2008
Google Scholar
Schuhmann, C., et al.: LAION-5B: an open large-scale dataset for training next generation image-text models. arXiv (2022). https://doi.org/10.48550/ARXIV.2210.08402, https://arxiv.org/abs/2210.08402
Schuhmann, C., et al.: LAION-400M: open dataset of CLIP-filtered 400 million image-text pairs. arXiv (2021). https://doi.org/10.48550/ARXIV.2111.02114, https://arxiv.org/abs/2111.02114
Shumailov, I., Shumaylov, Z., Zhao, Y., Gal, Y., Papernot, N., Anderson, R.: The curse of recursion: training on generated data makes models forget (2023)
Google Scholar
Simard, M.: Clean data for training statistical MT: the case of MT contamination. In: Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track, pp. 69–82 (2014)
Google Scholar
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
Szegedy, C., et al.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: Caltech-UCSD birds-200-2011 (cub-200-2011). Technical report. CNS-TR-2011-001, California Institute of Technology (2011)
Google Scholar
Weng, L.: What are diffusion models? lilianweng.github.io, July 2021. https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
Xiao, Z., Kreis, K., Vahdat, A.: Tackling the generative learning trilemma with denoising diffusion GANs. arXiv preprint arXiv:2112.07804 (2021)
Zhang, C., Geng, Y., Han, Z., Liu, Y., Fu, H., Hu, Q.: Autoencoder in autoencoder networks. IEEE Trans. Neural Netw. Learn. Syst. 1–13 (2022). https://doi.org/10.1109/TNNLS.2022.3189239
Zhang, C., Zhang, C., Zhang, M., Kweon, I.S.: Text-to-image diffusion model in generative AI: a survey. arXiv preprint arXiv:2303.07909 (2023)
Zhang, C., Zhang, C., Zhang, M., Kweon, I.S.: Text-to-image diffusion models in generative AI: a survey (2023)
Google Scholar

Download references

Acknowledgements

This work was supported by the FUN4DATE (PID2022-136684O7B-C21/22) and ENTRUDIT (TED2021-130118B-I00) projects funded by the Spanish Agencia Estatal de Investigacion (AEI).

Author information

Authors and Affiliations

Universidad Carlos III de Madrid, Madrid, Spain
Gonzalo Martínez & José Alberto Hernández
School of Informatics, University of Edinburgh, Edinburgh, UK
Lauren Watson, Marc Juarez & Rik Sarkar
Universidad Politécnica de Madrid, Madrid, Spain
Pedro Reviriego

Authors

Gonzalo Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Lauren Watson
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Reviriego
View author publications
You can also search for this author in PubMed Google Scholar
José Alberto Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Marc Juarez
View author publications
You can also search for this author in PubMed Google Scholar
Rik Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gonzalo Martínez .

Editor information

Editors and Affiliations

Dept. of Computing & Mathematics, Oxford Brookes University, Oxford, UK
Fabio Cuzzolin
Oxford Brookes University, Oxford, UK
Maryam Sultana

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martínez, G., Watson, L., Reviriego, P., Hernández, J.A., Juarez, M., Sarkar, R. (2024). Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet. In: Cuzzolin, F., Sultana, M. (eds) Epistemic Uncertainty in Artificial Intelligence . Epi UAI 2023. Lecture Notes in Computer Science(), vol 14523. Springer, Cham. https://doi.org/10.1007/978-3-031-57963-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-57963-9_5
Published: 24 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57962-2
Online ISBN: 978-3-031-57963-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet