Abstract
LLM detectors aim at detecting text generated by an LLM. They can be categorized into two main types: specific detectors and general detectors. Specific detectors target a particular type of language or context, such as hate speech or spam. In contrast, general detectors aim to identify a broad range of problematic languages, such as misinformation or propaganda. They typically rely on supervised learning, using large labeled datasets to train the models to recognize patterns in the language. General-purpose detectors have shown bad results, but specific-purpose detectors have shown more promising results. This has to be nuanced due to the broad range of effective attacks, especially the paraphrasing attacks, to which all defense techniques are somewhat vulnerable. There are also many other challenges for developing detectors such as the growing numbers of different LLMs (open source or not) being developed and an effective detector that works with many human languages besides English. Mitigation techniques include storing user conversations with an LLM and watermarking (especially cryptographic).
Chapter PDF
References
Maurice Jakesch, Jeffrey T. Hancock, and Mor Naaman. Human heuristics for AI-generated language are flawed. Proceedings of the National Academy of Sciences, 120(11):e2208839120, March 2023.
Rowan Zellers et al. Defending Against Neural Fake News. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
Da Silva Gameiro Henrique, Andrei Kucharavy, and Rachid Guerraoui. Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to Fine-Tune and Hard to Detect with other LLMs, April 2023. arXiv:2304.08968 [cs].
Kalpesh Krishna et al. Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense, March 2023. arXiv:2303.13408 [cs].
Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks, June 2023. arXiv:2306.07899 [cs].
John Kirchenbauer et al. A Watermark for Large Language Models, June 2023. arXiv:2301. 10226 [cs].
Vinu Sankar Sadasivan et al. Can AI-Generated Text be Reliably Detected?, June 2023. arXiv: 2303.11156 [cs].
Eric Mitchell et al. DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature, January 2023. arXiv:2301.11305 [cs].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this chapter
Cite this chapter
Da Silva Gameiro, H. (2024). LLM Detectors. In: Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., Lenders, V. (eds) Large Language Models in Cybersecurity. Springer, Cham. https://doi.org/10.1007/978-3-031-54827-7_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-54827-7_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54826-0
Online ISBN: 978-3-031-54827-7
eBook Packages: Computer ScienceComputer Science (R0)