LLM Detectors

Da Silva Gameiro, Henrique

doi:10.1007/978-3-031-54827-7_22

Henrique Da Silva Gameiro⁶

10k Accesses

Abstract

LLM detectors aim at detecting text generated by an LLM. They can be categorized into two main types: specific detectors and general detectors. Specific detectors target a particular type of language or context, such as hate speech or spam. In contrast, general detectors aim to identify a broad range of problematic languages, such as misinformation or propaganda. They typically rely on supervised learning, using large labeled datasets to train the models to recognize patterns in the language. General-purpose detectors have shown bad results, but specific-purpose detectors have shown more promising results. This has to be nuanced due to the broad range of effective attacks, especially the paraphrasing attacks, to which all defense techniques are somewhat vulnerable. There are also many other challenges for developing detectors such as the growing numbers of different LLMs (open source or not) being developed and an effective detector that works with many human languages besides English. Mitigation techniques include storing user conversations with an LLM and watermarking (especially cryptographic).

Download to read the full chapter text

Chapter PDF

References

Maurice Jakesch, Jeffrey T. Hancock, and Mor Naaman. Human heuristics for AI-generated language are flawed. Proceedings of the National Academy of Sciences, 120(11):e2208839120, March 2023.
Google Scholar
Rowan Zellers et al. Defending Against Neural Fake News. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
Google Scholar
Da Silva Gameiro Henrique, Andrei Kucharavy, and Rachid Guerraoui. Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to Fine-Tune and Hard to Detect with other LLMs, April 2023. arXiv:2304.08968 [cs].
Google Scholar
Kalpesh Krishna et al. Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense, March 2023. arXiv:2303.13408 [cs].
Google Scholar
Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks, June 2023. arXiv:2306.07899 [cs].
Google Scholar
John Kirchenbauer et al. A Watermark for Large Language Models, June 2023. arXiv:2301. 10226 [cs].
Google Scholar
Vinu Sankar Sadasivan et al. Can AI-Generated Text be Reliably Detected?, June 2023. arXiv: 2303.11156 [cs].
Google Scholar
Eric Mitchell et al. DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature, January 2023. arXiv:2301.11305 [cs].
Google Scholar

Download references

Author information

Authors and Affiliations

EPFL, Lausanne, Switzerland
Henrique Da Silva Gameiro

Authors

Henrique Da Silva Gameiro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henrique Da Silva Gameiro .

Editor information

Editors and Affiliations

HES-SO Valais-Wallis, Sierre, Switzerland
Andrei Kucharavy
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Octave Plancherel
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Valentin Mulder
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Alain Mermoud
Cyber-Defence Campus, armasuisse Science and Technology, Thun, Switzerland
Vincent Lenders

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Da Silva Gameiro, H. (2024). LLM Detectors. In: Kucharavy, A., Plancherel, O., Mulder, V., Mermoud, A., Lenders, V. (eds) Large Language Models in Cybersecurity. Springer, Cham. https://doi.org/10.1007/978-3-031-54827-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-54827-7_22
Published: 12 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54826-0
Online ISBN: 978-3-031-54827-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics