Generative AI models should include detection mechanisms as a condition for public release

The new wave of ‘foundation models’—general-purpose generative AI models, for production of text (e.g., ChatGPT) or images (e.g., MidJourney)—represent a dramatic advance in the state of the art for AI. But their use also introduces a range of new risks, which has prompted an ongoing conversation about possible regulatory mechanisms. Here we propose a specific principle that should be incorporated into legislation: that any organization developing a foundation model intended for public use must demonstrate a reliable detection mechanism for the content it generates, as a condition of its public release. The detection mechanism should be made publicly available in a tool that allows users to query, for an arbitrary item of content, whether the item was generated (wholly or partly) by the model. In this paper, we argue that this requirement is technically feasible and would play an important role in reducing certain risks from new AI models in many domains. We also outline a number of options for the tool’s design, and summarize a number of points where further input from policymakers and researchers would be required.

The new class of generative AI models, sometimes termed 'foundation models' 1 (FMs), have achieved dramatic advances in AI (Bommasani et al., 2022).Foundation models are trained on very large, domain-general datasets; after training, they have amazing abilities to generate content of the kind they were trained on.For instance, ChatGPT can generate humanlike text and dialogue contributions; Mid-Journey can generate realistic images.While earlier AI systems were able to generate small amounts of content (for instance, suggesting spelling or style changes to an existing text, or making alterations to images), foundation models can generate high-quality content from scratch, from minimal prompts.
Foundation models also introduce a range of new risks (see again Bommasani et al., 2022).Policymakers and AI researchers are engaged in very active discussions about these risks, and the regulatory measures that might 55 Page 2 of 7 practically manage them (see Hurst, 2023 for a recent survey).In this paper, we focus on one key risk, concerning the provenance of FM-generated content.Texts or images created by FMs can now readily pass as human-generated (see e.g., Jakesh et al., 2023;Waltzer et al., 2023).As FMgenerated content begins to flood the Web and the Apps ecosystem, human consumers of content will be faced with a brand new authentication problem: determining whether a given item they encounter was produced by a person or a machine. 2hy is it important to know this?Emphatically not because human-produced content is always 'better' than FM-generated content: this is certainly not the case (Bubeck et al., 2023;Singhal et al., 2023).It is rather that human and FM-generated content need to be assessed very differently, because of their very different origin.Consider a piece of text, encountered by a human reader.In many contexts, her assessment of the text will run very differently if she knows it was generated by an AI system.If she is a teacher assessing a piece of student work, she may want to know how engaged the student has been with the text: have they read it closely, has its content been assimilated?How much learning has taken place?If she is an employer assessing a contractor's report, she may want to know how carefully the provider has overseen its generation: how much work has the contractor done in producing the report?If she is assessing the text as a content moderator working in a social media company, she may want to know whether it is part of a larger-scale communication campaign, given that FMs can readily generate personalized communications at scale, including harmful disinformation (e.g., Newsguard, 2023;Tamkin et al., 2021).If she is a citizen receiving the text as professional advice from her doctor or lawyer, she may want to know how thoroughly it has been checked for errors, given the known problems of errors in FM output (e.g., Ji et al., 2023) and overreliance on FM output by human operators (e.g., Wang et al., 2023). 3In each case, the human assessor needs to know whether the text is human-or AI-generated, in order to make a proper assessment.The reasons for this need vary greatly between domains.In professional interactions they are about ensuring accuracy; in education they are about ensuring effective student assessment; in social media contexts they are about ensuring a safe Internet.One might argue that human consumers have a general 'right to know' whether the content they encounter was produced by a person or a machine.In fact we argued for this position in a previous paper (GPAI, 2023).But a more pragmatic argument can also be advanced, that in specific domains and contexts, consumers have specific needs to know about the origin of the content they assess, which justify expenses incurred by the producer in providing this information.This is the kind of argument that justifies laws about labeling the ingredients in food: consumers have no universal right to know what is in the food they eat, but in specific products and sale contexts, their need justifies rules requiring some information to be given (see Messer et al, 2017 for an overview of relevant consumer law).
There are already many actual or proposed laws that require purveyors of AI-generated content to identify it as such.For instance, Article 52.1 of the AI Act being developed by the EU (EU, 2021) requires that AI systems interacting directly with users are clearly identified as AI systems; California's BOT Act already in force (SB1001, 2018) makes similar requirements in specific commercial and political use cases.But these laws do not meet the case we are considering, which is where AI-generated content is disseminated beyond the interactive tool through which it was generated, and consumers encounter it 'indirectly', in some arbitrary new online or offline context.Some laws cover this dissemination process, by placing obligations on the disseminator.For instance, the EU's proposed AI Act (Article 52.3) places obligations on people who disseminate one specific type of AI-generated content ('deep fakes') to label this content as AI-generated.This is a useful measure-but consumers cannot rely on disseminators of AI content doing the right thing, even if it is required by law.Regulation must also cater for disseminators who do not disclose the AI origin of the content they spread.We argue that consumers should have the ability to determine whether some arbitrary item they see was generated (wholly or partly) by FMs.
The only way we see to meet this consumer need at present is with a tool that allows automatic detection of FMgenerated content.In the tool we envisage, the user uploads an arbitrary piece of online content, and the tool responds with an analysis of its human or machine provenance. 4We will discuss this analysis below-for now, our argument is that to help keep FM content generators safe, consumers need access to another AI tool, for the detection of FMgenerated content. 5age 3 of 7 55 A detection tool for FM-generated content would be valuable for companies that supply content to consumers, as well as for consumers themselves.Keeping social media platforms safe from large-scale disinformation campaigns is a pressing issue which poses considerable threats to democratic processes.A reliable detection tool for FM-generated content could be used by social media companies to detect and defuse such campaigns.The remainder of this paper is concerned with mechanisms that ensure that a detection tool of this kind can be made reliable.

A high-level proposal for legislation, and some questions for discussion
There are many tools that attempt to distinguish AI-generated from non-AI-generated content, both for text (e.g., Chaka, 2023) and images (e.g., Stroebel et al., 2023).But as FM generators improve, the ability of detectors to identify FM-generated content purely from an analysis of the content is likely to diminish rapidly (see e.g., Thompson & Hsu, 2023).Text generators are producing increasingly humanlike text, and image generators are producing increasingly realistic images: as generators get better at generalizing from their training inputs, the patterns that distinguish FM-generated content from authentic content necessarily become harder to identify.A consensus is emerging that the only way to create a reliable detector for FM-generated content, as generators improve, is to instrument the generator in some way, to support detection (see e.g., Kirchenbauer et al., 2023a;Tulchinskii et al., 2023).This 'instrumentation' might involve placing hidden patterns or 'watermarks' inside generated content that a detector can identify.But there are other methods too; we will review several options below.For now, the key point is that if reliable detection mechanisms require generators that are configured to support detection, then responsibility for workable detection mechanisms ultimately rests with the organizations that build the generators.
We suggest that legislation should recognise this responsibility.Specifically, we propose that any organization that develops a LLM intended for public use should be required by law to demonstrate a reliable detection tool for the content the model generates, as a condition for its release to the public.After release, the detection tool should be freely available to the public.
We made this proposal in an earlier paper (GPAI, 2023), 6  and it has stimulated considerable discussion amongst AI researchers and policymakers.In the remainder of the current paper, we will summarize the main issues that have arisen in this discussion, and our initial thoughts on these.Our focus is on the high-level policy questions that should be resolved before any detailed legislation is drafted.
What generative models are in scope for the proposed rule?
55 Page 4 of 7 as we discuss below.A final question concerns how complex or realistic a generator needs to be before our rule applies.We suggest realism is a more appropriate criterion than complexity, given the possibility of distilling smaller models from large ones (Hinton et al., 2015).Naturally, the most realistic generators will be the ones most used by the public, so a definition focussing on public use may be sufficient here.

Possible detection methods
There are several ways of instrumenting an AI content generator to support detection.One is to include watermarks in the generated content.This method has been demonstrated for text and image generators (see, e.g., Kirchenbauer et al., 2023aKirchenbauer et al., , 2023b;;Zhao et al., 2023).Other methods involve exploiting statistical features of FM-generated content (see, e.g., Mitchell et al., 2023 for a method operating on text content).A final method, which we feel needs more attention, is for the producer organization simply to keep a (private) log of all the content it generates-a detector tool can then be implemented as a regular plagiarism detector operating on this log.This method was recently demonstrated for text by Krishna et al. (2023).A plagiarism detector is essentially an information retrieval (IR) device: the companies at the forefront of FM content generation also have huge expertise in this area, and would be very well placed to provide detectors of this type.Other better methods may well be discovered as research advances.To future-proof legislation, it should avoid mention of particular methods, and simply require 'a reliable detection mechanism'.

The detector's response format
What information would the detector return, when given an input document?As a concrete basis for discussion: for textual input, we currently envisage an analysis similar to that given by plagiarism detectors such as TurnItIn (TurnItIn, 2021).For a short text, the tool returns a probability (with some confidence interval) that it was generated by a FM.It may refrain from any output for very short texts, where confidence is necessarily low.For a longer text, it might identify specific segments that have some super-threshold probability of being FM-generated, again with confidence intervals.(Current commercial detectors such as GPTZero and open-source detectors such as GLTR have some of this functionality.)In cases where small FM-generated 'suggestions' are interleaved throughout a document, we envisage the tool should treat the text as human-generated if these are sparse, and AI-generated if they are dense.Images can similarly be analyzed as wholes or by parts.(FM generators can be asked to produce a specified region of an image, and humans can also post-process certain parts of an image.)

Aggregation of detectors in a user-facing tool
In the proposed rule, an organization providing an FM generation system must make available a detector for content produced by that system.Users obviously need a tool that calls detectors for all generation systems in common public use, and aggregates their responses.Clearly, an aggregator can only target the most commonly used generators, if it is to be practical.But the market share for generators is likely to be heavily skewed towards a few 'winning' systems at any time (see Hefti & Lareida, 2021 for a recent analysis), so a focus on commonly used generators will still provide reasonable coverage.Who should provide this aggregator?There are various possibilities.It could be a commercial company, or a non-profit organization (academia, user group), or an international regulator of some kind.It might also be the FM-generation companies themselves.Note these companies have their own pressing commercial needs for a tool detecting FM-generated content, so they can avoid the 'model collapse' that may potentially occur when a content generator is iteratively retrained on its own output (see Shumailov et al., 2023).

Resistance to adversarial attacks
Any detector tool will naturally be attacked by people seeking to evade detection.For texts, the most commonly discussed attack method at present is by passing the text through an automated 'paraphrasing' system, which changes its form but retains its meaning.Sadasivan et al. (2023) note this method is quite effective against watermarking schemes.(Other methods for evading watermarking schemes are discussed by Jiang et al., 2023;Shi et al., 2023.)Krishna et al.'s logging scheme appears more resistant to paraphrase attacks.But here too, we should anticipate effective attacks in due course.An arms race will naturally play out between detection methods and evasion methods, whether or not detection methods are mandated by law.If there is a law, as we propose, it should require a detector that is reliable 'in the current adversarial context', whatever that is.As evasion methods mature, it may be that detection methods require broader systems for guaranteeing the provenance of content: for instance, agreements to track and share the provenance of identifiable source material through, and onwards from paraphrasing products.(These systems could also provide methods for authenticating the human origin of content.)Organizations would have to collaborate in developing systems of this kind.(Again, given companies' shared interest in workable detection systems to prevent 'model collapse', such collaboration is likely a viable proposition.)Crucially, it would be for the agency developing a new FM generator to demonstrate a detection method that is effective in the current adversarial context, and show its practicality, either unilaterally, or in collaboration with other groups.Naturally, each new detection method will elicit new attacks: so our proposed rule will not lead to a perfect detection system for consumers.But it will help to keep consumers safe.

Cost of providing a detection tool
A detection tool has a certain cost, both in its development and in its deployment to users.But we should note that AIgenerated content detection is emerging as a commercial field in its own right (see e.g., Marshall, 2023).While companies would provide their detector free of charge to users in our proposed scheme, they could likely generate revenue through advertising.Smaller companies should be able to build on open-source detector tools, which will help limit costs.State agencies could also fund research on detection tools, which then could be made available to companies; arguably states have some responsibility in providing AI safety 'infrastructure' of this kind, especially if they enact rules that require such infrastructure.When considering cost, it is also important to bear in mind the cost of not having a reliable detection tool, both on individual users in specific domains (e.g., the additional costs for teachers, in checking for AI-generated work) and more general on society the destabilization of democracies through AIgenerated disinformation).

What counts as a reliable tool?
Any detector tool can be expected to make errors, both false positives (identification of human-generated text as AI-generated) and false negatives (identification of AI-generated texts as human-generated).Decisions will have to be made as to what level of these errors is acceptable.These decisions should be part of the interpretation of the law, rather than the law itself, as they may also change as technology and adversarial methods advance.But the basic evaluation principle can be clearly stated: a classifier's performance must be tested on a sample of AI-generated and human-generated content unseen during its training.

Enforcement for open-source generator models
Providers of open-source FMs would also have to comply with our proposed rule, and to supply detector mechanisms for the content their models produce.But enforcing this compliance is likely to be harder for open-source providers than for other providers, because versions of open-source software can proliferate more readily.Nonetheless, there is some useful structure to this proliferation.Within the open-source world, the vast majority of FMs are built as modifications of a small set of high-profile core models (see e.g., Gao & Gao, 2023, for evidence from Hugging Face's language model collection).If the core models comply when first released, and include licenses that require compliance to be maintained, this should provide some support for compliance in the open-source ecosystem.It may also be possible to make the compliance code hard to remove-for instance, by 'obfuscating' it (see Goldwasser & Rotblum, 2007 and subsequent work).Independently of this, any open-source generators that attract a large user base will necessarily become visible to enforcement agencies.But generators used by smaller groups (for instance, state-sponsored bad actors) are likely to be harder to find.Of course actors of this kind won't comply with our proposed law, and regular policing methods for identifying the origin of illegal content will have to be used.

Current initiatives by companies and legislators
Several of the large AI companies have recently announced an initiative to include watermarks in AI-generated audio and visual content (see White House, 2023).This is a good initiative, but it is some way from the scheme we are proposing.For one thing, our proposal extends to FM-generated text as well as audio and visual content.But more importantly, our suggested rule makes reference to an objectivea reliable detection tool-rather than to a specific mechanism such as watermarking.On the legislation front, the EU Parliament has made some reference to our proposal in the amendments it recently agreed to its proposed AI Act catering specifically for FMs (EU, 2023).An amendment to Recital 60 g states that generative foundation models 'should ensure transparency about the fact the content [they produce] is generated by an AI system, not by humans'.This amendment is pushing in the right direction.But again, we suggest this requirement should be stated more precisely, by making reference to a workable detection tool.And the intention behind the recital should also be fully reflected in the Act's Articles-most likely in Article 28b (obligations on distributors) and/or Article 52 (transparency obligations).
We look forward to a productive discussion with legislators, companies and other stakeholders about these open questions.
55 Page 6 of 7 anonymous reviewers for their useful comments.The views expressed here, and any remaining errors, are of course our own.Funding Open Access funding enabled and organized by CAUL and its Member Institutions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.