1 Introduction

This article builds on our earlier paper [46], further refining our CCC framework and applying it to a wider range of GenAI systems and contexts.

The recent wave of generative AI systems (GenAI) that competently produce text, images, code and other outputs from human prompts (e.g., Stable Diffusion, ChatGPT, Github Copilot) has raised a host of social, ethical, legal and regulatory challenges. In the public and academic debate, central points of contention range from safety and responsibility in regard to offensive or untruthful outputs, over issues of plagiarism, rights infringements, illegitimate scraping of training data and associated harms to artists and other originators, to the disruptive potentials of GenAI for education systems and labor markets [23, 39, 43, 50, 60, 73, 81, 86]. Many of these controversies have since proceeded to large-scale litigation and industrial action, such as the 2023 Writers Guild of America strike, which involved controversy around the use of large language models (LLMs) to substitute screenplay writers; litigation by Getty Images and others against developers of text-to-image GenAI systems like OpenAI’s DALL-E2 and Stability.ai’s Stable Diffusion; class action lawsuits against Github, Microsoft and OpenAI in regard to Copilot, Codex and related systems that can produce code from user prompts; and lawsuits by novelists and other writers against OpenAI for infringing copyright and violating fair use norms by using their works to develop LLMs [30, 69, 72, 74]. In addition, the public debate sees continuing disagreement over how to understand what, exactly, happens when users interact with GenAI systems, e.g., in creative industries, where digital artists are struggling with defining their own roles in relation to GenAI systems that automate and transform processes previously performed by humans [1, 17, 61, 71, 76]. As these ongoing controversies indicate, there is significant moral, legal, and practical uncertainty surrounding the development and use of GenAI technologies which must be resolved.

In this paper, we contribute to this project by engaging one of the key questions at the heart of these controversies: the creatorship question. The creatorship question asks: who should be understood as being a creator for the outputs produced with the help of GenAI systems; who deserves credit for these outputs and can make ownership and copyright claims; and who is responsible for the outputs, including for negative attributes or consequences they might have? For instance, do users create outputs and GenAI systems are simply tools, like cameras or pencils? Do artists and writers have claims to co-creatorship and credit to GenAI outputs when GenAI systems are trained on their works and enabled to mimic their distinctive styles? Can users claim creatorship and perhaps copyright for GenAI outputs, even if they invested neither skill nor effort? And who is responsible for broader features of GenAI systems and their outputs, e.g., LLMs’ tendencies to produce untruthful or unsafe text? Providing informative answers to these questions is crucial as they shape our understanding of what constitutes responsible development and use of GenAI technologies; who can own and profit from their outputs; and whose contributions may and should get credited and compensated. GenAI creatorship is hence, at bottom, a computer ethics issue [59].

To address these urgent questions, we propose the collective-centered creation (CCC) framework, a conceptual framework to attribute credit for generative AI outputs. Centrally, CCC insists that it is misleading to ask for the creator of an output: creating outputs such as images, text and code with the help of GenAI is always a collective endeavor involving many hands. Human users, creators of training data, developers, and even GenAI systems themselves may each play a significant role in the co-production of specific outputs, which may each warrant credit and derivative claims to copyright protection or compensation. To better understand which agents, entities and resources play what roles in producing GenAI outputs and to allocate credit and responsibility for these outputs where it is due, our CCC framework offers a rich menu of conceptual resources. These resources, we show, can better track different contributors’ roles than existing views and help attribute credit more accurately, avoiding misattributions and the injustices they can bring about. Inadequate attributions of credit not only raise moral problems (e.g., when the role of writers and artists whose works are essential for training GenAI systems are not sufficiently recognized), but also have economic and social consequences, affecting how we value works and who benefits from them [41]. Moreover, credit allocation is important for the public’s ability to interpret and assess the validity and meaning of works [9, 42].

Focusing mainly on visual GenAI systems (i.e., text-to-image systems like DALL-E2, Stable Diffusion and Midjourney), we show how CCC advances ongoing debates and resolves controversies by furnishing arguments that strengthen existing positions and introduce novel stances on controversial issues. First, CCC helps highlight that creators of training data (e.g. photographers, digital artists, novelists, programmers) may often play crucial roles in the production of outputs and can therefore have strong co-creatorship claims, which can reinforce existing complaints about web-scale scraping of training data on grounds that are distinct from copyright infringement allegations. Relatedly, while CCC clarifies that users can be creators of GenAI outputs when they play the right kind of role (e.g., make sufficiently relevant and original contributions), it also insists that users will rarely be the sole creators of outputs, as other agents may have similarly strong claims to co-creatorship and credit, which must be acknowledged. Second, CCC draws out novel ways to understand how humans and machines co-produce outputs by clarifying how GenAI systems themselves can be considered co-creators in their own right, sometimes making contributions to outputs that do not simply reduce to inputs provided by users, developers or creators of training data. Recognizing this is helpful for pushing back on cases where users over-credit themselves, and responds to larger questions about the copyrightability of GenAI outputs: if a user’s contributions are minimal, their claims to co-creatorship may be too weak to ground copyright claims. Third, CCC stresses that developers can have significant responsibilities for GenAI outputs at a general level. Although they are usually not directly involved in producing any specific output, and are thus not co-creators, CCC shows that they exert high-level control over, and bear responsibility for, general affordances and tendencies of GenAI systems. While CCC does not aim to offer definitive judgments on who has claims to creatorship, and does not offer practical recipes for how credit, or copyright, may be distributed more widely, CCC can inform efforts to do so; reduce uncertainty; and help us better understand, navigate and resolve real-world controversies.

We proceed as follows. Section 2 develops CCC’s main conceptual resources. Section 3 illustrates how CCC may be applied to real cases, focusing on visual GenAI systems, and expands on how CCC can reinforce existing intuitions on creatorship as well as generate new insights that help move current debates over GenAI usage forward. Section 4 provides an outlook on how CCC may be applied to LLMs used for creative writing, marketing copy, essay writing and generating code and draws out additional, unique insights that arise in each domain. Section 5 comments on how CCC can inform larger public and academic debates. Section 6 concludes.

2 The collective-centered creation view

Who should count as a creator has long been contested, especially in collaborative environments such as film-making and artists’ workshops [29, 56, 84].Footnote 2 Attributes such as intention, autonomy and creative agency have been held up as vital to qualifying as a creator [4, 5, 18, 37, 58], but identifying whether someone displays these can be challenging in practice. The rise of GenAI technologies has muddied these waters further, introducing more complex and diffuse webs of possible contributors for any given output. Much disagreement and uncertainty currently exists amongst GenAI users, commentators, academics and technologists on the issue of creatorship in this new context, with no clear consensus in sight [31,32,33, 45, 55, 68, 83]. Some insist that GenAI is simply a new tool, like a camera or photoshop, and that users are the sole creators of what results from their prompts [2, 37, 61, 62, 64]. Others see GenAI itself as wielding creative control, even going so far as to say that AI can autonomously create outputs, minimizing the significance of human inputs [11, 15, 21, 44, 47]. Yet others see the role of creator falling to those who created GenAI systems in the first place, emphasizing the ultimate control that developers’ design choices have over outputs [2]. And finally, some see the use of GenAI as a collaborative endeavor that involves all these groups [1, 17, 27, 54, 61, 76].

While the latter, collaborative view seems, to us, the most plausible contender, very little has been said on how credit may be distributed amongst collaborators. The most direct suggestions regarding visual GenAI have been made by legal scholars Benhamou and Andrijevic [10], albeit solely with a view to copyright and without considering the role of GenAI systems themselves. Scholars such as McCormack et al. [58] have agreed that “[a]uthors have a responsibility to accurately represent the process used to generate a work, including the labour of both machines and other people” [58, p.13], and Anscomb asserts that AI might deserve some of the credit for the production of artworks [4].

But how could we go about ascertaining the need for this credit in individual cases, and then apportioning it? As Epstein et al. [22] and Jago and Carroll [41] suggest, people are vulnerable to allocating credit based on questionable criteria, such as anthropomorphicity, so there is a need to understand and communicate different contributors’ involvement on conceptually firmer grounds. In the spirit of related approaches, such as Jenkins and Lin’s proposals for determining credit for AI-generated text [42], the CCC framework we develop here maintains that GenAI can be part of a co-creating collective, but also provides richer resources to understand different agents’ and entities’ roles within a collective.

According to our CCC (collective-centered creation) framework,Footnote 3 the very starting question ‘who is the creator?’ is misleading: creation is a collective achievement, and credit distribution depends on the nature and significance of the contributions made. Specifically, CCC maintains that for most cases of creation using GenAI:

  • There is no clear single creatorFootnote 4 who can be credited with an output.Footnote 5

  • A collective of actors and entities all made important contributions to an output.

  • Credit for this output should be distributed between these contributors according to the nature and significance of the contributions made.

CCC, of course, is not the first view to emphasize that artistic, literary and other forms of production often take the format of co-creation. But contra existing views, CCC does not aim at offering neat, principled categorizations between different subgroups of agents, e.g. authors, creators, contributors, assistants [4, 6, 7, 9, 29, 38, 54, 56]. While we agree that making such distinctions can be sensible, we also think that they should be grounded in a conceptually richer analysis that tracks important primary features of contributors and their contributions, especially regarding GenAI. CCC, then, starts bottom-up, by first analyzing which features matter for determining inclusion in a co-creating collective. Pencils and hard drives will not make the cut—not because we say so, but because they do not score highly on relevant criteria. CCC hence provides conceptual machinery that specifies the sorts of considerations we should entertain when seeking to clarify creatorship and locate our disagreements.

Let us explain which features CCC uses to understand creatorship and distribute credit.Footnote 6 To do so, we first focus on generative visual AI systems for image creation, such as OpenAI’s DALL-E2, Stability.ai’s Stable Diffusion, Midjourney and related systems [63, 65], which allow users to steer image synthesis through a combination of text and image prompts for conditioning. To develop CCC’s conceptual inventory, we use a series of imagined cases, with and without GenAI, to draw out key intuitions and explain how CCC functions.

2.1 Relevance/(non-)redundancy and control

The first feature CCC uses to understand creatorship comes as a bundle: relevance and non-redundancy track what difference a contribution makes to an output. They are causal-counterfactual notions: to determine how relevant or (non-)redundant a contribution X is to an output Y, we must answer the counterfactual question, ‘take X away, what would the output Y have looked like?’ If a contribution is not relevant, or relevant but highly redundant, Y will remain the same. For instance, if Jo and Jake produce a painting, where Jo does all the painting and Jake’s role is to hand Jo the brushes, we might think that Jake is not terribly relevant and can be made redundant. Take Jake away, and the output would have been the same, either because Jo gets the brushes herself, or because someone else fills in for Jake. By contrast, consider Jerome, who takes a more active role in suggesting a certain composition or what brush could be the right one to achieve a certain texture. Jo and Jerome engage in a symbiotic relationship, with Jerome asking questions, making suggestions, adding interpretations and so on. Jerome’s involvement, let us imagine, makes a difference to the output: the painting would be different if Jerome was not there, and it might be difficult to replace Jerome. Jerome hence scores highly for relevance/non-redundancy. Lastly, consider Jake making a solo attempt to produce an image of a ‘cat on a mat’ using Stable Diffusion. Take away his access to the system, and Jake would fail to produce the image, because he lacks relevant skills to make it another way. Generally, the more relevant and non-redundant a contribution, the stronger the claim to inclusion in a co-creating collective.Footnote 7

A second feature that is closely related to relevance and non-redundancy is control [82]. Control tracks how precisely and robustly an agent or entity can steer or maintain an output. Intuitively, control may seem to involve intention, but we render it as a deflationary notion that only requires causal powers to make an output be a certain way rather than another. Consider Jo, who iteratively refines her prompts to get the precise image she wants. Jo exerts a high degree of control and can thus stake a strong claim to creatorship. By contrast, consider Jake again, who casually prompts Stable Diffusion with ‘cat on a mat’. Does Jake exhibit control? Not necessarily. Diffusion models begin synthesis from quasi-random noise patterns that are determined by a seed number, which can change from prompt to prompt. Importantly, one and the same prompt can yield dramatically different outputs depending on the seed [71]. So, Jake might end up with an entirely different image if the seed were different. Jake, in this case, does not exercise much control if he is happy with whatever output he gets. There is no back-and-forth interaction, like in Jo’s iterative endeavor, where Jake works against the randomness of diffusion-based image synthesis to realize a specific result.

Three further points help fine-grain control. First, control can be dispositional in a way that relevance and non-redundancy are not: an individual does not always need to exert actual influence in order to exhibit control, but they must be able to if the need arises. Consider a variation of Jo’s case where she is lucky to get the exact image she wants on the first try. We might still maintain that Jo exhibits control if it is true that she would have intervened successfully, had the output diverged from her expectations. Similarly, we might say that Stable Diffusion exhibits control over an output if it would have robustly produced the same output even if Jake had tried to steer it towards another. Second, control is zero-sum: the less control a user exercises, the more control the GenAI system has.Footnote 8 So, when clarifying control, we ask (1) how counterfactually robust an output’s features are, and (2) due to who. Finally, it is possible to have control over different aspects of the output to different extents, for example, control over the form of the output (e.g., whether it is a painting or a sculpture, what textures it has) or its content (e.g. the subtle meanings conveyed by a painting). We will unpack this further in Sect. 4.

Relevance, redundancy and control are thorny concepts, as they all hinge on (appropriate) counterfactuals. Whether Jake would have been able to produce ‘cat on a mat’ without Stable Diffusion, for example, might depend on whether we ask for the exact pixel-by-pixel image or just something in the ballpark. But even if we have clear counterfactuals in mind, learning them empirically is also difficult, e.g., telling what Jo’s painting would have looked like without Jerome’s suggestions or whether Jo would have successfully intervened if the GenAI had not produced what she wanted right away. These challenges are not unique to CCC but obtain in many areas, e.g., in legal reasoning, where we routinely assess what would have happened if people had acted differently. Difficult as this may be in practice, considering relevance, (non-)redundancy and control is essential for understanding creatorship.

2.2 Originality

Originality concerns how original a contribution is, i.e., whether it is novel in character and unique to a contributor. This is related to but different from the originality of an output, which is not our focus here. Let us assume some recognizably original output is generated. A key question for clarifying creatorship is whose original contributions helped achieve that output’s originality? A natural starting point is to look at users’ text/image prompts. Suppose that there has never before existed an image of a Donald Trump-shaped cheese wheel rolling down a hill. A user’s idea to produce such an image and their formulating a prompt that corresponds to these would constitute an original contribution. By contrast, a generic prompt such as ‘cute dog’ would not score highly. But prompts are not all that is needed to make an image—a GenAI system itself must be disposed in the right way to produce images that correspond well to user prompts. Specifically, GenAI systems may make original contributions to the production of original outputs, when, at training, the systems latch onto text-image relationships in original ways, e.g. by learning novel representations and relationships between them that can be used to competently synthesize, for instance, what a Donald Trump-shaped cheese wheel rolling down a hill would look like. Here, a mere collage might not be enough: success is measured by whether the system made original connections that help synthesize a coherent visual entity that recognizably looks like (1) Donald Trump, (2) a cheese wheel and (3) like it is rolling down a hill.

Right away, one might insist that originality still ultimately comes from the user—after all, it was them who prompted the system in a certain, original way. But while coming up with the ‘what’ may often involve originality on the part of the user, concretizing the ‘how’ may also require originality on the part of a GenAI system. This is best understood in cases where a user is unable to imagine how an image corresponding to their prompt could look. Take Jerome, who prompts Midjourney to produce an image encapsulating ‘the abstract feeling of realizing that you did not tell your parents that you loved them enough’. Here, Jerome might only learn about how this feeling could be visualized once he sees the output. If Jerome thinks it captures the feeling well, and there have not been previous attempts to visualize the feeling with similar results, it seems like Midjourney, too, has made original contributions to producing the output.

Even so, one might wonder where, exactly, we could locate originality in GenAI systems’ contributions. For instance, one might insist that the computations performed by GenAI systems are ‘deterministic’ or ‘always the same’, regardless of whether an output is original. To clarify, we do not claim that there is a mysterious originality property to be found (or not found) anywhere at the computational level. But—like in descriptions of human contributions where brain scans might be indistinguishable between a truly creative and an unoriginal prompter—some token-level macro behaviors that GenAI systems exhibit can nevertheless be usefully characterized by ascriptions of originality (e.g. learning a latent manifold that enables them to produce novel images or following a specific denoising trajectory towards a coherent rendition of ‘Donald Trump-shaped cheese wheel rolling down a hill’). We also do not claim that GenAI systems are always or routinely original. GenAI systems are prone to (near-)reproducing existing works and styles, raising concerns about (near-)plagiarism [53, 80]. So, our suggestion is that, especially in cases where output originality is granted but cannot be fully accounted for by reference to user contributions, GenAI systems may reasonably be described as making original contributions of their own.

2.3 Time/effort

The time and effort an agent or entity spends on furnishing a contribution can matter for their claim to inclusion in a co-creating collective, too. Consider Jake again: even if his brush-handing contributions are not highly relevant, if Jo recruited him for hundreds of hours in order to get the piece done, Jake may nevertheless have some claim to being included in a co-creating collective. Time and effort capture the ‘doing’ of creation (without which an idea would never transform into an output) and matter distributively in relation to others’ contributions. The production of a given output will have involved a certain amount of time and effort, and this feature asks what proportion of that time and effort was spent by each contributor. For GenAI systems, time and effort are best understood as tracking the computational complexity and compute effort (e.g., FLOPs) involved in furnishing a contribution. While GenAI systems are certainly faster than humans at producing images once trained, what matters are the computational efforts needed to furnish GenAI outputs, and these can be significant. For other collaborators, like the brush-handing Jake, this feature can recognize their labor—while not, of course, guaranteeing a strong creatorship claim across CCC as a whole. Time and effort are only one feature of the framework, after all, and they are crude and difficult metrics: spent inefficiently, they should not count for much; there will be cases where they are not relevant; and they introduce additional complexities we cannot address here, such as how to commensurate the time and effort spent by contributors who differ in efficiency and skill. But without tracking them, CCC would be incomplete, unable to account for extended efforts and sacrifices, e.g. opportunity costs, that are distinctive of some kinds of contributions.

2.4 Leadership and independence

Leadership captures whether a contributor steered the production of an output with a specific intention in mind. For instance, Jo may have a concrete vision for an image, choose a particular method for the job, say by involving Stable Diffusion, and pursue that vision by refining her prompts to realize a specific output. Jake, by contrast, may deploy a generic prompt like ‘cat on a mat’ and turn out happy with whatever result he gets. While there is intention involved, he does not exert a great deal of leadership. Leadership is closely related to control, i.e. the ability to precisely and robustly steer or maintain an output. Yet, while successful leadership involves control, it differs from mere control in that it also involves intentions, e.g., identifying, setting and pursuing goals and directing available means to reach them.

Second, independence tracks whether a contributor depends on detailed guidance to furnish their contribution or whether they act in a more autonomous way. Jo and Jerome might be independent in that sense, both coming up with suggestions for what a painting could look like. Jake, by contrast, would not make independent contributions if his role is to hand Jo the brushes she requests.

While leadership and independence are important, they should not be overemphasized. For instance, leadership roles frequently fall on agents ready to disproportionately absorb credit, such as when a famed film director’s artistic vision is emphasized as key to achieving a significant work, but other agents’ creative contributions that fill important blanks are left underrecognized. Nuancing the role of leadership and independence is especially relevant as GenAI systems have a hard time exhibiting these features at levels comparable to humans. For lack of intentions, they cannot exhibit leadership but only control. Likewise, they cannot exhibit full-fledged forms of independence that humans can, e.g. changing a prompt to deliver a different, better output. However, GenAI systems may still exhibit some thinner forms of independence at training that carry through to the ultimate outputs. Within the confines of a learning task defined by humans, the deep neural networks (DNNs) at the heart of GenAI systems must be sufficiently flexible to learn whatever there is to learn—that is the point of machine learning. Weights and biases are not hand-tuned by humans, and while humans write training algorithms and build system architectures, they do not fully determine what a system learns in particular (e.g. which representations), especially in unsupervised or self-supervised regimes. So, while GenAI systems are not independent in the sense of ‘choosing to do it their own way’, and what they end up learning is still importantly shaped by human aims, leadership and oversight [4, 58], we maintain that GenAI systems can nevertheless exhibit some forms of independence if what they learn and later draw on at inference is not fully determined by humans.

2.5 Directness

Finally, directness captures how directly a contribution is involved in producing an output. For instance, imagine that cash-strapped Jo could not produce any paintings if it was not for her friend Jack, who provides her studio space rent-free. Jack’s help is highly relevant and non-redundant, but not direct: his aid will support Jo, let us assume, in producing whatever paintings she wants to make but does not causally influence the form of any specific painting. Contrast this with Jerome, who is dialectically engaging with Jo to co-shape their open-ended artistic endeavor. His contributions are, therefore, both highly relevant and direct. Like Jerome, GenAI systems can make direct contributions. The computations performed at inference directly generate the ultimate outputs at issue.

Directness plays a special role among the features CCC tracks: it modulates the extent to which other features matter for creatorship. Take developers: without their efforts in building GenAI systems, most users would not be able to produce the images they do. But developers do not make direct contributions to the creation of specific images. Rather, their contributions primarily consist in building GenAI systems that have the capacity to produce images. This is an important achievement but not to be conflated with the production of specific images, to which developers usually contribute only in an indirect, enabling way. So, despite developers’ high causal relevance to specific outputs, this relevance must be appropriately discounted by the typically low directness of their contributions (more on this later). Generally, then, the less direct a contribution is overall, the less strongly the other features that a contribution exhibits weigh in determining its significance.

2.6 Putting CCC together

Let us look at how the CCC framework functions as a whole. First, all the features that CCC tracks come in degrees: a contribution can be less or more relevant, exhibit stronger leadership, or little originality and so on. Second, none of the features are individually necessary or sufficient for claims to creatorship, no matter the degree to which they are present. Consider sufficiency: a GenAI system can be highly relevant to producing an output, and yet be considered closer to a mere tool if a user scores highly on leadership, control, originality and so on. Nor is any single feature always necessary: seasoned users do not need much time or effort for good results, though some features will be essential in many cases (e.g. directness).

Second, distinguishing between the features we sketch here is sometimes difficult (e.g. control and leadership). This is neither surprising, nor a problem, however. The broader themes CCC’s concepts draw on, like causation, agency, and originality, have been subjects of study and controversy for centuries because they are complex and interrelated. With creation uniting these themes, it seems misguided to expect a finite list of distinct and razor-sharp conceptual ingredients that explain it neatly. CCC, then, does not raise but only encounters conceptual challenges, and these should not distract us from further exploring CCC’s descriptive and explanatory value.

Third, taken together, the features outlined here (and potentially others) form a basis for candidacy in a co-creating collective: if you exhibit none, or some but to low degrees, you will not get close to being a creator, but if you score highly on all, you should be considered a serious candidate. Within CCC’s feature space, there will be different profiles that can each ground strong claims to candidacy, just as there are naturally different roles to play in creative pursuits. Importantly, though, CCC does not maintain that there is ever a sharp threshold to decide creatorship. To the contrary, it acknowledges substantial and often reasonable disagreement about creatorship and only insists that creatorship is not all-or-nothing. CCC, therefore, invites us to work through attributions carefully, providing a set of clearer criteria that help us locate and potentially resolve disagreement about creatorship. With these tenets in mind, let us proceed to explore what CCC can do for us in practice.

3 What CCC can do for you

3.1 CCC across the space of contenders

To show how CCC can help make progress on understanding creatorship, we proceed as follows: first, we consider CCC’s criteria mapped against possible contenders for creatorship, i.e. users, GenAI systems and others, and comment on how each group may fare at a general level. We then focus specifically on the comparison between human prompters and GenAI systems and discuss two cases on different ends of a credit distribution spectrum. Finally, we elaborate how CCC reinforces existing intuitions offered in the public discourse on creatorship questions, as well as generates novel claims about creatorship.

Let us begin by applying CCC’s criteria to some of the most likely candidates: users, GenAI systems, developers and producers of training data, to better understand whether they are candidates for creatorship and, if not, why not. First, users can make less or more relevant/non-redundant contributions. Users can also spend lower or higher amounts of time and effort, and the originality of their contributions can vary from generic one-word prompts like ‘banana’ to highly engineered prompts pursuing specific objectives. Relatedly, they can exercise lower or higher degrees of control, leadership and independence when pursuing generic or more involved prompting projects. Finally, prompter contributions will always show directness, but to considerably varying degrees, e.g. through only generating a kind of image using a generic prompt like ‘banana’, or exhibiting high degrees of directness using targeted prompts.

Second, like users, GenAI systems can make less or more relevant and non-redundant contributions. But they can only exhibit a certain degree of independence and cannot demonstrate leadership, for lack of intentions. However, if unchallenged by a user, they will exercise control in producing certain images rather than others, given a prompt. GenAI systems’ contributions always involve some and potentially a lot of compute time and effort, and they can be less or more original, e.g., depending on whether they draw on original connections made at training. Importantly, their contributions exhibit high directness: their computations literally make the specific images synthesized.

Third, as elaborated earlier, developers’ contributions are almost always indirect. They do not make specific images, but rather, mainly, enable their production. These contributions can exhibit less or more relevance and redundancy, but will usually involve little specific control over particular outputs. Likewise, they may involve less or more time and effort, as well as varying degrees of originality, leadership and independence but for lack of directness, these features are discounted: developers do not intend to produce any specific image; they only intend to build systems that can. In most cases, then, developers will not qualify as candidate co-creators. That said, there are some cases where developers can play more direct roles: for instance, they may fine-tune GenAI systems to produce outputs of a certain kind, e.g. aesthetically pleasing images rather than just naturalistic ones, or hinder them from producing outputs of a certain kind, e.g. unsafe or toxic images. In the former case, developers will exert positive but imprecise control: they nudge a system towards certain kinds of outputs. But while exhibiting more directness, their contributions would not score highly regarding control for lack of precision, just like Jake, whose ‘cat on a mat’ prompt only controls what kind of image Stable Diffusion produces. Relatedly, while efforts to ensure safety of GenAI systems may be more precise in preventing specific kinds of outputs (e.g. refusing prompts with specific keywords), the control exerted here is negative in character: outputs can be anything, as long as they are not of the kind to be prevented. So, despite developers exercising some forms of control, their contributions remain mostly indirect: they have sway over outputs, but not in the way that users or GenAI systems have. Their role is, mainly, to enable—and perhaps to favor or disfavor—but not to make specific outputs.

Lastly, producers of training data can make varied contributions to creation, too. There are two ways to conceptualize this group: first, as capturing all producers of all training data used to train a GenAI system taken together. Second, as specific producers of particular training data tokens. On the wider construal, producers of training data make contributions that are highly relevant and somewhat non-redundant (e.g. there are more images on the web than large datasets like LAION-5B contain, but many images contained in LAION-5B are unique) but they exercise little control over the output. While they may, as a whole, exercise significant time and effort furnishing their contributions, scoring individually from low (Jack posting a photo of grass, which gets scraped and put into LAION-5B) to high (Jill’s collected 10-year efforts in producing her published illustrations), and with some originality in the mix, their contributions display no leadership, independence or directness regarding any image produced with GenAI (which is why there are concerns about scraping images without consent). These assessments can change importantly when we turn to specific producers of particular training data tokens. For instance, concerning relevance and redundancy, Jacinda’s collected paintings of non-cheese things looking like they are made from cheese may play a crucial role in enabling a GenAI system to produce ‘Donald Trump-shaped cheese wheel rolling down a hill’.

We expand on further differences in regard to producers of specific training data later. For now, let us turn to explore more concrete theses that CCC can ground, focusing first on a comparison of human users and GenAI systems.

3.2 Users vs. GenAI: a spectrum of creatorship

Can GenAI systems be part of co-creating collectives? CCC suggests yes, for they may exhibit a number of important features and to significant enough degrees to merit candidacy. But how would credit for an output be allocated between users and GenAI systems? Let us offer two examples, which fall on opposite sides of a spectrum for how credit may be distributed. These examples will help us establish that GenAI systems can have strong claims to creatorship, sometimes stronger than humans.

Consider Jake’s ‘cat on a mat’ prompt again. Four images are generated (Fig. 1), from which he chooses the first.

Fig. 1
figure 1

‘Cat on a mat, art’, produced by stable diffusion

How should we consider Jake’s and Stable Diffusion’s claims to credit here? CCC suggests that Stable Diffusion has a stronger claim than Jake. Jake typed in a generic prompt and did not contribute relevantly to the output beyond that. He did not have any concrete ideas regarding composition, palette, style, etc., and he would not have been able to create any of these images without GenAI.

Contrast this with Jill, an experienced visual artist working on campaign visuals for an environmental protection agency. She wants to create an image of a polluted ocean in the palm of a hand to correspond with key mission statements. Starting from a hand-drawn sketch, Jill refines her prompts, guiding the GenAI through a series of many images and exerting precise control, e.g., by using inpainting and ControlNet [87] to pose the hand and steer the composition, until she gets an image that conforms to her concrete expectations. Jill already knew what image she wanted and could have created something similar by different means, say with Photoshop. Here, CCC can ground why Jill deserves a significant credit share and that GenAI is more akin to a tool than a full-fledged creator.

CCC can capture the difference between these cases in a systematic fashion. Table 1 maps out Jill, Jake and Stable Diffusion against CCC’s criteria. For simplicity, we use a qualitative coding as ‘low’ or ‘high’ to indicate the degree to which each feature tracked by CCC is realized. ‘N/a’ indicates that a feature does not apply in a case, e.g., because GenAI systems do not have intentions necessary for leadership.

Table 1 Comparing contributors

Table 1 encodes Jill’s comparatively stronger claim than Stable Diffusion (SD1). Jake, by contrast, loses out to Stable Diffusion (SD2) on several criteria, including relevance, redundancy, control and time/effort, so Stable Diffusion has a comparatively stronger claim than him. CCC can hence capture how creatorship and credit depend on a number of context-specific details and locate the roles of various agents and entities straddling full creator and mere tool, rather than relying on rigid categories. This flexibility and ability to give insights into different situations, where our intuitions can vary widely and surprisingly, is at the heart of CCC—no agent or entity should be judged in or out at the outset, but instead should be allocated credit according to the specific contributions they make.

Nevertheless, there are some likely objections even against our moderate claim that GenAI systems can be strong candidates for co-creating collectives and can sometimes play more significant roles than humans. For instance, one could insist that GenAI systems are not appropriate targets for credit as they are not making the right sorts of contributions to an output. But taking this approach can raise problems. For instance, it can lead to credit attribution gaps and subsequent responsibility gaps (cf. [57, 67]), where the (human) creators established as forming a collective do not fully capture the credit for the output and allocating the concomitant responsibility is hindered by a lack of proper targets. While the visual ‘cat on the mat’ may be mundane and unoriginal, credit for this image, however little, must still be allocated somewhere. But if not to Jake, to who? Consider a variation of Jake’s case, where instead of prompting Stable Diffusion, he asks his artistic friend, Jana, to help him make ‘cat on a mat’. Jana looks at a range of other cat and mat pictures for inspiration, and drawing on experience and learned aesthetic norms, casually sketches some variants she expects Jake to like. Insisting that Jana should be allocated credit, while Stable Diffusion should not, even though their contributions take a similar form, seems to be begging the question on who can be a creator and is thus not compelling. The intuition that Jake is not solely responsible for the creation of the ‘cat on the mat’ visuals is even stronger in cases where the output is in some way harmful, for example, if Jake inputs an innocuous prompt and, to his surprise, receives images filled with racist stereotypes. In this case, it seems implausible to allocate responsibility to Jake. So, until compelling arguments are offered that CCC misses additional criteria to negotiate creatorship, which can sustain principled distinctions between humans and machines, we maintain that GenAIs can sometimes be considered parts of co-creating collectives.

3.3 CCC reinforces and generates intuitions

CCC can reinforce existing intuitions as well as generate new ones to advance ongoing debates. Existing controversy around the role of creators of training data is an important example. While common image datasets like LAION-5B are heavily populated with generic imagery, they also contain the works of dead and living artists who have spent considerable time and effort developing their works, and have not consented to their works being used to train GenAI systems that can appropriate their distinctive style. Many commentators and artists insist that something illegitimate is happening here [26, 36, 78] and CCC can reinforce such intuitions on independent grounds: in some cases, producers of training data may have claims to candidacy in a co-creating collective.

Take Jamal, who spent years crafting his distinctive and acclaimed style as a digital artist. Jamal’s images were scraped and a GenAI trained on them is now capable of rendering images in his style. Jamal may reasonably complain that he is made worse off by GenAI, as almost anyone can now freely produce imagery that looks like his, worsening his prospects of getting commissions and drowning out his distinctiveness in a sea of near-indistinguishable mimicry. Does Jamal have a claim to be considered a part of a co-creating collective for some outputs? CCC answers in the affirmative. Consider relevance and redundancy. Jamal’s works are highly relevant and non-redundant to a GenAI system’s ability to produce outputs in his style—take them out from the training dataset, re-train the system, and the GenAI would not be able to reproduce his unique style. They may also involve high degrees of control: while Jamal did not intend to effect specific results in a GenAI user’s outputs, the look of his works will co-determine what any GenAI outputs prompted to mimic his style will look like—had his palette been warmer, the outputs would have been warmer, too. Contrast this with Jimmy, whose 27 generic pictures of his cat ‘Mr Snuggles’ posted on Instagram will not make a recognizable difference to any cat images produced with the help of GenAI. Generally, the more specific a prompt is to a region of the latent manifold that is crucially shaped by a specific creator’s works, the stronger the claim that creator has to credit for a GenAI’s output due to the relevance/non-redundancy and control involved.

What about the other criteria? We may assume that Jamal’s contributions involved large amounts of time and effort in developing his style and producing his works. But while Jamal may have also exhibited plenty of leadership and independence in producing his oeuvre, his contributions to specific GenAI outputs are indirect: they are causally mediated by GenAI systems. So, what should we conclude about Jamal’s candidacy in a co-creating collective? We think that it is not implausible to consider Jamal a co-creator, albeit a distant one. Nevertheless, even a weaker claim to co-creatorship may ground derivative claims, e.g., to be appropriately credited or asked for consent. Reasonably, Jamal may decline to be a co-creator on prompting endeavors by people he does not know and whose values he may not share (see e.g. [71]). Importantly, CCC makes clear that he may do so on grounds that are independent from concerns about intellectual property violations in scraping and using imagery for training GenAI. In virtue of this, CCC also reinforces arguments pushing back against awarding users exclusive copyright over GenAI outputs, especially if they do not exert sufficient effort and control, and others, like Jamal, may have co-creatorship claims [28].

CCC also generates novel intuitions, for example, that GenAI systems have the capacity to create illusions of creatorship. Specifically, users can be led to over credit themselves, despite having made only minimal contributions to an output—and CCC explains why. Consider Jake again, who might think he created ‘cat on a mat’, using Stable Diffusion as a mere tool. But Jake might be entirely unaware of how little control he exerted over the output if he does not have access to relevant counterfactuals, such as how the images would have looked if a different seed had been used, or if he had, equally randomly, prompted ‘a mat with a cat on it’ instead of ‘cat on a mat’. Lacking such counterfactuals, Jake may understandably feel he exercised control to effect a specific output, but that feeling might be misleading. Users also lack information about the significance of others’ contributions. Take training data. Jaden likes sci-fi and uses Midjourney to produce a striking image of ‘a battlecruiser landing on a desert planet’. No amount of intricate prompt-engineering would have gotten him anywhere near that output if not for the extensive, aesthetically rich training data produced over decades by concept artists. But for lack of access to relevant counterfactuals, e.g., realizing that without those artists’ contributions Jaden’s battlecruiser image would have looked like a teenager’s pencil drawing, and without considering the kinds of features CCC tracks and what other candidates for co-creatorship there might be, it can be easy for users to overestimate their role in creation processes. CCC can help dispel such overestimations and allow users to better understand their roles: if Jake would have been happy with many different outputs, his role is more akin to someone browsing a gallery of cat images and selecting one they like. That is a fine role to play, but different from being a creator, and we should not worry about withholding credit when it is based on illusion.

4 CCC beyond visual GenAI

As we have shown, CCC offers useful conceptual resources to address pressing questions about how to understand and negotiate creatorship for visual GenAI outputs. But CCC is also more widely applicable to other domains where issues of creatorship arise, including to text generation with LLMs; generative video, audio, music and voice synthesis; and code generation. Such cases, too, raise concerns about illegitimately scraping training data, reproducing distinctive styles and dexterities at near-zero cost, and the effects this may have on human laborers. Here, we want to offer a cursory overview of how CCC may provide insights on some of these issues, as well as sketch how CCC can be expanded and tailored to distinctive aspects of these domains. To do so, we focus on two domains of strong current interest: text and code generation using LLMs. Importantly, as before, CCC’s focus is on identifying who is a creator of GenAI outputs, which is different from thicker notions such as author, artist, novelist, writer, programmer and so on. While creatorship, which focuses on understanding who contributed to making or producing something, may often be an essential part of these notions, they carry further requirements and valences that CCC does not aim to track (e.g. telling whether someone merely made an image or whether they should be considered an artist). That said, CCC can help us understand the differences between creatorship and these thicker notions in cases where they come apart, as we highlight below in regards to authorship.

4.1 Text and code generation with LLMs

LLMs currently have the edge over visual GenAI systems in the public consciousness and while there is substantive overlap in the issues they each raise, others are unique to LLMs and their various use-cases. In terms of overlap, LLMs used for text generation raise many of the same concerns as visual GenAI systems: who is the creator of their outputs? Are users the creators and LLMs mere tools or can systems like ChatGPT sometimes be considered creators of text outputs themselves? What about the role of writers whose works were scraped from the web to train LLMs? But LLMs also differ importantly: they are used for a much broader range of purposes, spanning such tasks as ideation; language editing; producing generic and complex texts in professional, creative and academic domains; generating code; augmenting search; and serving as conversational agents. This diversity comes with substantial variation in how important the issues of creatorship and credit attribution appear, as well as differences in the social and legal norms that currently encode how we negotiate these issues across domains. Here, we focus on three use-cases for LLMs to highlight distinctive aspects of using GenAI in these contexts, further sharpen CCC’s conceptual resources, and illustrate two key virtues of CCC: it can accommodate existing norms prevalent in each domain and highlight novel ways to respond to the unique challenges raised by GenAI.

4.1.1 Creative writing

The use of LLMs in creative writing (e.g. for novels, poetry, screenplays and short stories) has serious implications for the livelihoods of writers and our traditionally conceived understandings of such roles as author, novelist, screenwriter or poet. As the recent tentative agreement between the Writers Guild of America and AMPTP demonstrates [85], these implications are inextricably linked: if GenAI can be considered an ‘author’, then human writers are at greater risk of being made redundant—which is one of the reasons why the agreement rules out GenAI systems being considered a creditable author and does not treat the outputs of GenAI as ‘literary material’. At the same time, simply dismissing the contributions of LLMs as negligible also threatens to undermine the status and understanding of what it means to be named an author of a text, as it opens the door for individuals to overcredit themselves. When an author’s name appears on a book jacket, for example, a bundle of information is communicated about that person’s relation to the finished novel and the role they played in creating it: we likely assume that they conceived the idea, developed it and are responsible for the labour, skill and imaginative act of writing itself. While examples such as the ghostwriting of fiction already contravene these norms (take, for example, Milly Bobby Brown’s recent ghostwritten novel [34]), the use of LLMs is likely to further blur these lines, making our ability to explicitly specify the contributions of each agent/entity increasingly important.

To use an example, consider a novelist, Jayani. Struggling to come up with an idea for her next novel, she asks ChatGPT to write her an original novel plan. Let us assume that ChatGPT produces a detailed and original plan for the novel: ‘Whispers of the ChronoSphere: A Tale Beyond Time’.Footnote 9 Jayani is impressed with the ideas and decides to follow ChatGPT’s plan precisely. Does ChatGPT have a claim to being a co-creator of the finished novel in this case? CCC can help make progress on this question. Let us say, favorably, that ChatGPT’s contribution scores highly for non-redundancy and originality: Jayani would not have come up with the plan herself and the ideas were not around elsewhere for her to adopt. ChatGPT has strong claims to directness here: the shape of the final output can be traced directly back to ChatGPT’s plan. ChatGPT also exhibits significant global control over the final novel: insofar as Jayani does not change the initial plan, ChatGPT controls the general direction and key components of the story. However, ChatGPT exhibits little to no local control: the suggestions are general in character and leave considerable scope for Jayani to fill in the blanks. She is the one steering and maintaining the form and content of the final output and ChatGPT gives no further input beyond the ideation stage. Most of the time and effort for producing the novel comes from Jayani’s writing—although, as noted earlier, time and effort involved in ChatGPT’s inference should not be entirely neglected. Leadership is not applicable to ChatGPT: it does not have intentions to direct Jayani to produce the novel or pursue and direct goals for the project. However, while scoring low on some of the criteria, when added together, ChatGPT still seems to have a candidacy claim to be considered a co-creator of ‘Whispers of the ChronoSphere’.

Note that, while it could be argued that ChatGPT should not be credited for the idea due to its nonhuman status and the disruptions to labor markets this could entail downstream (e.g. [43]), acknowledging the importance of ChatGPT’s contribution to the final novel is a necessary step in conducting the finer-grained analysis of how credit for the idea should be allocated. For example, it may be that, after looking at the novel plan produced by ChatGPT, we discover it is not original and highly similar to other authors’ works represented in the training data. Credit for this idea should then be distributed between the original author, ChatGPT and potentially others, as appropriate. Either way, the idea is not Jayani’s and failing to acknowledge that the idea came from somewhere else over credits her contribution.

Let us consider Jayani’s role in more detail. It is likely Jayani needed to add significant creative input to fill in the gaps in the plan, bring the characters to life and add her own writing style to the finished story. Scoring highly across CCC’s features, Jayani certainly deserves a share of credit for the finished novel: but credit for what? Her role is not exactly the traditionally conceived, idealized role of a novelist start-to-finish. But it is also not the role of a mere executor, with Jayani robustly steering the form and content of the novel by bringing her own style, skill, effort and non-redundant ideas to the final output, placing her firmly in co-creator territory. It is interesting to consider, if the roles are swapped around and it is Jayani who generates the detailed novel plan before giving it to ChatGPT to write out, whether intuitions may flip—with Jayani now seeming to conduct the important work conceptualizing the novel and ChatGPT being a mere tool to realize her vision.

Considering these scenarios pushes us to more accurately specify the role each contributor plays and to be more explicit about what was actually contributed when a text is produced. Our traditional template of seeking to identify and credit a key ‘author’ often leads us to gloss over different contributions to settle on one convenient name: but this practice is to the detriment of clarity and fairly attributing credit for contributions made. Cases of text creation involving AI increasingly disrupt this traditional template and encourage us to consider the specific contributions of each agent and entity, in similar ways to efforts currently advocated by some in scientific research [12, 77, 79]—although these, too, currently refuse to acknowledge AI’s contributions on par with that of humans.

4.1.2 Marketing copy

LLMs also increasingly drive the accelerated production of marketing materials. In this space, now-familiar questions about creatorship again arise: who should be considered a creator of marketing copy and promotional content produced with LLMs? Can users simply use outputs straightforwardly for commercial purposes without acknowledging the role of LLMs in their production, or indeed, that of other actors? Consider the case of an advertising agency whose marketing professionals and copywriters have invested significant research, time and creativity into developing a specific tone and style for a client’s campaign including, for instance, a distinctive slogan. It may seem morally (and perhaps legally) problematic for a competitor to use LLMs to produce advertising that (near-)replicates this slogan or unique style, and it seems necessary to demand a closer examination of where credit is due for any LLM-generated output that was trained on others’ creatively rich marketing work. Just as in other cases above, CCC recommends that we must trace relevant, non-redundant, original contributions that involved significant forms of leadership, control and independence towards the outputs in question, and that we credit agents and entities that made these contributions accordingly.

But what about more generic forms of copywriting and content production? Here, CCC draws out substantial variation across contexts in how important credit attribution is in the first place. Take John, a freelance copywriter, who uses ChatGPT to produce website text for a Mexican restaurant, with the results sharing strong resemblances to copy describing numerous other Mexican restaurants around the globe (with headlines such as ‘try our mouthwatering burritos’ and ‘fresh, authentic ingredients’). Here, no contributors will score especially highly across the features CCC tracks. John will not have exerted much control, leadership, originality or time/effort by writing prompts such as ‘produce headlines for a Mexican restaurant’s home page’. ChatGPT, meanwhile, may have shown control and directness—but little originality, given the nature of the task: producing text describing Mexican restaurants presumably samples a well-defined region of the latent space and does not require a system making any impressive connections at training. Similarly, those whose texts describing Mexican restaurants and cuisine were included in the training data also score low. Given the repetitive and standardised style of this type of content, it is unlikely that any single contributor to the training data could demonstrate a particularly high degree of relevance or control over the output, compared to other contributors—and their contributions likely lacked originality in the first place.

As this example of applying CCC reflects, in some cases, issues of creatorship are simply less important, e.g. when both inputs and outputs lack originality, and when creatorship does not carry much practical meaning or benefit for creators—e.g. John and others like him will get paid for their freelance work regardless.Footnote 10 To be sure, CCC, as we envision it, is not supposed to issue judgments about whether a given output is significant or not, and hence whether there is a need to identify its creators. That said, some of the features CCC draws on, like originality, naturally correspond with intuitions about these issues, and CCC is therefore disposed to usefully reflect such intuitions while also remaining responsive to additional reasons that highlight the urgency of identifying creators and accurately distributing credit. In sum, we might say that distributing credit is like distributing a pie: sometimes the pie is not very large to begin with and the parties that might have claims have little interest in getting a slice. Moreover, in cases where the set of upstream contributors is large, e.g. spanning all the creators of Mexican food-related text on the web, it might be unclear who even has a plate, and not much harm is done if a few crumbs are withheld from those who have one.

4.1.3 Essay writing

Finally, let us briefly turn to the case of essay writing using LLMs in higher education contexts, which has generated significant controversy and uncertainty about the functioning of education systems in the age of GenAI [20]. To explore some of these concerns, let us consider the case of Julius, a bright student who cannot be bothered to do his coursework and who uses ChatGPT to write an essay about climate change and intergenerational justice. He prompts the system iteratively to produce different text passages that he weaves together into an essay with relatively minimal input. Is Julius the (or a) creator of the essay he hands in? If so, why? And if not, who is?

This is an instance where creatorship and authorship importantly come apart, and CCC can help us unpack the distinction. Three features of CCC, in particular, are vital here: originality, independence and control. Let us first turn to originality, and generously assume that Julius had some original ideas for how to best prompt ChatGPT to generate the kind of text that would be useful to put a passing essay together—but did not apply any subject knowledge relevant to the essay. On CCC, this contribution of original prompting brings him closer to being a co-creator of the final output. But, according to our socially agreed upon notions of what it means to be an author in this specific context, the form of originality Julius contributed is not relevant: the institution of essay writing is supposed to prompt students to produce their own, original argument—not to come up with inventive ways of letting others develop or materialize such an argument in their stead. Thus, Julius’ show of originality does not strengthen his status as an author of the essay, even if it warrants claims to being a co-creator of the output.

We may equally grant Julius some forms of independence, e.g. in making choices about which LLM to use, how to formulate prompts, and so on. Similarly, however, this may not be the kind of independence we expect from an author of an essay, i.e. independence in regard to setting their own agenda, deciding whether to do things certain ways rather than others, and developing their argument in a way that does not depend, too much, on others’ guidance. Once again, the independence Julius shows through his ‘prompt engineering’ and his weaving together of ChatGPT’s outputs may boost his position as a co-creator, but not that of an essay author.

Julius’ demonstration of control also does not take the right form to qualify for authorship. For instance, while Julius may exert control over mostly formal aspects of the text, e.g. its length, tone, style, general topic and so on, he may be less able to control the content of the essay if he lacks relevant knowledge and understanding, e.g. when an assumption that the essay makes commits him to a counterintuitive corollary, but he is unaware of this. As highlighted earlier, control on CCC can come in different forms in the sense that it can be about different things (e.g. the form of an output, its contents, the information carried by it)—but to qualify not only as a creator but as an essay author (or artist, novelist, and so on), the control exerted by an agent or entity needs to be about certain things. In this case, Julius would have needed to control not just the form of the essay, but its contents, too, e.g. its main argument, thesis and various finer details, which requires additional knowledge and understanding that he has not demonstrated.

As a result, we see that, in the educational context, merely being the co-creator of a text may not be sufficient, because the role of co-creator has come apart from that of essay author. We can further unpack this divergence by returning to the context of creating visual outputs. Here, the demands for authorship in an essay writing context (and beyond) also depart substantially from the thinner requirements for control that are often pertinent for creating visual outputs. For instance, in prompting Midjourney to produce ‘bird in a tree’ Jared neither tries to communicate an idea or claim, nor are there important ways in which ‘bird in a tree’ has deeper, hidden meanings that Jared might be unaware of and unable to control. Jared simply wants an image of a bird in a tree that looks nice, and if we think the image has contents at all that go beyond mere form (e.g. designating a bird that is in a tree), these can be read off from its form in a straightforward fashion (i.e. pixels arranged in relevant way).Footnote 11 Given the lack of ‘deep’ content, there are few ways in which we could imagine ‘bird in a tree’ to fail to be accurate, informative, truthful, logically consistent etc. in a way that an essay can. So unlike Julius, it is difficult for Jared to fail to understand or be unaware of the immediate contents of the image (e.g. failing to see there is no bird, or it’s not in a tree). And there are consequently few ways in which Jared may fail to exercise control-over-content regarding ‘bird in a tree’. Control, in Jared’s case, is straightforward: if he keeps refining his prompts until he gets the image that he envisioned, his job is done. CCC can hence elucidate a range of important intuitions about what it means to be an author, a status which, according to existing norms, requires a more involved relationship to an output (e.g. by understanding the contents of an output one claims authorship for, meaning the things expressed by a text, and so on).

What CCC also alludes to, however, is that our current notions about what constitutes authorship may well change as GenAI systems become increasingly integrated into everyday lives and workflows. As tools like ChatGPT become ubiquitous, for example, the skill of crafting an essay manually (or writing in the traditional sense, more generally) may become less important to society than the ability to competently co-produce and sign-off on a text that one understands and feels represents one’s own opinion.

In sum, CCC can draw out novel insights as its scope is expanded. Focusing on text generation with LLMs, we have shown that CCC has robust explanatory powers across contexts (e.g. stressing that control, leadership, originality and other features are globally relevant to navigating creatorship questions), while also demonstrating that CCC helps explain why intuitions about creatorship and authorship differ between contexts.

4.1.4 Code generation

Another context to which CCC’s explanatory powers can be helpfully applied is coding. Here, too, applying CCC results in fresh insights, helping to clarify the claims of those who feel their work has been misappropriated by GenAI systems that can produce code. Like in other domains, LLMs for coding have raised a number of concerns, particularly regarding the misuse of licensed material. These tools, such as GitHub’s CoPilot, were trained on billions of lines of existing code, including that shared by programmers on the platform GitHub itself. It is frequently suggested that CoPilot produces unique coding suggestions based on what it learned from this training data, rather than directly copying any existing code, and that those prompting CoPilot are the sole owners of the outputs [19, 24, 48]. It is unclear, however, if this is the case. Many authors of code used for training have disagreed, for example, insisting that CoPilot fails to use appropriate attribution for licensed code. It appears, by GitHub’s internal testing, that CoPilot plagiarizes about 1% of the time, and even when not exactly replicating existing code it has been shown to provide solutions to specific problems with outputs highly similar to existing, licensed code [24, 25]. While many predict that this may turn out to be deemed ‘fair use’ in the various lawsuits brought against GitHub, such an approach goes against existing, socially agreed upon attribution standards in the coding community where the misappropriation of coding ideas or solutions is disapproved of, even while cooperative copy-pasting is customary. Legal experts, too, currently disagree on whether CoPilot’s outputs sufficiently ‘transform’ existing code to qualify as ‘fair use’ [19, 48, 66].

CCC is useful for disentangling the co-creatorship of CoPilot’s outputs. Take a real-life example that gained traction on X: Tim David, a developer and professor of Computer Science at Texas A&M, complained that CoPilot “[even] with ‘public code’ blocked, emits large chunks of my copyrighted code, with no attribution, no LGPL license. For example, the simple prompt ‘sparse matrix transpose, cs_’ produces my cs_transpose in CSparse,” [8] and shared an image comparing the two sections of code. In response, the chief architect of CoPilot, Alex Graveley, tweeted that “the code in question is different from the example given. Similar, but different” [8]—and dismissed the idea that it was straightforward to automatically identify one as being derivative of the other. As CCC shows, however, producing code that is “similar, but different” may not be sufficient for claiming sole creatorship, just like concerns about problematic appropriations of visual artists’ styles are not about pixel-by-pixel replicas of existing artworks, but rather about appropriating distinctive ways of doing things. So, given the strong similarity between David’s cs_transpose and CoPilot’s output, David’s contribution can be said to score highly on all of CCC’s features except for leadership and independence, given CoPilot produced the output without his knowledge (or permission).

The features of relevance, non-redundancy and originality are particularly important here. If we took Tim David’s code for cs_transpose out of the training data and retrained the model, what would CoPilot have produced in response to this prompt? If CoPilot was unable to produce useful code or produced different code (e.g. that is significantly less efficient), then David’s input was highly relevant, non-redundant and likely original in character—even if CoPilot’s output had been merely “similar” to his. As a result, David would be identified as a co-creator of the output. CCC’s ability to demonstrate this route to co-creatorship is especially pertinent to the coding domain, where ideas and form are not always neatly separable (even if current copyright law demands it, e.g. [40]): the appropriation of original programming ideas is objected to, even when the code used to implement the idea varies.

Finally, CCC also provides some answers to important questions currently being asked by programmers, including whether they can and should use code from tools like CoPilot. Applying CCC to cases like David’s suggests that the specificity and complexity of sections of code matters. AI-generated code that is more generic, shorter and less complex is a safer bet, while code outputs that are extensive and single-handedly resolve a specific problem are more likely to result in significant co-creatorship claims from others. Indeed, CCC highlights that, while it may be more tempting to view the replication of code as ‘fair use’ given its functional nature, writers of code whose work has been drawn on by GenAI systems may have equally legitimate claims to co-creatorship as artists, writers and other groups.

5 CCC advances existing debates

As we have shown, CCC can be productively applied to a range of additional domains beyond image generation and can draw out important novel insights, as well as support and challenge existing intuitions. Before concluding, let us highlight how CCC can advance larger, ongoing academic and public debates.

Addressing the controversial role of GenAI, some have insisted that—in the name of transparency and authenticity—GenAI systems should not be credited with creatorship [16, 58]. But, as others have argued concerning ChatGPT [42] and we have demonstrated here in regards to visual, textual and code outputs more generally, failing to examine the role of GenAI and other contributors in fact hinders transparency and authenticity, obscuring the process of creation and the significance of different agents and entities involved. Many academics have called for the fair attribution of credit in the creation of GenAI outputs [4, 22, 41, 58], but have not provided concrete recipes for doing so. Members of the public, too, have been asking and debating who should be able to claim creatorship of GenAI outputs [1, 51].

CCC is, to our knowledge, the first systematic framework to respond to these demands. It provides a fine-grained framework that allows and encourages a more nuanced allocation of credit, accommodating the unique aspects of GenAI-based creation, supporting common intuitions and resolving uncertainty around existing creatorship debates.

In doing so, CCC addresses several problematic tendencies in the public discourse around GenAI. Major differences persist in what people take to be the most compelling approach to attributing credit for GenAI outputs—with some members of the public stating that the “typical structure people will be crediting will be a brilliant human on top and the AI as a facilitator, or a human-AI synergy”, while others have assumed the lion’s share will go to “the AI and its creators”. Each side appears confident that their view is “obviously” what “most people” will take up [2]. CCC works to counter these assumptions by demonstrating the sheer complexity and diversity of credit attribution that uses of GenAI bring about. It also shows that brittle analogies, which liken GenAI systems to, e.g., a pencil or AutoCAD, or flattening assertions that ‘the history of art and technology has seen all this before’, do little justice to the intricacies and novelties of GenAI and its rapidly growing uptake across society [1, 17, 76].

In particular, CCC works against a popular tendency to overstate the contributions of users. Excited by the new possibilities that GenAI offers, users often take credit for outputs with little to no acknowledgement of the other agents involved in their creation—some going so far as to feel “we are becoming like small gods with those tools” [3, see also 64]. Academics in the public discourse have reinforced such hype, with some stating that “AI gives artists superpowers” [75]. As we have seen, CCC untangles agents’ roles in the creative process facilitated by GenAI, thereby aiding users to understand, negotiate and articulate the contribution they have made to final outputs.

CCC also helps challenge problematic narratives of GenAI creatorship. For instance, tech companies have incentives to downplay their hand in the creation of users’ individual outputs and to instead present GenAI as a beneficial, innocuous tool. But the collective-driven nature of image, text and code generation that CCC emphasizes makes clear that such a framing is not always accurate. Describing GenAI systems as mere tools may shift too much responsibility onto users; e.g., when GenAI systems have built-in propensity to generate toxic imagery or text it seems odd to insist that problematic outputs are the result of inappropriate tool-use alone. CCC makes clear that developers, too, play relevant roles in the production of specific outputs, although usually only indirect ones that are mediated by the GenAI systems they trained, fine-tuned and released. Attempts to push framings of GenAI systems as mere tools have already played out at significant scale in the negotiations surrounding the EU AI Act, in which the most dominant technology companies lobbied to push the act’s regulatory obligations onto European deployers (e.g. app developers whose products access GenAI systems through APIs) and users of their general AI models (including the likes of GPT-4 and Stable Diffusion), rather than taking accountability for potential damages themselves [35, 70]. In campaigning for this framing, tech company leaders and lobbyists have asserted “the balance of responsibility between users, deployers and providers… needs to be better distinguished” and that “giving the right responsibilities to the right actor in the AI value chain is key” (quoted in [70], pp.12–14). We agree in general, but not with their preferred distinctions. As CCC shows, understanding the roles played by users, developers and GenAI systems themselves in greater detail does not in fact liberate developers of responsibility. Their (indirect) hand in creatorship, and the accountability that comes with that, cannot be justifiably attributed to others further downstream. While CCC makes clear that developers are rarely candidates for co-creatorship, we have outlined earlier that, unlike users, they have global causal powers to steer systems away from producing problematic kinds of outputs. While such powers don’t ground creatorship, for they only yield imprecise control, such control may nevertheless be sufficient to ground responsibilities for certain global aspects of GenAI outputs, e.g. toxicity, bias, etc.

Finally, CCC also informs and critically challenges existing scholarly and legal conceptualizations of creatorship. CCC suggests that long-held expectations for how creatorship, thicker notions such as authorship, and legal concepts such as copyright should be attributed may now need reworking in the face of GenAI. Copyright attributions, for example, usually aim to identify a small set of agents—but CCC suggests that perhaps copyright sometimes needs to be distributed more widely, even if doing so in practice can be extremely challenging. CCC also highlights the degree to which existing theories are not fully appropriate for these new technologies and the multi-layered processes of creation they entail, while also suggesting that earlier, more general understandings of creatorship may lack sufficient flexibility. Using all-or-nothing categorizations rather than gradations for roles such as artist, author, engineer, programmer, assistant, or contributor, for example, may obscure important contributions. In regard to GenAI specifically, CCC responds to scholars’ calls for the fair attribution of credit, offering a framework to dissect the creative process and distribute degrees of creatorship in a finer-grained way than existing work.

6 Conclusions

In response to the public and scholarly uncertainty currently surrounding the fair attribution of credit, compensation, rights and responsibility regarding textual, visual and code outputs made using generative AI (GenAI), we have proposed the CCC (collective-centered creation) view as a systematic framework for addressing pressing questions about creatorship in this context. At its core, CCC maintains that GenAI outputs are created by collectives in the first instance. Reinforcing collaborative views that have so far been lacking more concrete instruments to understand how credit can be distributed, CCC provides a rich conceptual machinery for better tracking different contributors’ roles and attributing credit more accurately. We have shown how CCC can inform ongoing debates and resolve controversies by lending independent support to influential views and by prompting us to consider new ways of thinking about different forms of co-production with GenAI, be that in regard to the GenAI’s role itself or that of other candidates for co-creation, such as producers of training data. By applying CCC to multiple creative contexts, we have provided insights into how and when creatorship may come apart from thicker notions such as authorship and why it is often misleading to aim at offering neat, principled categorizations between different groups of agents (e.g. authors, creators, contributors, assistants). Taken together, CCC offers a flexible framework that can advance public, academic and legal debate as GenAI is developed further, deployed more broadly, and as we, collectively, form a better understanding of our relationships with it. As indicated earlier, CCC is also limited in scope. It does not yield definitive judgments on creatorship issues in specific cases, nor does it insist that its criteria are the right ones, or the only ones that matter. CCC, as sketched here, is intended as a systematic conceptual contribution on questions of creatorship with GenAI, but not as the final word on these issues. We hope that scholars from different fields will feel invited to contribute to the larger project of refining this type of approach, be that through technical contributions by computer scientists (e.g. efforts to measure difference-making contributions, control, or originality); conceptual improvements made by art and literary theorists, practitioners and philosophers to further detail CCC’s conceptual machinery; or suggestions by legal scholars to make progress on understanding how CCC’s tenets can be reconciled with existing laws or inform the development of tailor-made law that encodes novel intuitions about creation involving GenAI.