“Obviously it is not desirable that a government department should have any power of censorship (except security censorship, which no one objects to in war time) over books which are not officially sponsored. But the chief danger to freedom of thought and speech at this moment is not the direct interference of the MOI [Ministry of Information] or any official body. If publishers and editors exert themselves to keep certain topics out of print, it is not because they are frightened of prosecution but because they are frightened of public opinion.”

Freedom of Press, George OrwellFootnote 1

This quote is taken from Orwell’s proposed preface to “Animal Farm”, which, following rejections from at least four publishers in the UK alone, was first published in 1945. The preface, however, did not reach the public until 1972, when the original manuscript was discovered after Orwell’s death. Shortly after the book’s publication in 1945, translations followed, among the first of which were translations into Polish and Ukrainian. Whereas the former was signed off with the translator’s name, the latter translation, due to safety concerns, was signed off with a pseudonym. The Ukrainian translation, however, included a dedicated preface by Orwell himself, which, unlike the original “Freedom of Press” preface, reached its intended audience during the author’s lifetime.Footnote 2

The story behind the first of Orwell’s two most famous works (the second being “1984”) is perhaps not quite “Orwellian” in the sense that this adjective is mainly used nowadays. However, the book’s story and the concerns expressed in the intended preface are very much relatable today. Orwell worried about self-censorship exercised by publishers and private individuals rather than the state. If written today, the preface would likely also address online service providers facilitating the creation and dissemination of information.

1 Freedom of Expression and Information and Copyright

There are a few points of contact between copyright and freedom of expression. To a varying degree, copyright safeguards expression from state control by abolishing formalities, providing authors with the means to control the fate of their speech manifested in a work through exercising economic and moral rights while (ideally temporarily) restricting access to information and impeding freedom of expression.

EU copyright harmonisation relies on internal market competence and engages with economic but not moral rights. The protection of fundamental rights and the common EU cultural policy are hindered by flaws and gaps in copyright harmonisation. The shortcomings of the narrow approach to harmonisation are evident in, among others, the CJEU judgments in Spiegel Online and Funke Medien, pertaining to non-economic interests to control the dissemination of a work and the conflict with the freedom of information it entails.

Could, and more importantly, should EU harmonisation strictly delineate against moral rights? The question is made relevant in the context of the EU’s aim to ensure a safe and accountable online environment and to create safeguards for general-purpose AI, given the intricate connection between copyright and fundamental rights. In the following, I would like to raise several points.

2 GenAI, Freedom of Expression and Moral Rights

The public availability of GenAI services (e.g. ChatGPT, Google Bard, Microsoft Copilot) has sparked an intense discussion on the copyright implications of such technologies. Do existing exceptions and limitations under copyright cover the use of copyright-protected material for training the models? Should authors be remunerated for the use of their works as training data? Should the output produced with no meaningful human involvement be protected by copyright or any sui generis rights?Footnote 3 What creative choices by human users could satisfy the originality requirement?

Perhaps less explored are the broader implications for the freedom of expression and information, and how copyright impacts, or ought to impact, the development. The EU AI Act recently agreed on attempts to take such a broader perspective, albeit with a rather limited interface with copyright.

2.1 Input, (Self-)Censorship and Digitisation

Quality of input (training data) for AI models is a general issue often discussed in terms of the likely reproduction of biases and resulting discrimination. While not a copyright question per se, it is, nevertheless, implicated by copyright policy.

The DSM Directive introduced the text-and-data mining (TDM) exception, albeit safeguarding right holders’ option to expressly exclude TDM outside of scientific research. What does the right to opt out from TDM for non-research purposes mean for developing GenAI models? Widespread reservations might reduce the availability of high-quality sources, and the issue of models relying on publicly available but often lower quality data is widely known (for example, the European Commission has recently opened an investigation into Twitter/X under the DSA). Furthermore, the decision to opt out might not predominantly rest with an author personally but an intermediary acting as a right holder, with a possibility to de facto censor the available content at their own discretion.

The quality of data is further dependent on the policies with respect to digitising various resources. The long term of copyright protection and difficulties clearing the rights even for orphan works are all known issues. The lack of access to digitised works in general, irrespective of their copyright-protection status, can negatively impact the diversity of input data and disproportionately put certain languages, groups, and cultures at a disadvantage. A public policy towards making content available could be helpful to overcome the lack of incentives to gather data that is not as easily available. Translation of the EU acts into all 24 EU official languages is an example of a policy serving multiple interests, from access to justice and language diversity of the Union to enabling R&D (for example, the DeepL translation service uses this database).

Needless to say, whether due to public policy or the choices made by the developer, some works could be prioritised or fail to be included, accidentally or on purpose. The draft AI Act, in this sense, addresses some of the concerns through a transparency requirement, namely, an obligation to make publicly available a sufficiently detailed summary of the content used for the training of the model, irrespective of its copyright protection status. The obligation is obviously important to the moral right of paternity under copyright, but also for the ability to detect possible interference by the state or intermediaries with free speech through the curation of training data.

2.2 Models, (Self-)Censorship and Integrity

The output quality depends on the training data, model, and prompts used. From the copyright perspective, discussions revolve around the necessity of safeguards preventing output from reproducing copyright-protected works, implying some kind of filter. Needless to say, the balance between the protection of copyright and the effective use of exceptions and limitations enabling free expression, derivative works and access to information needs to be preserved.

As Orwell further says in “Freedom of Press”, “when one demands liberty of speech and of the press, one is not demanding absolute liberty. There always must be, or at any rate there always will be, some degree of censorship, so long as organized societies endure.” Addressing hate speech would be the most apparent example. Recently, the EU stepped up regulation addressing risks connected to illegal content and content affecting fundamental rights, e.g. under the Digital Services Act (DSA). While it is unclear to what extent providers of GenAI fall under the DSA, the Act’s objectives suggest it might be desirable.

The capability of models to alter input received through a prompt in a way that changes its meaning, deliberately or not, is alarming. Recently, I asked ChatGPT to “streamline” a few sentences. To my great surprise, the phrase “Russia’s war against Ukraine” within a sentence was transformed into a “conflict between Russia and Ukraine”. Running the same sentence with “enhance”, “paraphrase”, “amplify”, and “clarify” instructions resulted in the same alteration. The instruction to “simplify” produced “the war in Ukraine”.

This was quite a departure from the intended wording. It is rather unlikely that ChatGPT has intentionally been instructed to call Russia’s war a “conflict”. Could it hypothetically be due to the abundance of training data using that language or an attempt to make the output more “neutral”? What is particularly disturbing is the fact that such rather significant changes were made during a mere editing task. Translating the same sentence into different languages with ChatGPT produced mixed results: Norwegian translation also suddenly referred to a “conflict”, whereas the original meaning was preserved in various other languages.

In many cases, a user creating a prompt remains in a position to discover such changes. Nevertheless, could such behaviour of models encourage human user self-censorship by suggesting more “neutral” (or more “extreme”, for that matter) language? Moreover, in the context of using others’ works in a prompt, what about these works’ potentially compromised integrity? The right of integrity of a work certainly merits attention, reaching beyond the author’s reputation to the issues of content distortion and disinformation.

3 Responsible (Use of) GenAI

Should GenAI be treated as any other technological development from the copyright perspective? Perhaps not. Individuals’ responsible use of (Gen)AI will be crucial, just as with other digital tools. However, the realities of online information space and the role of intermediaries point towards a need for a comprehensive look at the role of copyright in safeguarding and enabling freedom of expression and information in the context of GenAI.

The transparency requirement in the AI Act and the requirement of labelling AI-generated content are, of course, much welcomed, also in the light of enabling responsible use. However, the discussion on the technology’s (in)ability to preserve the integrity of input (whether training data or prompt) and the role of developers and models in curating free speech is equally relevant. The concerns are not unique to GenAI per se, but the capabilities of systems to produce content that can easily be disseminated online on an unprecedented scale perhaps make the issue somewhat more pressing. Is it not time to discuss (EU) copyright’s role in a democratic society, in particular its moral rights dimension?