Keywords

1 Technical Basics on Natural Language Generation

1.1 Introduction to Technical Aspects

Natural Language Generation (NLG) is a major subfield of Natural Language Processing and Deep Learning overall. Recent breakthroughs in autoregressive models such as OpenAI’s GPT-2 (Radford et al. 2019), GPT-3 (Brown et al. 2019), InstructGPT (Ouyang et al. 2022) or Google’s Primer (So et al. 2021) have led to demonstrations of machine generated texts that were demonstrably difficult or even impossible to distinguish from regular written texts. When using NLG, liability claims can occur in any area where verbal communication is used.

The first applications that commercialize the technology are already starting to be available with self-reinforcing chat bots, automated code generation and others starting to enter the market. In this work we will explore those new application areas and focus on particularly those which would likely be deemed a good fit for well-intentioned use but could lead to undesirable, negative results under certain conditions.

We further examine the capacity of both human and algorithmic detection of machine generated text to mitigate the fast spread of generated content in the form of news articles and others. Building on previous work that has focused on generating legal texts with a previous generation of NLG tools, we train a more advanced auto-regressive transformer model to illustrate ways how such models operate and at what points the operator of the model has a direct or indirect influence on the likely generated output.

In the second part of the article, we examine civil liability issues which may arise when using NLG, particularly focusing on the Directive on defective products and fault-based liability under Swiss Law. Regarding the latter, we discuss specific legal bases that may give rise to liability when NLG is used.Footnote 1

1.2 Risks of Reinforcement Learning

1.2.1 Undesirable Language Generation

A possible way to adapt this and similar model to its users’ inputs is by applying reinforcement learning to the model. One such way using the transformer-based model we introduced earlier, is to add relevant user input to the fine-tuning dataset. This allows the operator to adjust the model to the user’s behavior and in theory improve upon the overall readability and comprehension.

The potential danger with uncontrolled reinforcement learning utilizing unfiltered user inputs as well using a not carefully vetted data source for the main fine-tuning dataset is shown by undesired outputs from the NLG. Two more prominent recent examples include Microsoft Twitter Bot Tay (Schwartz 2019) in 2016 and IBM’s Watson (Madrigal 2013) in 2013.

In the case of Tay, the Bot was training itself on the unfiltered interactions it had with Twitter users that used inflammatory and offensive language. Based on those interactions it would generate inflammatory and offensive language itself even when responding to users that did not use any such language.

One recent approach that is being adopted by OpenAI is content filtering at the input prompt level (Markov et al. 2022). In an environment that requires a high degree of moderation, content filtering can be applied at source to avoid prompts that will likely result in a hateful, violent or otherwise undesirable response. Footnote 2 While this does address to a degree the most extreme detectable input prompts, it does not promise an input bias free response, which has to be addressed at the model level.

1.2.2 Code Generation and Vulnerable Code Data

With the advancement of transformer-based text generation it is starting to become possible to train the models on very specific and technically challenging tasks. One such emerging field is automated code generation based on natural language input. The probably most famous and widely used one is Github Copilot which was released in June 2021.Footnote 3 It generates code sequences in a variety of languages given some comment, function name or surrounding code.

Copilot is mostly based on the base GPT-3 model that has been then fine-tuned to Github using open-source code (Chen et al. 2021). Since there is no manual review of each entry, Github advises that the underlying dataset can contain insecure coding patterns that are in turn synthesized to generated code at the end-user level. A first evaluation of the generated code already found approximately 40% of the produced code excerpts to contain security vulnerabilities in scenarios relevant to high-risk Common Weakness Enumeration (Pearce et al. 2021).

The unreviewed generation of potentially vulnerable code would pose a severe risk to the owner of said code, which makes an unreviewed or near-autonomous application of such a tool unlikely to be applied in an autonomous fashion using the underlying models. There are however automated code-review solutions available that inspect a given code passage for any potential quality and security issues (e.g. Sonar Footnote 4). Enabling a non-technical operator to use natural language to generate simple code excerpts that can be automatically scanned for vulnerabilities and deployed in a test environment that would be used for rapid prototyping would seem like the most reasonable semi-autonomous code generation utilization.

1.3 Detection of Machine Generated Text

With the wide availability of cloud computing allowing for the production of machine generated content and social media allowing for the mass distribution of it, the last barrier remains the quality of the generated texts and the ability of regular content consumers to distinguish it from regular produced texts. Based on previous related work, the ability of non-trained evaluators depends to a certain degree on the subject domain as well as the quality of the model itself (Peric et al. 2021). For excerpts that focused on legal language that was sourced from several decades of US legal opinions, the ability to distinguish ranged from 49% (generated by GPT-2) up to 53% (generated by Transformer-XL), both being close to random. Related work also shows that the accuracy improved for the more creative domain of human written stories with the prompt “Once upon a time” where GPT-2 achieved a result already 62% and GPT-3 again a random-guessing value of 49% (Clark et al. 2021). While it is unlikely that we will see machine generated literature ready for mass consumption any time soon, one concerning factor is that the accuracy value for detecting news articles is also at 57% for GPT-2 and a basic random-guess value of 51% for GPT-3.

While most consumers have difficulties differentiating between machine and human generated texts, the same models can be trained to differentiate between those two. If the applied model (GPT-2/Transformer-XL) is known beforehand the rate of detection was 94% to 97% high, while not knowing the model in advance resulted in a detection rate of 74% to 76%. Possible detection would therefore likely be especially hard for models that are not open sourced and cannot easily be replicated. This will likely be increasingly the case as it has been with GPT-3 that has been licensed to Microsoft with the model source code not publicly available.Footnote 5

1.4 Operator Influence on Output

1.4.1 General Remarks

While setting up tools that generate text content, there are only a few options to influence the output. The first and most basic layer of most NLG algorithms is the basic dataset that is used to train the base model. In the case of GPT-2 with its 1.5 billion it was trained on the data of 8 million Web Pages or 40GB of Internet text. The way the selection was done was partly through selecting outbound web pages from Reddit that received “3 Karma” as a way of somewhat human quality selection. This layer cannot be replicated in most cases from the end-users of the tool and has to be taken as is (while the option to train the model from scratch is present but difficult to implement on a sufficiently sophisticated level for most end-users).

The second and more influenceable layer is the dataset that is used to “fine-tune” the model. This step allows end-users to specialize their output to a certain domain, a certain language style or similar. Here the end-users have the highest degree of influence on the actual NLG output that will be generated. A particular domain area, such as “legal language” for example allows users to specialize the output of the generated language to sound quite similar and even identical to qualified legal language. Current trends for LLMs as well as technical limitations of most operators will make this layer increasingly unaccessible with most models only allowing operator interaction via commercial API. Footnote 6 This approach limits operators’ influence, but also leaves an auditable utilization trace that can then always be tracked back to the provider of the used model.

The third and most direct change of quality output language can be set at the basic parameter settings of the model. Those are usually and specifically in the case of our example model the desired output text length, the initial text prompt, and the “temperature”. The temperature here allows the end-user to increase the likelihood of high probability words and decrease the likelihood of low probability words, which often results in usually more coherent text with a slightly higher temperature (Von Platen 2020). This layer is the most easily modifiable and would be most interacted with on the side of the end-user. Those parameters will likely become less available for most commercialized applications of LLMs, leaving the model provider more influence to optimize parameters and outputs based on existing optimized result lengths and probability scores.

An additional parameter that is also sometimes in place is to exclude any foul language from being generated. This can mean that even if the given text prompt or underlying training or fine-tuning dataset would contain foul language the model would still never output any words that are considered to be offensive based on a set keyword list.

1.4.2 Data and Methods

To further illustrate the direct impact the parameter setting of the operator has on the output, we trained a GPT-Neo (Black et al. 2021) model on a legal text dataset, that applies some of the methods and data of a previous work, while using a newer and more advanced model (Peric et al. 2021).

Our empirical setting is U.S. Circuit Courts, the intermediate appellate courts in the federal court system. Circuit Court judges review the decisions of the District Courts, deciding whether to affirm or reverse. The judges explain their decision by providing a written opinion. Our corpus comprises 50,000 of these U.S. Circuit Court opinions, uniformly sampled from the universe of opinions for the years 1890 through 2010.1 The sample includes both lead (majority) opinions and addendum opinions (concurrences and dissents). We undertake minimal pre-processing, so that our generator can replicate the original style of the texts. We do remove some metadata and XML markup but keep capitalization, punctuation, etc. We preserve the special legal citation notation used by U.S. courts. The opinions are in general quite lengthy, containing an average of 2024 tokens (words) per article. The average length gradually decreased from the 1890s reaching a minimum in the 1970s. After that, the average length of these opinions has grown steadily until the present day. Notably, it was around 1970 when digital legal research databases came into use. Our approach to representing legal documents is an auto-regressive language model. We are given an unsupervised corpus. We fine-tuned an existing GPT-Neo checkpoint on our legal corpus.

1.4.3 Samples of Operator Influence

We demonstrate how the end-user can interact with the model based on the fine-tuning of the model and based on the parameter selection with the following samples using our reference model. In our first sample (c.) we use the full-sized dataset with the default temperature value applied that generates some meaningful results.

In the other two samples, we used the standard full-sized dataset, but now changed the temperature parameter to 0.9 (d.) and 0.1 (e.) respectively. We observe that the samples set for a higher temperature also have a better readability that doesn’t consist of too many listed legal precedents, but rather more natural language.

(a.) Prompt: The Tenth Circuit contravened those settled principles here. Not one of the decisions relied upon (…)

(b.) Genuine Ending:

(…) by the Court of Appeals—Estate of Ceballos v. Husk, 919 F. 3d 1204 (CA10 2019), Hastings v. Barnes, 252 Fed. Appx. 197 (CA10 2007), Allen, 119 F. 3d 837, and Sevier v. Lawrence, 60 F. 3d 695 (CA10 1995)—comes close to establishing that the officers’ conduct was unlawful. The Court relied most heavily on Allen. But the facts of Allen are dramatically different from the facts here.

(c.) Full dataset, temperature default 0.7

(…) by the majority holds that a defendant who is not a party to a conspiracy cannot be held personally liable for an overt act in furtherance of the conspiracy. To the contrary, the Tenth Circuit held that a defendant is personally liable for an overt act if he “was a member of the conspiracy at the time of the overt act and participated in it in some manner.”

(d.) Full dataset, temperature high 0.9

(…) by the Tenth Circuit is even remotely analogous to this case. In United States v. Smith, 7 F.3d 1433 (10th Cir. 1993), the court held that a defendant’s motion to suppress evidence obtained in violation of the Fourth Amendment was properly denied because the government did not have probable cause to search the defendant’s house.

(e.) Full dataset, temperature very low 0.1

(…) by the Tenth Circuit in this case is inconsistent with the Supreme Court’s decision in United States v. Booker, 543 U.S. 220, 125 S. Ct. 738, 160 L. Ed. 2d 621 (2005). In Booker, the Supreme Court held that the mandatory nature of the Sentencing Guidelines rendered them unconstitutional. 543 U.S. at 244. The Court held that the Guidelines were unconstitutional because they were not “sufficiently reliable to support a finding of ‘reasonableness.’” Id. at 245.

2 Legal Aspects

2.1 Introduction to Legal Analysis

The use of artificial intelligence (AI) such as NLG algorithms creates numerous legal challenges, including liability issues. Most AI applications are designed to develop autonomously to deal with problems that their developers did not or could not have considered when programming it. As a result, self-learning AI can evolve in unforeseen ways. In the worst case, an algorithm can cause harm to others through dangerous self-learned behavior.

When using NLG, liability claims can occur in any area where verbal, be it oral or written, communication is used. Hence, a hospital or insurance company using NLG based bots to communicate with patients, a lawyer using NLG to draft briefs, or a news outlet using NLG to redact articles are facing liability claims, if the algorithm’s output causes harm to others.

Legal literature that deals with the implications of AI on future tort claims, focuses on the European Council’s directive on the liability for defective products (Directive) and whether new liability provisions are necessary or not.Footnote 7 This article further analyzes how verbal communication generated by NLG algorithms can violate personal rights, infringe on intellectual property rights or be the cause of unfair competition claims.

2.2 Liability for Autonomous Actions of AI in General

2.2.1 Unforeseeable Actions of Self-Learning AI as a Challenge for Tort Law

The self-learning ability of AI poses major challenges for developers and operators. On the one hand, AI autonomously develops new solutions to problems. On the other hand, these same characteristics pose a tremendous challenge, as developers and operators are not always able to anticipate risks that self-learning AI might pose to others.

One might conclude that an AI’s action adopted from self-learning mechanisms are not foreseeable to the developers or operators of an AI, preventing them from implementing adequate countermeasures (Horner and Kaulartz 2016, p. 7; von Westphalen 2019, p. 889; Gordon and Lutz 2020, p. 58).Footnote 8 This perception would shake at the foundations of tort law, as it stipulates the developer’s inability to control the risk stemming from AI (Weber 2017, n10).

Scholars have attempted to address the autonomy aspect of AI and have proposed various ideas based on existing liability law, such as analogies to all types of vicarious liability (Borges 2019, p. 151; Zech 2019a, p. 215). As with other new technologies, some argue for a new legal basis to adequately regulate the risks and assign liability to manufacturers and operators (Zech 2019a, p. 214; Gordon and Lutz 2020, p. 61; Säcker et al. 2020, p. 823). Furthermore, some support an entirely new legal concept of e-persons, that makes the AI itself the defendant of a tort claim (Koch 2019, p. 115).

As with most new technologies, it must be carefully analyzed whether they create risks of a new quality or merely change their quantity (Probst 2018, p. 41), as only the first one requires the introduction of new liability rules. Whether the AI qualifies as such has yet to be determined.

2.2.2 Respondent to Tort Claim

With AI causing harm to others, manufacturers or software developers will have a more significant role as defendants in tort claims as they are the human minds to which unwanted actions can be accounted to. In the case of NLG algorithms, the output generated is based on the program code developed by the manufacturer of the AI, the person who may be blamed if the output has negative consequences (Conraths 2020, n73).

For AI algorithms, the self-learning phase, when the product is already put into circulation, becomes increasingly important (Grapentin 2019, p. 179). Due to this shift of product development into the post marketing phase, legal scholars argue that not only the manufacturer of the AI bears a liability risk but also the operator (Spindler 2015, p. 767 et seq.; Reusch 2020, n178). For individual software, the operator may also be liable for a manufacturer’s actions if the latter can be considered a proxy to the operator and the latter cannot prove that it took all reasonable and necessary measures to instruct and supervise the proxy to prevent the damage from incurring (Kessler 2019, n16). For example, for news outlets that harness NLG, the editor-in-chief or other supervisory staff may be responsible for the proper functioning of the software and be liable in cases the software causes harm to others (Conraths 2020, n81).

2.2.3 Causality as the Limiting Factor of Liability

The fact that self-learning algorithms independently develop after the developer has put it into circulation makes it difficult to delimit each actor’s causal contribution to the damage (Ebers 2020, n194). In most cases, self-learning artificial agents (such as NLG) are not standard products but are individually tailored to the operator’s needs. Hence, the manufacturer and operator act in concertation when developing and training the AI for the operator to use. Under Swiss law, if two defendants acted together, both are jointly and severally liable for all harms caused (Art. 50 Swiss Code of Obligations (“CO”)).

With AI applications and NLG algorithms in particular, the interaction with third parties, such as the operator’s customers, becomes increasingly important for algorithms to further develop (Schaub 2019, p. 3). As recent real-life examples have shown, the input generated by customers may have undesired effects on the AI’s behavior. In general, a manufacturer must take reasonable measures to prevent an algorithm from using unqualified in-put data (such as hate speech) to adapt its behavior (Eichelberger 2020, n23).Footnote 9 But it cannot be expected of a manufacturer to foresee every possible misuse of its product. Under Swiss law, a manufacturer can escape liability if the manufacturer proves that a third actor’s unforeseeable actions have been significantly more relevant in causing the damage than its own, therefore, interrupting the chain of causality.Footnote 10

Similarly, the Directive sets forth that the manufacturer is not liable if it proves that it is probable that the defect which caused the damage did not exist at the time when the manufacturer put the product into circulation or that this defect came into being afterwards. Some authors argue that the fact that the user’s interactions with the AI may be the root cause for harm and therefore, the manufacturer escapes liability.Footnote 11

Apart from these specific challenges, proving causation in any claim for damages is challenging and, in many cases, requires significant resources to establish proof.Footnote 12 In many tort cases, not a single cause will be identified to have caused the damage occurred, but various causes will have partially contributed to the claimant’s damage (Zech 2019a, p. 207 et seq.). For the claimant, proof of causation will therefore remain a significant hurdle for compensation for damages (Spindler 2019, p. 139 et seq.).

2.3 Directive on Defective Products

2.3.1 General Remarks

Most NLG algorithms will cause economic losses that are not covered by the Directive (Art. 9) or the Swiss product liability law which is congruent with the Directive. Nevertheless, it is conceivable that NLG algorithms will also cause personal injury or property damage. This is the case when an NLG algorithm provides wrong information which causes bodily harm to others (e.g., a doctor receiving a diagnosis from a device that uses flawed NLG to communicate, a communications bot from a private emergency call facility giving false medical advice).

Scholars have extensively discussed whether the Directive applies to software or not (von Westphalen 2019, pp. 890, 892). Despite its ambiguity, most argue that software falls under the Directive.Footnote 13 To counter any remaining doubts, the EU Commission has published amendments to the Directive that name software as a product.Footnote 14 The following analysis therefore assumes that software qualifies as products under the Directive.

Various aspects of the Directive are discussed in the legal literature, with two standing out: First, it must be determined if the actions of an AI system are to be considered defective within the meaning of the Directive. Second, manufacturers of an AI system may be relieved of liability based on the state-of-the-art defense if they prove that, at the time the product was put into circulation, certain actions of the AI system, particularly those that the system develops through self-learning mechanisms, could not have been foreseen with the technical means and scientific knowledge available at the time.

2.3.2 Defectiveness of an AI System

2.3.2.1 Consumer Expectancy Test

Many scholars struggle with how to determine whether an AI system is defective or not. The Directive considers a product to be defective if it does not provide the safety that a person may expect (Art. 6 (1) Directive). Hence, a product is defective if a reasonable consumer would find it defective considering the presentation of the product, the use to which it could reasonably be expected that the product would be put, and the time when the product was put into circulation. This test based on consumer expectations may not be adequate to determine the defectiveness of cutting-edge technology, as it is hard to establish, lacking a point of reference (Lanz 2020, n745 et seq.). A risk-benefit approach that determines whether a reasonable alternative design would have significantly reduced the occurrence of harm therefore may be more appropriate (Wagner 2017, p. 731 et seq.).Footnote 15

2.3.2.2 AI Challenging the Notion of Defect

Various causes can account for the error of a software. Some of which are easier to prove and do not challenge the definition of defectiveness as set forth in the Directive. Among those figure cases in which the manufacturer caused an error in the algorithm’s code, trained the algorithm (before putting it into circulation) with unsuitable data (Eichelberger 2020, n22), or didn’t implement adequate measures to prevent that third parties tamper with the code (e.g. hacking) (Eichelberger 2020, n22; Wagner 2017, p. 727 et seq.).Footnote 16 But other aspects that complicate the proof of defect or challenge the understanding of the concept of defects arise with AI (See also Zech 2019a, p. 204).

From a technical standpoint, it is difficult to analyze the actions of an AI which led to a damage due to the processes taking place in a way not yet perceivable from the outside (black-box problem).Footnote 17 Especially in the case of NLG, it may already be difficult for a claimant to prove that the output causing a damage was artificially generated, so that the Directive applies.Footnote 18

From a normative point of view, the fact that an algorithm, through self-learning mechanisms, may adopt behavior not intended by its developer, challenges the perception of defectiveness: Scholars discuss various ways to determine the expectations of a reasonable consumer towards AI systems. AI agents outperform the skills of humans for specific tasks. To compare the outcomes of AI algorithms to those of a human does not sufficiently consider the task-limited superior performance of AI compared to humans (Wagner 2017, p. 734 et seq.). Comparing the results of two algorithms to determine the reasonable expectations of customers is not more suitable as its consequence would be that only the algorithm with the best performance is being considered safe, while all others are defective (Wagner 2017, p. 737 et seq.).

Determining the defectiveness of the learning process of an algorithm may further prove to be difficult as it is mainly developing after the product has been put into circulation and happens outside of the control of the manufacturer, in particular with NLG (Binder et al. 2021, n44). The phase where the NLG algorithm interacts with users is particularly challenging the understanding of defectiveness: While the AI is providing its services to the users it simultaneously improves its abilities, therefore raising the question whether the algorithm can be considered defective when it was put into use or not (Binder et al. 2021, n44).

As previous examples have shown, interaction with users can cause an algorithm to develop certain behavior not intended by the manufacturer (Zech 2019b, p. 192). It must be determined whether the manufacturer must provide reasonable measures to prevent the algorithm from evolving in an unintended manner (Eichelberger 2020, n23). Scholars agree that a manufacturer must implement safeguards to prevent an algorithm from incorporating inappropriate or illegal user behavior into its code. This may prove easier in theory than in practice because it is very difficult to predict what user behavior may cause a self-learning algorithm to evolve in a way not intended by the manufacturer. If users interact with the AI in unpredictable ways that cause harm, a product cannot be considered defective (Zech 2019a, p. 213).

2.3.3 State of the Art Defense

A manufacturer can escape liability if it proves that a defect could not have been detected when the product was put into circulation with the available technical and scientific knowledge. New technologies with unknown negative effects such as AI qualify for the state-of-the-art defense. Scholars therefore propose exempting certain applications from the state-of-the-art defense, as legislatures in various jurisdictions have done for other technological features such as GMOs and xenotransplantation (See for example: Junod 2019, p. 135; Eichelberger 2020, n20; disagreeing: Zech 2019a, p. 213).

The distinction between conditions that qualify as a defect of a product and those that fall under the state-of-the-art defense when it comes to AI is difficult. Self-learning algorithms may develop undesired behavior that a diligent manufacturer could not foresee. But the fact that a manufacturer cannot foresee the potential harmful behavior of its AI software does not automatically trigger the state-of-the-art defense.Footnote 19 Examples from the past showFootnote 20 that it is not sufficient that a manufacturer was unable to foresee a specific risk of his product. The defense could only be invoked if he was also unable to anticipate a general risk of harm posed by his product (Wagner 2020, § 1 ProdHaftG n61; Zech 2019a, p. 213).

The drafters of the Directive have intended this defense to be applicable to very limited cases. Hence, manufacturers are required to have applied the outmost care and diligence to anticipate negative effects of their product to invoke the defense. Some authors argue that the risk of self-learning AI is already known enough to prevent manufacturers to successfully invoke the defense (von Westphalen 2019, p. 892; Zech 2019a, p. 213).

From a practical perspective the hurdles to invoke the defense are significant as well. A manufacturer that invokes it, would most probably have to reveal business secrets (such as the programming code) to the injured party, therefore making it highly unlikely that the defense will become widely used to defend product liability claims (von Westphalen 2019, p. 892).

Finally, there exists a wide array of possible applications for AI, while not every product category poses the same dangers to consumers. In most cases the imminent dangers of a conventional product represent the greatest risk for harm; enhancements with AI applications of these products do not significantly increase that risk. A general exclusion of AI from the state-of-the-art defense would therefore not consider the individual risk of harm of each product category (Koch 2019, p. 114).

In conclusion, an exemption as proposed by some authors requires more in-depth analysis of the specific risks of AI and their foreseeability. A general call for excluding new technologies from the defense is counter-productive and may hinder manufacturers from investing in products using AI.

2.4 Liability for Negligence

In the absence of a specific provision which allows the defendant to claim compensation for damages, in Swiss law, the general fault based civil liability applies (Art. 41 Swiss Code of Obligations). For NLG, fault-based liability would become relevant if the output generated violates personal rights, infringes intellectual property rights, or triggers the unfair competition act’s provisions.

2.4.1 Infringement of Intellectual Property Rights

The output generated by NLG algorithms without human intervention is not protected by copyright due to the lack of creative input (Ragot et al. 2019, p. 574). Reymond 2019; Ebers et al. 2020, p. 9).Footnote 21 Hence, output generated by NLG algorithms can be used by other parties without violating copyright laws or paying royalties for its use.

On the other hand, NLG algorithms can rely on sources available on the internet. The risk that they use copyrighted or patented works must be considered by their developers.Footnote 22

2.4.2 Personal Rights Violation

Several examples show that artificial intelligence algorithms for NLG may generate output that violates personal rights of others (defamation, libel etc.). Swiss law provides a victim of personal rights violation with a bouquet of remedies, ranging from injunctions to claim of damages. The autonomy of NLG algorithms does not exclude the operator’s civil liability if the output generated by the NLG algorithm violates personal rights of others (Art. 28 (1) Swiss Civil Code).Footnote 23 If the claimant proves that the operator of the NLG algorithm was at fault, he may seek monetary compensation (Art. 28a (3) Swiss Civil Code and Art. 41 Swiss Code of Obligations) (Meili 2018, Art. 28a n16).

News outlets are susceptible to claims if they vastly use NLG algorithms without proper oversight. As news circles are shorter and new players become increasingly important, the risk that output generated by NLG infringes personal rights increases.

News outlets are not the only operators that may see themselves involved in defamation lawsuits when using NLG that does not work properly. In particular, rating portals that use NLG to create comments on businesses (e.g. aggregated from individual feedback form customers) may violate personal rights if the (aggregated) feedback is wrong or violates personal rights of others (Reymond 2019, p. 111, et seq.). If search engines or website owners that provide links to content that violates personal rights of others are also liable, is not yet determined under Swiss law.Footnote 24

2.4.3 Unfair Competition

Output of NLG algorithms may be susceptible to unfair competition claims if in violation of fair competition requirements. Cases in which unfair competition issues involving NLG become relevant are all types of sales activities in which NLG is used to advertise products. This may involve widespread general advertising or automated comparisons with similar products of competitors, or descriptions tailored to individual customers to persuade them to purchase a particular product (Leeb and Schmidt-Kessel 2020, n6). With the advent of rating websites (such as Google Maps, yelp etc.) businesses are taking advantage of good ratings. NLG algorithms may help businesses to easily create fake reviews. The creating or ordering of fake reviews to unjustifiably improve or weaken the rating of a competing business, qualifies as unfair competition, and may give a competitor a claim in damages.

The Swiss Unfair Competition Act sanctions (UWGFootnote 25) various forms of unfair competitive behavior. In particular, the law sanctions actions that mislead customers about the NLG’s operator’s own products or those of a competitor (Art. 3 (1) UWG).

The law provides for various remedies, such as injunctive relief for the injured persons, for the state or professional associations (Art. 9 and 10 UWG). Injured persons may further claim damages based on the fault-based liability in Art. 41 CO.

2.4.4 Duty of Care

Owners of copyright protected or persons whose personal rights have been violated by NLG output have various legal remedies to act against the violation of their rights. Besides injunctions the injured person may claim damages. The latter is based on the general fault-based liability provision of the Swiss Code of Obligations (Art. 41 CO). Hence, the claimant must prove, among damages and causality, that the tortfeasor breached the applicable duty of care.

The duty of care is derived from legal or private standards, which for new technologies have yet to be established (Reusch 2020, n301). If specific standards are lacking, general principles for all sorts of dangerous activities apply. Thus, a person creating a risk of harm for others must take all necessary and reasonable precautions to prevent such.Footnote 26

The Swiss Supreme Court has already dealt with cases where links from online blogs led to webpages that violated personal rights. Without in-depth assessment the Court concluded that the operator of the blog could not constantly monitor the content of all webpages linked (Reymond 2019, p. 114 with other references). Similarly, the German Supreme Court concluded that a search engine operator cannot be held accountable for any personal rights violation of autocomplete suggestions generated by its software. The operator of a search engine must only take reasonable measures to prevent violation of personal rights. The smooth and efficient performance of the software should not be impeded by rigorous filtering systems.Footnote 27 On the other hand, the specific expertise of the manufacturer or the operator which allows them to assess the risk that the AI agent may infringe on third party rights must be considered to set the applicable standard of care (Heinze and Wendorf 2020, n84). Furthermore, the operator is also responsible to regularly control the algorithms datasets which it uses to improve its abilities (Conraths 2020, n69). But despite careful planning, the developer or operator of an NLG-algorithm may not always be able to predict who may be harmed by the algorithm used, hence preventing it to take measures against it (Weber 2017, n22; Binder et al. 2021, n46). Finally, with self-learning NLG-algorithms in particular, developers and operators must prevent that the algorithm takes up harmful behavior from the interaction with its users (Heine and Wendorf 2020, n84).

3 Conclusion

NLG offers a wide array of possible applications. Cutting-edge algorithms allow to create verbal output that cannot be distinguished from human created speech. A cat-and-mouse game is underway between those who program NLG and those who develop algorithms capable of determining whether certain output is human or artificial. As shown, the verification of computer-generated text is crucial from a legal perspective, as the legal bases are only applicable to one or the other respectively.

The self-learning function of artificial intelligence is challenging tort law. Interaction with users can result in unintended behaviors, and in the worst case, even cause harm. This raises delicate questions as to what extent a programmer or an operator of an AI should be liable for its actions as they might not always be able to anticipate future behavior of the AI derived from the interaction with third parties.

Legal research will have to grapple for some time with how to deal with the specific challenges of AI before rashly giving in to the temptation of new legislation.