1 Introduction

The widespread research and development in the Artificial Intelligence (AI) sector is progressing technology at an exponential pace. Natural Language Processing and Deep Learning researchers have been working towards the goal of creating chatbots that can respond to any prompt, and provide relevant, tailored solutions to even the most niche of situations. Even though this development has been occurring for years, the recent commercial release of Generative Language Models (GLMs) like ChatGPT (developed and released by OpenAI, available at chat.openai.com) has led to massive public awareness of the power of these chatbots and nearly every field of work is trying to use it to their benefit [1]. Specifically, ChatGPT, running on GPT-3.5, crossed 1 million users within 1 week of its launch, gained 57 million users in its first month, and is bounded by 175 billion parameters (numerical values used by the model to make predictions based on input to create contextually relevant and coherent responses) [1, 2]. GPT-4, the next iteration of OpenAI’s GLM, has a training set of 170 trillion parameters and can accept images as input amongst further improvements that will lead to far more complex interactions and conversations [1, 3].

This rise in popularity of GLMs like ChatGPT has not skipped over the legal world. There are inherent ethical considerations in inputting sensitive client information into a third party tool and lawyers should constantly be wary that their actions do not violate the attorney-client privilege by removing specific information and breaking down complex tasks and queries into smaller, less identifiable subsects. Still, ChatGPT has been and is being used in many legal scenarios including responding to real law school exams and taking the Bar Exam [4, 5]. In the former case, ChatGPT passed all four classes but consistently performed at or near the bottom of the class [4]. ChatGPT performed better on essay style questions than multiple choice in an academic legal setting [4]. ChatGPT-3.5 was also instructed to take the multistate multiple choice (MBE) section of the Bar Exam [5]. It was able to achieve a correct rate of 50.3% and achieved passing scores in the Evidence and Torts sections [5]. GPT-4 just recently passed the Bar Exam with flying colors, scoring in the top 10% of test takers, in contrast to GPT-3.5’s score in the bottom 10% [6]. Newer papers have also looked at the implications of generative AI as it intersects with copyright law principles such as authorship and fair use [7]. ChatGPT has also been experimented with to help fill out legal timesheets and ease the burden of tedious labor for lawyers [8].

Despite the potential, there are still quite a few limitations to GLMs. The most evident limitations in GLMs like ChatGPT are the production of “hallucinations” in the output [9]. Hallucinations are defined broadly as text output that is nonsensical or unfaithful to the provided source input [9]. Hallucinations are concerning because they hinder the performance of the model and raise safety concerns for real world applications such as in the legal field [9]. For example, while ChatGPT may encrypt inputted data, it can still share a lot of personal information, and GLMs can be prompted in specific ways to reveal information that it was trained on [9, 10]. This can reveal information that is privileged and undermine the ethicality of the legal system. Hallucinations can also occur in the creation of case law that does not exist, leading to issues for lawyers depending on the case law for litigation. However, these hallucinations can potentially be reduced by modeling the input dialogue in a certain fashion [9]. As such, the most necessary next step for GLMs is reducing the prevalence of hallucinations when outputting helpful legal answers. This cannot be done without the standardization of prompting procedures. This paper attempts to present a primary iteration of nomenclature that will standardize legal related GLM search methods by proposing a framework of interconnected “variables” and “clauses.” This framework aims to provide attorneys and potential clients with better legal advice and more pertinent legal solutions.

2 Methods

2.1 Overview

The GLM that was examined in this paper was ChatGPT4. At the time of this publication, ChatGPT runs on GPT-3.5 or GPT-4 with 3.5 as the free commercial version of OpenAI’s product and 4 costing a monthly subscription of $20 per month. The reason we chose ChatGPT4 was that it has an advanced architecture that allows it to generate more accurate and contextually appropriate responses than its predecessors, such as GPT-3.5 [11]. While it might make sense to focus on the free version since legal problems pervade society regardless of class, race, orientation, the integration of ChatGPT-4 into Bing provides a free version that still leverages the model's advanced capabilities. This makes it accessible for a wide range of users, including those who may not have the resources to access more advanced or costly AI tools. The availability of ChatGPT-4 on a widely used platform like Bing ensures that more people can benefit from its advanced capabilities. Also, ChatGPT is currently the most well-known, high-performing and easily accessible GLM to the public at large. Just like other prevalent GLMs in the space, ChatGPT was trained using Reinforcement Learning on Human Feedback (RLHF), so the framework and terms outlined below will be translatable beyond just ChatGPT [12].

The methods for creating the standardized framework were derived from lectures at the UCLA School of Law and National Bar Association publications analyzing common elements of legal cases and general prompt engineering techniques that have been researched in other fields. The essential elements of any legal case are the background details (commonly known as a statement of facts when writing legal briefs), and a cause of action (examples include what the person filing a legal case is claiming or what a party has been charged with). Without these two pieces of information, no lawyer, law student, or algorithm could figure out what cases to use, how to go about finding a solution, or how best to construct an argument for litigation [13]. For GLMs in particular, without these two buckets of information, a model cannot provide any useful or contextually relevant information. This was verified by asking ChatGPT to simply “Help me drop charges against my client” or “What cases support my cause of action?” Both of these led to vague, unhelpful answers unrelated to the issues at hand.

As such, we decided that these two components must be present in every litigational legal prompt and will be defined as Background Variables/Clauses and Dependent Clauses. These components were defined as such to have universally understood phrasing. The other component that is needed to gain relevant output is a phrase containing the final directions of what a user would like to receive from the GLM. We are defining this as the Generative Clause, again defined to have a universally understood meaning (“What type of information are we trying to get the model to generate?”).

On top of these key pieces of the standardized nomenclature, every litigational issue deals with the concept of stare decisis, which holds that courts and judges should honor “precedent”—or the decisions, rulings, and opinions from prior cases [14]. As such, having a precedent-requesting phrase in the input to a GLM is highly encouraged if not practically recommended. However, since it is not technically necessary to get an analytical answer from a GLM, this is not included in the base components of the nomenclature. We define this phrase as the Source Clause.

Lastly, various simple prompt engineering techniques can be incorporated to supplement the accuracy and quality of output received by users. We find that the most relevant and useful techniques involve perspective shifting and tone alteration. Perspective shifting has been found to improve contextual understanding and relevancy of output, which is very useful in a specialized field such as law [15,16,17]. Tone alteration is a fundamental ability of GLMs, and taking advantage of this can prove very beneficial to the field of law since tone is extremely important when articulating an argument, and judges will oftentimes ask for both objectivity and persuasiveness as there is a myriad of both objective and subjective tests in the scope of the law [18,19,20]. Both of these prompt engineering techniques are highly recommended when prompting for litigation purposes, and we define them as Perspective Clauses and Tonality Clauses, respectively. These last two types of clauses round out the standardized nomenclature we propose.

Below, we provide more detailed descriptions of the nomenclature and then provide example prompts and outputs that were derived to validate the standardized nomenclature for legal prompting.

2.2 Input

An input is just the entry text into a GLM.

2.3 Variable

A variable is any input text that causes the GLM processing to become more narrow in scope. A variable is part of a larger clause that is connected to other variables and segments of the input.

2.4 Clause

A clause is a sentence in the input which connects variables and segments to each other and to the overall purpose of the input. Clauses can be connected in ways that also cause the GLM processing to become more narrow in scope. Below, we define two different types of variables and four different types of clauses that help to categorize and standardize legal related inputs.

2.5 Background variable and clause

A background variable is any piece of information relevant to furthering the legal matter at hand. These are the most pertinent pieces of information that will be used by the GLM in creating its output. This can include, but is not limited to, demographics of a client or attorney, jurisdiction involved, the type of law that the problem is outlining, what special assumptions come with that field of law, or applicable pieces of evidence for establishing claims. Almost all legal related prompts will require a multitude of background variables. For litigation, variables such as known case precedent, jurisdiction, client demographics, and facts surrounding the alleged claim are most useful. For transactional work, the type of transaction, the location of the transaction, the demographics of the clients involved, and parameters for negotiation from your client are some of the most useful background variables. For example, a user may say that the case is “to be litigated in California,” or her client is “a woman being paid less than her male counterparts.” A background clause is just a sentence which contains one or more background variables.

2.6 Dependent variable and clause

A dependent variable is any action or step already taken or wanting to be taken by the user as a result of the information outlined in the background clauses. This can include, but is not limited to, wanting to acquire another company, going to pay a fine, being charged with drug possession, or planning on bringing a suit under a certain act. A dependent clause is just a sentence which contains one or more dependent variables.

2.7 Generative clause

Near the end of the input, there must exist a “generative clause” which provides instructions to the GLM on the desired output. This clause provides instructions on how to utilize all the preceding input and create the desired output. The clause can be modified as the user sees fit for the task at hand, specifying precisely what task the user is trying to solve. For example, a user could say, “Please help me develop a strategy to proceed in advising my client, preferably with the goal of getting the aforementioned charges dropped.”

2.8 Source clause

Near the end of the input, there may exist a “source clause.” This clause outlines to the GLM what types of sources are needed to verify the output as well as the amount of sources. The source clause can be modified as the user sees fit to provide relevant cases, regulations, secondary sources, or statutes to support the output. The clause can also be further tailored to provide outputs in a specific field of law and can be weaved in with other clauses in one sentence. For example, a user could say, “Please find me relevant case precedent and statutes that support your opinion. Make sure your advice is tailored to general tort law in the state of California.”

2.9 Perspective clause

To take advantage of the unprecedented imaginative capabilities of GLMs, users may consider adding a perspective clause at the beginning of the prompt telling the GLM to pretend that they are an attorney in a specified field of law. This can be useful if the user does not have much of a clue on specific arguments, case law, or areas of law to start with, and can provide advice on what area of law to look for, what initial arguments can be pursued, and what strategies to take. This can be particularly important in both transactional and litigation work to supplement the brainstorming phase and quickly point lay people or attorneys in the right direction. For example, a user could say “You are a mass torts attorney based in Costa Mesa, California” or “You are an employment discrimination lawyer.”

2.10 Tonality clause

Depending on the task at hand, due to the unique nature of the legal field, there may be the need to add a “tonality clause.” This clause simply outlines what tone the user would like the output to have. This is especially important in litigation when looking at writing briefs or synthesizing arguments and strategies as there can be any tone ranging from purely objective to wholly persuasive for one’s client. For example, a user could say “outline the main arguments that could be made regarding this scenario in an objective manner,” or “outline the best arguments for my client, finding and drawing conclusions in a way that favors my client and is persuasive in tone.”

2.11 Output

The output is the raw text data that is given by the GLM after an input has been entered.

3 Annotated input prompts

figure a

With prompts 1 and 2, the aim is to test the baseline nomenclature of background, dependent and generative clauses while also testing the significance of adding a source clause to a prompt. The prompt was modeling based on a fact pattern provided on a UCLA Law Criminal assignment to allow for a known set of parameters and relevant cases (Figs. 1, 2).

Fig. 1
figure 1

Annotated Input Prompt for Criminal Law Scenario with Only Background, Dependent, and Generative Clauses

Fig. 2
figure 2

Annotated Input Prompt for Criminal Law Scenario With Source Clause Added

With prompts 3 and 4, the aim is to test the significance of adding a perspective clause to a prompt by modeling it based on a fact pattern provided on a UCLA Law Torts exam to allow for a known set of parameters and relevant cases. The first prompt of the two provides a baseline with all of the previously validated clauses (Figs. 3, 4).

Fig. 3
figure 3

Annotated Input Prompt for Products Liability Tort Scenario

Fig. 4
figure 4

Annotated Input Prompt for Mass Torts Scenario with Perspective Clause Added

With prompts 5 and 6, the aim is to test the efficacy of adding in the tonality clause to a prompt. The prompt fact pattern was modeled identically to a Legal Research and Writing Fact Pattern given at the UCLA School of Law to allow for a known set of parameters and relevant cases. The first prompt provides a baseline with all of the previously validated clauses (Figs. 5, 6).

Fig. 5
figure 5

Annotated Input Prompt for Equal Pay Scenario

Fig. 6
figure 6

Annotated Input Prompt for Equal Pay Scenario with a Tonality Clause Added

4 Discussion

4.1 Overview

Due to the novel nature of commercially accessible GLMs, no work has been done to standardize the input prompts within the legal litigation sphere. This paper aims to establish a standardized nomenclature that can be easily reproduced and followed to consistently obtain relevant and useful legal information from GLMs for litigational purposes. The methods set forth in this paper were utilized in a multitude of scenarios ranging from criminal law to tort law. These scenarios were chosen because of the broad nature of the legal field itself, but also to show that the proposed methods work regardless of the legal subfield. The input prompt scenarios were created based on known fact patterns from previous test or assignment questions given at the UCLA School of Law. The prompts themselves were then created following the proposed methods, each output was generated by ChatGPT, and then we analyzed the output against the known answers to the fact patterns to see whether the goal of the prompt was met with accuracy and whether each part of the nomenclature provided value.

4.2 Input prompt nomenclature

This paper is the first to propose the generalized structure of “variables” and “clauses” for standardizing legal input prompts to GLMs. All of the above scenarios show that each part of the nomenclature provides a specific benefit and the combinations of specific variables and clauses can provide relevant legal information in various manners, with varying degrees of persuasiveness, and utilizing relevant, proper sources.

Each prompt started with a large set of background clauses, peppered with a multitude of background variables. More research needs to be done to figure out if there is an optimal number or set of background variables, but the current scenarios seem to point in favor of including as many background variables as possible so that the GLM can provide less generic legal advice. After the background clauses, all of the above scenarios contain a dependent clause and dependent variables. This is to mimic real-life scenarios where some preliminary event will have occurred like a criminal charging or the user will want to take some action as a result of the background variables like pursue a claim. Each prompt also included the generative clause as described above. The very flexible nature of GLMs allow for these clauses to be written in a variety of ways. With many of the main words used in the generative clauses like “outline,” “analyze,” and “overview” leading to similar results, we predict that this portion of the nomenclature will need the least amount of fine tuning. However, further research can still be done to find the nuances in output when using slightly different generative clauses.

We then implemented a step-wise, iterative testing approach, first adding in the Source Clause, then the Perspective Clause, and lastly, the Tonality Clause, analyzing and ensuring that each new clause added more accurate and relevant information to the output, improving the prompts. Source clauses do not seem to need much precision, as any mention of “case law,” “sources,” “precedent,” “statutes,” and more lead to adequate listing or weaving of sources. If more sources are needed, the source clause can be made more explicit or a quota can be set, asking for a specific number of sources. Despite these clauses being relatively flexible, we urge future research to explore these clauses in more depth to create a better understanding of the effects of certain words used in these clauses.

4.3 Scenarios 1 and 2

The first way we wanted to test our nomenclature was through the inclusion of a source clause in our input. In analyzing a legal fact pattern to make coherent legal arguments it is imperative to consult case precedent and statutes. Consequently, for AI to become a valuable tool in the legal industry, it is equally as important for it to be able to cite to precedent when making its arguments.

To test how well ChatGPT would be able to cite to authorities in its outputs, we started with an input that did not include a source clause to serve as a control. We used a short, criminal law fact pattern derived from fact patterns given at UCLA School of Law during practice assignments, in order to test not only the accuracy of the model’s answer against a known solution, but also to assess the difference in quality of answer between the inclusion and exclusion of the source clause. We expect the input with the source clause will provide a higher quality response due to the model being prompted to ensure the accuracy of any arguments it makes, rather than producing an output with no reliance on authority. Additionally, we made our inputs as neutral as possible by excluding any words like “my client.” This was done to avoid producing any biases and produce an objective legal analysis and also avoid any potential overlap with perspective clause testing.

Further, there are certain components that we would expect to see in the model’s response based on the face pattern and corresponding cases given by UCLA School of Law Criminal Law courses. First, given that this prompt deals with the killing of a child, there should be some discussion of murder and manslaughter and the differences between the two crimes. There should be some discussion regarding the defendant’s mental state for each offense and how the defendant’s actions may or not be implied to constitute a higher offense. We would also expect the model to connect the facts provided in the input to certain elements of the crime to determine if the facts support or do not support a given criminal offense.

In terms of the substance of the response to the control prompt (see Table 1), ChatGPT does a decent job in its analysis. It correctly identifies a couple elements of first degree murder, felony murder, and manslaughter but, as expected, fails to cite to any case or other form of authority. In terms of accuracy, the response does a good job. The information is correct but lacks comprehensiveness. First, it does identify the correct legal concepts implicated in the fact pattern. By providing sections on first-degree murder, manslaughter, and felony murder, it identified the possible crimes that the hypothetical defendant would be implicated in. It notes that premeditation and deliberation are typically required for a first degree murder charge, which is true. It also correctly explains that here, there is clearly no intent to kill, but establishes that a felony murder theory of liability would be enough to sustain a first-degree murder charge for the defendant since the car theft could constitute a felony. By recognizing the importance of the facts surrounding the car theft, the model’s analysis is stellar.

Table 1 CIO for Scenario 1 (see Fig. 1)

Additionally, the model recognizes that if the prosecution fails on a murder theory of the crime, they would move on to manslaughter, specifically vehicular or involuntary manslaughter. It fails, however, to provide an adequate definition of either of these crimes, which is a weakness throughout the model’s response.

In terms of the model’s response, it does a good job in looking into how the defense could rebut or mitigate the charges.The only weakness in this section of the output is suggesting duress as a possible defnense here. None of the facts given to the model would suggest that the defendant here could assert a duress defense. In addition to this, the model fails to define what duress is.

A high point of the model’s response, however, is the inclusion of the defendant’s possible civil liabilities. While we intended the model to focus on the possible ciriminal liability of the defendant, which it does do, it was surprising that the model left space to talk about how a wrongful death suit may also be brought, a suit which has a lower burden of proof than criminal cases. This is definitely something a defendant would want to know, and it indicates that the model likely leaned in on the part of our input that instructs the model to analyze the “scope of the defendant’s liability.” Here, the model clearly does this by looking beyond the scope of the defendant’s criminal liability.

Next, we wanted to test how the model would do with the inclusion of our source clause. As seen in Fig. 2, we did this by requesting that the model “cite multiple specific relevant murder and manslaughter cases, statutes, and regulations.” As expected, the model complied with this request and provided a new and improved answer (see Table 2).

Table 2 CIO for Scenario 2 (see Fig. 2)

The addition of the source clause prompted the model to include seven sources of authority which, in turn, appeared to lead to more comprehensive definitions of the crimes implicated here. For example, in our second criminal law output, the model provides additional information, along with authority, in its discussion of first degree murder. It also applies the facts of our hypothetical situation by weaving facts into its discussion, showing how certain facts match up with what is required for a first degree murder. Specifically, when discussing first-degree murder and the felony murder rule, the model not only provides definitions for these crimes, but also includes, through use of a parentheses how auto-theft could be a felony, thus showing how this crime fits with this fact pattern.

Next, the model provides definitions for both vehicular and involuntary manslaughter, correctly identifying them as potential crimes the man could be liable for. However, this second output, contrary to the first output, provides statutory support for its definitions, highlighting a potential success for the inclusion of the source clause. It then moves on to including four cases: People v. Sanchez, People v. Knoller, People v. Watson, and People v. Rios. Sanchez, Knoller, and Rios, are all cases that are relevant to this case with each of them touching on either the mental states required for manslaughter and murder or the type of conduct necessary to fulfill the elements of each crime. One downside of the model’s response is the inclusion of People v. Watson. While on the surface, the case seems similar to the others in that it deals with a car accident leading to a person’s death, People v. Watson primarily deals with a case where the driver was under the influence of alcohol. The fact pattern we have the model in no way suggests that the hypothetical defendant was impaired in any way. This is one flaw in the model’s response due to its inclusion of a case that is not relevant to the issue.

In its “Comparative Analysis” section, the model does an excellent job at identifying key factors in the fact pattern that will determine the hypothetical defendant’s liability. The speed of the driver and the circumstances of the police chase are both relevant in determining what crime the defendant could eventually being found liable for. Additionally, the model makes an excellent point by explaining that a first degree murder charge is unlikely unless the underlying car theft is successfully proven as a felony. It correctly identifies that defendant had no premeditation or deliberation when he killed the child.

Overall, the inclusion of the source clause not only led the model to provide authority for its analysis, but it led the model to provide a better response. It is interesting to note that the model did not touch on any second degree murder charge, however, here it is less likely that a second degree murder charge would be sustained. A more comprehensive analysis of the potential of liability of the defendant likely would have included a discussion of second degree murder and why the defendant’s situation does not fit here. That said, the model does provide a decent starting point. Additionally, the inclusion of our source clause clearly prompted the model to include cases, jury instructions, and statutes to its response which provided a better output than our original input allowed. Thus, new attorneys should include a source clause in their inputs when consulting generative language models.

4.4 Scenarios 3 and 4

Tort law remains a cornerstone of the American legal system, particularly in the realm of personal injury law. This sector stands out for its accessibility and widespread application. In our study, illustrated in Figs. 3 and 4, we aimed to evaluate ChatGPT’s proficiency in dissecting legal scenarios and articulating coherent legal arguments to a non-specialist audience, leveraging relevant case precedents.

Prompts 3 and 4 were designed to test ChatGPT’s effectiveness in constructing arguments for specific legal issues emerging from the presented fact pattern, and to assess how the inclusion of a perspective clause influences the model's output. These prompts involved a detailed scenario, specifying the city and state for jurisdictional context, a comprehensive description of the facts to narrow down the scope of potential legal arguments, and a precise demand for authoritative support for these arguments. Prompt 4 introduced an innovative element by assigning the Generative Language Model (GLM) the role of a legal representative, constituting the perspective clause.

ChatGPT’s responses to these prompts demonstrate its adeptness at analyzing legal cases and identifying key issues within a given fact pattern. In both scenarios, the model accurately identified major concerns, such as the failure to warn about a chocolate bar's potential risks to children and the manufacturer’s possible negligence. The GLM then furnished relevant and issue-specific arguments (see Tables 3, 4).

Table 3 CIO for Scenario 3 (see Fig. 3)
Table 4 CIO for Scenario 4 (see Fig. 4)

Additionally, ChatGPT cited pertinent case law to bolster its arguments. Notably, it referenced cases like Greenman v. Yuba Power Products, Barker v. Lull Engineering, and Anderson v. Owens-Corning Fiberglas Corp. Two of these cases, integral to a UCLA Law Torts course, were correctly identified by the model, along with other significant cases mandated as authoritative in California.

Furthermore, in both outputs, ChatGPT correctly identified the possibility of a comparative negligence defense by the chocolate bar producer. This is important for the plaintiff to consider and prepare for, especially because it may greatly reduce the amount of damages they can recover.

Regarding the effectiveness of the perspective clause, a discernible improvement was evident in the second output (see Table 4), which included this clause. The initial paragraph of this output categorizes California's products liability cases, pinpointing the applicable category for the fact pattern and elucidating the rationale. This insight is invaluable as it provides users with a clearer understanding of the relevant legal landscape in California. Moreover, the second output excelled in simulating the role of a client’s attorney, offering guidance on trial preparation. The response was more impressive than the first one, particularly in addressing critical considerations for a client assessing the viability of a cause of action, such as the ability to gather essential evidence. This is a point that a Tort Law professor at UCLA Law emphasized as being critically important. The model suggested gathering pivotal evidence like the chocolate bar wrapper and the child’s medical records, which are vital for the upcoming litigation. This aspect was completely ignored by the first output which was not prompted using a perspective clause. Lastly, ChatGPT demonstrated a more nuanced and focused approach in its response to the second prompt. Specifically, its analysis in the negligence section appeared more precisely aligned with products liability issues, rather than a broader, generalized negligence context. This refined focus is indicative of the perspective clause’s strength in creating more focused legal analysis, which is more in line with the analysis performed by a UCLA Law Professor when presented with a similar fact pattern. ChatGPT’s mention of the attractive nuisance doctrine argument highlights the model's ability to explore a wide range of legal concepts, even though this was not mentioned by a UCLA Law Professor when analyzing the same fact pattern. This aspect, while showcasing the breadth of ChatGPT’s legal knowledge, also underscores the importance of contextual relevance. The effectiveness of the GLM's legal reasoning can be significantly enhanced by incorporating a perspective clause, a fact that is clearly demonstrated in this experiment.

While ChatGPT's responses may not meet the exacting standards required for legal briefs or judicial proceedings, this experiment underscores the utility of ChatGPT and the structured framework proposed in this study. These tools emerge as valuable assets for legal professionals seeking inspiration and case law references, as well as for clients desiring insights into potential legal discussions and strategies their attorneys might employ. The prompts also highlight the benefits of incorporating a perspective clause in legal prompting, markedly enhancing ChatGPT's output quality.

4.5 Scenarios 5 and 6

Employer discrimination based on sex is still extremely prevalent in the workplace with around 42% of women reporting that they have experienced some form of gender based discrimination [21]. Bringing a case for discrimination usually falls under the Equal Pay Act of 1963 (EPA) and requires that a plaintiff show that an employer pays different wages to employees of opposite sexes for equal work on jobs, performance of which requires equal skill, effort, and responsibility and the jobs are performed under similar working conditions [22].

This prompt was modeled off a UCLA School of Law Legal Research and Writing memorandum assignment that has cases and analysis provided by a joint number of UCLA School of Law faculty. This scenario was chosen to assess how well the GLM performs when given a tonality clause, providing us with a proof of concept for the tonality clause. The scenario was purposefully tailored so that objectively, the EPA claim would likely swing in favor of the male chef. We then introduced a tonality clause that asked the model to analyze the situation in favor of the female chef. This is so we could properly assess how useful the tonality clause will be when helping to analyze fact patterns that have unknown outcomes.

We expected that the fact pattern laid out in Scenario 5 (see Fig. 5) above would lead to an analysis that was either neutral or slightly against the female client. This ended up being the case as the output from ChatGPT indicated there may be a claim, but the additional responsibilities and duties of the male chef might be hard to argue against (see Table 5). The GLM also correctly included both Corning Glass Works and Rizo v. Yovino, both of which were verified by UCLA Law Professors as relevant to this scenario, although Rizo has been challenged and attorneys would not want to rely on certain parts of that case. The GLM also did a good job of summarizing the relevant Equal Pay Act issues, particularly sniffing out the correct issue of figuring out whether or not the jobs are “substantially equal.” This issue was again verified by UCLA Law Professors as the main issue presented. The GLM also correctly noticed that the facts were indeed stacked in favor of the male chef, outputting that “The male chef's responsibility of generating profitable ideas, however, seems to be a significant additional duty. Your client's artistic contribution, while valuable, might not equate to the revenue-generating ideas in terms of business impact.”

Table 5 CIO for Scenario 5 (see Fig. 5)

On the other hand, when the tonality clause was introduced and the model was told to analyze the situation in favor of the female chef, the model overall performed well in understanding this nuance (see Table 6). The GLM still outputted both Corning Glass Works and Rizo v. Yovino. The model additionally however, outputted lines of reasoning that truly favored the female chef such as: “Your client’s diverse experience, including roles at California Pizza Kitchen and as an Executive Chef, is comparable to her male counterpart's experience. This needs to be highlighted…” and instead of “Address the Revenue Generation” from the previous output, the model recommends to “Downplay the Revenue Generation.” These key differences show the value and efficacy of including the tonality clause as part of the prompting nomenclature to probe fact patterns and real life scenarios from multiple angles, creating a wide breadth of arguments.

Table 6 CIO for Scenario 6 (see Fig. 6)

Given all of the above results, our default prompting recommendation is to include background, dependent, generative, source, and perspective variables/clauses. This seems to provide the best quality of legal output. When the situation requires biased opinion, the tonality clause has been shown to provide effective analysis from varying subjective angles.

5 Conclusion

The advent of readily available commercial AI has led to much exploration of GLM usage in many spaces, including the legal field, and to sufficiently ensure that legal results are both accurate and standardized, there needs to be a standard set of clauses and nomenclature used when prompting the GLMs. This paper serves to create one of the first (if not first) instances of such nomenclature, providing a beneficial and robust framework that will allow for better legal focused results. This nomenclature was utilized in a multitude of legal scenarios which displayed that this framework has vast potential in legal applications from both client and attorney perspectives. Additionally, the proposed framework will act as a solid base for future researchers to further expand. Throughout this paper, we recommend several avenues of future prompting research of GLMs such as ChatGPT in the legal field which will lead to these tools being more widespread and beneficial to society.