1 Introduction

In 2021, the European Union (EU) published its first draft of the Artificial Intelligence (AI) Act. Among its proposals, this comprehensive regulation creates a risk classification framework that groups AI systems into four categories: unacceptable, high risk, limited risk, and low risk. Acknowledging the changing nature of technology, authorities incorporated a provision to continuously assess the risk classification of systems. In making these determinations, the EU is instructed to consider “the intended purpose of the AI system (Council of the European Union, 2021a).” This provision raises a critical issue, which is that AI systems may escape or evade the Act’s safeguards because there can be a complex mapping between who develops and deploys them, the functions or tasks they perform, and the purpose(s) they serve as a product. Hence, discussion of technologies lacking an intended purpose has crystallized as a debate around the meaning of the term general purpose AI system (GPAIS).

As it stands today, EU policymakers have proactively proposed a characterization of GPAIS in the AI Act. The Slovenian EU presidency defined this technology as an “AI system… able to perform generally applicable functions such as image/speech recognition, audio/video generation, pattern detection, question answering, translation, etc. (Council of the European Union, 2021b).” The French EU presidency further emphasizes that GPAIS: “may be used in a plurality of contexts and be integrated in a plurality of other AI systems (Council of the European Union, 2022a).” Updated versions of the proposal discuss the role of actors in GPAIS’s development and value chain (Council of the European Union, 2022b). We believe that the current conception of the technology in the AI Act has many opportunities for improvement because it is overinclusive of a variety of systems that could be considered fixed purpose.

How the discussion unfolds on the definition of GPAIS is crucial to the governance of AI in the EU and as a signal for global policymaking.Footnote 1 To complement the debate, this piece is written with three objectives in mind: first, to highlight the variance and ambiguity in the interpretation of GPAIS in the literature; second, to examine the dimensions of the generality of purpose available to define GPAIS; lastly, to propose a functional definition that facilitates this technology’s governance. Our intention with this piece is to offer policymakers an alternative perspective on GPAIS that improves the hard and soft law efforts to mitigate these systems’ risks and protect the well-being and future of constituencies in the EU and globally.Footnote 2

2 GPAIS: A Definitional Morass

The AI Act has created a need for a definition of GPAIS that distinguishes between fixed and general purpose systems where no consensus on this matter exists in the academic or grey literature (Feigenbaum, 1963; Hernández-Orallo, 2017; Kagiyama et al., 2019; Nilsson, 1983; Russell, 2021; Strelkova, 2017). Prior to its adoption by the EU, few instances are identified where AI systems are referenced as GPAIS. When this happens, the term is used to describe AI systems that vary considerably in terms of autonomy, modality, and training methods.

On one end of the spectrum, GPAIS possess abilities comparable to or greater than humans. Scholars have characterized these technologies as being able to adapt to unknown environments, make decisions with limited resources, and function in complex domains that are currently reserved for humans due to the need for contextualization (Wang, 2004; Bieger et al., 2016; Nilsson, 2005; Meltzer, 2018; Szegedy, 2020). This conception of GPAIS resembles the original notion of the general problem solver (or the oft-employed but also imprecisely-defined artificial general intelligence), whereby human aptitudes such as creativity, dexterity, and ingenuity manifest in a machine system (Newell et al., 1959).

On the other end of the spectrum, AI systems with limited functionality are occasionally referred to as GPAIS (Bennett & Hauser, 2013). For instance, a medical application in the form of a disease-agnostic simulation able to predict “the consequences of various treatment or policy choices” is described as a GPAIS. Similarly, others envision it as a technology that can play different games, a language model that generates strings of text, or a text-processing tool that analyzes patent data to assist government examiners with a variety of duties (Alderucci & Sicker, 2019; OECD, 2022; Somers, 2018).

3 Perspectives on Generality of Purpose

Identifying the technologies that fit under the GPAIS umbrella requires agreement on the meaning of the acronym’s first two initials. We begin by noting that there are several dimensions in which generality of purpose can be construed, corresponding to different options for the EU to consider as most appropriate for their constituency and regulatory objectives. In this section, we focus on four alternatives: ability, domain, task, and output (Hernández-Orallo et al., 2021). Importantly, we also emphasize that the generality of a system is distinct from its capability. Capability points to a system's competence, accuracy, or effectiveness; generality expresses how broadly and evenly capability is distributed.

Generality of purpose could center on a system’s abilities, otherwise described as the category of actions or processes it can affect. The literature has identified several fields to organize abilities such as language, vision, robotics, interaction, understanding, reasoning, and search (Bommasani et al., 2108; Hernández-Orallo, 2017). In fact, the current proposal for a definition by the EU appears to relate the term “functions” to what is often referred to as ability. Some existing systems have one core ability (such as image recognition); others (e.g., Dall-E and Open Diffusion) include combinations—such as text processing plus image generation—and could be considered more general in this dimension.

Domain (also known as application or economic context space in the literature) is another option for contextualizing general purpose (Bommasani et al., 2108; OECD, 2022). It refers to the sector(s) of the economy where a technology provides its solutions (e.g., healthcare, education, defense, and infrastructure). This concept is a proxy for a system’s competence in fulfilling the needs of stakeholders in a particular sector. If we take education as an example, this entails satisfying, to a degree, the needs of teachers, parents, administrators, and students, among others. AI systems servicing multiple domains would be thought of as more general than those that serve one or a portion of a domain.

The specification of a distinct type of problem, the definition of goals, or actions a system must achieve is called a task (Hernández-Orallo, 2017; OECD, 2022; Thórisson et al., 2016). Tasks relate to but are distinct from abilities. Abilities represent the overarching range of actions, goals, or problems that a system performs, while the term task is used to denote a particular concrete combination of these elements. For example, most large language models have the ability to process and produce text. But “processing text” is not a well-specified task. Rather, tasks would include summarizing text, labeling data, and answering classes of questions. An AI system that is more general in this dimension would perform a wider variety of distinct tasks.

Lastly, there is output. This term is the final chain in the AI process and identifies the results of a task (Ganguli et al., 2022; OECD, 2022). This can be exemplified by a unique essay or image emanating from a single sentence prompt, the set of actions taken in playing a game or driving a car, or validation of an individual’s identity via facial recognition. An AI system’s generality within this dimension can be gauged by the variety of different outputs that are produced.

These distinct dimensions of generality would classify AI systems in different ways. For example, an image classification system would be task-narrow, ability-narrow, and probably output-narrow but could be domain-general, by being incorporated in diverse applications. Meanwhile, a complex task (like autonomous driving) could require many diverse abilities. In terms of capability versus generality, one could have a highly capable but narrow system like AlphaGo or a very general system with low levels of capability, making it mediocre in the many things it does.

4 Proposing a Definition for GPAIS

Our approach in discerning GPAIS is to propose a task-based definition. In essence, a task is the building block of what AI systems achieve. Tasks can be created to focus on one ability and domain or permeate through several. In addition, regardless of how unique an AI system’s outputs are, its tasks remain a factor that can consistently be used to measure and compare the activities of different systems.

It is relevant to recognize that “tasks” stand in contrast to the term employed by the EU in its proposal, “functions.” The terms “function” or “functions” are found 12 times throughout the EU’s proposal for an AI Act, and their meaning ranges from working as intended, the objective of a system, to the role of an individual, among others (Council of the European Union, 2022b). We believe that the selection of “tasks” increases the specificity and clarity of a definition because it identifies a particular element of what an AI system achieves. On the other hand, “functions,” as used by the EU, is not a word generally used to denote either a task or ability. Hence, as it stands, the proposed EU definition encompasses systems that are considered fixed and general purpose, which defeats the purpose of defining GPAIS in the first place.

To contextualize our proposal, we divide AI systems into two groups. First, there are fixed-purpose systems. As their name suggests, they are created with a specific objective and can only accomplish tasks they are trained to perform. For instance, a fixed-purpose translation AI system is limited to translating text. Although systems may be coupled or combined with others to complete a higher number of tasks, if each of them is limited to performing the task they are originally trained for, they qualify as fixed-purpose.

GPAIS can also be trained to complete specific tasks. However, these systems differ from their fixed-purpose counterparts because they can perform tasks that they were not originally trained for. This is due to a combination of factors such as the quantity of input data or model structure. For instance, even though GPT-3 was trained to predict the next word in a string of text, it has been fine-tuned to support new tasks such as translation or coding. Considering the factors above, we propose the following definition of GPAIS:

An AI system that can accomplish or be adapted to accomplish a range of distinct tasks, including some for which it was not intentionally and specifically trained.

This definition highlights the distinctive feature of a GPAIS, which is its ability to complete tasks outside of those it is specifically trained for. Thus, it includes unimodal (e.g., GPT-3 and BLOOM) and multimodal (e.g., stable diffusion, GPT-4, and Dall-E) systems. It contains systems at different points of the autonomy spectra, with and without humans in the loop. GPAIS also differ in terms of their learning potential. While GPT-3 does not retain information from each interactive session, MuZero starts with complete ignorance about its virtual surroundings and then continuously learns about them. The term GPAIS also encompasses systems with “emergent” abilities, meaning new and surprising abilities that manifest at threshold levels of model parameter count and/or training computation (Wei et al., 2206). Lastly, GPAIS are trained through different methods. Whereas Gato uses supervised learning, MuZero is based on reinforcement learning (Reed et al., 2205; Schrittwieser et al., 2020). Intuitively, the proposed definition can be read as: in fixed-purpose AI systems, we choose a set of tasks and then train a system to do those particular tasks. In a GPAIS, we train the system and then choose tasks for it to do (perhaps with additional fine-tuning).

Our proposed definition for GPAIS excludes a variety of systems that are designed and trained to complete concrete tasks and objectives. Examples include image classification and recognition systems, among others. We consider such technologies fixed-purpose systems that can be incorporated into a variety of other systems (and/or used in different domains) but are not themselves GPAIS based on our definition. Likewise, a product such as a voice assistant that incorporates a number of fixed-purpose technologies would be excluded from the definition as long as each task in its repertoire corresponds to a model trained specifically for a task and it cannot perform distinct tasks beyond that.

As technology continues to progress, AI systems will jointly evolve with it. Efforts in research and development that began in the 1950s led to the creation of fixed-purpose systems. Now, they are followed by GPAIS, and in the future, we can expect advances in autonomy, capability, and generality that supersede any current form of technology. Regardless of what name these systems are given, general or strong AI, our proposed definition is agnostic to these breakthroughs and includes them as a subset.

Our definition is largely qualitative, though one can consider a system to be “more” of a GPAIS to the degree that it performs a wider variety of tasks, less of which it was specifically and originally trained for. For some purposes, it could be useful to consider quantitative measures that evaluate the degree of a system’s generality and capability by assessing its performance according to evaluation metrics across a wide range of tasks (Council of the European Union, 2021b; Hernández-Orallo et al., 2021). The completion of such work requires significant further research.

5 Implications of Defining GPAIS

No consensus definition of GPAIS is available and this has important repercussions on the governance of AI. As society attempts to promote safe and trustworthy AI systems, it needs clear guidance for discriminating between fixed-purpose and GPAIS, and for identifying their potential beneficial and harmful uses.

The EU is a pioneer in AI law. While the AI Act is a groundbreaking piece of proposed legislation, one of its weaknesses is its focus on the purpose of AI systems (Boine, 2022). This approach works effectively with fixed-purpose systems because they are designed with a defined set of tasks in mind, making it possible for policymakers to pinpoint the border between a low and high-risk system.

Conversely, GPAIS do not necessarily have an intended purpose. They can engage in tasks that are outside the scope of their initial development. This means that delimiting a risk profile via the sector approach envisioned by Annex III of the AI Act draft is less effective. Moreover, their ability to function across market sectors means that a flaw in GPAIS may even have a systemic negative effect on the economy. This reality affects all parties (consumers, developers, and downstream firms that incorporate a technology into their offering, among others) that engage with and push the limits of AI systems. Therefore, EU regulators should pursue a regulatory framework that ensures the safety of GPAIS design, development, and deployment.

The definition proposed in this piece represents a first step. Through it, policymakers within and outside the EU are better equipped to classify systems placed within their market. The changing nature of technology will merit continuous updates to our understanding of GPAIS. In fact, members of the European Parliament have proposed the creation of a sub-group in the Artificial Intelligence Board to deal with questions of this nature (Benifei & Tudorache, 2356). Overall, we hope that this piece serves as a contributing factor to improve the predictability of AI governance. Lack of action in addressing the regulatory gap that can surface due to the EU’s emphasis on a system’s “intended purpose” may catalyze important long-term risks that the region and the rest of the world should proactively avoid.