1 Introduction

After the European Commission tabled the proposal for the Artificial Intelligence Act (AI Act) in April 2021 (European Commission 2021b), important market and technological developments in the field of AI have taken place, including the emergence of ChatGPT (OpenAI 2022) and other similar tools, showcasing the general purpose capabilities of Large Language Models. Those developments have attracted a lot of public attention, prompting EU policy makers to specifically consider them in the course of the negotiations of the AI Act under way (Helberger and Diakopoulos 2023; Hacker et al. 2023; Boine and Rolnick 2023).

In the European Union legal order, which is distinct from the legal order of the Member States belonging to it, most laws are the result of a process where three institutions play an essential role. The European Commission, the executive, has the exclusive power to initiate legislation by preparing and publishing a draft law. This is sent in parallel to the European Parliament, directly elected by the EU citizens, and the Council, representing the government of all 27 Member States. The European Parliament and the Council are known as the ’co-legislators’ insofar as they have the power to adopt the final law by agreeing on the same text. Each of them follows internal procedures to reach its own position (i.e. a set of amendments to the Commission proposal). In order to reach a compromise in a more effective manner, the European Parliament, the Council and the Commission have developed the the practice of holding "trialogues" meetings between their representatives having received a mandate to negotiate a political deal on the basis of their own starting position. Any political deal reached in trialogues must be articulated into a fully fledged legal text, which has to be endorsed by the co-legislators In trialogues the Commission does not have a formal decision-making role, but remains an influential player by providing technical support to the other two institutions (European Parliamentary Research Service 2021).

The position expressed by the co-legislators, notably the General Approach of the Council (December 2022)Footnote 1 and the text adopted by the European Parliament (June 2023),Footnote 2 introduce in the AI Act new terms and concepts, notably general purpose AI system, foundation model and generative AI, and foresee accordingly dedicated legal provisions. These terms and concepts add to the already rich and varied ecosystem of specialised jargon used in the research community (Estevez Almenzar et al. 2022; Smuha 2021).

Terminological choices in law are essential and have far reaching consequences for the practical impact of the rules being introduced, especially in areas permeated by a high-degree of interdisciplinarity like AI (Koivisto et al. 2024; Schuett 2023).

The objective of this paper is to present and explain those terms and concepts, considering both a technical and legal perspective, with a view to facilitate a correct understanding thereof. To offer a useful context for the reader, we also propose a section on the term AI system, already included in the Commission proposal.

The considerations contained in this paper are based solely on the texts adopted separately by the three EU institutions before the final political agreement reached on December 9, 2023. We do not comment on the final text of the AI Act, which is still to be published at the time of writing of this paper. Nonetheless, we believe that the analysis we offer provides relevant background and tools also for a correct understanding of the adopted law, which results from negotiations between the co-legislators on the basis of their own position.

2 AI system

One of the most consequential concepts in the AI Act is the definition of AI system. This determines which systems fall within the scope of the regulation and may therefore be subject to the requirements and obligations established therein depending on their risk level.

Table 1 contains the several versions of the definition of AI system discussed herein. Although certainly important elements to consider, in the interest of brevity, we do not analyse the recitals (of the AI Act) or other explanatory text (of the OECD) accompanying the definition strictly speaking. In EU law the recitals can provide additional interpretative context in case of ambiguity in the operative provisions, but are not binding as such.

The AI Act, which follows the logic of the vast EU product legislation acquis called ’New Legislative Framework’, considers an AI system as a product, as discussed by Mazzini and Scalzo (2023). In line with other product legislation, the intended purpose of the AI system (i.e., the use for which an AI system is intended by the provider, including the specific context and conditions of use: Art. 3(12) Commission proposal) plays an essential role, including, among others, its classification according to risk levels.

The Commission’s intention was to propose a definition of AI system that would exclude conventional software as this does not present the characteristics or challenges typical of AI systems, as discussed in the Commission’s impact assessment (European Commission 2021a). On the other hand, the definition was intended to be inclusive enough to cover not only machine learning, but also other AI techniques and approaches such as symbolic ones. In addition, the definition was intended to be capable of adapting to new developments in the field. Because the definition of AI system in the AI Act should serve a regulatory objective and therefore provide legal certainty to operators, other definitions from the scientific literature or other non-binding policy documents would as a matter of principle not be fit for purpose, as discussed by Samoili et al. (2021).

In order to ensure an alignment of the EU legislative initiative with concepts emerging in international fora, the Commission took inspiration from the definition of AI system proposed by the OECD in its 2019 Recommendation of the Council on Artificial Intelligence (OECD 2019). However, because the OECD definition was not designed for regulatory purposes, some refinements were made in the AI Act proposal.

First, the Commission referred to “software” as opposed to “machine-based system” in order to align with already existing references to “software” in other EU legislative acts (e.g. the Medical Devices Regulation regulates "software" as such - which can itself be or can include an AI system - as a medical device).Footnote 3 Second, it highlighted that an AI system “can, for a given set of human-defined objectives, generate outputs” and made it also clear that those outputs can include “content”. Third, the Commission added that the software should be developed “with one or more of the techniques and approaches listed in Annex I”. In order to ensure legal certainty, Annex I listed a set of techniques used in machine learning, logic- and knowledge-based approaches and statistical modelling, which could be updated via delegated acts.

Although there are differences in their choices, both the Council and the Parliament aimed in general to keep the definition of AI system as aligned as possible to the OECD wording.

As regards the Council, it deleted the Annex I with the list of techniques, and incorporated the concepts that the Annex I was meant to clarify and exemplify in the definition itself, with the sentence “infers how to achieve a given set of objectives using machine learning and/or logic- and knowledge-based approaches”. In addition, the Council replaced the term “software” with “system”, added the concept of “elements of autonomy” and emphasised the relevance of data and inputs.

Like the Council, the Parliament also deleted the Annex I, but overall departed much less from the wording proposed by the OECD. Among others, the Parliament deleted the word “content” proposed by the Commission, replaced the term “software” with “machine-based system”, and included the reference to “varying level of autonomy”.

Compared to the definition adopted by the Council, the definition of the Parliament is not clear in making a clear distinction between AI systems and non-AI systems by omitting the reference to AI approaches and the ability of the system to infer how to achieve its own objectives. Other differences between the Parliament approach and that of the Council relate, for instance, to the deletion of the word “content”, the reference to a different concept of autonomy (“varying levels of autonomy” as opposed to "elements of autonomy") and the addition of the words “explicit or implicit" for the objectives of the system.

In parallel to the legislative deliberations in the EU, the OECD (2023) worked on revising the initially proposed definition of AI system, amended at the 2023 Recommendation of the Council on Artificial Intelligence, followed by an explanatory memorandum in 2024 (OECD 2024). It is interesting to note that the revised OECD definition includes elements discussed at EU level - notably in order to clearly differentiate an AI system from other more traditional software systems - such as the elements of "infer [...] how to generate outputs" and the mention of “content” as a possible output of the AI system.

Table 1 List of AI system definitions

3 General purpose AI system

The term general purpose AI system was used for the first time in a regulatory context in the Council General Approach. It emerged as a new concept during the Council debates as regards the role and responsibilities of actors in the AI value chain, i.e., of those actors whose software tools and components can be used by downstream providers for the development of AI systems that are under the scope of the AI Act (e.g. high-risk AI systems). Most likely, the expression “general purpose” was chosen as a derivation of the key concept of “intended purpose” of an AI system, although the two concepts should not be understood as antonyms.

In fact, in the debates surrounding the AI Act too often the legal term “intended purpose” has been incorrectly understood as specific purpose. Most probably this confusion originated because all high-risk AI systems (to which the greatest majority of the AI Act provisions are devoted) have an intended purpose that is specific (e.g. assessing creditworthiness of individuals, selecting resumes of job candidates, etc.) and, as mentioned earlier, the concept of “intended purpose” is key in the logic of the AI Act for the classification of those systems. However, as the definition itself states (Art. 1(12)), the concept of “intended purpose” merely refers to “the use for which an AI system is intended by the provider”, regardless of whether that use is general or specific.

In recital 60 of the proposal, the Commission had acknowledged the complexity of the AI value chain and the importance of the cooperation between relevant third parties and provider of the AI system with a view to enable compliance by the latter of their obligations under the AI Act. Such cooperation was left to the relevant parties’ contractual freedom and, consistently with the product legislation approach, regulatory compliance was identified as a responsibility of the provider of the final (high-risk) AI system.

The Council defines the concept of general purpose AI system as an AI system that “is intended by the provider to perform generally applicable functions such as image and speech recognition, audio and video generation, pattern detection, question-answering, translation and others; a general purpose AI system may be used in a plurality of contexts and be integrated in a plurality of other AI systems” (Art. 3(1b) General Approach).

The main characteristic of the general purpose AI system emerging from the Council text is that it can be integrated as a component of a downstream AI system covered by the AI Act, but also that it can be used “as such” as an AI system falling under the scope of the AI Act (Art. 4b(1) and recital 12c General Approach).

In the light of the above elements, the Council suggested to impose specific requirements and obligations (very much aligned with those applicable to high-risk AI systems) on the provider of this category of AI system. On the one hand, such requirements and obligations would lead to an improvement in the balance of responsibilities in the AI value chain (notably when those systems are integrated as components of high-risk AI systems). The downstream provider would benefit from the compliance activity performed by the upstream provider. On the other hand, the approach proposed by the Council would also serve to ensure compliance with the applicable requirements of the AI Act where these systems, in spite of the ’generality’ of purpose intended by the provider, due to their capabilities could be used by themselves as high-risk AI systems. In order to maintain a close link of the new rules on general purpose AI systems with the risk-based approach of the AI Act, the Council clarified that compliance by the providers of general purpose AI systems with the specific requirements and obligations is not expected if those providers have explicitly excluded in good faith all high-risk uses.

From the examples provided in the Council definition, general purpose AI systems include systems that generate content such as audio, video, and also text (e.g. question answering, translation). Therefore AI systems like ChatGPT, which was launched by OpenAI on 30 November 2022, i.e. just around the time the Council was finalising its General Approach - can certainly be considered an example of the concept of general purpose AI system introduced by the Council.

However, the Council definition of general purpose AI system appears to be more encompassing, in that it also refers to “narrow” AI systems, i.e. AI systems typically constrained to perform a more limited task or function, such as image and speech recognition or pattern detection on a specific domain. While AI systems belonging to this category are different from the previous ones that display broader capabilities and can perform a plurality of tasks, they are part of the AI value chain and can be integrated as components of downstream AI systems such as high-risk AI systems.

The Parliament proposed a different definition for general purpose AI system, referring to an “AI system that can be used in and adapted to a wide range of applications for which it was not intentionally and specifically designed” (Art. 3(1d)). This definition appears to exclude the category of narrow AI systems, and focuses essentially on systems that are capable of dealing with tasks outside those for which they were specifically trained. Therefore, this definition points more directly to those highly capable AI systems that have become known to the public since the appearance of ChatGPT.

In this paper we focus only on the terminology used by the three EU institutions in the context of the AI Act. It is to note, however, that several other terms have been used often interchangeably by different actors, including outside the scientific communities, to refer to AI tools like ChatGPT and related underlying techniques (e.g. generative pre-trained transformers). Such terms include for instance large pre-trained models, large language models, base models, foundational models, frontier models and also generative AI.

4 AI system vs AI model

In the light of the terminological variations just mentioned, it is important to understand the difference between AI model and AI system. In short, a system is a broader concept than a model, to the extent the latter is solely a component, among others, of the former.

An AI system comprises various components, including, in addition to the model or models, elements such as interfaces, sensors, conventional software, etc. From a scientific and technical standpoint and in accordance with ISO (2022) terminology, an AI model is a “physical, mathematical, or otherwise logical representation of a system, entity, phenomenon, process or data” and a machine learning model is a “mathematical construct that generates an inference or prediction based on input data”.

Therefore, the model in itself would not be useful, or rather usable in the first place, unless it is complemented by other software components which, together, constitute a system. In fact, only a system is capable to perform tasks and to function effectively, including of course interacting with users, other machine based systems and the environment more generally, as considered by Junklewitz et al. (2023). As an example, while ChatGPT is an AI system, a chatbot, GPT 3.5 is the model powering the chatbot. Figure 1 depicts the relationship between AI model and AI system.

Often times the terms “system” and “model” are used interchangeably, especially in non-technical discussions, in relation to AI tools such as ChatGPT and similar applications. However, the distinction between the two becomes very important from a legal perspective if a decision is made to establish binding rules not only at the level of an AI system, but also at the level of its components, like the AI model.

In fact, unlike the Council, which established additional rules for the new category of general purpose AI systems, the Parliament opted to establish new rules upstream in the AI value chain and introduced the concept of foundation model defined as “an AI model that is trained on broad data at scale, is designed for generality of output, and can be adapted to a wide range of distinctive tasks” (Art. 3(1c). Accordingly, the Parliament established a number of obligations for the providers of the foundation models, including on: risk assessment and management, data governance, performance requirements as regards predictability, interpretability, corrigibility, safety and cybersecurity of the model, energy reduction and energy efficiency, technical documentation, quality management system and registration (Art. 28b).

As any other possible component in the context of the AI value chain, the AI model can be provided by actors other than the provider of the final AI system, who determines the purpose for which the AI system can be used. It is therefore important to duly reflect on any possible obligations at the level of the AI model upstream so that they are appropriate, feasible and calibrated to the position of the actor in question. For instance, as it typically happens in complex engineering products, including software, where components can be sourced from multiple suppliers and integrated by one final manufacturer or developer, the providers of the AI system typically have at their disposal other tools, in addition to those available to providers of the AI models, to handle possible risks (e.g. adapted to concrete applications or scenarios).

This situation may be made even more complex depending on the means by which the AI model can be provided. For instance, if a model is provided as a software library, it may become unfeasible for model providers to monitor the manner in which the model is further deployed downstream or monitor whether the use of the model may have lead to problems or accidents.

The above does not mean however that the provider of the AI model should not have any role for compliance purposes, notably to the extent legal compliance downstream may be dependent on information available only to the provider of the AI model. In this respect an adequate level of transparency vis-a-vis downstream providers of AI systems seems necessary to address the risk of an imbalance of knowledge and ensure a fair allocation of responsibilities overall.

Fig. 1
figure 1

Diagram depicting the relationship between AI model and AI system

5 Foundation model

The term foundation model was introduced by the Stanford Institute for Human-Centered Artificial Intelligence in August 2021. That concept refers to a new machine learning paradigm in which one large model is pre-trained on a huge amount of data (broad data at scale) and can be used for many downstream tasks and applications (Bommasani et al. 2021).

The approach of pre-training a model (a base model) for a particular task and adapting it to a different task or set of tasks is a common approach in the machine learning community. This is traditionally enabled by transfer learning, where, according to Thrun and Mitchell (1995), “knowledge” learned from one task is adapted and applied to another task via fine-tuning (i.e., the parameters of a pre-trained model are re-computed with new data for another task).

However, when it comes to foundation models, such machine learning approach takes a different dimension. The key element of differentiation is the generality of capabilities of the model, which derive primarily from the architecture and size of the model (e.g. number of parameters in the neural network), the scale of the training data and compute, as well as the training methodology.

Learning objectives of foundation models involved in their training tend to be general and largely focused on the structure of the data itself, that is, learning representations directly from the data attributes without the need of a specific ground truth. Examples of learning objectives are predicting the next word given a sentence, capturing a distribution of images given a text prompt, or capturing and encoding representative features of data (images, audio or text).

At the current stage of technological development and architectures, the capabilities of the model are largely influenced by the size of the model itself and the amount and quality of the data used to train the same. It is worth noting that the relevant orders of magnitude (e.g. as regards model size or amount of data) are dependent on the modality (i.e., text, images, video or audio) and they are also a moving target. For example, in 2019, a language model such as BERT by Devlin et al. (2019), with 340 million parameters trained on a dataset of about 3.3 billion tokens, was considered a very large model trained on a very large dataset. However, four years later, language models considered large are of the order of tens or hundreds of billions of parameters, trained on datasets of more than 1 trillion tokens (e.g. a state-of-the-art model such as Llama 2 by Touvron et al. (2023) contains 70 billion parameters and was trained with 2 trillion tokens).

Considering that quantitative elements can significantly vary with technological development, any possible legal definition that would factor in scale (e.g. data, model parameters or merely the training compute as a joint indicator) as a proxy for capabilities to define the boundaries of foundation models should include a mechanism for flexible and timely updates to ensure that the definition remains future-proof. At the same time, it is important to consider the impact of the technological advances (e.g. new neural network architectures) whereby models may be highly capable also if they have a lower number of parameters or have been trained with a lower quantity of data or computational resources. Fig. 2 shows that whilst the compute required for the training of the model is the main factor to predict its performance (in the MMLU benchmark in this case), we see evidence that other factors such as data quality and model architecture also play an important role, which explains why some models like Phi-3 3.8b can outperform much bigger ones (Abdin et al. 2024). Finally, all those factors that could be considered relevant for the purpose of regulating foundation models should be closely monitored and documented to ensure they remain fit for purpose and closely linked to the type of capabilities or potential risk that may be relevant from a policy standpoint.

An alternative definitional approach for foundation models that does not rely on factors that are highly variable depending on modality and technological evolution could be to focus on the ability of the system to be inherently capable to competently perform a wide range of distinctive computational tasks. Such an approach would require establishing a taxonomy of tasks, as well as certain thresholds regarding the minimum number of tasks and the level of performance (Uuk et al. 2023).

Considering that the advanced capabilities of the foundation models, notably those capabilities that may generate certain risks, are what could justify attention by policymakers, the soundest definitional approach would be to attach regulation to the existence of those advanced capabilities, rather than other elements. From a technical point of view, standardised evaluation protocols or tools (i.e. benchmarks) are the key tools to actually identify the existence of the capabilities and limitations of the foundation model.

The evaluation of the model against standardised benchmarks would also be the basis for an informed appreciation of the risks that may possibly be associated with the specific context of the AI system in which the model is integrated. Evaluation protocols and tools for foundation models are the subject of intense research by the AI community and still in the process of being developed and validated.

Finally, it is important to note that foundation models can be used for generating content in the form of text, images, video or audio. Therefore, foundation models can be at the basis of what is colloquially referred to most recently as generative AI, described in the following section. However, it should be noted that foundation models can also be used for “non-generative” purposes. These would typically imply a more limited output (e.g. a numeric or discrete value), rather than generating a longer free-form output. Some examples are text or image classification.

Fig. 2
figure 2

Graphical representation of estimated compute required to train the model vs MMLU performance for several popular LLMs. Adapted from https://www.interconnects.ai/p/llama-3-and-scaling-open-llms.

6 Generative AI

The concept of generative learning is well known in the machine learning domain. For example, the distinction between discriminative models, focused on predicting the labels of the data, and generative models, focused on learning how the data are distributed, is a classic topic in this field as observed in Jebara (2004). However, most recently the term generative AI has become widely used, notably after the emergence of consumer facing products such as ChatGPT or Midjourney, to refer to AI models or AI systems that generate content such as text, images, video or audio. The content generated by these tools is usually conditioned to an input prompt provided by the user (e.g. question-answering, text-to-image, text-to-video or text-to-audio). However, from a technical point of view, the generation capability does not necessarily depend on a prompt (e.g. automatic generation systems).

Generative models have been around for quite some time. Some architectures such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs) or Recurrent Neural Networks (RNNs) have been widely used to develop generative models since at least 2014. These architectures are less scalable than other more recent ones, so they have traditionally been developed at smaller scales in relation to the number of parameters, data and computational resources. For this reason, models based on VAEs, GANs or RNNs are not usually considered as foundation models. Therefore, generative AI does not necessarily imply the use of foundation models. However, the more powerful generative models, based on other architectures such as the Generative Pre-trained Transformer (GPT) or the diffusion models, which have typically been associated most recently to the concept of generative AI, are considered as foundational models.

It has to be noted that the concept of generative AI is already captured in the Commission and Council definitions of AI system, as they refer to the generation of “outputs such as content”. Moreover, specific transparency obligations are introduced for this type of AI system outputs addressed to “users of AI systems that generate image, audio or video content that appreciably resembles existing persons, objects, places or other entities or events and would falsely appear to a person to be authentic or truthful (’deep fake’)” (Art. 52(3)).

The position of the Parliament (Art. 28b(4)) includes specific obligations for providers of foundation models used in AI systems intended to generate content and for providers of foundation models specialised into a generative AI system. This approach appears to link the concept of generative AI exclusively to foundation models, excluding therefore other types of AI models capable of generating content.

7 Conclusions

Considering the importance and complexity of key legal concepts of the AI Act (notably AI system, general purpose AI system, foundation model and generative AI), this paper seeks to bring clarity to them from a technical and legal point of view.

After highlighting the need to define the concept of AI system in such a way to duly distinguish it from conventional software/non-AI based systems, we observe that the concept of general purpose AI system - not included in the Commission proposal - is not the same in the Council and in the European Parliament versions of the AI Act. We also address the conceptual differences between the term AI system vs the term AI model and analyse the concept of foundation model in the light of its particular technical properties. While different definitional approaches of this concept from a legal point of view are possible and none is without challenges, we consider that the identification of the relevant capabilities on the basis of standardised evaluation protocols or tools appears to be most appropriate approach. The AI community is making great efforts to develop holistic evaluation frameworks to establish the capabilities of these models. Finally, we argue that the concept of generative AI is already captured by any definition of the term AI system that includes ’content’ as one of the system’s outputs, and that not all generative AI is based on foundation models. We hope that our considerations on the specific terminological choices made and wording used by the three institutions in their version of the AI Act can provide useful context and background to better understand the origin and evolution of the text of the AI Act, including in its final form.