1 Introduction

The philosopher Heidegger often reminds readers of the etymological connection between the (German) words for ‘concept’ or ‘to conceptualise’ (i.e., Begriff; begreifen) and that for ‘grasping’ (i.e., greifen) to emphasise that thinking with concepts goes beyond categorising, classifying, or maintaining a mental image of the external world. He suggests that thinking with concepts—at a more fundamental level—is an essential mode of engaging and interacting with the world and, a notion aptly encapsulated by the metaphor of attempting to comprehend something by (physically) grasping it cf., ‘getting a grip’ on the lived world (Bruineberg & Rietveld, 2014; Kiverstein et al., 2019). Yet, just as one might use the wrong tool for a task, one might employ a concept inappropriately: there exists a standard to judge whether a particular employment of a concept on an occasion is appropriate or inappropriate—for instance, a child might use ‘green’ for blue things, which would be an ‘inappropriate’ use of the term. In such evaluations of the appropriateness of a concept, one has to measure the result of the employment of a concept by some external, independent standard. In other words, concepts are said to have a normative dimension.

Philosophical discourse has yielded diverse interpretations of normativity. Ginsborg (2018), in particular, has identified two senses in which normativity has been attributed to concepts. The first, weaker sense (shared by e.g., Boghossian, 1989; Gibbard, 2012)) contends that the normative aspect of concepts does not inherently reside within the concepts themselves, but rather emerges through the use of concepts by humans in a certain practice. The second, stronger sense espoused by Ginsborg herself (also attributed to Kripke and Kant) posits that normativity is a constitutive element of possessing and employing concepts, requiring that “the subject herself take a normative attitude to what she is doing” (Ginsborg, 2018, p.1009). This distinction delineates a dichotomy between a first person perspective and a third person perspective: in the weaker version, the agents using concepts need only be characterised by fellow agents or by external observers as adhering to normative standards (i.e., behaving in a way that is bound by norms); in other words, the normativity of concepts is downstream from the normativity of the social practice in which concept use is embedded. In contrast, in the stronger version, the normativity of concepts is inherent to the concepts themselves, implying that any subject who has acquired a concept has to have internalised such normativity and be capable of self-conscious evaluation of its use.

The primary objective of the present paper is to propose a perspective that occupies an inclusive middle ground between these two alternative positions: we concur with the stronger view—positing that normativity transcends its exclusive status as a feature of a social practice—and that the first person perspective cannot be neglected; nevertheless, we diverge from the stronger view by offering a distinct perspective of the role of the subject within the realm of normativity.

The remainder of this paper comprises three sections. In preparation of establishing the intermediate view of normativity, the first section provides a brief review of the philosophical problems surrounding normativity and expands on the strong and weak views on normativity. The next section describes our view of the role of normativity in concept use, using the framework of Active Inference. In the final section we revisit the issues raised in the first section, in light of our proposed view of normativity.

2 The Problem of Normativity

The clearest formulation of the predicament concerning normativity can be found in Kripke’s work (Kripke, 1982). Although Kripke’s formulation is predominantly concerned with semantics (i.e., meaning), rather than concepts per se, the underlying reasoning remains applicable. To illustrate this, consider the following hypothetical scenario: one may posit a situation in which an individual has not previously added numbers larger than 57. Subsequently, this individual is confronted with a challenge: determining the result of 68 + 57. If the individual were to employ the concept of addition, the prescribed response would be 125. However, a sceptical interlocutor raises a probing question: why does it not make sense to answer 68 + 57 = 5? The sceptic, in this context, contends that the instances of ‘plus’ operations performed before, are actually cases of another operation, ‘quus’, which happen to coincide with ‘plus’ for numbers smaller than 57. But now for numbers greater than 57 the operation works differently, yielding 5.

The natural response would be that in this case, the individual is simply not employing addition; they have somehow confused ‘addition’ for some other operation, leading to a fundamental failure in understanding the concept of addition. Possessing the concept of addition constrains all applications of the concept, even to cases that one has never encountered before. However, the sceptic’s concern revolves around the underlying issues inherent in this response: upon what basis can it be posited that the individual indeed possesses the concept of addition as opposed to another operation? How does past use and possession of such a meaning or concept determine future deployment?

Kripke himself has explored a variety of possible answers, and subsequent scholarly discourse has further extended this inquiry (for different views, see (Miller, 2022)). Here, we focus on the dispositional approach—a notion relevant to our proposed framework, and that will be discussed shortly in this paper—as it discloses the issue of normativity most clearly. The proposed solution can be summarized as follows: a subject who has acquired the concept of addition has acquired a set of dispositions. In essence, a subject is said to have a disposition to apply a concept if, given appropriate circumstances, it does actually apply such a concept. These dispositions resemble latent potentials awaiting the requisite external conditions in order to manifest themselves. Such dispositions would subsequently dictate future use by activating a response in accordance with their inherent characteristics. However, Kripke (1982) argues that this alone does not suffice: the mere fact that an individual reacts in a certain manner does not render the reaction as correct or epistemologically justified. The crux of the argument is the following:

Suppose I do mean addition by ‘ + ’. What is the relation of this supposition to the question how I will respond to the problem ‘68 + 57’? The dispositionalist gives a descriptive account of this relation: if ‘ + ’ meant addition, then I will answer ‘125.’ But this is not the proper account of the relation, which is normative, not descriptive. The point is not that, if I meant addition by ‘ + ’, I will answer ‘125’, but that, if I intend to accord with my past meaning of ‘ + ’, I should answer ‘125.’... The relation of meaning and intention to future action is normative, not descriptive. (Kripke, 1982), p.37.

Dispositions, in isolation, lack inherent normative qualities: one disposition to react holds an equivalent normative standing as another. While a disposition might prompt the agent to react to a novel circumstance, according to Kripke, it still fails “to tell me what I ought to do in each new instance" (Kripke, 1982, p. 24).

We can construe the problem of normativity regarding concepts as follows. A concept exhibits a degree of generality: to understand a concept C is not only to understand C as applied to a set of objects one has encountered before, but to other objects that are potential candidates for the subsumption under C. This purely logical feature of concepts implies a form of restriction on their future use. Further, this (generalisation) constraint on the future use should be normative, in the sense described. The sceptic presses the following issue: in virtue of what can we say that we are using a concept C instead of any other concept?

Fodor, however, has remained sceptical about the existence of a genuine conundrum in this context. According to Fodor, if a method for modelling categorisation is available (such as an algorithm that applies ‘green’ regularly and reliably to green objects), there exists no additional normative element to account for. From this perspective, what more could be reasonably demanded? The algorithm explicitly prescribes the course of action in novel instances—simply executing the algorithm suffices. The proposed theory, which incorporates dispositions once it is fully developed, does indeed instruct us on how to proceed when confronted with a new instance, and it does so while yielding the correct results regularly and reliably. Hence, in Fodor’s view, the issue of normativity represents a red herring—a.k.a. a diversion from the primary objective of designing the appropriate theory for modelling concept use.

Two responses to Fodor’s perspective offer insights that contextualise the weak and strong views on the normativity of concepts briefly introduced in the introduction. The first response places emphasis on the intersubjective aspect that is inherent in Kripke’s scenario. This response is best articulated as a form of the ‘open question argument’. Consider a scenario in which two individuals both employ the concept of ‘addition’ yet come to disagree on the result of a problem. What serves as the basis for asserting that they disagree as opposed to that they just operate differently? If concepts are reducible to dispositions, then it follows that two individuals who possess different dispositions are thus employing different approaches. Consequently, they are not in a state of disagreement. Yet, in many cases involving the application of concepts, it is preferable to assert that individuals are indeed engaged in a dispute concerning the same concept rather than just operating with different concepts (e.g., original context of the open question argument concerns a dispute about what constitutes moral goodness, where the disputing parties are contending about the same concept, and not simply operating with different concepts; otherwise, there is no thing to dispute about). However, in order to assert that they are disagreeing instead of just operating differently, it becomes imperative to introduce a common standard to which both parties adhere. This appears to require that their respective representations be representations of the same thing. Based on this distinction, one might then claim that the ability to grasp and use concepts extends beyond the ability to discriminate or categorise. It encompasses the additional capability of subjecting our discriminations and categorisations to some external standard, thereby furnishing us with a framework for evaluating appropriateness. In short, normative standards.

The direction of this argument readily leads to the adoption of the weak view on normativity. The focus of the argument here revolves around intersubjectivity, and more particularly, around elucidating the foundational basis of possible agreement between two subjects. These subjects require a prior ground of agreement, which naturally leads to the proposition that a preexisting social practice, in which both subjects partake, serves as the shared common ground. If this is the case, the normativity of the concept would derive from the norms (perhaps implicit) in the practice they participate in. This is the foundational origin of the weak thesis on normativity, espoused by philosophers like Gibbard and Boghossian (Boghossian, 1989; Gibbard, 2012). Embracing this view aligns with Fodor’s assertion that concepts, in and of themselves, lack inherent normativity. It is only when concepts are used by agents within a specific practice that normativity can be attributed to the act of employing these concepts.

However, an alternative response can be formulated. Ginsborg (2018) has argued that independent of the aforementioned intersubjective normativity, there exists an inherent self-aware appraisal of the appropriateness or inappropriateness of a concept accompanying the very use of a concept. According to this view, Fodor’s account exhibits a notable deficiency, even without invoking the open question argument. This deficit pertains to the fact that concepts are not employed by agents as mere tools for categorisation, that is, as mere manifestations of a discriminative capacity or a disposition. As Ginsborg notes, there is a normative attitude associated with such discriminations, an attitude “more primitive” than the ascription of truth or warrant which take place intersubjectively or within a complex practice of providing and substantiating rationales. The use of a concept is not an indifferent act of categorisation; rather it is already imbued with significance and meaning for the subject.

But what is this “more primitive” attitude as understood by Ginsborg? Firstly, it is claimed that this primitive capacity for recognition of appropriateness or inappropriateness bases its evaluations on past experience (Ginsborg, 2018, p. 1012). Secondly—and a much stronger claim—Ginsborg posits that animals are incapable of such primitive attitudes. It is rather a characteristic of human use of concepts linked to the self-aware, reflective use of concepts, i.e., the ability of a human subject to “recognize” the suitability of the application of a concept.

In subsequent sections, we develop an account of concepts that aligns with Ginsborg’s proposition that concepts are normative—in that the discriminative act, in the use of a concept, is already charged with significance for the agent itself and involves a ‘primitive’ evaluative dimension. However, we do not endorse Ginsborg’s strong characterisation of the type of recognition involved in this. While it may be accurate to assert that the human use of concepts showcases a level of sophistication not paralleled by other animals, we are cautious about ascribing a qualitative divergence between human conceptual use and that of other animals. Instead, we leverage the Active Inference framework to reinterpret this primitive manner in which the acts of categorisation are evaluated by the subject by highlighting the fact that such acts of categorisation are embedded within the interaction between the agent and its environment within which the agent maintains itself. Categorisation is itself a dynamical process driven by the dynamics of the subject’s interaction with its environment. Hence, it already has a direct relevance to or significance for the subject, whether or not the subject possesses conscious awareness of its significance. A more comprehensive exploration of this notion will be provided in the next section.

3 An Active Inference View of Concepts

In the preceding sections, we identified philosophers’ observations regarding the gap between the normative use of concepts and the mere discriminative abilities or dispositions. The underlying complexity appears to arise from adopting a peculiar view on dispositions: these dispositions are attributed to a subject in a manner akin to attributing any other properties to an agent (such as its physical shape, mass, and so on), without adequately considering the inherent connection between these dispositions and the agency of the subject. By abstracting the agent from the description of dispositions, the essential link between the disposition and its evaluation in view of the agent is severed, leading to Fodor’s conclusions, where conceptual normativity is no longer apparent.

Contrary to this approach, we will start by delineating the subject as an entity firmly embedded within its environment, and in perpetual interaction with it. To emphasise the dimension of agency, we will employ the term ‘agent’ instead of ‘subject’. We will demonstrate that concepts should be conceived as emerging from these ongoing interactions, constituting a dynamic process that encompasses the agent in its entirety. As such, concepts are more appropriately characterised as abilities an agent possesses within a certain agent-environment context, rather than mere stored representations, or a set of objects in the environment or the world awaiting identification. To flesh this out, we start by specifying the structure of the agent and its relation to the environment, as well as the specific dynamics involved in such interactions that defines the relation in question.

3.1 Agents in Active Inference

At a fundamental level, every agent is extended across time but remains discernible as individuated and capable of some sort of autonomous or self-regulated action. Thus, first of all, the agent is temporal: it is itself properly seen as a process, or an ensemble of processes. Secondly, it must be structured so as to display some sort of control. In other words, the ensemble of processes making up the agent have to be coupled to each other in specific ways that makes the bundle of processes discernible as a singular self-organising system capable of some sort of autonomy. It is this structural organisation that then constitutes the individuality of the agent as it persists through time. These ideas can be made precise as follows.

The processes constituting the agent are understood as a succession of states of a (random) dynamical system. In adopting this perspective, we are simply adhering to the established practice of employing a state space to model systems of interest. Here, the state space can be understood as a structured space of possibilities that the system or the relevant parts of the system possesses.

In light of this, we can define the temporal coherence and individuality of the agent using the structure of the state space (i.e., beliefs about contingencies) of the agent. First, to talk of an agent is to talk of a bundle (of processes) that is in some sense separable from its environment, while simultaneously capable of modifying the environment via its own actions. Thus, it is feasible to characterise the agent via a set of internal states (denoted \(\mu\)), contrasted with external states (\(\eta\)), and provided with a set of action and sensory states (\(\alpha\),\(s\) respectively). For agents that are capable of planning ahead, the active states support possible policies (sequence of actions) that the agents select on the basis of sensory states; i.e., observable outcomes. For instance, we can view the brain as separated from the body via dynamic variables governing the link between brain and the rest of the central nervous system; one could also think of the visual system and its link to the external world, or distinct regions within the brain itself. It is imperative to recognise that in all these scenarios, the concept of separation pertains not to physical spacetime, but rather to the realm of state spaces.

Assuming this separation, we can characterise what it means for such an agent to maintain itself in some manner throughout its interaction with the environment. One way to make this precise is provided by the free energy principle: we consider the agent as entailing a joint probability distribution over the latent causes of its sensory states, observable outcomes, and policies. From our perspective, the agent is structured such that it models the environment and can be seen as a subset of said environment. This model entailed by the agent is the generative model, given by priors and likelihood. More specifically, we can read the internal states \(\mu\) as parameterising probabilistic or Bayesian beliefs about external states \(\eta\), under some generative or world model \(p\left( {\eta ,s,\alpha } \right)\), where \(\alpha\) are the agent’s active and internal states.Footnote 1 The likelihood part of the generative model \(p(s|\eta ,\alpha )\) describes the probability of observable (sensory) consequences, given unobservable (external) causes, while the prior part of the model \(p\left( {\eta ,\alpha } \right)\) is the probability over the external (hidden) causes or states. We will see that the very existence of this partition of states means the implicit agent can be read as inferring, representing, learning and acting upon the causes of its observations.

With this setup, the agent maintains itself in the sense that the trajectory it traces out in state space minimises the path integral of a Lagrangian \({\mathcal{L}}\left( {\vec{x}} \right): = - \ln p\left( {\vec{x}} \right)\); the vector notation is for the fact that here we are not just considering individual states, but trajectories. The form this Lagrangian takes depends on the model of the dynamics the states satisfy—usually this is taken to be driven by Langevin dynamics. This Lagrangian encodes the information theoretic notion of ‘surprisal’, and the path the agent traces out is then the path of least action with respect to the action defined by the Lagrangian encoding surprisal (Friston et al., 2023a).

In light of this formulation, different constraints imposed on the structure of the agent give rise to agents that trace paths satisfying different properties. Furthermore, entertaining certain assumptions about the relation between the sensory, active, internal, and external states results in internal paths of least action minimising free energy functionals of Bayesian beliefs about external paths (Friston et al., 2023a). For mathematical readers, internal states can be read as minimising the following variational free energy:

$$ F\left( {\vec{s},\vec{\alpha }} \right) = \underbrace {{{\mathbb{E}}_{{q\left( {\vec{\eta }} \right)}} \left[ {{\mathcal{L}}\left( {\vec{\eta },\vec{s},\vec{\alpha }} \right) + \ln q\left( {\vec{\eta }} \right)} \right]}}_{{\text{Variational free energy}}} $$

This approach allows us to treat the system as if it is performing approximate Bayesian inference, thereby connecting the purely physical view of agents as systems, with the perspective of self-evidencing proposed by Hohwy (2016). We can see concretely why such agents are said to self-evidence: the agent will act as if it is gathering evidence for its generative model. This is because by pursuing paths of least action, free energy is minimised. Because variational free energy is always greater than the Lagrangian, the Lagrangian is likewise minimised minimising the Lagrangian (or surprisal) just is maximising model evidence; namely, the probability of sensory (and agential) states: \(p\left( {\vec{s},\vec{\alpha }} \right)\) (technically, this is also known as the marginal likelihood because one has marginalised the generative model \(p\left( {\vec{\eta },\vec{s},\vec{\alpha }} \right)\) over unknown external causes).

Particularly important here are agents that infer their own action and, effectively, plan into the future. Such agents behave as if they have Bayesian beliefs over active paths that minimise expected free energy (see Friston et al., 2023a for details):

$$ E\left( {\vec{\alpha }} \right) = \underbrace {{{\mathbb{E}}_{{p(\vec{\eta },\vec{s}|\vec{\alpha })}} [{\mathcal{L}}\left( {\vec{\eta },\vec{s}} \right) - {\mathcal{L}}(\vec{\eta }|\vec{\alpha })]}}_{{\text{Expected free energy}}} $$

This functional means that agents will actively seek out evidence for their own models by minimising expected surprise. This minimisation comes in two flavours. First, expected surprise is the same as uncertainty, which means that actions will be selected that maximise expected information gain, in the spirit of optimal Bayesian experimental design (Lindley, 1956). Second, the agent will actively avoid surprising states that are unlikely under its prior beliefs. This aspect of self-evidencing conforms to Bayesian decision theory (Berger, 2011).

3.2 The Dynamics Underlying Conceptual Ability

The outlined dynamics governing the agent can be used to explain structure learning, and thus the nature of concepts. The aspect that a concept can be applied to the external world (akin to an agent having a disposition) and possesses a representational dimension, is encapsulated in the Active Inference framework by acknowledging that the agent maintains a set of (Bayesian) beliefs encoded as probability distributions under its generative model. The key point to note here is that it is the way in which this set of beliefs interfaces with the actual episodes of interaction between the agent and its environment that allows these beliefs to be beliefs about features of the external world. This interface is best understood as a link between processes running at different time scales, which collectively constitute the comprehensive (scale invariant) dynamics of the agent. Indeed, one could argue that it is this embedding in the whole dynamics of the agent’s interaction with the environment that affords what researchers commonly refer to as ‘grounding’ (e.g., Barsalou, 2008, 2010; Pezzulo et al., 2012).

At faster time scales, beliefs are instrumental in shaping context-sensitive responses to the world, facilitating updates to one’s beliefs that come from such interactions. This accounts for how concepts are applied to the world in a dispositional fashion, and the adaptive learning that takes place during such interactions. At the slowest time scale, the dynamics pertain to the refinement of concepts in the absence of observations, usually through removing redundancy or minimising complexity, in the sense of Occam’s razor or, mathematically, Jaynes’ maximum entropy principle (Jaynes, 1957; Ramstead et al., 2022). Such dynamics elucidate how the agent may systematically evolve its conceptual framework (i.e., develop a structured and parsimonious system of concepts). In current models of Active Inference (Friston et al., 2023b; Neacsu et al., 2022; Smith et al., 2020), the relevant dynamics are generally characterised as follows:

  • At the fastest timescale, the agent interacts with the environment through sensorimotor loops (either continuously or from moment to moment). In Active Inference, as mentioned above, such interactions are understood as a form of inference where sensory data is used to update beliefs about latent states and the most likely policies to pursue (K. Friston et al., 2017a, 2017b).

  • This leads to, at a slower timescale, a more gradual and cumulative modification of the parameters of the generative model which can be described as parametric learning, a mechanism that corresponds to the modification of synaptic weights (K. Friston et al., 2017a, 2017b).

  • Finally, the system of concepts can be modified after (or in between) episodes of engagement with the environment. This happens at the slowest timescale and consists of the processes by which the system of concepts expands or contracts. Broadly speaking, different possible generative models are compared (i.e., treated as alternative hypotheses) and the one with lowest free energy is then selected. In present models the expansion and reduction of the system of concepts is modelled as complementary processes of Bayesian model selection via model expansion (Smith et al., 2020) and Bayesian model reduction (Neacsu et al., 2022).

In summary, different concepts are represented by different configurations of beliefs (e.g., synaptic activity, connectivity, and architecture) in the generative model. One has to be careful, however, in understanding this identification of concepts with configurations of beliefs. As (Ramstead et al., 2022) pointed out, the generative model is entailed by the dynamics of the system: the generative model is realised by the dynamics of the system’s evolution, that is, its ‘behaviour’ (the precise definition of entailment is given in (Friston, 2012)); it is in this sense that the generative model may be said to be enacted. The beliefs of the generative model contribute to the agent’s conceptual capacity through the role they have in the overall adaptive behaviour of the agent. It follows that it would be inaccurate to identify the probability distributions of the generative model with a representation with representational content that the agent possesses, since strictly speaking these probability distributions do not encode information about the environment for the agent; they are entailed by the dynamics of the agent. Nevertheless, there is room for representations in the present framework. The internal states can be said to represent external states in that these internal states, as states within the architecture of the agent and as coupled to the environment, display exploitable structural similarity to features of the environment and are equipped to play a functional (perhaps causal) role in contributing to the success of the agent’s interactions with the environment. Thus, following influential accounts of structural representation (Gladziejewski & Milkowski, 2017; Hohwy & Kiefer, 2017), one can view such internal states as structural representations. It would thus be more perspicuous to say that a concept is an ability of an agent to make the conceptual discriminations in question, an ability that exploits what is structurally encoded in the agent’s physical structure. We will return to this point concerning abilities below.

One might inquire about the origin of this set of discrete states (i.e., beliefs) representing concepts, as the preceding exposition may appear to have introduced such a discrete set of concepts as an initial assumption. A comprehensive exploration of this matter would require addressing longstanding issues concerning the existence of innate concepts, which would lead us into a complex tangent. Excluding the issue of innate concepts from the discussion, we can provide a general outline of how, in principle, a set of discrete concepts might arise from interactions with a continuous world. This is closely linked to the generality of concepts noted above.

The emergence of a discrete system of categories can be rationalised when considering the (informational and thermodynamic) imperatives from an evolutionary and developmental perspective. On this view, the Bayesian model selection above can now be read in terms of natural selection and the implicit minimisation of complexity cost can be read in terms of minimising thermodynamic work via Landauer’s principle and the Jarzynski equality (Bennett, 2003; Evans, 2003; Jarzynski, 1997; Landauer, 1961). To avoid expending excessive energy on processing information that may be redundant, the agent must engage in some form of data compression (Hinton & Zemel, 1993; Schmidhuber, 1991, 2010; Wallace & Dowe, 1999). Since the form such compression takes is based on evolutionary and developmental pressures, it thus takes a different shape for different capacities. For instance, researchers in robotics have studied how discrete concepts grounded in the sensorimotor capacities of an agent might emerge through interaction with the environment (Nevens, 2020), or through the grouping together of affordances (Ugur & Akbulut, 2020). Techniques from deep learning have also been leveraged to the discovery of useful discrete categories or extracting rules in order to complete tasks that require planning from robots (Ahmetoglu et al., 2022). Equally, in the study of symbol and language emergence, researchers have investigated how phonetic systems and grammatical structure might emerge from the interaction of robotic agents (e.g., Oudeyer, 2006; Steels, 1995, 2003, 2016; Steels & Szathmary, 2018); in the study of robotic and cognitive systems, researchers have envisioned a concrete process leading from basic motion patterns to chains of movements to planning and symbol manipulation and ultimately to writing and communication of signs; e.g., (Hagiwara et al., 2019) and (Taniguchi et al., 2018).

The origin of our generative models also speaks to the intersubjective aspect of concepts, read as an emergent property of Bayesian model selection or structure learning. This follows from an evolutionary psychology perspective and, in particular, cultural niche construction that scaffolds learning at a neurodevelopmental timescale (e.g., language and deontic cues) (Bruineberg et al., 2018; Constant et al., 2019; Laland et al., 2016; Vasil et al., 2020; Veissiere et al., 2019). In other words, the concepts an agent acquires through exchange with her world is quintessentially intersubjective, should that world be constituted by agents gathering evidence for their models of the shared world (Albarracin et al., 2022; Kastel & Hesp, 2021; Vasil et al., 2020). The focus on language as a vehicle of communication and (Bayesian) belief sharing speaks again to the discrete or categorical nature of concepts.

When examining this phenomenon within the framework of Active Inference, one way to construe the problem of the generation of a discrete set of categories based on continuous data is to formulate it as a problem of action selection (Parr & Friston, 2018). Specifically, the decision of where to establish a boundary within the continuous stream of data is regarded as an action that is determined on the basis of available evidence (for an illustration in the case of segmentation of speech, see (Friston et al., 2021). In (Parr & Friston, 2018), boundary segmentation is not exclusively determined by the content of the (sensory) signal itself, but rather based on identifying a number of plausible intervals, and selecting the one with the highest evidence, in line with the agent’s prior beliefs.

Once such discrete categories are in place, one can investigate the dynamics coupling the discrete and the continuous domains. More precisely, one can explore the interplay between a discrete and a continuous generative model, where the discrete model entails the established categories, and the continuous model concerns action in and observation of a continuous environment. The two are mutually interconnected through a reciprocal relationship: the discrete categories serve as prior beliefs for the continuous model, while the continuous model contributes evidence for the discrete categories, considering them as possible alternative hypotheses (see Parr & Friston, 2018) for a study of this relation in the case of ocular motion). Returning to concepts, such discrete categories as represented by different beliefs (e.g., states) may also enter into the dynamics of structure learning, and thus play a role in the constitution of the agent’s conceptual ability.

Furthermore, this approach provides a means of accounting for the notion of generality inherent in concepts. The generality of concepts stems from two features of the agent’s interaction with the environment: firstly, it arises from the hierarchical configuration of the agent, enabling it to effectively ‘filter’ the environmental input, and therefore ‘abstract from’ features of its (sensory) stimuli; secondly, these abstractions, understood as beliefs (e.g., states) of the agent’s generative model, can subsequently be employed to facilitate comprehension of the environment: that is, they enable the agent to classify encountered objects based on both past experiences and the anticipated future interactions with the environment.

From this point of view, we can address the longstanding metaphysical issues concerning the nature of such abstract objects. It has often been a source of puzzlement that concepts do not appear to belong to the same ontological category as tangible objects in space and time. Concepts exhibit the capacity to encompass a number of objects dispersed across various spatial and temporal contexts simultaneously, and they defy physical manipulations. Their existence seems to be contingent upon some act of abstraction. However, based on the aforementioned framework, it becomes evident that the present approach does not treat concepts as abstract (logical) objects. Instead, the abstract nature of concepts arises due to the extensive filtration process that occurs within the hierarchical structure of the agent. As such, concepts are not abstractions in the traditional sense; rather, their abstract quality emerges because they are derived only after a comprehensive filtering of specific object details. Consequently, the manner in which the agent engages with a concept, represented as a belief (or set of beliefs), transcends consideration of the spatiotemporal location of objects, or physical interactions with them as part of the agent’s environment. Thus, it is more appropriate to view a concept not as an abstraction, but as a concretely instantiated relationship between the agent and its environment. Such a relationship determines the application of the concept to novel objects based on the probability distributions encoded by the agent. While it is conceivable to abstract from this concrete capacity to introduce an abstraction—e.g., a logical concept—denoting the set of objects to which the agent responds using this concrete capacity, this abstraction assumes no fundamental significance in our account of concepts at this foundational level.

3.3 The Abilities Approach

In the above we have described concepts as abilities. This might call to mind an approach to concepts in philosophy, sometimes called the abilities approach (Margolis & Laurence, 2023). In the abilities approach, as most recently articulated in (Kenny, 2010), possession of a concept is identified with possession of an ability, where the notion of ability is taken as a metaphysical primitive. Here, the notion of an ability extends beyond animate agents and includes various entities (e.g., the ability of a key to unlock a door). It can be clarified through some distinctions. Firstly, while an ability can be exercised, it should not be conflated with episodes of its exercise: for instance, one’s ability to understand English transcends the cumulative instances in which one has exercised this ability; possessing the ability also implies the ability to understand English sentences that have not been encountered before. In this sense, it can be compared to dispositions, much discussed in the philosophical literature: dispositions also possess conditions for manifestation, although the dispositions may not be manifested at any moment, and should not be identified with episodes of their manifestation. Further, dispositions manifest themselves only given the right conditions, or the right ‘partners’. (for an overview, see (Choi & Fara, 2021)). As has been pointed out, it is the careful distinction between a disposition and its manifestation that allows one to understand the potentially infinite manifestations that a disposition of a finite system has, even if the finite system due to its finitude cannot actually produce all these manifestations. Secondly, abilities are ‘carried’ by vehicles (for instance, in the example provided by Kenny (2010), the power/ability of whisky to intoxicate has as its vehicle the alcohol it contains). However, the ability should not be reduced to the vehicle: it is simply categorically different. Again, this distinction calls to mind the distinction between a dispositional property and its categorical basis (usually understood as some causal basis) in the literature on the metaphysics of powers (see (Choi & Fara, 2021) and the references therein).

With the Active Inference framework in mind, possessing a concept does indeed imply the possession of an ability in Kenny’s sense. In fact, one might take the active inference approach to be a particular realisation of a dispositional or abilities view of concepts. One can interpret the formalism of active inference involving the system and its state space along with the variational principle governing trajectories in the state space as a formal framework to articulate with precision the notion of a disposition. The correspondence is as follows. First, the state space of a system encodes all the degrees of freedom of the system. This includes both the structural (e.g., spatial) degrees of freedom as well as the ‘tendencies’ of the system, understood for instance as derivatives of the relevant function. The state space is thus a formal object which represents both the categorical and dispositional aspects of the system; moreover, since points on state space are the possible states of the system, it encodes also includes the modal information concerning the possible states of the system. In a way, then, the state space models as a unified whole what we call dispositional or categorical properties of the system. Since the trajectories of the system are understood as following a variational principle, one can see them as extending into the future in a definite way. This parallels the fact that the disposition the agent has displays a sort of potential infinity extending into future occasions where it can be manifested. In the same vein, we may understand the notion of the exercise of an ability or manifestation of a disposition given the right stimulus conditions or ‘partners’. For the case of the agent in particular, the relation between its dispositions and the right stimulus conditions or ‘partners’ of the disposition is captured precisely by the complementarity of the agent and its environment. This complementarity has long been conceptualised through the concept of an affordance, an essential relational feature of the environment relative to the agent. In the Active Inference framework, an affordance is that which possesses a high probability for the agent a priori (Linson et al., 2018) (e.g., beliefs about sequential changes in hidden states (Friston et al., 2012)). Furthermore, since there is temporal depth to the agent, an affordance is not just a relational feature of the environment for the agent at the present moment. It also includes the relation between a presently given agent and future relations between agent and environment.

In Active Inference treatments of affordances (Bruineberg & Rietveld, 2014; Cisek, 2007; Gibson, 1977; Kiverstein et al., 2019; Veissiere et al., 2019), the two aspects of self-evidencing afforded by expected free energy are often read in terms of epistemic affordance (information seeking opportunities) and pragmatic affordance (preference seeking opportunities) (Parr & Friston, 2019). The appeal to affordances brings us back to the notion of conceptualisation as ‘grasping’ or ‘getting a grip’ on the world. Thus, the future-directedness and the multiplicity of possibilities open to the agent at any point in time are already part of this complex dynamic. With the notion of affordance as understood within the framework of Active Inference, one may for the moment sidestep metaphysical concerns regarding the distinction between categorical and dispositional, or the relation between a disposition and its categorical base. This contrast is not the salient one for the study of the dynamical evolution of the agent and its interactions with the environment.

3.4 Normativity Revisited

With the theoretical framework we have outlined, we can now revisit the issue of normativity while shedding light on the central features of concepts mentioned above.

3.5 Normativity

The accommodation of normativity becomes readily apparent within this framework. The sceptical challenge requires an answer to the following question: in virtue of what is the agent employing concept C rather than another concept? The present answer appeals to the concepts as abilities, understood through the active inference framework: the agent is employing concept C because it possesses the ability to do what is required to exemplify possession of the concept C. In virtue of what is this ability that of concept C rather than C’, given that exercise of the ability to employ C and to employ C’ is by assumption the same in the cases considered? Here, one appeals to the fact that an ability to do X, or a disposition, has a potential infinity of manifestations, as remarked before. In the active inference framework, this potential infinity is understood as the future trajectory of the system under the variational principle. In this way, one can see that the present resolution of Kripke’s sceptical problem has much in common with the resolutions through appealing to a metaphysics of dispositions (see e.g., (Martin & Heil, 1998)). In such approaches, one also answers the sceptic by appealing to the disposition the agent possesses; one also appeals to the potential infinity of a disposition to manifest itself. However, this alone does not allow one to automatically answer the normative issue: in virtue of what ought the agent to continue following the rule or employing the concept in the way specified by the disposition? The approach of Martin and Heil seems unable to treat this point, in that their response to the normativity challenge does not take into account the difference between dispositions of inanimate objects to react in a certain way and dispositions of an agent to follow a rule or use a concept. In the case of inanimate objects, normativity does not seem to play any role. This indicates a weakness with Martin and Heil’s response to the normativity challenge. The present account gives a more concrete view as to how the appeal to dispositions might work to solve the problem: the normativity associated to concepts is mediated through the primitive normativity associated to an agent’s self-sustaining, self-evidencing exchange with its environment. From this perspective, the normativity associated with all concepts originates from the same source: it is the particular manner in which an agent maintains itself in a particular environment that imparts particular content to each concept. This content is acquired through the numerous intermediate steps delineated earlier: the agent’s continuous interaction with its environment, the transformation of such information into discrete categories, and the subsequent role of such discrete categories as discrete states (i.e., beliefs) within the generative model that constitute the agent’s conceptual ability. This dynamic view of agents is neither a first person, self-aware, reflective view nor a community-based view on normativity. The assertion that concept use is imbued with significance rather than ‘indifferent categorization’ does not emerge through explicit or self-conscious reflection but through the embodiment of the agent and its continual self-maintenance. From the present point of view, to assert that concept use is charged with significance is essentially synonymous with stating that the agent is self-evidencing in a particular manner.

It follows that this account does not conflict with the importance of the social dimension in the constitution of many concepts (for a brief exploration of how Active Inference can be applied to address this social dimension, please see (Hipolito & van Es, 2022)). Considering the actions, intentions, and plans ‘of the other’ introduces additional layers of complexity in the dynamics of the agent’s interaction with the world: the agent has to be able to entertain relations to other agents (for instance, have beliefs about other’s beliefs, entertaining a ‘theory of mind’). However, this augmentation does not introduce any fundamental alteration to the Active Inference framework, where concepts are viewed as relational constructs within a dynamic context. To accommodate this aspect, one simply needs to extend the existing dynamics.

The present account also gives one a way of accounting for first personal awareness the evaluation of the appropriateness of an application, if one posits a connection between awareness and minimisation of free energy. In fact, it can accommodate two phenomena on this level. First, often in learning a rule or the employment of a concept there is a moment of insight: one suddenly ‘sees’ what is to be done. This moment of insight has been modelled under the active inference framework through the process of Bayesian model selection, which is the dynamics at the slowest timescale above (K. J. Friston et al., 2017a, 2017b). Second, the sense of right or wrong that accompanies the employment of a concept is tied to whether one has an optimal grip over one’s environment through the employment of the concept. Awareness of optimal grip may be theorised through minimisation of free energy functionals (Bruineberg & Rietveld, 2014).

3.6 Kripke’s Case Reconsidered

To make the above point concrete, let us now return to Kripke’s scenario. Abstracting from the social situation in which Kripke first posed his question, we answer the following question: how do the dispositions, conceived as part of the dynamical state of the active inference agent, tell the agent what it ought to do?

As remarked above, the sense of the ought comes from the fact that the disposition in question is part of the self-maintaining dynamic of the agent; it is the imperative for this self-maintenance that gives the disposition the force of an ought. Now since this self-maintaining dynamic, under the free energy principle, is nothing other than the trajectories of the agent minimising free energy functionals, one could alternatively say that the ought derives from the fact that the employment of one concept or rule rather than another minimises the free energy functionals. But a problem arisesFootnote 2: by assumption, the result of following plus or quus would have been the same in the past cases the agent has engaged in. On the basis of this, when the agent next sees a cue for performing some operation on two numbers, why would identifying the state generating the problem as requiring one to perform one operation rather than another have a smaller free energy? Similarly, why would planning to execute one operation rather than another have smaller expected free energy?

To respond to this problem, we have to understand the meaning of the free energy functionals involved better. Recall that it is possible to decompose the free energy functionals as follows (see, e.g., Friston et al., 2023):

$$ \begin{aligned} F\left( {\vec{s},\vec{\alpha }} \right) & = \underbrace {{{\mathbb{E}}_{{q\left( {\vec{\eta }} \right)}} \left[ {{\mathcal{L}}\left( {\vec{\eta },\overrightarrow {s,} \vec{\alpha }} \right) + \ln q\left( {\vec{\eta }} \right)} \right]}}_{{\text{Variational free energy}}} \\ & = {\mathbb{E}}_{{q\left( {\vec{\eta }} \right)}} \left[ { - \ln p(\overrightarrow {s,} \vec{\alpha }|\vec{\eta })p\left( {\vec{\eta }} \right) + \ln q\left( {\vec{\eta }} \right)} \right] \\ & = \underbrace {{D_{KL} [q\left( {\vec{\eta }} \right)||p\left( {\vec{\eta }} \right)]}}_{{{\text{Complexity}}}} - \underbrace {{{\mathbb{E}}_{{q\left( {\vec{\eta }} \right)}} \left[ { - \ln p(\vec{s},\vec{\alpha }|\vec{\eta }} \right)]}}_{{{\text{Accuracy}}}} \\ \end{aligned} $$
$$ \begin{aligned} E\left( {\vec{\alpha }} \right) & = \underbrace {{{\mathbb{E}}_{{p(\vec{\eta },\vec{s}|\vec{\alpha })}} [{\mathcal{L}}\left( {\vec{\eta },\vec{s}} \right) - {\mathcal{L}}(\vec{\eta }|\vec{\alpha })]}}_{{\text{Expected free energy}}} \\ & = \underbrace {{D_{KL} [p(\vec{s}|\vec{\alpha })||p\left( {\vec{s}} \right)]}}_{{{\text{Risk}}}} + \underbrace {{{\mathbb{E}}_{{p(\vec{\eta },\vec{s}|\vec{\alpha })}} [{\mathcal{L}}(\vec{s}|\vec{\eta },\vec{\alpha })]}}_{{{\text{Ambiguity}}}} \\ & = \underbrace {{{\mathbb{E}}_{{p(\vec{s}|\vec{\alpha })}} \left[ {{\mathcal{L}}\left( {\vec{s}} \right)} \right]}}_{{\text{Expected cost}}} - \underbrace {{{\mathbb{E}}_{{p(\vec{s}|\vec{\alpha })}} \left[ {D_{KL} } \right[p(\vec{\eta }|\vec{s},\vec{\alpha })||p(\vec{\eta }|\vec{\alpha })]]}}_{{\text{Expected information gain}}} \\ \end{aligned} $$

We can thus see that in each case the minimisation of the free energy functional involves a trade-off. In minimising variational free energy, there is a trade-off between complexity and accuracy; in minimising expected free energy, the trade-off can be read as balancing expected cost and expected information gain; alternatively, it can be understood as minimising risk and ambiguity. Thus, in deciding what state of the world is responsible for one’s observations, one’s posterior density has to maximise not only accuracy, which is intuitively the fit between one’s posterior density and observations, but one has to do so in such a way that one minimises complexity, that is, difference with prior beliefs, which might derive from evolutionary or developmental sources. Thus, fit with a sequence of observations is not the only criteria. Similarly, the actions selected are selected not only because of the reduction of ambiguity, which is roughly uncertainty about which rule is in play, but also minimisation of risk, which is uncertainty of outcomes in relation to preferences. In the case at hand, it is assumed that both rules could account for all the observations. Thus, the key factor for the agent selecting one rule rather than another seems to be based on priors and preferences, which encode what the agent thinks is less complex. In terms of the exploration–exploitation trade-off, the expected information gain also does not really distinguish the two rules, thus the distinction comes from expected cost of following a rule. It is also assumed, of course, that this dynamic is at work at the level of Bayesian model selection, which in an offline manner attempts to find the most parsimonious explanation of accounting for the observations.

Now one might note that Kripke himself has objected that simplicity cannot be called upon to dissolve his sceptical puzzle; such simplicity includes what we have called complexity here. The reason is not the difficulty of finding an appropriate measure of ‘simple’. Rather, it is more basic: simplicity is simply irrelevant, for the sceptic is casting doubt on the existence of any fact of the matter concerning whether the individual is using one rule rather than another. If the individual cannot be said to be following one rule rather than another, ways of comparing the two rules would not matter. But it is easy to see that this rejection of simplicity relies on the prior rejection of the existence of any fact of the matter concerning what concept the agent is really applying; this, however, relies on a rejection of the dispositional view. Our appeal to simplicity thus is not supposed, by itself, to meet the sceptical challenge. It simply completes the version of the dispositional view defended here.

It is also evident that the social aspect could be incorporated, should one choose to do so. This could be in the form of a social interaction that scaffolds the learning of a concept, or via another agent’s role in evaluating the application of a concept. It is important to note, however, that this social interaction is an extension of an agent’s basic ability to evaluate its own conceptual operations. While it may influence the agent’s habits or preferences for actions, it is more appropriately seen as contextualising the dynamics underpinning an agent’s conceptual ability.

4 Conclusion

In summary, this paper established an Active Inference view of concepts with the primary objective of highlighting the inherent normativity of concepts. Within this framework, concepts are construed as emerging from the unique dynamics governing an agent’s interactions with its environment. This distinctive viewpoint contrasts with established approaches to normativity, which typically situate normative aspects either within a social dimension or within the realm of self-aware, reflective, and introspective, first-person dimensions. In our framework, we trace the source of a primitive form of normativity to the self-organising structure (or nature) of the agent. Additionally, we have undertaken a comparative analysis between the Active Inference view and existing paradigms for understanding concepts, including those of dispositions and abilities. We contend that the Active Inference view should be regarded as a more concrete and improved version of these approaches. With the Active Inference view, concepts are normative in that they are intrinsically connected to the self-maintaining nature of an agent, whose very structure implies an evaluation of the concepts it employs.