Intentionality and Symbolic Cognition

The evolution of hominin language poses a unique set of problems for scientists and philosophers. While language was undoubtedly of great importance for the evolution of Homo sapiens, there is very little agreement as to when, how, or why language developed, or even what it is. Language, before being written down, leaves few direct traces in the archaeological records. Furthermore, the gap between other great ape and modern human language competencies has been characterized as “glaringly large” (Planer and Starelny, 2021), and the failure to understand it as “an embarrassment for evolutionary theory” (Premarck, 1986: 133) and even as “the hardest problem in science” (Christiansen & Kirby, 2003). While small but important pieces of this massive puzzle have in recent decades fallen together, some painstaking and inevitably controversial reconstruction is required to come to an understanding of around five to nine million years of language evolution (Rogers & Gibbs, 2014).

A concept at the heart of ongoing discussions on the origins of language is that of symbolism. Over the past few decades, archaeological discoveries have definitively overthrown the longstanding idea of a sudden “creative explosion” (Pfeiffer, 1982) in the Upper Paleolithic. In particular, the decorative use of eagle talons (Frayer et al., 2020; Rodríguez-Hidalgo et al., 2019), various possible burial sites (Solecki, 1975; Vandermeersch & Bar-Yosef, 2019), and the use of pigment, pendants, and perforated shells (Bouzouggar et al., 2007; Henshilwood et al., 2002; Hoffmann et al., 2018; Marean et al., 2004) have pushed dates for complex behavior back significantly. Recently, Prévost et al., (2021: 1) claimed “symbolically mediated behavior” for the Middle Stone Age (Africa, MSA)/Middle Paleolithic (outside Africa, MP) Homo sapiens, and Hoffman et al. (2018: 1) suggested that the “roots of symbolic material culture may be found among the common ancestor of Neandertals and modern humans, more than half-a-million years ago”.

At the same time, archaeological debates are said to suffer from “loose reference to symbolic culture” (Wynn et al., 2016: 15) and “poverty of appropriate interpretive concepts” (Wynn & Coolidge, 2010: 5). Pigment use, beadwork, eagle talons, and figurines are almost standardly assumed to indicate symbolism (Henshilhood and Dubreuil 2009: 50), even though none of them, in fact, clearly live up to the still widely endorsed criteria of C. S. Peirce. In ongoing debates in primatology and ethology, on the other hand, which are not bound by the study of material culture, it is more commonly suggested that certain non-human animals might be capable of some form of symbolic cognition. For instance, the so-called directed scratch of Ngogo chimpanzees has been interpreted as partially symbolic (Pika, 2012; Pika & Mitani, 2006, 2009), and other chimpanzee populations have been shown not to use this gesture, which could indicate it is culturally transmitted (Wilke et al., 2022). Also, Zuberbühler et al. (2011: 26) argue that primates “produce and understand functionally referential calls,” and Tomasello and Call (2019: 462f.) suggest certain great ape pantomime (usually iconic in origin) could represent “meaningful social acts symbolically”. If so, symbolic cognition in the hominin lineage might go back much further than most archaeologists nowadays assume, possibly well into the Lower Paleolithic/Early Stone Age (LP/ESA).

The present paper uses some conceptual tools from phenomenology to reflect on early hominin social intentionality in the LP/ESA, around the time of the earliest known stone tool industries, at this point identified as the work of Kenyanthropus, Paranthropus, and/or Australopithecus (Toth & Schick, 2018: 7–9). My principal focus lies on the cognitive-intentional structure of pointing. In speaking of intentionality, I have in mind the phenomenological concept, which refers to the first-person viewpoint and the way it is “directed” at or “about” something (van Mazijk, 2019, 2020, 2022). I define a social intentional act broadly as an intentional act that is directed at and apprehends the mental states of another. This aligns with the equally broad definition of intersubjectivity of Zlatev et al., (2008: 1), “as the sharing of experiential content (e.g., feelings, perceptions, thoughts, and linguistic meanings) among a plurality of subjects”. A social act can then either be a mindreading act without communicative intent, such as in following another’s gaze or apprehending their (involuntary) bodily expression, or an act with communicative intent, such as pointing.

Departing from phenomenology primarily means taking intentionality and the first-person viewpoint seriously (see also van Mazijk, 2022 for a discussion of the role of intentional analysis in cognitive archaeology and related disciplines). Intentional analysis is geared toward the examination of the minds of past hominins and the way they experienced their worlds (called “intentional world-having” in van Mazijk, 2022). An advantage of the concept of intentionality is that it already plays a significant role in discussions on the “Theory of Mind” (ToM) of human infants and non-human primates, although it is there used exclusively in reference to social intentionality (e.g., Tomasello, 2010, 2021).

A fair amount of phenomenological work has been critical of ToM approaches and has defended alternative accounts, often based on concepts of embodiment or action (see e.g., Gallagher, 2007; 2008; Gallagher & Hutto, 2008; Zahavi, 2001, 2008, 2011, 2014b; Williams 2017; Aston, 2019). These contributions often focused on semantic issues regarding concepts like simulation, analogy, inference, mental content, and/or our (in)direct access to other minds. They also primarily targeted early versions of ToM (e.g., Goldman, 2002; Leslie, 1987). However, these early versions of ToM are quite different from more recent ones, for example as developed in Tomasello’s (2010, 2021) recent work, which centrally involve so-called “orders of intentionality,” and which are widely applied in ongoing social cognition research. As a consequence of this predominantly critical stance, phenomenological literature so far has largely overlooked the significant overlap which in fact exists between intentional analysis in the phenomenological (in particular Husserlian) tradition and these recent empirical applications which work with orders of intentionality.

This paper starts with a brief survey of certain social intentional acts thought to be in place at the start of the LP/ESA. I first discuss (a) bodily expression, (b) gaze following, and (c) attention-getters, which are performed by extant chimpanzees, and I analyze the orders of intentionality involved in them. In the follow-up section, I zoom in on (d) pointing gestures and argue that, in spite of it appearing as a “natural” gesture (Tomasello, 2010: 107, 112), pointing involves key elements that are typical of symbolic cognition. In particular, it involves a “social intention” (Tomasello, 2010: 29, 51) which is not codified indexically or iconically in the signaling behavior itself. To decode this social intention (“what does the other want me to do?”), the interpreter must resort to a shared contextual understanding, which I call the “shared practice horizon”. The final part reviews some indications for the use of pointing by Early Stone Age hominins. I conclude that pointing is more complex than is standardly acknowledged and that it may represent a missing link in the evolution of hominin symbolic cognition.

Expression, Gaze Following, and Attention-Getters

In this section, I briefly discuss three basic social acts that were likely available to hominins at the start of the LP/ESA, in order to assess their respective intentional structures. This list of acts is not exhaustive, and the discussion mainly serves as a prelude to the one on pointing in the next section. The acts discussed here are (a) bodily expression, (b) gaze following, and (c) attention-getters. The use of these by LP/ESA hominins is (or should be) relatively uncontroversial, as they are all used by extant great apes (at least by chimpanzees). In the follow-up section, I elaborate in more detail on pointing gestures and how they differ from these acts.

In phenomenology, it is common to distinguish between two types of mental states: those “lived through” and those which are “directed at” something. For instance, I feel pain, angry, frustrated, etc. These are mental states of the first kind, which are also called “immanent”. Here we also count being tired, feeling gloomy, down, worried, etc. In other words, moods which “color” what we experience. Today, these are often generically referred to as “phenomenal” or “qualitative” contents (Lycan, 2019). By contrast, the second type of mental state involves intentional directedness at something. For instance: I see a table, I think of you, I imagine a horse, I hear a train go by, etc. Such mental states are intentional acts (see also Husserl, 1984: 396–410).

It is typical of intentional acts that they present things as existing independently of me. For instance, the chair I see is presented as a part of the world outside of me. In other words, the object of the intentional act transcends my mental state. The pain I feel or the feeling of being tired, on the contrary, do not (necessarily) present things that are outside of my mental states. Mental states of this type do not—at least not as clearly—establish a relation to a transcendent (transcending my mental state) object, and can therefore instead be called immanent. It is thus only in intentional acts that the world is made to appear.

We can use this basic division in mental states to better assess some of the social act capacities that were likely available to hominins at the advent of the LP/ESA. In general, all vertebrates have a high degree of cephalization, possess a central nervous system as well as an endocrine system, and have specialized sensory and motor cells. As such, vertebrates are affected by environmental stimuli through their senses, to which they respond in complex ways. Presumably, many (if not all) vertebrates, and certainly all hominins, “live through” various affections in the so-called immanent sense just explained, in other words, they have phenomenal states.

Moreover, behavior can often be seen as “expressing” the complex immanent states others are in. For instance, we might say that flight behavior expresses a state of fear, limping a state of pain, etc. Thus considered, the external behavior expresses an immanent state in overt behavior. Many such basic behavioral expressions—as of pain, pleasure, hunger, anger, etc.—need not involve communicative intent. Still, others may interpret expressions, based on what has been called a “corporeal schema” or “bodily mimesis,” understood as “a capacity to use our bodies as resonance boxes, so to speak, in feeling the emotions of others” (Zlatev, 2007: 124; Zlatev, 2018: 46).

Importantly, such an embodied understanding of another’s states usually does not presuppose an understanding of the other’s intentional states. Many vertebrates can observably apprehend emotional states “immediately” through bodily expression while being more or less unable to follow the other’s gaze, so as to be aware of what the other is oriented at. For example, cats rarely follow gazes (only occasionally when there’s food involved, Koyasu et al. 2020), but they respond instantly to threatening bodily expressions. Something similar holds for human infants before the age of 6 months, who are very responsive to the facial expressions of caretakers, but cannot yet follow gazes (Gergely et al. 1995; Tomasello, 2021). For this reason, an interpretation of bodily expression qualifies, in my view, as a so-called “first order act” (see also Cole, 2016: 162; Gamble et al., 2018: 53f.). That is to say, it does not require a recursion of the other’s intentional state within the interpreter’s intentional state, as in “I see that you see X,” which is the case in gaze following, as I discuss in a moment.

Expression can also pair with communicative intent, as is likewise observable in many vertebrates (and perhaps invertebrates, see Taylor & Patek, 2010). Phenomenologically speaking, communicative intent means, first of all, establishing an intentional relation to “another,” who is tacitly apprehended as an expressive subject like myself (see Husserl, 1970b: 217; Zahavi, 2014a). It is thus an intentional act of being directed at another. This is unlike in so-called “communicative displays,” which are physical characteristics designed to affect the behavior of others (Tomasello, 2010: 14f.), and which are therefore not intentional or communicative acts in the phenomenological sense. Second, a communicative intention involves the voluntary display of an expressive action to the other, in an attempt to influence their behavior.

Even when an expression is thus voluntarily directed at another, it is usually not necessary to suppose an intentional state is communicated, which would make its interpretation a second order act (on behalf of the interpreter). This is because expression frequently does not communicate anything about objects in the world. Instead, an immanent state is expressed in an attempt to influence the other’s behavior. For instance, frustration or aggression might be expressed behaviorally toward another to “manipulate” their behavior (e.g., a chimpanzee’s aggressive behavioral display toward a conspecific). No object needs to be addressed in such an act, and neither party needs to have an understanding of the other’s intentional state for communication to be successful. Such expressive behaviors are often considered to be genetically fixed (e.g., Tomasello, 2010: 43f.), which could be true even for most primate vocal sounds (Genty et al., 2009; Tomasello, 2010: 8–20; Wheeler & Fischer, 2012; Price et al., 2015; Tomasello & Call, 2019; Griebel & Oller, 2021), although this is contested (Slocombe & Zuberbühler, 2006; Tomasello, 2010: 15–26; Zuberbühler et al., 2011; Schel et al., 2013).

Another act worth considering briefly, which is also widespread among vertebrates (Zeiträg et al., 2022), is gaze following. For instance, nine-month-old infants, chimpanzees, and wolves are capable of following gazes or head directions (Pepperberg, 2012; Range & Virányi, 2011; Tomasello et al., 2005; Ueda et al., 2014). Gazing usually lacks communicative intent on the gazer’s side. On the interpretative side, however, it requires an interest in the other’s object-directedness. Gaze following therefore requires, like expression with communicative intent, an intentional relation to another. In addition to this, it represents a crucial shift from “immanence” to “transcendence” from the mindreader’s viewpoint. For unlike expression, gaze following is an interpretation of the intentional state of the other, namely the object or situation they are directed at, rather than, so to say, how they “feel” about it, as was the case with expression. Gaze following therefore involves intentional recursion: the intentional directedness of another is now included within the mindreader’s own intentional directedness. This makes it a so-called second order act, as in “I see that you see X,” in contrast to the first order structure typical of many expressions.

Consequently, gaze following must be said to involve a different interpretative activity than the interpretation of bodily expression. This is also captured by saying that the gaze is indexical (van Mazijk, 2022). When I follow another’s gaze, it is because I infer a causal relation between the direction of the gaze and the intentional state of the subject in question. There is no similar inference in apprehending bodily expressions, a capacity which rather appears to be “immediate” and “hardwired” (Pika & Mitani, 2009: 167). Gaze following therefore seems to rely on an understanding of the covariation of gaze direction and intentional object awareness, which works simply because the other cannot help being aware of what they look at. This makes it relatively easy to infer the intentional state from observing gaze direction, at least compared to more complex (third order) acts like pointing, to which I turn later.

The last communicative act worth discussing here is the so-called “attention-getter” (Tomasello, 2010: 27). Unlike gaze following, the attention-getter seems, according to Tomasello, to be exclusive to primates, perhaps even to great apes. In short, attention-getters are individually learned behavioral displays that are used to attract the attention of others. An individual may notice that certain extravagant behavior draws attention, and subsequently learns to exploit this in diverging settings. Tomasello and Call (2019: 466) note that the attention-getter does “not [involve] a third object but still involves something in the direction of reference,” and that it manipulates “the attention of the recipient to specific entities”.

In my view, it is questionable whether one should speak of “reference” or “entities” here. It does not seem necessary to suppose that the communicating great ape is intentionally directed at themselves as an object, in order to then establish shared intentionality to that object, as would be the case in pointing to the self – something non-human great apes don’t do. Moreover, to interpret a pointing gesture, as we show later, the interpreter must engage in third order intentionality (“I see that you want to show me X”), and must subsequently wonder why the other manipulates their attention in this way.

The attention-getter, by contrast, does not seem to require third order (“I see that you want to show me X”) or even second order (“I see that you see X”) intentionality from the interpreter’s viewpoint. This is because the interpreter need not understand the communicator’s intentions at all in order for their attention to be successfully manipulated. After all, they simply respond to a loud sound that was just made. It is also not necessary to suppose that the communicator considers themselves as an object in the world, as would be the case in pointing to oneself in order to establish shared intentionality. It is instead more likely that the communicator has a pre-reflective understanding of what it is like to be looked at—an experience chimpanzees are thoroughly familiar with, as it is also needed for dyadic pantomime. They subsequently exploit certain attention-drawing behavior to achieve this familiar situation of having the other’s attention, which does not require them to consider themselves as objects.

In short, then, attention-getters do not seem to resemble pointing gestures in terms of intentional structure, as it is not necessary to suppose that they involve reference to a transcendent object, or third order intentionality. This makes it, in my view, unlikely that the attention-getter would be a “missing link” (Tomasello & Call, 2019: 466) in hominin language evolution.

I have so far discussed expression, gaze following, and attention-getters. We can postulate with reasonable certainty that early LP/ESA hominins were capable of at least these three acts. This much should be uncontroversial because we know that extant non-human great apes are capable of these (Tomasello & Call, 2019). That being said, there is little agreement as to whether or in what ways non-human great apes can refer to events or objects. Non-human great apes are certainly good at following gazes, but pointing is often suggested to be “virtually absent in wild chimpanzees” (Leavens et al., 2005; also Tomasello, 2021). Some domesticated mammals follow the pointing gestures of humans (Miklósi & Soproni, 2006), including elephants (Smet & Byrne, 2013) and captive non-human great apes (Tomasello, 2010: 34–48), but only humans (and perhaps ravens, see Pika & Bugnyar, 2011) actively initiate pointing to conspecifics; any observed trunk “pointing” of elephants is likely a mere expression (see Smet & Byrne, 2020).

In any case, expression, gaze following, and attention-getters do not seem to allow such reference; they are not triadic acts. As the previous expositions showed, expression, as I defined it, concerns the expression of an immanent state, and as it does not communicate intentional states, it is a first order act. Gaze following, on the other hand, is a one-sided second order act which interprets the intentional directedness of the other. Finally, attention-getters are equally one-sided, and it is not necessary to suppose that they involve reference to the self as a transcendent object.

In the next section, I offer a more detailed intentional account of pointing gestures. I argue that pointing is unique in virtue of being a triadic act with a social intention that the interpreter cannot decode by considering the signaling behavior only. The social intention, in other words, is not codified indexically or iconically in the behavioral display. As a result, the interpreter must resort to a shared context awareness, something I call a “shared practice horizon”. Figure 1 offers an overview of the discussion so far:

Fig. 1
figure 1

Overview of first order and second order social acts, which are pre-symbolic

The Intentionality of Pointing

Archaeologists often attempt to settle important debates about early hominin behavior or cognition by referring to a single, vaguely specified capacity for symbolism, which hominins are then alleged to have either possessed or not. For instance, Middle Pleistocene ochre findings and Late Pleistocene decorative items have been used to infer modern symbolism (Hoffmann et al., 2018; Zilhão et al., 2010), syntactical complexity (d’Errico & Vanhaeren, 2012), or modern symbolic behavior in general (Vanhaeren & d’Errico, 2006, Prévost 2021).

Although there are exceptions (e.g., Planer & Sterelny, 2021), the archaeological literature does not always neatly distinguish various elements often associated with symbolic behavior, such as Peircean arbitrariness or conventionality, “unbounded discourse” (meaning that everything can be named, e.g., Rappaport, 1999: 4), “free retrieval” (suggesting that lexical items are available for use at any time, see, e.g., Tallerman, 2011: 181), or “infinite generativity” (the capacity for syntactic recursion, see Hauser et al., 2002: 1574). Debates are occasionally said to suffer from “loose reference to symbolic culture” (Wynn et al., 2016: 15) and “poverty of appropriate interpretive concepts” (Wynn & Coolidge, 2010: 5). Admittedly, classic phenomenologists of the early twentieth century often did not separate these various elements either. Cassirer, Husserl, and Heidegger all studied the “essence” of the human being and concluded that humans are essentially symbolic or speech-capable beings, with a free capacity to capture the experienced world in language (see Cassirer, 1972: 36–52; Husserl, 1983: 295; 1997: 37–42; Heidegger, 2012: 203–210). Husserl even noted at one point that “the human surrounding world is [in terms of its general intentional structures] the same today as always” (Husserl, 1970a: 378).

In this section, I suggest that pointing involves key elements shared with symbolic cognition, although it lacks many of the more demanding characteristics listed above (such as unbounded discourse, free retrieval, and infinite generativity). In my view, pointing could be seen as symbolic because the social intention of the act is not codified in the signaling behavior. More precisely, it is symbolic insofar as the message is not encoded indexically or iconically. This negative definition of a symbol bypasses the often vague concepts of conventionality and arbitrariness, and is more commonly used in primatological literature. Instead, in pointing, the meaning is in each case deferred to what I call a shared practice horizon. Relatedly, and unlike with gaze following, expression, and attention-getters, interpreting a pointing gesture requires third order intentionality on the interpreter’s side (“I see that you want to show me X”), and the interpreter needs to wonder why the other is trying to establish shared intentionality to something. The answer to this why-question is not codified indexically or iconically in the behavior, and thus requires a consideration of the shared practice horizon, a kind of “what are we doing” background awareness.

Let us first return briefly to gazing. As discussed, gaze following, unlike expression, requires an inference based on the covariation of direction of gaze and intentional state. In other words, by seeing someone’s direction of gaze, one infers what they are conscious of. This makes the act indexical, in contrast to a symbolic act, which should have a meaning which is “not codified in the behavior” (Pika, 2012: 578). The inferential act involved in gaze following further requires, as we saw earlier, two orders of intentionality (“I see that you see X”), and only one order on the side of the gazer (they need not be socially engaging). Furthermore, we saw that this act is different from more widespread expressive modes of communication, as the latter tend to communicate immanent, not intentional, mental states, and are therefore first order acts.

Unlike gazing, as I elaborate in what follows, pointing has a meaning which is not codified in the signaling behavior. This is not how pointing is traditionally understood. Most notably, Peirce understood pointing as an indexical act. This is probably because he focused mainly on the referential intention, and this is indicated by the direction of the finger, hand, or arm. More recently, Leavens et al., (2005: 1) also noted that pointing “is not arbitrary,” and Tomasello (2010: 145–153) argued that pointing is not symbolic. Tomasello also suggests that pointing remains to be used by linguistic infants in early stages precisely because it is not symbolic, whereas the use of pantomime declines as it would compete with symbolic speech. In my view, pointing is more symbolic than most pantomimic acts, and this is presumably why it remains being used by linguistic infants. Put differently, because pointing and early speech involve largely the same intentional structure, pointing can initially serve to support symbolic speech, and thus overlaps with its early development (ontogenetically and phylogenetically).

It can first be noted that, unlike with gazing, there is nothing in the pointing gesture which immediately correlates with object awareness. The same gesture could, after all, be a mere expression, and this is indeed how limb movements are generally interpreted at this stage. A straightforward causal inference is for this reason not possible, as a wave of the arm simply does not covary with any object awareness in the way gazing does. It is therefore in my view likely that pointing at first combined with gazing, as it is already apprehended as indicative of object awareness, through covariation. Gazing, then, may at first have served to “underline” the pointing gesture, such that both “point” to the same object. Consequently, the interpreter can partially “offload” the referential intention of the act onto the gaze which is already understood. As this is an extension of gaze following, the referential intention can still be regarded as codified indexically as Peirce suggested.

As discussed, gaze following requires a second order intentional act on the interpreter’s side, as they need to apprehend something like “I see that you see X”. The gazer, on the other hand, need only engage in first order acts, as they need not be socially engaging at all. Pointing, by contrast, is generally a cooperative act; it involves the intention of sharing, informing, declaring, showing, and the like. However, as Peirce may have failed to sufficiently acknowledge, in real-life practical activity, pointing never serves merely the purpose of referring to things. Instead, there is a social implication involved. Shared intentionality is usually established for a practical, action-involving purpose. In other words, if someone points out something to me, they generally want me to do something. Without such a social intention, there is no “point” to pointing. It is certainly not coincidental that great ape (non-referential) gestures are almost exclusively imperative (Pika & Mitani, 2009: 169; Tomasello, 2010: 41).

Importantly, unlike pantomimic acts, which are generally considered iconic in origin (although the cognitive mechanisms of pantomime are debated, see e.g., Tomasello, 2010, 2021; Halina et al., 2013; Byrne et al., 2017), the social intention of pointing is not codified in the behavior as it is being displayed. For instance, if I point to a tree, the interpreter needs to wonder why I am doing this, or what the meaning of this act is. It might be that I want us to appreciate the tree’s beauty together, that I want to gather wood with them for the fireplace, or something else. None of this is codified in the signaling behavior itself. Instead, the social intention varies depending on the context. Interpreting the act successfully therefore requires a consideration of a shared context. I suggest that this shared context is at first a background understanding of a shared practice, a kind of “what are we doing” awareness which I call a “shared practice horizon” (see also van Mazijk, 2023).

The “meaning” of a pointing act, therefore, differs in each case not just because the referential objects differ, but because the social intentions (“What action does the other want me to perform?”) differ. To understand this social intention, three orders of intentionality are required on the interpreter’s side. Compare “I see that you see the food” (two orders, by gaze following) to “I see that you want to show me the food” (three orders, by cooperative pointing). While chimpanzee pantomime can also involve social intentions, such acts are generally considered iconic in origin, and therefore less in need of shared practice horizons, as the message is here codified in the signaling behavior through resemblance. Pantomimic interactions are also generally dyadic, rather than triadic, as pointing is.

To decode the social intention involved in pointing, something besides the sign itself has to be considered, namely what I call the “shared practice horizon”. In phenomenology, the term “horizon” is used to refer to any kind of background awareness of what is not immediately perceived yet somehow made co-present. Husserl famously distinguished between the “inner” horizon of things (their currently unperceived sides) and their “outer” horizon (their unperceived surroundings), but he used the term in diverging ways, for instance in speaking of a “horizon of familiarities” and “the world as horizon” (Husserl, 1970b: 31; Husserl, 1997: 197; 2001: 40–42). For example, in perceiving a chair, I am tacitly aware of its unperceived sides (inner horizon), its place in my living room (outer horizon), and its belonging within a larger cultural world in which I stand with others (world horizon).

Heidegger used the term horizon more explicitly in relation to a background understanding of how things are used (are “ready-to-hand”) within an ongoing practice or “equipment-context” (Heidegger, 2001: 109, 340, 464). Building on this idea, a shared practice horizon can be said to provide a cognitive frame for inferential reasoning, ordering new experiences, and outlining appropriate courses of action. For instance, when I go shopping, I am usually not intentionally aware of a practice of “going shopping”. Yet such awareness figures here as a horizon for the interpretation of sensory information and for initiating appropriate action. A different practice horizon—say, when I am to examine the building’s construction, or when I plan to rob the store—will result in a different organization of the perceptual field, highlighting different aspects of it for me, and consequently outlining different future courses of action. A horizon, then, is not a thing I am intentionally directed at, but a background framework which guides the ongoing interpretation of what I am directed at.

Interpreting a pointing gesture, even when it is supported by gazing and expressive behavior, presupposes a background horizonal awareness of a shared practice in order for the social intention to come across. This shared practice horizon functions as the primary form of socially enacted common ground enabling the use and interpretation of pointing and symbolic acts. For instance, the communicative success of pointing to an anvil in an Acheulian tool-making session could depend (for Early Stone Age hominins) on a shared understanding of the ongoing practice of making stone tools. It is only because both parties stand in the same practice horizon (making tools), that the interpreter can grasp the imperative force of this gesture (“I want you to use that anvil”). This requires both parties to assume that they are, at least to some extent, cooperatively engaged in the same practice. If, by contrast, the recipient lacks the appropriate practice horizon, this may lead to communicative failure. For example, someone’s pointing to an anvil when I am preparing for sleep does not allow me to disclose whatever the meaning of the gesture at that point may be, as we are not sharing the same practice horizon. Interpretation of the social intention, then, depends on a shared practice horizon, as the social intention is not codified in the behavior.

One major advantage of practice-embedded symbolic acts, pointing included, in early symbolic activity would be that it makes symbol use less cognitively demanding than more “free” symbol use as we are tempted to think of symbolism (see van Mazijk, 2023). The concrete meaning of the act could be said to be “offloaded” onto the environment, or better, onto the shared practice horizon. For instance, in a social practice of tool making, there is already a shared understanding of bodily expressions, arm movements, etc., and the intentions and goals they serve in this context. This shared practice functions as a background horizon for the interpretation of non-iconic and non-indexical acts, and effectively delimits the scope of possible meanings any act may have.

The idea that pointing relies on pre-conceptual shared practice horizons is further supported by recent studies of modern human infants, who have been shown to interpret pointing gestures of adults differently when being in a shared practice with them (Liebal et al., 2009). Fourteen-month-old infants use their background understanding of a shared practice to interpret the pointing gestures of adults, and they interpret the same gestures by adults who do not partake in this practice differently. The phenomenological explanation for this is that they are horizonally aware of a shared practice, and of the absence thereof, in the case of the non-partaking adult. They successfully interpret the gestures of the partaking adult by relying on the shared practice horizon as an interpretative framework. Here’s the overview of what was discussed so far (Fig. 2):

Fig. 2
figure 2

Third order acts rely on shared practice horizons, as the social intention is not codified in the signaling behavior

Pointing in the Early Stone Age?

We have so far distinguished four basic social intentional acts. First, (a) bodily expression, defined as a first order act which communicates an immanent mental state. Second, (b) gaze following, which involves the interpretation of another’s gaze as indicating their intentional state, and which is therefore a second order act. Third, (c) attention-getters, which usually serve to draw attention to oneself, and which like gaze following lack a proper reference or third order intentional structure. Fourth, (d) pointing gestures, which are cooperative acts involving third order intentionality on the interpreter’s side, and which rely on shared practice horizons.

I suggested that of these acts, only pointing involves key elements that are characteristic of symbolic cognition. Of the four acts discussed, only pointing counts as a triadic act, and it alone involves a social intention which is not codified indexically or iconically in the overt behavior. The interpreter of a pointing gesture must (i) engage in third order intentionality, (ii) wonder why the other is trying to establish shared intentionality to something, and (iii) solve this riddle by considering the shared practice horizon.

The question can now be raised whether early LP/ESA hominins were capable of pointing gestures, besides expression, gaze following, and attention-getters. There is increasing scientific interest in the view that there was an initial phase of gestural communication in early LP/ESA hominins, prior to verbal speech (see for instance Deacon, 1997; Corballis, 2002, 2003; Pollick & de Waal, 2007; Zlatev, 2018; Tomasello, 2010, 2021; Planer & Sterelny, 2021). This makes sense for a number of reasons, some of which we have already touched upon. For one, we know that non-human great apes have more difficulty with pointing (which is gestural) than with gaze following, but nonetheless spontaneously use it in captivity (Leavens et al., 2005), suggesting they possess the relevant cognitive infrastructure, but lack the social motivation to use it in the wild. Studies in developmental psychology further show that pointing is an important communicative skill for modern humans, which is attained by infants at twelve months or earlier, after gaze following (at around six months) and prior to speech (Tomasello et al., 2005: 683; Tomasello, 2010: 154).

Another, more systematic reason to invoke a phase of pointing prior to symbolic speech is that without it, the gap between gaze following and attention-getters on the one hand and symbolic speech on the other is arguably too big to bridge, at least from the viewpoint of intentional analysis. While gaze following concerns intentional states as do speech acts, gaze following does not similarly include communicative intent, let alone a cooperative motive (non-human great apes mostly use it competitively). Attention-getters, on the other hand, are generally not used to refer to things in the way speech acts do, and neither does pantomime. These important traits are, however, shared by pointing. Pointing therefore resembles simple symbolic speech acts in terms of intentional structure, while it shares the referential intention with gaze following. It could thus function as a bridge between second order acts such as gaze following and symbolic speech acts, as it, in fact, does in human ontogeny.

There are, in short, good reasons to suggest the use of pointing by Early Stone Age hominins, in particular when departing from a phenomenological and/or ToM viewpoint. In this final section, I briefly consider other indications, mainly from archaeology and paleontology. The point of this section is not to assess the “minimal competence” (Killin & Pain, 2023; Wynn & McGrew, 1989) of Early Stone Age tool production, nor to comprehensively overview the many ongoing controversies regarding these early industries. Instead, I depart in what follows from the common sense assumption that “sophisticated thinking of ancient hominins may have been in domains that leave no archaeological signature” (Killin & Pain, 2023), and I overview some recent scholarship which offers further support, albeit often tentatively—as is inevitable for these time periods –, that relatively complex communicative strategies such as pointing may have been a part of the early hominin cognitive repertoire.

In ongoing debates, suggested dates for the evolutionary origins of symbolic cognition vary widely. Much of this diversity derives from different—often unspecified—usages of “symbolic”. There is increasing empirical evidence, however, suggesting the early use of relatively complex communicative strategies by Homo erectus in the LP/ESA, perhaps some 1.8 mya. Some general indications include (tentative) evidence for dramatic cognitive expansion occurred during the LP/ESA, with brains rapidly doubling in size (see DeFelipe, 2011; Potts, 2011), for increased technological and niche intensification (Van Arsdale, 2013), cooking and other food processing innovations (Joordens et al., 2009; Wrangham, 2009), long- distance hunting (Henrich, 2017), as well as social adaptations such as cooperative breeding and secondary altriciality (Cofran & Desilva, 2015; Isler & Van Schaik, 2012), all of which suggest the need for new communicative strategies. Further research may indicate changes in body composition (Leonard et al., 2003; Henrich, 2017), brain lateralization, hyoid bone adaptations for increased speech capacities (Capasso et al., 2008), cortical growth required for language (Hillert, 2021), and expansion of Broca’s area (with Homo habilis) associated with gestures, increased vocalization (Corballis, 2003), and procedural know-how (Henrich, 2017).

However, in light of the account of pointing provided earlier, namely as a kind of bridge from gaze following to symbolic forms of communication, it seems plausible to push back dates for basic gestural symbolic acts slightly further than this, possibly to around the time of the earliest stone tools productions in the Lomekwian and Oldowan industries, some 3.3 mya and 2.6 mya, or at least shortly thereafter (de la Torre, 2019; Flicker & Key, 2023; Lewis & Harmand, 2016; Sahle & Gossa, 2019). While the archaeological record does not permit direct inference for such communicative strategies, there are some indications (apart from those already mentioned) which make this hypothesis worth considering.

It is worth noting that, apart from making stone tools, these early Oldowan hominins needed to carefully select their rock materials (Toth & Schick, 2018: 14f.), and they likely also developed cooperative foraging (Sterelny, 2012), both of which may involve coordinated social action and possibly pointing to that end. Moreover, they were largely bipedal, and thus had free hands for gesturing, while having ape-like vocal tracts, and therefore could not perform symbolic speech acts well. Also, lithic resources appear to have been collected at notable distances, which requires some planning and which could be aided by pointing. In short, while early stone tool production may have “remained largely nonverbial” (Wynn & Coolidge, 2016: 204), various actions involved would certainly have benefited from the most basic referential act that is available to primates, namely pointing.

For extant apes (humans included), skill acquisition tends to be a social process. Consequently, the development of sophisticated knapping skills “provides an important indication of evolving social cognitive capabilities” (Stout & Semaw, 2006: 317). Hiscock notes that the complexity of the Oldowan tools shows that they may have required “skilled individuals who have been taught and practiced for extensive periods” (Hiscock, 2014: 27). Morgan et al. (2015) also note that technical skills were “learned and required considerable practice,” and that the geographical spread of the tools indicate social transmission and possibly cultural variability. Their experimental results with modern humans are further taken to indicate a “gene-culture co-evolutionary account of human evolution in which reliance on Oldowan tools would have generated selection favouring teaching,” although this interpretation is contested by others (Snyder et al., 2022; Tennie et al., 2017).

Basic cultural transmission can involve different social intentional capacities, and it certainly need not imply symbolic speech, for which there is somewhat more support around the time of Homo erectus and the Acheulian industries. It seems plausible, however, that the earliest stone tool cultures involved at least the ability to share attention to things, as well as to coordinate social activities to allow for basic cooperation. While learning by imitation—as in chimpanzee nut-cracking – does not necessarily presuppose such capacities, it seems likely that group reliance upon stone tool technologies would have selected traits that support the social transmission of technical skills, as Morgan et al. (2015) suggest. If this is so, then it is worth considering that the presence of tool industries boosted the cognitive-intentional capacities involved in the most basic gestural triadic acts which allow coordinating cooperative action based on shared practice horizons. This would be of use not only in the acquisition of knapping skills but also for sharing information pertaining to the “mapping of the distributions of lithic sources across the landscape” (Hiscock, 2014: 31), in other words, for pointing to resources in the immediate environment.

Ultimately, when and where triadic gestural acts first developed is impossible to determine with certainty, at least by current means. Moreover, questions about first beginnings need not always be very meaningful. As mentioned earlier, chimpanzees in captivity already understand pointing, even without any human instruction. It thus seems even they already possess the relevant cognitive-intentional infrastructure, and that relatively basic cooperative circumstances can suffice to motivate them to spontaneously use it. My main point has been that complex communication requires the ability to actively manipulate each other’s gaze, in order to share attention, and for primates, pointing is the most basic way of doing this. Moreover, I argued that pointing’s complex intentional structure shares essential features with both gaze following and symbolic acts, which allows it to function as a kind of bridge between them, as it, in fact, does in human ontogeny. All of this suggests that pointing may have played a crucial role in the evolution of communicative capacities in the hominin lineage.

Conclusion

In this paper, I analyzed the intentional structure of pointing and explored the possible use of it by LP/ESA hominins. I argued that of the four social acts discussed, which at least captive chimpanzees are all capable of, only pointing shares important characteristics with symbolic acts, namely a combination of a triadic act structure, third order intentionality on the interpreter’s side, a social intention which is not codified indexically or iconically, and a reliance upon shared practice horizons. Pointing is more complex than is standardly acknowledged in the literature, and may have functioned as a bridge toward more fully symbolic cognition.