Introduction

I shall consider a perennial question concerning the human condition: What made our cognitive abilities special in the animal kingdom? Instead of pretending to be able to answer this question, I will engage in some anthropologically informed speculation in the vein of Ian Hacking’s enigmatic “Break: Reals and Representations” chapter in Representing and Intervening (1983). In that chapter, Hacking highlights the central role that the making of “likenesses”, by which he refers to forms of figurative art, may have played in the evolution of human language and cognition. I will match a variation on his self-confessed “anthropological fiction” against concrete evidence of early human artefacts and offer an interpretation that anchors them in a co-evolutionary framework.

In a nutshell, my argument is this: Practices of likeness-making appeared early in the evolution of human cognition. Even though Lower Paleolithic engravings or figurines were not forms of visual art in a contemporary sense, their making and use may still have played a central role in the evolution of human thought and language. The artefacts might have acted as material scaffolds in the development of collective practices of symbolic reference-making from more basic embodied abilities of mimesis and imitation.

I will first use Hacking’s anthropological fiction as a backdrop for formulating a set of empirical hypotheses (Section “Likeness-making”). After outlining the key theoretical concepts of co-evolution, scaffolding, mimesis and imitation (Section “Mimesis, imitation and evolutionary scaffolds”), I will use the main part of this essay for presenting two paradigmatic paleontological findings of artefacts (Section “Two Lower Paleolithic artefacts”) in order to elaborate on their potential role in cognitive evolution, its nature and its evidential support (Sections “Questions of evidence”, “A space of explanatory hypotheses” and “The embodied origins of conventional reference”).

The following considerations, first and more indirectly, address a common weakness of naturalistic philosophical theories of mind and language: These theories often postulate a direct connection between animal signalling and human language but often leave that connection underspecified. There might be no such direct connection after all. The second and more proximate aim is to connect the dots between paleoanthropology, co-evolutionary theories of human cognition and theories of embodied, embedded, extended and enactive (“4E”) cognition, by adding a piece to the puzzle that has been overlooked to date.

In order to tell a moderately credible “just-so” story, this piece of Naturphilosophie commits itself to a trinity of virtues: it is equally conscious of its “what if” character and of the need to provide the best evidence for its claims where it is available, and it outlines what potential corroborating or refuting evidence would look like.

Likeness-making

A free-form speculative take on the possibility of a co-evolution of human thought and language with the making and use of likenesses is presented by Hacking in his “anthropological fiction” of Homo depictor as a “representation-maker” (1983). There are several points at which this fiction, whose purpose is to illustrate the importance of practical intervention over detached representation in the sciences rather than to formulate testable hypotheses concerning the human past, can actually be turned into a set of such hypotheses. (They will ultimately also support Hacking’s original project, but this is a different discussion that cannot be pursued here.)

To begin with, the representations made my Homo depictor are neither mental nor linguistic representations. Instead, Hacking refers to the creation of material objects, paradigmatically engravings, paintings, and figurines, that can be visually accessed by members of a population and thereby function as “public likenesses”. He draws a sharp line between these public likenesses and activities of signalling in the animal realm: Warning cries and other displays, for example in mating, bonding, or determination of group hierarchies, have a jointly indicative and imperative function. This two-faced character of animal signals has been colourfully termed “pushmi-pullyus” by Millikan (1995). They are intrinsically bound to a present or projected course of activities, so they are not capable of relating to temporally or spatially remote or fictitious world affairs. By disassociating the origins of language from animal signalling, Hacking also disassociates them from purposes of co-ordination and co-operation: “Language is not for practical affairs.” (Hacking 1983, 135)

This is the first way in which Hacking’s anthropological fiction might bear empirical content:

l.1:

The evolution of human thought and language crucially depends on collective practices of making and referring to likenesses. Likenesses are publicly accessible artefacts that represent world affairs.

The likeness relation envisioned by Hacking is of an iconic kind. The artefacts are meant to bear a phenomenal similarity to an object or situation that is recognisable for other individuals in a collective. However, taken by themselves, these likenesses might not allow observers to recognise their relation to a certain (kind of) world affair. Referential relations might first need to be established by pointing towards the likeness and to what is depicted, that is, in an indexical mode of reference. In turn, the expression of the logical relations involved (“is like”, “and is”, “is not”) may become more precise and articulate with the development of verbal language and symbolic reference. Hence, the origins of language are located in showing public likenesses to other individuals, comparing and relating them to world affairs, and thereby judging representations: “This is what it stands for, this is real, this is not”. Hacking concludes that symbolic, convention-based linguistic systems of representation co-evolved with public likenesses, furnishing the rules of what a likeness is supposed to represent, and how. He presumes the mode of reference of the likenesses themselves to be unequivocally iconic though.

Hacking’s account of likeness-making relies on the basic Peircean distinction between indexical, iconic and symbolic modes of reference (1868/1992; 1894/1998) and the hierarchy that is assumed to hold between them: Symbolic reference, as it is found in spoken or written language, is stimulus-independent, and hence may become detached from spatially and temporally proximate affairs. The capacity of detachment distinguishes it from indexical reference, as it is found in pointing gestures or signposts, which remains relational and object-based and thereby bound to concrete contexts. Unlike iconic reference in turn, as it is to be found in photographs or figurative art, symbolic reference is association-based and independent of conditions of phenomenal similarity between sign and signified. Symbolic reference is independent of proximity and similarity conditions because it is at least in an elementary sense convention-based.

The assumed causal-historical hierarchy involved here is that between indexical reference as the most elementary mode of reference, which can already be found in forms of animal signalling; iconic reference as the intermediate mode that can be found in gestural mimicry and imitation in some primates; and symbolic reference as the most complex and cognitively demanding mode, which remains an exclusive human prerogative. Accordingly, the implicit or explicit reasoning goes, the evolution of the use of signs proceeds along this hierarchy of complexity.

According to this causal-historical narrative, the second way in which Hacking’s story might bear empirical content is this:

l.2:

The making and use of public likenesses establishes an iconic mode of reference, where the ability of indexical reference-making is a prerequisite and the establishment of symbolic modes of reference an implication.

Complementary to the question of the modes of reference involved, Hacking’s account highlights the pragmatic and dynamic aspects of likeness-making as making: Instead of static images that are presented to detached observers, likenesses are objects of shared practices of creation and use. Their modes of reference, too, depend on those practices. Accordingly, the third way in which Hacking’s story might bear empirical content is this:

l.3:

The making and use of public likenesses is an intrinsically collective practice that requires various types of interaction between makers and observers that shape the artefacts’ modes of reference.

Given the purpose of Hacking’s exercise as an anthropological fiction, it naturally and legitimately lacks an account of how, where and when likeness-making appeared in human prehistory, nor does it need to bother citing evidence of how and in what causal sequence and on what time-scale the referential properties of those likenesses were established. Instead, these are the tasks for paleoanthropology and related fields, which have brought forward an array of competing approaches to the question of the role that artefacts such as engravings, figurines or cave paintings might have played in human cognitive evolution.

Adding variations on two of the empirical hypotheses derived from Hacking’s account while keeping l.1 as the shared premiss, I will make a concrete proposal as to how, where, when and to what purpose likeness-making appeared in human prehistory. It relies on paleontological evidence (see section “Two Lower Paleolithic artefacts”) as well as on theoretical considerations (see section “Mimesis, imitation and evolutionary scaffolds”):

\(l.2'\):

The earliest forms of making and using public likenesses helped to establish stimulus-detached and convention-based modes of reference-making, and therefore testify to an emerging ability of symbol use even before the advent of iconic images.

There is evidence that artefacts with intentional markings originated already in the Lower Paleolithic, up to 500 kya (thousand years ago), hence even before the appearance of Homo sapiens. These markings apparently did not serve instrumental, tool-like functions, nor do they appear to be forms of figurative art. Instead, they display abstract geometric patterns of a kind that resembles later, undisputedly symbolic artefacts. While these similarities themselves cannot prove that the Lower Paleolithic artefacts were indeed symbolic rather than iconic or indexical in their mode of reference, or that they in fact bore any referential relation at all, the specific nature of the markings is congenial to an explanation of the emergence of symbolic reference-making in more straightforward and less demanding fashion than Hacking’s iconic image.

\(l.3'\):

In conjunction with being a collective practice according to l.3, likeness-making is an embodied practice, where these practices jointly shape the artefacts’ modes of reference.

The artefacts in question testify to types of embodied skill that cannot be found elsewhere in the animal kingdom. They both involve and foster abilities of mimesis and imitation, and thereby enable cumulative cultural learning, even if they do not qualify as symbolic, indexical or iconic likenesses in a contemporary sense. In conjunction, \(l.2'\) and \(l.3'\) serve to make a tentative but concrete case for an important scaffold for the evolution of human language and cognition that has been mostly overlooked to date in the relevant fields.

Mimesis, imitation and evolutionary scaffolds

There are numerous accounts of how human cognitive abilities evolved. The abilities in question are sometimes described as complex but otherwise straightforward biological adaptations with naturally selected-for functions. This is the approach taken by evolutionary psychology (Barkow et al. 1992; Cosmides and Tooby 1987) and its application to language (Pinker and Bloom 1990). They individuate the requisite abilities as modules adapted to specific perceptual and cognitive tasks that are imposed by the environment. Alternatively and in less adaptationist terms, human-specific social, cultural and technological practices are sometimes described as the product of processes of cultural evolution that operate analogous to but otherwise independent of biological evolution (Cavalli-Sforza and Feldman 1981; Lumsden and Wilson 1981; Mesoudi et al. 2006). Further down the co-evolutionary line are Dual Inheritance Theory (Boyd and Richerson 1985) and approaches from gene-culture co-evolution (Durham 1991), which seek to demonstrate the interplay between biological and cultural factors, while still assigning priority to the former.

However, the artefacts to be considered here are not best viewed as material expressions of pre-existing abilities that would be adaptations to the conditions to which the artefacts respond. This adaptationist narrative might have some prima facie credibility for tool-making and -use, to the extent that tools display environment-directed effects that might be characterised as fitness-enhancing. These effects would make the tool-related abilities adaptive. Language and language abilities will remain more ambiguous in this respect, while there is no viable path to an equivalent characterisation of figurines or engravings and the abilities of making and using them. Any environment-directed effect that artefacts of the latter kind display, and therefore any adaptive function they might serve, such as social bonding or aiding perception, are much harder to define than hunting or food preparation presumably are for tools. The question of how and why the ability of making likenesses evolved will be difficult to answer along adaptationist lines because it is difficult to ask along these lines in the first place.

Instead, it will be more appropriate to view the development of language, culture and artefacts and the evolution of the corresponding biological traits as dynamically and partly symmetrically related. Where, from an adaptationist vantage point, there is a unidirectional and modular trait-to-environment fit that is at most mediated through artefacts and culture, arguments from “environmental” or “niche construction” by Lewontin (1982) and Odling-Smee et al. (1996) highlight the interdependence between an organism’s traits and an array of features of his environment that are specific and specifically relevant to him. This interdependence includes the possibility that the existence and the present functioning of a trait directly involve the creation of artefacts and the modification of features of that environment by the organism himself, his ancestors or his peers.

In continuation of this line of constructivist and co-evolutionary argument, it can be assumed that the evolution of language and cognition, too, will depend on the creation and presence of features in the environment that enable cognitive accomplishments for their makers and users that would be otherwise unattainable. Phenomena of this kind are referred to as “scaffolding” and “cognitive niche construction”, the most pertinent discussions of which are Sterelny (2012a) and Laland (2017). One might also focus on the ways in which the creation and use of certain artefacts and structures might facilitate the evolution of linguistic abilities. Deacon (1997) argues that language as a reproducible structure in a straightforwardly evolutionary sense co-evolved with the human brain and its capability of symbolic reference. Alternatively, it has been argued that gestural communication in particular enabled complex social organisation in hominin groups and collectively shared modes of reference that are unavailable to voice-based animal signalling and that are improbable to have evolved from it (Arbib 2005; Corballis 2009; Sterelny 2012b). The more natural-history-oriented varieties of material engagement theory see a key to the evolution of human cognition and symbolic reference in embodied processes of giving form to natural materials through the creation of artefacts (Renfrew 2012; Malafouris and Renfrew 2010; Froese 2019). In a different direction, it has been argued that tool-making and material engagement require teaching by demonstration while enabling abstract causal reasoning that becomes detached from concrete present circumstances (Gärdenfors and Lombard 2020; Gärdenfors and Högberg 2017). Or one might inquire into the concrete ways in which artefacts are coupled with human agents in such a way as to facilitate, enable or possibly even constitute cognitive processes. This is the domain of extended cognition and related theories from the “4E” field (Clark and Chalmers 1998; Menary 2010; Newen et al. 2018). Whereas the majority of these theories primarily focus on present situations of coupling, I have highlighted the importance of a natural history of coupling relations and their evolutionary role for understanding cognition as extended, embodied and embedded (Greif 2017).

In terms of supporting the broad and diverse set of co-evolutionary and constructivist views described in the previous paragraphs, there are at least two distinct explanatory accounts of the basic human abilities that first entered into the co-evolutionary process. There are also two distinct views of the evolutionary time-line on which that process unfolded. These two variables will provide the boundary conditions of my argument for the co-evolutionary potential of a specific type of early human artefacts.

With respect to the first, the “abilities” variable, one may highlight the importance of imitation and cultural learning, with cumulative and “ratcheting” effects that partly loop back into biological evolution. This is the domain of theories of collective intentionality (Tomasello 1999, 2014). These theories understand imitation as the capacity of replicating an observed individual’s actions not merely in terms of attaining the same goal but in terms of faithfully reiterating the behavioural steps undertaken by the observed individual in pursuit of that goal. The imitation line of argument proceeds towards capacities of aiming at and referring to a shared goal in complex co-operative activities. Imitation, thus understood, is a distinctly cognitive ability that requires from its bearers a basic theory of mind, in terms of a conceptual grasp of one’s counterparts’ actions, aims and perspectives.

In contrast, the mimetic theory of the evolution of language and cognition (Donald 1991) highlights the role of embodied routines in replicating one’s own and one’s counterpart’s behaviours. Donald refers to “mimesis” and “autocueing” as the capacity of rehearsing and voluntarily retrieving one’s own bodily movements, which allow for improved motor control as compared to the primate ancestors of Homo. These basic skills, he continues, enable communication based on gestures and facial expressions, which forms the basis of dance and ritual. Although predating verbal language, these expressions are an important preadaptation to the evolution of language. The “mimesis” line of argument proceeds towards capacities of observing, copying and rehearsing bodily expressions in such a way as to turn them into standardised communicative resources. Mimesis, thus understood, is primarily a motor adaptation. It neither requires an explicit theory of other minds nor of one’s own while building upon a perceptual grasp of one’s counterpart’s actions and their direction.

These two lines of argument give rise to two distinct co-evolutionary storylines: Where imitation builds on an individual’s internal representational capabilities while tying these to environment-bound needs of practical co-operation, mimesis builds on embodied social interaction while highlighting the potential decoupling of what is communicated therein from proximate practical needs.

However, these two lines of argument might ultimately represent two sides of the same co-evolutionary coin. Initially, imitation might not have required a theory of mind but have fostered its development instead (Byrne 2003). Conversely, mimesis is likely to be present in the Great Apes to some degree (Mitchell and Miles 1993), and hence more basic and more deeply rooted in hominin evolution than Donald (1991) claims. Moreover, both abilities are within the functional domain of mirror neurones, which are specific to primates and play an important role in their cognitive abilities (Gallese et al. 1996; Fogassi et al. 2005). They might either be a by-product of associative learning or an adaptation to action understanding (Heyes 2010), or an ‘exaptation’ from the former to the latter. Understood in such deflationary fashion, mimesis and imitation may be two aspects of one ability that enabled co-ordinated activities in conjunction with the requisite communicative skills for a species that was already poised for sociality and manual dexterity.

With respect to the second of the above-mentioned variables, “time”, there are two markedly distinct accounts of the time-line of the evolution of the abilities under consideration here. Apparently, most of the ingredients of human cognitive life were present in some form quite early in human evolution, but equally apparently, they took until relatively late to come to full fruition.

First, there is the punctualist Late Upper Paleolithic “Revolution” model. According to this model, traits associated with behavioural modernity, including fully developed language, art, advanced stone tools and large game hunting appeared in one package, at one place and within very short time by evolutionary standards. They did so approximately 50 kya, and hence more than 100.000 years into the existence of anatomically modern Homo sapiens (Klein 1995; Bar-Yosef 2002; Tattersall 2009). Proponents of this model are opposed to the notion of a gradual and polycentric evolution of language and cognition, and include Donald as a prominent advocate.

Second, more gradualist models have been proposed, according to which human behavioural traits that are now considered “modern” developed partly independently and widely distributed over time and space. To some extent, they already appeared in ancestors of Homo sapiens, well before coalescing into what is now considered “behavioural modernity” (Davies 2019; Dediu and Levinson 2013; Henshilwood and Marean 2003; Lieberman and McCarthy 2015; McBrearty and Brooks 2000; Shea 2011; Watts 1999; see also Marshack 1976; Bednarik 2003a; Sterelny 2012b). Proponents of this view tend to be critical of the anthropocentric attitudes embodied in the punctualist models.

The Lower Paleolithic artefacts that I will discuss in the remainder of this essay provide partial evidence for the second, gradualist, polycentric view of the time-line of the evolution of human cognition. Accordingly, the artefacts will be argued to approximate the lower end of the “time” variable. My argument will be more parsimonious with respect to their bearing on the “abilities” variable, but I will suggest that mimesis and imitation should be interpreted in expressly deflationary fashion in this context, and that the artefacts tentatively support this interpretation.

Two Lower Paleolithic artefacts

Let me entertain the possibility that the first artefacts that might rightfully count as likenesses already appeared in Homo erectus and other ancestral humans during the Lower Paleolithic (from about 500 kya onwards), and that their making and use has been one important co-evolutionary factor in the establishment of the elementary forms of symbolic reference that would become characteristic of human language. It might be that these artefacts already were examples of symbolic reference-making. More probably though, the purposeful nature of their production, their specific structural features and the possibility of these features being referred to within a group jointly contributed to the emergence of symbolic reference, without the artefacts themselves being symbolically referring. While the first claim (See e.4 below) is a speculative possibility, the second claim can be supported with evidence from anthropology, comparative psychology and neuroscience besides the pertinent paleontological findings.

The partial and inevitably limited evidence that I can mobilise for this unequal pair of claims are two specimen from the Lower Paleolithic that belong to a small but growing corpus of findings of apparently non-utilitarian artefacts created during that period by Homo erectus and Homo heidelbergensis populations. I have selected those two artefacts from that corpus which are most remarkable in age and sophistication respectively:Footnote 1

e.1 Trinil. The shell engravings from Trinil (Fig. 1) date back approximately 500 kya. They were found at a site at the eponymous village on Java, Indonesia, together with remains of numerous other freshwater shells, including shell-made tools. The artefacts were first described by Joordens et al. (2015) as follows:

One of the Pseudodon shells, specimen DUB1006-fL, displays a geometric pattern of grooves on the central part of the left valve [...]. The pattern consists, from posterior to anterior, of a zigzag line with three sharp turns producing an ‘M’ shape, a set of more superficial parallel lines, and a zigzag with two turns producing a mirrored ‘N’ shape. Our study of the morphology of the zigzags, internal morphology of the grooves, and differential roughness of the surrounding shell area demonstrates that the grooves were deliberately engraved [...]. In addition, substantial manual control is required to produce straight deep lines and sharp turns as on DUB1006-fL. There are no gaps between the lines at the turning points, suggesting that attention was paid to make a consistent pattern. (Joordens et al. 2015, 229)

Fig. 1
figure 1

Geometric patterns on Pseudodon DUB1006-fL, Trinil, Java, approx. 500 kya (Joordens et al. 2015, 230). Scales: 1cm (a, c), 1mm (d), none (b). Reprinted by permission from Nature Publishing Group

e.2 Bilzingsleben. The elephant bone engravings from Bilzingsleben are encountered on four objects from the Steinrinne site in Thuringia, Germany. The artefacts, which partly also appeared to have served as tools, all date back approximately 350 kya. They were first described in English by Mania and Mania (1988), who write about the largest and most intricately engraved of these artefacts (Fig. 2):

[Artefact 1] is a tool which was manufactured from the spall of an elephant tibia. Fractures on one longitudinal edge and one end demonstrate its former use as a percussion tool. [...] The plane, 50–60 mm wide longitudinal surface displays a sequence of straight, single lines engraved into it. This sequence begins at the pointed end with a group of seven divergent lines, which adjoins a central sequence consisting of fourteen single straight lines engraved at regular distances, forming a fan-like arrangement. [...] Microscopic analysis shows them to be of identical cross-section and groove diameter, which allows the assumption that they were all engraved with the same tool. The sequence of lines appears to have been fashioned in the course of one single process. (Mania and Mania 1988, 93)

[...] we can observe that they [= the artefacts] must have had some significance, be it of a communicative, mnemonic or other form. (Mania and Mania 1988, 95)

What contribution could these two and other, similar artefacts have made to the emergence of forms of symbolic reference in early humans? If, on the one hand, the view to be defended here were that artefacts like these are evidence of the presence of symbolic art in Lower Paleolithic human populations, the artefacts would be insufficient to provide conclusive proof for this claim (a fate it would share with Bednarik 1995; Mania and Mania 2005). Nor would this claim already amount to a systematic case for the artefacts’ potential role in the evolution of human symbolic practices. If, on the other hand, my claim merely were that the markings on these artefacts are intentional, it would be fully supported by the available evidence while committing itself to silence on any co-evolutionary matters. Any claim within the spectrum between these poles has to be qualified in terms of evidential support.

Fig. 2
figure 2

Geometric patterns on Artefact 1, Bilzingsleben, Germany, approx. 350 kya (Mania and Mania 1988, 93). Reprinted by permission from Rock Art Research

Questions of evidence

The most forceful objection against an import of the present artefacts on the evolution of symbolic communication is that they might well have been one-off achievements that never gained any cultural traction in early human populations, and therefore cannot help to explain the evolution of symbolic practices under co-evolutionary premisses. This case rests on the observation that there are only a few isolated findings before the Late Upper Paleolithic whose status as non-utilitarian artefacts has been unequivocally established (d’Errico 1998; Henshilwood and Marean 2003).

The interpretation of these observations depends on taphonomic considerations: On the one hand, the rarity and isolation of findings from the Lower Paleolithic might be representative of the factual rarity and isolation of such artefacts. After all, artefacts of the kind under consideration typically preserve much better than other material manifestations of early human practices that might be equally relevant in the present context, such as ochre-based body decoration (Watts 1999), music (Killin 2017) or dance (Laland 2017). Accordingly, one might expect a richer record of relatively persistent bone and shell artefacts than is currently available.

On the other hand, the paleontological record might be partly biased in two relevant respects. First, it is likely that ecological changes in earlier hominid environments resulted in fewer persistent and well-preserved traces, while conditions were much more favourable to the preservation of Late Upper Paleolithic artefacts (Bednarik 2003a; Davies 2019). Even well-preserving bone and shell artefacts might be affected by this condition, which would have to be factored into any comparison between the frequencies of findings across the ages.

A second potential source of taphonomic bias lies in selective sampling practices that focus on sites, regions and ages where experience suggests new findings to be expected, at the neglect of others. However, the fact that one does find certain artefacts from a certain time at a certain place with more ease and in larger quantity is insufficient to prove that more of these artefacts existed at that time in that place than anywhere and anytime else. Conversely, artefacts of certain types are not expected from Homo erectus under the taken-for-granted assumption that art only appeared in the Late Upper Paleolithic. After all, the Trinil artefact had been part of an archeological collection for over one hundred years before the engravings were noted for the first time by Joordens et al. (2015).

The following observations and predictions may serve to properly qualify the evidential value of the artefacts under consideration for my main argument: If and to the extent that more artefacts of a similar kind to the Trinil and Bilzingsleben specimen were found that display some focus in space and time, tenable evidence for cultural transmission and a solid basis for arguments for their co-evolutionary relevance would be established. Being aware of the lack of such a solid basis while presuming that further paleontological research might still establish it, the discussion in section “A space of explanatory hypotheses” considers the possibility of the truth of this ‘maximal’ interpretation – which is not to be confused with the often overbearing speculations on the Bilzingsleben artefact in particular.Footnote 2

In contrast, if artefacts of the Trinil and Bilzingsleben kind continue to be found only rarely and remain scattered over space and time of origin, they will provide evidence of archaic human abilities that were not, or not reliably, culturally transmitted. However, even if the artefacts did indeed not partake in cumulative culture, the accomplishments of intentional marking they embody would still be notable, as they bear testimony to artefact-making practices that require cognitive and motor abilities otherwise not found in the animal kingdom. This is the basis for the ‘minimal’ interpretation of the artefacts under consideration to be pursued in in section “A space of explanatory hypotheses”.

This minimal interpretation and everything from hereon upwards is supported by the observation that the development of bilateral symmetry in the making of bifacial tools was an achievement roughly contemporaneous with the artefacts under consideration here; this development was pervasive, and it also requires a practical and conceptual grasp of equivalence relations and their reversibility as well as of shape constancy (Wynn 1979, 1993, 2002; see also Currie and Killin 2019). The precision and sophistication of tool-making enabled by bilateral symmetry will help to explain the precision and sophistication embodied in the Trinil and Bilzingsleben artefacts. Moreover, bilateral symmetry can be subsumed under an explanation from the motor-control aspects of mimetic abilities, and might be part of one and the same co-evolutionary process.

According to the distinction between “minimal-capacity” and “causal-association” inference that Currie and Killin (2019) introduced into the present explanatory context, the space of admissible interpretations of the artefacts between their minimal and maximal varieties is delimited in two charactersitic ways: On one side, there are certain abilities and a certain effort necessarily required for producing those engravings, which form a set of minimum requirements and exclude a number of otherwise imaginable origins. In the present case, these minimum requirements are intentional marking and basic mimetic abilities. They foreclose an incidental origin of the engravings, as brought forward in the “cutting board” charge against the Bilzingsleben artefacts in particular (see White 1992 and his commentary on Bednarik 1995; see also Mithen 1996). On the other side, there is a wide but finite space of roles above the minimum threshold that the engravings might have played. Among these, a possible role of the markings as instrumental and tool-like can be excluded on the grounds of their physical characteristics, none of which suggest an environment-directed effect or a contribution to an effect that they might have made. Keeping in mind that this is a domain of “how-possible” rather than “why-necessary” explanations in the sense that William Dray (1957) identified as typical of explanations in the historical sciences, the space of the remaining possibilities can be explored by comparatively evaluating various lines of evidence, as I will do in the following section.

A space of explanatory hypotheses

In light of the preceding considerations, I will make my case for the Trinil and Bilzingsleben artefacts as early forms of likenesses by going through four hypotheses within the space of possible interpretations outlined in the previous section, and assess their respective plausibility. From this set, two hypotheses will emerge as the most plausible minimal (e.1) and maximal (e.4) ones, while e.2, e.3 and e.4 themselves can be subdivided into more demanding and more elementary varieties. In their most elementary forms, e.2, e.3 and e.4 are complementary to each other rather than mutually exclusive (see section “The embodied origins of conventional reference”).

e.1 The engravings were ‘spandrels’ that were ‘exapted’ for other functions. This is the minimal and evidentially most tenable interpretation of the artefacts. It is possible that the engravings were a structured but secondary effect of an activity that served a different purpose. This effect might in turn have been co-opted for new purposes. In the parlance of contemporary evolutionary theory, they would be ‘spandrels’ that enter into processes of ‘exaptation’ (see Gould and Lewontin 1979 and Gould and Vrba 1982 respectively). Concretely, the engravings might have been created in the process of playfully exploring or more earnestly testing what can be done with a sharp stone and a shell or bone: how much force has to be exerted, how the tool is best controlled, how deep, broad and regular the incisions can be made. This interpretation matches the more basic, motor-control elements of mimesis as envisioned by Donald (1991) and material engagement theory. By virtue of their observable regularity, the engravings would provide feedback to their creator both on his or her own movements and on the nature of tool and material. In this quality, they would have been purposefully created but otherwise “self-sufficient marks” in the sense introduced by Davis (1986). In consequence though, they opened up a space for exploration of other possible capacities.

More specifically, the engravings might become an object of the kind of aesthetic apprehension of regular patterns that has been argued to be present in most higher animals by Hodgson (2006). Regular patterns, he argues, create a positive feedback or “resonance” effect in neural pathways, which in turn facilitates the detection of these patterns. On this account, some neurones in the visual brain are specialised on processing certain geometric patterns, and become “hyper-stimulated” by their perception. The human-specific accomplishment lies in not only seeking out but deliberately triggering resonance effects through the creation of geometric patterns or “primitives” by drawing, painting or incising them. Accordingly, the role of the artefacts would have been to embody geometric primitives that trigger resonance effects, which in turn fed back into an evolving sense of shape constancy. On this view, any possible representational qualities of the engravings are derived from those patterns (see e.3) but not necessitated by them. These qualities might ultimately rest on the artefacts' capacity of being shown to and observed by other members of a group, and to thereby become objects of communication by whatever means available (see e.4)—a possibility explicitly discussed by Donald (1991) but not further considered by either Hodgson (2006) or Davis (1986).

e.2 The engravings served elementary indexical functions. This interpretation is marginally supported by evidence. Especially in the Bilzingsleben case, indexical functions may have included pointing, calculating or measuring angles, distances, time or other properties (an interpretation brought forward, for example, by Mania and Mania 2005). In this case, the artefact’s function would have largely comprised in relating co-present individuals to some concrete world affair in their environment, and to provide support in directing their joint activities towards it. If used in this fashion, the engravings would refer to world affairs in stimulus-dependent and stimulus-directed fashion. Notwithstanding the possibility of being first used for purely individual mnemonic purposes, the artefacts would thereby have lent themselves to becoming instruments of collective reference-making. This interpretation would most closely associate the engravings with a view of the evolution of human cognition as being rooted in the needs and abilities of practical co-ordination, co-operation and shared intentionality, as brought forward by Tomasello (1999).

However, all but the most elementary functions of this kind would already have demanded relatively well-developed abstractive skills from its creators and users. These would require an explanation and evidential support in their own right and at least partly presuppose abilities that fall under e.4. Alternatively, the most plausible elementary function of an indexical kind would be individual counting and calculation with the help of iterated markings. Given that an elementary number sense and the ability of approximately numbering objects can be found in many higher animals and human infants, the development of a discrete numerical system might be scaffolded by artefacts that can be used for counting and basic calculation at a stage prior to the development of convention-based numerical symbol systems (Dehaene 1997; Fabry 2018, but see also Donald 1991).

e.3 The engravings served elementary iconic functions. This interpretation is tentatively supported by evidence. The engravings might have been created with the purpose of bearing an observable resemblance to some patterns detectable in objects and structures in their creators’ environments, such as the scattering of sun rays or the serration of shells. The engraved patterns would have invited both their creators and their observers to consider how they are supposed to relate to such world affairs: how to generalise from cut marks to superordinate patterns, how to continue a line in thought, what is particular about the lines detectable in this object or that structure, or how to highlight a specific function of the object carrying the engravings.

Alternatively, as suggested by Davidson and Noble (1989), the origins of iconicity might be located in the mimicry of events or animal behaviours through the “freezing” of gestures and the making of traces. In this context, mimicry is understood in a more advanced sense, similar to that proposed by Donald (1991). Gestural mimicry, Davidson and Noble ’s argument continues, gave rise to the deliberate creation of marks that in some way resembled objects or events in the world. This kind of practice would have first allowed the creators and observers to perceive and signify the relation between themselves, the image and the object. Only when this kind of signification is in place, the authors conclude, meaning could be given to non-iconic marks that are independent of present contexts. However, Davidson and Noble (1989) firmly place the emergence of this kind of practice in the Late Upper Paleolithic, categorically excluding archaic artefacts of the Bilzingsleben and Trinil kind from consideration.

In a more deflationary spirit and in continuation of the argument presented in e.1, Hodgson (2006) suggests that the visual brain resonates with certain marks and shapes in such a way that the preferences for these marks and shapes “become subject to material realization” (2006, 54). This would amount to the claim that the Lower Paleolithic engravings bear similarities to the patterns that specific neurones in the visual brain are adapted to detecting. They would be likenesses not of objects but of the stimuli that resonate with the visual brain. They would be likenesses, for instance, of the radiating patterns detected in sunbeams, or of the vein patterns detected in leaves—but strictly speaking not of the sunset or the plant as a whole. More straightforwardly and unequivocally, Hodgson sees iconic likeness relations embodied in the figurines of Tan Tan (approx. 400 kya) and Berekhat Ram (230 kya)—which, being at most slightly modified natural objects, might have been used rather than created as figurines (see also Bednarik 2003b; Goren-Inbar and Peltz 1995). Early human iconic forms of either kind might have entirely relied on individual perception or interoception rather than on conventions of reference. Both in its deflationary and in its more demanding varieties, this interpretation puts the engravings under investigation closest to Hacking’s notion of likeness-making.

e.4 The engravings served elementary symbolic functions. This is the maximal interpretation that is at least tentatively supported by evidence. The engravings might have been designed to relate to world affairs in a stimulus-detached and elementary convention-based manner. Instead of a concerning a concrete individual or type of object that was present or altogether observable to their makers and users, they could have referred to something spatially or temporally remote, or to something non-existent. In doing so, they could also have abstracted from observable similarities to their referent. In order to be able to refer to world affairs in these ways, the artefacts would have required rather than merely invited some form of communication concerning what the markings were supposed to do or mean, because there would be no direct and obvious way of fixing that relation. At this point, elementary forms of negotiating and agreeing referential relations, and therefore pre-linguistic conventions would be involved.

On the most demanding interpretation, artefacts might have attained a status approximating that of “exograms” as “external memory records” (Donald 1991) or of “artificial memory systems” as “physical devices specifically conceived to store and recover coded information” (d’Errico 1998, 20). Besides systems of writing proper, artefacts used for more basic purposes of notation, such as record-keeping or calendars, will fall under this category. D’Errico (1998) in particular presents ample evidence for the existence of notation in the Upper Paleolithic while excluding the much more ancient Bilzingsleben artefacts from closer consideration because of the inconclusive evidence of intentional marking he saw at the time of his writing. Structurally, however, the Lower Paleolithic artefacts share several key properties with artificial memory systems. Given that intentional marking has meanwhile been established for them, the Bilzingsleben artefacts would be prima facie admissible to the domain of symbolic notation under d’Errico’s, but also Donald’s analysis.

On a more basic interpretation, the symbolic nature of the engravings is rooted in the shared neuronal mechanisms involved in the perception of artificial geometric forms, natural objects and written words (Mellet et al. 2019). There is some evidence for a three-way relation between the activation patterns present in the visual perception of geometric forms, in object recognition, and in the recognition of writing. These activation patterns occur on a higher level of neuronal organisation than the ones mobilised by Hodgson (2006). Naturally, all the available evidence for Mellet’s (2019) point comes from studies with contemporary human subjects, whose neural constitution is supposedly similar to but not identical with that of early humans. The third element of the suggested relation also presupposes the existence and knowledge of writing systems on the subjects’s side. However, if early human brains were activated in similar ways in similar areas by the perception of geometric patterns and natural objects alike, and if the same brain areas are responsible for word recognition in modern humans, this parallelism would provide additional support to scaffolding arguments of the evolution of symbol use. It would also lower the bar for the conventionality of symbolic forms, by putting evolved neuronal mechanisms and material structures first.

If perception of and material engagement with geometric forms are as elementary as suggested in the previous argument, artefact properties such as parallelism of lines or their imaginary points of origin can be inquired and demonstrated without recurring to an elaborated repertoire of convention-based symbolic forms. Establishing and communicating more complex referential relations will require a grasp of type and token identity, negation transformations and logical conditionals, and probably an ability to pose inquiries analogous in some form to “what” and “how” questions. Rather than already presupposing such a grasp, the artefacts under investigation, for being tangible objects bearing intentionally created structures, would have lent themselves to practices of triangulation between creators, observers and object that facilitate the development of a grasp of those relations. They might thereby have enabled forms of proto-convention in an exchange of pre-linguistic expressions.

In contrast, forms of animal signalling do not appear equally prone to providing a foundation for convention-based collective reference-making. In all known cases, animal signals remain confined to a combined indicative-and-directive, “pushmi-pullyu” mode of operation. Taken by themselves, voice-based, gestural and mimic signals do not demonstrate a potential of developing into more elaborate forms of communication. In particular, there are no forms in animal communication that would approximate the asking of questions. For instance, Great Apes are capable of referring to concrete objects and situations by imitation and pointing; some chimpanzees could learn to use a simplified symbolic language and answer questions posed to them in that language, but they never came around to asking questions or developing basic forms of symbolic communication by themselves (Premack and Premack 1983). In the case of the artefacts under investigation here, however, embodied practices of inquiry into what some arrangement of markings may stand for appear as a plausible route towards the asking of questions, and therefore the establishment of some of the first forms of convention.

The embodied origins of conventional reference

Although the suggestion in e.4 is that the mode of reference of the artefacts was symbolic, there are several reasonable objections against this interpretation. Most straightforwardly, a clear-cut distinction between modes of reference might be factually impossible. The engravings cannot be determined to be unequivocally symbolic, as they might well have been employed to imitate naturally occurring patterns while doing so on some level of abstraction. Nor can the patterns be determined to be unequivocally iconic, as their “community in some quality” (Peirce 1868/1992, §14) with individuals or types of natural objects might well have been mediated and conceptual. After all, the apparent degrees of abstraction of the engravings are not a measure of being iconic or symbolic in Peircean terms. This could only be judged from knowing how Lower Paleolithic humans actually perceived their environments.

More fundamentally, there is no principled way of determining the mode of reference of the artefacts under consideration, nor of assigning primacy or priority of one mode over the others. This is not only a problem of insufficient evidence, but also of the interpretation of the chosen categories. The distinction between indexical, iconic and symbolic modes of reference is analytically useful when it comes to contemporary ways of relating to world affairs. However, first, it is not a difference in kind (although Peirce himself later, for example in 1903/1998, developed it into a complex set of metaphysical categories). Second and more importantly, symbolic, iconic and indexical reference are categories imposed by the modern observer. It should be clear that this set of distinctions cannot be superimposed on a Lower Paleolithic conditio humana as if it had been evident and meaningful to human beings of that time. Iconic similarities that are apparent to modern observers cannot be presumed to have been similarly apparent to Lower Paleolithic human populations. People who were just in the process of establishing elementary forms of likeness-making cannot be expected to have possessed reflective knowledge on whether a particular shape was an iconic likeness of an individual, an image exemplifying the generic properties of a type or a fully symbolic representation instead. Decisions of this kind would presuppose that these still-emerging modes of reference were already established and well-understood.

The distinction between modes of reference might become meaningful by a different, indirect route though. This route is based on the observation that the engravings in question were geometric in shape. If geometric principles are universal in the Platonic sense of being independent of how human beings conceive of them, the engravings likely testify to a perceptual grasp of the presence of geometric patterns in nature that embody these principles. In this case, the reference of the engravings would be iconic in the specific sense of depicting patterns that are governed by these principles. Their form would be dictated by the pertinent geometric principles, and would at most partly depend on conventions concerning their expression. Conversely, if geometric principles are not universal but dependent on human conceptions, the shape and any kind of referential relation of the engravings could only be fixed in the process of their creation and use. Provided that this use were public, these properties would be convention-based by necessity and ab initio, and hence meet one of the basic conditions for being symbolic. However, this interpretation presupposes that such conventions could already be articulated to some degree at the time of the artefacts’ creation.

There is an alternative to this dichotomy that does not have to make similarly strong ontological commitments and that is able to integrate the iconic, indexical and symbolic hypotheses: Rehearsing embodied skills of tool-use and the various possible kinds of feedback provided by the patterns thus created contributed to establishing a set of regularities of form, in the dual fashion explored in the ‘exaptations’ hypothesis (e.1) and in line with the previously discussed concepts of evolutionary scaffolding and material engagement. The creation of further artefacts that exhibited the same forms, and variations and transformations thereof, would be guided both by the material nature of the artefacts that first embodied these forms and by the embodied nature of the processes that realise artefacts of this type. The regularity of the patterns would serve as proposals of proper form that operate by example rather than through linguistic or quasi-linguistic expressions of the rules of proper form. Through showing the patterns and through demonstrating and learning the making of these patterns, both the pattern types and the skills required for their production would be transmitted between members of a group, enabling cultural accumulation and inheritance. A set of artefacts that are produced and reproduced within a population would also enable its members to refer, either synchronically or diachronically, to the regularities and the regular transformations displayed by the patterns. These patterns would not yet need to refer to some world affair, nor would they be committed to doing do so iconically, indexically or symbolically. Instead, the regularities of those patterns, their perceptual recognition and their cultural transmission would prefigure rules and conventions of form, which in turn are necessary ingredients of symbolic reference. Their capacities of detachment both from present stimuli and from similarity conditions will provide the other necessary ingredients.

Hence, this latter route to symbolic reference is not based on the geometricity of the artefacts as such but on their materiality, embodiment and collective use. The aspects of material realisation and embodiment affect the condition of detachment or stimulus-independence that in turn is pertinent to a sign’s mode of reference. Paradigmatically, indexes are spatially and temporally context-bound, and often transient in nature, such as in pointing and other demonstrative gestures. In contrast, iconic and symbolic signs may refer to a subject matter in detachment from the spatio-temporally bounded contexts of its appearance. Gestural communication and voice-based signalling, with their transient and context-bound tokenings, cannot effectively accomplish this. Dance, music and other forms of embodied routines that more strongly rely on repeated and faithful imitation of behaviour are one possible route towards detachment, whose relevance has been investigated by other authors (for example, Killin 2017; Laland 2017) but which left no traces in the paleontological record. They require a similar set of embodied skills as the other route to detachment, which was under investigation here and is more permanent in its mode of realisation: artefacts that display patterns which can be repeatedly, reliably and materially referred to in various contexts and in various ways that may come to stand for present, absent, possible or non-existent world affairs. This is what pictures and writing uniquely accomplish.

Under the qualifying conditions discussed above, my argument for a primacy of symbolic reference with respect to the artefacts under consideration can be summarised as follows: If, first, any collective reference-making requires some elementary form of convention, and if, second, one distinctive capacity of human reference-making is the possibility of detaching signs from stimuli, and if and to the extent that, third, conventionality and stimulus-independence are the key characteristics of symbolic reference, symbolic reference will be more fundamental than either iconic or indexical reference. This, rather than investing the Lower Paleolithic artefacts with symbolic or other meanings that cannot be recovered from the paleontological record, is the way in which our modern categories are useful in interpreting those artefacts.

Even though I cannot come up with an account of the selective pressures and adaptive functions that might have accrued to the Lower Paleolithic artefacts and their qualities without turning to just-so stories that are even less supported by evidence than the preceding speculations, it will at least be important to take another look at the timeline of human evolution in order to understand the potential co-evolutionary significance of the artefacts: If the first forms of figurative art appeared in the Late Upper Paleolithic, as the punctualist “Revolution” model suggests, they appeared at a time when human linguistic abilities were likely already fully developed at least at the organic level. Modern forms of symbolic language might have developed with the help of works of art that facilitated detached and conventional reference-making. Still, it is evident that some form of symbolic language was already in place at that time, given the presence of anatomically modern neuronal and vocal tract structures that would not be useful for much else besides speech or song. This would leave us with the question of how and why these organic traits evolved in the first place. If, however, the first forms of art, figurative or other, appeared in the Lower Paleolithic, as suggested here, the artefacts were created and used at a time when human linguistic abilities were not nearly fully developed at the organic level, so that the presence of a faculty of symbolic language cannot be presupposed. In this case, artefact-based reference-making might have been emerging at a time when forms of spoken or gestural language were emerging, too. All of these kinds of reference-making could then be traced back to the same set of basic and embodied practices of mimesis and imitation, where these practices and abilities would have mutually shaped and supported each other.

Conclusion

Prima facie, one might assume that my case for filling Hacking's (1983) narrative with empirical content is the same as the one brought forward by Davidson and Noble (1989, 125), who say “that communication of some sort is necessary for depiction and, further, that depiction transforms communication into language.” However, what I presented in this essay differs from their account in two important respects. First, my aim was to demonstrate in exemplary fashion that the origins of modern human cognitive abilities were manifold, distributed over time, place, adaptive function and locus of realisation, and that they partly already evolved in early humans rather than the Late Upper Paleolithic. Second, the Trinil and Bilzingsleben engravings, rather than being iconic likenesses, might have anticipated or already partly embodied some of the specific qualities of human symbol use, as patterns that may be made, used and reproduced to collectively refer to any kind of world affair in any kind of agreed-upon way. In spawning symbolic reference, these artefacts may have been a scaffold for the evolution of the human mind as we know it.