At some point between the rise of early Homo around 2 Ma (million years ago) and the appearance of our own species, anatomically modern humans, around 200 ka (thousand years ago), hominins began to increase the size of their social groups significantly beyond those typical of monkeys and apes (and australopithecines) (Gowlett, Gamble, & Dunbar, 2012; Dunbar, 2014a). This much we know as a direct consequence of the social brain hypothesis and the fact that across primates (including modern humans) social group size correlates with brain size (Dunbar, 1992, 1998, 2011): since we know where we started (as a great ape) and where we ended up (as modern humans), it follows that hominin community size must track the changes in brain size in between (Gowlett, Gamble, & Dunbar, 2012; Dunbar, 2014a).

Since monkeys and apes bond their social groups through social grooming (and indeed we still do ourselves, in the adapted form of stroking and patting; Suvulehto et al., 2015), and since there is a direct linear relationship between social group size and the amount of time devoted to social grooming (Dunbar, 1991; Lehmann, Kortsjens, & Dunbar, 2007; Dunbar & Lehmann, 2013), this would have dramatically increased the demand for social time in a context where time budgets were already stretched to the limit (Bettridge, 2010). For fossil Homo sapiens, the time required to meet these demands would have exceeded the available 12-h tropical day by around 60 % (Dunbar, 2014a: figs. 5.2 and 5.3). In effect, the constraints of time created a glass ceilingFootnote 1 that would have prevented increases in community size beyond those characteristic of monkeys and apes (including the australopithecines; Bettridge, 2010).

This glass ceiling arises from the fact that time is inelastic (we cannot stretch or create time) combined with the fact that, in both monkeys and humans, there is a direct link between the time invested in a relationship and that relationship’s quality and functionality (Dunbar, 1980; Roberts & Dunbar, 2011). Social grooming is physically restricted to being a one-on-one activity (and still is with us) and this limits the size of social group that can be bonded in the time typically available for social interaction to about 50 individuals (Lehmann, Korstjens, & Dunbar, 2007). With the progressive evolution of hominin community size to around 150 in modern humans (Gowlett, Gamble, & Dunbar, 2012), some novel means were necessary to allow time to be used more efficiently for social bonding so that larger groups could be evolved. In effect, it was necessary to find a way of triggering the same neuroendocrine mechanism that underpins bonding in primates (the endorphin system; Keverne, Martensz, & Tuite, 1989; Dunbar, 2010; Machin & Dunbar, 2011) to more individuals simultaneously (in effect, increasing the grooming group size).

I have suggested elsewhere that laughter (as a form of wordless, amusical chorusing) evolved very early during human evolution as a way of increasing the size of the grooming group (Dunbar, 2012a). Though both humans and great apes laugh, the form of laughter differs significantly between these two taxa: ape laughter consists of a series of exhalation–inhalation events, whereas human laughter consists of an uninterrupted series of exhalations with no intervening inhalations (Provine, 2000) which is dependent on close breath control unique to humans (MacLarnon & Hewitt, 1999). When humans laugh, the lungs are progressively emptied, and it is likely this that triggers endorphin activation through the stress it puts on the diaphragm and chest wall muscles, not to mention the progressive lack of oxygen (the “dying of laughter” phenomenon). In other words, laughter triggers the endorphin system, just as does grooming (Dunbar et al., 2012b; Nummenmaa et al., 2016).

There are at least four reasons for thinking that laughter is primitive and long predates the evolution of language. First, it has a spontaneous, visceral, strongly involuntary quality; second, it is intensely social (when others laugh, we cannot help laughing even if we did not understand the joke; Provine & Fisher, 1989); third, we do not need language to trigger it (tickling, slapstick and others’ misfortunes and even simply others laughing are good enough); and fourth, we share it with the great apes (in fact, laughter derives from the conventional primate play invitation vocalisation; Provine, 2000; Waller & Dunbar, 2005; Bryant & Aktipis, 2014). Of course, we also use jokes to trigger laughter, but the point is that we do not need to. I return to this below.

While grooming is an exclusively one-on-one activity, laughter is more explicitly social. The average size of naturally occurring laughter groups is about three (Dezecache & Dunbar, 2012) and, since all three people laugh, all get the endorphin ‘hit’. In social grooming, by contrast, it seems that only the recipient gets the ‘hit’. Since grooming thus has an effective group size of one, laughter is three times more efficient than grooming in its capacity to bond individuals. This more efficient use of time would have reduced the time demand for socialisation quite dramatically, and calculations suggest that it might have been sufficient to solve the bonding problem completely for early Homo (Dunbar, 2012a, 2014a). Note that laughter does not replace social grooming, but rather supplements it.

Although laughter might have solved the rather modest bonding problem for early Homo ergaster/erectus populations by allowing the same amount of social time to be used more efficiently, it is clear that it cannot have solved the much greater problem faced by later archaic and modern humans after 500 ka with their much larger community sizes (Gowlett, Gamble, & Dunbar, 2012). At this point, something else was needed to break through this constraint and allow still larger social groups. The answer seems to be singing, or musical chorusing.

Singing shares with laughter and speech two important features, namely segmentation and breath control. Segmentation is important for the syntactical structuring of long sentences, but breath control is crucial in that it makes possible the long exhalations on which speech depends for its fluency. Active control over the machinery of vocal production (diaphragm, oral articulation) does not seem to predate archaic humans (Dunbar, 2009, 2014a). The appearance of archaic humans around 500 ka is associated with a shift from a primate-like form to a modern human-like form in a number of key anatomical features related to speech, the most important of which is the thoracic nerve bundle (controlling diaphragm and chest wall muscles) (MacLarnon & Hewitt, 1999). However, the hypoglossal nerve (controlling the tongue and articulatory space) exhibits a similar pattern (Kay, Cartmill, & Balow, 1998), while the position of the hyoid bone (lowering the larynx to increase the articulatory space to allow vowel sounds) (Arensburg et al., 1989) and ear canals capable of hearing human speech (Martinez et al., 2004) all seem to appear at around this time (Dunbar, 2014a; see also Fitch, 2000). While the significance, and even validity, of the last three has been questioned (not always validly in respect of the first; Dunbar, 2009), the fact that, within the limits of the archaeological record, all four seem to converge on the same time point for the appearance of a human-like form adds weight to the more secure finding for the thoracic nerve.

In short, the appearance of archaic humans seems to have been associated with a crucial change in the capacity for breath control and, possibly, articulation of a kind that was not needed for laughter but was later needed for language. This might mark the point at which speech evolved. But equally, it might mark the point at which some other form of vocalisation short of speech evolved. Given the ‘primitive’ and intensely emotional aspect of music, I would suggest, as did Mithen (2005), that this in fact marks the appearance of a form of wordless singing, or humming. This much is clear from the observation that singing shares with laughter the fact that we can sing with or without words – witness both the puirt à beul (literally, ‘music of the mouth’) and waulking (òrain luaidh, work songs whose words are often nonsense sounds) traditions in Scots Gaelic folk music and scat singing in jazz.

Importantly, singing triggers the same endorphin mechanism as grooming and laughter, and at the same time increases the sense of belonging or social bonding (Pearce et al., 2015, 2016; Weinstein et al., 2016). It thus contributes directly to the same bonding process as laughter and grooming. But, unlike laughter and grooming , singing seems to have no limit on the size of group that can experience this effect (Weinstein et al.. 2016).

However, something important happened during the archaic human period that indirectly helped to solve the problem, literally overnight. This was control over fire and the regular use of hearths. The archaeological record demonstrates rather clearly that, although hearths occur sporadically from about 1.0 Ma, they do not become a regular feature of hominin fossil sites until around 400 ka, after which they are everywhere (Roebucks & Villa, 2011; Dunbar & Gowlett, 2014). Hearths have three important ecological consequences: they help with thermoregulation by raising the local ambient temperature, they enable food to be cooked (with a consequential increase in nutrient extraction, although only from red meat and tubers; Wrangham et al., 1999) and, most importantly in the present context, they lengthen the active day. Sitting around a fire in the evening typically adds 4–5 h to the working day in the tropics for modern humans, and makes an even greater contribution at high latitudes where daylength in winter can be as short as 6–7 h.

Firelight is not a panacea for everything, however: the light quality is generally poor, too poor for anything but the simplest kinds of tool manufacture, and there is a limit to which the business of cooking and eating frees up spare time from the day. However, the evening does lend itself to socialising: if the processes of social bonding can be shifted into these evening hours, it frees up the hours of daylight for food finding, solving at a stroke the whole of the time constraints crisis. There is, however, one crucial limitation: only a handful of individuals can sit around a hearth, and the circle of light it provides does not extend more than a meter or so.

If laughter functioned as a form of chorusing, could the number of individuals involved be increased in a firelit environment to allow large communities to be bonded? The answer seems to be probably no: laughter needs a trigger, and this has to be either physical (slapstick) or verbal (jokes), while jokes depend on significantly higher levels of cognitive processing (Dunbar, Launay, & Curry, 2016) than archaic humans could aspire to (Pearce, Stringer, & Dunbar, 2014; Dunbar, 2014a). However, if wordless chorusing began to be used to allow communal chorusing on a conversational or even camp-wide scale, it would have provided a natural template for the evolution of voiced speech, and hence language, by the very short additional step of mapping meaning onto sound (as originally proposed by Darwin 1871 and Jespersen 1922). Here, spoken language is crucial: gesture is difficult to make out across the half-light of the fireside, but spoken language carries far beyond from one hearth to the next. In this new context of fireside socialising, language has the important additional advantage that it allows us to tell stories and jokes. Jokes allow us to control the frequency of laughter, and to keep it going in a context where it would naturally die once the slapstick event that had triggered it has passed. More importantly, stories play a crucial role in community bonding by creating a sense of belonging to a community through the transmission of a common culture. Indeed, they are the major means by which we do this.

Sense of humour and shared cultural knowledge seem to be crucial components of relationship quality even at the dyadic level. There appear to be six major dimensions that determine friendship quality, all of which are cultural and thus change through time. These are: shared language (dialect), shared place of origin, shared educational history, shared hobbies/interests (including musical tastes), shared world view (moral, political and religious views) and shared sense of humour (Curry & Dunbar, 2013a,b; Launay & Dunbar, 2015a,b). The fact that all of these are cultural and subject to continuous change (a trait they share with dialects; Nettle & Dunbar, 1997) is highly advantageous: it allows the cues of community membership to change continuously through time, thereby demanding immersion in the cultural community for an extended period of time, and making it difficult for imposters to pass themselves off as community members.

Although most primates are intermittently active during the night, their activity is mainly confined to alarms and occasional squabbles over spacing. Their relatively poor nocturnal vision makes significant social or feeding activity difficult. In contrast, humans regularly use the evening for social activity, typically by exploiting the modest light offered by hearths. The importance of the evening as specialised social time is indicated by an analysis of !Kung San hunter-gatherer conversations. This revealed that factual conversations tended to dominate daytime conversations, whereas stories and jokes tended to dominate conversations around night-time hearths (Wiessner, 2014; see also Dunbar, 2014b). The night time is social time, and humans are the only anthropoid primate capable of being active both day and night. The ability to exploit the evening hours seems to have been crucial in facilitating the evolution of language as a final step in the accumulation of novel, specialised bonding mechanisms that helped break through a series of successive glass ceilings. I suggest (1) that language evolved directly from primate vocalisation, and not via an intermediate gestural stage, (2) that it did so in effect via an intermediate musical phase (as suggested by Mithen 2005), (3) that its use was explicitly social, mainly in the form of story-telling, and (4) that language as we know it evolved late.