Introduction

Sensory perception involves the detection of phenomena within an organism’s internal and external environment, and the resulting sensory data is then processed neurologically. Environmental data is gathered via different sensory channels such as the acoustic, olfactory, tactile and visual channels. Thus, sensory perception involves different modalities of data processing and interpretation (Jorquera-Cabrera et al. 2017) all of which are traditionally viewed to be functionally and structurally dissimilar. As sensory perception gives information about the organism’s environment, it therefore drives behavioural decisions; for example, those in relation to foraging, mate choice, predator avoidance and habitat selection, which in turn impact an individual’s survival and reproductive fitness (Bradbury and Vehrencamp 2011). From an evolutionary perspective, the morphological phenotypic features which allow sensory perception to take place are shaped by the surrounding environment (Nourmohammad et al. 2017). Furthermore, an organism’s perception of the world is based upon data gathered from more than one sensory modality, giving a multimodal schema of the world. In addition to perceiving environmental phenomena, organisms share their sensory information with each other, and also signal internal motivational-emotional states via different means of communication.

Communication can be defined as the transfer of energy or matter between individuals, which then changes the behaviour of the receiver (Bradbury and Vehrencamp 2011). However, a more precise definition is that energy and matter are the vehicles by which a sign is transferred from one individual to another. Whilst the majority of animal species communicate, human language is a highly complex means of communication, whereby information is transferred from the signaller to the receiver via different sensory channels. When expressed vocally as speech, spoken language is perceived by the receiver via the auditory sensory channel. However, when language is expressed via written symbolic media, it is perceived via the visual sensory channel. Language is so fundamental to human cultures that it has been adapted to allow those who have sensory impairments, for example, visual and hearing impairments, to communicate using alternative means such as braille and sign language (Chen and Saulter 2018). The human use of language is thus both innate and cultural (Clay et al. 2014; Kirby et al. 2007; Chomsky 1967) and drives the formation and cohesion of human societies. Study of language evolution is therefore interdisciplinary, with input from philosophy, anthropology, psychology, linguistics, cognitive science, sociology, behavioural ecology, physiology and evolutionary biology.

However, despite being complex, human language nonetheless follows rules which allow novel communications to be understood by any receiver who is familiar with the language being spoken. Linguistic studies identify the various rules and elements of a language, including semantics, which focusses on meaning, and syntax and grammar, which describe the form of the language (Smith 2016). Originally thought to be unique to humans, other taxonomic groups including whales and dolphins, songbirds and non-human primate species, have now been found to use syntax as part of their vocal repertoires (Suzuki et al. 2016; Janik 2013; Clarke et al. 2006; Okanoya 2004; Hailman and Ficken 1986). Further, these communications also have semantic value, such as the nightingale’s aggressive and territorial broadband trill (Schmidt et al. 2008). Nightingales, like all songbirds, learn their vocal repertoires and songs as juveniles, from mature adult birds. The neurological mechanisms via which learning occurs are analogous to those involved in human speech learning in infants (Beecher and Brenowitz 2005; Zeigler and Marler 2004). Moreover, song is more complex than simple bird calls which indicate emotional state, and thus song is an example of a form of communication which has phonetic structures analogous to those found in human language (Sasahara and Ikegami 2007). For example, the acoustic parameters of birdsong have now been found to contain specific information about the identity of the signaller (Couchoux and Dabelsteen 2015). In summary, the neurological and structural characteristics of birdsong and its production area analogous to human language structure and acquisition.

The Mechanisms Underlying the Sensory Modalities and their Relationship to Birdsong

The vibration mechanism of olfaction is a theoretical conceptualization of chemoreception, which proposes that olfactory perception detects the vibrational frequencies within the bonds of molecules, rather than the perception of scent relying upon the shape of the molecule. It is the molecular level vibrations which are proposed to be sensed by the olfactory system, and which give substances their own unique aroma (Turin 1996; Franco et al. 2011). If this were to be the case, then it would suggest that the biological mechanism of olfactory perception is similar to that involved in auditory perception. Thus, chemoreception is potentially another form of mechanoreception which involves the neurological detection of tension in cell membranes. The mechanical effect of cell membrane distortion is then converted to an electrical signal via the process of mechanotransduction (Sachs 1986). However, the vibration mechanism of olfaction postulates that olfactory signals are operating on a much finer scale with infinitesimally small vibrational frequencies being detected by the sensory apparatus. Significantly, this theoretical mechanism is therefore potentially relevant to communication and the evolution of language, as I will go on to argue.

Many animal species, particularly mammals, are known to rely olfactory cues to transmit and receive data about the physical and social environment (Quignon et al. 2012; Eisenberg and Kleiman 1972). Whilst many studies on olfaction focus on mammal species, more recent evidence indicates that birds also rely heavily on their olfactory sense. For example, European starlings line their nests with aromatic herbs, because volatile compounds in the herbs reduce parasite load in fledglings (Gwinner and Berger 2005), and thus serve to attract potential female mates (Gwinner and Berger 2008). Further, blue tits have been found to respond positively to the scent of the natal nest, and use scent as a cue to discriminate between the natal nest and unfamiliar nests (Petit et al. 2002). Moreover, blue tits can also detect changes in the aromatic plant odour composition of their nests (Mennerat 2008). Of further significance is the evidence that juvenile zebra finches use odour cues to identify and locate their natal nest within the nesting colony (Caspers and Krause 2010). Focussing on avian sensory perception, birds also have colour vision and can perceive a range of visual electromagnetic spectra (Osorio and Vorobyev 2005), including ultraviolet radiation (Bennett and Cuthill 1994). Indeed, tetrachromatic plumage is not uncommon amongst songbird species (Vorobyev et al. 1998) and as such, plumage colouration is implicated in avian mate choice (Hill 1990).Thus birds, as well as other animal species, integrate a range of energetic sensory data when perceiving the world.

To date, much of the experimental data has investigated sensory perception on a physiological functional scale. Whilst the mechanisms of colour vision are not yet fully understood (Gegenfurtner 2003) the underlying mechanism of colour vision is based upon spectral characteristics of waveforms; specifically, the frequencies of electromagnetic radiation. Indeed, it has already been postulated that colour vision is a quantum phenomenon (Bouman and Walraven 1962; de Vries 1943). This quantum mechanism of vision postulates that because light is particulate as well as a waveform (Dimitrova and Weis 2008), the visual sensory apparatus is detecting the mechanical vibrations of photons when particulate electromagnetic radiation is detected by the sensory apparatus at different speeds and energy intensities. Therefore, the suggested mechanism relies on vision also being a form of mechanoreception, given that the mechanisms involved in colour vision would be essentially the same as those involved in both olfactory and auditory processes. Indeed, when taken together, both the vibration mechanism of olfaction and the quantum mechanism of colour vision suggest that all sensory modalities (including proprioception and magnetoreception) could fundamentally be based upon the functioning of mechanoreceptors and mechanotransduction. Consequently, this provides the theoretical framework for a unified concept of sensory perception, across all sensory modalities.

A unified idea of sensory perception could potentially have applications in studying language evolution. This is because in addition to indicating emotional states (Seyfarth and Cheney 2003) a potential explanation for the evolution of birdsong may be that its function is the same as human language, i.e. that birdsong (and other forms of complex animal communication) describe perceived environmental phenomena, and are based upon the mechanical detection of vibrational frequencies of electromagnetic radiation and the vibrational frequencies of molecular bonds. These frequencies could then be reproduced by the signaller via a process of ‘frequency-mimicking’. This is a concept whereby a representational or descriptive vocal signal is produced to mimic the frequencies of energetic environmental phenomena.

A Hypothesis of Language Evolution Based upon Vibrational Frequencies in the Environment

I thus propose that the unified hypothesis of sensory perception may provide a new avenue via which to study the evolution of language. I also propose the hypothesis that the discrete, syntactic elements of birdsong are descriptive of environmental phenomena such as nest odour and plumage colour, and occur via the vocal acoustic mimicry of molecular odour and colour frequencies. These frequencies are then reproduced as birdsong and other syntactic animal vocal communications, depending on the taxonomic group. A potentially fruitful avenue of research may also be to investigate the hypothesis that in species where males build a nest prior to mating, such as the European starling, mate attraction vocalisations act as indicators of nest quality because they describe the scent of the nest arising from the aromatic compounds in the herbs lining the nest, thus indicating the nest is a healthy, parasite free environment to females. Furthermore, habitat quality of a male’s territory could be indicated by both melanic and carotenoid plumage colouring, as there is a direct causative link between the abundance of invertebrate prey available as a food source, and the intensity of a male’s plumage colouring. In turn, this is related to sexual selection and mate choice, with more colourful males being more attractive to females (Dunn et al. 2010; Ferns and Hinsley 2008; Hill and Montgomerie 1994). Taking these results into account, a further potential avenue of study would therefore be to investigate whether birds produce an acoustic frequency and amplitude which describes and advertises the intensity of their plumage colouration, such as ‘brown’ or ‘yellow’, and whether this frequency is based upon a matching of electromagnetic to acoustic frequencies, which are then expressed as part of the song.

Moreover, aposematism is a common feature in the environment, which uses colour to indicate risk or danger. Environmental threats or hazards, like stinging insects, show yellow and black colouring. Yellow and black is an aposematic colour combination which often symbolizes a potential hazard in nature, including in human societies (Wogalter et al. 1998; Schuler and Hesse 1985), as opposed to red colouration, which in some contexts signals catastrophic danger and communicates the action ‘stop’, such as can be seen in the leaves of poisonous plants (Karageorgou et al. 2008; Lev-Yadun 2009; Kehtarnavaz et al. 1993). Many ‘aggressive’ bird vocalizations are trills: - two fast, oscillating frequencies, analogous to the alternating yellow and black stripes on wasps. If the contrast between yellow and black could be described by frequency- mimicking, then it can be vocalized as a broadband trill, such as those as produced by nightingales. Thus, instead of being an ‘aggressive’ signal, i.e. purely an expression of emotion, it is possible that the broadband trill contains information that warns of a potential hazard, i.e. the signaller will pose a threat if provoked. Moreover, phenomena such as mimicry (Kelley et al. 2008; Schachner et al. 2009) shown by the European starling (West et al. 1983), the mynah bird (Archawaranon 2005), and some species of parrot (Cruickshank et al. 1993) could be explained by a unified mechanism of sensory perception, the premise being that vocal mimicry of sounds is a similar process to the proposed ‘frequency-mimicking’ of environmental frequencies. This is particularly apposite, as there is currently no evidence that species which mimic human language understand the semantic meaning of the words they utter. It would thus appear they simply copy sounds they perceive on a regular basis within a social context, and this bears striking similarities to the mechanisms underpinning the learning of birdsong. Further, given that birdsong has been found to contain information and is not simply an expression of emotional valence (Berwick et al. 2011), it is possible that in the case of the starling, for example, individuals may use mimicry as a guide to potential mates. Essentially, vocal signals may be advertising the position of the nest to females by mimicking sounds in the local environment, and broadcasting them for continual periods, each bout being interspersed with mate attraction song. Similarly, when rearing chicks, mimicking sounds in the environment provides a familiar environmental feature to which a foraging starling parent can return, without the need to engage complicated cognitive navigational processes. Simultaneously, the risk of nest predations is minimized, because commonly occurring noises in the environment, other than birdsong and calls, are imitated. This would minimize the risk of being detected by a predator, as predatory species would be unable to identify a potential prey item from the mimicked sounds produced. Nonetheless, those sounds would provide a cue to guide the returning parent back to the nest, given that the location and surrounding environment are familiar. Indeed, various animal species including dolphins (Richards et al. 1984), whales (Ridgway et al. 2012) harbour seals (Ralls et al. 1985) and even an Asian elephant (Stoeger et al. 2012) have been known to mimic human language, demonstrating that the ability to mimic and reproduce unfamiliar sound waves detected in the environment is present in these species. Thus, further study to investigate the function of mimicry by employing the concept of a unified mechanism of sensory perception may well be beneficial.

However, there are factors including environmental impacts such as anthropogenic noise, and anatomical factors such as an individual’s body size, which affect and shape the acoustic parameters of animal vocalisations (Antze and Koper 2018; Apol et al. 2018; García and Tubaro 2018). Arguably, then, ‘frequency- mimicry’ may not be a mechanism which would provide a faithful reproduction of frequencies in the environment. For example, interference from human noise, such as traffic, may have led to an alteration in the spectral characteristics of some animal vocalisations (Halfwerk and Slabbekoorn 2009). Nonetheless, current selection pressures involving the impact of anthropogenic noise were not present historically when vocal communication in birds and other species evolved. As such, any of the affected species’ acoustic adaptations observed today are recent changes in evolutionary time, indicative of a response to a changing environment as a result of human impacts. The original vocalisations of these animals may be a more faithful representation of frequencies detected in the ancestral environment. Moreover, local adaptations to vocal communications within a population, which overcome the difficulties posed by human disturbances and changes in the environment, become synonymous to ‘dialects’ which are found in different, isolated populations of songbird species (Derryberry 2009; Podos and Warren 2007; Baker and Cunningham 1985). Furthermore, any physical factor such as body size or conformation of the vocal tract, which may affect the frequency-related parameters of an individual’s vocalisations, can be accounted for by appealing to the more recent developments in cognitive science, which view cognition as being a process based on neural net functioning and Bayesian probabilities, rather than a series of computational functions (Annis and Palmeri 2018; Clark 2013). Traditional theories of communication assume that a larger body size results in a lower frequency of vocalisations (Fitch 1997). If this were the case, then ‘frequency mimicking’ would not be possible, because the anatomy and physiology of the individual would make it impossible to faithfully reproduce environmental sensory stimuli. A unified mechanism of sensory perception, however, allows for a receiver to evaluate a signaller’s communication based upon a sample of its vocal phonemes, and to make Bayesian predictions about the relative, rather than absolute frequencies in a signal originating from any one specific individual (Brown and Brüne 2012). Thus, this would allow a larger animal to reproduce a sound it had perceived in the environment which mimicked the perceived sound, but that was resonated at a pitch relative to its body size, i.e. the mimicked sound was a transposition rather than being a faithful reproduction. Further, more recent studies are finding that body size is not necessarily and indicator of vocal pitch in all species (Budka and Osiejuk 2013; Patel et al. 2010; Fitch 1997), so differing body sizes between individuals in a population need not be a barrier to testing for a unified mechanism of sensory perception. This is particularly relevant in social species, where individuals often use multi- modal sensory perception when interacting with other individuals (Takagi et al. 2019; Albanie et al. 2018; Nakamura et al. 2018). By relying on more than one sensory modality, receivers can extrapolate information from multiple sources of sensory data when perceiving another individual, which relate that individual’s body size to relative frequencies in its vocalizations, in comparison to the vocalizations of another conspecific of differing size. This would therefore account for a downward shift in the pitch of vocalizations made by a signaller with a larger body size, and vice-versa.

Conclusion

To conclude, a unified hypothesis of sensory perception may offer an explanation as to the evolution of birdsong and other forms of syntactic animal vocal communication, in that they may be species-specific forms of simple non-Saussurean languages. Animal ‘songs’ could describe olfactory, auditory and visual stimuli in the environment via a process of frequency-mimicking, whereby the spectral characteristics of sensory stimuli are perceived and then reproduced as a vocal communication. Further, the frequencies and associated acoustic parameters, such as bandwidth and amplitude, act as the syntactic element of the language. Certainly, there is a move toward theories of ‘embodied cognition’ or ‘enactive cognition’ (Setti and Borghi 2018) which describe more complex processes of cognition and communication than did previous cognitive theories. Enactivism also relies on the premise that cognition (thus communication) is a complex interplay of body, environment and mind, and not a series of neurological computational, algorithmic processes (Thompson 2005). Moreover, evolution is parsimonious, and it is simply more likely that one system of sensory perception evolved into separate and highly specialized systems, resulting in an integrative yet finely tuned system which can detect a range of physical phenomena. This then gives an organism a multi-modal picture of the world and its environment, both external and internal. Further, a more plastic and fluid view of sensory perception based upon quantum mechanics and mechanotransduction would allow fruitful avenues of research with respect to phenomena such as synaesthesia, the underlying mechanisms of which are still not clearly understood (Weiss and Fink 2008), as well as emotion- colour associations found in human language such as ‘seeing red’; ‘feeling blue’; ‘yellow bellied’, and ‘green with envy’ (Sandford 2014; Mohammad 2011). Additionally, the human perception of musicality could potentially be explained, if the origins of language are to be found in acoustical expressions of environmental frequencies, thus also explaining human emotional reactions to music. Further, it is an interesting observation that birdsong and whale song have roles in both territory defence and mate attraction, and the colour red is often associated with both aggression and sexual attraction. It is therefore possible that aggression and sexual attraction, rather than being discrete phenomena, are motivational-emotional states at opposite ends of the same neuro- affective spectrum. Certainly, given the advances in quantum theory of the mind (Stapp 2004) a unified mechanism of sensory perception would be an interesting avenue to explore, perhaps yielding surprising results and giving a novel framework in which to interpret existing data. Indeed, Nießner et al. (2011) and Nießner et al. (2013) have already demonstrated experimentally that UV receptors in poultry also act as magnetoreceptors. Thus, a unified idea of sensory perception provides a basis for future extensive scientific study into the nature of cognitive sensory data processing and communication in both animals and humans.