Introduction

The origin and phylogeny of the vertebrate ear is one of the best documented and most fascinating of evolutionary stories that have emerged over the last century. The ancestry of the mammalian middle ear, in particular, became a model example of evolutionary transformation of structure and function and is superbly documented in a large array of fossil ancestors (see, e.g., Watson 1953; Wang et al. 2001; Kemp 2005; Takechi and Kuratani 2010; Meng et al. 2011). In the last 20 years, new fossil finds have expanded our understanding of the evolution of middle and inner ears of vertebrates. Compared to inner ears, middle ears are relatively easy to study, as their components are often exposed. The study of fossil inner ears, however, involved in the past techniques such as sectioning, destroyed specimens and were obviously only possible in rare cases where many specimens were available. During the last 10 years, a greatly improved method of nondestructive investigation of the inner ear—and, indeed, any internal bony spaces—became available: micro-CT scanning. With this technique, small differences in density between the material in the inner ear spaces and the material that has replaced the bone can be studied nondestructively in three dimensions and with a spatial resolution of less than 10 μm. It is no surprise that the technique has been eagerly taken up by paleontologists, and some remarkably well-preserved specimens now provide important new insights into the long history of the evolution of the mammalian cochlea (e.g., Vater et al. 2004).

Much of the relevant paleontological literature will not be familiar to most hearing scientists and many might find it difficult to place recently discovered fossils and their structures in context. Thus, the aim of this review is to summarize for the “hearing research community” the recent literature on cochlear evolution as it applies to mammals and to provide one possible framework for understanding functional–physiological consequences of structural changes observed in several mammalian lineages over the very long period of mammalian evolution.

What distinguishes the hearing abilities of many modern mammals from those of nonmammals is not their sensitivity or frequency selectivity—some nonmammals can match mammals in these respects (Manley 1973)—it is the fact that most recent mammals are sensitive to ultrasonic frequencies. To understand how evolution enabled most, but not all, modern mammals to do this, three important lines of evidence will be followed over more than 200 million years (Ma) of evolution. These will be discussed in parallel—the evolution of the (1) middle ears, (2) cochlear ducts, and (3) prestins of the different mammalian lineages. These three approaches are broadly consistent and complementary and enable a unified understanding of the evolution of mammalian hearing.

Mammals are Diverse and Have a Very Long History

There have been few global attempts to understand the evolution of the mammalian cochlea. One influential review (Masterton et al. 1969) concentrated on the question of the historical background of human hearing and its broader implications. The authors concluded that “…the results show that high-frequency hearing (above 32 kHz) is a characteristic unique to mammals and, among members of this class one which is commonplace and primitive” (italics added; note that Masterton et al.'s definition of high-frequency hearing of >32 kHz is higher than that used in this review). While agreeing with the statements that hearing above 32 kHz is unique to mammals and obviously commonplace among them, the concept that it is “primitive” or ancestral ignores fully the first half of mammalian evolution and, indeed, all nontherian mammals, both extinct and modern.

The term “mammal”, while easy to define for modern, extant species, is extremely resistant to a clear definition for fossil species and there is still no agreed-upon version that satisfies all paleontologists, even after analyses based on more than 150 morphological features (e.g., Rowe 1988; Kemp 2005; Wible 1991). One of the difficulties lies in the fact that what we now regard as clearly mammalian features (such as hair, milk glands, particular types of dentition, a secondary palate, and a host of other skeletal attributes) did not arise at the same time. Indeed, they evolved over many millions of years and many, such as a heterodont dentition, were already found in the ancestors of mammals. Thus, the choice of a “beginning” for mammals depends on which feature(s) is or are chosen to define the group. Since we are concerned only with hearing, it is convenient here to define a mammal as an organism with a three-ossicle middle ear. The origin of the components of this middle ear has been known for a very long time (review, e.g., in Manley 2010). This definition places the beginning of mammal evolution at about 230 Ma before the present time, in the Triassic period of the Mesozoic. During the Triassic, for reasons not yet understood, all land vertebrate groups, independently of each other, developed a tympanic middle ear (Clack 2002: Manley and Clack 2004). Since only mammalian ancestors were simultaneously changing the constitution of their jaw joint, a morphological transformation that freed up a number of bones at the back of the jaw, only mammals established a middle ear containing three ossicles, rather than just one. More than 100 Ma later, this proved to be a very useful preadaptation, since it was conducive to the evolution of high-frequency hearing.

During the Triassic, one or more lineages of mammal-like “reptiles” (known as synapsids and recognizable from and named for their characteristic skull structure) became mammals. While it is likely that this only happened once (making the mammals monophyletic), this is still debated; as yet, it is not certain that mammals, as here defined, are not in fact biphyletic (e.g., Rich et al. 2005; Martin and Luo 2005). The first mammals lived in a world dominated by dinosaurs, some of whom were, of course, very large, and it has been assumed that the first mammals were, therefore, generally nocturnal, assuming a lifestyle that enabled them to avoid dinosaur predators. Small mammals were, however, not interesting to large dinosaurs as prey, but rather to small dinosaurs, and evidence (e.g., eye structure, Schmitz and Motani 2011) points to some dinosaurs being nocturnal or at least crepuscular. It is, therefore, not easy to know how dinosaurs acted as a selection pressure on mammalian evolution. All early mammals were small, some very small (shrew and mouse size) and no doubt fast and agile. Within a short time, and assuming monophyly, they diverged into three main lineages that will, to avoid difficult names and conflicts in nomenclature, not be named here but can be seen in Figure 1 as the lineages that led over a time period of 240 Ma to the modern egg-laying mammals or monotremes (on the left of the figure), to the multituberculates (middle), and to the therian mammals (right, placentals and marsupials).

FIG. 1
figure 1

Schematic summary of the status of the cochlea throughout mammalian evolution. Mammals, with their various characteristic traits, arose during the Triassic and quickly gave rise to a number of separate lineages. During the Jurassic, the length of the cochlear canal rarely exceeded a few millimeters in all lineages. Multituberculate mammals died out in the late Cenozoic, still having very short uncoiled cochleae. Monotreme mammals (of uncertain relationships) retain cochleae of maximally ~8 mm length to the present day. Their cochleae are uncoiled and the primitive organ of Corti is not supported by bony ridges. The lineage leading to therian mammals and their early dryolestid relatives evolved a bony support for the basilar membrane in the Jurassic and continued to coil the cochlea that, by the early Cretaceous, achieved one full coil. Three outline sketches of cochleae are shown for text block numbers 5, 6, and 7; each sketch reflects an interpretation of the shape of the soft tissues of the cochlea. In each sketch, the apex is on the right side.

From the beginning, these three lineages had their own evolutionary trajectories. If we were to see them now, their external appearance would prompt us to call them mammals (although some, such as the monotremes, lack pinnae). If we were only to see the gross morphology of their middle and especially their inner ears, however, few of us would immediately recognize them as mammalian. Thus, one very important conclusion from the evolution of the ears of the earliest mammals is that great care needs to be taken when discussing what is “mammalian.” A coiled cochlea and a very delicate middle ear suspended by ligaments in an air space (Sim and Puria 2008) did not evolve for more than 100 Ma after mammalian origins and are, thus, only part of the whole story and only in one of the three lineages. This can be illustrated by a brief discussion of the group Multituberculata.

Multituberculates

This, now extinct, lineage was large (more than 100 fossil species are known) and existed for an astounding 200 Ma, parallel to the other mammals (Kemp 2005). Animals of this lineage may be used to make one extremely important point: the evolution of what are now generally considered to be typically mammalian hearing characteristics (highly sensitive middle ear, well developed, and coiled cochlea) was in no way an inevitable consequence of being a mammal. Indeed, evidence points to the fact that the multituberculates, in every way a successful group of mammals, never developed their middle and inner ears much beyond the initial stages. After 200 Ma, they still had robust middle ear bones and a cochlea that was both uncoiled and remarkably short (2 to 6.5 mm) and that, in some cases at least, likely still had a lagenar macula (Fig. 1; Luo and Ketten 1991; Fox and Meng 1997; Hurum 1998). Comparing this lineage to that which led to modern mammals, we have to admit that there was nothing inevitable about the evolution of mammalian hearing characteristics.

Monotremes

This mammalian lineage, for which more fossil than extant species are known, is much more familiar to us and almost everyone knows the platypus (Ornithorhynchus) and the spiny anteaters (e.g., Tachyglossus). These mammals make clear that, e.g., “giving birth to live young” is not a definitive mammalian characteristic, a fact that, when discovered in 1884, was a zoological sensation. Monotremes, together with the multituberculates, can also make clear that there are many grades of middle and inner ears within mammals that all need to be taken into account before making assumptions concerning their evolution.

Fortunately, there are studies of hearing in these modern monotreme representatives (Aitkin and Johnstone 1972; Gates et al. 1974; Ladhams and Pickles 1996; Mills and Shepherd 2001) and from them, we can learn much more about the soft-tissue evolution of the mammalian cochlea and its function. In one respect, the monotreme hearing organ resembles that of multituberculates, being not coiled and relatively short (4.4 to 7.6 mm in length, Ladhams and Pickles 1996). The soft tissues do not conform fully to the shape of the bony canal and are, near the tip, slightly coiled in the opposite direction to the therian (placental and marsupial) cochlea. Modern monotremes have a peculiar middle ear that has been described as being very stiff (Aitkin and Johnstone 1972; Gates et al. 1974). Hearing in monotremes is, for mammals of their size, restricted to lower and middle frequencies. Audiograms were described as V-shaped, centered at 5 kHz (Gates et al. 1974) or U-shaped, with rapid loss of sensitivity below 3 kHz and above about 16 kHz (Mills and Shepherd 2001).

Interestingly, there is also a lagenar macula at the apical end of the monotreme cochlear canal, correlating with its likely presence in the multituberculate cochlea and, indeed, in the cochleae of all early mammals. A lagena macula is, of course, also found at the apical end of all modern avian and “reptilian” cochleae. These features of the cochleae of two lineages of mammals clearly indicate that there is not only one type of cochlear configuration in mammals. Indeed, the coiled cochlea, considered by many to be archetypically mammalian, arose only in one of the three lineages, the therians, and only after 100 Ma of mammalian evolution in that lineage (Meng and Fox 1995). We will return to the monotremes below to discuss the soft-tissue characteristics of their hearing organs.

Early Middle Ears and the Hearing of Mammalian Ancestors

Little space will be spent discussing the middle ears of mammals, since this topic has been reviewed before (e.g., Luo 2007; Manley 2010). Suffice it to say in the present context that while it is has been a controversial point, it can be concluded that the early mammalian middle ear was not a very efficient transmitter of sound. In particular, the malleus, originally derived from the primary jaw joint (quadrate bone), remains attached to the lower jaw for the first half of mammalian evolution and, indeed, has essentially this relationship in modern monotremes, at least up to hatching. Thus, several stages have been recognized in the evolution of the middle ear (Luo 2007).

The early mammal tympanic middle ear was, of course, far better than not having a true middle ear at all, as in their ancestors, but it was restricted to poor sensitivity and to low frequencies (Kemp 2007). This conclusion is drawn from the very short papillae, the size of the “stapes,” the bony component connecting the quadrate to the inner ear of these species and the intimate connection of the middle ear to the lower jaw (e.g., Kemp 2007). Even the few authors that suggest that the resultant early mammal tympanic middle ear may have transmitted relatively high frequencies (e.g., Hurum 1998) accept that it was not very sensitive to airborne sounds. For the middle ear of the very early mammal Morganucodon (which many regard as a mammaliform and not a true mammal), some authors, e.g., Rosowski and Graybeal (1991), came to the conclusion that its stiffness was compatible with high-frequency transmission. This suggestion was later challenged, however, as it was discovered that the skulls studied initially had been distorted during fossilization (reviewed in Hurum 1998). One suggestion (Rosowski 1992) was that this middle ear perhaps behaved like that of modern monotremes and transmitted a relatively narrow bandwidth of sound centered near 7 kHz. This would be in line with the revised data on Morganucodon, which indicates a more mobile and less stiff set of ear ossicles than originally thought (Hurum 1998). It should be noted, however, that in Morganucodon, the malleus was still firmly attached to the dentary (lower jaw) and the important assumption that the malleus of Morganucodon had a “long arm” producing a significant lever ratio (Rosowski 1992) has recently been shown to be very unlikely (Meng et al. 2011 Supplement). In Morganucodon, the middle ear also had a function in the support of the jaw and was not sufficiently evolved to even be termed a transitional middle ear, let alone a definitive mammalian middle ear (Meng et al. 2011). Morganucodon, however, was not close to the lineage leading to therian mammals, so studies of its middle ear may not be indicative for all early mammals.

The middle ear is, of course, only one component determining the frequency response of the entire ear. Middle ear response characteristics are influenced by the ability of the inner ear to process the frequencies being transmitted by the middle ear (Hemilä et al. 1995; Manley 1972, 1973; Ruggero and Temchin 2002; Lavender et al. 2011). Thus, any conclusions regarding frequency responses need to take the inner ear into account. Roughly for the first half of mammalian evolution, and longer in some lineages, the cochlea remained very short indeed (excluding the lagena macula only 1.5 to 2.5 mm; Greybeal et al. 1989; Luo et al. 2010). In modern species of vertebrates, a cochlea this short is incompatible with high-frequency hearing. Indeed, the cochlea of most birds is twice as long as this (5 mm) and their upper frequency limit is from 6 to 10 kHz. Modern monotremes, with a cochlea up to 8 mm long, still have an upper frequency limit below 20 kHz (Gates et al. 1974; Mills and Shepherd 2001). The fact that some modern therian mammals with relatively short cochleae (mouse, 6–7 mm) hear very high frequencies (Rosowski 1992) is not very relevant, since their middle- and inner ear systems are the result of a further 200 Ma of evolution.

One way to examine the hearing of mammalian ancestors is to use the standard and powerful technique of evolutionary cladistic studies known as the outgroup analysis. This compares the existence of two states (here, high-frequency hearing or not) within one closely related group (here mammals) and compares this to the condition in an outgroup. An outgroup is always the next most closely related lineage. Here, the outgroup would be other amniotes, such as “reptiles” and birds. Since all other amniotes possess only low-frequency hearing, the analysis accepts this as the ancestral state for all amniotes.

This conclusion plus evidence from the evolution of middle ears (review in Manley 2010) and of prestins (see below) leads to the conclusion that, for the first half of their evolutionary history, and contrary to the suggestion of Masterton et al. (1969), mammals did not hear high frequencies. Instead, in monotremes and multituberculates, the upper limit always remained below 20 kHz and for at least 50 Ma of the therian lineage, there was probably only a gradual increase in the upper limit of hearing to 20 kHz (e.g., Vater et al. 2004). After full cochlear coiling was achieved, the middle ear evolved to the freely suspended form of modern therians and prestin evolution made a major leap forward (see below), high-frequency hearing evolved, and is today represented in therian cochleae with lengths between 7 mm (mouse) and >50 mm (baleen whales). The actual relationship between cochlear length and hearing range varies between therian groups, but can be quite close within one class, such as primates (West 1985; Rosowski 1992). Since, however, the width of the basilar membrane also correlates with frequency response (Manley 1973), cochlear length alone is not a reliable indicator of frequency range. This can be conveniently illustrated by a comparison of the human cochlea with those of some dolphins. Both cochlear types have the same length, but the basilar membrane in dolphins is only about half as wide. The result is a huge difference in the upper frequency, with some dolphins exceeding 100 kHz (Manley 1973).

Early Inner Ear Soft Tissue Structure

Obviously, no old fossil provides remnants of soft tissues except as far as they influence or are shaped by bone. It is, however, possible to use the cladistical outgroup analysis method to investigate comparative structural questions regarding the soft tissues of the hearing organ. If we compare the structure of the cochleae of modern therian mammals with that of modern monotreme mammals and these again to the structures in nonmammals, we come to the conclusion that all modern mammals have similar and unique structural features (synapomorphies) and all their hearing organs deserve to be called “organs of Corti.” No nonmammals have anything similar.

In monotremes, as in therians, there are clearly two groups of hair cells that lie on the inside and outside of pillar cells. This is the basic framework of an organ of Corti—the differences within mammals being that in monotremes, the numbers of cells in any cell group and in a cross section are larger. Thus, there are three or four rows of pillar cells, four to five rows of inner hair cells, and six or seven rows of outer hair cells in a single transverse section (Ladhams and Pickles 1996). An independent origin of this configuration in monotremes and therian mammals is of course possible but considered extremely unlikely. This suggests that the basic structure of the organ of Corti was already established before the origin of mammals as defined here (Fig. 1). The unique configuration of the organ of Corti, thus, did not in itself automatically confer high-frequency hearing. The early organ of Corti was likely to have been a low- to mid-range frequency receptor receiving input from an insensitive middle ear. It can, however, be viewed as preadapted to facilitate the conditions for high-frequency and, indeed, high ultrasonic hearing. As we note below, this required the interaction of two hair-cell populations, and these populations already existed from the origin of mammals. It also required further changes in cell and protein structure during evolution (see below). Exactly what the functions of the two hair-cell populations of early mammals were can only be speculated upon. The unique configuration of the organ of Corti was, thus, established very early in mammalian evolution and was successful, with the consequence that the basic structure of modern therian hearing organs shows very little variation indeed.

The earliest mammalian cochleae were, thus, very short (2 mm; Fig. 1) bony tubes having smooth walls and harboring a lagena macula at the tip. There was, thus, space for perhaps 1.5 mm of basilar membrane surmounted by the organs of Corti. Many modern lizards and, of course birds, have papillae that are longer than this (Manley 1990).

A Decisive and Unique Step in Evolution: the Integration of Hard and Soft Tissues

One feature of the therian mammalian cochlea that has previously received little attention and that seems decisive to the present author is the unique integration of soft and hard tissues. As noted above, the earliest mammalian cochleae (as those also of modern nonmammals) were smooth-walled bony canals that had no firm contact to the soft tissues. In modern nonmammals, it is possible to insert a small hook into scala tympani and pull out the entire cochlear “tube,” something inconceivable in therian mammals. Yet it would have been possible in the earliest mammals and would even today be possible in monotremes. In only one mammalian lineage, that of the therians, did this change. Late Jurassic fossil mammalian cochleae (that is, after ~80 Ma of mammalian cochlear evolution), in the lineage leading to therian mammals, show a dramatic change in cochlear structure. Even though their cochleae were only marginally longer (~3 mm) than in their ancestors, coiling had reached about 270 ° (Fig. 1, Nr. 6), and CT scans reveal that the bony canal wall had became integrated into the soft tissue, as in modern therians. There were both primary and secondary laminae, presumably supporting the inner and outer edges of the basilar membrane, and the cochlear ganglion was itself enclosed in a canal within the bony wall. The nerve fiber bundles passed through clear openings in the bone to enter the organ of Corti (Ruf et al. 2009; Luo et al. 2010).

It is, of course, difficult to speculate upon the selective forces at play that led to this development. Vater et al. (2004) suggested that “development of the primary osseous spiral lamina probably resulted from the coiling of the cochlear canal.” It is likely that, given the stiffness of the middle ear apparatus of earlier mammals, an increase in the stiffness of the basilar membrane would have produced an improved impedance match between middle and inner ears and thus improved sensitivity. Contrary to the suggestions of some authors (e.g., Luo et al. 2010) on the basis of the distribution of cochlear laminae in modern therians, it is highly unlikely that the presence of bony laminae in these early cochleae immediately enabled very high-frequency hearing in these species. The extremely short lengths of these cochleae, the status of the prestins (see below), and a comparison to the hearing ranges of other vertebrate groups suggest, at best, a modest increase in the upper frequency limit of hearing at this stage, but a very useful improvement in sensitivity would have resulted from a better impedance match between middle ear and cochlea. In many modern placental groups, such as anthropoid primates (Coleman and Boyer 2012), including species that hear high frequencies, secondary bony laminae are not seen and their distribution in mammalian cochleae is by no means uniform. In any case, it is possible that the changes in cochlear impedance due to changes in the suspension of the basilar membrane would have had their largest effect below 20 kHz (Ravicz et al. 2010). This unique change in bone distribution in the cochlea is seen in the dryolestid lineage at around 160 Ma. It is, thus, possible and indeed likely that it occurred in the therian lineage before the split into placentals and marsupials (~130 Ma; Fig. 1).

Thus, during the Jurassic, the ancestors in the therian mammal lineage increased the length of their cochleae moderately (Fig. 1); continued the partial coil, integrated bone and soft tissue; and probably (and this is an assumption, based on later developments) reduced the number of cells across the organ. During and after this time, the middle ear slowly evolved towards lighter, more freely suspended ossicles, but not uniformly. Two types of eutherian middle ear, with many intermediates, have been recognized (Fleischer 1978; Lavender et al. 2011), a “microtype” in small mammals and a “freely mobile” type in medium to large mammals. Ossicular rotational axes differ between species (Puria and Steele 2010) and scaling with animal size is a general, but not universal, principle (Hemilä et al. 1995).

The Origin and Trajectory of Therian Mammals

Any modern mammal that does not lay eggs is a therian. Thus, this group includes the confusingly named “placentals” (Eutheria; note that marsupials and, indeed, many diverse groups of animals also have some sort of placenta) and the pouched or marsupial mammals (Metatheria). The therians originated in the early Cretaceous (~125 Ma; Fig. 1) and soon after that split into the two modern lineages (Ji et al. 2002; some molecular analyses place these dates earlier, see, e.g., Woodburne et al. 2003). An examination of their middle and inner ears shows so many similarities that we may conclude that by the time they split (1) the middle ear consisted of freely suspended ossicles (bullae came later and arose multiply, see below), (2) the cochlea had already achieved at least one full coil, and (3) the organ of Corti had essentially achieved its modern structure. In spite of the many similarities, unique features characterize fossil ear regions of placentals and marsupials (Wible 1990).

It is very likely that soon after a full coiling of the cochlea was achieved, the resulting spatial restrictions at the cochlear tip led to the loss of the lagena macula, a loss unique to therians. Wible et al. (2001) describe a cochlea in the very early eutherian Prokennalestes (early Cretaceous) that had exactly 360 ° of coil (Fig. 1). In that specimen, the diagrams indicate that when measured along the center of the cochlear canal, the cochlea was about 4 mm long, far shorter than in any modern therian. The tip of the cochlea was, at that stage, not narrowed, and it is not clear whether there was a lagenar macula or not; Wible et al. (2001) suggest not. In a marsupial from later in the Cretaceous (~ 80 Ma; Meng and Fox (1995), the cochlea was 7.3 mm long and turned through 1.25 coils. The tip of the cochlea also still had the same diameter as the base. Modern therian cochleae are tapered towards the apical end and have at least 1.5 coils (e.g., mice).

There have been a number of theories concerning potential effects, or even advantages, of cochlear coiling on hearing. Coiling has long been seen simply an efficient way of packaging a longer auditory organ. If there are other effects, then they will, to a large extent, be accidental ones as the result of, and not as a driving force for, coiling. Evolutionary processes cannot be anticipatory but do, of course, take advantage of structural changes. Potential effects of coiling on the responses of the tectorial membrane were discussed by von Békésy (1960) and Gavara et al. (2011). West (1985) also suggested that coiling minimizes length differences between afferent fibers innervating different positions along the organ of Corti and thus produces equal travel times for afferent information. However, the nervous system is quite capable of equalizing travel times using fiber length and diameter (as seen in an extreme case in the inputs to avian nucleus laminaris; Carr 2004).

While it may be thought that coiling may improve high-frequency hearing (and it does indirectly, of course, through enabling enormous cochlear elongation), Gavara et al. (2011) reached the opposite result and conclude that “the cochlear spiral geometry is a major determinant of low-frequency hearing.” Except within recent rodents, there is no simple relationship between basilar membrane length and the number of cochlear turns (West 1985), suggesting that optimization of cochlear form varies between mammalian groups. There is, however, a correlation between the logarithm of basilar membrane length and the range of octaves processed (West 1985). Coleman and Boyer (2012) report a weak correlation between cochlear length and low-frequency sensitivity (at 250 Hz). In modern eutherians, there is an inverse correlation between basilar membrane length and the high-frequency limit (West 1985; Rosowski 1992). These correlations in modern species cannot, however, be assumed to be relevant to the hearing of species of 200 Ma ago that had different middle and inner ears, and less evolved prestins.

The eventual loss of the lagena macula from the cochlear apex at some time during the Cretaceous (and thus the elimination of otoliths from this endolymphatic compartment) enabled therian mammals to dramatically reduce the concentration of calcium in the cochlear endolymph to micromolar levels. Such a large change in calcium levels perhaps played a hitherto underestimated role in the evolution of the tectorial membrane (which is highly sensitive to the ionic environment, Kronester-Frei 1979), the mechanosensory channels of the hair cells (e.g., Beurg et al. 2010), and perhaps the properties of prestins (e.g., Elgoyhen and Franchini 2011). To reach conclusions regarding the frequency responses of these early therian organs, we need to briefly discuss the evolution of mammalian prestins.

Prestin and Its Relevance to Mammalian Frequency Limits

Prestins have been described from all vertebrate groups, including fishes, and there is a general consensus that this molecule began its evolution as a transporter (Dallos and Fakler 2002). Prestins occur in very high concentrations only in the lateral membranes of the outer hair cells of mammals, but not in such concentrations in inner hair cells or in the hair cells of nonmammals (Köppl et al. 2004). In land vertebrates, a second function of prestin, molecular motility, evolved towards a greater emphasis on a motor function, but showed very different degrees of development in different lineages. In some nonmammalian groups, such as chickens and to some extent monotreme mammals, both transporter and motor functions are evident, but the dramatic development of strong motor forces within a relevant range of cellular membrane potentials, as seen in therian mammals, did not evolve (Franchini and Elgoyhen 2006; Elgoyhen and Franchini 2011; Tan et al. 2011). In the mammalian cochlea, the motor system involving prestin is more important at high frequencies (Hudspeth 2008), perhaps due to the unique cellular structure of the organ of Corti that permits the prestin motor system to amplify the movements of the entire organ of Corti.

The present data suggest that before therians evolved, mammalian prestins (presumably such as those shown in monotremes today) were not so highly specialized for a motor function at very high frequencies (Tan et al. 2011). Remarkably, highly evolved and specialized prestins that are found in modern ultrasound–echolocation species such as bats and toothed whales show many of the same sequence changes over evolutionary time, but evolved independently and during different geological time periods (Li et al. 2010; Liu et al. 2010). The evolution of acetylcholine receptor systems that control the outer hair-cell efferent feedback correlates with the evolution of prestins in therians (Elgoyhen and Franchini 2011) and suggests a parallel evolution of control systems, on the one hand, and a motor system, on the other hand, in the mammalian inner ear. It would be very interesting to see whether bat and whale efferent systems show parallel evolution of modified acetylcholine receptors. Thus, the history of prestins within the various mammalian groups also suggests that high-frequency hearing was not ancestral in mammalian evolution and that very high-frequency hearing only evolved in therians.

When Did Mammals Develop High-Frequency and Ultrasonic Hearing?

Although it has always been a tacit assumption that most mammals hear really high frequencies, the evolutionary facts tell us that high-frequency hearing is both a relatively restricted and a moderately recent event. The above conclusion concerning the evolution of prestins is an important one, since it supports the structural data that indicate that early mammals heard essentially low to intermediate frequencies.

High frequency hearing apparently developed rapidly beyond 20 kHz after therians evolved their delicate middle ears, prestins, and coiled cochleae to a degree resembling those in modern species (<100 Ma). One of the decisive selection pressures for, and consequences of, an improved high-frequency hearing was the ability to use interaural differences for sound localization (e.g., Heffner et al. 2001). Due to their different evolutionary trajectories, mammals lack a pressure gradient middle ear, that is, there is no wide buccal cavity that connects the middle ear spaces (Manley 2010; Christensen-Dalsgaard 2010). In many nonmammals, this connection allows even low frequency sounds to interact across the head and creates directional effects before the inner ears are stimulated. In lizards, for example, the largest interaural differences are generally found between 1 and 3 kHz (Christensen-Dalsgaard and Manley 2005). In the small early mammals, the lack of such a directional system was a handicap that would have been more than compensated for by the evolution of pinnae, high-frequency hearing, and much-enhanced neural processing. Unfortunately, we do not know exactly when any of these features evolved, but the elongated cochlea and pinnae presumably predated the split of the eutherian and metatherian lineages and the later refinements of the neural pathways.

Therian mammals showed numerous divergences of groups, both before and after the important K-T catastrophic event (~65 Ma) that led to the demise of dinosaurs and a loss of 65 % of species worldwide (including mammals, Shoshani and McKenna 1998; Bininda-Emons et al. 2007; Luo 2007). The history of the middle ear and the cochleae of the later divergent groups is by no means uniform. Some, such as the microchiropteran bats, took up flight as a means of prey capture and dispersal. The earliest bats (Simmons et al. 2008) did not have exceptional cochleae, suggesting that they did not echolocate (small prey, at least). Later microchiropteran bat fossils show a dramatic increase in the relative size of their cochleae, similar to modern species, suggesting that the use of ultrasonic echolocation in this group developed less than 50 Ma ago. Microchiropteran prestins also evolved rapidly during this time and, as noted, developed characteristic molecular features that also evolved independently in toothed whales, whose evolution began much later (about 35 Ma ago; McGowan et al. 2009; Zhou et al. 2011). These whales were large, of course, but were able to use high frequencies because, as water dwellers, they were able to abandon their “land-lubber” middle ear (Nummela et al. 2007), whose frequency response is strongly correlated with body size. New evidence points to a more recent, explosive evolution of oceanic dolphins within the last 11 Ma (McGowan 2011).

Many terrestrial therian mammalian groups, such as primates, evolved during the Cenozoic towards larger body size, which would be expected to correlate with better low-frequency hearing sensitivity (Plassmann and Brindle 1992; Rosowski 1992). Indeed, for 35 Ma after the K-T event, there was an exponential increase in maximum mammalian body mass (Evans et al. 2012). The hearing systems of these mammals, thus, trended during this time period more towards lower, rather than (only) higher frequency sensitivity. In this respect, and in this respect only, Masterton et al.'s (1969) conclusion, that lower frequency hearing developed later in mammalian evolution is indeed appropriate when applied to the last phase of the evolution of larger-bodied therian groups such as primates (Armstrong et al. 2011; Coleman and Boyer 2012).

The variety of modern therian groups correlates with a great diversity of structure and physiology, even within animals of the same size (e.g., Heffner et al. 2001), which reflects more than 100 Ma of evolution. This can be illustrated by reflecting on the diversity shown by one structure, the “bulla.” Bullae are spaces around the middle ear ossicles that can, in relation to the head size, be quite large and influence the impedance of the tympanic membrane. Far from being uniform structures with a single origin, bullae obviously arose quite a number of times independently and are constructed out of different parts of bony, cartilaginous, and even membranous tissue components (Novacek 1977). Although it is generally true that larger mammals have a lower upper frequency limit than smaller mammals, remarkable specializations are seen, as in the low upper frequency limit (~13 kHz) of the very small mole rats (Hemilä et al. 1995; Müller et al. 1992).

Major Evolutionary Traits in the Evolution of Amniote Hearing Organs

The evolution of mammalian cochleae was, of course, preceded by more ancient cochlear structures and evolved in parallel to equivalent organs in amphibians, birds, and lizards (Fig. 2). The earliest land vertebrates had an auditory papilla that rested on solid tissue and presumably had an otolithic covering. In amphibians, the otolithic membrane was replaced by a tectorial membrane free of otoliths. In all the following lineages, a freely suspended basilar membrane originated, but in some, such as lepidosaurs (lizards and relatives), this membrane did not show local frequency tuning. In other lineages, the basilar membrane showed partial tuning, with only moderate frequency selectivity (birds and presumably early mammals). Only in the therian mammalian lineage did a bony support system originate for the basilar membrane and this was later accompanied by sharp frequency selectivity of the oscillations of the basilar membrane/organ of Corti complex (Fig. 2).

FIG. 2
figure 2

Schematic representation of the major steps in the structural evolution of the cochlea of land vertebrates. Each stage shows an evolutionary trait accompanied, when appropriate, by a representative modern vertebrate group showing this trait or stage of development (this does not imply that these named modern groups form an evolutionary sequence!). The sequence should not be taken to imply that, for example, birds and crocodiles evolved directly from lizards. Rather the animal groups named simply represent those species that remain at a particular stage of evolutionary development. As noted in the text, a structure that can be termed “organ of Corti” developed right at the beginning of mammalian evolution and is thus also to be found in monotremes. The final trait that enabled the evolution of high-frequency hearing in therian mammalian lineages was the fusion of hard and soft tissues in the cochlea, providing an improved impedance match to the stiff middle ears of early mammals.

Conclusions

Had multituberculate mammals survived until modern times, we would probably have found their hearing to be roughly comparable to, but perhaps somewhat less sensitive than, birds. Hearing in modern monotreme mammals is little better. As noted above, the existence of such groups should clearly remind us that the evolution of the remarkably sensitive and specialized, ultrasonic auditory organs of some modern therian mammals was by no means an inevitability of mammal evolution. Indeed, a series of remarkable preadaptations that were of less consequence at the time of their origin (e.g., three-ossicle middle ear, integration of hard and soft tissues, cochlear coiling, prestin) were later essential as the basic framework for the evolution of elongated, highly sensitive auditory organs which upper frequency limit in some cases exceeds 100 kHz.

Thus, three parallel series of developments over 150 Ma led to high-frequency hearing only in most modern therian cochleae: (1) the initially stiff middle ear that retained structural aspects from its past but gradually became lighter and more freely suspended. (2) The initially very short cochlea was gradually elongated, and the soft tissues incorporated bony support elements of the basilar membrane and thus better matched the middle ear impedance. The cochlea coiled and eliminated the lagena macula. (3) Prestins gradually evolved into more effective components of a motor system specialized for effects in a useful range of cell potentials, with clear, further specializations in late-evolving echolocating species.