Review

Introduction

Try eating a crisp (or potato chip) without making a noise. It is, quite simply, impossible! The question to be addressed in this article concerns the role that such food-related eating sounds play in the perception of food or drink. Do you, for example, think that your experience of eating a crispy, crunchy, or crackly food differs as a function of whether you find yourself at a noisy party, or while listening to loud white noise (if you happen to find yourself in a psychologist’s laboratory; [1])? The sounds that we hear when we eat and drink, and their impact on us, constitute the subject matter of this article.

In the pages that follow, I hope to convince you that what we hear when we bite into a food or take a sip of a drink—be it the crunch of the crisp or the fizz of the carbonation in the glass—plays an important role in our multisensory perception of flavour, not to mention in our enjoyment of the overall multisensory experience of eating or drinking. What we hear can help us to identify the textural properties of what we, or for that matter anyone else, happens to be eating: How crispy, crunchy, or crackly a food is or even how carbonated the cava. Importantly, as we will see below, sound plays a crucial role in determining how much we like the experience. Indeed, it turns out that crispness and pleasantness are highly correlated when it comes to our rating of foods [2]. That said, many of my academic colleagues would rather restrict the contribution of sound to a minor modulatory role in texture perception.a And, as we will also see in a moment, some firmly believe that what we hear has absolutely nothing to do with the perception of flavour. In this article, I hope to convince you otherwise.

I would argue that the zeitgeist on this issue is slowly starting to change. I have certainly noticed a number of my scientific colleagues tentatively including sound as one of the senses that can impact on the experience of food and drink. For instance, Stevenson ([3], p. 58) believes that crispness is a flavour quality. A number of researchers now acknowledge the fact that the sound of consumption is an important factor affecting the consumers’ experience of food and drink [4, 5]. And, as we will see later, food sounds have a particularly noticeable influence on people’s perception of crispness [2, 6]. A growing number of chefs are now considering how to make their dishes more sonically interesting, using everything from a sprinkling of popping candy through to using the latest in digital technology (see [7, 8], for reviews).

I want to take a look at the older research on food sounds as well as the latest findings from the gastrophysics lab. The evidence concerning the contribution of audition to crispy, crunchy, crackly, carbonated, and creamy sensations will be reviewed. I will then go on to illustrate how the cognitive neuroscience-inspired approach has revolutionized our understanding in this area over the last decade or so.

Auditory contributions to flavour perception

The majority of reviews on the topic of multisensory flavour perception either do not talk about audition or else, if they do, provide only the briefest mention of this ‘forgotten’ flavour sense. I have looked at a number of representative review articles and books on flavour that have been published over the decades (and which are arranged chronologically below) and tallied-up just how much (or should that not be how little) coverage the authors have given over to hearing. The percentages tell their own story: Crocker [9] 0%; Amerine, Pangborn, and Roessler [10] <1%; Delwiche [11] 3%; Verhagen and Engelen [5] <1%; Stevenson [3] 2%; Shepherd [4] 1%; and Stuckey [12] 4% (these percentages were calculated by dividing the number of book pages given over to audition by the total number of book pages. Note that if each of the five senses were given equal weighting, then you would expect to see a figure closer to 20%). One could all too easily come away from such literature reviews with the distinct impression that what we hear simply does not play any significant role in our experience of food and drink. How else to explain the absence of material on this sense. Delwiche ([11], p. 142) seems to have captured the sentiment of many when she states that ‘While the definitive research remain [sic] to be done, the interaction of sound with the chemical senses seems unlikely’.

Indeed, the downplaying of sound’s influence would appear to be widespread amongst both food professionals and the general public alike [13, 14]. For instance, when 140 scientists working in the field of food research were questioned, they rated ‘sound’ as the least important attribute contributing to the flavour of food, coming in well behind taste, smell, temperature, texture appearance, and colour (see Table 1). Furthermore, sound also came in as the least essential and most changeable sense where flavour was concerned. I believe that these experts are all fundamentally underestimating the importance of sound.

Table 1 Summary of the opinions of 140 experts concerning the importance of various sensory attributes to flavour showing in what little regard sound is considered (adapted from[13])

The results of another study [14] highlight that similar opinions are also held by regular consumers as well. Eighty people without any special training or expertise in the food or beverage sector were asked to evaluate the relative importance of each of the senses to a wide range of products (N = 45), including various food and drink items. Interestingly, regardless of the product category, audition was rated as the least important of the senses (see Table 2). Perhaps it should come as no surprise, then, to find that auditory cues also fail to make it into the International Standards Organization definition of flavour (see [15, 16]). Indeed, according to their definition, flavour is a ‘Complex combination of the olfactory, gustatory and trigeminal sensations perceived during tasting. The flavour may be influenced by tactile, thermal, painful and/or kinaesthetic effects’.

Table 2 Results of a study demonstrating that even regular consumers pay surprisingly little attention to what they hear while eating and drinking (Source:[14])

One thing to bear in mind here though is that there is actually quite some disagreement in the field as to how ‘flavour’ should be defined (e.g. [11, 17]). While some researchers would prefer that the term be restricted to gustation, retronasal olfaction, and possibly also trigeminal inputs (see, for example, [15, 16]), others have suggested that the senses of hearing and vision should also be incorporated [4, 5, 1820]. There is no space to get into the philosophical debate surrounding this issue here (the interested reader is directed to [21]). In this article, I will use the term ‘flavour’ in a fairly broad sense to mean, roughly, ‘the overall experience of a food or beverage’ (see [5], for a similar position). As such, the consumer’s perception of the oral-somatosensory and textural properties of a foodstuff will be treated as a component part of their flavour experience (though see [11], for a different position).

The traditional view (that sound has little role to play in our flavour experiences) contrasts with the position adopted by a number of contemporary modernist chefs such as Heston Blumenthal who, for one, is convinced that you need to engage all of a diner’s senses if you want to create truly memorable dishes. Just take the following quote from the cover sheet of the tasting menu at The Fat Duck restaurant in Bray: ‘Eating is the only thing we do that involves all the senses. I don’t think that we realize just how much influence the senses actually have on the way that we process information from mouth to brain’. (see http://www.fatduck.co.uk). Ferran Adrià seems to have been taking a similar line when he said that ‘Cooking is the most multisensual art. I try to stimulate all the senses’ [22].

The last few years have seen something of a renaissance of interest in this heretofore neglected ‘flavour’ sense [2325]. The crucial point to bear in mind here is that it turns out that most people are typically unaware of the impact that what they hear has on how they perceive and respond to food and drink. Consequently, I would argue that intuition and unconstrained self-report, not to mention questionnaires asking about the role of audition in flavour, are unlikely to provide an altogether accurate assessment of the sense’s actual role in our multisensory experiences (whether or not those experiences relate to food or drink). Indeed, the decades of research from experimental psychologists have shown that the kinds of responses one gets from direct questioning rarely provide particularly good insights into the true drivers of people’s behaviour, especially when one is looking at the interaction between the senses that gives rise to multisensory perception [2628]. This means that we will need to focus on the results of well-designed empirical studies using more objective psychophysical measures in order to highlight the relative importance of the various factors/senses that really influence flavour perception in us humans.

Why think that what we hear is so much more important than we intuitively believe?

There are several lines of evidence pointing to the importance of sound to our food and drink experiences. In one early study, for instance, Szczesniak and Kleyn [29] reported that consumers mentioned ‘crisp’ more than any other descriptor in a word association test in which they had to list four descriptors in response to each of 79 foods. Now, while you might imagine that crispness is strictly a tactile attribute of food and, hence, that such results provide evidence for the importance of oral-somatosensation to our experience of food, the fact of the matter is that auditory cues play a key role in the delivery of this sensation [6]. These authors went so far as to suggest that crispness was an auditory sensation. Many chefs also appear to have texture top of mind: Just take three of the sensations that spring into the mind of the North American chef, Zakary Pelaccio, while eating: crispy (nicely fried chicken skin), fresh and crispy (raw veggies and herbs), and crunchy (corn nuts) ([30] p. 9).

Back in 2007, researchers from the University of Leeds came up with an equation to quantify just how important the crispness of the bacon, especially the sound of the crunch, is to the perfect BLT sandwich (see [31], pp. 79–80). Crucially, crispness was rated as the key element in creating the ideal offering. Dr. Graham Clayton, the lead researcher on the project, stated that ‘We often think it’s the taste and smell of bacon that consumers find most attractive. But our research proves that texture and the crunching sound is just – if not more – important’ [32].

Another example of the unrecognized importance of sound comes from the following anecdote: Some years ago, researchers working on behalf of Unilever asked their brand-loyal consumers what they would change about the chocolate-covered Magnum ice cream (a product that first appeared on the shelves in Sweden back in 1989). A frequent complaint that came back concerned all of those bits of chocolate falling onto the floor and staining one’s clothes when biting into the ice cream. This feedback was promptly passed back to the product development team who set about trying to alter the formulation so as to make the chocolate coating adhere to the ice cream better. In so doing, the distinctive cracking sound of the chocolate coating was lost. And when the enhanced product offering was launched, consumers complained once again. It turned out that they did not like the new formulation either. The developers were confused. Had not they fixed the original problem that consumers had been complaining about. Nevertheless, people simply did not like the resulting product. Why not? Were consumers simply being fickle? In this case, the answer was no—though the story again highlights the dangers of relying on subjective report.

Subsequent analysis revealed that it was that distinctive cracking sound that consumers were missing. It turned out that this was a signature feature of the product experience even though the consumers (not to mention the market researchers) did not necessarily realize it. Ever since, Unilever has returned to the original formulation, thus ensuring a solid cracking sound every time one of their customers bites into one of their distinctive ice cream bars.

In fact, once you realize just how important the sound is to the overall multisensory experience, you start to understand why it is that the food marketers spend so much of their time trying to accentuate the crispy, crunchy, and crackly sounds in their advertisements [33]. I, for one, am convinced that the chocolate crackling sound is accentuated in the Magnum adverts [34, 35]. Obviously, you want to make sure that you get the sensory triggers just right if you happen to be selling 2 billion of these ice creams per year (http://alvinology.com/2014/05/25/magnum-celebrates-25-years-of-pleasure/). Certainly, there is lots of talk of ‘cracking chocolate’ in online descriptions of the product (http://www.mymagnum.co.uk/products/) and in blogs: ‘I experienced the crack of the chocolate while biting into it and the “mmmmm” sound in my mind while eating the ice-cream. I was lost into it :) It was pure pleasure indeed’. (http://rakshaskitchen.blogspot.com/2014/02/magnum-masterclass-with-kunal-kapur.html).

Listen carefully enough and I think that you can often tell that the informative sounds of food consumption appear to have been sonically enhanced in many of the food ads seen on TV. A few years back, a Dutch crisp manufacturer named Crocky took things even further. They ran an advert that specifically focused on the crack of their crisps. The sound was so loud that it appeared to crack the viewer’s television screen when eaten on screen [36].

Why do people like crispy so much?

Crispness is synonymous with freshness in many fruits and vegetables. Indeed, lettuce is the first food that comes to the mind of many North Americans when asked to name examples of crispy foods [37]. Other foods that people often describe as especially crispy include tortilla chips and, perhaps unsurprisingly, crisps [38]. The link with freshness is thought to be part of the evolutionary appeal of crisp and crunchy foods [33, 39]. That said, for some people, these sonic-textural attributes have become desirable in their own right, regardless of their link to the nutritional properties of food. Why else, after all, are crisps so popular? It certainly cannot be for nutritional content nor is the flavour all that great when you come to think about it. Rather, the success of this product is surely all about the sonic stimulation—the crispy crunch. Over the years, a large body of research has documented that the pleasantness of many foods is strongly influenced by the sounds produced when people bite into them (e.g. [2, 6, 40, 41]).

Summarizing what we have seen in this section, while most people—food scientists and regular consumers alike—intuitively downplay (disregard, even) the contribution of sound when thinking about the factors that influence their perception and enjoyment of food, several lines of evidence now hint at just how important what we hear really is to the experience of what we eat (and presumably also to what we drink).

A brief history of the study of the role of hearing in flavour perception

It was during the middle decades of the 20th Century that food scientists first became interested in the role of audition (see [4244], for early research). In these initial studies, however, researchers tended to focus their efforts on studying the consequences, if any, of changing the background noise on the perception of food and drink (see [1], for a review). Within a decade, Birger Drake had started to analyze the kinds of information that were being conveyed to the consumer by food chewing and crushing sounds. Drake was often to be found in the lab mechanically crushing various foods and recording the distinctive sounds that were generated prior to their careful analysis [40, 4548]. Perhaps the key finding to emerge from his early work was that the sounds produced by chewing or crushing different foods varied in terms of their amplitude, frequency, and temporal characteristics.

Thereafter, Zata Vickers and her colleagues published an extensive body of research investigating the factors contributing to the perception of, and consumer distinction between, crispness and crunchiness (not to mention crackliness) in a range of dry food products (e.g. [41, 4954]; see [6, 55], for reviews of this early research; and [56], for a more recent review). Basically, she found that those foods that are associated with higher-pitched biting sounds are more likely to be described as ‘crispy’ than as ‘crunchy’ ([55, 57, 58]; see also [59, 60]). To give some everyday examples of what we are talking about here (at least for those in the English-speaking world): Lettuce and crisps are commonly described as crisp, whereas raw carrots, croutons, Granola bars, almonds, peanuts, etc. are all typically described as crunchy. Crispy foods tend to give off lots of high-frequency sounds above 5 kHz. By contrast, analyze the acoustic energy given off while munching on a raw carrot and you will find lots of acoustic energy in the 1–2 kHz range instead.

To date, crackly sensations have not received anything like as much attention from the research community. That said, crackly foods can typically be identified by the sharp sudden and repeated bursts of noise that they make [61]. Masking these sounds leads to a decrease in perceived crackliness. It turns out that the number of sounds given off provides a reasonably good measure of crackliness. Good examples of foods that make a crackly sound include pork scratchings or the aptly named pork crackling.

Despite all of the research that has been conducted in this area over the years, it is still not altogether clear just how distinctive ‘crisp’ and ‘crunchy’ are as concepts to many food scientists, not to mention to the consumers they study [62, 63]. Certainly, the judgments of the crispness, crunchiness, and hardness of foods turn out to be very highly correlated [41]. Part of the problem here seems to be linguistic. Different languages just use different terms, or else simply have no terms at all, to capture some of these textural distinctions: To give you some idea of the problems that one faces when working in this area, the French describe the texture of lettuce as craquante (crackly) or croquante (crunchy) but not as croustillant, which would be the direct translation of crispy [59, 64]. Meanwhile, the Italians use just a single word ‘croccante’ to describe both crisp and crunchy sensations.

Matters become more confusing still when it comes to Spanish speakers [63]. They do not really have their own words for crispy and crunchy, and if they do, they certainly do not use themb. Colombians, for instance, describe lettuce as ‘frisch’ (fresh) rather than as crispy. And when a Spanish-speaking Colombian wants to describe the texture of a dry food product, they either borrow the English work ‘crispy’ or else the French word ‘croquante’. This confusion extends to Spain itself, where 38% of those questioned did not know that the Spanish term for ‘crunchy’ was ‘crocante’. What is more, 17% of consumers thought that crispy and crunchy meant the same thing [63].

Of course, matters would be a whole lot simpler if there was some instrumental means of measuring the crispness/crunchiness/crackliness of a food. Then, we might not care so much what exactly people say when describing the sounds made by food products. However, it turns out that these are multisensory constructs, and hence, simply measuring how a food compresses when a force is applied to it provides an imperfect match to subjective ratings. A much better estimate of crispness, as perceived by the consumer, can be achieved not only by measuring the force-dependent deformation properties of a product but also by recording the sounds that are given off [51, 6567]. Taken together, these results suggest that the perception of crispness of (especially) crunchy foods (i.e. crisps, biscuits, cereals, vegetables, etc.) is characterized by tactile, mechanical, kinaesthetic, and auditory properties [50]. Of course, while it is one thing to demonstrate that the instrumental measures of crispness can be improved by incorporating some measure of the sound that the food makes when compressed, it is quite another to say that those sounds necessarily play an important role in the consumer’s overall experience of a food [68]. And while Vickers and Bourne [6] originally suggested that crispness was primarily an acoustic sensation, Vickers herself subsequently pulled back from this strong claim [49].

One relevant piece of evidence here comes from Vickers [41] who reported that estimates of the crispness of various foods such as celery, turnips, and Nabisco saltines were the same no matter whether people heard someone else biting into and chewing these foods as if they themselves actually got to bite and chew them. Meanwhile, Vickers and Wasserman [69] demonstrated that loudness and crispness are highly correlated sensory dimensions (see also [66]).

Assessing the relative contribution of auditory and oral-somatosensory cues to crispness perception

The participants in a study by Christensen and Vickers [70] rated the crispness of various dry and wet foods using magnitude estimation and separately judged the loudness of the chewing sounds. These judgments turned out to be highly correlated both when the food fractured on the first bite (r = 0.98) and when it further broke down as a result of chewing (r = 0.97; see Figure 1). Interestingly, though, the addition of masking sounds did not impair people’s judgments of the food. Such results were taken to suggest that both oral-somatosensory and auditory cues were (redundantly) providing the same information concerning the texture of the food that was being evaluated (though see also [1]).

Figure 1
figure 1

Graph showing the correlation between people’s rating on the crispness of a food based on the sound it makes while biting into the food versus when actually biting the food itself. Each dot represents a separate food [Source: [70]].

Interim summary

Despite the informational richness contained in the auditory feedback provided by biting into and/or chewing a food, people are typically unaware of the effect that such sounds have on their multisensory perception or evaluation of particular stimuli (see also [71]). While the overall loudness and frequency composition of food-eating sounds are certainly two of the most important auditory cues when it comes to determining the perceived crispness of a food, it should be noted that the temporal profile of any sounds associated with biting into crispy or crunchy foods (e.g. how uneven or discontinuous they are) can also convey important information about the rheological properties of the foodstuff being consumed, such as how crispy or crackly it is [69].

The multisensory integration approach to flavour perception

The opening years of the 21st Century saw the introduction of a radically different approach to the study of flavour perception, one that was based on the large body of research coming out of neurophysiology, cognitive neuroscience, and psychophysics laboratories highlighting the profoundly multisensory nature of human perception. Originally, the majority of this literature tended to focus solely on the integration of auditory, visual, and tactile cues in the perception of distal events, such as the ventriloquist’s dummy and beeping flashing lights (see [72, 73], for reviews). However, it was not long before some of those straddling the boundary between academic and applied food research started to wonder whether the same principles of multisensory integration that had initially been outlined in the anaesthetized animal model might not also be applicable to the multisensory perception of food and beverages in the awake consumer (see [5, 74, 75], for reviews that capture this burgeoning new approach to the study of flavour). It is to this field of research, sometimes referred to as gastrophysics [8, 76, 77], that we now turn.

Manipulating mastication sounds

The first research study based on the multisensory approach to flavour perception that involved sound was published in 2004. Zampini and Spence [78] took a crossmodal interaction that had originally been discovered in the psychophysics laboratory—namely, ‘the parchment skin illusion’—and applied it to the world of food. In this perceptual illusion, the dryness/texture of a person’s hands can be changed simply by changing the sound that they hear when they rub their palms together [7981]. Max Zampini and I wanted to know whether a similar auditory modulation of tactile perception would also be experienced when people bit into a noisy food product as well.

To this end, a group of participants was given a series of potato chips to evaluate. The participants had to bite each potato chip between their front teeth and rate it in terms of its ‘freshness’ or ‘crispness’ using an anchored visual analogue scale displayed on a computer monitor outside the window of the booth. In total, over the course of an hour-long experimental session, the participants bit into 180 Pringles, one after the other. During each trial, the participants received the real-time auditory feedback of the sounds associated with their own biting action over closed-ear headphones. Interestingly though, the participants typically perceived the sound as coming from the potato chip in their mouth, rather than from the headphones, due to the well-known ventriloquism illusion [82]c. On a crisp-by-crisp basis, this auditory feedback was manipulated by the computer controlling the experiment in terms of its overall loudness and/or frequency composition. Consequently, on some trials, the participants heard the sounds that they were actually making while biting into a crisp. On other trials, the overall volume of their crisp-biting sounds might have been attenuated by either 20 or 40 dB. The higher frequency components of the sound (>2 kHz) could also either be boosted or attenuated (by 12 dB) on some proportion of the trials. Interestingly, on debriefing, three quarters of the participants thought that the crisps had been taken from different packs during the course of the experiment.

The key result to emerge from Zampini and Spence’s [78] study was that participants rated the potato chips as tasting both significantly crisper and significantly fresher when the overall sound level was increased and/or when just the high-frequency sounds were boosted (see Figure 2). By contrast, the crisps were rated as both staler and softer when the overall sound intensity was reduced and/or when the high-frequency sounds associated with their biting into the potato chip were attenuated instead.

Figure 2
figure 2

Results of a study showing that the sound we hear influences the crispness of the crisp [Source:[78]].

Recently, a group of Italian scientists has extended this approach to study the role of sound in the perception of the crispness and hardness of apples [83]. Once again, reducing the auditory feedback was shown to lead to a reduction in the perceived crispness of the ‘Renetta Canada’, ‘Golden Delicious’, and ‘Fuji’ apples that were evaluated. More specifically, a small but significant reduction in mean crispness and hardness ratings was observed for this moist food product (contrasting with dry food products such as crisps), when the participants’ high-frequency biting sounds were attenuated by 24 dB and/or when there was an absolute reduction in the overall sound level. Thus, it would appear that people’s perception of the textural properties of both dry and moist food products can be changed simply by modifying the sounds that we heard.

The sound of carbonation

Our perception of the carbonation in a beverage is based partly on the sounds of effervescence and popping that we hear when holding a drink in our hand(s): Make the carbonation sounds louder, or else make the bubbles pop more frequently, and people’s judgments of the carbonation of a beverage go up [84]. That said, Zampini and Spence also reported that these crossmodal effects dissipate once their participants took a mouthful of the drink into their mouth. It would appear that the sour-sensing cells that act as the taste sensors for carbonation [85] and/or the associated oral-somatosensory cues [86] likely dominate the overall experience as soon as we take a beverage into our mouths, which, after all, is what we all want to do when we drinke. The bottom line here, then, is probably that oral-somatosensory and auditory cues play somewhat different roles in the perception of different food attributes. The research that has been published to date suggests that people appear to rely on their sense of touch more when judging the hardness of foods and the carbonation of drinks in the mouth. By contrast, the two senses (of hearing and oral-somatosensation) would appear to make a much more balanced contribution to our judgments of the crispiness of foods. And crackly may, if anything, be a percept that is a little more auditory dominant than the others.

The sound of creaminess

Not only do different foods make qualitatively different sounds when we bite into or chew them, but our mouth itself sometimes starts to sound a little different as a function of the food that we happen to put into it. This field of research is known as ‘acoustic tribology’ [87, 88]. One simple way to demonstrate this phenomenon is with a cup of strong black coffee. Find a quiet spot and take a mouthful. Swill the coffee around your mouth for a while and then swallow. Now rub your tongue against the top of your mouth (the palate) and think about the feeling you experience and the associated sound that you hear. Next, add some cream to your coffee and repeat the procedure. If you listen carefully enough, you should be able to tell that the sound and feel are quite different the second time around (see [89], for a video). In other words, once the cream has coated your oral cavity, your mouth really does start to make a subtly different sound because of the associated change in friction. Who knows whether our brains use such auditory cues in order to ascertain the texture of that which we have put into our mouths. The important point to note is that these sonic cues are always available, no matter if we pay attention to them or not. And some researchers have argued that such subtle sounds do indeed contribute to our perception of creaminess [90].

Squeaky foods

Now, ‘squeaky’ probably is not one of the first sounds that comes to mind when contemplating noisy foods. However, we should not neglect to mention this most unusual of sensations. Typically, this descriptor is used when talking about the sound we make when biting into halloumi cheese [91]. It is an example of the stick–slip phenomenon [92]. While the original version comes from Cyprus, the Fins have their very own version called Leipäjuusto [93]. While many people like the sound nowadays [94], traditionally, it was apparently judged to be rather unattractive (see [10], p. 228).

Interim summary

Taken together, the results of the cognitive neuroscience-inspired food research that has been published to date (e.g. [78]) provide support for the claim that modifying food-related auditory cues, no matter whether those sounds happen to come from the food itself (as in the case of a carbonated beverage) or result from a person’s interaction with it (as in the case of someone biting into a crisp), can indeed impact on the perception of both food and drink. That said, it should be noted that the products that have been used to date in this kind of research have been specifically chosen because they are inherently noisy. It would seem reasonable to assume that the manipulation of food-related auditory cues will have a much more pronounced effect on the consumer’s perception of such noisy foods than that on their impression of quieter (or silent) foodstuffs—think sliced bread, bananas, or fruit juice. Having said that, bear in mind that many foods make some sort of noise when we eat them: Not just crisps and crackers but also breakfast cereals and biscuits, not to mention many fruits and vegetables (think apples, carrots, and celery).f Even some seemingly silent foods sometimes make a distinctive sound if you listen carefully enough: Just think, for instance, of the subtle auditory cues that your brain picks up as your dessert spoon cuts through a beautifully prepared mousse. And, as we have just seen, even creaminess makes your mouth sound a little different.

On the commercialization of crunch

Given the above discussion, it should come as little surprise to find that a number of the world’s largest food producers (e.g. Kellogg’s, Nestlé, Proctor & Gamble, Unilever, etc.) are now starting to utilize the cognitive neuroscience approach to the multisensory design (and modification) of their food products. Kellogg’s, for one, certainly believes that the crunchiness of the grain (what the consumer hears and feels in the mouth) is a key driver of the success of their cornflakes (see [95], p. 12). According to Vranica [96]: ‘chip-related loudness is viewed as an asset. Frito-Lay has long pitched many of its various snacks as crunchy. Cheetos has used the slogan “The cheese that goes crunch!” A Doritos ad rolled out in 1989 featured Jay Leno revealing the secret ingredient: crunch.’ Once upon a time, Frito-Lay even conducted research to show that Doritos chips give off the loudest crack [97]. This harking back to the 1953 commercial created by the Doyle Dane Bernbach ‘Noise Abatement League Pledge’ claiming that Scudder’s were ‘the noisiest chips in the world’ (http://www.youtube.com/watch?v=293DQxMh39o; [98]).

In principle, the experimental approach developed by Zampini and Spence [78] enables such companies to evaluate a whole range of novel food or beverage sounds without necessarily having to go through the laborious process of trying to create each and every sound by actually modifying the ingredients or changing the cooking process (only to find that the consumer does not like the end result anyway). Clearly, then, sound is no longer the forgotten flavour sense as far as the big food and drink companies are concerned. Indeed, from my own work with industry, I see a growing number of companies becoming increasingly interested in the sounds that their foods make when eaten.

Of course, sometimes, it turns out to be impossible to generate the food sounds that the consumers in these laboratory studies rate most highly. At least, though, the food manufacturer has a better idea of what it is they are aiming for in terms of any modification of the sound of their product. In a way, the approach to the auditory design of foods is one that the car industry have been utilizing for decades, as they have tried to perfect the sound of the car door as it closes [99] or the distinctive sound of the engine for the driver of a high-end marque (see [35], for a review).

Caveats and limitations

Before moving on, it is important to note that Zampini and Spence [78] did not modify the bone-conducted auditory cues (that are transmitted through the jaw) when their participants bit into the potato chips in their studyg. Given that we know that such sounds play an important role in the evaluation of certain foodstuffs [59, 100], it will certainly be interesting in future research to determine whether there are ways in which they can either be cancelled out, or else modified, while eating (in order to better understand their role in consumer perception). It should also be noted here that Zampini and Spence’s auditory feedback manipulations were certainly not subtle [78, 84]. A 40-dB difference in sound level between the loudest and quietest auditory feedback conditions is a fairly dramatic change—just remember here that every 10 dB increase in the sound level equates to a doubling of the subjective loudness of a sound. That said, subsequent research has shown that similar crossmodal effects of sound on texture can also be obtained using much more subtle auditory manipulations.

Another important point to bear in mind here is that much of the research demonstrating the influence of auditory cues on texture perception has been based on judgments of the initial bite [78, 83]. However, if Harrington and Pearson’s [101] early observation that people commonly make between 25 and 47 bites before they end up swallowing a piece of pork meat is anything to go by, then one would certainly want to evaluate judgement of a food’s texture after swallowing (rather than after the first bite) in order perhaps to get a better picture of just how important what we hear really is to our everyday eating experiences (see Figure 3). That said, remember here that our first experience of a food very often plays by far the most important role in our experience of, and subsequent memory for, that which we have consumed [102]h. Indeed, observational studies show that people normally use the auditory cues generated during the first bite when trying to assess crispness of a food ([39, 103]; see also [70]).

Figure 3
figure 3

Graphs highlighting the general decline in the amplitude of mastication sounds for (A) crisp brown bread, (B) a half peanut, and (C) an apple as a function of the time spent masticating. The different symbols refer to different experiments conducted with each of the foods [Source: [45]; Figure Ten].

Finally here, it should be noted that the boosting of all sound frequencies above 2 kHz might not necessarily be the most appropriate manipulation of the sound envelope associated with food mastication/consumption sounds. Tracing things back, such broad amplification/attenuation was first introduced by researchers working in the lab on the parchment skin illusion [80]. These sonic manipulations were then adopted without much further modification by food researchers. As it happens, Pringles do tend to make a lot of noise at frequencies of 1.9 kHz and above when crushed mechanically [59, 104]. Hence, boosting or attenuating all sounds above 2 kHz will likely have led to a successful manipulation of the relevant auditory cues in the case of Zampini and Spence’s [78] Pringles study. I am not aware of any research that has documented the most important auditory characteristics of the sound of the popping of a carbonated drink. In the future, it will be interesting to determine which specific auditory frequency bands convey the most salient information to the consumer when it comes to different classes of products and/or different product attributes (be it crispy, crunchy, crumbly, crackly, creamy, moist, sticky, fizzy, etc.).

Mismatching masticating sounds

On occasion, researchers have investigated the consequences of presenting sounds locked to the movement of a person’s jaw that differ from those actually emanating from the mouth. There are, for instance, anecdotal reports of Jon Prinz having his participants repeatedly chew on a food in time with a metronome. After a few ticks, Prinz would take his subject by surprise and suddenly play the sound of breaking glass (or something equally unpleasant) just as they started to bite down on the food! Apparently, his subjects’ jaws would simply freeze-up. It was almost as if some primitive self-preservation reflex designed to avoid bodily harm had suddenly taken over.

Meanwhile, Japanese researchers pre-recorded the sound of their participants masticating rice crackers (a food that has a particularly crunchy texture) and rice dumplings (which, by contrast, have a very sticky texture; [105]). These sounds were then played back over headphones while participants chewed on a variety of foods including fish cakes, gummy candy, chocolate pie, marshmallow, pickled radish, sponge cake, and caramel corn. Importantly, the onset of the mastication sounds was synchronized with those of the participant’s own jaw movements. The ten people who took part in this study had to estimate the degree of texture change and the pleasantness of the ensuing experience either with or without added mastication sounds. Crucially, regardless of the particular food being tested (or should that be tasted), the perceived hardness/softness, moistness/dryness, and pleasantness of the experience were all modified by the addition of sound. Specifically, the foods were rated as harder and dryer when the rice cracker sounds were presented than without any sonic modification. By contrast, adding the sound of masticating dumplings resulted in the foods’ texture being rated as softer and moister than under normal auditory feedback.

Finally, the participants in another study from the same research group were given two chocolates that had a similar taste but a very different texture: one called Crunky (Lotte) was a crunchy chocolate that contained malt-puffs and hence gave rise to loud mastication sounds. The other, Aero (Nestle), contains nothing but air bubbles and hence does not make too much noise at all when eaten. The pre-recorded mastication sounds of the crunchy chocolate were then presented while the blindfolded participants chewed on a piece of the other chocolate.i The participants bit into both kinds of chocolate while either listening only to their self-generated mastication sounds, or else while the pre-recorded crunchy sounds were played back over noise cancelling headphones [106]. Interestingly, the Aero chocolate was misidentified as the Crunky chocolate 10–15% more often when the time-locked crunchy mastication sounds were presented. That said, given that only three participants took part in this study, the findings should not be treated as anything more than preliminary at this stage.

Interim summary

Taken together, the evidence that has been published over the last decade or so clearly highlights the influence that auditory cues have on the oral-somatosensory and textural qualities of a number of different foods. Boosting or attenuating the actual sounds of food consumption or the substituting of another sound that just so happens to be time locked to a person’s own jaw movements can nevertheless result in some really quite profound perceptual changes. It seems plausible to look for an explanation of these findings in terms of the well-established principles of multisensory integration [23, 72]. Indeed, it would not be at all surprising to find that such crossmodal effects can be effectively modelled in terms of the currently popular ‘maximum likelihood estimation’ approach to cue integration [107109]. The basic idea here is that the more reliable a sensory cue is, the more heavily it will be weighted by the brain in terms of the overall multisensory percept than other less reliable cues (e.g. when trying to judge how crispy that crisp really is; see also [110]).

Alternatively, however, it is also worth noting that auditory cues may influence our judgments of food texture because they simply capture our attention much more effectively than do oral-somatosensory cues [111].j Indeed, after they had finished the experiment, the majority of Zampini and Spence’s [78] participants reported anecdotally that the auditory information had been more salient to them than the oral-tactile cues. Of course, the within-participants design of their study meant that the participants would have been acutely aware of the sound changing from trial to trial, likely accentuated any auditory attentional capture effects.

In the future, it will be interesting to assess the relative contribution, and possible dominance, of certain sensory cues when they are put into conflict/competition with one another in the evaluation and consumption of realistic food products (e.g. see [112, 113], for examples along these lines). When the differences between the estimates provided by each of our senses are small, one normally sees integration/assimilation (depending on whether the cues are presented simultaneously or successively). However, when the discrepancy between the estimates provided by the senses differ by too great a margin, then you are likely to see a negatively valenced disconfirmation of expectation response instead [114, 115]. That said, if you get the timing right [106], the brain has a strong bias toward combining those cues that are perceived to have occurred at the same time, or that appear to be correlated temporally [116], even if those cues have little to do with one another [117].

Conclusions

Sound is undoubtedly the forgotten flavour sense. Most researchers, when they think about flavour, fail to give due consideration to the sound that a food makes when they bite into and chew it. However, as we have seen throughout this article, what we hear while eating plays an important role in our perception of the textural properties of food, not to mention our overall enjoyment of the multisensory experience of food and drink. As Zata Vickers ([54], p. 95) put it: ‘Like flavors and textures, sometimes sounds can be desirable, sometimes undesirable. Always they add complexity and interest to our eating experience and, therefore, make an important contribution to food quality.’ Indeed, the sounds that are generated while biting into or chewing food provide a rich source of information about the textural properties of that which is being consumed, everything from the crunch of the crisp and the crispy sound of lettuce, through to the crackle of your crackling and the carbonation in your cava. Remember also that, evolutionarily speaking, a food’s texture would have provided our ancestors with a highly salient cue to freshness of whatever they were eating.

In recent years, many chefs, marketers, and global food companies have started to become increasingly interested in trying to perfect the sound that their foods make, both when we eat them, but also when we see the model biting into our favourite brands on the screen. It is, after all, all part of the multisensory flavour experience. In the future, my guess is that various technologies, some of which will be embedded in digital artefacts, will increasingly come to augment the natural sounds of our foods at the dining table [8, 23]. And that is not all. Given the growing ageing population, there may also be grounds for increasing the crunch in our food in order to make it more interesting (not to say enjoyable) for those who are starting to lose their ability to smell and taste food [118]. Finally, before closing, it is worth noting that the majority of the research that has been reviewed in this article has focused on the moment of tasting or consumption. However, on reflection, it soon becomes clear that much of our enjoyment of food and drink actually resides in the anticipation of consumption and the subsequent memories we have, at least when it comes to those food experiences that are worth remembering (see Figure 4). As such, it will undoubtedly be worthwhile for future research to broaden out the timeframe over which our food experiences are studied. As always, then, much research remains to be conducted.

Figure 4
figure 4

The majority of research on multisensory flavour perception has focused on the moment of consumption. It is, however, important to note that our enjoyment of eating and drinking often extends over a much longer time period, encompassing both the anticipation of consumption and the subsequent memories associated with consumption. Future research will therefore need to start investigating the role of the various senses (and this includes audition) in the broader range of our food-related thoughts and memories.

Endnotes

aIf you take away the textural cues by pureeing foods, then people’s ability to identify them declines dramatically ([12], p. 91).

b‘Crujiente’ = crispy, while crocante comes from the French and has apparently almost disappeared from the Spanish language [63].

cThis is an audiotactile version of the phenomenon that we all experience when our brain glues the voice we hear onto the lips we see on the cinema screen despite the fact that the sounds actually originate from elsewhere in the auditorium [107].

dOf course, at this point, it could be argued that while these studies show that sound plays an important role in the perception of food texture, this is not the same as showing an effect on the flavour of food itself.

eEvolutionarily speaking, carbonation would have served as a signal to our ancestors that a food had gone off, i.e. that a piece of fruit was overripe/fermenting [85], thus making it so surprising that it should nowadays be such a popular sensory attribute in beverages; by contrast, it has been argued that crunchiness is a positive attribute since it signals the likely edibility of a given foodstuff and is associated with freshness [119, 120]. It is intriguing to consider here whether this difference in the meaning of different auditory cues (signalling bad vs. good foods, respectively) might not, then, have led to the different results reported here (cf. [121]). On the other hand, though, it also has to be acknowledged that the specific frequency manipulation introduced by Zampini and Spence [78] may simply not have been altogether ecologically valid, or meaningful, in terms of the perception of carbonation [84].

fAnd as we saw earlier, research from Vickers [41, 122] has shown that we can use those food biting and mastication sounds in order to identify a food, even when it is someone else who happens to be doing the eating.

gHere, we need to distinguish between air-conducted sound, the normal way we hear sound, and bone-conducted sound. It turns out that the jawbone and skull have a maximum resonance at around 160 Hz [33, 123].

hThe pitch of eating sounds changes (specifically it is lowered) by changing from biting to chewing, and, as a result, judgments of crispness tend to be lower ([55, 58]; though see [124]). Chew a food with the molars and the mouth closed and what you will hear is mostly the bone-conducted sound, thus lower in pitch.

iOne might worry here about the effect of blindfolding on participants’ judgments [125, 126]. However, to date, researchers have been unable to demonstrate a significant effect of blindfolding on people’s loudness, pitch, or duration judgments when it comes to their evaluation of food-eating sounds [112].

jRietz [127] would seem to have been thinking of something of the sort when he suggested many years ago that eating blanched almonds with smoked finnan haddie reduced the fishy flavour of the latter through ‘an illusion caused by the dominance of the auditory sense over that of taste and smell generated by the kinesthesis of munching’. However, no experimental evidence was cited in support of this claim.