Introduction

The sense of taste plays a significant role in nutrient balance, the flavor of foods and beverages, and protection against the ingestion of spoiled and poisonous food. As noted by Schier and Spector (2019, p. 605), “The taste system is the gatekeeper of the alimentary tract, permitting and promoting the entry of nutrients while preventing and rejecting ingestion of potentially harmful substances.” Data from the National Health and Nutrition Examination Survey (NHANES) suggest that over 25 million Americans 49 years of age and older suffer from chronic taste problems (Rawal et al., 2015; Liu et al., 2016) – problems that commonly go unrecognized until formal testing occurs (Soter et al., 2008).

Although a wide variety of taste tests have been described in the literature, they are rarely employed outside of academic and industrial settings (for reviews, see Doty, 2019a; Frank et al., 1995; Hawkes & Doty, 2017; Snyder et al., 2015). This reflects, in part, practical limitations of most taste tests, notably their reliance on liquid stimuli for stimulus presentation or rinsing. Thus, unless purified water is available and a means for sipping and/or expectorating the oral contents are present, quantitative chemical taste testing is not possible.

This paper describes the development and clinical validation of a practical, reliable, and portable taste test, termed the Waterless Empirical Taste Test (WETT®). This test requires neither liquid tastants nor liquid rinses, thereby being amenable to office, clinic, and bedside applications. Its stimuli include not only representatives of the classic taste qualities of sweet, sour, bitter, and salty, but also that of umami (i.e., monosodium glutamate). Test–retest and split-half reliability coefficients were determined, along with the test’s comparative sensitivity relative to two other taste tests, to sex, age, head trauma, and phenylthiocarbamide (PTC) taste sensitivity.

Materials and methods

Subjects

A total of 198 consecutive patients presenting to the University of Pennsylvania Smell and Taste Center for chemosensory evaluation served as subjects (Table 1). Five percent were cigarette smokers. A subset of 34 subjects were administered the WETT® on two occasions to establish test–retest reliability [17 men, 17 women, mean (SD) respective ages = 64.41 (11.62) and 60.12 (11.87)]. All subjects provided informed written consent. The study was approved by the University’s Office of Regulatory Affairs and complies with the Declaration of Helsinki for medical research involving human subjects.

Table 1 Etiology, sample size, age, and sex distribution of the study population. See text for details

Procedures

The WETT® and two other well-established taste tests, the Whole Mouth Taste Test and the Taste Quadrant Test described below, were administered in a standardized manner by trained test administrators. Testing order was interspersed with other tests, being dependent upon the patient mix and the availability of test stations. All but nine subjects were also administered both a PTC taste strip marketed by Carolina Biological Supply Company (Burlington, NC, USA; 7 μg PTC/strip) and by Sensonics International (Haddon Heights, NJ, USA; 18 μg PTC/strip). One type of strip was administered in the morning and the other type of strip in the afternoon in counterbalanced order. In addition to reporting whether or not a bitter taste was present on each PTC strip, those who noticed a bitter taste rated its relative intensity on a nine-point rating scale (1 = very weak; 9 = very strong).

Taste tests

Waterless Empirical Taste Test (WETT®)

The WETT® (Sensonics International, Haddon Heights, NJ; see Dr. Doty’s disclosure statement) is comprised of a series of 53 disposable plastic taste strips. Located on one side of each 1 × 6 cm strip is a 1 × 2.5 cm monomer cellulose pad that contains a concentration of either dried sucrose (0.20, 0.10, 0.05, or 0.025 g/ml), citric acid (0.025, 0.05, 0.10, or 0.20 g/ml), sodium chloride (0.0313, 0.0625, 0.125, or 0.25 g/ml), caffeine (0.011, 0.022, 0.044, or 0.088 g/ml), monosodium glutamate (0.017, 0.034, 0.068, or 0.135 g/ml), or no stimulus. For ease of presentation, the taste strips are contained in a three-drawer portable box (Fig. 1). Rubber gloves are provided for the test administrator in an opening at the top of the box, along with a prompt sheet to remind the subject as to the taste qualities to be reported. Each of three drawers of the kit is divided into nine compartments that open to the kit’s front. The strips are presented in an order denoted in each test drawer compartment.

Fig. 1
figure 1

The portable Waterless Empirical Taste Test (WETT®) kit with front door closed (left) and open (right). The three drawers containing the white plastic monomer cellulose pads embedded with tastants (in front of pictures) are shown on the right. Courtesy of Sensonics International, Haddon Hts., NJ 08035 USA. Copyright © 2015, 2019, Sensonics, International.

On a given trial, the patient is instructed to move the cellulose pad of each strip around the mouth, particularly along the dorsal edges of the tongue, for 5–10 seconds, and to identify the taste quality or to indicate that no taste can be perceived. The test sequence involves presenting the four concentrations of each stimulus twice. In the first half of the test (27 trials), the stimulus concentrations proceed from weak to strong in an ascending sequence, with the different tastants being randomized in presentation order. No tastant (e.g., sucrose) immediately follows itself. The blanks are presented after each of the four caffeine presentations, the 0.25 g/ml sodium chloride presentation, and the 0.025 g/ml and 0.10 g/ml citric acid presentations. In the second half of the test, the reverse presentation order is made, i.e., going from strong to weak concentrations. The blank that follows the 0.25 g/ml sodium chloride stimulus, which is the last trial of the first series, is not repeated at the beginning of the second series, resulting in 26 rather than 27 trials for the second half of the test.

Whole-mouth taste test (WMT)

In this test, 20 mL of five concentrations each of sucrose (0.08, 0.16, 0.32, 0.64, 1.28 molar [M]), sodium chloride (0.032, 0.064, 0.128, 0.256, 0.512 M), citric acid (0.0026, 0.0051, 0.0102, 0.0205, 0.0410 M), and caffeine (0.0026, 0.0051, 0.0102, 0.0205, 0.0410 M) are presented in disposable plastic cups to each subject in a counterbalanced presentation order (Deems et al., 1991; Stinton et al., 2010). Each solution is sipped, swished in the mouth, and expectorated. The subject indicates whether the solution tasted sweet, salty, sour, or bitter, and rates its intensity and pleasantness on visual analog scales (Hawkes & Doty, 2017). In the present study, only the identification scores are presented, not the intensity and pleasantness ratings. After responding, the mouth is rinsed with purified water. Forty stimulus presentations are administered (4 tastants × 5 concentrations × 2 trials). The total possible identification score for a given tastant is 10, with 40 being the maximum for the overall test.

Taste Quadrant test (TQT)

In this test, taste identification ability is assessed on the left and right sides of the anterior and posterior tongue (Doty, Heidt, et al., 2016a; Doty, Tourbier, et al., 2016b; Stinton et al., 2010). The selected tongue regions are near the lateral margins of the anterior tongue and near or on the lateral circumvallate papillae in the back of the tongue. For each tongue region, 25 μl of sucrose (0.49 M), sodium chloride (0.31 M), citric acid (0.015 M), and caffeine (0.04 M), equated for kinematic viscosity using cellulose (1.53 mm2/s), are presented in a counterbalanced order using a micropipette (Eppendorf, Hamburg, Germany). On a given trial, each subject reports whether the solution tastes sweet, sour, salty, or bitter before retracting the tongue and rinsing with purified water. A total of 96 forced-choice trials (4 tastants × 4 lingual regions × 6 repetitions) are presented. The maximum score a subject can attain for a given tastant is 24.

Statistical analyses

The test–retest reliability the of the WETT® was assessed using the Pearson correlation coefficient, and the differences in test scores between the two test occasions were assessed using analysis of covariance (ANCOVA; age = covariate). The time between test and retest ranged from 5 to 6 hours. Split-half reliabilities of all three tests were computed and compared. The Spearman-Brown prophecy formula was used to adjust the split-half reliability coefficients for test length (Guilford, 1954). ANCOVAs assessed the impact of age, sex, head trauma, and PTC on total test scores and scores from the individual taste stimuli that make up each test. P values, SDs, η2 values, and 95% confidence intervals are presented as summary statistics.

Results

WETT® test–retest reliability coefficients

The means, SDs, 95% CIs, and test–retest reliability coefficients for the two administrations of the WETT® are shown in Table 2 (n = 34). The coefficient of the total test (r = 0.92), i.e., for all taste qualities combined, was higher than that of the individual taste qualities. The average test scores did not differ significantly across the two test sessions.

Table 2 Mean, SD, and 95% confidence intervals for the trials of the first and second test–retest sessions of the Waterless Empirical Taste Test (WETT®), along with the percent difference in mean test scores, reliability coefficients, and associated p values. N = 34. See text for details

Split-half reliability coefficients of the WETT®, WMT, and TQT

The split-half r’s were higher for the total tests than for the individual taste qualities (Table 3; n = 198). In most cases, improvement occurred across the two sectors of the test, although only 6 of the 15 comparisons were statistically meaningful, and one reflected a decline in performance. Thirteen of the 15 reliability coefficients (87%) were above 0.70, with two-thirds (10/15) being at or above 0.80, a value considered to be very strong (Cohen, 2003; Hemphill, 2003).

Table 3 Split-half reliability coefficients, along with mean, SD, and 95% confidence intervals, for the trials of the first and second halves of the Waterless Empirical Taste Test (WETT®), the Whole Mouth Taste Test (WMT), and the Quadrant Taste Test (TQT). P value after % difference column is based on an ANCOVA (age = covariate) comparing the means of the two halves, whereas the p value in the last column represents the significance of the r value. N = 198. See text for details

Sensitivity of the WETT®, WMT, AND TQT to sex and age

The mean, SD, and 95% CI data for the total scores of all three tests are presented in Table 4 for both sexes as a function of age quartiles. Women, on average, outperformed men on all three tests (WETT® p < 0.001, η2 = 0.121; WMT p = 0.007, η2 = 0.035; TQT p < 0.001, η2 = 0.071). In all cases, scores decreased with age (WETT® p = 0.016, η2 = 0.026; WMT p = 0.002, η2 = 0.048; TQT p < 0.033, η2 = 0.022).

Table 4 Mean (SD; 95% CI) total test identification scores for three taste tests and their relationship to subject sex and age. See text for details

Both sex and age effects were variably present for the individual sweet, sour, salty, and bitter components of the three tests. When present, the magnitude of their effects was similar to those observed in Table 4. The statistical details for each stimulus are as follows:

Sucrose

For sucrose, the WETT® scores were higher for women (p = 0.001, η2 = 0.062) and decreased with age (p = 0.047, η2 = 0.021). This was also true for the TQT scores (sex p < 0.001, η2 = 0.071; age p = 0.033, η2 = 0.022). In contrast, the WMT sucrose scores were not meaningfully related to either sex or age (ps > 0.15).

Sodium chloride

For sodium chloride, women outperformed men on both the WETT® and the WMT (WETT® p = 0.006, η2 = 0.041; WMT p = 0.003, η2 = 0.048). Age did not significantly affect the performance on either test (ps > 0.20), although a trend was present for the WETT® (p = 0.062, η2 = 0.001). TQT sodium chloride scores were not meaningfully impacted by either sex or age (ps > 0.25).

Citric acid

For citric acid, WETT® scores were related to both sex (p = 0.006, η2 = 0.040) and age (p = 0.047, η2 = 0.043). The WMT was also sensitive to age (p < 0.0001, η2 = 0.093), but not sex (p = 0.23, η2 = 0.006). The TQT was not statistically influenced by either sex (p = 0.072, η2 = 0.034) or age (p = 0.077, η2 = 0.033).

Caffeine

For caffeine, both the WETT® scores and those of the TQT were influenced by sex (ps < 0.001; respective η2s = 0.057 & 0.080) and age (ps = 0.030 & 0.015; η2s = 0.043 & 0.027). Like citric acid, the WMT bitter trials were impacted by age (p = 0.016, η2 = 0.031), but not sex (p = 0.119, η2 = 0.013).

Monosodium glutamate

The WETT® is the only one of the three tests to employ monosodium glutamate (umami). Umami scores were influenced by sex (p = 0.033, η2 = 0.025), but not age (p = 0.239, η2 = 0.008).

Sensitivity of the WETT®, WMT, and TQT to head trauma taste deficits

There is evidence that head trauma (HT) can negatively impact taste function, with published dysfunction frequencies ranging from 0.4% to 19% (Schofield & Doty, 2019). Although viral upper respiratory infections (URIs) can also produce chronic taste deficits, they are less common than those observed in HT. For example, in one study, whole-mouth taste loss was evident in 5.3% of 132 HT patients, as compared to 1.6% of 192 URI patients (Deems et al., 1991). For this reason, we sought to determine whether any of the three tests could differentiate between these two groups.

The total test scores of each of the three tests were significantly lower for patients with a HT etiology than for those with an URI etiology (Table 5). While the subtests of the TQT did not discriminate between these two groups, the sucrose and citric acid subtests of the WETT® did so, as did the NaCl, citric acid, and caffeine subtests of the WMT.

Table 5. Mean, SD, and 95% confidence intervals for the identification scores of Waterless Empirical Taste Test (WETT®), the Whole Mouth Taste Test (WMT), and the Quadrant Taste Test (TQT). P and η2 values based on an ANCOVA (age = covariate) that compared means of the two groups. N = 198. See text for details

Sensitivity of the WETT®, WMT, and TQT to phenylthiocarbamide (PTC) tasters

We trichotomized the PTC test scores into the three categories: not detecting bitter on either type of taste strip, detecting bitter on only one type of taste strip, and detecting bitter on both types of strips. The mean (SD) bitter intensity ratings given to these strips for these three respective groups were 0 (0.00), 4.11 (2.06), and 5.75 (1.56). The taste test identification scores of the three taste tests are presented in Table 6, along with the results of the ANCOVAs performed across the three categories and the post hoc comparison p values. It is apparent from these data that each test’s total identification score differed among the three PTC sensitivity categories. Although there was a monotonic relationship of performance for each test across the three subject groups, comparisons of means using Tukey’s HSD difference test found that the most significant differences occurred between the no-strip and two-strip taste groups. Differences did occur between the no-strip and one-strip taste groups for caffeine and NaCl on the WETT® and between the one-strip and two-strip taste groups for the total test, sucrose, and caffeine on the TQT. Significant differences were apparent for seven of the comparisons for the WETT®, three comparisons for the WMT, and five comparisons for the TQT.

Table 6. Mean, SD, and 95% confidence intervals for the identification scores of Waterless Empirical Taste Test (WETT®), Whole Mouth Taste Test (WMT), and Quadrant Taste Test (TQT) as function of PTC tasting ability. P and η2 values based on an ANCOVA (age = covariate) comparing the three groups. N = 189. See text for details

Discussion

This study demonstrates, in a clinic population, that the WETT® performs as well as, and in some cases better than, two established liquid-based taste tests on a range of tasks, thereby documenting its general validity. Thus, the WETT® is as sensitive or more sensitive to age, sex, head trauma, and PTC taste detection relative to the other two tests. Such performance is remarkable in light of the fact that this test has a relatively short administration time and does not employ water rinses. Such features make it very practical in both clinical and non-clinical settings.

The WETT® is a general measure of the ability to identify various concentrations of taste stimuli with a minimum number of trials and without the calculation of thresholds, per se. However, operationally, its presentation paradigm is similar to that of recognition thresholds, since a range of different concentrations is presented in an ascending series in the first half of the test and a descending series in the second half of the test, and the subject’s task is to identify the taste quality of each stimulus. The concentrations of a given tastant (e.g., sucrose) do not immediately follow one another, as normally occurs for a threshold test, but are interspersed among the concentrations of the other tastants. Such an approach is efficient, allowing for the testing of all five basic taste qualities at the same time and short inter-stimulus intervals not confounded by meaningful adaptation. This paradigm is essentially the same as that done previously for some other taste tests, including the WMT (Deems et al., 1991; Stinton et al., 2010) and a 32-trial filter paper strip taste test (Landis et al., 2009).

The WETT® proved to be highly reliable, in terms of both test–retest and split-half reliability. Its overall test–retest reliability in the clinic sample is nominally above its reliability determined in a small cohort (n = 16) from a non-clinic population (respective r’s = 0.92 vs. 0.88; Doty, 2019b), and is nominally larger than that reported for a number of other taste identification tests, including ones presenting stimuli via whole-mouth rinses (r = 0.61; Hwang et al., 2018), filter paper strips [individual tastant r’s ranging from 0.38 to 0.55 (Fjaeldstad et al., 2018) and 0.46 to 0.79 (Mueller et al., 2003; Ribeiro et al., 2016)], chewable tablets (r = 0.69; Ahne et al., 2000), and glass probes or rods (r = 0.68; Pingel et al., 2010). Its split-half r’s, while higher than those of a number of tests, are similar to those reported for a test in which stimuli are placed on the tongue by a pipette, with individual tastant r’s ranging from 0.73 to 0.80 (Fjaeldstad et al., 2018).

Although the split-half reliability values of all three tests evaluated in this study were similar, the scores of the individual stimuli making up these tests were differentially influenced by sex and age. For example, for sucrose, the WETT® and TQT scores, but not the WMT scores, were significantly related to these variables. For NaCl, sex impacted the WETT® and WMT scores, but not the TQT scores. For citric acid, age was related to WETT® and WMT scores, but not to the TQT scores. For caffeine, both the WETT® and TQT scores were sensitive to both sex and age, whereas the WMT scores were only sensitive to age. The reasons for such differences are unclear.

All three tests administered in this study found lower test scores in traumatic head injury patients than in patients with an URI etiology. However, the WETT® and the WMT were superior to the TQT in making this distinction. In regard to PTC, the WETT® better differentiated tasters from non-tasters than the other two tests. Thus, across the taste stimuli, significant differences were apparent for seven of the comparisons for the WETT®, three for the WMT, and four for the TQT. The basis of the WETT®’s greater sensitivity is unknown, but may relate to stimulus concentration and deposition differences.

Our finding that the WETT® and the other taste tests evaluated in this study differentiated between PTC taster categories is in accord with earlier findings that subjects with greater sensitivity to PTC and related compounds [e.g., 6-n-propylthiouracil (PROP)] are also more sensitive to some representatives of the other basic taste qualities (Bartoshuk et al., 1998; Chang et al., 2006; Doty et al., 2017; Doty & De Fonte, 2016; Drewnowski et al., 1997; Keller & Adise, 2016; Webb et al., 2015). Such associations, however, are complex and need not be solely due to genetic factors (Nolden et al., 2020). Although somewhat unorthodox, the present study’s categorization of the PTC bitterness intensity ratings into three groups based on responses to two different PTC strips provided a metric beyond just a single-stimulus-based characterization of tasters and non-tasters. The best discrimination was between the non-tasters and the tasters of both types of test strips. Whether the tripartite differentiation is related to more traditional ways of differentiating among PTC tasters is not clear.

There is evidence that diseases negatively impact smell more than taste, and that most persons who complain of taste loss actually have olfactory deficits (Deems et al., 1991). Nevertheless, quantitative taste testing in clinical settings occurs less frequently than quantitative olfactory testing. Examples of diseases that are reported to affect taste are early stage cancers (Murtaza et al., 2017), hypertension (Roura et al., 2016), hypothyroidism (Pittman & Beschi, 1967), diabetes (Perros et al., 1996), kidney disease (Kim et al., 2018), Parkinson’s disease (Doty et al., 2015), multiple sclerosis (Doty et al., 2016c), and liver disease (Shiue, 2015). Among the most common causes of taste disturbances are cancer treatments (Nolden et al., 2019) and such widely prescribed medications as antifungal agents (Doty & Haxel, 2005) and cardiovascular ACE inhibitors and beta-blockers (Schiffman, 2018). The present development of a more practical taste test may well expand the list of disease-related factors that alter the ability to taste.

This study has both strengths and weaknesses. First, while the WETT® taste strips could be used to assess function in different regions of the tongue, the present application examined only whole-mouth function. Regional testing is needed if damage to one or more of the nerves innervating the lingual epithelium is to be detected. Thus, future research is needed applying the WETT® taste strips to localized tongue regions. That being said, whole-mouth testing is the best reflection of a patient’s overall perception of taste, making such testing of clinical value. Second, this study validated the WETT® in a patient population. While its major findings are therefore generalizable to a patient population with chemosensory complaints, the degree to which correspondence exists to non-clinic populations needs verification. Third, the test–retest evaluation of the WETT® was performed within the same day. Although this approach is common in the literature (Doty et al., 2019a), reliability data across longer time periods is usually viewed as desirable (Feeney & Hayes, 2014). Nonetheless, an argument can be made that so long as subjects do not recall their responses on the earlier test occasion, a short test–retest reliability is a better index of the stability of the test than a longer test–retest reliability since, in the latter case, non-related subject factors can intervene that distort the assessment of the true stability of the test (Marx et al., 2003).