Olfactory function can be reduced due to several reasons and negatively affects patients’ quality of life (Deems et al. 1991). Subjective ratings of olfactory performance frequently do not reflect psychophysical test results (Landis et al. 2003) and in consequence olfactory testing plays a central role in diagnosing olfactory dysfunction (OD). Up to one quarter of OD patients may report a poor disease management (Landis et al. 2009). This underlines the need for olfactory tests, considering most patients appreciate a comprehensive medical work-up.

The most widely used in the United States of America, and also beyond, for testing odor identification, is the University of Pennsylvania Smell Identification Test (UPSIT), which utilizes microencapsulated odorants that can be released by a “scratch-and-sniff” technique (Doty et al. 1984). More frequently used by European physicians and enabling testing three “dimensions” of the sense of smell, the “Sniffin’ Sticks” test battery is validated in different countries, commonly used for research and has the advantage of reusability instead of single usage as in microencapsulated tests (Kobal et al. 1996; Hummel et al. 2016). The Sniffin’ Sticks test kit is based on pen-like odor dispensing devices and can be used for testing olfactory threshold (T), discrimination (D), and identification (I) (Hummel et al. 1997). Current guidelines on OD recommend testing at least one out of these three olfactory “dimensions”; however preferably all three of them (Hummel et al. 2016). Summed scores of T, D, and I (TDI) can be compared with normative age-related data and cut-off scores allowing diagnosing norm-, hyp-, or anosmia (Hummel et al. 2007).

Full TDI testing takes up to 1 h and administration time may vary strongly amongst subjects. This limits application in a lot of clinical settings. Due to often narrow personnel resources, reliable self-administered testing procedures are favorable in clinical routine. The Sniffin’ Sticks identification test has already been validated in a self-administered manner (Mueller et al. 2006).

Being the more time-consuming subtests, developing a self-administered procedure for odor discrimination and threshold testing seems valuable for clinicians and research purposes. Lately, olfactory threshold can be measured by computerized devices in a self-administered manner (Jiang and Liang 2015). Costs of these devices however may exceed various research or clinic budgets. Consequently, the aim of the present study was to investigate for comparability of assisted and self-administered strategies for testing odor threshold and discrimination using reusable Sniffin’ Sticks.

Subjects and Methods

Voluntary participants were recruited through invitational notices displayed at the university campus. The study was carried out according to the guidelines of the Declaration of Helsinki on Biomedical Research Involving Human Subjects and approved by the local ethics committee (EK-Nr. 087/2009). All subjects provided written informed consent.

Experimental Subjects

The study included 50 healthy subjects (30 female, 20 male, mean age 27.3 years ± standard deviation (SD) 8.1, range 18–58). Subjects had to restrain from eating or drinking anything, except water for at least 1 h prior to testing, which was performed in a well-ventilated room. Smoking history was recorded, as well as subjective assessment of smell (SAS) on a visual analogue scale (VAS) ranging from 0 (no sense of smell) to 100 (excellent sense of smell). Half of the participants stated they were smoker. Mean SAS score was 62.7 ± 18.6, range 14–96.

For cognitive testing, the Mini-Mental State Examination (MMSE) was applied prior to olfactory tests (Folstein et al. 1975). The MMSE is a test for general cognitive function containing 11 tasks (e.g., orientation, memory, calculation) with a maximum score of 30: ranges for normal cognitive function of 24–30, 18–23 for mild and 10–17 for severe cognitive dysfunction. None of the participants reached a score below 24 indicating normal cognitive function with mean scores of 29.4 ± 0.9, range 25–30.

Olfactory Testing

Olfactory testing was performed using the Sniffin’ Sticks odor identification test kit (Burghart GmbH, Wedel, Germany) and subjects were randomly split into two groups to alternate administration order of different testing settings (see Table 1). As in regular TDI testing, subjects started with threshold testing, then discrimination, and last identification testing was performed.

Table 1 Olfactory testing order by grouping

In analogy to the self-administration strategy for identification testing validated by Mueller et al. (2006), for self-testing procedures, examinees were instructed in writing to paint some curves on a piece of paper using the odor pen and to smell the piece of paper (“odor-curves-on-paper” method). For self-testing, odor pens were partially covered to be visually the same and subjects did not wear eye masks, since they had to be able to read provided instructions and perform according to the instructions. Pens were labeled appropriately, and written instructions contained answer fields for each pen triplet. Letters of the scented pens (i.e., A, B, or C) had to be filled in. The random presentation sequence amongst the triplets was maintained for all participating subjects. Figure 1 schematically illustrates testing set-up. Time needed for each modality was recorded and two 7-min rest periods were granted in between. Overall testing time was 65.9 min ± 7.5, range 53–84.

Fig. 1
figure 1

The test set-up for self-testing—pen triplets had to be aligned on top of testing desk with matching letters (i.e., A, B, or C). Pieces of paper were provided to draw “odor curves” on to, using the odor pens. Subjects then had to smell the piece of paper and dispose it. The right letter had to be filled in the answer line below (for threshold testing the one pen containing an odorant; for discrimination testing, the one pen containing a different odorant than the other two). For threshold testing, only the observing examiner was familiar with the numbers and corresponding dilutions

Odor Threshold Testing

Olfactory threshold was tested in three different strategies in each subject:

  1. 1.

    Assisted testing was performed in a reverse staircase manner (threshold assisted = T-as) with subjects being blindfolded and pen presentation by an examiner in a standardized manner (i.e., using smell neutral examiner gloves, same distance to the nose, same time period, and in same order). Odor threshold was tested using triplets with one pen containing the odorant at a certain dilution and two other odorless pens. Examinees had to identify the one pen with the odorant (target pen) in a forced-choice method. The target pen had to be identified correctly twice at the same dilution to initiate a reversal, or (going towards weaker dilutions) one unsuccessful identification initiates a reversal. Seven reversals needed to be performed to calculate the final score as the mean of the last four reversals (Hummel et al. 1997).

  2. 2.

    All 16 dilution triplets of the threshold test battery were randomly provided to the subjects (threshold randomized = T-rand) and testing was performed in a self-administered manner as explained above. This strategy has been proposed as an alternative procedure for threshold testing with test–retest reliability similar to the staircase technique (Kobal et al. 2001) and can be administrable within a shorter period of time (Lötsch et al. 2004). Thresholds were calculated using a log-likelihood fitting technique described by Linschoten et al. (2001).

  3. 3.

    Additionally, and also self-administered, subjects had to identify the target odor amongst the triplets, but in contrary to T-rand, with ascending methods of limits (threshold ascending methods of limits = T-aml) (Cain et al. 1983): triplets were provided in order from weakest to strongest concentration of n-butanol. T-aml scores were formed by the mean of the last wrong and the first right answer, from which on all target odors were assigned correctly. Table 2 shows an example of the T-aml testing procedure.

Table 2 Odor threshold testing using ascending methods of limits

Odor Discrimination Testing

For odor discrimination testing, subjects had to distinguish one target odor from two identical odors in a three-alternative forced-choice paradigm. As common, 16 pen triplets were presented to examinees either with assistance (D-as) or self-administered (D-s).

Odor Identification Testing

Odor identification was tested last in both groups (I-s), using the self-administration strategy validated by Mueller et al. (2006). As mentioned above, subjects were instructed in writing to remove the cap of each of the 16 odors and draw a few curves on a piece of paper in front of them and smell it. Each piece of paper had to be put in a nearby trash can immediately after smelling, to avoid interference. Answers had to be given in a four-alternative forced-choice paradigm.

Statistical Analysis

The two testing order groups were gathered for statistical analysis. Correlational analyses were performed using the Pearson correlation coefficient. Student’s t tests for paired samples were used for comparisons between groups. The alpha level was set at 0.05. Normality of quantitative variables was tested using Shapiro–Wilk test. Statistical analysis was performed using IBM SPSS Software Version 24 (SPSS, Chicago, IL, USA). GraphPrism 7.0 (GraphPad Software, La Jolla, CA, USA) was used to visualize data.


TDI scores were calculated using the I-s score and scores of the assisted subtest for threshold and discrimination. Mean TDI was 35.9 ± 3.0, range 26.5–42.8. Two subjects scored less than 31 on TDI indicating hyposmia, whilst all other subjects scored within the normative range of normosmia. TDI scores did not significantly correlate with age, SAS, and MMSE (p > 0.05).

Also, subscales of TDI did not significantly correlate with SAS, regardless of being self-tested or administered by an examiner. There was no statistically significant difference in olfactory performance, as measured by TDI, of male and female, nor smoker and non-smoker (p > 0.05). Mean olfactory test results of each modality and required testing time are illustrated in Table 3.

Table 3 Mean overall olfactory test results and time required

T-as scores were significantly lower than T-aml (p < 0.001) and T-rand (p < 0.05). Threshold testing using the T-aml strategy was significantly faster than T-rand and T-as modalities (p < 0.001). Bland–Altman plots illustrate mean differences of testing scores and 95% limits of agreement values (see Fig. 2).

Fig. 2
figure 2

Bland–Altman plots of odor threshold and odor discrimination results. D-s, discrimination self-test; D-as, discrimination assisted test; T-aml, threshold self-test with ascending methods of limits; T-rand, threshold randomized self-test; T-as, threshold assisted test; Differences between scores from the two modalities are plotted against the average scores of the two sessions; 95% limits of agreement and mean difference are indicated by horizontal lines. Clockwise starting upper left: 1. D-as and D-s: − 4.01 to 4.61, mean. diff. 0.30; 2. T-aml and T-rand − 4.48 to 4.37, mean. diff. − 0.06; 3. T-as and T-rand − 5.72 to 3.53, mean. diff. − 1.09; 4. T-as and T-aml − 4.87 to 2.80, mean. diff. − 1.04

For odor discrimination, paired t test revealed no significant difference using an assisted or self-administered testing strategy in terms of scores and time needed (p > 0.05). As seen in Fig. 2, the mean difference between D-as and D-s was small (0.3).


As a major finding of the present investigation, also in threshold and discrimination testing, the “odor-curves-on-paper” method, as it has been validated for identification testing (Mueller et al. 2006), is applicable in appropriate settings, including written instructions and modification of outer appearance of odor pens.

The traditional administration strategy of the Sniffin’ Sticks test battery has been investigated excessively, in thousands of healthy subjects and olfactory diseased patients (Hummel et al. 2007; Kobal et al. 2000; Hummel et al. 2016; Cavazzana et al. 2017). This study did not seek to greatly modify these strategies, but to elaborate presentation settings to enable self-testing.

For discrimination testing, applying the “odor-curves-on-paper” method seems to be easiest, since triplets only need to be presented once and in a predefined order. In test–retest situations Hummel et al. (1997) found worse correlations in odor discrimination than in odor threshold and odor identification testing. The authors assumed that learning processes in odor testing affect discrimination tasks more. To overcome possible effects of learning processes on discrimination scores, we applied two different administration orders (as seen in Table 1—either D-as or D-s was performed first). In consequence, scores of discrimination tested in a self-administered manner did not differ significantly from administration by an examiner and time needed was similar. The homogeneous distribution of the data, as visualized in the Bland and Altman plot, suggests similar reproducibility for the two discrimination tests, regardless whether subjects had lower or higher scores. However, D-as scores were slightly higher than D-s. To a certain extent, this could also be due to blindfolding in D-as, but not in D-s. Blindfolded subjects may be able to focus more intensively on the presented odors and/or in D-s visual tasks may distract subjects with negative effects on performance. However, long-term visual impairment does not seem to affect olfactory performance in comparison to vision in healthy subjects as measured by TDI testing (Luers et al. 2014). Also, a recent meta-analysis concluded blind people do not have superior olfactory abilities (Sorokowska et al. 2018). Therefore, possible effects of blindfolding on testing performance within this study should only be addressed with caution.

Solely applying the “odor-curves-on-paper” method for the traditional reverse staircase odor threshold testing however was not suitable for self-testing: The lack of an examiner makes reversals in concentrations impossible. In consequence, we chose two previously published strategies for odor threshold testing (Cain et al. 1983; Linschoten et al. 2001; Lötsch et al. 2004) and these two different self-testing strategies in odor threshold testing revealed small mean differences and rather narrow 95% limits of agreement. However, assisted testing, in a reverse staircase paradigm, yielded significantly lower scores as compared to self-testing procedures. This suggests comparability of T-rand and T-aml, but indicates caution in comparing these modalities with the classic reversed staircase method.

Interestingly, T-aml was significantly faster than the two other modalities. Apparently, constantly increasing the concentration enables faster decision-making amongst healthy subjects in contrast to random dilution presentation. In contrast to other smell tests using random presentation of odorants (Kobal et al. 2001), the T-rand method was not significantly faster than T-as. Important to mention in this context: due to needed recovery time between odor stimuli presentation (Kobal 1981), saving time in olfactory testing is difficult to achieve without negative effects on accuracy. For threshold testing, wider dilution steps may facilitate less time consumption. Croy et al. (2009) found no significant difference in testing subjects with 8 dilution steps in comparison to traditional 16 dilution steps, but with less time needed for 8 steps. The present study has not taken advantage of fewer steps in odor threshold self-testing; this remains subject to future investigations. As a further limitation, this study has not considered testing threshold levels by using other strategies, such as the ascending limits procedure proposed by Sijben et al. (2017), where threshold levels are determined by testing until four right hits of the target odor of one concentration in a row are obtained.

Given the present results, T-aml may be used as a possible self-testing procedure for odor threshold, in case of narrow personnel resources. Clinicians have to keep in mind though that threshold scores (assessed by random or ascending methods of limits) may be lower when tested with the standard staircase procedure. Future comparative investigations should focus on several different threshold testing modalities in one population and perhaps leave out other olfactory dimensions. Using both self- and assisted modalities may then help to render more precisely the best method to apply in threshold self-testing settings.

Another issue in olfactory testing is the individual sensitivity level to single-molecule odors (Keller et al. 2012) and diverse odor discrimination abilities based on familiarity (Jehl et al. 1995). Recently, usage of more complex odors in terms of odor mixtures seems to overcome these confounding factors and to be more reliable than single-molecule testing, as presently used in the Sniffin’ Sticks testing battery (Oleszkiewicz et al. 2017; Hsieh et al. 2017).

Due to these mentioned limitations and the lack of exclusive normative data, application for research purposes cannot be recommended presently. Additionally, self-testing may not be suitable for all patient groups and validation in patients with OD is still pending. Cognitively impaired and possibly older, as well as children, will have problems in understanding the written tasks. Also, the forced-choice paradigm frequently confuses patients and they have to be reminded of this principle during testing. In absence of an examiner, this may lead to skipped answers and hence false scores. Nevertheless, self-testing procedures, as presented in this study, could be useful for screening of olfactory dysfunction.

In this cohort, subjective olfactory ratings did not correlate with olfactory test scores, including scores on self-tests. This inconsistent subjective scoring in comparison to olfactory measurements is a common finding in clinical routine and highlights the need for psychophysical tests to be able to assess olfactory function (Landis et al. 2003). All the more, easy, reliable, and ideally self-administrable olfactory testing modalities are of great value for a comprehensive medical work-up.

Taken together, the results of this study demonstrated the “odor-curves-on-paper” method is applicable for odor discrimination testing. Odor threshold, with restrictions, may also be performed self-administered. Self-administration of olfactory testing using Sniffin’ Sticks can easily be performed with appropriate instructions and slight Sniffin’ Sticks modifications. In cognitively healthy patients, the proposed testing set-up may improve patient care if personnel and (with limitations) time are restricted.