Subjects
A total of 20 participants (10 women; mean age 24.68 years, SD 2.6 years, range 19–30 years; mean body mass index (BMI) 22.03 kg/m2, SD 1.66 kg/m2, range 19.77–25.07 kg/m2) took part in the experiment. All participants were previously screened by means of telephone interviews. Exclusion criteria included current smoking, recent history of smoking (< 3 years of abstinence), vegetarian/vegan diet, allergies, current use of medication except oral contraceptives, drug use within the last 2 months, alcoholism, current pregnancy/breastfeeding, any subjective or objective impairments of the sense of smell, nose surgery except childhood nasal polypectomy, and history of neurological or psychiatric disorders. Inclusion criteria were age between 18 and 36 years. After inclusion, participants provided written informed consent. The study was carried out in accordance with the Declaration of Helsinki and was approved by the Ethics Committee of the University of Leipzig.
Sample Size Estimation
A priori we determined the minimum number of participants required using a statistical power analysis with Gpower software (Faul et al. 2007). Based on data from Hummel et al. (1997), we performed the sample size estimation. They correlated odor thresholds that were assessed with the “Sniffin’ Sticks” test battery, at two different test days. The effect size in this study was r = .61, considered to be large using Cohen’s (1988) criteria. With an alpha = .05 and power = .80, the projected sample size needed with this effect size (GPower 3.1) is approximately n = 16. Thus, our proposed sample size of n = 20 will be more than adequate for the main objective of this study.
Study Design and Procedure
The investigation involved ODT testing on two test days in a repeated measures within-subject design. Participants were instructed to refrain from eating and drinking except for water 2 h prior to testing. In the first session, all participants were screened for olfactory function using the short form of the olfactory identification test included in the “Sniffin’ Sticks” test battery (Mueller and Renner 2006). On both test days, olfactory testing was conducted using the single staircase procedure (SSP) as described by Hummel et al. (1997) and the brief ascending procedure (BAP), in a pseudo-randomized order. The interval between test days was approximately 1 week (mean 8.45 days, SD 4.37 days, interval 7–20 days). ODT tests were conducted successively with a short break of approximately 10 min. After conducting both ODT tests, participants rated intensity (0 = very weak, 10 = very strong), pleasantness (− 5 = unpleasant, + 5 = pleasant), and familiarity (0 = unfamiliar, 10 = familiar) of the odor from the pen containing the highest concentration of n-butanol on a visual analog scale (Aitken 1969).
Materials
All odorants were presented in commercially available felt-tip pens (“Sniffin’ Sticks”; Burghart Instruments, Wedel, Germany). For the screening of olfactory function, we used the short form of the olfactory identification test from the “Sniffin’ Sticks” test battery (Mueller and Renner 2006). In a multiple-choice task, participants must identify the correct smell from a card with four descriptors per odorant. In total, five odorants were presented; the test confirms the presence of normosmia (≥ 4 correct answers) or hyposmia (< 4).
The ODT test kit from the “Sniffin’ Sticks” test battery is performed with n-butanol, an odorant that arises from fermentation processes and is frequently used in olfactory testing. It is perceived as rather unpleasant.
Sixteen dilutions of n-butanol are prepared by stepwise diluting previous odor concentrations in a ratio of 1:2. The strongest odor concentration is 4% (pen number 1) and the weakest is 1.22 ppm (pen number 16).
The odorized pens are presented in triplets as described by Hummel et al. (1997), one containing diluted n-butanol and two containing the solvent (aqua conservans) only, serving as blanks. In this three-alternative forced-choice procedure, participants are asked to identify the pen containing the odorant. Each pen is presented for approximately 3 s at 1–2 cm distance of both nostrils. The interval between triplets is approximately 30 s. During testing, participants are blindfolded to avoid visual identification of the correct pen. We established ODTs based on the standard SSP procedure, an additionally computed threshold score with less reversals from the standard procedure, and the BAP.
In the standard SSP, odorants are presented from lowest to highest odor concentration. Two subsequent correct identifications trigger the first turning point (reversal of the staircase), thereby indicating the peri-threshold region. From there, odor concentration is increased following two correct answers in a row and decreased following an incorrect answer. Each turning point results in a reversal of the staircase. Seven reversals must be obtained in the “gold standard” SSP (Hummel et al. 1997). The short SSP follows the principle of the standard SSP but we estimated the threshold using only the first five of the seven measured reversals from the standard SSP.
In the BAP, each triplet is presented only once in an ascending order from lowest to highest odor concentration. The threshold score is defined as the point of transition between no detection and detection of the odorant, i.e., the threshold score is a value read at the boundary between correct and incorrect detection of the pen containing the odor. Based on the CSP (Fechner 1860) mentioned earlier, we presented each odor level only once. Similarly, based on the ALP (Cain et al. 1988), we defined the threshold score as being reached after five correct odor detections in a row. If the series of five correct detections begins within the five highest odor concentrations, the highest concentration level is repeated until five correct detections are reached unless the highest odor concentration is not detected, in which case the threshold value is zero.
Questionnaires and Interviews
Depressive symptoms were assessed using the Beck Depression Inventory (BDI) (Beck et al. 1961), a self-administered four-point rating scale (0 = not at all to 3 = always), which measures depressive symptoms in the past week, in order to exclude participants with depressive symptoms because depression has previously been shown to be associated with smell impairments (Croy and Hummel 2017).
Due to known effects of smoking on the olfactory system, smoking behavior was investigated using the Fagerstroem Test for Nicotine Dependence (FTQ, Fagerstroem 1978) as well as a smoking interview implemented previously in the Leipzig Life-Study containing questions about smoking behavior in the past and present, smoking onset and durations, breaks, and passive smoking hours (Loeffler et al. 2015).
To measure individual odor associations, use of the olfactory sense and the way olfaction influences decisions in daily life; we implemented the Importance of Olfaction Questionnaire (IOQ) (Croy et al. 2010).
Women were further interviewed to assess information about their menstrual cycle, because sensitivity to odors is known to be increased in follicular phase of the cycle/under oral contraceptive and decreased in luteal phase (Derntl et al. 2013; McNeil et al. 2013).
Data Analysis
JASP (version 0.8.1.1 for Mac OS X, JASP Team 2018), IBM SPSS (IBM Corp. Released 2015, IBM SPSS Statistics for Windows, Version 23.0.) and R (version 3.5.0, R Core Team 2013) were used for statistical evaluation. The α-level was set at .05.
Due to small sample size, normality of the data was ascertained using the Shapiro-Wilk test.
Age, BMI, ODT scores (SSP_7, SSP_5 on both days; BAP on the first day), perceived intensity, as well as pleasantness and familiarity of n-butanol on both days were normally distributed. ODT scores measured with BAP on the second day were not normally distributed. Although ANOVAs are relatively robust against violations of the assumption of normality, we nevertheless decided to perform each analysis, which included BAP threshold on day two, additionally with nonparametric testing as a precaution. As nonparametric test results did not deviate from parametric test results, we decided to report the latter here.
Data obtained with the “gold standard” SSP were analyzed twice. In a first step, we computed the standard ODT score, which is calculated by the mean of the last four of a total of seven reversals (SSP_7). A second threshold score was computed by the last two of a total of five reversals (SSP_5).
For BAP, the threshold value was estimated by identifying the point of transition between no detection and detection, which means, the point when an odorant was constantly detected five times in a row.
To compare the ODT scores obtained with the “gold standard” SSP with seven reversals (SSP_7), the short SSP with five reversals (SSP_5), the BAP, and between testing days, the data were submitted to repeated-measures analysis of variance (rm-ANOVA) using the general linear model with the within-subject factors “Method” (SSP_7/SSP_5/BAP) and “Test day” (T1/T2) and the between-subject factor “Sex” (male/female). Subsequently, we ran a Bayesian repeated measures analysis of variance (Bayesian rm-ANOVA) using the same model to ascertain that there are no significant differences between the ODT scores obtained with the different methods. While conventional statistical testing is based on the frequentist paradigm, the Bayesian approach is based on the subjective probability paradigm (van de Schoot et al. 2014). Compared to conventional statistical testing, the Bayesian approach is advantageous in that the likelihood of an outcome is considered under the null and the alternative hypothesis. This means that by using the Bayesian approach, we can actually estimate the probability of the null hypothesis (no differences between groups in our case), while in the conventional approach, we can only estimate the likelihood of our observations or more extreme values when the null hypothesis of no differences is true.
To examine the test-retest reliability, meaning the relationship between all ODT scores obtained with different methods and on different testing days, we used intraclass correlation (ICC). ICC estimates were calculated using SPSS statistical package version 23 (SPSS Inc., Chicago, IL) based on an absolute-agreement, two-way mixed-effects model. Additionally, we report Pearson’s correlation coefficients to make our results comparable to other test-retest correlation studies in olfactory testing. To describe advantages regarding relevant time-saving of the short over the standard procedure, we used rm-ANOVA based on p values with the within-subject factors “Method’ (SSP_7/SSP_5/BAP) and “Test day” (T1/T2) and the between-subject factor “Sex” (male/female).
To compare the differences of the interindividual variation regarding the duration of the three methods in order to find out whether the stability of the implementation time differs according to the assessed method, we first computed the three different Coefficients of Variation (CV), then adjusted the CVs for the mean of each method, and finally performed a one-way ANOVA with the dependent variable “adjusted CV.”