Distinct smell and taste disorder phenotype of post-acute COVID-19 sequelae

Purpose Olfactory dysfunction (OD) commonly accompanies coronavirus disease 2019 (COVID-19). We investigated the kinetics of OD resolution following SARS-CoV-2 infection (wild-type and alpha variant) and its impact on quality of life, physical and mental health. Methods OD prevalence was assessed in an ambulatory COVID-19 survey (n = 906, ≥ 90 days follow-up) and an observational cohort of ambulatory and hospitalized individuals (n = 108, 360 days follow-up). Co-occurrence of OD with other symptoms and effects on quality of life, physical and mental health were analyzed by multi-dimensional scaling, association rule mining and semi-supervised clustering. Results Both in the ambulatory COVID-19 survey study (72%) and the observational ambulatory and hospitalized cohort (41%) self-reported OD was frequent during acute COVID-19. Recovery from self-reported OD was slow (survey: median 28 days, observational cohort: 90 days). By clustering of the survey data, we identified a predominantly young, female, comorbidity-free group of convalescents with persistent OD and taste disorders (median recovery: 90 days) but low frequency of post-acute fatigue, respiratory or neurocognitive symptoms. This smell and taste disorder cluster was characterized by a high rating of physical performance, mental health, and quality of life as compared with convalescents affected by prolonged fatigue or neurocognitive complaints. Conclusion Our results underline the heterogeneity of post-acute COVID-19 sequelae calling for tailored management strategies. The persistent smell and taste disorder phenotype is characterized by good clinical, physical, and mental recovery and may pose a minor challenge for public health. Study registration ClinicalTrials.gov: NCT04661462 (survey study), NCT04416100 (observational cohort). Supplementary Information The online version contains supplementary material available at 10.1007/s00405-023-08163-x.


Rating of physical recovery, mental health and quality of life in the survey study
Self-perceived complete recovery, rehabilitation need and new medication since COVID-19 at the time of study participation were surveyed as single yes/no items.Percentage of physical performance loss as compared with the time before COVID-19 was rated with a 0 -100% scale [3,4].Quality of life impairment (QoL) and overall mental health impairment (OMH) were rated with a 4 item Likert scale each (possible answers: "excellent," "good," "fair," "poor," scored: 0, 1, 2, and 3) [3,4].Anxiety (ANX) and depression (DPR) were assessed with the Patient Health Questionnaire (PHQ-4) [3][4][5].Mental stress was scored with a modified 7 item PHQ stress module as described before [3,4,6].

Rating of olfactory dysfunction with Sniffin' Stick Test
Objective olfactory dysfunction at the 100-day and 360-day follow-up in the CovILD study participants was investigated with the 16-item Sniffn' Stick Identification Test as described [7][8][9][10].In brief, the nasal chemosensory performance was investigated using pen-like odor-dispensing devices for odor identification of 16 common odorants (multiple forcedchoice from four verbal items per test odorant).Clinically relevant olfactory dysfunction was defined as < 13 correct answers (points) [7,11].In the analysis, participants with the complete answers concerning self-reported olfactory dysfunction and complete test results were included.

Software
Analysis was done with R version 4.2.3 (The R Project for Statistical Computing).
Analysis results were visualized with ggplot [29] (bar, scatter and bubble plots, heat maps of confusion matrices), ExDA (ellipse plots, radar, ribbon, stack and violin plots), kinet (kinetic of symptom resolution), plotROC [30] (ROC plots), clustTools (multi-dimensional scaling projection/scatter plots, heat maps of clustering features, within-cluster sum of squares).HTML elements within the plots were handled with tools provided by the package ggtext [31].[32].Tables were rendered with the flextable package [33].Supplementary Material and parts of the manuscript were written in the rmarkdown environment [34] and were rendered as Word documents with the knitr [35] and bookdown [36] packages.Management of figures and tables within the rmarkdown documents was accomplished with the development package figur.

Data import and formatting
Data of the survey study [3,4] were imported from SPSS files with the foreign package (function read.spss()).The CovILD study data set [37] was imported from an Excel file with the function read_xlsx() (package readxl).Formatting of the raw data sets was done with in-house developed scripts available from the project's GitHub repository.

Estimation of minimal sample size for clustering analysis
To assess the minimal sample size of the survey study required for reliable clustering analysis, random subsets of the pooled Austria and Italy cohort COVID-19 symptom data set with differing observation numbers (50, 100, 200, 300, 400, 500, 600, 700, 800, 900 observations, 20 random subsets per sample size) were generated and their clustering tendency was assessed with Hopkins statistic (H).Possible H values span from 0 to 1, where H = 0 indicates an ideal uniform distribution and H = 1 suggests a highly clustered data.Beginning from the sample size of n = 400, no improvement of the clustering tendency (plateau) could be observed.This suggest n = 400 as minimal sample size required for reproducible clustering analysis results (Supplementary Figure S2).Given the size of the Austria and Italy cohorts (n = 479 and n = 427, respectively) and their H values being very close to the H plateau value in the random subset analysis (Austria: H = 0.8, Italy: H = 0.79), we inferred that the size of single study cohorts was sufficient for reliable and reproducible clustering analysis and hence abstained from pooling the survey study cohorts.

Descriptive statistic and effect size
If not indicated otherwise, numeric values are presented in the manuscript text and tables as medians with interquartile ranges (IQR) and ranges.Qualitative variables are presented as percentages and counts of their categories within the complete observation set.Descriptive statistic values were computed with the function explore() from the ExDA package.

Statistical hypothesis testing and inter-rater reliability
Since multiple study variables were non-normally distributed as assessed by Shapiro-Wilk test and visual assessment of their distribution (quantile -quantile plots), statistical significance for differences in outcome numeric variables were assessed with Mann-Whitney U test with r effect size statistic (two groups, independent data), paired Wilcoxon test with r effect size statistic (two groups, paired data) or Kruskal-Wallis test with η P values were adjusted for multiple testing with Benjamini-Hochberg method [42] separately for each analysis task and cohort.Effects with p < 0.05 were considered significant.

Symptom-symptom distances and multi-dimensional scaling
To assess co-occurrence or exclusivity of symptoms, simple matching distances between manifestations during the first 14 days, at 28 days and at 3 months after clinical onset in the survey study cohorts were calculated (function calculate_dist(), package clustTools employing tools from packages scrime and philentropy) [24,25,43].Subsequently, the distance matrix was subjected to multi-dimensional scaling (MDS, k = 2 dimensions, package stats, function cmdscale()).Association of specific symptoms was assessed by visual analysis of MDS coordinate plots.

Apriori analysis of COVID-19 symptoms in the survey study
Frequent combinations of symptoms during the first 14 days, at 28 days and at 3 months after clinical onset in the survey study cohorts were identified with the apriori algorithm (function apriori(), package arules) [22,44] with the minimal support cutoff of 0.15, 2 -10 item transaction length, confidence > 0.6 and lift > 2. The support statistic were used to estimate the symptom combination frequency.The confidence value was treated as an estimate of conditional probability of the symptom co-occurrence.The lift statistic was interpreted as a measure of the symptom dependence (lift = 1, symptoms are independent).Frequency of symptom combination in the survey study cohorts and percentage of co-occurrence of the symptoms within the given symptom combination were displayed in bobble plots.

Clustering analysis
COVID-19 recovery clusters of the training Austria survey cohort participants in respect to symptom-specific recovery times (Figure 1A) were defined with the PAM (partitioning around medoids) algorithm and Euclidean distance statistic [23,24].The set of participants with the complete clustering variable set (COVID-19 symptom recovery times) was included in the analysis.The symptom recovery times were not subjected to any type of pre-processing.The clustering objects were generated with the function kcluster(clust_fun = 'pam') from the package clustTools.
The choice of the clustering procedure was motivated by the analysis of the fraction of explained clustering variance (ratio of the total between-cluster to total sum of squares) and clustering structure stability in 10-fold cross-validation (metric: rate of correct cluster assignment, cluster assignment predicted by inverse distance-weighted 7-nearest neighbors label propagation algorithm) [45,46] for several clustering algorithms as presented in Supplementary Figure S14A.The fractions of explained clustering variance and cross-validated cluster assignment accuracy were calculated with the methods var() and cv() from the package clustTools.
The optimal number of clusters was determined by the bend of the total within-cluster sum of squares curve (Supplementary Figure S14B, method plot(), package clustTools, employing the genuine factoextra algorithm) [26].Permutation importance of specific clustering variables was investigated by calculating difference in clustering variance (ratio of total between-cluster sum of squares to total sum of squares) between the initial clustering object and the clustering object with the given variable reshuffled at random (function impact(), package clustTools).The permutation importance statistics were computed for 20 random permutations of each clustering variable.
Assignment of the Italy survey cohort participants to the recovery clusters was accomplished with the inverse distance-weighted 7-nearest neighbors label propagation classifier [46].Supplementary Table S7 (b) Comparison of frequencies of objective OD expressed as paired proportions was done with McNemar test with Cohen's q effect size statistic.Percentages of participants with objective OD at the three-month and one-year follow-up are displayed in an alluvial plot.The odds ratio of objective OD (one-year vs three months), effect size, p value and the number of participants are displayed in the plot caption.total between-cluster sum of squares to total sum of squares) and cluster assignment accuracy in 10-fold cross-validation (CV).
(b) Determination of the optimal cluster number in the PAM clustering of the training cohort by the bend of the total within-cluster sum of squares curve.
(c) Permutation importance of the clustering features (symptoms) for clustering of the training cohort expressed as the difference in clustering variance (ratio of total betweencluster sum of squares to total sum of squares) between the initial clustering object and the clustering object with the given variable reshuffled at random.Importance metrics were computed for 20 random permutations of each clustering factor.Median importance metrics with interquartile ranges (IQR) are visualized as boxes, whiskers span over 150% IQR.OD: self-reported olfactory dysfunction; Dim.appetite: diminished appetite; Imp.concentration: impaired concentration; Imp.walk: impaired walk; Imp.FMS: impaired fine motor skills.

Table S3 :
The clustering efficacy in the training Austria cohort and the test Italy cohort measured by clustering variance statistic defined above was similar (Austria: 0.59, Italy: 0.56), which indicate good reproducibility of the clustering structure developed in the training Austria cohort.Results of statistical hypothesis testing for significant recovery of the most frequent COVID-19 symptoms in the Austria (AT) and Italy cohort (IT) of the survey study.
a OD: self-reported olfactory dysfunction; Dim.appetite: diminished appetite.bCochran's Q test with Kendall's W effect size statistic.P values corrected for multiple testing with Benjamini-Hochberg method.Supplementary

Table S4 :
Results of statistical hypothesis testing for significant recovery of COVID-19 symptoms in COVID-19 severity strata of the CovILD study.

Table S5 :
Results of the Sniffin' Stick Test in the CovILD study subset with the complete longitudinal follow-up data.Numeric variables are presented as medians with interquartile ranges (IQR) and ranges.Categorical variables are presented as percentages and counts within the complete observation set.
b Categorical variables: McNemar test with Cohen's g effect size statistic.Numeric variables: paired Wilcoxon test with r effect size statistic.P values corrected for multiple testing with Benjamini-Hochberg method.Supplementary

Table S6 :
Demographic and baseline clinical characteristic at the COVID-19 onset of the survey study participants assigned to the recovery clusters, Austria (AT) cohort.Numeric variables are presented as medians with interquartile ranges (IQR) and ranges.Categorical variables are presented as percentages and counts within the complete observation set.

The largest significant differences in demographic, clinical and recovery variables between the Austria and Italy cohorts of the survey study.
: Demographic and baseline clinical characteristic at the COVID-19 onset of the survey study participants assigned to the recovery clusters, Italy (IT) cohort.Numeric variables are presented as medians with interquartile ranges (IQR) and ranges.Categorical variables are presented as percentages and counts within the complete observation set.Categorical variables: χ² test with Cramer V effect size statistic.Numeric variables: Kruskal-Wallis test with η² effect size statistic.P values corrected form multiple testing with Benjamini-Hochberg method.Categorical variables: χ² test with Cramer V effect size statistic.Numeric variables: Kruskal-Wallis test with η² effect size statistic.P values corrected form multiple testing with Benjamini-Hochberg method.Differences in numeric variables between the Austria (AT) and Italy (IT) survey study cohorts were assessed byMann-Whitney test with r effect size statistic.Differences in categorical variables were investigated by χ 2 test with Cramer's V effect size statistic.P values were corrected for multiple testing with Benjamini-Hochberg method.Significant numeric variables are presented in violin plots with medians and interquartile ranges depicted as diamonds and whiskers and single observations visualized as points.Percentages of categories of qualitative variables within the AT and IT cohorts are displayed as stack plots.Effect sizes and p values are presented in the plot captions.Numbers of complete observations are indicated in the plot axes.Patient Health Questionnaire, PHQ-4; ANX score: anxiety score, Patient Health Questionnaire, PHQ-4; QoL impairment score: score of impairment of quality of life.OD: self-reported olfactory dysfunction; Dim.appetite: diminished appetite; Imp.concentration: impaired concentration; Imp.walk: impaired walk; Imp.FMS: impaired fine motor skills.(a) Comparison of numeric Sniffin' Stick Test results was done with paired Wilcoxon test with r effect size statistic.Results are presented as a before -after plot (left) and box plot (right).Medians with interquartile ranges (IQR) are visualized as boxes, whiskers span over 150% IQR.Single observations are depicted as points.Observations obtained for the same participant are connected with a line.Effect sizes, p values are displayed in the plot caption.Numbers of complete observations are indicated i the X axis.