Background

Traumatic brain injury (TBI) is a condition characterized by changes in brain functioning caused by external head trauma [1]. It imposes life-long limitations [2] and leads to a range of physical, emotional, and cognitive disabilities, impacting functioning of affected individuals [1]. The burden of TBI extends to family caregivers [3], as well as health and economic systems [2].

Individuals after TBI can especially suffer from a range of post-concussion symptoms (PCS), which may persist much longer than initially expected [4]. These symptoms encompass physical (e.g., headaches or nausea), cognitive (e.g., a diminished ability to concentrate), and emotional/behavioral (e.g., depressiveness or fatigue) impairments [5]. PCS are commonly reported after mild to moderate TBI [6], but individuals following severe TBI also experience similar deficits [7, 8], referred to as PC-like symptoms.

To assess PCS, researchers and clinicians often rely on the subjective experiences of those affected using patient-reported outcome measures (PROMs), such as the Rivermead Post-Concussion Symptoms Questionnaire (RPQ) [6]. The RPQ is widely used to assess self-reported PCS. For the Collaborative European NeuroTrauma Effectiveness Research study (CENTER-TBI; clinicaltrials.gov NCT02210221), which was designed to examine treatment and outcomes of individuals after TBI in 18 countries [9], translations and linguistic validations were performed for the RPQ, resulting in eleven additional versions [10].

In multilingual studies, the equivalence of translated PROMs, in terms of their conceptual alignment with the original version and other translations, cultural relevance, acceptability to the target populations, and psychometrical comparability, is essential for language and country comparisons, as well as data aggregation in multilingual studies [11]. Measurement invariance (MI) analysis is a valuable tool to determine whether the translations of an instrument measure the same construct [12]. In particular, MI analysis investigates if the differences in observed variables across language versions are solely attributed to differences in latent means.

Therefore, the main aim of the present study is to provide empirical evidence of MI for the RPQ in six European languages. The RPQ has been declared a unidimensional PROM, but this property could not be replicated across translations, including English-speaking samples e.g., [13]. Thus, the factorial structure of the RPQ is examined to find the best-fitting model prior to MI analyses. In addition, the study seeks to explore whether the same construct is measured across the spectrum of TBI severity.

Once the assumption of MI is met, differences in RPQ scores between language samples will be due to true differences in self-reported PCS and not to differences in translation allowing for data aggregation and direct comparisons.

Methods

Study design and participants

Data were collected from December 2014 to December 2019 within the CENTER-TBI study involving 63 centers in 18 countries in Europe and Israel. A total of 4,509 individuals after TBI were enrolled in the core study. Inclusion criteria for study participation were clinical diagnosis of TBI and indication for computed tomography (CT) scan, enrollment within 24 h after injury, and informed consent for study participation. Written informed consent was obtained according to the local and national legislations for all patients (either by the patients or the legal representatives) and documented in the electronic case report form. To avoid bias in outcome assessment, patients with severe pre-existing neurological disorders were excluded from the study. Individuals were either seen in the emergency room (ER) and then discharged or either admitted to the hospital ward or intensive care unit (ICU). More detailed description of the study design is provided by Steyerberg et al. [14]. Data were retrieved from the CENTER-TBI database via Neurobot tool (core data set 2.1, November 2019).

The following analyses were limited to individuals aged 16 years of age or older who completed the RPQ six months after TBI. Following the recommendation for MI analyses, we included language samples with at least 200 participants (N = 1,818). Additional analyses on TBI severity involved individuals with available information on the Glasgow Coma Scale (GCS) [15] score at baseline (N = 1,790). For the composition of the study sample, see Fig. 1.

Fig. 1
figure 1

Sample attrition

Sample characteristics

Sociodemographic characteristics were collected at study enrollment and included sex, age, education (in groups and years), employment status, marital status, and living situation. Language samples were identified according to the languages spoken in the participating sites. For more details on language sample compositions in the CENTER-TBI study, see von Steinbuechel et al. [13].

The following variables were used to characterize premorbid and injury-related factors: mental health status before the injury (presence vs. absence of prior psychiatric diseases), cause of injury, clinical care pathways (ER, ward, ICU), and TBI severity measured using the Glasgow Coma Scale at baseline (GCS) [15] combined with information on abnormalities on the CT scan (uncomplicated mild, complicated mild, moderate, and severe TBI) [15, 16]. The functional recovery status at six months was rated using the Glasgow Outcome Scale – Extended (GOSE) [17]. The total injury severity score (ISS) and the brain injury severity score from the Abbreviated Injury Scale (AIS) [18] assessed total injury severity and brain injury severity, respectively.

The Rivermead Post-Concussion Symptoms Questionnaire

The Rivermead Post-Concussion Symptoms Questionnaire (RPQ) [6] assesses 16 symptoms including headaches, dizziness, nausea and/or vomiting, noise sensitivity, sleep disturbance, fatigue, irritability, depression, frustration, forgetfulness and poor memory, poor concentration, slow thinking, blurred vision, light sensitivity, double vision, and restlessness. Individuals are asked to rate the symptoms over the last 24 h compared with their condition before the TBI using a five-point Likert-type scale (from 0 “not experienced at all” to 4 “a severe problem”). Based on the originally proposed unidimensional factor structure, the total score ranges from 0 to 64 with higher values indicating greater impairment, whereby values of “1” indicating no more of a problem than before TBI are treated as zero. For clinical screening, a cut-off score of 12 can be applied [19]. The factorial structure of the RPQ has so far been the subject of several studies [19,20,21,22,23,24] and no agreement on an ultimate solution has yet been reached. Initial analyses of the unidimensional structure using data collected in the CENTER-TBI study also revealed rather poor model fit across language samples [13].

Statistical analyses

Descriptive statistics and language-samples comparisons

Prior to statistical analyses, language samples were compared by sample characteristics (esp. injury-related factors) using Kruskal–Wallis tests and pairwise U-tests to ensure their general comparability. For the pairwise U-tests, Vargha and Delaney’s effect size [25] was calculated based on the following cut-offs: groups equality (0.50), small (0.35–0.44 or 0.56–0.63), medium (0.30–0.34 or 0.64–0.70), and large effect (beyond 0.29 or 0.71). Distribution of the TBI severity groups in the language samples has been investigated applying a two-dimensional chi-squared test and computing Cramer’s V to determine the effect size. For this purpose, we first used the initial distribution containing four TBI severity groups (i.e., uncomplicated mild, complicated mild, moderate, and severe TBI), and then a collapsed classification (i.e., mild/moderate and severe TBI). The effect size was determined using Cohen’s taxonomy [26] with values of 0.10, 0.30, and 0.50 representing small, medium, and large effects, respectively. Furthermore, the distribution of PCS in language samples has been investigated.

Analyses of dimensionality

First, we analyzed the response pattern of the RPQ items within the language samples. The factorial structure of the RPQ translations was then examined separately for each language version using confirmatory factor analysis (CFA) with robust weighted least squares estimator (WLSMV) [27] for ordered categorical data. In the absence of agreement on the factorial structure of the RPQ, six competing models derived from previous research were estimated: the initial one-factor model [6], two two-factor models [19, 20], two three-factor models [21, 22], the first of which [21] was based on research findings presented by Gerber and Schraa [28], and one four-factor [23] model, corresponding to the study findings by Lannsjö and colleagues [24]. For more details on RPQ models estimated in the present study, see Table 1.

Table 1 Factorial structures of the RPQ investigated in the present study

The fit of all estimated models was assessed by several goodness of fit indices: \({\chi }^{2}\) and degrees of freedom (df), as well as the ratio \({\chi }^{2}/df\), the comparative fit index (CFI) [29], the Tucker-Lewis index (TLI) [30] the root mean square error of approximation (RMSEA) [31] including 90% confidence interval (CI90%), and the standardized root mean square residual (SRMR). A ratio \({\chi }^{2}/df\) ≤ 2 indicate good model fit [32], CFI and TLI values larger than 0.95 indicate a good fit [33], the RMSEA values less than 0.05 signal a close fit, values from 0.05 to 0.08 a fair fit, between 0.08 and 0.10 a mediocre and above 0.10 a poor fit [34]; the same criteria apply to the CIs. SRMR values less than 0.08 demonstrate a good model fit [33]. Since the cut-off values for the CFI, TLI, and RMSEA have not yet been validated for ordinal data, interpretation should be carried out with caution [35]. Therefore, all fit measures were considered simultaneously to identify the best fitting model.

For the CFI analyses, we first used the raw data to obtain the model fit and then the data with modified items due to missing responses in some categories. Thus, responses 3 “moderate problem” or 4 “severe problem” of the items Nausea and Double vision were collapsed with the category 2 “mild problem” implying a trichotomized response format when considering 1 “no more of a problem than before TBI” as “1” and a dichotomized scale when treating “1” as “0”. For all other items, the original responses were kept to maximally retain information.

Measurement invariance (MI)

The best-fitting factor solution served as the basis for MI analyses across languages and for TBI severity MI analyses. Because the absence of responses in some categories would not allow for invariance testing across groups, only modified items were considered. We conducted a multi-group CFA with stepwise increasing constraints following the framework proposed by Wu and Estabrook [36] and updated by Svetina et al. [37] to make it more suitable for Likert-type scales. This approach slightly differs from the classical MI procedure. First, the baseline model testing for configural invariance was fitted. Then, this model was restrained by requiring invariance of thresholds, and thresholds and loadings across the groups. The models were stepwise compared by calculating the chi-square difference test and changes in CFI (ΔCFI) and RMSEA (ΔRMSEA). Models showing non-significant differences (p ≥ 0.05), ΔCFI < 0.01 [38], and ΔRMSEA ≤ 0.01, which is recommended for groups with unequal sample sizes [39], were considered equivalent. If the difference tests between the models were not significant, the assumption of MI was considered justified. Once the MI assumption was fulfilled, a multi-group CFA approach was again used to examine the differences between mild and moderate/severe TBI across all RPQ translations. Finally, the best-fitting model was estimated and visualized for the total study sample.

According to the original scoring [6], “1” responses indicating that a symptom posed no more of a problem than before TBI should be treated as “0”. Because participants explicitly used a five-point Likert-type scale when completing the questionnaire, “1” responses were considered in both CFA and MI analyses. However, we additionally replicated the analyses using the simplified response scale (i.e., treating 1 as 0) to achieve greater congruence with the original scoring procedure. These results are reported in the Additional file 1.

All analyses were carried out with R version 4.0.0. [40] and packages "Table 1" [41] for descriptive analyses and "lavaan" [42] for the CFA and MI testing as well as package "lavaanPlot" for model visualization [43]. The significance level was set at 5% except for pairwise comparisons, for which the Bonferroni correction was performed to avoid alpha inflation \(\left( {\upalpha_{{{\text{adj}}}} = \frac{.05}{6} = 0.008} \right)\).

Results

Sample characteristics

The total sample comprised 1,818 participants (65.4% male) with a mean age of 49.5 ± 19.5 years (Mdn = 51, range 16–95) who completed the RPQ at six months after injury. Most individuals sustained a mild TBI (73.1%). The cause of injury was most commonly either a road traffic accident (41.7%) or incidental fall (43.7%). At six months after TBI, more than half of the individuals showed good recovery (GOSE: 7–8) across all language samples. For more details, see Table 2 and Additional file 1: Table S1 – Additional characteristics of the language samples.

Table 2 Characteristics of the language samples

Some significant differences were observed between the language samples regarding age, years of education, ISS, GCS at baseline, and GOSE at six months. Dutch participants were significantly older (52.9 ± 19.1) compared to all but the Italian sample. Finnish (13.3 ± 3.16) and Italian (12.4 ± 4.37) participants had slightly fewer (but statistically significant) years of education compared to the others. The Italian sample had a lower GCS (11.93 ± 4.13) compared to the Dutch, Finnish, and Norwegian samples. At six months after TBI, Finnish participants recovered slightly better (Mdn = 8; range 3–8) and had less severe extracranial injuries (ISS: 13.3 ± 9.94) compared to individuals from the Netherlands, the UK, Italy, and Spain (ISS only). However, the effects were small according to the predefined cut-offs (i.e., 0.35–0.44 or 0.56–0.63). The distributions of both four (i.e., uncomplicated mild, complicated mild, moderate, and severe) and two (i.e., mild/moderate and severe) TBI severity groups differed between language samples (p < 0.001). The effect sizes represented a small effect (V = 0.13 and V = 0.17, respectively). There was no significant difference in the RPQ total score across the samples. For details, see Additional file 1: Table S2 – Comparisons of language samples regarding sociodemographic and injury-related factors.

Distribution of PCS across language samples and TBI severity groups

The distribution of PCS was similar across all language samples. Fatigue was the most frequently reported symptom at six months after TBI with 37% (Spanish sample) to 56% (English sample), followed by Forgetfulness with 36% (Finnish sample) to 46% (English sample), and Poor concertation with 31% (Spanish sample) to 40% (Italian sample). Individuals from the English sample tended to report more intense PCS (8 out of 16 symptoms were rated as at least a mild problem) compared to participants from other language samples (see Table 3, left part—Language samples).

Table 3 Proportion of PCS rated as at least mild

Similar patterns were observed when examining TBI severity groups. Items Fatigue (44% and 63%), Forgetfulness (38% and 57%), and Poor concentration (34% and 44%) presented the most frequently reported symptoms in both the mild/moderate and severe TBI groups, respectively. More than one-third of individuals after severe TBI also rated the following symptoms as at least mild: prolonged thinking (42%), being frustrated (39%), irritable (36%), and depressed (35%) (see Table 3, right part—TBI severity). For visualization, see Additional file 2: Figure S1 – Distribution of the PCS ratings in (A) each language sample and (B) for the TBI severity groups.

Analyses of response pattern

The analysis of response patterns per language sample revealed an unequal distribution of the response categories across all samples. Especially higher ranked categories (i.e., 3 “a moderate problem” and 4 “a severe problem”) showed a low endorsement rate in some items. One item (Nausea) was not rated as a severe problem in Finnish, Italian, and Norwegian samples. The frequencies of endorsements for this item in the Dutch, English, and Spanish samples were also sparse: 1 (0.4%), 2 (0.9%), 5 (0.8%). The endorsement of the category “a moderate problem” varied from 0% (English sample) to 2.2% (Dutch sample). In addition, the item Double vision was rated a maximum of 2 (“a mild problem”) in the Norwegian sample, resulting in no endorsement in two response categories (3 “a moderate problem” and 4 “a severe problem”). The highest endorsement rate for the category 3 “a moderate problem” or 4 “a severe problem” was observed in the Italian sample (5.2%). For more details, Additional file 1: Table S3 – Analyses of response patterns by language sample.

Confirmatory factor analyses (CFA)

The CFA revealed that a four-factor structure [23] comprising vertigo, mood/somatic, cognitive, and vision symptoms fitted the data best across the languages closely followed by the three-factor structure [21] including somatic, emotional, and cognitive symptoms (see Table 4). The estimation of the two-factor model comprising emotional-somatic and cognitive domains [19] did not converge in a proper way with covariance matrix of latent variables being not positive definite. Therefore, interpretation of goodness of fit indices of this model should be carried out with caution. Correlations between latent factors were high across all models and languages (i.e., standardized coefficients exceeded 0.65; see Additional file 1: Table S4 – Correlations between latent variables (raw data)). When using trichotomized responses of the items Nausea and Double Vision, the models revealed comparable fit across the languages (i.e., difference observed on the third decimal place; see Additional file 1: Table S5 – CFA results for competitive factorial structure analyses of the RPQ across the language samples (trichotomized items Nausea and Double Vision)  for the model fit indices and Table S6 – Correlations between latent variables (trichotomized items Nausea and Double Vision) for correlations between latent factors).

Table 4 CFA results for competitive factorial structure analyses of the RPQ across the language samples (raw data)

When “1” responses were treated as zero, some fit indices indicated slightly better model fit across all estimated factorial solutions and languages. However, the item measuring Nausea, which was dichotomized because of missing responses in the higher response categories, showed high correlations (approx. r = 1.00) with items Dizziness, Feeling Frustrated, Poor Concentration, Taking Longer to Think, and Blurred Vision. Furthermore, two model estimations resulted in not positive definite covariance matrices: the three-factor model (general somatic, mood/cognition, and visual somatic symptoms) [22], and the firstly favorized four-factor model (vertigo, mood/somatic, cognitive, and vision) [23]. For details, see Additional file 1: Table S7 – CFA results for competitive factorial structure analyses of the RPQ across the language samples (considering “1” responses as “0” and using dichotomized items Nausea and Double Vision) for the model fit indices and Table S8 – Correlations between latent variables (considering “1” responses as “0” and using and dichotomized items Nausea and Double Vision) for correlations between latent factors).

Overall, the three-factor model comprising somatic, emotional, and cognitive factors [21] performed best across all competing factorial solutions in all language samples, regardless of how the “1” responses were treated as “1” or “0”. Therefore, this factorial solution was chosen as a baseline model for the MI analyses.

Measurement invariance (MI)

The cross-linguistic MI analyses revealed satisfactory results (see Table 5—upper part). Except for the χ2 p-values, no fit indices exceed the predefined cut-off values in the baseline model as well as in the models with increased constraints (i.e., thresholds as well as threshold and loadings model). Model comparisons were not significant. Taken together, the free-factor model did not show any violation of measurement equivalence between languages. When treating “1” responses as zero, model fit slightly increased (see Additional file 1: Table S9 – Results of MI analyses between language samples and TBI severity groups and model comparison for the three-factor model comprising somatic, emotional, and cognitive factors considering “1” responses as “0”; upper part). Therefore, this model was considered suitable for measuring PCS using the six RPQ translations.

Table 5 Results of MI analyses between language samples and TBI severity groups and model comparison for the three-factor model comprising somatic, emotional, and cognitive factors using raw data [21]

Analyses of the TBI severity groups revealed no violation of MI assumption as reflected by non-significant difference between the models with different constraints (see Table 5—lower part). Here, again, an increase of the model fit was observed when treating “1” as “0” (see Additional file 1: Table S9 – Results of MI analyses between language samples and TBI severity groups and model comparison for the three-factor model comprising somatic, emotional, and cognitive factors considering “1” responses as “0”; lower part). These findings also support the applicability of the three-factor solution for PCS assessment using the RPQ in both examined TBI severity groups.

Final model

Estimation of the final model comprising somatic, emotional, and cognitive factors using raw data of the total study sample revealed satisfactory results with χ2(101) = 647.04, χ2/df = 6.41, p < 0.001, CFI = 0.995, TLI = 0.994, RMSEA = 0.055, CI90%[0.051, 0.059], SRMR = 0.051. Except for significant p-value and χ2/df-ratio > 2, which can be explained by the large sample size, all other fit indices showed excellent model fit. The correlation between latent factors was high (somaticemotional: 0.85; somaticcognitive: 0.81; emotionalcognitive: 0.81). For the model visualization, see Fig. 2. When treating “1” responses as “0”, the results indicated a better fit with χ2 (101) = 377.78, χ2/df = 3.74, p < 0.001, CFI = 0.997, TLI = 0.996, RMSEA = 0.039, CI90%[0.035, 0.043], SRMR = 0.049. Again, latent factors were highly correlated (somaticemotional: 0.86; somaticcognitive: 0.83; emotionalcognitive: 0.81). For the model visualization, see Additional file 3: Figure S2 – Final model somatic (soma), emotional (emo), and cognitive (cog) factors for the total study sample when treating “1” responses as “0”.

Fig. 2
figure 2

Final model including somatic (soma), emotional (emo), and cognitive (cog) factors using raw data of the total study sample. The numbers depict standardized coefficients

Discussion

The present study aimed to examine the factorial validity and cross-linguistic comparability of the RPQ between six language samples. Additionally, measurement equivalence of the RPQ within TBI severity groups was investigated. The results suggest that a three-factor structure consisting of somatic, emotional, and cognitive symptom groups best captures PCS across languages. Moreover, the RPQ measures PCS equivalently across both the six language samples (i.e., Dutch, English, Finnish, Italian, Norwegian, and Spanish) and TBI severity groups (i.e., mild/moderate vs. severe). This enables national and international research on PCS and direct comparisons of outcomes across the analyzed languages within the full spectrum of TBI severity.

The RPQ has a relatively long history of attempts (2003–2018) to identify the best fitting factorial solution and thus best suitable scoring. To date, there is still no agreement as to which factorial structure would be more appropriate to assess PCS. Nevertheless, most researchers do agree on the multidimensionality of the RPQ [19,20,21,22,23,24].

Our findings show that the three-factor structure [21] including somatic, emotional, and cognitive scale is most appropriate for PCS assessment across six language-based samples after TBI. The favorized model is also the only one—apart from the original unidimensional factor structure—which was based on theoretical assumptions [28]. This point can also partly explain gaining problems with fitting of the models showing satisfactory results in previous studies [19, 23]. Exploratory-founded, data-driven factorial solutions may fit the data well in derivation studies but perform poorly in other datasets.

Furthermore, the scoring demonstrates clinical practicality because there are no additional constraints that may complicate the calculation of scale scores (e.g., no correlated error terms as proposed by Thomas et al. [23] to increase the model fit). In addition, two studies on factorial structure of the RPQ [19, 23], aimed in part at replicating previous scoring results, found that the three-factor model provided at least a satisfactory model fit. Potter et al. [19] found high covariance (i.e., 1.02) between the somatic and emotional latent factors, but this however was not be demonstrated in the present study (i.e., 0.85 for the total sample using raw data and 0.86 considering “1” responses as “0”).

Yielding satisfactory results in one language does not provide any evidence for cross-linguistic comparability of a questionnaire. All but one study [24], which recruited Swedish-speaking participants, investigated factorial structure of the RPQ in English-speaking samples. In the present study, we observed that fit indices of the competitive factorial solutions were comparable across the languages. Since the favored three-factor structure showed empirical evidence of MI, we would recommend using this scoring in both national and international clinical and scientific investigations using the RPQ. However, from the intercorrelations between the scales, it is evident that cognitive, somatic, and emotional symptoms are not completely independent of each other. Therefore, the use of the RPQ total score can be maintained at least as a proxy for the total PCS severity rating.

In line with previous suggestions [20], we would recommend a reduction of the response categories. In particular, the response category “1—no more of a problem than before” contributes hardly any added value to obtain more information. The original scoring of the RPQ excludes this category from the calculation of the total score. However, there are some pitfalls in modifying data for scoring post hoc, which is generally not recommended [44]. First, there is a difference between the number of categories presented and the number of categories used for scoring. Second, specifically in case of the RPQ, the original response scale consists of a mixture of information from the present (i.e., current symptom burden) and the past (i.e., before TBI). Although these types of scales have advantages, such as avoiding the administration of two forms of questionnaires to assess pre-TBI and post-TBI symptoms, as in the use of the Postconcussion Symptom Inventory [45], they may be particularly challenging for participants with cognitive impairments, which is likely to be the case after TBI. In addition, potential self-report or memory bias may influence response behavior in general [46] and in case of traumatic (brain) events in particular [47]. The use of this type of scale may result in inaccurate or even false information being collected. In the present study, the results of both CFAs and MI analyses using simplified response scale resulted in a better model fit. Hence, we can conclude that treating “1” as “0” may contribute to a more valid outcome assessment. However, further empirical evidence is needed before reducing the number of responses. We would suggest that future studies should address this issue by having the same group of patients complete the RPQ using different response scales (i.e., 0–4 and 0–3, where “0” could mean either “no problem at all or as before TBI” or “currently no problem”). This comparison would provide more evidence and facilitate the decision on the number of response categories of the RPQ, as has been done with other questionnaires [48].

Alternatively, future studies could address the issue of the RPQ scoring by investigating the differences between individuals choosing “0” and “1” responses, for example, using multidimensional Item Response Theory based models. Furthermore, identification of the individuals suffering from symptoms comparable to PCS prior to TBI would facilitate interpretation of the “1” responses. For example, those suffering from chronic health complains such as cancer, chronic pains, or other conditions, can suffer from fatigue, problems with concentration or sleep. This information could be considered when establishing reference or norm values for the interpretation of the results of the patients when applying the RPQ. For example, in the recent study [49] which provided reference values for the United Kingdom, the Netherlands, and Italy, one of the stratifications for the reference values was the presence the chronic health conditions which has proved important for the RPQ scores.

Strengths and limitations

The present study holds several advantages over previous investigations. First, this is the first study involving data on multiple RPQ translations which allows for a broader overview of PCS self-report in six European languages. Second, in contrast to other studies, we applied methods within the CFA framework considering the ordinal nature of the questionnaire items. Third, we additionally address the applicability of the RPQ in different TBI severity groups which had not been yet carried out.

Some limitations should be mentioned as well. Most of the sample consisted of individuals after mild TBI. Thus, those affected by moderate and severe TBI were underrepresented in this study. Therefore, the results of the MI analyses for the TBI groups should be interpreted with caution and further investigation of moderate and severe TBI regarding PCS or PC-like symptoms is highly recommended. A larger sample size of the moderate and/or severe group may result in higher test power and thus lead to more robust results.

Furthermore, there are still some difficulties in assessing PCS related to particular symptoms. The authors are aware that modification of the responses of the items Nausea and Double vision presents a potential weakness, as response behavior reflects the exhaustion of response choices and/or absence of these symptoms at six months after TBI. Interestingly, these items have already undergone some rearrangements during previous analyses of the factorial structure of the RPQ. For example, Potter et al. [19] suggested to drop the item Double vision from the RPQ due to severe skewness and kurtosis. Eyres et al. [20] distinguished between “acute” and “post-acute” PCS whereby the item Double vision was a part of the “post-acute” scale. Other authors attributed the item either to a somatic scale [21], visual somatic [22], or visual domain [23]. Lannsjö and colleagues (2011) [24] found an underrepresentation of responses in the category “severe problem in a large mild TBI sample and the omission of this item had been suggested again.

The item Nausea was the one with the lowest endorsement rate across all language samples. This finding is consistent with the distinction between early and late onset PCS proposed by Ryan and Warden [50] within a mild TBI group. Moreover, Eyres et al. [20] have allocated the Nausea item to the “acute” symptoms using a Rasch-based approach questioning the stability of the PCS and thus the factorial structure of the RPQ over time. Since our data refer to the six-month outcome assessments, there is no information on early-onset symptoms.

Furthermore, the focus of this study was on the factorial structure and its validity, as well as the comparability of the overall PCS construct across language samples. Therefore, item-by-item comparisons using differential item functioning (DIF) techniques were not conducted. Given the rigorous translation and linguistic validation process of the RPQ, which included several stages of harmonization of translations with feedback from psychologists and health professionals, translators, laypersons, and TBI patients, and item-by-item evaluation at the syntactic, cultural, idiomatic/pragmatic, and syntactic/grammatical levels, all possible linguistic issues that might arise during the translation process were addressed [10]. However, some specific problems of individual items may have been overlooked. To further strengthen the evidence for the comparability of RPQ translations, additional research involving item-level analyses is strongly encouraged.

Finally, we only took one specific point of time, i.e., six months after TBI, into account. Longitudinal analyses would provide more insight into the prevalence and persistence of PCS, and the applicability of the RPQ over time. Agtarap et al. [51] provided longitudinal analyses on PCS using a large U.S. sample of individuals after mild TBI. In Europe, a recent study using CENTER-TBI data at 3, 6, and 12 months post TBI [52] showed evidence of the applicability of the RPQ over time and the stability of the three-factor model by Smith-Seemiller et al. [21] that includes emotional, somatic, and cognitive domains.

Conclusions

Although with some limitations, the six RPQ translations were found to measure the PCS construct equally across six European languages and TBI severity groups. The three-factor model consisting of somatic, emotional, and cognitive domains showed the best fit regardless of the treatment of “1” responses. Further studies on the reduction of the RPQ response categories may provide more insight into the comparability of four- and five-point response scales. In the absence of further evidence, we recommend the use of the three-factor structure for scoring, with “1” treated as “0”, in addition to the conventional total score. Finally, item-by-item comparisons between different translations of the RPQ are recommended to strengthen its aggregated applicability across languages.