Background

Racing thoughts refer to the subjective acceleration and overproduction of thoughts, which have classically been associated with mania and hypomania in bipolar disorder (BD) [1, 2]. As a concept, racing thoughts encompass different psychopathological experiences also present in mixed, depressive, and anxious states, but different terms can be recognized in the literature. “Crowded thoughts” is commonly described by the patient as feeling that their head is full of thoughts they cannot stop [3]; “racing thoughts” indicate an increased velocity of thoughts; and “depressive ruminations” are thoughts and ideas confined to specific situations in the past [4, 5]. Distinguishing between the different domains of racing thoughts holds great clinical and psychopathological value. Its presence or absence and the degrees it may present can guide the differential diagnosis when clinical manifestations overlap, especially in disorders characterized by affective disturbances [6]. It also allows for a deeper psychopathological understanding of the patients’ experience [7].

However, until now, efforts to diagnose these types of disturbances have focused on data obtained from the medical records. Tools for measuring this symptom are scarce, mainly due to the difficulty inherent to the subjective nature of thinking. Nonetheless, these tools are necessary, as spontaneous patient reporting is rare, and it is up to the clinicians to specifically investigate this aspect [8]. Another difficulty arises from possible fluctuations, so it may not be easy to capture the full picture. Furthermore, thought processes are not considered primary targets in the acute phases of mood disorders, where the priority for clinicians is to stabilize a patient’s clinical condition. In fact, when the scales used for depressive and anxious disorders include the velocity of thoughts, they only include items on slowness and not on acceleration. In addition, the scales used for mania and hypomania include items on racing thoughts, but do not explore their different domains [9].

For this reason, Weiner et al [8] developed the Racing and Crowded Thoughts Questionnaire (RCTQ) in 2018, consisting of 34 items in English that measure the number, velocity, and types of thoughts. Their conceptual framework proposes that “racing thoughts” is a multifaceted phenomenon involving three domains: 1) “thought overactivation,” referring to an excessive number and velocity of thoughts; 2) “burden of thought overactivation,” which evokes the overwhelming impact of thought overactivation; and 3) “thought overexcitability,” which describes distractibility, a distinctive characteristic associated with racing thoughts. A factor analysis was conducted on the initial validation on the BD population, and a three-factor structure was confirmed. However, it yielded redundant items, which were eliminated, giving rise to a 13-item version of the scale [10].

This short version of the RCTQ (RCTQ-13) preserves the initial three-factor structure and has been shown to have adequate internal consistency and adequate convergent, divergent, and discriminant validity. It was validated on hypomanic and mixed states, as well as on depression with subclinical hypomanic/activation symptoms. This suggests that it could be particularly sensitive to activation symptoms in BD and could become a valuable tool in providing follow-up for these patients. It could be useful in depressive and anxious disorders as well, where patients have also reported this experience [10, 11]. For this reason, it is the most widely used specific scale for this mental phenomenon, but there is currently no Spanish validation of the RCTQ-13 that would allow evaluation in Spanish-speaking patients and an item-level analysis has also not been performed. Therefore, the aim of this study was to translate, adapt, and validate the RCTQ-13 into Spanish in a sample of Colombian patients with mood disorders using classical test theory and item response theory (IRT).

Methods

This was a multicenter study conducted in three centers in the city of Medellín, Colombia (Hospital San Vicente Fundación, Hospital Mental de Antioquia, and Hospital Alma Máter de Antioquia). It complies with the Declaration of Helsinki and was approved by the Bioethics Committee of the School of Medicine of Universidad de Antioquia (Approval Act 016 of 2021) and by the participating institutions. All participants signed the informed consent form. The first stage consisted of translation, adaptation, and the pilot test, and the second stage was for the evaluation of psychometric properties.

Translation and adaptation

We obtained permission from the lead author (Dr. Luisa Weiner) to use the scale. The objective of this stage was to produce a Spanish version of the RCTQ-13 that would be linguistically and culturally equivalent to the original English version. The scale was translated and adapted following the translation and back-translation process. Initially, two translators independently translated the items from English to Spanish. The two translators and a review board consisting of three psychiatrists, a psychiatry resident, and a professor from the School of Languages reviewed both translations and agreed on a unified Spanish version of the instrument. This version was then translated back into English by two different translators who were not familiar with the original version of the scale. The review board and the translators compared both back-translations and resolved the inconsistencies to produce a single back-translated scale, which was then compared with the original. The found inconsistencies were modified until all board members agreed that the original and translated versions had identical meaning and content, only with the particularities specific to the Colombian population.

Pilot test

It was conducted on 14 subjects diagnosed with BD. The aim was determining the ease of administration of the questionnaire, the average time of administration, and the difficulties that could arise during answering. Afterwards, they were subjected to a cognitive interview as recommended by the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) [12] to evaluate the comprehensibility, comprehensiveness, and relevance of the items, instructions, and response options. These interviews were recorded and transcribed verbatim for later analysis by the research team.

Validation stage

Participants

We included patients diagnosed with bipolar I disorder, in maniac or depressive episode or in full remission (considered euthymic), and patients with major depressive disorder following the criteria of the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) [13], who were under inpatient and outpatient care in the participating institutions. Patients with cognitive impairment, intellectual disability, psychosis, and a level of education under 5 years were excluded. We calculated the sample size for each of the evaluated psychometric properties. We considered 250 people for internal consistency following Streiner’s recommendations [14] for scales with over 10 items, with an expected Cronbach’s alpha of 0.7 and a 95% confidence interval (95% CI) width of 0.1. We used the same number of patients for structural validity. We included 100 participants for the test-retest reliability, as recommended by De Vet [15] with an expected intraclass correlation coefficient (ICC) of 0.7 and a 95% CI width of 0.1. For convergent construct validity, we calculated a sample size of 55 people using the sample size formula for determining the correlation coefficient, with a type I error of 0.05, a type II error of 0.20, an alternative hypothesis correlation coefficient of 0.5 (as moderate correlations were expected with other related but not identical constructs), and a null hypothesis correlation coefficient of 0.2, as well as a one-tailed hypothesis test. For discriminant construct validity we included 63 patients in each group, calculating a sample size for the mean difference between independent groups, with a type I error of 0.05, a type II error of 0.2, an expected standardized mean difference of 0.5 and a 1:1 ratio of affected vs. unaffected [16]. For responsiveness, a sample size of 72 was calculated using the Hanley and McNeil formula [17] with an expected area under the ROC curve (AUROC) of 0.7, a type I error of 0.05, a type II error of 0.20, a null hypothesis AUROC of 0.5 and an expected 2:1 ratio between subjects who do not change and those who do.

Procedures

Each subject received information about the study, and they were asked to complete the questionnaire after signing the informed consent form. A subsample was newly administered the RCTQ-13 5 days after the first administration to evaluate test-retest reliability. To evaluate construct validity through hypothesis testing, a subsample of 55 people was administered the Young Mania Rating Scale (YMRS) [18], the Montgomery–Åsberg Depression Rating Scale (MADRS) [19], the Ruminative Response Scale (RRS) [20], the Penn State Worry Questionnaire (PSWQ) [21], and the State-Trait Anxiety Inventory (STAI) [22]. Discriminant validity was initially approached by comparing the RCTQ-13 scores of the following relevant patient groups: [1] with hypomanic episodes, [2] with manic episodes, 3) with manic episodes with mixed features [4] with depressive episodes, [5] with depressive episodes with mixed features, and [6] euthymic patients (in full remission). However, the hypomanic episode group was not included in the final analysis due to the small number of individuals (n = 3) and there were no patients with mania with the mixed symptoms specifier. Therefore, only the euthymia, mania, depression and depression with mixed symptoms groups were left in the final analysis. Classification into each of the groups was determined by an interview conducted by an experienced psychiatrist, using DSM-5 criteria and the results of the Young Mania Rating Scale and the Montgomery–Åsberg Depression Rating Scale. For determining responsiveness, we used a criterion-based approach using the Clinical Global Impression (CGI) rating scale as the reference standard [15]. The RCTQ-13 was administered a second time on a sample of 72 patients 4 weeks after the first administration with CGI for determining change.

Instruments

Short version of the Racing and Crowded Thoughts Questionnaire (RCTQ-13): 13-item self-report questionnaire that evaluates thought overactivity during the past 24 h [10]. The first 4 items belong to the thought overactivation subscale. The following 4 items belong to the burden of thought overactivation subscale, and the last 5 items correspond to the thought overexcitability subscale.

Young Mania Rating Scale (YMRS): it consists of 11 items, which are individually scored on a 5-option response scale corresponding to different degrees of severity of the mania. They are explicitly defined for each item [18]. For each item, the response options are rated with 0, 1, 2, 3, or 4 points. However, the five response options for items 5, 6, 8, and 9 are scored with double points. The final total score of the scale is obtained by adding up all the points, indicating the degree of severity of the patient’s manic state from least to most severe. The scale takes about 15-30 minutes to be administered, and the general recommendation is to mark the highest score applicable to the patient for each item. For this study, we used a cutoff score of > 5 points to determine whether a patient presents hypomania or mania. The scale is not validated for Colombia, but it has been validated in Spanish [18].

Montgomery-Åsberg Depression Rating Scale (MADRS): hetero-administered questionnaire consisting of 10 statements for major depressive episode diagnosis, which focuses on cognitive, affective, and somatic aspects. It has been validated in Spanish and for Colombia [19]. Additionally, 7 degrees of severity (0-6) are considered for each item, which associate the even values (0, 2, 4, 6) to statements. The scale allows for intermediate scores between two statements when it is uncertain which statement applies. The total score of the scale is obtained by adding the values selected for each item, with an interval of 0-60 points.

Ruminative Response Scale (RRS): it is a 22-item self-report questionnaire that evaluates two aspects of rumination during the last 7 days, including the past 24 h: “brooding” (5 items), which refers to the tendency for brooding and mood pondering, is related to a negative mood, and is considered to be maladaptive; and “reflection” (5 items), which refers to active efforts to understand one’s negative feelings, and is considered adaptative. The items are classified on a scale from 1 “almost never” to 4 “almost always.” It is validated for Colombia [20].

Penn State Worry Questionnaire (PSWQ): it is a measure of anxiety designed to evaluate the general tendency to experience worry [20]. It consists of 16 items to which participants respond according to a 5-point scale, ranging from 1 (“not at all typical of me”) to 5 (“very typical of me”). The possible range of scores is 16-80: 16-39 = low worry, 40-59 = moderate worry, and 60-80 = high worry. The questionnaire is currently validated for Colombia [21].

State-Trait Anxiety Inventory (STAI): instrument based on a theoretical model of anxiety as a state and as a trait [22]. State anxiety is a transient emotional condition characterized by consciously perceived subjective feelings of tension and apprehension, as well as by hyperactivity of the autonomic nervous system. Trait anxiety is a relatively stable personality attribute whereby subjects tend to perceive situations as threatening, consequently raising their anxiety level. The time frame of reference for state anxiety is “right now” (20 items) and was the one used in this study. Each subscale is made up of 20 items on a 4-point Likert scale system based on intensity (0 = almost never/not at all; 1 = somewhat/sometimes; 2 = moderately so/often; 3 = very much so/almost always). The total score in each subscale ranges from 0 to 60 points [21]. It is validated in Spanish and for Colombia [22].

Clinical Global Impressions (CGI) scale: it refers to the global impression of the patient and therefore requires clinical experience [23]. It is a descriptive scale that provides qualitative information regarding the severity of the condition and the change seen in the patient compared to the baseline state. It is comprised of two subscales that evaluate the severity of the condition and the improvement of the condition due to treatment. The notion of improvement refers to the distance between the patient’s current condition and the condition recorded at the start of the treatment. Both scales consist of a single item, which in this case was answered by a clinician who evaluated the patients at the time the scales were applied. It is validated in Spanish [23].

Statistical analysis

To describe the sociodemographic and clinical characteristics of the participating subjects, we used frequencies and percentages for qualitative variables, and medians and interquartile ranges for quantitative variables, since they did not present a normal distribution according to the Shapiro–Wilk test. We also determined the frequency of items with missing data and the frequency of use of each response option.

For structural validity, we conducted a confirmatory factor analysis (CFA) of the three-factor model proposed by the authors using the diagonally weighted least square mean and variance estimator (WLSMV) [11]. The following goodness-of-fit statistics were used: RMSEA (Root Mean Square Error of Approximation), CFI (Comparative Fit Index), TLI (Tucker-Lewis Index), and SRMSR (Standardized Root Mean Square Residual). The fit of the model was considered adequate if: RMSEA = 0.06-0.08 and CFI and TLI > 0.95 [24]. We also evaluated internal consistency using Cronbach’s alpha and McDonald’s omega [25] as well as the correlations between each item and the total score. Test-retest reliability was also determined by means of the ICC with a 95% CI. Also, the Bland–Altman plot was used to represent the limits of agreement between the two measurements for the total score and each factor.

Regarding construct validity through hypothesis testing, convergent validity was assessed by calculating the Spearman correlation coefficient of the RCQT-13 scores with item 7 of the YMRS, which assesses language-thought disorders; a moderate positive correlation was expected. For divergent validity, we calculated the Spearman correlation coefficient between RCTQ-13 and the MADRS, the state subscale of the STAI, and the PSWQ, expecting it to be low as they do not specifically include racing thoughts. While the statistical significance of the Spearman coefficients was calculated, the interpretation was primarily based on the strength of the association. Correlations with values greater than 0.6 are considered as “strong,” those falling between 0.30 and 0.60 are considered “moderate”, and any value below 0.30 suggests a low or weak correlation [26].

For discriminant validity, we compared total and subscale scores between the different patient groups using the Kruskal–Wallis test, as data distribution was not normal. Likewise, ordinal epsilon squared (ε2) was calculated for comparing 2 or more groups as a nonparametric effect size measure, with values interpreted as small (0.01-0.06), moderate (0.08-0.26), and large (≥0.26) [27]. Post-hoc pairwise comparison were performed after significant effects with Dunn test with multiple comparison adjustment with Bonferroni method. A level of statistical significance was defined as a p-value of less than 0.05.

For responsiveness, we calculated the Spearman correlation coefficient between the change classification indicated in the CGI and the mean difference of the scores obtained in the two measurements of each RCTQ-13 subscale. In addition, the AUROC was calculated for the entire scale, using as a reference the presence of change; it was considered adequate if it presented values > 0.7 [17].

Item response theory was used to estimate the difficulty and discrimination for each item by applying a generalized partial credit model (GPCM) [28, 29]. The category characteristic curve (CCC) was also obtained for each item. The fit was evaluated for each item based on the values of the infit and outfit statistics, which were considered acceptable if they were between 0.5 and 1.5 [30].

The statistical analysis was conducted using Stata 15. For factor analysis and evaluation based on item response theory, we used R [31] and R Studio [32] with the lavaan [33] and ltm [34] packages, respectively.

Results

Translation, adaptation and pilot test

We obtained a Spanish RCTQ-13 version which was approved by the review board. It proved to be easy to administer in the pilot test, although supervision was required for a few subjects with a low level of education. Thus, it was decided that participants must have completed until the fifth year of elementary school to participate in the rest of the study. The average time for administering the questionnaire was 5.1 minutes. In the cognitive interview, certain items presented comprehensibility issues, which led to modifications. Item 2 was changed from “Mis pensamientos van a 200km/h” (My thoughts race at 200 km/h) to “Mis pensamientos van muy rápido”(My thoughts go very fast) because the symbol “km/h” was confusing. Item 5 was also modified from “Mi cerebro no puede manejar todos los pensamientos que me surgen al mismo tiempo” (My brain cannot manage all these thoughts that arise at the same time) to “Mi cerebro no puede controlar todos los pensamientos cuando me surgen al mismo tiempo” (My brain cannot control all thoughts when they come to me at the same time) because the participants had difficulty understanding the word “manage” specifically associated with mental capacity. Item 6, “Me siento angustiado en mi vida diaria por la gran cantidad de pensamientos o por la velocidad de estos en mi mente” (I feel distressed in my everyday life by the great number of thoughts or by the velocity of thoughts in my mind) was changed to “Me angustia tener tantos pensamientos en la mente y/o que vayan tan rápido” (I feel distressed by so many thoughts in my mind and/or to have them go so fast) because the participants found the item to be too long and complex. These modifications were evaluated in a new group of 10 patients. They found it easy to understand, and the version was submitted for validation (The complete scale in Spanish is available in Additional file 1).

Validation process

A total of 250 participants were included, 22% of whom were male, with a median age of 37.5 years and an 11-year level of education. In the clinical interview, 190 patients were diagnosed with BD type I (76%), mainly in a manic episode. Other demographic and clinical characteristics are shown in Table 1. There were no unanswered items, and the participants used all the response options in each item (Table 2).

Table 1 Sociodemographic and clinical characteristics of the participants (n = 250)
Table 2 Frequency of responses to each item of the Spanish version of the RCTQ-13 (n = 250)

Structural validity

The three-factor structure hypothesis proposed by the developers of the scale was confirmed in the CFA (Fig. 1), with goodness-of-fit statistics that were adequate for the model (RMSEA = 0.061, CFI = 0.9, TLI = 0.9, and SRMSR = 0.04).

Fig. 1
figure 1

Confirmatory factor analysis of the Spanish version of the RCTQ-13. Three-factor structure displaying correlations between items and factors, for which the goodness of fit was good (Root Mean Square Error of Approximation = 0.061, Comparative Fit Index = 0.9, Tucker-Lewis = 0.9, and Standardized Root Mean Square Residual = 0.04)

Reliability

Internal consistency was adequate for the entire scale and for each of the factors (Table 3).

Table 3 Internal consistency of the Spanish version of the RCTQ-13 evaluated in a Colombian population

Regarding test-retest reliability, the ICC for the entire RCTQ-13 was 0.82 CI (95% CI 0.70-0.88); 0.79 (95% CI 0.68-0.86) for Factor #1, 0.80 (95% CI 0.70-0.87) for Factor #2, and 0.77 (95% CI 0.66-0.85) for Factor #3. The Bland–Altman plot for the entire scale and each of the factors showed that there are slight differences between the two administrations with a slightly higher score in the first evaluation, especially in the middle range of scores, with no observable systematic trend (Fig. 2).

Fig. 2
figure 2

Bland–Altman plots for test-retest reliability of the Spanish version of the RCTQ-13 and its factors. A Entire scale (Racing and Crowded Thoughts Questionnaire 13 items). Mean difference: 3.8 (95% CI: 2.1 to 5.6). Limits of agreement: -13.5 (95% CI: − 16.5 to − 10.5) and 21.2 (95% CI: 18.3 to 24.2). B Factor #1. Mean difference: 1.2 (95% CI: 0.5 to 1.8). Limits of agreement: -5.4 (95% CI: − 6.5 to − 4.2) and 7.8 (95% CI: 6.7 to 8.9). C Factor #2. Mean difference: 1.1 (95% CI: 0.5 to 1.8). Limits of agreement: -5.4 (95% CI: − 6.5 to − 4.3) and 7.8 (95% CI: 6.6 to 8.9). D Factor #3. Mean difference: 1.4 (95% CI: 0.6 to 2.3). Limits of agreement: -6.8 (95% CI: − 8.2 to − 5.4) and 8.8 (95% CI: 8.3 to 11.2)

Construct validity

According to the convergent validity, there was a low negative correlation of the RCTQ-13 scale with the scores of item 7 of the YMRS (Table 4). Regarding divergent validity, the total scale and the three factors showed a low positive correlation with the MADRS and the STAI, as was expected. However, there were moderate positive correlations with the PSWQ and the RRS (Table 4).

Table 4 Convergent and divergent construct validity of the Spanish version of the RCTQ-13 in a Colombian population

As for discriminant validity, statistically significant differences were found in the total score of the scale and between patients with different affective episodes, with a moderate effect size (ε2 = 0.09) (Fig. 3).

Fig. 3
figure 3

Discriminant validity of the RCQT-13. Median total scores of the patients with different affective episodes (n = 247). Medians were compared using Kruskal-Wallis test, which showed statistically significant differences, with moderate effect size so post-hoc paired comparisons were performed using Dunn’s test, applying a Bonferroni correction for multiple comparisons. The statistically significant pairwise comparisons are shown at the top. Patients with current episode hypomanic (n = 3) were not included

Individuals in the groups experiencing depressive and depressive with mixed features episodes had higher total scale scores compared to those in the groups with manic episodes and euthymic states (Table 5). Specifically, the groups with depressive with and without mixed features episodes had higher scores in relation to Factor #2 (Burden of thought overactivation) when compared to individuals in the manic episode and euthymic groups, suggesting that the difference in total score is at the expense of this factor. As expected, the euthymia group scored lower on the total scale and on all factors.

Table 5 Discriminant construct validity of the Spanish version of the RCTQ-13 in a Colombian population with different affective episodes (n = 247)a

Responsiveness

The correlation between the change in the CGI and the mean difference in RCTQ-13 scores was moderate and negative (ICC = − 0.31). Based on the total scores of the RCTQ-13 scale and the outcome of change or no change, according to the CGI, the area under the ROC curve was 0.71 (95% CI 0.50-0.92).

Item response theory

Upon analyzing item difficulty, we found that item 1 “Tengo demasiados pensamientos al mismo tiempo” (I have too many thoughts at the same time) was the easiest, while item 10 “No tengo tiempo suficiente para comprender el significado de un pensamiento, porque inmediatamente me surge otro” (There is not enough time to grasp the meaning of a thought, as new ones immediately arise) was the most difficult. The CCCs for each item are presented in Fig. 4. In general, the response thresholds for the response options are organized.

Fig. 4
figure 4

Category characteristic curve for each item of the Spanish version of the RCTQ-13. The number for each item corresponds to the number after “rctq”. Item 10, for example, is rctq10. Each option for the response of the item es presented as a specific curve, and is designated with the letter P. First response option “Not at all”, for example, is P1

The infit was adequate for all items and the outfit was acceptable, except for items 3, 4, and 7 (Table 6).

Table 6 Item response theory parameters of the Spanish version of the RCQT-13 in a Colombian population (n = 250)

Discussion

We linguistically and culturally adapted the short version of the RCTQ-13, and we found that it has a structure coherent with the theoretical development of the instrument, adequate internal consistency, and test-retest reliability in patients with affective disorders. Evidence of its discriminant construct validity was also observed, as the hypothesis of differences between euthymic patients and those with affective episodes was met. In our study, most participants were experiencing a manic episode. This differs from the original RCTQ-13 validation study, which did not include patients with mania and had a larger sample of patients with hypomania [10]. Despite this, we found evidence that the Spanish version of RCTQ-13 measure the construct intended to be measured, as it captures not only racing thoughts but also crowded thoughts.

The concept of detecting racing thought only in episodes of mania and hypomania has been expanded with the evidence that in depression there is also the psychopathological description of racing thoughts. This symptom would not only be a specifier or an indication of bipolar depression. In patients with unipolar depression, it has been seen that up to 56.4% may experience racing/crowded thinking [5]. For some clinicians, the often-unrecognized description of these thought symptom can even guide the pharmacological treatment of unipolar depression [35]. Precisely in our study, patients with depressive disorder scored higher on the RCTQ-13, even without having the specifier of mixed symptoms. This would be in favor of racing and crowded thoughts being expressed in all affective disorders as a broad spectrum [36, 37], but that could be expressed with different nuances.

In this regard, the depressive episodes groups had statistically higher scores than the other groups on Factor #2 (“Burden of thought overactivation”). And when evaluating the divergent construct validity of RCTQ-13, a moderate positive correlation was found with the PSWQ AND RRS, especially with this Factor #2. Thus, as previous studies have suggested, racing thoughts in depression could generate great emotional distress and are related to rumination [38,39,40], that is perceived as a “crowded” type thinking that the patient often describes as their head being full of thoughts they cannot stop [38]. To that extent, it is possible that the experience itself may constitute a specific focus of concern for the subject. And therefore, it is necessary to maintain the distinction crowded thought as a specific subtype of racing thoughts and to continue studying relationship between the racing thoughts construct with rumination and worry.

An important finding in this study is that patients with a manic episode scored higher than patients in euthymia specifically on Factor #1, which has to do with thought overactivation, and would indicate a specific facet of racing thoughts in mania. With all this, we think that the idea that racing thoughts constitute a multifaceted phenomenon existing differentially along a continuum across the spectrum of symptomatic mood states is supported [41]. As we evidenced that the patients with mania had statistically lower total scores than patients with depression and depression with mixed symptoms, it is possible that the experience of racing thoughts is different in this group. We observed that the correlation between the Spanish version of the RCTQ-13 and the racing thoughts assessed by clinicians in item 7 of the YMRS, was small and negative. We believe that this difference has to do with the patients’ subjective experience of racing thoughts, which could be different from what the clinician perceives when scoring the YMRS. It is possible that, for patients in a manic state, who were the majority, this experience of acceleration could be pleasant or normalized and their scores on the scale do not reflect the degree of disturbance perceived by the clinician and without the burden (Factor #2) of depressive episodes. It is also possible that the patient may have a perception of increase of thought velocity, but that the clinician did not perceive it in their language assessment.

This discrepancy in the assessment of thought experience between the patient and the clinician has also been suggested by Goldberg [42], who found very low concordance (coefficient κ = 0.15) on the racing thought item of the Mood Disorder Questionnaire (MDQ). This highlights the complexity of approaching thought assessment, where an interaction occurs between the patient’s perception as internal subjective experience and the external detection that the clinician can make in the clinical evaluation. Another possible explanation is that some patients may have been under the effects of sedatives, given that enrollment occurred during inpatient care of the acute episode. Therefore, despite experiencing acceleration, the clinician may not have been able to perceive it due to drowsiness or dysarthria. We do not consider that we have found an indication of a lack of construct validity, but rather that there is a research opportunity to delve deeper into racing thoughts as a subjective experience and how it is reflected in clinical examination.

We also found that the RCTQ-13 has adequate responsiveness when applied a month after the first assessment. This finding is noteworthy, since the authors did not analyze this psychometric property in patients with mood disorders [8, 10]. Responsiveness holds particular significance in longitudinal patient follow-ups as it denotes the capacity of a measurement instrument to detect changes over time in the targeted construct [43], and it is a fundamental psychometric property so that measurement instruments can be used to measure outcomes in clinical trials [44]. In this sense, we have provided evidence of the responsiveness of the RCTQ-13 and its use could be possible to measure changes in patients before and after treatment or for follow-up over time.

We also conducted an analysis of the Spanish version of the RCTQ-13 using IRT, which had not been done before. One of the main advantages of this approach is that it allows knowing the difficulty of the items and the trait level in the measured individuals and it has become an important and complementary approach in the validation process of scales that measure psychological constructs [45]. One of its advantages is that it helps determine how much of the racing thought experience is required to answer each item. With this information, it is possible to select items for different purposes and populations. If, for example, a clinician wanted to screen for experience in the general population, where the amount of the trait is expected to be low, they could use the easiest items, such as item 1 and item 2, which generically inquire on thought overactivation. However, for the assessment of severity and classification of patients with affective episodes, more difficult items should be used, such as item 10, which requires much more of the trait to provide an answer. It is important to note that items 3, 4, and 7 had a low outfit. This could indicate that these items do not fit the model well and do not represent the outliers. This may be due to the fact that the participants made careless mistakes or guessed, which is to be expected to an extent in manic episodes (the most frequent in our study) and may also have contributed to the lower total score obtained in our study. We could therefore suggest that, for this subgroup of patients, the supervision of the clinician could be required or that the possible elimination of these items should be reviewed.

An important limitation in our study was the low representation of patients experiencing a hypomanic episode, which makes it difficult to directly compare our adaptation with the developmental studies of the original version of the scale. It also does not allow us to establish differences in the subjective experience of racing thoughts between patients experiencing these episodes and in other affective states.

Conclusion

The Spanish version of the RCTQ-13 adapted for the Colombian population has adequate reliability and construct validity and responsiveness. Thus, it can be used to measure the construct of racing and crowded thoughts in patients with the spectrum of affective disorders in whom this experience can be expressed with different nuances. It is important to continue studying the racing thoughts construct, considering its relationship with rumination and worry.