Introduction

Many children suffer from emotional disorders. Indeed, lifetime prevalence rates range from 7 to 15%, depending on the type of emotional disorder studied (Costello et al. 2003; Verhulst et al. 1997). Although evidence-based therapies are available to treat children with emotional disorders, around 45% of children do not respond sufficiently (Bodden et al. 2008), so there is room for improvement. To improve treatment results we need to know more about developmental and maintaining factors of disorders, but also about effective components of treatment. One factor which is presumed to play a critical role in the onset and maintenance of anxiety disorders is cognition (Beck 2005). Commonly, anxious cognition is examined by measuring negative thoughts (e.g., “I am worthless”). However, in order to investigate cognitive models of anxiety disorders it is also necessary to measure positive thoughts (e.g., “I feel good about myself”). In the current study, we describe the development of a questionnaire which incorporates negative and positive thoughts: the Children’s Automatic Thoughts Scale-Negative/Positive (CATS-N/P).

Cognitions play an important role in disorders and their treatment. For example, children with anxiety and mood disorders report more dysfunctional and negative beliefs than healthy children (Beck 2005). There are three important models which describe the contribution of thoughts to emotional disorders. According to the States-of-Mind (SOM) model (Schwartz and Garamoni 1989), the balance of positive and negative thoughts is essential for psychological well-being. A ratio of .62 between positive and negative thoughts is considered optimal or healthy. A ratio less than .31 is related to depression or anxiety (Schwartz and Garamoni 1989). A second model is Kendall’s “power of nonnegative thinking”. This model states that anxious children may benefit more from a reduction in the amount of their negative thoughts than from an increase in the amount of their positive thoughts (Kendall and Chansky 1991; Kendall and Korgeski 1979). Third, Beck’s content-specificity hypothesis focuses on dysfunctional cognitive schemata and specific cognitive content. Anxious self-talk is future-oriented, unstable, and focused on threat. Depressive self-talk is past-oriented, stable, and focused on loss and failure (Beck and Clark 1997; Ronan and Kendall 1997).

Several questionnaires have been developed to measure cognitions in children. However, several problems have been associated with the application of these questionnaires. First, the majority of cognition questionnaires for children [e.g., the Children’s Anxious Self-Statements Questionnaire (CASSQ; Ronan et al. 1988), or the Cognition Checklist for Children (CCL-C; Jolly and Dykman 1994)], are downward extensions of measures developed for adults. Children might have trouble understanding items of these questionnaires or make different self-statements than adults. Therefore, the original factor structure for adults may not hold in a younger population. A second problem is that most cognition questionnaires fail to distinguish between thoughts and symptoms. For example, the Negative Affectivity Self-Statement Questionnaire (NASSQ; Ronan et al. 1994; NASSQ-Anxiety scale; Sood and Kendall 2007), which was developed using self-statements generated by children, measures both symptoms (e.g. “I was shaking”) and thoughts (e.g. “I usually do something stupid”). This overlap in item content might artificially inflate correlations between symptom measures and cognition measures. The overlap in item content also makes it difficult to disentangle the specific contributions of symptoms and cognitions to the disorder. Third, most cognition questionnaires measure general (negative) affect and not anxiety and/or depression separately, so it is difficult to examine content specificity. Finally, there are no psychometrically sound questionnaires which incorporate negative as well as positive thoughts (and not positive affect).

A cognitions questionnaire which circumvents most of these problems is the Children’s Automatic Thoughts Scale (CATS; Schniering and Rapee 2002). This questionnaire was specifically designed for children, has been used in different international studies, and measures thoughts but not symptoms. The items of the CATS are based on self-statements made by clinically anxious, depressed, or behaviorally disturbed children (Schniering and Rapee 2002). The CATS assesses negative beliefs common to both internalizing and externalizing problems. In addition, it also contains specific items related to different disorders, which can facilitate the investigation of content-specificity (i.e. thoughts that are specific or common to these disorders). Confirmatory factor analysis of the CATS in a community sample revealed four distinct first-order factors (Physical threat, Social threat, Personal failure and Hostility) and one higher-order factor reflecting negative beliefs (Schniering and Rapee 2002). This factor structure was replicated in two other studies (Schniering and Lyneham 2007; Schniering and Rapee 2004). The CATS has consistently shown good internal reliability, with Cronbach’s alphas ranging from .82 to .96 (Bodden and Bögels 2006; Schniering and Lyneham 2007; Schniering and Rapee 2002, 2004). Test–retest reliability was good at 1 month (.66–.80) and 3 months (.68–.77; Schniering and Rapee 2002). The CATS has good discriminant validity. In fact, the CATS has been demonstrated to discriminate between children with anxiety disorders and healthy controls (Bodden and Bögels 2006; Schniering and Rapee 2002); to discriminate between anxiety, depression, and behavioral disorders (Bodden and Bögels 2006; Schniering and Rapee 2002); and to discriminate between different anxiety disorders (Bodden and Bögels 2006; Schniering and Lyneham 2007). Finally, the CATS has also been shown to be sensitive to treatment change (Mifsud and Rapee 2005; Schniering and Lyneham 2007).

While the CATS has a number of advantages over other measures of cognition, it does not assess positive thoughts. The inclusion of positive thoughts in a cognition questionnaire makes it possible to examine theoretical cognitive models like the SOM model, power of nonnegative thinking, and the content-specificity hypothesis. Therefore, to increase the applicability of this questionnaire in research on cognition in children, we decided to extend the CATS with positive self-statements. The resulting measure was named the CATS-Negative/Positive (CATS-N/P). The objective of the present study was to describe the development and psychometric properties of the CATS-N/P in a community sample of children and adolescents.

Our first research question concerned the factor structure of the CATS-N/P and consisted of two parts: (a) whether we could derive the original four-factor structure of the CATS in a Dutch population; and (b) if the factor structure for the CATS-N/P would include an extra factor for positive thoughts. Performing a factor analysis of the CATS-N/P is important for several reasons. First, adding extra items to a questionnaire might change the overall factor structure. We wanted to be confident that the original subscales were still relevant to the new questionnaire. This is important because earlier studies showed that subscales of the CATS discriminated between different disorders (Bodden and Bögels 2006; Schniering and Rapee 2002). Secondly, translating a questionnaire or using it in a different population can change the factor structure. However, the factor structure of the Dutch translation of the CATS had not been examined in previous studies. Third, a factor analysis can reveal whether the positive items will form a coherent factor. This should be determined before the balance between negative and positive thoughts can be examined in future studies using the CATS-N/P.

Our second research question concerned the internal reliability of the scale and the 8-week test–retest reliability. Our third research question focused on the convergent and discriminant validity of the CATS-N/P. We hypothesized that there would be a positive correlation between the negative beliefs factor and the measures for emotional problems and anxiety; and a negative correlation between the measures for emotional problems and anxiety and the positive thoughts scale of the CATS-N/P. Finally, age and sex differences on the CATS-N/P were examined exploratively.

Method

Participants

Participants were a community sample of 554 children (8–11 years, n = 183) and adolescents (12–18 years, n = 371) with a mean age of 12.55 years. There were 272 boys and 282 girls. Ten different schools were asked to participate in the study, and six schools agreed to cooperate (60%). Schools were public secondary schools and elementary schools in several rural and urban areas of the Netherlands. A total of 681 children and their parents were informed about the study and 569 children and their parents (83.6%) agreed to participate. Of these, twelve children were not present when the questionnaires were administered. Three children were excluded from the analyses because they had too many missing items on the CATS-N/P (i.e., a maximum of two missing items per subscale was allowed and missings were replaced with the subscale mean). Despite repeated reminders, only 402 parents (72.6%) returned the questionnaires.

Socioeconomic status of participants was assessed. Parental educational level was low (29.8%), medium (36.5%), and high (33.7%) for mothers, and low (27.3%), medium (31.2%), and high (41.5%) for fathers. 113 families (28.1%) refused to give information about their income level. Of the other families, most (68.9%) had an income level above the mean (>34,000 EUR), 17.0% had a mean income level (28,500–34,000 EUR), and 14.1% had an income level below the mean (<28,500 EUR).Footnote 1 Most children (83.4%) lived in two-parent families. Children represented the main ethnic groups in the Netherlands: Dutch (94.0%), Turkish (2.7%), Moroccan (0.7%), Antillean (1.2%), or different (1.2%). Most parents (89.6%) had the Dutch nationality.

Measures

Development of the CATS-N/P

The original Children’s Automatic Thoughts Scale (Schniering and Rapee 2002) consists of 40 items which represent different negative thoughts (e.g., “Something awful is going to happen” or “Kids will think I’m stupid”).Footnote 2 Children rate how often they have had each of the 40 thoughts in the past week. The items are scored on a five-point scale from “not at all” (0) to “all the time” (4). Four 10-item subscales (Physical threat, Social threat, Personal failure and Hostility) are calculated by adding item scores. The Total score is derived by adding the four subscale scores. We used the 40-item Dutch CATS, translated by Bodden and Bögels (2006), and added ten positive thoughts. Ten items were chosen to be added in order to facilitate the calculation of SOM (ratio) scores in the future (i.e. all subscales have an equal amount of items). We selected the positive items from the Flemish PNG-k (Positieve en Negatieve Gedachten bij kinderen; Bracke and Braet 2000). In addition to 35 negative items (all from the NASSQ-39; Ronan et al. 1994), the PNG-k contains 35 positive items which were selected from the NASSQ-39 (Ronan et al. 1994), the Automatic Thoughts Questionnaire-Positive (ATQ-P; Ingram and Wisnicki 1988), and other child questionnaires. The PNG-k was validated in a sample of 690 children. All items had high factor loadings and internal reliability was good (Cronbach’s α > .91). However, the items from the PNG-k reflect positive and negative affect, including symptoms, as well as cognition. Therefore, we used the highest loading positive items and selected items representing thoughts rather than symptoms. Moreover, positive items which were the opposite of the negative CATS items were selected first. Examples of the positive items are: “Only good things will happen to me”, “My future looks bright”, and “I enjoy life” (see Table 1 for all items).Footnote 3 For the English language version of the CATS-N/P, the positive Flemish items were translated into English in three steps. First, all authors (except EK), who are bilingual and familiar with the topic in the questionnaire, independently translated all ten items from Flemish to English. The translations were compared until we uniformly agreed on the phrasing. Next, a native English speaker, who is an expert in the field of childhood anxiety disorders, reviewed the items and recommended some minor changes. Third, the revised items were reviewed by three other bilingual psychologists, and a back translation was made from English to Dutch (equivalent to Flemish). No more changes were made after this step.

Table 1 Factors, items and factor loadings on the CATS-N/P

The final items of the CATS-N/P were scored on a five-point scale, ranging from “not at all” (0) to “all the time” (4). Higher scores on the five subscales reflect a higher amount of negative or positive thoughts. The range of each subscale is 0–40. As the Total score of the CATS-N/P represents the extent to which a child has negative thoughts, the positive items are not added to the Total score on the CATS-N/P. Therefore, the range of the Total score is 0–160.

Symptom Questionnaires

Four symptom questionnaires were administered in order to determine anxiety levels and to establish the convergent and discriminant validity of the CATS-N/P.

Strengths and Difficulties Questionnaire (SDQ). The SDQ-parent version (Goodman 1997) is a 25-item questionnaire that assesses the psychological adjustment of children and adolescents. The questionnaire has five scales: Emotional symptoms, Conduct problems, Hyperactivity-attention, Peer problems, and Prosocial behavior. Higher scores (range 0–40) reflect more problems. The total problem score was used in the current study to determine whether children had substantial psychological problems. The SDQ-parent version has good internal reliability (Cronbach’s α for Total problems .81) and concurrent validity (Van Widenfelt et al. 2003). The Cronbach’s α of the Total problem score in this sample was .80.

Spielberger State Trait Inventory for Children-trait subscale (STAIC-trait). The STAIC trait subscale (Spielberger et al. 1973) has 20 items and measures trait anxiety level in children aged 7–14. The adult version of the scale (STAI-trait; Spielberger 1983) was used in the current study with children aged 15 years and older. Scores on the STAIC-trait range from 20 to 60; on the STAI-trait the range is 20–80. Higher scores reflect higher levels of trait anxiety in both scales. The STAIC-trait and STAI-trait have been widely used and shown to have satisfactory psychometric properties (see for the Dutch versions, respectively Bakker et al. 1989 and Van der Ploeg 2000). The Cronbach’s α’s in the current study were .88 for the STAIC-trait and .89 for the STAI-trait.

Revised Child Anxiety and Depression Scale-child version (RCADS). The RCADS (Chorpita et al. 2000) is a 47-item self-report questionnaire which measures anxiety and depression symptoms in children and adolescents. Higher scores (range 0–141) reflect more symptoms. The RCADS possesses good internal reliability (with Cronbach’s α’s of .73–.82), moderate to good 1-week test–retest reliability, and good convergent and discriminant validity (Chorpita et al. 2000; Chorpita et al. 2005). The Cronbach’s α in the current sample was .95.

Children’s Depression Inventory (CDI). The CDI (Kovacs 1992) was included in the current study to explicitly measure depressive symptoms. It is a 27-item self-report questionnaire (range 0–54), with higher scores reflecting more depressive symptoms. The CDI has demonstrated adequate to good psychometric properties (Kovacs 1992). The Cronbach’s α in the current sample was .85.

Data Analysis and Overall Procedure

After parents and children had received written information about the study, informed consent was obtained from all parents and children. In accordance with the participating schools and the Clinical Psychology department Ethics Committee (University of Amsterdam), the majority of parents (70.6%) gave passive consent. Parents completed the SDQ and a demographics questionnaire. The CATS-N/P, STAI(C)-trait, RCADS and CDI were administered to the children in their classroom under supervision by research assistants. It took children about 40 min to complete all questionnaires. A sub sample of 139 children who had given informed consent completed a second CATS-N/P in order to examine test–retest reliability. Due to variability per school in the starting date of the summer vacation, some children completed the second CATS-N/P at school and others completed the questionnaire at home. Although children were reminded twice (in writing and by telephone) to return the questionnaires as soon as possible, there was considerable variation in the return of the retest questionnaire (range 7–21 weeks after the initial administration, M = 9.66, SD = 2.36). Therefore, we divided the retest data in two groups based on the retest period median (9 weeks). After the study, participants were informed about the results by an article in the school paper.

Confirmatory factor analysis (CFA) was performed using Amos 16.0 (Arbuckle 2007). We chose a confirmatory factor analysis over an exploratory analysis, given the strong theoretical assumptions about the factor structure based on earlier studies (Schniering and Rapee 2002). A confirmatory analysis allows scale items to be forced into certain, pre-determined factors (Brown 2006). The internal reliability of the CATS-N/P was calculated with Cronbach’s alpha and the test–retest reliability with Pearson’s r. ANOVA’s were used to detect age and sex differences in the subscales of the CATS-N/P.

Results

Confirmatory Factor Analysis

Amos 16.0 was used to determine which of three alternative models provided the best explanation for the data relative to a null model. Each of the alternative models was based on a theoretical conceptualization and earlier results of studies into the factor structure of the CATS (Schniering and Rapee 2002). Model 1 was the original four-factor model found by Schniering and Rapee (2002). This model contained the four subscales found in the original CATS: physical threat, social threat, failure, and hostility. By including this model, we could examine whether the original structure of the 40 CATS items could be found in a Dutch population. Model 2 was a five-factor model which contained the four subscales found in the original CATS, and a fifth factor containing positive thoughts. Model 3 was a hierarchical model, with one higher-order ‘negative thoughts’ factor and a separate yet correlated first-order ‘positive thoughts’ factor. Four-first-order factors, namely physical threat, social threat, failure, and hostility, were allowed to covariate and contribute to the higher-order factor.

Tests of normality showed that the data were not normally distributed: the majority of items showed positive skewness and kurtosis. Most children reported low frequencies of negative thoughts. This finding was consistent with the type of sample used: non-referred children with low anxiety levels. When data violate the assumption of multivariate normality, estimation methods like maximum likelihood cannot be used for CFA (Anderson and Gerbing 1988; Brown 2006). Therefore, we used the method of unweighted least-squares (UWLS), which uses the correlation matrix (Brown 2006; Schniering and Rapee 2004).

To evaluate model fit, a range of fit indices was used. There are many classes of fit indices available and different indices are recommended in different situations, depending on estimation method, model parsimony, and sample size (Hu and Bentler 1999). Absolute fit indices assess model fit at an absolute level and provide an indication of the extent to which the observed data match the predicted model of the population (Brown 2006; Hu and Bentler 1998). Of these, we used chi-square, the goodness-of-fit-index (GFI), and the adjusted goodness-of-fit index (AGFI). Chi-square should be non-significant and small relative to the degrees of freedom. However, chi-square is very sensitive to sample size and almost always significant when used in large samples (Bentler 1990; Breckler 1990). GFI and AGFI values greater than .95 indicate good model fit (Kline 2005); although others use a more lenient cut-off of .90 or even .85 (see Schniering and Rapee 2004).

Another class is the relative fit indices, which give an indication of the proportional improvement of the model relative to a more restricted, nested baseline model, usually the ‘null’ model. We used the normed fit index (NFI) and the relative fit index (RFI). NFI and RFI values greater than .90 demonstrate good model fit (Bentler and Bonett 1980).

As Table 2 shows, the original factor structure (model 1) was found in our Dutch sample. Fit indices were satisfactory, but lower than those found in the original sample (Schniering and Rapee 2002). The fit of model 2, in which positive items were added to the existing negative items, was quite good. However, the NFI and RFI were just below the recommended cut-off of .90. The proposed higher-order model (model 3) could not be examined because the fit of the underlying first-order model (model 2) was not satisfactory (Brown 2006). Instead, we examined theoretically relevant Modification Indices (MI) provided by AMOS. The MI is an index of the improvement of a model when certain constraints are made. The use of MI’s is permitted when investigating just one theoretically-justified constraint at a time (Brown 2006; Kline 2005). After allowing items 13 and 44 to load both on hostility and social threat, and allowing item 34 to load both on hostility and physical threat, the first-order model (model 2a) showed good fit. All fit indices exceeded .94 and the model explained the data better than model 2. All items were fixed to the four factors as outlined in Table 1.

Table 2 Goodness of fit indices and comparison of different models

Next, we examined whether a higher-order factor explained the covariation among first-order factors in a more parsimonious way (Brown 2006; Marsh and Hocevar 1985). This is only possible when the first-order factors have high intercorrelations (>.80). Because hostility correlated only moderately with the other first-order factors, it was not included in the higher-order factor. The higher-order model (model 4) therefore included one higher-order factor which explained the first-order factors of social threat, physical threat, and failure. The first-order factors hostility and positive thoughts were allowed to correlate with this higher-order factor. As shown in Table 1, the model fit was good, with fit indices above .93. The target coefficient T (Marsh and Hocevar) was used to compare the higher-order model to the first-order model. T represents the ratio of the chi-square of a first-order model to the chi-square of the more restrictive model. A target coefficient close to 1 indicates that the higher-order model can effectively explain the correlation between the first-order factors (Marsh and Hocevar). In the current study, the target coefficient was sufficient (T = 0.91). The standardized loadings of each first-order factor on the higher-order factor were all high (see Table 3), ranging from .85 to .97. The percentage of variance explained by the higher-order factor was also high (.73–.95).

Table 3 Intercorrelations of factors, standardized loadings of first-order factor on higher-order factor and percentage of explained variance by higher-order factor

In order to facilitate comparisons between different studies, both the original total score of the CATS (‘Total negative thoughts’) and the total score of the CATS-N/P without items from the ‘hostility’ and ‘positive thought’ factors (‘Total internalizing negative thoughts’) will be reported in the remainder of this article.

Internal Reliability

Internal reliability was calculated with Cronbach’s alpha for all subscales and the original higher-order negative thought scale. Alphas were all satisfactory to good (Physical threat .84; Social threat .89; Failure .87; Hostility .83; Positive thoughts .86; and Total negative thoughts .94). The alpha for the Total internalizing negative thoughts scale was .94.

Intercorrelations between the scales were also calculated (see Table 3). As expected, Social threat, Physical threat, and Failure were highly correlated. Low to moderate correlations were found between Hostility and Positive thoughts and the other first-order factors. Moderate correlations were found between Hostility and Positive thoughts and the higher-order factor.

Age and Gender Differences

To examine age and gender differences in scores on the CATS-N/P, ANOVAs were carried out for each subscale and the two total scores. A Bonferroni correction was applied to avoid inflation of the type I error rate (α was set at .0071). Normative data for each (sub) scale are presented in Table 4. Significant main effects for age were found for the Total internalizing negative thoughts scale, F(1, 550) = 22.37, P < .001, Total negative thoughts, F(1, 550) = 16.35, P < .001, Physical threat, F(1, 550) = 12.75, P < .001, Social threat, F(1, 550) = 14.75, P < .001, and Failure, F(1, 550) = 27.53, P < .001. Younger children (aged 8–11 years) reported more negative thoughts on these (sub) scales than adolescents (aged 12–18 years).

Table 4 Means (and SD) for the total sample and separate for different age levels and gender

A significant main effect for sex was found on the Hostility scale, F(1, 550) = 35.18, P < .001, and the Positive thoughts scale, F(1, 550) = 13.52, P < .001, indicating that boys reported more hostile but also more positive thoughts than girls.

Test–Retest Reliability

A sub-sample of 139 (25.1%) children filled out a second CATS-N/P. These children were comparable to the total community sample in terms of sex (χ 2(1) = 0.54, P > .05), mean scores on the CATS-N/P at T1 and mean scores on anxiety measures at T1 (P all >.05). However, children who participated in the retest were slightly younger (M = 12.13) than children who did not participate in the retest (M = 12.69, t(552) = 2.65, P < .05). The reliability of all factor scores and total scores at 7–9 weeks (n = 91, 65.5%) and 10–21 weeks (n = 48, 34.5%) is reported in Table 5. Test–retest reliability at 7–9 weeks was satisfactory (Pearson’s r = .62–.77). Mean scores at the first and second administration only differed for Total negative thoughts (M = 32.05 for T1 and M = 28.04 for T2, t(90) = 2.41, P < .05) and Total internalizing negative thoughts (M = 19.98 for T1 and M = 16.07 for T2, t(90) = 2.17, P < .05). Test–retest reliability at 10–21 weeks was moderate to good (Pearson’s r = .40–.62). Mean scores at the first and second administration only differed for Total negative thoughts (M = 26.46 for T1 and M = 21.02 for T2, t(47) = 2.33, P < .05) and Social threat (M = 6.88 for T1 and M = 5.23 for T2, t(47) = 2.05, P < .05).

Table 5 Test–retest correlations of the CATS-N/P for 7–9 weeks and 10–21 weeks

Relationship with Symptom Measures

The convergent and discriminant validity of the CATS-N/P was examined using Pearson’s correlations with self-report and parent-report measures of anxiety and emotional disturbance (Table 6). The mean score on the RCADS was 25.24 (SD = 17.43, n = 542), and on the CDI 7.74 (SD = 6.07, n = 548). The mean anxiety score as measured by the STAI-trait was 34.90 (SD = 8.71, n = 209); on the STAIC-trait the mean anxiety score was 30.49 (SD = 7.03, n = 343). A total of 41 children (10.2%) had a score in the clinical range on the SDQ total (M = 6.75, SD = 5.13). All correlations were significant and in the expected direction. However, correlations did not differentiate between anxiety and depression measures.

Table 6 Intercorrelations of the CATS-N/P scales and measures of anxiety and emotional disturbance

Discussion

In this study we investigated an adapted version of the Children’s Automatic Thoughts Scale, the CATS-Negative/Positive (CATS-N/P), in a non-referred community sample. In order to enhance the applicability of the CATS in studying the cognitions of children with different disorders, we added an extra subscale containing positive thoughts. The main research question of the current study was whether the factor structure and the psychometric properties of the new CATS-N/P were satisfactory.

Although some modifications in the factor structure were made, the factor structure and internal reliability of the CATS-N/P in the current sample was almost equal to the results of earlier studies using the CATS (Schniering and Rapee 2002, 2004). Although the fit indices found in the current study were a little smaller than in earlier studies (Schniering and Rapee 2002, 2004), the original four-factor structure of the CATS was supported in a Dutch sample. The major difference between the findings of the current study and earlier results was the addition of an extra ‘positive thoughts’ factor. In addition, three Hostility items were found to cross-load on other factors. Closer examination of the three items revealed that these cross-loadings seemed to have face validity. The three items were all negative thoughts about hostility directed at the child itself (e.g. “Most people are against me”). In contrast, the remaining seven Hostility items described thoughts about hostility directed at other persons and focused on revenge or other people being bad or stupid (“Bad people deserve to be punished”). Another difference between the current study and earlier results is that in the current study the Hostility factor only correlated modestly with the other factors. Therefore, it was not appropriate to use a higher-order factor including items from the Hostility factor or to calculate a Total score including Hostility items. Indeed, in earlier studies the Hostility factor was also found to display the lowest intercorrelations and factor loadings relative to other factors (Schniering and Rapee 2002, 2004). However, the three aforementioned items were not seen to be problematic in earlier studies (Schniering and Rapee 2002, 2004). The results found in this study regarding the Hostility items may reflect a shift in the underlying structure caused by the addition of the positive items. Another possibility is that due to the translation, children interpreted the items slightly different than in the English version. Based on the current findings, and to facilitate comparisons between different studies using the CATS, we recommend using and reporting two different negative thoughts Total scores: one with and one without Hostility items. Indeed, this is in line with the findings from other studies using the CATS, which have found that children with internalizing disorders (anxiety and depression) report more negative thoughts concerning Physical threat, Social threat, and Failure, while children with behavior disorders score higher on Hostile intent (Schniering and Rapee 2002).

The new positive items showed good internal reliability and high factor loadings. As expected, the Positive thoughts scale correlated negatively with almost all other subscales. However, the Positive thoughts scale was positively correlated with the Hostile thoughts scale. A possible explanation for this finding may be that both types of items share a common feature: assertive/extrovert or externalizing thoughts. Indeed, the higher-order model found in the current study seems to suggest that the CATS-N/P has three different types of items: items reflecting negative, internalizing self-statements (including separate social threat, physical threat, and failure items); items reflecting negative externalizing self-statements (hostility) and items reflecting positive self-statements.

The second research question in this study was whether the test–retest reliability of the CATS-N/P was satisfactory. Although the short-term reliability of the CATS-N/P was good and comparable to earlier results with the CATS (Schniering and Rapee 2002), the reliability at a longer interval was only moderate to good. However, there were some practical constraints which may have influenced the results of the test–retest analysis. First, the retest occurred across a broad time frame, which makes it difficult to interpret the results. Second, because of the school holidays, some children did the retest at home without supervision. The administration of the measure in different settings might have influenced the answers (e.g., at home there was less group pressure). Moreover, during the school holidays, children were probably less exposed to events involving threat, potential failure, and hostility from peers than during normal school days. This may have resulted in a temporary decrease in the occurrence of their negative thoughts. Third, there was a selection bias, in that children did not automatically participate in the second part of the study and had to give separate consent for the retest. Due to the aforementioned constraints, the stability of the CATS-N/P over longer periods of time should be further investigated.

The third research question explored in this study was whether the convergent and discriminant validity of the CATS-N/P was satisfactory. As expected, the Positive thoughts subscale was negatively associated with measures assessing anxious and depressive symptoms. Furthermore, the correlations between Physical Threat, Social Threat, Failure, and Total scores and the anxiety and depression measures were all high. Hostility correlated only moderately with these measures. This result was in line with previous studies which demonstrated that Hostility distinguished between children with internalizing problems and children with behavior problems (Schniering and Rapee 2002). The low correlations between the CATS-N/P subscales and the emotional subscale of the SDQ may be explained by the fact that the SDQ measures global emotional problems. Moreover, the SDQ was filled out by the parents, while the CATS-N/P was filled out by the children.

Unexpectedly, the correlations found between the CATS-N/P and both anxiety and depression measures were equally strong. This result was in contrast to earlier research, which found that the CATS subscale Failure discriminated between depressed and anxious children (Schniering and Rapee 2002). In the current study, however, the assessment of the validity of the CATS-N/P was based on correlations between different subscales of the CATS-N/P and different symptom measures in a non-referred sample, rather than differences between clinically anxious, depressed, or behaviorally disturbed children. Therefore, the current sample may have been too homogeneous to find differences between anxiety and depression. This may especially have been the case, given that the correlation between anxiety and depression symptoms is known to be high (Costello et al. 2003). Of course, another possibility is that the CATS-N/P is not able to discriminate between anxiety and depression.

Finally, we examined age and gender differences in mean scores on the different CATS-N/P subscales. First of all, the mean scores are substantially lower overall (about 8 points for Total score and 2 points for all subscales) than in the community sample as described by Schniering and Rapee (2002). Although the groups used in the current study and the study by Schniering and Rapee (2002) seem comparable in terms of age, gender, and socioeconomic status, the difference in mean scores between the two studies might be explained by cultural differences. Moreover, the translation and extra positive items might have influenced responding style. However, the means found in the current study are higher (about 11 points for the Total score) than the means found in a Dutch control sample described by Bodden and Bögels (2006). This difference in mean scores cannot be due to the translation, but interregional variations and the smaller sample size used by Bodden and Bögels may account for the difference. In clinically anxious groups, large differences in mean scores between separate studies have also been found (e.g. Bodden and Bögels 2006; Schniering and Lyneham 2007; Schniering and Rapee 2002). Further research should aim to investigate the possibility that the CATS or CATS-N/P may be very sensitive to sample characteristics.

The age differences in this study were rather unexpected. In contrast with earlier results (Bodden and Bögels 2006; Schniering and Lyneham 2007), younger children reported more negative thoughts on some scales than older children. Although one would expect older children to worry more about social threat and failure, the group setting in which these data were collected may have lead to underreporting thoughts because of social concerns. As for sex differences, the results of the current study are similar to those found in earlier studies, in that boys reported more hostile thoughts and positive thoughts than girls. These findings may reflect that boys in general display more externalizing behavior (Costello et al. 2003) and are more self-confident than girls (Birndorf et al. 2005).

Because some cross-cultural differences were found regarding the overall mean scores and differences between age groups, the generalizability of the CATS and CATS-N/P to other countries and/or cultures is uncertain. Therefore, norm tables from different samples should be interpreted cautiously. Another limitation of this study is that the factor structure and psychometric properties of the measure were not evaluated in a clinical group. Future studies should therefore focus on further establishing the discriminant validity and psychometric properties of the CATS-N/P, for example by comparing non-clinical and clinical groups with different disorders (e.g., anxiety disorders, depression, behavior disorders). Moreover, it would be interesting to examine whether the CATS-N/P can predict treatment change and whether the Positive thoughts subscale can discriminate between clinically anxious and depressed children.

The CATS-N/P is an adapted and innovative version of the CATS designed specifically to measure positive and negative thoughts in children. This was the first study to apply the CATS-N/P in a large community sample in the Netherlands. The psychometric properties of the new measure were found to be good and the added positive items formed a psychometrically sound factor. Therefore, the CATS-N/P can be a valuable tool for the facilitation of research into the role of cognitive factors in the development and maintenance of different childhood disorders. Moreover, the use of the CATS-N/P in a clinical setting might improve the insight of the clinician in the amount of dysfunctional thoughts of a child pre-treatment and whether cognitions change over the course of treatment (especially after cognitive restructuring).