Introduction

Gender dimorphic behavior is a well-established phenomenon in child development. Previous studies have shown that most children begin to develop a gender stereotyped pattern of toys and activity choices by age three, and exhibit marked gender-related behaviors and interests by middle childhood (Beere, 1990; Lytton & Romney, 1991). Exploring children’s gender-related behavior in middle childhood will enrich our knowledge of normative gender development and provide information for understanding atypical gender development. Despite a number of studies which have investigated this phenomenon in Western societies (e.g., Elizabeth & Green, 1984; Sandberg & Meyer-Bahlburg, 1994), scarce literature exists on gender-related behavior issues among Chinese children.

Four parent-report questionnaires are available for assessing a range of children’s gender-related behaviors, and show consistent reliabilities and validities (for reviews, see Bailey & Zucker, 1995; Zucker, 2005). These instruments are easy to administer with large samples, and can display low levels of response bias (Meyer-Bahlburg, Sandberg, Dolezal, & Yager, 1994a; Meyer-Bahlburg, Sandberg, Yager, Dolezal, & Ehrhardt, 1994b). Among the four questionnaires, two measures provide normative data collected on non-clinical school-aged children: the Child Behavior and Attitude Questionnaire (CBAQ) (Bates, Bentler, & Thompson, 1973) and the Child Game Participation Questionnaire (CGPQ) (Bates & Bentler, 1973). The present study aimed to modify the CBAQ and CGPQ to construct a measurement that can be used to assess Chinese children’s gender-related behaviors.

The CBAQ was initially developed for boys only to measure gender conformity or nonconformity by means of parents’ reports of their sons’ participation in boy-typical and atypical behaviors, their relations to peers and parents, and the occurrence of behavior problems considered more frequent for boys than for girls. The final form of the original CBAQ contained 55 items. An example item is “He swishes and swings his hips when he walks.” Parents rate the frequency of the occurrence of such behaviors either on a 5-point Likert scale from 1 (Never) to 5 (Always) or an 8-point Likert scale from 1 (Once every 6 months or less) to 8 (Daily).

The CBAQ was modified by Meyer-Bahlburg et al. (1994b). They created a 71-item CBAQ for boys by adding 16 gender items and also designed a comparable 68-item scale for girls. The full questionnaire was administered to a community sample of American parents of children aged 6–10 years. Through factor analysis, the items were reduced to 35 distributed over four subscales: (1) Femininity (FEM), (2) Cross-Gender, Boys and Girls (CG-A), (3) Cross-Gender, Boy only (CG-B), and (4) Cross-Gender, Girl only (CG-C). The subscales showed high internal consistencies and yielded significant differences between boys and girls and large effect sizes.

The original CGPQ (Bates & Bentler, 1973) was designed to differentiate boys with gender identity disorder from gender typical boys by virtue of items assessing game-playing behavior. Items on the CGPQ cover 64 children’s games, such as “Play with dolls” and “Indian wrestling.” Parents answer “yes” or “no” based on whether their children regularly participated in a specific game. To better screen school-age children with gender problems, Meyer-Bahlburg et al. (1994a) later incorporated five items and constructed four subscales through factor analysis: (1) CGPQ-A, a bipolar Gender scale, (2) CGPQ-B, a Preschool scale, (3) CGPQ-C, a Masculinity scale, and (4) CGPQ-D, a Femininity/Preschool scale. The modified CGPQ consisted of 52 items, and all four scales showed acceptable internal consistency and significant gender differences (Meyer-Bahlburg et al., 1994a).

Later research using CBAQ and CGPQ has shown significant gender differences (Bailey, Bechtold, & Berenbaum, 2002; Meyer-Bahlburg et al., 1994a, b) and good discriminant validity (Berenbaum & Snyder, 1995; Hines, 2004). The CBAQ and CGPQ can, therefore, be used to measure gender-related behavior in the general population and to screen gender non-conformity in epidemiological studies.

Another desirable feature of the two gender scales, the CBAQ-FEM and the CGPQ-A, is the insensitivity of the scale scores to some demographic variables. Participants’ ethnicity and parents’ education did not have any significant effects on children’s scale scores (Meyer-Bahlburg et al., 1994a, b). The instruments may well be appropriate for use in cross-cultural research involving children with different ethnic and socioeconomic backgrounds. It should be noted that both the CBAQ and CGPQ were modified by the same group of researchers (Meyer-Bahlburg et al., 1994a, b), and the constructs and contents of the two instruments are very similar. Therefore, it is reasonable to combine items from the two instruments and modify them to yield one screening tool to identify children who exhibit gender atypical behaviors, including in the non-clinical range.

In China, studies on children’s gender-related behavior have been mostly qualitative and have mainly focused on clinic-referred children with gender problems. Researchers often assess children’s gender typical or atypical behavior with direct observation, interviews with teachers or parents, and self-developed questionnaires (Liang, Li, & Huang, 2006; Shang Guan, 2000; Shi, 2005). Similar to Western findings, Chinese children begin to exhibit observable sex-dimorphic behaviors by the second year of their lives; girls prefer soft toys and tend to be sedentary, whereas boys like cars and trucks and tend to be more active (He, 2005). Between 2 and 3 years old, play becomes gender-segregated (Xing & Jia, 2001). Children appear to be most rigidly gender-stereotyped at age four to five, and boys are overall more rigid than girls. By age seven, children’s gender stereotypes become more flexible (Du & Su, 2005). Since the PSAI was first introduced and validated for use with Chinese preschool children (Golombok & Rust, 1993a, b), more quantitative findings have been reported (Du, Su, & Li, 1995). For school-aged Chinese children, however, there is still neither a standardized measure for the assessment of gender role behavior nor any prevalence data about school-aged children’s gender dimorphic behaviors in representative community samples.

Gender is culturally defined and there are variations across cultures and historical epochs in terms of behaviors or traits defined as masculine or feminine. Any measures of gender development and gender atypicality should be consistent with cultural definitions of masculinity and femininity in the time and place they are used (Wolff & Watson, 1983). Behaviors, attitudes, and personality traits designated as stereotypically masculine or feminine in China may be different from those of Western countries. For example, talkativeness may be regarded as a feminine trait in America but as gender-neutral in China. Therefore, before using any Western measure in Chinese society, it is necessary to ensure the accuracy of translation, as well as to re-examine the psychometric characteristics of the translated instrument.

The CBAQ and CGPQ scales appear to be robust instruments—psychometrically sound, capable of being used with clinical and non-clinical samples regardless of ethnicity and parents’ education. However, the research on these instruments is largely North American. They have not been systematically tested with a Chinese sample. China has the largest population in the world, with over 200 million children under 12 years old (National Bureau of Statistics of China, 2006). Thus, a validated Chinese version of such instruments would be of great value for both researchers and clinicians. The purpose of the present study was to modify the CBAQ and CGPQ into a new measurement tool, namely the Child Play Behavior and Activity Questionnaire (CPBAQ), for the assessment of children’s gender typical behaviors in Chinese culture, and then to test the psychometric properties of the CPBAQ.

Method

Participants

Participants were sampled from three elementary schools in Hefei, the capital city of an eastern Chinese province. With the assistance of the local Education Bureau, three standard elementary schools were randomly selected. All children studying in the first through sixth grade were eligible for inclusion in the study. Among them, 40 boys and 40 girls were randomly selected from each grade, and their parents were asked to serve as informants. A total of 1,440 questionnaires were distributed. Parents of 246 children refused to participate, and 291 questionnaires were incomplete and therefore were not included in the analyses. Complete data were obtained from parents of 903 children. Among them, there were 486 boys, age 6–12, with a mean of 9.43 years (SD = 1.62), and 417 girls aged from 6 to 12, with a mean of 9.50 years (SD = 1.71).

Procedure

With approval by the school administration committee, a detailed cover letter from the researcher, a consent form, and a set of questionnaires (with a return envelope) were sent to the parents or guardians of selected children. The cover letter explained the purpose of the study and how the survey would be conducted. Parents were invited to complete the questionnaire and consent form, as well as a background information sheet, and then seal them in the return envelope and send them the next day to the researcher’s temporary and locked mailbox in their children’s school. The researcher collected all envelopes in the following week. The parents were assured that survey information was confidential and only aggregate data would be made available to the schools. Parents took approximately 30 minutes to complete the questionnaire.

Measures

Step 1: Item Pool Construction

All 52 items of the CGPQ and 35 items of the CBAQ were translated into Chinese and back-translated to ensure comparability with the English versions. In addition to these items, 14 gender-related itemsFootnote 1 were selected from a list of Chinese folk games for children (Wang, 2000). The selection of these new items was to expand the existing items to cover domains commonly considered as markers for gender-stereotyping: peer preferences, role in fantasy play, relations to peers, and relations to parents (Zucker, 2005).

For the CGPQ, 13 games out of 52 items that either did not exist or were uncommon in Chinese children’s daily activities were removed. Among the newly incorporated 14 items, nine items have similar content with a few CBAQ or CGPQ items (e.g., a Chinese item “Nose, mouth, eye” is similar to an original CGPQ item, “Simon says”). Based on findings from Wang (2000) and the researchers’ own knowledge, another five children’s activities, each apparently gender-dimorphic in Chinese culture (“Play with plasticene,” “Put handkerchief behind you,” “Hawk catches chicken,” “Sliding board,” and “Wushu”) were included.

As mentioned earlier, some CBAQ items were designed to be answered by members of one gender group only. In the present study, the items of CG-B (Cross-Gender, Boy only) and CG-C (Cross-Gender, Girl only) were rewritten so that parents of either boys or girls could answer. For example, the item “He does things with female relatives” was rewritten as “He (She) does things with female relatives.”

In addition, several duplicate or similar items in the CBAQ and CGPQ were replaced or modified by either combining the two items or employing two forms. For example, “Play with dolls” in the CGPQ was replaced by “Play with girlish dolls, such as Barbie doll” and “Play with boyish dolls, such as Robot” (two items in CBAQ); “Soccer” and “Basketball” were combined as “Soccer or Basketball”; “Wrestling” and “Wushu” were combined as “Wrestling or Wushu.”

In summary, the ultimate item pool comprised 100 items covering gender-related play preference, behavior, attitude, and relation to other people. The items were classified into three groups based on wording and specific contents: (1) Boy typical items, in which the behavior described is performed more frequently by boys than by girls (e.g., “He (She) imitates male characters on TV or in the movies”); (2) Girl typical items performed more frequently by girls than boys (e.g., “He (She) uses feminine gestures with hands when talking”); (3) Cross gender items, specifically describing a cross gender behavior and applied to both boys and girls (e.g., “He (She) has stated the wish to be a girl (boy) or a woman (man)”). For each item, parents rated the frequency of occurrence on a 5-point Likert scale (“Never” = 1, “Occasionally” = 2, “Sometimes” = 3, “Usually” = 4, and “Always” = 5).

Step 2: Identification of Gender-Stereotyped Items

Items displaying a gender difference were identified as follows. First, parents’ responses on the 5-point Likert scale were defined as dependent variables. Gender differences in play were assessed using a series of multiple regression analyses (Norusis, 2000) by entering the subject’s sex in the second block of the regression model after statistically controlling the influence of subjects’ age. Participant sex was coded −1 for boys and +1 for girls. Thus, a negative regression coefficient signified greater male participation relative to girls in an activity.

Step 3: CPBAQ Scale Construction

A new item pool was created based on the results of regression analysis, consisting of three parts: (1) gender typical game items (boy’s typical and girl’s typical games which showed the greatest gender differences, mainly from CGPQ items); (2) gender typical behavior and activity items (which displayed statistically significant gender differences, mainly from CBAQ items); (3) cross gender description items (items from the cross gender scale for boys and girls in CBAQ). The new item pool was subject to an exploratory principal-axis factor analysis for the earlier sample (N = 903). Due to the fact that factor structures depend on intercorrelations, which are influenced by the variability of the individual variables, a forced three-factor extraction with varimax rotation was performed in order to maximize variance of the data set, and to further simplify the interpretations of the generated scales. For the scales derived from the results of the factor analysis, gender effect sizes were calculated based on Cohen (1988).

Results

Identification of Gender-Stereotyped Items

Of the 52 game items on the questionnaire, 37 items showed statistically significant gender differences when participant’s age was controlled.Footnote 2 The gender-related games were categorized into two groups: Girl typical games, in which girls’ participation exceeded boys’, or Boy typical games, in which boys’ participation exceeded girls’. Table 1 presents the 19 girl-typical items and 18 boy-typical items, and the correlation coefficient for each item. In the table, the game items are ordered within each group from the smallest to the largest gender difference based on multiple regression. Statistically significant partial correlations between CGPQ item and subject sex ranged from .09 (“Plays with kite”) to .69 (“Plays toy gun” and “Plays soccer or basketball”). The first six girl typical games and first six boy typical games with the largest regression coefficients were selected for inclusion in the final item pool.

Table 1 Gender differences for game items

Apart from the 37 game items, statistically significant gender differences were found on 17 out of 35 other gender-related behavior items in the questionnaire, which concerned children’s play patterns (e.g., “Plays with girls at school”), interests (e.g., “Likes real automobile”), mannerisms (e.g., “Use feminine gestures with hands when talking”), and peer relations (e.g., “Popular among boys”). Among the 17 items, 10 items were identified as girl typical behavior (girls’ participation significantly exceeded boys’) and seven items as boy typical behaviors (boys’ participation significantly exceeded girls’). Table 2 presents these items ordered within each group by gender differences. Statistically significant partial correlations between item and sex ranged from .22 (“Play-acts, puts on little dramas”) to −.76 (“Plays with boys at school”).

Table 2 Gender differences for behavior and activity items

Table 3 shows the cross-gender behavior items. For half of the items, there was no significant gender difference, suggesting that these cross-gender behavior patterns were equally common in both boys and girls. However, these items were retained in the new item pool because the content of the 10 items pertained to the core phenomenology of gender identity disorder (Meyer-Bahlburg et al., 1994b). Inclusion of these items would contribute to making the scale useful for clinical application.

Table 3 Gender differences for cross-gender behavior items

Factor Analysis

Factor analyses were limited to the new 39-item pool based on the previous results. The new item set included the 12 gender typical games, the 17 other gender typical behaviors and attitudes, and the 10 cross gender description items. The unrestricted solution yielded six factors which explained 64.4% of the variance. The first three factors, explaining 21.8%, 20.2%, and 10.8% of the variance, respectively, were interpretable in relation to comprehensive gender scales. Factor loadings of the remaining factors were quite small and were not included in the subsequent analyses, which were restricted to a forced three-factor solution.

The forced three-factor extraction, unrotated, resulted in a bipolar Factor 1 (Table 4). Twenty-four items had loadings above .40 were selected based on the criterion used in developing the original CGPQ and CBAQ (Meyer-Bahlburg et al., 1994a, b). The positive ones were all girl typical behaviors (i.e., significantly more frequently endorsed for girls), and the negative ones all boy typical behaviors. After varimax rotation, Factor 1 was unipolar, containing 14 items with loadings above .40, all girl typical behaviors. The varimax-rotated Factor 2 was also unipolar: 12 items had loadings above .40, all boy-typical behaviors. Factor 3 was also unipolar, and included six items with loading above .40, all cross-gender behaviors. Selected items and their loadings for each factor are shown in Table 4.

Table 4 Factor loadings for CPBAQ items

Based on the factor analyses, four interpretable scalesFootnote 3 were constructed, incorporating items with loadings above .40, in unit-weighted fashion. The first one was a bipolar Gender Scale (GS), incorporating 24 items loading above .40 on the unrotated Factor 1, with negative loadings for boy-typical items. The GS Scale measured how feminine a child was. The second scale was a Cross-Gender Scale (CGS), incorporating six items loading significantly on the stable Factor 3 (invariant across several variations of the factor analysis) and assessing children’s gender nonconforming behavior. The other two unipolar scales were the 14-item Girl Typicality Scale (GTS, varimax-rotated Factor 1) and the 12-item Boy Typicality Scale (BTS, varimax-rotated Factor 2), measuring girl-typical and boy-typical play behavior and attitudes, respectively.

Psychometric Properties

Internal consistencies and gender effect sizes of the CPBAQ scales are shown in Table 5. The gender effect sizes were very large for three of the four scales. Cronbach’s α coefficients of the original CBAQ and CGPQ scales (Meyer-Bahlburg et al., 1994a, b) are also shown in Table 5 to facilitate comparison with those of CPBAQ. Table 6 presents the intercorrelations among the four CPBAQ scales for boys and girls separately.

Table 5 Internal consistencies and effect sizes of CPBAQ, CBAQ and CGPQ
Table 6 Inter-correlations among the Four CPBAQ Scales

Discussion

Gender role development is an important part of children’s socialization. Despite numerous Western studies on children’s gender-related behaviors and personality traits, very few studies have addressed this issue among Chinese children. This is particularly true of quantitative research. This study was the first to adapt two questionnaires developed in the U.S., the CBAQ and the CGPQ, assessing children’s gender-related behaviors on a sample in China.

The items in CGPQ and CBAQ were earlier viewed as comprising four scales (a bipolar gender scale, a boy typical scale, a girl typical scale, and a cross-gender scale); this view was based upon differential item correlations with the factor-analytically derived scales of a second gender behavior questionnaire (Meyer-Bahlburg et al., 1994a, b). The present study goes beyond these earlier reports by adapting some items in line with Chinese culture and demonstrating gender differences for the majority of the adapted items in a relatively large sample of Chinese boys and girls in their middle childhood and in a culture different from the setting in which the original instruments were developed. Through factor analyses on the adapted items, four scales were generated: a Gender Scale (GS), a Girl Typicality Scale (GTS), a Boy Typicality Scale (BTS), and a Cross-Gender Scale (CGS), together constituting the 32-item CPBAQ. The gender effect sizes on the CPBAQ scale were comparable to the original CBAQ bipolar Masculinity-Femininity scale (CBAQ-FEM) and CGPQ bipolar Gender scale (CGPQ-A). The prominent gender differences detected by the CPBAQ provide evidence for the validity of this instrument.

We believe that another robust property of the CPBAQ was the gender-nonspecific item language which makes the questionnaire applicable to both sexes. This is in contrast to two of the scales of the CBAQ, i.e., CG-B (Cross-Gender Scale for Boys Only) and CG-C (Cross-Gender Scale for Girls Only) which incorporates items specific to either boys or girls. The effect of this is to make boys’ and girls’ scores on the CPBAQ comparable. The practical value of this is that researchers will be able to contrast directly gender-related behaviors among the two sexes. We believe that they should also be able to investigate developmental trends in behavior (individual behaviors or a set thereof) in both boys and girls. The GTS, BTS, and CGS can be used for screening in epidemiological studies and clinical assessment to tap three sets of gender-related behaviors: girl typicality (feminine behaviors), boy typicality (masculine behaviors), and gender atypicality (cross-gender behaviors). The bipolar GS scale can be used as a discriminant function derived from an orthogonal pair involving boy typicality and girl typicality. Based on GS scores, researchers might classify children into different categories, such as masculine boy, masculine girl, feminine boy, or feminine girl, for different research purposes. Finally, the comparability of boys’ and girls’ scores in the CPBAQ facilitates analyses for factor analytic and scaling purposes.

The absence of an appropriate instrument for assessing Chinese children’s gender role development has seriously hindered not only gender research among Chinese children but cross-cultural comparison studies (involving China) that might identify pan-cultural commonalities and cross-cultural dissimilarities that we have reason to believe exist (Low, 1989) and might shed new light on existing theories, assumptions, beliefs, and practices in particular societies (Cross & Madson, 1997). Scholars have proposed that certain unique features of Chinese culture and social background (i.e., the one-child family, Confucianism, the Cultural Revolution, etc.) would very likely exert effects on Chinese people’s gender-related attitudes and expectations as well as children’s gender role formation. For example, the effect of the one-child policy may be to strengthen Chinese parents’ gender expectations for their children, and to increase children’s gender typing, especially for the commonly preferred sex of child (male). On the other hand, the one-child policy has arguably introduced a Western ‘‘child-centered’’ attitude into Chinese child-rearing, particularly among well-educated populations (Chang, Schwartz, Dodge, & McBride-Chang, 2003). Only-children may experience less pressure from parents to adhere to normative gender role behavior. In addition, the absence of siblings may in itself have a direct effect upon children’s gender role development (McHale, Updegraff, Helms-Erikson, & Crouter, 2001). It has been reported that children with few or no siblings in the home are more likely to show gender-egalitarian beliefs (Hertsgaard & Light, 1984; Levy, 1989). However, there is little empirical evidence for the above issues in Chinese children based on a standardized instrument for the assessment of gender role.Footnote 4 It is hoped that the CPBAQ developed in this study might provide the means for researchers to investigate Chinese children’s gender role development and accumulate empirical evidence in preparation for future cross-cultural comparisons.

The present study had several limitations. First, the current study drew on participants from a Chinese city unlikely to be representative of the Chinese population overall. Therefore, more nationally representative samples of Chinese children are needed to further validate the CPBAQ for nationwide use. Second, only parent-report data were collected in this study. It remains unclear whether and to what extent the parents’ ratings reflect their children’s actual gender role behaviors instead of the parents’ own gender stereotypes. Future research with the CPBAQ might profitably involve the use of the questionnaire with children themselves or teachers, allowing comparison with parents’ ratings. Third, this study was based on a community sample. The psychometric properties of CPBAQ with a clinic sample remain to be tested. In addition, although the criterion group validity of the CPBAQ was demonstrated by the presence of significant gender differences, discriminant validity or concurrent validity were not examined in this study. Future research might examine the relationships among children’s gender role behaviors and other gender-related personality traits in order to test further the validity of the CPBAQ.

Despite the above limitations, we believe the CPBAQ scale constitutes a questionnaire with satisfactory psychometric properties which, designed for the measurement of gender typical or atypical behavior in Chinese children, fills the long existing gap in research into Chinese children’s gender role development.