How Valid Are Measures of Children’s Self-Concept/ Self-Esteem? Factors and Content Validity in Three Widely Used Scales

Children’s self-esteem/self-concept, a core psychological construct, has been measured in an overwhelming number of studies, and the widespread use of such measures should indicate they have well-established content validity, internal consistency and factor structures. This study, sampling a demographically representative cohort in late childhood/early adolescence in Dublin, Ireland (total n = 651), examined three major self-esteem/self-concept scales designed for late childhood/early adolescence: Piers-Harris Self-Concept Scale for Children 2 (Piers et al. 2002), Self-Description Questionnaire I (Marsh 1992) and Self-Perception Profile for Children (Harter 1985). It also examined findings in light of the salient self factors identified by participants in a linked mixed-methods study. The factor structure of Piers-Harris Self-Concept Scale was not replicated. The Self-Description Questionnaire I and Self-Perception Profile for Children were replicated only in part although in similar ways. In all three scales, a global/ appearance self evaluation factor accounted for the largest variance in factor analyses. Sport/athletic ability, school ability, school enjoyment, maths and reading ability/enjoyment, behaviour, peer popularity, and parent factors were also identified but did not always reflect existing scale structures. Notably, the factors extracted, or items present in these scales, often did not reflect young people’s priorities, such as friendship over popularity, the importance of family and extended family members, and the significance of incremental personal mastery in activities rather than assessing oneself as comparatively good at preferred activities. The findings raise questions about how self-esteem/self-concept scales are used and interpreted in research with children and young people.


Introduction
Children's self-esteem/self-concept is a core psychological construct that has been measured in an 'overwhelming' number of studies, the great majority employing standardised measures (Kwan et al. 2007). As a fundamental premise of psychological measurement is that instruments should be reliable and valid, accurately reflecting the underlying construct they purport to measure (Anastasi and Urbina 1997), the widespread use of self-esteem/self-concept scales should indicate that they reflect participants' key self factors, and that the factors measured are internally consistent and stable. Yet self-esteem/self-concept scales vary considerably, both in structure and content, raising questions about which scales are more valid. Furthermore, qualitative enquiry has established that children value domains and content not found in these scales (Tatlow-Golden andGuerin 2010, 2017), which are typically adult-devised, involving little or no reported consultation with children (Butler and Gasson 2005). Therefore, there is a case for considering content validity, internal consistency and factor structures of self-esteem/self-concept scales designed for children. This study does so for three major self-esteem/self-concept scales, considering findings in light of the salient self factors that children in the same samples prioritised (Tatlow-Golden and Guerin 2017).
The self literature has long been characterised as a 'morass' and a 'shambles' (Leary 2004(Leary , 2006Rosenberg 1979), replete with 'fuzzy' concepts (Markusen 2003) that are not consistently defined. 1 We use self-concept for the overall self (e.g. Baumeister 1996;Byrne 1996;Rosenberg 1979); selfesteem for self-evaluations, (thus self-esteem is a subset of the overall selfconcept); global self-esteem for overall self evaluations, and domain-specific self-esteem, (or e.g., academic self-esteem) for specific domains. Note however that others use 'self-concept' for domain-specific self evaluations, and some scale titles reflect this (Harter 1985;Piers et al. 2002).

Comparing Scale and Subscale Content across Self-Concept Scales
Participants' most salient self-concept factors should be found in self-concept scale domains and items, and indeed scale creators argue that domains are relevant to all or most participants (Harter 1999;Roche and Marsh 1993). However, the domains found in children's self-concept scales vary considerably. Table 1 identifies the domains found in six scales.
Only one domain, peers/popularity, appears in all six scales. School/academic domains are in four scales; parent/family domains in three; two scales contain a dedicated physical appearance domain. Half of all the subscales (9 of 18) appear in one scale only (Significance, Value, Control over Destiny, Resilience, Competence, Anxiety, Happiness, Reading and Maths). Therefore, Two (•) s in a column for a scale indicate that items from those two categories combine to form a single subscale, e.g. in the TSCS, physical appearance and physical ability items are in a single physical subscale *SPPC (Self-Perception Profile for Children, Harter 1985); SDQI (Self-Description Questionnaire I; Marsh 1992) PH2 (Piers-Harris Self-Concept Scale for Children, Piers, Herzberg & Harris, 2002; SEI -Coopersmith Self-Esteem Inventory, Coopersmith 1967); TSCS (Tennessee Self-Concept Scale, Roid & Fitts, 1988); Robson (Robson Self-Esteem Scale, 1989) where global self-esteem is a summed score (as is the case for four of these six scales), it measures different domains, e.g., the Self-Esteem Inventory (SEI, Coopersmith 1967) measures Peer, Parent, School and Personal domains whereas Robson (1989) measures Social, Appearance, Moral-ethical, Competence, Significance, Resilience, Value and Control over Destiny. Two further scales, the Self-Description Questionnaire I (SDQI; Marsh 1992) and Self-Perception Profile for Children (SPPC; Harter 1985) measure global self-esteem with a dedicated subscale rather than by summing all items, but also vary in the other subscales they contain. These varying operationalisations indicate underlying differences in definitions of children's key self-concept domains. In light of these differences, and given the widespread use of these scales in empirical research, this study aims to examine commonalities, differences and factor structures of multidimensional self-esteem/self-concept scales for late childhood /early adolescence. Three key scales were selected: the most widely used scale in self-concept research with children (Butler and Gasson 2005), the Piers-Harris Self-Concept Scale for Children 2 (PH2; Piers et al. 2002), and two that have been evaluated as psychometrically superior (Byrne 1996), the Self-Description Questionnaire I (SDQI; Marsh 1992) and the Self-Perception Profile for Children (SPPC; Harter 1985).

Participants
As part of a larger mixed methods study of children's self-concept (Tatlow-Golden and Guerin 2017), participants (n = 651, 10-13 years) in 5th and 6th class, the final two years of primary school, in co-educational National (public) schools across Dublin, Ireland were invited to complete one of the three scales cited above. Demographics for those completing each scale, including those from schools in communities experiencing social disadvantage, are given in Table 2.

Compliance with Ethical Standards
The study was submitted to the Human Research Ethics Committee at University College Dublin and passed required procedures for full ethical review.

Materials
The Self-Description Questionnaire I (SDQI; Marsh 1992), considered the most psychometrically validated self-esteem measure for late childhood/early adolescence (Byrne 1996), employs eight domains: seven domain-specific subscales (School, Reading, Mathematics, Physical Appearance, Physical Ability, Peer Relations and Parent Relations) and an eighth subscale, General-Self, derived from the Rosenberg Self Esteem Scale (72 items in total). The scale is normed on responses from 3652 elementary school children from diverse backgrounds (the General subscale was normed on only 732 of these) in New South Wales, Australia, Grades 2-6 (2768 from Grades 5-6), no ages given. The Self-Perception Profile for Children (SPPC, Harter 1985), widely used in selfesteem studies in late childhood/early adolescence (Butler and Gasson 2005), measures self-perceived competence/ acceptability with 36 items in six subscales: Scholastic Competence, Athletic Competence, Social Acceptance, Physical Appearance, Behavioural Conduct, and Global Self-Worth. The total normative sample for this version of the scale reported in the 1985 manual is n = 1543 (from 4 separate samples, Grade 3 to 8; no ages reported), all from the State of Colorado in the United States, mostly lower to upper middle class and 90% Caucasian.
The Piers-Harris 2 Children's Self-Concept Scale (PH2; Piers et al. 2002), the most frequently cited self-esteem scale with children and adolescents (Butler and Gasson 2005), was originally unidimensional (Piers and Harris 1964). Six subscales were created post hoc by retaining items identified in six factor analyses (Byrne 1996;Piers 1984): Happiness and Satisfaction, Freedom from Anxiety, Physical Appearance and Attributes, Popularity, Behaviour, and Intellectual and School Status. This scale differs from the other two in that items load on to multiple factors. The 2002 version contains 60 items, reduced from 80 in earlier versions. The full standardization sample reported in the 2002 manual was 1387 ethnically diverse students from across the USA, ranging in age from 7 to 18 years.

Analyses
A series of psychometric analyses was undertaken in SPSS v24. Internal consistency reliability (Cronbach's alpha) and completion rates for the scales were explored. Exploratory factor analyses (EFA) identified factors in the current sample and item content of these factors was compared to those of the published subscales. EFA was chosen rather than confirmatory factor analysis as the premise of this review of these scales is to examine their content validity and hence the analysis aimed to allow for alternative structures to emerge if they were present.
To explore scale factors, Principal Component Analysis (PCA) was chosen as the primary method, for its psychometric soundness. As the underlying principle of these multidimensional self-concept scales is that they address psychometrically distinct self domains (Marsh 1992), we employed orthogonal rotation. Varimax was chosen to maximize the dispersion of loadings within factors; only items that loaded at .4 or above (16% or more of variance), were interpreted (Field 2005). Key assumptions were assessed using Bartlets Test of Sphericity and the Kaiser-Meyer-Olkin (KMO) Test for Sampling Adequacy. Finally, in order to examine whether any differences observed in the factor structures related to the use of orthogonal rotation, a second PCA was conducted using oblique rotations as had predominantly been used in the normative analyses of the scales. The findings of this oblique set of rotations are reported in the narrative here rather than in the tables, to support the reporting of the orthogonally extracted factors, but avoiding further complexity in the reporting of results.

Reliability and Completion Rates
The reliability (internal consistency) for reported subscales within each scale was generally very good (Table 3); only one (Global Self-Worth/ SPPC) was lower than the generally-considered acceptable level of .7 (Field 2005). SDQI alphas were high (.85-.94; median .89) and closely matched those reported for the normative sample (Marsh 1992). SPPC alphas (.67-.82; median .77) were largely consistent with published figures (Harter 1985), though in the present sample the alpha level for Global Self Worth was noticeably lower. Full scale reliabilities are not calculated for these two scales as this does not reflect their structure. PH2 alphas (.72-.83; median .80) were similar to the normative sample (Piers et al. 2002); full scale internal consistency was also very high (3). As reliability analyses are only conducted with data from respondents who completed a subscale or scale in full, scale completion rates were calculated. These varied considerably: SPPC 79% (n = 189), SDQI 58% (n = 123), PH2 50% (n = 100). The low PH2 completion rate is notable given its simple response set (Yes/No) and brevity compared to the SDQI. Many participants wrote on PH2 answer sheets, with comments such as 'sometimes', or 'middle', rather than selecting either 'yes' or 'no', suggesting the dichotomous response format may be experienced as too limiting.

Factor Analyses
We begin by reporting the PCA with Varimax rotation as the primary analysis for each scale, and also report findings for each oblique rotation in the course of the narrative. The oblique rotations required increased iterations for convergence, n = 57 iterations for the PH2 and n = 28 for the SPPC. All analyses met the assumptions of sampling adequacy (KMO values above 0.5 for all analyses) and sphericity (p < 0.05 for the Bartlett's test in all instances).

Factor Analyses: SDQI
An initial unforced extraction of the SDQI produced 16 factors (76% variance), but 8 of these consisted of one or two items; as such factors are generally unstable and interpretation is hazardous (Tabachnick and Fidell 2007), this model was not considered further. The scree plot suggested six to nine factors; as the SDQI is structured as an eight-factor scale (Marsh 1992), the PCA was repeated, forcing extraction of eight components, which cumulatively accounted for 61% of the variance. The first factor, with 19 items, had an eigenvalue of 10.76 (14% variance). The remaining seven factors had between 10 and 5 items; eigenvalues of 7.36-3.15 accounted for 10%-4% of variance each ( Table 4).
Six of the 76 scale items (8%) loaded at below .4 so were excluded. This factor analysis of the SDQI thus resulted in one large 19-item factor, four factors identical or close to original SDQI subscales and three smaller factors (eigenvalues 10.76-3.15). Alpha coefficients for the eight extracted factors ranged from .80-.94, indicating strong internal consistency (Table 4).
To compare SDQI item dispersion, Table 5 displays the items in original subscales and factors extracted in the current analysis.
For four factors, original subscale names were retained: Factors 2, 3 and 4 were identical to SDQI Reading, Maths and Physical Activity subscales, and Factor 5 matched the SDQI Parent subscale except for item 12, My parents are usually unhappy or disappointed with what I do, which failed to load onto any factor.
The other four factors extracted differed notably from the original SDQI. Factor 1, to which we assigned the name Looks, Self-esteem and Likeability, contained all nine Appearance items, six General items (related to self-acceptance, acceptance by others and competence) and four Peer items, related to likeability (e.g., Item 36, I am easy to like) and ease of friendship (Item 44, Other kids want me to be their friend). Factor 6, General/ School Ability had five items (four School Subjects, one General Self-Esteem) relating to ability to work well, getting good marks in school or doing things well (e.g., Item 16, I get good marks in all school subjects). Factor 7, Enjoy Schoolwork, had four items addressing schoolwork enjoyment/ interest (e.g., Item 39, I am interested in all school subjects)thus, the SDQI School Subjects subscale divided into factors for school ability and enjoyment. Factor 8, Peer Popularity, contained four popularity-related items (e.g. Item 52, I have more friends than most other kids) and one further Peer subscale item.
Six SDQI items failed to converge at over .4 with any factor: Item 12 (cited above), items 23: I hate all school subjects; 29: I do lots of important things; 47: I am dumb in all school subjects, 53: Overall I have a lot to be proud of and 61: I can't do anything right.
On conducting the PCA using an oblique rotation, the factor structure described above was replicated to a large extent. Again, the original General Self Esteem factor was not evident, with six of the items failing to load and three (items 45, 70, 72) loading in the new factor Looks, Esteem & Likeability with items from the original Appearance subscale as had been the case for the orthogonal rotation. Interestingly, peer items did not load here for this analysis. The original Peers subscale was retained in this rotation except for the item Kids want me to be friends (item 44). This contrasts with the orthogonal rotation finding where a separate smaller Peer Popularity factor loaded, and peer 'likeability' items  (2) 02 Good at all school subjects General/School Ability (6) loaded instead into the Looks, Esteem & Likeability factor. As was found for the orthogonal rotation, Physical ability, Parents, Reading and Maths were generally retained, while the original School Subjects subscale was again split as described above. The similarity of these findings suggests that the rotation method used was not the determining factor regarding the outcome of the analysis with the current sample.

Factor Analyses: SPPC
In the PCA with Varimax rotation of the SPPC, the initial unforced extraction produced nine factors (64% variance), but the scree plot suggested four to nine. Therefore the PCA was repeated, extracting six components in accordance with the original SPPC's factor structure (Harter 1985); these accounted for 54% variance. Exact variance and all eigenvalues are shown in Table 6. Factor 1 (eigenvalue 4.10) accounted for 12% variance; the others accounted for 10%-7% variance (eigenvalues 3.52-2.59). Two of the 36 items (6%) loaded below .4 and were excluded. The forced extraction of six factors from the SPPC therefore produced one large 9-item factor (Factor 1), three matching original SPPC subscales (Factors 2, 3 and 5) and two smaller factors (Factors 4 and 6). Alpha coefficients for internal consistency of the six SPPC factors extracted (Table 6) were .68 (Factor 4) to .86 (Factor 1; median .76); some fell below the accepted level of .7 (Field 2005)as the original SPPC subscales had (see Table 3). Table 7 displays the dispersion of SPPC items in the original subscales and in the extracted factors.
The original SPPC Behavioural, Scholastic and Athletic Competence subscales were exactly replicated by three extracted factors (2, 3 and 5), but three further factors differed from the SPPC structure. Factor 1, Self-acceptance: Appearance and Self as a Person, contained nine items: all six Physical Appearance subscale items (e.g. item 4, I am happy with the way I look) and three Global Self-Worth subscale items (e.g. item 30, I am very happy with the way I am). Two factors primarily contained Peer Competence subscale items: Factor 6, Peer Popularity, contained the three positive items from the Peer Competence subscale (Table 7) and Factor 4, a negative item factor, Negative Self-and Social Perceptions (Table 7) contained the three Peer Competence items (e.g. item 26, I wish that more people my age liked me') and one original Global Self-Worth scale item (12, I don't like the way I am leading my life). Finally, two items from Global Self-Worth failed to load onto any factor at .4 or over: 6, I am often unhappy with myself and 36, I am not happy with the way I do a lot of things.
As with the SDQI, when using an, using an oblique rotation the factor structure described above was replicated -in this case with only one exception. In this analysis the item Don't like how I am leading life (Item 12), which appears in the original Global Self Worth subscale, did not load. No other differences were noted. This suggests, again, that the rotation method used did not account for the outcome of the analysis with the current sample.

PH2 Factor Analyses
PCA with Varimax rotation for the PH2 produced 18 factors, but as the rotation failed to converge in 25 iterations, the PCA was repeated, forcing extraction and rotation of six components, reflecting the original PH2 factor structure (Piers et al. 2002). This model accounted for 44% of the variance (eigenvalues 6.22 to 2.57) ( Table 8).
The forced extraction of six factors from the PH2 resulted in four large factors with 9-15 items each, and one smaller 4-item factor. Factor 6, with only one item, cannot be considered stable (Tabachnick and Fidell 2007), and 11 of the 60 items (18% of the scale) loaded below .4 and were therefore not interpreted. Internal consistency of the five interpretable factors were somewhat lower than for the original PH2 subscales. Coefficient alphas ranged from .85 (high) for Factor 1 to .62 (unacceptable) for Factor 5 (Field 2005) (Table 8). Table 9 displays the original PH2 items and their dispersion in the factors extracted by the current analysis. As many PH2 items contribute to multiple sub-scales, all the original subscales to which each item contributes are indicated. For scale copyright reasons, item content is not reproduced here. The item numbers correspond to the 60 item scale version.
For the five interpretable factors generated by the forced PH2 extraction, Factors 1, 2 and 3 converged partially with original PH2 subscales (Table 9). Factor 1, Peer Popularity and Anxiety contained 15 items addressing popularity, peers liking their ideas, physical strength, friendship, fitting in, anxiety and feeling lucky, (8 from the Popularity subscale, 3 from Freedom from Anxiety, 2 Physical Appearance and

Attributes, and 1 each from Happiness and Intellectual subscales). Factor 2,
Being Bad and Getting it Wrong, largely contained negative items (behaviour in general/at home, negative cognitions and negative views of appearance), half from the Behavioural subscale, two from Freedom from Anxiety and one from Happiness and Satisfaction. Factor 3, School Ability and Behaviour, contained 7 items from the 14-item Intellectual and School Status subscale, and 2 from Behavioural Adjustment. These encompassed ability to learn and read and selfperceptions of smartness, as well as behaviour in school. The two remaining interpretable factors contained a greater mix: Factor 4, Happy, Sad and Being Accepted, consisted primarily of happiness/sadness items (Table 9), originally from the Freedom from Anxiety, Happiness, Popularity, and Behaviour subscales. Factor 5, Confidence in Class and Appearance, contained items relating to being good-looking and an important class member, from the Intellectual, Physical, and Happiness subscales. Factor 6, one item from the Physical subscale, was not interpreted. Overall, therefore, six PH2 factors were extracted, of which five interpretable factors converged to some degree with some PH2 original subscales. However the factors extracted reflect the mix of items and many overlaps between subscales in the original PH2.
Finally, the PCA with oblique rotation once again highlighted a very similar pattern of item loadings and different factors. Five factors mirrored those described above for the orthogonal rotation, and small changes in individual items that did not load this time (items 24, 26 and 33), or those that did (items 1, 6, 12, 18, 38, 51 and 60) still leaving the same conceptual factors present. For example, items 1, 6 and 60 (being made fun of; shyness; overall qualities of self), which had not loaded for the orthogonal rotation, here loaded on and conceptually fit within the factor Happy, Sad & Being Accepted, a factor that had also been identified for the orthogonal rotation. Items 12 and 18 (to do with behaviour and ability in school) loaded on and conceptually fit within the new factor named School Ability and Behaviour. Items 38 and 51 loaded on Factor 6, which still could not be interpreted.

Discussion
Self-esteem/self-concept scales are very widely employed in psychological research. Yet they vary substantially in content and structure and therefore this study examined the factor structure and internal consistency of three widely used self-concept scales for late childhood/early adolescence. The ultimate goal was not only to consider the factors identified and any patterns across the scales, but also to assess their validity in light of empirical research that had established young people's own perceptions of salient self-concept factors (Tatlow-Golden and Guerin 2017). Factor analyses identified interesting patterns across the three scales. Some closely matched existing subscales, yet notable differences were also identifiedin patterns that were, interestingly, found in more than one scale, with separate yet demographically similar participant groups.

Factors Extracted from the Three Scales
Exploring the three scales, patterns of similarity and difference from original scales were found, and we turn to these first. Both orthogonal and oblique rotations were conducted. It is of particular note that the method of rotation does not appear to be the source of the differences we found from the normative subscales reported by scale authors, as the factors we extracted mapped closely on to one another in both rotations. As we stated at the outset, this study draws on self-concept theory and scale creators' assumptions that subscales reflect distinct aspects of the self. Therefore, in discussing the specifics of the factors found, we refer to those from the orthogonal rotations.
For both the SPPC and the SDQI scales, large, global/appearance factors were extracted: global self-esteem items loaded with physical appearance and these accounted for more variance than any other extracted factor. Factors representing school ability, sport/athletic ability, and peer popularity were also extracted from both SPPC and SDQI. Indeed, where factors differed between the two scales, these contained items present in one scale only: for the SDQI, school enjoyment, maths ability/ enjoyment, reading ability/enjoyment, and parent factors were extracted, for which items are not present in the SPPC. For the SPPC, negative peer perceptions and behaviour were extracted, content that is not present in the SDQI.
The fact that a 'pure' global self-esteem factor was not extracted for either the SDQI and SPPC is notable, suggesting that global and appearance self-perceptions may not be psychometrically distinct. This represents a potential challenge to models of multidimensional self-esteem as proposed by researchers such as Marsh (1992) and Harter (1985). These large global/appearance factors reflect strong positive correlations identified in the present dataset between global self-esteem and appearance (Tatlow-Golden 2011), trends that are consistently found in empirical research across cultural settings (Baudsen et al. 2016;Harter 2006Harter , 2012Klomsten et al. 2004). Harter (2012) argues that self-perceptions of appearance are distinct yet that they are the primary cause of global self-esteem; an item response theory modelling analysis of the SPPC (Egberink and Meijer 2011) concluded that as global self-esteem is heavily saturated with appearance, global self-esteem subscales may be measuring appearance selfperceptions instead. These distinctions remain to be teased out in further research.
Furthermore, peer items relating to likeability (make friends easily, easy to like) also loaded on the appearance/global factor for the SDQI, suggesting a global/appearance self factor may also be associated with self-perceived peer likeability. Other SDQI peer items, addressing popularity (being popular, having lots of friends), loaded onto a separate peer popularity factor, as did the positively phrased SPPC popularity items (negative popularity items loaded separately). Therefore popularity factors, distinct from likeability or friendship quality, were extracted from both SDQI and SPPC, supporting consistent findings that popularity is distinct from aspects of peer relationships such as friendship (both functionally and regarding longer-term outcomes: Asher et al. 1996;Bukowski et al. 2010).
Both SPPC and SDQI, for school and sports/athletics, extracted ability factors. The SPPC Scholastic Competence subscale was extracted complete (it does not measure enjoyment); the SDQI School Subjects subscale, which contains ability and enjoyment items, split into distinct factors, suggesting generalised self-perceptions of school learning ability and enjoyment may be psychometrically distinct. Interestingly, however, enjoyment and ability loaded together for the SDQI's subject-specific maths and reading factors, suggesting that self-perceptions of ability/ enjoyment may be related for more specific subjects, e.g., maths/reading, but not for school overall. This complex set of relationships regarding enjoyment and ability in sporting and schoolrelated endeavours requires further investigation.
In sum, half the SDQI and SPPC factors extracted in the present study matched original scales, and where they did not, the factors that were extracted consistently aligned with one another, suggesting that these two scales with similar items may be accessing consistent, valid factors. Interestingly, the original SPPC manual (Harter 1985) reports a variation of factor patterns for some sub-samples, where fewer factors were extracted: Scholastic Competence items loaded together with Social Acceptance in one school and with Behavioral Conduct in another. Harter (1985) interprets these variations as reflecting local school values, suggesting that the SPPC may be subject to local variation. Shevlin et al. (2003) report that confirmatory factor analyses with samples in various countries have produced mixed results, and found, with a northern Irish sample, that SPPC subscale domains varied across time.
The PH2 factor analysis did not extract any factor matching its original subscales, and factors extracted were rather difficult to interpret. Once again, this lends support to multiple reviewers who have queried PH2 subscales' validity (Byrne 1996;Marsh and Holmes 1990;Wylie 1989) and who have questioned their use in research (e.g. Byrne 1996). The factorial confusion of the PH2 may be due to the fact that overall, 15 of the scale's 60 items contribute to more than one subscale; for example, one item (item 8), to do with negative self-perception of appearance, contributes to three subscale scores: Happiness and Satisfaction, Freedom from Anxiety, and Physical Appearance and Attributes. A further concern relates to the conceptual coherence of PH2 subscales, as some items have low face validity in at least some of the subscales to which they contribute. It is difficult to understand, for example, how items that relate to being smart, or peers liking or approving of one's ideas (5, 26, and 39) have a logical place in a subscale titled Physical Appearance and Attributes.
For the PH2, there were however also some similarities to the factors extracted for the SDQI and SPPC. A PH2 schoolwork factor was extracted (although behaviour items loaded as well, in contrast to the SPPC, which also has both schoolwork and behaviour items) as was a peer popularity factor (on which many anxiety items also loaded). Furthermore, as with the SPPC, a negative item factor was extracted from the PH2; negative item factors in measures for this age group are considered below. Further factors that were extracted related to emotions and acceptance and confidence in class, or were uninterpretable.

Methodological Considerations
In the present dataset, internal consistency of the scales was fairly good (SPPC) to very good (SDQI, PH2) and Cronbach's alpha values were comparable to published reliabilities. However, given that Cronbach's alphas are only calculated for participants who complete a scale (or relevant subscale) in full, the considerable variation in completion rates of the three scales was notable. Over three-quarters of participants completed the SPPC in full, somewhat over half for the SDQI and just half for the PH2. These full scale completion rates may reflect lower scale acceptability for longer scales in terms of participant fatigue, as the SPPC has 36 items, compared to 60 for the PH2 and 76 for the SDQI. Another potential factor was suggested by participants' annotations on the PH2, indicating that this scale's dichotomous yes/no options were insufficiently nuanced (the SDQI and SPPC, which had higher completion rates, are Likert scales).
The extraction of negative item factors for the PH2 and SPPC in the present study supports Marsh and Holmes' (1990) findings regarding earlier SPPC and PH2 versions (Perceived Competence Scale; Harter 1982; Piers-Harris Self Concept Scale for Children; Piers 1984). It further supports Marsh and colleagues' concerns (e.g., Marsh 1992;Marsh and Holmes 1990) about children's capacity to respond to negatively worded itemsthe SDQI does contain negatively worded items to disrupt response bias, but these are disregarded when calculating subscale scores as they were found to contribute to reduced reliability (Marsh 1992). This raises questions about the validity of any scale using negatively worded items with participants up to and including young adolescents.
Sampling-related limitations of the present study are that students in more advantaged areas were over-represented, potentially limiting generalizability, and that participant numbers (ns =100-189) differed for the three scales. However important sampling strengths are that participants were drawn from schools across the greater Dublin region, randomly selected within specified clusters to reflect school characteristics, and had comparable age and gender characteristics for the three scales.
The sample sizes in the present study were considerably smaller than those on which the scales were normed and for which factor structures were reported. However, smaller samples might predict extracting fewer factors, and this proved not to be the case in the present study where more factors were identified in open extractions leading to the decision to force extraction of the number of factors reported for each scales' normative sample.

Content Validity of Scales
A further, and still more fundamental, challenge to these scales is the question of their content validitytheir ability to measure aspects of self concept that are experienced phenomenologically as salient by children themselves. Our work with young people on this topic (Tatlow-Golden and Guerin 2017) suggests that mixed methods approaches elicit aspects of self that these young people themselves value, as well as meanings they attribute to them. Many of these aspects and meanings are missing from these scales, with implications for scales' content validity. For the active and social domains of the self, self-concept scales focus on ability in schoolwork and sports, and popularity with peers. This contrasted with our findings (Tatlow-Golden and Guerin 2017) that participants focused on friendship quality rather than on their perceived popularity, and on many other significant relationships, particularly those with immediate and extended family and even with pets (Tatlow-Golden and Guerin 2017). Furthermore, in contrast with scales' focus, participants very rarely cited schoolwork, and they were less concerned with their self-assessed ability in their favored activities (how 'good' they were at them) than with their sense of their individual skill progression. This indicates that in some domains self-concept scale content favours adult researchers' priorities over those of children (Tatlow-Golden and Guerin 2017), measuring extrinsic self factors rather than those that are intrinsically meaningful in late childhood and early adolescensce.
Extensive empirical research in Self-Determination Theory (SDT; Deci and Ryan 2000) has established that, compared with extrinsic motivation, intrinsic motivation (where activities are engaged in due to inner values and interest) is associated with greater well-being in multiple domains including self-esteem. If self-concept/self-esteem scales measure children and young people's externally defined, contingent selfesteem, rather than intrinsic, self-esteem factors, as we have argued (Tatlow-Golden and Guerin 2017), this has substantial implications for the interpretation of findings from self-concept research employing these scales. The factor analyses reported here lend further support to these conclusions that scales as currently constructed may not reflect the selves that children and young people experience phenomenonologically.
The self, self-concept and self-esteem have tended to be viewed as synonymous with 'whatever is measured with tests of the self-concept' (Bruner 1990, p.101). However, it is impossible for self-esteem scales to evaluate every aspect of a construct as broad as the self-concept. The SDQI's author notes that 'by its very nature … there is no perfect indicator of self-concept, let alone a perfect criterion against which to validate a measure of self-concept' (Marsh et al. 1983 p. 336), and the SPPC's author has wisely alerted researchers to guard against the temptation of treating any self-esteem scale as synonymous with the construct (Harter 1982). Unfortunately, this most pertinent caution is not generally heeded (Bruner 1990;Wylie 1974Wylie , 1989.

Conclusion
The present study suggests concerns regarding the factor structure and content validity of all three self-concept scales examined, but indicates that the SPPC and SDQI are considerably more psychometrically valid than the PH2. This reflects findings of scale analyses to date; despite the PH2's high internal consistency reliability, we concur with those reviewers who have long argued that, despite its very widespread use, the PH2 is unlikely to be a valid measure of multiple selfesteem dimensions (Byrne 1996;Marsh and Holmes 1990;Wylie 1989). The PH2 factor analysis failed to replicate any factor from the original scale, the factors extracted lacked conceptual clarity, and a large negative item factor was extracted, suggesting that it is less suitable for younger children. Its internal consistency reliability was also compromised by a very low full scale completion rate. Taken together these considerations suggest that the PH2 may be a poor measure for accessing self-esteem (either dimensional or global) in late childhood or early adolescence.
There are also some concerns regarding the SPPC, which had poorer internal consistency, and extracted a small negative item factor; and regarding the low full scale completion rate for the SDQI. In addition, only half of the factors were replicated for both SPPC and SDQI. However, the factor patterns we identified were much clearer for these two scales, and these aligned strongly with one another, indicating that there may be certain psychometrically distinct aspects of children's self-esteem. These suggest that global and appearance esteem may form a single dimension of children's self-evaluation, and that further valid dimensions of self-evaluation may be sports ability, school ability, reading ability and enjoyment, maths ability and enjoyment, behaviour, peer popularity, and parent relationships. This study therefore supports the content and construct validity of certain self-concept factors, but in light of qualitative findings regarding children and young people's salient self-concept factors, it suggests that vital dimensions and items, reflecting more intrinsic self-esteem factors, are not represented in these scales (Tatlow-Golden & Guerin 2017). These include friendship, many other close relationships beyond peers, many activities not represented in scales, and crucially, individual self-perceived progression, rather than comparative ability, in these activities. Taken together, these findings query the ability of existing, adult-developed self-concept/self-esteem scales for young people to accurately reflect participants' self-concept. They certainly challenge researchers to consider carefully the value of such scales in future research.