Introduction

Quality physical education (QPE) has been suggested as an important and effective approach to change the childhood obesity epidemic (Quinn 2012). Preservice physical education teachers (PPETs) are future professionals who will deliver QPE programs to all school-aged children (Kahan and McKenzie 2015; Scrabis-Fletcher et al. 2016). However, physical education has been a marginalized subject in schools for more than 4 decades (Beddoes et al. 2014) with many negative images attached to physical education teachers (McCullick et al. 2003; Spittle et al. 2009). As Pajares (1992) noted, preservice teachers “may have commitments to prior beliefs” that they learned from their previous schooling experiences, and it is extremely difficult to challenge those negative beliefs, because the reality of their everyday lives “may continue largely unaffected in higher education” (Pajares 1992, p. 323). This calls for more empirical research on this topic to reveal prevailing PPETs’ beliefs, so that teacher educators can challenge negative beliefs more purposefully through coursework and field practicum (Erbas 2013; Sofo et al. 2012).

Beliefs are “individual’s judgment of the truth or falsity of a proposition” (Pajares 1992, p. 316). Teacher beliefs are associated with a specific subject area (Fives and Buehl 2012). In the field of physical education, studies on teachers’ beliefs have been focusing on concepts related to self-concept and specific teaching and learning outcomes, such as teaching efficacy (Hand 2014; Pendergast et al. 2011), perceived roles and images in school (McCullick et al. 2012; Rachele et al. 2016), teacher preparation (Chróinín and O’Sullivan 2016), and intended curriculum outcomes (Adamakis and Zounhia 2016; Capel et al. 2011). However, to date, beliefs about the physical education profession (BPEP) received very little attention (Richards et al. 2014). Teachers’ beliefs about their profession have a significant impact on their career choice and retention (Dundar 2014; Ralph and MacPhail 2015). With positive beliefs about the teaching profession, people tend to choose a teaching career out of intrinsic reasons (e.g., personal interests and personal impact on students) (Eick 2002; Ustuner et al. 2009). As a result, they are more likely to enter and continue in the teaching profession for a long term (Eick 2002; Perrachione et al. 2008; Spittle et al. 2009).

The previous research on teacher socialization and motivation provides some knowledge about BPEP, even though studies with a specific focus on BPEP are still needed. PPETs tend to enter the physical education teacher education (PETE) major, because they believe that this major leads to sport coaching positions (Matanin and Collier 2003; Richards and Templin 2012; Tsangaridou 2006) and requires lower admission criteria than other majors (McCullick et al. 2012; Spittle et al. 2009). As PETE program study and school practicum accumulated, conflicting beliefs about the value of physical education (improving fitness or enhance sport performance) and about the academic rigor of physical education (core subject or supplementary subject) emerge (Garrett and Wrench 2007; Matanin and Collier 2003; Philpot and Smith 2011; Richards et al. 2017; Spittle et al. 2009). What remained unchanged was the low attraction to the physical education profession comparing to other subject matters (McKenzie and Lounsbery 2014). Yet, many PETE programs are “simply not designed to help recruits deliberately and directly confront their belief systems, either about subject matter or pedagogy” (Placek 1993, p. 364), even though some success has been reported in a few studies (e.g., Chróinín and O’Sullivan 2016; McCullick et al. 2012; Philpot and Smith 2011).

Another approach to understanding teachers’ beliefs can be drawn from Fishbein and Ajzen’s (1975) two-domain theoretical framework about beliefs: the importance (i.e., centrality and relevance to personal goals) and the value (i.e., evaluation of usefulness and goodness) of the object being judged (Greenleaf et al. 2008). This framework is particularly valuable to this study, because it sheds light upon one’s judgement of a profession in relation to oneself and to the people that one mainly contributes to. Some empirical studies have validated the two-domain framework. For example, Tsangaridou (2008) reported two themes in their study about PPET-BPEP: the contribution of physical education to students’ development and the equal significance of physical education to other school subjects. Richards et al. (2017) focused on one aspect of beliefs—the perceived mattering (i.e., the value) of physical education, and reported two subdomains: the value as a teacher and as a subject matter in schools. Furthermore, teachers’ beliefs about the importance and the value of their work were closely related to teachers’ career choices across subject matters (Fokkens-Bruinsma and Canrinus 2012; Lin et al. 2012; Thomson et al. 2012), revealing external validity of Fishbein and Ajzen’s framework.

It is important to note that almost all the existing studies on PPET-BPEP employed qualitative methods such as interviews and document analysis (e.g., Chróinín and O’Sullivan 2016; Placek 1993; Ralph and MacPhail 2015; Tsangaridou 2008). While qualitative studies have certainly provided situated and in-depth data, it is imperative to go beyond one confined context, and conduct large-scale, multi-site investigations on the topic using quantitative methods. To our knowledge, the instrument of PPET-BPEP has yet to be fully developed for quantitative studies, limiting our overall understanding of PPET-BPEP. To fill the research gap, this study aimed to develop and validate an initial scale measuring PPET-BPEP based on mixed-method research design in three consecutive studies. The following sections reported each study in the order in which it was completed.

Unlike that in many western countries, an approval for research involving human subjects is not mandated in China. Nonetheless, before any data collection began in all three studies described below, the Institutional Review Board of the university affiliated to the corresponding author approved this study.

Study 1

Study 1 aimed to explore domains of PPET-BPEP and develop a scale item pool in three steps. First, a literature review on the topic was conducted to build on the existing knowledge about BPEP. Second, a group of PPETs’ responses to two open-ended questions were collected and analyzed to explore if additional domains would emerge. After triangulating literature review and PPET’s responses, we summarized tentative domains and developed scale items for each domain.

Method

The existing literature was first searched to identify the underlying structure of PPET-BPEP. Keywords included “preservice teacher” or “teacher candidate”, “beliefs”, and “physical education”. We used major databases for educational research in U.S. and China, including EBSCOhost, China Academy Journals, and CNKI (Keating et al. 2017). We also restricted the search to include publications written in English and Chinese between 2000 and 2016 to develop domains based on a variety of studies in relevant contexts. To reduce the risks of bias, we included all relevant publications in our analysis, regardless of research methods (i.e., qualitative, survey, systematic review, quantitative, etc.), types of articles (i.e., theoretical and empirical studies), and sources (i.e., peer-reviewed journals, books, thesis, and dissertations).

The senior author conducted a thematic coding of all selected studies resulting from the search. Two other authors re-examined the coding accuracy and discussed the discrepancies in coding until a consensus was reached. The three authors also reviewed book chapters about teacher beliefs that were frequently cited to gain a thorough understanding of BPEP. In the end, the tentative domains of PPET-BPEP were identified and triangulated with survey responses among PPETs and experts as described below.

In addition to a literature review, 25 Chinese PPETs enrolled in a large university in China completed an open-ended questionnaire with two questions: (a) What do you think about the physical education profession and (b) Why do you want to become a physical education teacher? Two professionally trained researchers, who were not involved in the literature review process, coded these responses independently using constant content comparison method (Dye et al. 2000). Themes that emerged from the questionnaire responses were compared to the literature review results for data triangulation. Peer debriefing was conducted during coding and triangulation to ensure the trustworthiness of results.

Finally, to gain insights concerning the domain structure and item development, the existing scales measuring teachers’ beliefs related to the teaching profession were also examined (e.g., Teacher Beliefs Survey, FIT-Choice, Watt and Richardson 2007). Based on the above three sources, the domains of PPET-BPEP were conceived and an item pool was developed. Three experts (i.e., two from Chinese universities and one from a U.S. university) provided feedback for the domains and items.

Results

Two salient themes were identified from the existing literature and open-ended questionnaire responses: the benefits of physical education for students’ health and well-being, and the value of physical education for PPETs’ self-fulfillment. These two themes are in line with Fishbein and Ajzen’s (1975) framework about beliefs. Based on these findings, a two-domain model of PPET-BPEP was developed. The two domains were labeled and defined as follows: (a) the value of physical education profession, defined as self-perceived significance of physical education with regards to students’ development and the well-being of the society; (b) the importance of the physical education profession, referring to the self-assessed aptness for the physical education profession. Accordingly, BPEP was defined as the understanding, presumptions, and propositions about the importance and the value of the physical education profession that one thinks to be true.

Based on the two-domain model described above, an item pool consisting of 28 items was developed. Two items were from related scales about teacher beliefs (i.e., Greenleaf et al. 2008; Watt and Richardson 2007). Five negatively worded items were included to avoid respondents’ bias (Groves et al. 2011). Because beliefs are conceptualized as propositions that one thinks to be true, we used a seven-point Likert response format, ranging from strongly disagree (scoring 1) to strongly agree (scoring 7), which will generate variations needed for measuring latent variables (Groves et al. 2011). Based on the three experts’ feedback, five items were modified, and nine items were deleted due to the lack of clarity or relevance to its intended domain. At the end of Study 1, an initial scale with 19 items was developed.

Study 2

Study 2 was conducted to evaluate the content validity of the initial survey items. A panel of experts in the fields of physical education and educational psychology was recruited to complete an online survey. Because the scale is designed for Chinese PPETs, experts from China were also included in the study to ensure acceptable content validity.

Method

Six currently tenured professors in PETE programs and educational psychology from China and the US participated in Study 2. They were selected based on a search of previously published studies on teacher education and teachers’ beliefs. An online Qualtrics survey link was emailed to the six experts, which included the definition of PPET-BPEP, definition of the two aforementioned domains, and the 19 randomly ordered survey items. The experts were asked to identify the best matching domain for each item and make suggestion for item wording. Agreement among experts on the matching domain for each item was calculated. Items with at least 80% expert agreement were considered as having acceptable content validity (Groves et al. 2011).

Results

In total, five items were deleted due to a low agreement (i.e., < 80%) among the six experts. These five items were either closely relevant to both domains or unrelated to either domain. For example, an item that “Teaching physical education is very meaningful” was deemed relevant to both domains; whereas the item “I chose physical education for lighter demands of coursework” was viewed as more relevant to career motivation rather than BPEP. Minor revisions of wording were made to the remaining items according to experts’ feedback about item clarity. A preliminary scale of 14 items was developed at the completion of Study 2.

Study 3

Study 3 was geared toward testing the reliability (i.e., internal consistency among items) and factorial validity of the 14-item scale from study 2.

Method

Participants

In total, 696 PPETs from a convenience sample participated in study 3, with 533 male and 163 female PPETs (Mage = 21.28 years, SD = 1.50 years). They were undergraduate students from four large normal universities in China, who have been enrolled in PETE major for at least 1 year. About half of the participants (N = 346, Nmale = 264, Nfemale = 82, Mage = 21.34 years, SD = 1.50 years) were randomly selected using SPSS 21.0 for exploratory factor analysis (EFA). The remaining 350 participants (Nmale = 269, Nfemale = 81, Mage = 21.23, SD = 1.53) were selected for confirmatory factor analysis (CFA).

Instrument and Data Collection

The preliminary PPET-BPEP scale consisted of two domains with a total of 14 items utilizing a seven-point Likert response format. Basic demographic information such as gender, age, and class standing was collected at the end of the survey. A Qualtrics online survey was distributed to Chinese PPETs at six normal universities via emails.

Data Analyses

The statistical analyses consisted of four steps. Before the data analyses, the researchers excluded cases with more than three missing values or outliers. First, correlations between the 14 items were analyzed using SPSS 21.0 to identify and eliminate highly correlated items (i.e., r > 0.80) (Meyers et al. 2017). Second, an exploratory factor analysis (EFA) was performed to determine the underlying common factors. Using principal component extraction, the number of common factors was determined based on the eigenvalue (> 1) and the scree plot. Because the correlation between the two factors was not close to zero, a promax rotation was then utilized based on recommendation by Meyers and colleagues (2017). The factor loadings of items after rotation were obtained. If rotated factor loading was lower than 0.30 or indicated cross-loading (i.e., items closely load on both factors), the item was deleted.

Using the items remained from EFA, CFA was performed utilizing Mplus 7 to examine the factorial validity as evidence for the construct validity of the model obtained from EFA results. We examined the normality of item scores using the Shapiro–Wilk test in SPSS 21.0. Because none of the item score was normally distributed, a rescaling-robust estimator (robust maximum-likelihood, MLR) was used in CFA (Meyers et al. 2017; Wang and Wang 2012). According to Meyers et al. (2017), fit indices including root-mean-square error of approximation (RMSEA) and 90% confidence interval, the comparative fit index (CFI), Tucker–Lewis index (TLI), and standardized root-mean-square residual (SRMR) were examined to test the overall model fit to the data. Chi-square test was not chosen due to it is sensitive to large sample size, which may be subject to biased interpretation of model fit (Meyers et al. 2017). RMSEA < 0.06, CFI and TLI > 0.95, and SRMR < 0.08 have been recommended for a conclusion of an adequate model fit (Hooper et al. 2008). Standardized factor loadings lower than 0.30 and residuals greater than 1.96 were also used to locate the source of model misfit, if any. The correlation between the two factors is expected not to exceed 0.80 (Meyers et al. 2017).

Finally, after removing items with low factor loadings or high residual covariance, Cronbach’s alpha was calculated using SPSS 21.0 to examine the internal consistency of the items in each resulting domain and the entire scale, respectively. The Cronbach’s alpha is commonly used to indicate measurement reliability in cross-sectional studies (Cohen et al. 2013). The cut-off Cronbach’s alpha value for acceptable reliability is greater than 0.80 (Meyers et al. 2017).

Results

Correlation analyses showed that items were moderately correlated with each other, r = 0.15–0.61. EFA results are presented in Table 1. EFA revealed two common factors with eigenvalue greater than 1. The initial two-factor solution was then rotated using the promax rotation. Two items showed high cross-loading on both factors and were removed. EFA was performed again and revealed a two-factor solution without salient cross-loading items. In total, the two factors, each with six items, counted for 50.60% of the total variance of the entire scale. Based on the shared meaning of the items in each factor, the first factor was labeled as “value of physical education”, and the other factor was labeled as “sense of calling”.

Table 1 Reliability and EFA results of the PPET-BPEP scale

Using the other half of total sample, CFA results showed a good model fit to the data (RMESA = 0.03, 90% Confidence Interval ranges between 0.01 and 0.05; SRMR = 0.04; CFI = 0.98; TLI = 0.98). All these fit indices met the recommended cut-off value for a good model fit (Hooper et al. 2008). All the structure coefficients were greater than 0.58 and significantly different than 0 at p < 0.05 level (see Table 2). The correlation between the two factors was moderate (r = 0.70) (Meyers et al. 2017). The internal consistency of each factor and the entire scale was relatively high, given that Cronbach’s alpha value for each factor and the entire scale was 0.87, 0.81, and 0.83, respectively (see Table 1).

Table 2 Factor loadings of the PPET-BPEP scale

Discussion

This study is the first attempt to develop a scale measuring PPET-BPEP. Based on the body of evidence suggesting teachers’ beliefs is an essential factor in their career choice (Dundar 2014) and the quality and decision-making process in teaching practice (Buehl and Beck 2014; Ennis et al. 1992), we believe that the scale can add an important dimension to the ongoing research on teacher education.

Scale Domains and Item Content

Two domains of PPET-BPEP were identified in this study: the value of physical education in school, which means how useful and valuable physical education is for students; and the sense of calling, which is the perceived aptness for one to choose physical education as a career. The two-factor model supported Fishbein and Ajzen’s (1975) theory about beliefs. The content of the scale items also supported findings in the literature discussed earlier. For instance, “becoming a physical education teacher is my dream” and “physical education is an ideal professional career for me” are items that represent the attraction of the profession, which were also found in Thomson et al. (2012) study about preservice teachers’ beliefs. Items such as “I think physical education is of equal importance as other subjects in K-12 program” and “physical education is important in school” represent one forms of teacher career motivation—the social utility value (Watt and Richardson 2007) framework. Overall, this study suggested that the PPET-BPEP is a reliable and valid scale among the Chinese PPET population.

Factorial Validity and Reliability of the Scale

This study marks the first attempt to validate a PPET-BPEP scale using two analyses (i.e., EFA and CFA) in one study. Sample size is important to EFA and CFA (Meyers et al. 2017). The participant–item ratio was close to 30:1, which surpassed the recommended ratio of 10:1 for CFA (DeVellis 2016). Thus, it is deemed that the sample size was large enough to generate meaningful statistical power. Findings of CFA and EFA supported the two-factor models across two independent samples, lending us much confidence in the structural validity of the scale. The Cronbach’s alpha coefficient was greater than 0.80 for the entire scale and the two factors, suggesting an acceptable reliability among items in the same domain and in the entire scale (Meyers et al. 2017).

It is important to note that all the negatively worded items were deleted because of their low factor loadings. In general, negatively worded items were included in a scale to reduce acquiescent response bias (Groves et al. 2011). However, researchers reported that such practice resulted in ambiguous results and low reliability (Roszkowski and Soven 2010). In recent studies, the method effects associated with negatively worded items were recognized as a response style among adults (DiStefano and Motl 2006) and could be controlled via separating negatively worded items as a distinct latent variable (Ye and Wallace 2014). Future studies are needed to examine the necessity of including negatively worded items in the scale.

Overall, the entire scale is a short, valid, and reliable measurement for future research on the topic. We suggest using the two-domain model for future conceptualization of PPET-BPEP. However, the scale needs to be re-validated before using this scale in other countries or in other subject areas. PPETs in other countries may undertake different tasks and roles in schools. For example, physical education teachers in secondary schools in U.S. may focus on athletic coaching more than teaching physical education (Konukman et al. 2010), which is different from Chinese physical education teachers who seldom have coaching responsibilities (Sun 2014). Likewise, preservice teachers’ beliefs about other subject matters would vary in their BPEP due to difference in content knowledge and academic status.

Implications of Using the Scale

There are several potential areas where the scale of PPET-BPEP can be used. First, this scale can be utilized by PETE recruitment staff and program advisors to assess PPET-BPEP by obtaining an average score of the entire scale. Tok (2012), for example, suggested that such practice is necessary in addition to knowledge test scores. Teacher education program applicants’ beliefs towards the teaching profession need to be taken into consideration in the admission process to teacher education programs. Career advising services can help current students by providing useful and individualized advice for career choice based on the students’ BPEP scores measured by the PPET-BPEP scale.

Second, the scale can provide valuable information with respect to the effects of PETE programs of reshaping students’ beliefs. Beliefs about the teaching profession are not static. They can be influenced by many factors (Tarman 2012). Field practicum experiences (e.g., observation, peer-teaching, intern, and student teaching) can largely shape teachers’ initial beliefs about the teaching profession (Tarman 2012). These beliefs, in turn, serve as the base for meaningful teaching reflection for teacher educators (Ruitenberg 2011). For instance, a PPET with a high score concerning the value of physical education but a low score on the sense of calling may indicate that the coursework and field experience need to be in line with personal interests and goals. In addition, by administering this scale multiple times throughout students’ program study, practitioners may identify whether the teacher preparation program has effectively shaped students’ beliefs about the profession.

Third, the scale can be used to investigate salient factors influencing PPET-BPEP by conducting correlational and quasi-experimental studies. Our knowledge about the effective approaches to shaping positive BPEP through PETE programs is still limited. With the measures of BPEP developed in this study, it becomes possible to deepen our research on the construction of positive BPEP by mitigating its determinants.

Finally, the scale would serve as an important venue for educational reform shifting from a standard-test-based approach to a more holistic approach to educating the whole child in schools. The later approach can be jeopardized by negative BPEP that are currently prevailing among preservice classroom teachers (Faulkner et al. 2004). Moreover, physical education is not the only subject matter being marginalized in schools. This study provides a broadly situated framework and a pragmatic survey validation protocol for teacher educators in subject matters such as health, music, art, and foreign language education. Researchers in these subject matters may develop a similar belief scale to improve the capacity of their teacher education program in shaping preservice teachers’ beliefs about teaching within their subject area.

Limitations

The following limitations should be noted. First, the scale developed in this study focused on Chinese PPETs. Therefore, it has limited use in other populations. Re-validating the scale in various countries with different samples is needed to improve its applicability. Second, the comparability of the two samples was unknown. Although the entire sample was randomly split into two samples to maximize the comparability (i.e., age, gender, and class standing), it was still possible that the two samples varied in biographies such as the previous physical education quality and/or sporting experience.

Conclusions

This research provides a much-needed scale for measuring PPET-BPEP in the Chinese population, with acceptable structural validity and internal consistency. The validated scale in this research may be used for studies on influencing factors of PPET-BPEP, effects of PETE programs on PPET-BPEP development, and the relationship between PPET-BPEP and teaching practices. This scale may also assist PPET recruitment and develop holistic educational approaches that educate the whole child in China. However, cautions need to be exercised when using the PPET-BPEP scale. This study is an initial attempt to develop the PPET-BPEP scale. To avoid potential influences from other measures on the newly developing scale (Worthington and Whittaker 2006), we did not assess convergent and discriminant validity of the scale. However, we intend to do so in a separate sample in our future studies. Revalidations of the scale in other countries are also of concern.