Background

School physical education is considered to be one of the important ways to improve students’ physical fitness. Schools provide physical education for students by arranging systematic sports courses and activities. Students can be exposed to various sports and activities during school time, thereby promoting their physical health. School physical education not only cultivates students’ sports skills but also enables them to form a good lifestyle, which has a positive impact on students’ lives and happiness [1, 2]. In addition, some studies have shown that school physical education helps to improve students’ cognitive performance, cultivate a positive sense of competition, and shape a good body image and self-esteem [3, 4]. Among them, the school physical education plan is an important part of school physical education. The implementation of a good school sports plan can better promote students’ physical fitness [4, 5]. In China, where school attendance is mandatory for all children of school age, there is a great deal of research on the impact of school physical education on students, with a large number of empirical studies documenting the implementation and effectiveness of various sports programs [6,7,8,9]. In the literature review, it was found that there were survey studies on the implementation of school physical education, most of which were self-made questionnaires, but many questionnaires lacked formal validation with appropriate measurement properties, so there are still many studies that use relatively complex evaluation forms issued by the Chinese government, and although these studies are abundant in the Chinese literature, much of them remain inaccessible to the international academic community due to language barriers. This highlights a significant gap in the global discussion of China’s educational practices, and this lack of translation and international dissemination has created a huge gap in the global discourse on educational practices, especially in the context of physical activity in Chinese middle schools.

Therefore, it is necessary to develop and validate a questionnaire to evaluate the implementation level of physical education programs in Chinese middle schools to address these issues. The purpose of this questionnaire is to obtain comprehensive variables related to the implementation of school sports, including school organization and management, education and education, conditions and guarantees, students’ physical fitness and supervision and inspection. This will provide a survey tool that can be used by future researchers when investigating the implementation level of physical education programs in Chinese junior high schools.

Methods

Questionnaire construction

First, we searched multiple databases on the Internet using the keyword ‘Implementation of physical education’, including PubMed, ScienceDirect, Web of Science and ResearchGate. After reviewing a large number of relevant literature, we found that most studies investigating the implementation of physical education in schools basically included factors such as school management, school curriculum implementation, physical education teachers, physical education equipment, and safety guidelines [10,11,12,13,14]. After roughly determining these factors, we further searched for relevant information based on China’s actual situation. Finally, we found that the content involved in the document “Measures for the Evaluation of Physical Education in Primary and Secondary Schools” issued by the Ministry of Education of China in 2014 was similar to the objectives of this study. Therefore, the content of the questionnaire for this study will be modified based on this document [15]. The questionnaire content includes 38 questions in five dimensions, covering organizational management, education and teaching, condition guarantee, student physical fitness and supervision and inspection. For each item, we used a five-point Likert scale ranging from 0 (“not implemented/not in compliance at all/very poor”) to 4 (“very good/always implemented/fully in compliance”).

Next, a panel of five expert reviewers was invited to review the draft of the questionnaire. The panelists were all experts and scholars in physical education, including four professors of physical education and one associate professor of physical education. The review process was conducted face-to-face in order to make full use of the experts’ expertise and to immediately address any problems that might arise in the questionnaire design. They evaluated each item in the questionnaire and gave corresponding suggestions for clarity, relevance, and potential bias. Each person scored each aspect of the questionnaire using a 4-point scale, with 1 for “very unsuitable” and 4 for “very suitable”. After collecting the results of the expert review, the content of the questionnaire was modified in three places, including item 15, which was changed from a specific inquiry about the project of training students to “evaluating the effectiveness of physical education courses in skill training”, because the new curriculum reform standard issued by the Chinese government in 2022 has richer physical education curriculum content. In addition to basic sports skills, physical fitness, and health education, special sports skills include six major categories of sports, namely ball games, track and field, gymnastics, water or ice and snow, traditional Chinese sports, and emerging sports, providing students with more sports projects. The remaining two items, item 9 and item 28, were modified from declarative sentences to interrogative sentences for better understanding.

Finally, we randomly selected 10 junior high school PE teachers, five each in Hangzhou and Huaibei city, for a pilot survey and recorded all clarification requests.

Data collection

Data collection was conducted in a variety of ways: during the expert review process, we conducted face-to-face communication and data collection, and the pilot survey used the online chat tool WeChat to communicate with 10 physical education teachers and record feedback. The data for the questionnaire validity analysis came from the online questionnaire platform Wenjuanxing, and a total of 350 data were collected. The Wenjuanxing platform set the condition of filling out the questionnaire, which was that the applicant must be a working junior high school physical education teacher or manager in China.

Statistical analysis

After the questionnaire items were generated, we began to test the content validity, which is usually assessed by expert review and feedback from the target population when filling out the questionnaire [16]. After receiving the results of the five expert reviews, we conducted content validity index and Cohen’s coefficient kappa analysis to measure the consistency of the experts’ opinions on the questionnaire [17, 18]. After obtaining the results of the expert review consistency analysis, we began to conduct a simple pre-test on the target population and record their feedback. The purpose was to assess whether the target population could understand each question in the questionnaire and identify potential deficiencies [19].

Reliability test We used Cronbach’s Alpha to assess the consistency between the questionnaire items. The higher the alpha coefficient (usually greater than 0.7), the better the internal consistency [20, 21]. We also conducted a test-retest reliability test, using the same questionnaire on the same group at different time points and then calculating the correlation between the two test results. A high correlation indicates that the questionnaire has good test-retest reliability [22].

We will use construct validity testing. To ensure that the data are suitable for factor analysis, we first use the KMO (Kaiser-Meyer-Olkin) test and Bartlett’s sphericity test. The KMO result ranges from 0 to 1. When the value is above 0.6, the data is usually considered suitable for factor analysis. When the P value of the Bartlett test is less than 0.05, it indicates that there is a correlation between the data, and it is suitable for factor analysis. Then, exploratory factor analysis (EFA) is performed to analyze the data structure and check whether the questionnaire can effectively reflect the preset theoretical structure. Principal component analysis (PCA) is used to extract factors in exploratory factor analysis (EFA). The extraction of principal components uses eigenvalue decomposition and cumulative variance explanation to identify the principal components in the correlation matrix. Commonly used methods include the Kaiser criterion (the principle that the eigenvalue is greater than 1) and the cumulative contribution rate (generally retaining 60-80% of the total variance) to determine how many components should be retained [16, 23, 24].

Results

Content validity

After obtaining the evaluation results of the questionnaire on a 4-point scale from the five experts, we analyzed the I-CVI and KAPPA values. Table 1 shows that the relevance and clarity I-CVI of the questionnaire are both 1, indicating that all experts believe that each item has a good correlation with the concept to be measured, and the questionnaire content is valid for this topic. The KAPPA value of each item is 1, indicating that the consistency of the evaluation of the five experts is statistically significant, and the evaluation tool has extremely high reliability and consistency among experts.

Table 1 Expert review

Reliability test

The questionnaire was tested for Cronbach’s Alpha reliability. The statistical results are shown in Table 2. The Cronbach ‘s alpha coefficients of each dimension are as follows: the Cronbach’s alpha coefficient of organization and management is 0.889; the Cronbach’s alpha coefficient of education and teaching is 0.925; the Cronbach’s alpha coefficient of condition guarantee is 0.929; the Cronbach’s alpha coefficient of student physical fitness is 0.901; the Cronbach’s alpha coefficient of supervision and inspection is 0.784; usually, the alpha coefficient of a scale above 0.70 is considered to be reliable [20, 22, 25]. From the analysis results, it can be seen that the alpha coefficients of each dimension of this questionnaire are all greater than 0.70. Therefore, the internal consistency reliability of multiple items in this questionnaire is good.

Table 2 Reliability test

Exploratory factor analysis

The scale was analyzed for validity. KMO and Bartlett’s sphericity test were used. If this value is higher than 0.8, it means that the research data is very suitable for extracting information and has good validity; if this value is between 0.7 and 0.8, it means that the research data is suitable for extracting information and has good validity; if this value is between 0.6 and 0.7, it means that the research data is relatively suitable for extracting information and has average validity; if this value is less than 0.6, it means that the data validity is average [26, 27]. The KMO test was performed on the scale part, and the test results are shown in Table 3. The KMO result is 0.951, which is greater than 0.6. In Bartlett’s sphericity test, p < 0.001, indicating that there are correlation factors between the variables, and the validity is very good, and further factor analysis can be performed.

Table 3 KMO and Bartlett’s test

In further factor analysis, we first analyzed the factor extraction and the amount of information extracted from the factors. The eigenvalue is an important indicator used to evaluate the proportion of each factor in the variance of the covariance or correlation matrix between the explanatory variables. Only those factors with eigenvalues greater than 1 were retained. The cumulative variance explanation rate is an indicator to measure how much of the total variance can be explained by all the extracted factors together. Generally speaking, if the extracted factors can explain 60–70% of the variance, the model is considered reasonable. We extracted factors based on the eigenvalues and cumulative variance explanation rate [24, 28]. As shown in Table 4, when the eigenvalues were all greater than 1, a total of 5 factors were extracted by factor analysis. The variance explanation rates of these 5 factors after rotation were 17.948%, 15.182%, 13.170%, 10.773%, and 4.127%, respectively. The cumulative variance explanation rate after rotation was 61.2%>50%, indicating that data information can be effectively extracted (See Table 4 at the end of the document).

Table 4 Results of scale variance explanation

In order to show the strength of the relationship between factors and variables, we made a rotation component matrix. As can be seen from Table 5, the absolute value of the factor loading coefficient of each item is greater than 0.4, which means that there is a corresponding relationship between the options and factors. In addition, it can be seen that each group of items corresponds to its factor: condition guarantee 1–12 belongs to factor 1; education and teaching 1–10 belongs to factor 2; student physical fitness 1–8 belongs to factor 3; organization management 1–6 belongs to factor 4; supervision and inspection 1–2 belongs to factor 5. This shows that each factor can represent a specific dimension in the questionnaire, which also verifies that the preset concept of the questionnaire is highly consistent and correlated with each group of variables, and the design of the questionnaire is effective (See Table 5 at the end of the document).

Table 5 Factor loading coefficients after rotation

Test-retest

40 people were randomly selected from the pre-test data to fill out the questionnaire again. The initial test data was set to Group A, and the retest data was set to Group B. The pre- and post-test data of these 40 people were used to retest each dimension of the questionnaire. The results are as follows. The test-retest reliability test results of the five dimensions of organizational management, education and teaching, condition guarantee, student physical quality, supervision and inspection are shown in Table 6. The correlation coefficients of Group A and Group B of each dimension are 0.875, 0.884, and 0.793 respectively., 0.908, 0.747, the significance probability values are all less than 0.05, reaching the significance level, which means that the test-retest reliability of each dimension in the questionnaire is good, the stability is high, and the questionnaire has good reliability (See Table 6 at the end of the document).

Table 6 Test-retest reliability table

Discussion

Principal findings

This study aims to develop a questionnaire for evaluating the implementation level of physical education programs in junior high schools and tests the validity and reliability of the questionnaire. The Cronbach’s α coefficient of the reliability indicators of each dimension of the questionnaire are all greater than 0.7, indicating that the questionnaire is reliable. In the process of exploratory factor analysis (EFA), principal component analysis showed that 5 main factors could be extracted, and the variance explanation rate of each factor was 17.948%, 15.182%, 13.170%, 10.773% and 4.127%, respectively, and the cumulative variance explanation rate was 61.2%, indicating that the extracted factors can well explain most of the variables in the original data, thereby effectively capturing the main information and structure in the data set. After orthogonal rotation, it can be observed that each factor corresponds to each item. Orthogonal rotation enhances the interpretability and clarity of factor loadings, making the items of the questionnaire highly specific in the factor structure. Specifically, each factor clearly corresponds to a set of specific items, which together reflect a single dimension. Factor 1 corresponds to items related to conditional assurance, factor 2 corresponds to items related to education and teaching, factor 3 corresponds to items related to students’ physical fitness, factor 4 corresponds to items related to organization and management, and factor 5 corresponds to items related to supervision and inspection. The results of exploratory factor analysis show that the factor structure corresponds to the theoretical constructs of the questionnaire. This consistency indicates that the questionnaire performed well in the early stages of design and that the questionnaire items can effectively collect the predetermined constructs. Each extracted factor corresponds to the dimensions of the questionnaire design, which supports the theoretical basis and structural validity of the questionnaire.

Theoretical and practical significance

This questionnaire is constructed based on the evaluation content of the physical education level evaluation index system for primary and secondary schools issued by the Ministry of Education of China in 2014. The evaluation index was further simplified during the development and verification process of the questionnaire, making it easier to understand and operate when evaluating the implementation level of physical education programs in junior high schools. Therefore, this questionnaire is more convenient and quick to use. The filler does not need professional training, but only needs to fill it out according to his or her actual feelings. In addition, the questionnaire combines the latest physical education policies and is more adaptable and flexible.

Strengths and weaknesses

Therefore, the main advantages of this questionnaire are its simpler operation and practicality, and its ability to accurately reflect the actual situation of physical education in junior high schools. However, compared with the existing government evaluation form, its limitation is that the questionnaire has not been widely used in practice, and its stability and effectiveness in different regions and cultural backgrounds are uncertain whether it can reach a good state in the research process, especially in evaluating the situation of schools in remote areas.

Future directions

The reliability and validity analysis results of the questionnaire show that the design of the questionnaire is consistent with the pre-assumptions, and the questionnaire has great potential in practical applications. In the future, more extensive data collection and analysis should be conducted on this questionnaire, and its validity and reliability should be further verified using different regions and cultural backgrounds to enhance the universal applicability of the questionnaire. In addition, an electronic questionnaire evaluation system can be developed to make data collection and analysis more efficient, which will help achieve real-time feedback and continuity.

Conclusions

This study designed a questionnaire for the evaluation of the implementation level of physical education programs in junior middle schools in China and tested its reliability and validity through a variety of analytical methods. The results showed that the five-factor structure constructed by this questionnaire has good reliability and validity and can be used to investigate the implementation of physical education programs in junior middle schools in China. The development of this questionnaire fills the gaps in the specific operability and detailed reflection of existing evaluation tools and provides a practical tool for school administrators and education policy makers. Secondly, the school’s implementation level derived from this questionnaire can be used as a basis for studying the changes in students’ physical fitness level with the school’s implementation level. Finally, the school’s scores from this questionnaire can be used to identify the deficiencies in the implementation of its physical education program, and the school’s efforts to improve the deficiencies and raise the level of implementation can provide a solid guarantee of the physical fitness of the students. However, it is worth noting that future research needs to be expanded to a wider range of regions to improve the universal applicability of the questionnaire.