Introduction

One of the most important components of pre-service teacher education programs is practical field experience. This is appreciated by both pre-service teachers and teacher educators (Arnold, Gröschner, & Hascher, 2014) and is therefore “a key aspect of a teacher education program” (Beck & Kosnik, 2002, p. 81). Accordingly, most teacher education programs in Germany have implemented long-term internships (Hascher & de Zordo, 2015). In line with this strategy, there is a growing body of research regarding the effects of internships on pre-service teachers’ teaching competencies (e.g. Arnold et al., 2014; Cohen, Hoz, & Kaplan, 2013; Hascher, 2012). However, most studies focus on pre-service teachers’ beliefs, attitudes, motivational variables, or perceived competency development and are mainly based on self-reports (Arnold et al., 2014; Cohen et al., 2013; Hascher, 2012). In order to measure the development of pre-service teachers’ performance during a long-term internship, classroom observations are a less frequently used tool. Especially in physical education (PE), there is a lack of research regarding the effects of long-term internships on pre-service teachers’ actual teaching performance. This research gap needs to be closed, since PE with movement and play as its core content has specific challenges that are fundamentally different from other subjects like math or sciences, hence teaching situations differ from regular classrooms. Scherler (2008), for example, reported specific challenges for PE teachers with regard to content orientation, learning group management, and teacher–student communication. In order to contribute to a closure of the research gap in the effects of a long-term internship on pre-service teachers’ teaching performance in PE classrooms, the authors conducted a video-based analysis using the Classroom Assessment Scoring System (CLASS; Pianta, La Paro, & Hamre, 2008), focusing on the quality of teacher–student interactions in PE classrooms.

Core domains of teacher–student interactions

Individual teachers have a great impact on students’ learning (Hattie, 2009). Content knowledge and knowledge of methodical/didactic preparation of these contents can be regarded as the basis for good teaching. At this point, a distinction must be made between the structural quality of lesson preparation by teachers and the dynamic quality of teacher–student interactions during the lesson itself. Good planning does not necessarily lead to good teaching. Highly qualified teachers should know what to teach and how to teach (pedagogical content knowledge) and should be able to apply this knowledge to lesson planning and in teacher–student interactions during their lessons. Ultimately, it is the classroom processes that students directly experience and primarily the quality of teacher–student interactions that facilitate student learning and future academic success (La Paro & Pianta, 2000; Pianta, La Paro, Payne, Cox, & Bradley, 2002). According to Pianta, Hamre, and Mintz (2012), the quality of social and instructional interactions between teachers and students is essential for promoting student learning and long-term school success. In the majority of the present work on the competence diagnostics of teachers, the actual performance is usually disregarded (Baumgartner, 2018). For ecological validity purposes, it is therefore currently required that the skill performance of teachers in the actual teaching situation should be given greater consideration. The concrete skill level of teachers in a teaching situation is described on the basis of Shavelson (2010, 2013) by the term teaching performance. In German-speaking countries, three basic dimensions of teaching quality are used as a conceptual framework for teaching performance (e.g. Klieme, Pauli, & Reusser, 2009; Klieme & Rackoczy, 2008; Praetorius, Klieme, Herbert, & Pinger, 2018). These dimensions are classroom management, student-oriented classroom climate, and cognitive activation and therefore very similar to the domains suggested by Pianta and colleagues. They grouped these interactions into three domains: Classroom Organization, Emotional Support, and Instructional Support (Pianta et al., 2012; Pianta & Hamre, 2008). Especially for PE classrooms, the broader term of instructional support, which is suggested by Pianta et al. (2012), seems to grasp teacher–student activities better than the term cognitive activation. All CLASS dimensions are supposed to be subject-independent and are based on established theories and various empirical investigations in this field. Research suggests that students in classrooms with more emotional support have higher social competence and demonstrate higher performances in school (Burchinal, Vandergrift, Pianta, & Mashburn, 2010; Curby et al., 2009; Curby & Chavez, 2013; Guo, Piasta, Justice, & Kaderavek, 2010; Mashburn et al., 2008). Moreover, effective classroom organization and instructional support are positively linked with behavior competence (Burchinal, Vernon-Feagans, Vitiello, Greenberg, & Family Life Project Key Investigators, 2014) and increased skills across different subjects (Hamre, Hatfield, Pianta, & Jamil, 2014; Maier, Vitiello, & Greenfield, 2012; Xu, Chin, Reed, & Hutchinson, 2014).

For PE, the three core domains of teacher–student interactions are also considered to be central to the quality of teaching (Herrmann, Seiler, Pühse, & Gerlach, 2015), taking into account the special features of the subject in terms of shape and form. This is especially important for the domain Instructional Support. For this domain, there is consensus in the scientific community that subject-specific supplementation is needed (Niederkofler & Amesberger, 2016). As this instrument does not exist—neither does a generally accepted competency model for PE—CLASS K‑3 is used in this study without additions or modifications. A corresponding procedure is not considered necessary for the dimensions of Classroom Organization and Emotional Support; and corresponding subject-specific reading is considered sufficient (Herrmann et al., 2015). In the following, the three domains of teacher–student interactions are specified and empirical findings for PE presented.

Classroom organization

One of the most important components of teacher competence is classroom management (König, 2015). Classroom management or classroom organization refers to “the organization and management of students’ behavior, time, and attention in the classroom” (Pianta et al., 2012, p. 3). Teachers who can organize classrooms effectively minimize interruptions and maximize learning (Kounin, 1970). Therefore, in highly organized classrooms, students can spend more time on task (Lipowsky et al., 2009). Teachers display good classroom management skills when they establish consistent rules and routines and when their lessons are well-structured and efficiently organized (Praetorius, Pauli, Reusser, Rakoczy, & Klieme, 2014). These aspects can be defined as Productivity according to the CLASS framework (Pianta et al., 2012). Moreover, teachers should be able to stop inappropriate behavior immediately and encourage positive behavior (Behavior Management, Pianta et al., 2012). The way in which teachers utilize interesting learning materials is also important to engage students in active learning (Harris, 1998). The CLASS K‑3 framework defines this as Instructional Learning Formats. Research has shown a correlation between student achievement and the ability of teachers to organize classrooms (for an overview, see Hattie, 2009). Moreover, research indicates that classroom organization is positively associated with children’s social skills, their behavioral competence, and their engagement in learning (Pianta et al., 2012). Furthermore, effective classroom management helps to prevent anxiety disorders and burnout of teachers (Friedman, 2006; Lopez et al., 2008). To improve the ability of teachers to organize a classroom, not only is theoretical knowledge needed, but also teaching experiences in various classrooms. PE, in contrast to other subjects, does not take place in a regular classroom and PE teachers must perform in different teaching spaces (gymnasium, outdoor sports grounds, swimming pool) that are likely to suffer from high noise levels and poor acoustics, making effective teaching more demanding. Additionally, the children do not usually have a fixed seat in PE, but move around in a comparatively large area. Therefore, PE teachers must develop clear and consistent, yet flexible rules and routines that fit the different learning spaces (Cothran & Kulinna, 2014). Regarding the field of PE, Miethling and Krieger (2004) identified unsuccessful classroom management as a negative predictor for motivation. In addition, Seiler (2016) was able to show that effective classroom management has an impact on students’ motivation, while Heemsoth (2014) was able to prove that successful classroom management predicts performance motivation.

Emotional support

Another core domain of teacher–student interaction is emotional support. This refers to ways in which teachers help children develop warm and supportive relationships, experience enjoyment and excitement about learning, feel comfortable in the classroom, and experience appropriate levels of autonomy (Pianta & Hamre, 2008). The CLASS K‑3 framework suggests four dimensions of emotional support (Positive Climate, Negative Climate, Teacher Sensitivity, and Regard for Student Perspectives). Research indicates that children display more social competence and engage more positively with their peers and teachers in classrooms with high levels of emotional support (Burchinal et al., 2010; Curby et al., 2009; Downer et al., 2011; Mashburn et al., 2008). Moreover, they demonstrate fewer behavior problems and therefore have fewer conflicts with teachers (Hamre, Pianta, Downer, & Mashburn, 2008). There are also findings that emotional support is related to more engagement in school, academic achievements, and increased school motivation (e.g. Pianta et al., 2012). In comparison to other subjects, situations in PE are characterized by physical actions and are usually more emotionally charged (e.g., losing a match, performing in front of others). This makes emotional support more important and challenging and shapes teacher–student interactions in a special way (e.g., with regard to physical proximity and non-verbal or verbal cues). Regarding PE, Gerlach (2005) and Gerlach, Kussin, Brandl-Bredenbeck, & Brettschneider (2006) showed that positive teacher–student relationships have an impact on students’ exertion and well-being. Corresponding results are available for the intrinsic motivation of students (Jaakkola & Liukkonen, 2006). Heemsoth (2014) was able to show a connection between the performance motivation of students and teacher–student interactions as well as student–student relationships. There are also results on the importance of relationships in PE, mostly from qualitative studies (including Miethling & Krieger, 2004; about the students’ need for safety in PE lessons).

Instructional support

The conceptual foundation of instructional support is based on research into students’ cognitive and language development (Pianta et al., 2012). Teachers’ instructional support is associated with students’ development and learning, as well as their academic performance (Jamil, Sabol, Hamre, & Pianta, 2015). Teachers who demonstrate good instructional support provide their students with consistent, process-oriented feedback (Quality of Feedback; Pianta et al., 2008); they focus on higher-order thinking skills and help their students to understand things in a meaningful context (Concept Development; Pianta et al., 2008). Instructional support in preschool and elementary school also includes how teachers promote the language skills of their students (Language Modeling; Pianta et al., 2008). Klieme, Pauli, & Reusser, (2009) refer to the dimension of cognitive activation, when teachers encourage higher-order thinking in their students. Teachers who are interested in their students’ thoughts, who provide stimulating questions, challenging tasks, and connect different concepts and ideas are more likely to create a cognitively activating classroom (Taut & Rakoczy, 2016). However, in contrast to classroom organization or emotional support, cognitive activation/instructional support “has to be defined for each academic subject based on specific findings from didactic research and cognitive psychology in that field” (Taut & Rakoczy, 2016, p. 47). In PE, many of these cognitive aspects seem less relevant at first glance. Students often imitate actions that were demonstrated by the teacher or other students. In addition, periods of conversation and verbal reflection are less frequent in PE, but not less important (Greve, 2013). In the field of PE research, the dimension of cognitive activation gained importance only recently (Niederkofler & Amesberger, 2016). This can be explained by the fact that there is currently no subject-specific competence model that could be transferred to a corresponding examination instrument (Herrmann et al., 2015).

Development of teacher–student interactions during long-term internships

Field experiences have “been widely recognized as a fundamental educational tool that allows for the integration of theoretical knowledge with real world practice in the professional field” (Tapp, Macke, & McLendon, 2012). Therefore, it is assumed that school internships offer a variety of learning opportunities for pre-service teachers and improve teaching qualities accordingly. In particular, the implementation of long-term internships in the first phase of teacher training is often justified by the fact that longer internships are superior to shorter ones (Weyland & Wittmnn, 2015). In addition, Gröschner et al., (2015) point out that internships in teacher education can promote the acquisition of teaching skills.

Looking at previous research, it is noticeable that studies on subjective competence assessment by pre-service teachers themselves or by their mentors predominate (Gröschner, Schmidt, & Seidel, 2013; Weyland & Wittmann, 2015). For the field of PE, Linka and Gerlach (2019), for example, compare pre-service teachers’ assessments of the dimension of class leadership with corresponding students’ assessments.Footnote 1

However, restraint is called for, since only a handful of empirical studies examine the extent to which internships improve the concrete teaching activities of pre-service teachers (Gröschner, Klaß, & Dehne, 2018). Küster (2008) analyses classroom videos of eleven pre-service teachers during a long-term internship. He uses both low and highly inferential procedures of classroom observation to investigate the growth in teaching performance. Results of the highly inferential analysis display a significant increase over time, whereas the findings of the low inferential video analysis showed no increase. For the subject of PE, there are virtually no known findings regarding the analysis of pre-service teachers’ performance using classroom videos. Only Baumgartner (2018) was able to reveal, in a pre-post design, that pre-service teachers for vocational school do not improve their feedback-related performance through a long-term internship. Baumgartner used the concept of performance as the observable behavior that emerges from professional competencies (Shavelson, 2010, 2013), which is also used in the present study. Schwarz (2009) was able to show that pre-service teachers remember their own teaching situations better if they analyzed them with the help of video recording.

Measuring teacher–student interactions and teaching performance

Classroom observation instruments have increasingly gained importance in recent years due to their potential to provide an evidence-based measure of teacher–student interactions and teaching performance (White, 2018). In contrast to low inferential rating systems, highly inferential rating systems require a high degree of inferences on the part of the observer (Lotz, Gabriel, & Lipowsky, 2013). Therefore, observers must be carefully trained in order to provide accurate and consistent ratings (White, 2018). One instrument that was developed and published in 2008 and has since been widely adopted for research and evaluation purposes in over 3000 early childhood and elementary classrooms is the Classroom Assessment Scoring System (CLASS; Pianta et al., 2008). CLASS is not restricted to a specific subject and can be used to analyze changes in different subdimensions and the aforementioned three domains of teacher–student interactions (Sandilos & DiPerna, 2011). Two previous US studies with an overall of 115 primary school teachers conducted with CLASS K‑3 summarize the performance of the involved teachers in the domain of emotional support in an upper mid-range, classroom organization in an upper mid-range, and instructional support in a lower mid-range (Pianta et al., 2008). In their study of 332 Italian primary school teachers, Longobardi, Pasta, Marengo, Prino, & Settanni, (2018) found upper mid-range performances in all dimensions and domains.

CLASS observers need to complete a 2-day training program first and succeed in a video-based reliability certification. During the recertification process, trainees score five 20-min videotaped segments from classrooms on a seven-point scale, and their scores are compared with those of the master coders. To assess interrater reliability, CLASS uses the percent-within-one (PWO) analysis (Hamre et al., 2008; Pianta et al., 2008; La Paro, Pianta, & Stuhlman, 2004). When calculating PWO, scores are considered to be in agreement if they fall within ±1 point of each other. Thus, for two raters to achieve 80% reliability on a CLASS cycle, eight out of 10 scores must fall within one point of each other. Trainees who pass with a PWO of 80% or higher in relation to the master coders are certified in CLASS. In field studies with different CLASS observers, most researchers report adequate (between 0.78–0.96%) interrater reliability of scores on CLASS dimensions (Sandilos & DiPerna, 2011).

Aim of the study

Up until now, there have been no empirical findings on the development of PE pre-service teachers’ teaching performance during long-term internships. The teaching performance of pre-service PE teachers was analyzed and evaluated with regard to the improvement of classroom organization, emotional support, and instructional support during a 5-month internship. In addition, the professional performance of pre-service teachers in the present study is compared to performance values from other CLASS studies (that do not refer to results of PE studies) to inspect possible subject-specificity or particularities of the sample.

Method

Participants and study design

The sample consisted of 11 (72% female; Mage = 24, SDage = 1.5) pre-service PE teachers for primary school in their second masters semester of teacher education. All students had a bachelor’s degree in teaching and learning, which they received from the same university. Their bachelor was set up for six semesters and two internships were part of the bachelor program. The first internship was a 3-week observation and teacher assistance in their second bachelor semester, while the second internship was a 4-week teaching position in their fourth bachelor semester. Since classroom management is considered one of the most important tasks for pre-service teachers (Wolff, Jarodzka, & Boshuizen, 2017), the first teaching internship focused on classroom management. A 5-month internship is integrated in the curriculum of the master program. The pre-service teachers have a subject-specific preparatory seminar for the internship, as well as an accompanying seminar during their internship. During the internship, pre-service teachers are obliged to teach 64 lessons in each subject. They had a school-mentor in each subject and received three visits from lecturers from the university seminars. Various topics were reflected on in the consultations, for example: organization in the gymnasium, the conversation between pre-service teachers and students, and the didactic-methodical preparation of lesson contents.

Video material and rating procedures

All pre-service teachers were videotaped three times at the beginning of the internship (cycles 1–3; first 3 weeks) and three times at the end (cycles 4–6; last 3 weeks), so that each participant had six videotaped lessons of his or her teaching. All six lessons per pre-service teacher were conducted in the same class (elementary classrooms, class level one, two, or three). Between the third and fourth filmed lesson, the pre-service teachers received three visits from their university supervisors. The supervisors observed the lessons and discussed them afterwards with the pre-service teachers. The discussions were generally about the content of the lessons and the performance of the pre-service teachers, but did not focus on CLASS criteria. In total, the data basis for this study consists of 66 filmed PE lessons. All the videos were rated by two licensed observers (independently from each other) using the CLASS K‑3 (Pianta et al., 2008).

All 10 sub-dimensions were rated by using indicators and behavioral markers. See Table 1 for brief descriptions of the CLASS domains, the sub-dimensions, and examples of indicators and markers. All indicators and markers are described in detail in the CLASS K‑3 Manual (Pianta et al., 2008). Moreover, the manual describes when processes in the classroom lead to a high, mid, or low rating in each dimension. All 10 subdimensions are rated from 1 = low to 7 = high, with 1 or 2 indicating low quality, 3, 4, or 5 indicating mid-range of quality, and 6 or 7 indicating high quality. To calculate the three domain scores (classroom organization, emotional support, and instructional support), the average dimension scores that fall into each domain are added up and divided by the number of dimensions in that domain.

Table 1 Definitions and examples from this study of CLASS K‑3 domains and sub-dimensions

Interrater reliability

Since PWO was the primary indicator of interrater reliability reported in the CLASS manual, it was also considered as the reliability indices in this study. Reliability indices for all subdimensions are reported in Table 2 and 3. The overall interrater reliability ranged from 82 to 100%.

Table 2 Interrater agreement for the CLASS dimensions calculated by dimensions at the beginning (T1) and at the end of the long-term internship (T2)
Table 3 Interrater agreement for the CLASS dimensions calculated by cycles

CLASS observations were conducted in cycles. A cycle consisted of a 20-min observation period at the beginning of the lesson and a 10-min rating period. Agreement calculations were computed by cycle, as the six 20-min cycles were scored independently. All ratings were conducted based on the videos. Scores from the PWO analysis ranged from 82 to 100% for cycle 1, from 73 to 100% for cycle 2, from 91 to 100% for cycle 3, from 73 to 100% for cycle 4, from 82 to 100% for cycle 5 and from 91 to 100% for cycle 6.

Results

Development of physical education pre-service teachers’ teaching performance

Analyses were conducted using the SPSS software (version 26; IBM Corp., Armonk, NY, USA). Means, standard deviations, and effect sizes are reported in Table 4. The results demonstrate that emotional support and classroom organization were rated as mid-range at the beginning of the long-term internship (cycles 1–3), whereas instructional support was rated low. The same pattern can be reported for the end of the internship (cycles 4–6). To investigate whether the change is significant, t‑tests for paired samples were conducted and effect sizes for repeated measures calculated. In accordance with Cohen (1988), values <0.4 were regarded as a small effect, values between 0.5 and 0.7 as a medium effect, and values 0.8 as a large effect. None of the core domains increased significantly over time. Regarding the sub-dimensions, results show that the PE pre-service teachers in this study indicate only rare or no negative affects, which resulted in low scores for Negative Climate, both at the beginning and at the end of the long-term internship. However, results show significantly less Negative Climate at the end of the long-term internship with a high effect size (see Table 4). Positive Climate, Teacher Sensitivity, Behavior Management, Productivity and Instructional Learning Format had an average score of four to five at both measurement points. In these dimensions, the pre-service teachers fit in the mid-range description of the CLASS K‑3 manual, with one or two indicators in the high range. The ratings here show that the dimension Regard for Student Perspectives was rated somewhat lower than the aforementioned dimensions. Descriptively, participants had the highest increase in this dimension with a medium effect size (see Table 4). However, results of t‑tests for paired samples indicated no significant effect. All dimensions of the domain Instructional Support were rated in the low range, both at the beginning and at the end of the long-term internship.

Table 4 Means (M), standard deviations (SD), and effect size for repeated measures (d) for CLASS dimension and domain scores at the beginning (T1) and at the end of the long-term internship (T2)

Discussion

This study aimed to offer some insights into the development of PE pre-service teachers’ teaching performance during a 5-month internship. To measure teacher–student interactions of pre-service teachers, the CLASS K‑3 observation instrument was applied. Results revealed no significant changes in the teaching performance of PE pre-service teachers over the course of 5 months, with one exception for the subdimension of Negative Climate. The significant change in this dimension could be due to the fact that pre-service teachers were stressed and nervous during their first teaching lessons, which could have resulted in slightly higher ratings. At the end of the internship, pre-service teachers could have been more used to interactions with their students. However, these interpretations need to be further investigated in future research by using questionnaires or by conducting interviews with the pre-service teachers after each lesson. The recorded performance values of pre-service teachers in our study are similar to findings in other studies with experienced teachers in non-PE classrooms (Longobardi et al., 2018; Pianta et al., 2008). Like experienced teachers, the pre-service teachers in PE classrooms score in the upper mid-range in the domains of Emotional Support and Classroom Organization. Only in the domain Instructional Support, the pre-service teachers achieved lower values by comparison. They score in the low-range while experienced teachers in the studies mentioned achieved ratings in the middle-range. It is surprising that the pre-service teachers in our study performed equally well at the beginning of their internship as experienced teachers in the domains of Emotional Support and Classroom Organization. In the case of Classroom Management, one possible explanation is that this topic is discussed in detail during different university seminars in preparation for the internship and that the specific professional expertise is easily transferable to actual classroom performance. Since the university seminars usually do not cover aspects of Emotional Support, it can be assumed that these performatively visible competences of PE pre-service teachers are acquired and consolidated in non-academic contexts ahead of time, e.g., in care and learning situations with children in private settings or through child-related coaching activities in sport clubs, for example. The fact that scores in Emotional Support and Classroom Organization were already relatively high at the beginning of the internship of pre-service PE teachers may explain why the impact of the internship in these areas of teacher performance is comparatively low. To increase performance quality in these areas, targeted and recurring feedback over longer periods of time would probably be necessary.

The low scores of PE pre-service teachers in the domain of Instructional Support, on the other hand, indicate that this is the most promising area of developmental growth for pre-service teachers at this stage. Since performative competences in this area are obviously acquired neither through university seminars nor in non-academic contexts beforehand, they should be targeted in teacher training during the established long-term internships. Yet, the findings here demonstrate that no significant developmental changes in teaching performance occur during the internship of pre-service PE teachers. However, multiple possible explanations for these findings and implications for possible changes in these sequences of teacher training seem conceivable.

It is possible that developmental changes to the pre-service PE teachers teaching quality occur on the comparatively structural level of lesson planning and preparation that do not yet “trickle down” to the actual teacher–student interactions. Improved lesson planning as well as routines could lead to a situational cognitive discharge later on, which experienced teachers can use to focus their attention on individual learning processes of the students and give specific instructional support. Hence, it is possible that a 5-month internship is too short to see learning effects on a performance level of pre-service teachers. Further assessment of pre-service PE teachers’ performance during the second phase of teacher training and at the beginning of their teaching career could provide a deeper insight here. Another possibility is that pre-service PE teachers receive too little or non-specific feedback related to the domain of Instructional Support during their reflective talks with school and university mentors after their lessons. Vogler, Messmer, and Allemann (2017) come to similar conclusions regarding the pedagogical content knowledge development of pre-service PE teachers. In the case of the present study, these talks followed no systematic scheme, but focused in a situational manner on different aspects that stood out to the mentors in the observed lesson. Reflective talks that identify and select individual areas of growth and target these over longer periods of time during the internship could be more effective in this regard (Shavelson, 2010, 2013).

Furthermore, unfocussed and non-systematic reflective talks in combination with a large extent of compulsory teaching in the internship could lead to learning by doing or trial and error strategies. In the case of novices, these might only lead to success by chance and not to systematic development. Moreover, this would support existing critical views on the effects of internships in teacher education (Hascher, 2011). Whereas most research underlines positive effects on pre-service teachers’ competency development, there are empirical findings that long-term internships with a high teaching obligation can have a deprofessionalizing effect. Due to the increasing and prolonged school practice during studies, currently existing (possibly problematic) school practice is reproduced without more critical reflection (Hascher, 2012; Vogler et al., 2017; Weyland & Wittmann, 2015). This self-reproduction of the school system contradicts the idea of a critical-constructive university education for teachers and can be regarded as very problematic in consideration of the discussion on the quality of teaching.

Limitations

This study faced some limitations that encourage future research in this field. First, the sample size is relatively small. Pre-service teachers needed to volunteer to participate in this study and, moreover, data privacy laws require parental consent of the school students. Therefore, video-based studies rarely involve large sample sizes (e.g., Major and Watson, 2018). Second, the CLASS K‑3 was used, which is a highly implemented instrument to measure the quality of teacher–student interactions. CLASS K‑3 is supposed to be subject-independent. However, as far as the authors know, there are no studies using CLASS K‑3 in PE classrooms. As mentioned before, there is an ongoing discussion about a subject-specific supplementation for the domain of Instructional Support for PE classrooms, which has not resulted in an adapted observation instrument as yet. Therefore, the comparisons drawn here between scores in this domain of pre-service PE teachers and experienced teachers in other CLASS studies (in non-PE classrooms) may be limited in their validity. For future research, it would be of interest to adapt and validate CLASS K‑3 for the rating of PE lessons. Whereas most subjects in schools aim at higher-order thinking skills and cognitive activation of students, PE lessons have a different pivotal learning goal (Vogler et al., 2017). In PE, hands on materials and inductive physical-motor learning processes play a central role, where teacher feedback is not always necessary but provided by the reactions of materials themselves. Third, one has to take into consideration that a video-camera in a classroom may have an effect on the behavior of pre-service teachers during the lesson. Fourth, other factors related to pre-service teachers’ teaching performance were not measured. For example, contextual factors, such as the number of informal feedback occasions in school, subject knowledge, motivation, or prior teaching experiences might have had an effect on the results of the study. Therefore, future research should include these variables.

Conclusions

This study is the first to measure the development of PE pre-service teachers’ teaching performance during a long-term internship in Germany. The results showed no significant changes in the teaching performance of PE pre-service teachers over the course of 5 months (with one exception for the subdimension of Negative Climate). These results have several implications, both for teacher educators and for researchers in the field of PE and video-based measurement. School internships may need to be longer than 5‑month periods to archive an increase in actual teaching performance and interactional teaching quality of pre-service teachers. Feedback sessions between mentors and pre-service teachers after lessons should be more structured and should focus on the quality of teacher–student interaction. Moreover, it is of interest for researchers to validate the CLASS instruments for use in PE lessons. In order to validate the CLASS instrument for PE with its specific features, studies should be carried out in which CLASS performances of pre-service teachers during internships in PE and a regular subject are collected and compared. Thus, one could examine whether the interaction quality of teachers with CLASS can be captured subject-independent and whether the interaction quality of pre-service teacher does not rise over the course of a 5-month internship in any of the observed classes. Corresponding data could support the insights of this study that an extended practice phase and focused mentoring talks after the lessons are necessary to induce an increase in the interaction quality of pre-service PE teachers.