Today, we have reached a scholarly understanding of teaching and learning as being situated, complex, and reciprocally interactive activities. Teaching can be considered as situated because each pedagogical action is embedded in a concrete communicational context, and therefore, its meaning and function can only be reconstructed in relation to this context. For example, whether a teacher’s praise for a student’s performance on a task can be regarded as autonomy promoting or rather autonomy constraining (see Ryan and Deci 2000) depends on the concrete learning situation (e.g., receiving explicit praise for the mastery of a very simple task might lead the student to believe that s/he has a rather low ability level compared with the peer students). Characterizing the nature of teaching as complex and reciprocally interactive is justified because each pedagogical action is influenced and co-determined by multiple factors inherent in the teaching situation, such as the students’ current motivation and personal intentions, their individual learning prerequisites, the subject matter to be taught, the learning culture of the school, and so on. Thus, a teacher’s pedagogical actions are always action and reaction at the same time. Given the complex nature of teaching, there is often no straightforward or algorithmic way to determine which pedagogical action should be taken in a given situation because, in teaching, there are always different roads that may lead to Rome (see Renkl 2015). Therefore, competent teaching requires the ability to make informed pedagogical decisions by weighing up the pros and cons of possible strategies with respect to the concrete boundary conditions of a given teaching situation (Wegner et al. 2014).

Similar to teaching, learning is likewise situated inasmuch as the learners have to respond to the constraints and affordances set by the teacher (see Greeno and Middle School Mathematics through Applications Project Group 1998) and this influences how students deal with the subject matter. For example, the students may (erroneously) infer from the teacher’s clear and coherent presentation that the subject matter is easy and it is therefore appropriate to reduce mental effort, or they may come to believe that, in mathematics, every problem has a solution because they were never confronted with ambiguous real world problems where some information is always missing and has to be hypothesized by the problem-solver in order to make the problem solvable (Reusser and Stebler 1997).

By looking at teaching and learning from the perspective of communication theory (Clark 1996), the reciprocal-interactive nature of teaching and learning can theoretically be understood as a collaborative process. Teaching and learning are interconnected in a reciprocal and interactive way precisely because, during classroom instruction, the teacher and the students have to collaborate in order to establish a common ground (see Clark 1996) about the meaning of subject matter. In this collaborative process, the role of the teacher is to make contributions that enable and facilitate students’ construction of knowledge, whereas the students’ role is to engage in such knowledge construction processes through social interaction with the teacher and peers. It should be noted that teacher and students negotiate not only the meaning of subject matter concepts and principles, but, on a meta level, also what the subject is all about (e.g., whether mathematics requires argumentative reasoning or simply practice of step-by-step procedures, see Richland et al. 2012; Weinhuber et al. 2019). For the social construction of knowledge to become successful, teacher and students need to continuously coordinate their individual perspectives on the subject matter and the enacted classroom social practice both on a verbal and non-verbal level (Clark and Wilkes-Gibbs 1986; Weinhuber et al. 2019).

An Emerging Research Paradigm

Irrespective of the situated, complex and reciprocally interactive nature of teaching and learning, it has been the promise of educational research that characteristics of teacher expertise can be identified (i.e., “input factors”, see Scheerens 2015) that enable teachers to influence students’ learning in a systematic and positive way and therefore, desirable processes can be measured in terms of raised instructional quality, such as student engagement and cognitive activation, eventually leading to outcomes in terms of raised achievement gains. Large-scale studies using advanced statistical methodologies, such as structural equation modeling, provided valuable insights into the dimensional structure of teacher knowledge and its impact on measures of instructional quality and student achievement (Baumert et al. 2010; Hill et al. 2005; Krauss et al. 2008; Kunter et al. 2013; Voss et al. 2011). However, these studies typically adopted a bird’s-eye view by correlating inter-individual differences in teacher knowledge with ratings of teaching quality (e.g., cognitive activation, classroom management) and student achievement scores. Therefore, they contribute little to our understanding of the underlying cognitive processes that enable a teacher to effectively deal with the complexity of classroom interaction and facilitate the social construction of knowledge. In this respect, the seven contributions to this special issue offer a promising avenue. One might even say that in these contributions, a new research paradigm takes shape that brings together several different and originally separated strands of research:

  1. (1)

    In research on professional development in teacher education, the concept of professional vision, originally developed by Goodwin (1994), has increasingly received attention in the last 15 years stimulated by Sherin’s and van Es’s (2005; see van Es and Sherin 2008) work on video clubs in teacher professional development. The basic assumption underlying this research is that teachers can develop their teaching competencies through the continuous and collaborative study of video records of their own as well as colleagues’ classroom teaching (Kleinknecht and Schneider 2013; Seidel et al. 2011). Following Sherin and van Es (2005), teachers can develop, in these video clubs, their awareness of students’ learning processes during classroom instruction through noticing and reasoning about the noticed events. Researchers in the field of professional vision further argue that there is a direct link between teachers’ professional vision and their ability to act effectively in the classroom (Kersting et al. 2012; Weber et al. 2018). In this respect, the ability to envision potentially alternative courses of action in a given teaching situation is regarded as crucial (i.e., “How would you have acted if you had been in the position of the teacher in the video?”, see Kersting et al. 2012; Kleinknecht and Gröschner 2016). Hence, the ability to envision as well as to weigh up the pros and cons of alternative pedagogical actions when faced with a problematic teaching situation seems to be a core feature of teaching competence. Building on the concept of professional vision, standardized assessments using authentic or staged video clips have been developed to measure teachers’ ability to competently evaluate critical teaching events (Gold and Holodynski 2017; Stürmer and Seidel 2017).

  2. (2)

    The focus on visual perception (i.e., professional vision) in teacher professional development parallels classic research on expertise within the field of cognitive science. In a seminal study, De Groot (1965, orig. 1946) found that chess masters, in contrast to chess players below master level, were able to almost perfectly reproduce complex chess configurations from memory they had been shown for only a few seconds (see also Chase and Simon 1973). Similar to de Groot, Berliner (1992) showed expert and novice teachers photographs of classroom situations for a few seconds and asked them to describe what they saw. Berliner found that the expert teachers’ descriptions were strongly driven by their knowledge schemas about classroom situations (so-called curriculum scripts, see Putnam 1987) whereas the novices simply described details of what they saw rather than interpreting the depicted situations. In the domain of medical image perception (e.g., X-ray and ultrasound pictures), there is considerable evidence for the so-called rapid holistic processing account (see Sheridan and Reingold 2017). Following this account, medical experts typically extract a global impression of an image almost instantly (e.g., of an X-ray picture). It is assumed that in forming this impression, medical experts compare the contents of the image with their stored knowledge schemas about the visual appearance of normal and abnormal medical images. This global impression enables them to identify perturbations, that is, deviations from the expert’s schemas that indicate possible pathogenic processes (see Sheridan and Rheingold 2017). In contrast to experts, novices have not yet acquired this rapid holistic mode of visual perception and are therefore, limited to a slow “search-to-find” mode. In summary, cognitive research on expertise has from its very beginning focused on the differences between experts and novices in the categorical perception of professionally relevant situations. This opens up a broad array of potential synergies with the current focus on professional vision in the domain of teacher professional development.

  3. (3)

    From a methodological point of view, advanced eye-tracking technologies may substantially contribute to unfold these synergies. Eye-tracking has widely been used in cognitive and educational psychology for several decades, for example, to investigate grammatical parsing processes in sentence comprehension (Konieczny and Döring 2003), or to study students’ integration of static and dynamic visualizations with expository text in multimedia learning (e.g., van Gog and Scheiter 2010; Mason et al. 2013). While in the beginning, the use of eye-tracking was restricted to the analysis of the processing of static displays, the rapid evolution of ever more powerful eye-tracking machines opened up interesting new avenues for educational research. Thus, it became possible to track people’s eye movements while observing videos of classroom teaching (Wolff et al. 2016; Wolff et al. 2014). The development of dyadic eye-tracking approaches made it possible to study gaze coordination processes during dyadic communication (e.g., “Does Susan’s gaze pattern influence John’s speech?” see Gergle and Clark 2011; Jermann et al. 2011; Richardson et al. 2007). Even more importantly, the availability of mobile eye-tracking glasses in the last years now allows for the study of communicational processes in naturalistic and unconstrained environments such as tutoring and small group scaffolding (see Haataja et al. 2020) and even more complex one-to-many communication settings as in whole-class instruction.

    Hence, with regard to professional vision and teacher expertise, a major promise of mobile eye-tracking technology is to shed light on how expert teachers as compared with novices competently use professional vision as a pedagogical and communicational means. This includes, for example, teachers’ monitoring and directing of students’ attention during whole-class instruction, the non-verbal grounding of their instructional contributions (e.g., throwing a questioning gaze on the students, see Clark and Brennan 1991), and the socio-emotional regulation of interpersonal relationships (using gaze to signal emotional attachment or distance, see Haataja et al. 2020)

  4. 4

    Eye-tracking technology may further be fruitfully applied to provide objective measures for certain aspects of instructional quality. Researchers in the domain of educational effectiveness have suggested conceptualizations of instructional quality that converge on three core dimensions (Hamre et al. 2013; Klieme et al. 2009): (a) classroom management that comprises a teacher’s ability to successfully prevent and cope with disruptive behavior of students in order to create a positive work environment that maximizes students’ time on task (Fauth et al. 2014a); (b) cognitive activation that denotes a teacher’s ability to engage students in productive knowledge construction activities, for example, by providing them with challenging tasks or by asking thought-provoking questions. From the perspective of cognitive load theory (Sweller et al. 2011, 2019), cognitive activation can be conceived of as the extent to which a teacher succeeds in engaging students in germane processing, that is, cognitive processes that are productive for learning (see Sweller et al. 2019). (c) Supportive climate refers to the teacher’s ability to give positive and constructive feedback, to see student errors and misconceptions as an opportunity to help students extend their knowledge, and to maintain a caring, appreciative and emotionally positive relationship with the students.

    In empirical studies on instructional quality, these dimensions are typically assessed by collecting ratings either from students, teachers, or external observers. Empirical studies showed that agreement among these groups of raters is at best moderate (in the case of classroom management and partly supportive climate), whereas low and even non-significant correlations between rater groups were found for cognitive activation (see Fauth et al. 2014b). These results suggest that even with regard seemingly easy-to-observe aspects of instructional quality such as the perceived frequency of disruptions or level of quietness in the class, students, teachers, and external observers showed considerable disagreement. Hence, methodologically, it could be a fruitful advancement of educational effectiveness research, if digital technologies such as eye-tracking or machine learning approaches to visual perception could be used to provide objective measures of aspects of instructional quality such as students’ attentional level and time on task.

Rewards and Challenges of the Seven Contributions of the Special Issue

The seven articles of this special issue on visual perception underlying teaching and learning originate from research groups who have made significant contributions in the past to at least one of the strands of research sketched above and who are therefore well-equipped to bring to bear the full potential of synergies inherent in this new paradigm for educational research. In the following, I will discuss the rewards and challenges of each of the articles. To this end, I will draw on my expertise as an educational psychologist and as a teacher educator with research interests in the cognitive processes underlying teacher behavior and teaching competences.

Seidel, Schnitzler, Kosel, Stürmer, and Holzberger (this issue) have a strong research background in teacher education. They focus on in-service and pre-service teachers’ impression formation regarding distinct types of students with specific cognitive and motivational characteristics (e.g., underestimating student with high knowledge but low self-concept). Their expert-novice study is part of a likewise ambitious, interesting, and challenging endeavor: To what extent are teachers—within the context of whole-class instruction—able to diagnose learning-relevant individual characteristics of students in order to adapt their pedagogical actions accordingly? How can eye-tracking methodology help to identify differences between experienced teachers and novice teachers in such professional vision competencies? Hence, Seidel et al. investigate so-called on-the-fly formative assessment (Shavelson 2006) in a whole-class setting. Generally, formative assessment is regarded as crucial for effective instruction because it enables the teacher to offer instructional and pedagogical support which is adapted to the students’ individual needs (see supportive climate as a core dimension of instructional quality; Klieme et al. 2009).

It is a major strength of Seidel et al.’s study that the authors drew upon distinct student profiles, which they had validated in extensive previous research, for designing their professional vision video task. Overall, the authors found partial support for their hypotheses. Especially with regard to the profiles of the underestimating, overestimating, and uninterested students, experienced teachers were more accurate in diagnosing these profiles in the video than novices. The eye-tracking data additionally revealed that the experienced teachers spent more fixations on the profiles of the uninterested and underestimating students, but, contrary to the authors’ expectations, not on the overestimating student.

The study alludes in some respect to the holistic processing account discussed above with regard to visual expertise in medicine (see Sheridan and Reingold 2017): Are experienced teachers—similar to medical experts—able, or could they be enabled with some deliberate practice, to rapidly form a global impression of individual students’ overt behavior in order to diagnose hidden learning-relevant student characteristics? The empirical evidence provided by Seidel et al. is rather tentative, and there are substantial differences between the domain of clinical medicine and the domain of teaching and learning. For example, for the visual inspection of an ultrasound picture, there are relatively concise and unequivocal pictorial codes that define how a healthy liver without any signs of fibrosis looks like. In contrast, diagnostic judgments regarding covert psychological characteristics of individual students in a whole-class setting are clearly more uncertain because the possibilities to ascertain the hidden meaning of an individual’s communicational contributions by grounding are very limited due to the one-to-many communication setting.Footnote 1 Hence, in order to prevent teachers from simply succumbing to an actor-observer biasFootnote 2 (see Jones and Nisbett 1971), they should be informed about the uncertain nature of informal impression formation. Additionally, they should be encouraged to see their informal impressions of students during whole-class instruction as an initial and preliminary hypothesis that needs to be carefully examined, that is, either verified or falsified, by subsequent switching into a slow search-to-find mode (similar to medical experts’ differential diagnoses), for example, in a one-to-one diagnostic interview with the particular student (White and Gunstone 2014).

Wyss, Rosenberger, and Bührer (this issue) like Seidel et al. (this issue) have a background in research on teacher education. Similar to Seidel et al., they used an expert-novice comparison design. However, interestingly, their expert participants were not simply experienced teachers but a group of teacher educators (i.e., “second order practitioners” as Wyss et al. call them) which is quite an exceptional and rarely studied group of expert teachers. A further strength of Wyss et al.’s study is their use of the critical incident technique. More concretely, Wyss et al. had their participants watch a video clip that featured an incident which the researchers viewed as significant from the perspective of pertinent pedagogical knowledge on classroom-room management. Indeed, disruptive behavior initiated by the teacher themselves has comprehensively been described by Kounin in relation to the “smoothness-dimension” in his seminal book on the principles of effective classroom management (Kounin 1970). Ever since, disruptive behavior by the teacher themselves is treated as a prime example of dysfunctional teacher behavior in practice recommendations on classroom management for novice teachers. Given the prominence of the teacher behavior featured in Wyss et al.’s video clip, it is quite surprising that only six out of the twenty-eight teacher educators interpreted the critical incident as expected by the researchers. Wyss et al. (2020) speculate that this might be due to their prompting of the participants to simply tell what they saw in the video instead of asking for an interpretation of the scene.

On the other hand, the eye-tracking data Wyss et al. collected showed marked and unequivocal differences in fixation times on the protagonist student in the video clip between the six teacher educators who mentioned the disruptive teacher behavior and the far majority of participants (irrespective of being teacher students or teacher educators) who did not. Hence, combining the verbal protocols with the eye-tracking data suggests that none of the teacher students and only a minority of the second order practitioners interpreted the critical incident as theoretically expected by the researchers.

One possible conclusion from these results is that pertinent evidence-based knowledge about effective classroom management regarding Kounin’s dimension of smoothness is not standard knowledge and common ground in the population of teacher educators out of which Wyss et al. recruited their expert participants. Alternatively, the results might rather indicate how difficult it is even for proficient teacher educators—who are generally familiar with the kind of situations depicted in the video—to decipher the meaning of classroom communication episodes in which one had not been involved as an active participant and is thus compelled to reach a conclusion in the communicative role of a coincidental “eavesdropper” (see Schober and Clark 1989).

Similar to the research teams discussed before, Pouta, Lethinen, and Palonen (this issue) have a focus on teacher education, but also a strong background in learning and expertise research. In their naturalistic eye-tracking study, several teacher students and experienced teachers wore mobile eye-tracking glasses while teaching a lesson on fractions. The lessons were also videotaped and used to collect retrospective recall protocols from the participants afterwards. The results were quite surprising and counterintuitive: Compared to the experienced teachers, the teacher students provided significantly more advanced interpretations of students’ mathematical thinking in the retrospective recall protocols, whereas the experienced teachers’ reasoning remained mostly on a descriptive level or they told how students typically operate rather than focusing on an individual student’s thinking.

On the other hand, the coding of the lesson videos clearly showed that the experienced teachers were far more successful than the teacher students in providing effective instructional support for students’ understanding of fractions. The eye-tracking data regarding teachers’ attempts to establish shared attention with the students provided additional evidence for this finding inasmuch as experienced teachers indicated more shared attention in fraction-understanding supportive episodes, whereas student teachers indicated more shared attention in fraction-understanding non-supportive episodes. These results show that experienced teachers and novice teachers differ in their ability to scaffold students’ mathematical thinking effectively.

At the same time, and this is the surprising part of Pouta et al.’s findings, effective instructional support by the experienced teachers was not associated with a superior ability to reason about individual students’ mathematical thinking retrospectively. Evidently, there was a dissociation between the quality of the teachers’ realized scaffolding of students’ thinking and their ability to consciously reason about the students’ thinking afterwards. This paradoxical finding sheds some doubt on a fundamental theoretical assumption underlying the notion of professional vision as currently favored in teacher education research (see Blömeke et al. 2015). Accordingly, “accurate noticing and knowledge-based reasoning […] are needed in order to transfer a teacher’s pedagogical content knowledge into accurate instructions” (see Pouta et al. 2020). In fact, Pouta et al.’s results call this mediator role of professional vision as “a system bridging knowledge and practice” into question. I will come back to this issue in the final subsection of this discussion paper.

Haataja, Salonen, Laine, Toivanen, and Hannula (this issue) have a research background in the psychology of mathematics education. In their naturalistic eye-tracking study, Haataja et al. (2020) employed multiple-person mobile eye-tracking to investigate how teachers and students used non-verbal communication through eye contact to coordinate their individual perspectives and regulate their interpersonal relationships. The study is interesting and innovative because the researchers tapped into the micro-level regulation processes of social interaction during classroom instruction by assessing the interactants’ non-verbal gaze behavior and the situationally varying qualities of their interpersonal relationships (i.e., agency and communion, see Kiesler 1983) simultaneously. This methodologically sophisticated research strategy of triangulating data on gaze behavior with continuous ratings of agency and communion makes it possible to analyze how agency and communion were reflected in teachers’ and students’ patterns of gaze behavior and how teachers and students used gaze behavior to express agency and communion. The results showed that both teachers and students used their gaze to take initiative in their interactions with each other. Teachers especially used gazes to express high agency, for example, in directing students’ attention to specific aspects of the learning material. On the other hand, with “friendly eye-contacts,” teachers also expressed communion and thereby encouraged students to take initiative in the problem-solving process. These results might be fruitfully used for teacher trainings that aim to develop teachers’ non-verbal communication skills. Given the naturalistic design of the study, however, the relative contribution of such non-verbal gaze behavior to the effectiveness of the regulation of the interpersonal relationship between teacher and students remains an open question. For example, how important is non-verbal behavior compared with other pedagogical means (e.g., articulating interest in the students’ learning progress, see Ryan and Deci 2000) teachers can draw upon to give students a sense of belonging? Also, is non-verbal gaze behavior a communicational skill that has to be developed by deliberate practice or is it rather a primary socio-cognitive ability as suggested by evolutionary psychology (see Geary 2008) that humans naturally acquire during ontogenesis? These are open questions that could be addressed by future research.

Whereas Haataja et al. (2020) employed mobile eye-tracking glasses to investigate how teachers and students used their gaze behavior to regulate agency and communion during mathematics problem-solving, Rosengrant, Hearrington, and O’Brien (this issue) applied a mobile eye-tracker to study students’ on-task vigilance and vigilance decrement in teacher-centered instruction within higher-education classrooms. Rosengrant et al.’s goal was to empirically examine a widespread educational myth, according to which students’ ability to follow a lecture attentively typically depletes after only 15 to 20 min (e.g., Chaney 2005). Accordingly, it is often recommended in advice literature for teaching not to exceed presentation of content by this time span due to the students’ limited attentional capacity. To investigate empirically to what extent such recommendations are justified, Rosengrant et al. conducted a methodologically sound longitudinal study in which a sample of university students wore mobile eye-tracking glasses repeatedly during a varying number of classes over longer periods of time. The authors meticulously coded on a second-by-second basis the students’ off-task and on-task attention behavior based on the eye-tracking data. They found that students spent on average almost 89% of class sessions lasting 70 min on-task, which is very high. When plotting students’ on-task behavior on a minute-by-minute basis over the whole classroom session, it was found that students’ vigilance level reached 90% and higher a few minutes after the beginning and stayed on that high level for the rest of the session.

These results clearly refute the widespread belief that students’ attentional resources deplete after only 15 or 20 min (see, Chaney 2005). They show that adult students have no problem in following attentively lectures and classroom sessions of a length of more than 1 h given sufficient opportunities for active participation and interaction with the teacher are provided. Rosengrant et al.’s results are of clear practical relevance regarding course design in higher education. Furthermore, it could be interesting to extend Rosengrant et al.’s eye-tracking approach to the investigation of students’ engagement in watching instructional videos in online learning environments such as MOOCs. Recent findings by data mining analyses of MOOCs suggest that student engagement is highest for short videos of no more than 6 min of length and sharply drops below the 50% watch-time for videos exceeding this length (Guo et al. 2014). These findings are remarkably consistent with the average high-level vigilance time spans, Rosengrant et al. (this issue) found in their study, but as Guo et al.’s results were based only on log-file data, using an eye-tracking approach might lead to more reliable and valid results.

It could be even more fruitful to look at Rosengrant et al.’s eye-tracking approach to students’ attention from the perspective of cognitive load theory (Sweller et al. 2011, 2019) and the recent effort monitoring and regulation framework (EMR) suggested by de Bruin et al. (in press). A core idea of both theories is that the mental effort, a learner invests during learning, can be conceived of as the learner’s deliberate allocation of attentional resources in the processing of learning materials. Thus, it could be interesting to investigate experimentally how, for example, manipulations of the complexity of the learning material (i.e., intrinsic load) or the instructional design (i.e., extraneous load) would be reflected in patterns of students’ vigilance decrement in ecologically valid learning environments such as the classrooms investigated by Rosengrant et al. (this issue). Hence, Rosengrant et al.’s approach might offer an innovative objective approach to measuring cognitive load (i.e., invested mental effort, see de Bruin et al., in press) in authentic learning environments.

Goldberg and colleagues’ study (this issue) is part of an interdisciplinary research project in which researchers from educational science, psychology, and computer science collaborate. The goal is to develop an artificial intelligence system that allows for the automatic assessment of student engagement during classroom instruction. Goldberg et al. report in their paper results from a gold standard study in which they compared measures of student engagement generated by a machine learning algorithm with fine-grained manual ratings made on a second-by-second basis. The paper is theoretically interesting because Goldberg et al. went one step further than Rosengrant et al. (this issue) and sought to conceptualize as well as to measure students’ engagement and not merely students’ attention.

Following Goldberg et al., student engagement is a more complex concept which includes attention as one aspect but also comprises internal cognitive and motivational processes. The authors also relate their notion of student engagement closely to Chi’s ICAP-framework (see Chi and Wylie 2014), according to which high levels of student engagement are manifested in constructive and co-constructive learning activities such as reflecting out loud and explaining content to peers (see Goldberg et al. 2020; Fig. 1; Chi and Wylie 2014). From the perspective of cognitive load theory, Goldberg et al.’s notion of student engagement closely parallels Sweller et al.’s (1998, 2019) notion of germane processing, that is, the mental effort, learners invest in genuine knowledge construction processes. On an operational level, Goldberg et al.’s manual rating scale mirrors this cognitive conceptualization of student engagement because it is rated to what the extent students engage in productive overt learning activities such as explaining or rather show disengaged or even disruptive off-task behavior.

The machine learning algorithm, in contrast, was trained on more surface-related behavioral features to detect student engagement including gaze direction, head posture, and facial expressions. Interestingly, the authors additionally added a measure of synchrony between neighboring students given that co-constructive (i.e., interactive) learning behavior should be manifested in synchronized behavior of collaborating students. Overall, the study yielded promising results showing a moderate to high correlation of the machine learning measure (combining head pose, gaze features, and neighbor synchrony) with the manual ratings of student engagement. The manual ratings, however, were more successful than the machine learning measures in predicting students’ self-reported cognitive engagement and situational interest.

In summary, Goldberg et al.’s machine learning approach to measuring student engagement during classroom instruction offers an automated and therefore very economical alternative to the time consuming and laborious manual ratings by human raters. Hence, this approach seems to be particularly promising for empirical research on instructional quality and educational effectiveness (see Hamre et al. 2013; Klieme et al. 2009). Indeed, Goldberg et al.’s notion of student engagement nicely complements Klieme et al.’s notion of cognitive activation. Whereas cognitive activation is a core dimension of instructional quality representing a teacher’s ability to offer challenging tasks and stimulating ways of interacting with the subject matter and with the peers, Goldberg et al.’s notion of engagement, on the other hand, captures the extent to which students actually make use of the available learning opportunities.

From a psychological perspective, however, the added value of the concept of student engagement may perhaps be limited given that powerful evidence-based theoretical frameworks on students’ learning processes have been established, especially within cognitive load theory and self-regulated learning theory (see Sweller et a. 1998, 2019; de Bruin et al., in press; Nückles et al. 2020). The strength of the latter frameworks is that they offer a detailed picture of the micro-level processes of learning. Dimensions of instructional quality such as cognitive activation or student engagement are clearly less revealing in this respect because their strength rather is to allow the researcher to capture the overall quality of learning achieved in a classroom in a parsimonious and reductive way. Against this background, Goldberg et al.’s approach to student engagement may be especially promising for macro-level (i.e., “bird’s eye”) research on educational effectiveness but possibly less for micro-level research on students’ internal cognitive processing.

Wolff, Jarodzka, and Boshuizen (this issue) describe in their paper a theoretical model of the cognitive differences between expert and novice teachers regarding classroom management skills. The authors’ primary goal is to explain the distinct cognitive processes that enable expert teachers to more successfully perceive and manage critical events in the classroom than novice teachers. The model is valuable because an analysis of the structure and development of teaching competencies from a cognitive-psychological perspective is rare in teacher education research (for exceptions see Berliner 1988, 2004; Bromme 2014; Leinhardt and Greeno 1986). Similar to Lachner et al. (2016) and Putnam (1987), Wolff et al. (this issue) place the notion of script at the center of their theory (see Schank and Abelson 1977). Lachner et al. (2016) adopted Putnam’s notion of curriculum script to generally refer to the knowledge structures that enable proficient teachers to understand classroom situations and to act pedagogically in a competent way. In a similar, but at the same time more specific, vein, Wolff et al. (this issue) use the term “classroom management script” to refer to the specific knowledge structures that allow proficient teachers to establish a productive work climate in the classroom by preventing and managing off-task or disruptive behavior effectively. Following Wolff et al.’s definition, a classroom management script comprises enabling conditions that specify factors of the classroom situation, such as the extent to which the transition from one phase of a lesson (e.g., the teacher’s presentation) to the next phase (small-group work) goes smoothly. Pedagogical consequences, the second major script component, refer to student behavior such as the proliferation of off-task behavior and disengagement resulting, for example, from the lack of smoothness and momentum (see Kounin 1970). The third script component comprises possible alternative courses of action allowing the teacher either to prevent or to counteract the undesirable consequences emanating from particular enabling conditions. Wolff et al. argue that, compared to novices, experienced teachers not only possess a larger number of such classroom-management scripts but their scripts are also richer (i.e., more saturated with professional experience, and more integrated, that is, more coherent and more inter-related). It is this difference in the quality of the cognitive organization of experienced teachers’ classroom-management scripts that enables them to anticipate and understand critical classroom events more rapidly and adequately than novices and that enables them to prevent or counteract undesirable pedagogical consequences more effectively and efficiently.

Following Wolff et al. (this issue), such classroom management scripts evolve through “exposure to numerous and varied experiences in the classroom.” At a first glance, this reads as if teachers develop classroom management expertise in a completely inductive way by inferring general pedagogical principles from their teaching experiences. However, Wolff et al. also contend that “through experience, teachers blend formal professional knowledge with personal and practical knowledge.” Hence, teacher expertise in classroom management rather develops through the fusion of formal pedagogical knowledge, acquired during academic teacher education, with generalized experiences made during teaching (see Bromme and Tillema 1995). This contention, however, has to be questioned because in most teacher education programs, there is typically a strong focus on the acquisition of content knowledge (i.e., scientific knowledge about the subjects to be taught), but far less on the acquisition of general pedagogical knowledge (e.g., scientific knowledge about classroom management) and pedagogical content knowledge (i.e., scientific pedagogical knowledge about how to teach a subject, see Borko and Putnam 1996; Shulman 1986).

Hence, we have a situation in teacher education programs, where novice teachers enter in their professional life with rather superficial and fragmented knowledge (see Ohst et al. 2015) about important pedagogical methods such as principles of classroom management. Accordingly, it is questionable, to what extent—without elaborated and well-organized scientific pedagogical knowledge—novice teachers can make use of their ample experiences in teaching to develop powerful and expert-like classroom management scripts.

Indeed, recent experimental studies on teachers’ instructional skills suggest that, for example, with regard to the provision of instructional explanations, experienced mathematics teachers showed the same bias towards procedural instead of more effective principle-oriented explanations as teacher students (Lachner and Nückles 2016; Lachner et al. 2019; Weinhuber et al. 2019). Similarly, regarding mathematics teachers’ ability to accurately predict the difficulty of mathematical tasks for students, Wagner et al. (2020) found that experienced mathematics teachers were not better than teacher students because of lacking knowledge about core instructional design principles (e.g., spatial contiguity) derived from cognitive load theory (see also Hellmann and Nückles 2013). These findings, of course, do not refute Wolff et al.’s theory of classroom management scripts, but they call into question the assumption that the development of pedagogical expertise can be modeled in analogy to other expertise domains such as academic medicine or psychotherapy where much more professionalized and coherent curricula have been established as compared with the vast majority of teacher education programs (see Darling-Hammond 2006).

Critical Remarks on a Promising Research Paradigm and Avenues for Future Research

In this final subsection, I would like to critically discuss some theoretical assumptions that seem to be tacitly shared by the contributors to this special issue and that are inherent in the notion of professional vision as it is currently discussed in teacher education research.

  1. (1)

    When reading the papers, I came across the idea several times that proficient teaching is the ability to react appropriately to the sudden und unforeseeable events occurring in a very complex and dynamic classroom. I do not want to deny the complexity and dynamics of classroom interaction. However, I would like to challenge the idea that teaching is mainly reacting to and managing of sudden events. This conception of the “reacting teacher” is currently also reinforced by Blömeke et al.’s (2015) model of teaching competence according to which professional vision is presented as a crucial situation-specific skill that “mediates” between a teacher’s knowledge bases (i.e., content knowledge, pedagogical content knowledge and general pedagogical knowledge, see Shulman 1986) and their actual teaching performance. This conception, however, neglects the fundamental role of planning in teaching (Reigeluth 2013). Teaching does not start in the classroom but when the teacher sits at home at their desk and plans the lessons for next morning. It is this planning (i.e., selecting and sequencing learning content in line with the students’ learning prerequisites, orchestrating cognitively challenging learning tasks and forms of social interaction, see Reigeluth 2013) that makes the complexity of classroom interaction predictable and manageable. This does not mean that a teacher never has to modify their plans. Of course, unpredictable things always happen during classroom instruction, but it is precisely their well-thought-out didactic plan that enables proficient teachers to react flexibly if required in a certain situation (see Leinhardt and Greeno 1986). For example, it is much easier to respond to a sudden conflict if the teacher has before carefully planned and implemented an effective system of behavioral rules and a schedule of reinforcements that ensures a generally high level of compliance by the students. Or, it is much easier to respond to unexpected comprehension difficulties by students if the teacher has included sufficient buffer time in their lesson plan. Hence, given the significance of careful lesson planning for successful teaching, future research on professional vision could, for example, investigate to what extent teachers are able to realize their planned behavior during class instruction. For example, if they had decided to consistently use their gaze as a means of reinforcement (“appreciative gaze”) and punishment (“punitive gaze”), to what extent are they capable to realize this plan while simultaneously presenting subject matter to the students (see Kounin’s withitness and overlapping, Kounin 1970).

  2. (2)

    There is still another problem associated with the assumption of professional vision as mediator between a teacher’s knowledge and pedagogical actions. From this assumption follows that teachers who are better able to notice and reason about classroom events should also show superior teaching performance. Indeed, there are correlational data supporting this contention (see Weber et al. 2018, Kersting et al. 2012). On the other hand, especially the study by Pouta et al. (2020) showed that the experienced teachers in their study reflected about the videos of their teaching in a less sophisticated manner than the student teachers did. But nevertheless, the experienced teachers were far more successful in scaffolding students’ mathematical thinking than the student teachers. This result suggests that proficient teachers sometimes can do more than they can tell!

    How can this puzzling result be theoretically explained? Boshuizen and Schmidt’s (1992) theory of knowledge encapsulation, developed for the domain of medical expertise, offers an explanation for Pouta et al.’s findings. Apparently, the experienced teachers in their study had developed effective ways of scaffolding students’ mathematical understanding. Due to their extensive teaching practice, these pedagogical routines may have become largely encapsulated and automated. Therefore, they were able to enact them in a situationally appropriate and supportive manner, but, when watching the video of their teaching afterwards, they were not fully able to articulate the rationale behind their actions in that particular teaching situation. Recent findings from priming studies provide further evidence that teachers’ awareness of the way they teach may (at least in some situations) be limited (Weinhuber et al. 2019). Accordingly, it is an interesting question for future research on professional vision to investigate situations where teachers’ conscious representation of their teaching falls apart from what they are actually doing in the classroom. Eye-tracking may be a very good methodology to detect such discrepancies between awareness and performance because a teacher’s gaze behavior may be especially revealing with regard to not consciously accessible intentions and action plans in a particular teaching situation.

  3. (3)

    A further issue that in my opinion is controversial refers to the type of gaze behavior and noticing that student teachers should learn in teacher education. I am skeptical whether expressing agency and communion through non-verbal gaze has to be instructed and practiced in teacher education (see Haataja et al. 2020) because non-verbal communication skills are part of the primary socio-cognitive abilities that humans normally develop naturally during ontogenesis (Geary 2008). The same argument applies to the type of monitoring skills some of the papers in this special issue suggest for teacher training. Thus, I am not convinced that teacher students need to learn to discern whether pupils are concentrated and engaged as suggested by Goldberg et al. (2020). Everyone with a normally developed theory of mindFootnote 3 should be able to infer from another person’s facial expression and body posture whether the person is concentrated and engaged or rather disengaged and distracted (e.g., if the students permanently focus on the display of their mobile phones instead of looking at the teacher). On the other hand, whether a student who looks engaged and concentrated, in fact engages in germane processing (see Sweller et al. 2019), can hardly be inferred from the student’s overt behavior but has to be assessed by asking the student some diagnostic questions (White and Gunstone 2014).

    Furthermore, the ability to effectively monitor students’ off-task and disruptive behavior does not mean that a teacher should be able to detect 100% of the disruptive moves which the students enact during a classroom session. Kounin’s notion of withitness (Kounin 1970) denotes a teacher’s ability to communicate a stance rather than an observational skill. Thus, successful implementation of withitness implies that a teacher has successfully established the mutual belief (see Clark 1996) that s/he is willing and able to detect and to prohibit every single disruptive behavior that occurs in the classroom. Such a transactional or communicational perspective on teaching is indeed adopted by several papers in this special issue (see Haataja et al. 2020; Pouta et al. 2020; Wyss et al., 2020). Hence, in learning to implement withitness effectively in the classroom, the demand on teacher students is mainly communicational rather than observational.

    With regard to withitness, a critical skill is to maintain non-verbal communication with all students while, for example, instructing a subgroup of students or giving a presentation. Therefore, Kounin (1970) explicitly related withitness to “overlapping” by which he meant the ability to do two things at the same time (i.e., multitasking). Especially the demand of presenting before a larger audience is by many novice teachers experienced as stressful and frightening. Therefore, they typically tend to avoid eye contact with the audience in order to reduce their anxiety level. Furthermore, non-verbal communication with the audience also requires freely available working-memory resources. Consequently, one should rehearse one’s presentation to the extent that the speech production does not absorb all working-memory resources, but some can be invested in the non-verbal communication with the audience (i.e., overlapping). Hence, with regard to the successful implementation of withitness and overlapping, novice teachers have to learn a lot. Given the importance of non-verbal communication, eye-tracking could be a valuable method to investigate teachers’ non-verbal communication with the audience (see also Haataja et al. 2020) and to provide novice teachers feedback on their success in implementing withitness during presentation.

  4. (4)

    With regard to Wolff et al.’s theoretical paper on classroom management, I have argued that in the domain of teaching, development of pedagogical expertise through professional experience cannot be taken for granted. The reason is that extant teacher education programs offer too sparse opportunities for deep learning and deliberate practice of scientific pedagogical knowledge (see Bauer and Prenzel 2012; Darling-Hammond 2006). Thus, using an explorative approach to study cognitive differences between novices and experienced teachers will often be of limited value, if the goal is to find out what experienced teachers can master and novice teachers should therefore learn. On the other hand, psychology has spawned numerous evidence-based principles with regard to effective instructional design and classroom management. Hence, future research should move beyond explorative expert-novice comparisons to controlled hypotheses-driven designs that investigate to what extent teachers are able to successfully apply evidence-based pedagogical knowledge and how such knowledge can be taught effectively in teacher education. The papers of Seidel et al. (this issue) and Wyss et al. (2020) take important steps towards such a theory-driven approach.

In this discussion paper, I have started with a characterization of teaching and learning as situated, complex, and reciprocally interactive activities. I have argued that investigating visual perception processes in teaching and learning with advanced eye-tracking methodologies constitutes an innovative and promising new research paradigm. I also attempted to challenge some—in my view, as an educational psychologist and teacher educator—rather controversial theoretical assumptions with regard to the notion of professional vision as it is currently discussed in this emerging research community. I do hope that my remarks will help to further advance this promising field of research at the intersection of educational psychology, cognitive science, and teacher education.