1 Introduction

Students’ mathematical thinking has often been chosen as a focus for investigating and developing professional noticing—especially in the US context (e.g., Colestock and Sherin 2015; Jacobs, Lamb, and Philipp 2010). A main reason for this choice is the idea that whether and how teachers attend to, interpret, and respond to students’ thinking in the mathematics classroom is an important factor of teaching quality. Corresponding studies on teacher noticing usually use—at least implicitly—a frame of reference for what the teachers are supposed to notice in the sense of high level professional noticing (so-called “target noticing”, Stockero and Rupnow 2017). However, it is well known that Western and East Asian perspectives on what characterizes high instructional quality in mathematics classrooms are different in many aspects (Clarke 2013b; Leung 2001). Hence it may be assumed that such different norms influence how teacher noticing regarding students’ thinking is operationalized by mathematics education researchers in different cultures, and thus it is questionable whether such research can be cross-culturally valid (Clarke 2013a).

Therefore, it is essential in our intercultural research community to make culture-specific norms on the level of experts (scholars in mathematics education) explicit, which may influence how teacher noticing is assessed, and to consider these norms in the interpretation of corresponding findings.

Consequently, in this contribution, we want to raise awareness of possible influences of different cultural norms of instructional quality on how we assess teacher noticing and propose a way in which such cultural norms can be made explicit. In particular, we illustrate, by means of an example, how expert norms of responding to students’ thinking in a specific classroom situation can be different when viewed from German and Taiwanese perspectives.

2 Theoretical framework

2.1 Teacher noticing

In the last decade, research on teacher noticing has gathered momentum in the international mathematics education community, since situation-specific skills including perception and interpretation of key aspects in complex classroom situations are considered to be important components of teaching expertise (e.g., Blömeke et al. 2015; Kaiser, Blömeke, König, Busse, Döhrmann, and Hoth 2017; Schack, Fisher, and Wilhelm 2017). As a result, there is a growing body of research indicating that teacher noticing is indeed a central aspect of the missing link between teachers’ dispositions (knowledge and beliefs) and their performance in the sense of instructional quality (for an overview, see Stahnke, Schüler, and Rösken-Winter 2016).

Although different conceptualizations of teacher noticing can be found (see theme 1 of this issue), it essentially comprises aspects of perception and interpretation of relevant features of instructional situations (Sherin, Jacobs, and Philipp 2011). Hence, in line with the conceptualization by Sherin (2007, 2017) and the underlying theoretical approach by Goodwin (1994), we understand teacher noticing as including the following components:

  • attending to aspects of classroom situations that are relevant for instructional quality (selective attention) and

  • interpreting them by drawing on professional knowledge and beliefs (knowledge-based reasoning).

Since it is well-known that “perception is as much a top-down as a bottom-up process” (Sherin 2017, p. 403), we do not consider a separable sub-facet ‘perception’, as is the case in some other prominent frameworks such as the “Perception, Interpretation, and Decision-making (PID) model” (Kaiser et al. 2017). Instead, we understand the processes of attending and interpreting as being interrelated, cyclical, and interacting in a dynamic manner. In particular, “teachers necessarily interpret what they see, relating observed events to abstract categories and characterizing what they see in terms of familiar instructional episodes” (Sherin et al. 2011 p. 5). Moreover, what teachers attend to is influenced by what they expect, how they interpret the overall situation, and the categories they have in mind (Clarke, Mesiti, Cao, and Novotná 2017; Sherin 2017). Based on the well-known work by Jacobs, Lamb, and Philipp (2010), much research in the field has included a third process in their conceptualization of teacher noticing: “deciding how to respond” or more generally “decision making” (e.g., Kaiser et al. 2017), which is certainly also an important aspect of teaching expertise. However, in line with Sherin (2017), we do not think that deciding is as tightly integrated into perception as interpreting is and thus we do not see the need to further increase the complexity of the construct by including deciding.

Often, studies on teacher noticing specify certain aspects of instructional quality that are to be considered (e.g., Stürmer and Seidel 2017). This means either that the focus is on a specific aspect, such as responding to students’ mathematical thinking (e.g., Jacobs et al. 2010), or that a certain model of instructional quality is taken as a reference: for instance, Mitchell and Marin (2015) employed the Mathematical Quality of Instruction framework by Hill et al. (2008) to investigate pre-service teachers’ noticing. This framework proposes several aspects to characterize subject-specific instructional quality, including mathematical explanation and justification, interpreting students’ mathematical utterances, the use of multiple representations, and linking among representations (Hill et al. 2008, p. 437).

2.2 Assessing teacher noticing

Since investigating teacher noticing involves challenges and corresponding decisions regarding the methodology (Jacobs 2017; Sherin and Russ 2011), many different operationalizations of the construct exist in the growing body of research (see theme 2 of this issue). However, it is widely accepted that vignettes in the form of videos, comics, or transcripts can be used to represent a specific instructional situation (e.g., Friesen 2017). Such vignettes are also called representations of practice as they provide specific practical classroom scenarios in order to elicit teacher noticing (e.g., Herbst and Kosko 2014; Stürmer and Seidel 2017). While some research on teacher noticing aims merely to describe what teachers notice (e.g., Colestock and Sherin 2015), most studies implicitly or explicitly use a normative frame of reference of what teachers should notice in order to demonstrate teaching expertise (e.g., Choy, Thomas, and Yoon 2017; Stockero and Rupnow 2017). Our project is based on the latter approach, which resonates with an understanding of teacher noticing as a key component of teaching expertise that is trainable and can be measured (e.g., Fernández and Choy 2019).

A common ‘operational trick’ for assessing teacher noticing is to design or select representations of practice in which something occurs that does not meet the expectations of ‘good’ teaching, that is, they include a breach of a norm (e.g., Dreher and Kuntze 2015; Herbst and Kosko 2014), which is sometimes also called a critical incident (e.g., Wyss, Rosenberger, and Bührer 2020). As suggested by Herbst and Chazan (2011), the notion of a norm is used here “in the sociological sense as the normal or unmarked behavior that is tacitly expected in a setting” (p. 411). The teachers’ reaction to such a breach of a norm is then used as an indicator of noticing expertise (e.g., Friesen 2017; Meschede et al. 2017). As Herbst and Chazan (2011) pointed out, the way in which the vignettes are used to investigate teacher noticing in this case corresponds to the ethnomethodological notion of a breaching experiment (Mehan and Wood 1975).

In a study by Dreher and Kuntze (2015), for instance, instructional situations were represented in which the teacher’s instruction involves an unnecessary change of representations that potentially hindered students’ understanding, since the representations were not linked appropriately. Hence, the instruction illustrated by the vignette included a breach of a norm regarding the aspect of instructional quality “linking among representations” (e.g., Hill et al. 2008). In order to show specific professional noticing, the teachers participating in the study were expected to notice this breach of a norm in the sense of indicating that they (1) attended to the change of representations and (2) interpreted this change of representations as being critical based on their professional knowledge.

This dual sense makes obvious that norms regarding aspects of instructional quality play a double role in teacher noticing research: On the one hand, beyond personal dispositions of teachers (e.g., individual knowledge and beliefs) also cultural norms are assumed to influence teacher noticing (e.g., Yang, Kaiser, König, and Blömeke 2019). On the other hand, such norms form the frame of reference that is already implemented in the operationalization by the researchers. In particular, researchers use the consistency of their own norms with regard to what teachers notice as an indicator of noticing expertise (e.g., Stockero and Rupnow 2017; Stürmer and Seidel 2017). Consequently, the results of such research on teacher noticing depend on the researchers’ norms. Closer inspection therefore reveals that it is not clear whether such research can be cross-culturally valid, since such norms may be culture-specific (e.g., Louie 2018; Xu and Clarke 2019).

2.3 Differences between Taiwanese and German perspectives on instructional quality in mathematics classrooms

In international comparative studies such as PISA and TEDS-M, East Asian learners as well as teachers have consistently outperformed their Western counterparts regarding their mathematical knowledge (e.g., Kaiser and Blömeke 2013; Kleickmann et al. 2015; OECD 2014). Such findings raised the interest of the scientific community in comparing mathematics education in East Asian and Western cultures, which led to a better understanding of typical intercultural differences in mathematics classrooms.

In his search for an East Asian identity in mathematics education, Leung (2001) contrasted, for instance, features and their underlying values in East Asian and Western mathematics education by means of six dichotomies: product versus process; rote learning versus meaningful learning; studying hard versus pleasurable learning; extrinsic versus intrinsic motivation; whole class teaching versus individualized learning; and competence of teachers regarding subject matter versus pedagogy. He argued that these distinct characteristics “are based on deep-rooted cultural values and paradigms” (p. 35). For instance, regarding the second dichotomy, Leung (2001) explained that “underlying this dichotomy are different views on the nature of mathematics learning” (p. 40): While the dominant Western conception of constructivist learning entails the view that memorization without understanding is not meaningful learning, in East Asia, memorization before understanding is often considered to play an important role in learning mathematics. Moreover, in view of the examination-driven culture dominant in East Asia, memorization plays an important role (e.g., Lin and Li 2009), as students need to get prepared for tests in which they must solve tasks at a very fast pace. This means in particular that memorizing procedures and algorithms, which can be applied to certain kinds of mathematical tasks, is important. However, from a Western point of view, there is often more emphasis on students’ individual problem solving strategies, and heuristics showing their own mathematical investigations, than on the mastery of universally valid algorithms (e.g., Leung 2001).

Although there is also diversity among Western as well as among East Asian countries (e.g., Clarke 2013a), several studies have revealed that conceptions of ideal mathematics instruction differ especially between East Asian and Western countries (e.g., Bryan, Wang, Perry, Wong, and Cai 2007; Clarke 2013b; Kaiser and Vollstedt 2007), such as Taiwan and Germany.

While traditional mathematics classrooms in Taiwan are characterized by teacher-directed and product-oriented whole-class teaching focusing on clear explanation of the mathematics content (Lin and Li 2009), ideal mathematics instruction in Germany is student-centered and process-oriented (Kaiser and Vollstedt 2007). In particular, this difference suggests that responding to individual students’ thinking might be an aspect of instructional quality that is perceived as being more important in Germany than in Taiwan. However, influenced by Western perspectives, Taiwanese curricula have been reformed several times since 1993 to emphasize the development of students’ mathematical literacy and student-centered instruction (Hsieh 1997). Consequently, what is considered high-quality mathematics instruction in Taiwan today reflects not only traditional perspectives, but is also shaped by Western ideas of constructivist-based instruction, such as discussing students’ solutions as well as focusing on students’ thinking and misconceptions (Hsieh, Wang, and Chen 2020; Lin and Li 2009).

This situation is also reflected in the findings by Felbrich, Kaiser, and Schmotz (2014) regarding pre-service teachers’ epistemological beliefs concerning the nature of mathematics, where they distinguished between individualistic and collectivistic countries following the terminology by Hofstede (1986): While in general it was found that in individualistic countries pre-service teachers stressed the dynamic nature of mathematics more strongly than its static nature and for pre-service teachers in collectivistic countries it was vice versa, Taiwanese pre-service teachers emphasized both aspects of mathematics to the same extent. Nevertheless, the findings of this study revealed clear differences between Taiwanese and German pre-service teachers’ beliefs regarding the nature of mathematics. In contrast to their Taiwanese counterparts, the German pre-service teachers agreed significantly more with the dynamic perspective (e.g., “In mathematics many things can be discovered and tried out by oneself”) than with the static perspective (e.g., “Mathematics is a collection of rules and procedures that prescribe how to solve a problem”). This finding resonates with Leung’s (2001) considerations regarding his first dichotomy: He emphasized that although mathematics educators from both East Asian and Western countries would say that mathematics is both the product (a body of knowledge with distinctive knowledge structure) and the process (a distinctive way or process of dealing with particular aspects of reality), their positions on the continuum between the two extremes are different. While the contemporary Western perspective is that the process of doing mathematics is more important than the content arising out of the process, East Asian scholars rather believe that ultimately the content and its correctness are fundamental (Leung, 2001 p. 39).

Against this background, it can be assumed that such beliefs also influence how mathematics education researchers in Germany and Taiwan perceive different aspects of instructional quality, which are the focus of research on teacher noticing. In particular, regarding the focus of students’ thinking, Colestock and Sherin (2015) pointed out that there are different purposes for attending to students’ mathematical thinking, which may depend on different overarching instructional goals and beliefs. For instance, such different purposes can be diagnosing student errors or misunderstandings that need to be addressed or looking for students’ ideas that have the potential to serve as the foundation for new understandings. In their study, they investigated different purposes for attending to students’ mathematical thinking and found that even within one cultural context the teachers focused on these purposes to various degrees. However, they did not take into account the perspectives of experts in mathematics education or different cultural contexts and thus it is still an open question whether cultural norms of responding to students’ mathematical thinking differ.

2.4 The role of cultural norms for research on teacher noticing

As argued in the previous sections, research on teacher noticing often depends on the norms of researchers and some of these norms may be culture-specific. Thus, it is not clear whether such research can be cross-culturally valid. Currently, the international research community is generating cumulative evidence concerning the construct of teacher noticing (e.g., Stahnke et al. 2016)—however, it is rarely explicitly questioned whether these findings are culture-specific or hold true across cultures. There are first studies comparing teacher noticing across cultures. In particular, Yang et al. (2019) compared noticing expertise of teachers in Germany and China by means of a video-based test instrument developed for the German Teacher Education and Development Study—Follow Up (TEDS-FU). The existing tests were translated into Chinese and the video vignettes were redone by Chinese teachers and their students. Yang, Kaiser, König, and Blömeke (2018) described how this study followed the recommendations for test translations that include adaptation of test materials as well as judgment of linguistic equivalence of original and translated versions (e.g. translation-back-translation procedures, see also Guidelines for Translating and Adapting Tests, ITC 2017). The findings of this study showed different patterns of strengths and weaknesses of German and Chinese teachers and provide empirical evidence that teacher noticing is influenced by different cultural contexts (Yang et al. 2019). In particular, German teachers performed significantly better on noticing aspects regarding general pedagogy and process-oriented mathematical skills, whereas Chinese teachers performed better on aspects regarding mathematical instruction, such as identifying students’ mathematical errors. As this study is pioneering cross-cultural comparative research on teacher noticing, it is very valuable for the field. However, the study is an example of the phenomenon that research in education is often designed from a ‘Western’ point of view and then used in other cultural contexts. Clarke (2013a) emphasized that this approach is not without difficulties, since the operationalizations which are used to categorize and compare aspects of education are themselves results of cultural value systems that shape researchers’ analyses and conclusions. Such a problem occurred for instance in the Chinese adaptation of the research design for the Middle School Mathematics and Institutional Setting of Teaching (MIST) project: The Instructional Quality Assessment (IQA) instrument by Silver and Stein (1996) was rejected, since the instrument clearly reflected norms for instructional quality of the United States as the authoring culture (Clarke 2013a). For instance, the IQA uses answers to the question “Was there widespread participation in teacher-facilitated discussion?” as a criterion for instructional quality (Silver and Stein 1996).

However, educational research scarcely focuses on such culture-specific differences regarding researchers’ understandings of central constructs. The usual way to deal with such differences in international comparative research is to omit culture-specific aspects based on the evaluation of experts. Against this background, Clarke (2013a) argued that international comparative research often sacrifices validity in the interest of comparability.

Therefore, to investigate teacher noticing in a way that is sensitive to different cultural contexts, and to explore the role of cultural norms for teacher noticing, require researchers to take into account the views of participants at the level of experts in mathematics education (i.e., researchers and teacher educators) as a frame of reference. Making the understandings of experts in different cultures more explicit should in particular include a comparison of their norms regarding instructional quality, in order to determine to what extent these norms overlap (ITC 2017). Furthermore, to consider different cultural perspectives from the beginning, the process of instrument development requires intensive collaboration in an intercultural research team.

3 Objectives

According to the need for research pointed out in the previous sections, the first objective of this contribution is the following:

  1. Objective 1.

    Developing a methodology for the concurrent design of vignettes and for eliciting corresponding expert norms as a prerequisite to investigating teacher noticing in a way that is sensitive to different cultural contexts.

To show how this design can uncover different implicit cultural norms of experts in mathematics education and thus make visible different frames of references for investigating teacher noticing in different cultural contexts, the second objective of this paper is the following:

  1. Objective 2.

    Illustrating by means of an example how expert norms of responding to students’ thinking can be different from German and Taiwanese perspectives.

4 Resulting methodology of concurrent development of vignettes (Objective 1)

To meet objective 1, we designed a concurrent process for developing culturally sensitive vignettes based on the Guidelines for Translating and Adapting Tests (ITC 2017) as an orienting framework. In the following sections, we present the resulting methodology.

As we could not include any bicultural German-Taiwanese mathematics education researcher, English was chosen as the mediating language. An overview of the development process that we designed is shown in Fig. 1 and the phases of this process are elaborated subsequently.

Fig. 1
figure 1

Overview of the concurrent development process

4.1 Phase 1: reconciliation of the development framework

Aspects of instructional quality Based on East Asian as well as Western frameworks, we selected three aspects of mathematics teaching quality that are central in both cultures, but also prone to potentially different norms: these aspects are responding to students’ thinking, use of representations, and use of tasks. In this paper, we focus on responding to students’ thinking.

Mathematical content. We decided to focus on a specific topic to exclude construct-irrelevant variance due to different mathematical content. The topic of linear and quadratic functions and equations was chosen, since it is central in the secondary curriculum (grades 7–9) in Germany and Taiwan. It was also identified as a topic suited to international comparisons in the Teaching and Learning International Survey (TALIS) 2018 video-study.

Design features of the vignettes As reviewed in Sect. 2.2, teacher noticing assessment is often based on representations of practice in the form of vignettes. We decided to follow the approach by Dreher and Kuntze (2015) using text vignettes that include a breach of a specific norm regarding the focused aspect of instructional quality. While different cultural norms regarding the quality of mathematics instruction were to be disclosed by means of these vignettes, other distinguishing features of mathematics classrooms in Taiwan and Germany should not become visible for the sake of ecological validity (i.e., valid representations of practice) in both cultures. This is one reason why text vignettes (with pictorial representations) were used instead of videos that display the individualities of people as well as settings and thus convey the full range of cultural discrepancies. Moreover, text vignettes give designers more control of how to represent a specific breach of a norm and they can easily be adapted in an iterative process. To further facilitate ecological validity, we identified features that should be avoided in the vignettes, since they are not common in one of the countries (ITC 2017: PC-3, e.g., calculators are uncommon in Taiwan, chorus answers in Germany).

4.2 Phase 2: development within cultures

Each national research team developed draft vignettes including a breach of a norm regarding an aspect of instructional quality from their cultural perspective (aiming at 3 per aspect). This breach of a norm was also externalized in one sentence per vignette. The process was iterative and we used published and unpublished materials as resources. After finalizing a first set of drafts, we recruited mathematics educators and teachers for external feedback to get indications of whether the intended noticing processes were enabled by the vignettes in the target populations within the original cultural context (ITC 2017: TD-4, TD-5). The goal of this external feedback from practitioners in mathematics education was to reduce the chance that insufficient quality of the vignettes would hinder teacher noticing regarding the target aspects of instructional quality. The practitioners were asked to read the draft vignettes, answer the noticing prompts, note possible sources of misunderstanding (e.g., ambiguous wording, lack of structural clarity), and assess the ecological validity of the vignettes with respect to secondary classrooms. In a second round, for each vignette, we presented the intended breach of a norm and asked whether they agreed, regardless of what they had noticed before. The responses informed our revision processes. For instance, in Germany it turned out that regarding one vignette focusing on a common misconception of students, the practitioners expected the teacher to react according to a typical approach using a visualization of the equations. Since the teacher in the vignette did not do this, they focused very much on this issue and did not notice the breach of a norm intended by the German researchers, although they agreed with this breach in the second round. Therefore, this vignette was revised such that the dominant issue that the practitioners saw no longer occurred. Moreover, the feedback helped to refine the wording of the vignettes (e.g., regarding overly complex student statements). Through this external feedback from practitioners the quality and in particular the ecological validity of the vignettes could be improved within each country.

4.3 Phase 3: exchange between cultures

Subsequently, the vignettes and the description of the included breach of a norm were exchanged in English between the teams and subjected to feedback. Each team checked whether the vignettes from the other team could represent practice in their country. For instance, this was not the case, if the specific content was usually not treated in class or if the teaching materials were uncommon. Moreover, the wording of the vignettes was checked and preliminary translations into the third language were made to check whether adequate terms existed in each language. We thus took into account that different cultures use different pedagogical lexicons with respect to teaching mathematics (Clarke et al. 2017). In this step, we also had to tackle difficulties related to the use of English as a mediating language. For instance, the German word “Äquivalenzumformungen” is used for operations on equations that do not change the set of solutions. In Chinese, a similar term (等量公理) is known. However, English appears not to provide a specific term for these operations, so that we were able to find equivalent wordings in German and Chinese, but not in the intermediate language. In other cases, paraphrasing expressions could prevent linguistic inequivalence between the German and Chinese text of the vignette. If necessary and possible, vignettes were revised in an iterative process to represent practice and pedagogical lexicons in both cultures. Otherwise, vignettes had to be discarded or redesigned.

Finally, this process yielded 9 vignettes from each country in the local language (and in English), which were considered ecologically valid in both contexts and included a breach of a norm regarding instructional quality from one national team’s perspective, but not necessarily from the other.

4.4 Phase 4: translation processes

In phase 4, the national teams translated the foreign vignettes from English into their language. Again, other mathematics educators were involved to get feedback on whether the translated vignettes were ecologically valid.

Finally, we subjected the German and Chinese versions of all 18 vignettes to a check of linguistic equivalence (ITC 2017: TD-2) by a person with a German–Taiwanese background: Mr. Huang grew up bilingually in Germany and lives in Taiwan today, where his son is educated in a Taiwanese school. He studied in Germany, as well as in the US (business administration) and Taiwan (Chinese language and culture). He also worked in the educational context in Germany (as a teacher for business and economics classes) as well as in Taiwan (English and German language teacher). Since he completed the advanced mathematics course at school (Leistungskurs) and studied in a field related to mathematics, he was mathematically literate with respect to the topic of functions and equations. He was thus considered to have the relevant expertise to evaluate the linguistic equivalence of the test materials (ITC 2017: TD-1). The check led to minor adaptations of both the German and Chinese versions, but largely validated the development process designed for the purpose of this study.

5 Different expert norms of responding to students’ thinking in Taiwan and Germany (Objective 2)

Focusing on objective 2, in the following we show how the approach of this study can uncover different implicit cultural norms of experts in mathematics education. To this end, we use one of our vignettes to illustrate how expert norms of responding to students’ thinking can be different from Taiwanese and German perspectives. Focusing on one specific vignette can certainly not provide generalizable results nor yield the full range of different norms, but it allows us to give detailed insight into experts’ answers as well as our process of extracting different norms. Thus, in some sense, the following can be seen as a ‘proof of existence’, which shows that expert norms playing a role in investigating teacher noticing can be culture-specific.

5.1 Methods

The vignette shown in Fig. 2 was chosen for this purpose, since already during the process of development described above, it became clear that this was one of the vignettes that had the potential to bring to light different expert norms of responding to students’ thinking in the two countries.

Fig. 2
figure 2

Taiwanese vignette focusing on responding to students’ thinking

The vignette was developed by the Taiwanese research team. Thus, the represented classroom situation contains a breach of a norm of how the teacher responds to students’ thinking from their perspective: The teacher does not address S1′s misunderstanding and inadequate use of strategy (over-generalizing the strategy applicable in the case “\(f \times g = 0\)”) properly, but merely conveys factual knowledge regarding the number of solutions and a standard procedure. (It should be noted that this is the interpretation of the Taiwanese members of our research team and not necessarily in line with the interpretations of researchers from other contexts.)

When the German research team saw this vignette, it was hard for them to see such a problem with S1′s answer and the strategy that was presumably used. We assumed that these different perspectives on the student’s thinking would probably not be restricted to our research teams and thus we anticipated specific cultural differences between the perspectives of Taiwanese and German experts in mathematics education.

To investigate whether this was true or whether this was merely a matter of different individual perspectives of researchers in mathematics education, this vignette was presented to Taiwanese and German professors of mathematics education in an online expert survey in their native language (Chinese/German). To recruit these experts, we focused on professors who were active in mathematics education research and also in preparing future secondary mathematics teachers. As we aimed for a sample of 15 experts in each country and assumed a participation rate of at least 50%, in Germany, a random sample of 30 professors, out of the full list of persons meeting these criteria, was contacted. In Taiwan, these criteria yielded a list of only 32 professors and thus all of them were contacted. In total, a sample of n1 = 19 Taiwanese professors (6 female, 13 male) from 10 universities and a sample of n2 = 19 German professors (5 female, 14 male) from 14 universities worked on the vignette (completion rates were TW 59%, GER 63%). Besides being active in research in mathematics education, some of them had also conducted research in mathematics (TW 5, GER 6). Moreover, besides educating mathematics pre-service teachers, most of them had also experience as school teachers (TW 14, GER 17). To capture the experts’ frame of reference for investigating teacher noticing regarding students’ thinking, the experts were given the same open-ended prompt that would be used to assess corresponding teacher noticing: “Please evaluate how the teacher responds to students' thinking in this situation and give reasons for your answer.” We decided to use “evaluate” in combination with “give reasons” in the prompt to make sure that the experts did not merely describe what they noticed. Since we aimed at identifying the experts’ norms of responding to students’ thinking, it was important to know how the experts evaluate what they notice.

Accordingly, we analyzed the experts’ evaluations concerning two main aspects:

  1. 1.

    Did they see some breach of a norm regarding how the teacher responded to S1′s thinking? And if so:

  2. 2.

    Which norm was breached from their perspective?

For enabling the whole research team to engage in the coding process of all experts’ answers and directly compare them across cultures, all answers were translated into English by the first and the fourth author. These translations were reviewed critically by the other authors, and similarly to the processes of translating the vignettes, the external trilingual person Mr. Huang compared the translations from Chinese to English with those from German to English in detail and pointed out possible alternative translations which were discussed subsequently. Both language versions were used in parallel during the coding processes.

The coding process consisted of two phases, namely, the development of the coding scheme and its application.

Development First, we sought to identify possible alternative norms that were breached from the experts’ perspectives. To this end, each member of the research team went through all of the experts’ answers, looking for indications of perceived breaches of a norm regarding how the teacher responded to S1′s thinking (indicator: the way the teacher responded to S1′s thinking was evaluated as insufficient/inadequate). Subsequently, we analyzed why the teacher’s responding to S1′s thinking was evaluated negatively in order to deduce which norm participants felt was breached. Specifically, this meant that different reasons why participants thought the teacher’s response was inadequate were coded. On the one hand, we examined whether the experts pointed out the reason that corresponds to the breach of the norm that the developing Taiwanese researchers integrated (Code: “S1′s misunderstanding or inadequate use of strategy is not addressed properly”). On the other hand, reasons indicating a different norm were extracted inductively from the experts’ answers. Aggregating similar reasons, corresponding breaches of possible alternative norms were first discussed within the national research teams and then in the full research team. In view of our aim to identify culture-specific or intercultural norms of instructional quality rather than individual teachers’ beliefs, reasons, which were mentioned only by individual experts, were not considered to indicate a possible alternative norm, and thus they were not included in the analysis using the resulting coding scheme. For instance, three experts (2 from GER, 1 from TW) pointed out that the teacher’s response was inadequate, since instead of answering S1′s question him- or herself, he or she should have asked other students (e.g., S2) to do this. As this reason was mentioned only by three participants, it was not included in the coding scheme for the second phase. However, one other reason corresponding to a possible alternative norm appeared more often and was thus included (see results).

Application In the following second phase, each author applied the resulting coding scheme independently to all answers. Corresponding to the two main aspects mentioned above, this top-down coding process included a dichotomous coding of whether the expert saw some breach of a norm regarding how the teacher responded to S1′s thinking. If so, it was coded why the teacher’s responding to S1′s thinking was evaluated negatively, as an indicator for which norm was breached from their perspective (i.e., the codes were reasons why participants thought the teachers’ responding to S1′s thinking was inadequate).

Again, the codings were first compared within the national research teams and discrepancies were resolved through discussion. Then, these national coding results were compared and yielded good interrater reliability across cultures (main aspect 1, Cohen’s kappa = 0.89; main aspect 2, Cohen’s kappa = 0.72). In case of discrepancies, a consensus was reached through discussion, where usually the interpretation of the research team from the culture of the expert was given priority.

In view of the aim to identify culture-specific or intercultural norms of instructional quality regarding the aspect of responding to students’ thinking, we finally considered how many of the experts in each country recognized a breach of a specific norm regarding this vignette. For evaluating this result, it is important to consider that even if a specific norm exists, it cannot be expected that all of the experts’ answers reveal that they noticed the corresponding breach of this norm. Firstly, there may always be individuals who do not agree with commonly accepted norms in their culture. Secondly, just like teachers, the experts had to accomplish a process of noticing and hence attend to the specific aspect of the represented classroom situation and also write down their corresponding interpretations in a way that we were able to code. Therefore, we figured that if the majority of the experts from one country actively recognized the breach of a specific norm, then this could be considered a strong indication of the existence of this norm in the corresponding culture.

5.2 Results

Since the focused vignette was authored in Taiwan, we first consider the answers of the Taiwanese experts, in the sense of a validation within a culture. Of the 19 answers, 17 contained negative evaluations of how the teacher responded to S1′s thinking, which indicates that the corresponding experts saw some breach of a norm of responding to students’ thinking. Regarding the reasons why the participants thought the teacher’s responding to S1′s thinking was inadequate, 11 of these experts’ evaluations indicated the assumption that S1′s answer shows a problem to be addressed (misunderstanding/inadequate strategy), which was not done appropriately by the teacher. These experts recognized the breach of the norm that was intended by the Taiwanese research team (“original norm”). To explore such evaluations of Taiwanese experts more closely, we now focus on two typical examples.

Expert TW1 (Fig. 3) criticizes that the teacher did not ask S1 how he got his answer, indicating that he or she recognized a breach of a norm of responding to students’ thinking. Furthermore, TW1 clearly marks the purpose of why the teacher should have attended to S1′s thinking as a problem to be addressed.

Fig. 3
figure 3

Expert TW1′s answer

Expert TW2 (Fig. 4) starts by pointing out that although the students were taught how to solve quadratic equations, S1 and others got only one solution, which is apparently considered to be a problem. Interestingly, also four other Taiwanese experts emphasize the discrepancy between the fact that the students were already taught how to solve quadratic equations a month earlier and their observation that S1 and others apparently did not use the standard algorithm (e.g., “currently, the students are at the level of substituting specific numbers”), whereas none of the German experts mentions anything like this. TW2 then hypothesizes what S1′s mathematical thinking may have been and suggests a way to take up S1′s thinking to guide S1 to complete his answer (which was so far insufficient) and to realize that his strategy is not universally applicable. Afterwards, TW2 suggests, the teacher’s comment on the adequate solution strategy could follow. Thus, although TW2 does not explicitly evaluate the way the teacher responded to S1′s thinking as being insufficient, the suggestion of an alternative can be interpreted as an implicit evaluation. Moreover, even though it is suggested to use S1′s strategy to find the second solution, essentially this strategy is considered to be inadequate (since it just happened to work in that case) and it is emphasized that S1 found only one solution. Thus, also in this case, the purpose of why the teacher should have attended to S1′s thinking is to address a problem.

Fig. 4
figure 4

Expert TW2′s answer

Concerning the answers of the German experts, it quickly became obvious that the situation was different: Whereas also most of the German experts saw some breach of a norm of responding to students’ thinking (15 answers), only one of the answers could be considered to indicate that the breach of the norm that was intended by the Taiwanese researchers was recognized. Coding as to why other German experts evaluated the teacher’s responding to S1′s thinking negatively, showed that many of them also thought that the teacher should have attended to S1′s thinking, but different reasons for why the teacher should have done so were pointed out. We illustrate this with two examples of typical reasoning.

At first glance the answer by expert GER1 (Fig. 5) may look similar to TW2′s answer, since both of them explicitly hypothesize what S1′s mathematical thinking may have been and suggest a way to take up S1′s thinking. However, GER1 emphasizes the sophisticated nature and flexibility of S1′s mathematical thinking and strategy which were neither valued nor encouraged by the teacher. Hence, in contrast to TW2, GER1 does not see a problem to be addressed in S1′s thinking, but instead valuable mathematical thoughts to be encouraged. From GER1′s perspective, the development of such flexible solution strategies is apparently more important than the question whether the students are capable to apply the standard way of solution a month after the introduction. Accordingly, GER1 does not suggest that the teacher should guide the students to use the universally valid standard method at the end.

Fig. 5
figure 5

Expert GER1′s answer

Similarly, expert GER2 (Fig. 6) argues that the teacher did not appreciate and pick up S1′s achievement to quickly see a solution in the given equation, which is “an expression of number sense or structure sense”. Hence, also in this case the expert sees a breach of a norm in the way the teacher responded to S1′s thinking not because he or she did not address a problem, but because he or she did not use the potential for valuable mathematical thinking.

Fig. 6
figure 6

Expert GER2′s answer

Thus, from answers like the two presented here, another kind of reason for seeing the way the teacher responded to S1′s thinking as inadequate/insufficient was extracted during code development: “S1′s valuable mathematical ability or strategy is not addressed properly”. In order to investigate further, whether these two perspectives reflected culture-specific norms of instructional quality of responding to students’ thinking, the resulting coding scheme was applied to all of the experts’ answers as described above. In particular, the resulting coding allowed us to distinguish the following cases:

  1. 1.

    breach of originally intended norm recognized (code: “S1′s misunderstanding or inadequate use of strategy is not addressed properly”),

  2. 2.

    breach of alternative norm recognized (code: “S1′s valuable mathematical ability or strategy is not addressed properly”),

  3. 3.

    breach of unidentifiable norm recognized (other or no reason mentioned), and

  4. 4.

    no breach of a norm recognized (positive/neutral evaluation of the teacher’s responding).

It was one of the first three cases, if the teacher’s responding to S1′s thinking was evaluated as insufficient/inadequate. Which one of the three cases it was, depended on the kind of reason that could be identified in the response. The comparison of the number of these cases among the experts in Taiwan and Germany presented in Table 1 shows clear differences. While the majority of the Taiwanese experts actively recognized the breach of the norm intended by the Taiwanese research team, the majority of the German experts’ evaluations indicated that they recognized a breach of a different norm corresponding to another kind of purpose for attending to students’ mathematical thinking (mathematical strategy/ability to be valued).

Table 1 Numbers of experts in each case

6 Discussion and conclusions

This contribution may serve two purposes: Firstly, it addresses the question of cross-cultural validity of research on teacher noticing, which certainly merits attention in view of the growing body of research in this area in our intercultural mathematics education community. In particular, it may raise awareness of the role of potentially different cultural norms of instructional quality in research on teacher noticing, and it introduces a design that allows the making explicit of intercultural as well as culture-specific norms on the level of experts in mathematics education. We consider this an important prerequisite to addressing the question of whether and how teacher noticing skills can be assessed in a cross-culturally valid way, since it makes visible potentially different frames of references.

Secondly, this contribution gives insight into certain expert norms of instructional quality from Taiwanese and German perspectives. Similarly to the design of many studies on teacher noticing, we focused on responding to students’ thinking, and thus we illustrated similarities and differences between corresponding norms of professors in mathematics education from Taiwan and Germany. On the one hand, regardless of their cultural background, the large majority of these experts pointed out that the teacher in the classroom situation should have attended to the student’s thinking. Hence, these findings suggest that attending to individual students’ thinking is considered an important aspect of instructional quality in Germany as well as in Taiwan. As reasoned in Sect. 2.3, this is in line with the present situation in Taiwan, where the understanding of high-quality mathematics instruction does not only reflect ideas rooted in traditional East Asian culture, but also Western ideas of constructivist-based instruction (Hsieh et al. 2020).

On the other hand, different purposes for attending to the student’s thinking (Colestock and Sherin 2015) were identified by the experts in mathematics education in Taiwan and Germany. The majority of the Taiwanese experts assumed that the student’s answer indicates a misunderstanding or inappropriate strategy, which should be addressed, whereas the majority of their German counterparts assumed that the student’s answer shows a mathematical ability or strategy, which should be valued and fostered. Thus, the experts from the two countries evaluated this student’s thinking differently. In particular, there appears to be a difference regarding what is the most important frame of reference for their interpretation and evaluation of the students’ thinking: the content and its correctness or the students’ processes of doing mathematics. This finding may possibly be interpreted as evidence for how the deep-rooted cultural values underlying Leung’s (2001) dichotomy concerning the nature of mathematics (product versus process) still shape the perspectives of researchers and educators in mathematics education in responding to students’ mathematical thinking. Furthermore, this finding also may be considered to reflect Leung’s second dichotomy described above, since memorization of procedures that can always be applied for solving quadratic equations, appears to be more important to Taiwanese experts compared to German experts.

It is noteworthy that such a prototypical difference could be extracted inductively from experts’ evaluations regarding the same representation of practice in our globalized community of mathematics education researchers. These differences would most likely not have been visible, if these experts were asked about the significance of attending to students’ thinking on a general level. It can be assumed that regardless of their cultural background, the vast majority of experts would have emphasized that addressing problems of understanding as well as fostering valuable mathematical strategies are important purposes of attending to students’ mathematical thinking. However, on a situated level, in view of a specific classroom situation, different interpretations and evaluations were revealed. This phenomenon reflects the reason why research on situation-specific skills such as teacher noticing is more prone to the influence of different cultural norms than research on dispositions of teachers on a more general level.

Before discussing possible implications for intercultural research on teacher noticing, we would like to recall the limitations of this study, which suggest interpreting the evidence with care. As we did not investigate the experts’ dispositions, we cannot exclude that possibly different professional knowledge also influenced their noticing. Despite our sampling strategies and a response rate of about 60% in both countries, it is not entirely clear whether these experts’ answers can fully represent the perspectives of mathematics education researchers and educators in Taiwan and Germany. This is relevant, because of the variance that exists also within countries. Moreover, we do not know to what extent these findings can be generalized to other East Asian or Western countries.

Regarding the process of coding the expert answers, we realized in our bicultural collaboration that sometimes evaluations are very implicit, so that it is particularly difficult to interpret answers by experts from the other culture. For instance, when experts write that a student “guessed” or “used trial and error”, are these activities connoted as being valuable or inferior within a context? We dealt with these problems by intensive discussions between our national research teams and by validating our interpretations and codings of specific experts’ answers within one culture by discussing them with other experts. However, this means that sometimes the frame of reference in a specific culture is needed in order to identify implicit evaluations by the experts.

The findings regarding differences between expert norms of responding to students’ thinking in Germany and Taiwan presented in this paper are based on only one vignette for reasons explained above. While there are further vignettes regarding which our data indicate culture-specific expert norms, there are also vignettes regarding which experts from both countries largely concur, or regarding which the variance within one country is so big that there appears not to exist a specific norm on the level of the country. The analysis of our data regarding all of our vignettes will soon give further insight.

Bearing these limitations in mind, our findings nevertheless afford means of addressing the objectives of this contribution, and suggest that there exist culture-specific norms that may influence how teacher noticing regarding students’ thinking is assessed by researchers in different cultures. Hence, it is questionable whether and how such research can be cross-culturally valid (Clarke 2013a). In particular, if the aim is to measure and compare teacher noticing across cultures, we must be careful when determining what high noticing expertise is in such a case. One may even think about some kind of culturally adaptive scoring which takes into account the culture-specific frame of reference. In any case, the question of how teacher noticing can be investigated in a way that is sensitive to different cultural contexts certainly merits attention in our intercultural research community.