Keywords

Introduction

If you teach inquiry while teaching Ecology- it’s like a death verdict for everything else… It’s meeting the requirements and covering the curriculum against thinking [quote from one of the interviewees for this chapter].

Wide-scale implementation of HOT is not taking place in a void. Its outcomes are strongly influenced by the educational context in which it operates. The implementation of policies advocating instruction of HOT takes place in many countries while administering multiple standardized tests for accountability purposes. This testing creates a climate of high-stakes testing, with far-reaching consequences for all those involved: school principals, teachers, and students. One of the most negative outcomes is intense test preparation. This chapter examines how a culture of high-stakes testing affects the implementation of policies advocating instruction of HOT.

The effects of policies that in effect create a climate of high-stakes testing on learning and instruction have been documented by numerous researchers in many school subjects (e.g., Koretz, 2008; Mansell, 2007; Nichols & Berliner, 2007). These studies show that when efforts are taken to raise test scores in a short time, scores may go up, while deep processes of learning and instruction are being undermined.

Many studies show that this general phenomenon is also prevalent in science education (e.g., Maerten-Rivera et al., 2010; Marx & Harris, 2006; Shaver et al., 2006). Anderson (2012) conducted an integrative review of science education studies in this area. As noted in previous chapters, numerous policy documents and science educations organizations currently support the use of progressive pedagogies such as inquiry-based instruction, instruction geared toward constructivist learning, project-based learning, and student-centered teaching. Anderson’s review shows that under policies that induce high-stakes testing, research-based reforms aiming to implement progressive pedagogies tend to be compromised. Teachers’ practice becomes more fact based; they teach less science content; they become less satisfied and fail to meet many students’ needs. Accordingly, studies addressing educators’ beliefs show that educators on all levels think that such compromises indeed take place. For example, Kersaint et al. (2001) interviewed 46 principals supported by NSF-funded science education centers in four cities across the USA. Most of these principals felt that testing policies, not reform ideals, are in effect the force that drives instruction.

Teachers often perceived accountability as disrupting to the efforts to induce educational reforms and as changing their course of instruction. In addition to their feeling that they must teach to the test, teachers reported that they no longer teach the way they think best. Although science education experts encouraged the use of inquiry-based instruction, teachers reported that high-stakes testing discourages its use. Studies consistently showed that under high-stakes testing conditions, inquiry-based lessons take place much less frequently. Teachers stated that they included much more inquiry-based curriculum in classes not connected with tests, whereas instruction in classes connected to the tests was more fact based. Generally speaking, the reviewed studies indicated that accountability measures emphasize isolated facts rather than HOT and that even when tests try to assess HOT skills, they do not necessarily influence teachers to teach these skills (Anderson, 2012).

Achievement Gaps and High-Stakes Testing

On the face of it, test-based accountability appears to have increased attention to achievement gaps because it increases expectations for all students, particularly low-income and minority ones (Anderson, 2012; Nichols & Berliner, 2007). Yet, many researchers argue that high-stakes testing and accountability widen rather than narrow achievement gaps. Research suggested that the structure of most accountability systems led teachers to focus more on students near the scoring cut-off point of meeting standard than on the lower achieving students. Teachers often view low-achieving students as less likely to be able to move from below to above the critical standard. Therefore, teachers neglect to nurture low-achieving students because they view them as “lost cases” in terms of their ability to meet the standard (Elmore, 2004; Gamoran, 2007; Huber & Moore, 2000; Nichols & Berliner, 2007; Shaver et al., 2006; Supovitz, 2009). Nichols and Berliner (2007) argue that children in too many schools rote-learn while slaving over worksheets for too many hours “preparing for the tests.” Many poor and minority children, however, are required to do so even more than other children. The phenomenon of “narrowing the curriculum,” including the avoidance of critical thinking while teaching for high-stakes testing, affects such students more than others. Marx and Harris (2006) address this point eloquently. They warn that especially in low-performing schools that are under intense pressure to show immediate improvement on test scores – test preparation and test taking account for substantial instructional time. These researchers are concerned that instructional time in low-performing schools will be spent on a narrow set of scientific facts needed for short-term success on tests. Moreover, as testing requirements increase, there is mounting pressure in marginal or failing schools to standardize instructional approaches that in turn will squeeze out components such as inquiry-based learning (IBL) from teaching science. In high-performing schools, it is more likely that time would be spent on a more ambitious approach to instruction, including more time for inquiry, because far less time will have to be allocated to test preparation.

These findings relate to a group of other studies, showing that teachers believe that instruction of HOT is indeed an appropriate educational goal for high-achieving students, but not for low-achieving ones. Research data show that in effect, teachers apply more thinking-rich instruction with high-achieving than with low-achieving students (Oakes, 1985; Raudenbush et al., 1993; Warburton & Torff, 2005; Zohar et al., 2001; Zohar & Dori, 2003).

Educational Context

The study described in this chapter was conducted in a specific educational context. Between 2009 and 2012, the official policy of the Israeli Ministry of Education (MOE) consisted of (among other issues) the following three components:

  1. (a)

    An aggressive policy stating the need for a rapid improvement in the scores of standardized tests (the international PISA and TIMSS, and the national Mafmar and Meitzav science achievement tests, Israeli MOE, 2009). In addition, the policy also indicated a need for a rapid increase in students’ participation rate in the most prestigious high school matriculation exams. The following quotation provides an example of a policy document stating this policy:

    In the next three academic years the educational system will advance10 places in the international (2012) PISA ranking in mathematics, science and language … In four years, the number of students eligible for the 5 units matriculation certificate [i.e. the highest level matriculation level] will increase by 10% in each of the following school subjects: mathematics, physics, chemistry and English as a foreign language (Israeli MOE, 2010).

  2. (b)

    A policy advocating the development of students’ HOT and inquiry skills. As noted in Chap. 1, the MOE continued to support the “Pedagogical Horizon: Educating for Thinking” policy advocating a system-wide change towards a thinking-rich curriculum. The support continued even after a change of government was accompanied by radical changes in the MOE’s educational policy in many other areas. Consequently, in science education, the goal of teaching thinking was expressed in the official policy of the MOE Science and Technology Unit who decided that the implementation of scientific inquiry will be one of its major goals:

    This decision would lead to construction of HOT skills as well as to meaningful construction of content knowledge. Consequently, students’ achievements will improve (Israeli MOE, Science and technology administration, 2012).

  3. (c)

    A policy calling to narrow achievement gaps. For example, section #2 of the MOE “Goal Plans for the year 2011” states that the education system will strive to “Narrow academic gaps” (MOE, 2011).

The MOE took several quite dramatic steps in a top-down manner to implement the “raising test scores” policy in junior high schools:

  1. (a)

    Introduction of considerable changes to the junior high school science curriculum in order to improve its overlap with the TIMSS framework.

  2. (b)

    Addition of extra weekly hours to science instruction for test preparation.

  3. (c)

    Putting together a set of new learning materials (called the “Hila Kits”) for teachers’ use. The Kits consisted of a detailed description of the required knowledge and skills, theoretical materials for teachers, suggestions for instruction, and numerous examples of test items.

  4. (d)

    Issuing new strict regulations about what teachers needed to teach in each part of the year.

  5. (e)

    Hiring a team of instructors for visiting schools, in order to provide guidance and assistance about how to teach the renewed curriculum and new learning materials, but also to inspect whether teachers were teaching according to the new guidelines.

Several top-down steps were also taken in order to implement the teaching for thinking policy. The development of the Hila Kits was a significant contribution to this process. The Kits were designed to support teachers in planning and applying the content and skills designated by the science and technology junior high school curriculum. They were viewed as suggestions only, and each teacher was advised to adapt them to the specific needs of the school she/he was working in. The units of the Kits focus on chosen central topics of the curriculum (each unit covers 15–20 h of instruction) and central skills. Each unit consists of description of the knowledge and skills students would need, relevant pedagogical content knowledge for the unit’s main topic (e.g., energy, reproduction and heredity, ecological systems), scientific background; practical suggestions for instruction. suggestions for lab activities, and a collection of assessment tasks (National Teaching Center for Science and Technology in Junior High School, 2010). The learning materials in the Hila Kits consist of many HOT and inquiry activities that are integrated into the science content. In addition, in order to encourage teachers to actually engage in teaching thinking, the national science achievement tests were gradually changed over several years, so that, eventually, approximately one-third of their items assessed HOT (Zohar, 2013a, b). The items in these tests are either multiple-choice items or items that require a short (up to three lines) constructed response. All items address topics from the science curriculum.

The HOT items normally consist of a requirement to apply knowledge studied in class to new circumstances, to construct explanations of a scientific phenomenon, or to apply scientific inquiry skills in investigations that are intertwined with the science content covered by the curriculum. The latter items present either a problem or some research data and ask students to plan experiments, to record findings, to analyze data, to draw conclusions, etc. An analysis of these items according to a taxonomy of levels of thinking such as modern versions of Bloom’s taxonomy (e.g., Krathwohl, 2002; Leighton, 2011) would indeed place them as items that require more than memorization or simple comprehension. For example, an item addressing the eighth-grade electricity chapter stated that Neta conducted an experiment with the goal of finding out how the thickness of an iron rod influenced the level of current in an electric circuit. The item presented a diagram of an electric circuit that included a section made of an iron rod and a table with data about six iron rods differing in length and thickness. The item asked students to advise Neta as to which rods she should pick for her experiment. This was a multiple-choice item because students were asked to circle one of four combinations of three rods that the table provided (e.g., a. Rod #1, #3, and #5; b. Rod #1, #2 and #6; etc.). This was followed by another question, asking students: “Explain your choice by referring to the length and thickness of the rods you chose.” Students were given one line to compose their response. An official website of the National Authority for Assessment and Evaluation published an analysis of these items, stating that they addressed the sections of the curriculum about energy and about understanding scientific inquiry. In explaining what these items require students to be able to do, the website consisted of the following assertion:

these items require of students to identify the independent variable (thickness of the rods); to understand that only the independent variable in an experiment needs to be changed while all the other variables (length of the rods) need to stay constant; to understand how to apply the rule of variable control in an experiment; to identify in the table of data the rods that are suitable for this experiment. (National Authority for Assessment and Evaluation)

It should be noted that it is not easy to come up with original types of HOT test items. Indeed, an examination of tests from several consecutive years reveals that they consist of a limited number of patterns of test items that are repeated in diverse content areas.

An important question in the context of the current research is whether or not students can be cued or prepared for answering HOT test items, such as the example described in the previous paragraph. When answering multiple-choice items that are designed to assess HOT, students may often pick up the correct answer using a heuristic that directs them to answer correctly rather than by applying deep understanding of the reasoning strategy assessed by the test item (Cooper, 2015; Talanquer et al., 2015; Zohar, 2013a, b). A possible remedy for this problem is to ask students to explain their response to the test’s multiple-choice questions in order to verify their understanding. Yet, simply asking for a short explanation is not always sufficient because there is evidence that students can rote learn correct responses to recurrent patterns of reasoning item requiring simple, short-constructed answers (Zohar, 2013a, b). One such example draws on data from the Israeli matriculation exam in biology during the 1980s. The exam contained a chapter that addressed scientific inquiry skills, including an item that assessed the control of variables strategy. For several years, students’ scores on this item were extremely high. Suddenly in a certain year students’ scores dropped dramatically. An examination of the items that appeared in the exam over the years explained the drop: the pattern of the control of variables item was quite similar over many years, but was changed in the year the scores dropped. The item was not more difficult, but since it was different than the pattern students were cued on, they could not use the heuristics teachers taught them for answering the control of variables item. In general, the limited number of patterns of HOT test items that are repeated over the years in diverse content areas is a key factor in the current discussion because under such circumstances, teachers can and often do prepare students not only for choosing the correct multiple-choice response but also for justifying their choice in a short sentence (Zohar, 2013a, b).

It should be noted that whether or not an item actually makes it necessary to think in order to answer it correctly obviously depends on the nature of the item, but also on the educational context within which students have to answer it. An item may require deep thinking when students encounter it for the first time. The same item may require mainly retrieval from memory if students have been drilled in numerous examples of similar items as part of intense test preparation.

Finally, in order to implement the third policy calling to narrow achievement gaps, teachers were asked to facilitate personalized treatment of students. Specifically, teachers were required to follow up and report upon achievements of individual students.

The educational context in which these three policies were made at the same time provides a unique opportunity to study how their consequences interact with each other. The present study therefore aims to address this issue from the perspective of how senior science teachers view the effects of several simultaneous and interacting policies on classroom practices.

Methodology

Twenty semi-structured interviews were conducted with 20 senior science teachers in junior high schools. In addition to current science classroom instruction, all participants either served as heads of science departments in their schools or were engaged by the MOE as instructors in professional development programs for science teachers. In order to be chosen for these roles, teachers needed to have good reputation in terms of their instructional skills as well as to have a robust background of a large variety of in-service professional development courses (many of which focused on teaching inquiry and HOT) and/or a higher degree than just the BA and the teaching certificate that are necessary for teaching science. As indicated by the data analysis presented in what follows, the more senior than average participating teachers were indeed knowledgeable in terms of progressive teaching methods.

The singularity of the interviews is in their timing—they took place close to the time the international TIMSS, and local Meitzav, and Mafmar science tests were administered. Consequently, it was close to the time teachers had prepared their students for the test, while this experience was still fresh in their memory. The semi-structured interviews were approximately1.5 h long, consisting of 11 core questions and numerous probes, addressing issues such as: What are teachers’ main goals in teaching science and what do they see as the best ways to achieve them? What is their view about teaching HOT in science classrooms? How do they believe students should best be assessed? What is their opinion concerning the policy of raising test scores, the new science curricula, the new testing regime and the hila kits? And, how do they think these issues had affected science teaching? In addition, the interview asked about the pressure to raise test scores and how it had affected all of the above.

The data was analyzed using a pragmatic qualitative research approach that is particularly suitable for professional fields because it provides the descriptive information that can inform professional practices (Savin-Baden & Howell-Major, 2013; see also Chap. 4). The research conducted within this approach is just what the name implies: research that draws upon the most sensible and practical methods available in order to answer a given research question. It aims for description of experiences and events as interpreted by the researchers. It therefore marks the meeting point of description and interpretation, in which description involves presentation of facts, feelings, and experiences in the everyday language of participants, as interpreted by the researcher. Analysis typically consists of qualitative content analysis using modifiable coding systems that correspond to the data collected, and interpretation stays close to the data (Savin-Baden & Howell-Major, 2013).

Findings: “Because Of The Measuring—We Are Loosing It”

The official voice of the system is to encourage thinking skills. But the way it is implemented … because of the attempt to raise test scores, because of the measuring- we are loosing it.

These pessimistic words of one of the interviewees summarize the views of many of the teachers concerning the gap between the stated policy regarding the advancement of inquiry and HOT in science learning and what actually takes place in science classrooms. It seems that the simultaneous requirements for a rapid improvement in test scores and for fostering HOT, created considerable tensions and conflicts. The rest of this chapter examines various aspects of this statement and their implications.

Teaching for the Test Increases the Frequency of Engagement with HOT During Instruction

Sixteen teachers (80%) view the system-wide tests as a tool that directs learning and instruction:

If we want to succeed in international tests, there is no other way. Currently, we are teaching for the test (original emphasis by the interviewee).

According to these 16 teachers, in order to prepare students for the tests, it is necessary to teach them both the content and the skills the tests require. Because the tests are rich in thinking items, part of test preparation must consist of addressing thinking skills in the classroom:

  • The Meitzav test is oriented towards learning, and it focuses on thinking skills. Once I understood that this is what is important for the test, I began to emphasize it [in my teaching].

  • This year the Mafmar test will definitely include inquiry skills such as controlling variables. Students don’t know that, so I taught it before the test.

  • Throughout the years we learned that students fail the HOT items. We need to prepare them in a better way for the challenge involved in this part of the test.

Not only the tests but also the learning materials in the Hila Kits addressed HOT in an extensive way:

I thought the Hila Kits were really good because they had more thinking skills … It is important that thinking issues are being treated.

A total of 13 teachers (65%) noted that the materials they received in the Hila Kits addressed diverse thinking levels, including questions requiring HOT strategies such as text analysis, understanding graphs, formulating research questions, formulating evidence-based arguments, drawing conclusions, etc. These findings indicate that on the face of it, both the high-stakes tests and the Hila Kits learning materials encouraged and supported the policy of teaching for thinking. The picture changes, however, when we look deeper into the data.

Rote Learning of HOT?

According to many of the interviewees, test preparation caused “teaching thinking” to consist of rote rather than meaningful learning. In order to explain this view, I will first highlight the differences between how teachers view rote and meaningful learning in general and then show how these differences are expressed in the ways by which test preparation affects teaching for thinking.

Fourteen teachers (70%) explained the difference between how they view meaningful learning (which they called “real learning”) and non-meaningful learning. The interview transcripts showed that these teachers have rich pedagogical knowledge that may enable them to support students’ deep understanding in diverse topics. The following citation is an example of how one of the teachers explains the central idea of the relationship between surface area and volume to her grade 7 students. This central idea is replicated in many biological contexts such as the small intestine, lungs, red blood cells, and plants’ leaves. I chose this particular example from many other transcripts showing teachers’ rich pedagogical knowledge because it demonstrates several aspects of meaningful learning of both scientific concepts and HOT:

I found a solution [for how to teach] the relationship between surface area and volume. I bring to the classroom two baguettes and chocolate spread. I spread the chocolate on the surface of one of the baguettes and cut the other one to small, round slices. Then I spread chocolate around each slice. While I am doing it, I ask them where I am using more chocolate … They are watching and know the answer right away: the sliced one. In the test you could see that this was experiential learning. This is meaningful learning. I met again several of the students who were in that class while we were studying about the digestion system when they were in 10th grade. They all said that this is exactly the same thing as the baguette and the chocolate. It is exactly the same … They remembered and could apply [the principle] correctly. Students are active, involved, experiencing and they take responsibility during [meaningful] learning … [when students come to me and ask:] How should we do this experiment? [I tell them:]… Let’s plan it together. I find this is the most important thing… For meaningful learning they need to think about it for themselves in a deep way, the need must come from them … That they will ask clever questions … that don’t have easy answers, that you need to search information [in order to find the answer]. All students are going to search for information and then next time we meet each student tells what he or she found about this question. Each student presents his or her findings. This process also places me in a proper role: the teacher also does not know [the answer] so we are all investigating it together. (#1)

In this excerpt, the teacher talks about two ways of meaningful teaching and learning. In the first part of the excerpt, she explains how she had taught the principle of the relationship between surface area and volume by engaging students with an experiential demonstration of a baguette and chocolate spread. This demonstration shows students that the amount of chocolate you can spread on a whole baguette (which has a large volume compared to the volume of each slice) is smaller than the amount of chocolate you can spread on all the slices (i.e., the pieces you get when you slice the baguette) combined. This shows that the total surface area of many bodies with small volume is larger than the relative surface area of a large form that has the volume of all the smaller volumes combined. The teacher notes that due to the vivid experience involved with this demonstration, students had not only remembered it for several years, but could also apply the principle they had learned to a new context (the digestive system).

In the second part of the excerpt, the teacher discusses several characteristics of meaningful learning, noting the following relevant aspects: learning is experiential; students are active learners who are deeply involved in their own learning; students ask “smart questions” leading to an inquiry process that triggers a need to look for information and construct new knowledge; during inquiry learning the teacher’s role changes from being a source of information to supporting students in thinking and in looking for solutions; and the teacher is learning along with her students.

Like additional teachers who talked about meaningful learning, this teacher held a constructivist view of learning according to which learning is a process of meaning-making by active learners who engage in inquiry and HOT. These teachers view instruction in which the teacher is transmitting information and “spoon feeding” her students as learning that does not bring about deep understanding. Teachers view such learning as shallow rote learning that does not support the development of thinking tools and the ability for deep thinking because learners do not go through an active process that help them construct their own knowledge. This idea is expressed in the following citations:

  • Real learning means you reach some very serious situations of thinking. Raising authentic hypotheses, examining things, deliberating, experimenting. If you don’t allow them to make mistakes, it’s not real learning. Because it is you that does and explain everything.

  • To transmit information…, while it doesn’t matter whether or not they got it … Rote learning does not mean that I went through a process, or that I had learned anything for the sake of learning, for the purpose of fostering students’ thinking. Instead, I had learned because I had to, and I did not receive enough tools that can help me to think differently.

Another teacher had explicitly described the negative effect of the policy for raising test scores on the quality of learning:

Measuring and running after achievements is not real learning. It’s the same as reciting a History chapter … and doing well on the test. […] It’s not clear what will remain in my head in a month.

The interviews show that the pressure to do well in the tests leads to abandonment of complex teaching goals such as meaningful learning with deep understanding and thinking, in favor of simpler goals focusing on test achievements.

Despite the fact that the interviewed teachers seemed to have rich pedagogical knowledge that may enable them to support students’ deep understanding in diverse topics, there is ample evidence that test preparation hindered their use of that knowledge. Seventeen (85%) of the interviewees reported that the new guidelines to teach the content and skills that are necessary for the exam drove them to change their teaching patterns, narrowing the opportunities to provide students with meaningful learning experiences. In general, these teachers stated that they are no longer able to combine complex thinking tasks with their daily teaching routine. About half of these teachers said explicitly that the requirement to raise test scores made them reduce mainly teaching processes focusing on thinking, inquiry, and creativity:

  • Precisely because of the new program, I am less able to provide meaningful learning so that the child will engage in inquiry, will be interested, will be able to explain to others, to ask HOT questions … I have less time for creativity.

  • Why did I stop doing inquiry? Because my class was supposed to take the TIMSS test … So I needed time to prepare them … And you can’t do it all at once. It creates a conflict. (emphasis in teachers’ own voice)

  • It turns out that when you’re stressed learning is not meaningful. Teachers teach the material because they have to, they just tick it off, and it chills any enthusiasm. Teachers felt stressed … They had no time for projects, papers … These things take time.

  • A stressed teacher will run away from allowing students think. She doesn’t have the time to develop a discussion until they reach a conclusion … [she] will tell them what the concept means and that’s it.

How do these ideas align with the findings from the earlier section according to which test preparation increases the frequency of engagement with HOT during instruction? Teaching thinking is usually viewed as contradicting rote learning (Zohar, 2013a, b). A careful reading of the interview transcripts, however, shows that in an era of intense test preparation, this contradiction is no longer necessarily true. HOT test items address strategies such as control of variables, formulating research questions, verbal explanations of graphs, and drawing conclusions. Twelve teachers (60%) said that it is possible to engage with issues involved with such thinking strategies in a superficial or “mechanical” way. This means that instruction focuses on drill and practice that aim at improving students’ ability to respond correctly to HOT test items, rather than on the construction of students’ thinking abilities. Therefore, according to the interviews, engaging with HOT items does not necessarily reflect scientific thinking because students can engage with them and even answer test items correctly by applying rote learning:

  • Many children learn by memorization, they engage in rote learning of knowledge, and get great test scores. This is not yet scientific thinking … The system however views test scores as the ultimate manifestation of achievements.

  • A child who studies science in the way that is common in today’s schools, does not develop scientific thinking even if he does get high test scores … The Meitzav test indeed contains HOT items, but this is not enough because they are clearly not questions about processes. Questions about fragmented issues that are disconnected to each other, actually mean you need to spill out what you have memorized, to spill out the thinking skills. (#19)

These quotations indicate that the teachers indeed believe that it is not necessary to actually engage with active HOT in order to answer the tests’ HOT items, because students can answer such questions by “spilling out” the material they had memorized for the exam. Indeed, nine teachers (45%) said explicitly that while they prepare students for the test, they must teach HOT strategies in a mechanical way rather than as a process that fosters the development of students’ thinking abilities and deep understanding.

It is important to note that instruction emphasizing memorization rather than active thinking results from the pressure to prepare students for the exam rather than from lack of the pedagogical knowledge required for good teaching. For example, one of the teachers described how she was obliged to teach a thinking strategy (formulation of an inquiry question) in a “transmission of information” approach. The teacher explained that this was not the way she would have liked to teach, making it clear that she had the pedagogical knowledge required for constructing students’ reasoning abilities in an active and profound way. She explained how she would have taught if she only could. She would have encouraged students’ brainstorming to raise multiple possible questions and then ask students to classify the questions they had raised, to create criteria for high-quality research questions, to make reasoned decisions about which question to choose, and to think about the variables they would like to investigate. Yet, rather than apply this comprehensive pedagogical knowledge in the classroom, she described how she had begun her lesson by “telling” her students what is a research question, which research question they are going to investigate, and what would be the dependent and independent variables in their investigation. This teacher explained that the time pressure created by the preparation for the test forced her to teach thinking strategies “in a mechanical rather than a meaningful way.” She summarized this section of the interview by stressing that she believes the way she had taught does not lead to “real learning” (these quotations are cited verbatim).

Another teacher elaborated even more about teachers’ difficulties in working in the midst of conflicting policies—on the one hand a policy pressing to raise test scores and on the other hand a policy advocating the development of students’ thinking:

The [Hila] Kits were written with a new spirit of saying YES to thinking and skills. This year they told us to emphasize inquiry skills … I would have expected it to be performed in that spirit, so that students will get a message that they really needed to think. In second thought, perhaps their goal was only to show that test scores are suddenly going up (original emphases by teacher). The Kit consists of some rather amazing questions that could be part of our lessons … But this is all a function of how much time we have, if we can actually fulfil their potential.

This teacher clearly saw the Hila Kits’ potential in terms of advancing the new spirit of teaching HOT and was hoping that it will indeed affect her teaching. However, her expectation that the new spirit evolving from the teaching thinking policy will indeed affect students’ thinking was not fulfilled. This excerpt indicates that parallel to the recognition of the demand to teach students to think, she also recognizes the demand to increase test scores. According to this teacher, the possibility to integrate the Hila thinking questions (which she describes as “amazing”) into her routine teaching does not materialize because of the lack of time resulting from the pressure to prepare students for the test. It seems that under these conditions, students and teachers cannot allow themselves to “waste” time on thinking. Teachers’ attitudes toward the conflicting messages coming from the MOE are also expressed in the following citations, whose first lines are also cited at the beginning of the chapter:

If you teach inquiry while teaching Ecology- it’s like a death verdict for everything else. You must devote special time for it because otherwise you will be short of time … I see it in other schools too, exactly the same difficulties as here … I must meet the requirements and cover the curriculum … What can I do? It’s meeting the requirements and covering the curriculum against thinking (emphasis in origin, apparent in teacher’s voice]. This is difficult because our students are racing to reach the goal – the final test, after they had covered all the material … At the beginning of the year I give them the syllabus and each time I am marking on it what we already covered … So we are talking about doing something according to schedule, not about learning [emphasis in origin, apparent in teacher’s voice]. Learning is not meaningful. Teachers say that they feel they are racing. We may have mentioned a concept on the level of naming it, but we didn’t really teach it. There is no joy of learning, of accomplishment, of doing something deep … Everything is about tic, tic, tic, quickly, quickly, and about meeting requirements on time.

Following the new program, I have less of a possibility to teach in a meaningful way, so that the child will investigate, will be interested, will be able to explain to others, to ask HOT questions about a process … I have less time for creativity, so that each child would be able to be creative, that his curiosity will not be lost… I have changed the way I teach.

We are currently abandoning learning by inquiry [because otherwise] we will be unable to meet the requirements on time. I don’t have the privilege of letting them conduct an experiment, do something with their own hands. This is a considerable transformation in perception.

These teachers describe their difficulty to devote time to teaching HOT, particularly to teaching thinking (as well as content) in a meaningful way while still meeting the requirements in terms of test preparation. As mentioned earlier, 60% of the interviewees reported a similar conflict.

In summary, despite a policy stating that developing students’ thinking is one of the MOE’s explicit goals, the policy stating the need to raise test scores overrules. Test preparation drives out thinking-rich instruction and deep learning.

Narrowing Achievement Gaps

The studies presented earlier point to educators’ belief that the pressure to raise test scores increases rather than decreases achievement gaps (e.g., Elmore, 2004; Marx & Harris, 2006; Nichols & Berliner, 2007; Shaver et al., 2006). Nichols and Berliner (2007), for example, argued that the focus of schools on raising test scores and on test preparation undermines the school system’s stated goal of narrowing achievement gaps. As explained, another policy advocated by the Israeli MOE at the period examined here addressed the need to narrow achievement gaps by (among other things) raising test scores for all students (Israeli MOE, 2011).

This section examines how the raising test scores policy affected diverse students’ populations. Although no interview question addressed this issue directly, 11 interviewees (55%) spontaneously raised the issue of students’ diversity and meaningful instruction of science topics (including application of HOT) to all students. These teachers argued that there is a contradiction between the requirement to teach complex topics (e.g., density, forces and interactions, and complex processes in the human body) in a limited amount of time and the requirement to adapt instruction to students’ diversity. According to these teachers, the quick pace of instruction required to cover the curriculum in terms of both content and skills is suitable for the abilities of high-achieving students. The same quick pace, however, harms the ability to provide a suitable response to students with low academic achievements:

  • Meeting requirements in terms of covering the material in a limited amount of time is something that currently pressures teachers … From my own point of view, this pressure is very positive … I teach a strong student population for whom this pressure is excellent. The pressure to study helps them to make progress … Strong students get much more from the new program … They get everything really quickly … There is also a feeling that we can run quickly with the material and waste less time … rehearsing each topic … So the weak students are left behind (emphasis added by authors).

  • For the weak students the topics we need to teach are too many and too complex. This creates difficulties … For these students I need to reduce the amount of topics I teach … It flows smoothly for the strong ones … The strong ones want to move forward, they want more information, but I need to restrain myself so that I will not frustrate the weak ones who need more drilling in order to understand … It is very difficult to cope with all this in a heterogeneous classroom

Turning our attention more specifically to the issue of teaching thinking, previous studies found clear evidence showing that students from all levels gain from instruction of HOT. Yet, most teachers believe that it is an appropriate educational goal for high-achieving (HA) students but not for low-achieving (LA) ones (Warburton & Torff, 2005; Zohar et al., 2001). These studies alert us about the danger embedded in this belief because it might become a self-fulfilling prophecy. Teachers who hold this belief may direct thinking activities more to HA than to LA students, thereby inhibiting them from making progress. In effect, this belief may widen achievement gaps. Previous studies also found that teachers lack instructional tools for teaching thinking to LA students and specifically for providing the scaffolding these students need.

In contrast to these previous studies, 10 of the 11 leading teachers who expressed their views regarding this issue in the present study believe that the goal of teaching HOT is equally appropriate for LA and HA students. Only four teachers were apprehensive about the frustration of LA students following instruction of HOT. Yet, the interviewees are not blind to the difficulties of LA students. In effect, they acknowledged these difficulties but believed they can overcome them by using appropriate scaffolding, as long as they can devote an appropriate amount of time to the teaching of this issue. The excerpts also show that the interviewees have complex pedagogical knowledge that includes specific instructional strategies such as scaffolding a complex task by dividing it to several smaller tasks, or scaffolding through a series of guiding questions:

Not all students can reach the highest levels of analysis or synthesis…. I would give such tasks only if I could teach the proper background. This is not an easy task… Weaker students have a problem with understanding graphs… I would not give them such tasks without first verifying that they know how to read a graph, and that I had taught them about dependent and independent variables. Only then I would ask them to analyze it.

  • I would give that task to the stronger students, but I would not make it too easy for those who find it more difficult … There are so many thinking strategies … I would break them into small portions … do it in a more friendly way … and let everybody deal with it as is. They can do such a task even if it is difficult.

  • I believe that a task which requires HOT is appropriate for all students. Some students will be able to draw a conclusion on a rather simple level and others will find it difficult to draw any conclusion at all … Therefore, whenever we have high-level questions, I would provide support by asking guiding questions to those who find it more difficult … Perhaps by the end of the school year more students will be able to operate on high thinking levels.

Yet, these teachers explained that there is a mismatch between the program’s demands to teach many topics (some of which are rather complex) in a limited amount of time and the need to adapt instruction to students’ diversity. They stressed that the quick pace of teaching dictated by the stressed atmosphere characterizing high-stakes testing may be suited for the needs of HA students, but interferes with providing the support that LA students need:

  • The stated goal of the MOE was to increase the test scores of weak students. For example, they gave teachers examples of how to accommodate learners’ diversity when planning a lesson … But a stressed teacher will be less attentive to learners’ diversity. She will be running ahead to cover the curriculum and therefore will loose 20% of the ‘weaker’ students.

  • Teachers that got stressed … did not reach high achievements and did not adapt their teaching to learners’ diversity … To learn information by heart or to watch the teacher demonstrate an experiment is evidently not enough. In order to treat learners’ diversity you must apply a variety of teaching methods, to use scaffolds. This must be done in a systematic way by those who design new learning materials … teachers must have the option to respond to students’ diverse needs.

  • Strong students get much more from the new program… they get it quickly… There is a feeling of a quicker pace and less wasted time in the new program… so the weak one stay behind.

  • I failed to bring the weak students to the level of application … because they did not go through the process, did not internalize the skills. In addition, the topics they learned this year were very complex and abstract, requiring high levels of thinking … If the curriculum is not suited to students’ cognitive level, even the most professional teacher with the highest motivation will not be able to make it. Following the innovations of the last two years teachers had started to question students’ abilities. They say … students will not be able to cope with it. Such teachers will be less attentive to students’ diversity. If such teachers will run forward they would lose 20% (of the students)

Summary and Discussion

This chapter examines how senior science teachers view the policy of raising test scores in terms of how it affects instruction of HOT and narrowing achievement gaps. Previous studies show that “teaching for the test” focuses on memorization of facts and basic skills rather than on instruction of HOT (e.g., Koretz, 2008; Mansell, 2007; Nichols & Berliner, 2007). When comparing these findings with those of the present study, the data show mixed results. On the one hand, our data show that following the implementation of a policy calling for instruction of HOT, teachers indeed devoted more time to engage students with HOT while they were preparing them for the tests. Please note, however, that we use the words “teachers devoted more time to engage students with HOT,” rather than the words “teachers devoted more time to teaching HOT,” because our analysis shows that many of the teachers didn’t actually think they were teaching students how to think, but more how to respond to particular types of test items. They did this by using algorithms and drill and practice.

Part of preparing students for the tests consisted of engaging with HOT tasks taken from the Hila Kits and from tests of previous years. Following the policy calling for teaching students to think, the design of the Hila Kits addressed thinking strategies such as analysis of scientific texts and graphs, formulating inquiry questions, controlling variables, formulating evidence-based arguments, and drawing conclusions. Teachers therefore believed that the Hila Kits had an important role in preparing students for the tests. Many of the participants in this study expressed the view that teaching HOT is a worthwhile instructional goal and indicated that they had elaborated pedagogical knowledge of how to practice it with their students in a meaningful way. Yet, their expectations regarding a “new spirit” calling for instruction of inquiry and HOT throughout the system did not materialize. They reported that under the regime of high-stakes testing, instruction of HOT seemed to take the form of “mechanical instruction,” implying rote learning and drilling students in answering HOT items, rather than teaching for thinking in a meaningful way.

How can we explain this finding? As explained in the methodology section, it seems that the standardized tests under consideration indeed applied many items that can be classified as HOT items. But as the data reveal, the structure of a test item is only one of the factors determining how it will affect learning and instruction. Our study shows the significance of the educational context in which the tests are applied. In the case of the present study, two main factors in the educational context seemed to have an especially large effect. The first factor is the large amount of material that needed to be covered in a limited amount of time. The second factor is the fact that the tests were given in an educational context that pressured teachers to experience them as high-stakes tests. The data show that the combination of these two factors made teachers resort to more didactic methods than they cared for. Pressing teachers to “cover the curriculum” in a limited amount of time made teachers engage with the complex learning materials from the Hila Kits in a shallow way of drill and practice. Teachers’ beliefs regarding the strategies addressed by the tests influenced their decisions as to which thinking strategies they should teach. In addition, the fact that the tests consisted of patterns of HOT items that were applied repeatedly over the years made it possible to drill students in how to respond to such items using external cues, rules, and algorithms rather than deep thinking. Under these circumstances, instruction may have facilitated students’ ability to improve test scores, but did not make a real contribution to the development of their scientific reasoning and deep understanding. In sum, the chapter demonstrates the diverse ways by which the educational climate created by the “raising test scores” policy had affected the substantive pedagogy in the context of teaching HOT.

The analysis presented in this chapter points to future directions for improving testing from the perspective of its potential contribution to valid assessment of HOT. One recommendation is to develop more complex tasks. Teachers will find it difficult to prepare students for such tasks by using “technical” drilling and rote learning. We can assume however that written tests are a limited means to accomplish this goal. Consequently, there is a need to develop more diverse means of assessment such as inquiry papers, projects, portfolios, etc. Yet, improving assessment is only part of the story. A second recommendation relates to the educational culture of the school system.

Strong pressure to raise test scores creates a culture of high-stakes testing. Even the best assessment methods cannot function well in such a culture that drives principals, teachers, and students to adopt diverse means for raising test scores without really improving learning (Zohar, 2013a, b). Therefore, if we want assessment to be able to support deep thinking, we need to develop both appropriate assessment tools and a climate that will be free of the pressure to raise test scores.

The findings show that despite a policy stating the need to narrow achievement gaps, under such circumstances, learning of low-achieving students is compromised. Many of the teachers in our study believed that in principle, it is possible and worthwhile to develop the thinking of low-achieving students. Yet, because of the aggressive policy regarding a quick improvement of students’ achievements, they had to abandon the goal of teaching HOT to low-achieving students.

In sum, the central contribution of this chapter to the main argument of the book is in demonstrating how a policy embracing the development of students’ thinking is affected by policies regarding raising test scores and the climate of high-stakes testing that follows such policies. In particular, the findings show that the considerable efforts to implement the policy advocating instruction of HOT were compromised in terms of their effects on classroom practice within the climate of high-stakes testing. Substantive pedagogy does not take place in a void because it is sensitive to culture and context. A strong pressure from policy makers and administrators to raise test scores leads to a climate of high stakes. Such a climate interferes with teachers’ abilities to develop students HOT even when they have the knowledge, supporting learning materials, and motivation to do so.