Keywords

1 Introduction

Educators are increasingly turning to student surveys as a valuable source of information about important features of school and classroom learning environments, ranging from time on task and content coverage to more qualitative aspects of teaching—e.g., the extent to which classes are well-managed, teachers foster student cognitive engagement, or students feel emotionally, physically, and intellectually safe (Baumert et al., 2010; Klieme et al., 2009; Pianta & Hamre, 2009). Considerable research shows that student survey reports can be aggregated into reliable indicators of constructs that have been variously identified in the literature with terms like learning environment, classroom climate, instructional practice, or teaching quality. These constructs may or may not be exchangeable across areas of study, but irrespective of terminology, the literature shows that student survey aggregates tend to correlate significantly with each other, with indicators derived through other methods (e.g., classroom observation), and with a range of desirable student outcomes. However, there is a gap in research investigating whether within-classroom or within-school variability in such student survey responses may offer additional information beyond that conveyed by average indicators. This question is important in light of emerging evidence that the educational experiences of individual students can vary considerably within schools, and even within the same classroom, including opportunities for student participation (Reinholz & Shah, 2018; Schweig et al., 2020), and the quantity and quality of teacher–student interactions (e.g., Connor et al., 2009), among others. In this chapter we review literature that examines aggregate survey indicators in different fields, and consider the key assumptions and consequences of various measurement models and analytic methods commonly used to summarize student survey reports of teaching. We then examine the growing literature that investigates the variability in student survey responses within classrooms and schools, and whether this variation may relate to educational experiences and outcomes. We illustrate the potential implications of this kind of variation using a hypothetical example case. In the final section, we discuss the implications of this research for evaluation policy and instructional improvement.

2 Student Surveys, Teaching, and the Learning Environment

There are many reasons why educators are increasingly interested in student surveys as a source of information about learning environments. Perhaps most importantly, students can spend over 1,000 hours in their schools every year, and thus have unmatched depth and breadth of experience interacting with teachers and peers (Ferguson, 2012; Follman, 1992; Fraser, 2002). Students also provide a unique perspective compared to other reporters (Downer et al., 2015; Feldlaufer et al., 1988). Probing students about their perceptions of teaching and the learning environment acknowledges their voice (Bijlsma et al., 2019; Lincoln, 1995), and the significance of their school-based experiences (Fraser, 2002; Mitra, 2007). Second, a growing body of research suggests that students can provide trustworthy information about important aspects of the learning environment (Marsh, 2007). For example, survey-based aggregate indicators can reliably distinguish among instructional practices (Fauth et al., 2014; Kyriakides, 2005; Wagner et al., 2013), and aspects of teaching quality (e.g., Benton & Cashin, 2012). These aggregates are furthermore significantly and positively associated with other measures of teaching quality (e.g., Burniske & Meibaum, 2012; Kane & Staiger, 2012).

Like other measures, student survey responses can be susceptible to error (e.g., recall, inconsistency in interpretation; see e.g., Popham, 2013; van der Lans et al., 2015), bias (e.g., acquiescence), and halo effects (perceptions of one aspect of teaching influencing those of other aspects; see e.g., Fauth et al., 2014; Chap. 3 by Röhl and Rollett of this volume) that may influence their psychometric properties (see for example, Follman, 1992; Schweig, 2014; Wallace et al., 2016). Nevertheless, most existing studies suggest that these biases are generally small in magnitude and do not greatly influence comparisons across teachers or student groups, or how aggregates relate with one another and with external variables (Kane & Staiger, 2012; Vriesema & Gehlbach, 2019). Research also demonstrates that aggregated student survey responses are associated with important student outcomes including academic achievement (Durlak et al., 2011; Shindler et al., 2016), engagement (Christle et al., 2007), and self-efficacy and confidence (e.g., Fraser & McRobbie, 1995).

Student surveys also have the benefit of being cost-effective, relatively easy to administer, and feasible to use at scale (e.g., Balch, 2012; West et al., 2018). This is a particular advantage when contrasted with other commonly used methods for measuring teaching and the quality of the learning environment, including direct classroom observation. In large school districts, an observation system closely tied to professional development can require dozens of full-time positions, with yearly costs in the millions of dollars (Balch, 2012; Rothstein & Mathis, 2013). As a result, the use of student surveys has seen remarkable growth over the last two decades for evaluating educational interventions (Augustine et al., 2016; Gottfredson et al., 2005; Teh & Fraser, 1994), and monitoring and assessing educational programs and practices (Hamilton et al., 2019). In particular, student surveys are commonly used to inform teacher evaluation and accountability systems—summatively as input for setting actionable targets (Burniske & Meibaum, 2012; Little et al., 2009), or formatively to provide feedback and promote teacher reflection and instructional improvement (Bijlsma et al., 2019; Gehlbach et al., 2016; Wubbels & Brekelmans, 2005).

3 Psychological Climate, Organizational Climate, and Student Surveys

In most contexts, schooling is an inherently social activity, and students typically experience schooling in organizational clusters (Bardach et al., 2019). The common pattern of student clustering within classrooms and schools presents challenges and choices in using surveys to understand teaching and the quality of the learning environment. One of the first choices is whether to focus the survey on understanding the personal perceptions and experiences of individual class members, or more broadly on shared elements of teaching quality relevant to the class or school as a whole (Bliese & Halverson, 1998; Den Brok et al., 2006; Echterhoff et al., 2009).

Surveys that aim to capture individual student interpretations of teaching quality or of the learning environment are described as reflecting psychological climate, and include items that ask for individual self-perceptions and personal beliefs (Glick, 1985; Maehr & Midgley, 1991). A long history of educational research suggests that psychological climate is a key proximal determinant of academic beliefs, behaviors, and emotions (Maehr & Midgley, 1991; Ryan & Grolnick, 1986). Because psychological climate variables treat individual perceptions as interpretable, it is appropriate to analyze them at the individual level (Stapleton et al., 2016), and differences among individual respondents are considered as substantively meaningful. Individuals can react in different ways to the same practices, procedures that seem fair to one individual might seem unfair to another individual, and so forth. Psychological climate variables can be aggregated to describe the composition of an organization (Sirotnik, 1980).

On the other hand, surveys that focus on the classroom or the school as a whole are described as reflecting organizational climate (see e.g., Lüdtke et al., 2009; Marsh et al., 2012), a concept that has a rich history in industrial and social psychology (Bliese & Halverson, 1998; Chan, 1998). Unlike psychological climate, organizational climate emerges from the collective perceptions of individuals as they experience policies, practices, and procedures (e.g., Hoy, 1990; Ostroff et al., 2003). Aggregating individual perceptions produces measures of organizational level phenomena (Sirotnik, 1980). These new variables can be interpreted to reflect an overall or shared perception of the environment (Lüdtke et al., 2009). The concept of organizational climate informs the design and use of many student surveys, which are typically directed toward students as a group, often asking for observations of the behavior of others (e.g., classmates, teachers; see Den Brok et al., 2006).

When conceived as measures of organizational climate, aggregating survey responses essentially positions students as informants or judges of a classroom or school level trait, similar to observers who would provide ratings using a standardized protocol. To illustrate this assumption, consider the following claims in Table 1 regarding three widely used student surveys.

Table 1 Measurement claims for three widely used surveys

Thus, while psychological climate variables treat interindividual differences as substantively interpretable, organizational climate variables emerge based on shared student experiences, and assume that students have similar mental images of their classroom or school (Fraser, 1998). Students in a particular classroom or school are treated as exchangeable (Lüdtke et al., 2009), and interindividual differences are treated as idiosyncratic measurement error. Lüdtke et al. (2006, p. 207) noted that in the ideal scenario, “each student would assign the same rating, such that the responses of students in the same class would be interchangeable.” Because organizational climate variables treat individual perceptions as error, it is appropriate to analyze them at the classroom or school level (Stapleton et al., 2016). However, while the distinction between psychological and organizational variables is frequently drawn in the theoretical and methodological literature, much-applied literature does not explicitly or consistently consider student survey-based ratings of teaching quality and the learning environment as either psychological or organizational level measures (Lam et al., 2015; Schweig, 2014; Sirotnik, 1980). This in part reflects the fact that most student surveys occupy a gray area between these two classifications. On one hand classrooms and schools are shared spaces, students interact socially and build social relationships with their peers and with their teachers, and some aspects of teaching quality are more or less equally applicable to all students in the classroom (Lam et al., 2015; Urdan & Schoenfelder, 2006). At the same time, students’ school-based experiences can and often do differ, making their responses not exchangeable; students are not objective, external observers, but active participants involved in complex interactions with other students, teachers, and features of the classroom and school environment. Teachers often interact with students through multiple modes and formats, both individually, and as a group (whole-class instruction, group work); and of course students interact directly with one another individually and as a group (Den Brok et al., 2006; Glick, 1985; Sirotnik, 1980).

4 Reporting Survey Results: Common Practices and Opportunities for Improvement

In the previous section, we argued that research often does not explicitly state the measurement assumptions that underlie their use of student surveys. In particular, researchers are not always explicit about the unit of interest (e.g., the individual or the group), and what this implies for the interpretability of individual student responses. These issues also arise in how survey developers choose to summarize and report survey results. In practice, nearly all survey platforms report measures of teaching and the quality of the learning environment by aggregating individual student responses to create classroom-level or school-level scores. It is these aggregates that are subsequently communicated to stakeholders or practitioners through data dashboards or survey reports (Bradshaw, 2017; Panorama Education, 2015). These aggregates can reflect simple averages (Balch, 2012; Bijlsma et al., 2019), percentages of respondents that report a certain experience or behavior (Panorama Education, 2015), or more sophisticated statistical models (e.g., IRT, or other latent variable models, see e.g., Maulana et al., 2014).

Irrespective of whether the survey developers are interested in individual or school- or classroom-level variables, this approach to score reporting often does not include information about the variability of student responses within classrooms or schools (Chan, 1998; Lüdtke et al., 2006). Thus, whether by accident or design, survey reports are ultimately firmly rooted in the notion of organizational climate in industrial or organizational psychology described previously: the shared learning environment is the central substantive focus, students are assumed to react similarly to similar external stimuli, and individual variation is assumed idiosyncratic or reflective of random measurement error (Chan, 1998; Lüdtke et al., 2009; Marsh et al., 2012).

However, while aggregated scores are useful for characterizing the overall learning experiences of a typical student, a growing body of research shows that these experiences can in fact vary greatly within schools and classrooms. Croninger and Valli (2009), for example, found that the vast majority (more than 80 percent) of the variance in the quality of spoken teacher–student exchanges occurred among lessons delivered by the same teachers. Den Brok et al. (2006) found that the majority of variance in student survey reports reflects differences among students within the same classroom (between 60 and 80 percent of the total variance). Crucially, emerging research also suggests that disagreement among students in their reports of the learning environment does not reflect only error, and indeed can provide important additional insights into teaching and learning, not captured by classroom or school aggregates. In a study of elementary school students, Griffith (2000) found that schools with higher levels of agreement in student and parent survey reports of order and discipline tended to have higher levels of student achievement and parent engagement. Recent work by Bardach and colleagues (2019) found that within-classroom consensus on student reports of classroom goal structures was positively associated with socio-emotional and academic outcomes.

4.1 An Example Case of Within-Classroom Variability

Examining the distribution of student reports can open up possibilities for using information about the nature and extent of student disagreements for diagnostic and formative uses, and focused professional development opportunities for teachers, among others. The three hypothetical Classrooms in Fig. 1 illustrate how different within-classroom distributions can produce the same aggregate classroom climate rating (e.g., Lindell & Brandt, 2000; Lüdtke et al., 2006).

Fig. 1
Three dot plots of student scores versus students in classrooms 1, 2, and 3. In classroom 2, the students with high scores are the highest in numbers.

Three hypothetical distributions of student climate ratings yielding the same average of 3.42

For the purposes of this example, students in each of these classrooms are asked about their perceptions of cognitive activation in the classroom, and the extent to which they are presented with questions that encourage them to think thoroughly and explain their thinking (Lipowski et al., 2009). Figure 1 displays the ratings provided by twenty students in each of the three classrooms. All three classrooms have the same average score of 3.42 on a 5-point scale.

In Classroom 1 there is noticeable disagreement in student survey responses, and students provide responses all across the allowable score range. In Classroom 2, there is also a lot of variability in student responses, but student perceptions seem polarized: there is a large group of students that feel very positively about the level of cognitive activation, while a large group of students feel very negatively. Finally, in Classroom 3, there is perfect agreement among all students—this is the hypothetical ideal classroom described in Lüdtke and colleagues (2006) where all students experience classroom climate the same way. These scenarios raise important questions for practice. In principle it does not seem justifiable to give the three schools in Fig. 1 the same feedback and professional development recommendations for teachers—thus omitting the fact that the patterns of within-classroom variation are dramatically different. A more sensible approach would likely entail considering whether the within-classroom variability in student reports can potentially be informative for purposes of diagnosing and improving teaching quality. It is not possible to determine from this raw quantitative display why students in these three classrooms perceived cognitive activation in different ways. However, examining the distribution of student reports can open up possibilities for using this information for diagnostic and formative uses, and focused professional development opportunities for teachers. In the remainder of this chapter, we summarize and discuss relevant literature for understanding these interindividual differences.

5 School and Classroom Factors Associated with Variation in Student Perceptions of Teaching Quality

Within classrooms or schools, interindividual differences in the perception of teaching or the learning environment can arise for many reasons. We begin this section discussing the standard assumption invoked by common approaches to survey score reporting (that within-classroom or school variation reflects measurement error) and subsequently present four alternative interpretations that have support in the literature in other areas: (1) differential expectations and teacher treatment, (2) diversity of student needs and expectations, (3) diversity of student backgrounds, experiences, cultural values, and norms, and (4) teacher characteristics.

5.1 Measurement Error

Interindividual variability in student perceptions can be assumed to involve some idiosyncratic component of measurement error—i.e., random fluctuations around the “true” score of a school or classroom, related to memory, inconsistency, and unpredictable interactions among time, location, and personal factors. Individual students may also vary in terms of their standards of comparison (Heine et al., 2002), or the internal scales they use to calibrate their perceptions (Guion, 1973). This can create differences in student scores analogous to rater effects in studies of observational protocols: some students may be more lenient or severe than others. Thus, some differences among students are not substantively interpretable (Marsh et al., 2012; Stapleton et al., 2016), Moreover, to the extent students are not systematically sorted into classrooms based on stringency, these differences are not expected to induce bias and are best treated as measurement error (West et al., 2018). If interindividual variability were idiosyncratic and random, however, we would generally not expect within-classroom student ratings to be associated with other measures of teaching quality or student outcomes. However, a number of prior studies have demonstrated that individual perceptions of school or classroom climate can be positively associated with student achievement. Griffith (2000) and Schweig (2016), found that learning environments with more intraindividual disagreements about order, discipline, and the quality of classroom management had lower academic performance, even holding average ratings constant. Schenke et al. (2018) found that lower levels of heterogeneity among students’ perceptions of emotional support, autonomy support, and performance focus are negatively associated with mathematics achievement. Martínez (2012) found that individual perceptions of opportunity to learn (OTL) were predictive of reading achievement, even after controlling for class and school level OTL. Such findings strongly suggest that within-classroom variability in student reports is not entirely reflective of measurement error.

5.2 Differential Expectations and Teacher Treatment

Teacher expectations are a critical determinant of student learning (Muijs et al., 2014). Teachers may consciously or unconsciously have differential expectations for subgroups of students, which may translate into different sets of rules, classroom environments, and pedagogical strategies (Babad, 1993; Brophy & Good, 1974), potentially leading to opportunity gaps (Flores, 2007). Research has shown some teachers can have lower achievement expectations for students of color (Banks & Banks, 1995; Oakes, 1990). Teachers may also have lower achievement expectations for female students (Lazarides & Watt, 2015), and offer them less reinforcement and feedback (e.g., Simpson & Erickson, 1983). Teacher expectations may also differ based on perceptions of student ability. At higher grades, research has shown that prior academic achievement is the most significant influence on teacher expectations (Lockheed, 1976). More recent research suggests that learning tasks are often differentially assigned to students based on teacher beliefs about student ability. For example, “mathematically rich” instruction (tasks requiring reasoning and creativity, multiple concepts and methods, and application to novel contexts) is often reserved for students perceived to be high-achieving, while those perceived as lower achieving spend more time developing and practicing basic skills (Schweig et al., 2020; Stipek et al., 2001). Thus, within-classroom variability in student survey reports could point to suboptimal or inequitable participation opportunities and instructional experiences for students of different groups (Gamoran & Weinstein, 1998; Seidel, 2006), which may, in turn, result in achievement gaps (Voight et al., 2015).

5.3 Diversity of Student Needs and Expectations

Student perceptions of teaching and the learning environment may reflect different student needs and expectations—learning experiences and instructional practices that are successful with some students may not be effective with others, and student socio-emotional needs and expectations may also differ substantially within classrooms. Levy et al. (2003) provide an example that students with lower self-esteem may have greater needs with respect to the establishment of a supportive climate. Lüdtke et al. (2006) suggest that higher and lower ability students may differ in their perceptions of certain aspects of instructional practice, including pacing or task difficulty. English learners (ELs) and students with disabilities tend to report their schools to be less safe and supportive than their peers (Crosnoe, 2005; De Boer et al., 2013; Watkins & Melde, 2009). ELs face challenges with language comprehension, particularly with academic or mathematical language (Freeman & Crawford, 2008), and this may create differential perceptions of the clarity of classroom procedures. On the other hand, Hough and colleagues (2017) found that ELs had systematically more favorable perceptions of their teachers and classrooms than their peers on several aspects of climate. ELs students could be more engaged, more challenged, and better behaved, which influences their overall perception of the classroom (LeClair et al., 2009). In this way, ELs could also be more proactive at seeking out additional support from teachers, or that teachers are particularly sensitive to the needs of ELs (LeClair et al., 2009).

Alternatively, teachers may use instructional strategies that are responsive to and supportive of students’ diverse needs and expectations, potentially causing student perceptions of the quality of their learning experiences to be more similar. For example, teachers may use complex instruction structured to promote student engagement, support critical thinking, and to connect content in meaningful ways to students’ lives (Averill et al., 2009; Freeman & Crawford, 2008). Thus, to the extent that within-classroom agreement is associated with the use of instructional strategies responsive to students’ diverse needs and expectations, there may be more equitable opportunities for all students. In a recent mixed-methods study of science classrooms, we found that classrooms with higher levels of student agreement tended to provide more collaborative learning opportunities for students, including more group work, and to have more structured systems for eliciting student participation (Schweig et al., n.d.).

5.4 Diversity of Student Backgrounds, Experiences, Cultural Values, and Norms

Reports of teaching and the quality of the learning environment may reflect cultural or contextual factors that cause students to perceive the learning environment differently (Bankston & Zhou, 2002; West et al., 2018). There is also research suggesting that student perceptions of the learning environment may also differ by grade level (West et al., 2018). In the United States, research has shown that Black and Hispanic/Latino students often report feeling less connected to their schools, feel less positively about their relationships with teachers and administrators, and feel less safe in some areas of the school (Lacoe, 2015; Voight et al., 2015). However, recent literature suggests that this may not always be the case. Hough and colleagues (2017) found that while Black students had systematically lower ratings of school connectedness, discipline, and safety than their peers, Hispanic/Latino students tended to report systematically higher perceptions. These findings are not inherently at odds, and other literature suggests that perceptions of the learning environment can differ even from one area of the school to the other. Using data from New York City, Lacoe (2015) found that, for example, Black students have systematically lower perceptions of safety than their white peers in classrooms, but have systematically higher perceptions of safety in hallways, bathrooms, and locker rooms. In our own work, we found that classes with higher proportions of ELs and low-achieving students tended to have more intraindividual disagreements about teaching and the quality of the learning environment (Schweig, 2016; Schweig et al., 2017) in mathematics and science classrooms, and we also found significant within-classroom gaps between Black and white students on several aspects of teaching and the quality of the learning environment, with Black students typically having more positive perceptions relative to their white peers (Perera & Schweig, 2019).

The perception of some teacher behaviors, including the extent to which teachers make students feel cared for, may depend strongly on cultural conceptions of caring (Garza, 2009). Calarco (2011) highlighted several ways in which economically disadvantaged students help-seeking behaviors differed from their classmates in ways that could impact perceptions of teaching quality. Specifically, Calarco found that economically disadvantaged students sought less teacher assistance, and as a result, received less guidance from their teachers. Atlay and colleagues (2019) found that students from higher socioeconomic backgrounds were more critical about teacher assistance, perhaps reflecting a sense of entitlement (Lareau, 2002). Students’ perceptions of teaching quality can also be influenced by out-of-school experiences. For example, there may be differential exposure to external stressors that influence feelings of school safety (Bankston & Zhou, 2002; Lareau & Horvat, 1999).

5.5 Teacher Characteristics

A number of teacher characteristics can influence survey-based reports. Past work, for example, has shown that student perceptions of teachers are associated with teacher experience, and in particular, that more experienced teachers are perceived as more dominant and strict (Levy & Wubbels, 1992). More experienced teachers, however, are not generally perceived as more caring or supportive by their students (Den Brok et al., 2006; Levy et al., 2003). Teacher race and ethnicity can also play a role in survey-based ratings of teaching quality. Newly emerging research suggests that race-based disparities in perceptions of teaching quality can be ameliorated by the presence of teachers of color. Specifically, teacher–student race congruence may positively influence students’ perceptions of teaching quality (Dee, 2005; Gershenson et al., 2016). In our own research, however, we did not find evidence that observable teacher characteristics, including teacher race, gender, years of experience, and level of education explain variation in race-based perceptual gaps (Perera & Schweig, 2019; Schweig, 2016).

6 Conclusion

A growing body of evidence suggests that in considering instructional climate, researchers and school leaders may want to look beyond aggregate indicators, and consider also the extent of variation (or consensus) in student survey reports, as a potential indicator of important aspects of the school or classroom environment. In fact, the ability to capture within-school or within-classroom variability in student experiences is one of the defining strengths of student survey-based measures. Other commonly used measurement modes (including teacher self-report and structured classroom observations) are structurally not well-equipped to capture differential student experiences. Classroom observation protocols, for example, are typically not designed to measure whether or how teachers engage with individual students (Cohen & Goldhaber, 2016; Douglas, 2009). Student surveys, on the other hand, offer information that goes beyond typical experiences and can allow teachers and instructional leaders better understand how instruction, socio-emotional support, and other aspects of the learning environment are experienced by different students or groups of students.

Collectively, the research presented in this chapter suggests that variation in student survey reports of their learning environment may reflect a variety of factors and influences, ranging from strategic instructional choices, responsive pedagogy, and classroom structures implemented by teachers, varying needs and perceptions of particular students or groups of students, contextual factors, and the interactions among these. Importantly, variation can also reflect more pernicious influences like differential teacher expectations, and other structural disadvantages for some group of students. Our example case also raises important questions about whether within-school or within-classroom variability should be considered as ignorable measurement error when examining student survey reports of teaching quality and learning environments. Should we give the three classrooms in Fig. 1 the same feedback and professional development recommendations for teachers? Or is there evidence in the within-classroom variability in student reports that can potentially be informative for these purposes? Recent policy guidelines in the United States either explicitly require or implicitly move in the latter direction, advising education agencies to provide schools not only aggregated survey-based indicators, but also indicators disaggregated by student subgroup (Holahan & Batey, 2019; Voight et al., 2015). A growing consensus also sees attending to these subgroup differences as a key for school-wide adoption of instructional improvement strategies that meet the learning needs of the most vulnerable students (Kostyo et al., 2018).

Considering the diversity of student perspectives and experiences can be particularly useful for informing efforts to promote equitable learning and outcomes. Ultimately, whether the climate is conceived as a psychological or organizational climate, or both, if subgroups of students experience school life in meaningfully different ways, reliance on aggregated survey indicators as measures of teaching quality can potentially obscure diagnostic information (Roberts et al., 1978), and compromise the validity and utility of these measures to inform teacher reflection or feedback, and other improvement processes within schools (Gehlbach, 2015; Lüdtke et al., 2006).