Dangers of “Making Diversity Visible”: Historicizing Metrics of Science Achievement in U.S. Education Policy

Science achievement is often taken for granted as itself the inequality that must be remediated to permit greater social equality, access, and inclusion. Where prior critiques of Science-for-All reforms invoke a divide between policy rhetoric and classroom reality, I argue that policy is not mere rhetoric. Education policy actively produces typologies of students and science curricula as authorized by inherited metrics of science achievement. This chapter historically examines how science achievement emerged as a psychological category and calculable attribute in the early twentieth century. The design and calibration of these early tests drew upon sociological taxonomies, religious hierarchies, and cultural distinctions to stabilize science achievement as both a criterion of the ideal American citizen and a universalized standard of comparison. Juxtaposing this history with current U.S. and international policy reports, I argue that today’s efforts to include “diverse groups” by closing science achievement gaps retain expectations about the desired future citizen that inadvertently marginalize those projected as outside these norms.

literacy-presumed to threaten their "personal well-being," prospects in "job markets," and "civic decision-making." This chapter explores how this notion of a demographic difference in science achievement became taken for granted as itself the inequality that must be remediated to allow for greater economic equality, political access, and social inclusion.
Recent calls for broadening participation in science, technology, engineering, and mathematics (STEM) education can be examined as part of a broader hope of the modern school to make the kind of person who is happier, healthier, and more productive (Diaz 2017;Ideland 2018;Miller 2017;Popkewitz 2008;Valero 2017;Zheng 2019). Transnational discourses in STEM education are not simply about improving learning outcomes or economic productivity; they also embody anxieties about the increasing cultural diversity ascribed to immigrants, refugees, and other marginalized groups (Bazzul 2014;Ideland and Malmberg 2014). This relationship is not new. For at least a century, science education has participated in making categories of self and Other through distinctions that divide scientific from superstitious and healthy from pathological, and which render citizenship into a moral and cultural qualification rather than an assumption (Kirchgasler 2017(Kirchgasler , 2018.
This chapter historically examines key conditions of possibility for dividing children by something called science achievement. Drawing on insights from science studies and curriculum studies, the argument explores how achievement data are not simply descriptions that represent a pre-existing reality with greater or less fidelity. Achievement metrics-assembled with policy objectives, curricular standards, psychological categories, and pedagogical techniques-act to shape that reality in multiple, indeterminate ways. This chapter analyzes present policy alongside past research in U.S. science education. This juxtaposition indicates that current efforts to include "diverse groups" by closing gaps in science achievement retain historical and cultural principles about the desired future citizen that unintentionally marginalize those projected as outside these norms. At stake is how the seemingly neutral categories, methods, and practices of education policy inadvertently generate new exclusions even as they seek to empower and include.
This chapter is organized as follows. As a starting point, I consider how the most recent U.S. science curriculum standards, called the Next Generation Science Standards (NGSS), respond to concerns that the prior standards had made diversity invisible and promoted a one-size-fits-all approach. I situate these concerns in relation to extant critiques of the role of achievement data in education policy, and outline the need for a new approach that treats science achievement as a historical object. Next, this chapter briefly examines how science achievement emerged in early twentieth-century U.S. science education research as a psychological category and a calculable attribute. Related techniques of research and pedagogy helped to make up different "kinds" (Hacking 2007) of science learners as needing different levels of science instruction. This chapter concludes by returning to the NGSS to consider key historical shifts in how science achievement operates to classify and order difference. The argument highlights limits and dangers in efforts to make diversity visible through science achievement. It also illustrates how science achievement itself acquires visibility through anxieties about the "nation's increasingly diverse student population" (NGSS Lead States 2013a: 359).

Reevaluating the Premises of Invisibility and a "One-Size-Fits-All" Science Education
The recent U.S. Next Generation Science Standards (NGSS) aim to address inequity by "making diversity visible" (NGSS Lead States 2013a: 364). "Persistent achievement gaps" are taken to indicate that "non-dominant groups" have different learning needs and require instructional shifts (p. 359). The standards present case studies for seven categories of students. Contrasts appear in the pedagogical strategies recommended for some groups versus for others. The case studies for "economically disadvantaged students" and for "students from major racial and ethnic groups" recommend strategies to make science more accessible and concrete, such as multimodal representations to review below-grade-level material (NGSS Lead States 2013b, c). Meanwhile, those identified as gifted and talented are said to require instruction that is more open-ended and abstract, such as self-directed projects to explore above-grade-level material (NGSS Lead States 2013d). These contrasts beg the question: How did it become reasonable to advocate for distinct kinds of science education for socioeconomic, racial, and ethnic groups in the name of equity?
The NGSS' framework (National Research Council 2012) cites critiques that prior standards had promoted a "dangerous discourse of invisibility" (Rodriguez 1997: 19) by failing to address critical issues of ethnicity, socioeconomic status, and gender. During the late 1990s, policy analyses outlined Science-for-All reforms as "egalitarian in theory," but "difficult to actualize in practice" (Calabrese Barton 1998: 525). Accompanying these critiques were calls to close the research-practice gap by identifying specific strategies to support demographic groups historically overlooked in U.S. science education (Lee 1999). In other words, the concern was how to expand the "all" of Science-for-All to include students with disabilities (Mastropieri and Scruggs 1992), Mexican American students (Barton and Osborne 1995), bilingual students (Fradd and Lee 1995), girls (Shakeshaft 1995), and urban homeless children (Calabrese Barton 1998), among others. Such critiques emphasized that the problem did not lie with deficits within these groups, but with a curriculum that represented science in narrow, discriminatory ways and failed to respond to their ideas, interests, and everyday lives (see, e.g., Brickhouse 1994). A special issue on diversity in K-12 science education concluded in 2001 that, "It has become increasingly obvious that 'science for all' does not necessarily mean 'one size fits all'" (Lynch 2001: 622).
Yet contrary to the premise of a "one size fits all" approach, U.S. science education has long distinguished between the curricula and pedagogies needed by some students versus by others. Efforts to differentiate science instruction for specific categories of students date back much further than 1990s discussions of multicultural science education (e.g., Atwater and Riley 1993;Hodson 1993) or the Science for All Americans report (American Association for the Advancement of Science [AAAS] 1990). A persistent preoccupation with difference is evident in titles of Science Education articles published over the years, such as, "The inner city child: An attempt to improve his problem solving skills" (George and Dietz 1971), "Adapting science instruction in New York City junior high schools to the needs of Puerto Rican pupils" (Sanguinetti 1961), and "Teaching science to defective delinquents" (Schuyler 1940). While much has changed over the past century, the NGSS' attempt to bring visibility to different groups of students and their science learning needs is not entirely new. Next, I consider the value of shifting the analytical focus from diverse groups to the "dividing practices" (Foucault 1994: 126) by which differences are seen and sorted in the classroom.

Not Just Rhetoric and Misrepresentation: Why
Historicize the Making of Science-for-All Science achievement data sit at the crux of nearly three decades of national and international reforms to promote Science-for-All (Hodson and Reid 1988;Linder et al. 2010;McEneaney 2003;Orion 2007). These initiatives have sought to raise the science achievement of all members of society, but particularly of those groups identified as historically underserved. The disaggregation of achievement data is envisioned to play a crucial role in revealing gaps, identifying effective strategies for specific demographic groups, and monitoring the success of pedagogical interventions. This logic, however, has come under sustained critique. While a review of this work is beyond the scope of this chapter, it is helpful to situate my approach in relation to work that interrogates the link presumed between achievement metrics and equity outcomes in terms of rhetoric and representation. Prior education policy analyses have discussed: (1) Science-for-All reforms as rhetoric, and (2) the achievement gap as a misrepresentation of the capabilities and needs of diverse groups. Within science education, many have argued that the policy emphasis on Science-for-All is mere rhetoric that is not implemented in reality (e.g., Atwater 2000; Calabrese Barton 1998). Others have taken issue with the rhetorical justification of Science-for-All as the need to optimize human capital and economic competitiveness, rather than as a moral imperative (e.g., Basile and Lopez 2015;DeBoer 2013). Beyond science education, scholars have argued that the overwhelming focus on racial achievement gaps functions as a deficit lens that perpetuates stereotypes and detracts attention from systemic disparities (e.g., Gutiérrez 2008;Ladson-Billings 2006). Others have contended that education policies employ the rhetoric of data to lend a scientific veneer to achievement metrics, when in reality data-driven reforms tend to disadvantage marginalized groups and to compound inequity through educational triage (e.g., Booher-Jennings 2005;Horn et al. 2015;Sleeter 2007;Valenzuela 2005).
Critiques of policy-as-rhetoric raise concerns about how policy narratives elide, obscure, and exacerbate the educational exclusion of marginalized groups. However, there are several limits to analyses that presuppose a divide between policy rhetoric and classroom reality. First, the premise of a rhetoric/reality or text/context divide makes it more difficult to examine how techniques for seeing and ordering difference circulate across domains of policy, research, and practice. Second, the tendency to interpret achievement discourse as a case of a broader ideology (e.g., neoliberalism, deficit thinking) omits scrutiny of the historical principles that made it possible to think about people as differing in science achievement in the first place. Third, the argument that achievement metrics misrepresent the real science capabilities and needs of marginalized populations risks reinscribing the notion that these groups constitute distinct types of learners whose capabilities and needs could be revealed objectively through the elimination of biased test items or through more culturally valid forms of assessment. Instead of debunking the science achievement gap as a false representation, a more pressing issue is to understand how it became a candidate for scientific truth or falsehood (Hacking 1992). In other words, how did science achievement itself become visible in national policy as a singular quality of mind, or as a metric of universal knowledge, practices, and reasoning that seems to vary in degree and appears to be distributed unequally between individuals, populations, and nations?
Rather than viewing the "all" of Science-for-All as an empty promise that is said but not done-or what Ahmed (2006) calls non-performative discourse-I am interested in how science education policy does perform, act, and impact educational inequalities. My research draws on scholarship from science studies and curriculum studies that examines how education policies comprise technologies that produce material effects. Popkewitz et al. (2018), for instance, discuss how benchmarks and notions of empirical evidence "perform as expectations about universal characteristics of society and people" that, ironically, generate difference through their statements of unity (p. 113). If I return to the opening epigraphs, the "all" of Science-for-All is not simply an egalitarian vision that is left incomplete or unfulfilled. Instead, that "all," linked to frameworks of science literacy and metrics of science achievement, creates hierarchical distinctions through rules and standards of what each citizen must know and do. These universalized qualities come to appear necessary to secure one's personal well-being, job market prospects, and civic decision-making. In so doing, the "all" inscribes differing needs onto the minds, attitudes, and home lives of students, which then appear to demand distinct forms of science instruction in response.
It is important to attend to this performative making of difference, because inclusion and exclusion are not just opposite phenomena (Popkewitz 2008). In a process called abjection (Butler 2011), those identified as needing to be included are classified as different from the norm (e.g., not-yet-scientifically-literate) and subjected to rescue and reform, where their inclusion depends on developing the qualities they are seen as lacking. Abjection directs attention to how scientific discourses and tools operate to "overrepresent" (Wynter 2003: 260) a historically peculiar and culturally particular genre of human thought and activity (e.g., the "basic skills" measured by PISA) as a generic baseline for human existence and a prerequisite for equal participation in society.
The point of historicizing science achievement, then, is not to debunk it as illusion or ideology. Instead of subtracting reality from achievement, I will attempt to add reality back to it by analyzing its historical shifts, political entanglements, and material agency (Latour 2004: 232). Science achievement only appears as a scientific object through what Latour (2000) calls a historical network of production. Rather than seeking its definitive origins, I highlight a few of the countless events out of which science achievement formed as an unstable assembly of various strategies of knowledge production, social administration, and pedagogical intervention. Notions of science achievement, ability, potential, and talent have materialized in mutating configurations over the past century. While some appear today as timeless cognitive factors, each emerged at a particular moment in response to a perceived social problem that its measurement was intended to solve. As a history of the present (Foucault 1977), this chapter explores two such moments-the emergence of standardized tests of science ability and achievement in the 1920s, and the current linkage of achievement test data to issues of equity and diversity in the 2010s. My starting points of analysis include the science education journal, General Science Quarterly (GSQ), published from 1916 to 1929, and the Next Generation Science Standards (NGSS) and their accompanying documents.
This chapter is not concerned with the internal validity or reliability of test items used to assess science achievement, nor with the authors' intentions. Rather, I examine the scientific and schooling practices that make certain differences knowable and actionable in the science classroom. Prior to the 1920s, for instance, it was not possible to make scientific claims about students' capacities for science learning. The subsequent century has witnessed a proliferation of instruments for assessing how schoolchildren measure up to standards codified as science, and later for ranking demographic groups and nations. Over the past century in the United States, techniques for conceptualizing and measuring science achievement have acquired, discarded, and reforged linkages to other elements, including evolutionary theories, psychological categories, narratives of American exceptionalism, Piagetian stage theories, political discourses of accessibility, and protocols of datadriven decision-making. These partial substitutions and rearrangements make it hard to recognize that, while many elements have changed, today's network of "science achievement" still generates distinctions in both individualizing and racializing terms.

The Making of Science Achievement as a Measurable Attitude of the Mind (Early 1900s)
Denaturalizing notions of science achievement requires briefly returning to a moment before it became natural to think of children's minds as possessing distinct amounts of scientific understanding. In the mid nineteenth-century United States, truth about human difference was established through religious doctrines about the soul. Societal problems were attributed to cities as sites of moral contagion where virtues dissipated and vices spread (Boyer 1978). Physiology courses in the common schools sought to combat the vice of ignorance, fostering moral character through teaching obedience to God's laws in nature (Mann 1867). By the early twentieth century, the explicit aims of school science began shifting from moral character to a mental attitude. The notion of science as a mental quality of the child emerged at a moment in the United States when hopes of scientific progress were coupled with fears of racial degeneration. In the early 1900s, popular narratives of national identity highlighted America's inventive genius and technological progress as the height of modern civilization (Nye 1999). The social sciences brought principles of scientific planning to problems of human improvement. Of utmost concern was the Social Question, which attributed the perceived moral disorder of U.S. cities to the Great Migration and to the immigration of "foreigners" from southern and eastern Europe (Popkewitz 2008). Societal problems were imputed to the mental habits of these racialized populations, and mass schooling took on importance as a site of their rescue and reform. Hopes were placed in education, and the new educational sciences, to "Americanize the masses" by fostering desired characteristics among future citizens. Given the central role of science in concurrent narratives of American exceptionalism, science education could see itself as having a special role in transforming immigrants of "unscientific mind" (Woodhull 1918: 3) into "straight-thinking Americans" (Whitman 1921: 88).
A "scientific attitude of mind" emerged as a new object of empirical investigation through psychological techniques (Barber 1917: 108). Yet in the shift from soul to mind, moralizing judgments of social behavior did not disappear. Bad moral habits, such as poor hygiene, were still attributed to the ignorance of the urban masses. However, this ignorance was now construed not as a spiritual vice but as a product of the mental immaturity of immigrants, such as "inferior southern European stocks" (Grier 1920: 47). The new goal of developing scientific attitudes sought to bring moral order to these pupils' daily lives as they learned to follow scientific recommendations concerning physical, mental, and sexual hygiene.
Reconfiguring science as a mental trait relied upon and reiterated long-circulating assumptions of "lower races" as less capable of scientific reason. Drawing on recapitulation theory, scientific thinking was argued to be the upper anchor of human evolution, exemplified by the "best American stocks" (Grier 1920: 47), and was defined and discriminated against the unsystematized thinking attributed to the "savage" (Dewey 1910: 16). Yet the new notion of science-as superior reasoning, civilized living, and national belonging-was not part of the curriculum in existing courses. Psychological theories suggested that the rapid expansion of public schooling had yielded populations of pupils for whom existing forms of science teaching were inadequate. According to Thorndike's Law of Readiness, demanding that all students take physics and chemistry would be an attempt "to force nature," forgetting that the requisite attitude develops "relatively late in youthful minds as in that of the race" (Woodhull 1918: 49).
This "recapitulatory point of view" made it possible to reorganize science education as a differentiated, developmental progression (Downing 1925: 74). At the top of the trajectory was knowledge of physics and chemistry-now designated as "specialized sciences" suitable only for the few judged capable of quantitative abstractions. At the bottom of the developmental trajectory, a new course called general science would help "immature minds" acquire the scientific attitude seen as a prerequisite for more advanced, abstract thinking. Recapitulatory principles thus provided the grounds for defining scientific minds in opposition to allegedly immature minds and for differentiating the curriculum for these new categories of pupils. In the historical shift from religious moralization to psychological normalization, then, what got constituted as a "scientific attitude" continued to embody moral principles about who the child was and needed to become, and who was construed as furthest from these norms.
So far, I have considered the emergence of a notion of science as a mental quality differentiating kinds of people. But how did it become a quantifiable attribute-not merely inferred, but empirically measured? Around the same time as the general science course was spreading across the country, the intelligence quotient (IQ) test and other psychological instruments began entering U.S. schools. It soon became "self-evident that the first thing one must do is to find out the exact mental equipment of his [sic] students" (Woodhull 1918: 83). In part, this demand was tied to the perception that those entering high schools were no longer homogeneous, but "a mongrel lot of pupils of all races" whose foundations for science learning had to be assessed rather than assumed (p. 224). Like their overall mental capacity, pupils' abilities for learning science were assumed to vary by "sex, age, environment, [and] heredity" (Hunter 1920: 385). The standardized tests developed over the next decade would materialize science ability as a measurable attribute that varied in degrees from a norm and could be used to compare distinct categories of pupils.
In these early standardized science tests, what became codified as science ability or achievement (terms often used interchangeably) was not simply a subset of the natural sciences, but the mental qualities presumed lacking in the masses. Sociological studies of the time defined scientific thinking in opposition to the "folk beliefs" of the "Southern Negro" (Puckett 1926), and the "superstitions" of the "Italian" and "Jew" (Jones 1904). Sociologists classified particular religious practices, such as the hanging of rosary beads or of the mezuzah, as "superstition," and identified adherents of Roman Catholicizm and Judaism as less science-minded than those of Protestantism, which was upheld as a model of independent thinking (i.e., for purportedly having emancipated itself from the constricts of Old World religious traditions) (p. 77). Since general science aimed to free American citizens from superstition (Whitman 1921), early tests of science ability generated questions to assess "common superstitions or beliefs arrived at through unscientific thinking" (Maxwell 1920: 444). For instance, one question on a test of scientific reasoning asked whether the date Friday the thirteenth was unlucky (p. 449)-a belief classified by sociologists of the era as a "Negro Taboo" (Puckett 1926: xii). Part of what the tests constituted as science ability, then, was pupils' rejection of beliefs presumed to distinguish racialized Others from allegedly rational Americans.
The theoretical object of science ability was reconfigured further through the operation of its measurement, such that-like intelligence (Danziger 1997)-science ability became that which science achievement tests measured. The instruments made it possible to conceptualize each mind as having a stable degree of future capacity for science learning. Early science ability tests were designed to distribute individuals along a bell curve, keeping only those items that "differentiate bright pupils from dull ones" (Whitman 1920c: 50). The validity of the test could only be secured through an alignment with pre-existing appraisals of what constituted a mature scientific thinker, which required pre-determining which pupils were bright and which were dull. Such judgments were supplied by calibrating the tests against either IQ tests (Dvorak 1926) or teachers' grades and rankings (Ruch 1920). In particular, test designers asked teachers to rank their students by qualities such as "diligence, classroom behavior, personality of the pupil, punctiliousness with assignments, neatness, spontaneity, and many others" (p. 17). Such categories were not neutral but embodied specific social values and norms. Consider "spontaneity," a positive intellectual quality presumed to distinguish the American both from the "Frenchman" characterized as "bound by tradition, inert and pessimistic" (Downing 1925: 174), and also from the random impulsivity imputed to the "savage" (Dewey 1910: 14). Cultural norms of belief, conduct, and expression came not only to serve as universal indicators of scientific ability, but also as signs of American exceptionalism, expressed as "our own buoyancy, alertness, and ability to tackle forcefully and efficiently the changing problems" of society (Downing 1925: 174). The external validity of standardized science tests, like other psychological instruments (Rose 1985), relied upon registering as subnormal those individuals who had already been designated as problematic by institutions like schooling.
Stabilizing science content on standardized tests spatialized difference along a numerical scale. In generalizing a particular performance as a personal attribute, the notion of scientific thinking-already racialized through recapitulation theory, and culturally specified through sociological studies of foreign superstitions-became quantifiable. Statistical techniques sorted individual scores by pre-determined categories of difference (e.g., sex, heredity, environment), and demographic averages became inscribed as personal traits. This statistical style of reasoning made available new types of truth claims, such as test data suggesting that girls have more difficulty acquiring science knowledge than boys (Dvorak 1926), or that students from "typical Chicago high schools" do not grasp fundamental science concepts (Downing 1925). Through the rendering of such claims as empirical "findings," numerical data reordered pupils and populations, marking their distance from social norms that were, in the process, abstracted and universalized as science knowledge and conceptual understanding.
At the same, the standardized test did not enjoy universal acclaim. Some scholars in GSQ expressed dissatisfaction that the standardized test revealed only the most "mechanical aspects" of science learning (Kilpatrick 1921: 281). Additional techniques would be necessary to capture the "wider gamut of achievement" in terms of the ideals and habits associated with a scientific attitude of mind (p. 282). Besides science achievement tests, evidence of a scientific attitude could also be displayed through inventories of science-related interests (Lyon 1918), questionnaires of what children collected and why (Hunter 1919), home surveys of fire hazards (Whitman 1920a), and neighborhood surveys of sanitary conditions in local grocery stores and meat shops (Bayer and Clark 1920;Andress and Evans 1925). Embedded in these survey techniques were categories, guidelines, and normalized values that would allow teachers (and pupils themselves) to identify "both the defects and the good features of both home and community," so that those defects could "naturally" give rise to project work and classroom discussions (Whitman 1920b: 30). In this way, what scholars identified as the "danger" of the standardized test-as an overly narrow measure of science achievement (Kilpatrick 1921: 282)-could be mitigated through its linkage with other emerging practices of research and pedagogy.
Moreover, the numerical precision of standardized tests afforded a "more secure basis" for distributing pupils into different levels of science education (Ruch 1923: 196). They offered a single measurement that could be used to index past educational experiences, indicate present levels of proficiency, and predict degrees of readiness to access a particular educational objective or pedagogical approach. By claiming to reveal natural differences in a mechanical way, the tests promised an objective basis for segregating "low grade mental types" (Hunter 1920: 382), differentiating science courses into fast-and slow-moving sections (Ruch 1923), and guiding students toward vocations for which they appeared mentally fit (Whitman 1922). The tests' numerical precision made it possible to classify students into a growing range of course levels and sections, whose pacing and pedagogical approaches could be calibrated along a clear, linear (or, recapitulatory) progression.
Through the concrete practices of designing and validating the first standardized science tests, a transformation can be observed. Extant theories of racialized differences in group beliefs, behaviors, and IQ became sedimented in tests of science ability. The tests assessed the degree to which an individual adopted information codified as scientific, adapted to the social norms of the classroom (e.g., spontaneity), and rejected those views labeled as superstitious (e.g., Friday the 13 th ). By calibrating standardized tests to teachers' assessment of pupils' personalities, specific cultural values gained momentary visibility in methodological discussions before becoming embedded and effaced through statistical procedures of quantification and correlation. What became codified as "science achievement" had less to do with the natural sciences than with social science practices of classifying mental qualities of populations to guide their proper education, Americanization, and sexual differentiation. The data generated by these assessments effectively produced the differences that they purported to reveal. Fabricated distinctions between biological races and sociological types became tethered to and reconfigured as a split between scientific and unscientific minds-one that could now be measured as degrees of science ability or achievement.
The standardized test, coupled with the developmental scale and survey techniques, offered a new mode of producing and sorting difference in the science classroom. In projecting science ability as a set of universal ideals, difference could only be seen as deviation from norms of sound reasoning, correct knowledge, and healthy habits. Standardized science tests made new differences visible, calculable, and governable, ordering individual pupils and subpopulations along an evolutionary trajectory and matching them with distinct levels of instruction. In the presumed symmetry between psychological and civilizational development, science achievement operated as a "dense transfer point for relations of power" (Foucault 1990: 103)-a site for ranking individual merit, delineating national belonging, and regulating racialized groups deemed unready for democratic participation.
It is significant that notions and metrics of science achievement were assembled at a particular historical moment, one in which today's concepts of equity and diversity did not exist. As discussed, the explicit goals of early twentieth-century U.S. science education included sorting the leaders from the led, separating out the feeble-minded, teaching girls their place in the domestic sphere, and assimilating immigrant groups (e.g., Hunter 1920). This history matters, because theories and techniques invented in the early 1900s have become blackboxed (Latour 1999) and continue to circulate in modified forms within science classrooms today.
The next section outlines a few shifts that appear to separate current U.S. science education policy reforms from the early 1900s premise of a natural hierarchy of science ability. Gradually, terms like intelligence, science capacity, and aptitude were dropped in favor of achievement. Over decades, equity-oriented concerns arose in terms of the underrepresentation of women and racial minorities in scientific fields (e.g., Crowley 1977). National policy reports began to argue that racial and gender gaps in performance were not signs of an inevitable evolutionary order (as presupposed earlier), but rather evidence of unjust disparities facing groups that had "largely been bypassed in science and mathematics education" (AAAS 1990: xviii). Despite this important shift, many of the early twentieth-century practices that projected a demographic difference in science achievement had, by the mid-to latetwentieth century, been reconfigured but not replaced.

The Next Generation of Science Achievement (Early 2000s)
The recent U.S. Next Generation Science Standards (NGSS) formulate equity as a technical problem of "making diversity visible" to differentiate instruction for various demographic groups (NGSS Lead States 2013a: 364). The historical analysis above has indicated that making diversity visible is not a neutral, passive reading of social reality. Embodied in school science are cultural norms that are productive of new distinctions. Setting aside the question of authors' intentions, I focus here on the classificatory techniques that make certain differences appear as objective entities to which teachers must respond-given in current policy as the "learning needs of the nation's increasingly diverse student population" (p. 359). Following Latour (2004), rather than subtracting reality from science achievement, the purpose of historicizing is to add reality back to it by identifying tethers once linked to its historical network of production. This section will highlight a few clear differences in how science achievement appears in the NGSS versus in General Science Quarterly (GSQ), which can be thought of as discarded tethers. It also highlights a few resemblances in how science achievement is seen and ordered. Since this is not an evolutionary history, these resemblances may not be continuities, but rather "partial reinscriptions, modified displacements, and amplified recuperations" (Stoler 2016: 27). The point is to open up this taken-for-granted category of science achievement for further investigation so as to ask what it "authorizes, and what precisely it excludes or forecloses" (Butler 1993: 7, emphasis in original). By recognizing science achievement as a historical artifact, it no longer appears natural or inevitable. Unlike GSQ, the NGSS reject hereditary notions of mental ability and refute deficit stereotypes by asserting the capability of all students to learn science. "[R]eports continually highlight that when provided with equitable learning opportunities, students from diverse backgrounds are capable of engaging in scientific practices and constructing meaning" (NGSS Lead States 2013a: 359). The absolutist language of incapacity is out. This is also the case in international assessment programs. Science literacy-in contrast with older notions of science ability, talent, or potential-is viewed "not as an attribute that a student has or does not have, but as a set of knowledge and skills that can be acquired to a greater or lesser extent" (OECD 2016: 1). These statements suggest that the field has dismissed past definitions of science achievement as one fixed trait in favor of science achievement as a malleable, multidimensional set of understandings and practices that everyone can (and therefore, should) be supported to acquire.
Nevertheless, current notions of science achievement still produce distinctions. Although today's science assessments may be understood as indexing multiple dimensions of science-related knowledge, practices, and dispositions, they continue to register proficiency in science as a single number. Similar to GSQ, the NGSS take standardized tests as objective measures of something called scientific thinking. Psychometric techniques transfigure the heterogeneity of the scientific disciplines into "science" as a universalized quality of mind-one that differs in degree and appears unevenly distributed in the population. Achievement data array individuals and demographic groups onto a numerical scale assumed to reveal relative amounts of scientific knowledge, conceptual understanding, and reasoning. Statistical techniques, combined with populational reasoning, make it possible to identify a person or a group of people as high-achieving or as underperforming in science. Science achievement continues to be posited as a feature of the mind that operates as a "potential site of unity" (Baker 2013: 38)-one that divides and abjects some as not-yet-qualified for inclusion with in that unity.
In spite of the repudiation of deficit thinking, what persists is talk of relative differences in students' current capability, or readiness, to access a certain level of cognitive demand: [A]chievement gaps in science and other key academic indicators among demographic subgroups have persisted … As these new standards are cognitively demanding, teachers must make instructional shifts to enable all students to be college and career ready … [and] to ensure that the NGSS are accessible to all students. (NGSS Lead States 2013a: 359) In name, the focus has shifted from a problem within the child (e.g., cognitive deficit) to a problem with the curriculum (e.g., cognitive demand) (see Brickhouse 1994). Yet because raising cognitive demand is presented as key to national competitiveness, the problem is not ultimately located in the curriculum, but in the mismatch posited between subpopulations of students and the (necessarily) demanding curriculum.
Moreover, the historical accumulation of data points reinscribes distinctions between non-dominant groups that "traditionally struggled to demonstrate mastery [on] less demanding standards" versus "those who can and should surpass the NGSS" (NGSS Lead States 2013a: 359). Trends in achievement data make it appear sensible to promote a "two-pronged approach" to K-12 science education (p. 370), where "low-performing at-risk groups" must be elevated to the baseline of the standards through pedagogies that make science more accessible and "concrete" (NGSS Lead States 2013c: 6), while "our future innovators" need access to science instruction that is more advanced and "abstract" (NGSS Lead States 2013d: 2). Data fabricate a division between certain racial and ethnic groups as requiring interventions to meet the standards, and their unmarked peers as deserving opportunities to exceed this baseline. In effect, rather than critiquing tracking as an equity problem, the stratification of science coursework becomes naturalized as a reasonable response to the distinct achievement, or readiness, ascribed to racialized groups.
As in GSQ, the NGSS operate within a developmental logic that divides children and curricula into different kinds ordered along a hierarchical scale. The case studies that accompany the NGSS depict "gifted and talented students" as above grade level and "economically disadvantaged students" as below grade level (NGSS Lead States 2013b, d). Being located above or below on this scale is then linked to different curricular content and pedagogies. The gifted and talented are matched with more abstract, open-ended, and complex pedagogies (NGSS Lead States 2013d). In contrast, economically disadvantaged students and major racial and ethnic groups are said to require pedagogies that connect science to the physical dimensions and tangible problems of their local community (NGSS Lead States 2013b, c).
Of course, there are crucial differences between the developmental scales in GSQ and the NGSS. Rather than presuming racial categories to differ by nature as in recapitulation theory, distinctions now appear through numerical data taken to indicate that not all are ready for the same level of instruction. Moreover, the politics have changed. Whereas GSQ's locally-focused project method was discussed as Americanizing the unscientific masses, the NGSS' place-based, project-based approach is offered as empowering for students from historically underserved groups. Yet, in this effort to empower, local and applied aspects of science are nevertheless positioned as compensatory strategies for making science accessible to traditionally underperforming groups, and as contrasting with the pedagogies designated for children labeled as gifted and talented. A danger is that pedagogies intended to close achievement gaps may inadvertently reiterate a century-old pattern in the United States-treating those racialized as non-White as not-yet-ready for the more "abstract" instruction designated for those seen as "potential scientists" or "future innovators." Another important difference from the past is the repudiation of claims of cultural superiority linked to evolutionary stages of civilization that were widely taken for granted in the early twentieth-century U.S. social sciences. Whereas GSQ scholars relied on sociological studies to identify the superstitions of less evolved groups, the NGSS reject the tendency to focus on the deficits of "non-dominant" groups and instead call for valuing these students' diverse backgrounds. Nonetheless, because the standards conceptualize science as a universal set of concepts and practices derived from the disciplines, not all backgrounds become equally valued. Specifically, the NGSS contrast the "academic backgrounds" of dominant groups with the "cultural knowledge" of non-dominant groups (NGSS Lead States 2013a: 359), where only the latter must be filtered for connections and disconnections with science (p. 364). Here, the standards have already stabilized the science from which (dis)connections can be seen, and elevated the backgrounds of dominant groups as more closely corresponding with a universal science and thus as rising above culture (i.e., as academic rather than cultural). Despite calls to value cultural diversity, the NGSS articulate the purpose of school science as supporting non-dominant groups to "transition from their naïve conceptions of the world to more scientifically based conceptions" (p. 363). As Brown (2006) observes, where once culture was elevated as the unique property of civilized societies (versus primitive groups cast as closer to nature), today those marked as "cultural" are typically those populations positioned as furthest behind or as yet to enter the global knowledge economy. This dangerous logic presumes that while a cosmopolitan "we" may have culture, culture has "them" (p. 151).
The past is not repeated in the present, but new assemblies of tools and theories continue to codify science as a universal ideal that generates cultural distinctions, dividing students and the science instruction they appear to demand. At issue is how it became possible to conceive of human beings as different types of thinkers, of students as more or less ready for a particular "level" of thought, and of science instruction as existing in discrete but developmental forms (i.e., concrete to abstract) that correspond to these types of minds. These notions are not natural, but emerge from a network of heterogeneous theories and techniques, and the epistemic, political, and moral principles they carry. To reduce all of these elements to a psychologized problem of deficit thinking within the mind of the teacher would be to obscure how the ordering strategy functions. As comparative distinctions, science ability and achievement depend on the production of abnormal Others as lacking in ability or behind in achievement.
Consequently, a paradox appears in current efforts to promote equity through the paradigm of achievement. As measures of science achievement interact with developmental trajectories of school science, they fabricate different "kinds" of students and match them with hierarchical levels of science education. Passed down through the decades of the twentieth century, the psychological construct of science achievement has already expunged the dynamic variation within and between the sciences, as well as erased any obvious trace of the sociological taxonomies, racial and religious hierarchies, and cultural normativities that once assembled to stabilize science achievement as both a criterion of the ideal American citizen and a universal standard of comparison. Numerical distance from that fabricated and universalized ideal would become one of the primary differences inscribed and reinscribed in science classrooms, research studies, and policy reports; in effect, it would become the "diversity" made most visible.
This chapter highlights how diversity is not simply the recognition of representational categories of people. Rather, there is a need to examine more closely how central characteristics of U.S. science education-its goal of fostering independent thought and individual agency through scientific methods, its egalitarian emphasis on making science relevant to populations' varied needs, and its pragmatic focus on designing solutions to local community problems-emerged in relation to concerns about the nature of the child, reason, and democracy that were entangled historically with racializing distinctions. This analytical approach calls attention to the limits of current discourses of "making diversity visible" in understanding the paradoxes of inclusion and exclusion in schooling today.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.