Which factors contribute to standardized test scores for prospective general science teachers: An analysis of the Praxis(cid:0) General Science Content Knowledge Test.

This study aims to contribute to the knowledge base of what is known about the general science certi�ed teaching population. Studies have shown that science teacher content knowledge is among the most foundational components of effective teaching and learning. Our study analyzes the Praxis (cid:0) General Science Content Knowledge Test (GSCKT) from May 2006-June 2016. We present one of the largest datasets comprising 28,688 general science teacher candidates in order to provide information about their demonstrated general science content knowledge. Our results can be used to design targeted professional learning experiences for pre-and inservice teachers with the objective of strengthening teacher recruitment and retention efforts. Findings from this study are particularly useful while planning inservice topic speci�c professional learning for teachers pre-and inservice GS teachers by answering the following research questions (1) How have personal and professional characteristics correlated with Praxis (cid:0) GSCKT performance in the last decade? (2) How have examinees performed as a whole in each category on the Praxis (cid:0) GSCKT? (3) Which personal and/or professional characteristics have been associated with examinee performance in each category? What have been the relative category performances of examinees of varying characteristics? Examinee performance at the category level was analyzed through a four-part process: 1. Percent correct; 2. Regression; 3. ANOVA; 4. Scaled points lost. Our �ndings revealed that examinees demonstrated strongest performance in the topics assessing Life Science and identi�ed Earth & Space Science as a topic in need of support. Across categories, we found differences in achievement associated with undergraduate major, gender, and ethnicity. Test-takers with STEM majors consistently lost fewer points than their out-of-eld counterparts, that men outperformed women in the study, and White test-takers lost fewer scaled points than Black and Hispanic candidates. Our recommendations include reviewing our results for alignment with state standards in order to develop comprehensive content knowledge development that will be used as an anchor for focused support on those topics where test-takers tend to demonstrate lowest pro�ciency.


Introduction
Middle school science experiences play a pivotal role in forming students' STEM identities as they progress through academic and career trajectories.Their teachers are unique in that they require broader CK across topics than their high school counterparts who primarily focus on one discipline.This means that they must be able to make abstract science concepts relatable to students with diverse learning needs (Mesa & Pringle, 2019).At the center of science teacher research are efforts to improve student learning.For this to happen teachers need to have an understanding of the content they teach (Sadler et al., 2013).Teacher licensure testing is designed to ensure a baseline quality of teacher content knowledge (Goldhaber & Hansen, 2010).
Due to a national shortage of certi ed science teachers in the United States, many states have been forced to adopt policies that allow teaching assignments across disciplines without having to demonstrate mastery in a speci c eld.Although the majority of states require disciplinary speci c endorsements, many states continue to allow secondary science teachers to teach across disciplines with a general science certi cation (National Council on Teacher Quality, 2010).An area of particular concern is that new science teachers are often assigned to teach out-of-eld more frequently than those with more experience which ultimately negatively impacts development and leads to attrition (Nixon, Luft, & Ross, 2017;Shah et al., 2019).Understanding that teachers are primarily responsible for student learning outcomes, it is imperative that teachers are supported in content knowledge (CK) development.In this study, we focus on CK of prospective general science teachers through analysis of the Praxis General Science Content Knowledge Test (GSCKT) with the objective of informing professional learning (PL) experiences.

Background & Literature Review
Licensure testing is common, and nearly all states include them as part of the requirements for teaching in public schools.Across the United States, the Praxis ® II content knowledge tests are most commonly administered to assess knowledge and competencies of an entry-level teacher (Goldhaber & Hansen, 2010).Each state determines their passing score.This means that although the scaled score earned by each test is considered the same across states, the passing score may differ.In order to minimize the impact of having experience with the assessment, multiple test forms are offered throughout the year, and questions vary between test forms (ETS, 2018).The GSCKT is a 2.5 hour computer-delivered selected response examination that includes integration of basic topics in chemistry, physics, life science, and As part of the commitment to offer high quality tests with minimal bias, ETS evaluates the assessment using differential item functioning (DIF).DIF allows test developers to determine whether people in different groups (typically gender or race) perform differently on test items.Groups of people are matched using the content and skills scores from the test or section of the test.DIF occurs when people within matched groups perform differently on test items.Each question is assigned a category.Category A indicates questions with little or no difference between matched groups.Category B contains questions with small to moderate differences, and Category C indicates questions with the greatest differences.
Test developers select questions from category A whenever possible.If there are not enough category A questions, category B questions are used, with preference given to those with the smallest DIF values.Developers only use category C questions if they are considered essential and must document the reasons why those questions are selected (Zieky, 2003).This study included a DIF analysis following similar methodology of ETS (Zieky, 2003).We applied a False Discovery Rate (Benjamini & Hochbert, 1995) which increased statistical power during hypothesis testing by reducing false positives and reduced type I error.Subjects were grouped into quartiles and a logistic regression was run that included gender and race.Our analysis does not account for sample sizes.Results from these analyses are reported in Supplemental Materials Table B and C.
Teacher learning experiences are diverse and include everything from formal topic-speci c seminars to informal collegial conversations in school buildings.They begin through teacher preparation programs and continue once teachers enter the classroom (Desimone, 2009).The content focus paired with how students learn that content is considered among top characteristics of effective teacher learning experiences in uencing student achievement (Desimone, 2009;Sadler et al., 2013).Science instruction includes learning new ideas while unlearning old ones, therefore knowledge of and ability to reveal common misconceptions is central to designing quality learning experiences.Because middle school science teachers are frequently required to teach across disciplines it is important to leverage teacher training and PL opportunities that emphasize gaps in teachers' knowledge (Sadler et al., 2013).

Research Questions
This study investigates the following research questions (1) How have personal and professional characteristics correlated with Praxis ® General Science: Content Knowledge test performance in the last decade?(2) How have examinees performed as a whole in each category on the Praxis General Science: Content Knowledge Test? (3) Which personal and/or professional characteristics have been associated with examinee performance in each category?What have been the relative category performances of examinees of varying characteristics?

Conceptual Framework
It is our assertion that strong foundational CK has a positive impact on licensure examination performance.Figure 1 presents a model depicting the relationship between science teacher professional knowledge and skills and student learning.Teacher knowledge has a direct impact on instructional design and is considered among the most in uential factors contributing to student achievement (Goldhaber & Hansen, 2010;Keller, Neumann, & Fischer, 2017;Sadler et al., 2013).Teachers' self-e cacy and multiple identities (personal and professional characteristics) associated with unique backgrounds in uence development as they grow in their practice (Cochran-Smith, 2012, Lotter et al., 2016).Because of this, science instruction is impacted by the beliefs teachers hold about science teaching (Lotter, et al., 2016).Strong content knowledge, in-eld or out-of-eld placement, access to mentors and learning communities are among factors contributing to recruitment and retention of STEM teachers (Cochran-Smith, 2012).
Science teacher identity is dynamic and complex.It is a socially constructed ongoing process that describes the teacher within a personal and/or professional context (Beauchamp & Thomas, 2009;Polizzi et al., 2021).Student populations are becoming increasingly diverse yet science learning environments tend to be Eurocentric (Cheruvu et al., 2015, Mensah & Jackson, 2018).This results in overrepresentation of White middle-class candidates with monocultural perspectives within the STEM elds.Understanding that preservice teachers of color experience burdens including subliminal racism or impostor syndrome, it is thereby important to consider recruitment, retention, and professional learning efforts that foster development of underrepresented science teacher candidates (Cheruvu et al., 2015, Mensah & Jackson, 2018).
Instructional quality when thought of as a continuum, is in uenced by licensure policies and practices.
Learning to teach is dependent upon both CK and pedagogical content knowledge (PCK).CK differs PCK in that PCK is speci c to the knowledge possessed by teachers that is used to transfer content to students and incorporates an understanding of how students learn those content and skills speci c to the discipline (Ball, 2000;Minor et al., 2016;Shulman, 1986).Teacher CK impacts instructional design and is central to PCK.It is considered among the most in uential factors contributing to student achievement (Goldhaber & Hansen, 2010;Keller, Neumann, & Fischer, 2017).Much like their students, science teachers also enter the classroom with misconceptions about the content they teach (Kartal, Öztürk, & Yalvaç, 2011).
High quality, differentiated PL is central to improving instruction, organizing curriculum, facilitating clear communication of ideas, and creating 3D science learning experiences.Within the context of the Next Generation Science Standards, this incorporates science & engineering practices, disciplinary core ideas, and cross cutting concepts (National Research Council, 2012).An integral feature of PD theory of action is how to facilitate transfer of new ideas into systems of practice inside the classroom when knowledge and skills gained through PD commonly occur outside of the classroom (Kennedy, 2016).Understanding what teachers need to know, how they have to know it, and how to help them learn it (Ball, 2000) has the potential to increase CK, PCK, and overall con dence in teaching.Because of this, teachers' initial CK must be taken into consideration when planning for instructional support (Minor et  we present ndings about examinee performance on the Praxis GSCKT as a whole and at the category level.In order to gain insight into examinee performance on the assessment we followed a four-part methodology similar to Ndembera et al., 2022 which is reiterated below.1. Percent correct; 2. Regression; 3. ANOVA; and 4. Scaled points lost per category. Study Sample The data analyzed in this study included examinees who sat for the Praxis ® GSCKT from 2006-2016.Because the test-takers may take the exam more than once the data was restricted to the highest score resulting in a study population of 28,688.Examinee data included self-reported demographic characteristics, selected demographics are presented in Table 1, full descriptive statistics are found in Supplemental Table A. After exclusions were applied, the majority of test-takers were female, comprising 60.1% of the testing population as compared to 39.5% male.Of those who responded, white test-takers comprised 77.7% of the study population, Black and Hispanic test-takers represented 8.3% and 2.5% respectively.Biology undergraduate majors represented the largest testing population, comprising 29.2% followed by other non-STEM (24.8%), other STEM (11.8%), physical science (9.4%) and Earth & space science (3.8%).77% of the testing population held undergraduate grade point averages (GPA) above 3.0.59.3% reported that they had not yet entered the teaching eld, 17.4% had completed more than 3 years of teaching, and 17.3% had 1-3 years of teaching.It can be inferred that these teachers registered for the assessment as part of an additional certi cation.For each category, the three variables explaining the greatest η 2 were analyzed in order to determine an estimation of scaled points lost.
Estimation of Scaled Points Lost Per Category Scaled points lost were calculated in order to determine examinees' relative performance and provide information on whether there were disciplinary content areas in need of support using the equation: Scaled points lost C1 = m(total number of questions C1) -m(number of correctly answered questions C1) where m was equal to the slope between scaled score and total questions correct on the exam (

Stepwise Model
The stepwise linear regression yielded several statistically signi cant relationships between reported personal and professional characteristics and performance on the GSCKT.Undergraduate major, ethnicity and gender were identi ed by the regression model (Table 2) as the top demographic variables associated with performance.The F values and associated P < .0001values con rm that the independent variables in the model reliably predict the dependent variable, test taker performance.Reported R 2 and η 2 values can be expressed as a percentage and provide information about the proportion of variance in the scaled score accounted for in the sample.Results are presented as an average because there was not much change over time in undergraduate major, ethnicity, and gender over the decade studied.Whiskers indicate the variability of the mean scaled scores and points outside the whiskers represent outliers.

Scaled Score
Table 2 and Fig. 2 present results from the analysis of the Praxis ® GSCKT as a whole and offer additional context for research question 1. Undergraduate major (Fig. 2) explained 11% of the overall variance (Table 2) in the General Science CKT.Test-takers with physical science degrees demonstrated the highest performance on the assessment followed by Earth & space science, biology, and other STEM majors with average scaled scores of 175, 171, 167, and 166 respectively.Other STEM included majors such as engineering, mathematics, and computer science.Non-STEM majors demonstrated lowest performance on the assessment.Ethnicity (Fig. 2) explained 7% of the overall variance (Table 2) in the assessment.Without considering other interacting factors, there are presented differences in achievement between White and Black or Hispanic test-takers.The greatest variability was found in Black and Hispanic testtakers, 144 and 158 average scaled points respectively.Over the decade studied we found that mean scaled scores of White examinees outperformed Black examinees by 20 scaled points and outperformed Hispanic examinees by 5 scaled points.In order to determine the extent to which the assessment serves as a barrier to the teaching eld, additional information is needed about the states in which Black examinees are likely to test.Although females outnumber males in the assessment sample, males earned an average of 8 scaled points higher than their female counterparts.Gender explained 6% of the overall variance (Table 2) in the GSCKT.

Category
Table 2 presents results from the category analysis of the Praxis ® GSCKT.To provide additional context for research question 2, our results and analysis focus on the physical science, life science, and Earth & space science categories because those most closely align with the undergraduate majors represented.Life and Earth & space science topics each comprised 20% of the exam.Estimated percent correct performance on life science Earth & space were 75% and 67% respectively.Physical science consists of questions assessing chemistry and physics topics.While it makes up the largest portion of the exam at 38% it had the lowest estimated percent correct (64%).The ANOVA model presented in Table 3 was developed as part of research question 3. Our correlational analysis of the stepwise linear regression at the category level revealed several statistically signi cant relationships.Table 3 presents examinee characteristics most strongly correlated with category performance on the Praxis ® GSCKT.For the three major categories assessed, the F Values (Table 3) and associated p < .0001values con rm the demographic variables represented within the model account for a signi cant portion of the variability at the category level.Reported η 2 values provide information about the proportion of variance in category score accounted for in the sample.Strong relationships between reported demographic variables and test-taker performance at the category level are represented in the large η 2 effect sizes as seen in Table 3.Comparison of means reveals differences in achievement most consistently associated with undergraduate major, ethnicity, and gender across the three categories presented in the study.These data were further analyzed at the category level to make comparisons in scaled points lost and are presented as graphical representations in Fig. 3.
The three characteristics most strongly associated with performance in the category assessing physical science (Fig. 3) were undergraduate major, ethnicity, and gender.They explained 12.1%, 4.4%, and 3.3% of the total variance (Table 3).Physical science majors lost the fewest scaled points (11.9) and outperformed non-STEM majors by 12 scaled points.White test-takers lost an average of 18.6 scaled points, Hispanic test-takers lost an average of 19.9 scaled points, and Black test-takers lost an average of 25.8 scaled points.Male test-takers lost 3 fewer scaled points than female test-takers.
Undergraduate major, ethnicity, and undergraduate GPA were most strongly associated with performance in the life science category of the assessment (Fig. 3) explaining 11.7%, 3.5%, and 1.7% of the total variance (Table 3).Biology (4.8) and physical science (7.0) majors lost the fewest scaled points.White test-takers lost an average of 6.2 scaled points, Hispanic test-takers lost an average of 6.5 scaled points, and Black test-takers lost an average of 9.4 scaled points.Test-takers with undergraduate GPAs of 3.5-4.0lost an average of 5.8 scaled points, outperforming those with undergraduate GPAs below 2.99 (7.5 scaled points).
In the category assessing Earth & space science (Fig. 3) undergraduate majors, ethnicity, and gender explained 13.8% of the overall variance (Table 3) with undergraduate major accounting for 4.7% of the overall variance.ESS (4.9) and physical science (7.9) majors lost the fewest scaled points.In alignment with the other categories and test as a whole, non-STEM majors lost the most points (10.0).White testtakers lost an average of 8.3 scaled points, Hispanic test-takers lost an average of 10.1 scaled points, and Black test-takers lost an average of 12.8 scaled points.Males and females performed similarly listing an average of 7.6 and 9.6 scaled points respectively.

Discussion & Implications for Practice
While general science teachers earn degrees to specialize in one science content area, they are responsible for demonstrating foundational knowledge across disciplines.Understanding how science teachers' knowledge progresses over time is essential as professional developers design and facilitate targeted learning experiences (Schneider & Plasman, 2011).Our ndings revealed differences in performance on the assessment as a whole and within sub-discipline categories most commonly associated with both professional characteristics including undergraduate major and undergraduate GPA and personal characteristics such as gender and ethnicity.The estimated percent correct (Table 3) was lowest for the physical science category.This category combines chemistry and physics topics, thus warranting details about the questions and test-takers themselves in order to offer context about performance.
Ethnic representation of the testing population did not match the overall makeup of the United States according to the US Census Bureau within the testing window (US Census Bureau, 2018) where those who identify as Black or Hispanic make up respectively 13.4% and 18.3% of the US population but only 8.3% and 2.5% of the population studied.
Across physical science, life science, and Earth & space science categories, test-takers demonstrated strongest performance in the category that best aligned with their undergraduate major (Fig. 3).
Examinees across disciplines lost the fewest scaled points and performed most similarly in the category assessing life science topics.
As seen in previous research (Ndembera et

Recruitment & Retention
With the growing science teacher shortage and high teacher turnover, pre and inservice teachers must be supported in order to promote con dence in teaching.Although many undergraduates enrolled in STEM majors may not have considered education as a career option, it is critical to expose them to the eld of teaching early in their post-secondary educational program.Programs are encouraged to offer inquirybased learning through early eld experiences (Taskin-Can, 2011; Luft et al., 2011).Identi cation of students with an a nity towards STEM disciplines as early as high school can help strengthen these efforts.Placing preservice teachers with strong mentors during eld experience will strengthen recruitment efforts and facilitate development of effective science educators (Dailey, Bunn, & Cotabish, 2015).In this way they will be more likely to include coursework that aligns with state certi cation requirements as they progress in their studies (Dailey, Bunn, & Cotabish, 2015).We assert that these early exposures will also facilitate diversi cation of the eld.

Professional Learning
Earth science in alignment with the National Science Education Standards and National Science Teacher Association standards.Although content domains are determined by practitioners in each eld (ETS, 2018), overall GS content categories include (1) Science Methodology, Techniques, and History, (2) Physical Science, (3) Life Science, (4) Earth and Space Science, (5) Science, Technology, and Society (ETS, 2021).

Figure 1 Model
Figure 1

Table 1
Descriptive Statistics: Detailed list of personal and professional characteristics of General Science: Content Knowledge test-takers from 2006-2016.See Supplemental Materials Table A for a complete set of reported demographic variables.Estimation of Categorical Percent Correct Information on the highest number of points each test taker earned per category was provided in the ETS dataset.Because it did not include the number of test items for each category, we used the highest reported number of items to represent the total number of questions.The following equation was used to estimate the categorical percentage score for each examinee: Regression Model Selection A stepwise linear regression was performed on the whole data set using a 10fold cross validation procedure in order to estimate associations between self-reported testtaker characteristics and category performance on the Praxis ® GSCKT.This allowed us to determine which groups of teachers would be most likely to be in need of CK development.Examinee characteristics were categorized as personal (ie.gender, age, ethnicity) or professional (ie.undergraduate major, graduate major, years in teaching, undergraduate GPA).Demographic variables identi ed by the regression model are presented in Table 2.

Table 2
Top examinee characteristics most strongly associated with performance on the Praxis ® General Science: Content Knowledge test.

Table 3
Results from category analysis of Praxis® General Science: Content Knowledge Test.Total and Partial η 2 results of one-way ANOVA analyses for examinee performance in each category of the Praxis General Science Subject Assessment are presented here.Supplemental Materials Table B presentes the full list of demographic variables for each category.The general rules for η 2 ANOVA effect sizes are; small: 0.01, moderate: 0.06, large 0.14.* corresponds to corrected signi cance levels of < .0001.**approximate percentage of examination (ETS, 2020) al., 2021, Ndembera et al., 2022, Shah et al., 2018) males outperformed females on the assessment as a whole and across categories.Although Black-White and Hispanic-White ethnicity DIF analysis was relatively low for Category C (Supplemental TableC) questions per test form, similar trends were identi ed in regards to ethnicity, with White test-takers scoring above Black and Hispanic counterparts.