Relationships Between the Way Students Are Assessed in Science Classrooms and Science Achievement Across Canada

  • Man-Wai Chu
  • Karen Fung


Canadian students experience many different assessments throughout their schooling (O’Connor 2011). There are many benefits to using a variety of assessment types, item formats, and science-based performance tasks in the classroom to measure the many dimensions of science education. Although using a variety of assessments is beneficial, it is unclear exactly what types, format, and tasks are used in Canadian science classrooms. Additionally, since assessments are often administered to help improve student learning, this study identified assessments that may improve student learning as measured using achievement scores on a standardized test. Secondary analyses of the students’ and teachers’ responses to the questionnaire items asked in the Pan-Canadian Assessment Program were performed. The results of the hierarchical linear modeling analyses indicated that both students and teachers identified teacher-developed classroom tests or quizzes as the most common types of assessments used. Although this ranking was similar across the country, statistically significant differences in terms of the assessments that are used in science classrooms among the provinces were also identified. The investigation of which assessment best predicted student achievement scores indicated that minds-on science performance-based tasks significantly explained 4.21% of the variance in student scores. However, mixed results were observed between the student and teacher responses towards tasks that required students to choose their own investigation and design their own experience or investigation. Additionally, teachers that indicated that they conducted more demonstrations of an experiment or investigation resulted in students with lower scores.


Classroom assessments Assessment types Item formats Science-based performance tasks Pan-Canadian Assessment Program 



Preparation of this paper was supported by Council of Ministers of Education Canada (CMEC). CMEC encourages researchers to express freely their professional judgment. This paper, therefore, does not necessarily represent the positions or the policies of CMEC and no official endorsement should be inferred.


  1. Abrahams, I., & Millar, R. (2008). Does practical work really work? A study of the effectiveness of practical work as a teaching and learning method in school science. International Journal of Science Education, 30(14), 1945–1969. Scholar
  2. American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME]. (2014). Standards for educational and psychological testing. Washington, DC: Author.Google Scholar
  3. Barab, S. A., Gresalfi, M. S., & Ingram-Goble, A. (2010). Transformational play: using games to position person, content, and context. Educational Researcher, 39(7), 525–536 Retrieved from Scholar
  4. Bennett, R. E., & Gitomer, D. H. (2009). Transforming K-12 assessment: integrating accountability testing, formative assessment and professional support. In C. Wyatt-Smith & J. J. Cumming (Eds.), Educational assessment in the 21st century (pp. 43–61). New York, NY: Springer.CrossRefGoogle Scholar
  5. Bennett, R.E., Persky, H., Weiss, A.R., and Jenkins, F. (2007). Problem solving in technology-rich environments: a report from the NAEP technology-based assessment project (NCES 2007–466). U.S. Department of Education. Washington, DC: National Center for Education Statistics. Retrieved from the Institute of Education Sciences website: .
  6. Black, P., & Wiliam, D. (1998). Inside the black box: raising standards through classroom assessment. London: School of Education, King’s College.Google Scholar
  7. Chalmers, A. F. (1999). What is this thing called science? Indianapolis, IN: Hackett Publishing Company.Google Scholar
  8. Chu, M-W. (2017, March). Using computer simulated science laboratories: a test of pre-laboratory activities with the learning error and formative feedback model. Unpublished doctoral dissertation, University of Alberta, Edmonton.Google Scholar
  9. Council of Ministers of Education, Canada [CMEC] (2013a). Pan-Canadian assessment program PCAP—2013 student questionnaire. Council of Ministers of Education, Canada. Toronto: Author. Retrieved from
  10. Council of Ministers of Education, Canada [CMEC] (2013b). Pan-Canadian assessment program PCAP—2013 teacher questionnaire. Council of Ministers of Education, Canada. Toronto: Author. Retrieved from
  11. Council of Ministers of Education, Canada [CMEC] (2014). Pan-Canadian assessment program 2013: report on the pan-Canadian assessment of science, reading, and mathematics. Council of Ministers of Education, Canada. Toronto: Author. Retrieved from
  12. Duncan, & Noonan. (2007). Factors affecting teachers’ grading and assessment practices. Alberta Journal of Educational Research, 53(1), 1–21 Retrieved from Scholar
  13. Frontline (2014). The testing industry’s big four: profiles of the four companies that dominate the business of making and scoring standardized achievement tests. Retrieved from
  14. Fung, K., & Chu, M.-W. (2015). Fairness of standardized assessments: discrepancy between provincial and territorial results. Journal of Contemporary Issues in Education, 10(1), 2–24.
  15. Gobert, J., Sao Pedro, M., Raziuddin, J., & Baker, R. (2013). From log files to assessment metrics for science inquiry using educational data mining. Journal of the Learning Sciences, 22(4), 521–563 Retrieved from Scholar
  16. Hodson, D. (1996). Laboratory work as scientific method: three decades of confusion and distortion. Jounral of Curriculum Studies, 28(2), 115–135. Scholar
  17. Hodson, D. (2003). Time for action: science education for an alternative future. International Journal of Science Education, 25(6), 645–670. Scholar
  18. Hofstein, A., & Lunetta, V. N. (2003). The laboratory in science education: foundations for the twenty-first century. Science Education, 88(1), 28–54. Scholar
  19. Leighton, J. P., Chu, M.-W., & Seitz, P. (2013). Cognitive diagnostic assessment and the learning errors and formative feedback (LEAFF) model. In R. Lissitz (Ed.), Informing the practice of teaching using formative and interim assessment: A systems approach (pp. 183–207). Charlotte: Information Age Publishing.Google Scholar
  20. Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education/Praeger.Google Scholar
  21. Klinger, D. A., & Saab, H. (2012). Educational leadership in the context of low-stakes accountability: the Canadian perspective. In L. Volante (Ed.), School leadership in the context of standard-based reform: International perspective (pp. 69–94). New York, NY: Springer Science + Business Media.CrossRefGoogle Scholar
  22. Klinger, D., DeLuca, C., & Miller, T. (2008). The evolving culture of large-scale assessments in Canadian education. Canadian Journal of Educational Administration and Policy, 76, 1–34.Google Scholar
  23. Krathwohl, D. R. (2002). A revision of bloom’s taxonomy: an overview. Theory Into Practice, 41(4), 212–218 Retrieved from Scholar
  24. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.Google Scholar
  25. Ma, J., & Nickerson, J. V. (2006). Hands-on, simulated, and remote laboratories: a comparative literature review. ACM Computing Surveys, 38(3), 7. Scholar
  26. McMillan J. H. (2001). Fundamental assessment principles for teachers and school administrators. Practical Assessment, Research & Evaluation, 7(8). Retrieved from
  27. National Research Council. (2006). America’s lab report: investigations in high school science. Committee on High School Science Laboratories: Role and Vision, In S. R. Singer, M. L. Hilton, and H. A. Schweingruber, (Eds.). Board on Science Education, Center for Education. Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press. Retrieved from
  28. National Research Council. (2014). Developing assessments for the next generation science standards. In J. W. Pellegrino, M. R. Wilson, J. A. Koenig, & A. S. Beatty (Eds.), Division of behavioral and social sciences and education, Committee on Developing Assessments of Science Proficiency in K-12. Board on Testing and Assessment and Board on Science Education. Washington, DC: The National Academies Press Retrieved from Scholar
  29. Next Generation Science Standards Lead States. (2013). Next generation science standards: for states, by states. Washington, DC: The National Academies Press Retrieved from Scholar
  30. O’Connor, K. (2011). 15 fixes for broken grades (Canadian edition). Toronto, ON: Pearson Canada.Google Scholar
  31. Organization for Economic Co-operation and Development [OECD] (2017). PISA 2015 assessment and analytical framework: science, reading, mathematic, financial literacy and collaborative problem solving, OECD Publishing, Paris. Retrieved from
  32. PhET. (2017). PhET interactive simulations: research. Retrieved from
  33. Popham, W. J. (2011). Classroom assessment: what teachers need to know (6th ed.). Boston: Pearson.Google Scholar
  34. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: applications and data analysis methods (Second Edition). Newbury Park, CA: Sage.Google Scholar
  35. Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29(7), 4–14 Retrieved from Scholar
  36. Shute, V. J., & Ventura, M. (2013). Measuring and supporting learning in games: stealth assessment. Cambridge, MA: Massachusetts Institute of Technology Press Retrieved from Scholar
  37. Shute, V., Leighton, J. P., Jang, E. E., & Chu, M.-W. (2016). Advances in the science of assessment. Educational Assessment, 21(1), 34–59. Scholar
  38. Snijders, T. A. B., & Bosker, R. J. (2012) Multilevel analysis: an introduction to basic and advanced multilevel modeling (Second Edition). London: Sage Publishers.Google Scholar
  39. Supovitz, J. (2009). Can high-stakes testing leverage educational improvement? Prospects from the last decade of testing and accountability reform. Journal of Educational Change, 10(1), 211–227. Scholar
  40. Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics. Boston: Pearson/Allyn & Bacon.Google Scholar
  41. Volante, L., & Jaafar, S. B. (2008). Educational assessment in Canada. Assessment in Education: Principles, Policy, & Practice, 15(2), 201–210. Retrieved from. Scholar
  42. Wainer, H. (1990). Introduction and history. In H. Wainer, N. J. Dorans, R. Flaugher, B. F. Green, R. J. Mislevy, L. Steinberg, & D. Thissen (Eds.), Computerized adaptive testing: a primer (pp. 1–21). Hillsdale, NJ: Erlbaum.Google Scholar
  43. Zenisky, A. L., & Sireci, S. G. (2002). Technological innovations in large-scale assessment. Applied Measurement in Education, 15(4), 337–362. Scholar

Copyright information

© Springer Science+Business Media B.V., part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of CalgaryCalgaryCanada
  2. 2.University of AlbertaEdmontonCanada

Personalised recommendations