Advertisement

Educational Psychology Review

, Volume 31, Issue 1, pp 1–34 | Cite as

The Psychometric Modeling of Scientific Reasoning: a Review and Recommendations for Future Avenues

  • Peter A. EdelsbrunnerEmail author
  • Fabian Dablander
REVIEW ARTICLE

Abstract

Psychometric modeling has become a frequently used statistical tool in research on scientific reasoning. We review psychometric modeling practices in this field, including model choice, model testing, and researchers’ inferences based on their psychometric practices. A review of 11 empirical research studies reveals that the predominant psychometric approach is Rasch modeling with a focus on itemfit statistics, applied in a way strongly similar to practices in national and international large-scale educational assessment programs. This approach is common in the educational assessment community and rooted in subtle philosophical views on measurement. However, we find that based on this approach, researchers tend to draw interpretations that are not within the inferential domain of this specific approach and not in accordance with the related practices and inferential purposes. In some of the reviewed articles, researchers put emphasis on item infit statistics for dimensionality assessment. Item infit statistics, however, cannot be regarded as a valid indicator of the dimensionality of scientific reasoning. Using simulations as illustration, we argue that this practice is limited in delivering psychological insights; in fact, various recent inferences about the structure, cognitive basis, and correlates of scientific reasoning might be unwarranted. In order to harness its full potential, we make suggestions towards adjusting psychometric modeling practices to the psychological and educational questions at hand.

Keywords

Scientific reasoning Psychometrics Review Rasch model Item response theory 

Notes

References

  1. Ainley, J., Fraillon, J., & Freeman, C. (2007). National assessment program—ICT literacy years 6 & 10 report, 2005. Ministerial Council on Education, Employment, Training and Youth Affairs (NJ1).Google Scholar
  2. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. American Educational Research Association.Google Scholar
  3. Andersen, E. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38(1), 123–140.  https://doi.org/10.1007/BF02291180.Google Scholar
  4. Anderson, C. J., Li, Z., & Vermunt, J. K. (2007). Estimation of models in a Rasch family for polytomous items and multiple latent variables. Journal of Statistical Software, 20(6), 1–36.  https://doi.org/10.18637/jss.v020.i06.Google Scholar
  5. Andrich, D. (2004). Controversy and the Rasch model: a characteristic of incompatible paradigms? Medical Care, 42(Supplement), I–7.  https://doi.org/10.1097/01.mlr.0000103528.48582.7c.Google Scholar
  6. Andrich, D. (2011, October). Rating scales and Rasch measurement. Expert Review of Pharmacoeconomics & Outcomes Research, 11(5), 571–585.  https://doi.org/10.1586/erp.11.59.Google Scholar
  7. Baird, J.-A., Andrich, D., Hopfenbeck, T. N., & Stobart, G. (2017). Assessment and learning: fields apart? Assessment in Education: Principles, Policy & Practice, 24(3), 317–350.  https://doi.org/10.1080/0969594X.2017.1319337.Google Scholar
  8. Bartholomew, D. J., Deary, I. J., & Lawn, M. (2009). A new lease of life for thomson’s bonds model of intelligence. Psychological Review, 116(3), 567–579.Google Scholar
  9. Bartolucci, F., Bacci, S., & Gnaldi, M. (2014). MultiLCIRT: an R package for multidimensional latent class item response models. Computational Statistics & Data Analysis, 71, 971–985.  https://doi.org/10.1016/j.csda.2013.05.018.Google Scholar
  10. Bird, A. (2013). Thomas kuhn. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy (Fall 2013). Metaphysics Research Lab, Stanford University.Google Scholar
  11. Bond, T. & Fox, C. M. (2015). Applying the rasch model: fundamental measurement in the human sciences. Routledge.Google Scholar
  12. Bonifay, W., Lane, S. P., & Reise, S. P. (2016). Three concerns with applying a bifactor model as a structure of psychopathology. Clinical Psychological Science, 5(1), 184–186.  https://doi.org/10.1177/2167702616657069.Google Scholar
  13. Boone, W. J., Staver, J. R., & Yale, M. S. (2014). Rasch analysis in the human sciences. Dordrecht: Springer Netherlands.Google Scholar
  14. Borsboom, D. (2008). Latent variable theory. Measurement: Interdisciplinary Research & Perspective, 6(1-2), 25–53.  https://doi.org/10.1080/15366360802035497.Google Scholar
  15. Bozdogan, H. (1987). Model selection and akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika, 52(3), 345–370.  https://doi.org/10.1007/BF02294361.Google Scholar
  16. Brown, N. J., & Wilson, M. (2011). A model of cognition: the missing cornerstone of assessment. Educational Psychology Review, 23(2), 221–234.Google Scholar
  17. Brown, N. J., Furtak, E. M., Timms, M., Nagashima, S. O., & Wilson, M. (2010). The evidence-based reasoning framework: assessing scientific reasoning. Educational Assessment, 15(3-4), 123–141.Google Scholar
  18. Bruner, J. S., Goodnow, J. J., & Austin, G. A. (1956). A study of thinking. 1956. New York: John Wiley.Google Scholar
  19. Bürkner, P. C. (2017). brms: an R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28.  https://doi.org/10.18637/jss.v080.i01.Google Scholar
  20. Cano, F. (2005). Epistemological beliefs and approaches to learning: their change through secondary school and their influence on academic performance. British Journal of Educational Psychology, 75(2), 203–221.  https://doi.org/10.1348/000709904X22683.Google Scholar
  21. Carey, S. (1992). The origin and evolution of everyday concepts. University of Minnesota Press, Minneapolis.Google Scholar
  22. Caspi, A., Houts, R. M., Belsky, D. W., Goldman-Mellor, S. J., Harrington, H., Israel, S., Meier, M. H., Ramrakha, S., Shalev, I., Poulton, R., & Moffitt, T. E. (2014). The p factor: one general psychopathology factor in the structure of psychiatric disorders? Clinical Psychological Science, 2(2), 119–137.  https://doi.org/10.1177/2167702613497473.Google Scholar
  23. Chambers, C. D., Dienes, Z., McIntosh, R. D., Rotshtein, P., & Willmes, K. (2015). Registered reports: realigning incentives in scientific publishing. Cortex, 66(3), A1–A2.  https://doi.org/10.1016/j.cortex.2012.12.016.Google Scholar
  24. Chen, Z., & Klahr, D. (1999). All other things being equal: acquisition and transfer of the control of variables strategy. Child Development, 70(5), 1098–1120.  https://doi.org/10.1111/1467-8624.00081.Google Scholar
  25. Christensen, K. B. & Kreiner, S. (2013). Item fit statistics. In Rasch models in health (pp. 83–104). John Wiley & Sons, Inc.  https://doi.org/10.1002/9781118574454.ch5.
  26. Conway, A. R., & Kovacs, K. (2015). New and emerging models of human intelligence. Wiley Interdisciplinary Reviews: Cognitive Science, 6(5), 419–426.  https://doi.org/10.1002/wcs.1356.Google Scholar
  27. Cullen, L. T. (2012). Rasch models: foundations, recent developments, and applications. [S.l.]: Springer.Google Scholar
  28. Davier, M. v., & Carstensen, C. H. (2007). Multivariate and mixture distribution rasch models extensions and applications. New York: Springer.Google Scholar
  29. De Groot, A. (2014). The meaning of “significance” for different types of research [translated and annotated by Eric-Jan Wagenmakers, Denny Borsboom, Josine Verhagen, Rogier Kievit, Marjan Bakker, Angelique Cramer, Dora Matzke, Don Mellenbergh, and Han Lj Van Der Maas]. Acta Psychologica, 148, 188–194.  https://doi.org/10.1016/j.actpsy.2014.02.001.Google Scholar
  30. de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33(3), 163–183.  https://doi.org/10.1177/0146621608320523.Google Scholar
  31. Deary, I. J., Wilson, J. A., Carding, P. N., MacKenzie, K., & Watson, R. (2010). From dysphonia to dysphoria: Mokken scaling shows a strong, reliable hierarchy of voice symptoms in the Voice Symptom Scale questionnaire. Journal of Psychosomatic Research, 68(1), 67–71.  https://doi.org/10.1016/j.jpsychores.2009.06.008.Google Scholar
  32. Dewey, J. (1910). How we think. Boston, MA: DC Heath.Google Scholar
  33. Dickison, P., Luo, X., Kim, D., Woo, A., Muntean, W., & Bergstrom, B. (2016). Assessing higher-order cognitive constructs by using an information-processing framework. Journal of Applied Testing Technology, 17, 1–19.Google Scholar
  34. Divgi, D. (1986). Does the rasch model really work for multiple choice items? Not if you look closely. Journal of Educational Measurement, 23(4), 283–298.  https://doi.org/10.1111/j.1745-3984.1986.tb00251.x.Google Scholar
  35. Donovan, J., Hutton, P., Lennon, M., O’Connor, G., & Morrissey, N. (2008a). National assessment program—science literacy year 6 school release materials, 2006. Ministerial Council on Education, Employment, Training and Youth Affairs (NJ1).Google Scholar
  36. Donovan, J., Lennon, M., O’connor, G., & Morrissey, N. (2008b). National assessment program–science literacy year 6 report, 2006. Ministerial Council on Education, Employment, Training and Youth Affairs (NJ1).Google Scholar
  37. Engelhard Jr, G. (2013). Invariant measurement: using Rasch models in the social, behavioral, and health sciences. New York: Routledge.Google Scholar
  38. Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, 31(2), 93–112.Google Scholar
  39. Esswein, J. L. (2010). Critical thinking and reasoning in middle school science education (Doctoral dissertation, The Ohio State University).Google Scholar
  40. Finkelstein, L. (2003). Widely, strongly and weakly defined measurement. Measurement, 34(1), 39–48.  https://doi.org/10.1016/S0263-2241(03)00018-6.Google Scholar
  41. Fischer, F., Kollar, I., Ufer, S., Sodian, B., Hussmann, H., Pekrun, R., et al. (2014). Scientific reasoning and argumentation: advancing an interdisciplinary research agenda in education. Frontline Learning Research, 4, 28–45.  https://doi.org/10.14786/flr.v2i2.96.Google Scholar
  42. Fox, J.-P. (2010). Bayesian item response modeling: theory and applications. Springer Science & Business Media.Google Scholar
  43. Gebhardt, E. (2016). Latent path models within an irt framework (Doctoral dissertation).Google Scholar
  44. Gelman, A., & Loken, E. (2014). The statistical crisis in science data-dependent analysis—a “garden of forking paths”—explains why many statistically significant comparisons don’t hold up. American Scientist, 102(6), 460.Google Scholar
  45. Gignac, G. E. (2016, July). On the evaluation of competing theories: a reply to van der Maas and Kan. Intelligence, 57, 84–86.  https://doi.org/10.1016/j.intell.2016.03.006.Google Scholar
  46. Glas, C. A. & Verhelst, N. D. (1995). Testing the Rasch model. In Rasch models (pp. 69–95). Springer.Google Scholar
  47. Glockner-Rist, A., & Hoijtink, H. (2003). The best of both worlds: factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10(4), 544–565.  https://doi.org/10.1207/S15328007SEM1004_4.Google Scholar
  48. Grube, C. R. (2010). Kompetenzen naturwissenschaftlicher Erkenntnisgewinnung [Competencies of scientific inquiry] (Doctoral dissertation, Universität Kassel).Google Scholar
  49. Hambleton, R. K. (2000). Response to hays et al and McHorney and Cohen: emergence of item response modeling in instrument development and data analysis. Medical Care, 38, II–60.  https://doi.org/10.1097/00005650-200009002-00009.Google Scholar
  50. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.Google Scholar
  51. Hartig, J., & Frey, A. (2013). Sind Modelle der Item-Response-Theorie (IRT) das “Mittel der Wahl ”für die Modellierung von Kompetenzen? Zeitschrift für Erziehungswissenschaft, 16(S1), 47–51.  https://doi.org/10.1007/s11618-013-0386-0.Google Scholar
  52. Hartig, J., Klieme, E., & Leutner, D. (2008). Assessment of competencies in educational contexts. Hogrefe Publishing.Google Scholar
  53. Hartmann, S., Upmeier zu Belzen, A., Kroeger, D., & Pant, H. A. (2015, January). Scientific reasoning in higher education: constructing and evaluating the criterion-related validity of an assessment of preservice science teachers’ competencies. Zeitschrift fuer Psychologie, 223(1), 47–53.  https://doi.org/10.1027/2151-2604/a000199.Google Scholar
  54. Heene, M. (2006). Konstruktion und Evaluation eines Studierendenauswahlverfahrens für Psychologie an der Universität Heidelberg. Unpublished Doctoral Dissertation, University of Heidelberg.Google Scholar
  55. Heene, M., Bollmann, S., & Buhner, M. (2014). Much ado about nothing, or much to do about something: effects of scale shortening on criterion validity and mean differences. Journal of Individual Differences, 35(4), 245–249.  https://doi.org/10.1027/1614-0001/a000146Heene,M.
  56. Heene, M., Kyngdon, A., & Sckopke, P. (2016). Detecting violations of unidimensionality by order-restricted inference methods. Frontiers in Applied Mathematics and Statistics, 2, 3.Google Scholar
  57. Holland, P. W., & Hoskens, M. (2003). Classical test theory as a first-order item response theory: application to true-score prediction from a possibly nonparallel test. Psychometrika, 68(1), 123–149.Google Scholar
  58. Humphry, S. (2011, January). The role of the unit in physics and psychometrics. Measurement: Interdisciplinary Research & Perspective, 9(1), 1–24.  https://doi.org/10.1080/15366367.2011.558442.Google Scholar
  59. Jeon, M., Draney, K., & Wilson, M. (2015). A general saltus lltm-r for cognitive assessments. In Quantitative psychology research (pp. 73–90). Springer.  https://doi.org/10.1007/978-3-319-07503-7_5.
  60. Kiefer, T., Robitzsch, A., Wu, M., & Robitzsch, A. (2016). Package tam. R software package. Kitchner, K. S. (1983). Cognition, metacognition, and epistemic cognition. Human Development, 26, 222–232.Google Scholar
  61. Klahr, D. (2002). Exploring science: the cognition and development of discovery processes. The MIT Press.Google Scholar
  62. Klahr, D., & Dunbar, K. (1988). Dual space search during scientific reasoning. Cognitive Science, 12(1), 1–48.  https://doi.org/10.1207/s15516709cog1201_1.Google Scholar
  63. Koeppen, K., Hartig, J., Klieme, E., & Leutner, D. (2008). Current issues in competence modeling and assessment. Zeitschrift für Psychologie, 216(2), 61–73.Google Scholar
  64. Koller, I., Maier, M. J., & Hatzinger, R. (2015). An empirical power analysis of quasi-exact tests for the rasch model. Methodology, 11(2), 45–54.  https://doi.org/10.1027/1614-2241/a000090.Google Scholar
  65. Körber, S., Mayer, D., Osterhaus, C., Schwippert, K., & Sodian, B. (2014, September). The development of scientific thinking in elementary school: a comprehensive inventory. Child Development, 86(1), 327–336.  https://doi.org/10.1111/cdev.12298.Google Scholar
  66. Körber, S., Osterhaus, C., & Sodian, B. (2015). Testing primary-school children’s understanding of the nature of science. British Journal of Developmental Psychology, 33(1), 57–72.  https://doi.org/10.1111/bjdp.12067.Google Scholar
  67. Kreiner, S. & Christensen, K. B. (2013). Overall tests of the rasch model. In Rasch models in health (pp. 105–110). John Wiley & Sons, Inc.  https://doi.org/10.1002/9781118574454.ch6.
  68. Kremer, K., Specht, C., Urhahne, D., & Mayer, J. (2014, January 2). The relationship in biology between the nature of science and scientific inquiry. Journal of Biological Education, 48(1), 1–8.  https://doi.org/10.1080/00219266.2013.788541.Google Scholar
  69. Kuhn, D. (1989). Children and adults as intuitive scientists. Psychological Review, 96(4), 674–689.  https://doi.org/10.1037/0033-295X.96.4.674.Google Scholar
  70. Kuhn, D. (1991). The skills of argument. Cambridge University Press.Google Scholar
  71. Kuhn, D., Iordanou, K., Pease, M., & Wirkala, C. (2008). Beyond control of variables: what needs to develop to achieve skilled scientific thinking? Cognitive Development, 23(4), 435–451.  https://doi.org/10.1016/j.cogdev.2008.09.006.Google Scholar
  72. Kuhn, D., & Pease, M. (2008). What needs to develop in the development of inquiry skills? Cognition and Instruction, 26(4), 512–559.  https://doi.org/10.1080/07370000802391745.Google Scholar
  73. Kuhn, D., Ramsey, S., & Arvidsson, T. S. (2015, July). Developing multivariable thinkers. Cognitive Development, 35, 92–110.  https://doi.org/10.1016/j.cogdev.2014.11.003.Google Scholar
  74. Kuhn, D., & Udell, W. (2003). The development of argument skills. Child Development, 74(5), 1245–1260.  https://doi.org/10.1111/1467-8624.00605.Google Scholar
  75. Kuhn, T. S. (1970). The structure of scientific revolutions ([2d ed., enl). International encyclopedia of unified science. Foundations of the unity of science, v. 2, no. 2. Chicago: University of Chicago Press.Google Scholar
  76. Kuo, C.-Y., Wu, H.-K., Jen, T.-H., & Hsu, Y.-S. (2015, September 22). Development and validation of a multimedia-based assessment of scientific inquiry abilities. International Journal of Science Education, 37(14), 2326–2357.  https://doi.org/10.1080/09500693.2015.1078521.Google Scholar
  77. Lehrer, R., & Schauble, L. (2000). Modeling in mathematics and science. In R. Glaser (Ed.), Advances in instructional psychology, Volume 5: Educational Design and Cognitive Science (pp. 100–159). New Jersey: Lawrence Erlbaum.Google Scholar
  78. Linacre, J. M. (2010). Two perspectives on the application of rasch models. European Journal of Phsyciological Rehabilitaiton Medicine, 46, 309–310.Google Scholar
  79. Linacre, J. M. (2012). A user’s guide to facets rasch-model computer programs.Google Scholar
  80. Linacre, J. M., & Wright, B. D. (1994). Dichotomous infit and outfit mean-square fit statistics. Rasch Measurement Transactions, 8(2), 260.Google Scholar
  81. Linacre, J. M. & Wright, B. D. (2000). Winsteps. URL: http://www.winsteps.com/index.htm [accessed 2017-01-01].
  82. Lou, Y., Blanchard, P., & Kennedy, E. (2015). Development and validation of a science inquiry skills assessment. Journal of Geoscience Education, 63(1), 73–85.  https://doi.org/10.5408/14-028.1.Google Scholar
  83. MacCallum, R. C., Wegener, D. T., Uchino, B. N., & Fabrigar, L. R. (1993). The problem of equivalent models in applications of covariance structure analysis. Psychological Bulletin, 114(1), 185–199.Google Scholar
  84. Mair, P., & Hatzinger, R. (2007). Extended rasch modeling: the erm package for the application of irt models in r. Journal of Statistical Software, 20(9), 1–20.  https://doi.org/10.18637/jss.v020.i09.Google Scholar
  85. Manlove, S., Lazonder, A. W., & Jong, T. D. (2006). Regulative support for collaborative scientific inquiry learning. Journal of Computer Assisted Learning, 22(2), 87–98.Google Scholar
  86. Mari, L., Maul, A., Irribarra, D. T., & Wilson, M. (2016). A meta-structural understanding of measurement. In Journal of physics: conference series (Vol. 772, p. 012009). IOP Publishing.Google Scholar
  87. Mari, L., Maul, A., Torres Irribarra, D., & Wilson, M. (2017). Quantities, Quantification, and the Necessary and Sufficient Conditions for Measurement. Measurement, 100, 115–121Google Scholar
  88. Masters, G. N. (1988). Item discrimination: when more is worse. Journal of Educational Measurement, 25(1), 15–29.  https://doi.org/10.1111/j.1745-3984.1988.tb00288.x.Google Scholar
  89. Maul, A. (2017). Rethinking traditional methods of survey validation. Measurement: Interdisciplinary Research and Perspectives, 15(2), 51–69.  https://doi.org/10.1080/15366367.2017.1348108.Google Scholar
  90. Maydeu-Olivares, A. (2013, July). Goodness-of-fit assessment of item response theory models. Measurement: Interdisciplinary Research & Perspective, 11(3), 71–101.  https://doi.org/10.1080/15366367.2013.831680.Google Scholar
  91. Maydeu-Olivares, A., & Joe, H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavioral Research, 49(4), 305–328.  https://doi.org/10.1080/00273171.2014.911075.Google Scholar
  92. Mayer, D., Sodian, B., Körber, S., & Schwippert, K. (2014, February). Scientific reasoning in elementary school children: assessment and relations with cognitive abilities. Learning and Instruction, 29, 43–55.  https://doi.org/10.1016/j.learninstruc.2013.07.005.Google Scholar
  93. Meijer, R. R., Sijtsma, K., & Smid, N. G. (1990). Theoretical and empirical comparison of the Mokken and the Rasch approach to IRT. Applied Psychological Measurement, 14(3), 283–298.  https://doi.org/10.1177/014662169001400306.Google Scholar
  94. Michell, J. (2000). Normal science, pathological science and psychometrics. Theory & Psychology, 10(5), 639–667.Google Scholar
  95. Mokken, R. J. (1971). A theory and procedure of scale analysis: with applications in political research. Walter de Gruyter.Google Scholar
  96. Molenaar, I. W. (2001). Thirty years of nonparametric item response theory. Applied Psychological Measurement, 25(3), 295–299.  https://doi.org/10.1177/01466210122032091.Google Scholar
  97. Morris, B. J., Croker, S., Masnick, A., & Zimmerman, C. (2012). The emergence of scientific reasoning. In Current topics in children’s learning and cognition. Rijeka, Croatia: InTech.Google Scholar
  98. Mullis, I. V. S., Martin, M. O., Ruddock, G. J., O’Sullivan, C. Y., & Preuschoff, C. (2009). Timss 2011 assessment frameworks.Google Scholar
  99. Mullis, I. V., Martin, M. O., Smith, T. A., Garden, R. A., Gregory, K. D., Gonzalez, E. J., … O’Connor, K. M. (2003). TIMSS trends in mathematics and science study: assessment frameworks and specifications 2003.Google Scholar
  100. Musek, J. (2007). A general factor of personality: evidence for the big one in the five-factor model. Journal of Research in Personality, 41(6), 1213–1233.  https://doi.org/10.1016/j.jrp.2007.02.003.Google Scholar
  101. National Assessment Governing Board. (2007). Science assessment and item specifications for the 2009 national assessment of educational progress. Washington: National Assessment Governing Board.Google Scholar
  102. Nowak, K. H., Nehring, A., Tiemann, R., & Upmeier zu Belzen, A. (2013). Assessing students’ abilities in processes of scientific inquiry in biology using a paper-and-pencil test. Journal of Biological Education, 47(3), 182–188.  https://doi.org/10.1080/00219266.2013.822747.Google Scholar
  103. OECD. (2006). Assessing scientific, reading and mathematical literacy: a framework for PISA 2006. Paris: Organisation for Economic Co-operation and Development.Google Scholar
  104. Opitz, A., Heene, M., & Fischer, F. (2017). Measuring scientific reasoning—a review of test instruments. Educational Research and Evaluation, 23(3-4), 78–101.Google Scholar
  105. Pant, H. A., Stanat, P., Schroeders, U., Roppelt, A., Siegle, T., Pohlmann, C., & Institut zur Qualitätsentwicklung im Bildungswesen (Eds.). (2013). IQB ländervergleich 2012: mathematische und naturwissenschaftliche Kompetenzen am Ende der Sekundarstufe i. Munster: Waxmann.Google Scholar
  106. Peirce, C. S. (2012). Philosophical writings of Peirce. Courier Corporation.Google Scholar
  107. Piaget, J. & Inhelder, B. (1958). The growth of logical thinking from childhood to adolescence: an essay on the construction of formal operational structures. Abingdon, Oxon: Routledge.Google Scholar
  108. Pohl, S., & Steyer, R. (2010). Modeling common traits and method effects in multitrait-multimethod analysis. Multivariate Behavioral Research, 45(1), 45–72.  https://doi.org/10.1080/00273170903504729.Google Scholar
  109. Raiche, G. & Raiche, M. G. (2009). The irtprob package.Google Scholar
  110. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Nielsen & Lydiche.Google Scholar
  111. Raykov, T. & Marcoulides, G. A. (2011). Introduction to psychometric theory. Routledge.Google Scholar
  112. R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. Accessed 15 Sept 2013.
  113. Reckase, M. (2009). Multidimensional item response theory. Springer.Google Scholar
  114. Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47(5), 667–696.  https://doi.org/10.1080/00273171.2012.715555.Google Scholar
  115. Reise, S. P., Bonifay, W. E., & Haviland, M. G. (2013). Scoring and modeling psychological measures in the presence of multidimensionality. Journal of Personality Assessment, 95(2), 129–140.  https://doi.org/10.1080/00223891.2012.725437.Google Scholar
  116. Renkl, A. (2012). Modellierung von Kompetenzen oder von interindividuellen Kompetenzunterschieden. Psychologische Rundschau., 63(1), 50–53.Google Scholar
  117. Revelle, W. (2004). An introduction to psychometric theory with applications in r. Springer.Google Scholar
  118. Roberts, S., & Pashler, H. (2000). How persuasive is a good fit? A comment on theory testing. Psychological Review, 107(2), 358–367.Google Scholar
  119. Robitzsch, A. (2016). Essays zu methodischen herausforderungen im large-scale assessment. Humboldt-Universität zu Berlin.Google Scholar
  120. Robitzsch, A., Kiefer, T., George, A. C., & Uenlue, A. (2014). Cdm: cognitive diagnosis modeling. R package version, 3.Google Scholar
  121. Rosseel, Y., Oberski, D., Byrnes, J., Vanbrabant, L., Savalei, V., Merkle, E., ... Barendse, M., et al. (2017). Package lavaan.Google Scholar
  122. Rost, J., Carstensen, C., & Von Davier, M. (1997). Applying the mixed rasch model to personality questionnaires. Applications of latent trait and latent class models in the social sciences, 324–332.Google Scholar
  123. Schommer, M., Calvert, C., Gariglietti, G., & Bajaj, A. (1997). The development of epistemological beliefs among secondary students: a longitudinal study. Journal of Educational Psychology, 89(1), 37–40.  https://doi.org/10.1037/0022-0663.89.1.37.Google Scholar
  124. Siersma, V. & Eusebi, P. (2013). Analysis with repeatedly measured binary item response data by ad hoc rasch scales. In Rasch models in health (pp. 257–276). John Wiley & Sons, Inc.  https://doi.org/10.1002/9781118574454.ch14
  125. Sijtsma, K. (2011). Review. Measurement, 44(7), 1209–1219.  https://doi.org/10.1016/j.measurement.2011.03.019.Google Scholar
  126. Sijtsma, K. (2012, December 1). Psychological measurement between physics and statistics. Theory & Psychology, 22(6), 786–809.  https://doi.org/10.1177/0959354312454353.Google Scholar
  127. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366.Google Scholar
  128. Smith, A. B., Rush, R., Fallowfield, L. J., Velikova, G., & Sharpe, M. (2008). Rasch fit statistics and sample size considerations for polytomous data. BMC Medical Research Methodology, 8(1), 33.  https://doi.org/10.1186/1471-2288-8-33.Google Scholar
  129. Smith, R. M., Schumacker, R. E., & Bush, M. J. (1998). Using item mean squares to evaluate fit to the Rasch model. Journal of Outcome Measurement, 2(1), 66–78.Google Scholar
  130. Smith, R. M., & Suh, K. K. (2003). Rasch fit statistics as a test of the invariance of item parameter estimates. Journal of Applied Measurement, 4(2), 153–163.Google Scholar
  131. Sodian, B., & Bullock, M. (2008, October). Scientific reasoning where are we now? Cognitive Development, 23(4), 431–434.  https://doi.org/10.1016/j.cogdev.2008.09.003.Google Scholar
  132. Sodian, B., Zaitchik, D., & Carey, S. (1991). Young children’s differentiation of hypothetical beliefs from evidence. Child Development, 62(4), 753–766.  https://doi.org/10.1111/j.1467-8624.1991.tb01567.x.Google Scholar
  133. Stewart, I. (2008). Nature’s numbers: the unreal reality of mathematics. NY: Basic Books.Google Scholar
  134. Strobl, C., Kopf, J., & Zeileis, A. (2015). Rasch trees: a new method for detecting differential item functioning in the Rasch model. Psychometrika, 80(2), 289–316.  https://doi.org/10.1007/s11336-013-9388-3.Google Scholar
  135. Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51(4), 567–577.  https://doi.org/10.1007/BF02295596.Google Scholar
  136. Thurstone, L. L. [Louis L]. (1928). Attitudes can be measured. American Journal of Sociology, 33, 529–554, 4.Google Scholar
  137. Thurstone, L. L. Louis Leon & Chave, E. J. (1954). Chicago: Chicago University Press.Google Scholar
  138. Toulmin, S. (1974). Human understanding, volume i.Google Scholar
  139. Van der Ark, L. A., et al. (2007). Mokken scale analysis in r. Journal of Statistical Software, 20, 1–19.Google Scholar
  140. Van Der Maas, H. L., Dolan, C. V., Grasman, R. P., Wicherts, J. M., Huizenga, H. M., & Raijmakers, M. E. (2006). A dynamical model of general intelligence: the positive manifold of intelligence by mutualism. Psychological Review, 113(4), 842–861.  https://doi.org/10.1037/0033-295X.113.4.842.Google Scholar
  141. van Bork, R., Epskamp, S., Rhemtulla, M., Borsboom, D., & van der Maas, H. L. (2017). What is the p-factor of psychopathology? Some risks of general factor modeling. Theory & Psychology, 27(6), 759–773.Google Scholar
  142. Vandekerckhove, J., Matzke, D., & Wagenmakers, E.-J. (2015). Model comparison and the principle. The Oxford handbook of computational and mathematical psychology, 300.Google Scholar
  143. Vandekerckhove, J., Tuerlinckx, F., & Lee, M. D. (2011). Hierarchical diffusion models for two-choice response times. Psychological Methods, 16(1), 44–62.Google Scholar
  144. von Davier, M. (2001). Winmira 2001. Computer software]. St. Paul, MN: Assessment Systems Corporation.Google Scholar
  145. Vosniadou, S., & Brewer, W. F. (1992). Mental models of the earth: a study of conceptual change in childhood. Cognitive Psychology, 24(4), 535–585.  https://doi.org/10.1016/0010-0285(92)90018-W.Google Scholar
  146. Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804.  https://doi.org/10.3758/BF03194105.Google Scholar
  147. Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632–638.Google Scholar
  148. Whitely, S. E. (1983) Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93(1), 179–197Google Scholar
  149. Wilkening, F., & Sodian, B. (2005). Scientific reasoning in young children: introduction. Swiss Journal of Psychology, 64(3), 137–139.  https://doi.org/10.1024/1421-0185.64.3.137.Google Scholar
  150. Wilson, M., Allen, D. D., & Li, J. C. (2006). Improving measurement in health education and health behavior research using item response modeling: comparison with the classical test theory approach. Health Education Research, 21(Supplement 1), i19–i32.Google Scholar
  151. Wright, B. D. (1979). Best test design. Chicago, IL: MESA Press.Google Scholar
  152. Wright, B. D. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 370.Google Scholar
  153. Wu, M. (2004). Plausible values. Rasch Measurement Transactions, 18, 976–978.Google Scholar
  154. Wu, M. L. (2007). ACER ConQuest version 2.0: generalised item response modelling software. Camberwell, Vic.: ACER Press.Google Scholar
  155. Zimmerman, C. (2000). The development of scientific reasoning skills. Developmental Review, 20(1), 99–149.  https://doi.org/10.1006/drev.1999.0497.Google Scholar
  156. Zimmerman, C. (2007). The development of scientific thinking skills in elementary and middle school. Developmental Review, 27(2), 172–223.  https://doi.org/10.1016/j.dr.2006.12.001.Google Scholar
  157. Zimmerman, C., & Klahr, D. (2018). Development of scientific thinking. In J. T. Wixted (Ed.), Stevens’ handbook of experimental psychology and cognitive neuroscience (pp. 1–25). Hoboken: John Wiley & Sons, Inc..Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.ETH ZürichZürichSwitzerland
  2. 2.University of AmsterdamAmsterdamNetherlands

Personalised recommendations