Psychometrics is the study of the measurement of educational and psychological characteristics such as abilities, aptitudes, achievement, personality traits and knowledge (Everitt, 2006). Psychometric methods address challenges and problems arising in these measurements.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Adams, R. J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21(1), 1–23.CrossRefGoogle Scholar
  2. Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22(1), 47–76.CrossRefGoogle Scholar
  3. American Educational Research Association (AERA), American Psychological Association (APA), and National Council for Measurement in Education (NCME). (1999). Standards for psychological and educational tests. Washington D.C.: AERA, APA, and NCME.Google Scholar
  4. Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Dekker.Google Scholar
  5. Banta, T. W., Lund, J. P., Black, K. E., & Oblander, F. W. (1996). Assessment in practice: PuttingGoogle Scholar
  6. principles to work on college campuses. San Francisco: Jossey-Bass.Google Scholar
  7. Baxter, J. (1995). Children’s understanding of astronomy and the earth sciences. In S. M. Glynn & R. Duit (Eds.), Learning science in the schools: Research reforming practice (pp. 155–177). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  8. Black, P., Wilson, M., & Yao, S. (2011). Road maps for learning: A guide to the navigation of learning progressions. Measurement: Interdisciplinary Research and Perspectives, 9, 1–52.Google Scholar
  9. Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.CrossRefGoogle Scholar
  10. Brennan, R. L. (2006). Perspectives on the evolution and future of educational measurement. In R. L. Brennan (Ed.), Educational measurement (4th ed.).Westport, CT: Praeger.Google Scholar
  11. Briggs, D., Alonzo, A., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiplechoice items. Educational Assessment, 11(1), 33–63.CrossRefGoogle Scholar
  12. Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296–322.Google Scholar
  13. Campbell, N. R. (1928). An account of the principles of measurement and calculation. London: Longmans, Green & Co.Google Scholar
  14. Claesgens, J., Scalise, K., Wilson, M., & Stacy, A. (2009). Mapping student understanding in chemistry: The perspectives of chemists. Science Education, 93(1), 56–85.CrossRefGoogle Scholar
  15. Cooke, L. (2006). Is the mouse a poor man’s eye tracker? Proceedings of the Society for Technical Communication Conference. Arlington, VA: STC, 252–255.Google Scholar
  16. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.CrossRefGoogle Scholar
  17. Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral mea surements: Theory of generalizability for scores and profiles. New York: John Wiley.Google Scholar
  18. Dahlgren, L. O. (1984a). Outcomes of learning. In F. Marton, D. Hounsell & N. Entwistle (Eds.), The experience of learning. Edinburgh: Scottish Academic Press.Google Scholar
  19. De Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.Google Scholar
  20. De Boeck, P., & Wilson, M. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer-Verlag.Google Scholar
  21. Draney, K., & Wilson, M. (2009). Selecting cut scores with a composite of item types: The Construct Mapping procedure. In E. V. Smith, & G. E. Stone (Eds.), Criterion-referenced testing: Practice analysis to score reporting using Rasch measurement (pp. 276–293). Maple Grove, MN: JAM Press.Google Scholar
  22. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum & Associates.Google Scholar
  23. Everitt, B. S. (2010). Cambridge dictionary of statistics (3rd ed.). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  24. Galton, F. (1883). Inquiries into human faculty and its development. AMS Press, New York.CrossRefGoogle Scholar
  25. Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139–150.CrossRefGoogle Scholar
  26. Guttman, L. A. (1950). The basis for scalogram analysis. In S. A. Stouffer, L. A. Guttman, F. A. Suchman, P. F. Lazarsfeld, S. A. Star, & J. A. Clausen (Eds.), Studies in social psychology in world war two, vol. 4. Measurement and prediction. Princeton: Princeton University Press.Google Scholar
  27. Holland, P. W., & Hoskens, M. (2003). Classical test theory as a first-order item response theory: Application to true-score prediction from a possibly-nonparallel test. Psychometrika, 68, 123–149.CrossRefGoogle Scholar
  28. Ivie, J. L., Embretson, S., E. (2010). Cognitive process modeling of spatial ability: The assembling objects task. Intelligence, 38(3), 324–335.CrossRefGoogle Scholar
  29. Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and item-group predictors. In, P. De Boeck, & M. Wilson, (Eds.), Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer-Verlag.Google Scholar
  30. Kakkonen, T., Myller, N., Sutinen, E., & Timonen, J. (2008). Comparison of dimension reduction methods for automated essay grading. Educational Technology & Society, 11(3), 275–288.Google Scholar
  31. Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed.).Westport, CT: Praeger.Google Scholar
  32. Kofsky, E. (1966). A scalogram study of classificatory development. Child Development, 37, 191–204.CrossRefGoogle Scholar
  33. Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151–160.CrossRefGoogle Scholar
  34. Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 52.Google Scholar
  35. Longford, N. T., Holland, P. W., & Thayer, D. T.(1993). Stability of the MH D-DIF statistics across populations. In P. W. Holland, & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  36. Luce, R. D., & Tukey, J. W. (1964). Simultaneous conjoint measurement: A new type of fundamental measurement. Journal of Mathematical Psychology, 1, 1–27.CrossRefGoogle Scholar
  37. Magidson, J., & Vermunt, J. K. (2002). A nontechnical introduction to latent class models. Statistical innovations white paper No. 1. Available at: Scholar
  38. Marton, F. (1981). Phenomenography: Describing conceptions of the world around us. Instructional Science, 10(2), 177–200.CrossRefGoogle Scholar
  39. Masters, G. N., Adams, R. J., & Wilson, M. (1990). Charting of student progress. In T. Husen & T. N.Google Scholar
  40. Postlethwaite (Eds.), International encyclopedia of education: Research and studies. Supplementary, Volume 2 (pp. 628–634). Oxford: Pergamon Press.Google Scholar
  41. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed.). New York: American Council on Education and Macmillan.Google Scholar
  42. Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Education Researcher, 32(2), 13–23.CrossRefGoogle Scholar
  43. Meulders, M., & Xie, Y. (2004). Person-by-item predictors. In, P. De Boeck, & M. Wilson, (Eds.), Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer-Verlag.Google Scholar
  44. Michell, J. (1990). An introduction to the logic of psychological measurement. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  45. Mislevy, R, J., Wilson, M., Ercikan, K., & Chudowsky, N. (2003). Psychometric principles in student assessment. In T. Kellaghan, & D. L. Stufflebeam (Eds.), International handbook of educational evaluation. Dordrecht, The Netherlands: Kluwer Academic Press.CrossRefGoogle Scholar
  46. National Research Council. (2001). Knowing what students know: The science and design of educational assessment (Committee on the Foundations of Assessment. J. Pellegrino, N. Chudowsky, & R. Glaser, (Eds.), Division on behavioural and social sciences and education). Washington, DC: National Academy Press.Google Scholar
  47. National Research Council. (2008). Early childhood assessment: Why, what, and how? Committee on Developmental Outcomes and Assessments for Young Children, Catherine E. Snow & Susan B. Van Hemel, (Eds.), Board on children, youth and families, board on testing and assessment, division of behavioral and social sciences and education. Washington, DC: The National Academies Press.Google Scholar
  48. Nisbet, R. J., Elder, J., & Miner, G. D. (2009). Handbook of statistical analysis and data mining applications. Academic Press.Google Scholar
  49. Nunnally, C. J. (1978). Psychometric theory (2nd ed.) New York: McGraw Hill.Google Scholar
  50. Paek, I. (2002). Investigation of differential item functioning: Comparisons among approaches, and extension to a multidimensional context. Unpublished doctoral dissertation, University of California, Berkeley.Google Scholar
  51. Ramsden, P., Masters, G., Stephanou, A., Walsh, E., Martin, E., Laurillard, D., & Marton, F. (1993). Phenomenographic research and the measurement of understanding: An investigation of students’ conceptions of speed, distance, and time. International Journal of Educational Research, 19(3), 301–316.Google Scholar
  52. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danmarks Paedogogische Institut.Google Scholar
  53. Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the Ratings: Assessing the Psychometric Quality of Rating Data. Psychological Bulletin. 88(2), 413–428.CrossRefGoogle Scholar
  54. Scalise, K., & Wilson, M. (2011). The nature of assessment systems to support effective use of evidence through technology. E-Learning and Digital Media, 8(2), 121–132.CrossRefGoogle Scholar
  55. Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal and structural equation models. Boca Raton, FL: Chapman & Hall/CRC.CrossRefGoogle Scholar
  56. Shavelson, R. J., Webb, N. M., & Rowley, G. L. (1989). Generalizability theory. American Psychologist, 44, 922–932.CrossRefGoogle Scholar
  57. Spearman, C. C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101.CrossRefGoogle Scholar
  58. Spearman, C, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295.CrossRefGoogle Scholar
  59. Takane, Y. (2007). Applications of multidimensional scaling in psychometrics. In C. R. Rao, & S. Sinharay (Eds.), Handbook of statistics, Vol. 26: Psychometrics. Amsterdam: Elsevier.Google Scholar
  60. Thorburn, W. M. (1918). The myth of occam’s Razor. Mind, 27(107), 345–353.CrossRefGoogle Scholar
  61. van der Linden, W. (1992). Fundamental measurement and the fundamentals of Rasch measurement. In M. Wilson (Ed.), Objective measurement: Theory into practice Vol. 2. Norwood, NJ: Ablex Publishing Corp.Google Scholar
  62. van der Linden, W. J., & Hambleton, R. K. (Eds.) (1997). Handbook of modern item response theory. New York: Springer.Google Scholar
  63. Vosniadou, S., & Brewer, W. F. (1994). Mental models of the day/night cycle. Cognitive Science, 18, 123–183.CrossRefGoogle Scholar
  64. Wang, W.-C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29,126–149.CrossRefGoogle Scholar
  65. Wiliam, D. (2011). Embedded formative assessment. Bloomington, IN: Solution Tree Press,Google Scholar
  66. Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  67. Wilson, M. (2009). Measuring progressions: Assessment structures underlying a learning progression. Journal for Research in Science Teaching, 46(6), 716–730.CrossRefGoogle Scholar
  68. Wilson, M., & Adams, R. J. (1995). Rasch models for item bundles. Psychometrika, 60(2), 181–198.CrossRefGoogle Scholar
  69. Wilson, M., & Draney, K. (2002). A technique for setting standards and maintaining them over time. In S. Nishisato, Y. Baba, H. Bozdogan, & K. Kanefugi (Eds.), Measurement and multivariate analysis (Proceedings of the International Conference on Measurement and Multivariate Analysis, Banff, Canada, May 12–14, 2000), pp. 325–332. Tokyo: Springer-Verlag.Google Scholar
  70. Wright, B. D. (1968). Sample-free test calibration and person measurement. Proceedings of the 1967 invitational conference on testing (pp. 85–101). Princeton, NJ: Educational Testing Service.Google Scholar
  71. Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14, 97–116.CrossRefGoogle Scholar
  72. Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: MESA Press.Google Scholar

Copyright information

© Sense Publishers 2013

Authors and Affiliations

  • Mark Wilson
  • Perman Gochyyev

There are no affiliations available

Personalised recommendations