Abstract
We propose a generalization of the speed–accuracy response model (SARM) introduced by Maris and van der Maas (Psychometrika 77:615–633, 2012). In these models, the scores that result from a scoring rule that incorporates both the speed and accuracy of item responses are modeled. Our generalization is similar to that of the one-parameter logistic (or Rasch) model to the two-parameter logistic (or Birnbaum) model in item response theory. An expectation–maximization (EM) algorithm for estimating model parameters and standard errors was developed. Furthermore, methods to assess model fit are provided in the form of generalized residuals for item score functions and saddlepoint approximations to the density of the sum score. The presented methods were evaluated in a small simulation study, the results of which indicated good parameter recovery and reasonable type I error rates for the residuals. Finally, the methods were applied to two real data sets. It was found that the two-parameter SARM showed improved fit compared to the one-parameter SARM in both data sets.
Similar content being viewed by others
Notes
The executable and software manual are freely available for noncommercial use at sarm@ets.org.
This data set was suggested by an anonymous reviewer.
References
Andersen, E. B. (1973). Conditional inference and multiple choice questionnaires. British Journal of Mathematical and Statistical Psychology, 26, 31–44. https://doi.org/10.1111/j.2044-8317.1973.tb00504.x.
Biehler, M., Holling, H., & Doebler, P. (2015). Saddlepoint approximations of the distribution of the person parameter in the two parameter logistic model. Psychometrika, 80, 665–688. https://doi.org/10.1007/s11336-014-9405-1.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. Lord & M. Novick (Eds.), Statistical theories of mental test scores (pp. 397–472). Reading, MA: Addison-Wesley.
Butler, R. W. (2007). Saddlepoint approximations with applications. Cambridge: Cambridge University Press.
De Boeck, P., Chen, H., & Davison, M. (2017). Spontaneous and imposed speed of cognitive test responses. British Journal of Mathematical and Statistical Psychology, 70, 225–237. https://doi.org/10.1111/bmsp.12094.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38.
Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models (2nd edn). Berlin: Springer. https://doi.org/10.1007/978-1-4757-3454-6.
Goldhammer, F. (2015). Measuring, ability, speed, or both? Challenges, psychometric solutions, and what can be gained from experimental control. Measurement, 13, 133–164. https://doi.org/10.1080/15366367.2015.1100020.
Haberman, S. J. (2006). Joint and conditional estimation for implicit models for tests with polytomous item scores (ETS Research Report RR-06-03). Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/j.2333-8504.2006.tb02009.x.
Haberman, S. J. (2013). A general program for item-response analysis that employs the stabilized Newton–Raphson algorithm (ETS research report RR-13-32). Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/j.2333-8504.2013.tb02339.x.
Haberman, S. J. (2016). Exponential family distributions relevant to IRT. In W. J. van der Linden (Ed.), Handbook of item response theory, volume two: Statistical tools (pp. 47–70). Boca Raton, FL: CRC Press.
Haberman, S. J., & Sinharay, S. (2013). Generalized residuals for general models for contingency tables with application to item response theory. Journal of the American Statistical Association, 108, 1435–1444. https://doi.org/10.1080/01621459.2013.835660.
Haberman, S. J., Sinharay, S., & Chon, K. H. (2013). Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions. Psychometrika, 78, 417–440. https://doi.org/10.1007/s11336-012-9305-1.
Hooker, G., Finkelman, M., & Schwartzman, A. (2009). Paradoxical results in multidimensional item response theory. Psychometrika, 74, 419–442. https://doi.org/10.1007/S11336-009-9111-6.
Kim, S. (2012). A note on the reliability coefficients for item response model-based ability estimates. Psychometrika, 77, 153–162. https://doi.org/10.1007/s11336-011-9238-0.
Kim, S. (2013). Generalization of the Lord–Wingersky algorithm to computing the distributions of summed test scores based on real-number item scores. Journal of Educational Measurement, 50, 381–389.
Lee, Y. H., & Chen, H. (2011). A review of recent response-time analyses in educational testing. Psychological Test and Assessment Modeling, 3, 359–379.
Lord, F. M. (1975). Formula scoring and number right scoring. Journal of Educational Measurement, 12, 7–11. https://doi.org/10.1111/j.1745-3984.1975.tb01003.x.
Lord, F. M., & Wingersky, M. S. (1984). Comparison of “IRT” true-score and equipercentile observed-score equatings. Applied Psychological Measurement, 8, 453–461.
Louis, T. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 44, 226–233. https://doi.org/10.2307/2345828.
Luce, R. D. (1986). Response times. Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195070019.001.0001.
Maris, G., & van der Maas, H. L. J. (2012). Speed-accuracy response models: Scoring rules based on response time and accuracy. Psychometrika, 77, 615–633. https://doi.org/10.1007/s11336-012-9288-y.
Marsman, M. (2014). Plausible values in statistical inference. Doctoral dissertation, University of Twente, Enschede.
Meng, X. L., & Rubin, D. (1991). Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association, 86, 899–909.
Naylor, J. C., & Smith, A. F. M. (1982). Applications of a method for efficient computation of posterior distributions. Applied Statistics, 31, 214–225. https://doi.org/10.2307/2347995.
Ranger, J., & Kuhn, J. T. (2012). A flexible latent trait model for response times in tests. Psychometrika, 77, 31–47. https://doi.org/10.1007/s11336-011-9231-7.
Ranger, J., Kuhn, J. T., & Gaviria, J. L. (2015). A race model for responses and response times in tests. Psychometrika, 80, 791–810. https://doi.org/10.1007/s11336-014-9427-8.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Paedagogike Institut.
Roskam, E. E. (1997). Models for speed and time-limit tests. In R. K. Hambleton & W. J. van der Linden (Eds.), Handbook of modern item response theory (pp. 187–208). New York: Springer.
Rouder, J. N., Sun, D., Speckman, P. L., Lu, J., & Zhou, D. (2003). A hierarchical Bayesian statistical framework for response time distributions. Psychometrika, 68, 589–606.
Spearman, C. (1927). The abilities of men. London: MacMillan.
Thurstone, L. L. (1919). A scoring method for mental tests. Psychological Bulletin, 16, 235–240.
Thurstone, L. L. (1937). Ability, motivation, and speed. Psychometrika, 2, 249–254.
Tuerlinckx, F., & de Boeck, P. (2005). Two interpretations of the discrimination parameter. Psychometrika, 70, 629–650. https://doi.org/10.1007/s11336-000-0810-3.
Tuerlinckx, F., Molenaar, D., & van der Maas, H. L. J. (2016). Diffusion-based item response modeling. In W. J. van der Linden (Ed.), Handbook of item response theory (Vol. 1, pp. 283–302). Boca Raton, FL: Chapman & Hall/CRC Press.
van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287–308. https://doi.org/10.1007/s11336-006-1478-z.
van der Linden, W. J. (2008). Using response times for item selection in adaptive testing. Journal of Educational and Behavioral Statistics, 33, 5–20. https://doi.org/10.3102/1076998607302626.
van der Linden, W. J. (2009). Conceptual issues in response-time modeling. Journal of Educational Measurement, 46, 247–272.
van der Maas, H., & Wagenmakers, E. J. (2005). A psychometric analysis of chess expertise. American Journal of Psychology, 118, 29–60.
van Rijn, P. W., & Ali, U. S. (2017). A comparison of item response models for accuracy and speed of item responses with applications to adaptive testing. British Journal of Mathematical and Statistical Psychology, 70, 317–345. https://doi.org/10.1111/bmsp.12101.
van Rijn, P. W., & Ali, U. S. (2018, in press). SARM: A computer program for estimating speed-accuracy response models (ETS Research Report). Princeton, NJ: Educational Testing Service.
van Rijn, P. W., & Rijmen, F. (2015). On the explaining-away phenomenon in multivariate latent variable models. British Journal of Mathematical and Statistical Psychology, 68, 1–22. https://doi.org/10.1111/bmsp.12046.
Yuan, K. H., Cheng, Y., & Patton, J. (2014). Information matrices and standard errors for MLEs of item parameters in IRT. Psychometrika, 79, 232–254. https://doi.org/10.1007/S11336-013-9334-4.
Acknowledgements
Funding was provided by Educational Testing Service. The authors would like to thank Rebecca Zwick, Yi-Hsuan Lee, Fred Robin, and three anonymous reviewers for their comments on earlier drafts of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
van Rijn, P.W., Ali, U.S. A Generalized Speed–Accuracy Response Model for Dichotomous Items. Psychometrika 83, 109–131 (2018). https://doi.org/10.1007/s11336-017-9590-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-017-9590-9