Semiparametric Factor Analysis for Item-Level Response Time Data

Liu, Yang; Wang, Weimeng

doi:10.1007/s11336-021-09832-8

Semiparametric Factor Analysis for Item-Level Response Time Data

Theory and Methods
Published: 31 January 2022

Volume 87, pages 666–692, (2022)
Cite this article

Psychometrika Aims and scope Submit manuscript

622 Accesses
3 Citations
Explore all metrics

Abstract

Item-level response time (RT) data can be conveniently collected from computer-based test/survey delivery platforms and have been demonstrated to bear a close relation to a miscellany of cognitive processes and test-taking behaviors. Individual differences in general processing speed can be inferred from item-level RT data using factor analysis. Conventional linear normal factor models make strong parametric assumptions, which sacrifices modeling flexibility for interpretability, and thus are not ideal for describing complex associations between observed RT and the latent speed. In this paper, we propose a semiparametric factor model with minimal parametric assumptions. Specifically, we adopt a functional analysis of variance representation for the log conditional densities of the manifest variables, in which the main effect and interaction functions are approximated by cubic splines. Penalized maximum likelihood estimation of the spline coefficients can be performed by an Expectation-Maximization algorithm, and the penalty weight can be empirically determined by cross-validation. In a simulation study, we compare the semiparametric model with incorrectly and correctly specified parametric factor models with regard to the recovery of data generating mechanism. A real data example is also presented to demonstrate the advantages of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Item Response Theory

Modeling Differences Between Response Times of Correct and Incorrect Responses

Article 28 August 2019

The Four-Parameter Normal Ogive Model with Response Times

Notes

Proposals that apply standard curve-fitting methods to observable estimates of the LV (e.g., a properly transformed total score) can be adapted to the current context (e.g., Ramsay, 1991). However, those methods require the instrument to be sufficiently long (Douglas, 1997) and therefore are not further considered in the present paper.
Molenaar et al. (2018) used the generalized partial credit model, which is a special case of the nominal model 1986.
We note that the domains of variables remain different in the two models. A more careful comparison can be made after truncating the MV domain from above in the PH model.
For simplicity, we use the same number of basis functions for both x and y.
We found in numerical experiments that the final estimates are insensitive to randomly generated starting values for the spline coefficients.
Log-RT for items 5, 6, 7, and 8 in the empirical data were selected for LMCV, LMLV, QMCV, and QMLV, respectively.
A slight difference is that the fitted QMLV model does not impose range restrictions on the MVs and LVs as the data-generating model does. But the discrepancy was found to be negligible in pilot runs.

References

Agresti, A. (2003). Categorical data analysis. Wiley.
Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics, 55(11), 117–128.
PubMed Google Scholar
Alexander, P. A. & The Disciplined Reading and Learning Research Laboratory. (2012). Reading into the future: Competence for the 21st century. Educational Psychologist, 47(4), 259–280.
Alexander, P. A., Dumas, D., Grossnickle, E. M., List, A., & Firetto, C. M. (2016). Measuring relational reasoning. The Journal of Experimental Education, 84(1), 119–151.
Google Scholar
Bartholomew, D., Knott, M., & Moustaki, I. (2011). Latent variable models and factor analysis: A unified approach. Wiley.
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51.
Google Scholar
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.
Google Scholar
Bollen, K. (1989). Structural equations with latent variables. Wiley.
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.
Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10(4), 433–436.
PubMed Google Scholar
Brown, L., Gans, N., Mandelbaum, A., Sakov, A., Shen, H., Zeltyn, S., & Zhao, L. (2005). Statistical analysis of a telephone call center: A queueing-science perspective. Journal of the American Statistical Association, 100(469), 36–50.
Google Scholar
Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75(13), 3–57.
Google Scholar
Chambers, J. M., Cleveland, W. S., Kleiner, B., & Tukey, P. A. (1983). Graphical methods for data analysis. Chapman.
Currie, I. D., Durban, M., & Eilers, P. H. (2006). Generalized linear array models with applications to multidimensional. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 682, 259–280.
Google Scholar
Davis, P., & Polonsky, I. (1964). Numerical interpolation, differentiation and integration. In M. Abramowitz & I. A. Stegun (Eds.), Handbook of mathematical functions with formulas, graphs, and mathematical tables. DCNational Bureau of Standards.
De Boeck, P., & Jeon, M. (2019). An overview of models for response times and processes in cognitive. Frontiers in Psychology, 10, 102.
PubMed PubMed Central Google Scholar
De Boor, C. (1978). A practical guide to splines. Springer.
De Boor, C., & Daniel, J. W. (1974). Splines with nonnegative B-spline coefficients. Mathematics of Computation, 28(126), 565–568.
Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.
Google Scholar
Dierckx, P. (1993). Curve and surface fitting with splines. Clarendon.
Douglas, J. (1997). Joint consistency of nonparametric item characteristic curve and ability estimation. Psychometrika, 6(21), 7–28.
Google Scholar
Eilers, P. H., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11, 89–102.
Google Scholar
Entink, R. K., van der Linden, W., & Fox, J. P. (2009). A Box–Cox normal model for response times. British Journal of Mathematical and Statistical Psychology, 62(3), 621–640.
Google Scholar
Glas, C. A., & van der Linden, W. J. (2010). Marginal likelihood inference for a model for item responses and response times. British Journal of Mathematical and Statistical Psychology, 63(3), 603–626.
PubMed Google Scholar
Gu, C. & Qiu, C. (1993). Smoothing spline density estimation: Theory. The Annals of Statistics, 217–234.
Gu, C. (1995). Smoothing spline density estimation: Conditional distribution. Statistica Sinica, 709–726.
Gu, C. (1993). Smoothing spline density estimation: A dimensionless automatic algorithm. Journal of the American Statistical Association, 88(422), 495–504.
Google Scholar
Gu, C. (2013). Smoothing spline ANOVA models. Springer.
Gu, M., & Kong, F. (1998). A stochastic approximation algorithm with Markov chain Monte-Carlo method for incomplete data estimation problems. Proceedings of the National Academy of Sciences, 95(13), 7270–7274.
Google Scholar
Gu, C., & Wahba, G. (1993). Smoothing spline ANOVA with component-wise Bayesian ‘confidence interval’. Journal of Computational and Graphical Statistics, 2(1), 97–117.
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2013). The elements of statistical learning: Data mining, inference, and prediction. Springer.
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55.
Google Scholar
Kang, H. A. (2017). Penalized partial likelihood inference of proportional hazards latent trait models. British Journal of Mathematical and Statistical Psychology, 70(2), 187–208.
PubMed Google Scholar
Kendall, M. (1955). Rank correlation methods (2nd ed.). Charles Griffin and Co.
Kyllonen, P. C., & Zu, J. (2016). Use of response time for measuring cognitive ability. Journal of Intelligence, 4(14), 1–29.
Google Scholar
Lee, Y. H., & Haberman, S. J. (2016). Investigating test-taking behaviors using timing and process data. International Journal of Testing, 16(3), 240–267.
Google Scholar
Lee, S. Y., Lu, B., & Song, X. Y. (2008). Semiparametric Bayesian analysis of structural equation models with fixed covariates. Statistics in Medicine, 27(13), 2341–2360.
PubMed Google Scholar
Leitenstorfer, F., & Tutz, G. (2007). Generalized monotonic regression based on B-splines with an application to air pollution data. Biostatistics, 8(3), 654–673.
PubMed Google Scholar
Liu, Y., Magnus, B. E., & Thissen, D. (2016). Modeling and testing differential item functioning in unidimensional binary item response models with a single continuous covariate: A functional data analysis approach. Psychometrika, 81(2), 371–398.
PubMed Google Scholar
MacCallum, R. C. (2003). 2001 presidential address: Working with imperfect models. Multivariate Behavioral Research, 38(1), 113–139.
PubMed Google Scholar
MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111(3), 490–504.
PubMed Google Scholar
Maydeu-Olivares, A. (2017). Assessing the size of model misfit in structural equation models. Psychometrika, 82(3), 533–558.
Google Scholar
Molenaar, D., Bolsinova, M., & Vermunt, J. K. (2018). A semi-parametric within-subject mixture approach to the analyses of responses and response times. British Journal of Mathematical and Statistical Psychology, 71(2), 205–228.
PubMed Google Scholar
Nocedal, J., & Wright, S. (2006). Numerical optimization. Springer.
OECD. (2017). PISA 2015 assessment and analytical framework. https://doi.org/10.1787/9789264281820-en
Pya, N., & Wood, S. N. (2015). Shape constrained additive models. Statistics and Computing, 25(3), 543–559.
Google Scholar
R Core Team. (2020). R: A language and environment for statistical computing [computer oftware manual], Vienna, Austria. https://www.R-project.org/
Ramsay, J. O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56(4), 611–630.
Google Scholar
Ramsay, J. O., & Silverman, B. W. (1997). Functional data analysis. Springer.
Ramsay, J. O., & Winsberg, S. (1991). Maximum marginal likelihood estimation for semiparametric item analysis. Psychometrika, 56(3), 365–379.
Google Scholar
Ranger, J., Kuhn, J. T., & Ortner, T. M. (2020). Modeling responses and response times in tests with the hierarchical model and the three-parameter lognormal distribution. Educational and Psychological Measurement, 80(6), 1059–1089.
PubMed PubMed Central Google Scholar
Ranger, J., & Ortner, T. (2012). A latent trait model for response times on tests employing the proportional hazards model. British Journal of Mathematical and Statistical Psychology, 65(2), 334–349.
PubMed Google Scholar
Ranger, J., & Ortner, T. M. (2013). Response time modeling based on the proportional hazards model. Multivariate Behavioral Research, 48(4), 503–533.
PubMed Google Scholar
Ranger, J., & Wolgast, A. (2019). Using response times as collateral information about latent traits in psychological tests. Methodology, 15, 185–196.
Google Scholar
Rossi, N., Wang, X., & Ramsay, J. O. (2002). Nonparametric item response function estimates with the EM algorithm. Journal of Educational and Behavioral Statistics, 27(3), 291–317.
Google Scholar
Rudin, W. (1964). Principles of mathematical analysis. McGraw-Hill.
Schnipke, D. L., & Scrams, D. J. (2002). Exploring issues of examinee behavior: Insights gained from response-time analyses. In C. N. Mills, M. Potenza, J. J. Fremer, & W. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 237–266). Lawrence Erlbaum Associates.
Shaked, M., & Shanthikumar, J. (2007). Stochastic orders. Springer.
Sinharay, S., & Johnson, M. S. (2019). The use of item scores and response times to detect examinees who may have benefited from item preknowledge. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12187.
Article PubMed Google Scholar
Sinharay, S., & van Rijn, P. W. (2020). Assessing fit of the lognormal model for response times. Journal of Educational and Behavioral Statistics, 45(5), 534–568.
Google Scholar
Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. CRC Press.
Snow, J. (2012). Qualtrics survey software: Handbook for research professionals. Qualtrics Labs Inc.
Song, X. Y., & Lu, Z. H. (2010). Semiparametric latent variable models with Bayesian P-splines. Journal of Computational and Graphical Statistics, 19(3), 590–608.
Google Scholar
Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51(4), 567–577.
Google Scholar
Thissen, D., & Wainer, H. (2001). Test scoring. Taylor & Francis.
van der Linden, W. J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31(2), 181–204.
Google Scholar
van der Linden, W. J., & Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73(3), 365–384.
Google Scholar
van der Linden, W. J., Klein Entink, R. H., & Fox, J. P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34(5), 327–347.
Google Scholar
Wang, C., Fan, Z., Chang, H. H., & Douglas, J. A. (2013). A semiparametric model for jointly analyzing response times and accuracy in computerized testing. Journal of Educational and Behavioral Statistics, 38(4), 381–417.
Google Scholar
Wood, S. N. (2004). Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association, 99(467), 673–686.
Google Scholar
Wu, C. J. (1983). On the convergence properties of the EM algorithm. The Annals of Statistics, 95–103.
Yalcin, I. & Amemiya, Y. (2001). Nonlinear factor analysis as a statistical method. Statistical Science, 275–294.
Zhang, S., Chen, Y., & Liu, Y. (2020). An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology, 73(1), 44–71.
PubMed Google Scholar
Zhang, D., & Davidian, M. (2001). Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics, 57(3), 795–802.
PubMed Google Scholar
Zhao, H., Alexander, P. A., & Sun, Y. (2020). Relational reasoning’s contributions to mathematical thinking and performance in Chinese elementary and middle-school students. Journal of Educational Psychology. https://doi.org/10.1037/edu0000595.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Human Development and Quantitative Methodology, University of Maryland, College Park, USA
Yang Liu & Weimeng Wang

Authors

Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Weimeng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Data sharing is not applicable to this article as no new data were created or analyzed in this study. Correspondence should be made to Yang Liu at 1230B Benjamin Bldg, 3942 Campus Dr, University of Maryland, College Park, MD 20742. Email: yliu87@umd.edu. The work is sponsored by the National Science Foundation under grant No. 1826535. The authors are grateful to Dr. David Thissen from the University of Carolina at Chapel Hill and Dr. Hao Wu from Vanderbilt University for their insightful comments on the project.The authors would also like to thank Drs. Hongyang Zhao and Patricia Alexander from University of Maryland, College Park, for providing the empirical data example, as well as Dr. Jochen Ranger for sharing his estimation code for the proportional hazard factor model.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 125 KB)

Derivatives for M-Step Optimization

Taking logarithm of the conditional density (Eq. 4) yields

$$\begin{aligned} \log f_j(y|x) = g_j(x, y) - \log \int _0^1\exp \left( g_j(x, y') \right) \mathrm{d}y'. \end{aligned}$$

(30)

The first and second derivatives of Eq. 30 with respect to the reduced coefficients $\varvec{\theta }_j$ are

$$\begin{aligned} \frac{\partial \log f_j}{\partial \varvec{\theta }_j}(x, y) = \frac{\partial g_j}{\partial \varvec{\theta }_j}(x, y) - \frac{\int _0^1 \frac{\partial g_j}{\partial \varvec{\theta }_j}(x, y') \exp \left( g_j(x, y') \right) \mathrm{d}y'}{\int _0^1\exp \left( g_j(x, y') \right) \mathrm{d}y'}, \end{aligned}$$

(31)

and

$$\begin{aligned} \frac{\partial ^2\log f_j}{\partial \varvec{\theta }_j\partial \varvec{\theta }_j^\top }(x, y) =&- \frac{\int _0^1 \frac{\partial g_j}{\partial \varvec{\theta }_j}(x, y')\frac{\partial g_j}{\partial \varvec{\theta }_j^\top }(x, y') \exp \left( g_j(x, y') \right) \mathrm{d}y'}{\int _0^1\exp \left( g_j(x, y') \right) \mathrm{d}y'} \nonumber \\&+ \frac{\left( \int _0^1 \frac{\partial g_j}{\partial \varvec{\theta }_j}(x, y') \exp \left( g_j(x, y') \right) \mathrm{d}y'\right) \left( \int _0^1 \frac{\partial g_j}{\partial \varvec{\theta }_j^\top }(x, y') \exp \left( g_j(x, y') \right) \mathrm{d}y'\right) }{\left( \int _0^1\exp \left( g_j(x, y') \right) \mathrm{d}y'\right) ^2},\quad \end{aligned}$$

(32)

respectively. Because the spline coefficients for the main effect and interaction functions are separable, we have

$$\begin{aligned} \frac{\partial g_j}{\partial \varvec{\theta }_j}(x, y) = \left( \frac{\partial g_j^y}{\partial \varvec{\alpha }_j^\top }(y),\ \frac{\partial g_j^{xy}}{\partial \mathrm {vec}(\mathbf {B}_j)^\top }(x, y) \right) ^\top , \end{aligned}$$

(33)

in which

$$\begin{aligned} \frac{\partial g_j^y}{\partial \varvec{\alpha }}(y) = \mathbf {N}^\top \varvec{\psi }(y), \end{aligned}$$

(34)

and

$$\begin{aligned} \frac{\partial g_j^{xy}}{\partial \mathbf {B}}(x, y) = \mathbf {N}^\top \varvec{\psi }(y)\varvec{\psi }(x)^\top \mathbf {N}. \end{aligned}$$

(35)

The penalty terms (Eqs. 15 and 16) are quadratic forms in the spline coefficients. Let $\mathbf {Q}= \mathbf {D}_2\mathbf {N}$. The corresponding derivatives are obtained as follows:

$$\begin{aligned}&\frac{\partial p_1}{\partial \varvec{\alpha }_j}(\varvec{\alpha }_j; \uplambda ) = \uplambda \mathbf {Q}^\top \mathbf {Q}^{}\varvec{\alpha }_j, \end{aligned}$$

(36)

$$\begin{aligned}&\frac{\partial ^2 p_1}{\partial \varvec{\alpha }_j\partial \varvec{\alpha }_j^\top }(\uplambda ) = \uplambda \mathbf {Q}^\top \mathbf {Q}^{}, \end{aligned}$$

(37)

$$\begin{aligned}&\frac{\partial p_2}{\partial \mathrm {vec}(\mathbf {B}_j)}(\mathbf {B}_j; \uplambda ) = \uplambda \left( \mathbf {I}\otimes \mathbf {Q}^\top \mathbf {Q}+ \mathbf {Q}^\top \mathbf {Q}\otimes \mathbf {I}\right) \mathrm {vec}(\mathbf {B}_j), \end{aligned}$$

(38)

and

$$\begin{aligned} \frac{\partial ^2 p_2}{\partial \mathrm {vec}(\mathbf {B}_j)\partial \mathrm {vec}(\mathbf {B}_j)^\top }(\uplambda ) = \uplambda \left( \mathbf {I}\otimes \mathbf {Q}^\top \mathbf {Q}+ \mathbf {Q}^\top \mathbf {Q}\otimes \mathbf {I}\right) . \end{aligned}$$

(39)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Wang, W. Semiparametric Factor Analysis for Item-Level Response Time Data. Psychometrika 87, 666–692 (2022). https://doi.org/10.1007/s11336-021-09832-8

Download citation

Received: 05 March 2021
Revised: 27 September 2021
Published: 31 January 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11336-021-09832-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semiparametric Factor Analysis for Item-Level Response Time Data

Abstract

Access this article

Similar content being viewed by others

Item Response Theory

Modeling Differences Between Response Times of Correct and Incorrect Responses

The Four-Parameter Normal Ogive Model with Response Times

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 125 KB)

Derivatives for M-Step Optimization

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semiparametric Factor Analysis for Item-Level Response Time Data

Abstract

Access this article

Similar content being viewed by others

Item Response Theory

Modeling Differences Between Response Times of Correct and Incorrect Responses

The Four-Parameter Normal Ogive Model with Response Times

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 125 KB)

Derivatives for M-Step Optimization

Derivatives for M-Step Optimization

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation