Skip to main content
Log in

Semiparametric Factor Analysis for Item-Level Response Time Data

  • Theory and Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Item-level response time (RT) data can be conveniently collected from computer-based test/survey delivery platforms and have been demonstrated to bear a close relation to a miscellany of cognitive processes and test-taking behaviors. Individual differences in general processing speed can be inferred from item-level RT data using factor analysis. Conventional linear normal factor models make strong parametric assumptions, which sacrifices modeling flexibility for interpretability, and thus are not ideal for describing complex associations between observed RT and the latent speed. In this paper, we propose a semiparametric factor model with minimal parametric assumptions. Specifically, we adopt a functional analysis of variance representation for the log conditional densities of the manifest variables, in which the main effect and interaction functions are approximated by cubic splines. Penalized maximum likelihood estimation of the spline coefficients can be performed by an Expectation-Maximization algorithm, and the penalty weight can be empirically determined by cross-validation. In a simulation study, we compare the semiparametric model with incorrectly and correctly specified parametric factor models with regard to the recovery of data generating mechanism. A real data example is also presented to demonstrate the advantages of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Proposals that apply standard curve-fitting methods to observable estimates of the LV (e.g., a properly transformed total score) can be adapted to the current context (e.g., Ramsay, 1991). However, those methods require the instrument to be sufficiently long (Douglas, 1997) and therefore are not further considered in the present paper.

  2. Molenaar et al. (2018) used the generalized partial credit model, which is a special case of the nominal model 1986.

  3. We note that the domains of variables remain different in the two models. A more careful comparison can be made after truncating the MV domain from above in the PH model.

  4. For simplicity, we use the same number of basis functions for both x and y.

  5. We found in numerical experiments that the final estimates are insensitive to randomly generated starting values for the spline coefficients.

  6. Log-RT for items 5, 6, 7, and 8 in the empirical data were selected for LMCV, LMLV, QMCV, and QMLV, respectively.

  7. A slight difference is that the fitted QMLV model does not impose range restrictions on the MVs and LVs as the data-generating model does. But the discrepancy was found to be negligible in pilot runs.

References

  • Agresti, A. (2003). Categorical data analysis. Wiley.

  • Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics, 55(11), 117–128.

    PubMed  Google Scholar 

  • Alexander, P. A. & The Disciplined Reading and Learning Research Laboratory. (2012). Reading into the future: Competence for the 21st century. Educational Psychologist, 47(4), 259–280.

  • Alexander, P. A., Dumas, D., Grossnickle, E. M., List, A., & Firetto, C. M. (2016). Measuring relational reasoning. The Journal of Experimental Education, 84(1), 119–151.

    Google Scholar 

  • Bartholomew, D., Knott, M., & Moustaki, I. (2011). Latent variable models and factor analysis: A unified approach. Wiley.

  • Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51.

    Google Scholar 

  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.

    Google Scholar 

  • Bollen, K. (1989). Structural equations with latent variables. Wiley.

  • Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.

  • Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10(4), 433–436.

    PubMed  Google Scholar 

  • Brown, L., Gans, N., Mandelbaum, A., Sakov, A., Shen, H., Zeltyn, S., & Zhao, L. (2005). Statistical analysis of a telephone call center: A queueing-science perspective. Journal of the American Statistical Association, 100(469), 36–50.

    Google Scholar 

  • Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75(13), 3–57.

    Google Scholar 

  • Chambers, J. M., Cleveland, W. S., Kleiner, B., & Tukey, P. A. (1983). Graphical methods for data analysis. Chapman.

  • Currie, I. D., Durban, M., & Eilers, P. H. (2006). Generalized linear array models with applications to multidimensional. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 682, 259–280.

    Google Scholar 

  • Davis, P., & Polonsky, I. (1964). Numerical interpolation, differentiation and integration. In M. Abramowitz & I. A. Stegun (Eds.), Handbook of mathematical functions with formulas, graphs, and mathematical tables. DCNational Bureau of Standards.

  • De Boeck, P., & Jeon, M. (2019). An overview of models for response times and processes in cognitive. Frontiers in Psychology, 10, 102.

    PubMed  PubMed Central  Google Scholar 

  • De Boor, C. (1978). A practical guide to splines. Springer.

  • De Boor, C., & Daniel, J. W. (1974). Splines with nonnegative B-spline coefficients. Mathematics of Computation, 28(126), 565–568.

    Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.

    Google Scholar 

  • Dierckx, P. (1993). Curve and surface fitting with splines. Clarendon.

  • Douglas, J. (1997). Joint consistency of nonparametric item characteristic curve and ability estimation. Psychometrika, 6(21), 7–28.

    Google Scholar 

  • Eilers, P. H., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11, 89–102.

    Google Scholar 

  • Entink, R. K., van der Linden, W., & Fox, J. P. (2009). A Box–Cox normal model for response times. British Journal of Mathematical and Statistical Psychology, 62(3), 621–640.

    Google Scholar 

  • Glas, C. A., & van der Linden, W. J. (2010). Marginal likelihood inference for a model for item responses and response times. British Journal of Mathematical and Statistical Psychology, 63(3), 603–626.

    PubMed  Google Scholar 

  • Gu, C. & Qiu, C. (1993). Smoothing spline density estimation: Theory. The Annals of Statistics, 217–234.

  • Gu, C. (1995). Smoothing spline density estimation: Conditional distribution. Statistica Sinica, 709–726.

  • Gu, C. (1993). Smoothing spline density estimation: A dimensionless automatic algorithm. Journal of the American Statistical Association, 88(422), 495–504.

    Google Scholar 

  • Gu, C. (2013). Smoothing spline ANOVA models. Springer.

  • Gu, M., & Kong, F. (1998). A stochastic approximation algorithm with Markov chain Monte-Carlo method for incomplete data estimation problems. Proceedings of the National Academy of Sciences, 95(13), 7270–7274.

    Google Scholar 

  • Gu, C., & Wahba, G. (1993). Smoothing spline ANOVA with component-wise Bayesian ‘confidence interval’. Journal of Computational and Graphical Statistics, 2(1), 97–117.

    Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2013). The elements of statistical learning: Data mining, inference, and prediction. Springer.

  • Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55.

    Google Scholar 

  • Kang, H. A. (2017). Penalized partial likelihood inference of proportional hazards latent trait models. British Journal of Mathematical and Statistical Psychology, 70(2), 187–208.

    PubMed  Google Scholar 

  • Kendall, M. (1955). Rank correlation methods (2nd ed.). Charles Griffin and Co.

  • Kyllonen, P. C., & Zu, J. (2016). Use of response time for measuring cognitive ability. Journal of Intelligence, 4(14), 1–29.

    Google Scholar 

  • Lee, Y. H., & Haberman, S. J. (2016). Investigating test-taking behaviors using timing and process data. International Journal of Testing, 16(3), 240–267.

    Google Scholar 

  • Lee, S. Y., Lu, B., & Song, X. Y. (2008). Semiparametric Bayesian analysis of structural equation models with fixed covariates. Statistics in Medicine, 27(13), 2341–2360.

    PubMed  Google Scholar 

  • Leitenstorfer, F., & Tutz, G. (2007). Generalized monotonic regression based on B-splines with an application to air pollution data. Biostatistics, 8(3), 654–673.

    PubMed  Google Scholar 

  • Liu, Y., Magnus, B. E., & Thissen, D. (2016). Modeling and testing differential item functioning in unidimensional binary item response models with a single continuous covariate: A functional data analysis approach. Psychometrika, 81(2), 371–398.

    PubMed  Google Scholar 

  • MacCallum, R. C. (2003). 2001 presidential address: Working with imperfect models. Multivariate Behavioral Research, 38(1), 113–139.

    PubMed  Google Scholar 

  • MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111(3), 490–504.

    PubMed  Google Scholar 

  • Maydeu-Olivares, A. (2017). Assessing the size of model misfit in structural equation models. Psychometrika, 82(3), 533–558.

    Google Scholar 

  • Molenaar, D., Bolsinova, M., & Vermunt, J. K. (2018). A semi-parametric within-subject mixture approach to the analyses of responses and response times. British Journal of Mathematical and Statistical Psychology, 71(2), 205–228.

    PubMed  Google Scholar 

  • Nocedal, J., & Wright, S. (2006). Numerical optimization. Springer.

  • OECD. (2017). PISA 2015 assessment and analytical framework. https://doi.org/10.1787/9789264281820-en

  • Pya, N., & Wood, S. N. (2015). Shape constrained additive models. Statistics and Computing, 25(3), 543–559.

    Google Scholar 

  • R Core Team. (2020). R: A language and environment for statistical computing [computer oftware manual], Vienna, Austria. https://www.R-project.org/

  • Ramsay, J. O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56(4), 611–630.

    Google Scholar 

  • Ramsay, J. O., & Silverman, B. W. (1997). Functional data analysis. Springer.

  • Ramsay, J. O., & Winsberg, S. (1991). Maximum marginal likelihood estimation for semiparametric item analysis. Psychometrika, 56(3), 365–379.

    Google Scholar 

  • Ranger, J., Kuhn, J. T., & Ortner, T. M. (2020). Modeling responses and response times in tests with the hierarchical model and the three-parameter lognormal distribution. Educational and Psychological Measurement, 80(6), 1059–1089.

    PubMed  PubMed Central  Google Scholar 

  • Ranger, J., & Ortner, T. (2012). A latent trait model for response times on tests employing the proportional hazards model. British Journal of Mathematical and Statistical Psychology, 65(2), 334–349.

    PubMed  Google Scholar 

  • Ranger, J., & Ortner, T. M. (2013). Response time modeling based on the proportional hazards model. Multivariate Behavioral Research, 48(4), 503–533.

    PubMed  Google Scholar 

  • Ranger, J., & Wolgast, A. (2019). Using response times as collateral information about latent traits in psychological tests. Methodology, 15, 185–196.

    Google Scholar 

  • Rossi, N., Wang, X., & Ramsay, J. O. (2002). Nonparametric item response function estimates with the EM algorithm. Journal of Educational and Behavioral Statistics, 27(3), 291–317.

    Google Scholar 

  • Rudin, W. (1964). Principles of mathematical analysis. McGraw-Hill.

  • Schnipke, D. L., & Scrams, D. J. (2002). Exploring issues of examinee behavior: Insights gained from response-time analyses. In C. N. Mills, M. Potenza, J. J. Fremer, & W. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 237–266). Lawrence Erlbaum Associates.

  • Shaked, M., & Shanthikumar, J. (2007). Stochastic orders. Springer.

  • Sinharay, S., & Johnson, M. S. (2019). The use of item scores and response times to detect examinees who may have benefited from item preknowledge. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12187.

    Article  PubMed  Google Scholar 

  • Sinharay, S., & van Rijn, P. W. (2020). Assessing fit of the lognormal model for response times. Journal of Educational and Behavioral Statistics, 45(5), 534–568.

    Google Scholar 

  • Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. CRC Press.

  • Snow, J. (2012). Qualtrics survey software: Handbook for research professionals. Qualtrics Labs Inc.

  • Song, X. Y., & Lu, Z. H. (2010). Semiparametric latent variable models with Bayesian P-splines. Journal of Computational and Graphical Statistics, 19(3), 590–608.

    Google Scholar 

  • Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51(4), 567–577.

    Google Scholar 

  • Thissen, D., & Wainer, H. (2001). Test scoring. Taylor & Francis.

  • van der Linden, W. J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31(2), 181–204.

    Google Scholar 

  • van der Linden, W. J., & Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73(3), 365–384.

    Google Scholar 

  • van der Linden, W. J., Klein Entink, R. H., & Fox, J. P. (2010). IRT parameter estimation with response times as collateral information. Applied Psychological Measurement, 34(5), 327–347.

    Google Scholar 

  • Wang, C., Fan, Z., Chang, H. H., & Douglas, J. A. (2013). A semiparametric model for jointly analyzing response times and accuracy in computerized testing. Journal of Educational and Behavioral Statistics, 38(4), 381–417.

    Google Scholar 

  • Wood, S. N. (2004). Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association, 99(467), 673–686.

    Google Scholar 

  • Wu, C. J. (1983). On the convergence properties of the EM algorithm. The Annals of Statistics, 95–103.

  • Yalcin, I. & Amemiya, Y. (2001). Nonlinear factor analysis as a statistical method. Statistical Science, 275–294.

  • Zhang, S., Chen, Y., & Liu, Y. (2020). An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology, 73(1), 44–71.

    PubMed  Google Scholar 

  • Zhang, D., & Davidian, M. (2001). Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics, 57(3), 795–802.

    PubMed  Google Scholar 

  • Zhao, H., Alexander, P. A., & Sun, Y. (2020). Relational reasoning’s contributions to mathematical thinking and performance in Chinese elementary and middle-school students. Journal of Educational Psychology. https://doi.org/10.1037/edu0000595.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Data sharing is not applicable to this article as no new data were created or analyzed in this study. Correspondence should be made to Yang Liu at 1230B Benjamin Bldg, 3942 Campus Dr, University of Maryland, College Park, MD 20742. Email: yliu87@umd.edu. The work is sponsored by the National Science Foundation under grant No. 1826535. The authors are grateful to Dr. David Thissen from the University of Carolina at Chapel Hill and Dr. Hao Wu from Vanderbilt University for their insightful comments on the project.The authors would also like to thank Drs. Hongyang Zhao and Patricia Alexander from University of Maryland, College Park, for providing the empirical data example, as well as Dr. Jochen Ranger for sharing his estimation code for the proportional hazard factor model.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 125 KB)

Derivatives for M-Step Optimization

Derivatives for M-Step Optimization

Taking logarithm of the conditional density (Eq. 4) yields

$$\begin{aligned} \log f_j(y|x) = g_j(x, y) - \log \int _0^1\exp \left( g_j(x, y') \right) \mathrm{d}y'. \end{aligned}$$
(30)

The first and second derivatives of Eq. 30 with respect to the reduced coefficients \(\varvec{\theta }_j\) are

$$\begin{aligned} \frac{\partial \log f_j}{\partial \varvec{\theta }_j}(x, y) = \frac{\partial g_j}{\partial \varvec{\theta }_j}(x, y) - \frac{\int _0^1 \frac{\partial g_j}{\partial \varvec{\theta }_j}(x, y') \exp \left( g_j(x, y') \right) \mathrm{d}y'}{\int _0^1\exp \left( g_j(x, y') \right) \mathrm{d}y'}, \end{aligned}$$
(31)

and

$$\begin{aligned} \frac{\partial ^2\log f_j}{\partial \varvec{\theta }_j\partial \varvec{\theta }_j^\top }(x, y) =&- \frac{\int _0^1 \frac{\partial g_j}{\partial \varvec{\theta }_j}(x, y')\frac{\partial g_j}{\partial \varvec{\theta }_j^\top }(x, y') \exp \left( g_j(x, y') \right) \mathrm{d}y'}{\int _0^1\exp \left( g_j(x, y') \right) \mathrm{d}y'} \nonumber \\&+ \frac{\left( \int _0^1 \frac{\partial g_j}{\partial \varvec{\theta }_j}(x, y') \exp \left( g_j(x, y') \right) \mathrm{d}y'\right) \left( \int _0^1 \frac{\partial g_j}{\partial \varvec{\theta }_j^\top }(x, y') \exp \left( g_j(x, y') \right) \mathrm{d}y'\right) }{\left( \int _0^1\exp \left( g_j(x, y') \right) \mathrm{d}y'\right) ^2},\quad \end{aligned}$$
(32)

respectively. Because the spline coefficients for the main effect and interaction functions are separable, we have

$$\begin{aligned} \frac{\partial g_j}{\partial \varvec{\theta }_j}(x, y) = \left( \frac{\partial g_j^y}{\partial \varvec{\alpha }_j^\top }(y),\ \frac{\partial g_j^{xy}}{\partial \mathrm {vec}(\mathbf {B}_j)^\top }(x, y) \right) ^\top , \end{aligned}$$
(33)

in which

$$\begin{aligned} \frac{\partial g_j^y}{\partial \varvec{\alpha }}(y) = \mathbf {N}^\top \varvec{\psi }(y), \end{aligned}$$
(34)

and

$$\begin{aligned} \frac{\partial g_j^{xy}}{\partial \mathbf {B}}(x, y) = \mathbf {N}^\top \varvec{\psi }(y)\varvec{\psi }(x)^\top \mathbf {N}. \end{aligned}$$
(35)

The penalty terms (Eqs. 15 and 16) are quadratic forms in the spline coefficients. Let \(\mathbf {Q}= \mathbf {D}_2\mathbf {N}\). The corresponding derivatives are obtained as follows:

$$\begin{aligned}&\frac{\partial p_1}{\partial \varvec{\alpha }_j}(\varvec{\alpha }_j; \uplambda ) = \uplambda \mathbf {Q}^\top \mathbf {Q}^{}\varvec{\alpha }_j, \end{aligned}$$
(36)
$$\begin{aligned}&\frac{\partial ^2 p_1}{\partial \varvec{\alpha }_j\partial \varvec{\alpha }_j^\top }(\uplambda ) = \uplambda \mathbf {Q}^\top \mathbf {Q}^{}, \end{aligned}$$
(37)
$$\begin{aligned}&\frac{\partial p_2}{\partial \mathrm {vec}(\mathbf {B}_j)}(\mathbf {B}_j; \uplambda ) = \uplambda \left( \mathbf {I}\otimes \mathbf {Q}^\top \mathbf {Q}+ \mathbf {Q}^\top \mathbf {Q}\otimes \mathbf {I}\right) \mathrm {vec}(\mathbf {B}_j), \end{aligned}$$
(38)

and

$$\begin{aligned} \frac{\partial ^2 p_2}{\partial \mathrm {vec}(\mathbf {B}_j)\partial \mathrm {vec}(\mathbf {B}_j)^\top }(\uplambda ) = \uplambda \left( \mathbf {I}\otimes \mathbf {Q}^\top \mathbf {Q}+ \mathbf {Q}^\top \mathbf {Q}\otimes \mathbf {I}\right) . \end{aligned}$$
(39)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Wang, W. Semiparametric Factor Analysis for Item-Level Response Time Data. Psychometrika 87, 666–692 (2022). https://doi.org/10.1007/s11336-021-09832-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-021-09832-8

Keywords

Navigation