Abstract
Reliability is a crucial concept in psychometrics. Although it is typically estimated as a single fixed quantity, previous work suggests that reliability can vary across persons, groups, and covariates. We propose a novel method for estimating and modeling case-specific reliability without repeated measurements or parallel tests. The proposed method employs a “Reliability Factor” that models the error variance of each case across multiple indicators, thereby producing case-specific reliability estimates. Additionally, we use Gaussian process modeling to estimate a nonlinear, non-monotonic function between the latent factor itself and the reliability of the measure, providing an analogue to test information functions in item response theory. The reliability factor model is a new tool for examining latent regions with poor conditional reliability, and correlates thereof, in a classical test theory framework.
Similar content being viewed by others
Notes
Note that the reliability factor models error variance in responses, and not intraindividual variability of responses, nor variance in latent factors.
A traditional Gaussian model assumes that, e.g., \(y \sim {\mathcal {N}}(\mu = f(X), \sigma ^2)\). A location-scale Gaussian model includes a second submodel on the scale parameter, \(\sigma ^2\), to model variance: \(y \sim {\mathcal {N}}(\mu = f(X), \sigma ^2 = g(X))\). Because \(\sigma ^2\) must be positive, a log-link function is used in the submodel, \(\sigma ^2 = \exp (g(X))\). The reliability factor approach uses this exact strategy in its formulation.
The latent factors (\({\varvec{\eta _i}}\)) can be endogenously modeled as per usual, but this is not discussed here.
References
Asparouhov, T., Hamaker, E. L., & Muthén, B. (2018). Dynamic structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 25(3), 359–388. https://doi.org/10.1080/10705511.2017.1406803
Bacon, D. R., Sauer, P. L., & Young, M. (1995). Composite reliability in structural equations modeling. Educational and Psychological Measurement, 55(3), 394–406. https://doi.org/10.1177/0013164495055003003
Barnard, J., McCulloch, R., & Meng, X.-L. (2000). Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Statistica Sinica, 10(4), 1281–1311.
Bauer, D. J. (2017). A more general model for testing measurement invariance and differential item functioning. Psychological Methods, 22(3), 507–526. https://doi.org/10.1037/met0000077
Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consistency reliability. Psychometrika, 74(1), 137–143. https://doi.org/10.1007/s11336-008-9100-1
Betancourt, M. (2017). A conceptual introduction to Hamiltonian Monte Carlo. Retrieved from arxiv.org/abs/1701.02434
Brennan, R. L. (2005). Generalizability theory. Educational Measurement: Issues and Practice, 11(4), 27–34. https://doi.org/10.1111/j.1745-3992.1992.tb00260.x
Brunton-Smith, I., Sturgis, P., & Leckie, G. (2017). Detecting and understanding interviewer effects on survey data by using a cross-classified mixed effects location-scale model. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(2), 551–568. https://doi.org/10.1111/rssa.12205
Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software. https://doi.org/10.18637/jss.v076.i01
de Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.
Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105(3), 399–412. https://doi.org/10.1111/bjop.12046
Ellis, J. L., & van den Wollenberg, A. L. (1993). Local homogeneity in latent trait models. A characterization of the homogeneous monotone IRT model. Psychometrika, 58(3), 417-429. https://doi.org/10.1007/BF02294649
Feldt, L. S., & Quails, A. L. (1996). Estimation of measurement error variance at specific score levels. Journal of Educational Measurement, 33(2), 141–156. https://doi.org/10.1111/j.1745-3984.1996.tb00486.x
Feldt, L. S., Steffen, M., & Gupta, N. C. (1985). A comparison of five methods for estimating the standard error of measurement at specific score levels. Applied Psychological Measurement, 9(4), 351–361. https://doi.org/10.1177/014662168500900402
Geldhof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72–91. https://doi.org/10.1037/a0032138
Gelman, A., Hill, J., & Yajima, M. (2012). Why we (usually) don’t have to worry about multiple comparisons. Journal of Research on Educational Effectiveness, 5(2), 189–211. https://doi.org/10.1080/19345747.2011.618213
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472. https://doi.org/10.1214/ss/1177011136
Gelman, A., Vehtari, A., Simpson, D., Margossian, C. C., Carpenter, B., Yao, Y., Kennedy, L., Gabry, J., Burkner, P.-C., & Modrák, M. (2020). Bayesian workflow. Retrieved from arXiv:2011.01808
Harvill, L. M. (1991). An NCME Instructional Module on. standard error of measurement. Educational Measurement: Issues and Practice, 10(2), 33–41. https://doi.org/10.1111/j.1745-3992.1991.tb00195.x
Hedeker, D., Mermelstein, R. J., Berbaum, M. L., & Campbell, R. T. (2009). Modeling mood variation associated with smoking: An application of a heterogeneous mixed-effects model for analysis of ecological momentary assessment (EMA) data. Addiction, 104(2), 297–307. https://doi.org/10.1111/j.1360-0443.2008.02435.x
Hedeker, D., Mermelstein, R. J., & Demirtas, H. (2008). An application of a mixed-effects location scale model for analysis of ecological momentary assessment (EMA) data. Biometrics, 64(2), 627–634. https://doi.org/10.1111/j.1541-0420.2007.00924.x
Hedeker, D., Mermelstein, R. J., & Demirtas, H. (2012). Modeling between-subject and within-subject variances in ecological momentary assessment data using mixed-effects location scale models. Statistics in medicine, 31(27), 3328–36. https://doi.org/10.1002/sim.5338
Holzinger, K. J., & Swineford, F. A. (1939). A study in factor analysis: The stability of a bi-factor solution. Supplementary Education Monographs, 48.
Hu, Y., Nesselroade, J. R., Erbacher, M. K., Boker, S. M., Burt, S. A., Keel, P. K., & Klump, K. (2016). Test reliability at the individual level. Structural Equation Modeling: A Multidisciplinary Journal, 23(4), 532–543. https://doi.org/10.1080/10705511.2016.1148605
Jöreskog, K. G. (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36(2), 109–133. https://doi.org/10.1007/BF02291393
Kapur, K., Li, X., Blood, E. A., & Hedeker, D. (2015). Bayesian mixed-effects location and scale models for multivariate longitudinal outcomes: An application to ecological momentary assessment data. Statistics in Medicine, 34(4), 630–651. https://doi.org/10.1002/sim.6345
Leckie, G., French, R., Charlton, C., & Browne, W. (2014). Modeling heterogeneous variance-covariance components in two-level models. Journal of Educational and Behavioral Statistics, 39(5), 307–332. https://doi.org/10.3102/1076998614546494
Lee, Y., & Nelder, J. A. (2006). Double hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society: Series C (Applied Statistics), 55(2), 139–185. https://doi.org/10.1111/j.1467-9876.2006.00538.x
Lek, K. M., & Van De Schoot, R. (2018). A comparison of the single, conditional and person-specific standard error of measurement: What do they measure and when to use them? Frontiers in Applied Mathematics and Statistics. https://doi.org/10.3389/fams.2018.00040
Lewandowski, D., Kurowicka, D., & Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis. https://doi.org/10.1016/j.jmva.2009.04.008
Li, X., & Hedeker, D. (2012). A three-level mixed-effects location scale model with an application to ecological momentary assessment data. Statistics in Medicine, 31(26), 3192–210. https://doi.org/10.1002/sim.5393
Liu, H., Zhang, Z., & Grimm, K. J. (2016). Comparison of inverse Wishart and separation-strategy priors for Bayesian estimation of covariance parameter matrix in growth curve analysis. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 354–367. https://doi.org/10.1080/10705511.2015.1057285
Lord, F. M., & Novick, M. R. (2008). Statistical theories of mental test scores. Information Age Publishing.
Martin, S. R., Williams, D. R., & Rast, P. (2019). Measurement invariance assessment with Bayesian hierarchical inclusion modeling. PsyArXiv. https://doi.org/10.31234/osf.io/qbdjt
Martin, S. R., Williams, D. R., & Rast, P. (2020). Omegad. Retrieved from http://github.com/stephensrmmartin/ omegad
McNeish, D. (2018). Thanks coeffcient alpha, we’ll take it from here. Psychological Methods, 23(3), 412–433. https://doi.org/10.1037/met0000144
Mehta, P. D., & Neale, M. C. (2005). People are variables too: Multilevel structural equations modeling. Psychological Methods, 10(3), 259–284. https://doi.org/10.1037/1082-989X.10.3.259
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543. https://doi.org/10.1007/BF02294825
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Muthén, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22(3), 376–398. https://doi.org/10.1177/0049124194022003006
Nestler, S. (2020). Modelling inter-individual differences in latent within-person variation: The confirmatory factor level variability model. British Journal of Mathematical and Statistical Psychology, 73(3), 452–473. https://doi.org/10.1111/bmsp.12196
Raju, N. S., Price, L. R., Oshima, T., & Nering, M. L. (2007). Standardized conditional SEM: A case for conditional reliability. Applied Psychological Measurement, 31(3), 169–180. https://doi.org/10.1177/0146621606291569
Rast, P., & Ferrer, E. (2018). A mixed-effects location scale model for dyadic interactions. Multivariate Behavioral Research, 53(5), 756–775. https://doi.org/10.1080/00273171.2018.1477577
Rast, P., Hofer, S. M., & Sparks, C. (2012). Modeling individual differences in within-person variation of negative and positive affect in a mixed effects location scale model using BUGS/JAGS. Multivariate Behavioral Research, 47(2), 177–200. https://doi.org/10.1080/00273171.2012.658328
Rast, P., Martin, S. R., Liu, S., & Williams, D. R. (2020). A new frontier for studying within-person variability: Bayesian multivariate generalized autoregressive conditional heteroskedasticity models. Psychological Methods. https://doi.org/10.1037/met0000357
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: applications and data analysis methods (2nd edn). Thousand Oaks.
Raykov, T. (1997). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21(2), 173–184. https://doi.org/10.1177/01466216970212006
Raykov, T. (2001). Estimation of congeneric scale reliability using covariance structure analysis with nonlinear constraints. British Journal of Mathematical and Statistical Psychology, 54(2), 315–323. https://doi.org/10.1348/000711001159582
Raykov, T., & du Toit, S. H. C. (2005). Estimation of reliability for multiple-component measuring instruments in hierarchical designs. Structural Equation Modeling: A Multidisciplinary Journal, 12(4), 536–550. https://doi.org/10.1207/s15328007sem1204_2
Rothenberg, T. J. (1971). Identification in parametric models. Econometrica, 39(3), 577. https://doi.org/10.2307/1913267
Schad, D. J., Betancourt, M., & Vasishth, S. (2019). Toward a principled Bayesian workflow in cognitive science.
Solin, A., & Särkkä, S. (2019). Hilbert space methods for reduced-rank Gaussian process regression. Statistics and Computing. https://doi.org/10.1007/s11222-019-09886-w
Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out crossvalidation andWAIC. Statistics and Computing, 27(5), 1413–1432. https://doi.org/10.1007/s11222-016-9696-4
Viallefont, A., Lebreton, J.-D., Reboulet, A.-M., & Gory, G. (1998). Parameter identifiability and model selection in capture-recapture models: A numerical approach. Biometrical Journal, 40(3), 313–325. https://doi.org/10.1002/(SICI)1521-4036(199807)40:3\(<\)313:AID-BIMJ313\(>\)3.0.CO2-2
Williams, D. R., Liu, S., Martin, S. R., & Rast, P. (2019). Bayesian multivariate mixed-effects location scale modeling of longitudinal relations among affective traits, states, and physical activity. PsyArXiv. https://doi.org/10.31234/osf.io/4kfjp
Williams, D. R., Martin, S. R., & Rast, P. (2019). Putting the individual into reliability: Bayesian testing of homogeneous within-person variance in hierarchical models. PsyArXiv. https://doi.org/10.31234/OSF.IO/HPQ7W
Yang, Y., Bhattacharya, A., & Pati, D. (2017). Frequentist coverage and sup-norm convergence rate in Gaussian process regression. Retrieved from arxiv.org/abs/1708.04753
Zhang, X., & Savalei, V. (2019). Examining the effect of missing data on RMSEA and CFI under normal theory full-information maximum likelihood. Structural Equation Modeling: A Multidisciplinary Journal . https://doi.org/10.1080/10705511.2019.1642111
Author information
Authors and Affiliations
Ethics declarations
Data Availability Statement
The datasets generated during the current study are available from the corresponding author on request. The Holzinger–Swineford (1939) dataset is freely available in the lavaan (https://cran.r-project.org/web/packages/lavaan/index.html) and MBESS (https://cran.r-project.org/web/packages/MBESS/index.html) R packages.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Research reported in this publication was supported by the National Institute On Aging of the National Institutes of Health under Award Number R01AG050720 to PR. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Rights and permissions
About this article
Cite this article
Martin, S.R., Rast, P. The Reliability Factor: Modeling Individual Reliability with Multiple Items from a Single Assessment. Psychometrika 87, 1318–1342 (2022). https://doi.org/10.1007/s11336-022-09847-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-022-09847-9