Abstract
In educational and psychological measurement, researchers and/or practitioners are often interested in examining whether the ability of an examinee is the same over two sets of items. Such problems can arise in measurement of change, detection of cheating on unproctored tests, erasure analysis, detection of item preknowledge, etc. Traditional frequentist approaches that are used in such problems include the Wald test, the likelihood ratio test, and the score test (e.g., Fischer, Appl Psychol Meas 27:3–26, 2003; Finkelman, Weiss, & Kim-Kang, Appl Psychol Meas 34:238–254, 2010; Glas & Dagohoy, Psychometrika 72:159–180, 2007; Guo & Drasgow, Int J Sel Assess 18:351–364, 2010; Klauer & Rettig, Br J Math Stat Psychol 43:193–206, 1990; Sinharay, J Educ Behav Stat 42:46–68, 2017). This paper shows that approaches based on higher-order asymptotics (e.g., Barndorff-Nielsen & Cox, Inference and asymptotics. Springer, London, 1994; Ghosh, Higher order asymptotics. Institute of Mathematical Statistics, Hayward, 1994) can also be used to test for the equality of the examinee ability over two sets of items. The modified signed likelihood ratio test (e.g., Barndorff-Nielsen, Biometrika 73:307–322, 1986) and the Lugannani–Rice approximation (Lugannani & Rice, Adv Appl Prob 12:475–490, 1980), both of which are based on higher-order asymptotics, are shown to provide some improvement over the traditional frequentist approaches in three simulations. Two real data examples are also provided.
Similar content being viewed by others
Notes
Finkelman et al. (2010) suggested the computation of \(s^2_1({\theta _1})\) and \(s^2_2({\theta _2})\) using \(\theta _1=\theta _2=\hat{\theta }_0\). Instead, one can use \(\theta _1=\hat{\theta }_1\) and \(\theta _2=\hat{\theta }_2\) to perform the Wald test—this variation did not produce results that are much different in a limited simulation. Therefore, results using \(\theta _1=\theta _2=\hat{\theta }_0\) are reported in this paper.
Note that the item parameters are assumed known throughout this paper.
If both \(\hat{\theta }_1\) and \(\hat{\theta }_2\) under the square root sign in the expression of \(q(\psi _0)\) are replaced by \(\hat{\theta }_0\), \(q(\psi _0)\) would become identical to the Wald statistic.
The results using the weighted maximum likelihood estimator (WLE; Warm, 1989) of ability were very similar.
To create the plot for LRA, which lies between 0 to 1, the standard normal quantile of the LRA was used as the input; that is because the use of the LRA provided by Eq. 14 to test \(H_0\) is equivalent to the use of the standard normal quantile of the LRA as a statistic along with a standard normal null distribution assumption.
For quantiles between − 4 and 2, the curves for the SS and SLR statistics and the LRA were very close to the diagonal line.
Note that 3 has been subtracted from the formula of kurtosis so that the kurtosis of the standard normal distribution is 0 according to the formula used in this paper.
that is expected given that the null distribution of all these statistics converges to the standard normal distribution as test length increases.
the SS statistic in this case is a signed square root of the statistic that Glas and Dagohoy (2007) used to test against a two-sided alternative.
The method used by Haberman (2006) was applied to our first real data example—the 3PLM did not provide a substantial gain over the 2PLM for that data set either.
References
Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters based on the standardized signed log likelihood ratio. Biometrika, 73, 307–322.
Barndorff-Nielsen, O. E. (1991). Modified signed log likelihood ratio. Biometrika, 78, 557–563.
Barndorff-Nielsen, O. E., & Cox, D. R. (1994). Inference and Asymptotics. London: Springer.
Bedrick, E. J. (1997). Approximating the conditional distribution of person fit indexes for checking the Rasch model. Psychometrika, 62, 191–199.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B (Methodological), 57, 289–300.
Biehler, M., Holling, H., & Doebler, P. (2015). Saddlepoint approximations of the distribution of the person parameter in the two parameter logistic model. Psychometrika, 80, 665–688.
Brazzale, A. R., Davison, A. C., & Reid, N. (2007). Applied asymptotics. Oxford: Cambridge University Press.
Cizek, G. J., & Wollack, J. A. (2017). Handbook of quantitative methods for detecting cheating on tests. Washington, DC: Routledge.
Costa, P. T., & McCrae, R. R. (1992). Normal personality assessment in clinical practice: The NEO personality inventory. Psychological Assessment, 4, 5–13.
Cox, D. R., & Hinkley, D. V. (1974). Theoretical statistics. London: Chapman and Hall.
Donoghue, J. R. (1994). An empirical examination of the IRT information of polytomously scored reading items under the generalized partial credit model. Journal of Educational Measurement, 31, 295–311.
Drasgow, F., Levine, M. V., & Zickar, M. J. (1996). Optimal identification of mismeasured individuals. Applied Measurement in Education, 9, 47–64.
Ferrara, S. (2017). A framework for policies and practices to improve test security programs: Prevention, detection, investigation, and resolution (PDIR). Educational Measurement: Issues and Practice, 36(3), 5–24.
Finkelman, M., Weiss, D. J., & Kim-Kang, G. (2010). Item selection and hypothesis testing for the adaptive measurement of change. Applied Psychological Measurement, 34, 238–254.
Fischer, G. H. (2003). The precision of gain scores under an item response theory perspective: A comparison of asymptotic and exact conditional inference about change. Applied Psychological Measurement, 27, 3–26.
Ghosh, J. K. (1994). Higher order asymptotics. Hayward, CA: Institute of Mathematical Statistics.
Glas, C. A. W., & Dagohoy, A. V. T. (2007). A person fit test for IRT models for polytomous items. Psychometrika, 72, 159–180.
Guo, J., & Drasgow, F. (2010). Identifying cheating on unproctored internet tests: The z-test and the likelihood ratio test. International Journal of Selection and Assessment, 18, 351–364.
Haberman, S. J. (2006). An elementary test of the normal 2PL model against the normal 3PL alternative. ETS Research Report RR-06-14, ETS, Princeton, NJ.
Haberman, S. J., & Lee, Y.-H. (2017). A statistical procedure for testing unusually frequent exactly matching responses and nearly matching responses. ETS Research Report RR-17-23, ETS, Princeton, NJ.
Jensen, J. L. (1992). The modified signed likelihood statistic and saddlepoint approximations. Biometrika, 79, 693–703.
Jensen, J. L. (1995). Saddlepoint approximations. Oxford: Clarendon Press.
Jensen, J. L. (1997). A simple derivation of r\(^*\) for curved exponential families. Scandinavian Journal of Statistics, 24, 33–46.
Klauer, K. C. (1991). An exact and optimal standardized person test for assessing consistency with the Rasch model. Psychometrika, 56, 213–228.
Klauer, K. C., & Rettig, K. (1990). An approximately standardized person test for assessing consistency with a latent trait model. British Journal of Mathematical and Statistical Psychology, 43, 193–206.
Lewis, C., & Thayer, D. T. (1998). The power of the K-index (or PMIR) to detect copying. ETS Research Report 98-49, Educational Testing Service, Princeton, NJ.
Lugannani, R., & Rice, S. (1980). Saddle point approximation for the distribution of the sum of independent random variables. Advances in Applied Probability, 12, 475–490.
Maris, G., & Bechger, T. (2009). On interpreting the model parameters for the three parameter logistic model. Measurement: Interdisciplinary Research and Perspective, 7(2), 75–88.
Martín, E. S., González, J., & Tuerlinckx, F. (2015). On the unidentifiability of the fixed-effects 3PL model. Psychometrika, 80, 450–467.
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.
Pierce, D. A., & Peters, D. (1992). Practical use of higher-order asymptotics for multiparameter exponential families. Journal of Royal Statistical Society, Series B, 54, 701–738.
R Core Team. (2017). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Rao, C. R. (1973). Linear statistical inference and its applications (2nd ed.). New York, NY: Wiley.
Reid, N. (2003). Asymptotics and the theory of inference. The Annals of Statistics, 31, 1695–1731.
Sinharay, S. (2017). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42, 46–68.
Sinharay, S., Duong, M. Q., & Wood, S. W. (2017). A New Statistic for Detection of Aberrant Answer Changes. Journal of Educational Measurement, 54, 200–217.
Skorupski, W. P., & Wainer, H. (2017). The case for Bayesian methods when investigating test fraud. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 214–231). Washington, DC: Routledge.
Skovgaard, I. M. (1990). On the density of minimum contrast estimators. The Annals of Statistics, 18, 779–789.
von Davier, M., & Molenaar, I. W. (2003). A person-fit index for polytomous Rasch models, latent class models, and their mixture generalizations. Psychometrika, 68, 213–228.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427–450.
Wollack, J. A., Cohen, A. S., & Eckerly, C. A. (2015). Detecting test tampering using item response theory. Educational and Psychological Measurement, 75, 931–953.
Wollack, J. A., & Eckerly, C. (2017). Detecting test tampering at the group level. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 214–231). Washington, DC: Routledge.
Wollack, J. A., & Schoenig, R. W. (2018). Cheating. In B. B. Frey (Ed.), The SAGE encyclopedia of educational research, measurement, and evaluation (pp. 260–265). Thousand Oaks, CA: Sage.
Author information
Authors and Affiliations
Corresponding author
Additional information
Note: The research reported in this article was supported by the Institute of Education Sciences (IES), U.S. Department of Education, through grant R305D170026. Any opinions expressed in this publication are those of the author and not necessarily of IES or Educational Testing Service.
Appendix: The MSLRT and the LRA for the GPCM
Appendix: The MSLRT and the LRA for the GPCM
Donoghue (1994) provided the result that for log likelihood given by Eq. 21,
Noting that
Equation 25 can be rewritten as
For two sets of items, using notations introduced below Eq. 22, the joint log likelihood of \(\theta _1\) and \(\theta _2\) is given by
The last equality holds because \(\sum _{k=0}^{m_{i}} d_k(X_{i})=\sum _{k=0}^{m_{j}} d_k(Y_j)=1\) under the assumption that no data are missing, which means, for example, that \(\sum _{k=0}^{m_{i}} d_k(X_{i})\log (\Gamma _{i}(\theta _1))=\log (\Gamma _{i}(\theta _1))\).
Let us apply the transformations \(\psi =\theta _2-\theta _1\) and \(\lambda =\theta _1\), which means that \(\theta _1=\lambda \) and \(\theta _2=\psi +\lambda \). Let us also denote
Note that both \(S_1\) and \(S_2\) are functions of the data (\(X_{i}\)’s and \(Y_j\)’s) and not of the parameters (\(\theta _1\) and \(\theta _2\)). The above log likelihood, \(\ell (\theta _1,\theta _2)\), or, \(\ell (\psi ,\lambda )\), then is given by
The above log likelihood belongs to the exponential family of distributions with canonical parameters \(\psi \) and \(\lambda \) and joint sufficient statistics \(S_1\) and \((S_1+S_2)\).
Then, given the discussion on the applicability of the MSLRT and LRA to the exponential family of distributions, the MSLRT and LRA can be applied to test \(H_0:\psi =0\), or, \(H_0:\theta _1=\theta _2\), in applications of the GPCM. The SLR statistic is given by
Then, using the result provided in Eq. 26 (or, by differentiating the joint log likelihood provided in Eq. 29 twice),
Then,
which implies that
One can obtain \(j_{\lambda \lambda }(\psi _0,\hat{\lambda }_{\psi _0})\) from Eq. 31 as
Then, \(q(\psi _0)\) is given by
Once \(q(\psi _0)\) is computed using Eq. 34 and \(r(\psi _0)\) is computed using Eq. 30, one can compute the MSLR statistic \( r^*(\psi _0)\) using Eq. 12 and can compute the LRA using Eq. 14.
Rights and permissions
About this article
Cite this article
Sinharay, S., Jensen, J.L. Higher-Order Asymptotics and Its Application to Testing the Equality of the Examinee Ability Over Two Sets of Items. Psychometrika 84, 484–510 (2019). https://doi.org/10.1007/s11336-018-9627-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-018-9627-8