Skip to main content
Log in

Higher-Order Asymptotics and Its Application to Testing the Equality of the Examinee Ability Over Two Sets of Items

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

In educational and psychological measurement, researchers and/or practitioners are often interested in examining whether the ability of an examinee is the same over two sets of items. Such problems can arise in measurement of change, detection of cheating on unproctored tests, erasure analysis, detection of item preknowledge, etc. Traditional frequentist approaches that are used in such problems include the Wald test, the likelihood ratio test, and the score test (e.g., Fischer, Appl Psychol Meas 27:3–26, 2003; Finkelman, Weiss, & Kim-Kang, Appl Psychol Meas 34:238–254, 2010; Glas & Dagohoy, Psychometrika 72:159–180, 2007; Guo & Drasgow, Int J Sel Assess 18:351–364, 2010; Klauer & Rettig, Br J Math Stat Psychol 43:193–206, 1990; Sinharay, J Educ Behav Stat 42:46–68, 2017). This paper shows that approaches based on higher-order asymptotics (e.g., Barndorff-Nielsen & Cox, Inference and asymptotics. Springer, London, 1994; Ghosh, Higher order asymptotics. Institute of Mathematical Statistics, Hayward, 1994) can also be used to test for the equality of the examinee ability over two sets of items. The modified signed likelihood ratio test (e.g., Barndorff-Nielsen, Biometrika 73:307–322, 1986) and the Lugannani–Rice approximation (Lugannani & Rice, Adv Appl Prob 12:475–490, 1980), both of which are based on higher-order asymptotics, are shown to provide some improvement over the traditional frequentist approaches in three simulations. Two real data examples are also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Finkelman et al. (2010) suggested the computation of \(s^2_1({\theta _1})\) and \(s^2_2({\theta _2})\) using \(\theta _1=\theta _2=\hat{\theta }_0\). Instead, one can use \(\theta _1=\hat{\theta }_1\) and \(\theta _2=\hat{\theta }_2\) to perform the Wald test—this variation did not produce results that are much different in a limited simulation. Therefore, results using \(\theta _1=\theta _2=\hat{\theta }_0\) are reported in this paper.

  2. Note that the item parameters are assumed known throughout this paper.

  3. If both \(\hat{\theta }_1\) and \(\hat{\theta }_2\) under the square root sign in the expression of \(q(\psi _0)\) are replaced by \(\hat{\theta }_0\), \(q(\psi _0)\) would become identical to the Wald statistic.

  4. The results using the weighted maximum likelihood estimator (WLE; Warm, 1989) of ability were very similar.

  5. To create the plot for LRA, which lies between 0 to 1, the standard normal quantile of the LRA was used as the input; that is because the use of the LRA provided by Eq. 14 to test \(H_0\) is equivalent to the use of the standard normal quantile of the LRA as a statistic along with a standard normal null distribution assumption.

  6. For quantiles between − 4 and 2, the curves for the SS and SLR statistics and the LRA were very close to the diagonal line.

  7. Note that 3 has been subtracted from the formula of kurtosis so that the kurtosis of the standard normal distribution is 0 according to the formula used in this paper.

  8. that is expected given that the null distribution of all these statistics converges to the standard normal distribution as test length increases.

  9. the SS statistic in this case is a signed square root of the statistic that Glas and Dagohoy (2007) used to test against a two-sided alternative.

  10. The method used by Haberman (2006) was applied to our first real data example—the 3PLM did not provide a substantial gain over the 2PLM for that data set either.

References

  • Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters based on the standardized signed log likelihood ratio. Biometrika, 73, 307–322.

    Google Scholar 

  • Barndorff-Nielsen, O. E. (1991). Modified signed log likelihood ratio. Biometrika, 78, 557–563.

    Article  Google Scholar 

  • Barndorff-Nielsen, O. E., & Cox, D. R. (1994). Inference and Asymptotics. London: Springer.

    Book  Google Scholar 

  • Bedrick, E. J. (1997). Approximating the conditional distribution of person fit indexes for checking the Rasch model. Psychometrika, 62, 191–199.

    Article  Google Scholar 

  • Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B (Methodological), 57, 289–300.

    Article  Google Scholar 

  • Biehler, M., Holling, H., & Doebler, P. (2015). Saddlepoint approximations of the distribution of the person parameter in the two parameter logistic model. Psychometrika, 80, 665–688.

    Article  PubMed  Google Scholar 

  • Brazzale, A. R., Davison, A. C., & Reid, N. (2007). Applied asymptotics. Oxford: Cambridge University Press.

    Book  Google Scholar 

  • Cizek, G. J., & Wollack, J. A. (2017). Handbook of quantitative methods for detecting cheating on tests. Washington, DC: Routledge.

    Google Scholar 

  • Costa, P. T., & McCrae, R. R. (1992). Normal personality assessment in clinical practice: The NEO personality inventory. Psychological Assessment, 4, 5–13.

    Article  Google Scholar 

  • Cox, D. R., & Hinkley, D. V. (1974). Theoretical statistics. London: Chapman and Hall.

    Book  Google Scholar 

  • Donoghue, J. R. (1994). An empirical examination of the IRT information of polytomously scored reading items under the generalized partial credit model. Journal of Educational Measurement, 31, 295–311.

    Article  Google Scholar 

  • Drasgow, F., Levine, M. V., & Zickar, M. J. (1996). Optimal identification of mismeasured individuals. Applied Measurement in Education, 9, 47–64.

    Article  Google Scholar 

  • Ferrara, S. (2017). A framework for policies and practices to improve test security programs: Prevention, detection, investigation, and resolution (PDIR). Educational Measurement: Issues and Practice, 36(3), 5–24.

    Article  Google Scholar 

  • Finkelman, M., Weiss, D. J., & Kim-Kang, G. (2010). Item selection and hypothesis testing for the adaptive measurement of change. Applied Psychological Measurement, 34, 238–254.

    Article  Google Scholar 

  • Fischer, G. H. (2003). The precision of gain scores under an item response theory perspective: A comparison of asymptotic and exact conditional inference about change. Applied Psychological Measurement, 27, 3–26.

    Article  Google Scholar 

  • Ghosh, J. K. (1994). Higher order asymptotics. Hayward, CA: Institute of Mathematical Statistics.

    Google Scholar 

  • Glas, C. A. W., & Dagohoy, A. V. T. (2007). A person fit test for IRT models for polytomous items. Psychometrika, 72, 159–180.

    Article  Google Scholar 

  • Guo, J., & Drasgow, F. (2010). Identifying cheating on unproctored internet tests: The z-test and the likelihood ratio test. International Journal of Selection and Assessment, 18, 351–364.

    Article  Google Scholar 

  • Haberman, S. J. (2006). An elementary test of the normal 2PL model against the normal 3PL alternative. ETS Research Report RR-06-14, ETS, Princeton, NJ.

  • Haberman, S. J., & Lee, Y.-H. (2017). A statistical procedure for testing unusually frequent exactly matching responses and nearly matching responses. ETS Research Report RR-17-23, ETS, Princeton, NJ.

  • Jensen, J. L. (1992). The modified signed likelihood statistic and saddlepoint approximations. Biometrika, 79, 693–703.

    Article  Google Scholar 

  • Jensen, J. L. (1995). Saddlepoint approximations. Oxford: Clarendon Press.

    Google Scholar 

  • Jensen, J. L. (1997). A simple derivation of r\(^*\) for curved exponential families. Scandinavian Journal of Statistics, 24, 33–46.

    Google Scholar 

  • Klauer, K. C. (1991). An exact and optimal standardized person test for assessing consistency with the Rasch model. Psychometrika, 56, 213–228.

    Article  Google Scholar 

  • Klauer, K. C., & Rettig, K. (1990). An approximately standardized person test for assessing consistency with a latent trait model. British Journal of Mathematical and Statistical Psychology, 43, 193–206.

    Article  Google Scholar 

  • Lewis, C., & Thayer, D. T. (1998). The power of the K-index (or PMIR) to detect copying. ETS Research Report 98-49, Educational Testing Service, Princeton, NJ.

  • Lugannani, R., & Rice, S. (1980). Saddle point approximation for the distribution of the sum of independent random variables. Advances in Applied Probability, 12, 475–490.

    Article  Google Scholar 

  • Maris, G., & Bechger, T. (2009). On interpreting the model parameters for the three parameter logistic model. Measurement: Interdisciplinary Research and Perspective, 7(2), 75–88.

    Google Scholar 

  • Martín, E. S., González, J., & Tuerlinckx, F. (2015). On the unidentifiability of the fixed-effects 3PL model. Psychometrika, 80, 450–467.

    Article  Google Scholar 

  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.

    Article  Google Scholar 

  • Pierce, D. A., & Peters, D. (1992). Practical use of higher-order asymptotics for multiparameter exponential families. Journal of Royal Statistical Society, Series B, 54, 701–738.

    Google Scholar 

  • R Core Team. (2017). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.

    Google Scholar 

  • Rao, C. R. (1973). Linear statistical inference and its applications (2nd ed.). New York, NY: Wiley.

    Book  Google Scholar 

  • Reid, N. (2003). Asymptotics and the theory of inference. The Annals of Statistics, 31, 1695–1731.

    Article  Google Scholar 

  • Sinharay, S. (2017). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42, 46–68.

    Article  Google Scholar 

  • Sinharay, S., Duong, M. Q., & Wood, S. W. (2017). A New Statistic for Detection of Aberrant Answer Changes. Journal of Educational Measurement, 54, 200–217.

    Article  Google Scholar 

  • Skorupski, W. P., & Wainer, H. (2017). The case for Bayesian methods when investigating test fraud. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 214–231). Washington, DC: Routledge.

    Google Scholar 

  • Skovgaard, I. M. (1990). On the density of minimum contrast estimators. The Annals of Statistics, 18, 779–789.

    Article  Google Scholar 

  • von Davier, M., & Molenaar, I. W. (2003). A person-fit index for polytomous Rasch models, latent class models, and their mixture generalizations. Psychometrika, 68, 213–228.

    Article  Google Scholar 

  • Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427–450.

    Article  Google Scholar 

  • Wollack, J. A., Cohen, A. S., & Eckerly, C. A. (2015). Detecting test tampering using item response theory. Educational and Psychological Measurement, 75, 931–953.

    Article  PubMed  PubMed Central  Google Scholar 

  • Wollack, J. A., & Eckerly, C. (2017). Detecting test tampering at the group level. In G. J. Cizek & J. A. Wollack (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 214–231). Washington, DC: Routledge.

    Google Scholar 

  • Wollack, J. A., & Schoenig, R. W. (2018). Cheating. In B. B. Frey (Ed.), The SAGE encyclopedia of educational research, measurement, and evaluation (pp. 260–265). Thousand Oaks, CA: Sage.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandip Sinharay.

Additional information

Note: The research reported in this article was supported by the Institute of Education Sciences (IES), U.S. Department of Education, through grant R305D170026. Any opinions expressed in this publication are those of the author and not necessarily of IES or Educational Testing Service.

Appendix: The MSLRT and the LRA for the GPCM

Appendix: The MSLRT and the LRA for the GPCM

Donoghue (1994) provided the result that for log likelihood given by Eq. 21,

$$\begin{aligned} \frac{\partial ^2 \ell (\theta ) }{\partial \theta ^2}= \sum _{i=1}^{n} a^2_i \left( \left[ \sum _{k=0}^{m_i} kP_{ik}(\theta ) \right] ^2-\sum _{k=0}^{m_i} k^2P_{ik}(\theta ) \right) \cdot \end{aligned}$$
(25)

Noting that

$$\begin{aligned} \mathop {\mathrm {Var}}\nolimits (X_i|\theta )=\sum _{k=0}^{m_i} k^2P_{ik}(\theta ) - \left[ \sum _{k=0}^{m_i} kP_{ik}(\theta ) \right] ^2, \end{aligned}$$

Equation 25 can be rewritten as

$$\begin{aligned} \frac{\partial ^2 \ell (\theta ) }{\partial \theta ^2}= - \sum _{i=1}^{n} a^2_i \mathop {\mathrm {Var}}\nolimits (X_i|\theta ) \cdot \end{aligned}$$
(26)

For two sets of items, using notations introduced below Eq. 22, the joint log likelihood of \(\theta _1\) and \(\theta _2\) is given by

$$\begin{aligned} \ell (\theta _1,\theta _2)= & {} \sum _{{i}=1}^{n_1} \sum _{k=0}^{m_{i}} d_k(X_{i}) \log P_{{i}k}(\theta _1)+\sum _{j=1}^{n_2} \sum _{k=0}^{m_{j}} d_k(Y_j) \log \tilde{P}_{{j}k}(\theta _2) \nonumber \\= & {} \sum _{{i}=1}^{n_1} \sum _{k=0}^{m_{i}} d_k(X_{i})\left\{ \sum _{h=0}^{k} a_{i}(\theta _1-b_{{i}h}) - \log (\Gamma _{i}(\theta _1)) \right\} \nonumber \\&\quad +\, \sum _{{j}=1}^{n_2} \sum _{k=0}^{m_{j}} d_k(Y_j)\left\{ \sum _{h=0}^{k} \tilde{a}_{j}(\theta _2-\tilde{b}_{{j}h}) - \log (\tilde{\Gamma }_{{j}}(\theta _2)) \right\} \nonumber \\= & {} \theta _1 \sum _{{i}=1}^{n_1} a_{i} \sum _{k=0}^{m_{i}} (k+1) d_k(X_{i}) - \sum _{{i}=1}^{n_1} a_{i} \sum _{k=0}^{m_{i}} d_k(X_{i})\sum _{h=0}^{k} b_{{i}h} - \sum _{{i}=1}^{n_1} \log (\Gamma _{{i}}(\theta _1))\nonumber \\&\quad +\, \theta _2 \sum _{{j}=1}^{n_2} \tilde{a}_{j} \sum _{k=0}^{m_{j}} (k+1) d_k(Y_j) - \sum _{{j}=1}^{n_2} \tilde{a}_{j} \sum _{k=0}^{m_{j}} d_k(Y_j)\sum _{h=0}^{k} \tilde{b}_{{j}h} - \sum _{{j}=1}^{n_2} \log (\tilde{\Gamma }_{{j}}(\theta _2))\nonumber \\ \end{aligned}$$
(27)

The last equality holds because \(\sum _{k=0}^{m_{i}} d_k(X_{i})=\sum _{k=0}^{m_{j}} d_k(Y_j)=1\) under the assumption that no data are missing, which means, for example, that \(\sum _{k=0}^{m_{i}} d_k(X_{i})\log (\Gamma _{i}(\theta _1))=\log (\Gamma _{i}(\theta _1))\).

Let us apply the transformations \(\psi =\theta _2-\theta _1\) and \(\lambda =\theta _1\), which means that \(\theta _1=\lambda \) and \(\theta _2=\psi +\lambda \). Let us also denote

$$\begin{aligned} S_1=\sum _{{i}=1}^{n_1} a_{i} \sum _{k=0}^{m_{i}} (k+1) d_k(X_{i}), \text{ and } S_2= \sum _{{j}=1}^{n_2} \tilde{a}_{j} \sum _{k=0}^{m_{j}} (k+1) d_k(Y_j) \cdot \end{aligned}$$

Note that both \(S_1\) and \(S_2\) are functions of the data (\(X_{i}\)’s and \(Y_j\)’s) and not of the parameters (\(\theta _1\) and \(\theta _2\)). The above log likelihood, \(\ell (\theta _1,\theta _2)\), or, \(\ell (\psi ,\lambda )\), then is given by

$$\begin{aligned} \ell (\psi ,\lambda )= & {} \sum _{{i}=1}^{n_1} \sum _{k=0}^{m_{i}} d_k(X_{i}) \log P_{{i}k}(\lambda )+\sum _{j=1}^{n_2} \sum _{k=0}^{m_{j}} d_k(Y_j) \log \tilde{P}_{{j}k}(\psi +\lambda ) \end{aligned}$$
(28)
$$\begin{aligned}= & {} \lambda S_1 - \sum _{{i}=1}^{n_1} a_{i} \sum _{k=0}^{m_{i}} d_k(X_{i})\sum _{h=0}^{k} b_{{i}h} - \sum _{{i}=1}^{n_1} \log (\Gamma _{{i}}(\lambda )) \nonumber \\&+\, (\psi +\lambda ) S_2 - \sum _{{j}=1}^{n_2} \tilde{a}_{j} \sum _{k=0}^{m_{j}} d_k(Y_j)\sum _{h=0}^{k} \tilde{b}_{{j}h} - \sum _{{j}=1}^{n_2} \log (\tilde{\Gamma }_{{j}}(\psi +\lambda )) \nonumber \\= & {} S_2 \psi + (S_1+S_2) \lambda - \sum _{{i}=1}^{n_1} \log (\Gamma _{{i}}(\lambda )) - \sum _{{j}=1}^{n_2} \log (\tilde{\Gamma }_{{j}}(\psi +\lambda )) \nonumber \\&-\, \sum _{{i}=1}^{n_1} a_{i} \sum _{k=0}^{m_{i}} d_k(X_{i})\sum _{h=0}^{k} b_{{i}h} - \sum _{{j}=1}^{n_2} \tilde{a}_{j} \sum _{k=0}^{m_{j}} d_k(Y_j)\sum _{h=0}^{k} \tilde{b}_{{j}h} \cdot \end{aligned}$$
(29)

The above log likelihood belongs to the exponential family of distributions with canonical parameters \(\psi \) and \(\lambda \) and joint sufficient statistics \(S_1\) and \((S_1+S_2)\).

Then, given the discussion on the applicability of the MSLRT and LRA to the exponential family of distributions, the MSLRT and LRA can be applied to test \(H_0:\psi =0\), or, \(H_0:\theta _1=\theta _2\), in applications of the GPCM. The SLR statistic is given by

$$\begin{aligned} r(\psi _0)= & {} \text{ sign }(\hat{\psi }-\psi _0) \left[ 2[\ell (\hat{\psi },\hat{\lambda })- \ell (\psi _0,\hat{\lambda }_{\psi _0})] \right] ^{1/2} \nonumber \\= & {} \text{ sign }(\hat{\theta }_2-\hat{\theta }_1) \sqrt{2} \left[ \sum _{{i}=1}^{n_1} \sum _{k=0}^{m_{i}} d_k(X_{i}) \log P_{{i}k}(\hat{\theta }_1)+\sum _{j=1}^{n_2} \sum _{k=0}^{m_{j}} d_k(Y_j) \log \tilde{P}_{{j}k}(\hat{\theta }_2) \right. \nonumber \\&- \left. \sum _{{i}=1}^{n_1} \sum _{k=0}^{m_{i}} d_k(X_{i}) \log P_{{i}k}(\hat{\theta }_0)-\sum _{j=1}^{n_2} \sum _{k=0}^{m_{j}} d_k(Y_j) \log \tilde{P}_{{j}k}(\hat{\theta }_0) \right] ^{1/2} \cdot \end{aligned}$$
(30)

Then, using the result provided in Eq. 26 (or, by differentiating the joint log likelihood provided in Eq. 29 twice),

$$\begin{aligned} j_{\psi \psi }({\psi },{\lambda })= & {} \sum _{j=1}^{n_2} \tilde{a}^2_{j} \mathop {\mathrm {Var}}\nolimits (Y_j|\psi +\lambda ), \nonumber \\ j_{\lambda \lambda }({\psi },{\lambda })= & {} \sum _{i=1}^{n_1} a^2_{i} \mathop {\mathrm {Var}}\nolimits (X_{i}|\lambda ) + \sum _{j=1}^{n_2} \tilde{a}^2_{j} \mathop {\mathrm {Var}}\nolimits (Y_j|\psi +\lambda ), \nonumber \\ j_{\psi \lambda }({\psi },{\lambda })= & {} j_{\lambda \psi }({\psi },{\lambda })=j_{\psi \psi }({\psi },{\lambda })\cdot \end{aligned}$$
(31)

Then,

$$\begin{aligned} |j({\psi },{\lambda })|= & {} j_{\psi \psi }({\psi },{\lambda })j_{\lambda \lambda }({\psi },{\lambda })-[j_{\psi \psi }({\psi },{\lambda })]^2= j_{\psi \psi }({\psi },{\lambda }) [j_{\lambda \lambda }({\psi },{\lambda })-j_{\psi \psi }({\psi },{\lambda }]\\= & {} \sum _{j=1}^{n_2} \tilde{a}^2_{j} \mathop {\mathrm {Var}}\nolimits (Y_j|\psi +\lambda ) \sum _{i=1}^{n_1} a^2_{i} \mathop {\mathrm {Var}}\nolimits (X_{i}|\lambda ), \end{aligned}$$

which implies that

$$\begin{aligned} |j({\hat{\psi }},{\hat{\lambda }})|= & {} \sum _{i=1}^{n_1}a^2_{i} \mathop {\mathrm {Var}}\nolimits (X_{i}|\hat{\theta }_1) \sum _{j=1}^{n_2} \tilde{a}^2_{j} \mathop {\mathrm {Var}}\nolimits (Y_j|\hat{\theta }_2) \cdot \end{aligned}$$
(32)

One can obtain \(j_{\lambda \lambda }(\psi _0,\hat{\lambda }_{\psi _0})\) from Eq. 31 as

$$\begin{aligned} j_{\lambda \lambda }(\psi _0,\hat{\lambda }_{\psi _0})= & {} \sum _{i=1}^{n_1} a^2_{i} \mathop {\mathrm {Var}}\nolimits (X_{i}|\hat{\theta }_0) + \sum _{j=1}^{n_2} \tilde{a}^2_{j} \mathop {\mathrm {Var}}\nolimits (Y_j|\hat{\theta }_0) \end{aligned}$$
(33)

Then, \(q(\psi _0)\) is given by

$$\begin{aligned} q(\psi _0)= & {} (\hat{\theta }_2-\hat{\theta }_1) \sqrt{\frac{|j(\hat{\psi },\hat{\lambda })|}{j_{\lambda \lambda }(\psi _0,\hat{\lambda }_{\psi _0})}} \nonumber \\= & {} (\hat{\theta }_2-\hat{\theta }_1) \sqrt{\frac{\sum _{i=1}^{n_1}a^2_{i} \mathop {\mathrm {Var}}\nolimits (X_{i}|\hat{\theta }_1) \sum _{j=1}^{n_2} \tilde{a}^2_{j} \mathop {\mathrm {Var}}\nolimits (Y_j|\hat{\theta }_2)}{\sum _{i=1}^{n_1} a^2_{i} \mathop {\mathrm {Var}}\nolimits (X_{i}|\hat{\theta }_0) + \sum _{j=1}^{n_2} \tilde{a}^2_{j} \mathop {\mathrm {Var}}\nolimits (Y_j|\hat{\theta }_0) }} \cdot \end{aligned}$$
(34)

Once \(q(\psi _0)\) is computed using Eq. 34 and \(r(\psi _0)\) is computed using Eq. 30, one can compute the MSLR statistic \( r^*(\psi _0)\) using Eq. 12 and can compute the LRA using Eq. 14.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sinharay, S., Jensen, J.L. Higher-Order Asymptotics and Its Application to Testing the Equality of the Examinee Ability Over Two Sets of Items. Psychometrika 84, 484–510 (2019). https://doi.org/10.1007/s11336-018-9627-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-018-9627-8

Keywords

Navigation