Skip to main content
Log in

Detecting test fraud using Bayes factors

  • Invited Paper
  • Published:
Behaviormetrika Aims and scope Submit manuscript

Abstract

According to Wollack and Schoenig (Cheating, in: Frey BB (ed) The SAGE encyclopedia of educational research, measurement, and evaluation, Sage, Thousand Oaks, pp 260–265, 2018), score differencing is one of six types of statistical methods used to detect test fraud. In this paper, we suggest the use of Bayes factors (e.g., Kass and Raftery in J Am Stat Assoc 90:773–795, 1995) for score differencing. A simulation study shows that the suggested approach performs slightly better than an existing frequentist approach. We demonstrate the usefulness of the suggested approach using a real data set that involves actual test fraud.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Note that the term “score differencing” was used in only one of these references. However, the methods suggested in these references are various versions of “score differencing.”

  2. Note that \(\int _{\theta _1=-\infty }^{\theta _1=\infty }\int _{\theta _2=-\infty }^{\theta _2=\infty }2 \phi (\theta _1)\frac{1}{\sqrt{10}}\phi (\frac{\theta _2}{\sqrt{10}})I(\theta _2\ge \theta _1)\mathrm{d}\theta _1\mathrm{d}\theta _2=1\).

  3. Sinharay (2017a) noted this phenomenon that occurs when \(\hat{\theta }_1\) and \(\hat{\theta }_2\) are very close—a conclusion of no significant score difference is made for the corresponding examinees.

  4. That can be achieved by using 1.64 as the cutoff for the SLR statistic and a simulation-based cutoff for the Bayes factor.

  5. Though, in those simulations, we noticed a slight tendency of the Bayes factor increasing with an increase in the prior variances of the ability distributions.

  6. Sinharay and Johnson (2020) made some progress regarding the use of the posterior probability for score differencing.

References

  • Allen J, Ghattas A (2016) Estimating the probability of traditional copying, conditional on answer-copying statistics. Appl Psycho Meas 40:258–273

    Article  Google Scholar 

  • Chen W-H, Thissen D (1997) Local dependence indexes for item pairs using item response theory. J Educ Behav Stat 22:265–289

    Article  Google Scholar 

  • Cizek GJ, Wollack JA (2017) Handbook of detecting cheating on tests. Routledge, Washington, DC

    Google Scholar 

  • Cox DR (2006) Principles of statistical inference. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Drasgow F, Levine MV, Williams EA (1985) Appropriateness measurement with polychotomous item response models and standardized indices. Br J Math Stat Psychol 38:67–86

    Article  Google Scholar 

  • Finkelman M, Weiss DJ, Kim-Kang G (2010) Item selection and hypothesis testing for the adaptive measurement of change. Appl Psychol Meas 34:238–254

    Article  Google Scholar 

  • Fischer GH (2003) The precision of gain scores under an item response theory perspective: a comparison of asymptotic and exact conditional inference about change. Appl Psychol Meas 27:3–26

    Article  MathSciNet  Google Scholar 

  • Fox J-P, Mulder J, Sinharay S (2017) Bayes factor covariance testing in item response models. Psychometrika 82:979–1006

    Article  MathSciNet  MATH  Google Scholar 

  • Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2014) Bayesian data analysis, 3rd edn. Chapman and Hall, New York

    MATH  Google Scholar 

  • Gu X, Mulder J, Deković M, Hoijtink H (2014) Bayesian evaluation of inequality constrained hypotheses. Psychol Methods 19:511–527

    Article  Google Scholar 

  • Guo J, Drasgow F (2010) Identifying cheating on unproctored internet tests: the Z-test and the likelihood ratio test. Int J Sel Assess 18:351–364

    Article  Google Scholar 

  • Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36

    Article  Google Scholar 

  • Hoijtink H, Mulder J, van Lissa C, Gu X (2019) A tutorial on testing hypotheses using the Bayes factor. Psychol Methods. https://doi.org/10.1037/met0000201

    Article  Google Scholar 

  • Jeffreys H (1961) Theory of probability. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795

    Article  MathSciNet  MATH  Google Scholar 

  • Klugkist I, Laudy O, Hoijtink H (2005) Inequality constrained analysis of variance: a Bayesian approach. Psychol Methods 10:477–493

    Article  Google Scholar 

  • Masson MEJ (2011) A tutorial on a practical Bayesian alternative to null-hypothesis significance testing. Behav Res Methods 43:679–690

    Article  Google Scholar 

  • Morey RD, Romeijn J-W, Rouder JN (2016) The philosophy of Bayes factors and the quantification of statistical evidence. J Math Psychol 72:6–18

    Article  MathSciNet  MATH  Google Scholar 

  • Mulder J, Klugkist I, van de Schoot R, Meeus WHJ, Selfhout M, Hoijtink H (2009) Bayesian model selection of informative hypotheses for repeated measurements. J Math Psychol 53:530–546

    Article  MathSciNet  MATH  Google Scholar 

  • Orlando M, Thissen D (2000) Likelihood-based item-fit indices for dichotomous item response theory models. Appl Psychol Meas 24:50–64

    Article  Google Scholar 

  • Schönbrodt FD, Wagenmakers E-J, Zehetleitner M, Perugini M (2017) Sequential hypothesis testing with Bayes factors: efficiently testing mean differences. Psychol Methods 22:322–339

    Article  Google Scholar 

  • Sinharay S (2017a) Detection of item preknowledge using likelihood ratio test and score test. J Educ Behav Stat 42:46–68

    Article  Google Scholar 

  • Sinharay S (2017b) Which statistic should be used to detect item preknowledge when the set of compromised items is known? Appl Psychol Meas 41:403–421

    Article  Google Scholar 

  • Sinharay S (2018) Application of Bayesian methods for detecting fraudulent behavior on tests. Meas Interdiscip Res Perspect 16:100–113

    Article  Google Scholar 

  • Sinharay S, Jensen JL (2019) Higher-order asymptotics and its application to testing the equality of the examinee ability over two sets of items. Psychometrika 84:484–510

    Article  MathSciNet  MATH  Google Scholar 

  • Sinharay S, Johnson MS (2020) The use of the posterior probability in score differencing. J Educ Behav Stat (in press)

  • Sinharay S, Duong MQ, Wood SW (2017) A new statistic for detection of aberrant answer changes. J Educ Meas 54:200–217

    Article  Google Scholar 

  • Skorupski WP, Wainer H (2017) The case for Bayesian methods when investigating test fraud. In: Cizek GJ, Wollack JA (eds) Handbook of detecting cheating on tests. Routledge, Washington, DC, pp 214–231

    Google Scholar 

  • Stern HS (2005) Model inference or model selection: discussion of Klugkist, Laudy, and Hoijtink (2005). Psychol Methods 10:494–499

    Article  Google Scholar 

  • Tendeiro JN, Meijer RR (2014) Detection of invalid test scores: the usefulness of simple nonparametric statistics. J Educ Meas 51:239–259

    Article  Google Scholar 

  • Tijmstra J, Hoijtink H, Sijtsma K (2015) Evaluating manifest monotonicity using bayes factors. Psychometrika 80:880–896

    Article  MathSciNet  MATH  Google Scholar 

  • van der Linden WJ (2009) A bivariate lognormal response-time model for the detection of collusion between test takers. J Educ Behav Stat 34:378–394

    Article  Google Scholar 

  • van der Linden WJ, Lewis C (2015) Bayesian checks on cheating on tests. Psychometrika 80:689–706

    Article  MathSciNet  MATH  Google Scholar 

  • Verhagen J, Levy R, Millsap RE, Fox J-P (2016) Evaluating evidence for invariant items: a Bayes factor applied to testing measurement invariance in IRT models. J Math Psychol 72:171–182

    Article  MathSciNet  MATH  Google Scholar 

  • Wagenmakers E-J (2007) A practical solution to the pervasive problems of p values. Psychon Bull Rev 14:779–804

    Article  Google Scholar 

  • Wang X, Liu Y, Hambleton RK (2017) Detecting item preknowledge using a predictive checking method. Appl Psychol Meas 41:243–263

    Article  Google Scholar 

  • Wang X, Liu Y, Robin F, Guo H (2019) A comparison of methods for detecting examinee preknowledge of items. Int J Test 19:207–226

    Article  Google Scholar 

  • Warm TA (1989) Weighted likelihood estimation of ability in item response theory. Psychometrika 54:427–450

    Article  MathSciNet  Google Scholar 

  • Wasserman L (2004) All of statistics: a concise course in statistical inference. Springer, New York

    Book  MATH  Google Scholar 

  • Wasserstein RL, Lazar NA (2016) The ASA’s statement on p-values: context, process, and purpose. Am Stat 70:129–133

    Article  MathSciNet  Google Scholar 

  • Wetzels R, Matzke D, Lee MD, Rouder JN, Iverson GJ, Wagenmakers E-J (2011) Statistical evidence in experimental psychology. Perspect Psychol Sci 6:291–298

    Article  Google Scholar 

  • Wollack JA, Schoenig RW (2018) Cheating. In: Frey BB (ed) The SAGE encyclopedia of educational research, measurement, and evaluation. Sage, Thousand Oaks, pp 260–265

    Google Scholar 

  • Wollack JA, Cohen AS, Eckerly CA (2015) Detecting test tampering using item response theory. Educ Psychol Meas 75:931–953

    Article  Google Scholar 

Download references

Acknowledgements

The authors wish to express sincere appreciation and gratitude to Wim van der Linden and Kazuo Shigemasu, the editors. The authors thank Sooyeon Kim, Carol Eckerly, and Daniel McCaffrey for their helpful comments on an earlier version. Any opinions expressed in this publication are those of the authors and not necessarily of ETS or of Institute of Education Sciences. The research was supported by the Institute of Education Sciences, US Department of Education, through Grant R305D170026.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandip Sinharay.

Additional information

Communicated by Kazuo Shigemasu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Any opinions expressed in this publication are those of the authors and not necessarily of Educational Testing Service or Institute of Education Sciences.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sinharay, S., Johnson, M.S. Detecting test fraud using Bayes factors. Behaviormetrika 47, 339–354 (2020). https://doi.org/10.1007/s41237-020-00113-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41237-020-00113-9

Keywords

Navigation