Comparison of Outlier Detection Methods in NEAT Design

Liu, Chunyan; Jurich, Daniel

doi:10.1007/978-3-030-74772-5_20

Chunyan Liu⁶ &
Daniel Jurich⁶

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 353))

Included in the following conference series:

The Annual Meeting of the Psychometric Society

910 Accesses
1 Citations

Abstract

In equating practice, the existence of outliers in the anchor items can deteriorate the equating accuracy and threaten the validity of test scores. This study used simulation to compare the performance of three outlier detection methods when conducting equating: the t-test method, the logit difference method, and the robust z statistic. The investigated factors include sample size, proportion of outliers, item difficulty drift direction, and group difference. Overall, across all simulated conditions, the t-test method outperformed the other methods in terms of sensitivity of flagging true outliers, specificity of flagging true non-outliers, bias of translation constant, and the root mean square error of the estimated examinee ability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altman, D. G., & Bland, J. M. (1994). Diagnostic tests. 1: Sensitivity and specificity. BMJ, 308, 1552.
Article Google Scholar
Bock, R. D., Muraki, E., & Pfeiffenberger, W. (1988). Item pool maintenance in the presence of item parameter drift. Journal of Educational Measurement, 25(4), 275–285.
Article Google Scholar
DeMars, C. E. (2004). Detection of item parameter drift over multiple test administrations. Applied Measurement in Education, 17, 265–300.
Article Google Scholar
Donoghue, J. R., & Isham, S. P. (1998). A comparison of procedures to detect item parameter drift. Applied Psychological Measurement, 22, 33–51.
Article Google Scholar
Goldstein, H. (1983). Measuring changes in educational attainment over time: Problems and possibilities. Journal of Educational Measurement, 20, 369–377.
Article Google Scholar
Harris, D. J. (1991). Equating with nonrepresentative common item sets and nonequivalent groups. Paper presented at the annual meeting of the American Educational Research Association, Chicago.
Google Scholar
He, Y., Cui, Z., Fang, Y., & Chen, H. (2013). Using a linear regression method to detect outliers in IRT common item equating. Applied Psychological Measurement, 37, 522–540.
Article Google Scholar
Hogg, R. V. (1979). Statistical robustness: One view on its use in applications today. The American Statistician, 33, 108–115.
MathSciNet MATH Google Scholar
Hu, H., Rogers, W. T., & Vukmirovic, Z. (2008). Investigation of IRT-based equating methods in the presence of outlier common items. Applied Psychological Measurement, 32, 311–333.
Article MathSciNet Google Scholar
Huang, C. Y., & Shyu, C. Y. (2003, April). The impact of item parameter drift on equating. Paper Presented at the Annual Meeting of the National Council on Measurement in Education, Chicago.
Google Scholar
Huynh, H., & Meyer, P. (2010). Use of robust z in detecting unstable items in item response theory models. Practical Assessment, Research and Evaluation, 15(2).
Google Scholar
Klein, L. W., & Jarjoura, D. (1985). The importance of content representation for common-item equating with nonrandom groups. Journal of Educational Measurement, 22, 197–206.
Article Google Scholar
Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking. Springer.
Book MATH Google Scholar
Liu, C., Jurich, D., Morrison, C., & Grabovsky, I. (2020). Detection of outliers in anchor items using modified Rasch fit statistics. Paper presented at the annual meeting of the National Council on Measurement in Education, Denver, CO.
Google Scholar
Manna, V. F., & Gu, L. (2019). Different methods of adjusting for form difficulty under the Rasch model: Impact on consistency of assessment results (Technical report RR-19-08). Educational Testing Service.
Google Scholar
Miller, G. E., Rotou, O., & Twing, J. S. (2004). Evaluation of the 0.3 logit screening criterion in common item equating. Journal of Applied Measurement, 5, 172–177.
Google Scholar
Muraki, E., & Engelhard, G. (1989). Examining differential item functioning with BIMAIN. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Google Scholar
Murphy, S., Little, I., Fan, M., Lin, C., & Kirkpatrick, R. (2010, April). The impact of different anchor stability methods on equating results and student performance. Paper presented at the annual meeting of the National Council on Measurement in Education, Denver, CO.
Google Scholar
Smith, R. M. (1996). A comparison of the Rasch separate calibration and between-fit methods of detecting item bias. Educational and Psychological Measurement, 56(3), 403–418.
Article MathSciNet Google Scholar
Smith, R. M., & Suh, K. K. (2003). Rasch fit statistics as a test of the invariance of item parameter estimates. Journal of Applied Measurement, 4, 153–163.
Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1992). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Lawrence Erlbaum Associates.
Google Scholar
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. Springer.
Book MATH Google Scholar
Winsteps & Rasch measurement Software. (2019). Fit diagnosis: Infit outfit mean-square standardized. Retrieved from http://www.winsteps.com/winman/misfitdiagnosis.htm.
Wright, B. D., & Stone, M. H. (1979). Best test design. MESA Press.
Google Scholar

Download references

Author information

Authors and Affiliations

National Board of Medical Examiners®, Philadelphia, PA, USA
Chunyan Liu & Daniel Jurich

Authors

Chunyan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Jurich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunyan Liu .

Editor information

Editors and Affiliations

Department of Statistics, USBE, Umeå University, Umeå, Västerbottens Län, Sweden
Marie Wiberg
Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
Dylan Molenaar
Facultad de Matemáticas, Pontificia Universidad Católica de Chile, Santiago, Chile
Jorge González
Kellogg School of Management, Northwestern University, Evanston, IL, USA
Ulf Böckenholt
Department of Educational Psychology, University of Wisconsin-Madison, Madison, WI, USA
Jee-Seon Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, C., Jurich, D. (2021). Comparison of Outlier Detection Methods in NEAT Design. In: Wiberg, M., Molenaar, D., González, J., Böckenholt, U., Kim, JS. (eds) Quantitative Psychology. IMPS 2020. Springer Proceedings in Mathematics & Statistics, vol 353. Springer, Cham. https://doi.org/10.1007/978-3-030-74772-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-74772-5_20
Published: 23 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74771-8
Online ISBN: 978-3-030-74772-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics