Abstract
Studies that focus on measurement invariance are of significant importance in proving the validation of high-stake tests, and in order to provide fairness from the results of these exams for special needs students. The aim of this study is to examine the measurement invariance of the “Central Examination for Secondary Education Institutions” in Turkey according to participant disability status. A focus group comprised of 369 visually impaired students was formed. An equal number of non-visually impaired peers were randomly selected as a reference group. Mantel-Haenszel, logistic regression, Breslow-Day, and standardization methods of classical test theory were used in order to detect items with differential item functioning (DIF). DIF analysis results proved that 16 (17.78%) of the 90 test items indicated DIF, and that ten of the DIF detected items (62.5%) represented a disadvantage for visually impaired participants. A total of 17 experts were consulted in order to investigate item bias. As a result of the collective expert opinion, five items were found to be “biased” in the Turkish (n = 1), English (n = 2), and in Science (n = 2) subtests. Close agreement was obtained between the experts that the “biased items” favored non-visually impaired participants. Use of visuals/graphics, complex/lengthy texts and response options, the need for rereading questions, and the negative attitudes of readers/coders were pointed out as sources of item bias.

Similar content being viewed by others
References
Abedi, J., Leon, S., & Kao, J. C. (2008). Examining differential item functioning in reading assessments for students with disabilities. CRESST report 744. National Center for research on evaluation, standards, and student testing (CRESST).
Allen, N. L., Carlson, J. E., & Zelenak, C. A. (1996). DIF results for accommodation booklet items for science and mathematics. The NAEP 1996 technical report. Washington: National Center for Education Statistics.
Allman, C. B. (2009). Making tests accessible for students with visual impairments: a guide for test publishers, test developers, and state assessment personnel (4th ed.). Louisville: American Printing House for the Blind.
Almond, P. J., Lehr, C., Thurlow, M., & Quenemoen, R. (2002). Participation in large-scale state assessment and accountability systems. In G. Tindal & T. M. Haladayna (Eds.), Large-scale assessment programs for all students (pp. 341–370). Lawrence Erlbaum Associates.
American Educational Research Association, American Psychological Association, National Council on Measurement in Education [AERA/APA/NCME]. (1999). Standards for educational and psychological testing. Washington: American Psychological Association.
Amrein-Beardsley, A., & Barnett, J. H. (2012). Working with error and uncertainty to increase measurement validity. Educational Assessment, Evaluation and Accountability, 24(4), 369–379. https://doi.org/10.1007/s11092-012-9146-6.
Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36(4), 277–300.
Atalay Kabasakal, K., Arsan, N., Gök, B., & Kelecioğlu, H. (2014). Comparing performances (type I error and power) of IRT likelihood ratio SIBTEST and mantel-Haenszel methods in the determination of differential item functioning. Kuram ve Uygulamada Egitim Bilimleri, 14(6), 2186–2193. https://doi.org/10.12738/estp.2014.6.2165.
Atar, B., & Kamata, K. (2011). Comparison of IRT likelihood ratio test and logistic regression DIF detection procedures. Hacettepe University Journal of Education, 41, 36–47.
Bakan Kalaycıoğlu, D., & Kelecioğlu, H. (2011). Item bias analysis of the university entrance examination. Education and Science, 36(161), 3–13.
Banks, K. (2015). An introduction to missing data in the context of differential item functioning. Practical Assessment, Research and Evaluation, 20(1), 1–12.
Bell, E. C., & Silverman, A. M. (2019). Access to math and science content for youth who are blind or visually impaired. Journal of Blindness Innovation and Research, 9(1). https://doi.org/10.5241/9-152.
Bennett, R. E., Rock, D. A., & Kaplan, B. A. (1987). SAT differential item performance for nine handicapped groups. Journal of Educational Measurement, 24(1), 41–55. https://doi.org/10.1111/j.1745-3984.1987.tb00260.x.
Bennett, R. E., Rock, D. A., & Novatkoski, I. (1989). Differential item functioning on the SAT-M braille edition. Journal of Educational Measurement, 26(1), 67–79. https://doi.org/10.1111/j.1745-3984.1989.tb00319.x.
Bolt, S. E., & Ysseldyke, J. (2008). Accommodating students with disabilities in large-scale testing: a comparison of differential item functioning (DIF) identified across disability types. Journal of Psychoeducational Assessment, 26(2), 121–138. https://doi.org/10.1177/0734282907307703.
Bruce, S. M., Luckner, J. L., & Ferrell, K. A. (2018). Assessment of students with sensory disabilities: evidence-based practices. Assessment for Effective Intervention, 43(2), 79–89. https://doi.org/10.1177/1534508417708311.
Buzick, H., & Stone, E. (2011). Recommendations for conducting differential item functioning (DIF) analyses for students with disabilities based on previous DIF studies. ETS Research Report Series, 2011(2), i-26.
Camara, W. (2009). Validity evidence in accommodations for English language learners and students with disabilities Wayne Camara the college. Journal of Applied Testing Technology, 10(2), 1–23.
Camilli, G. (2013). Ongoing issues in test fairness. Educational Research and Evaluation, 19(2–3), 104–120. https://doi.org/10.1080/13803611.2013.767602.
Cho, H. J., Lee, J., & Kingston, N. (2012). Examining the effectiveness of test accommodation using DIF and a mixture IRT model. Applied Measurement in Education, 25(4), 281–304. https://doi.org/10.1080/08957347.2012.714682.
Çokluk, Ö., Şekercioğlu, G., & Büyüköztürk, Ş. (2018). Sosyal bilimler için çok değişkenli istatistik. SPSS ve LİSREL uygulamaları. Ankara: Pegem Akademi Yayınları.
Dorans, N. J., & Holland, P. W. (1992). DIF detection and description: Mantel-Haenszel and standardization 1, 2. ETS Research Report Series, 1992(1), i-40. https://doi.org/10.1002/j.2333-8504.1992.tb01440.x.
Dorans, N. J., Schmitt, A. P., & Bleistein, C. A. (1992). The standardization approach to assessing comprehensive differential item functioning. Journal of Educational Measurement, 29(4), 309–319.
Educational Test Service. (2019). TOEFL iBT tests, bulletin supplement for test takers with disabilities or health-related need. Princeton: ETS.
El Masri, Y. H., Baird, J. A., & Graesser, A. (2016). Language effects in international testing: The case of PISA 2006 science items. Assessment in Education: Principles, Policy and Practice, 23(4), 427–455. https://doi.org/10.1080/0969594X.2016.1218323.
Elliott, J., Thurlow, M., Ysseldyke, J., & Erickson, R. (1997). Providing assessment accommodations for students with disabilities in state and district assessments (policy directions no. 7). Minneapolis, MN: University of Minnesota, National Center on educational outcomes. Retrieved from http://www.cehd.umn.edu/NCEO/onlinepubs/archive/Policy/Policy7.html. Accessed 2 Dec 2020.
Finch, W. H., & French, B. F. (2007). Detection of crossing differential item functioning: a comparison of four methods. Educational and Psychological Measurement, 67(4), 565–582. https://doi.org/10.1177/0013164406296975.
Gök, B., Kelecioglu, H., & Doğan, N. (2010). The comparison of mantel-Haenszel and logistic regression techniques in determining the differential item functioning. Education and Science, 35(156), 3–16.
Gültekin, S. (2014). Testlerde kullanılacak madde türleri, hazırlama ilkeleri ve puanlaması [principles, preparation and scoring of item types used in tests]. In N. Demirtaşlı (Ed.), Eğitimde Ölçme ve Değerlendirme (2nd ed., pp. 171–251). Ankara: Edge Akademi.
Hambleton, R. K., & Swaminathan, H. (1991). Item response theory, principles and applications. Springer Science & Business Media. https://doi.org/10.1017/CBO9781107415324.004.
Herrera, A., & Gómez, J. (2008). Influence of equal or unequal comparison group simple sizes on the detection of differential item functioning using the mantel–Haenszel and logistic regression techniques. Quality and Quantity, 42(6), 739–755.
Heward, L. W. (2013). Exceptional children: an introduction to special education (10th ed.). Upper Saddle River: Pearson.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329–349.
Karakaya, İ. (2012). An investigation of item bias in science and technology subtests and mathematic subtests in level determination exam (LDE). Educational Sciences: Theory and Practice, 12(1), 222–229.
Kato, K., Moen, R., & Thurlow, M. (2007). Examining DIF, DDF, and omit rate by discrete disability categories. Minneapolis: University of Minnesota, Partnership for Accessible Reading Assessment.
Kato, K., Moen, R., & Thurlow, M. (2009). Differential of a state reading assessment: Item functioning, distractor functioning, and omission frequency for disability categories. Educational Measurement: Issues and Practice, 28(2), 28–40.
Lim, R. G., & Drasgow, F. (1990). Evaluation of two methods for estimating item response theory parameters when assessing differential item functioning. Journal of Applied Psychology, 75(2), 164–174. https://doi.org/10.1037/0021-9010.75.2.164.
Linn, R. L. (2006). Validity of inferences from test-based educational accountability systems. Journal of Personnel Evaluation in Education, 19(1–2), 5–15. https://doi.org/10.1007/s11092-007-9027-6.
Magis, D., Béland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42(3), 847–862.
McKnight, P. E., McKnight, K. M., Sidani, S., & Figueredo, A. J. (2007). Missing data: A gentle introduction. New York: Guilford Press.
Mellenbergh, G. J. (2014). Item bias detection: classical approaches. In Wiley StatsRef: Statistics Reference Online. Wiley, Ltd. https://doi.org/10.1002/9781118445112.stat06383.
Mertler, C. A., & Vannatta, R. A. (2005). Advanced and multivariate statistical methods: practical application and interpretation (3rd ed.). Los Angeles: Pyrczak.
Middleton, K., & Laitusis, C. C. (2007). Examining test items for differential distractor functioning among students with learning disabilities. ETS Research Report Series, 2007(2), i-34.
Milli Eğitim Bakanlığı. (2018). Sınavla Öğrenci Alacak Ortaöğretim Kurumlarına İlişkin Merkezî Sınav Başvuru ve Uygulama Kılavuzu [application guide of central examination for secondary education institutions]. Ankara: Milli Eğitim Bakanlığı. Retrieved from https://www.meb.gov.tr/sinavlar/dokumanlar/2018/MERKEZI_SINAV_BASVURU_VE_UYGULAMA_KILAVUZU.pdf. Accessed 2 Dec 2020.
Nandakumar, R. (1993). A fortran 77 program for detecting differential item functioning through the mantel-Haenszel statistic. Educational and Psychological Measurement, 53(3), 679–684.
Narayanan, P., & Swaminathan, H. (1994). Performance of the mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18(4), 315–328.
Narayanan, P., & Swaminathan, H. (1996). Identification of items that show non-uniform DIF. Applied Psychological Measurement, 20(3), 257–274. https://doi.org/10.1177/01466216960200030.
National Center for Education Statistics. (2019). NAEP accommodations increase inclusiveness. Washington: U.S. Department of Education. Retrieved from https://nces.ed.gov/nationsreportcard/about/accom_table.aspx#accommodations. Accessed 2 Dec 2020.
Ölçme, Seçme & Yerleştirme Merkezi Başkanliği (ÖSYM). (2019). 2019 Yükseköğretim Kurumları Sınavı (YKS) Kılavuzu [2019 higher education institutions examination guide]. Ankara: ÖSYM.
Özdemir, S. (2016). Engellilerin, inanç konuları hakkındaki dinî bilgilerinin yeterlilikleri [the competence of the religious knowledge of individuals with disabilities on matters of belief]. I. Uluslararası Engellilik ve Din Sempozyumu Engellilik Bağlamında Dini Yaklaşım, Dini Başa çıkma ve Din Eğitimi, 18-20.
Penfield, R. D. (2003). Applying the Breslow–day test of trend in odds ratio heterogeneity to the analysis of nonuniform DIF. Alberta Journal of Educational Research, 49(3), 231–243.
Potenza, M. T., & Dorans, N. J. (1995). DIF assessment for polytomously scored items: a framework for classification and evaluation. Applied Psychological Measurement, 19(1), 23–37. https://doi.org/10.1177/014662169501900104.
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17(2), 105–116. https://doi.org/10.1177/014662169301700201.
Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., DeGraeff, A., Groenvold, M., et al. (2009). A simulation study provided sample size guidance for differential item functioning (DIF) studies using short scales. Journal of Clinical Epidemiology, 62(3), 288–295.
Şenel, S. (2015). Görme engelli öğrencilerin üniversite giriş sınavı deneyimleri [experiences of visually impaired students in university entrance exam]. Hacettepe Eğitim Araştırmaları Dergisi, 1(1), 1–17.
Şenel, S., & Kutlu, Ö. (2018). Comparison of two test methods for VIS: paper-pencil test and CAT. European Journal of Special Needs Education, 33(5), 631–645.
Sireci, S. G. (2009). No more excuses: new research on assessing students with disabilities. Journal of Applied Testing Technology, 10(2), 1–18.
Smith, D. W., & Amato, S. (2012). Synthesis of available accommodations for students with visual impairments on standardized assessments. Journal of Visual Impairment and Blindness, 106(5), 299–304. https://doi.org/10.1177/0145482x1210600505.
Steinberg, J., Cline, F., Ling, G., Cook, L., & Tognatta, N. (2009). Examining the validity and fairness of a state standards-based assessment of English-language arts for deaf or hard of hearing students. Journal of Applied Testing Technology, 10(2), 1–33.
Stevens, J. (1996). Applied multivariate statistics for the behavioral sciences. Mahwah: Erlbaum.
Stone, E., & Davey, T. (2011). Computer-adaptive testing for students with disabilities: A review of the literature. ETS research report series. Princeton, 2, i-24, NJ: Educational Testing Service.
Stone, E., Cook, L., Cahalan Laitusis, C., & Cline, F. (2010). Using differential item functioning to investigate the impact of testing accommodations on an English-language arts assessment for students who are blind or visually impaired. Applied Measurement in Education, 23(2), 132–152. https://doi.org/10.1080/08957341003673773.
Südkamp, A., Pohl, S., & Weinert, S. (2015). Competence assessment of students with special educational needs—identification of appropriate testing accommodations. Frontline Learning Research, 3(2), 1–26. https://doi.org/10.14786/flr.v3i2.130.
Sünbül, Ö., & Sünbül, S. Ö. (2016). Type I error rates and power study of several differential item functioning determination methods. Elementary Education Online, 15(3), 882–897.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x.
Tamcı, P. (2018). Kayıp Veriyle Başa Çıkma Yöntemlerinin Değişen Madde Fonksiyonu Üzerindeki Etkisinin İncelenmesi (Master's thesis). Ankara: Hacettepe Üniversitesi. Eğitim Bilimleri Enstitüsü.
Tunç, E. B. (2016). A comparison of differential item functioning in two and multi-category scoring items (unpublished doctoral dissertation). Ankara: Ankara University.
Zebehazy, K. T., Zigmond, N., & Zimmerman, G. J. (2012). Ability or access-ability: differential item functioning of items on alternate performance-based assessment tests for students with visual impairments. Journal of Visual Impairment and Blindness, 106(6), 325–338. https://doi.org/10.1177/0145482x1210600602.
Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337–347). Hillsdale: Erlbaum.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zumbo, B. D., & Thomas, D. R. (1996). A measure of DIF effect size using logistic regression procedures. National Board of Medical Examiners. Philadelphia.
Zwick, R. (2012). A review of ETS differential item functioning assessment procedures: flagging rules, minimum sample size requirements, and criterion refinement. In ETS Research Report Series (Vol. 2012, pp. i–30). Wiley. https://doi.org/10.1002/j.2333-8504.2012.tb02290.x.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1
Appendix 2
Appendix 3
Appendix 4
Appendix 5
Appendix 6
Rights and permissions
About this article
Cite this article
Şenel, S. Assessing measurement invariance of Turkish “Central Examination for Secondary Education Institutions” for visually impaired students. Educ Asse Eval Acc 33, 621–648 (2021). https://doi.org/10.1007/s11092-020-09345-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11092-020-09345-5


