Abstract
There has been a growing consensus among the educational measurement experts and psychometricians that test taker characteristics may unduly affect the performance on tests. This may lead to construct-irrelevant variance in the scores and thus render the test biased. Hence, it is incumbent on test developers and users alike to provide evidence that their tests are free of such bias. The present study exploited generalizability theory to examine the presence of gender differential performance on a high-stakes language proficiency test, the University of Tehran English Proficiency Test. An analysis of the performance of 2,343 examinees who had taken the test in 2009 indicated that the relative contributions of different facets to score variance were almost uniform across the gender groups. Further, there is no significant interaction between items and persons, indicating that the relative standings of the persons were uniform across all items. The lambda reliability coefficients were also uniformly high. All in all, the study provides evidence that the test is free of gender bias and enjoys a high level of dependability.
Similar content being viewed by others
Notes
The EduG software along with its manual are freely available from http://www.irdp.ch/edumetrie/englishprogram.htm.
References
American Educational Research Association (AERA), American Psychological Association (APA), National Council on Measurement in Education (NCME). (1999, 2002). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Bachman, L. F., Lynch, B. K., & Mason, M. (1995). Investigating variability in tasks and rater judgments in a performance test of foreign language speaking. Language Testing, 12, 238–257. doi:10.1177/026553229501200206.
Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford: Oxford University Press.
Bolus, R. E., Hinofotis, F. B., & Bailey, K. M. (1982). An introduction to generalizability theory in second language research. Language Learning, 32(2), 245–258. doi:10.1111/j.1467-1770.1982.tb00970.x.
Brennan, R. L. (1983). Elements of generalizability theory. Iowa City: American College Testing Program.
Brown, J. D. (1999). The relative importance of persons, items, subtests and languages to TOEFL test variance. Language Testing, 16, 217–238. doi:10.1177/026553229901600205.
Cardinet, J., Johnson, S., & Pini, G. (2010). Applying generalizability theory using EduG. New York: Routledge.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements. New York: Wiley.
Farhady, H., & Hedayati, H. (2009). Language assessment policy in Iran. Annual Review of Applied Linguistics, 29, 132–141.
Gebril, A. (2009). Score generalizability of academic writing tasks: does one test method fit it all? Language Testing, 26(4), 507–531. doi:10.1177/0265532209340188.
Kane, M. (2003). Generalizability Theory. International Journal of Testing, 3(1), 95–100. doi:10.1207/S15327574IJT0301_6.
Kane, M. (2011). The errors of our ways. Journal of Educational Measurement, 48(1), 12–30. doi:10.1111/j.1745-3984.2010.00128.x.
Karami, H. (2011). Detecting gender bias in a language proficiency test. International Journal of Language Studies, 5(2), 167–178.
Karami, H. (2012a). An introduction to differential item functioning. International Journal of Educational and Psychological Assessment, 11(2), 59–76.
Karami, H. (2012b). The relative impact of persons, items, subtests, and academic background on performance on a high-stakes language proficiency test. Psychological Test and Assessment Modeling, 54(3), 211–226.
Karami, H. (2012c). The development and validation of a bilingual version of the Vocabulary Size Test. RELC Journal, 43(1), 53–67.
Karami, H. (2013). The quest for fairness in language testing. Educational Research and Evaluation, 19(2&3), 158–169.
Karnameh Haghighi, H. K., & Akbari, N. (2005). An investigation into women’s social demand for higher education in Iran. Pezhohesh Zanan, 3(1), 69–100. (in Persian).
Khosrokhavar, F., & Ghaneirad, M. (2010). Iranian Women’s Participation in the Academic World. Iranian Studies, 43(2), 223–238. http://dx.doi.org/10.1080/00210860903542093.
Kunnan, A. J. (1992). An investigation of a criterion-referenced test using G-theory, and factor and cluster analysis. Language Testing, 9(1), 30–49. doi:10.1177/026553229200900104.
Lee, Y. W. (2006). Dependability of scores for a new ESL speaking assessment consisting of integrated and independent tasks. Language Testing, 23(2), 131–166. doi:10.1191/0265532206lt325oa.
Lumley, T., & O’Sullivan, B. (2005). The effect of test-taker gender, audience and topic on task performance in tape- mediated assessment of speaking. Language Testing, 22(4), 415–437. doi:10.1191/0265532205lt303oa.
Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and Many-Facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158–180. doi:10.1177/026553229801500202.
Mehran, G. (2009). Doing and undoing gender: Female higher education in the Islamic Republic of Iran. International Review of Education, 55, 541–559.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education & Macmillan.
O’Loughlin, K. (2002). The impact of gender in oral proficiency testing. Language Testing, 19(2), 169–192. doi:10.1191/0265532202lt226oa.
O’Sullivan, B. (2000). Exploring gender and oral proficiency interview performance. System, 28(3), 373–386. doi:10.1016/S0346-251X(00)00018-X.
Rezai-Rashti, G. (2011). Iranian women’s increasing access to higher education but limited participation in the job market. Middle East Critique, 20(1), 83–98.
Rezai-Rashti, G., & Moghadam, V. (2011). Women and Higher Education in Iran: What are the implications for employment and the “marriage market”? International Review of Education, 57, 419–441.
Ryan, K., & Bachman, L. (1992). Differential item functioning on two tests of EFL proficiency. Language Testing, 9, 12–29. doi:10.1177/026553229200900103.
Sato, T. (2012). The contribution of test-takers’ speech content to scores on an English oral proficiency test. Language Testing, 29(2), 223–241. doi:10.1177/0265532211421162.
Sawaki, Y. (2007). Construct validation of analytic rating scales in a speaking assessment: Reporting a score profile and a composite. Language Testing, 24(3), 335–390. doi:10.1177/0265532207077205.
Shabani, E. A. (2008). Differential item functioning analysis for dichotomously scored items of UTEPT using Logistic Regression. Unpublished master’s thesis, University of Tehran, Iran.
Shavarini, M. K. (2005). The feminization of Iranian higher education. International Review of Education, 51(4), 329–347.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park: Sage.
Shavelson, R. J., Webb, N. M., & Rowley, G. L. (1989). Generalizability theory. American Psychologist, 44(6), 922–932. doi:10.1037/0003-066X.44.6.922.
Swiss Society for Research in Education Working Group. (2010). EDUG user guide. Neuchatel: IRDP.
Takala, S., & Kaftandjieva, F. (2000). Test fairness: A DIF analysis of an L2 vocabulary test. Language Testing, 17(3), 323–340. doi:10.1177/026553220001700303.
Van Moere, A. (2006). Validity evidence in a university group oral test. Language Testing, 23(4), 311–440. doi:10.1191/0265532206lt336oa.
Xi, X. (2007). Evaluating analytic scoring for the TOEFL® Academic Speaking Test (TAST) for operational use. Language Testing, 24(2), 251–286. doi:10.1177/0265532207076365.
Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147–170.
Zeidner, M. (1987). A comparison of ethnic, sex, and age bias in the predictive validity of English Language aptitude tests: Some Israeli data. Language Testing, 4, 55–71. doi:10.1177/026553228700400106.
Zhang, S. (2006). Investigating the relative effects of persons, items, sections, and languages on TOEIC score dependability. Language Testing, 23(3), 351–369. doi:10.1191/0265532206lt332oa.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Karami, H. An investigation of the gender differential performance on a high-stakes language proficiency test in Iran. Asia Pacific Educ. Rev. 14, 435–444 (2013). https://doi.org/10.1007/s12564-013-9272-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12564-013-9272-y