Skip to main content

Advertisement

Log in

An investigation of the gender differential performance on a high-stakes language proficiency test in Iran

  • Published:
Asia Pacific Education Review Aims and scope Submit manuscript

Abstract

There has been a growing consensus among the educational measurement experts and psychometricians that test taker characteristics may unduly affect the performance on tests. This may lead to construct-irrelevant variance in the scores and thus render the test biased. Hence, it is incumbent on test developers and users alike to provide evidence that their tests are free of such bias. The present study exploited generalizability theory to examine the presence of gender differential performance on a high-stakes language proficiency test, the University of Tehran English Proficiency Test. An analysis of the performance of 2,343 examinees who had taken the test in 2009 indicated that the relative contributions of different facets to score variance were almost uniform across the gender groups. Further, there is no significant interaction between items and persons, indicating that the relative standings of the persons were uniform across all items. The lambda reliability coefficients were also uniformly high. All in all, the study provides evidence that the test is free of gender bias and enjoys a high level of dependability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The EduG software along with its manual are freely available from http://www.irdp.ch/edumetrie/englishprogram.htm.

References

  • American Educational Research Association (AERA), American Psychological Association (APA), National Council on Measurement in Education (NCME). (1999, 2002). Standards for educational and psychological testing. Washington, DC: American Psychological Association.

  • Bachman, L. F., Lynch, B. K., & Mason, M. (1995). Investigating variability in tasks and rater judgments in a performance test of foreign language speaking. Language Testing, 12, 238–257. doi:10.1177/026553229501200206.

    Article  Google Scholar 

  • Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford: Oxford University Press.

    Google Scholar 

  • Bolus, R. E., Hinofotis, F. B., & Bailey, K. M. (1982). An introduction to generalizability theory in second language research. Language Learning, 32(2), 245–258. doi:10.1111/j.1467-1770.1982.tb00970.x.

    Article  Google Scholar 

  • Brennan, R. L. (1983). Elements of generalizability theory. Iowa City: American College Testing Program.

    Google Scholar 

  • Brown, J. D. (1999). The relative importance of persons, items, subtests and languages to TOEFL test variance. Language Testing, 16, 217–238. doi:10.1177/026553229901600205.

    Google Scholar 

  • Cardinet, J., Johnson, S., & Pini, G. (2010). Applying generalizability theory using EduG. New York: Routledge.

    Google Scholar 

  • Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements. New York: Wiley.

    Google Scholar 

  • Farhady, H., & Hedayati, H. (2009). Language assessment policy in Iran. Annual Review of Applied Linguistics, 29, 132–141.

    Article  Google Scholar 

  • Gebril, A. (2009). Score generalizability of academic writing tasks: does one test method fit it all? Language Testing, 26(4), 507–531. doi:10.1177/0265532209340188.

    Article  Google Scholar 

  • Kane, M. (2003). Generalizability Theory. International Journal of Testing, 3(1), 95–100. doi:10.1207/S15327574IJT0301_6.

    Article  Google Scholar 

  • Kane, M. (2011). The errors of our ways. Journal of Educational Measurement, 48(1), 12–30. doi:10.1111/j.1745-3984.2010.00128.x.

    Article  Google Scholar 

  • Karami, H. (2011). Detecting gender bias in a language proficiency test. International Journal of Language Studies, 5(2), 167–178.

    Google Scholar 

  • Karami, H. (2012a). An introduction to differential item functioning. International Journal of Educational and Psychological Assessment, 11(2), 59–76.

    Google Scholar 

  • Karami, H. (2012b). The relative impact of persons, items, subtests, and academic background on performance on a high-stakes language proficiency test. Psychological Test and Assessment Modeling, 54(3), 211–226.

    Google Scholar 

  • Karami, H. (2012c). The development and validation of a bilingual version of the Vocabulary Size Test. RELC Journal, 43(1), 53–67.

    Article  Google Scholar 

  • Karami, H. (2013). The quest for fairness in language testing. Educational Research and Evaluation, 19(2&3), 158–169.

    Article  Google Scholar 

  • Karnameh Haghighi, H. K., & Akbari, N. (2005). An investigation into women’s social demand for higher education in Iran. Pezhohesh Zanan, 3(1), 69–100. (in Persian).

    Google Scholar 

  • Khosrokhavar, F., & Ghaneirad, M. (2010). Iranian Women’s Participation in the Academic World. Iranian Studies, 43(2), 223–238. http://dx.doi.org/10.1080/00210860903542093.

  • Kunnan, A. J. (1992). An investigation of a criterion-referenced test using G-theory, and factor and cluster analysis. Language Testing, 9(1), 30–49. doi:10.1177/026553229200900104.

    Article  Google Scholar 

  • Lee, Y. W. (2006). Dependability of scores for a new ESL speaking assessment consisting of integrated and independent tasks. Language Testing, 23(2), 131–166. doi:10.1191/0265532206lt325oa.

    Article  Google Scholar 

  • Lumley, T., & O’Sullivan, B. (2005). The effect of test-taker gender, audience and topic on task performance in tape- mediated assessment of speaking. Language Testing, 22(4), 415–437. doi:10.1191/0265532205lt303oa.

    Article  Google Scholar 

  • Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and Many-Facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158–180. doi:10.1177/026553229801500202.

    Google Scholar 

  • Mehran, G. (2009). Doing and undoing gender: Female higher education in the Islamic Republic of Iran. International Review of Education, 55, 541–559.

    Article  Google Scholar 

  • Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education & Macmillan.

    Google Scholar 

  • O’Loughlin, K. (2002). The impact of gender in oral proficiency testing. Language Testing, 19(2), 169–192. doi:10.1191/0265532202lt226oa.

    Article  Google Scholar 

  • O’Sullivan, B. (2000). Exploring gender and oral proficiency interview performance. System, 28(3), 373–386. doi:10.1016/S0346-251X(00)00018-X.

    Article  Google Scholar 

  • Rezai-Rashti, G. (2011). Iranian women’s increasing access to higher education but limited participation in the job market. Middle East Critique, 20(1), 83–98.

    Article  Google Scholar 

  • Rezai-Rashti, G., & Moghadam, V. (2011). Women and Higher Education in Iran: What are the implications for employment and the “marriage market”? International Review of Education, 57, 419–441.

    Article  Google Scholar 

  • Ryan, K., & Bachman, L. (1992). Differential item functioning on two tests of EFL proficiency. Language Testing, 9, 12–29. doi:10.1177/026553229200900103.

    Article  Google Scholar 

  • Sato, T. (2012). The contribution of test-takers’ speech content to scores on an English oral proficiency test. Language Testing, 29(2), 223–241. doi:10.1177/0265532211421162.

    Article  Google Scholar 

  • Sawaki, Y. (2007). Construct validation of analytic rating scales in a speaking assessment: Reporting a score profile and a composite. Language Testing, 24(3), 335–390. doi:10.1177/0265532207077205.

    Article  Google Scholar 

  • Shabani, E. A. (2008). Differential item functioning analysis for dichotomously scored items of UTEPT using Logistic Regression. Unpublished master’s thesis, University of Tehran, Iran.

  • Shavarini, M. K. (2005). The feminization of Iranian higher education. International Review of Education, 51(4), 329–347.

    Article  Google Scholar 

  • Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park: Sage.

    Google Scholar 

  • Shavelson, R. J., Webb, N. M., & Rowley, G. L. (1989). Generalizability theory. American Psychologist, 44(6), 922–932. doi:10.1037/0003-066X.44.6.922.

    Article  Google Scholar 

  • Swiss Society for Research in Education Working Group. (2010). EDUG user guide. Neuchatel: IRDP.

    Google Scholar 

  • Takala, S., & Kaftandjieva, F. (2000). Test fairness: A DIF analysis of an L2 vocabulary test. Language Testing, 17(3), 323–340. doi:10.1177/026553220001700303.

    Google Scholar 

  • Van Moere, A. (2006). Validity evidence in a university group oral test. Language Testing, 23(4), 311–440. doi:10.1191/0265532206lt336oa.

    Google Scholar 

  • Xi, X. (2007). Evaluating analytic scoring for the TOEFL® Academic Speaking Test (TAST) for operational use. Language Testing, 24(2), 251–286. doi:10.1177/0265532207076365.

    Article  Google Scholar 

  • Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147–170.

    Google Scholar 

  • Zeidner, M. (1987). A comparison of ethnic, sex, and age bias in the predictive validity of English Language aptitude tests: Some Israeli data. Language Testing, 4, 55–71. doi:10.1177/026553228700400106.

    Article  Google Scholar 

  • Zhang, S. (2006). Investigating the relative effects of persons, items, sections, and languages on TOEIC score dependability. Language Testing, 23(3), 351–369. doi:10.1191/0265532206lt332oa.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hossein Karami.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karami, H. An investigation of the gender differential performance on a high-stakes language proficiency test in Iran. Asia Pacific Educ. Rev. 14, 435–444 (2013). https://doi.org/10.1007/s12564-013-9272-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12564-013-9272-y

Keywords

Navigation