MIMIC approach to assessing differential item functioning with control of extreme response style
Likert or rating scales may elicit an extreme response style (ERS), which means that responses to scales do not reflect the ability that is meant to be measured. Research has shown that the presence of ERS could lead to biased scores and thus influence the accuracy of differential item functioning (DIF) detection. In this study, a new method under the multiple-indicators multiple-causes (MIMIC) framework is proposed as a means to eliminate the impact of ERS in DIF detection. The findings from a series of simulations showed that a difference in ERS between groups caused inflated false-positive rates and deflated true-positive rates in DIF detection when ERS was not taken into account. The modified MIMIC model, as compared to conventional MIMIC, logistic discriminant function analysis, ordinal logistic regression, and their extensions, could control false-positive rates across situations and yielded trustworthy true-positive rates. An empirical example from a study of Chinese marital resilience was analyzed to demonstrate the proposed model.
KeywordsExtreme response style Multiple indicators multiple causes Differential item functioning Measurement invariance
The study was funded by the University Research Council under an Early Career Scheme Grant (No. CityU 21615416).
- Hofstede, G. (2001). Culture’s consequences (2nd ed.). Thousand Oaks, CA: Sage.Google Scholar
- Holland, W. P., & Thayer, D. T. (1988). Differential item performance and the Mantel–Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
- Leventhal, B. C., & Stone, C. A. (2018). Bayesian analysis of multidimensional item response theory models: A discussion and illustration of three response style models. Measurement: Interdisciplinary Research and Perspectives, 16, 114–128. https://doi.org/10.1080/15366367.2018.1437306 Google Scholar
- Li, T.-S. (2007). The formation and change of Chinese marital resilience: Modifying and testing the hypothetical theory. [Data file]. Available from Survey Research Data Archive, Academia Sinica. https://doi.org/10.6141/TW-SRDA-E93037-1
- Mantel, N. (1963). Chi-square tests with one degree of freedom: Extensions of the Mantel–Haenszel procedure. Journal of the American Statistical Association, 58, 690–700.Google Scholar
- Miller, T. R., & Spray, J. A. (1993). Logistic discriminant function analysis for DIF identification of polytomously scored items. Journal of Educational Measurement, 30, 107–122. https://doi.org/10.1111/j.1745-3984.1993.tb01069.x CrossRefGoogle Scholar
- Muthén, B. O., Kao, C. F., & Burstein, L. (1991). Instructional sensitivity in mathematics achievement test items: Applications of a new IRT-based detection technique. Journal of Educational Measurement, 28, 1–22. https://doi.org/10.1111/j.1745-3984.1991.tb00340.x CrossRefGoogle Scholar
- Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147–169). Hillsdale, NJ: Erlbaum.Google Scholar
- Tomás, J. M., Oliver, A., Galiana, L., Sancho, P., & Lila, M. (2013). Explaining method effects associated with negatively worded items in trait and state global and domain-specific self-esteem scales. Structural Equation Modeling, 20, 299–313. https://doi.org/10.1080/10705511.2013.769394 CrossRefGoogle Scholar
- Weijters, B. (2006). Response styles in consumer research (Unpublished doctoral dissertation). Ghent University, Belgium.Google Scholar
- Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.Google Scholar