MIMIC approach to assessing differential item functioning with control of extreme response style

  • Kuan-Yu Jin
  • Hui-Fang ChenEmail author


Likert or rating scales may elicit an extreme response style (ERS), which means that responses to scales do not reflect the ability that is meant to be measured. Research has shown that the presence of ERS could lead to biased scores and thus influence the accuracy of differential item functioning (DIF) detection. In this study, a new method under the multiple-indicators multiple-causes (MIMIC) framework is proposed as a means to eliminate the impact of ERS in DIF detection. The findings from a series of simulations showed that a difference in ERS between groups caused inflated false-positive rates and deflated true-positive rates in DIF detection when ERS was not taken into account. The modified MIMIC model, as compared to conventional MIMIC, logistic discriminant function analysis, ordinal logistic regression, and their extensions, could control false-positive rates across situations and yielded trustworthy true-positive rates. An empirical example from a study of Chinese marital resilience was analyzed to demonstrate the proposed model.


Extreme response style Multiple indicators multiple causes Differential item functioning Measurement invariance 


Author note

The study was funded by the University Research Council under an Early Career Scheme Grant (No. CityU 21615416).


  1. Austin, E. J., Deary, I. J., & Egan, V. (2006). Individual differences in response scale use: Mixed Rasch modelling of responses to NEO-FFI items. Personality and Individual Differences, 40, 1235–1245. CrossRefGoogle Scholar
  2. Bachman, J. G., & O’Malley, P. M. (1984). Yea-saying, nay-saying, and going to extremes: Black-white differences in response styles. Public Opinion Quarterly, 48, 491–509. CrossRefGoogle Scholar
  3. Baumgartner, H., & Steenkamp, J. B. E. M. (2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research, 38, 143–156. CrossRefGoogle Scholar
  4. Baumgartner, H., & Steenkamp, J. B. E. M. (2006). Response biases in marketing research. In R. Grover & M. Vriens (Eds.), The handbook of marketing research: Uses, misuses, and future advances (pp. 95–109). Thousand Oaks, CA: Sage. CrossRefGoogle Scholar
  5. Böckenholt, U. (2012). Modeling multiple response processes in judgment and choice. Psychological Methods, 17, 665–678. CrossRefGoogle Scholar
  6. Böckenholt, U. (2017). Measuring response styles in Likert items. Psychological Methods, 22, 69–83. CrossRefGoogle Scholar
  7. Bolt, D. M., & Johnson, T. R. (2009). Addressing score bias and differential item functioning due to individual differences in response style. Applied Psychological Measurement, 33, 335–352. CrossRefGoogle Scholar
  8. Bolt, D. M., & Newton, J. R. (2011). Multiscale measurement of extreme response style. Educational and Psychological Measurement, 71, 814–833. CrossRefGoogle Scholar
  9. Chen, H.-F., Jin, K.-Y., & Wang, W.-C. (2017). Modified logistic regression approaches to eliminating the impact of response styles on DIF detection in Likert-type scales. Frontiers in Psychology, 8, 1143. CrossRefGoogle Scholar
  10. Cheung, G. W., & Rensvold, R. B. (2000). Assessing extreme and acquiescence response sets in cross-cultural research using structural equations modelling. Journal of Cross-Cultural Psychology, 31, 188–213. CrossRefGoogle Scholar
  11. Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel–Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278–295. CrossRefGoogle Scholar
  12. Finch, W. H., & French, B. F. (2011). Estimation of MIMIC model parameters with multilevel data. Structural Equation Modeling, 18, 229–252. CrossRefGoogle Scholar
  13. Fischer, R. (2004). Standardization to account for cross-cultural response bias: A classification of score adjustment procedures and review of research in JCCP. Journal of Cross-Cultural Psychology, 35, 263–282. CrossRefGoogle Scholar
  14. Glöckner-Rist, A., & Hoitjink, H. (2003). The best of both worlds: Factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10, 544–565. CrossRefGoogle Scholar
  15. Hofstede, G. (2001). Culture’s consequences (2nd ed.). Thousand Oaks, CA: Sage.Google Scholar
  16. Holland, W. P., & Thayer, D. T. (1988). Differential item performance and the Mantel–Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  17. Jeon, M., & De Boeck, P. (2016). A generalized item response tree model for psychological assessments. Behavior Research Methods, 48, 1070–1085. CrossRefGoogle Scholar
  18. Jin, K.-Y., & Wang, W.-C. (2014). Generalized IRT models for extreme response style. Educational and Psychological Measurement, 74, 116–138. CrossRefGoogle Scholar
  19. Johnson, T., Kulesa, P., Cho, Y. I., & Shavitt, S. (2005). The relation between culture and response styles: Evidence from 19 countries. Journal of Cross-Cultural Psychology, 36, 264–277. CrossRefGoogle Scholar
  20. Khorramdel, L., & von Davier, M. (2014). Measuring response styles across the Big Five: A multiscale extension of an approach using multinominal processing trees. Multivariate Behavioral Research, 49, 61–177. CrossRefGoogle Scholar
  21. Lee, S., Bulut, O., & Suh, Y. (2017). Multidimensional extension of multiple indicators multiple causes models to detect DIF. Educational and Psychological Measurement, 77, 545–569. CrossRefGoogle Scholar
  22. Leventhal, B. C., & Stone, C. A. (2018). Bayesian analysis of multidimensional item response theory models: A discussion and illustration of three response style models. Measurement: Interdisciplinary Research and Perspectives, 16, 114–128. Google Scholar
  23. Li, T.-S. (2007). The formation and change of Chinese marital resilience: Modifying and testing the hypothetical theory. [Data file]. Available from Survey Research Data Archive, Academia Sinica.
  24. Lu, I. R. R., Thomas, D. R., & Zumbo, B. D. (2005). Embedding IRT in structural equation models: A comparison with regression based on IRT scores. Structural Equation Modeling, 12, 263–277. CrossRefGoogle Scholar
  25. MacIntosh, R., & Hashim, S. (2003). Variance estimation for converting MIMIC model parameters to IRT parameters in DIF analysis. Applied Psychological Measurement, 27, 372–379. CrossRefGoogle Scholar
  26. Mantel, N. (1963). Chi-square tests with one degree of freedom: Extensions of the Mantel–Haenszel procedure. Journal of the American Statistical Association, 58, 690–700.Google Scholar
  27. Miller, T. R., & Spray, J. A. (1993). Logistic discriminant function analysis for DIF identification of polytomously scored items. Journal of Educational Measurement, 30, 107–122. CrossRefGoogle Scholar
  28. Moors, G. (2012). The effect of response style bias on the measurement of transformational, transactional, and laissez-faire leadership. European Journal of Work and Organizational Psychology, 21, 271–298. CrossRefGoogle Scholar
  29. Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176. CrossRefGoogle Scholar
  30. Muthén, B. O., Kao, C. F., & Burstein, L. (1991). Instructional sensitivity in mathematics achievement test items: Applications of a new IRT-based detection technique. Journal of Educational Measurement, 28, 1–22. CrossRefGoogle Scholar
  31. Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel–Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18, 315–328. CrossRefGoogle Scholar
  32. Plieninger, H., & Meiser, T. (2014). Validity of multiprocess IRT models for separating content and response styles. Educational and Psychological Measurement, 74, 875–899. CrossRefGoogle Scholar
  33. Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. CrossRefGoogle Scholar
  34. Strobl, C., Kopf, J., & Zeileis, A. (2015). Rasch trees: A new method for detecting differential item functioning in the Rasch model. Psychometrika, 80, 289–316. CrossRefGoogle Scholar
  35. Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408. CrossRefGoogle Scholar
  36. Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147–169). Hillsdale, NJ: Erlbaum.Google Scholar
  37. Tomás, J. M., Oliver, A., Galiana, L., Sancho, P., & Lila, M. (2013). Explaining method effects associated with negatively worded items in trait and state global and domain-specific self-esteem scales. Structural Equation Modeling, 20, 299–313. CrossRefGoogle Scholar
  38. van Herk, H., Poortinga, Y. H., & Verhallen, T. M. M. (2004). Response styles in rating scales: Evidence of method bias in data from six EU countries. Journal of Cross-Cultural Psychology, 35, 346–360. CrossRefGoogle Scholar
  39. van Vaerenbergh, Y., & Thomas, T. D. (2012). Response styles in survey research: A literature review of antecedents, consequences, and remedies. International Journal of Public Opinion Research, 25, 195–217. CrossRefGoogle Scholar
  40. Wang, W.-C., & Shih, C.-L. (2010). MIMIC methods for assessing differential item functioning in polytomous items. Applied Psychological Measurement, 34, 166–180. CrossRefGoogle Scholar
  41. Wang, W.-C., & Su, Y.-H. (2004). Factors influencing the Mantel and generalized Mantel–Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28, 450–480. CrossRefGoogle Scholar
  42. Weijters, B. (2006). Response styles in consumer research (Unpublished doctoral dissertation). Ghent University, Belgium.Google Scholar
  43. Woods, C. M., & Grimm, K. J. (2011). Testing for nonuniform differential item functioning with multiple indicator multiple causes models. Applied Psychological Measurement, 35, 339–361. CrossRefGoogle Scholar
  44. Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.Google Scholar

Copyright information

© The Psychonomic Society, Inc. 2019

Authors and Affiliations

  1. 1.Faculty of EducationUniversity of Hong KongPokfulamHong Kong
  2. 2.Department of Social and Behavioural SciencesCity University of Hong KongKowloonHong Kong

Personalised recommendations