Skip to main content
Log in

Null Results in Assessing Survey Score Comparability: Illustrating Measurement Invariance Using Item Response Theory

  • Published:
Journal of Business and Psychology Aims and scope Submit manuscript

Abstract

In using organizational surveys for decision-making, it is essential to consider measurement equivalence/invariance (ME/I), which addresses the questions of whether score differences are attributable to differences in the latent variable we intend to measure, or attributable to confounding differences in measurement properties. Due to the tendency for null results to remain unpublished, most articles have focused on findings of, and reasons for violations of ME/I. On the other hand, little is available to practitioners and researchers concerning situations where ME/I can be expected to uphold. This is especially disconcerting due to the fact that the null is the desired result in such analyses, and allows for unfettered observed-score comparisons. This special issue presents a unique opportunity to provide such a discussion using real-world examples from an organizational culture survey. In doing so we hope to clear up confusion surrounding the concept of ME/I, when it can be expected, and how it relates to actual differences in scores. First, we review the basic tenets and past findings focusing on ME/I, and discuss the item response theory differential item functioning framework used here. Next, we show ME/I being upheld using organizational survey data wherein violations of ME/I would reasonably not be expected (i.e., the null hypothesis was predicted and supported), and simulate the consequences of ignoring ME/I. Finally, we suggest a set of conditions wherein ME/I is likely to be upheld.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. A fully freed model was fit to ensure that item and person parameters were on the same scale.

References

  • AERA, APA, & NCME (1999). Standards for educational and psychological testing. Washington, DC: Author.

  • Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36, 277–300. doi:10.1111/j.1745-3984.1999.tb00558.x.

    Article  Google Scholar 

  • Behrend, T. S., Thompson, L. F., Meade, A. W., Newton, D. A., & Grayson, M. S. (2008). Measurement invariance in careers research: Using IRT to study gender differences in medical students’ specialization decisions. Journal of Career Development, 35, 60–83. doi:10.1177/0894845308317936.

    Article  Google Scholar 

  • Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). Functional thought experiments. Synthese, 130, 379–387. doi:10.1023/A:1014840616403.

    Article  Google Scholar 

  • Borsboom, D., Romeijn, J. W., & Wicherts, J. M. (2008). Measurement invariance versus selection invariance: Is fair selection possible? Psychological Methods, 13, 75–98. doi:10.1037/1082-989X.13.2.75.

    Article  PubMed  Google Scholar 

  • Brown, J. R. (1991). The laboratory of the mind: Thought experiments in the natural sciences. London: Routledge.

    Google Scholar 

  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks: Sage.

    Google Scholar 

  • Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253–260. doi:10.1177/014662168801200304.

    Article  Google Scholar 

  • Carlson, D. S., Kacmar, K. M., & Williams, L. J. (2000). Construction and initial validation of a multidimensional measure of work-family conflict. Journal of Vocational Behavior, 56, 249–276. doi:10.1006/jvbe.1999.1713.

    Article  Google Scholar 

  • Carter, N. T., Kotrba, L. M., Diab, D. L., Lake, C. J., Lin, B. C., Pui, S.-Y., et al. (2012). A comparison of a subjective and statistical method for establishing score comparability in an organizational culture survey. Journal of Business and Psychology,. doi:10.1007/s10869-011-9254-1.

    Google Scholar 

  • Carter, N. T., & Zickar, M. J. (2011). A comparison of the LR and DFIT approaches to differential item functioning under the generalized graded unfolding model. Applied Psychological Measurement,. doi:10.1177/0146621611427898.

    Google Scholar 

  • Chan, K.-Y., Drasgow, F., & Sawin, L. L. (1999). What is the shelf life of a test? The effect of time on the psychometrics of a cognitive ability test battery. Journal of Applied Psychology, 84, 610–619. doi:10.1037/0021-9010.84.4.610.

    Article  Google Scholar 

  • Chernyshenko, O. S., Stark, S., Chan, K.-Y., Drasgow, F., & Williams, B. (2001). Fitting item response theory models to two personality inventories: Issues and Insights. Multivariate Behavioral Research, 36, 523–562. doi:10.1207/S15327906MBR3604_03.

    Article  Google Scholar 

  • Cheung, G. W., & Rensvold, R. B. (2000). Assessing extreme and acquiescence response sets in cross-cultural research using structural equations modeling. Journal of Cross-Cultural Psychology, 31, 187–212. doi:10.1177/0022022100031002003.

    Article  Google Scholar 

  • Chuah, S. C., Drasgow, F., & Roberts, B. W. (2006). Personality assessment: Does the medium matter? No. Journal of Research in Personality, 40, 359–376. doi:10.1016/j.jrp.2005.01.006.

    Article  Google Scholar 

  • Clark, P. C., & LaHuis, D. M. (2011). An examination of power and type I errors for two differential item functioning indices using the graded response model. Organizational Research Methods,. doi:10.1177/1094428111403815.

    Google Scholar 

  • Cohen, A. S., Kim, S.-H., & Wollack, J. A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement, 20, 15–26. doi:10.1177/014662169602000102.

    Article  Google Scholar 

  • Collins, W. C., Raju, N. S., & Edwards, J. E. (2000). Assessing differential functioning in a satisfaction scale. Journal of Applied Psychology, 85, 451–461. doi:10.1037/0021-9010.85.3.451.

    Article  PubMed  Google Scholar 

  • De Ayala, R. J., Kim, S.-H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2, 243–276. doi:10.1207/S15327574IJT023&4_4.

    Article  Google Scholar 

  • De Beuckelaer, A., & Lievens, F. (2009). Measurement equivalence of paper-and-pencil and internet organisational surveys: A large scale examination of 16 countries. Applied Psychology: An International Review, 58, 336–361. doi:10.1111/j.1464-0597.2008.00350.x.

    Article  Google Scholar 

  • de Dreu, C. K. W., Evers, A., Beersma, B., Kluwer, E. S., & Nauta, A. (2001). A theory-based measure of conflict management strategies in the workplace. Journal of Organizational Behavior, 22, 645–668. doi:10.1002/job.107.

    Article  Google Scholar 

  • Denison, D. R. (1984). Bringing corporate culture to the bottom line. Organizational Dynamics, 13, 4–22.

    Google Scholar 

  • Denison, D. R. (1990). Corporate culture and organizational effectiveness. New York: John Wiley & Sons.

  • Denison, D. R. (2000). Organizational culture: Can it be a key lever for driving organizational change? In C. L. Cooper, S. Cartwright, & P. C. Earley (Eds.), The International Handbook of Organizational Culture and Climate (pp. 347–372).

  • Denison, D. R., & Mishra, A. K. (1995). Toward a theory of organizational culture and effectiveness. Organizational Science, 6, 204–223.

    Google Scholar 

  • Denison, D. R., Janovics, J., Young, J., & Cho, H. J. (2007). Diagnosing organizational cultures: Validating a model and method. Working paper, International Institute for Management Development, Lausanne, Switzerland.

  • Donovan, M. A., Drasgow, F., & Probst, T. M. (2000). Does computerizing pencil-and-paper job attitude scales make a difference? IRT analyses offer insight. Journal of Applied Psychology, 85, 305–313. doi:10.1037/0021-9010.85.2.305.

    Article  PubMed  Google Scholar 

  • Drasgow, F. (1982). Biased test items and differential validity. Psychological Bulletin, 92, 526–531. doi:10.1037/0033-2909.92.2.526.

    Article  Google Scholar 

  • Drasgow, F. (1987). Study of the measurement bias of two standardized psychological tests. Journal of Applied Psychology, 72, 19–29. doi:10.1037/0021-9010.72.1.19.

    Article  Google Scholar 

  • Drasgow, F., & Lissak, R. I. (1983). Modified parallel analysis: A procedure for examining the latent dimensionality of dichotomously scored item responses. Journal of Applied Psychology, 68, 363–373.

    Google Scholar 

  • Drasgow, F., Levine, M. V., Tsien, S., Williams, B. A., & Mead, A. D. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19, 143–165. doi:10.1177/014662169501900203.

    Article  Google Scholar 

  • Edwards, J. R., & Bagozzi, R. P. (2000). On the nature and direction of relationships between constructs and measures. Psychological Methods, 5, 155–174. doi:10.1037/1082-989X.5.2.155.

    Article  PubMed  Google Scholar 

  • Eid, M., & Rauber, M. (2000). Detecting measurement invariance in organizational surveys. European Journal of Psychological Assessment, 16, 20–30. doi:10.1027//1015-5759.16.1.20.

    Article  Google Scholar 

  • Einarsdóttir, S., & Rounds, J. (2009). Gender bias and construct validity in vocational interest measurement: differential item functioning in the strong interest inventory. Journal of Vocational Behavior, 74, 295–307. doi:10.1016/j.jvb.2009.01.003.

    Article  Google Scholar 

  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Finch, W. H., & French, B. F. (2008). Anomalous type I error rates for identifying one type of differential item functioning in the presence of the other. Educational and Psychological Measurement, 68, 742–759. doi:10.1177/0013164407313370.

    Article  Google Scholar 

  • Flowers, C. P., Oshima, T. C., & Raju, N. S. (1999). A description and demonstration of the polytomous-DFIT framework. Applied Psychological Measurement, 23, 309–326. doi:10.1177/01466219922031437.

    Article  Google Scholar 

  • Han, K. T. (2007). WinGen: Windows software that generates IRT parameters and item responses. Applied Psychological Measurement, 31, 457–459. doi:10.1177/0146621607299271.

    Article  Google Scholar 

  • Hofstede, G. (1991). Cultures and organizations: Software of the mind. London: McGraw-Hill.

    Google Scholar 

  • Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel–Haenszel procedure. In H. Wainer & H. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Earlbaum.

    Google Scholar 

  • Hulin, C. L., Drasgow, F., & Parsons, C. K. (1983). Item response theory. Homewood, IL: Dow Jones-Irwin.

    Google Scholar 

  • Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535. doi:10.1037/0033-2909.112.3.527.

    Article  Google Scholar 

  • Kim, S.-H., & Cohen, A. S. (1995). A comparison of Lord’s Chi-square, Raju’s area measures, and the likelihood ratio test on detection of differential item functioning. Applied Measurement in Education, 8, 291–312. doi:10.1207/s15324818ame0804_2.

    Article  Google Scholar 

  • Kim, S.-H., & Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22, 345–355. doi:10.1177/014662169802200403.

    Article  Google Scholar 

  • Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking. New York, NY: Springer.

    Book  Google Scholar 

  • Lautenschlager, G. J., & Park, D. G. (1988). Item response theory item bias detection procedures: Issues of robustness, model misspecification and parameter linking. Applied Psychological Measurement, 12, 365–376. doi:10.1177/014662168801200404.

    Article  Google Scholar 

  • Lazarsfeld, P. F. (1959). Latent structure analysis. New York: McGraw-Hill.

    Google Scholar 

  • Liu, C., Borg, I., & Spector, P. E. (2004). Measurement equivalence of the German Job Satisfaction Survey used in a multinational organization: Implications of Schwartz’s culture model. Journal of Applied Psychology, 89, 1070–1082. doi:10.1037/0021-9010.89.6.1070.

    Article  PubMed  Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. New York: Information Age Publishing.

    Google Scholar 

  • Mach, E. (1926). On thought experiments. In E. Mach (Ed.), Knowledge and error: Sketches on the psychology of enquiry. Dodrecht: D. Reidel Publishing Co.

    Google Scholar 

  • Maydeu-Olivares, A., & Cai, L. (2006). A cautionary note on using G 2 to assess relative model fit in categorical data analysis. Multivariate Behavioral Research, 41, 55–64. doi:10.1207/s15327906mbr4101_4.

    Article  Google Scholar 

  • Meade, A. W. (2010). A taxonomy of effect size measures for the differential functioning of items and scales. Journal of Applied Psychology, 95, 728–743. doi:10.1037/a0018966.

    Article  PubMed  Google Scholar 

  • Meade, A. W., Lautenschlager, G. J., & Johnson, E. C. (2007). A comparison of item response theory and confirmatory factor analytic methodologies for establishing measurement equivalence/invariance. Applied Psychological Measurement, 31, 430–455. doi:10.1177/0146621606297316.

    Article  Google Scholar 

  • Meade, A. W., & Wright, N. A. (2012). Solving the measurement invariance anchor item problem in item response theory. Journal of Applied Psychology, 97, 1016–1031. doi:10.1037/a0027934.

    Article  PubMed  Google Scholar 

  • Meriac, J. P., Poling, T. L., & Woehr, D. J. (2009). Are there differences in work ethic? An examination of the measurement equivalence of the multidimensional work ethic profile? Personality and Individual Differences, 47, 209–213. doi:10.1016/j.paid.2009.03.001.

    Article  Google Scholar 

  • Millsap, R. E. (2011). Statistical approaches to measurement invariance. New York: Routlege.

    Google Scholar 

  • Netemeyer, R. G., Brashear-Alejandro, T., & Boles, J. S. (2004). A cross-national model of job-related outcomes of work role and family role variables: A retail sales context. Journal of the Academy of Marketing Science, 32, 49–60. doi:10.1177/0092070303259128.

    Article  Google Scholar 

  • Park, D. G., & Lautenschlager, G. J. (1990). Improving IRT item bias detection with iterative linking and ability scale purification. Applied Psychological Measurement, 14, 163–173. doi:10.1177/014662169001400205.

    Article  Google Scholar 

  • Parker, C. P., Baltes, B. B., & Christiansen, N. D. (1997). Support for affirmative action, justice perceptions, and work attitudes: A study of gender and racial-ethnic group differences. Journal of Applied Psychology, 82, 376–389. doi:10.1037/0021-9010.82.3.376.

    Article  PubMed  Google Scholar 

  • Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495–502. doi:10.1007/BF02294403.

    Article  Google Scholar 

  • Riordin, C. M., & Vandenberg, R. J. (1994). A central question in cross-cultural research: Do employees of different cultures interpret work-related measures in an equivalent manner? Journal of Management, 20, 643–671. doi:10.1016/0149-2063(94)90007-8.

    Article  Google Scholar 

  • Rivas, G. E. L., Stark, S., & Chernyshenko, O. S. (2009). The effects of referent item parameters on differential item functioning detection using the free baseline likelihood ratio test. Applied Psychological Measurement, 33, 251–265. doi:10.1177/0146621608321760.

    Article  Google Scholar 

  • Rivers, D. C., Meade, A. W., & Fuller, W. L. (2009). Examining question and context effects in organization survey data using item response theory. Organizational Research Methods, 12, 529–553. doi:10.1177/1094428108315864.

    Article  Google Scholar 

  • Robert, C., Probst, T. M., Martocchio, J. J., Drasgow, F., & Lawler, J. J. (2000). Empowerment and continuous improvement in the United States, Mexico, Poland, and India: Predicting fit on the basis of dimensions of power distance and individualism. Journal of Applied Psychology, 85, 643–658. doi:10.1037/0021-9010.85.5.643.

    Article  PubMed  Google Scholar 

  • Rogers, H. J., & Swaminathan, H. (1993). A comparison of the logistic regression and Mantel–Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105–116. doi:10.1177/014662169301700201.

    Article  Google Scholar 

  • Roznowski, M. (1989). Examination of the measurement properties of the job descriptive index with experimental items. Journal of Applied Psychology, 74, 805–814. doi:10.1037/0021-9010.74.5.805.

    Article  Google Scholar 

  • Ryan, A. M., Chan, D., Ployhart, R. E., & Slade, L. A. (1999). Employee attitude surveys in a multinational organization: Considering language and culture in assessing measurement equivalence. Personnel Psychology, 52, 37–58. doi:10.1111/j.1744-6570.1999.tb01812.x.

    Article  Google Scholar 

  • Ryan, A. M., Horvath, M., Ployhart, R. E., Schmitt, N., & Slade, L. A. (2000). Hypothesizing differential item functioning in global employee opinion surveys. Personnel Psychology, 53, 531–562. doi:10.1111/j.1744-6570.2000.tb00213.x.

    Article  Google Scholar 

  • Ryan, A. M., West, B. J., & Carr, J. Z. (2003). Effects of the terrorist attacks of 9/11/01 on employee attitudes. Journal of Applied Psychology, 88, 647–659. doi:10.1037/0021-9010.88.4.647.

    Article  PubMed  Google Scholar 

  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 100–114.

    Google Scholar 

  • Schneider, K. T., Hitlan, R. T., & Radhakrishnan, P. (2000). An examination of the nature and correlates of ethnic harassment experiences in multiple contexts. Journal of Applied Psychology, 85, 3–12. doi:10.1037/0021-9010.85.1.3.

    Article  PubMed  Google Scholar 

  • Shealy, R., & Stout, W. (1993). An item response theory model for test bias and differential test functioning. In P. Holland & H. Wainer (Eds.), Differential item functioning (pp. 197–240). Hillsdale, NJ: Earlbaum.

    Google Scholar 

  • Sheppard, R., Han, K., Colarelli, S. M., Dai, G., & King, D. W. (2006). Differential item functioning by sex and race in the Hogan Personality Inventory. Assessment, 13, 442–453. doi:10.1177/1073191106289031.

    Article  PubMed  Google Scholar 

  • Stark, S. (2001). MODFIT: A computer program for model-data fit. Unpublished manuscript. University of Illinois at Urbana Champaign, Champaign.

  • Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292–1306. doi:10.1037/0021-9010.91.6.1292.

    Article  PubMed  Google Scholar 

  • Thissen, D. (2001). IRTLRDIF: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Chapel Hill, NC: Author.

    Google Scholar 

  • Thissen, D., Chen, W.-H., & Bock, R. D. (2003). Multilog v7.0 [Computer software]. Lincolnwood, IL: Scientific Software International.

    Google Scholar 

  • Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group mean differences: The concept of item bias. Psychological Bulletin, 99, 118–128. doi:10.1037/0033-2909.99.1.118.

    Article  Google Scholar 

  • Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Tsutsumi, A., Iwata, N., Watanabe, N., de Jonge, J., Pikhart, H., Fernández-López, J. A., et al. (2009). Application of item response theory to achieve cross-cultural comparability of occupational stress measurement. International Journal of Methods in Psychiatric Research, 18, 58–67. doi:10.1002/mpr.277.

    Article  PubMed  Google Scholar 

  • Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–70. doi:10.1177/109442810031002.

    Article  Google Scholar 

  • Wang, M., & Russell, S. S. (2005). Measurement equivalence of the Job Descriptive Index across Chinese and American workers: Results from confirmatory factor analysis and item response theory. Educational and Psychological Measurement, 65, 709–732. doi:10.1177/0013164404272494.

    Article  Google Scholar 

  • Wang, W.-C., & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479–498. doi:10.1177/0146621603259902.

    Article  Google Scholar 

  • Woods, C. (2008). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33, 42–57. doi:10.1177/0146621607314044.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nathan T. Carter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carter, N.T., Kotrba, L.M. & Lake, C.J. Null Results in Assessing Survey Score Comparability: Illustrating Measurement Invariance Using Item Response Theory. J Bus Psychol 29, 205–220 (2014). https://doi.org/10.1007/s10869-012-9283-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10869-012-9283-4

Keywords

Navigation