Null Results in Assessing Survey Score Comparability: Illustrating Measurement Invariance Using Item Response Theory

Carter, Nathan T.; Kotrba, Lindsey M.; Lake, Christopher J.

doi:10.1007/s10869-012-9283-4

Null Results in Assessing Survey Score Comparability: Illustrating Measurement Invariance Using Item Response Theory

Published: 17 November 2012

Volume 29, pages 205–220, (2014)
Cite this article

Journal of Business and Psychology Aims and scope Submit manuscript

Nathan T. Carter¹,
Lindsey M. Kotrba² &
Christopher J. Lake³

495 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

In using organizational surveys for decision-making, it is essential to consider measurement equivalence/invariance (ME/I), which addresses the questions of whether score differences are attributable to differences in the latent variable we intend to measure, or attributable to confounding differences in measurement properties. Due to the tendency for null results to remain unpublished, most articles have focused on findings of, and reasons for violations of ME/I. On the other hand, little is available to practitioners and researchers concerning situations where ME/I can be expected to uphold. This is especially disconcerting due to the fact that the null is the desired result in such analyses, and allows for unfettered observed-score comparisons. This special issue presents a unique opportunity to provide such a discussion using real-world examples from an organizational culture survey. In doing so we hope to clear up confusion surrounding the concept of ME/I, when it can be expected, and how it relates to actual differences in scores. First, we review the basic tenets and past findings focusing on ME/I, and discuss the item response theory differential item functioning framework used here. Next, we show ME/I being upheld using organizational survey data wherein violations of ME/I would reasonably not be expected (i.e., the null hypothesis was predicted and supported), and simulate the consequences of ignoring ME/I. Finally, we suggest a set of conditions wherein ME/I is likely to be upheld.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effect of Within-Group Dependency on Fit Statistics in Mokken Scale Analysis in the Presence of Two-Level Test Data

Scale length does matter: Recommendations for measurement invariance testing with categorical factor analysis and item response theory approaches

Article Open access 15 December 2021

The effects of careless responding on the fit of confirmatory factor analysis and item response theory models

Article 03 February 2023

Notes

A fully freed model was fit to ensure that item and person parameters were on the same scale.

References

AERA, APA, & NCME (1999). Standards for educational and psychological testing. Washington, DC: Author.
Ankenmann, R. D., Witt, E. A., & Dunbar, S. B. (1999). An investigation of the power of likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36, 277–300. doi:10.1111/j.1745-3984.1999.tb00558.x.
Article Google Scholar
Behrend, T. S., Thompson, L. F., Meade, A. W., Newton, D. A., & Grayson, M. S. (2008). Measurement invariance in careers research: Using IRT to study gender differences in medical students’ specialization decisions. Journal of Career Development, 35, 60–83. doi:10.1177/0894845308317936.
Article Google Scholar
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). Functional thought experiments. Synthese, 130, 379–387. doi:10.1023/A:1014840616403.
Article Google Scholar
Borsboom, D., Romeijn, J. W., & Wicherts, J. M. (2008). Measurement invariance versus selection invariance: Is fair selection possible? Psychological Methods, 13, 75–98. doi:10.1037/1082-989X.13.2.75.
Article PubMed Google Scholar
Brown, J. R. (1991). The laboratory of the mind: Thought experiments in the natural sciences. London: Routledge.
Google Scholar
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks: Sage.
Google Scholar
Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253–260. doi:10.1177/014662168801200304.
Article Google Scholar
Carlson, D. S., Kacmar, K. M., & Williams, L. J. (2000). Construction and initial validation of a multidimensional measure of work-family conflict. Journal of Vocational Behavior, 56, 249–276. doi:10.1006/jvbe.1999.1713.
Article Google Scholar
Carter, N. T., Kotrba, L. M., Diab, D. L., Lake, C. J., Lin, B. C., Pui, S.-Y., et al. (2012). A comparison of a subjective and statistical method for establishing score comparability in an organizational culture survey. Journal of Business and Psychology,. doi:10.1007/s10869-011-9254-1.
Google Scholar
Carter, N. T., & Zickar, M. J. (2011). A comparison of the LR and DFIT approaches to differential item functioning under the generalized graded unfolding model. Applied Psychological Measurement,. doi:10.1177/0146621611427898.
Google Scholar
Chan, K.-Y., Drasgow, F., & Sawin, L. L. (1999). What is the shelf life of a test? The effect of time on the psychometrics of a cognitive ability test battery. Journal of Applied Psychology, 84, 610–619. doi:10.1037/0021-9010.84.4.610.
Article Google Scholar
Chernyshenko, O. S., Stark, S., Chan, K.-Y., Drasgow, F., & Williams, B. (2001). Fitting item response theory models to two personality inventories: Issues and Insights. Multivariate Behavioral Research, 36, 523–562. doi:10.1207/S15327906MBR3604_03.
Article Google Scholar
Cheung, G. W., & Rensvold, R. B. (2000). Assessing extreme and acquiescence response sets in cross-cultural research using structural equations modeling. Journal of Cross-Cultural Psychology, 31, 187–212. doi:10.1177/0022022100031002003.
Article Google Scholar
Chuah, S. C., Drasgow, F., & Roberts, B. W. (2006). Personality assessment: Does the medium matter? No. Journal of Research in Personality, 40, 359–376. doi:10.1016/j.jrp.2005.01.006.
Article Google Scholar
Clark, P. C., & LaHuis, D. M. (2011). An examination of power and type I errors for two differential item functioning indices using the graded response model. Organizational Research Methods,. doi:10.1177/1094428111403815.
Google Scholar
Cohen, A. S., Kim, S.-H., & Wollack, J. A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement, 20, 15–26. doi:10.1177/014662169602000102.
Article Google Scholar
Collins, W. C., Raju, N. S., & Edwards, J. E. (2000). Assessing differential functioning in a satisfaction scale. Journal of Applied Psychology, 85, 451–461. doi:10.1037/0021-9010.85.3.451.
Article PubMed Google Scholar
De Ayala, R. J., Kim, S.-H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2, 243–276. doi:10.1207/S15327574IJT023&4_4.
Article Google Scholar
De Beuckelaer, A., & Lievens, F. (2009). Measurement equivalence of paper-and-pencil and internet organisational surveys: A large scale examination of 16 countries. Applied Psychology: An International Review, 58, 336–361. doi:10.1111/j.1464-0597.2008.00350.x.
Article Google Scholar
de Dreu, C. K. W., Evers, A., Beersma, B., Kluwer, E. S., & Nauta, A. (2001). A theory-based measure of conflict management strategies in the workplace. Journal of Organizational Behavior, 22, 645–668. doi:10.1002/job.107.
Article Google Scholar
Denison, D. R. (1984). Bringing corporate culture to the bottom line. Organizational Dynamics, 13, 4–22.
Google Scholar
Denison, D. R. (1990). Corporate culture and organizational effectiveness. New York: John Wiley & Sons.
Denison, D. R. (2000). Organizational culture: Can it be a key lever for driving organizational change? In C. L. Cooper, S. Cartwright, & P. C. Earley (Eds.), The International Handbook of Organizational Culture and Climate (pp. 347–372).
Denison, D. R., & Mishra, A. K. (1995). Toward a theory of organizational culture and effectiveness. Organizational Science, 6, 204–223.
Google Scholar
Denison, D. R., Janovics, J., Young, J., & Cho, H. J. (2007). Diagnosing organizational cultures: Validating a model and method. Working paper, International Institute for Management Development, Lausanne, Switzerland.
Donovan, M. A., Drasgow, F., & Probst, T. M. (2000). Does computerizing pencil-and-paper job attitude scales make a difference? IRT analyses offer insight. Journal of Applied Psychology, 85, 305–313. doi:10.1037/0021-9010.85.2.305.
Article PubMed Google Scholar
Drasgow, F. (1982). Biased test items and differential validity. Psychological Bulletin, 92, 526–531. doi:10.1037/0033-2909.92.2.526.
Article Google Scholar
Drasgow, F. (1987). Study of the measurement bias of two standardized psychological tests. Journal of Applied Psychology, 72, 19–29. doi:10.1037/0021-9010.72.1.19.
Article Google Scholar
Drasgow, F., & Lissak, R. I. (1983). Modified parallel analysis: A procedure for examining the latent dimensionality of dichotomously scored item responses. Journal of Applied Psychology, 68, 363–373.
Google Scholar
Drasgow, F., Levine, M. V., Tsien, S., Williams, B. A., & Mead, A. D. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19, 143–165. doi:10.1177/014662169501900203.
Article Google Scholar
Edwards, J. R., & Bagozzi, R. P. (2000). On the nature and direction of relationships between constructs and measures. Psychological Methods, 5, 155–174. doi:10.1037/1082-989X.5.2.155.
Article PubMed Google Scholar
Eid, M., & Rauber, M. (2000). Detecting measurement invariance in organizational surveys. European Journal of Psychological Assessment, 16, 20–30. doi:10.1027//1015-5759.16.1.20.
Article Google Scholar
Einarsdóttir, S., & Rounds, J. (2009). Gender bias and construct validity in vocational interest measurement: differential item functioning in the strong interest inventory. Journal of Vocational Behavior, 74, 295–307. doi:10.1016/j.jvb.2009.01.003.
Article Google Scholar
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
Google Scholar
Finch, W. H., & French, B. F. (2008). Anomalous type I error rates for identifying one type of differential item functioning in the presence of the other. Educational and Psychological Measurement, 68, 742–759. doi:10.1177/0013164407313370.
Article Google Scholar
Flowers, C. P., Oshima, T. C., & Raju, N. S. (1999). A description and demonstration of the polytomous-DFIT framework. Applied Psychological Measurement, 23, 309–326. doi:10.1177/01466219922031437.
Article Google Scholar
Han, K. T. (2007). WinGen: Windows software that generates IRT parameters and item responses. Applied Psychological Measurement, 31, 457–459. doi:10.1177/0146621607299271.
Article Google Scholar
Hofstede, G. (1991). Cultures and organizations: Software of the mind. London: McGraw-Hill.
Google Scholar
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel–Haenszel procedure. In H. Wainer & H. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Earlbaum.
Google Scholar
Hulin, C. L., Drasgow, F., & Parsons, C. K. (1983). Item response theory. Homewood, IL: Dow Jones-Irwin.
Google Scholar
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535. doi:10.1037/0033-2909.112.3.527.
Article Google Scholar
Kim, S.-H., & Cohen, A. S. (1995). A comparison of Lord’s Chi-square, Raju’s area measures, and the likelihood ratio test on detection of differential item functioning. Applied Measurement in Education, 8, 291–312. doi:10.1207/s15324818ame0804_2.
Article Google Scholar
Kim, S.-H., & Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22, 345–355. doi:10.1177/014662169802200403.
Article Google Scholar
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking. New York, NY: Springer.
Book Google Scholar
Lautenschlager, G. J., & Park, D. G. (1988). Item response theory item bias detection procedures: Issues of robustness, model misspecification and parameter linking. Applied Psychological Measurement, 12, 365–376. doi:10.1177/014662168801200404.
Article Google Scholar
Lazarsfeld, P. F. (1959). Latent structure analysis. New York: McGraw-Hill.
Google Scholar
Liu, C., Borg, I., & Spector, P. E. (2004). Measurement equivalence of the German Job Satisfaction Survey used in a multinational organization: Implications of Schwartz’s culture model. Journal of Applied Psychology, 89, 1070–1082. doi:10.1037/0021-9010.89.6.1070.
Article PubMed Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. New York: Information Age Publishing.
Google Scholar
Mach, E. (1926). On thought experiments. In E. Mach (Ed.), Knowledge and error: Sketches on the psychology of enquiry. Dodrecht: D. Reidel Publishing Co.
Google Scholar
Maydeu-Olivares, A., & Cai, L. (2006). A cautionary note on using G ² to assess relative model fit in categorical data analysis. Multivariate Behavioral Research, 41, 55–64. doi:10.1207/s15327906mbr4101_4.
Article Google Scholar
Meade, A. W. (2010). A taxonomy of effect size measures for the differential functioning of items and scales. Journal of Applied Psychology, 95, 728–743. doi:10.1037/a0018966.
Article PubMed Google Scholar
Meade, A. W., Lautenschlager, G. J., & Johnson, E. C. (2007). A comparison of item response theory and confirmatory factor analytic methodologies for establishing measurement equivalence/invariance. Applied Psychological Measurement, 31, 430–455. doi:10.1177/0146621606297316.
Article Google Scholar
Meade, A. W., & Wright, N. A. (2012). Solving the measurement invariance anchor item problem in item response theory. Journal of Applied Psychology, 97, 1016–1031. doi:10.1037/a0027934.
Article PubMed Google Scholar
Meriac, J. P., Poling, T. L., & Woehr, D. J. (2009). Are there differences in work ethic? An examination of the measurement equivalence of the multidimensional work ethic profile? Personality and Individual Differences, 47, 209–213. doi:10.1016/j.paid.2009.03.001.
Article Google Scholar
Millsap, R. E. (2011). Statistical approaches to measurement invariance. New York: Routlege.
Google Scholar
Netemeyer, R. G., Brashear-Alejandro, T., & Boles, J. S. (2004). A cross-national model of job-related outcomes of work role and family role variables: A retail sales context. Journal of the Academy of Marketing Science, 32, 49–60. doi:10.1177/0092070303259128.
Article Google Scholar
Park, D. G., & Lautenschlager, G. J. (1990). Improving IRT item bias detection with iterative linking and ability scale purification. Applied Psychological Measurement, 14, 163–173. doi:10.1177/014662169001400205.
Article Google Scholar
Parker, C. P., Baltes, B. B., & Christiansen, N. D. (1997). Support for affirmative action, justice perceptions, and work attitudes: A study of gender and racial-ethnic group differences. Journal of Applied Psychology, 82, 376–389. doi:10.1037/0021-9010.82.3.376.
Article PubMed Google Scholar
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495–502. doi:10.1007/BF02294403.
Article Google Scholar
Riordin, C. M., & Vandenberg, R. J. (1994). A central question in cross-cultural research: Do employees of different cultures interpret work-related measures in an equivalent manner? Journal of Management, 20, 643–671. doi:10.1016/0149-2063(94)90007-8.
Article Google Scholar
Rivas, G. E. L., Stark, S., & Chernyshenko, O. S. (2009). The effects of referent item parameters on differential item functioning detection using the free baseline likelihood ratio test. Applied Psychological Measurement, 33, 251–265. doi:10.1177/0146621608321760.
Article Google Scholar
Rivers, D. C., Meade, A. W., & Fuller, W. L. (2009). Examining question and context effects in organization survey data using item response theory. Organizational Research Methods, 12, 529–553. doi:10.1177/1094428108315864.
Article Google Scholar
Robert, C., Probst, T. M., Martocchio, J. J., Drasgow, F., & Lawler, J. J. (2000). Empowerment and continuous improvement in the United States, Mexico, Poland, and India: Predicting fit on the basis of dimensions of power distance and individualism. Journal of Applied Psychology, 85, 643–658. doi:10.1037/0021-9010.85.5.643.
Article PubMed Google Scholar
Rogers, H. J., & Swaminathan, H. (1993). A comparison of the logistic regression and Mantel–Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105–116. doi:10.1177/014662169301700201.
Article Google Scholar
Roznowski, M. (1989). Examination of the measurement properties of the job descriptive index with experimental items. Journal of Applied Psychology, 74, 805–814. doi:10.1037/0021-9010.74.5.805.
Article Google Scholar
Ryan, A. M., Chan, D., Ployhart, R. E., & Slade, L. A. (1999). Employee attitude surveys in a multinational organization: Considering language and culture in assessing measurement equivalence. Personnel Psychology, 52, 37–58. doi:10.1111/j.1744-6570.1999.tb01812.x.
Article Google Scholar
Ryan, A. M., Horvath, M., Ployhart, R. E., Schmitt, N., & Slade, L. A. (2000). Hypothesizing differential item functioning in global employee opinion surveys. Personnel Psychology, 53, 531–562. doi:10.1111/j.1744-6570.2000.tb00213.x.
Article Google Scholar
Ryan, A. M., West, B. J., & Carr, J. Z. (2003). Effects of the terrorist attacks of 9/11/01 on employee attitudes. Journal of Applied Psychology, 88, 647–659. doi:10.1037/0021-9010.88.4.647.
Article PubMed Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 100–114.
Google Scholar
Schneider, K. T., Hitlan, R. T., & Radhakrishnan, P. (2000). An examination of the nature and correlates of ethnic harassment experiences in multiple contexts. Journal of Applied Psychology, 85, 3–12. doi:10.1037/0021-9010.85.1.3.
Article PubMed Google Scholar
Shealy, R., & Stout, W. (1993). An item response theory model for test bias and differential test functioning. In P. Holland & H. Wainer (Eds.), Differential item functioning (pp. 197–240). Hillsdale, NJ: Earlbaum.
Google Scholar
Sheppard, R., Han, K., Colarelli, S. M., Dai, G., & King, D. W. (2006). Differential item functioning by sex and race in the Hogan Personality Inventory. Assessment, 13, 442–453. doi:10.1177/1073191106289031.
Article PubMed Google Scholar
Stark, S. (2001). MODFIT: A computer program for model-data fit. Unpublished manuscript. University of Illinois at Urbana Champaign, Champaign.
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292–1306. doi:10.1037/0021-9010.91.6.1292.
Article PubMed Google Scholar
Thissen, D. (2001). IRTLRDIF: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Chapel Hill, NC: Author.
Google Scholar
Thissen, D., Chen, W.-H., & Bock, R. D. (2003). Multilog v7.0 [Computer software]. Lincolnwood, IL: Scientific Software International.
Google Scholar
Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group mean differences: The concept of item bias. Psychological Bulletin, 99, 118–128. doi:10.1037/0033-2909.99.1.118.
Article Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Tsutsumi, A., Iwata, N., Watanabe, N., de Jonge, J., Pikhart, H., Fernández-López, J. A., et al. (2009). Application of item response theory to achieve cross-cultural comparability of occupational stress measurement. International Journal of Methods in Psychiatric Research, 18, 58–67. doi:10.1002/mpr.277.
Article PubMed Google Scholar
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–70. doi:10.1177/109442810031002.
Article Google Scholar
Wang, M., & Russell, S. S. (2005). Measurement equivalence of the Job Descriptive Index across Chinese and American workers: Results from confirmatory factor analysis and item response theory. Educational and Psychological Measurement, 65, 709–732. doi:10.1177/0013164404272494.
Article Google Scholar
Wang, W.-C., & Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479–498. doi:10.1177/0146621603259902.
Article Google Scholar
Woods, C. (2008). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33, 42–57. doi:10.1177/0146621607314044.
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Georgia, 323 Psychology Building, Athens, GA, 30602, USA
Nathan T. Carter
Denison Organizational Consulting, 121 West Washington St # 201, Ann Arbor, MI, 48104-1382, USA
Lindsey M. Kotrba
Bowling Green State University, Psychology Building, Bowling Green, OH, 43403, USA
Christopher J. Lake

Authors

Nathan T. Carter
View author publications
You can also search for this author in PubMed Google Scholar
Lindsey M. Kotrba
View author publications
You can also search for this author in PubMed Google Scholar
Christopher J. Lake
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nathan T. Carter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carter, N.T., Kotrba, L.M. & Lake, C.J. Null Results in Assessing Survey Score Comparability: Illustrating Measurement Invariance Using Item Response Theory. J Bus Psychol 29, 205–220 (2014). https://doi.org/10.1007/s10869-012-9283-4

Download citation

Published: 17 November 2012
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10869-012-9283-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Null Results in Assessing Survey Score Comparability: Illustrating Measurement Invariance Using Item Response Theory

Abstract

Access this article

Similar content being viewed by others

Effect of Within-Group Dependency on Fit Statistics in Mokken Scale Analysis in the Presence of Two-Level Test Data

Scale length does matter: Recommendations for measurement invariance testing with categorical factor analysis and item response theory approaches

The effects of careless responding on the fit of confirmatory factor analysis and item response theory models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Null Results in Assessing Survey Score Comparability: Illustrating Measurement Invariance Using Item Response Theory

Abstract

Access this article

Similar content being viewed by others

Effect of Within-Group Dependency on Fit Statistics in Mokken Scale Analysis in the Presence of Two-Level Test Data

Scale length does matter: Recommendations for measurement invariance testing with categorical factor analysis and item response theory approaches

The effects of careless responding on the fit of confirmatory factor analysis and item response theory models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation