A Comparison of a Subjective and Statistical Method for Establishing Score Comparability in an Organizational Culture Survey

Carter, Nathan T.; Kotrba, Lindsey M.; Diab, Dalia L.; Lin, Bing C.; Pui, Shuang Y.; Lake, Christopher J.; Gillespie, Michael A.; Zickar, Michael J.; Chao, Alen

doi:10.1007/s10869-011-9254-1

A Comparison of a Subjective and Statistical Method for Establishing Score Comparability in an Organizational Culture Survey

Published: 03 January 2012

Volume 27, pages 451–466, (2012)
Cite this article

Journal of Business and Psychology Aims and scope Submit manuscript

Nathan T. Carter¹,
Lindsey M. Kotrba²,
Dalia L. Diab³,
Bing C. Lin⁴,
Shuang Y. Pui⁵,
Christopher J. Lake⁶,
Michael A. Gillespie⁷,
Michael J. Zickar⁶ &
…
Alen Chao¹

848 Accesses
8 Citations
Explore all metrics

Abstract

Purpose

The purpose of this study was to compare the results of a subjective and a statistical method of detecting non-comparable items in 14 language-translated forms of a survey measuring organizational culture.

Design/Methodology/Approach

Data were obtained from a large multinational organization using a 60-item organizational culture survey. Each of 14 language-translated forms were administered to members of their respective language groups and compared to the original English (United States) form. Subjective reviews were conducted using fluent bilingual organizational members whom flagged items they did not believe were comparable. Statistical analyses using an item response theory (IRT) approach were used to detect problematic items from 14 samples with sizes ranging from 304 to 3,014. Detection patterns from these two approaches were compared.

Findings

The subjective approach identified far less items as problematic and did not agree with the statistical approach.

Implications

Our results suggest that the subjective approach as a pre-screening adaptation procedure has little added value over a careful translation/back-translation procedure.

Originality/Value

The use of language-adapted organizational surveys has become increasingly important to multinational organizations. In examining scores from such surveys the establishment of score comparability is essential. IRT analyses are often used to determine whether scores are comparable across language translations. It is also common for subjective reviews of item content to be utilized prior to statistical techniques to determine if the translated items are comparable to the original form. No research known to authors has compared modern statistical and subjective approaches to addressing these issues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparability of Survey Measurements

Assessing conceptual comparability of single-item survey instruments with a mixed-methods approach

Article Open access 19 December 2023

Scale reliability of alternative cultural theory survey measures

Article 05 April 2023

Notes

Ryan et al. (2000) utilized the Differential Functioning of Items and Tests (DFIT) framework. See Oshima and Morris (2008) for a clear and accessible summary of the DFIT approach.
Throughout this manuscript, we refer to the LR approach as “the statistical” approach. However, as one reviewer noted there are many statistical approaches, and the choice of a statistical approach is to some extent subjective. We have done our best to consider past research evidence to choose a method that has seen great empirical support.
It should be noted that these analyses represent initial translations of the DOCS and these language-translated forms have since undergone substantial revisions. For example, the proportion of items shown to have DIF in the most recent Japanese form was significantly smaller, z = 2.47, p < .05. Details of this analysis can be obtained by contacting the first author.
As a measure of association we utilized Cramér’s φ, which is calculated \( \varphi = \sqrt {\frac{{\chi^{2} }}{n(r - 1)}} \), where n is the number of observations and r is the number of rows (see Cramér 1945).

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Google Scholar
Baker, M., Olohan, M., & Perez, M. C. (2010). Text and context: Essays on translation and interpreting in honour of Ian Mason. Manchester, UK: Saint Jerome Publishing.
Google Scholar
Behrend, T. S., Thompson, L. F., Meade, A. W., Newton, D. A., & Grayson, M. S. (2008). Measurement invariance in careers research: Using IRT to study gender differences in medical students’ specialization decisions. Journal of Career Development, 35, 60–83.
Article Google Scholar
Berdie, F. S. (1971). What test questions are likely to offend the general public. Journal of Educational Measurement, 8, 87–93.
Article Google Scholar
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071.
Article PubMed Google Scholar
Brown, R. T., Reynolds, C. R., & Whitaker, J. S. (1999). Bias in mental testing since “Bias in Mental Testing”. School Psychology Quarterly, 14, 208–238.
Article Google Scholar
Budgell, G. R., Raju, N. S., & Quartetti, D. A. (1995). Analysis of differential item functioning in translated assessment instruments. Applied Psychological Measurement, 19, 309–321.
Article Google Scholar
Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253–260.
Article Google Scholar
Carter, N. T., & Zickar, M. J. (2011). A comparison of the LR and DFIT frameworks of differential functioning to the generalized graded unfolding model. Applied Psychological Measurement, 35, 623–642.
Article Google Scholar
Clark, P. C., & LaHuis, D. M. (2011). An examination of power and type I errors for two differential item functioning indices using the graded response model. Organizational Research Methods. doi:10.1177/1094428111403815.
Cramér, H. (1945). Mathematical methods of statistics. Uppsala: Almqvist & Wiksells.
Google Scholar
De Ayala, R. J., Kim, S.-H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2, 243–276.
Google Scholar
Denison, D. R. (1984). Bringing corporate culture to the bottom line. Organizational Dynamics, 13, 4–22.
Article PubMed Google Scholar
Denison, D. R. (1990). Corporate culture and organizational effectiveness. New York: Wiley.
Google Scholar
Denison, D. R. (2000). Organizational culture: Can it be a key lever for driving organizational change? In C. L. Cooper, S. Cartwright, & P. C. Earley (Eds.), The international handbook of organizational culture and climate (pp. 347–372). New York: Wiley.
Google Scholar
Denison, D. R., Janovics, J., Young, J., & Cho, H. J. (2007). Diagnosing organizational cultures: Validating a model and method. Working paper, International Institute for Management Development, Lausanne, Switzerland.
Denison, D. R., & Mishra, A. K. (1995). Toward a theory of organizational culture and effectiveness. Organizational Science, 6, 204–223.
Article Google Scholar
Denison, D. R., & Neale, W. (1996). Denison organizational culture survey. Ann Arbor: Aviat.
Google Scholar
Denison, D. R., & Neale, W. (2000). Denison organizational culture survey. Ann Arbor, MI: Denison Consulting.
Google Scholar
Drasgow, F. (1984). Scrutinizing psychological tests: Measurement equivalence and equivalent relations with external variables are central issues. Psychological Bulletin, 95, 134–135.
Article Google Scholar
Drasgow, F., Levine, M. V., Tsien, S., Williams, B. A., & Mead, A. D. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19, 143–165.
Article Google Scholar
Ellis, B. B., Becker, P., & Kimmel, H. D. (1993). An item response theory evaluation of an English version of the Trier Personality Inventory. Journal of Cross-Cultural Psychology, 23, 133–148.
Article Google Scholar
Ellis, B. B., Minsel, B., & Becker, P. (1989). Evaluation of attitude survey translations: An investigation using item response theory. International Journal of Psychology, 24, 665–684.
Google Scholar
Fey, C. F., & Denison, D. R. (2003). Organizational culture and effectiveness: Can American theory be applied in Russia? Organizational Science, 14, 686–706.
Article Google Scholar
Gierl, M. J., & Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement, 38, 164–187.
Article Google Scholar
Guion, R. M. (1977). Content validity: The source of my discontent. Applied Psychological Measurement, 1, 1–10.
Article Google Scholar
Guion, R. M. (1980). On Trinitarian doctrines of validity. Professional Psychology, 11, 385–398.
Article Google Scholar
Hambleton, R. K., & de Jong, J. H. A. L. (2003). Advances in translating and adapting educational and psychological tests. Language Testing, 20, 127–134.
Article Google Scholar
Hambleton, R. K., & Kanjee, A. (1995). Increasing the validity of cross-cultural assessments: Use of improved methods for test adaptation. European Journal of Psychological Assessment, 11, 147–157.
Article Google Scholar
Hambleton, R. K., & Patsula, L. (1999). Increasing the validity of adapted tests: Myths to be avoided and guidelines for improving test adaptation practices. Journal of Applied Testing Technology, 1, 1–12.
Google Scholar
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Google Scholar
Highhouse, S. (2008). Stubborn reliance on intuition and subjectivity in employee selection. Industrial and Organizational Psychology: Perspectives on Science and Practice, 1, 333–342.
Article Google Scholar
Hofstede, G. (1991). Empirical models of cultural differences. In N. Bleichrodt & P. J. D. Drenth (Eds.), Contemporary issues in cross-cultural psychology (pp. 4–20). Lisse: Swets & Zeitlinger.
Google Scholar
Huang, C. D., Church, A. T., & Katigbak, M. S. (1997). Identifying cultural differences in items and traits: Differential item functioning in the NEO personality inventory. Journal of Cross-Cultural Psychology, 28, 192–218.
Article Google Scholar
Hulin, C. L., Drasgow, F., & Komocar, J. (1982). Applications of item response theory to analysis of attitude scale translations. Journal of Applied Psychology, 67, 818–825.
Article Google Scholar
Hulin, C. L., & Mayer, L. J. (1986). Psychometric equivalence of a translation of the job descriptive index into Hebrew. Journal of Applied Psychology, 71, 83–94.
Article Google Scholar
Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.
Google Scholar
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.
Article Google Scholar
Kim, S.-H., & Cohen, A. S. (1995). A comparison of Lord’s Chi-square, Raju’s area measures, and the likelihood ratio test on detection of differential item functioning. Applied Measurement in Education, 8, 291–312.
Article Google Scholar
Kim, S.-H., & Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22, 345–355.
Article Google Scholar
Landy, F. J. (2005). Employment discrimination litigation: Behavioral, quantitative, and legal perspectives. San Francisco: Jossey-Bass.
Google Scholar
McKay, R., Breslow, M., Sangster, R., Gabbard, S., Reynolds, R., Nakamoto, J., et al. (1996). Translating survey questionnaires: Lessons learned. New Directions for Evaluation, 70, 93–104.
Article Google Scholar
Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.
Article Google Scholar
Oshima, T. C., & Morris, S. B. (2008). Raju’s differential functioning of items and tests. Educational Measurement: Issues and Practice, 27, 43–50.
Article Google Scholar
Pérez, E. O. (2009). Lost in translation? Item validity in bilingual political surveys. The Journal of Politics, 71, 1530–1548.
Article Google Scholar
Plake, B. S. (1980). A comparison of a statistical and subjective procedure to ascertain item validity: One step in the test validation process. Educational and Psychological Measurement, 40, 397–404.
Article Google Scholar
Ramirez, M., Ford, M. E., Stewart, A. L., & Teresi, J. A. (2005). Measurement issues in health disparities research. Health Services Research, 40, 1640–1657.
Article PubMed Google Scholar
Ramsey, P. (1993). Sensitivity review: The ETS experience as a case study. In P. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Rogers, T. B. (1992). Antecedents of operationalism: A case history in radical positivism. In C. W. Tolman (Ed.), Positivism in psychology: Historical and contemporary problems. London: Springer-Verlag.
Google Scholar
Ross, S. J., & Okabe, J. (2006). The subjective and objective interface of bias detection on languages. International Journal of Testing, 6, 229–253.
Article Google Scholar
Ryan, A. M., Chan, D., Ployhart, R. E., & Slade, A. L. (1999). Employee attitude surveys in a multinational organization: Considering language and culture in assessing measurement equivalence. Personnel Psychology, 52, 37–58.
Article Google Scholar
Ryan, A. M., Horvath, M., Ployhart, R. E., Schmitt, N., & Slade, L. A. (2000). Hypothesizing differential item functioning in global employee opinion surveys. Personnel Psychology, 53, 531–562.
Article Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 100–114.
Google Scholar
Sandoval, J., & Miille, M. W. (1980). Accuracy of judgments of WISC-R item difficulty for minority groups. Journal of Consulting and Clinical Psychology, 48, 249–253.
Article Google Scholar
Searcy, C. A., & Lautenschlager, G. J. (1999). A monte carlo investigation of DIF assessment for polytomously scored items. Paper presented at the 14th annual meeting of the Society for Industrial and Organizational Psychology, Atlanta, GA.
Thissen, D. (2001). IRTLRDIFv2.0b: Software for the computation of the statistics involved in item response theory-based likelihood ratio tests for differential item functioning. Chapel Hill, NC: University of North Carolina.
Google Scholar
Thissen, D. (2003). MULTILOG 7.0: Multiple, categorical item analysis and test scoring using item response theory [Computer Program]. Chicago, IL: Scientific Software, Inc.
Google Scholar
Van de Vijver, F., & Hambleton, R. K. (1996). Translating tests: Some practical guidelines. European Psychologist, 1, 89–99.
Article Google Scholar
Willgerodt, M. A. (2003). Using focus groups to develop culturally relevant instruments. Western Journal of Nursing Research, 25, 798–814.
Article PubMed Google Scholar
Woods, C. M. (2011). DIF testing for ordinal items with poly-SIBTEST, the Mantel and GMH tests, and IRT-LR-DIF when the latent distribution is nonnormal for both groups. Applied Psychological Measurement, 35, 145–164.
Article Google Scholar
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning: Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Central Florida, 330 Psychology Building, Orlando, FL, 32816, USA
Nathan T. Carter & Alen Chao
Denison Consulting, Ann Arbor, MI, USA
Lindsey M. Kotrba
Xavier University, Cincinnati, OH, USA
Dalia L. Diab
Portland State University, Portland, OR, USA
Bing C. Lin
University of Illinois, Springfield, Springfield, IL, USA
Shuang Y. Pui
Bowling Green State University, Bowling Green, OH, USA
Christopher J. Lake & Michael J. Zickar
University of South Florida at Sarasota, Sarasota, FL, USA
Michael A. Gillespie

Authors

Nathan T. Carter
View author publications
You can also search for this author in PubMed Google Scholar
Lindsey M. Kotrba
View author publications
You can also search for this author in PubMed Google Scholar
Dalia L. Diab
View author publications
You can also search for this author in PubMed Google Scholar
Bing C. Lin
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Y. Pui
View author publications
You can also search for this author in PubMed Google Scholar
Christopher J. Lake
View author publications
You can also search for this author in PubMed Google Scholar
Michael A. Gillespie
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Zickar
View author publications
You can also search for this author in PubMed Google Scholar
Alen Chao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nathan T. Carter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carter, N.T., Kotrba, L.M., Diab, D.L. et al. A Comparison of a Subjective and Statistical Method for Establishing Score Comparability in an Organizational Culture Survey. J Bus Psychol 27, 451–466 (2012). https://doi.org/10.1007/s10869-011-9254-1

Download citation

Published: 03 January 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10869-011-9254-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparison of a Subjective and Statistical Method for Establishing Score Comparability in an Organizational Culture Survey