Abstract
Purpose
The purpose of this study was to compare the results of a subjective and a statistical method of detecting non-comparable items in 14 language-translated forms of a survey measuring organizational culture.
Design/Methodology/Approach
Data were obtained from a large multinational organization using a 60-item organizational culture survey. Each of 14 language-translated forms were administered to members of their respective language groups and compared to the original English (United States) form. Subjective reviews were conducted using fluent bilingual organizational members whom flagged items they did not believe were comparable. Statistical analyses using an item response theory (IRT) approach were used to detect problematic items from 14 samples with sizes ranging from 304 to 3,014. Detection patterns from these two approaches were compared.
Findings
The subjective approach identified far less items as problematic and did not agree with the statistical approach.
Implications
Our results suggest that the subjective approach as a pre-screening adaptation procedure has little added value over a careful translation/back-translation procedure.
Originality/Value
The use of language-adapted organizational surveys has become increasingly important to multinational organizations. In examining scores from such surveys the establishment of score comparability is essential. IRT analyses are often used to determine whether scores are comparable across language translations. It is also common for subjective reviews of item content to be utilized prior to statistical techniques to determine if the translated items are comparable to the original form. No research known to authors has compared modern statistical and subjective approaches to addressing these issues.
Similar content being viewed by others
Notes
Throughout this manuscript, we refer to the LR approach as “the statistical” approach. However, as one reviewer noted there are many statistical approaches, and the choice of a statistical approach is to some extent subjective. We have done our best to consider past research evidence to choose a method that has seen great empirical support.
It should be noted that these analyses represent initial translations of the DOCS and these language-translated forms have since undergone substantial revisions. For example, the proportion of items shown to have DIF in the most recent Japanese form was significantly smaller, z = 2.47, p < .05. Details of this analysis can be obtained by contacting the first author.
As a measure of association we utilized Cramér’s φ, which is calculated \( \varphi = \sqrt {\frac{{\chi^{2} }}{n(r - 1)}} \), where n is the number of observations and r is the number of rows (see Cramér 1945).
References
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Baker, M., Olohan, M., & Perez, M. C. (2010). Text and context: Essays on translation and interpreting in honour of Ian Mason. Manchester, UK: Saint Jerome Publishing.
Behrend, T. S., Thompson, L. F., Meade, A. W., Newton, D. A., & Grayson, M. S. (2008). Measurement invariance in careers research: Using IRT to study gender differences in medical students’ specialization decisions. Journal of Career Development, 35, 60–83.
Berdie, F. S. (1971). What test questions are likely to offend the general public. Journal of Educational Measurement, 8, 87–93.
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061–1071.
Brown, R. T., Reynolds, C. R., & Whitaker, J. S. (1999). Bias in mental testing since “Bias in Mental Testing”. School Psychology Quarterly, 14, 208–238.
Budgell, G. R., Raju, N. S., & Quartetti, D. A. (1995). Analysis of differential item functioning in translated assessment instruments. Applied Psychological Measurement, 19, 309–321.
Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253–260.
Carter, N. T., & Zickar, M. J. (2011). A comparison of the LR and DFIT frameworks of differential functioning to the generalized graded unfolding model. Applied Psychological Measurement, 35, 623–642.
Clark, P. C., & LaHuis, D. M. (2011). An examination of power and type I errors for two differential item functioning indices using the graded response model. Organizational Research Methods. doi:10.1177/1094428111403815.
Cramér, H. (1945). Mathematical methods of statistics. Uppsala: Almqvist & Wiksells.
De Ayala, R. J., Kim, S.-H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2, 243–276.
Denison, D. R. (1984). Bringing corporate culture to the bottom line. Organizational Dynamics, 13, 4–22.
Denison, D. R. (1990). Corporate culture and organizational effectiveness. New York: Wiley.
Denison, D. R. (2000). Organizational culture: Can it be a key lever for driving organizational change? In C. L. Cooper, S. Cartwright, & P. C. Earley (Eds.), The international handbook of organizational culture and climate (pp. 347–372). New York: Wiley.
Denison, D. R., Janovics, J., Young, J., & Cho, H. J. (2007). Diagnosing organizational cultures: Validating a model and method. Working paper, International Institute for Management Development, Lausanne, Switzerland.
Denison, D. R., & Mishra, A. K. (1995). Toward a theory of organizational culture and effectiveness. Organizational Science, 6, 204–223.
Denison, D. R., & Neale, W. (1996). Denison organizational culture survey. Ann Arbor: Aviat.
Denison, D. R., & Neale, W. (2000). Denison organizational culture survey. Ann Arbor, MI: Denison Consulting.
Drasgow, F. (1984). Scrutinizing psychological tests: Measurement equivalence and equivalent relations with external variables are central issues. Psychological Bulletin, 95, 134–135.
Drasgow, F., Levine, M. V., Tsien, S., Williams, B. A., & Mead, A. D. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19, 143–165.
Ellis, B. B., Becker, P., & Kimmel, H. D. (1993). An item response theory evaluation of an English version of the Trier Personality Inventory. Journal of Cross-Cultural Psychology, 23, 133–148.
Ellis, B. B., Minsel, B., & Becker, P. (1989). Evaluation of attitude survey translations: An investigation using item response theory. International Journal of Psychology, 24, 665–684.
Fey, C. F., & Denison, D. R. (2003). Organizational culture and effectiveness: Can American theory be applied in Russia? Organizational Science, 14, 686–706.
Gierl, M. J., & Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement, 38, 164–187.
Guion, R. M. (1977). Content validity: The source of my discontent. Applied Psychological Measurement, 1, 1–10.
Guion, R. M. (1980). On Trinitarian doctrines of validity. Professional Psychology, 11, 385–398.
Hambleton, R. K., & de Jong, J. H. A. L. (2003). Advances in translating and adapting educational and psychological tests. Language Testing, 20, 127–134.
Hambleton, R. K., & Kanjee, A. (1995). Increasing the validity of cross-cultural assessments: Use of improved methods for test adaptation. European Journal of Psychological Assessment, 11, 147–157.
Hambleton, R. K., & Patsula, L. (1999). Increasing the validity of adapted tests: Myths to be avoided and guidelines for improving test adaptation practices. Journal of Applied Testing Technology, 1, 1–12.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Highhouse, S. (2008). Stubborn reliance on intuition and subjectivity in employee selection. Industrial and Organizational Psychology: Perspectives on Science and Practice, 1, 333–342.
Hofstede, G. (1991). Empirical models of cultural differences. In N. Bleichrodt & P. J. D. Drenth (Eds.), Contemporary issues in cross-cultural psychology (pp. 4–20). Lisse: Swets & Zeitlinger.
Huang, C. D., Church, A. T., & Katigbak, M. S. (1997). Identifying cultural differences in items and traits: Differential item functioning in the NEO personality inventory. Journal of Cross-Cultural Psychology, 28, 192–218.
Hulin, C. L., Drasgow, F., & Komocar, J. (1982). Applications of item response theory to analysis of attitude scale translations. Journal of Applied Psychology, 67, 818–825.
Hulin, C. L., & Mayer, L. J. (1986). Psychometric equivalence of a translation of the job descriptive index into Hebrew. Journal of Applied Psychology, 71, 83–94.
Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.
Kim, S.-H., & Cohen, A. S. (1995). A comparison of Lord’s Chi-square, Raju’s area measures, and the likelihood ratio test on detection of differential item functioning. Applied Measurement in Education, 8, 291–312.
Kim, S.-H., & Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22, 345–355.
Landy, F. J. (2005). Employment discrimination litigation: Behavioral, quantitative, and legal perspectives. San Francisco: Jossey-Bass.
McKay, R., Breslow, M., Sangster, R., Gabbard, S., Reynolds, R., Nakamoto, J., et al. (1996). Translating survey questionnaires: Lessons learned. New Directions for Evaluation, 70, 93–104.
Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.
Oshima, T. C., & Morris, S. B. (2008). Raju’s differential functioning of items and tests. Educational Measurement: Issues and Practice, 27, 43–50.
Pérez, E. O. (2009). Lost in translation? Item validity in bilingual political surveys. The Journal of Politics, 71, 1530–1548.
Plake, B. S. (1980). A comparison of a statistical and subjective procedure to ascertain item validity: One step in the test validation process. Educational and Psychological Measurement, 40, 397–404.
Ramirez, M., Ford, M. E., Stewart, A. L., & Teresi, J. A. (2005). Measurement issues in health disparities research. Health Services Research, 40, 1640–1657.
Ramsey, P. (1993). Sensitivity review: The ETS experience as a case study. In P. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.
Rogers, T. B. (1992). Antecedents of operationalism: A case history in radical positivism. In C. W. Tolman (Ed.), Positivism in psychology: Historical and contemporary problems. London: Springer-Verlag.
Ross, S. J., & Okabe, J. (2006). The subjective and objective interface of bias detection on languages. International Journal of Testing, 6, 229–253.
Ryan, A. M., Chan, D., Ployhart, R. E., & Slade, A. L. (1999). Employee attitude surveys in a multinational organization: Considering language and culture in assessing measurement equivalence. Personnel Psychology, 52, 37–58.
Ryan, A. M., Horvath, M., Ployhart, R. E., Schmitt, N., & Slade, L. A. (2000). Hypothesizing differential item functioning in global employee opinion surveys. Personnel Psychology, 53, 531–562.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34, 100–114.
Sandoval, J., & Miille, M. W. (1980). Accuracy of judgments of WISC-R item difficulty for minority groups. Journal of Consulting and Clinical Psychology, 48, 249–253.
Searcy, C. A., & Lautenschlager, G. J. (1999). A monte carlo investigation of DIF assessment for polytomously scored items. Paper presented at the 14th annual meeting of the Society for Industrial and Organizational Psychology, Atlanta, GA.
Thissen, D. (2001). IRTLRDIFv2.0b: Software for the computation of the statistics involved in item response theory-based likelihood ratio tests for differential item functioning. Chapel Hill, NC: University of North Carolina.
Thissen, D. (2003). MULTILOG 7.0: Multiple, categorical item analysis and test scoring using item response theory [Computer Program]. Chicago, IL: Scientific Software, Inc.
Van de Vijver, F., & Hambleton, R. K. (1996). Translating tests: Some practical guidelines. European Psychologist, 1, 89–99.
Willgerodt, M. A. (2003). Using focus groups to develop culturally relevant instruments. Western Journal of Nursing Research, 25, 798–814.
Woods, C. M. (2011). DIF testing for ordinal items with poly-SIBTEST, the Mantel and GMH tests, and IRT-LR-DIF when the latent distribution is nonnormal for both groups. Applied Psychological Measurement, 35, 145–164.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning: Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Carter, N.T., Kotrba, L.M., Diab, D.L. et al. A Comparison of a Subjective and Statistical Method for Establishing Score Comparability in an Organizational Culture Survey. J Bus Psychol 27, 451–466 (2012). https://doi.org/10.1007/s10869-011-9254-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10869-011-9254-1